访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.LG 方向,今日共计117篇
Graph相关(图学习|图神经网络|图优化等)(8篇)
【1】 NetFense: Adversarial Defenses against Privacy Attacks on Neural Networks for Graph Data 标题:NetFense:针对图数据的神经网络隐私攻击的对抗性防御
作者:I-Chung Hsieh,Cheng-Te Li 机构: Institute of Data Science, National Cheng Kung University 备注:None 链接:https://arxiv.org/abs/2106.11865 摘要:近年来,在保护图数据节点隐私和攻击图神经网络(GNNs)方面的研究进展受到了广泛关注。眼睛还没有把这两个基本任务结合起来。想象一下,一个对手可以利用强大的GNNs来推断社交网络中用户的私有标签。我们如何在保持扰动图效用的同时,对抗这种隐私攻击?在这项工作中,我们提出了一个新的研究课题,对抗基于GNN的隐私攻击,并提出了一个基于图扰动的方法NetFense来实现这一目标。NetFense可以同时保持图数据的不可见性(即图结构的有限变化),保持目标标签分类的预测置信度(即保持数据效用),降低私有标签分类的预测置信度(即保护节点隐私)。利用三个实图数据对单目标和多目标扰动进行的实验表明,NetFense的扰动图能有效地保持目标标签分类的数据效用(即模型不可见性),显著降低了私有标签分类的预测置信度(即。,隐私保护)。广泛的研究也带来了一些见解,例如NetFense的灵活性,在数据不可见的情况下保持本地邻居,以及对高度节点更好的隐私保护。 摘要:Recent advances in protecting node privacy on graph data and attacking graph neural networks (GNNs) gain much attention. The eye does not bring these two essential tasks together yet. Imagine an adversary can utilize the powerful GNNs to infer users' private labels in a social network. How can we adversarially defend against such privacy attacks while maintaining the utility of perturbed graphs? In this work, we propose a novel research task, adversarial defenses against GNN-based privacy attacks, and present a graph perturbation-based approach, NetFense, to achieve the goal. NetFense can simultaneously keep graph data unnoticeability (i.e., having limited changes on the graph structure), maintain the prediction confidence of targeted label classification (i.e., preserving data utility), and reduce the prediction confidence of private label classification (i.e., protecting the privacy of nodes). Experiments conducted on single- and multiple-target perturbations using three real graph data exhibit that the perturbed graphs by NetFense can effectively maintain data utility (i.e., model unnoticeability) on targeted label classification and significantly decrease the prediction confidence of private label classification (i.e., privacy protection). Extensive studies also bring several insights, such as the flexibility of NetFense, preserving local neighborhoods in data unnoticeability, and better privacy protection for high-degree nodes.
【2】 Towards Automated Evaluation of Explanations in Graph Neural Networks 标题:图神经网络中解释的自动评价研究
作者:Vanya BK,Balaji Ganesan,Aniket Saxena,Devbrat Sharma,Arvind Agarwal 备注:5 pages, 4 figures, XAI Workshop at ICML 2021 链接:https://arxiv.org/abs/2106.11864 摘要:用易于理解的术语向人工智能应用的最终用户解释图形神经网络预测仍然是一个未解决的问题。特别是,我们没有完善的方法来自动评估解释,更接近用户如何使用这些解释。基于最近的应用趋势和我们在实际问题中的经验,我们提出了GNN解释的自动评估方法。 摘要:Explaining Graph Neural Networks predictions to end users of AI applications in easily understandable terms remains an unsolved problem. In particular, we do not have well developed methods for automatically evaluating explanations, in ways that are closer to how users consume those explanations. Based on recent application trends and our own experiences in real world problems, we propose automatic evaluation approaches for GNN Explanations.
【3】 Graph coarsening: From scientific computing to machine learning 标题:图形粗化:从科学计算到机器学习
作者:Jie Chen,Yousef Saad,Zechen Zhang 机构:†University of Minnesota 链接:https://arxiv.org/abs/2106.11863 摘要:图粗化或图约简的一般方法在科学计算中是一个非常有用和普遍存在的工具,它在机器学习中刚刚开始产生类似的影响。本文的目标是对科学计算中已经成功应用的粗化技术进行广泛的研究,并了解类似的原理是如何在与机器学习相关的最新应用中找到它们的方法的。在科学计算中,粗化在代数多重网格方法以及相关的多级不完全LU分解中起着核心作用。在机器学习中,图的粗化可以用不同的名称来表示,例如图的下采样或图的约简。在大多数情况下,它的目标是用一个节点数较少但结构和特征与原图相似的图来代替原图。可以看出,这些方法中的一个常见策略是依赖光谱特性来定义粗图。 摘要:The general method of graph coarsening or graph reduction has been a remarkably useful and ubiquitous tool in scientific computing and it is now just starting to have a similar impact in machine learning. The goal of this paper is to take a broad look into coarsening techniques that have been successfully deployed in scientific computing and see how similar principles are finding their way in more recent applications related to machine learning. In scientific computing, coarsening plays a central role in algebraic multigrid methods as well as the related class of multilevel incomplete LU factorizations. In machine learning, graph coarsening goes under various names, e.g., graph downsampling or graph reduction. Its goal in most cases is to replace some original graph by one which has fewer nodes, but whose structure and characteristics are similar to those of the original graph. As will be seen, a common strategy in these methods is to rely on spectral properties to define the coarse graph.
【4】 A Deep Latent Space Model for Graph Representation Learning 标题:一种用于图表示学习的深度潜在空间模型
作者:Hanxuan Yang,Qingchao Kong,Wenji Mao 链接:https://arxiv.org/abs/2106.11721 摘要:图表示学习是关系数据建模的一个基本问题,对许多下游应用程序都有好处。传统的基于贝叶斯的图模型和最近发展起来的基于深度学习的GNN都存在着不实用性和不可解释性的问题,因此提出了无向图的组合模型。由于现实世界中有很大一部分图是有向图(其中无向图是特例),本文提出了一种有向图的深潜空间模型(DLSM),将传统的基于潜变量的生成模型引入到深度学习框架中。我们提出的模型由一个图卷积网络(GCN)编码器和一个随机译码器组成,这两个译码器通过分层变分自动编码器结构分层连接。通过利用节点随机因子对群落异质性程度进行具体建模,该模型在群落结构和异质性程度上都具有较好的解释性。为了快速推理,采用了随机梯度变分贝叶斯(SGVB)的非迭代识别模型,比传统的基于MCMC的方法具有更大的可扩展性。在真实数据集上的实验表明,该模型在学习可解释节点嵌入的同时,在链路预测和社区检测任务上都达到了最先进的性能。源代码位于https://github.com/upperr/DLSM. 摘要:Graph representation learning is a fundamental problem for modeling relational data and benefits a number of downstream applications. Traditional Bayesian-based graph models and recent deep learning based GNN either suffer from impracticability or lack interpretability, thus combined models for undirected graphs have been proposed to overcome the weaknesses. As a large portion of real-world graphs are directed graphs (of which undirected graphs are special cases), in this paper, we propose a Deep Latent Space Model (DLSM) for directed graphs to incorporate the traditional latent variable based generative model into deep learning frameworks. Our proposed model consists of a graph convolutional network (GCN) encoder and a stochastic decoder, which are layer-wise connected by a hierarchical variational auto-encoder architecture. By specifically modeling the degree heterogeneity using node random factors, our model possesses better interpretability in both community structure and degree heterogeneity. For fast inference, the stochastic gradient variational Bayes (SGVB) is adopted using a non-iterative recognition model, which is much more scalable than traditional MCMC-based methods. The experiments on real-world datasets show that the proposed model achieves the state-of-the-art performances on both link prediction and community detection tasks while learning interpretable node embeddings. The source code is available at https://github.com/upperr/DLSM.
【5】 A Vertical Federated Learning Framework for Graph Convolutional Network 标题:一种用于图卷积网络的垂直联合学习框架
作者:Xiang Ni,Xiaolong Xu,Lingjuan Lyu,Changhua Meng,Weiqiang Wang 机构:Ant Group 链接:https://arxiv.org/abs/2106.11593 摘要:近年来,图形神经网络(GNN)在解决图形数据的各种实际问题方面取得了显著的成功。然而在大多数行业中,数据是以孤岛的形式存在的,数据的隐私性和安全性也是一个重要的问题。在本文中,我们提出了联邦GCN学习范式FedVGCN,该范式可推广到现有的GCN模型中,用于数据垂直分割环境下的隐私保护节点分类任务。具体来说,我们将计算图数据分为两部分。对于训练过程的每次迭代,双方在同态加密下相互传递中间结果。我们在基准数据上进行了实验,结果证明了FedVGCN在图形图像情况下的有效性。 摘要:Recently, Graph Neural Network (GNN) has achieved remarkable success in various real-world problems on graph data. However in most industries, data exists in the form of isolated islands and the data privacy and security is also an important issue. In this paper, we propose FedVGCN, a federated GCN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GCN models. Specifically, we split the computation graph data into two parts. For each iteration of the training process, the two parties transfer intermediate results to each other under homomorphic encryption. We conduct experiments on benchmark data and the results demonstrate the effectiveness of FedVGCN in the case of GraphSage.
【6】 Continuous-Depth Neural Models for Dynamic Graph Prediction 标题:动态图预测的连续深度神经模型
作者:Michael Poli,Stefano Massaroli,Clayton M. Rabideau,Junyoung Park,Atsushi Yamashita,Hajime Asama,Jinkyoo Park 机构: 2The University of Tokyo 3Syntensor 备注:Extended version of the workshop paper "Graph Neural Ordinary Differential Equations". arXiv admin note: substantial text overlap with arXiv:1911.07532 链接:https://arxiv.org/abs/2106.11581 摘要:介绍了连续深度图神经网络的结构。神经图微分方程(Neural-graph-differential equations,Neural-GDEs)被形式化为GNNs的对应形式,其中输入输出关系由GNN层的连续统一体决定,混合了离散拓扑结构和微分方程。该框架与静态GNN模型兼容,并通过混合动力系统理论扩展到动态和随机环境。在这里,神经GDE通过利用基本的动力学几何结构来提高性能,进一步引入了适应不规则采样数据的能力。结果证明了所提出的模型在交通量预测、遗传调控网络预测等方面的有效性。 摘要:We introduce the framework of continuous-depth graph neural networks (GNNs). Neural graph differential equations (Neural GDEs) are formalized as the counterpart to GNNs where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. The proposed framework is shown to be compatible with static GNN models and is extended to dynamic and stochastic settings through hybrid dynamical system theory. Here, Neural GDEs improve performance by exploiting the underlying dynamics geometry, further introducing the ability to accommodate irregularly sampled data. Results prove the effectiveness of the proposed models across applications, such as traffic forecasting or prediction in genetic regulatory networks.
【7】 Graph Routing between Capsules 标题:胶囊之间的图路由
作者:Yang Li,Wei Zhao,Erik Cambria,Suhang Wang,Steffen Eger 机构:Northwestern Polytechnical University, China, Nanyang Technological University, Singapore, Technical University of Darmstadt, Germany, Pennsylvania State University, USA 备注:None 链接:https://arxiv.org/abs/2106.11531 摘要:胶囊网络中的路由方法通常学习连续层胶囊之间的层次关系,但对同一层胶囊之间的内部关系研究较少,而这种内部关系是文本数据语义理解的关键因素。因此,本文引入一种新的带图路由的胶囊网络来学习这两种关系,其中每一层的胶囊都被视为图的节点。我们研究了从一层胶囊中产生三种不同距离的邻接矩阵和度矩阵的策略,并提出了胶囊之间的图路由机制。我们在五个文本分类数据集上验证了我们的方法,并且我们的发现表明,结合自底向上路由和自顶向下注意的方法表现最好。这种方法展示了跨数据集的泛化能力。与最新的路由方法相比,我们使用的五个数据集的精确度分别提高了0.82、0.39、0.07、1.01和0.02。 摘要:Routing methods in capsule networks often learn a hierarchical relationship for capsules in successive layers, but the intra-relation between capsules in the same layer is less studied, while this intra-relation is a key factor for the semantic understanding in text data. Therefore, in this paper, we introduce a new capsule network with graph routing to learn both relationships, where capsules in each layer are treated as the nodes of a graph. We investigate strategies to yield adjacency and degree matrix with three different distances from a layer of capsules, and propose the graph routing mechanism between those capsules. We validate our approach on five text classification datasets, and our findings suggest that the approach combining bottom-up routing and top-down attention performs the best. Such an approach demonstrates generalization capability across datasets. Compared to the state-of-the-art routing methods, the improvements in accuracy in the five datasets we used were 0.82, 0.39, 0.07, 1.01, and 0.02, respectively.
【8】 ConvDySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention and Convolutional Neural Networks 标题:ConvDySAT:基于自关注和卷积神经网络的动态图深度神经表示学习
作者:Ahmad Hafez,Atulya Praphul,Yousef Jaradt,Ezani Godwin 机构:Supervisors:, Ahmed Rashed, Student Research Project ,-, Master of Science in Data Analytics, Wirtschaftsinformatik und Maschinelles Lernen, Stiftung Universit¨at Hildesheim, Universitatspl¨atz , Hildesheim, arXiv:,.,v, [cs.LG] , Jun 链接:https://arxiv.org/abs/2106.11430 摘要:学习时态图上的节点表示是有效学习实词动态图的基本步骤。现实世界中的图具有随时间不断演化的性质,如改变边权值、删除和添加节点、边的出现和消失等,而以往的图表示学习方法一般集中在静态图上。我们提出ConvDySAT是对DySAT的一种改进,DySAT是一种最先进的动态方法,它利用DySAT中用来表达结构和时间演化的自我注意机制来增强卷积神经网络。我们在一个通信网络和分级网络上进行了单步链路预测,实验结果表明ConvDySAT比各种最先进的方法有显著的性能提高。 摘要:Learning node representations on temporal graphs is a fundamental step to learn real-word dynamic graphs efficiently. Real-world graphs have the nature of continuously evolving over time, such as changing edges weights, removing and adding nodes and appearing and disappearing of edges, while previous graph representation learning methods focused generally on static graphs. We present ConvDySAT as an enhancement of DySAT, one of the state-of-the-art dynamic methods, by augmenting convolution neural networks with the self-attention mechanism, the employed method in DySAT to express the structural and temporal evolution. We conducted single-step link prediction on a communication network and rating network, Experimental results show significant performance gains for ConvDySAT over various state-of-the-art methods.
Transformer(4篇)
【1】 MODETR: Moving Object Detection with Transformers 标题:MODETR:基于Transformer的运动目标检测
作者:Eslam Mohamed,Ahmad El-Sallab 机构:Deep Learning Research, Valeo R&D Cairo, EGYPT, Ahmad El Sallab 备注:None 链接:https://arxiv.org/abs/2106.11422 摘要:运动目标检测(MOD)是自主驱动管道的关键任务。MOD通常是通过包含外观和运动线索的两流卷积结构来处理的,而不考虑空间或运动特征之间的相互关系。在本文中,我们通过跨空间和运动流的多头部注意机制来解决这个问题。我们建议使用MODETR;一种运动目标检测变换器网络,包括用于空间和运动模式的多流变换器编码器和使用集合预测产生运动目标边界框的对象变换器解码器。整个体系结构是使用双边损失进行端到端训练的。探索了几种将运动线索与Transformer模型相结合的方法,包括两流RGB和光流(of)方法,以及利用序列信息的多流结构。为了融合时间信息,我们提出了一种新的时间位置编码(TPE)方法来扩展DETR中的空间位置编码(SPE)。为此,我们探讨了两种架构选择,即速度和时间之间的平衡。为了评估我们的网络,我们对KITTI MOD[6]数据集执行MOD任务。结果显示,与最新方法相比,MODTransformer网络的映射显著提高了5%。此外,提议的TPE编码比SPE基线提供了10%的mAP改进。 摘要:Moving Object Detection (MOD) is a crucial task for the Autonomous Driving pipeline. MOD is usually handled via 2-stream convolutional architectures that incorporates both appearance and motion cues, without considering the inter-relations between the spatial or motion features. In this paper, we tackle this problem through multi-head attention mechanisms, both across the spatial and motion streams. We propose MODETR; a Moving Object DEtection TRansformer network, comprised of multi-stream transformer encoders for both spatial and motion modalities, and an object transformer decoder that produces the moving objects bounding boxes using set predictions. The whole architecture is trained end-to-end using bi-partite loss. Several methods of incorporating motion cues with the Transformer model are explored, including two-stream RGB and Optical Flow (OF) methods, and multi-stream architectures that take advantage of sequence information. To incorporate the temporal information, we propose a new Temporal Positional Encoding (TPE) approach to extend the Spatial Positional Encoding(SPE) in DETR. We explore two architectural choices for that, balancing between speed and time. To evaluate the our network, we perform the MOD task on the KITTI MOD [6] data set. Results show significant 5% mAP of the Transformer network for MOD over the state-of-the art methods. Moreover, the proposed TPE encoding provides 10% mAP improvement over the SPE baseline.
【2】 Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation 标题:用于联合运动目标检测和分割的时空多任务学习转换器
作者:Eslam Mohamed,Ahmed El-Sallab 机构:Valeo R&D Cairo, Egypt 链接:https://arxiv.org/abs/2106.11401 摘要:移动物体对于自动驾驶任务具有特殊的重要性。通过对运动目标像素进行分割,可以将运动目标检测定位为运动目标分割;通过生成运动目标的边界框,可以将运动目标检测定位为运动目标检测。本文提出了一种基于Transformers的多任务学习体系结构,通过一个网络共同完成两个任务。由于运动特征对任务的重要性,整个设置基于时空聚集。我们评估了单个任务体系结构与MTL设置的性能,包括早期共享编码器和晚期共享编码器-解码器转换器。对于后者,我们提出了一种新的联合任务查询-解码器转换器,使我们能够在共享模型中有专用的任务头。为了评估我们的方法,我们使用KITTI MOD[29]数据集。结果表明,在单个任务网络中,运动目标检测的mAP提高了1.5%,运动目标分割的IoU提高了2%。 摘要:Moving objects have special importance for Autonomous Driving tasks. Detecting moving objects can be posed as Moving Object Segmentation, by segmenting the object pixels, or Moving Object Detection, by generating a bounding box for the moving targets. In this paper, we present a Multi-Task Learning architecture, based on Transformers, to jointly perform both tasks through one network. Due to the importance of the motion features to the task, the whole setup is based on a Spatio-Temporal aggregation. We evaluate the performance of the individual tasks architecture versus the MTL setup, both with early shared encoders, and late shared encoder-decoder transformers. For the latter, we present a novel joint tasks query decoder transformer, that enables us to have tasks dedicated heads out of the shared model. To evaluate our approach, we use the KITTI MOD [29] data set. Results show1.5% mAP improvement for Moving Object Detection, and 2%IoU improvement for Moving Object Segmentation, over the individual tasks networks.
【3】 Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records 标题:HI-BEHRT:利用多模式纵向电子健康记录精确预测临床事件的基于分层Transformer的模型
作者:Yikuan Li,Mohammad Mamouei,Gholamreza Salimi-Khorshidi,Shishir Rao,Abdelaali Hassaine,Dexter Canoy,Thomas Lukasiewicz,Kazem Rahimi 机构:EHR provide up-to-date and comprehensive information, about patients. It allows a clinician to assess the entire patient, journey and represents what is actually available in clinical, practice[,]. Due to its complexity and heterogeneity, modelling 链接:https://arxiv.org/abs/2106.11360 摘要:电子健康记录代表了对患者轨迹的全面了解。它们越来越多的可用性激发了新的希望,可以利用它们,为各种疾病开发准确的风险预测模型。鉴于病历和患者结果之间的复杂相互关系,深度学习模式在实现这一目标方面显示出明显的优点。然而,这些模型的一个关键限制仍然是它们处理长序列的能力。捕捉医学遭遇的整个历史有望导致更准确的预测,但包含几十年来收集的记录和来自多种资源的记录不可避免地会超出现有深度学习体系结构的接受范围。这可能导致缺少关键的、长期的依赖关系。为了解决这个问题,我们提出了Hi-BEHRT,一个基于层次变换器的模型,它可以显著地扩展变换器的感受野,并从更长的序列中提取关联。使用多模态大规模纵向电子健康记录,Hi-BEHRT在接收器工作特性(AUROC)曲线下的面积平均超过最先进的BEHRT 1%至5%,在精确回忆(AUPRC)曲线下的面积平均超过3%至6%,对于有5年心衰、糖尿病、慢性肾病和中风风险预测的长期病史的患者,AUROC为3%至6%,AUPRC为3%至11%。另外,由于层次变换的预训练还没有建立起来,我们利用EHR为Hi-BEHRT提供了一种有效的端到端对比预训练策略,提高了它在预测临床事件方面的可传递性。 摘要:Electronic health records represent a holistic overview of patients' trajectories. Their increasing availability has fueled new hopes to leverage them and develop accurate risk prediction models for a wide range of diseases. Given the complex interrelationships of medical records and patient outcomes, deep learning models have shown clear merits in achieving this goal. However, a key limitation of these models remains their capacity in processing long sequences. Capturing the whole history of medical encounters is expected to lead to more accurate predictions, but the inclusion of records collected for decades and from multiple resources can inevitably exceed the receptive field of the existing deep learning architectures. This can result in missing crucial, long-term dependencies. To address this gap, we present Hi-BEHRT, a hierarchical Transformer-based model that can significantly expand the receptive field of Transformers and extract associations from much longer sequences. Using a multimodal large-scale linked longitudinal electronic health records, the Hi-BEHRT exceeds the state-of-the-art BEHRT 1% to 5% for area under the receiver operating characteristic (AUROC) curve and 3% to 6% for area under the precision recall (AUPRC) curve on average, and 3% to 6% (AUROC) and 3% to 11% (AUPRC) for patients with long medical history for 5-year heart failure, diabetes, chronic kidney disease, and stroke risk prediction. Additionally, because pretraining for hierarchical Transformer is not well-established, we provide an effective end-to-end contrastive pre-training strategy for Hi-BEHRT using EHR, improving its transferability on predicting clinical events with relatively small training dataset.
【4】 Transformer-based Spatial-Temporal Feature Learning for EEG Decoding 标题:基于变换的时空特征学习在脑电解码中的应用
作者:Yonghao Song,Xueyu Jia,Lie Yang,Longhan Xie 机构: Lie Yang and Longhan Xie are with theShien-Ming Wu School of Intelligent Engineering, South China Universityof Technology 备注:10 pages, 6 figures 链接:https://arxiv.org/abs/2106.11170 摘要:目前,人们通常采用一些基于卷积神经网络(CNNs)的方法对脑电图(EEG)进行解码。然而,CNNs在感知全局依赖性方面存在局限性,这对于具有强整体关系的常见EEG范式是不够的。针对这一问题,本文提出了一种基于注意机制的脑电解码方法。首先对脑电数据进行预处理和空间滤波。然后,在特征通道维度上进行注意转换,使模型能够增强更多相关的空间特征。最关键的一步是在时间维度上对数据进行切片,进行注意力转换,最终得到高度可分辨的表征。在这个时候,全局平均池和一个简单的完全连接层被用来分类不同类别的脑电数据。在两个公共数据集上的实验表明,注意转换策略有效地利用了空间和时间特征。在脑电多分类中,参数较少,达到了最先进的水平。据我们所知,这是第一次提出一个详细而完整的方法,基于Transformer的思想在这一领域。它对促进脑机接口(BCI)的实用化具有良好的潜力。源代码可以在以下位置找到:textit{https://github.com/anranknight/EEG-Transformer}. 摘要:At present, people usually use some methods based on convolutional neural networks (CNNs) for Electroencephalograph (EEG) decoding. However, CNNs have limitations in perceiving global dependencies, which is not adequate for common EEG paradigms with a strong overall relationship. Regarding this issue, we propose a novel EEG decoding method that mainly relies on the attention mechanism. The EEG data is firstly preprocessed and spatially filtered. And then, we apply attention transforming on the feature-channel dimension so that the model can enhance more relevant spatial features. The most crucial step is to slice the data in the time dimension for attention transforming, and finally obtain a highly distinguishable representation. At this time, global averaging pooling and a simple fully-connected layer are used to classify different categories of EEG data. Experiments on two public datasets indicate that the strategy of attention transforming effectively utilizes spatial and temporal features. And we have reached the level of the state-of-the-art in multi-classification of EEG, with fewer parameters. As far as we know, it is the first time that a detailed and complete method based on the transformer idea has been proposed in this field. It has good potential to promote the practicality of brain-computer interface (BCI). The source code can be found at: textit{https://github.com/anranknight/EEG-Transformer}.
GAN|对抗|攻击|生成相关(5篇)
【1】 Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks 标题:基于分类器引导的生成性对抗性归罪网络的多器官功能衰竭预测
作者:Xinlu Zhang,Yun Zhao,Rachael Callcut,Linda Petzold 机构:Department of Computer Science, University of California, Santa Barbara, USA, UC, Davis Health, Davis, USA 备注:BioKDD 链接:https://arxiv.org/abs/2106.11878 摘要:多器官衰竭(MOF)是重症监护病房(ICU)患者中死亡率较高的一种严重综合征。早期准确的检测对于临床医生及时做出决定至关重要。将机器学习模型应用于电子健康记录(EHRs)的一个基本挑战是缺失值的普遍存在。现有的大多数插补方法都是在数据预处理阶段进行的,无法捕捉数据与结果之间的关系进行下游预测。在本文中,我们提出了分类器引导的生成性对抗性插补网络(分类器增益)来弥补这一差距,通过合并观测数据和标签信息。具体来说,分类器从生成器(输入器)中获取输入值来预测任务结果,并通过联合训练向生成器提供额外的监督信号。分类器引导生成器在训练过程中利用标签感知来填充缺失值,提高了分类器在推理过程中的性能。我们进行了大量的实验,结果表明,我们的方法在一系列缺失数据场景和评估指标上始终优于经典和最先进的神经基线。 摘要:Multiple organ failure (MOF) is a severe syndrome with a high mortality rate among Intensive Care Unit (ICU) patients. Early and precise detection is critical for clinicians to make timely decisions. An essential challenge in applying machine learning models to electronic health records (EHRs) is the pervasiveness of missing values. Most existing imputation methods are involved in the data preprocessing phase, failing to capture the relationship between data and outcome for downstream predictions. In this paper, we propose classifier-guided generative adversarial imputation networks Classifier-GAIN) for MOF prediction to bridge this gap, by incorporating both observed data and label information. Specifically, the classifier takes imputed values from the generator(imputer) to predict task outcomes and provides additional supervision signals to the generator by joint training. The classifier-guide generator imputes missing values with label-awareness during training, improving the classifier's performance during inference. We conduct extensive experiments showing that our approach consistently outperforms classical and state-of-art neural baselines across a range of missing data scenarios and evaluation metrics.
【2】 Self-Supervised Iterative Contextual Smoothing for Efficient Adversarial Defense against Gray- and Black-Box Attack 标题:自监督迭代上下文平滑有效防御灰箱和黑箱攻击
作者:Sungmin Cha,Naeun Ko,Youngjoon Yoo,Taesup Moon 机构: Department of Electrical and Computer Engineering, Seoul National University, NAVER AI Lab, NAVER Clova 备注:Preprint version 链接:https://arxiv.org/abs/2106.11644 摘要:提出了一种新颖有效的基于输入变换的对抗性灰盒和黑盒攻击防御方法,该方法计算效率高,不需要对分类模型进行任何对抗性训练或再训练。我们首先证明了一个非常简单的迭代高斯平滑可以有效地消除敌对噪声,并获得相当高的鲁棒精度。在此基础上,本文提出了一种自监督迭代上下文平滑(SSICS)算法,该算法的目标是在平滑后的高斯图像中以上下文自适应的方式重建出原始的鉴别特征,同时平滑掉敌对噪声。在ImageNet上的实验表明,我们的SSICS对灰盒和黑盒攻击具有很高的标准精度和很强的鲁棒性;e、 基于转移的PGD攻击和基于分数的攻击。值得注意的一点是,我们的防御是免费的计算昂贵的对抗性训练,但可以接近其强大的精度通过输入转换。 摘要:We propose a novel and effective input transformation based adversarial defense method against gray- and black-box attack, which is computationally efficient and does not require any adversarial training or retraining of a classification model. We first show that a very simple iterative Gaussian smoothing can effectively wash out adversarial noise and achieve substantially high robust accuracy. Based on the observation, we propose Self-Supervised Iterative Contextual Smoothing (SSICS), which aims to reconstruct the original discriminative features from the Gaussian-smoothed image in context-adaptive manner, while still smoothing out the adversarial noise. From the experiments on ImageNet, we show that our SSICS achieves both high standard accuracy and very competitive robust accuracy for the gray- and black-box attacks; e.g., transfer-based PGD-attack and score-based attack. A note-worthy point to stress is that our defense is free of computationally expensive adversarial training, yet, can approach its robust accuracy via input transformation.
【3】 On Adversarial Robustness of Synthetic Code Generation 标题:关于合成代码生成的对抗健壮性
作者:Mrinal Anand,Pratik Kayal,Mayank Singh 机构:Indian Institute of Technology Gandhinagar, India 链接:https://arxiv.org/abs/2106.11629 摘要:从自然语言描述中自动合成代码是一项具有挑战性的任务。近年来,我们在开发领域特定语言(dsl)的代码生成系统方面取得了巨大的进展。在本文中,我们具体实验了基于textsc{AlgoLisp}DSL的生成模型,并通过不同类别的对抗性例子展示了显著的数据集偏差的存在。我们还对基于Transformer的模型的两个变体进行了实验,它们的性能优于所有现有的基于textsc{AlgoLisp}DSL的代码生成基线。与目前最先进的系统相一致,我们提出的模型在对抗环境下的性能也很差。因此,我们提出了几种数据集扩充技术,以减少偏见和展示其有效性,使用稳健的实验。 摘要:Automatic code synthesis from natural language descriptions is a challenging task. We witness massive progress in developing code generation systems for domain-specific languages (DSLs) employing sequence-to-sequence deep learning techniques in the recent past. In this paper, we specifically experiment with textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. We also experiment with two variants of Transformer-based models that outperform all existing textsc{AlgoLisp} DSL-based code generation baselines. Consistent with the current state-of-the-art systems, our proposed models, too, achieve poor performance under adversarial settings. Therefore, we propose several dataset augmentation techniques to reduce bias and showcase their efficacy using robust experimentation.
【4】 Particle Cloud Generation with Message Passing Generative Adversarial Networks 标题:基于消息传递生成对抗网络的粒子云生成
作者:Raghav Kansal,Javier Duarte,Hao Su,Breno Orzari,Thiago Tomei,Maurizio Pierini,Mary Touranakou,Jean-Roch Vlimant,Dimitrios Gunopulos 机构:University of California San Diego, La Jolla, CA , USA, Universidade Estadual Paulista, São PauloSP - CEP ,-, Brazil, European Organization for Nuclear Research (CERN), CH-, Geneva , Switzerland, California Institute of Technology, Pasadena, CA , USA 备注:13 pages, 4 figures, 2 tables, and a 3 page appendix 链接:https://arxiv.org/abs/2106.11535 摘要:在高能物理(HEP)中,喷流是粒子碰撞中普遍产生的相关粒子的集合,比如欧洲核子研究中心(CERN)大型强子对撞机(LHC)。基于机器学习的生成性模型,如生成性对抗网络(GANs),具有显著加速LHC喷气机模拟的潜力。然而,尽管喷流在动量空间中自然地表现为一组粒子,也就是粒子云,但据我们所知,不存在适用于这种数据集的生成模型。我们引入了一个新的粒子云数据集(JetNet),并且,由于粒子云和点云之间的相似性,将现有的点云数据集应用于它。使用(1)高级和低级特征分布之间的1-Wasserstein距离,(2)新开发的Fr'{e}chet粒子网距离,(3)覆盖率和(4)最小匹配距离度量来评估结果。现有的GANs不适合于物理应用,因此我们开发了一种新的消息传递GAN(MPGAN),它在几乎所有指标上都优于现有的点云GANs,在HEP中显示出良好的应用前景。我们建议JetNet作为一个新的点云风格的数据集,供机器学习社区进行实验,并将MPGAN作为一个基准来改进,以用于未来的生成模型。 摘要:In high energy physics (HEP), jets are collections of correlated particles produced ubiquitously in particle collisions such as those at the CERN Large Hadron Collider (LHC). Machine-learning-based generative models, such as generative adversarial networks (GANs), have the potential to significantly accelerate LHC jet simulations. However, despite jets having a natural representation as a set of particles in momentum-space, a.k.a. a particle cloud, to our knowledge there exist no generative models applied to such a dataset. We introduce a new particle cloud dataset (JetNet), and, due to similarities between particle and point clouds, apply to it existing point cloud GANs. Results are evaluated using (1) the 1-Wasserstein distance between high- and low-level feature distributions, (2) a newly developed Fr'{e}chet ParticleNet Distance, and (3) the coverage and (4) minimum matching distance metrics. Existing GANs are found to be inadequate for physics applications, hence we develop a new message passing GAN (MPGAN), which outperforms existing point cloud GANs on virtually every metric and shows promise for use in HEP. We propose JetNet as a novel point-cloud-style dataset for the machine learning community to experiment with, and set MPGAN as a benchmark to improve upon for future generative models.
【5】 f-Domain-Adversarial Learning: Theory and Algorithms 标题:F-域-对抗性学习:理论与算法
作者:David Acuna,Guojun Zhang,Marc T. Law,Sanja Fidler 机构: The model is trained with both the 1NVIDIA 2UniversityofToronto 3VectorInstitute 4University of Waterloo 备注:ICML 2021 链接:https://arxiv.org/abs/2106.11344 摘要:无监督域自适应在许多机器学习应用中都有应用,在训练过程中,模型可以访问目标域中的未标记数据和相关的标记数据集。本文介绍了一种新颖的通用领域对抗框架。具体来说,我们推导了一个新的推广界域适应,利用一个新的措施之间的差异分布的基础上变分特征的f-分歧。作为特例,它恢复了Ben-David等人(2010a)的理论结果,并支持了实践中使用的分歧。基于这个界限,我们导出了一个新的算法框架,该框架在Ganin等人(2016)的原始对抗训练方法中引入了关键修正。我们表明,过去几年在这个框架中引入的许多正则化器和特殊目标不需要达到与最先进的领域对抗方法相当的性能(如果不是更好的话)。在真实的自然语言和计算机视觉数据集上进行的实验分析表明,我们的框架优于现有的基线,并且对于领域对抗学习中没有考虑的f-分歧获得了最好的结果。 摘要:Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain, and a related labeled dataset. In this paper, we introduce a novel and general domain-adversarial framework. Specifically, we derive a novel generalization bound for domain adaptation that exploits a new measure of discrepancy between distributions based on a variational characterization of f-divergences. It recovers the theoretical results from Ben-David et al. (2010a) as a special case and supports divergences used in practice. Based on this bound, we derive a new algorithmic framework that introduces a key correction in the original adversarial training method of Ganin et al. (2016). We show that many regularizers and ad-hoc objectives introduced over the last years in this framework are then not required to achieve performance comparable to (if not better than) state-of-the-art domain-adversarial methods. Experimental analysis conducted on real-world natural language and computer vision datasets show that our framework outperforms existing baselines, and obtains the best results for f-divergences that were not considered previously in domain-adversarial learning.
半/弱/无/有监督|不确定性|主动学习(5篇)
【1】 MEAL: Manifold Embedding-based Active Learning 标题:Meal:基于流形嵌入的主动学习
作者:Deepthi Sreenivasaiah,Thomas Wollmann 机构:Merantix Labs GmbH, Berlin, Germany 链接:https://arxiv.org/abs/2106.11858 摘要:在自动驾驶中,图像分割是一项常见而又具有挑战性的任务。为训练数据提供足够的像素级注释是一个障碍。主动学习通过推荐最有希望的标记样本,帮助从少量数据中学习。在这项工作中,我们提出了一种新的基于池的主动学习方法,即在每一个采集步骤中提出有希望的图像区域。在勘探开发框架下,将基于均匀流形逼近的嵌入方法与熵作为不确定性测度的模型表示方法相结合。我们将我们提出的方法应用于具有挑战性的自主驾驶数据集CamVid和Cityscapes,并与最先进的方法进行了定量比较。我们发现,与其他方法相比,我们的主动学习方法在CamVid上获得了更好的性能,而在城市景观上,性能提升可以忽略不计。 摘要:Image segmentation is a common and challenging task in autonomous driving. Availability of sufficient pixel-level annotations for the training data is a hurdle. Active learning helps learning from small amounts of data by suggesting the most promising samples for labeling. In this work, we propose a new pool-based method for active learning, which proposes promising image regions, in each acquisition step. The problem is framed in an exploration-exploitation framework by combining an embedding based on Uniform Manifold Approximation to model representativeness with entropy as uncertainty measure to model informativeness. We applied our proposed method to the challenging autonomous driving data sets CamVid and Cityscapes and performed a quantitative comparison with state-of-the-art methods. We find that our active learning method achieves better performance on CamVid compared to other methods, while on Cityscapes, the performance lift was negligible.
【2】 Active Learning under Pool Set Distribution Shift and Noisy Data 标题:池集分布漂移和噪声数据下的主动学习
作者:Andreas Kirsch,Tom Rainforth,Yarin Gal 机构: Department of Computer Science 链接:https://arxiv.org/abs/2106.11719 摘要:主动学习对于更有效的深度学习至关重要。贝叶斯主动学习的研究重点是减少模型参数的不确定性。然而,我们发现,BALD会被与任务无关的分布外数据或垃圾数据卡住。我们研究了一种新的*预期预测信息增益(EPIG)*来处理池集的分布变化。EPIG减少了从测试数据分布中取样的未标记的评估集*上的*预测*的不确定性,该测试数据分布的分布可能不同于池集分布。在此基础上,我们提出了一种新的用于贝叶斯神经网络的EPIG-BALD获取函数,该函数选择样本来提高测试数据分布的性能,而不是选择样本来降低模型的不确定性,包括测试数据分布中密度较低的分布外区域。我们的方法在高维数据集上优于现有的贝叶斯主动学习方法,并且在现有方法失败的情况下避免了分布外的垃圾数据。 摘要:Active Learning is essential for more label-efficient deep learning. Bayesian Active Learning has focused on BALD, which reduces model parameter uncertainty. However, we show that BALD gets stuck on out-of-distribution or junk data that is not relevant for the task. We examine a novel *Expected Predictive Information Gain (EPIG)* to deal with distribution shifts of the pool set. EPIG reduces the uncertainty of *predictions* on an unlabelled *evaluation set* sampled from the test data distribution whose distribution might be different to the pool set distribution. Based on this, our new EPIG-BALD acquisition function for Bayesian Neural Networks selects samples to improve the performance on the test data distribution instead of selecting samples that reduce model uncertainty everywhere, including for out-of-distribution regions with low density in the test data distribution. Our method outperforms state-of-the-art Bayesian active learning methods on high-dimensional datasets and avoids out-of-distribution junk data in cases where current state-of-the-art methods fail.
【3】 Recent Deep Semi-supervised Learning Approaches and Related Works 标题:深度半监督学习方法及相关研究进展
作者:Gyeongho Kim 机构:Department of Industrial Engineering 链接:https://arxiv.org/abs/2106.11528 摘要:作者的这项工作提出了一个最近的半监督学习方法和相关工作的概述。尽管神经网络在各种应用中取得了显著的成功,但仍然存在一些难以克服的限制,包括需要大量的标记数据。因此,半监督学习(semi-supervised learning)越来越重要,它是一种利用稀缺的标签和大量的未标记数据来训练模型(如深度神经网络)的学习方案。基于半监督学习的主要假设,即流形假设、聚类假设和连续性假设,本文回顾了近年来半监督学习方法的研究进展。特别地,对在半监督学习环境中使用深度神经网络的方法进行了初步讨论。此外,本文首先对现有的研究成果进行了分类和阐释,然后详细阐述了统一上述思想的整体方法。 摘要:The author of this work proposes an overview of the recent semi-supervised learning approaches and related works. Despite the remarkable success of neural networks in various applications, there exist few formidable constraints including the need for a large amount of labeled data. Therefore, semi-supervised learning, which is a learning scheme in which the scarce labels and a larger amount of unlabeled data are utilized to train models (e.g., deep neural networks) is getting more important. Based on the key assumptions of semi-supervised learning, which are the manifold assumption, cluster assumption, and continuity assumption, the work reviews the recent semi-supervised learning approaches. In particular, the methods in regard to using deep neural networks in a semi-supervised learning setting are primarily discussed. In addition, the existing works are first classified based on the underlying idea and explained, and then the holistic approaches that unify the aforementioned ideas are detailed.
【4】 Credal Self-Supervised Learning 标题:凭证式自我监督学习
作者:Julian Lienen,Eyke Hüllermeier 机构:Department of Computer Science, Paderborn University, Paderborn , Germany, Institute of Informatics, University of Munich (LMU), Munich , Germany 备注:17 pages, 1 figure, 7 tables 链接:https://arxiv.org/abs/2106.11853 摘要:自我训练是一种有效的半监督学习方法。其关键思想是让学习者自己根据当前的假设对未标记的实例迭代生成“伪监督”。结合一致性正则化,伪标记在计算机视觉等领域显示出良好的性能。为了说明伪标签的假设性质,通常以概率分布的形式提供。尽管如此,有人可能会说,即使是概率分布也代表了信息量的过高水平,因为它表明学习者准确地知道基本真理条件概率。因此,在我们的方法中,我们允许学习者以credal集的形式标记实例,即(候选)概率分布集。由于这种表达能力的增强,学习者能够以更灵活、更忠实的方式表达不确定性和知识的缺乏。为了从这类弱标记数据中学习,我们利用了最近在所谓的超集学习领域中提出的方法。在详尽的实证评估中,我们将我们的方法与最先进的自我监督方法进行了比较,结果表明,特别是在包含高度不确定性的低标签场景中,我们的方法具有较高的竞争力。 摘要:Self-training is an effective approach to semi-supervised learning. The key idea is to let the learner itself iteratively generate "pseudo-supervision" for unlabeled instances based on its current hypothesis. In combination with consistency regularization, pseudo-labeling has shown promising performance in various domains, for example in computer vision. To account for the hypothetical nature of the pseudo-labels, these are commonly provided in the form of probability distributions. Still, one may argue that even a probability distribution represents an excessive level of informedness, as it suggests that the learner precisely knows the ground-truth conditional probabilities. In our approach, we therefore allow the learner to label instances in the form of credal sets, that is, sets of (candidate) probability distributions. Thanks to this increased expressiveness, the learner is able to represent uncertainty and a lack of knowledge in a more flexible and more faithful manner. To learn from weakly labeled data of that kind, we leverage methods that have recently been proposed in the realm of so-called superset learning. In an exhaustive empirical evaluation, we compare our methodology to state-of-the-art self-supervision approaches, showing competitive to superior performance especially in low-label scenarios incorporating a high degree of uncertainty.
【5】 Improving Ultrasound Tongue Image Reconstruction from Lip Images Using Self-supervised Learning and Attention Mechanism 标题:利用自监督学习和注意机制改进唇部图像的超声舌象重建
作者:Haiyang Liu,Jihan Zhang 机构:Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan, School of Mechanical Engineering, Southeast University, Nanjing, China 备注:Accepted in KDD Workshop (BIOKDD 2021) 链接:https://arxiv.org/abs/2106.11769 摘要:言语产生是一个动态的过程,涉及舌、颌、唇等多个人体器官。建立声道变形的动力学模型是理解语音的一个基本问题,而语音是人类日常交际中最常见的方式。研究人员使用几个感官流来同时描述这个过程,这些感官流在统计学上无疑与其他感官流相关。在本文中,我们解决了以下问题:给定一个可观察到的嘴唇图像序列,我们可以描绘出相应的舌头运动。将该问题描述为自监督学习问题,采用双流卷积网络和长短记忆网络作为学习任务,并引入注意机制。通过利用未标记的唇部视频预测即将到来的超声舌像序列,我们评估了该方法的性能。结果表明,该模型能够生成与真实舌象接近的图像,实现了两种成像模式的匹配。 摘要:Speech production is a dynamic procedure, which involved multi human organs including the tongue, jaw and lips. Modeling the dynamics of the vocal tract deformation is a fundamental problem to understand the speech, which is the most common way for human daily communication. Researchers employ several sensory streams to describe the process simultaneously, which are incontrovertibly statistically related to other streams. In this paper, we address the following question: given an observable image sequences of lips, can we picture the corresponding tongue motion. We formulated this problem as the self-supervised learning problem, and employ the two-stream convolutional network and long-short memory network for the learning task, with the attention mechanism. We evaluate the performance of the proposed method by leveraging the unlabeled lip videos to predict an upcoming ultrasound tongue image sequence. The results show that our model is able to generate images that close to the real ultrasound tongue images, and results in the matching between two imaging modalities.
迁移|Zero/Few/One-Shot|自适应(3篇)
【1】 The Hitchhiker's Guide to Prior-Shift Adaptation 标题:“顺风车乘客适应前班次指南”
作者:Tomas Sipka,Milan Sulc,Jiri Matas 机构:Dept. of Cybernetics, Czech Technical University in Prague 备注:16 pages, 7 figures 链接:https://arxiv.org/abs/2106.11695 摘要:在许多计算机视觉分类任务中,测试时的类先验往往不同于训练集上的先验。在这种先验变换的情况下,分类器必须相应地调整以保持接近最优的性能。本文分析了概率分类器对新先验的自适应方法和在未标记测试集上估计新先验的方法。我们提出了一种新的方法来解决基于混淆矩阵的先验估计方法的一个已知问题,其中决策概率和混淆矩阵的不一致估计导致估计的先验值为负值。在细粒度图像分类数据集上的实验表明,该方法在先验自适应方面取得了很好的效果。将该方法应用于两个先验不平衡的任务,即网络爬虫图像学习和植物物种分类,识别率分别提高了1.1%和3.4%。 摘要:In many computer vision classification tasks, class priors at test time often differ from priors on the training set. In the case of such prior shift, classifiers must be adapted correspondingly to maintain close to optimal performance. This paper analyzes methods for adaptation of probabilistic classifiers to new priors and for estimating new priors on an unlabeled test set. We propose a novel method to address a known issue of prior estimation methods based on confusion matrices, where inconsistent estimates of decision probabilities and confusion matrices lead to negative values in the estimated priors. Experiments on fine-grained image classification datasets provide insight into the best practice of prior shift estimation and classifier adaptation and show that the proposed method achieves state-of-the-art results in prior adaptation. Applying the best practice to two tasks with naturally imbalanced priors, learning from web-crawled images and plant species classification, increased the recognition accuracy by 1.1% and 3.4% respectively.
【2】 Adaptive Learning Rate and Momentum for Training Deep Neural Networks 标题:训练深度神经网络的自适应学习速率和动量
作者:Zhiyong Hao,Yixuan Jiang,Huihua Yu,Hsiao-Dong Chiang 机构:Cornell University, Ithaca NY , USA 链接:https://arxiv.org/abs/2106.11548 摘要:深度学习的最新进展在很大程度上依赖于训练算法的质量和效率。本文提出了一种基于非线性共轭梯度(CG)框架的快速训练方法。提出了共轭梯度二次线搜索(CGQ)方法。一方面,二次线搜索根据当前的损失情况确定步长。另一方面,在计算共轭梯度参数(如Polak-Ribiere)时,动量因子是动态更新的。给出了在强凸环境下保证方法收敛的理论结果。在图像分类数据集上的实验表明,该方法收敛速度快,泛化能力强(测试集精度)。本文方法的一个主要优点是避免了学习速率和动量等超参数的繁琐手工调整。 摘要:Recent progress on deep learning relies heavily on the quality and efficiency of training algorithms. In this paper, we develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework. We propose the Conjugate Gradient with Quadratic line-search (CGQ) method. On the one hand, a quadratic line-search determines the step size according to current loss landscape. On the other hand, the momentum factor is dynamically updated in computing the conjugate gradient parameter (like Polak-Ribiere). Theoretical results to ensure the convergence of our method in strong convex settings is developed. And experiments in image classification datasets show that our method yields faster convergence than other local solvers and has better generalization capability (test set accuracy). One major advantage of the paper method is that tedious hand tuning of hyperparameters like the learning rate and momentum is avoided.
【3】 BiAdam: Fast Adaptive Bilevel Optimization Methods 标题:BiAdam:快速自适应双层优化方法
作者:Feihu Huang,Heng Huang 机构:Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, USA, Editor: 备注:20 pages, 2 tables 链接:https://arxiv.org/abs/2106.11396 摘要:二层优化由于其在超参数优化、策略优化等方面的广泛应用,近年来引起了机器学习领域的广泛关注。虽然最近有人提出了一些方法来解决二层问题,但这些方法没有考虑使用自适应学习率。为了填补这一空白,本文提出了一类快速有效的自适应方法,用于求解外部可能非凸、内部可能强凸的双层优化问题。具体来说,我们提出了一种基于基本动量技术的快速单循环BiAdam算法,该算法的样本复杂度为$tilde{O}(epsilon^{-4})$,用于寻找$epsilon$-平稳点。同时,我们提出了一种基于方差缩减技术的加速BiAdam算法(VR-BiAdam),其样本复杂度达到了$tilde{O}(epsilon^{-3})$。为了进一步减少导数估计的计算量,我们提出了一种避免Hessian逆的快速单循环随机逼近BiAdam算法(saBiAdam),在没有大批量的情况下,该算法的样本复杂度仍然达到$tilde{O}(epsilon^{-4})$。我们进一步提出了一种加速的saBiAdam算法(VR-saBiAdam),该算法的样本复杂度达到了$tilde{O}(epsilon^{-3})$。我们将统一的自适应矩阵应用到我们的方法中,如SUPER-ADAMcitep{huang2021super},其中包括多种类型的自适应学习率。此外,我们的框架可以灵活地使用动量和方差缩减技术。特别是,我们提供了一个有用的收敛性分析框架的约束和无约束双层优化。据我们所知,我们首先研究了具有自适应学习率的自适应双层优化方法。 摘要:Bilevel optimization recently has attracted increased interest in machine learning due to its many applications such as hyper-parameter optimization and policy optimization. Although some methods recently have been proposed to solve the bilevel problems, these methods do not consider using adaptive learning rates. To fill this gap, in the paper, we propose a class of fast and effective adaptive methods for solving bilevel optimization problems that the outer problem is possibly nonconvex and the inner problem is strongly-convex. Specifically, we propose a fast single-loop BiAdam algorithm based on the basic momentum technique, which achieves a sample complexity of $tilde{O}(epsilon^{-4})$ for finding an $epsilon$-stationary point. At the same time, we propose an accelerated version of BiAdam algorithm (VR-BiAdam) by using variance reduced technique, which reaches the best known sample complexity of $tilde{O}(epsilon^{-3})$. To further reduce computation in estimating derivatives, we propose a fast single-loop stochastic approximated BiAdam algorithm (saBiAdam) by avoiding the Hessian inverse, which still achieves a sample complexity of $tilde{O}(epsilon^{-4})$ without large batches. We further present an accelerated version of saBiAdam algorithm (VR-saBiAdam), which also reaches the best known sample complexity of $tilde{O}(epsilon^{-3})$. We apply the unified adaptive matrices to our methods as the SUPER-ADAM citep{huang2021super}, which including many types of adaptive learning rates. Moreover, our framework can flexibly use the momentum and variance reduced techniques. In particular, we provide a useful convergence analysis framework for both the constrained and unconstrained bilevel optimization. To the best of our knowledge, we first study the adaptive bilevel optimization methods with adaptive learning rates.
强化学习(8篇)
【1】 Off-Policy Reinforcement Learning with Delayed Rewards 标题:报酬延迟的非策略强化学习
作者:Beining Han,Zhizhou Ren,Zuofan Wu,Yuan Zhou,Jian Peng 机构:Tsinghua University, University of Illinois at Urbana-Champaign 备注:24 pages 链接:https://arxiv.org/abs/2106.11854 摘要:研究了具有延迟奖励的深度强化学习算法。在许多现实任务中,即时奖励通常不容易获得,甚至在代理执行操作后无法立即定义。在这项工作中,我们首先正式定义了具有延迟报酬的环境,并讨论了由于这种环境的非马尔可夫性质而引起的挑战。然后,我们引入了一个通用的非策略RL框架,该框架具有一个新的Q-函数公式,可以在理论上保证收敛的情况下处理延迟报酬。对于具有高维状态空间的实际任务,我们在框架中进一步引入了Q函数的HC分解规则,从而得到了一种有助于提高训练效率和稳定性的近似方案。最后,我们进行了大量的实验,以证明我们的算法优于现有的工作及其变种的性能。 摘要:We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-world tasks, instant rewards are often not readily accessible or even defined immediately after the agent performs actions. In this work, we first formally define the environment with delayed rewards and discuss the challenges raised due to the non-Markovian nature of such environments. Then, we introduce a general off-policy RL framework with a new Q-function formulation that can handle the delayed rewards with theoretical convergence guarantees. For practical tasks with high dimensional state spaces, we further introduce the HC-decomposition rule of the Q-function in our framework which naturally leads to an approximation scheme that helps boost the training efficiency and stability. We finally conduct extensive experiments to demonstrate the superior performance of our algorithms over the existing work and their variants.
【2】 Emphatic Algorithms for Deep Reinforcement Learning 标题:深度强化学习的重点算法
作者:Ray Jiang,Tom Zahavy,Zhongwen Xu,Adam White,Matteo Hessel,Charles Blundell,Hado van Hasselt 机构: Department of ComputingScience, University of Alberta 备注:None 链接:https://arxiv.org/abs/2106.11779 摘要:非策略学习允许我们从不同行为策略产生的经验中学习可能的行为策略。时间差分(TD)学习算法在与函数逼近和非策略采样相结合时会变得不稳定,这被称为“致命的三元组”。强调时间差分(ETD($lambda$)算法通过对TD($lambda$)更新进行适当加权,确保了线性情况下的收敛性。在本文中,我们将强调方法的应用扩展到深度强化学习代理。我们发现,单纯地将ETD($lambda$)应用于流行的深度强化学习算法(使用前向视图多步返回)会导致较差的性能。然后,我们推导出新的重点算法用于此类算法的上下文中,并且我们证明它们在设计用于突出TD方法的不稳定性的小问题中提供了显著的好处。最后,我们在街机学习环境的经典Atari游戏中大规模应用这些算法时,观察到了性能的提高。 摘要:Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation and off-policy sampling - this is known as the ''deadly triad''. Emphatic temporal difference (ETD($lambda$)) algorithm ensures convergence in the linear case by appropriately weighting the TD($lambda$) updates. In this paper, we extend the use of emphatic methods to deep reinforcement learning agents. We show that naively adapting ETD($lambda$) to popular deep reinforcement learning algorithms, which use forward view multi-step returns, results in poor performance. We then derive new emphatic algorithms for use in the context of such algorithms, and we demonstrate that they provide noticeable benefits in small problems designed to highlight the instability of TD methods. Finally, we observed improved performance when applying these algorithms at scale on classic Atari games from the Arcade Learning Environment.
【3】 MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning 标题:MMD-MIX:协作式多智能体强化学习的均值最大值函数分解
作者:Zhiwei Xu,Dapeng Li,Yunpeng Bai,Guoliang Fan 机构:Fusion Innovation Center, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 备注:7 pages, 2 figures, 2 tables. Accepted by IJCNN 2021 链接:https://arxiv.org/abs/2106.11652 摘要:在现实世界中,许多任务需要多个agent在局部观测条件下相互协作。为了解决这些问题,人们提出了许多基于集中训练和分散执行的多智能体强化学习方法。一个代表性的工作类别是价值分解,它将全局联合Q值$Qtext{jt}$分解为个体Q值$Qu a$,以指导个体的行为,例如VDN(价值分解网络)和QMIX。然而,这些基线往往忽略了情况的随机性。我们提出MMD-MIX,一种结合分布强化学习和价值分解的方法来缓解上述缺点。此外,为了提高数据的采样效率,我们借鉴了REM(Random-emessemblimix)算法,该算法是一种鲁棒的RL算法,它将随机性引入到MMD-MIX中。实验表明,在星际争霸多智能体挑战(SMAC)环境中,MMD-MIX的性能优于先前的基线。 摘要:In the real world, many tasks require multiple agents to cooperate with each other under the condition of local observations. To solve such problems, many multi-agent reinforcement learning methods based on Centralized Training with Decentralized Execution have been proposed. One representative class of work is value decomposition, which decomposes the global joint Q-value $Q_text{jt}$ into individual Q-values $Q_a$ to guide individuals' behaviors, e.g. VDN (Value-Decomposition Networks) and QMIX. However, these baselines often ignore the randomness in the situation. We propose MMD-MIX, a method that combines distributional reinforcement learning and value decomposition to alleviate the above weaknesses. Besides, to improve data sampling efficiency, we were inspired by REM (Random Ensemble Mixture) which is a robust RL algorithm to explicitly introduce randomness into the MMD-MIX. The experiments demonstrate that MMD-MIX outperforms prior baselines in the StarCraft Multi-Agent Challenge (SMAC) environment.
【4】 Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation 标题:线性函数逼近强化学习的一致PAC界
作者:Jiafan He,Dongruo Zhou,Quanquan Gu 机构:and 备注:30 pages 链接:https://arxiv.org/abs/2106.11612 摘要:研究了线性函数逼近下的强化学习。现有算法只能保证高概率遗憾和/或近似正确(PAC)样本复杂度,不能保证收敛到最优策略。为了克服现有算法的局限性,本文提出了一种新的算法FLUTE,该算法能以高概率均匀收敛到最优策略。一致PAC保证是文献中关于强化学习的最有力的保证,它可以直接表示PAC和高概率后悔界,使我们的算法优于现有的线性函数逼近算法。该算法的核心是一种新的极大极小值函数估计算法和一种从历史观测数据中选取训练样本的多级划分方案。这两种技术都是新的和独立的兴趣。 摘要:We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multi-level partition scheme to select the training samples from historical observations. Both of these techniques are new and of independent interest.
【5】 Reinforcement learning for PHY layer communications 标题:用于物理层通信的强化学习
作者:Philippe Mary,Visa Koivunen,Christophe Moy 机构:. 备注:Machine Learning and Wireless Communications, In press 链接:https://arxiv.org/abs/2106.11595 摘要:在本章中,我们将通过定义不同类别的问题以及处理这些问题的可能解决方案,给出应用RL优化无线通信物理层的综合实例。在第9.2节中,我们介绍了解决RL问题所需的所有基本理论,即马尔可夫决策过程(MDP)、部分可观测马尔可夫决策过程(POMDP),以及两个非常重要且应用广泛的RL算法,即Q-学习和SARSA算法。我们还介绍了深度强化学习(DRL)范式,最后一节介绍了多武装土匪(MAB)框架。第9.3节着重于一些玩具的例子来说明如何在通信系统中使用RL的基本概念。我们使用与本章第9.2节中类似的符号从简化系统模型的文献中提取应用程序。在第9.3节中,我们还着重于建模RL问题,即如何选择行动和状态空间以及奖励。本章在第9.4节中结束,对RL趋势进行了前瞻性思考,并在第9.5节中对更广泛的技术现状进行了回顾。 摘要:In this chapter, we will give comprehensive examples of applying RL in optimizing the physical layer of wireless communications by defining different class of problems and the possible solutions to handle them. In Section 9.2, we present all the basic theory needed to address a RL problem, i.e. Markov decision process (MDP), Partially observable Markov decision process (POMDP), but also two very important and widely used algorithms for RL, i.e. the Q-learning and SARSA algorithms. We also introduce the deep reinforcement learning (DRL) paradigm and the section ends with an introduction to the multi-armed bandits (MAB) framework. Section 9.3 focuses on some toy examples to illustrate how the basic concepts of RL are employed in communication systems. We present applications extracted from literature with simplified system models using similar notation as in Section 9.2 of this Chapter. In Section 9.3, we also focus on modeling RL problems, i.e. how action and state spaces and rewards are chosen. The Chapter is concluded in Section 9.4 with a prospective thought on RL trends and it ends with a review of a broader state of the art in Section 9.5.
【6】 Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations 标题:具有低秩MDP和丰富观测的不可知强化学习
作者:Christoph Dann,Yishay Mansour,Mehryar Mohri,Ayush Sekhari,Karthik Sridharan 机构:Google Research, Cornell University, Tel Aviv University, Courant Institute of Mathematical Sciences 链接:https://arxiv.org/abs/2106.11519 摘要:在观察空间丰富的问题中,关于可证明有效强化学习(RL)的研究已经取得了许多新进展。然而,所有这些工作都有一个关于真实MDP的最优值函数的强可实现性假设。这种可实现性假设在实践中往往过于强大。在这项工作中,我们考虑更现实的不可知RL设置,它具有丰富的观察空间和一个固定的策略类$Pi$,其中可能不包含任何接近最优的策略。我们为这种设置提供了一种算法,其误差以底层MDP的秩$d$为界。具体地说,我们的算法具有$widetilde{O} left((H^{4d}K^{3d}log |Pi |)/epsilon^2 right)的样本复杂度界限,其中,$H$是片段长度,$K$是动作数,$epsilon>0$是所需的次优。我们还提供了一个几乎匹配的下界,这个下界表明指数依赖于秩是不可避免的,没有进一步的假设。 摘要:There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $Pi$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of $widetilde{O}left((H^{4d} K^{3d} log |Pi|)/epsilon^2right)$ where $H$ is the length of episodes, $K$ is the number of actions and $epsilon>0$ is the desired sub-optimality. We also provide a nearly matching lower bound for this agnostic setting that shows that the exponential dependence on rank is unavoidable, without further assumptions.
【7】 Policy Smoothing for Provably Robust Reinforcement Learning 标题:可证明鲁棒强化学习的策略平滑
作者:Aounon Kumar,Alexander Levine,Soheil Feizi 机构:University of Maryland 链接:https://arxiv.org/abs/2106.11420 摘要:深度神经网络(DNN)模型可证明对抗鲁棒性的研究主要集中在图像分类等静态监督学习任务上。然而,DNNs被广泛应用于强化学习(RL)等现实自适应任务中,使得RL系统容易受到攻击。对抗性RL的关键挑战在于攻击者能够适应agent在前一时间步中使用的防御策略,从而在以后的步骤中加强攻击。在这项工作中,我们研究了RL对输入的范数有界对抗性扰动的可证明鲁棒性。我们专注于基于平滑的可证明防御,并提出策略平滑,其中代理在应用策略网络之前,在每个时间步向其观测值添加高斯噪声,以使其对其输入的对抗性扰动不那么敏感。我们的主要理论贡献是证明了Neyman-Pearson引理的一个自适应版本,其中特定时刻的对抗性扰动可以是当前和先前观测、状态以及先前观测行为的随机函数。利用这个引理,我们将静态图像分类中随机平滑产生的鲁棒性证书调整为RL的动态设置。我们生成证书,保证在输入的范数有界对抗扰动下,平滑策略获得的总报酬不会低于某个阈值。我们通过构造一个最坏情况的设置来证明我们的证书是紧的,这个设置达到了我们分析中得到的边界。在我们的实验中,我们证明了这种方法可以在复杂的环境中产生有意义的证书,证明了它对对抗性攻击的有效性。 摘要:The study of provable adversarial robustness for deep neural network (DNN) models has mainly focused on static supervised learning tasks such as image classification. However, DNNs have been used extensively in real-world adaptive tasks such as reinforcement learning (RL), making RL systems vulnerable to adversarial attacks. The key challenge in adversarial RL is that the attacker can adapt itself to the defense strategy used by the agent in previous time-steps to strengthen its attack in future steps. In this work, we study the provable robustness of RL against norm-bounded adversarial perturbations of the inputs. We focus on smoothing-based provable defenses and propose policy smoothing where the agent adds a Gaussian noise to its observation at each time-step before applying the policy network to make itself less sensitive to adversarial perturbations of its inputs. Our main theoretical contribution is to prove an adaptive version of the Neyman-Pearson Lemma where the adversarial perturbation at a particular time can be a stochastic function of current and previous observations and states as well as previously observed actions. Using this lemma, we adapt the robustness certificates produced by randomized smoothing in the static setting of image classification to the dynamic setting of RL. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial perturbation of the input. We show that our certificates are tight by constructing a worst-case setting that achieves the bounds derived in our analysis. In our experiments, we show that this method can yield meaningful certificates in complex environments demonstrating its effectiveness against adversarial attacks.
【8】 Interpretable Model-based Hierarchical Reinforcement Learning using Inductive Logic Programming 标题:基于归纳逻辑编程的可解释模型分层强化学习
作者:Duo Xu,Faramarz Fekri 机构:Department of Electrical and Computer Engineering, Georgia Institute of Technology 链接:https://arxiv.org/abs/2106.11417 摘要:近年来,深度强化学习在广泛的应用中取得了巨大的成功。然而,众所周知,它缺乏数据效率和可解释性。数据效率很重要,因为与环境交互的成本很高。此外,可解释性可以增加黑盒式深RL模型的透明度,从而获得用户的信任。在这项工作中,我们提出了一个新的基于符号RL的层次结构,利用符号转换模型来提高数据效率,并引入学习策略的可解释性。该框架由高级代理、子任务求解器和符号转换模型组成。在不假定任何先验知识的情况下,采用归纳逻辑编程(ILP)学习符号状态转换的规则,引入可解释性,使学习到的行为为用户所理解。在实证实验中,我们证实了提出的框架比以前的方法提供了大约30%到40%的数据效率。 摘要:Recently deep reinforcement learning has achieved tremendous success in wide ranges of applications. However, it notoriously lacks data-efficiency and interpretability. Data-efficiency is important as interacting with the environment is expensive. Further, interpretability can increase the transparency of the black-box-style deep RL models and hence gain trust from the users. In this work, we propose a new hierarchical framework via symbolic RL, leveraging a symbolic transition model to improve the data-efficiency and introduce the interpretability for learned policy. This framework consists of a high-level agent, a subtask solver and a symbolic transition model. Without assuming any prior knowledge on the state transition, we adopt inductive logic programming (ILP) to learn the rules of symbolic state transitions, introducing interpretability and making the learned behavior understandable to users. In empirical experiments, we confirmed that the proposed framework offers approximately between 30% to 40% more data efficiency over previous methods.
符号|符号学习(1篇)
【1】 Any equation is a forest: Symbolic genetic algorithm for discovering open-form partial differential equations (SGA-PDE) 标题:任何方程都是森林:发现开式偏微分方程的符号遗传算法(SGA-PDE)
作者:Yuntian Chen,Yingtao Luo,Qiang Liu,Hao Xu,Dongxiao Zhang 机构:Intelligent Energy Laboratory, Frontier Research Center, Peng Cheng Laboratory, Shenzhen, P. R., Department of Computer Science, University of Washington, Seattle, WA, U.S.A. 备注:24 pages, 16 figures 链接:https://arxiv.org/abs/2106.11927 摘要:偏微分方程(pde)是领域知识的简明易懂的表示形式,对于加深我们对物理过程的理解和预测未来的反应至关重要。然而,许多实际问题的偏微分方程都是不确定的,这就要求偏微分方程的发现。我们提出了符号遗传算法(SGA-PDE)来直接从数据中发现开放式偏微分方程,而无需事先了解方程的结构。SGA-PDE侧重于PDE的表示和优化。首先,SGA-PDE利用符号数学实现对任意给定PDE的灵活表示,将一个PDE转化为一个森林,并将每个函数项转化为一个二叉树。其次,SGA-PDE采用一种特殊设计的遗传算法,通过迭代更新树的拓扑结构和节点属性,有效地优化二叉树。SGA-PDE是无梯度的,这是PDE发现中的一个理想特性,因为很难获得PDE损耗和PDE结构之间的梯度。在实验中,SGA-PDE不仅成功地发现了非线性Burgers方程、Korteweg-de-Vries(KdV)方程和Chafee-Infante方程,而且还处理了传统PDE发现方法无法求解的具有分数结构和复合函数的偏微分方程。 摘要:Partial differential equations (PDEs) are concise and understandable representations of domain knowledge, which are essential for deepening our understanding of physical processes and predicting future responses. However, the PDEs of many real-world problems are uncertain, which calls for PDE discovery. We propose the symbolic genetic algorithm (SGA-PDE) to discover open-form PDEs directly from data without prior knowledge about the equation structure. SGA-PDE focuses on the representation and optimization of PDE. Firstly, SGA-PDE uses symbolic mathematics to realize the flexible representation of any given PDE, transforms a PDE into a forest, and converts each function term into a binary tree. Secondly, SGA-PDE adopts a specially designed genetic algorithm to efficiently optimize the binary trees by iteratively updating the tree topology and node attributes. The SGA-PDE is gradient-free, which is a desirable characteristic in PDE discovery since it is difficult to obtain the gradient between the PDE loss and the PDE structure. In the experiment, SGA-PDE not only successfully discovered nonlinear Burgers' equation, Korteweg-de Vries (KdV) equation, and Chafee-Infante equation, but also handled PDEs with fractional structure and compound functions that cannot be solved by conventional PDE discovery methods.
医学相关(3篇)
【1】 Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic 标题:细粒到粗粒错误信息分类:冠状病毒信息症的实证研究
作者:Ye Jiang,Xingyi Song,Carolina Scarton,Ahmet Aker,Kalina Bontcheva 机构:University of Sheffield, Department of Computer Science, Sheffield, S,DP, UK, University of Duisburg-Essen, Department of Computer Science and Applied Cognitive Science, Duisburg, Germany 链接:https://arxiv.org/abs/2106.11702 摘要:在社交媒体上传播COVID-19的错误信息已经引起了许多研究人员的注意。据谷歌学者称,到目前为止,已经发表了约26000篇与COVID-19相关的错误信息研究。这些研究大多集中在1)检测和/或2)分析COVID-19相关错误信息的特征。然而,与误传相关的社会行为研究往往被忽视。在本文中,我们介绍了一个包含社会行为注释(如对错误信息的评论或问题)的细粒度注释错误信息tweets数据集。该数据集不仅可以进行社会行为分析,而且适用于基于证据或非基于证据的错误信息分类任务。此外,我们在实验中引入了保留声明验证,并证明了当应用于真实世界中看不见的错误信息时,错误信息分类的性能可能会有显著的不同。 摘要:The spreading COVID-19 misinformation over social media already draws the attention of many researchers. According to Google Scholar, about 26000 COVID-19 related misinformation studies have been published to date. Most of these studies focusing on 1) detect and/or 2) analysing the characteristics of COVID-19 related misinformation. However, the study of the social behaviours related to misinformation is often neglected. In this paper, we introduce a fine-grained annotated misinformation tweets dataset including social behaviours annotation (e.g. comment or question to the misinformation). The dataset not only allows social behaviours analysis but also suitable for both evidence-based or non-evidence-based misinformation classification task. In addition, we introduce leave claim out validation in our experiments and demonstrate the misinformation classification performance could be significantly different when applying to real-world unseen misinformation.
【2】 From SIR to SEAIRD: a novel data-driven modeling approach based on the Grey-box System Theory to predict the dynamics of COVID-19 标题:从SIR到SEAIRD:一种新的基于灰盒理论的数据驱动建模方法预测冠状病毒动态
作者:Komi Midzodzi Pékpé,Djamel Zitouni,Gilles Gasso,Wajdi Dhifli,Benjamin C. Guinhouya 机构:C. Guinhouya, Received: date Accepted: date 链接:https://arxiv.org/abs/2106.11918 摘要:COVID-19的常见房室模型是基于先验知识和大量假设的。此外,他们没有系统地纳入无症状病例。我们的研究旨在通过利用灰箱系统理论或灰箱辨识的优点,为数据驱动方法提供一个框架,灰箱辨识以其在部分、不完整或不确定数据下解决问题的鲁棒性而闻名。从一个开源存储库中提取的关于确诊病例和死亡的经验数据被用于开发SEAIRD隔间模型。对COVID-19的行为进行了调整以适应当前的知识。利用常微分方程求解器和优化工具实现并求解了该模型。应用交叉验证技术,计算确定系数$R^2$,以评估模型的拟合优度数据。最后对关键的流行病学参数进行了估计,为SEAIRD模型的建立提供了理论依据。当应用于巴西的案例时,SEAIRD对数据产生了极好的一致性,确定系数为%R^2$$geq 90%$。COVID-19传播的概率普遍较高($geq 95%$)。根据20天的模型数据,巴西和法国COVID-19的发病率低至每10万接触者中有3例感染。在同一时间段内,COVID-19的死亡率在法国最高(16.4%),其次是巴西(6.9%),在俄罗斯最低(1美元)。SEAIRD代表了在传染病的动态稳定阶段建模的一种资产,特别是在病理生理学知识非常有限的情况下,对于新病毒。 摘要:Common compartmental modeling for COVID-19 is based on a priori knowledge and numerous assumptions. Additionally, they do not systematically incorporate asymptomatic cases. Our study aimed at providing a framework for data-driven approaches, by leveraging the strengths of the grey-box system theory or grey-box identification, known for its robustness in problem solving under partial, incomplete, or uncertain data. Empirical data on confirmed cases and deaths, extracted from an open source repository were used to develop the SEAIRD compartment model. Adjustments were made to fit current knowledge on the COVID-19 behavior. The model was implemented and solved using an Ordinary Differential Equation solver and an optimization tool. A cross-validation technique was applied, and the coefficient of determination $R^2$ was computed in order to evaluate the goodness-of-fit of the model. %to the data. Key epidemiological parameters were finally estimated and we provided the rationale for the construction of SEAIRD model. When applied to Brazil's cases, SEAIRD produced an excellent agreement to the data, with an %coefficient of determination $R^2$ $geq 90%$. The probability of COVID-19 transmission was generally high ($geq 95%$). On the basis of a 20-day modeling data, the incidence rate of COVID-19 was as low as 3 infected cases per 100,000 exposed persons in Brazil and France. Within the same time frame, the fatality rate of COVID-19 was the highest in France (16.4%) followed by Brazil (6.9%), and the lowest in Russia ($leq 1%$). SEAIRD represents an asset for modeling infectious diseases in their dynamical stable phase, especially for new viruses when pathophysiology knowledge is very limited.
【3】 MIMIR: Deep Regression for Automated Analysis of UK Biobank Body MRI 标题:MIMIR:用于英国生物库体部MRI自动分析的深度回归
作者:Taro Langner,Andrés Martínez Mora,Robin Strand,Håkan Ahlström,Joel Kullberg 链接:https://arxiv.org/abs/2106.11731 摘要:英国生物银行(UKB)正在对50多万名志愿者进行大规模研究,收集有关遗传学、生活方式、血液生化等方面的健康相关信息。此外,医学成像还针对10万名受试者,进行了7万次随访,实现了对器官、肌肉和身体成分的测量。随着多达170000个MR图像的安装,各种方法也相应地用于大规模图像分析。这项工作提出了一个实验推理机,可以自动预测从UKB颈部到膝盖体MRI的主题元数据的综合概况。在交叉验证中,它准确地推断了年龄、身高、体重和性别等基线特征,但也模拟了DXA测量的身体成分、器官体积以及握力、脉搏率和2型糖尿病状态等抽象特征(AUC:0.866)。该系统能在数小时内自动分析数千个被试,并提供个体置信区间。其基本方法是基于卷积神经网络的图像均值-方差回归的二维表示的磁共振数据。这项工作的目的是使拟议的系统免费提供给研究人员,他们可以使用它获得快速和完全自动化的估计72个不同的测量后,立即发布新的英国生物银行的图像数据。 摘要:UK Biobank (UKB) is conducting a large-scale study of more than half a million volunteers, collecting health-related information on genetics, lifestyle, blood biochemistry, and more. Medical imaging furthermore targets 100,000 subjects, with 70,000 follow-up sessions, enabling measurements of organs, muscle, and body composition. With up to 170,000 mounting MR images, various methodologies are accordingly engaged in large-scale image analysis. This work presents an experimental inference engine that can automatically predict a comprehensive profile of subject metadata from UKB neck-to-knee body MRI. In cross-validation, it accurately inferred baseline characteristics such as age, height, weight, and sex, but also emulated measurements of body composition by DXA, organ volumes, and abstract properties like grip strength, pulse rate, and type 2 diabetic status (AUC: 0.866). The proposed system can automatically analyze thousands of subjects within hours and provide individual confidence intervals. The underlying methodology is based on convolutional neural networks for image-based mean-variance regression on two-dimensional representations of the MRI data. This work aims to make the proposed system available for free to researchers, who can use it to obtain fast and fully-automated estimates of 72 different measurements immediately upon release of new UK Biobank image data.
聚类(3篇)
【1】 A Clustering-based Framework for Classifying Data Streams 标题:一种基于聚类的数据流分类框架
作者:Xuyang Yan,Abdollah Homaifar,Mrinmoy Sarkar,Abenezer Girma,Edward Tunstel 机构:North Carolina A&T State University, Greensboro, NC, USA, Raytheon Technologies Research Center, East Hartford, CT, USA 备注:This paper has been accepted by IJCAI 2021 链接:https://arxiv.org/abs/2106.11823 摘要:数据流的非平稳性对传统的机器学习技术提出了严峻的挑战。尽管已经提出了一些解决方案来扩展传统的机器学习技术来处理数据流,但是这些方法要么需要初始标签集,要么依赖于专门的设计参数。类之间的重叠和数据流的标记构成了对数据流进行分类的其他主要挑战。在本文中,我们提出了一个基于聚类的数据流分类框架来处理非平稳数据流,而不需要使用初始标签集。采用基于密度的流聚类方法,通过动态阈值捕获新概念,并引入有效的主动标签查询策略,从数据流中不断学习新概念。探讨每个类的子类结构,以处理类之间的重叠。实验结果和定量比较研究表明,该方法在统计上比现有方法有更好的或可比的性能。 摘要:The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches either require an initial label set or rely on specialized design parameters. The overlap among classes and the labeling of data streams constitute other major challenges for classifying data streams. In this paper, we proposed a clustering-based data stream classification framework to handle non-stationary data streams without utilizing an initial label set. A density-based stream clustering procedure is used to capture novel concepts with a dynamic threshold and an effective active label querying strategy is introduced to continuously learn the new concepts from the data streams. The sub-cluster structure of each cluster is explored to handle the overlap among classes. Experimental results and quantitative comparison studies reveal that the proposed method provides statistically better or comparable performance than the existing methods.
【2】 Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data 标题:基于Sigmoid正则化的核聚类在序列数据高效分割中的应用
作者:Tung Doan,Atsuhiro Takasu 机构:The Graduate University for Advanced Studies, SOKENDAI, Shonan Village, Hayama, Kanagawa,-, Japan, National Institute of Informatics,-,-, Hitotsubashi, Chiyoda, Tokyo,-, Japan, A R T I C L E I N F O 链接:https://arxiv.org/abs/2106.11541 摘要:核分割的目的是将一个数据序列分割成若干非重叠的段,这些段可能具有非线性和复杂的结构。一般来说,它是一个带有组合约束的离散优化问题。动态规划(DP)是一种常用的优化算法,它具有二次计算和内存需求。考虑到实际中的序列太长,这种算法不是一种实用的方法。虽然许多启发式算法被提出来逼近最优分割,但它们的解的质量没有保证。本文采用可微的方法来解决上述问题。首先,我们引入一种新的基于sigmoid的正则化来平滑地逼近组合约束。结合均衡核聚类的目标,提出了一种基于sigmoid正则化的可微核聚类模型(KCSR),利用基于梯度的算法得到最优分割。其次,我们提出了一个随机变量的模型。第二种模型利用时空复杂度较低的随机梯度下降算法进行优化,可以对超长数据序列进行分割。最后,为了同时分割多个数据序列,我们稍微修改了基于sigmoid的正则化,进一步引入了扩展的模型。通过对不同类型数据序列的大量实验,评估了模型的性能,并与现有方法进行了比较。实验结果验证了所提模型的优越性。我们的Matlab源代码可以在github上获得。 摘要:Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A popular algorithm for optimally solving this problem is dynamic programming (DP), which has quadratic computation and memory requirements. Given that sequences in practice are too long, this algorithm is not a practical approach. Although many heuristic algorithms have been proposed to approximate the optimal segmentation, they have no guarantee on the quality of their solutions. In this paper, we take a differentiable approach to alleviate the aforementioned issues. First, we introduce a novel sigmoid-based regularization to smoothly approximate the combinatorial constraints. Combining it with objective of the balanced kernel clustering, we formulate a differentiable model termed Kernel clustering with sigmoid-based regularization (KCSR), where the gradient-based algorithm can be exploited to obtain the optimal segmentation. Second, we develop a stochastic variant of the proposed model. By using the stochastic gradient descent algorithm, which has much lower time and space complexities, for optimization, the second model can perform segmentation on overlong data sequences. Finally, for simultaneously segmenting multiple data sequences, we slightly modify the sigmoid-based regularization to further introduce an extended variant of the proposed model. Through extensive experiments on various types of data sequences performances of our models are evaluated and compared with those of the existing methods. The experimental results validate advantages of the proposed models. Our Matlab source code is available on github.
【3】 Routine Clustering of Mobile Sensor Data Facilitates Psychotic Relapse Prediction in Schizophrenia Patients 标题:移动传感器数据的常规聚类有助于精神分裂症患者的精神病复发预测
作者:Joanne Zhou,Bishal Lamichhane,Dror Ben-Zeev,Andrew Campbell,Akane Sano 机构: Department of Statistics Rice University Houston US, Department of Electrical and Computer Engineering Rice University Houston US, BRiTE Center Psychiatry and Behavioral Sciences University of Washington Seattle US 链接:https://arxiv.org/abs/2106.11487 摘要:我们的目标是建立聚类模型,从连续的多模态移动感知数据中获得行为表征,以用于复发预测任务。所识别的集群可以代表与患者日常生活相关的不同常规行为趋势以及与即将复发相关的非典型行为趋势。我们使用在交叉检查项目中获得的移动传感数据进行分析。从总共63名精神分裂症患者获得的6种不同的基于移动感知的模式(例如环境光、声音/对话、加速度等)的连续数据,每个患者监测长达一年,用于聚类模型和复发预测评估。采用高斯混合模型(GMM)和基于Medoids的划分模型(PAM)两种聚类模型对移动感知数据进行行为表征。利用聚类模型得到的特征训练和评价了一个基于平衡随机森林的个体化复发预测模型。个性化是通过根据年龄相近的其他患者组成的个性化子集来确定给定患者的最佳特征来实现的。使用GMM和PAM模型确定的集群被发现代表不同的行为模式(例如,集群代表久坐日、活跃但交流日较少等)。从聚类模型中获得的行为表征特征在复发期附近有显著变化。基于聚类模型的特征,加上其他特征的移动传感数据,导致F2得分为0.24分的复发预测任务中,离开一个病人的评估设置。获得的F2得分显著高于平均F2得分为0.042的随机分类基线。 摘要:We aim to develop clustering models to obtain behavioral representations from continuous multimodal mobile sensing data towards relapse prediction tasks. The identified clusters could represent different routine behavioral trends related to daily living of patients as well as atypical behavioral trends associated with impending relapse. We used the mobile sensing data obtained in the CrossCheck project for our analysis. Continuous data from six different mobile sensing-based modalities (e.g. ambient light, sound/conversation, acceleration etc.) obtained from a total of 63 schizophrenia patients, each monitored for up to a year, were used for the clustering models and relapse prediction evaluation. Two clustering models, Gaussian Mixture Model (GMM) and Partition Around Medoids (PAM), were used to obtain behavioral representations from the mobile sensing data. The features obtained from the clustering models were used to train and evaluate a personalized relapse prediction model using Balanced Random Forest. The personalization was done by identifying optimal features for a given patient based on a personalization subset consisting of other patients who are of similar age. The clusters identified using the GMM and PAM models were found to represent different behavioral patterns (such as clusters representing sedentary days, active but with low communications days, etc.). Significant changes near the relapse periods were seen in the obtained behavioral representation features from the clustering models. The clustering model based features, together with other features characterizing the mobile sensing data, resulted in an F2 score of 0.24 for the relapse prediction task in a leave-one-patient-out evaluation setting. This obtained F2 score is significantly higher than a random classification baseline with an average F2 score of 0.042.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】 SeqNetVLAD vs PointNetVLAD: Image Sequence vs 3D Point Clouds for Day-Night Place Recognition 标题:SeqNetVLAD与PointNetVLAD:图像序列与三维点云的昼夜位置识别
作者:Sourav Garg,Michael Milford 机构:QUT Centre for Robotics, Queensland University of Technology 备注:Accepted to CVPR 2021 Workshop on 3D Vision and Robotics (3DVR). this https URL 链接:https://arxiv.org/abs/2106.11481 摘要:位置识别是移动机器人定位和导航的关键技术。基于图像或视觉位置识别(VPR)是一个具有挑战性的问题,因为场景外观和相机视点在重新访问位置时会发生显著变化。最近基于“序列表示”的VPR方法与传统的序列分数聚合或基于单个图像的技术相比,显示出了良好的结果。与此同时,随着基于深度学习的点云处理技术的发展,基于三维点云的位置识别也在探索之中。然而,一个关键的问题仍然存在:一个显式的基于三维结构的位置表示总是优于一个隐式的基于RGB图像序列的“空间”表示,它可以内在地学习场景结构。在这个扩展的摘要中,我们试图通过考虑类似的“度量跨度”来表示位置,来比较这两种方法。我们将基于三维点云的方法(PointNetVLAD)与基于图像序列的方法(SeqNet等)进行了比较,并展示了基于图像序列的方法在给定度量范围内接近甚至超过基于点云的方法所达到的性能。这些性能变化可归因于输入传感器的数据丰富性以及移动机器人的数据积累策略的差异。虽然对于这两种不同的模式而言,完美的苹果对苹果的比较可能不可行,但所提出的比较朝着回答与空间表示有关的更深层次问题的方向迈出了一步,这些问题与自主驾驶和增强/虚拟现实等数个应用程序有关。公开的源代码https://github.com/oravus/seqNet. 摘要:Place Recognition is a crucial capability for mobile robot localization and navigation. Image-based or Visual Place Recognition (VPR) is a challenging problem as scene appearance and camera viewpoint can change significantly when places are revisited. Recent VPR methods based on ``sequential representations'' have shown promising results as compared to traditional sequence score aggregation or single image based techniques. In parallel to these endeavors, 3D point clouds based place recognition is also being explored following the advances in deep learning based point cloud processing. However, a key question remains: is an explicit 3D structure based place representation always superior to an implicit ``spatial'' representation based on sequence of RGB images which can inherently learn scene structure. In this extended abstract, we attempt to compare these two types of methods by considering a similar ``metric span'' to represent places. We compare a 3D point cloud based method (PointNetVLAD) with image sequence based methods (SeqNet and others) and showcase that image sequence based techniques approach, and can even surpass, the performance achieved by point cloud based methods for a given metric span. These performance variations can be attributed to differences in data richness of input sensors as well as data accumulation strategies for a mobile robot. While a perfect apple-to-apple comparison may not be feasible for these two different modalities, the presented comparison takes a step in the direction of answering deeper questions regarding spatial representations, relevant to several applications like Autonomous Driving and Augmented/Virtual Reality. Source code available publicly https://github.com/oravus/seqNet.
联邦学习|隐私保护|加密(2篇)
【1】 Enabling Long-Term Cooperation in Cross-Silo Federated Learning: A Repeated Game Perspective 标题:跨竖井联合学习的长期合作:重复博弈视角
作者:Ning Zhang,Qian Ma,Xu Chen 链接:https://arxiv.org/abs/2106.11814 摘要:跨竖井联合学习(FL)是一种分布式学习方法,其中客户机协作训练全局模型,同时保持其本地数据的私有性。与跨设备FL不同,cross-silo-FL中的客户通常是组织或公司,由于其本地数据集时变,可能重复执行多个cross-silo-FL流程,并通过自私地选择其参与级别来优化其长期利益。虽然已经有一些关于激励客户加入外语教学的研究,但是对跨文化外语教学中客户长期自私参与行为的分析还没有深入探讨。本文分析了跨竖井FL中异质客户的自私参与行为,具体地说,我们将客户的长期自私参与行为建模为一个无限重复博弈,其中阶段博弈是一个跨竖井FL过程中的自私参与博弈。对于阶段博弈SPFL,我们推导了唯一纳什均衡(NE),并针对每个客户提出了一个分布式算法来计算其均衡参与策略。对于客户之间的长期交互,我们提出了一种客户合作策略,在增加模型训练的本地数据量的同时,最小化搭便车者的数量。我们证明了在惩罚策略的作用下,这种合作策略是一个无限重复博弈的SPNE,在这个SPNE下,处于阶段博弈NE上的搭便车者选择成为(部分)贡献者。我们进一步提出了一种算法来计算最优SPNE,该算法在最大限度地减少模型训练的局部数据量的同时,使搭便车的数量最小化。仿真结果表明,本文提出的最优SPNE协作策略能有效地减少搭便车者的数量,增加模型训练的局部数据量。 摘要:Cross-silo federated learning (FL) is a distributed learning approach where clients train a global model cooperatively while keeping their local data private. Different from cross-device FL, clients in cross-silo FL are usually organizations or companies which may execute multiple cross-silo FL processes repeatedly due to their time-varying local data sets, and aim to optimize their long-term benefits by selfishly choosing their participation levels. While there has been some work on incentivizing clients to join FL, the analysis of the long-term selfish participation behaviors of clients in cross-silo FL remains largely unexplored. In this paper, we analyze the selfish participation behaviors of heterogeneous clients in cross-silo FL. Specifically, we model the long-term selfish participation behaviors of clients as an infinitely repeated game, with the stage game being a selfish participation game in one cross-silo FL process (SPFL). For the stage game SPFL, we derive the unique Nash equilibrium (NE), and propose a distributed algorithm for each client to calculate its equilibrium participation strategy. For the long-term interactions among clients, we derive a cooperative strategy for clients which minimizes the number of free riders while increasing the amount of local data for model training. We show that enforced by a punishment strategy, such a cooperative strategy is a SPNE of the infinitely repeated game, under which some clients who are free riders at the NE of the stage game choose to be (partial) contributors. We further propose an algorithm to calculate the optimal SPNE which minimizes the number of free riders while maximizing the amount of local data for model training. Simulation results show that our proposed cooperative strategy at the optimal SPNE can effectively reduce the number of free riders and increase the amount of local data for model training.
【2】 FLRA: A Reference Architecture for Federated Learning Systems 标题:FLRA:一种联邦学习系统的参考体系结构
作者:Sin Kit Lo,Qinghua Lu,Hye-Young Paik,Liming Zhu 机构: Data, CSIRO, Sydney, Australia, University of New South Wales, Sydney, Australia 备注:Accepted by ECSA 2021 链接:https://arxiv.org/abs/2106.11570 摘要:联合学习是一种新兴的机器学习范式,它使多个设备能够在本地训练模型并形成一个全局模型,而不必共享客户机的本地数据。联邦学习系统可以看作是一个大规模的分布式系统,涉及到不同的组件和涉众,具有不同的需求和约束。因此,开发一个联邦学习系统需要软件系统设计思想和机器学习知识。尽管人们从机器学习的角度对联邦学习进行了大量的研究,但是我们以前对该领域的系统文献回顾表明,对于联邦学习的软件体系结构设计,显然缺乏考虑。在本文中,我们提出了一个联邦学习系统的参考体系结构FLRA,它为基于联邦学习的解决方案提供了一个模板设计。所提出的FLRA参考体系结构是基于对文献中联邦学习系统的现有模式和现有工业实现的广泛回顾。FLRA参考体系结构由一组体系结构模式组成,这些模式可以解决联邦学习体系结构中经常出现的设计问题。FLRA参考体系结构可以作为设计指南,帮助架构师和开发人员为他们的问题提供实用的解决方案,这些解决方案可以进一步定制。 摘要:Federated learning is an emerging machine learning paradigm that enables multiple devices to train models locally and formulate a global model, without sharing the clients' local data. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constraints. Hence, developing a federated learning system requires both software system design thinking and machine learning knowledge. Although much effort has been put into federated learning from the machine learning perspectives, our previous systematic literature review on the area shows that there is a distinct lack of considerations for software architecture design for federated learning. In this paper, we propose FLRA, a reference architecture for federated learning systems, which provides a template design for federated learning-based solutions. The proposed FLRA reference architecture is based on an extensive review of existing patterns of federated learning systems found in the literature and existing industrial implementation. The FLRA reference architecture consists of a pool of architectural patterns that could address the frequently recurring design problems in federated learning architectures. The FLRA reference architecture can serve as a design guideline to assist architects and developers with practical solutions for their problems, which can be further customised.
推理|分析|理解|解释(6篇)
【1】 Sphynx: ReLU-Efficient Network Design for Private Inference 标题:Sphynx:面向私有推理的RELU-Efficient网络设计
作者:Minsu Cho,Zahra Ghodsi,Brandon Reagen,Siddharth Garg,Chinmay Hegde 机构: New York University, University of California San Diego 链接:https://arxiv.org/abs/2106.11755 摘要:深度学习的出现伴随着围绕用户数据和服务提供商模式的隐私问题。我们专注于私有推理(PI),其中的目标是使用服务提供者的模型对用户的数据样本执行推理。用于深度网络的现有PI方法使得密码安全推断具有很小的功能损失;但是,它们会产生严重的延迟成本,主要是由非线性网络操作(如ReLUs)引起的。本文提出了一种基于微搜索策略的卷积小区设计方法Sphynx。Sphynx在CIFAR-100上实现了对所有现有私有推理方法的Pareto控制,并设计了支持Tiny-ImageNet和ImageNet上密码私有推理的大规模网络。 摘要:The emergence of deep learning has been accompanied by privacy concerns surrounding users' data and service providers' models. We focus on private inference (PI), where the goal is to perform inference on a user's data sample using a service provider's model. Existing PI methods for deep networks enable cryptographically secure inference with little drop in functionality; however, they incur severe latency costs, primarily caused by non-linear network operations (such as ReLUs). This paper presents Sphynx, a ReLU-efficient network design method based on micro-search strategies for convolutional cell design. Sphynx achieves Pareto dominance over all existing private inference methods on CIFAR-100. We also design large-scale networks that support cryptographically private inference on Tiny-ImageNet and ImageNet.
【2】 Sequential Late Fusion Technique for Multi-modal Sentiment Analysis 标题:多模态情感分析的序贯晚融合技术
作者:Debapriya Banerjee,Fotios Lygerakis,Fillia Makedon 机构:The University of Texas at Arlington, Arlington, Texas, USA 备注:2 pages, 1 figure, 1 table 链接:https://arxiv.org/abs/2106.11473 摘要:多模态情感分析对于为用户提供更好的交互体验起着重要的作用。多模态数据中的每种模态都可以提供不同的观点或揭示用户情绪状态的独特方面。在这项工作中,我们使用了来自MOSI数据集的文本、音频和视频模式,并提出了一种新的基于多头部注意LSTM网络的融合技术。最后,我们执行一个分类任务并评估其性能。 摘要:Multi-modal sentiment analysis plays an important role for providing better interactive experiences to users. Each modality in multi-modal data can provide different viewpoints or reveal unique aspects of a user's emotional state. In this work, we use text, audio and visual modalities from MOSI dataset and we propose a novel fusion technique using a multi-head attention LSTM network. Finally, we perform a classification task and evaluate its performance.
【3】 Efficient Inference via Universal LSH Kernel 标题:基于通用LSH核的高效推理
作者:Zichang Liu,Benjamin Coleman,Anshumali Shrivastava 机构:Department of Computer Science, Rice University, Houston, TX , Department of Electrical and Computer Engineering 链接:https://arxiv.org/abs/2106.11426 摘要:大型机器学习模型在各种任务上取得了前所未有的性能,并随着go-to技术的发展而不断发展。然而,在资源受限的环境中部署这些计算和内存消耗模型带来了新的挑战。在这项工作中,我们提出了数学上可证明的Representer Sketch,这是一组简洁的计数数组,可以通过简单的散列计算和聚合来近似推理过程。Representer-Sketch建立在核心文献中流行的Representer定理的基础上,因此得名,它提供了一种通用的基本方法来解决有效推理的问题,该问题超越了常用的方法,如量化、迭代剪枝和知识提取。将神经网络函数转化为加权核密度表示,用我们的素描算法可以非常有效地估计出核密度。实验表明,Representer-Sketch在不降低精度的情况下,存储需求减少了114倍,计算复杂度减少了59倍。 摘要:Large machine learning models achieve unprecedented performance on various tasks and have evolved as the go-to technique. However, deploying these compute and memory hungry models on resource constraint environments poses new challenges. In this work, we propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name, providing a generic fundamental alternative to the problem of efficient inference that goes beyond the popular approach such as quantization, iterative pruning and knowledge distillation. A neural network function is transformed to its weighted kernel density representation, which can be very efficiently estimated with our sketching algorithm. Empirically, we show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in computation complexity without any drop in accuracy.
【4】 Membership Inference on Word Embedding and Beyond 标题:词嵌入和词外的隶属度推理
作者:Saeed Mahloujifar,Huseyin A. Inan,Melissa Chase,Esha Ghosh,Marcello Hasegawa 机构:Princeton University, Microsoft Research, Microsoft Corporation 链接:https://arxiv.org/abs/2106.11384 摘要:在文本处理上下文中,大多数ML模型都建立在单词嵌入的基础上。这些嵌入本身是在一些数据集上训练的,可能包含敏感数据。在某些情况下,这种训练是独立完成的,在另一些情况下,它是作为一个更大的、特定于任务的模型训练的一部分进行的。在这两种情况下,考虑基于嵌入层的成员推理攻击是理解敏感信息泄漏的一种方法。但是,有点令人惊讶的是,对单词嵌入的成员推理攻击以及它们在使用这些嵌入的自然语言处理(NLP)任务中的作用,仍然没有得到充分的研究。在这项工作中,我们证明了在现实假设下,单词嵌入容易受到黑盒成员身份推理攻击。此外,我们还通过另外两个主要的NLP应用程序(分类和文本生成)证明了这种泄漏仍然存在,即使嵌入层没有暴露给攻击者。我们证明了我们的MI攻击对分类器模型和基于LSTM的语言模型具有较高的攻击精度。实际上,我们的攻击是对文本生成模型的一种廉价的成员推断攻击,它不需要目标模型的知识,也不需要对文本生成模型进行任何昂贵的训练。 摘要:In the text processing context, most ML models are built on word embeddings. These embeddings are themselves trained on some datasets, potentially containing sensitive data. In some cases this training is done independently, in other cases, it occurs as part of training a larger, task-specific model. In either case, it is of interest to consider membership inference attacks based on the embedding layer as a way of understanding sensitive information leakage. But, somewhat surprisingly, membership inference attacks on word embeddings and their effect in other natural language processing (NLP) tasks that use these embeddings, have remained relatively unexplored. In this work, we show that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions. Furthermore, we show that this leakage persists through two other major NLP applications: classification and text-generation, even when the embedding layer is not exposed to the attacker. We show that our MI attack achieves high attack accuracy against a classifier model and an LSTM-based language model. Indeed, our attack is a cheaper membership inference attack on text-generative models, which does not require the knowledge of the target model or any expensive training of text-generative models as shadow models.
【5】 Understanding top-down attention using task-oriented ablation design 标题:使用面向任务的消融设计理解自上而下的注意
作者:Freddie Bickford Smith,Brett D Roads,Xiaoliang Luo,Bradley C Love 机构:University of Oxford, University College London 链接:https://arxiv.org/abs/2106.11339 摘要:自上而下的注意力允许神经网络,无论是人工的还是生物的,专注于与给定任务最相关的信息。众所周知,这可以提高视觉感知能力。但注意力是如何促进知觉的,尤其是在自然环境下,比如在日常场景中识别物体时,还不清楚。注意力有助于处理视觉任务的哪些方面?我们的目标是通过一个基于任务导向设计的计算实验来回答这个问题。首先,我们定义了广泛的视觉任务,并确定了任务可变性的六个因素。然后在每个任务上,我们比较两个神经网络的性能,一个有自上而下的注意,另一个没有。这些比较揭示了注意知觉提升的任务依赖性,使人们对注意所扮演的角色有了更清晰的认识。然而,许多现有的认知账户将注意力与刺激水平变量联系起来,如视觉混乱和物体尺度,我们发现系统水平变量具有更大的解释力,这些变量捕捉了模型、训练数据分布和任务格式之间的相互作用。这一发现表明,注意力研究方式的转变可能是卓有成效的。我们公开了我们的代码和结果,以及与基于ImageNet的实验相关的统计数据。我们的贡献有助于开发更多类似人类的视觉模型和设计更多信息的机器学习实验。 摘要:Top-down attention allows neural networks, both artificial and biological, to focus on the information most relevant for a given task. This is known to enhance performance in visual perception. But it remains unclear how attention brings about its perceptual boost, especially when it comes to naturalistic settings like recognising an object in an everyday scene. What aspects of a visual task does attention help to deal with? We aim to answer this with a computational experiment based on a general framework called task-oriented ablation design. First we define a broad range of visual tasks and identify six factors that underlie task variability. Then on each task we compare the performance of two neural networks, one with top-down attention and one without. These comparisons reveal the task-dependence of attention's perceptual boost, giving a clearer idea of the role attention plays. Whereas many existing cognitive accounts link attention to stimulus-level variables, such as visual clutter and object scale, we find greater explanatory power in system-level variables that capture the interaction between the model, the distribution of training data and the task format. This finding suggests a shift in how attention is studied could be fruitful. We make publicly available our code and results, along with statistics relevant to ImageNet-based experiments beyond this one. Our contribution serves to support the development of more human-like vision models and the design of more informative machine-learning experiments.
【6】 Analysis and Tuning of a Voice Assistant System for Dysfluent Speech 标题:一种不流利语音辅助系统的分析与调谐
作者:Vikramjit Mitra,Zifang Huang,Colin Lea,Lauren Tooley,Sarah Wu,Darren Botten,Ashwini Palekar,Shrinath Thelapurath,Panayiotis Georgiou,Sachin Kajarekar,Jefferey Bigham 机构:Apple, Cupertino, CA, USA 备注:5 pages, 1 page reference, 2 figures 链接:https://arxiv.org/abs/2106.11759 摘要:语音发音的不流畅和变异会严重降低语音识别性能,对于许多中重度语音障碍患者来说,语音操作系统不起作用。当前的语音识别系统主要是用流利的说话者的数据来训练的,因此不能很好地推广到语音不流畅的情况,如声音或单词重复、声音延长或听觉障碍。这项工作的重点是定量分析消费者的语音识别系统对个人谁口吃和生产为导向的方法,以提高性能的共同语音助理任务(即,“天气如何?”)。在基线检查时,该系统引入了大量的插入和替换错误,导致流利性障碍患者的预期语音词错误率(isWER)下降13.64%(绝对值)。我们表明,在现有的混合语音识别系统中,只要简单地调整解码参数,就可以将流利性障碍个体的isWER提高24%(相对)。与所有口吃严重程度的18名受试者的默认设置相比,调整这些参数可以提高3.6%的领域识别率和1.7%的意图识别率。 摘要:Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6% better domain recognition and 1.7% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities.
检测相关(3篇)
【1】 Towards Reducing Labeling Cost in Deep Object Detection 标题:深部目标检测中降低标注成本的研究
作者:Ismail Elezi,Zhiding Yu,Anima Anandkumar,Laura Leal-Taixe,Jose M. Alvarez 机构:Laura Leal-Taixé, TUM, NVIDIA, CALTECH 备注:Includes supplementary material 链接:https://arxiv.org/abs/2106.11921 摘要:深度神经网络在目标检测方面已经达到了很高的精度,但是它们的成功依赖于大量的标记数据。为了减少对标签的依赖,人们提出了各种主动学习策略,通常基于检测器的置信度。然而,这些方法偏向于性能最好的类,并且可能导致获得的数据集不能很好地代表测试集中的数据。在这项工作中,我们提出了一个统一的主动学习框架,该框架考虑了检测器的不确定性和鲁棒性,确保网络在所有类中都能准确地执行。此外,我们的方法能够伪标记非常有信心的预测,抑制潜在的分布漂移,同时进一步提高模型的性能。实验表明,该方法在PASCAL-VOC07 12和MS-COCO上的综合性能优于多种主动学习方法,相对性能提高了7.7%,标记成本降低了82%。 摘要:Deep neural networks have reached very high accuracy on object detection but their success hinges on large amounts of labeled data. To reduce the dependency on labels, various active-learning strategies have been proposed, typically based on the confidence of the detector. However, these methods are biased towards best-performing classes and can lead to acquired datasets that are not good representatives of the data in the testing set. In this work, we propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector, ensuring that the network performs accurately in all classes. Furthermore, our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift while further boosting the performance of the model. Experiments show that our method comprehensively outperforms a wide range of active-learning methods on PASCAL VOC07 12 and MS-COCO, having up to a 7.7% relative improvement, or up to 82% reduction in labeling cost.
【2】 Detecting Anomalous User Behavior in Remote Patient Monitoring 标题:远程患者监护中异常用户行为的检测
作者:Deepti Gupta,Maanak Gupta,Smriti Bhatt,Ali Saman Tosun 机构:∗§Dept. of Computer Science, University of Texas at San Antonio, San Antonio, Texas , USA, †Dept. of Computer Science, Tennessee Technological University, Cookeville, Tennessee , USA 链接:https://arxiv.org/abs/2106.11844 摘要:使用可穿戴和非可穿戴医疗物联网(IoMT)的远程患者监护(RPM)服务的增长有望提高诊断质量,并促进对各种医疗状况的及时治疗。与此同时,IoMT设备的激增增加了恶意活动的可能性,这些恶意活动可能导致灾难性后果,包括个人信息被盗、数据泄露和医疗设备受损,从而使人的生命处于危险之中。IoMT设备产生大量数据,反映用户行为模式,包括个人和日常社交活动以及日常健康监测。在这种情况下,由于各种原因可能产生异常,包括意外的用户行为、传感器故障或来自恶意/受损设备的异常值。为了解决这个问题,迫切需要开发一个框架来保护智能医疗基础设施,以识别和缓解异常。本文提出了一种基于IoMT和智能家居设备的RPM异常检测模型。提出了一种基于隐马尔可夫模型(HMM)的异常检测方法,分析了智能家居和智能健康设备环境下的正常用户行为,并对异常用户行为进行了识别。我们设计了一个包含多个IoMT设备和家庭传感器的测试平台来收集数据,并使用HMM模型来训练网络和用户的行为数据。提出的基于HMM的异常检测模型在RPM环境下对异常的识别率达到98%以上。 摘要:The growth in Remote Patient Monitoring (RPM) services using wearable and non-wearable Internet of Medical Things (IoMT) promises to improve the quality of diagnosis and facilitate timely treatment for a gamut of medical conditions. At the same time, the proliferation of IoMT devices increases the potential for malicious activities that can lead to catastrophic results including theft of personal information, data breach, and compromised medical devices, putting human lives at risk. IoMT devices generate tremendous amount of data that reflect user behavior patterns including both personal and day-to-day social activities along with daily routine health monitoring. In this context, there are possibilities of anomalies generated due to various reasons including unexpected user behavior, faulty sensor, or abnormal values from malicious/compromised devices. To address this problem, there is an imminent need to develop a framework for securing the smart health care infrastructure to identify and mitigate anomalies. In this paper, we present an anomaly detection model for RPM utilizing IoMT and smart home devices. We propose Hidden Markov Model (HMM) based anomaly detection that analyzes normal user behavior in the context of RPM comprising both smart home and smart health devices, and identifies anomalous user behavior. We design a testbed with multiple IoMT devices and home sensors to collect data and use the HMM model to train using network and user behavioral data. Proposed HMM based anomaly detection model achieved over 98% accuracy in identifying the anomalies in the context of RPM.
【3】 Data Augmentation for Opcode Sequence Based Malware Detection 标题:基于操作码序列的恶意软件检测数据增强
作者:Niall McLaughlin,Jesus Martinez del Rincon 机构:The Centre for Secure Information Technologies (CSIT), Queen’s University Belfast 备注:11 pages, 3 figures 链接:https://arxiv.org/abs/2106.11821 摘要:数据扩充已经成功地应用于深度学习的许多领域,以显著提高模型的性能。通常,数据增强模拟数据中的真实变化,以增加训练集的明显多样性。然而,对于基于操作码的恶意软件分析,深度学习方法已经达到了最先进的性能,如何应用数据扩充目前还不清楚。在本文中,我们研究了不同的数据扩充方法,从使用固定变换的基本方法开始,转向适应数据的方法。提出了一种新的数据扩充方法,该方法利用网络中的一个操作码嵌入层及其相应的操作码嵌入矩阵来实现训练过程中的自适应数据扩充。据我们所知,这是第一篇论文进行了系统的研究,不同的增强方法应用于基于操作码序列的恶意软件分类。 摘要:Data augmentation has been successfully used in many areas of deep-learning to significantly improve model performance. Typically data augmentation simulates realistic variations in data in order to increase the apparent diversity of the training-set. However, for opcode-based malware analysis, where deep learning methods are already achieving state of the art performance, it is not immediately clear how to apply data augmentation. In this paper we study different methods of data augmentation starting with basic methods using fixed transformations and moving to methods that adapt to the data. We propose a novel data augmentation method based on using an opcode embedding layer within the network and its corresponding opcode embedding matrix to perform adaptive data augmentation during training. To the best of our knowledge this is the first paper to carry out a systematic study of different augmentation methods applied to opcode sequence based malware classification.
分类|识别(4篇)
【1】 Notes on the H-measure of classifier performance 标题:关于分级机性能的H-测度的注记
作者:D. J. Hand,C. Anagnostopoulos 备注:13 pages 链接:https://arxiv.org/abs/2106.11888 摘要:H-度量是一种分类器性能度量,它考虑了应用的上下文,而不需要设置相对错误分类代价的刚性值。自2009年推出以来,它已被广泛采用。本文回答了用户提出的各种问题,包括它的解释、加权函数的选择、它是否严格恰当以及它的一致性等问题,并将它与其他工作联系起来。 摘要:The H-measure is a classifier performance measure which takes into account the context of application without requiring a rigid value of relative misclassification costs to be set. Since its introduction in 2009 it has become widely adopted. This paper answers various queries which users have raised since its introduction, including questions about its interpretation, the choice of a weighting function, whether it is strictly proper, and its coherence, and relates the measure to other work.
【2】 User Identification across Social Networking Sites using User Profiles and Posting Patterns 标题:使用用户简档和发布模式跨社交网站识别用户
作者:Prashant Solanki,Kwan Hui Lim,Aaron Harwood 机构:∗School of Computing and Information Systems, The University of Melbourne, †Information Systems Technology and Design Pillar, Singapore University of Technology and Design 备注:Accepted at the 2021 International Joint Conference on Neural Networks (IJCNN'21) 链接:https://arxiv.org/abs/2106.11815 摘要:随着在线社交网站(osn)和移动设备的普及,人们越来越依赖各种osn与家人和朋友保持联系,并将其作为信息来源。例如,用户可能出于不同的目的使用多个OSN,例如使用Flickr与家人和朋友共享假日图片,使用Twitter发布关于他们想法的短消息。跨多个OSN识别同一用户是一项重要的任务,因为这使我们能够了解不同OSN中用户的使用模式,在用户注册新OSN时提出建议,以及各种其他有用的应用程序。为了解决这一问题,我们提出了一种基于多层感知器的算法,该算法使用了多种类型的特征,即:(i)用户特征,如姓名、位置、描述(ii)用户生成内容的时间分布;基于用户名、实名和描述的嵌入。利用Twitter和Flickr的用户及其发布活动数据集,我们对这些特征如何影响两个osn的用户识别性能进行了实证研究,并基于不同的特征讨论了我们的主要发现。 摘要:With the prevalence of online social networking sites (OSNs) and mobile devices, people are increasingly reliant on a variety of OSNs for keeping in touch with family and friends, and using it as a source of information. For example, a user might utilise multiple OSNs for different purposes, such as using Flickr to share holiday pictures with family and friends, and Twitter to post short messages about their thoughts. Identifying the same user across multiple OSNs is an important task as this allows us to understand the usage patterns of users among different OSNs, make recommendations when a user registers for a new OSN, and various other useful applications. To address this problem, we proposed an algorithm based on the multilayer perceptron using various types of features, namely: (i) user profile, such as name, location, description; (ii) temporal distribution of user generated content; and (iii) embedding based on user name, real name and description. Using a Twitter and Flickr dataset of users and their posting activities, we perform an empirical study on how these features affect the performance of user identification across the two OSNs and discuss our main findings based on the different features.
【3】 Gradient-based Label Binning in Multi-label Classification 标题:基于梯度的标签装箱在多标签分类中的应用
作者:Michael Rapp,Eneldo Loza Mencía,Johannes Fürnkranz,Eyke Hüllermeier 机构:Johannes F¨urnkranz, and Eyke H¨ullermeier, Knowledge Engineering Group, TU Darmstadt, Darmstadt, Germany, Computational Data Analysis Group, JKU Linz, Linz, Austria, Heinz Nixdorf Institute, Paderborn University, Paderborn, Germany 链接:https://arxiv.org/abs/2106.11690 摘要:在多标签分类中,单个示例可能同时与多个类别标签相关联,因此对标签之间的依赖关系建模的能力对于有效优化不可分解的评估度量(例如子集0/1损失)至关重要。梯度提升框架为专门针对这样的损失函数的学习模型提供了很好的研究基础,最近的研究证明了在多标签设置中实现高预测精度的能力。二阶导数的利用,正如许多最近的boosting方法所使用的那样,有助于指导不可分解损失的最小化,这是因为它将有关标签对的信息纳入了优化过程中。缺点是,即使标签的数量很小,计算成本也很高。在这项工作中,我们通过将一种新的近似技术集成到boosting过程中,解决了这种方法的计算瓶颈——求解线性方程组的需要。基于训练过程中计算出的导数,我们将标签动态地分组到一个预定义的容器中,以对线性系统的维数施加一个上界。我们的实验,使用现有的基于规则的算法,表明这可以提高训练的速度,而不会对预测性能造成任何明显的损失。 摘要:In multi-label classification, where a single example may be associated with several class labels at the same time, the ability to model dependencies between labels is considered crucial to effectively optimize non-decomposable evaluation measures, such as the Subset 0/1 loss. The gradient boosting framework provides a well-studied foundation for learning models that are specifically tailored to such a loss function and recent research attests the ability to achieve high predictive accuracy in the multi-label setting. The utilization of second-order derivatives, as used by many recent boosting approaches, helps to guide the minimization of non-decomposable losses, due to the information about pairs of labels it incorporates into the optimization process. On the downside, this comes with high computational costs, even if the number of labels is small. In this work, we address the computational bottleneck of such approach -- the need to solve a system of linear equations -- by integrating a novel approximation technique into the boosting procedure. Based on the derivatives computed during training, we dynamically group the labels into a predefined number of bins to impose an upper bound on the dimensionality of the linear system. Our experiments, using an existing rule-based algorithm, suggest that this may boost the speed of training, without any significant loss in predictive performance.
【4】 Incremental Deep Neural Network Learning using Classification Confidence Thresholding 标题:基于分类置信度阈值的增量式深度神经网络学习
作者:Justin Leo,Jugal Kalita 机构:Department of Computer Science, University of Colorado at Colorado Springs 备注:Accepted to IEEE TNNLS 链接:https://arxiv.org/abs/2106.11437 摘要:大多数现代神经网络分类都没有考虑到未知的概念。训练好的神经网络通常是在一个不现实的场景中进行测试,只需要从一组封闭的已知类中提取例子。为了开发更真实的模型,引入了在开放集环境中工作的概念。这反过来又导致了增量学习的概念,即具有自己的体系结构和初始训练数据集的模型可以在测试阶段识别未知类,并在检测到新类的证据时自动更新自身。增量学习中出现的一些问题是,重复训练分类器时资源的使用效率低下,并且随着时间的推移,多个类的添加会降低分类精度。实例化新类的过程会根据需要重复多次,从而产生错误。针对这些问题,本文提出了一种基于分类置信阈值的素数神经网络增量学习方法,通过限制遗忘来保持较高的学习精度。一个精益的方法也被用来减少神经网络再训练中使用的资源。提出的方法是基于这样一种思想,即网络即使暴露在与新类相关联的有限数量的样本中,也能够增量地学习新类。这种方法可以应用于大多数现有的神经网络,只需对网络结构进行最小的修改。 摘要:Most modern neural networks for classification fail to take into account the concept of the unknown. Trained neural networks are usually tested in an unrealistic scenario with only examples from a closed set of known classes. In an attempt to develop a more realistic model, the concept of working in an open set environment has been introduced. This in turn leads to the concept of incremental learning where a model with its own architecture and initial trained set of data can identify unknown classes during the testing phase and autonomously update itself if evidence of a new class is detected. Some problems that arise in incremental learning are inefficient use of resources to retrain the classifier repeatedly and the decrease of classification accuracy as multiple classes are added over time. This process of instantiating new classes is repeated as many times as necessary, accruing errors. To address these problems, this paper proposes the Classification Confidence Threshold approach to prime neural networks for incremental learning to keep accuracies high by limiting forgetting. A lean method is also used to reduce resources used in the retraining of the neural network. The proposed method is based on the idea that a network is able to incrementally learn a new class even when exposed to a limited number samples associated with the new class. This method can be applied to most existing neural networks with minimal changes to network architecture.
表征(1篇)
【1】 Provably Efficient Representation Learning in Low-rank Markov Decision Processes 标题:低秩马尔可夫决策过程中可证明有效的表示学习
作者:Weitong Zhang,Jiafan He,Dongruo Zhou,Amy Zhang,Quanquan Gu 机构:UniversityofCalifornia, edu¶Department of Electrical Engineering and Computer Science 备注:27 pages 链接:https://arxiv.org/abs/2106.11935 摘要:深度强化学习(DRL)的成功归功于学习一种适合于潜在探索和开发任务的表征的能力。然而,现有的线性函数逼近可证强化学习算法往往假设特征表示已知且固定。为了了解表示学习如何提高RL的效率,我们研究了一类低秩Markov决策过程(MDPs)的表示学习,其中转换核可以用双线性形式表示。我们提出一个可证明有效的算法称为ReLEX,它可以同时学习表示和执行探索。我们证明了ReLEX算法在没有表示学习的情况下,其性能并不比最新的算法差,并且如果表示函数类在整个状态-动作空间上具有某种温和的覆盖特性,则在样本效率方面会有严格的提高。 摘要:The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task. However, existing provable reinforcement learning algorithms with linear function approximation often assume the feature representation is known and fixed. In order to understand how representation learning can improve the efficiency of RL, we study representation learning for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose a provably efficient algorithm called ReLEX that can simultaneously learn the representation and perform exploration. We show that ReLEX always performs no worse than a state-of-the-art algorithm without representation learning, and will be strictly better in terms of sample efficiency if the function class of representations enjoys a certain mild "coverage'' property over the whole state-action space.
编码器(2篇)
【1】 Deep Stereo Image Compression with Decoder Side Information using Wyner Common Information 标题:基于Wyner公共信息的解码器边信息深度立体图像压缩
作者:Nitish Mital,Ezgi Ozyilkan,Ali Garjani,Deniz Gunduz 机构:†Department of Electrical and Electronics Engineering, Imperial College London, ‡Department of Computer Engineering, Sharif University of Technology 备注:19 pages, 18 figures 链接:https://arxiv.org/abs/2106.11723 摘要:提出了一种新的深度神经网络(DNN)结构,当相关图像仅在解码器处作为边信息时,对图像进行压缩。这个问题在信息论中称为分布式信源编码(DSC)。特别地,我们考虑一对由于视场重叠而通常彼此具有高相关性的立体图像,并且假设该对图像中的一个图像将被压缩和传输,而另一个图像仅在解码器处可用。在该结构中,编码器将输入图像映射到一个潜在空间,量化潜在表示,并使用熵编码对其进行压缩。对解码器进行训练,从输入图像和相关图像中提取Wyner公共信息。接收到的潜在表示和本地生成的公共信息通过解码器网络来获得输入图像的增强重建。公共信息在接收者处提供了相关信息的简洁表示。通过对KITTI立体图像数据集的训练,验证了该方法的有效性。我们的结果表明,该架构能够利用解码器只边信息,并优于以往的工作,立体图像压缩解码器边信息。 摘要:We present a novel deep neural network (DNN) architecture for compressing an image when a correlated image is available as side information only at the decoder. This problem is known as distributed source coding (DSC) in information theory. In particular, we consider a pair of stereo images, which generally have high correlation with each other due to overlapping fields of view, and assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder. In the proposed architecture, the encoder maps the input image to a latent space, quantizes the latent representation, and compresses it using entropy coding. The decoder is trained to extract the Wyner's common information between the input image and the correlated image from the latter. The received latent representation and the locally generated common information are passed through a decoder network to obtain an enhanced reconstruction of the input image. The common information provides a succinct representation of the relevant information at the receiver. We train and demonstrate the effectiveness of the proposed approach on the KITTI dataset of stereo image pairs. Our results show that the proposed architecture is capable of exploiting the decoder-only side information, and outperforms previous work on stereo image compression with decoder side information.
【2】 Encoder-Decoder Architectures for Clinically Relevant Coronary Artery Segmentation 标题:用于临床相关冠状动脉分割的编解码器结构
作者:João Lourenço Silva,Miguel Nobre Menezes,Tiago Rodrigues,Beatriz Silva,Fausto J. Pinto,Arlindo L. Oliveira 机构:INESC-ID Instituto Superior Técnico, University of Lisbon, Cardiology Department, CAML, CCUL, Lisbon School of Medicine, University of Lisbon 链接:https://arxiv.org/abs/2106.11447 摘要:冠状动脉X射线造影是诊断和治疗冠状动脉疾病的重要临床手段,每年约占全球死亡人数的16%。然而,在这些过程中获得的图像分辨率低,对比度差,使病变检测和评估具有挑战性。准确的冠状动脉分割不仅有助于缓解这些问题,而且可以提取相关的解剖特征,以便通过定量方法进行进一步分析。虽然冠状动脉的自动分割已经被提出过,但是以前的方法都使用非最佳分割标准,导致了不太有用的结果。大多数方法要么只分割主要血管,丢弃其余血管的重要信息,要么主要基于对比度信息分割整个冠状动脉树,产生噪声输出,其中包括与诊断无关的血管。我们采用更适合的临床标准,并根据其临床相关性分割血管。此外,我们同时进行导管分割,由于导管的已知直径提供的比例因子,这可能有助于诊断,而且这是一项尚未取得良好结果的任务。为了得到最佳的方法,我们进行了广泛的比较研究编码器-解码器结构训练的组合焦点损失和一个变种的广义骰子损失。基于EfficientNet和UNet 体系结构,我们提出了一系列高效和高性能的分割模型,使用了一种新的解码器体系结构EfficientNet ,其最佳性能版本的动脉和导管类的平均dice分数分别为0.8904和0.7526,平均广义骰子得分为0.9234。 摘要:Coronary X-ray angiography is a crucial clinical procedure for the diagnosis and treatment of coronary artery disease, which accounts for roughly 16% of global deaths every year. However, the images acquired in these procedures have low resolution and poor contrast, making lesion detection and assessment challenging. Accurate coronary artery segmentation not only helps mitigate these problems, but also allows the extraction of relevant anatomical features for further analysis by quantitative methods. Although automated segmentation of coronary arteries has been proposed before, previous approaches have used non-optimal segmentation criteria, leading to less useful results. Most methods either segment only the major vessel, discarding important information from the remaining ones, or segment the whole coronary tree based mostly on contrast information, producing a noisy output that includes vessels that are not relevant for diagnosis. We adopt a better-suited clinical criterion and segment vessels according to their clinical relevance. Additionally, we simultaneously perform catheter segmentation, which may be useful for diagnosis due to the scale factor provided by the catheter's known diameter, and is a task that has not yet been performed with good results. To derive the optimal approach, we conducted an extensive comparative study of encoder-decoder architectures trained on a combination of focal loss and a variant of generalized dice loss. Based on the EfficientNet and the UNet architectures, we propose a line of efficient and high-performance segmentation models using a new decoder architecture, the EfficientUNet , whose best-performing version achieved average dice scores of 0.8904 and 0.7526 for the artery and catheter classes, respectively, and an average generalized dice score of 0.9234.
优化|敛散性(7篇)
【1】 Local policy search with Bayesian optimization 标题:基于贝叶斯优化的本地策略搜索
作者:Sarah Müller,Alexander von Rohr,Sebastian Trimpe 机构:Max Planck Institute for Intelligent Systems, Stuttgart, Germany, Institute for Data Science in Mechanical Engineering, RWTH Aachen University, Germany, IAV GmbH, Gifhorn, Germany, Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany 链接:https://arxiv.org/abs/2106.11899 摘要:强化学习(Reinforcement learning,RL)的目标是通过与环境的交互来寻找最优策略。因此,学习复杂的行为需要大量的样本,这在实践中是禁止的。然而,用于局部搜索的策略梯度往往是从随机扰动中获得的,而不是系统地推理和主动地选择信息样本。这些随机样本产生高方差估计,因此在样本复杂度方面是次优的。主动选择信息样本是贝叶斯优化的核心,它从过去的样本中构造一个目标的概率替代物来推理后续的信息样本。在这篇论文中,我们提议将两个世界结合起来。我们开发了一个利用目标函数及其梯度的概率模型的算法。基于该模型,该算法决定在何处查询含噪的零阶预言,以提高梯度估计。该算法是一种新型的策略搜索方法,并与现有的黑盒算法进行了比较。比较表明,改进的样本复杂度和减少方差在广泛的经验评价综合目标。此外,我们强调了在流行的RL基准上进行主动采样的好处。 摘要:Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.
【2】 Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization 标题:基于多目标贝叶斯优化的潜伏期感知神经结构搜索
作者:David Eriksson,Pierce I-Jen Chuang,Sam Daulton,Ahmed Aly,Arun Babu,Akshat Shrivastava,Peng Xia,Shicong Zhao,Ganesh Venkatesh,Maximilian Balandat 机构:Facebook, Samuel Daulton∗, ∗ = Equal contribution. 备注:To Appear at the 8th ICML Workshop on Automated Machine Learning, ICML 2021 链接:https://arxiv.org/abs/2106.11890 摘要:在为设备部署调整大型机器学习模型的体系结构和超参数时,需要了解设备延迟和模型精度之间的最佳权衡。在这项工作中,我们利用高维搜索空间上贝叶斯优化和多目标贝叶斯优化的最新方法学进展,有效地探索Facebook上生产规模的设备自然语言理解模型的这些权衡。 摘要:When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trade-offs for a production-scale on-device natural language understanding model at Facebook.
【3】 On Constrained Optimization in Differentiable Neural Architecture Search 标题:可微神经结构搜索中的约束优化问题研究
作者:Kaitlin Maile,Erwan Lecarpentier,Hervé Luga,Dennis G. Wilson 机构: IRIT, University of Toulouse, Toulouse, France, IRT Saint-Exupery, Toulouse, France, ISAE-SUPAERO, University of Toulouse, Toulouse, France 链接:https://arxiv.org/abs/2106.11655 摘要:可微结构搜索(DARTS)是最近提出的一种基于可微松弛的神经结构搜索(NAS)方法。由于它的成功,最近提出了许多分析和改进DARTS框架的变体。通过将问题看作一个约束双层优化问题,我们提出并分析了三种改进方案,即结构权重竞争、更新调度和向离散化方向的正则化。首先,我们引入一种新的方法来激活体系结构权重,它可以防止边缘内的混淆竞争,并允许跨边缘的公平比较,以帮助离散化。接下来,我们提出了一个基于每小批量网络信息的动态调度方案,以使体系结构更新更加及时。最后,我们考虑了两种正则化方法,基于离散化的邻近性和交替方向乘子法(ADMM)算法,以促进早期离散化。我们的结果表明,这种新的激活方案减少了最终架构的大小,并且正则化提高了搜索结果的可靠性,同时保持了与NAS中最新技术相当的性能,特别是当与我们新的动态通知调度一起使用时。 摘要:Differentiable Architecture Search (DARTS) is a recently proposed neural architecture search (NAS) method based on a differentiable relaxation. Due to its success, numerous variants analyzing and improving parts of the DARTS framework have recently been proposed. By considering the problem as a constrained bilevel optimization, we propose and analyze three improvements to architectural weight competition, update scheduling, and regularization towards discretization. First, we introduce a new approach to the activation of architecture weights, which prevents confounding competition within an edge and allows for fair comparison across edges to aid in discretization. Next, we propose a dynamic schedule based on per-minibatch network information to make architecture updates more informed. Finally, we consider two regularizations, based on proximity to discretization and the Alternating Directions Method of Multipliers (ADMM) algorithm, to promote early discretization. Our results show that this new activation scheme reduces final architecture size and the regularizations improve reliability in search results while maintaining comparable performance to state-of-the-art in NAS, especially when used with our new dynamic informed schedule.
【4】 Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization 标题:用矩阵化梯度调整步长提高了最优化和泛化能力
作者:Yizhou Wang,Yue Kang,Can Qin,Yi Xu,Huan Wang,Yulun Zhang,Yun Fu 机构:† Northeastern University, ‡ University of California, Davis 备注:40 pages, 27 figures 链接:https://arxiv.org/abs/2106.11514 摘要:自适应梯度方法,如textsc{Adam},在机器学习中取得了巨大的成功。用梯度的平方来衡量梯度,这种方法可以实现现代深层神经网络的快速训练。然而,它们的泛化能力比随机梯度下降(SGD)差,并且在训练的早期阶段容易陷入局部极小。有趣的是,我们发现将预条件项中的梯度替换为textsc{Adam}中的动量化版本可以很好地解决这些问题。直觉上,带有动量的梯度包含了更精确的方向信息,因此它的二阶矩估计是比原始梯度更好的缩放选择。因此,我们建议textsc{AdaMomentum}作为一个新的优化器,以达到更快地训练同时更好地推广的目标。我们进一步发展了一个理论来支持优化和泛化的改进,并在凸和非凸条件下提供收敛性保证。在各种模型和任务上的大量实验表明,在视觉任务上,AdaMomentum与SGD表现出相当的性能,并且在包括语言处理在内的其他任务上也取得了一致的最新成果。 摘要:Adaptive gradient methods, such as textsc{Adam}, have achieved tremendous success in machine learning. Scaling gradients by square roots of the running averages of squared past gradients, such methods are able to attain rapid training of modern deep neural networks. Nevertheless, they are observed to generalize worse than stochastic gradient descent (textsc{SGD}) and tend to be trapped in local minima at an early stage during training. Intriguingly, we discover that substituting the gradient in the preconditioner term with the momentumized version in textsc{Adam} can well solve the issues. The intuition is that gradient with momentum contains more accurate directional information and therefore its second moment estimation is a better choice for scaling than raw gradient's. Thereby we propose textsc{AdaMomentum} as a new optimizer reaching the goal of training faster while generalizing better. We further develop a theory to back up the improvement in optimization and generalization and provide convergence guarantee under both convex and nonconvex settings. Extensive experiments on various models and tasks demonstrate that textsc{AdaMomentum} exhibits comparable performance to textsc{SGD} on vision tasks, and achieves state-of-the-art results consistently on other tasks including language processing.
【5】 Instance-Optimal Compressed Sensing via Posterior Sampling 标题:基于后验采样的实例最优压缩感知
作者:Ajil Jalal,Sushrut Karmalkar,Alexandros G. Dimakis,Eric Price 机构: 1University of Texas atAustin, Department of Electrical and Computer Engineering 2University of Texas at Austin, Department of ComputerScience 链接:https://arxiv.org/abs/2106.11438 摘要:我们描述了从已知先验分布提取的信号的压缩感知的测量复杂性,即使先验的支持是整个空间(而不是,比方说,稀疏向量)。对于高斯测量和信号的{any}先验分布,我们证明了后验采样估计器达到了接近最优的恢复保证。此外,只要分布估计(例如,来自可逆生成模型的估计)接近Wasserstein距离的真实分布,该结果对模型失配具有鲁棒性。我们利用Langevin动力学实现了深生成先验的后验抽样估计,并通过实证发现它比MAP产生了更为精确的多样性估计。 摘要:We characterize the measurement complexity of compressed sensing of signals drawn from a known prior distribution, even when the support of the prior is the entire space (rather than, say, sparse vectors). We show for Gaussian measurements and emph{any} prior distribution on the signal, that the posterior sampling estimator achieves near-optimal recovery guarantees. Moreover, this result is robust to model mismatch, as long as the distribution estimate (e.g., from an invertible generative model) is close to the true distribution in Wasserstein distance. We implement the posterior sampling estimator for deep generative priors using Langevin dynamics, and empirically find that it produces accurate estimates with more diversity than MAP.
【6】 Asynchronous Stochastic Optimization Robust to Arbitrary Delays 标题:对任意时延具有鲁棒性的异步随机优化
作者:Alon Cohen,Amit Daniely,Yoel Drori,Tomer Koren,Mariano Schain 机构:Google Research Tel Aviv, Hebrew University of Jerusalem, Blavatnik School of Computer Science, Tel Aviv University 链接:https://arxiv.org/abs/2106.11879 摘要:我们考虑了具有延迟梯度的随机优化问题,其中,在每个时间步$t$,算法使用步骤$t-dt$中的过时随机梯度对任意延迟$dt$进行更新。此设置抽象了异步分布式优化,其中中央服务器接收由工作机计算的梯度更新。这些机器可能会经历计算和通信负载,这些负载可能会随着时间的推移而显著变化。在一般的非凸光滑优化环境中,我们给出了一个简单有效的算法,它需要$O(sigma^2/epsilon^4 tau/epsilon^2)$步来寻找$epsilon$-平稳点$x$,其中$tau$是平均延迟$smash{frac{1}{T}sum{T=1}^T d}$,而$sigma^2$是随机梯度的方差。这比以前的工作有所改进,以前的工作表明,随机梯度下降达到了相同的速率,但是相对于最大延迟$max{t}du t$,它可以显著大于平均延迟,特别是在异构分布式系统中。实验证明了该算法在时延分布偏态或重尾情况下的有效性和鲁棒性。 摘要:We consider stochastic optimization with delayed gradients where, at each time step $t$, the algorithm makes an update using a stale stochastic gradient from step $t - d_t$ for some arbitrary delay $d_t$. This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time. In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( sigma^2/epsilon^4 tau/epsilon^2 )$ steps for finding an $epsilon$-stationary point $x$, where $tau$ is the emph{average} delay $smash{frac{1}{T}sum_{t=1}^T d_t}$ and $sigma^2$ is the variance of the stochastic gradients. This improves over previous work, which showed that stochastic gradient decent achieves the same rate but with respect to the emph{maximal} delay $max_{t} d_t$, that can be significantly larger than the average delay especially in heterogeneous distributed systems. Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
【7】 Local convexity of the TAP free energy and AMP convergence for Z2-synchronization 标题:抽头自由能的局部凸性与Z2同步的AMP收敛性
作者:Michael Celentano,Zhou Fan,Song Mei 机构:edu†Department of Statistics and Data Science, Yale University 备注:56 pages 链接:https://arxiv.org/abs/2106.11428 摘要:本文以高维贝叶斯模型的一个典型例子Z2同步为例,研究了基于TAP方法的均值场变分贝叶斯推理。我们证明了当信号强度$lambda>1$(弱恢复阈值)时,TAP自由能泛函在Bayes后验定律的平均值附近存在唯一的局部极小值。此外,在这个极小值的局部邻域中的抽头自由能是强凸的。因此,自然梯度/镜像下降算法从局部初始化获得线性收敛到这个最小值,这可以通过近似消息传递(AMP)的有限次迭代获得。这提供了一个严格的基础,通过最小化的TAP自由能在高维的变分推理。我们还分析了AMP的有限样本收敛性,表明AMP在任何$lambda>1$的抽头极小值处渐近稳定,并且在足够大的$lambda$的谱初始化下线性收敛到该极小值。这样的保证比状态演化分析得到的结果更强,状态演化分析只描述无限样本极限下固定数量的AMP迭代。我们的证明结合Kac-Rice公式和Sudakov-Fernique-Gaussian比较不等式分析了满足强凸性和稳定性条件的临界点在其局部邻域内的复杂性。 摘要:We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength $lambda > 1$ (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a finite number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy. We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any $lambda > 1$, and is linearly convergent to this minimizer from a spectral initialization for sufficiently large $lambda$. Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit. Our proofs combine the Kac-Rice formula and Sudakov-Fernique Gaussian comparison inequality to analyze the complexity of critical points that satisfy strong convexity and stability conditions within their local neighborhoods.
预测|估计(2篇)
【1】 Robust Regression Revisited: Acceleration and Improved Estimation Rates 标题:稳健回归回顾:加速和改进的估计率
作者:Arun Jambulapati,Jerry Li,Tselil Schramm,Kevin Tian 机构:com‡Stanford University, edu§Stanford University 备注:47 pages 链接:https://arxiv.org/abs/2106.11938 摘要:我们研究了强污染模型下统计回归问题的快速算法,目标是在给定不利污染样本的情况下近似优化广义线性模型(GLM)。这一研究领域的前期工作是基于Prasad等人的稳健梯度下降框架(一种使用有偏梯度查询的一阶方法)或Diakonikolas等人的Sever框架(一种称为平稳点查找器的迭代离群点去除方法)。我们提出了一种用于稳健回归问题的近似线性时间算法,与现有算法相比,该算法具有更好的运行时间或估计保证。对于光滑GLMs的一般情况(如logistic回归),我们证明了Prasad等人的稳健梯度下降框架可以被加速,并且证明了我们的算法扩展到优化Lipschitz GLMs的Moreau包络(如支持向量机),回答了文献中的几个开放性问题。对于稳健线性回归的研究,我们提出了一种比以往的近似线性时间算法获得更高估计率的方法。有趣的是,我们的方法首先在Bakshi和Prasad的平方和算法中引入了一个可辨识性证明,该证明在需要大量多项式运行时间和样本复杂度的情况下实现了最优的错误率。我们在Sever框架下重新解释了他们的证明,在较少的分布假设下得到了一个更快、更有效的算法。 摘要:We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of Diakonikolas et. al., an iterative outlier-removal method calling a stationary point finder. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art. For the general case of smooth GLMs (e.g. logistic regression), we show that the robust gradient descent framework of Prasad et. al. can be accelerated, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e.g. support vector machines), answering several open questions in the literature. For the well-studied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearly-linear time algorithms. Interestingly, our method starts with an identifiability proof introduced in the context of the sum-of-squares algorithm of Bakshi and Prasad, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sample-efficient algorithm under fewer distributional assumptions.
【2】 Rank-one matrix estimation with groupwise heteroskedasticity 标题:具有GroupWise异方差的秩一矩阵估计
作者:Joshua K. Behne,Galen Reeves 备注:22 pages, 3 figures 链接:https://arxiv.org/abs/2106.11950 摘要:研究了在不同噪声水平下,由高斯观测值估计秩一矩阵的问题。这个问题是由聚类和社区检测中的应用程序引起的,其中潜在变量可以被划分为固定数量的已知组(例如,用户和项目),并且矩阵的块对应于不同类型的成对交互(例如,用户-用户、用户-项目或项目-项目交互)。在块数固定而变量数趋于无穷大的情况下,我们证明了矩阵和潜变量估计的最小均方误差的渐近精确公式。这些公式描述了问题的弱恢复阈值,并揭示了噪声方差在一定尺度下的不变性。我们还推导了一个近似的消息传递算法和一个梯度下降算法,并通过实验证明了这些算法在一定的区域内达到了信息论的极限。 摘要:We study the problem of estimating a rank-one matrix from Gaussian observations where different blocks of the matrix are observed under different noise levels. This problem is motivated by applications in clustering and community detection where latent variables can be partitioned into a fixed number of known groups (e.g., users and items) and the blocks of the matrix correspond to different types of pairwise interactions (e.g., user-user, user-item, or item-item interactions). In the setting where the number of blocks is fixed while the number of variables tends to infinity, we prove asymptotically exact formulas for the minimum mean-squared error in estimating both the matrix and the latent variables. These formulas describe the weak recovery thresholds for the problem and reveal invariance properties with respect to certain scalings of the noise variance. We also derive an approximate message passing algorithm and a gradient descent algorithm and show empirically that these algorithms achieve the information-theoretic limits in certain regimes.
其他神经网络|深度学习|模型|建模(26篇)
【1】 Revisiting Deep Learning Models for Tabular Data 标题:表格数据的深度学习模型再探
作者:Yury Gorishniy,Ivan Rubachev,Valentin Khrulkov,Artem Babenko 机构:† Yandex, Russia, ‡ Moscow Institute of Physics and Technology, Russia, ♣ National Research University Higher School of Economics, Russia 备注:Code: this https URL 链接:https://arxiv.org/abs/2106.11959 摘要:对表格数据进行深入学习的必要性仍然是大量研究工作解决的一个悬而未决的问题。最近关于表格式DL的文献提出了几个据报道优于传统的“浅”模型(如梯度增强决策树)的深层结构。然而,由于现有的工作经常使用不同的基准和调整协议,因此不清楚所提出的模型是否普遍优于GBDT。此外,这些模型往往没有相互比较,因此,为实践者确定最佳的深度模型是一个挑战。在这项工作中,我们从一个最近为表格数据开发的DL模型的主要家族的彻底回顾开始。我们在广泛的数据集上仔细地调整和评估它们,并揭示了两个重要的发现。首先,我们证明了GBDT和DL模型之间的选择在很大程度上依赖于数据,并且仍然没有普遍的最优解决方案。其次,我们证明了一个简单的ResNet架构是一个非常有效的基线,它比DL文献中的大多数复杂模型都要好。最后,我们为表格数据设计了一个简单的Transformer架构,它成为一个新的强DL基线,并减少了GBDT和以GBDT为主的数据集上的DL模型之间的差距。 摘要:The necessity of deep learning for tabular data is still an unanswered question addressed by a large number of research efforts. The recent literature on tabular DL proposes several deep architectures reported to be superior to traditional "shallow" models like Gradient Boosted Decision Trees. However, since existing works often use different benchmarks and tuning protocols, it is unclear if the proposed models universally outperform GBDT. Moreover, the models are often not compared to each other, therefore, it is challenging to identify the best deep model for practitioners. In this work, we start from a thorough review of the main families of DL models recently developed for tabular data. We carefully tune and evaluate them on a wide range of datasets and reveal two significant findings. First, we show that the choice between GBDT and DL models highly depends on data and there is still no universally superior solution. Second, we demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models from the DL literature. Finally, we design a simple adaptation of the Transformer architecture for tabular data that becomes a new strong DL baseline and reduces the gap between GBDT and DL models on datasets where GBDT dominates.
【2】 RootPainter3D: Interactive-machine-learning enables rapid and accurate contouring for radiotherapy 标题:RootPainter3D:交互式机器学习使放射治疗的轮廓快速准确
作者:Abraham George Smith,Jens Petersen,Cynthia Terrones-Campos,Anne Kiil Berthelsen,Nora Jarrett Forbes,Sune Darkner,Lena Specht,Ivan Richter Vogelius 机构:Department of Computer Science, University of Copenhagen, Department of Oncology, Rigshospitalet, University of Copenhagen, Department of Infectious Diseases, Rigshospitalet, University of Copenhagen 链接:https://arxiv.org/abs/2106.11942 摘要:危险器官轮廓仍然是放射治疗中的一个瓶颈,许多深度学习方法在评估临床数据时达不到预期的效果。我们调查的准确性和时间节省使用交互式机器学习方法的风险器官轮廓任务。我们将该方法与Eclipse等高线软件进行了比较,发现它与手工绘制的方法有很强的一致性,骰子得分为0.95。使用校正注释创建的注释也需要更少的时间来创建,因为更多的图像被注释,与手动方法相比节省了大量的时间,在923个图像被描绘之后,心脏平均需要2分2秒来描绘,与手动描绘时的7分1秒相比。我们的实验表明,交互式机器学习与纠正注释提供了一种快速和方便的方式,非计算机科学家训练深度学习模型,以分割自己的兴趣结构,作为常规临床工作流程的一部分。源代码位于href{https://github.com/Abe404/RootPainter3D}{此HTTPS URL}。 摘要:Organ-at-risk contouring is still a bottleneck in radiotherapy, with many deep learning methods falling short of promised results when evaluated on clinical data. We investigate the accuracy and time-savings resulting from the use of an interactive-machine-learning method for an organ-at-risk contouring task. We compare the method to the Eclipse contouring software and find strong agreement with manual delineations, with a dice score of 0.95. The annotations created using corrective-annotation also take less time to create as more images are annotated, resulting in substantial time savings compared to manual methods, with hearts that take 2 minutes and 2 seconds to delineate on average, after 923 images have been delineated, compared to 7 minutes and 1 seconds when delineating manually. Our experiment demonstrates that interactive-machine-learning with corrective-annotation provides a fast and accessible way for non computer-scientists to train deep-learning models to segment their own structures of interest as part of routine clinical workflows. Source code is available at href{https://github.com/Abe404/RootPainter3D}{this HTTPS URL}.
【3】 On the importance of cross-task features for class-incremental learning 标题:论跨任务特征对班级增量学习的重要性
作者:Albin Soutif--Cormerais,Marc Masana,Joost Van de Weijer,Bartłomiej Twardowski 机构:∗LAMP team, Computer Vision Center, UAB Barcelona, Spain, †VLO team, Institute of Computer Graphics and Vision, TU Graz, Austria 备注:includes supplementary material 链接:https://arxiv.org/abs/2106.11930 摘要:在类增量学习中,资源有限的agent需要学习一系列的分类任务,形成了一个不断增长的分类问题,其约束条件是不能访问以前任务中的数据。任务增量学习的主要区别在于,在推理时任务ID可用,学习者还需要执行跨任务区分,即区分没有一起看到的类。解决这个问题的方法有很多种,大多数都是利用大小不可忽略的外部存储器(缓冲区)。本文主要研究了跨任务特征的学习及其对课堂基本重放策略的影响。我们还定义了一个新的遗忘量表来衡量班级增量学习,发现遗忘并不是导致学习成绩低下的主要原因。实验结果表明,未来的类增量学习算法不仅要防止遗忘,而且要提高跨任务特征的质量。当每个任务的类数较少时,这一点尤为重要。 摘要:In class-incremental learning, an agent with limited resources needs to learn a sequence of classification tasks, forming an ever growing classification problem, with the constraint of not being able to access data from previous tasks. The main difference with task-incremental learning, where a task-ID is available at inference time, is that the learner also needs to perform cross-task discrimination, i.e. distinguish between classes that have not been seen together. Approaches to tackle this problem are numerous and mostly make use of an external memory (buffer) of non-negligible size. In this paper, we ablate the learning of cross-task features and study its influence on the performance of basic replay strategies used for class-IL. We also define a new forgetting measure for class-incremental learning, and see that forgetting is not the principal cause of low performance. Our experimental results show that future algorithms for class-incremental learning should not only prevent forgetting, but also aim to improve the quality of the cross-task features. This is especially important when the number of classes per task is small.
【4】 Physics-Informed Deep Reversible Regression Model for Temperature Field Reconstruction of Heat-Source Systems 标题:热源系统温度场重构的物理信息深度可逆回归模型
作者:Zhiqiang Gong,Weien Zhou,Jun Zhang,Wei Peng,Wen Yao 机构: Yao are with the NationalInnovation Institute of Defense Technology 链接:https://arxiv.org/abs/2106.11929 摘要:在工程系统中,为了保证热源的正常工作乃至长寿命,对热源部件寿命期内的温度进行监测是必不可少的。然而,以往的方法主要采用插值估计,需要大量的温度张量才能得到精确的估计。为了解决这一问题,本文提出了一种新的基于物理信息的温度场重构深度代理模型。首先,定义了热源系统的温度场重建任务。然后,本文针对所提出的任务开发了深度代理模型映射。最后,考虑到传热的物理性质,本文提出了四种不同的损失,并结合这些损失建立了深层替代模型。对典型的二维热源系统进行了实验研究,验证了所提出的基于物理信息的深度替代模型重建温度场的有效性和有效性。 摘要:Temperature monitoring during the life time of heat-source components in engineering systems becomes essential to ensure the normal work and even the long working life of the heat sources. However, prior methods, which mainly use the interpolate estimation, require large amounts of temperature tensors for an accurate estimation. To solve this problem, this work develops a novel physics-informed deep surrogate models for temperature field reconstruction. First, we defines the temperature field reconstruction task of heat-source systems. Then, this work develops the deep surrogate model mapping for the proposed task. Finally, considering the physical properties of heat transfer, this work proposes four different losses and joint learns the deep surrogate model with these losses. Experimental studies have conducted over typical two-dimensional heat-source systems to demonstrate the effectiveness and efficiency of the proposed physics-informed deep surrogate models for temperature field reconstruction.
【5】 Deep Phasor Networks: Connecting Conventional and Spiking Neural Networks 标题:深度相量网络:连接常规神经网络和尖峰神经网络
作者:Wilkie Olin-Ammentorp,Maxim Bazhenov 机构: Department of Medicine, University of California, San Diego, Institute for Neural Computation, University of California, San Diego, Correspondence: 备注:18 pages, 6 figures 链接:https://arxiv.org/abs/2106.11908 摘要:在这项工作中,我们扩展了标准的神经网络,假设神经元的激活与单位圆上的复数的角度相对应,即“相量”。这样的网络中的每一层通过对前一层的相位进行加权叠加并计算新的相位值来产生新的激活。这种广义体系结构允许模型达到高精度,并具有独特的优点,即可以在考虑或不考虑时间变量的情况下执行网络的数学等效版本。重要的是,时域中相位角的值可以稀疏地由一系列周期性重复的δ函数或“尖峰”表示。我们展示了在标准深度学习任务上相量网络的非时域训练,并且证明了这些网络可以在传统的非时域或脉冲时域中执行,而不需要转换步骤。这为构建深度网络提供了一个新的基础,该网络通过适合于神经形态计算硬件的基于时间尖峰的计算进行操作。 摘要:In this work, we extend standard neural networks by building upon an assumption that neuronal activations correspond to the angle of a complex number lying on the unit circle, or 'phasor.' Each layer in such a network produces new activations by taking a weighted superposition of the previous layer's phases and calculating the new phase value. This generalized architecture allows models to reach high accuracy and carries the singular advantage that mathematically equivalent versions of the network can be executed with or without regard to a temporal variable. Importantly, the value of a phase angle in the temporal domain can be sparsely represented by a periodically repeating series of delta functions or 'spikes'. We demonstrate the atemporal training of a phasor network on standard deep learning tasks and show that these networks can then be executed in either the traditional atemporal domain or spiking temporal domain with no conversion step needed. This provides a novel basis for constructing deep networkswhich operate via temporal, spike-based calculations suitable for neuromorphic computing hardware.
【6】 Dangers of Bayesian Model Averaging under Covariate Shift 标题:协变量漂移下贝叶斯模型平均的危险性
作者:Pavel Izmailov,Patrick Nicholson,Sanae Lotfi,Andrew Gordon Wilson 机构:†New York University, ‡Stevens Capital Management 链接:https://arxiv.org/abs/2106.11905 摘要:神经网络的近似贝叶斯推理被认为是标准训练的一种稳健的替代方法,通常对分布外的数据具有良好的性能。然而,贝叶斯神经网络(BNNs)通过全批量哈密顿montecarlo进行高保真近似推理,在协变量漂移下泛化能力较差,甚至不如经典估计。我们解释了这个令人惊讶的结果,说明了贝叶斯模型平均值在协变量变化下实际上是有问题的,特别是在输入特征中的线性依赖性导致缺乏后收缩的情况下。此外,我们还说明了为什么相同的问题不影响许多近似推理程序,或经典的最大后验概率(MAP)训练。最后,我们提出了新的先验知识,以提高BNN的稳健性,许多来源的协变量转移。 摘要:Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
【7】 Failing with Grace: Learning Neural Network Controllers that are Boundedly Unsafe 标题:GRACE失败:学习有界不安全的神经网络控制器
作者:Panagiotis Vlantis,Michael M. Zavlanos 机构:DukeUniversity 链接:https://arxiv.org/abs/2106.11881 摘要:在这项工作中,我们考虑学习一个前馈神经网络(NN)控制器的问题,以安全地引导一个任意形状的平面机器人在一个紧凑和障碍闭塞的工作空间。与现有方法强烈依赖于接近安全状态空间边界的数据点密度来训练具有闭环安全保证的神经网络控制器不同,我们提出了一种方法,该方法取消了在实际中难以满足的数据上的假设,并允许优美的安全违规,即。,在空间上可以控制的有界大小的。为此,我们采用可达性分析方法来封装训练过程中的安全约束。具体地说,为了获得闭环系统前向可达集的一个计算效率高的过逼近,我们将机器人的状态空间划分为若干个单元,并在训练好的控制律下自适应地细分包含可能脱离安全集的状态的单元。为了做到这一点,我们首先设计适当的欠近似和过近似的机器人的足迹,自适应地细分配置空间成细胞。然后,利用每个单元的前向可达集和不可行机器人配置集之间的重叠作为安全违规的度量,我们在损失函数中引入惩罚项来惩罚训练过程中的这种重叠。因此,我们的方法可以为闭环系统学习一个安全向量场,同时,通过闭环系统的前向可达集的过逼近与不安全状态集的重叠,给出整个配置空间上安全违规的数值最坏情况界。此外,它还可以控制计算复杂度和这些边界的紧密性之间的折衷。最后,我们进行了仿真研究,验证了该方案的有效性。 摘要:In this work, we consider the problem of learning a feed-forward neural network (NN) controller to safely steer an arbitrarily shaped planar robot in a compact and obstacle-occluded workspace. Unlike existing methods that depend strongly on the density of data points close to the boundary of the safe state space to train NN controllers with closed-loop safety guarantees, we propose an approach that lifts such assumptions on the data that are hard to satisfy in practice and instead allows for graceful safety violations, i.e., of a bounded magnitude that can be spatially controlled. To do so, we employ reachability analysis methods to encapsulate safety constraints in the training process. Specifically, to obtain a computationally efficient over-approximation of the forward reachable set of the closed-loop system, we partition the robot's state space into cells and adaptively subdivide the cells that contain states which may escape the safe set under the trained control law. To do so, we first design appropriate under- and over-approximations of the robot's footprint to adaptively subdivide the configuration space into cells. Then, using the overlap between each cell's forward reachable set and the set of infeasible robot configurations as a measure for safety violations, we introduce penalty terms into the loss function that penalize this overlap in the training process. As a result, our method can learn a safe vector field for the closed-loop system and, at the same time, provide numerical worst-case bounds on safety violation over the whole configuration space, defined by the overlap between the over-approximation of the forward reachable set of the closed-loop system and the set of unsafe states. Moreover, it can control the tradeoff between computational complexity and tightness of these bounds. Finally, we provide a simulation study that verifies the efficacy of the proposed scheme.
【8】 Randomness In Neural Network Training: Characterizing The Impact of Tooling 标题:神经网络训练中的随机性:工具影响的表征
作者:Donglin Zhuang,Xingyao Zhang,Shuaiwen Leon Song,Sara Hooker 机构:School of Computer Science, The University of Sydney, Department of Computer Science, University of Washington, Google Research, Brain 备注:21 pages, 10 figures 链接:https://arxiv.org/abs/2106.11872 摘要:机器学习中对决定论的追求不成比例地集中在描述由算法设计选择引入的噪声的影响上。在这项工作中,我们解决了一个不太好理解和研究的问题:我们如何选择工具引入随机深入神经网络训练。我们在不同类型的硬件、加速器、最先进的网络和开源数据集上进行大规模实验,以描述工具选择如何影响系统中的非确定性水平、所述非确定性的影响以及消除不同噪声源的成本。我们的发现令人惊讶,并表明非决定论的影响在细微差别。虽然top-1准确度之类的顶级指标没有受到明显影响,但数据分布某些部分的模型性能对引入随机性更为敏感。我们的结果表明确定性工具对于人工智能的安全性至关重要。然而,我们也发现,确保确定性的成本在神经网络结构和硬件类型之间有很大差异,例如,相对于非确定性训练,在广泛使用的GPU加速器结构的频谱上,开销高达746%$、241%$和196%$。本文使用的源代码可在https://github.com/usyd-fsalab/NeuralNetworkRandomness. 摘要:The quest for determinism in machine learning has disproportionately focused on characterizing the impact of noise introduced by algorithmic design choices. In this work, we address a less well understood and studied question: how does our choice of tooling introduce randomness to deep neural network training. We conduct large scale experiments across different types of hardware, accelerators, state of art networks, and open-source datasets, to characterize how tooling choices contribute to the level of non-determinism in a system, the impact of said non-determinism, and the cost of eliminating different sources of noise. Our findings are surprising, and suggest that the impact of non-determinism in nuanced. While top-line metrics such as top-1 accuracy are not noticeably impacted, model performance on certain parts of the data distribution is far more sensitive to the introduction of randomness. Our results suggest that deterministic tooling is critical for AI safety. However, we also find that the cost of ensuring determinism varies dramatically between neural network architectures and hardware types, e.g., with overhead up to $746%$, $241%$, and $196%$ on a spectrum of widely used GPU accelerator architectures, relative to non-deterministic training. The source code used in this paper is available at https://github.com/usyd-fsalab/NeuralNetworkRandomness.
【9】 Machine learning for risk assessment in gender-based crime 标题:机器学习在性别犯罪风险评估中的应用
作者:Ángel González-Prieto,Antonio Brú,Juan Carlos Nuño,José Luis González-Álvarez 机构: Universidad Aut´onoma de Madrid, Universidad Complutense de Madrid, Universidad Polit´ecnica de Madrid 备注:17 pages, 5 figures, 4 tables. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 链接:https://arxiv.org/abs/2106.11847 摘要:基于性别的犯罪是当代社会最令人关注的祸害之一。为了从根本上消除这一威胁,世界各国政府投入了大量的经济和人力资源。尽管作出了这些努力,但准确预测性别暴力受害者再次受到攻击的风险仍然是一个非常困难的开放问题。开发新的方法来发布准确、公平和快速的预测,将使警察部队能够选择最适当的措施来防止再次犯罪。在这项工作中,我们建议应用机器学习(ML)技术来建立模型,准确预测性别暴力罪犯的再犯风险。这项工作贡献的相关性有三个方面:(i)提出的最大似然法优于现有的基于经典统计技术的风险评估算法;(ii)这项研究是通过一个官方专用数据库进行的,有40000多份关于性别暴力的报告,以及(iii)提出了两个新的质量指标,用于评估模型提供的有效警察保护及其产生的投入资源过载。此外,我们提出了一个混合模型,将统计预测方法与ML方法相结合,允许当局实现从先前存在的模型到基于ML的模型的平稳过渡。这种混合性质使决策过程能够在警察系统的效率和所采取的保护措施的积极性之间取得最佳平衡。 摘要:Gender-based crime is one of the most concerning scourges of contemporary society. Governments worldwide have invested lots of economic and human resources to radically eliminate this threat. Despite these efforts, providing accurate predictions of the risk that a victim of gender violence has of being attacked again is still a very hard open problem. The development of new methods for issuing accurate, fair and quick predictions would allow police forces to select the most appropriate measures to prevent recidivism. In this work, we propose to apply Machine Learning (ML) techniques to create models that accurately predict the recidivism risk of a gender-violence offender. The relevance of the contribution of this work is threefold: (i) the proposed ML method outperforms the preexisting risk assessment algorithm based on classical statistical techniques, (ii) the study has been conducted through an official specific-purpose database with more than 40,000 reports of gender violence, and (iii) two new quality measures are proposed for assessing the effective police protection that a model supplies and the overload in the invested resources that it generates. Additionally, we propose a hybrid model that combines the statistical prediction methods with the ML method, permitting authorities to implement a smooth transition from the preexisting model to the ML-based model. This hybrid nature enables a decision-making process to optimally balance between the efficiency of the police system and aggressiveness of the protection measures taken.
【10】 Lower and Upper Bounds on the VC-Dimension of Tensor Network Models 标题:张量网络模型VC维数的上下界
作者:Behnoush Khavari,Guillaume Rabusseau 机构:Université de Montréal, DIRO & Mila, CIFAR AI Chair 链接:https://arxiv.org/abs/2106.11827 摘要:张量网络方法已经成为凝聚态物理进展的一个关键组成部分,并且由于其紧凑地表示高维物体的能力,最近引起了机器学习界的兴趣。例如,张量网络方法可用于有效学习指数大特征空间中的线性模型[Stoudenmire和Schwab,2016]。在这项工作中,我们推导了分类、回归和完备的一大类张量网络模型的VC维和伪维的上下界。我们的上界适用于由任意张量网络结构参数化的线性模型,我们推导了常用张量分解模型(CP、张量列、张量环和Tucker)的下界,表明了我们的一般上界的严密性。这些结果被用来导出一个推广界,它可以应用于低秩矩阵的分类以及基于任何常用张量分解模型的线性分类器。作为我们结果的推论,我们得到了[Stoudenmire and Schwab,2016]中介绍的矩阵积状态分类器的VC维数的一个界,作为所谓键维数的函数(即张量列秩),它回答了Cirac、Garre Rubio和P'erez-Garc'ia在[Cirac et al.,2019]中列出的一个开放问题。 摘要:Tensor network methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the machine learning community for their ability to compactly represent very high-dimensional objects. Tensor network methods can for example be used to efficiently learn linear models in exponentially large feature spaces [Stoudenmire and Schwab, 2016]. In this work, we derive upper and lower bounds on the VC dimension and pseudo-dimension of a large class of tensor network models for classification, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary tensor network structures, and we derive lower bounds for common tensor decomposition models~(CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classification with low rank matrices as well as linear classifiers based on any of the commonly used tensor decomposition models. As a corollary of our results, we obtain a bound on the VC dimension of the matrix product state classifier introduced in [Stoudenmire and Schwab, 2016] as a function of the so-called bond dimension~(i.e. tensor train rank), which answers an open problem listed by Cirac, Garre-Rubio and P'erez-Garc'ia in [Cirac et al., 2019].
【11】 A Stealthy and Robust Fingerprinting Scheme for Generative Models 标题:一种适用于产生式模型的隐形鲁棒指纹方案
作者:Li Guanlin,Guo Shangwei,Wang Run,Xu Guowen,Zhang Tianwei 机构:Nanyang Technological University, Chongqing University, Wuhan University 链接:https://arxiv.org/abs/2106.11760 摘要:本文提出了一种新的用于生成模型知识产权保护的指纹识别方法。判别模型的先验解通常采用对抗性的例子作为指纹,给出异常的推理行为和预测结果。因此,这些方法不是秘密的,很容易被对手识别。我们的方法利用隐形后门技术来克服上述限制。具体来说,我们设计验证样本,其模型输出看起来正常,但可以触发后门分类器作出异常预测。我们提出了一种新的后门嵌入方法,具有独特的三重丢失和细粒度分类,以提高指纹的有效性。大量的评估结果表明,该方法对于不同的GAN模型具有更高的鲁棒性、唯一性和隐蔽性。 摘要:This paper presents a novel fingerprinting methodology for the Intellectual Property protection of generative models. Prior solutions for discriminative models usually adopt adversarial examples as the fingerprints, which give anomalous inference behaviors and prediction results. Hence, these methods are not stealthy and can be easily recognized by the adversary. Our approach leverages the invisible backdoor technique to overcome the above limitation. Specifically, we design verification samples, whose model outputs look normal but can trigger a backdoor classifier to make abnormal predictions. We propose a new backdoor embedding approach with Unique-Triplet Loss and fine-grained categorization to enhance the effectiveness of our fingerprints. Extensive evaluations show that this solution can outperform other strategies with higher robustness, uniqueness and stealthiness for various GAN models.
【12】 Symplectic Learning for Hamiltonian Neural Networks 标题:哈密顿神经网络的辛学习
作者:Marco David,Florian Méhats 机构: SpaceAble (Paris, France), École Normale Supérieure, PSL Research University, F-, Paris, France, Univ. Rennes, CNRS, INRIA-MINGuS, IRMAR-UMR , F-, Rennes, France 备注:10 pages, 4 figures; Source code, datasets and pre-trained models available at this https URL 链接:https://arxiv.org/abs/2106.11753 摘要:机器学习方法广泛应用于自然科学领域,通过观测数据对物理系统进行建模和预测。然而,它们常常被当作理解不足的“黑匣子”,无视现有的数学结构和问题的不变量。最近,哈密顿神经网络(HNNs)的提出朝着统一的“灰盒”方法迈出了第一步,它利用物理洞察力来提高哈密顿系统的性能。本文利用具有不同损失函数的哈密顿系统的辛结构,探索了一种改进的HNNs训练方法。这使损失从人为的下限中解脱出来。我们在数学上保证了一个精确的哈密顿函数的存在,HNN可以学习这个函数。这使我们能够证明和数值分析HNNs所产生的错误,从而使它们完全可以解释。最后,我们提出了一种新的训练后校正方法,仅从离散化的观测数据中获得任意阶的真实哈密顿量。 摘要:Machine learning methods are widely used in the natural sciences to model and predict physical systems from observation data. Yet, they are often used as poorly understood "black boxes," disregarding existing mathematical structure and invariants of the problem. Recently, the proposal of Hamiltonian Neural Networks (HNNs) took a first step towards a unified "gray box" approach, using physical insight to improve performance for Hamiltonian systems. In this paper, we explore a significantly improved training method for HNNs, exploiting the symplectic structure of Hamiltonian systems with a different loss function. This frees the loss from an artificial lower bound. We mathematically guarantee the existence of an exact Hamiltonian function which the HNN can learn. This allows us to prove and numerically analyze the errors made by HNNs which, in turn, renders them fully explainable. Finally, we present a novel post-training correction to obtain the true Hamiltonian only from discretized observation data, up to an arbitrary order.
【13】 Lifted Model Checking for Relational MDPs 标题:关系MDP的提升模型检测
作者:Wen-Chi Yang,Jean-François Raskin,Luc De Raedt 机构:Jean-Fran¸cois Raskin, Received: date Accepted: date 链接:https://arxiv.org/abs/2106.11735 摘要:模型检验是为了验证具有随机和不确定性行为的系统的行为而发展起来的。它被用来为这样的系统提供保证。虽然大多数模型检查方法侧重于命题模型,但各种概率规划和强化框架处理的是关系域,例如条带规划和关系马尔可夫决策过程。在关系设置中使用命题模型检查需要对模型进行接地,这导致了众所周知的状态爆炸问题和困难性。我们提出了pCTL-REBEL,一种在关系mdp上验证pCTL属性的提升模型检查方法。它将relationalbellman更新算子REBEL(一种用于基于模型的关系强化学习的提升值迭代方法)扩展到关系模型检查。PCTL-REBEL被解除,这意味着该模型在抽象的关系层次上利用了对称性和原因,而不是根植。理论上,我们证明了pCTL模型检验方法对于关系mdp是可判定的,即使对于可能无限的域,只要状态有一个有界的大小。实际上,我们给出了提升关系模型检查的算法和实现,并且我们证明了提升方法提高了模型检查方法的可扩展性。 摘要:Model checking has been developed for verifying the behaviour of systems with stochastic and non-deterministic behavior. It is used to provide guarantees about such systems. While most model checking methods focus on propositional models, various probabilistic planning and reinforcement frameworks deal with relational domains, for instance, STRIPS planning and relational Markov Decision Processes. Using propositional model checking in relational settings requires one to ground the model, which leads to the well known state explosion problem and intractability. We present pCTL-REBEL, a lifted model checking approach for verifying pCTL properties on relational MDPs. It extends REBEL, the relational Bellman update operator, which is a lifted value iteration approach for model-based relational reinforcement learning, toward relational model-checking. PCTL-REBEL is lifted, which means that rather than grounding, the model exploits symmetries and reasons at an abstract relational level. Theoretically, we show that the pCTL model checking approach is decidable for relational MDPs even for possibly infinite domains provided that the states have a bounded size. Practically, we contribute algorithms and an implementation of lifted relational model checking, and we show that the lifted approach improves the scalability of the model checking approach.
【14】 FLEA: Provably Fair Multisource Learning from Unreliable Training Data 标题:跳蚤:从不可靠的训练数据中证明公平的多源学习
作者:Eugenia Iofinova,Nikola Konstantinov,Christoph H. Lampert 机构:IST Austria 链接:https://arxiv.org/abs/2106.11732 摘要:公平感知学习的目标是构建分类器,不仅能做出准确的预测,而且不歧视特定群体。它是一个快速发展的机器学习领域,具有深远的社会影响。然而,现有的公平学习方法容易受到训练数据中意外或恶意伪影的影响,从而在不知情的情况下产生不公平的分类器。在这项工作中,我们解决了在健壮的多源环境中,从不可靠的训练数据中公平学习的问题,其中可用的训练数据来自多个源,其中一小部分可能不代表真实的数据分布。我们引入了跳蚤,一种基于过滤的算法,允许学习系统识别和抑制那些如果用于训练会对公平性或准确性产生负面影响的数据源。我们通过在多个数据集上的一系列实验证明了我们的方法的有效性。此外,我们正式地证明,只要受影响的数据源少于一半,只要有足够的数据,跳蚤就可以保护学习者不受不可靠数据的影响。 摘要:Fairness-aware learning aims at constructing classifiers that not only make accurate predictions, but do not discriminate against specific groups. It is a fast-growing area of machine learning with far-reaching societal impact. However, existing fair learning methods are vulnerable to accidental or malicious artifacts in the training data, which can cause them to unknowingly produce unfair classifiers. In this work we address the problem of fair learning from unreliable training data in the robust multisource setting, where the available training data comes from multiple sources, a fraction of which might be not representative of the true data distribution. We introduce FLEA, a filtering-based algorithm that allows the learning system to identify and suppress those data sources that would have a negative impact on fairness or accuracy if they were used for training. We show the effectiveness of our approach by a diverse range of experiments on multiple datasets. Additionally we prove formally that, given enough data, FLEA protects the learner against unreliable data as long as the fraction of affected data sources is less than half.
【15】 Learning Dynamical Systems from Noisy Sensor Measurements using Multiple Shooting 标题:利用多次打靶法从噪声传感器测量中学习动态系统
作者:Armand Jordana,Justin Carpentier,Ludovic Righetti 机构:Tandon School of Engineering, New York University, Brooklyn, NY, Inria, Département d’informatique de l’ENS, École normale supérieure, CNRS, PSL Research, University, Paris, France 链接:https://arxiv.org/abs/2106.11712 摘要:动态系统建模在捕捉和理解复杂的物理现象中起着至关重要的作用。当物理模型不够精确或难以用解析公式描述时,可以使用神经网络等通用函数逼近器直接从传感器测量中获取系统动力学。到目前为止,学习这些神经网络参数的现有方法对大多数感兴趣的动力系统的固有不稳定性非常敏感,这反过来又妨碍了对很长序列的研究。在这项工作中,我们介绍了一个通用的和可扩展的方法,基于多重射击学习潜在的间接观测动态系统的表现。我们在直接从原始图像观察到的系统上实现了最先进的性能。此外,我们证明了我们的方法对噪声测量是鲁棒的,并且可以处理复杂的动力学系统,例如混沌系统。 摘要:Modeling dynamical systems plays a crucial role in capturing and understanding complex physical phenomena. When physical models are not sufficiently accurate or hardly describable by analytical formulas, one can use generic function approximators such as neural networks to capture the system dynamics directly from sensor measurements. As for now, current methods to learn the parameters of these neural networks are highly sensitive to the inherent instability of most dynamical systems of interest, which in turn prevents the study of very long sequences. In this work, we introduce a generic and scalable method based on multiple shooting to learn latent representations of indirectly observed dynamical systems. We achieve state-of-the-art performances on systems observed directly from raw images. Further, we demonstrate that our method is robust to noisy measurements and can handle complex dynamical systems, such as chaotic ones.
【16】 Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models 标题:分布梯度匹配在不确定神经动力学模型学习中的应用
作者:Lenart Treven,Philippe Wenk,Florian Dörfler,Andreas Krause 机构:Learning and Adaptive Systems Group & Automatic Control Lab, ETH Z¨urich, Switzerland, ETH Z¨urich and Max Planck ETH Center for Learning Systems, Florian D¨orfler 链接:https://arxiv.org/abs/2106.11609 摘要:微分方程,特别是神经微分方程,是连续时间系统辨识中的关键技术。虽然许多确定性学习算法是基于伴随方法的数值积分设计的,但许多下游任务,如主动学习、强化学习探索、鲁棒控制或滤波,都需要对预测不确定性进行精确估计。在这项工作中,我们提出了一种新的方法来估计认知不确定的神经常微分方程,避免了数值积分的瓶颈。我们直接在状态空间中建模不确定性,而不是在ODE参数中建模不确定性。我们的算法-分布梯度匹配(DGM)-联合训练一个更平滑的模型和一个动力学模型,并通过最小化Wasserstein损失来匹配它们的梯度。我们的实验表明,与传统的基于数值积分的近似推理方法相比,我们的方法训练速度更快,预测以前看不到的轨迹速度更快,并且在神经ODEs的情况下,更准确。 摘要:Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of predictive uncertainties. In this work, we propose a novel approach towards estimating epistemically uncertain neural ODEs, avoiding the numerical integration bottleneck. Instead of modeling uncertainty in the ODE parameters, we directly model uncertainties in the state space. Our algorithm - distributional gradient matching (DGM) - jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate.
【17】 SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning 标题:SSUL:基于样本类增量学习的未知标注语义分割
作者:Sungmin Cha. Beomyoung Kim,Youngjoon Yoo,Taesup Moon 机构: Department of Electrical and Computer Engineering, Seoul National University, NAVER AI Lab, NAVER Clova 链接:https://arxiv.org/abs/2106.11562 摘要:我们考虑一类增量式语义切分(CISS)问题。虽然最近提出的一些算法利用了知识提取(KD)技术的变体来解决这个问题,但是它们仅仅部分地解决了CISS中导致灾难性遗忘的关键问题;i、 背景类的语义漂移和多标签预测问题。为了更好地应对这些挑战,我们提出了一种新的方法,称为SSUL-M(Semantic Segmentation with Unknown Label with Memory),通过仔细地结合几种为语义分割量身定做的技术。更具体地说,我们有三个主要贡献(1) 在背景类中建模未知类,帮助学习未来类(帮助可塑性),(2)冻结骨干网络和过去分类器,使用二进制交叉熵损失和伪标记克服灾难性遗忘(帮助稳定性),(3)首次在CISS系统中利用微样本记忆提高可塑性和稳定性。结果表明,在标准的基准数据集上,我们的方法比最新的基线具有更好的性能。此外,我们通过深入和广泛的烧蚀分析证明了我们的贡献,并讨论了CISS问题与标准分类增量学习的不同性质。 摘要:We consider a class-incremental semantic segmentation (CISS) problem. While some recently proposed algorithms utilized variants of knowledge distillation (KD) technique to tackle the problem, they only partially addressed the key additional challenges in CISS that causes the catastrophic forgetting; i.e., the semantic drift of the background class and multi-label prediction issue. To better address these challenges, we propose a new method, dubbed as SSUL-M (Semantic Segmentation with Unknown Label with Memory), by carefully combining several techniques tailored for semantic segmentation. More specifically, we make three main contributions; (1) modeling unknown class within the background class to help learning future classes (help plasticity), (2) freezing backbone network and past classifiers with binary cross-entropy loss and pseudo-labeling to overcome catastrophic forgetting (help stability), and (3) utilizing tiny exemplar memory for the first time in CISS to improve both plasticity and stability. As a result, we show our method achieves significantly better performance than the recent state-of-the-art baselines on the standard benchmark datasets. Furthermore, we justify our contributions with thorough and extensive ablation analyses and discuss different natures of the CISS problem compared to the standard class-incremental learning for classification.
【18】 A Logical Neural Network Structure With More Direct Mapping From Logical Relations 标题:一种从逻辑关系到更直接映射的逻辑神经网络结构
作者:Gang Wang 机构: Wang is with the School of Computer Science and Technology, TaiyuanUniversity of Technology 链接:https://arxiv.org/abs/2106.11463 摘要:逻辑关系广泛存在于人类活动中。人类利用它们根据各种条件进行判断和决策,这些条件以规则的形式体现出来。作为一种重要的认知智能,它是将逻辑关系正确地表示和存储到计算机系统中,以便进行自动判断和决策的前提,特别是在医学诊断等高风险领域。然而,现有的数字人工神经网络(ANN)模型在图像识别等感知智能方面表现得很好,而在逻辑表示等认知智能方面表现得不好,阻碍了ANN的进一步应用。为了解决这个问题,研究人员尝试设计逻辑ANN模型来表示和存储逻辑关系。虽然这方面的研究已经取得了一些进展,但由于这些逻辑ANN模型的结构仍然不能更直接地与逻辑关系进行映射,导致相应的逻辑关系无法从它们的网络结构中读出,因此目前的工作仍然存在不足。因此,为了用神经网络结构更清晰地表示逻辑关系,并从中读出逻辑关系,本文根据逻辑表示的需要,通过设计新的逻辑神经元和连接,提出了一种新的逻辑神经网络模型。与最近关于逻辑ANN模型的工作相比,这种逻辑ANN模型采用更直接的映射方法,与逻辑关系有更清晰的对应关系,从而可以按照网络结构的连接模式读出逻辑关系。此外,使用较少的神经元。 摘要:Logical relations widely exist in human activities. Human use them for making judgement and decision according to various conditions, which are embodied in the form of emph{if-then} rules. As an important kind of cognitive intelligence, it is prerequisite of representing and storing logical relations rightly into computer systems so as to make automatic judgement and decision, especially for high-risk domains like medical diagnosis. However, current numeric ANN (Artificial Neural Network) models are good at perceptual intelligence such as image recognition while they are not good at cognitive intelligence such as logical representation, blocking the further application of ANN. To solve it, researchers have tried to design logical ANN models to represent and store logical relations. Although there are some advances in this research area, recent works still have disadvantages because the structures of these logical ANN models still don't map more directly with logical relations which will cause the corresponding logical relations cannot be read out from their network structures. Therefore, in order to represent logical relations more clearly by the neural network structure and to read out logical relations from it, this paper proposes a novel logical ANN model by designing the new logical neurons and links in demand of logical representation. Compared with the recent works on logical ANN models, this logical ANN model has more clear corresponding with logical relations using the more direct mapping method herein, thus logical relations can be read out following the connection patterns of the network structure. Additionally, less neurons are used.
【19】 Hardness of Samples Is All You Need: Protecting Deep Learning Models Using Hardness of Samples 标题:样本的硬度就是你所需要的:用样本的硬度保护深度学习模型
作者:Amir Mahdi Sadeghzadeh,Faezeh Dehghan,Amir Mohammad Sobhanian,Rasool Jalili 机构:Sharif University of Technology 备注:19 pages, 9 figures, 9 tables 链接:https://arxiv.org/abs/2106.11424 摘要:最近的一些研究表明,基于深度神经网络(DNN)的分类器容易受到模型提取攻击。在模型提取攻击中,敌方利用目标分类器创建一个替代分类器,模仿目标分类器的某些准则。本文研究了样本的硬度,证明了模型抽取攻击样本的硬度直方图与正常样本的硬度直方图是可区分的。正态样本来自目标分类器的训练数据分布。由于基于DNN的分类器的训练过程是在多个历元中进行的,因此我们可以把这个过程看作是一个子分类器序列,使得每个子分类器都是在一个历元的末尾创建的。我们使用子分类器序列来计算样品的硬度。研究了样本硬度与分类器输出可信度之间的关系。提出了面向硬度的检测方法(HODA)来检测模型提取攻击的样本序列。结果表明,HODA只需观察100个攻击样本,就可以检测出模型提取攻击的样本序列,成功率很高。我们还研究了对抗性样品的硬度,指出对抗性样品的硬度直方图不同于正常样品的硬度直方图。 摘要:Several recent studies have shown that Deep Neural Network (DNN)-based classifiers are vulnerable against model extraction attacks. In model extraction attacks, an adversary exploits the target classifier to create a surrogate classifier imitating the target classifier with respect to some criteria. In this paper, we investigate the hardness degree of samples and demonstrate that the hardness degree histogram of model extraction attacks samples is distinguishable from the hardness degree histogram of normal samples. Normal samples come from the target classifier's training data distribution. As the training process of DNN-based classifiers is done in several epochs, we can consider this process as a sequence of subclassifiers so that each subclassifier is created at the end of an epoch. We use the sequence of subclassifiers to calculate the hardness degree of samples. We investigate the relation between hardness degree of samples and the trust in the classifier outputs. We propose Hardness-Oriented Detection Approach (HODA) to detect the sample sequences of model extraction attacks. The results demonstrate that HODA can detect the sample sequences of model extraction attacks with a high success rate by only watching 100 attack samples. We also investigate the hardness degree of adversarial examples and indicate that the hardness degree histogram of adversarial examples is distinct from the hardness degree histogram of normal samples.
【20】 Learn Like The Pro: Norms from Theory to Size Neural Computation 标题:向专业人士学习:从理论到大小神经计算的规范
作者:Margaret Trautner,Ziwei Li,Sai Ravela 机构:Department of Computation and Mathematical Sciences, California Institute of Technology, Pasadena, CA, Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, Earth Signals and Systems Group 备注:7 pages 链接:https://arxiv.org/abs/2106.11409 摘要:神经网络的优化设计是许多应用中的一个关键问题。在这里,我们研究了具有多项式非线性的动态系统是如何为神经系统的设计提供信息的。我们提出了一个可学习性度量及其相关特征来量化学习动力学的近平衡行为。将神经系统的可学习性等同于参考系统的等效参数估计度量,从而建立了网络结构的界。这样,来自理论的规范为神经结构提供了一个很好的第一个猜测,然后可以进一步适应数据。该方法既不需要训练,也不需要训练数据。它揭示了一类具有模拟连续或离散时间多项式动力学的乘性节点的神经网络的精确规模。它还为经典的前馈网络提供了相对严格的下限,这与模拟评估是一致的。 摘要:The optimal design of neural networks is a critical problem in many applications. Here, we investigate how dynamical systems with polynomial nonlinearities can inform the design of neural systems that seek to emulate them. We propose a Learnability metric and its associated features to quantify the near-equilibrium behavior of learning dynamics. Equating the Learnability of neural systems with equivalent parameter estimation metric of the reference system establishes bounds on network structure. In this way, norms from theory provide a good first guess for neural structure, which may then further adapt with data. The proposed approach neither requires training nor training data. It reveals exact sizing for a class of neural networks with multiplicative nodes that mimic continuous- or discrete-time polynomial dynamics. It also provides relatively tight lower size bounds for classical feed-forward networks that is consistent with simulated assessments.
【21】 Dive into Deep Learning 标题:深入研究深度学习
作者:Aston Zhang,Zachary C. Lipton,Mu Li,Alexander J. Smola 机构:Jun 备注:(HTML) this https URL (GitHub) this https URL 链接:https://arxiv.org/abs/2106.11342 摘要:这本开源的书代表了我们试图让深度学习变得平易近人,教读者概念、上下文和代码。整本书是在Jupyter笔记本中起草的,无缝集成了说明图、数学和交互式示例以及自包含的代码。我们的目标是提供一个资源,可以(i)免费提供给每个人(ii)提供足够的技术深度,为实际成为应用机器学习科学家提供起点(iii)包括可运行代码,向读者展示如何在实践中解决问题(iv)允许我们和整个社区快速更新(v) 辅以一个论坛,就技术细节进行互动讨论并回答问题。 摘要:This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely available for everyone; (ii) offer sufficient technical depth to provide a starting point on the path to actually becoming an applied machine learning scientist; (iii) include runnable code, showing readers how to solve problems in practice; (iv) allow for rapid updates, both by us and also by the community at large; (v) be complemented by a forum for interactive discussion of technical details and to answer questions.
【22】 Feedback Shaping: A Modeling Approach to Nurture Content Creation 标题:反馈塑造:培育内容创作的建模方法
作者:Ye Tu,Chun Lo,Yiping Yuan,Shaunak Chatterjee 机构:LinkedIn Corporation 备注:None 链接:https://arxiv.org/abs/2106.11312 摘要:社交媒体平台通过诸如newsfeed之类的推荐系统将内容创造者和内容消费者聚集在一起。到目前为止,此类推荐系统的重点主要是对内容消费者偏好进行建模,并对其体验进行优化。然而,同样重要的是,通过优先考虑创作者的兴趣来培育内容创作,因为高质量的内容形成了可持续参与和对话的种子,带来了新的消费者,同时保留了现有的消费者。在这项工作中,我们提出了一种建模方法来预测内容消费者的反馈如何激励创作者。然后,我们利用此模型通过重塑反馈分布来优化内容创建者的新闻提要体验,从而形成更活跃的内容生态系统。实际上,我们讨论了如何平衡消费者和创作者的用户体验,以及如何利用强大的网络效应进行在线A/B测试。我们在LinkedIn newsfeed上展示了一个已部署的用例,在该用例中,我们使用这种方法显著改进了内容创建,同时又不损害消费者的体验。 摘要:Social media platforms bring together content creators and content consumers through recommender systems like newsfeed. The focus of such recommender systems has thus far been primarily on modeling the content consumer preferences and optimizing for their experience. However, it is equally critical to nurture content creation by prioritizing the creators' interests, as quality content forms the seed for sustainable engagement and conversations, bringing in new consumers while retaining existing ones. In this work, we propose a modeling approach to predict how feedback from content consumers incentivizes creators. We then leverage this model to optimize the newsfeed experience for content creators by reshaping the feedback distribution, leading to a more active content ecosystem. Practically, we discuss how we balance the user experience for both consumers and creators, and how we carry out online A/B tests with strong network effects. We present a deployed use case on the LinkedIn newsfeed, where we used this approach to improve content creation significantly without compromising the consumers' experience.
【23】 Sparsistent Model Discovery 标题:稀疏模型发现
作者:Georges Tod,Gert-Jan Both,Remy Kusters 机构:Université de Paris, INSERM U, Center for Research and Interdisciplinarity (CRI), F-, Paris, France 链接:https://arxiv.org/abs/2106.11936 摘要:从非常有限的观测数据中发现时空数据集下的偏微分方程在许多科学领域都是非常重要的。然而,基于稀疏回归的模型发现算法何时能够真正恢复底层物理过程仍然是一个悬而未决的问题。我们将基于Lasso的模型发现算法性能不佳的原因追溯到其潜在的变量选择不一致性:这意味着即使库中存在真实的模型,也可能无法选择。通过首先重新审视套索的不可再现性条件(IRC),我们获得了一些关于何时可能发生这种情况的见解。然后,我们证明了自适应套索比套索有更多的机会验证IRC,并建议将其集成到具有稳定性选择和错误控制的深度学习模型发现框架中。实验结果表明,在高噪声水平下,我们可以用一组超参数从有限的样本中恢复出多个非线性混沌正则偏微分方程。 摘要:Discovering the partial differential equations underlying a spatio-temporal datasets from very limited observations is of paramount interest in many scientific fields. However, it remains an open question to know when model discovery algorithms based on sparse regression can actually recover the underlying physical processes. We trace back the poor of performance of Lasso based model discovery algorithms to its potential variable selection inconsistency: meaning that even if the true model is present in the library, it might not be selected. By first revisiting the irrepresentability condition (IRC) of the Lasso, we gain some insights of when this might occur. We then show that the adaptive Lasso will have more chances of verifying the IRC than the Lasso and propose to integrate it within a deep learning model discovery framework with stability selection and error control. Experimental results show we can recover several nonlinear and chaotic canonical PDEs with a single set of hyperparameters from a very limited number of samples at high noise levels.
【24】 Machine Learning for Model Order Selection in MIMO OFDM Systems 标题:机器学习在MIMO OFDM系统模型阶数选择中的应用
作者:Brenda Vilas Boas,Wolfgang Zirwas,Martin Haardt 机构:Nokia, Germany, Ilmenau University of Technology, Germany 备注:to be published 链接:https://arxiv.org/abs/2106.11633 摘要:各种无线信道估计方法,例如MUSIC和ESPRIT,都依赖于模型阶的先验知识。因此,正确估计组成这些信道的多径分量(mpc)的数目是非常重要的。然而,具有许多散射体的环境可能产生密集分布的mpc。除了噪声外,mpc的这种聚类使得模型顺序选择任务在实践中很难与目前已知的算法相比较。在本文中,我们利用MIMO正交频分复用(OFDM)系统的多维特性,提出了一种机器学习(ML)方法,该方法能够在几乎相干的情况下以比现有方法更高的精度确定mpc的数目。此外,我们的结果显示,我们提出的ML方法具有更高的可靠性。 摘要:A variety of wireless channel estimation methods, e.g., MUSIC and ESPRIT, rely on prior knowledge of the model order. Therefore, it is important to correctly estimate the number of multipath components (MPCs) which compose such channels. However, environments with many scatterers may generate MPCs which are closely spaced. This clustering of MPCs in addition to noise makes the model order selection task difficult in practice to currently known algorithms. In this paper, we exploit the multidimensional characteristics of MIMO orthogonal frequency division multiplexing (OFDM) systems and propose a machine learning (ML) method capable of determining the number of MPCs with a higher accuracy than state of the art methods in almost coherent scenarios. Moreover, our results show that our proposed ML method has an enhanced reliability.
【25】 Physics-constrained deep neural network method for estimating parameters in a redox flow battery 标题:物理约束的深度神经网络氧化还原液流电池参数估计方法
作者:QiZhi He,Panos Stinis,Alexandre Tartakovsky 机构:Physical and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA , Department of Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, Urbana, IL 链接:https://arxiv.org/abs/2106.11451 摘要:本文提出了一种基于物理约束的深度神经网络(PCDNN)的钒液流电池零维模型参数估计方法。在这种方法中,我们使用深度神经网络(DNNs)来逼近模型参数作为操作条件的函数。该方法将VRFB计算模型集成为参数学习过程中的物理约束,提高了参数估计和电池电压预测的精度。通过一个实验数据集,我们证明PCDNN方法可以在一定的运行条件下估计模型参数,并且与传统逆方法估计的具有恒定运行条件独立参数的0D模型预测相比,改进了电压的0D模型预测。我们还证明了PCDNN方法对于DNN训练中未使用的操作条件下的参数估计具有改进的泛化能力。 摘要:In this paper, we present a physics-constrained deep neural network (PCDNN) method for parameter estimation in the zero-dimensional (0D) model of the vanadium redox flow battery (VRFB). In this approach, we use deep neural networks (DNNs) to approximate the model parameters as functions of the operating conditions. This method allows the integration of the VRFB computational models as the physical constraints in the parameter learning process, leading to enhanced accuracy of parameter estimation and cell voltage prediction. Using an experimental dataset, we demonstrate that the PCDNN method can estimate model parameters for a range of operating conditions and improve the 0D model prediction of voltage compared to the 0D model prediction with constant operation-condition-independent parameters estimated with traditional inverse methods. We also demonstrate that the PCDNN approach has an improved generalization ability for estimating parameter values for operating conditions not used in the DNN training.
【26】 Tensor Learning-based Precoder Codebooks for FD-MIMO Systems 标题:基于张量学习的FD-MIMO系统预编码码本
作者:Keerthana Bhogi,Chiranjib Saha,Harpreet S. Dhillon 备注:30 pages, 8 figures 链接:https://arxiv.org/abs/2106.11374 摘要:本文提出了一种利用张量学习设计低复杂度码本的有效方法,用于全维(FD)多输入多输出(MIMO)系统的预编码。特别地,我们没有使用统计信道模型,而是使用无模型的数据驱动方法,以机器学习为基础来生成适应周围传播条件的码本。我们使用FD-MIMO信道的张量表示,并利用其特性来设计量化版本的信道预编码器。通过对信道的张量分解,我们得到了最优预编码器的最佳表示,即两个低维预编码器的Kronecker积(KP)的函数,分别对应于UPA的水平和垂直维。然后,我们量化该预编码器以设计产品码本,使得由于信道状态信息(CSI)的量化而导致的互信息的平均损失最小化。其关键技术贡献在于利用预编码器上的约束条件,将产品码本设计问题简化为笛卡尔积Grassmann流形(CPM)上的无监督聚类问题,其中聚类质心形成有限大小的预编码器码本。通过在CPM上运行$K$-means聚类,可以有效地找到这个代码本。通过在CPM上引入适当的诱导距离度量,我们证明了乘积码本的构造等价于在因子流形上找到对应于水平和垂直维度的最优质心集。仿真结果表明,所提出的设计准则具有良好的码书学习能力和良好的性能。 摘要:This paper develops an efficient procedure for designing low-complexity codebooks for precoding in a full-dimension (FD) multiple-input multiple-output (MIMO) system with a uniform planar array (UPA) antenna at the transmitter (Tx) using tensor learning. In particular, instead of using statistical channel models, we utilize a model-free data-driven approach with foundations in machine learning to generate codebooks that adapt to the surrounding propagation conditions. We use a tensor representation of the FD-MIMO channel and exploit its properties to design quantized version of the channel precoders. We find the best representation of the optimal precoder as a function of Kronecker Product (KP) of two low-dimensional precoders, respectively corresponding to the horizontal and vertical dimensions of the UPA, obtained from the tensor decomposition of the channel. We then quantize this precoder to design product codebooks such that an average loss in mutual information due to quantization of channel state information (CSI) is minimized. The key technical contribution lies in exploiting the constraints on the precoders to reduce the product codebook design problem to an unsupervised clustering problem on a Cartesian Product Grassmann manifold (CPM), where the cluster centroids form a finite-sized precoder codebook. This codebook can be found efficiently by running a $K$-means clustering on the CPM. With a suitable induced distance metric on the CPM, we show that the construction of product codebooks is equivalent to finding the optimal set of centroids on the factor manifolds corresponding to the horizontal and vertical dimensions. Simulation results are presented to demonstrate the capability of the proposed design criterion in learning the codebooks and the attractive performance of the designed codebooks.
其他(23篇)
【1】 Variance-Aware Off-Policy Evaluation with Linear Function Approximation 标题:基于线性函数逼近的方差感知非策略评估
作者:Yifei Min,Tianhao Wang,Dongruo Zhou,Quanquan Gu 机构:edu‡Department of Statistics and Data Science, Yale University, UniversityofCalifornia 备注:70 pages, 4 figures 链接:https://arxiv.org/abs/2106.11960 摘要:研究了线性函数逼近强化学习中的非策略评价问题,其目的是基于行为策略收集的离线数据来估计目标策略的价值函数。我们建议加入值函数的方差信息来提高OPE的样本效率。更具体地说,对于时间不均匀的幕式线性马尔可夫决策过程(MDPs),我们提出了一种算法VA-OPE,它利用值函数的估计方差对拟合Q-迭代中的Bellman残差进行加权。我们证明了我们的算法比已知的结果有更严格的误差界。我们还提供了行为策略和目标策略之间分布转移的细粒度描述。大量的数值实验证实了我们的理论。 摘要:We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose to incorporate the variance information of the value function to improve the sample efficiency of OPE. More specifically, for time-inhomogeneous episodic linear Markov decision processes (MDPs), we propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration. We show that our algorithm achieves a tighter error bound than the best-known result. We also provide a fine-grained characterization of the distribution shift between the behavior policy and the target policy. Extensive numerical experiments corroborate our theory.
【2】 Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes 标题:组合结构的重用:子模基多面体上的快速迭代投影
作者:Jai Moondra,Hassan Mortagy,Swati Gupta 机构:Georgia Institute of Technology 链接:https://arxiv.org/abs/2106.11943 摘要:优化算法,如投影牛顿法、FISTA法、镜像下降法及其变体,具有接近最优的遗憾边界和收敛速度,但在每次迭代中都可能遇到计算“投影”的计算瓶颈(例如,$O(T^{1/2})$在线镜像下降的遗憾)。另一方面,条件梯度变量在每次迭代中解决一个线性优化问题,但会导致次优率(例如,$O(T^{3/4})$在线Frank Wolfe的遗憾)。基于运行时v/s收敛速度的折衷,我们考虑了广泛流行的子模基多面体$B(f)$上邻近点的迭代投影。我们开发了一个工具包来加速计算投影使用离散和连续的观点。随后,我们采用了分步Frank-Wolfe算法来使用这些信息并实现提前终止。对于基于基数的子模多面体的特殊情况,我们将某些Bregman投影的计算时间提高了$Omega(n/log(n))$。我们的理论结果表明,在初步的计算实验中,运行时间减少了几个数量级。 摘要:Optimization algorithms such as projected Newton's method, FISTA, mirror descent and its variants enjoy near-optimal regret bounds and convergence rates, but suffer from a computational bottleneck of computing "projections'' in potentially each iteration (e.g., $O(T^{1/2})$ regret of online mirror descent). On the other hand, conditional gradient variants solve a linear optimization in each iteration, but result in suboptimal rates (e.g., $O(T^{3/4})$ regret of online Frank-Wolfe). Motivated by this trade-off in runtime v/s convergence rates, we consider iterative projections of close-by points over widely-prevalent submodular base polytopes $B(f)$. We develop a toolkit to speed up the computation of projections using both discrete and continuous perspectives. We subsequently adapt the away-step Frank-Wolfe algorithm to use this information and enable early termination. For the special case of cardinality based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of $Omega(n/log(n))$. Our theoretical results show orders of magnitude reduction in runtime in preliminary computational experiments.
【3】 Speed Benchmarking of Genetic Programming Frameworks 标题:遗传编程框架的速度基准
作者:Francisco Baeta,João Correia,Tiago Martins,Penousal Machado 链接:https://arxiv.org/abs/2106.11919 摘要:遗传规划(GP)是已知的遭受负担是计算昂贵的设计。然而,多年来,许多技术已经被开发出来来缓解这个问题,尤其是数据矢量化,可以说仍然是最有吸引力的策略,因为GP的并行性。在这项工作中,我们采用了一系列的基准测试,旨在比较跨多个现有框架的不同矢量化和迭代实现方法的性能和演化能力。也就是说,TensorGP是一个用Python编写的新的开源引擎,它从TensorFlow库中获益匪浅,可以加速GP中的域求值阶段。所提出的性能基准表明,TensorGP引擎成功地取得了领先,相对加速超过了两个数量级的问题与更多的健身情况。此外,由于能够计算更大的域,我们认为TensorGP性能的提高有助于发现更准确的候选解。 摘要:Genetic Programming (GP) is known to suffer from the burden of being computationally expensive by design. While, over the years, many techniques have been developed to mitigate this issue, data vectorization, in particular, is arguably still the most attractive strategy due to the parallel nature of GP. In this work, we employ a series of benchmarks meant to compare both the performance and evolution capabilities of different vectorized and iterative implementation approaches across several existing frameworks. Namely, TensorGP, a novel open-source engine written in Python, is shown to greatly benefit from the TensorFlow library to accelerate the domain evaluation phase in GP. The presented performance benchmarks demonstrate that the TensorGP engine manages to pull ahead, with relative speedups above two orders of magnitude for problems with a higher number of fitness cases. Additionally, as a consequence of being able to compute larger domains, we argue that TensorGP performance gains aid the discovery of more accurate candidate solutions.
【4】 MONCAE: Multi-Objective Neuroevolution of Convolutional Autoencoders 标题:MONCAE:卷积自动编码器的多目标神经进化
作者:Daniel Dimanov,Emili Balaguer-Ballester,Colin Singleton,Shahin Rostami 机构:Bournemouth University, Bournemouth, BH,BB, UK, CountingLab, Reading, UK, Data Science Lab, Polyra Limited, Bournemouth, UK 备注:Published as a Poster paper in ICLR 2021 Neural Architecture Search workshop 链接:https://arxiv.org/abs/2106.11914 摘要:在这篇论文中,我们提出了一种新的神经进化方法来识别卷积自动编码器的结构和超参数。值得注意的是,根据我们目前的知识,我们首次在神经结构搜索自动编码器的背景下使用了超容积指示器。结果表明,该算法对图像进行了10倍以上的压缩,同时保留了足够的信息,实现了大部分任务的图像分类。因此,这种新的方法可以用来加速AutoML流水线的图像压缩。 摘要:In this paper, we present a novel neuroevolutionary method to identify the architecture and hyperparameters of convolutional autoencoders. Remarkably, we used a hypervolume indicator in the context of neural architecture search for autoencoders, for the first time to our current knowledge. Results show that images were compressed by a factor of more than 10, while still retaining enough information to achieve image classification for the majority of the tasks. Thus, this new approach can be used to speed up the AutoML pipeline for image compression.
【5】 Making Invisible Visible: Data-Driven Seismic Inversion with Physics-Informed Data Augmentation 标题:使看不见:物理信息数据增强的数据驱动地震反演
作者:Yuxin Yang,Xitong Zhang,Qiang Guan,Youzuo Lin 备注:13 pages, 12 figures, submitted to IEEE Transactions on Geoscience and Remote Sensing 链接:https://arxiv.org/abs/2106.11892 摘要:深度学习和数据驱动方法在科学领域显示出巨大的潜力。数据驱动技术的前景依赖于大量高质量训练数据集的可用性。由于通过昂贵的物理实验、仪器和模拟获取数据的成本很高,近年来,科学应用中的数据增强技术已成为获取科学数据的一个新方向。然而,现有的数据增强技术源于计算机视觉,产生了物理上不可接受的数据样本,对我们感兴趣的领域问题没有帮助。本文提出了一种基于卷积神经网络的物理信息数据增强技术。具体来说,我们的生成模型利用不同的物理知识(如控制方程、可观察知觉和物理现象)来提高合成数据的质量。为了验证我们的数据增强技术的有效性,我们将其应用于利用模拟CO$2$泄漏数据进行的地下地震全波形反演。我们的兴趣是反演与非常小的CO$2$泄漏相关的地下速度模型。通过综合数值试验验证了本文方法的有效性。通过比较和分析,我们表明,利用我们的物理信息数据增强技术,数据驱动的地震成像可以得到显著的增强。特别地,在使用我们的技术获得的增强训练集时,在一般尺寸泄漏的测试场景中成像质量提高了15%,在小尺寸泄漏的测试场景中成像质量提高了17%。 摘要:Deep learning and data-driven approaches have shown great potential in scientific domains. The promise of data-driven techniques relies on the availability of a large volume of high-quality training datasets. Due to the high cost of obtaining data through expensive physical experiments, instruments, and simulations, data augmentation techniques for scientific applications have emerged as a new direction for obtaining scientific data recently. However, existing data augmentation techniques originating from computer vision, yield physically unacceptable data samples that are not helpful for the domain problems that we are interested in. In this paper, we develop new physics-informed data augmentation techniques based on convolutional neural networks. Specifically, our generative models leverage different physics knowledge (such as governing equations, observable perception, and physics phenomena) to improve the quality of the synthetic data. To validate the effectiveness of our data augmentation techniques, we apply them to solve a subsurface seismic full-waveform inversion using simulated CO$_2$ leakage data. Our interest is to invert for subsurface velocity models associated with very small CO$_2$ leakage. We validate the performance of our methods using comprehensive numerical tests. Via comparison and analysis, we show that data-driven seismic imaging can be significantly enhanced by using our physics-informed data augmentation techniques. Particularly, the imaging quality has been improved by 15% in test scenarios of general-sized leakage and 17% in small-sized leakage when using an augmented training set obtained with our techniques.
【6】 Dynamic Customer Embeddings for Financial Service Applications 标题:金融服务应用程序的动态客户嵌入
作者:Nima Chitsazan,Samuel Sharpe,Dwipam Katariya,Qianyu Cheng,Karthik Rajasethupathy 备注:ICML Workshop on Representation Learning for Finance and E-Commerce Applications 链接:https://arxiv.org/abs/2106.11880 摘要:随着金融服务(FS)公司经历了技术驱动的剧烈变化,新数据流的可用性为更全面地了解客户提供了机会。我们提出了动态客户嵌入(DCE),这是一个利用客户的数字活动和广泛的金融环境来学习FS行业中客户的密集表示的框架。我们的方法检查移动或网络数字会话中的客户操作和页面浏览,会话本身的顺序,以及登录时整个组织中常见财务功能的快照。我们在三个预测问题中使用真实世界的数据测试我们的客户嵌入:1)客户在下一个数字会话中的意图,2)客户在会话后呼叫呼叫中心的概率,以及3)数字会话被欺骗的概率。DCE在所有三个下游问题上都表现出了性能提升。 摘要:As financial services (FS) companies have experienced drastic technology driven changes, the availability of new data streams provides the opportunity for more comprehensive customer understanding. We propose Dynamic Customer Embeddings (DCE), a framework that leverages customers' digital activity and a wide range of financial context to learn dense representations of customers in the FS industry. Our method examines customer actions and pageviews within a mobile or web digital session, the sequencing of the sessions themselves, and snapshots of common financial features across our organization at the time of login. We test our customer embeddings using real world data in three prediction problems: 1) the intent of a customer in their next digital session, 2) the probability of a customer calling the call centers after a session, and 3) the probability of a digital session to be fraudulent. DCE showed performance lift in all three downstream problems.
【7】 Stochastic Polyak Stepsize with a Moving Target 标题:具有运动目标的随机Polyak步长
作者:Robert M. Gower,Aaron Defazio,Michael Rabbat 机构:LTCI, Télécom Paris, Institut Polytechnique de Paris, Facebook AI Research 备注:41 pages, 13 figures, 1 table 链接:https://arxiv.org/abs/2106.11851 摘要:我们提出了一种新的随机梯度方法,它使用记录的过去损失值来减少方差。我们的方法可以解释为Polyak步长的一个新的随机变量,在不假设插值的情况下全局收敛。我们的方法引入辅助变量,每个数据点一个,跟踪每个数据点的损失值。我们为我们的方法提供了一个全局收敛理论,证明它可以解释为在线SGD的一个特殊变体。新方法只在每个数据点上存储一个标量,为减少方差开辟了新的应用,内存是瓶颈。 摘要:We propose a new stochastic gradient method that uses recorded past loss values to reduce the variance. Our method can be interpreted as a new stochastic variant of the Polyak Stepsize that converges globally without assuming interpolation. Our method introduces auxiliary variables, one for each data point, that track the loss value for each data point. We provide a global convergence theory for our method by showing that it can be interpreted as a special variant of online SGD. The new method only stores a single scalar per data point, opening up new applications for variance reduction where memory is the bottleneck.
【8】 Speeding Up OPFython with Numba 标题:使用Numba加速OPFython
作者:Gustavo H. de Rosa,João Paulo Papa 机构:Department of Computing, São Paulo State University, Bauru, São Paulo - Brazil 备注:12 pages, 1 figure 链接:https://arxiv.org/abs/2106.11828 摘要:一个图启发的分类器,被称为最优路径林(OPF),已被证明是一个国家的最先进的算法,可与Logistic回归,支持向量机在各种各样的任务。最近,它基于Python的版本OPFython被提出,以提供更友好的框架和更快的原型环境。然而,基于Python的算法要比基于C的算法慢,在面对大量数据时会影响它们的性能。因此,本文提出了一种简单而高效的使用Numba包的加速算法,它加速了基于Numpy的计算,并试图提高算法的整体性能。实验结果表明,该方法比传统的基于Python的OPF方法取得了更好的效果,并加快了其测距计算速度。 摘要:A graph-inspired classifier, known as Optimum-Path Forest (OPF), has proven to be a state-of-the-art algorithm comparable to Logistic Regressors, Support Vector Machines in a wide variety of tasks. Recently, its Python-based version, denoted as OPFython, has been proposed to provide a more friendly framework and a faster prototyping environment. Nevertheless, Python-based algorithms are slower than their counterpart C-based algorithms, impacting their performance when confronted with large amounts of data. Therefore, this paper proposed a simple yet highly efficient speed up using the Numba package, which accelerates Numpy-based calculations and attempts to increase the algorithm's overall performance. Experimental results showed that the proposed approach achieved better results than the na"ive Python-based OPF and speeded up its distance measurement calculation.
【9】 Evo* 2021 -- Late-Breaking Abstracts Volume 标题:EVO*2021--最新摘要卷
作者:A. M. Mora,A. I. Esparcia-Alcázar 备注:LBAs accepted in Evo* 2021. Part of the Conference Proceedings 链接:https://arxiv.org/abs/2106.11804 摘要:这一卷的最新摘要提交给了Evo*2021会议,会议于2021年4月7日至9日在线举行。这些论文介绍了正在进行的研究和初步结果,调查了生物启发方法(主要是进化计算)的不同方法在不同问题上的应用,其中大多数是真实世界的问题。 摘要:Volumen with the Late-Breaking Abstracts submitted to the Evo* 2021 Conference, held online from 7 to 9 of April 2021. These papers present ongoing research and preliminary results investigating on the application of different approaches of Bioinspired Methods (mainly Evolutionary Computation) to different problems, most of them real world ones.
【10】 SISA: Securing Images by Selective Alteration 标题:SISA:通过选择性更改保护图像
作者:Prutha Gaherwar,Shraddha Joshi,Raviraj Joshi,Rahul Khengare 机构: Department of Computer Technology, Pune Institute of Computer Technology, Pune, Maharashtra, India, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, Tamil Nadu, India 备注:Accepted at ICTCS 2020 链接:https://arxiv.org/abs/2106.11770 摘要:随着移动和摄像设备的普及,以图像形式出现的数字内容急剧增加。随着个人生活不断被记录在照片中,窃听者失去它的风险是一个令人严重关切的问题。辅助存储是存储个人和其他图像的首选介质。我们的工作是关注这些图像的安全性。虽然加密是确保图像安全的最佳方法,但完全加密和解密是一个计算密集型的过程。此外,随着照相机的日臻完善,图像质量、像素密度也大大提高。增加的像素密度使得加密和解密更加昂贵。因此,我们深入研究了基于感兴趣区域的选择性加密和选择性模糊。我们只对图像的选定区域进行编码,而不是加密或模糊整个照片。我们提出了一个比较分析的部分和完全加密的照片。这种编码将帮助我们在不损害安全性的情况下降低加密开销。由于解密时间的减少,利用该技术的应用程序将变得更可用。此外,模糊图像比加密图像可读性更高,允许我们定义安全级别。我们利用机器学习算法如Mask-RCNN(基于区域的卷积神经网络)和YOLO(你只看一次)来选择感兴趣的区域。这些算法为目标识别建立了新的基准。我们开发了一个端到端系统来演示我们的选择性加密思想。 摘要:With an increase in mobile and camera devices' popularity, digital content in the form of images has increased drastically. As personal life is being continuously documented in pictures, the risk of losing it to eavesdroppers is a matter of grave concern. Secondary storage is the most preferred medium for the storage of personal and other images. Our work is concerned with the security of such images. While encryption is the best way to ensure image security, full encryption and decryption is a computationally-intensive process. Moreover, as cameras are getting better every day, image quality, and thus, the pixel density has increased considerably. The increased pixel density makes encryption and decryption more expensive. We, therefore, delve into selective encryption and selective blurring based on the region of interest. Instead of encrypting or blurring the entire photograph, we only encode selected regions of the image. We present a comparative analysis of the partial and full encryption of the photos. This kind of encoding will help us lower the encryption overhead without compromising security. The applications utilizing this technique will become more usable due to the reduction in the decryption time. Additionally, blurred images being more readable than encrypted ones, allowed us to define the level of security. We leverage the machine learning algorithms like Mask-RCNN (Region-based convolutional neural network) and YOLO (You Only Look Once) to select the region of interest. These algorithms have set new benchmarks for object recognition. We develop an end to end system to demonstrate our idea of selective encryption.
【11】 Privacy Amplification via Iteration for Shuffled and Online PNSGD 标题:混洗PNSGD和在线PNSGD的迭代隐私放大
作者:Matteo Sordello,Zhiqi Bu,Jinshuo Dong 机构:Department of Statistics, University of Pennsylvania, Graduate Group in AMCS, University of Pennsylvania, IDEAL Institute, Northwestern University 链接:https://arxiv.org/abs/2106.11767 摘要:本文考虑了Feldman等人提出的迭代隐私放大框架,Asoodeh等人通过收缩系数对其进行了简化。本文主要研究了投影噪声随机梯度下降(PNSGD)算法在隐藏中间更新情况下的隐私保证问题。现有文献中的一个局限性是只研究了早期停止的PNSGD,而对于更广泛应用的PNSGD在随机数据集上的应用还没有结果。此外,当以在线方式接收新数据时,关于如何降低注入噪声的方案还没有被提出。在这项工作中,我们首先证明了随机PNSGD的隐私保证,当每个样本大小$n$的噪声是固定的,但当$n$增加时,它会以预定的速率减少,以达到隐私损失的收敛。然后,我们分析了在线设置,并为注入噪声的大小提供了一个更快的衰减方案,这也保证了隐私丢失的收敛性。 摘要:In this paper, we consider the framework of privacy amplification via iteration, which is originally proposed by Feldman et al. and subsequently simplified by Asoodeh et al. in their analysis via the contraction coefficient. This line of work focuses on the study of the privacy guarantees obtained by the projected noisy stochastic gradient descent (PNSGD) algorithm with hidden intermediate updates. A limitation in the existing literature is that only the early stopped PNSGD has been studied, while no result has been proved on the more widely-used PNSGD applied on a shuffled dataset. Moreover, no scheme has been yet proposed regarding how to decrease the injected noise when new data are received in an online fashion. In this work, we first prove a privacy guarantee for shuffled PNSGD, which is investigated asymptotically when the noise is fixed for each sample size $n$ but reduced at a predetermined rate when $n$ increases, in order to achieve the convergence of privacy loss. We then analyze the online setting and provide a faster decaying scheme for the magnitude of the injected noise that also guarantees the convergence of privacy loss.
【12】 LV-BERT: Exploiting Layer Variety for BERT 标题:LV-BERT:开发BERT的层多样性
作者:Weihao Yu,Zihang Jiang,Fei Chen,Qibin Hou,Jiashi Feng 机构:National University, of Singapore, Huawei Noah’s, Ark Lab 备注:Accepted to Findings of ACL 2021. The code and pre-trained models are available at this https URL 链接:https://arxiv.org/abs/2106.11740 摘要:现代预先训练的语言模型大多是建立在主干上,以交错的顺序堆叠自我注意和前馈层。在本文中,除了这种刻板的层模式之外,我们还从层类型集和层顺序两个方面来利用层的多样性来改进预先训练的模型。具体来说,除了原始的自我注意层和前馈层外,我们在层类型集中引入了卷积,实验发现这对预先训练的模型是有益的。此外,除了原来的交错顺序,我们探索更多的层顺序,以发现更强大的架构。然而,引入的层多样性导致了超过数十亿个候选模型的大的架构空间,而从头开始训练单个候选模型已经需要巨大的计算成本,使得通过直接训练大量候选模型来搜索这样的空间是负担不起的。为了解决这一问题,我们首先对一个超网进行预训练,从中继承所有候选模型的权值,然后采用基于预训练精度的进化算法来寻找最优结构。大量实验表明,该方法得到的LV-BERT模型在各种下游任务上都优于BERT及其变种。例如,LV BERT small在胶水测试集上达到78.8,比强基线ELECTRA small高出1.8。 摘要:Modern pre-trained language models are mostly built upon backbones stacking self-attention and feed-forward layers in an interleaved order. In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order. Specifically, besides the original self-attention and feed-forward layers, we introduce convolution into the layer type set, which is experimentally found beneficial to pre-trained models. Furthermore, beyond the original interleaved order, we explore more layer orders to discover more powerful architectures. However, the introduced layer variety leads to a large architecture space of more than billions of candidates, while training a single candidate model from scratch already requires huge computation cost, making it not affordable to search such a space by directly training large amounts of candidate models. To solve this problem, we first pre-train a supernet from which the weights of all candidate models can be inherited, and then adopt an evolutionary algorithm guided by pre-training accuracy to find the optimal architecture. Extensive experiments show that LV-BERT model obtained by our method outperforms BERT and its variants on various downstream tasks. For example, LV-BERT-small achieves 78.8 on the GLUE testing set, 1.8 higher than the strong baseline ELECTRA-small.
【13】 A Unified Framework for Conservative Exploration 标题:保守探索的统一框架
作者:Yunchang Yang,Tianhao Wu,Han Zhong,Evrard Garcelon,Matteo Pirotta,Alessandro Lazaric,Liwei Wang,Simon S. Du 机构:Center for Data Sience, Peking University, University of Science and Technology of China, Facebook AI Research, Key Laboratory of Machine Perception, MOE, School of EECS, Peking University, University of Washington 链接:https://arxiv.org/abs/2106.11692 摘要:我们研究土匪和强化学习(RL)受保守约束,其中代理人被要求执行至少以及一个给定的基线策略。这种设置特别适用于现实世界的领域,包括数字营销、医疗保健、生产、金融等。对于多武装强盗、线性强盗和表格RL,在以前的工作中提出了专门的算法和理论分析。在本文中,我们提出了一个保守bandits和RL的统一框架,其中我们的核心技术是计算从运行基线策略中获得的必要和充分的预算。对于下界,我们的框架给出了一个黑盒约简,它将非保守环境中的某个下界转化为保守环境中的一个新下界。我们加强了现有的保守多武装土匪的下界,得到了保守线性土匪、表RL和低秩MDP的新下界。对于上界,我们的框架通过简单的分析将一个非保守的置信上限(UCB)算法转化为一个保守的算法。对于多武装土匪,线性土匪和表RL,我们的新上界收紧或匹配现有的显着更简单的分析。我们还得到了保守低秩MDP的一个新的上界。 摘要:We study bandits and reinforcement learning (RL) subject to a conservative constraint where the agent is asked to perform at least as well as a given baseline policy. This setting is particular relevant in real-world domains including digital marketing, healthcare, production, finance, etc. For multi-armed bandits, linear bandits and tabular RL, specialized algorithms and theoretical analyses were proposed in previous work. In this paper, we present a unified framework for conservative bandits and RL, in which our core technique is to calculate the necessary and sufficient budget obtained from running the baseline policy. For lower bounds, our framework gives a black-box reduction that turns a certain lower bound in the nonconservative setting into a new lower bound in the conservative setting. We strengthen the existing lower bound for conservative multi-armed bandits and obtain new lower bounds for conservative linear bandits, tabular RL and low-rank MDP. For upper bounds, our framework turns a certain nonconservative upper-confidence-bound (UCB) algorithm into a conservative algorithm with a simple analysis. For multi-armed bandits, linear bandits and tabular RL, our new upper bounds tighten or match existing ones with significantly simpler analyses. We also obtain a new upper bound for conservative low-rank MDP.
【14】 Repulsive Deep Ensembles are Bayesian 标题:排斥深度系综是贝叶斯系综
作者:Francesco D'Angelo,Vincent Fortuin 机构:ETH Zürich, Zürich, Switzerland 链接:https://arxiv.org/abs/2106.11642 摘要:深度集成由于其概念上的简单性和高效性,最近在深度学习社区中获得了广泛的应用。然而,保持用梯度下降法独立训练的集合成员之间的功能多样性是一个挑战。当添加更多集成成员时,这可能导致病态,例如集成性能饱和,这会收敛到单个模型的性能。此外,这不仅影响其预测的质量,而且更影响集合的不确定性估计,从而影响其对分布外数据的性能。我们假设这一限制可以通过阻止不同的系综成员崩溃为相同的功能来克服。为此,我们在深层系综的更新规则中引入了一个核化排斥项。我们证明,这种简单的修改不仅加强和保持了成员之间的多样性,而且更重要的是,将最大后验概率推理转化为适当的贝叶斯推理。也就是说,我们提出的斥力系综的训练动力学遵循一个具有真实后验概率的KL散度的Wasserstein梯度流。我们研究了权重空间和函数空间中的排斥项,并对它们与标准集合和贝叶斯基线在合成和真实预测任务中的性能进行了实证比较。 摘要:Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.
【15】 Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw 标题:弗罗茨瓦夫大学为ZeroSpeech 2021年提供的信息检索:“深渊翻滚”
作者:Jan Chorowski,Grzegorz Ciesielski,Jarosław Dzikowski,Adrian Łańcucki,Ricard Marxer,Mateusz Opala,Piotr Pusz,Paweł Rychlikowski,Michał Stypułkowski 备注:Published in Interspeech 2021 链接:https://arxiv.org/abs/2106.11603 摘要:我们提出了一些低资源的方法来完成2021年零资源语音挑战赛的任务。我们以组织者提出的语音的无监督表示为基础,从CPC中导出,并用k-means算法进行聚类。我们证明了改进这些表示的简单方法可以缩小差距,甚至改进使用高计算预算的解。结果表明,CPC导出的表示对于训练语言模型来说仍然太过嘈杂,但是对于简单的模式匹配和检索来说足够稳定。 摘要:We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021. We build on the unsupervised representations of speech proposed by the organizers as a baseline, derived from CPC and clustered with the k-means algorithm. We demonstrate that simple methods of refining those representations can narrow the gap, or even improve upon the solutions which use a high computational budget. The results lead to the conclusion that the CPC-derived representations are still too noisy for training language models, but stable enough for simpler forms of pattern matching and retrieval.
【16】 Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge 标题:用最少的DAG知识寻找不可忽略条件下的有效平差
作者:Abhin Shah,Karthikeyan Shanmugam,Kartik Ahuja 机构:MIT, IBM Research, MILA 链接:https://arxiv.org/abs/2106.11560 摘要:从观察数据中估计治疗效果是因果推理中的一个基本问题。有两种截然不同的思想流派解决了这个问题。一方面,Pearlian框架通常以有向无环图(dag)的形式假设结构知识(由专家提供),并提供诸如后门准则之类的图形准则来识别有效的调整集。另一方面,潜在结果(PO)框架通常假设所有观察到的特征满足可忽略性(即,没有隐藏的混杂),这通常是不稳定的。在这项工作中,我们将采取步骤来连接这两个框架。我们表明,即使我们只知道治疗变量的一个父变量(由专家提供),那么非常显著的是,它足以测试广泛的一类(但不是所有)后门标准。重要的是,我们还讨论了一个非常重要的情况,即在不需要观察治疗变量的所有父变量的情况下,观察到的整个特征集是不可忽略的(将PO框架推广)。我们的关键技术思想涉及到一个更一般的结果——给定一个合成的子采样(或环境)变量,它是父变量的函数,我们证明了一个涉及这个子采样变量的不变性测试相当于测试一个大类的后门标准。我们展示了我们的方法对合成数据以及真正的因果关系估计基准。 摘要:Treatment effect estimation from observational data is a fundamental problem in causal inference. There are two very different schools of thought that have tackled this problem. On the one hand, the Pearlian framework commonly assumes structural knowledge (provided by an expert) in the form of Directed Acyclic Graphs (DAGs) and provides graphical criteria such as the back-door criterion to identify the valid adjustment sets. On the other hand, the potential outcomes (PO) framework commonly assumes that all the observed features satisfy ignorability (i.e., no hidden confounding), which in general is untestable. In this work, we take steps to bridge these two frameworks. We show that even if we know only one parent of the treatment variable (provided by an expert), then quite remarkably it suffices to test a broad class of (but not all) back-door criteria. Importantly, we also cover the non-trivial case where the entire set of observed features is not ignorable (generalizing the PO framework) without requiring all the parents of the treatment variable to be observed. Our key technical idea involves a more general result -- Given a synthetic sub-sampling (or environment) variable that is a function of the parent variable, we show that an invariance test involving this sub-sampling variable is equivalent to testing a broad class of back-door criteria. We demonstrate our approach on synthetic data as well as real causal effect estimation benchmarks.
【17】 Differentiable Architecture Search Without Training Nor Labels: A Pruning Perspective 标题:无训练和无标签的可区分体系结构搜索:一种剪枝观点
作者:Miao Zhang,Steven Su,Shirui Pan,Xiaojun Chang,Wei Huang,Gholamreza Haffari 机构:Monash University, University of Technology Sydney 链接:https://arxiv.org/abs/2106.11542 摘要:利用权重共享和连续松弛,使梯度下降通过双层优化范式交替优化超净权重和架构参数,textit{可微结构搜索(Differentiable ARchiTecture Search},DARTS)以其简单高效的特点成为神经结构搜索(Neural ARchiTecture Search,NAS)的主流方法。然而,最近的研究发现,在DARTS中,搜索结构的性能几乎没有随着优化的进行而提高。此外,一些并行工作表明,NAS可以找到更具竞争力的无标签体系结构。上述观察结果表明,DART中的监督信号可能是架构优化的一个不良指标,引发了一个基本问题:与其使用监督信号来执行双层优化,不如使用监督信号来找到高质量的架构textbf{不需要任何训练或标签}}?我们提供了一个肯定的答案,将NAS定制为初始化问题时的网络修剪。通过利用最新的网络初始化剪枝技术,我们设计了一个无需任何训练或标记的自由流代理来评估NAS中候选操作的重要性,并相应地提出了一个新的框架,称为textit{training and label free neural architecture search}(textbf{FreeNAS})。我们表明,在没有任何训练和标签的情况下,使用所提出的FreeFlow代理的FreeNAS可以比大多数NAS基线更好。更重要的是,我们的框架是非常有效的,它只在单个GPU上的textbf{3.6s}和textbf{79s}内分别完成NAS-Bench-201和DARTS搜索空间的架构搜索。我们希望我们的工作能从初始化剪枝的角度激发更多解决NAS的尝试。 摘要:With leveraging the weight-sharing and continuous relaxation to enable gradient-descent to alternately optimize the supernet weights and the architecture parameters through a bi-level optimization paradigm, textit{Differentiable ARchiTecture Search} (DARTS) has become the mainstream method in Neural Architecture Search (NAS) due to its simplicity and efficiency. However, more recent works found that the performance of the searched architecture barely increases with the optimization proceeding in DARTS. In addition, several concurrent works show that the NAS could find more competitive architectures without labels. The above observations reveal that the supervision signal in DARTS may be a poor indicator for architecture optimization, inspiring a foundational question: instead of using the supervision signal to perform bi-level optimization, textit{can we find high-quality architectures textbf{without any training nor labels}}? We provide an affirmative answer by customizing the NAS as a network pruning at initialization problem. By leveraging recent techniques on the network pruning at initialization, we designed a FreeFlow proxy to score the importance of candidate operations in NAS without any training nor labels, and proposed a novel framework called textit{training and label free neural architecture search} (textbf{FreeNAS}) accordingly. We show that, without any training nor labels, FreeNAS with the proposed FreeFlow proxy can outperform most NAS baselines. More importantly, our framework is extremely efficient, which completes the architecture search within only textbf{3.6s} and textbf{79s} on a single GPU for the NAS-Bench-201 and DARTS search space, respectively. We hope our work inspires more attempts in solving NAS from the perspective of pruning at initialization.
【18】 An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN 标题:一种基于CycleGAN的无加速度计PPG运动伪影精确去除技术
作者:Amir Hosein Afandizadeh Zargari,Seyed Amir Hossein Aqajari,Hadi Khodabandeh,Amir M. Rahmani,Fadi Kurdahi 机构: University of California 备注:Submitted to ACM Health 链接:https://arxiv.org/abs/2106.11512 摘要:光体积描记术(PPG)是一种简单且廉价的光学技术,广泛应用于医疗领域,用于提取有价值的健康相关信息,例如心率变异性、血压和呼吸频率。使用便携式可穿戴设备可以轻松地连续和远程采集PPG信号。然而,这些测量装置容易受到日常生活活动引起的运动伪影的影响。消除运动伪影的最常见方法是使用额外的加速计传感器,这有两个限制:i)高功耗和ii)需要在可穿戴设备中集成加速计传感器(某些可穿戴设备不需要)。本文提出了一种低功耗的基于非加速度计的PPG运动伪影去除方法。利用循环生成对抗网络从含噪PPG信号中重构干净的PPG信号。我们新的基于机器学习的技术实现了9.5倍的改善运动伪影去除相比,国家的最先进的,而不使用额外的传感器,如加速度计。 摘要:A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique widely used in the healthcare domain to extract valuable health-related information, e.g., heart rate variability, blood pressure, and respiration rate. PPG signals can easily be collected continuously and remotely using portable wearable devices. However, these measuring devices are vulnerable to motion artifacts caused by daily life activities. The most common ways to eliminate motion artifacts use extra accelerometer sensors, which suffer from two limitations: i) high power consumption and ii) the need to integrate an accelerometer sensor in a wearable device (which is not required in certain wearables). This paper proposes a low-power non-accelerometer-based PPG motion artifacts removal method outperforming the accuracy of the existing methods. We use Cycle Generative Adversarial Network to reconstruct clean PPG signals from noisy PPG signals. Our novel machine-learning-based technique achieves 9.5 times improvement in motion artifact removal compared to the state-of-the-art without using extra sensors such as an accelerometer.
【19】 How well do you know your summarization datasets? 标题:您对摘要数据集的了解程度如何?
作者:Priyam Tejaswin,Dhruv Naik,Pengfei Liu 机构:Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA 备注:Accepted into Findings of ACL-IJCNLP 2021 链接:https://arxiv.org/abs/2106.11388 摘要:最先进的摘要系统是在从网上搜集的大量数据集上训练和评估的。尽管它们普遍存在,但我们对这些数据集的基本特征(数据噪声、摘要复杂性等)知之甚少,以及它们如何影响系统性能和自动度量(如ROUGE)的可靠性。在这项研究中,我们从三个流行的摘要数据集中手工分析了600个样本。我们的研究是由一个六类类型学驱动的,它捕获不同的噪声类型(丢失的事实,实体)和程度的摘要困难(提取,抽象)。我们随后对27种最先进的摘要模型和5种流行的度量标准进行了深入分析,并报告了我们的主要见解:(1)数据集具有不同的数据质量和复杂性分布,可以追溯到它们的收集过程(2) 模型的性能和度量的可靠性依赖于样本的复杂性(3) 由于参考文献的多样性较差,忠实的摘要往往得分较低。我们发布了代码、带注释的数据和模型输出。 摘要:State-of-the-art summarization systems are trained and evaluated on massive datasets scraped from the web. Despite their prevalence, we know very little about the underlying characteristics (data noise, summarization complexity, etc.) of these datasets, and how these affect system performance and the reliability of automatic metrics like ROUGE. In this study, we manually analyze 600 samples from three popular summarization datasets. Our study is driven by a six-class typology which captures different noise types (missing facts, entities) and degrees of summarization difficulty (extractive, abstractive). We follow with a thorough analysis of 27 state-of-the-art summarization models and 5 popular metrics, and report our key insights: (1) Datasets have distinct data quality and complexity distributions, which can be traced back to their collection process. (2) The performance of models and reliability of metrics is dependent on sample complexity. (3) Faithful summaries often receive low scores because of the poor diversity of references. We release the code, annotated data and model outputs.
【20】 Photozilla: A Large-Scale Photography Dataset and Visual Embedding for 20 Photography Styles 标题:Photozilla:一个大规模摄影数据集和20种摄影样式的视觉嵌入
作者:Trisha Singhal,Junhua Liu,Lucienne T. M. Blessing,Kwan Hui Lim 机构:Lucienne T.M. Blessing, Singapore University of Technology and Design, Singapore, Forth AI, Singapore 备注:In the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. (Poster) 链接:https://arxiv.org/abs/2106.11359 摘要:社交媒体平台的出现促进了数码摄影的发展,从而催生了视觉应用的繁荣。基于这一动机,我们引入了一个称为“Photozilla”的大规模数据集,其中包括超过990k的图像,属于10种不同的摄影风格。然后利用该数据集训练3个分类模型,将图像自动分类为相应的风格,分类准确率达96%。随着数码摄影的迅速发展,我们看到新的摄影风格以指数级的速度出现。基于此,我们提出了一种新的基于暹罗的分类网络,该网络以训练好的分类模型为基础结构,只需25个训练样本就可以适应和分类不可见的样式。我们报告的准确率超过68%的确定10其他不同类型的摄影风格。此数据集位于https://trisha025.github.io/Photozilla/ 摘要:The advent of social media platforms has been a catalyst for the development of digital photography that engendered a boom in vision applications. With this motivation, we introduce a large-scale dataset termed 'Photozilla', which includes over 990k images belonging to 10 different photographic styles. The dataset is then used to train 3 classification models to automatically classify the images into the relevant style which resulted in an accuracy of ~96%. With the rapid evolution of digital photography, we have seen new types of photography styles emerging at an exponential rate. On that account, we present a novel Siamese-based network that uses the trained classification models as the base architecture to adapt and classify unseen styles with only 25 training samples. We report an accuracy of over 68% for identifying 10 other distinct types of photography styles. This dataset can be found at https://trisha025.github.io/Photozilla/
【21】 Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations 标题:Cogment:面向分布式多参与者训练、部署和操作的开源框架
作者:AI Redefined,Sai Krishna Gottipati,Sagar Kurandwad,Clodéric Mars,Gregory Szriftgiser,François Chabot 机构:Chabot 备注:16 pages, 7 figures 链接:https://arxiv.org/abs/2106.11345 摘要:由于强化学习和人在回路学习方面的一些进步,让人类直接参与人工智能代理的训练正变得越来越重要。人类可以为代理提供奖励、演示任务、设计课程或在环境中行动,但这些好处也伴随着架构、功能设计和工程复杂性而来。我们提出了Cogment,一个统一的开源框架,它引入了actor形式主义来支持各种人-代理协作类型和训练方法。由于采用了分布式微服务体系结构,它还具有开箱即用的可扩展性,并为上述复杂性提供了解决方案。 摘要:Involving humans directly for the benefit of AI agents' training is getting traction thanks to several advances in reinforcement learning and human-in-the-loop learning. Humans can provide rewards to the agent, demonstrate tasks, design a curriculum, or act in the environment, but these benefits also come with architectural, functional design and engineering complexities. We present Cogment, a unifying open-source framework that introduces an actor formalism to support a variety of humans-agents collaboration typologies and training approaches. It is also scalable out of the box thanks to a distributed micro service architecture, and offers solutions to the aforementioned complexities.
【22】 Surrogate-based variational data assimilation for tidal modelling 标题:用于潮汐模拟的基于代理的变分数据同化
作者:Rem-Sophia Mouradi,Cédric Goeury,Olivier Thual,Fabrice Zaoui,Pablo Tassi 机构:EDF R&D, National Laboratory for Hydraulics and Environment (LNHE), Quai Watier, Climate, Environment, Coupling and Uncertainties research unit (CECI) at the European, Center for Research and Advanced Training in Scientific Computation (CERFACS), French 链接:https://arxiv.org/abs/2106.11926 摘要:资料同化(DA)被广泛应用于结合物理知识和观测。目前地球科学中常用的方法是进行参数标定。在气候变化的背景下,旧的校准不一定能用于新的情景。这就提出了DA计算成本的问题,因为昂贵的基于物理的数值模型需要重新分析。因此,归约和元建模代表了有趣的观点,例如最近的贡献中提出的集成和变分方法的混合,以结合它们的优势(效率、非线性框架)。然而,它们通常基于montecarlo(MC)类型的采样,这通常需要大量增加集合大小以获得更好的效率,因此在基于集合的方法中也代表了计算负担。为了解决这些问题,提出了两种用替代物代替复杂模型的方法:(i)PODEn4DVAR直接源于PODEn4DVAR,它依赖于基于集成的联合参数状态本征正交分解(POD),提供了一种线性元模型(ii)POD-PCE-3DVAR,其中模型状态被POD缩减,然后使用多项式混沌展开(PCE)学习,产生非线性元模型。这两种元模型都允许编写一个近似的代价函数,其最小值可以解析计算,或通过梯度下降以可忽略不计的代价推导出来。此外,为POD-PCE-3DVAR给出了自适应元模型误差协方差矩阵,从而大大改进了基于元模型的DA分析。提出的方法是面对一个孪生实验,并比较了经典的3DVAR测量为基础的问题。结果是有希望的,特别是优于POD-PCE-3DVAR,显示出良好的收敛性和对噪声的鲁棒性经典3DVAR。 摘要:Data assimilation (DA) is widely used to combine physical knowledge and observations. It is nowadays commonly used in geosciences to perform parametric calibration. In a context of climate change, old calibrations can not necessarily be used for new scenarios. This raises the question of DA computational cost, as costly physics-based numerical models need to be reanalyzed. Reduction and metamodelling represent therefore interesting perspectives, for example proposed in recent contributions as hybridization between ensemble and variational methods, to combine their advantages (efficiency, non-linear framework). They are however often based on Monte Carlo (MC) type sampling, which often requires considerable increase of the ensemble size for better efficiency, therefore representing a computational burden in ensemble-based methods as well. To address these issues, two methods to replace the complex model by a surrogate are proposed and confronted : (i) PODEn3DVAR directly inspired from PODEn4DVAR, relies on an ensemble-based joint parameter-state Proper Orthogonal Decomposition (POD), which provides a linear metamodel ; (ii) POD-PCE-3DVAR, where the model states are POD reduced then learned using Polynomial Chaos Expansion (PCE), resulting in a non-linear metamodel. Both metamodels allow to write an approximate cost function whose minimum can be analytically computed, or deduced by a gradient descent at negligible cost. Furthermore, adapted metamodelling error covariance matrix is given for POD-PCE-3DVAR, allowing to substantially improve the metamodel-based DA analysis. Proposed methods are confronted on a twin experiment, and compared to classical 3DVAR on a measurement-based problem. Results are promising, in particular superior with POD-PCE-3DVAR, showing good convergence to classical 3DVAR and robustness to noise.
【23】 Algorithmic Recourse in Partially and Fully Confounded Settings Through Bounding Counterfactual Effects 标题:通过限制反事实效应在部分和完全混淆环境下的算法追索权
作者:Julius von Kügelgen,Nikita Agarwal,Jakob Zeitler,Afsaneh Mastouri,Bernhard Schölkopf 机构:Bernhard Sch¨olkopf , Max Planck Institute for Intelligent Systems T¨ubingen, Germany, Department of Engineering, University of Cambridge, United Kingdom, Graduate Training Centre of Neuroscience, International Max Planck Research School 备注:Preliminary workshop version; work in progress 链接:https://arxiv.org/abs/2106.11849 摘要:算法资源旨在为个人提供可操作的建议,以从自动化决策系统中获得更有利的结果。由于它涉及对在物理世界中进行的干预进行推理,追索权从根本上说是一个因果问题。现有的方法使用从数据中学习到的因果模型,在假设没有隐藏的混杂和建模假设(如加性噪声)的情况下,计算追索行为的影响。在Balke和Pearl(1994)的开创性工作的基础上,我们提出了一种离散随机变量的替代方法,该方法放宽了这些假设,并允许未观察到的混杂和任意结构方程。所提出的方法只需要说明因果图和混淆结构,并限制追索行动的预期反事实效果。如果下限高于某个阈值,即在决策边界的另一边,则期望中保证追索权。 摘要:Algorithmic recourse aims to provide actionable recommendations to individuals to obtain a more favourable outcome from an automated decision-making system. As it involves reasoning about interventions performed in the physical world, recourse is fundamentally a causal problem. Existing methods compute the effect of recourse actions using a causal model learnt from data under the assumption of no hidden confounding and modelling assumptions such as additive noise. Building on the seminal work of Balke and Pearl (1994), we propose an alternative approach for discrete random variables which relaxes these assumptions and allows for unobserved confounding and arbitrary structural equations. The proposed approach only requires specification of the causal graph and confounding structure and bounds the expected counterfactual effect of recourse actions. If the lower bound is above a certain threshold, i.e., on the other side of the decision boundary, recourse is guaranteed in expectation.