机器学习学术速递[7.22]

2021-07-27 11:12:45 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.LG 方向,今日共计72篇

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】 Bridging the Gap between Spatial and Spectral Domains: A Theoretical Framework for Graph Neural Networks 标题:弥合空间域和谱域之间的鸿沟:一个图神经网络的理论框架

作者:Zhiqian Chen,Fanglan Chen,Lei Zhang,Taoran Ji,Kaiqun Fu,Liang Zhao,Feng Chen,Lingfei Wu,Charu Aggarwal,Chang-Tien Lu 机构: Department of Computer Science and Engineering, Mississippi State University 链接:https://arxiv.org/abs/2107.10234 摘要:在过去的十年中,深度学习的性能在各种机器学习任务中得到了广泛的认可,从图像分类、语音识别到自然语言理解。图神经网络(GNN)是一种深度学习,它利用传统深度学习技术难以解决的图结构数据来处理非欧氏问题。大多数GNN是使用各种过程创建的,包括随机游走、PageRank、图卷积和热扩散,因此无法进行直接比较。以前的研究主要集中在将现有模型划分为不同的类别,而很少研究它们的内部关系。这项研究提出了一个统一的理论框架和一个新的视角,可以将现有的GNN方法整合到我们的框架中。我们调查和分类现有的GNN模型分为空间域和光谱域,以及显示之间的联系在每个领域的子类别。进一步的研究揭示了这些域的空间、光谱和子群之间的密切关系。 摘要:During the past decade, deep learning's performance has been widely recognized in a variety of machine learning tasks, ranging from image classification, speech recognition to natural language understanding. Graph neural networks (GNN) are a type of deep learning that is designed to handle non-Euclidean issues using graph-structured data that are difficult to solve with traditional deep learning techniques. The majority of GNNs were created using a variety of processes, including random walk, PageRank, graph convolution, and heat diffusion, making direct comparisons impossible. Previous studies have primarily focused on classifying current models into distinct categories, with little investigation of their internal relationships. This research proposes a unified theoretical framework and a novel perspective that can methodologically integrate existing GNN into our framework. We survey and categorize existing GNN models into spatial and spectral domains, as well as show linkages between subcategories within each domain. Further investigation reveals a strong relationship between the spatial, spectral, and subgroups of these domains.

【2】 Relational Graph Convolutional Networks: A Closer Look 标题:关系图卷积网络:近距离观察

作者:Thiviyan Thanapalasingam,Lucas van Berkel,Peter Bloem,Paul Groth 机构:Informatics Institute, University of Amsterdam, The Netherlands†Vrije Universiteit Amsterdam 链接:https://arxiv.org/abs/2107.10015 摘要:本文描述了一种关系图卷积网络(RGCN)的复制。利用我们的复制,我们解释了模型背后的直觉。我们的再现结果验证了我们在节点分类和链路预测任务中使用基准知识图数据集实现的正确性。我们的解释为用户和扩展RGCN方法的研究人员提供了对RGCN的不同组件的友好理解。此外,我们还介绍了两种新的参数效率更高的RGCN结构。代码和数据集可在https://github.com/thiviyanT/torch-rgcn. 摘要:In this paper, we describe a reproduction of the Relational Graph Convolutional Network (RGCN). Using our reproduction, we explain the intuition behind the model. Our reproduction results empirically validate the correctness of our implementations using benchmark Knowledge Graph datasets on node classification and link prediction tasks. Our explanation provides a friendly understanding of the different components of the RGCN for both users and researchers extending the RGCN approach. Furthermore, we introduce two new configurations of the RGCN that are more parameter efficient. The code and datasets are available at https://github.com/thiviyanT/torch-rgcn.

【3】 A Factor Graph-based approach to vehicle sideslip angle estimation 标题:一种基于因子图的车辆侧滑角估计方法

作者:Antonio Leanza,Giulio Reina,Jose-Luis Blanco-Claraco 备注:15 pages, 9 figures 链接:https://arxiv.org/abs/2107.09815 摘要:侧滑角是了解和监测车辆动力学的一个重要变量,但缺乏一种廉价的直接测量方法。因此,通常使用Kalman滤波器族的滤波方法从船上的惯性和其他本体感知传感器估计。作为一种新的选择,这项工作提出将问题直接建模为一个图形模型(因子图),然后可以使用多种方法进行优化,例如离线处理的整数据集批量优化或在线操作的固定滞后平滑器。在实车数据集上的实验结果验证了该方法的有效性,估计的侧滑角与实际的侧滑角符合得很好,显示出与现有技术相似的性能,由于其灵活的数学框架,具有很大的扩展潜力。 摘要:Sideslip angle is an important variable for understanding and monitoring vehicle dynamics but it lacks an inexpensive method for direct measurement. Therefore, it is typically estimated from inertial and other proprioceptive sensors onboard using filtering methods from the family of the Kalman Filter. As a novel alternative, this work proposes modelling the problem directly as a graphical model (factor graph), which can then be optimized using a variety of methods, such as whole dataset batch optimization for offline processing or fixed-lag smoother for on-line operation. Experimental results on real vehicle datasets validate the proposal with a good agreement between estimated and actual sideslip angle, showing similar performance than the state-of-the-art with a great potential for future extensions due to the flexible mathematical framework.

【4】 Group Contrastive Self-Supervised Learning on Graphs 标题:图的分组对比自监督学习

作者:Xinyi Xu,Cheng Deng,Yaochen Xie,Shuiwang Ji 机构: Ji are with the Department of Computer Science andEngineering, Texas A&M University 链接:https://arxiv.org/abs/2107.09787 摘要:利用对比的方法研究了图的自监督学习。先验方法的一般方案是优化输入图的两视图表示。在许多研究中,一个单一的图级表示被计算为对比目标之一,捕获了图的有限特征。我们认为,对比图在多个子空间使图编码器捕捉到更丰富的特征。为此,本文提出了一个小组对比学习框架。我们的框架将给定的图嵌入到多个子空间中,每个子空间的表示都被提示对图的特定特征进行编码。为了学习多样性和信息性的表征,我们制定了原则性的目标,使我们能够捕捉群体中空间内和空间间表征之间的关系。在这个框架下,我们进一步发展了一个基于注意的表征函数来计算表征,以捕捉给定图形的不同子结构。基于我们的框架,我们扩展了两种现有的方法到GroupCL和GroupIG中,并提出了相应的目标。综合实验结果表明,该框架在多种数据集上都取得了良好的性能提升。此外,我们的定性结果显示,从我们的代表人生成的特征成功地捕捉到各种特定的图形特征。 摘要:We study self-supervised learning on graphs using contrastive methods. A general scheme of prior methods is to optimize two-view representations of input graphs. In many studies, a single graph-level representation is computed as one of the contrastive objectives, capturing limited characteristics of graphs. We argue that contrasting graphs in multiple subspaces enables graph encoders to capture more abundant characteristics. To this end, we propose a group contrastive learning framework in this work. Our framework embeds the given graph into multiple subspaces, of which each representation is prompted to encode specific characteristics of graphs. To learn diverse and informative representations, we develop principled objectives that enable us to capture the relations among both intra-space and inter-space representations in groups. Under the proposed framework, we further develop an attention-based representor function to compute representations that capture different substructures of a given graph. Built upon our framework, we extend two current methods into GroupCL and GroupIG, equipped with the proposed objective. Comprehensive experimental results show our framework achieves a promising boost in performance on a variety of datasets. In addition, our qualitative results show that features generated from our representor successfully capture various specific characteristics of graphs.

Transformer(1篇)

【1】 Audio Captioning Transformer 标题:音频字幕转换器

作者:Xinhao Mei,Xubo Liu,Qiushi Huang,Mark D. Plumbley,Wenwu Wang 机构: Centre for Vision, Speech and Signal Processing (CVSSP), Department of Computer Science, University of Surrey, UK 备注:5 pages, 1 figure 链接:https://arxiv.org/abs/2107.09817 摘要:音频字幕旨在自动生成音频片段的自然语言描述。大多数字幕模型遵循编码器-解码器体系结构,解码器根据编码器提取的音频特征预测单词。卷积神经网络(CNNs)和递归神经网络(RNNs)常用作音频编码器。然而,cnn在建模音频信号中的时间帧之间的时间关系方面可能受到限制,而rnn在建模时间帧之间的长程依赖方面可能受到限制。本文提出了一种完全无卷积的音频字幕转换器(ACT),它是一种基于编解码器结构的全Transformer网络。该方法能够更好地对音频信号中的全局信息进行建模,并捕捉音频事件之间的时间关系。我们在AudioCaps上评估了我们的模型,AudioCaps是公开的最大的音频字幕数据集。与其他最先进的方法相比,我们的模型显示出了具有竞争力的性能。 摘要:Audio captioning aims to automatically generate a natural language description of an audio clip. Most captioning models follow an encoder-decoder architecture, where the decoder predicts words based on the audio features extracted by the encoder. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are often used as the audio encoder. However, CNNs can be limited in modelling temporal relationships among the time frames in an audio signal, while RNNs can be limited in modelling the long-range dependencies among the time frames. In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free. The proposed method has a better ability to model the global information within an audio signal as well as capture temporal relationships between audio events. We evaluate our model on AudioCaps, which is the largest audio captioning dataset publicly available. Our model shows competitive performance compared to other state-of-the-art approaches.

GAN|对抗|攻击|生成相关(6篇)

【1】 Boundary of Distribution Support Generator (BDSG): Sample Generation on the Boundary 标题:分布支持生成器(BDSG)的边界:边界上的样本生成

作者:Nikolaos Dionelis 机构:The University of Edinburgh, Edinburgh, UK 备注:None 链接:https://arxiv.org/abs/2107.09950 摘要:生成性模型,如生成性对抗网络(GANs),已被用于无监督异常检测。在性能不断提高的同时,还存在一些限制,特别是由于难以获得多模态支持,以及接近尾部的基础分布(即分布支持的边界)的能力。本文提出了一种方法,试图减轻这些缺点。提出了一种基于可逆残差网络的分布支持发生器(BDSG)边界模型。GANs一般不保证概率分布的存在,在这里,我们使用最近发展的可逆残差网络(IResNet)和残差流(ResFlow)进行密度估计。这些模型尚未用于异常检测。我们利用IResNet和ResFlow来进行非分布(OoD)样本检测,并使用复合损失函数来生成边界上的样本,该复合损失函数迫使样本位于边界上。BDSG解决了非凸支持、不相交分量和多峰分布。合成数据和来自多峰分布(如MNIST和CIFAR-10)的数据的结果表明,与文献中的方法相比具有竞争力。 摘要:Generative models, such as Generative Adversarial Networks (GANs), have been used for unsupervised anomaly detection. While performance keeps improving, several limitations exist particularly attributed to difficulties at capturing multimodal supports and to the ability to approximate the underlying distribution closer to the tails, i.e. the boundary of the distribution's support. This paper proposes an approach that attempts to alleviate such shortcomings. We propose an invertible-residual-network-based model, the Boundary of Distribution Support Generator (BDSG). GANs generally do not guarantee the existence of a probability distribution and here, we use the recently developed Invertible Residual Network (IResNet) and Residual Flow (ResFlow), for density estimation. These models have not yet been used for anomaly detection. We leverage IResNet and ResFlow for Out-of-Distribution (OoD) sample detection and for sample generation on the boundary using a compound loss function that forces the samples to lie on the boundary. The BDSG addresses non-convex support, disjoint components, and multimodal distributions. Results on synthetic data and data from multimodal distributions, such as MNIST and CIFAR-10, demonstrate competitive performance compared to methods from the literature.

【2】 Fast and Scalable Adversarial Training of Kernel SVM via Doubly Stochastic Gradients 标题:基于双重随机梯度的核支持向量机快速可扩展对抗性训练

作者:Huimin Wu,Zhengmian Hu,Bin Gu 机构: School of Computer & Software, Nanjing University of Information Science & Technology, P.R.China, Department of Electrical & Computer Engineering, University of Pittsburgh, PA, USA, JD Finance America Corporation, Mountain View, CA, USA 链接:https://arxiv.org/abs/2107.09937 摘要:通过生成与自然示例几乎无法区分的示例进行对抗性攻击,对学习模型构成严重威胁。防御对抗性攻击是可靠学习系统的关键要素。支持向量机(SVM)是一种经典的学习算法,在当前的深度学习时代仍然是一种重要的学习算法。虽然近年来人们对如何提高学习模型的对抗鲁棒性做了大量的研究,但大多局限于深层神经网络(DNNs),核支持向量机的研究仍然是空白。本文以核支持向量机为研究对象,提出了adv-SVM,通过对抗训练来提高其对抗鲁棒性,这是最有前途的防御技术。据我们所知,这是第一个致力于快速和可扩展的核支持向量机对抗训练的工作。具体地说,我们首先在原始空间和核空间之间建立样本扰动的联系,然后基于这种联系给出了核支持向量机对抗训练的一种简化等价形式。其次,采用基于两种无偏随机逼近(一种基于训练点,另一种基于随机特征)的双随机梯度(DSG)来更新目标函数的解。最后,我们证明了用DSG优化的算法在步长不变和递减的情况下,以O(1/t)的速率收敛到最优解。综合实验结果表明,我们的对抗式训练算法对各种攻击具有较强的鲁棒性,同时具有与经典DSG算法相似的效率和可扩展性。 摘要:Adversarial attacks by generating examples which are almost indistinguishable from natural examples, pose a serious threat to learning models. Defending against adversarial attacks is a critical element for a reliable learning system. Support vector machine (SVM) is a classical yet still important learning algorithm even in the current deep learning era. Although a wide range of researches have been done in recent years to improve the adversarial robustness of learning models, but most of them are limited to deep neural networks (DNNs) and the work for kernel SVM is still vacant. In this paper, we aim at kernel SVM and propose adv-SVM to improve its adversarial robustness via adversarial training, which has been demonstrated to be the most promising defense techniques. To the best of our knowledge, this is the first work that devotes to the fast and scalable adversarial training of kernel SVM. Specifically, we first build connection of perturbations of samples between original and kernel spaces, and then give a reduced and equivalent formulation of adversarial training of kernel SVM based on the connection. Next, doubly stochastic gradients (DSG) based on two unbiased stochastic approximations (i.e., one is on training points and another is on random features) are applied to update the solution of our objective function. Finally, we prove that our algorithm optimized by DSG converges to the optimal solution at the rate of O(1/t) under the constant and diminishing stepsizes. Comprehensive experimental results show that our adversarial training algorithm enjoys robustness against various attacks and meanwhile has the similar efficiency and scalability with classical DSG algorithm.

【3】 Defending against Reconstruction Attack in Vertical Federated Learning 标题:垂直联合学习中抗重构攻击的研究

作者:Jiankai Sun,Yuanshun Yao,Weihao Gao,Junyuan Xie,Chong Wang 备注:Accepted to International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21) 链接:https://arxiv.org/abs/2107.09898 摘要:最近,研究人员研究了联邦学习(FL)中的输入泄漏问题,恶意方可以从共享梯度重建用户提供的敏感训练输入。由于输入泄漏与使用FL的隐私保护意图相矛盾,这引起了人们对FL的关注。尽管关于水平FL中输入重建攻击和防御的文献相对丰富,但垂直FL中的输入泄漏和保护最近开始引起研究者的关注。本文研究了如何在垂直FL中防御输入泄漏攻击,设计了一个基于对抗训练的框架,包括三个模块:对抗重建、噪声正则化和距离相关最小化。这些模块既可以单独使用,也可以一起使用,因为它们相互独立。通过在一个大型工业在线广告数据集上的大量实验,我们证明了我们的框架在保持模型效用的同时有效地保护了输入隐私。 摘要:Recently researchers have studied input leakage problems in Federated Learning (FL) where a malicious party can reconstruct sensitive training inputs provided by users from shared gradient. It raises concerns about FL since input leakage contradicts the privacy-preserving intention of using FL. Despite a relatively rich literature on attacks and defenses of input reconstruction in Horizontal FL, input leakage and protection in vertical FL starts to draw researcher's attention recently. In this paper, we study how to defend against input leakage attacks in Vertical FL. We design an adversarial training-based framework that contains three modules: adversarial reconstruction, noise regularization, and distance correlation minimization. Those modules can not only be employed individually but also applied together since they are independent to each other. Through extensive experiments on a large-scale industrial online advertising dataset, we show our framework is effective in protecting input privacy while retaining the model utility.

【4】 Using Undervolting as an On-Device Defense Against Adversarial Machine Learning Attacks 标题:使用欠电压作为对抗机器学习攻击的设备防御

作者:Saikat Majumdar,Mohammad Hossein Samavatian,Kristin Barber,Radu Teodorescu 机构:Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA 链接:https://arxiv.org/abs/2107.09804 摘要:深度神经网络(DNN)分类器是一种强有力的工具,在从图像识别到自主车辆等领域有着广泛的重要应用。不幸的是,DNNs很容易受到敌方攻击,几乎影响到所有最先进的模型。这些攻击对输入进行微小的不可察觉的修改,这些修改足以导致DNNs产生错误的分类。在本文中,我们提出了一种新的,轻量级的对抗性校正和/或检测机制的图像分类器,依赖于欠电压(运行一个芯片的电压略低于其安全裕度)。为了引入有限的计算错误,我们建议使用运行推理过程的芯片的受控欠压。我们证明,这些错误以一种既可以用来纠正分类又可以将输入检测为敌方输入的方式破坏了敌方输入。我们在FPGA设计和软件仿真中对所提出的解决方案进行了评估。我们评估了对两个流行dnn的10次攻击,平均检测率为80%到95%。 摘要:Deep neural network (DNN) classifiers are powerful tools that drive a broad spectrum of important applications, from image recognition to autonomous vehicles. Unfortunately, DNNs are known to be vulnerable to adversarial attacks that affect virtually all state-of-the-art models. These attacks make small imperceptible modifications to inputs that are sufficient to induce the DNNs to produce the wrong classification. In this paper we propose a novel, lightweight adversarial correction and/or detection mechanism for image classifiers that relies on undervolting (running a chip at a voltage that is slightly below its safe margin). We propose using controlled undervolting of the chip running the inference process in order to introduce a limited number of compute errors. We show that these errors disrupt the adversarial input in a way that can be used either to correct the classification or detect the input as adversarial. We evaluate the proposed solution in an FPGA design and through software simulation. We evaluate 10 attacks on two popular DNNs and show an average detection rate of 80% to 95%.

【5】 High-Resolution Pelvic MRI Reconstruction Using a Generative Adversarial Network with Attention and Cyclic Loss 标题:基于注意力和循环丢失的生成性对抗网络的骨盆高分辨率MRI重建

作者:Guangyuan Li,Jun Lv,Xiangrong Tong,Chengyan Wang,Guang Yang 机构:a., School of Computer and Control Engineering, Yantai University, Yantai, China;, b., Human Phenome Institute, Fudan University, Shanghai, China;, c., Cardiovascular Research Centre, Royal Brompton Hospital, SW,NP, London, U.K.;, d. 备注:21 pages, 7 figures, 4 tables 链接:https://arxiv.org/abs/2107.09989 摘要:磁共振成像(MRI)是一种重要的医学成像手段,但由于生理限制,其采集速度较慢。近年来,超分辨率方法在磁共振成像加速方面表现出了优异的性能。在某些情况下,即使扫描时间延长,也很难获得高分辨率图像。因此,我们提出了一种新的超分辨率方法,它使用具有循环丢失和注意机制的生成对抗网络(GAN)从低分辨率的MR图像中生成高分辨率的MR图像,因子为2。我们将我们的模型应用于健康受试者的骨盆图像作为训练和验证数据,而这些来自患者的数据被用于检测。使用不同的成像序列获得MR数据集,包括T2、T2W-SPAIR和mDIXON-W。采用双三次、SRCNN、SRGAN和EDSR四种方法进行比较。以结构相似性、峰值信噪比、均方根误差和方差膨胀因子为计算指标,评价了该方法的性能。实验结果表明,与其他方法相比,该方法能更好地恢复高分辨率MR图像的细节。此外,重建的高分辨率MR图像可以为肿瘤患者提供更好的病变纹理,在临床诊断中具有广阔的应用前景。 摘要:Magnetic resonance imaging (MRI) is an important medical imaging modality, but its acquisition speed is quite slow due to the physiological limitations. Recently, super-resolution methods have shown excellent performance in accelerating MRI. In some circumstances, it is difficult to obtain high-resolution images even with prolonged scan time. Therefore, we proposed a novel super-resolution method that uses a generative adversarial network (GAN) with cyclic loss and attention mechanism to generate high-resolution MR images from low-resolution MR images by a factor of 2. We implemented our model on pelvic images from healthy subjects as training and validation data, while those data from patients were used for testing. The MR dataset was obtained using different imaging sequences, including T2, T2W SPAIR, and mDIXON-W. Four methods, i.e., BICUBIC, SRCNN, SRGAN, and EDSR were used for comparison. Structural similarity, peak signal to noise ratio, root mean square error, and variance inflation factor were used as calculation indicators to evaluate the performances of the proposed method. Various experimental results showed that our method can better restore the details of the high-resolution MR image as compared to the other methods. In addition, the reconstructed high-resolution MR image can provide better lesion textures in the tumor patients, which is promising to be used in clinical diagnosis.

【6】 3D-StyleGAN: A Style-Based Generative Adversarial Network for Generative Modeling of Three-Dimensional Medical Images 标题:3D-StyleGAN:一种基于风格的三维医学图像生成对抗性网络

作者:Sungmin Hong,Razvan Marinescu,Adrian V. Dalca,Anna K. Bonkhoff,Martin Bretzner,Natalia S. Rost,Polina Golland 机构: JPK Stroke Research Center, Department of Neurology, Massachusetts General, Hospital, Harvard Medical School, Boston, MA, USA., Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of, Technology, Cambridge, MA, USA. 备注:11 pages, 6 figures, 2 tables. Provisionally Accepted at DGM4MICCAI workshop in MICCAI 2021 链接:https://arxiv.org/abs/2107.09700 摘要:三维医学图像的生成性对抗网络(Generative敌对网络,GANs)图像合成具有很大的潜力,可以扩展到许多医学应用领域,如图像增强和疾病进展建模等。然而,目前用于三维医学图像合成的GAN技术需要显著改进,以适应现实世界的医学问题。在本文中,我们扩展了最先进的StyleGAN2模型,该模型本机适用于二维图像,以实现三维图像合成。除了图像合成之外,我们还通过从原始StyleGAN2继承的样式向量研究了3D StyleGAN的可控性和可解释性,这些样式向量非常适合医学应用:(i)看不见的真实图像的潜在空间投影和重建,以及(ii)样式混合。我们用12000幅全脑mrt1三维图像验证了3D-StyleGAN的性能和可行性,尽管它可以应用于任何三维体积图像。此外,我们探讨了超参数的不同配置,以探讨更大网络对图像合成的潜在改进。代码和预先训练的网络可在线获取:https://github.com/sh4174/3DStyleGAN. 摘要:Image synthesis via Generative Adversarial Networks (GANs) of three-dimensional (3D) medical images has great potential that can be extended to many medical applications, such as, image enhancement and disease progression modeling. However, current GAN technologies for 3D medical image synthesis need to be significantly improved to be readily adapted to real-world medical problems. In this paper, we extend the state-of-the-art StyleGAN2 model, which natively works with two-dimensional images, to enable 3D image synthesis. In addition to the image synthesis, we investigate the controllability and interpretability of the 3D-StyleGAN via style vectors inherited form the original StyleGAN2 that are highly suitable for medical applications: (i) the latent space projection and reconstruction of unseen real images, and (ii) style mixing. We demonstrate the 3D-StyleGAN's performance and feasibility with ~12,000 three-dimensional full brain MR T1 images, although it can be applied to any 3D volumetric images. Furthermore, we explore different configurations of hyperparameters to investigate potential improvement of the image synthesis with larger networks. The codes and pre-trained networks are available online: https://github.com/sh4174/3DStyleGAN.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】 Black-box Probe for Unsupervised Domain Adaptation without Model Transferring 标题:无模型转移的无监督领域自适应黑盒探针

作者:Kunhong Wu,Yucheng Shi,Yahong Han,Yunfeng Shao,Bingshuai Li 机构:College of Intelligence and Computing, Tianjin University, Tianjin, China, Huawei Noah’s Ark Lab, Huawei Technologies 链接:https://arxiv.org/abs/2107.10174 摘要:近年来,深度学习模型给数据安全和隐私带来的威胁越来越受到研究者的关注,尤其是在领域自适应领域。现有的无监督域自适应(unsupervised domain adaption,UDA)方法可以在不将数据从源域传输到目标域的情况下获得良好的性能。然而,具有表示对齐或自监督伪标记的UDA依赖于传输源模型。在许多数据关键场景中,基于模型传输的方法可能遭受成员推理攻击,并暴露私有数据。在本文中,我们的目标是克服一个具有挑战性的新设置,即源模型只能查询,但不能转移到目标域。我们提出了黑盒探测域适配(BPDA),它采用查询机制,利用第三方数据集对源模型中的信息进行探测和细化。为了获得更为丰富的查询结果,我们进一步提出了分布式对抗训练(DAT)来调整第三方数据的分布与目标数据的分布。BPDA使用公共的第三方数据集和基于DAT的对抗性实例作为源域和目标域之间的信息载体,无需传输源数据或模型。在数字五、Office-Caltech、Office-31、Office-Home和DomainNet上的实验结果证明了BPDA不需要模型转换的可行性。 摘要:In recent years, researchers have been paying increasing attention to the threats brought by deep learning models to data security and privacy, especially in the field of domain adaptation. Existing unsupervised domain adaptation (UDA) methods can achieve promising performance without transferring data from source domain to target domain. However, UDA with representation alignment or self-supervised pseudo-labeling relies on the transferred source models. In many data-critical scenarios, methods based on model transferring may suffer from membership inference attacks and expose private data. In this paper, we aim to overcome a challenging new setting where the source models are only queryable but cannot be transferred to the target domain. We propose Black-box Probe Domain Adaptation (BPDA), which adopts query mechanism to probe and refine information from source model using third-party dataset. In order to gain more informative query results, we further propose Distributionally Adversarial Training (DAT) to align the distribution of third-party data with that of target data. BPDA uses public third-party dataset and adversarial examples based on DAT as the information carrier between source and target domains, dispensing with transferring source data or model. Experimental results on benchmarks of Digit-Five, Office-Caltech, Office-31, Office-Home, and DomainNet demonstrate the feasibility of BPDA without model transferring.

【2】 S4T: Source-free domain adaptation for semantic segmentation via self-supervised selective self-training 标题:S4T:基于自监督选择性自训练的无源域自适应语义分割

作者:Viraj Prabhu,Shivam Khare,Deeksha Kartik,Judy Hoffman 机构:Georgia Institute of Technology 链接:https://arxiv.org/abs/2107.10140 摘要:大多数现代领域自适应语义分割方法依赖于在自适应过程中对源数据的连续访问,这可能由于计算或隐私限制而不可行。本文主要研究无源域自适应的语义分割方法,其中源模型必须在给定未标记目标数据的情况下自适应于新的目标域。我们提出了一种无源自适应算法,即自监督选择性自训练(S4T),该算法首先利用模型在不同视角下的像素级预测一致性和模型置信度,将像素预测分为可靠和不可靠两类。接下来,该模型是自训练的,使用预测的伪标签进行可靠的预测,并使用通过选择插值策略推断的伪标签进行不可靠的预测。S4T匹配或改进了最先进的无源代码自适应技术,在一个单一的自适应时代内实现了3个标准的语义切分基准。 摘要:Most modern approaches for domain adaptive semantic segmentation rely on continued access to source data during adaptation, which may be infeasible due to computational or privacy constraints. We focus on source-free domain adaptation for semantic segmentation, wherein a source model must adapt itself to a new target domain given only unlabeled target data. We propose Self-Supervised Selective Self-Training (S4T), a source-free adaptation algorithm that first uses the model's pixel-level predictive consistency across diverse views of each target image along with model confidence to classify pixel predictions as either reliable or unreliable. Next, the model is self-trained, using predicted pseudolabels for reliable predictions and pseudolabels inferred via a selective interpolation strategy for unreliable ones. S4T matches or improves upon the state-of-the-art in source-free adaptation on 3 standard benchmarks for semantic segmentation within a single epoch of adaptation.

【3】 Uncertainty Estimation and Out-of-Distribution Detection for Counterfactual Explanations: Pitfalls and Solutions 标题:反事实解释的不确定性估计和离散性检测:陷阱和解决方案

作者:Eoin Delaney,Derek Greene,Mark T. Keane 机构: The provision of uncertaintyestimations on counterfactual explanations can avoid pre-senting users with overconfident and potentially harmful 1School of Computer Science, University College Dublin 备注:None 链接:https://arxiv.org/abs/2107.09734 摘要:虽然最近提出了大量的技术来产生对不透明黑匣子系统的预测的反事实解释,但对探索这些产生的解释的不确定性的关注明显较少。在高风险场景中,这成为一个关键问题,不确定和误导性的解释可能会产生可怕的后果(例如,医疗诊断和治疗计划)。此外,通常很难确定生成的解释是否基于训练数据并且对分布变化敏感。本文提出了一些实用的解决方案,可以通过与其他研究工作在解释性(如信任分数)和不确定性估计(如蒙特卡罗退出)方面建立新的联系来解决这些问题。两个实验证明了我们提出的解决方案的实用性。 摘要:Whilst an abundance of techniques have recently been proposed to generate counterfactual explanations for the predictions of opaque black-box systems, markedly less attention has been paid to exploring the uncertainty of these generated explanations. This becomes a critical issue in high-stakes scenarios, where uncertain and misleading explanations could have dire consequences (e.g., medical diagnosis and treatment planning). Moreover, it is often difficult to determine if the generated explanations are well grounded in the training data and sensitive to distributional shifts. This paper proposes several practical solutions that can be leveraged to solve these problems by establishing novel connections with other research works in explainability (e.g., trust scores) and uncertainty estimation (e.g., Monte Carlo Dropout). Two experiments demonstrate the utility of our proposed solutions.

【4】 Towards Lower-Dose PET using Physics-Based Uncertainty-Aware Multimodal Learning with Robustness to Out-of-Distribution Data 标题:基于物理的对分布外数据具有鲁棒性的不确定性感知多模态学习的低剂量PET

作者:Viswanath P. Sudarshan,Uddeshya Upadhyay,Gary F. Egan,Zhaolin Chen,Suyash P. Awate 机构:⋆ , Computer Science and Engineering (CSE) Department, Indian Institute of Technology (IIT) Bombay, Mumbai, India., IITB-Monash Research Academy, Monash Biomedical Imaging (MBI), Monash University, Melbourne, Australia. 备注:Accepted at Medical Image Analysis 链接:https://arxiv.org/abs/2107.09892 摘要:正电子发射断层扫描(PET)成像中的辐射暴露限制了其在辐射敏感人群研究中的应用,例如孕妇、儿童和成人需要纵向成像。减少PET放射性示踪剂剂量或采集时间会减少光子计数,从而降低图像质量。最近基于深度神经网络(DNN)的图像到图像转换方法使得低质量PET图像(使用大幅度降低的剂量获得)与相关的磁共振成像(MRI)图像结合到高质量PET图像的映射成为可能。然而,这种DNN方法侧重于涉及与训练数据的统计特征非常匹配的测试数据的应用,而很少注意评估这些DNN在新的分布外(OOD)采集上的性能。我们提出了一种新的DNN公式,该公式模拟了(i)PET成像系统基于正弦图的基本物理和(ii)通过预测图像和高质量参考图像之间残差的每体素异方差的DNN输出的不确定性。我们基于sinogram的不确定性感知DNN框架,即suDNN,以(i)低剂量/低计数PET图像和(ii)相应的多对比MRI图像的形式使用多模态输入来估计标准剂量PET图像,从而提高suDNN对OOD采集的鲁棒性。体内同步PET-MRI的结果以及PET-MRI中各种形式的OOD数据,从定量和定性上显示了suDNN在当前技术水平上的优势。 摘要:Radiation exposure in positron emission tomography (PET) imaging limits its usage in the studies of radiation-sensitive populations, e.g., pregnant women, children, and adults that require longitudinal imaging. Reducing the PET radiotracer dose or acquisition time reduces photon counts, which can deteriorate image quality. Recent deep-neural-network (DNN) based methods for image-to-image translation enable the mapping of low-quality PET images (acquired using substantially reduced dose), coupled with the associated magnetic resonance imaging (MRI) images, to high-quality PET images. However, such DNN methods focus on applications involving test data that match the statistical characteristics of the training data very closely and give little attention to evaluating the performance of these DNNs on new out-of-distribution (OOD) acquisitions. We propose a novel DNN formulation that models the (i) underlying sinogram-based physics of the PET imaging system and (ii) the uncertainty in the DNN output through the per-voxel heteroscedasticity of the residuals between the predicted and the high-quality reference images. Our sinogram-based uncertainty-aware DNN framework, namely, suDNN, estimates a standard-dose PET image using multimodal input in the form of (i) a low-dose/low-count PET image and (ii) the corresponding multi-contrast MRI images, leading to improved robustness of suDNN to OOD acquisitions. Results on in vivo simultaneous PET-MRI, and various forms of OOD data in PET-MRI, show the benefits of suDNN over the current state of the art, quantitatively and qualitatively.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 Multi-agent Reinforcement Learning Improvement in a Dynamic Environment Using Knowledge Transfer 标题:动态环境下基于知识转移的多智能体强化学习改进

作者:Mahnoosh Mahdavimoghaddama,Amin Nikanjama,Monireh Abdoos 机构:K. N. Toosi University of Technology, Tehran, Iran, Shahid Beheshti University, Tehran, Iran, c SWAT Lab., Polytechnique Montréal, Quebec, Canada 备注:arXiv admin note: text overlap with arXiv:1912.07796 by other authors 链接:https://arxiv.org/abs/2107.09807 摘要:多智能体协作系统在不同的领域有着广泛的应用。代理之间的交互将带来好处,包括降低操作成本、高可扩展性和促进并行处理。这些系统也是处理大规模、未知和动态环境的好选择。然而,在这些环境中学习已经成为各种应用中一个非常重要的挑战。这些挑战包括搜索空间大小对学习时间的影响、代理之间的低效合作以及代理决策之间缺乏适当的协调。此外,在这些问题中,强化学习算法的收敛时间较长。本文提出了一种基于知识转移概念的通信框架来解决大状态空间羊群问题。为了解决算法的收敛性问题,采用了知识转移的方法,大大提高了强化学习算法的效率。代理之间的协调分别通过每个代理组中的一个主代理和一个协调代理来执行。结果表明,该框架确实可以提高学习速度,缩短收敛时间。 摘要:Cooperative multi-agent systems are being widely used in different domains. Interaction among agents would bring benefits, including reducing operating costs, high scalability, and facilitating parallel processing. These systems are also a good option for handling large-scale, unknown, and dynamic environments. However, learning in these environments has become a very important challenge in various applications. These challenges include the effect of search space size on learning time, inefficient cooperation among agents, and the lack of proper coordination among agents' decisions. Moreover, reinforcement learning algorithms may suffer from long convergence time in these problems. In this paper, a communication framework using knowledge transfer concepts is introduced to address such challenges in the herding problem with large state space. To handle the problems of convergence, knowledge transfer has been utilized that can significantly increase the efficiency of reinforcement learning algorithms. Coordination between the agents is carried out through a head agent in each group of agents and a coordinator agent respectively. The results demonstrate that this framework could indeed enhance the speed of learning and reduce convergence time.

【2】 Adaptive Inducing Points Selection For Gaussian Processes 标题:高斯过程的自适应诱导点选择

作者:Théo Galy-Fajou,Manfred Opper 机构: Technical University of Berlin 备注:Accepted at Continual Learning Workshop - ICML 2020 : this https URL 链接:https://arxiv.org/abs/2107.10066 摘要:高斯过程(textbf{GPs})是一种灵活的非参数模型,具有很强的概率解释能力。虽然GPs是对时间序列进行推断的标准选择,但在流式传输环境中很少有技术可以使用cite{bui2017streaming}提出了一种利用稀疏性技术训练在线GPs的有效变分方法:用一组较小的诱导点(textbf{IPs})来逼近整个观测集,并用新数据移动。IPs的数目和位置对算法的性能有很大的影响。除了优化它们的位置之外,我们还提出了根据GP的性质和数据结构自适应地添加新的点。 摘要:Gaussian Processes (textbf{GPs}) are flexible non-parametric models with strong probabilistic interpretation. While being a standard choice for performing inference on time series, GPs have few techniques to work in a streaming setting. cite{bui2017streaming} developed an efficient variational approach to train online GPs by using sparsity techniques: The whole set of observations is approximated by a smaller set of inducing points (textbf{IPs}) and moved around with new data. Both the number and the locations of the IPs will affect greatly the performance of the algorithm. In addition to optimizing their locations, we propose to adaptively add new points, based on the properties of the GP and the structure of the data.

强化学习(4篇)

【1】 Demonstration-Guided Reinforcement Learning with Learned Skills 标题:带学习技能的示范引导式强化学习

作者:Karl Pertsch,Youngwoon Lee,Yue Wu,Joseph J. Lim 机构:University of Southern California 链接:https://arxiv.org/abs/2107.10253 摘要:示范引导强化学习(RL)是一种利用奖励反馈和一组目标任务示范来学习复杂行为的有效方法。先前的演示引导RL方法将每个新任务视为一个独立的学习问题,并尝试一步一步地跟随所提供的演示,类似于人类试图通过跟随演示者的精确肌肉运动来模仿完全看不见的行为。当然,这样的学习会很慢,但新的行为往往不是完全看不见的:它们与我们以前学过的行为共享子任务。在这项工作中,我们的目标是利用这种共享的子任务结构来提高演示引导RL的效率。我们首先从跨多个任务收集的大量离线经验数据集中学习一组可重用的技能。然后,我们提出了基于技能的示范学习(SkiLD),这是一种示范引导RL算法,它通过遵循示范技能而不是原始动作来有效地利用所提供的示范,从而比以前的示范引导RL方法有显著的性能改进。在长视距迷宫导航和复杂机器人操作任务中验证了该方法的有效性。 摘要:Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen behavior by following the demonstrator's exact muscle movements. Naturally, such learning will be slow, but often new behaviors are not completely unseen: they share subtasks with behaviors we have previously learned. In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL. We first learn a set of reusable skills from large offline datasets of prior experience collected across many tasks. We then propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations by following the demonstrated skills instead of the primitive actions, resulting in substantial performance improvements over prior demonstration-guided RL approaches. We validate the effectiveness of our approach on long-horizon maze navigation and complex robot manipulation tasks.

【2】 A Deep Reinforcement Learning Approach for Fair Traffic Signal Control 标题:一种用于公平交通信号控制的深度强化学习方法

作者:Majid Raeis,Alberto Leon-Garcia 机构:UniversityofToronto 备注:7 pages, Accepted at ITSC 2021 (International Conference on Intelligent Transportation Systems) 链接:https://arxiv.org/abs/2107.10146 摘要:交通信号控制是城市交通管理中最有效的方法之一。近年来,基于深度强化学习(DRL)的交通控制方法因其对实时交通数据的挖掘能力而受到广泛关注,而传统的手工方法往往使用较少。最近的基于DRL的方法主要集中在最大化车辆的吞吐量或最小化车辆的平均行驶时间,而交通信号控制器的公平性常常被忽略。这一点尤其重要,因为忽略公平性可能导致某些车辆经历极端等待时间,或者特定交通流的吞吐量受到交叉口另一冲突流量波动的高度影响。为了解决这些问题,我们引入了两个公平性的概念:基于延迟的公平性和基于吞吐量的公平性。此外,我们还提出了两种基于DRL的交通信号控制方法来实现这些公平性概念,这两种方法都可以获得较高的吞吐量。我们使用三种流量到达分布来评估我们提出的方法的性能,发现我们的方法在测试场景中的性能优于基线。 摘要:Traffic signal control is one of the most effective methods of traffic management in urban areas. In recent years, traffic control methods based on deep reinforcement learning (DRL) have gained attention due to their ability to exploit real-time traffic data, which is often poorly used by the traditional hand-crafted methods. While most recent DRL-based methods have focused on maximizing the throughput or minimizing the average travel time of the vehicles, the fairness of the traffic signal controllers has often been neglected. This is particularly important as neglecting fairness can lead to situations where some vehicles experience extreme waiting times, or where the throughput of a particular traffic flow is highly impacted by the fluctuations of another conflicting flow at the intersection. In order to address these issues, we introduce two notions of fairness: delay-based and throughput-based fairness, which correspond to the two issues mentioned above. Furthermore, we propose two DRL-based traffic signal control methods for implementing these fairness notions, that can achieve a high throughput as well. We evaluate the performance of our proposed methods using three traffic arrival distributions, and find that our methods outperform the baselines in the tested scenarios.

【3】 MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments 标题:MarsExplorer:通过深度强化学习和程序生成环境探索未知地形

作者:Dimitrios I. Koutras,Athanasios Ch. Kapoutsis,Angelos A. Amanatiadis,Elias B. Kosmatopoulos 机构:Kosmatopoulos , Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece, Information Technologies Institute, The Centre for Research & Technology, Hellas, Thessaloniki, Greece 链接:https://arxiv.org/abs/2107.09996 摘要:本文是一个初步的努力,以弥补之间的差距强大的深层强化学习方法和探索/覆盖未知地形的问题。在这个范围内,MarsExplorer,一个与openai健身房兼容的环境,专门用于未知区域的探索/覆盖。MarsExplorer将最初的机器人问题转化为一个强化学习设置,各种现成的算法可以解决。任何学习到的策略都可以直接应用到机器人平台上,而无需对机器人的动力学建立详细的仿真模型来应用不同的学习/适应阶段。它的核心特征之一是可控的多维地形过程生成,这是生成具有较强泛化能力的策略的关键。在MarsExplorer环境中训练了四种不同的最新RL算法(A3C、PPO、Rainbow和SAC),并与人类水平的平均性能进行了比较。在后续的实验分析中,分析了多维难度设置对最佳执行算法(PPO)学习能力的影响。一个里程碑式的结果是生成一个遵循希尔BERT曲线的勘探政策,而不向环境提供这些信息,也不直接或间接地奖励希尔BERT曲线样的轨迹。实验分析的结论是比较PPO学习的政策结果与基于边界的探索背景下的扩展地形大小。源代码位于:https://github.com/dimikout3/GeneralExplorationPolicy. 摘要:This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot's dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by comparing PPO learned policy results with frontier-based exploration context for extended terrain sizes. The source code can be found at: https://github.com/dimikout3/GeneralExplorationPolicy.

【4】 Enhancing Loop-Invariant Synthesis via Reinforcement Learning 标题:基于强化学习的环路不变综合

作者:Takeshi Tsukada,Hiroshi Unno,Taro Sekiyama,Kohei Suenaga 机构:Chiba University, Japan, University of Tsukuba, Japan, National Institute of Informatics, Japan, Kyoto University, Japan 链接:https://arxiv.org/abs/2107.09766 摘要:循环不变综合是每个程序验证过程的基础。由于其一般不确定性,用于不变综合的工具必然使用启发式。尽管人们普遍认为启发式算法的设计对于验证器的有效性能至关重要,但是对于获得每个不变综合工具的最优启发式算法的研究却很少。相反,开发人员手工调整了工具的启发式。这项研究表明,我们可以有效地自动学习一个良好的启发式强化学习为一个不变的合成器PCSat。我们的实验表明,PCSat结合强化学习的启发式学习算法在这项任务上的表现优于目前最先进的求解算法。据我们所知,这是第一个工作,研究学习启发式的不变综合工具。 摘要:Loop-invariant synthesis is the basis of every program verification procedure. Due to its undecidability in general, a tool for invariant synthesis necessarily uses heuristics. Despite the common belief that the design of heuristics is vital for the effective performance of a verifier, little work has been performed toward obtaining the optimal heuristics for each invariant-synthesis tool. Instead, developers have hand-tuned the heuristics of tools. This study demonstrates that we can effectively and automatically learn a good heuristic via reinforcement learning for an invariant synthesizer PCSat. Our experiment shows that PCSat combined with the heuristic learned by reinforcement learning outperforms the state-of-the-art solvers for this task. To the best of our knowledge, this is the first work that investigates learning the heuristics of an invariant synthesis tool.

医学相关(3篇)

【1】 Machine Learning for Real-World Evidence Analysis of COVID-19 Pharmacotherapy 标题:机器学习在冠状病毒药物治疗实证分析中的应用

作者:Aurelia Bustos,Patricio Mas_Serrano,Mari L. Boquera,Jose M. Salinas 机构:AI Medical Research Unit, MedBravo∗, Pharmacy Department, HGUA†, ISABIAL‡, Mari Luz Boquera, Jose Maria Salinas, IT Department§, San Juan University Hospital 备注:22 pages, 7 tables, 11 figures 链接:https://arxiv.org/abs/2107.10239 摘要:引言:临床实践中产生的真实世界数据可用于分析COVID-19药物治疗的真实世界证据(RWE)和验证随机临床试验(RCTs)的结果。机器学习(ML)方法在RWE中得到了广泛的应用,是一种很有前途的精密医学工具。在这项研究中,ML方法用于研究西班牙巴伦西亚地区COVID-19住院治疗的疗效。方法:采用10个卫生部门2020年1月至2021年1月的5244例和1312例COVID-19住院病例,分别对remdesivir、皮质类固醇、tocilizumab、lopinavir-ritonavir、阿奇霉素和氯喹/羟基氯喹的治疗效果模型(TE-ML)进行训练和验证。另外两个卫生部门的2390名住院患者被保留作为一项独立测试,以回顾性分析使用cox比例风险模型的TE-ML模型选择的人群中治疗的生存益处。使用治疗倾向评分调整TE-ML模型,以控制与结果相关的治疗前混杂变量,并进一步评估其无效性。ML架构基于增强的决策树。结果:在TE-ML模型确定的人群中,只有Remdesivir和Tocilizumab与生存时间增加显著相关,危险比分别为0.41(P=0.04)和0.21(P=0.001)。氯喹衍生物、洛匹那韦、利托那韦和阿奇霉素对存活率无影响。解释TE-ML模型预测的工具在患者层面被探索为个性化决策和精确医学的潜在工具。结论:ML法适用于COVID-19药物治疗的RWE分析。所得结果重现了RWE上已发表的结果,并验证了RCT的结果。 摘要:Introduction: Real-world data generated from clinical practice can be used to analyze the real-world evidence (RWE) of COVID-19 pharmacotherapy and validate the results of randomized clinical trials (RCTs). Machine learning (ML) methods are being used in RWE and are promising tools for precision-medicine. In this study, ML methods are applied to study the efficacy of therapies on COVID-19 hospital admissions in the Valencian Region in Spain. Methods: 5244 and 1312 COVID-19 hospital admissions - dated between January 2020 and January 2021 from 10 health departments, were used respectively for training and validation of separate treatment-effect models (TE-ML) for remdesivir, corticosteroids, tocilizumab, lopinavir-ritonavir, azithromycin and chloroquine/hydroxychloroquine. 2390 admissions from 2 additional health departments were reserved as an independent test to analyze retrospectively the survival benefits of therapies in the population selected by the TE-ML models using cox-proportional hazard models. TE-ML models were adjusted using treatment propensity scores to control for pre-treatment confounding variables associated to outcome and further evaluated for futility. ML architecture was based on boosted decision-trees. Results: In the populations identified by the TE-ML models, only Remdesivir and Tocilizumab were significantly associated with an increase in survival time, with hazard ratios of 0.41 (P = 0.04) and 0.21 (P = 0.001), respectively. No survival benefits from chloroquine derivatives, lopinavir-ritonavir and azithromycin were demonstrated. Tools to explain the predictions of TE-ML models are explored at patient-level as potential tools for personalized decision making and precision medicine. Conclusion: ML methods are suitable tools toward RWE analysis of COVID-19 pharmacotherapies. Results obtained reproduce published results on RWE and validate the results from RCTs.

【2】 ECG Heartbeat Classification Using Multimodal Fusion 标题:基于多模式融合的心电心跳分类

作者:Zeeshan Ahmad,Anika Tabassum,Ling Guan,Naimul Khan 机构:(Fellow, IEEE), NAIMUL MEFRAZ KHAN, (Senior Member, IEEE), Financial support from NSERC and Dapasoft Inc. (CRDPJ,-,) to conduct the research is highly appreciated. 链接:https://arxiv.org/abs/2107.09869 摘要:心电图(ECG)是诊断和对抗心律失常、心肌梗死(MI)等严重心血管综合征的权威资料。目前的机器学习技术要么依赖于人工提取的特征,要么依赖于仅直接利用一维心电信号的大型复杂的深度学习网络。由于智能多模态融合可以在最先进的水平上通过一个高效的深度网络来实现,因此,本文提出了两种计算效率高的多模态融合框架,称为多模态图像融合(MIF)和多模态特征融合(MFF)。在这些框架的输入下,我们使用Gramian角场(GAF)、递推图(RP)和Markov变换场(MTF)将原始心电数据转换成三种不同的图像。在MIF中,我们首先通过组合三种成像模式来执行图像融合,以创建单个图像模式作为卷积神经网络(CNN)的输入。在MFF中,我们从CNNs的倒数第二层提取特征,并对其进行融合,以获得更好的分类器性能所需的唯一和相互依赖的信息。最后利用这些信息特征训练支持向量机(SVM)分类器对心电信号进行分类。我们通过在PhysioNets MIT-BIH数据集上对5种不同的心律失常(符合AAMI EC57协议)和PTB诊断数据集上对心肌梗死(MI)分类进行实验,证明了所提出的融合模型的优越性。对心律失常和心肌梗死的分类准确率分别为99.7%和99.2%。 摘要:Electrocardiogram (ECG) is an authoritative source to diagnose and counter critical cardiovascular syndromes such as arrhythmia and myocardial infarction (MI). Current machine learning techniques either depend on manually extracted features or large and complex deep learning networks which merely utilize the 1D ECG signal directly. Since intelligent multimodal fusion can perform at the stateof-the-art level with an efficient deep network, therefore, in this paper, we propose two computationally efficient multimodal fusion frameworks for ECG heart beat classification called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF). At the input of these frameworks, we convert the raw ECG data into three different images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF). In MIF, we first perform image fusion by combining three imaging modalities to create a single image modality which serves as input to the Convolutional Neural Network (CNN). In MFF, we extracted features from penultimate layer of CNNs and fused them to get unique and interdependent information necessary for better performance of classifier. These informational features are finally used to train a Support Vector Machine (SVM) classifier for ECG heart-beat classification. We demonstrate the superiority of the proposed fusion models by performing experiments on PhysioNets MIT-BIH dataset for five distinct conditions of arrhythmias which are consistent with the AAMI EC57 protocols and on PTB diagnostics dataset for Myocardial Infarction (MI) classification. We achieved classification accuracy of 99.7% and 99.2% on arrhythmia and MI classification, respectively.

【3】 Checkovid: A COVID-19 misinformation detection system on Twitter using network and content mining perspectives 标题:Checkovid:一个基于网络和内容挖掘的Twitter冠状病毒错误信息检测系统

作者:Sajad Dadgar,Mehdi Ghatee 机构:Department of Mathematics and Computer Science, Amirkabir University of Technology, No. , Hafez Avenue, Tehran ,-, Iran 备注:20 Pages, 18 Figures, 7 Tables, Submitted for Review Process in a Journal 链接:https://arxiv.org/abs/2107.09768 摘要:在COVID-19大流行期间,由于社会隔离和隔离,社交媒体平台是沟通的理想平台。此外,它也是大规模错误信息传播的主要来源,被称为信息传播。因此,自动破译错误信息是一个至关重要的问题。为了解决这个问题,我们在Twitter上提出了两个COVID-19相关的错误信息数据集,并提出了一个基于机器学习算法和NLP技术的错误信息检测系统。在基于网络的过程中,我们关注社会属性、网络特征和用户。另一方面,在基于内容的分类过程中,我们直接利用tweet的内容对错误信息进行分类,包括文本分类模型(段落级和句子级)和相似度模型。基于网络过程的评价结果表明,人工神经网络模型的评价结果最好,F1值为88.68%。在基于内容的分类过程中,与基于网络的模型相比,我们的新的相似性模型的F1评分为90.26%,显示了错误信息分类结果的改进。此外,在文本分类模型中,使用叠加集成学习模型得到的F1得分为95.18%,取得了最好的分类效果。此外,我们还测试了基于内容的模型Constraint@AAAI2021通过得到94.38%的F1分数,我们改进了基线结果。最后,我们开发了一个名为Checkovid的事实检查网站,它使用每个过程从不同的角度检测COVID-19域中的错误信息和信息性声明。 摘要:During the COVID-19 pandemic, social media platforms were ideal for communicating due to social isolation and quarantine. Also, it was the primary source of misinformation dissemination on a large scale, referred to as the infodemic. Therefore, automatic debunking misinformation is a crucial problem. To tackle this problem, we present two COVID-19 related misinformation datasets on Twitter and propose a misinformation detection system comprising network-based and content-based processes based on machine learning algorithms and NLP techniques. In the network-based process, we focus on social properties, network characteristics, and users. On the other hand, we classify misinformation using the content of the tweets directly in the content-based process, which contains text classification models (paragraph-level and sentence-level) and similarity models. The evaluation results on the network-based process show the best results for the artificial neural network model with an F1 score of 88.68%. In the content-based process, our novel similarity models, which obtained an F1 score of 90.26%, show an improvement in the misinformation classification results compared to the network-based models. In addition, in the text classification models, the best result was achieved using the stacking ensemble-learning model by obtaining an F1 score of 95.18%. Furthermore, we test our content-based models on the Constraint@AAAI2021 dataset, and by getting an F1 score of 94.38%, we improve the baseline results. Finally, we develop a fact-checking website called Checkovid that uses each process to detect misinformative and informative claims in the domain of COVID-19 from different perspectives.

自动驾驶|车辆|车道检测等(1篇)

【1】 Training Electric Vehicle Charging Controllers with Imitation Learning 标题:用模拟学习方法训练电动汽车充电控制器

作者:Martin Pilát 备注:Submitted to ICTAI 2021 链接:https://arxiv.org/abs/2107.10111 摘要:随着电动汽车数量的增加,协调电动汽车充电的问题变得越来越重要。本文提出了一种电动汽车充电协调控制器的训练方法。与此主题的大多数现有工作不同,我们要求控制器保护用户的隐私,因此我们不允许控制器与任何第三方进行任何通信。为了训练控制器,我们使用了模仿学习的思想——我们首先用二次优化方法为问题的松弛版本找到一个最优解,然后训练控制器来模仿这个解。研究了最优解的正则化对控制器性能的影响。在实际数据上对该方法进行了评估,结果表明,与使用进化算法训练的类似控制器相比,该方法的性能和训练速度都有所提高。 摘要:The problem of coordinating the charging of electric vehicles gains more importance as the number of such vehicles grows. In this paper, we develop a method for the training of controllers for the coordination of EV charging. In contrast to most existing works on this topic, we require the controllers to preserve the privacy of the users, therefore we do not allow any communication from the controller to any third party. In order to train the controllers, we use the idea of imitation learning -- we first find an optimum solution for a relaxed version of the problem using quadratic optimization and then train the controllers to imitate this solution. We also investigate the effects of regularization of the optimum solution on the performance of the controllers. The method is evaluated on realistic data and shows improved performance and training speed compared to similar controllers trained using evolutionary algorithms.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations 标题:具有一般RELU激活的Depth-2神经网络的高效学习算法

作者:Pranjal Awasthi,Alex Tang,Aravindan Vijayaraghavan 机构:Google Research, Northwestern University 备注:36 pages (including appendix) 链接:https://arxiv.org/abs/2107.10209 摘要:在温和的非简并假设下,我们提出多项式时间和样本有效的算法来学习具有一般ReLU激活的未知深度2前馈神经网络。特别地,我们考虑学习一个未知形式的f $(x)={a} {{Mathsf {t}}σ({W}^ Mthsf {t} x b)$,其中$x$是从高斯分布中提取的,$ σ(t):=max(t,0)$是Relu激活。以前的工作学习网络与ReLU激活假设偏差$b$为零。为了处理偏项的存在,我们提出的算法包括对函数$f(x)$的Hermite展开所产生的多个高阶张量进行鲁棒分解。利用这些思想,我们还建立了最小假设下网络参数的可辨识性。 摘要:We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{mathsf{T}}sigma({W}^mathsf{T}x b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions.

联邦学习|隐私保护|加密(1篇)

【1】 Federated Learning using Smart Contracts on Blockchains, based on Reward Driven Approach 标题:基于奖励驱动的区块链智能合约联合学习

作者:Monik Raj Behera,Sudhir Upadhyay,Suresh Shetty 备注:9 pages, 7 figures and 1 table 链接:https://arxiv.org/abs/2107.10243 摘要:近几年来,联邦机器学习在需要从数据中获取见解的同时,继续获得兴趣和动力,同时保护数据提供者的隐私。然而,在采用联合学习方面存在的其他挑战之一是缺乏公平、透明和普遍同意的奖励联合学习贡献者的激励计划。区块链网络上的智能合约为网络的所有参与者提供透明、不变和可独立验证的证据。我们利用区块链上智能合约的这种开放和透明的性质来定义贡献者的激励规则,它基于一种新的标量-联邦贡献。这种基于智能合约的奖励驱动模型有可能彻底改变企业采用联合学习的方式。我们的贡献有两个方面:第一,展示基于智能合约的区块链如何成为联邦学习的一个非常自然的沟通渠道。第二,利用这个基础设施,我们可以展示如何建立每个代理贡献的直观度量,并将其与训练和奖励流程的生命周期集成。 摘要:Over the recent years, Federated machine learning continues to gain interest and momentum where there is a need to draw insights from data while preserving the data provider's privacy. However, one among other existing challenges in the adoption of federated learning has been the lack of fair, transparent and universally agreed incentivization schemes for rewarding the federated learning contributors. Smart contracts on a blockchain network provide transparent, immutable and independently verifiable proofs by all participants of the network. We leverage this open and transparent nature of smart contracts on a blockchain to define incentivization rules for the contributors, which is based on a novel scalar quantity - federated contribution. Such a smart contract based reward-driven model has the potential to revolutionize the federated learning adoption in enterprises. Our contribution is two-fold: first is to show how smart contract based blockchain can be a very natural communication channel for federated learning. Second, leveraging this infrastructure, we can show how an intuitive measure of each agents' contribution can be built and integrated with the life cycle of the training and reward process.

推理|分析|理解|解释(7篇)

【1】 Answer-Set Programs for Reasoning about Counterfactual Interventions and Responsibility Scores for Classification 标题:关于反事实干预和分类责任得分推理的答案集程序

作者:Leopoldo Bertossi,Gabriela Reyes 机构:Universidad Adolfo Ib´a˜nez, and, Millennium Inst. for Foundational Research on Data (IMFD), Santiago, Chile 备注:Extended version with appendices of conference submission (under review). arXiv admin note: text overlap with arXiv:2106.10562 链接:https://arxiv.org/abs/2107.10159 摘要:我们描述了如何使用答案集程序声明性地指定对分类下实体的反事实干预,以及它们的原因。特别是,它们可以用来定义和计算责任分数,作为分类模型结果的基于归因的解释。该方法允许包含领域知识并支持查询应答。给出了一个朴素贝叶斯分类器的详细实例。 摘要:We describe how answer-set programs can be used to declaratively specify counterfactual interventions on entities under classification, and reason about them. In particular, they can be used to define and compute responsibility scores as attribution-based explanations for outcomes from classification models. The approach allows for the inclusion of domain knowledge and supports query answering. A detailed example with a naive-Bayes classifier is presented.

【2】 The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding 标题:中级任务训练对语码转换自然语言理解的有效性

作者:Archiki Prasad,Mohammad Ali Rehan,Shreya Pathak,Preethi Jyothi 机构:Indian Institute of Technology, Bombay 链接:https://arxiv.org/abs/2107.09931 摘要:虽然最近的基准测试在改进多语言任务的预训练多语言模型的泛化方面激发了大量新的工作,但是改进代码转换自然语言理解任务的技术却很少被探索。在这项工作中,我们建议使用双语中间预训练作为一种可靠的技术,在使用代码切换文本的三种不同的自然语言处理任务上获得大而一致的性能增益。我们在印地语-英语自然语言推理(NLI)、问答(QA)和西班牙语-英语情感分析(SA)的平均准确度和F1分数上分别比以前的最先进系统有了7.87%、20.15%和10.99%的绝对提高。我们在四种不同的代码转换语言对(印地语英语、西班牙语英语、泰米尔语英语和马拉雅拉姆语英语)上展示了SA的一致性能增益。我们还提出了一种代码切换蒙面语言建模(MLM)预训练技术,与使用真实代码切换文本的标准MLM预训练相比,该技术始终有利于SA。 摘要:While recent benchmarks have spurred a lot of new work on improving the generalization of pretrained multilingual language models on multilingual tasks, techniques to improve code-switched natural language understanding tasks have been far less explored. In this work, we propose the use of bilingual intermediate pretraining as a reliable technique to derive large and consistent performance gains on three different NLP tasks using code-switched text. We achieve substantial absolute improvements of 7.87%, 20.15%, and 10.99%, on the mean accuracies and F1 scores over previous state-of-the-art systems for Hindi-English Natural Language Inference (NLI), Question Answering (QA) tasks, and Spanish-English Sentiment Analysis (SA) respectively. We show consistent performance gains on four different code-switched language-pairs (Hindi-English, Spanish-English, Tamil-English and Malayalam-English) for SA. We also present a code-switched masked language modelling (MLM) pretraining technique that consistently benefits SA compared to standard MLM pretraining using real code-switched text.

【3】 GLIME: A new graphical methodology for interpretable model-agnostic explanations 标题:GLIME:一种新的图形化解释方法--模型不可知性解释

作者:Zoumpolia Dikopoulou,Serafeim Moustakidis,Patrik Karlsson 链接:https://arxiv.org/abs/2107.09927 摘要:可解释人工智能(XAI)是一个新兴的领域,在这个领域中,一系列的过程和工具使人们能够更好地理解由黑盒模型生成的决策。然而,大多数可用的XAI工具通常仅限于简单的解释,主要是量化各个特性对模型输出的影响。因此,人类用户无法理解特征之间的相互关系以进行预测,而训练模型的内部工作机制仍然是隐藏的。本文致力于开发一种新的图形化解释工具,该工具不仅能显示模型的重要特征,而且能揭示特征之间的条件关系和推理,捕捉特征对模型决策的直接和间接影响。提出的XAI方法称为gLIME,它提供了全局(对于整个数据集)或局部(对于特定数据点)的图形模型不可知解释。它依赖于局部可解释模型不可知解释(LIME)与图形最小绝对收缩和选择算子(GLASSO)的结合,产生无向高斯图形模型。采用正则化方法将小的偏相关系数压缩到零,从而提供更稀疏、更易于解释的图形解释。选择两个著名的分类数据集(活检和OAI)来证实gLIME在稳健性和一致性方面优于LIME。具体来说,gLIME在两个数据集上实现了特征重要性方面的稳定性提高(76%-96%,而使用LIME则为52%-77%)。gLIME展示了一种独特的潜力,通过提供信息丰富的图形化解释,可以打开黑匣子,从而扩展XAI当前最先进的功能。 摘要:Explainable artificial intelligence (XAI) is an emerging new domain in which a set of processes and tools allow humans to better comprehend the decisions generated by black box models. However, most of the available XAI tools are often limited to simple explanations mainly quantifying the impact of individual features to the models' output. Therefore, human users are not able to understand how the features are related to each other to make predictions, whereas the inner workings of the trained models remain hidden. This paper contributes to the development of a novel graphical explainability tool that not only indicates the significant features of the model but also reveals the conditional relationships between features and the inference capturing both the direct and indirect impact of features to the models' decision. The proposed XAI methodology, termed as gLIME, provides graphical model-agnostic explanations either at the global (for the entire dataset) or the local scale (for specific data points). It relies on a combination of local interpretable model-agnostic explanations (LIME) with graphical least absolute shrinkage and selection operator (GLASSO) producing undirected Gaussian graphical models. Regularization is adopted to shrink small partial correlation coefficients to zero providing sparser and more interpretable graphical explanations. Two well-known classification datasets (BIOPSY and OAI) were selected to confirm the superiority of gLIME over LIME in terms of both robustness and consistency over multiple permutations. Specifically, gLIME accomplished increased stability over the two datasets with respect to features' importance (76%-96% compared to 52%-77% using LIME). gLIME demonstrates a unique potential to extend the functionality of the current state-of-the-art in XAI by providing informative graphically given explanations that could unlock black boxes.

【4】 MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis 标题:MG-NET:利用伪成像进行多模态元基因组分析

作者:Sathyanarayanan N. Aakur,Sai Narayanan,Vineela Indla,Arunkumar Bagavathi,Vishalini Laguduva Ramnath,Akhilesh Ramachandran 机构: Department of Computer Science, Oklahoma State University, Stillwater, OK, USA, Oklahoma Animal Disease Diagnostic Laboratory, College of Veterinary Medicine 备注:To appear in MICCAI 2021 链接:https://arxiv.org/abs/2107.09883 摘要:新病原体和人畜共患疾病(如SARS-CoV-2)的出现强调了开发新的诊断和干预管道的必要性,这些管道可以从少量标记数据中快速学习。结合下一代测序技术的进步,基于元基因组的诊断工具有望彻底改变快速定点诊断。然而,发展这样一种方法有着重大的挑战,其中主要的挑战是学习自我监督的表示法,这有助于用极少量的标记数据检测新的病原体特征。这是一个特别困难的任务,因为密切相关的病原体可以共享90%以上的基因组结构。在这项工作中,我们通过提出MG-Net来解决这些挑战,MG-Net是一个自我监督的表示学习框架,它利用了来自临床元基因组序列的伪成像数据的多模态上下文。我们表明,该框架可以从未标记的数据中学习健壮的表示,可用于下游任务,如对标记数据访问有限的元基因组序列分类。大量的实验表明,在每个类只有1000个样本的情况下,学习到的特征比当前的基线元基因组表现要好。 摘要:The emergence of novel pathogens and zoonotic diseases like the SARS-CoV-2 have underlined the need for developing novel diagnosis and intervention pipelines that can learn rapidly from small amounts of labeled data. Combined with technological advances in next-generation sequencing, metagenome-based diagnostic tools hold much promise to revolutionize rapid point-of-care diagnosis. However, there are significant challenges in developing such an approach, the chief among which is to learn self-supervised representations that can help detect novel pathogen signatures with very low amounts of labeled data. This is particularly a difficult task given that closely related pathogens can share more than 90% of their genome structure. In this work, we address these challenges by proposing MG-Net, a self-supervised representation learning framework that leverages multi-modal context using pseudo-imaging data derived from clinical metagenome sequences. We show that the proposed framework can learn robust representations from unlabeled data that can be used for downstream tasks such as metagenome sequence classification with limited access to labeled data. Extensive experiments show that the learned features outperform current baseline metagenome representations, given only 1000 samples per class.

【5】 Explainable AI Enabled Inspection of Business Process Prediction Models 标题:可解释的人工智能支持的业务流程预测模型检查

作者:Chun Ouyang,Renuka Sindhgatta,Catarina Moreira 备注:17 pages, 6 figures, 1 table 链接:https://arxiv.org/abs/2107.09767 摘要:以机器学习技术为基础的现代数据分析已经成为以数据为主导的决策自动化的关键因素。作为最新数据分析的一个重要分支,业务流程预测也面临着一个挑战,即缺乏对底层“黑箱”预测模型的推理和决策的解释。随着可解释机器学习技术的发展,可以为黑盒模型生成解释,使得(人类)用户能够访问机器学习预测背后的推理。在本文中,我们的目标是提出一种方法,允许我们使用模型解释来研究机器学习预测应用的某些推理,并检测潜在的问题,从而增强对业务流程预测模型的信任。我们的方法的一个新贡献是模型检查的建议,它利用了可解释的机器学习机制产生的解释和从记录历史进程执行的事件日志中提取的上下文或领域知识。从这项工作中得出的结论有望作为开发模型可靠性度量和业务流程预测上下文中的评估的关键输入。 摘要:Modern data analytics underpinned by machine learning techniques has become a key enabler to the automation of data-led decision making. As an important branch of state-of-the-art data analytics, business process predictions are also faced with a challenge in regard to the lack of explanation to the reasoning and decision by the underlying `black-box' prediction models. With the development of interpretable machine learning techniques, explanations can be generated for a black-box model, making it possible for (human) users to access the reasoning behind machine learned predictions. In this paper, we aim to present an approach that allows us to use model explanations to investigate certain reasoning applied by machine learned predictions and detect potential issues with the underlying methods thus enhancing trust in business process prediction models. A novel contribution of our approach is the proposal of model inspection that leverages both the explanations generated by interpretable machine learning mechanisms and the contextual or domain knowledge extracted from event logs that record historical process execution. Findings drawn from this work are expected to serve as a key input to developing model reliability metrics and evaluation in the context of business process predictions.

【6】 Delving Into Deep Walkers: A Convergence Analysis of Random-Walk-Based Vertex Embeddings 标题:深入研究深度漫游:基于随机游走的顶点嵌入的收敛性分析

作者:Dominik Kloepfer,Angelica I. Aviles-Rivero,Daniel Heydecker 机构: Heydecker are with the Department of AppliedMathematics and Theoretical Physics 链接:https://arxiv.org/abs/2107.10014 摘要:近年来,基于随机游动的图顶点嵌入技术的影响越来越大,它能有效地将图转换为更易于计算的格式,同时保留相关信息,在多个任务中表现出良好的性能。然而,这些算法的理论性质,特别是超参数和图结构对其收敛性的影响,至今还没有得到很好的理解。本文对基于随机游动的嵌入技术进行了理论分析。首先,我们证明了在一些较弱的假设下,由随机游动导出的顶点嵌入确实在单极限的随机游动数$N到 infty$和双极限的随机游动长度$L到 infty$中收敛。其次,我们推导了集中边界,量化了单极限和双极限语料库的收敛速度。第三,我们利用这些结果来推导一个选择超参数$N$和$L$的启发式算法。我们通过一系列的数值实验和可视化实验,验证和说明了我们的研究结果的实际重要性。 摘要:Graph vertex embeddings based on random walks have become increasingly influential in recent years, showing good performance in several tasks as they efficiently transform a graph into a more computationally digestible format while preserving relevant information. However, the theoretical properties of such algorithms, in particular the influence of hyperparameters and of the graph structure on their convergence behaviour, have so far not been well-understood. In this work, we provide a theoretical analysis for random-walks based embeddings techniques. Firstly, we prove that, under some weak assumptions, vertex embeddings derived from random walks do indeed converge both in the single limit of the number of random walks $N to infty$ and in the double limit of both $N$ and the length of each random walk $Ltoinfty$. Secondly, we derive concentration bounds quantifying the converge rate of the corpora for the single and double limits. Thirdly, we use these results to derive a heuristic for choosing the hyperparameters $N$ and $L$. We validate and illustrate the practical importance of our findings with a range of numerical and visual experiments on several graphs drawn from real-world applications.

【7】 EMG Pattern Recognition via Bayesian Inference with Scale Mixture-Based Stochastic Generative Models 标题:基于尺度混合随机生成模型的贝叶斯推理肌电模式识别

作者:Akira Furui,Takuya Igaue,Toshio Tsuji 机构:Graduate School of Advanced Science and Engineering, Hiroshima University, Higashi-hiroshima ,-, Japan, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku ,-, Japan 备注:This paper is accepted for publication in Expert Systems with Applications 链接:https://arxiv.org/abs/2107.09853 摘要:肌电图(EMG)由于能够反映人体运动意图,已被广泛应用于假手和信息设备的信号接口。虽然EMG分类方法已经被引入到基于EMG的控制系统中,但是它们没有完全考虑EMG信号的随机特性。提出了一种基于尺度混合生成模型的肌电模式分类方法。比例混合模型是一种随机肌电模型,它将肌电方差看作一个随机变量,使得方差中的不确定性得以表示。将该模型进行了扩展,并将其应用于肌电信号的模式分类。该方法通过变分贝叶斯学习进行训练,实现了模型复杂度的自动确定。此外,为了用部分判别法优化该方法的超参数,提出了一种基于互信息的确定方法。仿真和肌电分析实验验证了超参数与分类精度的关系以及该方法的有效性。使用公共肌电数据集进行的比较表明,该方法优于各种传统的分类器。这些结果表明了所提方法的有效性及其对肌电控制系统的适用性。在肌电模式识别中,基于能反映肌电信号随机特征的生成模型的分类器比传统的通用分类器具有更好的识别效果。 摘要:Electromyogram (EMG) has been utilized to interface signals for prosthetic hands and information devices owing to its ability to reflect human motion intentions. Although various EMG classification methods have been introduced into EMG-based control systems, they do not fully consider the stochastic characteristics of EMG signals. This paper proposes an EMG pattern classification method incorporating a scale mixture-based generative model. A scale mixture model is a stochastic EMG model in which the EMG variance is considered as a random variable, enabling the representation of uncertainty in the variance. This model is extended in this study and utilized for EMG pattern classification. The proposed method is trained by variational Bayesian learning, thereby allowing the automatic determination of the model complexity. Furthermore, to optimize the hyperparameters of the proposed method with a partial discriminative approach, a mutual information-based determination method is introduced. Simulation and EMG analysis experiments demonstrated the relationship between the hyperparameters and classification accuracy of the proposed method as well as the validity of the proposed method. The comparison using public EMG datasets revealed that the proposed method outperformed the various conventional classifiers. These results indicated the validity of the proposed method and its applicability to EMG-based control systems. In EMG pattern recognition, a classifier based on a generative model that reflects the stochastic characteristics of EMG signals can outperform the conventional general-purpose classifier.

检测相关(1篇)

【1】 Window Detection In Facade Imagery: A Deep Learning Approach Using Mask R-CNN 标题:幕墙图像中的窗口检测:一种基于Mask R-CNN的深度学习方法

作者:Nils Nordmark,Mola Ayenew 机构: University of GothenburgMola Ayenew, University of GothenburgAbstractThe parsing of windows in building facades is a long-desired but challenging task in computer vision 备注:13 pages, 65 figures, 1 table 链接:https://arxiv.org/abs/2107.10006 摘要:在计算机视觉中,建筑物外立面中的窗口解析是一项长期以来一直被人们所期待但却极具挑战性的任务。它对于城市分析、语义重建、生命周期分析、数字孪生和场景解析以及其他需要高质量语义数据的建筑相关任务至关重要。本文研究了maskr-CNN框架在门面图像输入窗口检测中的应用。我们利用转移学习来训练我们所提出的COCO权重方法,并利用我们收集的立面街景图像数据集来产生我们新窗口类的实例分割。实验结果表明,我们建议的方法与一个相对较小的数据集训练网络只有传输学习和增强实现的结果与以前的国家最先进的窗口检测方法,即使没有优化后技术。 摘要:The parsing of windows in building facades is a long-desired but challenging task in computer vision. It is crucial to urban analysis, semantic reconstruction, lifecycle analysis, digital twins, and scene parsing amongst other building-related tasks that require high-quality semantic data. This article investigates the usage of the mask R-CNN framework to be used for window detection of facade imagery input. We utilize transfer learning to train our proposed method on COCO weights with our own collected dataset of street view images of facades to produce instance segmentations of our new window class. Experimental results show that our suggested approach with a relatively small dataset trains the network only with transfer learning and augmentation achieves results on par with prior state-of-the-art window detection approaches, even without post-optimization techniques.

分类|识别(5篇)

【1】 Distribution of Classification Margins: Are All Data Equal? 标题:分类边距的分布:所有数据都相等吗?

作者:Andrzej Banburski,Fernanda De La Torre,Nishka Pant,Ishana Shastri,Tomaso Poggio 机构: 2Brown University 备注:Previously online as CBMM Memo 115 on the CBMM MIT site 链接:https://arxiv.org/abs/2107.10199 摘要:最近的理论结果表明,在指数损失函数下,深度神经网络的梯度下降使分类裕度局部最大,这相当于在裕度约束下使权重矩阵的范数最小化。然而,解的这一性质并不能完全描述其泛化性能。我们从理论上证明了训练集上边缘分布曲线下的面积实际上是一个很好的泛化度量。然后,我们证明,在实现数据分离后,可以动态地将训练集减少99%以上,而不会显著降低性能。有趣的是,得到的“高容量”特征子集在不同的训练运行中并不一致,这与理论上的观点一致,即在SGD下,以及在存在批量归一化和权重衰减的情况下,所有训练点都应收敛到相同的渐近边界。 摘要:Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization. We then show that, after data separation is achieved, it is possible to dynamically reduce the training set by more than 99% without significant loss of performance. Interestingly, the resulting subset of "high capacity" features is not consistent across different training runs, which is consistent with the theoretical claim that all training points should converge to the same asymptotic margin under SGD and in the presence of both batch normalization and weight decay.

【2】 CGANs with Auxiliary Discriminative Classifier 标题:带辅助判别分类器的CGAN

作者:Liang Hou,Qi Cao,Huawei Shen,Xueqi Cheng 机构:Data Intelligence System Research Center, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences 链接:https://arxiv.org/abs/2107.10060 摘要:条件生成模型旨在学习数据和标签的联合分布,从而实现条件生成。其中,辅助分类器生成对抗网络(AC-GAN)得到了广泛的应用,但存在生成样本类内多样性低的问题。本文指出其根本原因是AC-GAN的分类器不依赖于生成元,因此不能为生成元提供信息指导来逼近目标的联合分布,导致条件熵最小化,从而降低了类内分集。基于这一发现,我们提出了一种新的带有辅助判别分类器的cGANs(ADC-GAN)来解决AC-GAN的问题。具体地说,辅助判别分类器在识别真假数据的同时,通过区分真假数据来识别它们的标签。然后,基于辅助分类器和原始分类器对生成器进行优化,使生成样本的联合分布和边缘分布与真实样本的联合分布和边缘分布相匹配。我们提供了理论分析和实证证据,在合成和现实世界的数据集,以证明所提出的ADC-GAN的优越性相比,竞争cGaN。 摘要:Conditional generative models aim to learn the underlying joint distribution of data and labels, and thus realize conditional generation. Among them, auxiliary classifier generative adversarial networks (AC-GAN) have been widely used, but suffer from the issue of low intra-class diversity on generated samples. In this paper, we point out that the fundamental reason is that the classifier of AC-GAN is generator-agnostic, and thus cannot provide informative guidance to the generator to approximate the target joint distribution, leading to a minimization of conditional entropy that decreases the intra-class diversity. Based on this finding, we propose novel cGANs with auxiliary discriminative classifier (ADC-GAN) to address the issue of AC-GAN. Specifically, the auxiliary discriminative classifier becomes generator-aware by distinguishing between the real and fake data while recognizing their labels. We then optimize the generator based on the auxiliary classifier along with the original discriminator to match the joint and marginal distributions of the generated samples with those of the real samples. We provide theoretical analysis and empirical evidence on synthetic and real-world datasets to demonstrate the superiority of the proposed ADC-GAN compared to competitive cGANs.

【3】 Integration of Autoencoder and Functional Link Artificial Neural Network for Multi-label Classification 标题:自动编码器与函数连接人工神经网络相结合的多标签分类

作者:Anwesha Law,Ashish Ghosh 链接:https://arxiv.org/abs/2107.09904 摘要:多标签分类是当前一个非常活跃的研究课题,它处理的是由于多个标签对一个特定的数据实例是活动的而产生的卷积和重叠的边界。我们提出了一种能够提取底层特征并引入非线性的分类器来处理复杂的决策边界。提出了一种新的神经网络模型,其中输入特征经过两种变换,分别来自多标签函数链人工神经网络和自动编码器。首先,利用基函数对原始特征进行函数扩展。这是由一个自动编码器辅助转换和缩减的扩展功能。该网络通过两层变换提高了多标签数据的可分性,同时将扩展后的特征空间缩小到更易于管理的程度。这样就平衡了输入维度,即使对于有限的数据量,也可以获得更好的分类性能。该网络已在5个ML数据集上进行了验证,与6个成熟的ML分类器相比,显示了其优越的性能。此外,一个单一的标签变异的网络也同时制定和测试四个相关的数据集对三个现有的分类器,以建立其有效性。 摘要:Multi-label (ML) classification is an actively researched topic currently, which deals with convoluted and overlapping boundaries that arise due to several labels being active for a particular data instance. We propose a classifier capable of extracting underlying features and introducing non-linearity to the data to handle the complex decision boundaries. A novel neural network model has been developed where the input features are subjected to two transformations adapted from multi-label functional link artificial neural network and autoencoders. First, a functional expansion of the original features are made using basis functions. This is followed by an autoencoder-aided transformation and reduction on the expanded features. This network is capable of improving separability for the multi-label data owing to the two-layer transformation while reducing the expanded feature space to a more manageable amount. This balances the input dimension which leads to a better classification performance even for a limited amount of data. The proposed network has been validated on five ML datasets which shows its superior performance in comparison with six well-established ML classifiers. Furthermore, a single-label variation of the proposed network has also been formulated simultaneously and tested on four relevant datasets against three existing classifiers to establish its effectiveness.

【4】 Regularized Classification-Aware Quantization 标题:正则化分类感知量化

作者:Daniel Severo,Elad Domanovitz,Ashish Khisti 机构:Electrical and Computer Engineering, University of Toronto, Toronto, Canada 备注:Accepted to the 30th Biennial Symposium on Communications (BSC) 2021 链接:https://arxiv.org/abs/2107.09716 摘要:传统上,量化是为了最小化数据源的重建误差。当考虑下游分类任务时,其他失真度量可能是有意义的;如0-1分类损失。此外,当这些量化器被部署到生产环境中时,它们的性能最好不会恶化,因为在线重新学习方案并不总是可能的。在这项工作中,我们提出了一类算法,学习分布式量化方案的二进制分类任务。我们的方法在看不见的数据上有很好的表现,并且比以前的方法速度快,和数据集大小的二次项成正比。它的工作原理是用重建误差正则化0-1损失。我们对合成的混合和二元高斯数据进行了实验,并将训练、测试和泛化误差与文献中的一系列基准量化方案进行了比较。我们的方法称为正则化分类感知量化。 摘要:Traditionally, quantization is designed to minimize the reconstruction error of a data source. When considering downstream classification tasks, other measures of distortion can be of interest; such as the 0-1 classification loss. Furthermore, it is desirable that the performance of these quantizers not deteriorate once they are deployed into production, as relearning the scheme online is not always possible. In this work, we present a class of algorithms that learn distributed quantization schemes for binary classification tasks. Our method performs well on unseen data, and is faster than previous methods proportional to a quadratic term of the dataset size. It works by regularizing the 0-1 loss with the reconstruction error. We present experiments on synthetic mixture and bivariate Gaussian data and compare training, testing, and generalization errors with a family of benchmark quantization schemes from the literature. Our method is called Regularized Classification-Aware Quantization.

【5】 Quantum Measurement Classification with Qudits 标题:基于Qudits的量子测量分类

作者:Diego H. Useche,Andres Giraldo-Carvajal,Hernan M. Zuluaga-Bucheli,Jose A. Jaramillo-Villegas,Fabio A. González 机构:Gonz´alez 备注:15 pages, 10 figures 链接:https://arxiv.org/abs/2107.09781 摘要:本文提出了一个混合经典量子密度估计和监督分类程序。该程序在高维量子计算机模拟器中以量子电路的形式实现。我们证明了所提出的量子协议允许估计概率密度函数,并以监督学习的方式进行预测。该模型可以推广到高维量子计算机中求密度矩阵的期望值。在各种数据集上进行了实验。结果表明,该方法是一种在高维量子计算机上实现有监督分类和密度估计的可行策略。 摘要:This paper presents a hybrid classical-quantum program for density estimation and supervised classification. The program is implemented as a quantum circuit in a high-dimensional quantum computer simulator. We show that the proposed quantum protocols allow to estimate probability density functions and to make predictions in a supervised learning manner. This model can be generalized to find expected values of density matrices in high-dimensional quantum computers. Experiments on various data sets are presented. Results show that the proposed method is a viable strategy to implement supervised classification and density estimation in a high-dimensional quantum computer.

表征(1篇)

【1】 Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies 标题:电子健康记录中时间数据表示的深度学习:挑战和方法的系统回顾

作者:Feng Xie,Han Yuan,Yilin Ning,Marcus Eng Hock Ong,Mengling Feng,Wynne Hsu,Bibhas Chakraborty,Nan Liu 机构: Programme in Health Services and Systems Research, Duke-NUS Medical School, Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore, Department of Emergency Medicine, Singapore General Hospital, Singapore 链接:https://arxiv.org/abs/2107.09951 摘要:目的:时态电子健康档案(EHRs)可作为临床事件预测、慢性病管理等二次利用的丰富信息。然而,时态数据表示存在挑战。因此,我们试图找出这些挑战,并通过对深度学习解决方案的系统研究来评估应对这些挑战的新方法。方法:我们搜索了五个数据库(PubMed、EMBASE、电气与电子工程师协会[IEEE]Xplore数字图书馆、计算机械协会[ACM]数字图书馆和科学网),并在一些著名的计算机科学会议论文集中进行了手工搜索。从2010年1月1日到2020年8月30日,我们寻找了关于结构化EHR数据中时态数据表示的深度学习方法的文章。我们从时间序列的性质、方法论和模型实现三个角度对所选文章进行了总结和分析。结果:我们收集了98篇关于深度学习的时态数据表示的文章。确定了四个主要挑战,包括数据不规则性、数据异质性、数据稀疏性和模型不透明性。然后我们研究了如何应用深度学习技术来应对这些挑战。最后,我们讨论了深度学习带来的一些开放性挑战。结论:时态EHR数据对临床预测建模和数据利用提出了几个主要挑战。在某种程度上,当前的深度学习解决方案可以解决这些挑战。未来的研究可以考虑设计综合的和完整的解决方案。此外,研究人员应将额外的临床领域知识纳入研究设计,并提高模型的可解释性,以促进其在临床实践中的实施。 摘要:Objective: Temporal electronic health records (EHRs) can be a wealth of information for secondary uses, such as clinical events prediction or chronic disease management. However, challenges exist for temporal data representation. We therefore sought to identify these challenges and evaluate novel methodologies for addressing them through a systematic examination of deep learning solutions. Methods: We searched five databases (PubMed, EMBASE, the Institute of Electrical and Electronics Engineers [IEEE] Xplore Digital Library, the Association for Computing Machinery [ACM] digital library, and Web of Science) complemented with hand-searching in several prestigious computer science conference proceedings. We sought articles that reported deep learning methodologies on temporal data representation in structured EHR data from January 1, 2010, to August 30, 2020. We summarized and analyzed the selected articles from three perspectives: nature of time series, methodology, and model implementation. Results: We included 98 articles related to temporal data representation using deep learning. Four major challenges were identified, including data irregularity, data heterogeneity, data sparsity, and model opacity. We then studied how deep learning techniques were applied to address these challenges. Finally, we discuss some open challenges arising from deep learning. Conclusion: Temporal EHR data present several major challenges for clinical prediction modeling and data utilization. To some extent, current deep learning solutions can address these challenges. Future studies can consider designing comprehensive and integrated solutions. Moreover, researchers should incorporate additional clinical domain knowledge into study designs and enhance the interpretability of the model to facilitate its implementation in clinical practice.

优化|敛散性(2篇)

【1】 Neural Fixed-Point Acceleration for Convex Optimization 标题:凸优化的神经不动点加速算法

作者:Shobha Venkataraman,Brandon Amos 机构:Facebook AI 备注:AutoML@ICML2021 链接:https://arxiv.org/abs/2107.10254 摘要:定点迭代是数值计算的核心,在实时应用中常常是计算瓶颈,而实时应用通常需要中等精度的快速解。经典的不动点问题加速方法侧重于设计具有适用于任何不动点问题的理论保证的算法。利用元学习和经典加速算法的思想,我们提出了一个神经不动点加速框架,该框架可以自动学习加速从分布中提取的凸不动点问题。我们将我们的框架应用于SCS(最先进的凸锥规划求解器),并设计模型和损失函数,以克服学习展开优化和加速不稳定性的挑战。我们的工作将神经加速引入到任何可以用CVXPY表示的优化问题中。本文的源代码可以在https://github.com/facebookresearch/neural-scs 摘要:Fixed-point iterations are at the heart of numerical computing and are often a computational bottleneck in real-time applications, which typically instead need a fast solution of moderate accuracy. Classical acceleration methods for fixed-point problems focus on designing algorithms with theoretical guarantees that apply to any fixed-point problem. We present neural fixed-point acceleration, a framework to automatically learn to accelerate convex fixed-point problems that are drawn from a distribution, using ideas from meta-learning and classical acceleration algorithms. We apply our framework to SCS, the state-of-the-art solver for convex cone programming, and design models and loss functions to overcome the challenges of learning over unrolled optimization and acceleration instabilities. Our work brings neural acceleration into any optimization problem expressible with CVXPY. The source code behind this paper is available at https://github.com/facebookresearch/neural-scs

【2】 On the Convergence of Prior-Guided Zeroth-Order Optimization Algorithms 标题:关于先验引导零阶优化算法的收敛性

作者:Shuyu Cheng,Guoqiang Wu,Jun Zhu 机构:Dept. of Comp. Sci. and Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys., Institute for AI, THBI Lab, Tsinghua University, Beijing, China 备注:Code available at this https URL 链接:https://arxiv.org/abs/2107.10110 摘要:零阶优化被广泛应用于处理具有挑战性的任务,如基于查询的黑盒对抗攻击和强化学习。在基于有限差分的梯度估计过程中,人们尝试了多种方法来整合先验信息,并取得了很好的实证结果。然而,它们的收敛性还不是很清楚。本文在贪心下降框架下,利用不同的梯度估计量,分析了先验引导ZO算法的收敛性,试图填补这一空白。为先验随机无梯度(PRGF)算法的收敛性提供了保证。此外,为了进一步加速贪心下降法,我们提出了一个新的加速随机搜索(ARS)算法,结合先验信息,并进行了收敛性分析。最后,我们的理论结果被几个数值基准和对抗性攻击的实验所证实。 摘要:Zeroth-order (ZO) optimization is widely used to handle challenging tasks, such as query-based black-box adversarial attacks and reinforcement learning. Various attempts have been made to integrate prior information into the gradient estimation procedure based on finite differences, with promising empirical results. However, their convergence properties are not well understood. This paper makes an attempt to fill this gap by analyzing the convergence of prior-guided ZO algorithms under a greedy descent framework with various gradient estimators. We provide a convergence guarantee for the prior-guided random gradient-free (PRGF) algorithms. Moreover, to further accelerate over greedy descent methods, we present a new accelerated random search (ARS) algorithm that incorporates prior information, together with a convergence analysis. Finally, our theoretical results are confirmed by experiments on several numerical benchmarks as well as adversarial attacks.

预测|估计(4篇)

【1】 Predicting Issue Types on GitHub 标题:预测GitHub上的问题类型

作者:Rafael Kallis,Andrea Di Sorbo,Gerardo Canfora,Sebastiano Panichella 机构:Valdon Group, Seilergraben , Zurich, Switzerland, University of Sannio, Piazza Guerrazzi, Benevento, Italy, Zurich University of Applied Sciences, Obere Kirchgasse , Winterthur, Switzerland 链接:https://arxiv.org/abs/2107.09936 摘要:软件维护和演化涉及到软件项目成功的关键活动。为了支持这些活动并保持代码的最新性和无错误性,软件社区使用问题跟踪器,即用于发送信号、处理和解决软件系统中发生的问题的工具。然而,在热门项目中,每天都会提交几十到几百个问题报告。在这种情况下,确定每个提交报告的类型(例如,错误报告、功能请求等)将有助于管理和确定要解决的问题的优先次序。为了支持问题处理活动,本文提出了一个GitHub应用程序Ticket Tagger,它通过机器学习技术分析问题标题和描述,自动识别GitHub上提交的报告类型,并为每个问题分配相应的标签。我们对该工具在大约30000个GitHub问题上的预测性能进行了实证评估。我们的结果表明,标签标签可以识别正确的标签分配给GitHub问题具有相当高的效率。考虑到这些结果以及该工具被设计为易于集成到GitHub问题管理过程中这一事实,Ticket Tagger对于开发人员来说是一个有用的解决方案。 摘要:Software maintenance and evolution involves critical activities for the success of software projects. To support such activities and keep code up-to-date and error-free, software communities make use of issue trackers, i.e., tools for signaling, handling, and addressing the issues occurring in software systems. However, in popular projects, tens or hundreds of issue reports are daily submitted. In this context, identifying the type of each submitted report (e.g., bug report, feature request, etc.) would facilitate the management and the prioritization of the issues to address. To support issue handling activities, in this paper, we propose Ticket Tagger, a GitHub app analyzing the issue title and description through machine learning techniques to automatically recognize the types of reports submitted on GitHub and assign labels to each issue accordingly. We empirically evaluated the tool's prediction performance on about 30,000 GitHub issues. Our results show that the Ticket Tagger can identify the correct labels to assign to GitHub issues with reasonably high effectiveness. Considering these results and the fact that the tool is designed to be easily integrated in the GitHub issue management process, Ticket Tagger consists in a useful solution for developers.

【2】 High-dimensional Multivariate Time Series Forecasting in IoT Applications using Embedding Non-stationary Fuzzy Time Series 标题:物联网应用中嵌入非平稳模糊时间序列的高维多变量时间序列预测

作者:Hugo Vinicius Bitencourt,Frederico Gadelha Guimarães 机构:Machine Intelligence and Data Science Lab (MINDS), Graduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Av. Antˆonio Carlos ,-, Belo Horizonte, MG, Brazil, Frederico Gadelha Guimar˜aes 备注:6 pages, 1 figure, submitted to the 7th IEEE LA-CCI (Latin American Conference on Computational Intelligence) 链接:https://arxiv.org/abs/2107.09785 摘要:在物联网(IoT)中,数据是从不同的数据源连续记录的,设备的嵌入式电子设备会发生故障,从而导致高维数据集和概念漂移事件。因此,能够处理高维非平稳时间序列的方法在物联网应用中具有重要价值。模糊时间序列(Fuzzy Time Series,FTS)模型是数据驱动的非参数模型,具有实现简单、精度高等特点。不幸的是,FTS在处理多变量数据集和概念漂移的场景时遇到了困难。本文提出了一种处理高维非平稳时间序列的新方法,将原始高维数据投影到低维嵌入空间,并采用FTS方法。结合这些技术可以更好地表示非平稳多元时间序列的复杂内容和准确的预测。该模型能解释98%的方差,RMSE、MAE和MAPE分别达到11.52%、2.68%和2.91%。 摘要:In Internet of things (IoT), data is continuously recorded from different data sources and devices can suffer faults in their embedded electronics, thus leading to a high-dimensional data sets and concept drift events. Therefore, methods that are capable of high-dimensional non-stationary time series are of great value in IoT applications. Fuzzy Time Series (FTS) models stand out as data-driven non-parametric models of easy implementation and high accuracy. Unfortunately, FTS encounters difficulties when dealing with data sets of many variables and scenarios with concept drift. We present a new approach to handle high-dimensional non-stationary time series, by projecting the original high-dimensional data into a low dimensional embedding space and using FTS approach. Combining these techniques enables a better representation of the complex content of non-stationary multivariate time series and accurate forecasts. Our model is able to explain 98% of the variance and reach 11.52% of RMSE, 2.68% of MAE and 2.91% of MAPE.

【3】 Statistical Estimation from Dependent Data 标题:相依数据的统计估计

作者:Yuval Dagan,Constantinos Daskalakis,Nishanth Dikkala,Surbhi Goel,Anthimos Vardis Kandiros 机构:EECS & CSAIL, MIT, GOOGLE RESEARCH, Microsoft Research NYC 备注:41 pages, ICML 2021 链接:https://arxiv.org/abs/2107.09773 摘要:我们考虑一个一般的统计估计问题,其中跨不同观察的二进制标签不是独立地对它们的特征向量进行调节,而是依赖的捕获设置,例如,这些观测被收集在空间域、时间域或社交网络上,从而引起依赖性。我们用马尔可夫随机场的语言来建模这些依赖关系,重要的是,允许这些依赖关系是实质性的,也就是说,不要假设捕获这些依赖关系的马尔可夫随机场是在高温下。作为我们的主要贡献,我们为这个模型提供了算法和统计上有效的估计率,给出了logistic回归、稀疏logistic回归和依赖数据的神经网络设置的一些例子。我们的估计保证遵循从{em single}样本估计Ising模型参数(即外场和相互作用强度)的新结果{我们在真实的网络数据上评估了我们的估计方法,结果表明,在三个文本分类数据集:Cora、citeser和Pubmed中,它优于忽略依赖关系的标准回归方法 摘要:We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}

【4】 Predicting trajectory behaviour via machine-learned invariant manifolds 标题:基于机器学习不变流形的弹道行为预测

作者:Vladimír Krajňák,Shibabrat Naik,Stephen Wiggins 机构:School of Mathematics, University of Bristol, Fry building, Woodland Road, Bristol BS, UG, United Kingdom 链接:https://arxiv.org/abs/2107.10154 摘要:在本文中,我们使用支持向量机(SVM)来发展机器学习框架,以发现相空间结构,可以区分不同的反应路径。机器学习模型是利用哈密顿方程的轨迹数据来训练的,但适合用于分子动力学模拟。该框架是专门设计的,要求最小的先验知识的动态系统。我们用一个模型哈密顿量来测试我们的方法,这个哈密顿量是由两部分组成的,一部分是刚性对称的顶,代表$text{CH}u3^{ }$离子,另一部分是移动的$text{H}$原子。我们从轨迹开始,使用支持向量机来确定不同类别轨迹对应的初始条件之间的边界。然后,我们证明了不同轨道类别之间的这些边界近似于Chesnavich模型早期分析中观察到的相同类型的不变相空间结构。我们的方法在设计时考虑到了高维应用的扩展。支持向量机即使在数据量很小的情况下也能很好地工作,因此我们的方法在计算上比现有的方法更适合于高维系统和轨迹积分昂贵的系统。 摘要:In this paper we use support vector machines (SVM) to develop a machine learning framework to discover the phase space structure that can distinguish between distinct reaction pathways. The machine learning model is trained using data from trajectories of Hamilton's equations but lends itself for use in molecular dynamics simulation. The framework is specifically designed to require minimal a priori knowledge of the dynamics in a system. We benchmark our approach with a model Hamiltonian for the reaction of an ion and a molecule due to Chesnavich consisting of two parts: a rigid, symmetric top representing the $text{CH}_3^{ }$ ion, and a mobile $text{H}$ atom. We begin with trajectories and use support vector machines to determine the boundaries between initial conditions corresponding to different classes of trajectories. We then show that these boundaries between different classes of trajectories approximate invariant phase space structures of the same type observed in earlier analyses of Chesnavich's model. Our approach is designed with extensions to higher-dimensional applications in mind. SVM is known to work well even with small amounts of data, therefore our approach is computationally better suited than existing methods for high-dimensional systems and systems where integrating trajectories is expensive.

其他神经网络|深度学习|模型|建模(10篇)

【1】 On the Memorization Properties of Contrastive Learning 标题:论对比学习的记忆特性

作者:Ildus Sadrtdinov,Nadezhda Chirkova,Ekaterina Lobacheva 机构: One of the powerful tools for analysinggeneralization properties are memorization studies whichinvestigate what patterns and how do DNNs learn and moti- 1HSE University 备注:Published in Workshop on Overparameterization: Pitfalls & Opportunities at ICML 2021 链接:https://arxiv.org/abs/2107.10143 摘要:对深层神经网络(DNNs)的记忆研究有助于理解DNNs学习的模式和方式,并促进DNN训练方法的改进。在这项工作中,我们研究了一种广泛使用的对比自监督学习方法SimCLR的记忆特性,并将其与监督学习和随机标签训练的记忆进行了比较。我们发现,训练对象和增广在SimCLR如何学习它们的意义上可能具有不同的复杂性。此外,我们还证明了SimCLR在训练对象复杂度分布上类似于随机标签训练。 摘要:Memorization studies of deep neural networks (DNNs) help to understand what patterns and how do DNNs learn, and motivate improvements to DNN training approaches. In this work, we investigate the memorization properties of SimCLR, a widely used contrastive self-supervised learning approach, and compare them to the memorization of supervised learning and random labels training. We find that both training objects and augmentations may have different complexity in the sense of how SimCLR learns them. Moreover, we show that SimCLR is similar to random labels training in terms of the distribution of training objects complexity.

【2】 Learning Theorem Proving Components 标题:学习定理证明组件

作者:Karel Chvalovský,Jan Jakubův,Miroslav Olšák,Josef Urban 机构: Czech Technical University in Prague, Prague, Czechia, University of Innsbruck, Innsbruck, Austria 备注:Accepted to TABLEAUX'21 链接:https://arxiv.org/abs/2107.10034 摘要:基于给定子句过程的饱和式自动定理证明器(ATPs)是目前经典一阶逻辑最强大的通用推理器。然而,在这样的系统中,子句选择启发式算法常常是孤立地评估子句,而忽略其他子句。最近,通过为E/ENIGMA系统配备一个图形神经网络(GNN),该网络基于在先前选择的子句上下文中的评估来选择下一个给定的子句,这种情况发生了变化。在这项工作中,我们描述了几种算法并用ENIGMA进行了实验,提出了基于学习从句图重要成分的上下文评价思想。 摘要:Saturation-style automated theorem provers (ATPs) based on the given clause procedure are today the strongest general reasoners for classical first-order logic. The clause selection heuristics in such systems are, however, often evaluating clauses in isolation, ignoring other clauses. This has changed recently by equipping the E/ENIGMA system with a graph neural network (GNN) that chooses the next given clause based on its evaluation in the context of previously selected clauses. In this work, we describe several algorithms and experiments with ENIGMA, advancing the idea of contextual evaluation based on learning important components of the graph of clauses.

【3】 Memorization in Deep Neural Networks: Does the Loss Function matter? 标题:深度神经网络中的记忆:损失函数重要吗?

作者:Deep Patel,P. S. Sastry 机构:Indian Institute of Science, Bangalore, India - 备注:Accepted at PAKDD 2021. 12 pages and 5 figures 链接:https://arxiv.org/abs/2107.09957 摘要:深度神经网络,往往由于过度参数化,显示出能够准确记忆甚至随机标记的数据。实证研究也表明,没有一种标准的正则化技术能够缓解这种过度拟合。我们研究损失函数的选择是否会影响这种记忆。我们用MNIST和CIFAR-10这两个基准数据集进行了实证研究,结果表明,相对于交叉熵或平方误差损失,对称损失函数显著提高了网络抵抗这种过度拟合的能力。然后,我们给出了记忆鲁棒性的形式化定义,并从理论上解释了为什么对称损失提供了这种鲁棒性。我们的结果清楚地表明,在这种记忆现象中,损失函数单独起作用。 摘要:Deep Neural Networks, often owing to the overparameterization, are shown to be capable of exactly memorizing even randomly labelled data. Empirical studies have also shown that none of the standard regularization techniques mitigate such overfitting. We investigate whether the choice of the loss function can affect this memorization. We empirically show, with benchmark data sets MNIST and CIFAR-10, that a symmetric loss function, as opposed to either cross-entropy or squared error loss, results in significant improvement in the ability of the network to resist such overfitting. We then provide a formal definition for robustness to memorization and provide a theoretical explanation as to why the symmetric losses provide this robustness. Our results clearly bring out the role loss functions alone can play in this phenomenon of memorization.

【4】 Preventing dataset shift from breaking machine-learning biomarkers 标题:防止数据集移动破坏机器学习生物标记物

作者:Jéroôme Dockès,Gaël Varoquaux,Jean-Baptiste Poline 机构:McGill University ,INRIA, ∗Corresponding author JB Poline and Ga¨el Varoquaux contributed equally to this work. 备注:GigaScience, BioMed Central, In press 链接:https://arxiv.org/abs/2107.09947 摘要:机器学习带来了从具有丰富生物医学测量数据的队列中提取新的生物标记物的希望。一个好的生物标志物是一个可靠的检测相应的条件。然而,生物标志物通常是从与目标人群不同的队列中提取的。这种不匹配,称为数据集偏移,可能会破坏生物标记物在新个体中的应用。在生物医学研究中,数据集的变化是经常发生的,例如由于招聘偏见。当数据集发生变化时,标准的机器学习技术不足以提取和验证生物标记。本文概述了机器学习提取生物标记的时间和方式,以及检测和校正策略。 摘要:Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies.

【5】 Communication and Computation Reduction for Split Learning using Asynchronous Training 标题:利用异步训练减少分裂学习的通信和计算量

作者:Xing Chen,Jingtao Li,Chaitali Chakrabarti 备注:Accepted by SIPS '21 链接:https://arxiv.org/abs/2107.09786 摘要:分割学习是一种很有前途的隐私保护分布式学习方案,它对边缘设备的计算量要求较低,但存在边缘设备与服务器之间通信开销大的缺点。为了减少通信开销,本文提出了一种基于丢失的异步训练方案,该方案更新客户端模型的频率较低,并且只在选定的时间段发送/接收激活/梯度。为了进一步减少通信开销,在传输之前使用8位浮点对激活/梯度进行量化。所提出的通信缩减方法的另一个好处是,由于客户端模型更新的数量减少,客户端的计算量减少。此外,所提出的基于通信约简的分割学习方法的隐私性与传统的分割学习方法基本相同。在CIFAR-10上的VGG11、VGG13和ResNet18模型上的仿真结果表明,在单客户端情况下,当精度下降小于0.5%时,通信开销降低了1.64x-106.7x,客户端计算量减少了2.86x-32.1x。对于5和10个客户案例,VG11的通信成本降低了11.9倍和11.3倍,准确度损失为0.5%。 摘要:Split learning is a promising privacy-preserving distributed learning scheme that has low computation requirement at the edge device but has the disadvantage of high communication overhead between edge device and server. To reduce the communication overhead, this paper proposes a loss-based asynchronous training scheme that updates the client-side model less frequently and only sends/receives activations/gradients in selected epochs. To further reduce the communication overhead, the activations/gradients are quantized using 8-bit floating point prior to transmission. An added benefit of the proposed communication reduction method is that the computations at the client side are reduced due to reduction in the number of client model updates. Furthermore, the privacy of the proposed communication reduction based split learning method is almost the same as traditional split learning. Simulation results on VGG11, VGG13 and ResNet18 models on CIFAR-10 show that the communication cost is reduced by 1.64x-106.7x and the computations in the client are reduced by 2.86x-32.1x when the accuracy degradation is less than 0.5% for the single-client case. For 5 and 10-client cases, the communication cost reduction is 11.9x and 11.3x on VGG11 for 0.5% loss in accuracy.

【6】 kNet: A Deep kNN Network To Handle Label Noise 标题:KNet:一种处理标签噪声的深度KNN网络

作者:Itzik Mizrahi,Shai Avidan 机构:Tel Aviv University 链接:https://arxiv.org/abs/2107.09735 摘要:深层神经网络的训练需要大量的标记数据。大规模收集这些数据不可避免地会产生标签噪声,因此,需要开发对标签噪声具有鲁棒性的学习算法。近年来,k近邻(kNN)成为解决这一问题的可行方法。尽管取得了成功,kNN也并非没有问题。主要是,它需要一个巨大的内存占用来存储所有的训练样本,并且它需要一个先进的数据结构来允许快速检索相关的例子,给定一个查询样本。我们提出了一个神经网络,称为kNet,学习执行kNN。一旦训练完成,我们就不再需要存储训练数据,处理查询样本就是一个简单的推理过程。为了使用kNet,我们首先在数据集上训练一个初始网络,然后在初始网络的倒数第二层训练kNet,我们发现kNet给出了kNN的一个光滑近似,并且不能处理kNN所能显示的样本之间的急剧标签变化。这表明,目前kNet最适合用相当大的k来近似kNN。在两个数据集上的实验表明,这是kNN工作得最好的区域,因此可以用kNet代替kNet。在实践中,kNet在所有标签噪声区域中始终将所有初始网络的结果提高了3%。 摘要:Deep Neural Networks require large amounts of labeled data for their training. Collecting this data at scale inevitably causes label noise.Hence,the need to develop learning algorithms that are robust to label noise. In recent years, k Nearest Neighbors (kNN) emerged as a viable solution to this problem. Despite its success, kNN is not without its problems. Mainly, it requires a huge memory footprint to store all the training samples and it needs an advanced data structure to allow for fast retrieval of the relevant examples, given a query sample. We propose a neural network, termed kNet, that learns to perform kNN. Once trained, we no longer need to store the training data, and processing a query sample is a simple matter of inference. To use kNet, we first train a preliminary network on the data set, and then train kNet on the penultimate layer of the preliminary network.We find that kNet gives a smooth approximation of kNN,and cannot handle the sharp label changes between samples that kNN can exhibit. This indicates that currently kNet is best suited to approximate kNN with a fairly large k. Experiments on two data sets show that this is the regime in which kNN works best,and can therefore be replaced by kNet.In practice, kNet consistently improve the results of all preliminary networks, in all label noise regimes, by up to 3%.

【7】 Machine Learning Approaches to Automated Flow Cytometry Diagnosis of Chronic Lymphocytic Leukemia 标题:机器学习方法在慢性淋巴细胞白血病自动流式细胞术诊断中的应用

作者:Akum S. Kang,Loveleen C. Kang,Stephen M. Mastorides,Philip R. Foulis,Lauren A. DeLand,Robert P. Seifert,Andrew Borkowski 机构: University of SouthFlorida, 5Department of Pathology, University of Florida 备注:4 pp 链接:https://arxiv.org/abs/2107.09728 摘要:流式细胞术是一种测量单个细胞通过激发光源时产生的多种荧光和光散射相关参数的技术。这些细胞被抗体标记以检测各种抗原,荧光信号反映抗原的表达。多参数流式细胞术数据的解释是费力、耗时和昂贵的。它包括由训练有素的医学技术人员和病理学家手工解释细胞分布和在二维图上进行模式识别。使用各种机器学习算法,我们试图开发一种自动分析临床流式细胞术病例,自动分类正常和慢性淋巴细胞白血病病例。我们用梯度增压取得了最大的成功。XGBoost分类器对恶性肿瘤患者进行前瞻性分类的特异性为1.00,敏感性为0.67,阴性预测值为0.75,阳性预测值为1.00,总体准确度为0.83。 摘要:Flow cytometry is a technique that measures multiple fluorescence and light scatter-associated parameters from individual cells as they flow a single file through an excitation light source. These cells are labeled with antibodies to detect various antigens and the fluorescence signals reflect antigen expression. Interpretation of the multiparameter flow cytometry data is laborious, time-consuming, and expensive. It involves manual interpretation of cell distribution and pattern recognition on two-dimensional plots by highly trained medical technologists and pathologists. Using various machine learning algorithms, we attempted to develop an automated analysis for clinical flow cytometry cases that would automatically classify normal and chronic lymphocytic leukemia cases. We achieved the best success with the Gradient Boosting. The XGBoost classifier achieved a specificity of 1.00 and a sensitivity of 0.67, a negative predictive value of 0.75, a positive predictive value of 1.00, and an overall accuracy of 0.83 in prospectively classifying cases with malignancies.

【8】 Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs 标题:混合整数规划的大邻域搜索学习算法

作者:Nicolas Sonnerat,Pengming Wang,Ira Ktena,Sergey Bartunov,Vinod Nair 机构:Equal contributors, DeepMind 链接:https://arxiv.org/abs/2107.10201 摘要:大邻域搜索(LNS)是一种组合优化启发式算法,它从待优化变量的赋值开始,通过在当前赋值周围搜索一个大邻域来迭代改进。在本文中,我们考虑基于学习的混合整数规划(MIPS)的LNS方法。我们训练了一个神经潜水模型来表示分配上的概率分布,该模型与现有的MIP求解器一起生成初始分配。将后续的搜索步骤描述为马尔可夫决策过程,我们训练神经邻域选择策略,在每一步选择一个搜索邻域,然后用MIP求解器进行搜索,找到下一个分配。策略网络采用模仿学习的方法进行训练。我们提出了一个目标仿制策略,在给定足够的计算资源的情况下,保证在指定大小的邻域的所有可能选择中选择包含最优下一个分配的邻域。我们的方法匹配或优于五个真实世界MIP数据集上的所有基线,这些数据集具有来自不同应用程序的大规模实例,包括Google的两个生产应用程序。在较大的运行时间内,它比三个数据集上的最佳基线的平均原始差距高2倍到37.8倍。 摘要:Large Neighborhood Search (LNS) is a combinatorial optimization heuristic that starts with an assignment of values for the variables to be optimized, and iteratively improves it by searching a large neighborhood around the current assignment. In this paper we consider a learning-based LNS approach for mixed integer programs (MIPs). We train a Neural Diving model to represent a probability distribution over assignments, which, together with an existing MIP solver, generates an initial assignment. Formulating the subsequent search steps as a Markov Decision Process, we train a Neural Neighborhood Selection policy to select a search neighborhood at each step, which is searched using a MIP solver to find the next assignment. The policy network is trained using imitation learning. We propose a target policy for imitation that, given enough compute resources, is guaranteed to select the neighborhood containing the optimal next assignment across all possible choices for the neighborhood of a specified size. Our approach matches or outperforms all the baselines on five real-world MIP datasets with large-scale instances from diverse applications, including two production applications at Google. At large running times it achieves $2times$ to $37.8times$ better average primal gap than the best baseline on three of the datasets.

【9】 KalmanNet: Neural Network Aided Kalman Filtering for Partially Known Dynamics 标题:KalmanNet:神经网络辅助部分已知动力学的卡尔曼滤波

作者:Guy Revach,Nir Shlezinger,Xiaoyong Ni,Adria Lopez Escoriza,Ruud J. G. van Sloun,Yonina C. Eldar 机构: Ben-Gurion University of the Negev 链接:https://arxiv.org/abs/2107.10043 摘要:动态系统的实时状态估计是信号处理与控制中的一项基本任务。对于由完全已知的线性高斯状态空间(SS)模型表示的系统,著名的Kalman滤波器(KF)是一种低复杂度的最优解。然而,在实践中,基本SS模型的线性度和对它的准确认识往往是不存在的。在这里,我们提出了KalmanNet,一种实时状态估计器,它从数据中学习,在具有部分信息的非线性动态下进行Kalman滤波。通过在KF流中加入结构SS模型和专用的递归神经网络模块,我们保持了经典算法的数据效率和可解释性,同时隐式地从数据中学习复杂动力学。数值计算表明,KalmanNet方法克服了非线性和模型失配的缺点,优于经典的滤波方法。 摘要:Real-time state estimation of dynamical systems is a fundamental task in signal processing and control. For systems that are well-represented by a fully known linear Gaussian state space (SS) model, the celebrated Kalman filter (KF) is a low complexity optimal solution. However, both linearity of the underlying SS model and accurate knowledge of it are often not encountered in practice. Here, we present KalmanNet, a real-time state estimator that learns from data to carry out Kalman filtering under non-linear dynamics with partial information. By incorporating the structural SS model with a dedicated recurrent neural network module in the flow of the KF, we retain data efficiency and interpretability of the classic algorithm while implicitly learning complex dynamics from data. We numerically demonstrate that KalmanNet overcomes nonlinearities and model mismatch, outperforming classic filtering methods operating with both mismatched and accurate domain knowledge.

【10】 Manifold learning-based polynomial chaos expansions for high-dimensional surrogate models 标题:基于流形学习的高维代理模型多项式混沌展开

作者:Katiana Kontolati,Dimitrios Loukrezis,Ketson R. M. dos Santos,Dimitrios G. Giovanis,Michael D. Shields 机构:Shields,a, Department of Civil & Systems Engineering, Johns Hopkins University, Baltimore MD, USA, Institute for Accelerator Science and Electromagnetic Fields, Technische Universität Darmstadt, Darmstadt, Germany 备注:29 pages, 14 figures 链接:https://arxiv.org/abs/2107.09814 摘要:本文提出了一种基于流形学习的复杂时空过程不确定性量化方法。我们的第一个目标是识别一组高维数据的嵌入,这些数据代表计算或分析模型的感兴趣的数量。为此,我们采用了Grassmannian扩散映射,这是一种两步非线性降维技术,它允许我们降低数据的维数,并以节省和廉价的方式识别有意义的几何描述。然后利用多项式混沌展开法构造随机输入参数与约化空间扩散坐标之间的映射关系。提出了一种自适应聚类方法来确定潜在空间中的最优聚类数目。点的相似性允许我们构造一些几何谐波仿真器,这些仿真器最终被用作一组廉价的预训练模型,以执行潜在特征到环境空间实现的逆映射,从而执行精确的样本外预测。因此,所提出的方法作为一个编码器-解码器系统,能够自动处理非常高维的数据,同时在小数据区成功地运行。该方法在两个基准问题和一个模拟两种物质间一级化学反应的对流扩散反应方程组上进行了验证。在所有的测试案例中,所提出的方法都能达到高精度的近似,最终导致UQ任务的显著加速。 摘要:In this work we introduce a manifold learning-based method for uncertainty quantification (UQ) in systems describing complex spatiotemporal processes. Our first objective is to identify the embedding of a set of high-dimensional data representing quantities of interest of the computational or analytical model. For this purpose, we employ Grassmannian diffusion maps, a two-step nonlinear dimension reduction technique which allows us to reduce the dimensionality of the data and identify meaningful geometric descriptions in a parsimonious and inexpensive manner. Polynomial chaos expansion is then used to construct a mapping between the stochastic input parameters and the diffusion coordinates of the reduced space. An adaptive clustering technique is proposed to identify an optimal number of clusters of points in the latent space. The similarity of points allows us to construct a number of geometric harmonic emulators which are finally utilized as a set of inexpensive pre-trained models to perform an inverse map of realizations of latent features to the ambient space and thus perform accurate out-of-sample predictions. Thus, the proposed method acts as an encoder-decoder system which is able to automatically handle very high-dimensional data while simultaneously operating successfully in the small-data regime. The method is demonstrated on two benchmark problems and on a system of advection-diffusion-reaction equations which model a first-order chemical reaction between two species. In all test cases, the proposed method is able to achieve highly accurate approximations which ultimately lead to the significant acceleration of UQ tasks.

其他(15篇)

【1】 Using system context information to complement weakly labeled data 标题:使用系统上下文信息来补充弱标签数据

作者:Matthias Meyer,Michaela Wenner,Clément Hibert,Fabian Walter,Lothar Thiele 机构:Computer Engineering and Networks Laboratory, Zurich, Switzerland, Laboratory of Hydraulics, Hydrology and Glaciology, ETH ZurichWSL Birmensdorf, Cl´ement Hibert, Institut de Physique du Globe de Strasbourg, University of Strasbourg 备注:Also appears in "Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)" arXiv:2107.03690 链接:https://arxiv.org/abs/2107.10236 摘要:用传感器网络收集的真实世界数据集通常包含不完整和不确定的标签以及系统环境中产生的人工制品。由于人力和时间开销、专家可用性有限以及缺少地面信息,对于大规模和长期的传感器网络部署来说,完整和可靠的标记通常是不可行的。此外,如果用于分析的机器学习方法对部署的某些特征敏感,则需要为每个新部署重复标记和学习。为了应对这些挑战,我们提出利用信息图中形式化的系统上下文信息,通过对比学习将其嵌入到学习过程中。基于实际数据,我们证明了这种方法在弱标记数据的情况下提高了准确度,并提高了分类器的鲁棒性和可转移性。 摘要:Real-world datasets collected with sensor networks often contain incomplete and uncertain labels as well as artefacts arising from the system environment. Complete and reliable labeling is often infeasible for large-scale and long-term sensor network deployments due to the labor and time overhead, limited availability of experts and missing ground truth. In addition, if the machine learning method used for analysis is sensitive to certain features of a deployment, labeling and learning needs to be repeated for every new deployment. To address these challenges, we propose to make use of system context information formalized in an information graph and embed it in the learning process via contrastive learning. Based on real-world data we show that this approach leads to an increased accuracy in case of weakly labeled data and leads to an increased robustness and transferability of the classifier to new sensor locations.

【2】 JEFL: Joint Embedding of Formal Proof Libraries 标题:JEFL:形式证明库的联合嵌入

作者:Qingxiang Wang,Cezary Kaliszyk 机构: University of Innsbruck, Austria, University of Warsaw, Poland 备注:Submission to FroCoS 2021 链接:https://arxiv.org/abs/2107.10188 摘要:在不同的交互式证明辅助库中使用的逻辑基础的异质性使得在它们之间发现相似的数学概念变得困难。在本文中,我们比较了先前提出的跨库概念匹配算法和我们的无监督嵌入方法,它可以帮助我们检索相似的概念。我们的方法基于Word2Vec的fasttext实现,在其上添加了一个树遍历模块,以使其算法适应数据输出管道的表示格式。我们比较了这些方法的可解释性、可定制性和在线可服务性,并认为神经嵌入方法更有潜力集成到交互式证明辅助中。 摘要:The heterogeneous nature of the logical foundations used in different interactive proof assistant libraries has rendered discovery of similar mathematical concepts among them difficult. In this paper, we compare a previously proposed algorithm for matching concepts across libraries with our unsupervised embedding approach that can help us retrieve similar concepts. Our approach is based on the fasttext implementation of Word2Vec, on top of which a tree traversal module is added to adapt its algorithm to the representation format of our data export pipeline. We compare the explainability, customizability, and online-servability of the approaches and argue that the neural embedding approach has more potential to be integrated into an interactive proof assistant.

【3】 Leave-one-out Unfairness 标题:一刀切的不公平

作者:Emily Black,Matt Fredrikson 机构:Carnegie Mellon University 备注:None 链接:https://arxiv.org/abs/2107.10171 摘要:我们引入了漏掉一个不公平,它描述了由于模型的训练数据中包含或删除了一个单独的其他人,模型对一个人的预测有多大可能发生变化。撇开不公平,公平的决定不是武断的:它们不应该基于任何一个人被纳入训练数据的偶然事件。漏掉不公平与算法的稳定性密切相关,但它关注的是单个点的预测结果对训练数据的单位变化的一致性,而不是模型的总体误差。除了形式化漏掉不公平之外,我们还描述了深层模型在真实数据上漏掉不公平的程度,包括在泛化误差很小的情况下。此外,我们证明了对抗性训练和随机平滑技术对漏掉公平性有相反的影响,从而揭示了深度模型中稳健性、记忆、个体公平性和漏掉公平性之间的关系。最后,我们讨论了突出的实际应用,可能会受到负面影响,漏掉一个不公平。 摘要:We introduce leave-one-out unfairness, which characterizes how likely a model's prediction for an individual will change due to the inclusion or removal of a single other person in the model's training data. Leave-one-out unfairness appeals to the idea that fair decisions are not arbitrary: they should not be based on the chance event of any one person's inclusion in the training data. Leave-one-out unfairness is closely related to algorithmic stability, but it focuses on the consistency of an individual point's prediction outcome over unit changes to the training data, rather than the error of the model in aggregate. Beyond formalizing leave-one-out unfairness, we characterize the extent to which deep models behave leave-one-out unfairly on real data, including in cases where the generalization error is small. Further, we demonstrate that adversarial training and randomized smoothing techniques have opposite effects on leave-one-out fairness, which sheds light on the relationships between robustness, memorization, individual fairness, and leave-one-out fairness in deep models. Finally, we discuss salient practical applications that may be negatively affected by leave-one-out unfairness.

【4】 Incentivizing Compliance with Algorithmic Instruments 标题:激励遵守算法工具

作者:Daniel Ngo,Logan Stapleton,Vasilis Syrgkanis,Zhiwei Steven Wu 机构: and Zhiwei Steven Wu 3 1University of Minnesota, com 3Carnegie Mellon University 备注:In Proceedings of the Thirty-eighth International Conference on Machine Learning (ICML 2021), 17 pages of main text, 53 pages total, 3 figures 链接:https://arxiv.org/abs/2107.10093 摘要:随机实验由于参与者的潜在不合规性而容易受到选择偏差的影响。虽然现有的许多工作都将遵从性作为一种静态行为进行了研究,但我们提出了一个博弈论模型来将遵从性作为可能随时间变化的动态行为进行研究。在轮次中,社会规划者与一系列异质代理人互动,这些代理人带着他们未被观察到的私人类型到达,这些类型决定了他们在行动中的优先选择(例如控制和治疗)以及他们在不接受任何治疗的情况下的基线奖励。计划者为每个代理提供一个随机的建议,可以改变他们的信念和他们的行动选择。我们发展了一种新的推荐机制,将规划者的推荐视为一种工具变量(IV),它只影响代理人的行为选择,而不影响观察到的报酬。我们通过仔细地将历史(计划者和先前代理之间的交互)映射到随机推荐来构建这样的IVs。即使最初的药物可能完全不合规,我们的机制可以随着时间的推移激励依从性,从而能够估计每种治疗的治疗效果,并将以确定最佳治疗为目标的规划者的累积后悔最小化。 摘要:Randomized experiments can be susceptible to selection bias due to potential non-compliance by the participants. While much of the existing work has studied compliance as a static behavior, we propose a game-theoretic model to study compliance as dynamic behavior that may change over time. In rounds, a social planner interacts with a sequence of heterogeneous agents who arrive with their unobserved private type that determines both their prior preferences across the actions (e.g., control and treatment) and their baseline rewards without taking any treatment. The planner provides each agent with a randomized recommendation that may alter their beliefs and their action selection. We develop a novel recommendation mechanism that views the planner's recommendation as a form of instrumental variable (IV) that only affects an agents' action selection, but not the observed rewards. We construct such IVs by carefully mapping the history -- the interactions between the planner and the previous agents -- to a random recommendation. Even though the initial agents may be completely non-compliant, our mechanism can incentivize compliance over time, thereby enabling the estimation of the treatment effect of each treatment, and minimizing the cumulative regret of the planner whose goal is to identify the optimal treatment.

【5】 Interpreting diffusion score matching using normalizing flow 标题:用归一化流程解释扩散分数匹配

作者:Wenbo Gong,Yingzhen Li 机构: Such useful properties enable them to bewidely applied in training energy-based model (EBM) (SongEqual contribution 1Department of Engineering, University ofCambridge, UK 2Department of Computing 备注:8 pages, International Conference on Machine Learning (ICML) INNF 2021 Workshop Spotlight 链接:https://arxiv.org/abs/2107.10072 摘要:评分匹配(SM)及其对应的Stein差异(SD)在模型训练和评价中取得了巨大的成功。然而,最近的研究显示了它们在处理某些类型的分布时的局限性。一种可能的解决方法是将原始分数匹配(或Stein差异)与扩散矩阵相结合,这称为扩散分数匹配(DSM)(或扩散Stein差异(DSD))。然而,缺乏对扩散的解释限制了它在简单分布和人工选择矩阵中的应用。在这项工作中,我们计划通过使用标准化流解释扩散矩阵来填补这一空白。具体地说,我们从理论上证明了DSM(或DSD)等价于正规化流定义的变换空间中的原始分数匹配(或Stein差异),其中扩散矩阵是流的Jacobian矩阵的逆。此外,我们还建立了它与黎曼流形的联系,并进一步推广到连续流,其中DSM的变化以常微分方程为特征。 摘要:Scoring matching (SM), and its related counterpart, Stein discrepancy (SD) have achieved great success in model training and evaluations. However, recent research shows their limitations when dealing with certain types of distributions. One possible fix is incorporating the original score matching (or Stein discrepancy) with a diffusion matrix, which is called diffusion score matching (DSM) (or diffusion Stein discrepancy (DSD)). However, the lack of interpretation of the diffusion limits its usage within simple distributions and manually chosen matrix. In this work, we plan to fill this gap by interpreting the diffusion matrix using normalizing flows. Specifically, we theoretically prove that DSM (or DSD) is equivalent to the original score matching (or Stein discrepancy) evaluated in the transformed space defined by the normalizing flow, where the diffusion matrix is the inverse of the flow's Jacobian matrix. In addition, we also build its connection to Riemannian manifolds and further extend it to continuous flows, where the change of DSM is characterized by an ODE.

【6】 Differentiable Feature Selection, a Reparameterization Approach 标题:可微特征选择--一种再参数化方法

作者:Jérémie Dona,Patrick Gallinari 机构: Sorbonne Universit´e, CNRS, LIP, F-, Paris, France, Criteo AI Labs, Paris, France 备注:None 链接:https://arxiv.org/abs/2107.10030 摘要:我们考虑重建的特征选择的任务,它包括从整个数据实例中选择可以重构的特征的小子集。这在涉及例如昂贵的物理测量、传感器放置或信息压缩的若干环境中尤其重要。为了打破这个问题固有的组合性质,我们制定的任务是优化二元掩模分布,使准确的重建。然后,我们面临两个主要挑战。一个是由于二元分布引起的可微性问题。第二种方法是通过以相关方式选择变量来消除冗余信息,这需要对二进制分布的协方差进行建模。我们通过引入一种新的对数正态分布的重参数化来解决这两个问题。通过对多个高维图像基准点的评价,证明了该方法提供了一种有效的探测方案,并能有效地进行特征选择。我们表明,该方法利用了数据的内在几何结构,有利于重建。 摘要:We consider the task of feature selection for reconstruction which consists in choosing a small subset of features from which whole data instances can be reconstructed. This is of particular importance in several contexts involving for example costly physical measurements, sensor placement or information compression. To break the intrinsic combinatorial nature of this problem, we formulate the task as optimizing a binary mask distribution enabling an accurate reconstruction. We then face two main challenges. One concerns differentiability issues due to the binary distribution. The second one corresponds to the elimination of redundant information by selecting variables in a correlated fashion which requires modeling the covariance of the binary distribution. We address both issues by introducing a relaxation of the problem via a novel reparameterization of the logitNormal distribution. We demonstrate that the proposed method provides an effective exploration scheme and leads to efficient feature selection for reconstruction through evaluation on several high dimensional image benchmarks. We show that the method leverages the intrinsic geometry of the data, facilitating reconstruction.

【7】 Deep Iterative 2D/3D Registration 标题:深度迭代2D/3D配准

作者:Srikrishna Jaganathan,Jian Wang,Anja Borsdorf,Karthik Shetty,Andreas Maier 机构: Pattern Recognition Lab, FAU Erlangen-N¨urnberg, Erlangen, Germany., Siemens Healthineers AG, Forchheim, Germany. 备注:10 pages,2 figures, Accepted at MICCAI 2021 链接:https://arxiv.org/abs/2107.10004 摘要:基于深度学习的二维/三维配准方法具有很强的鲁棒性,但在临床应用中往往缺乏必要的配准精度。使用基于经典优化的2D/3D配准方法和基于深度学习的技术相结合的细化步骤可以提供所需的精度。但是,它也会增加运行时间。在这项工作中,我们提出了一个新的深度学习驱动的2D/3D注册框架,它可以端到端地用于迭代注册任务,而无需依赖任何进一步的细化步骤。我们通过学习使用点到平面对应的2D/3D配准框架的更新步骤来实现这一点。使用基于迭代残差精化的光流估计,结合作为已知算子嵌入的点到平面对应解算器,学习更新步骤。该方法的平均运行时间约为8s,平均重投影距离误差为0.60$pm$0.40mm,成功率为97%,捕获范围为60mm。高配准精度、高鲁棒性和快速运行时间的结合使得我们的解决方案非常适合临床应用。 摘要:Deep Learning-based 2D/3D registration methods are highly robust but often lack the necessary registration accuracy for clinical application. A refinement step using the classical optimization-based 2D/3D registration method applied in combination with Deep Learning-based techniques can provide the required accuracy. However, it also increases the runtime. In this work, we propose a novel Deep Learning driven 2D/3D registration framework that can be used end-to-end for iterative registration tasks without relying on any further refinement step. We accomplish this by learning the update step of the 2D/3D registration framework using Point-to-Plane Correspondences. The update step is learned using iterative residual refinement-based optical flow estimation, in combination with the Point-to-Plane correspondence solver embedded as a known operator. Our proposed method achieves an average runtime of around 8s, a mean re-projection distance error of 0.60 $pm$ 0.40 mm with a success ratio of 97 percent and a capture range of 60 mm. The combination of high registration accuracy, high robustness, and fast runtime makes our solution ideal for clinical applications.

【8】 Online structural kernel selection for mobile health 标题:面向移动健康的在线结构内核选择

作者:Eura Shin,Pedja Klasnja,Susan Murphy,Finale Doshi-Velez 备注:Workshop paper in ICML IMLH 2021 链接:https://arxiv.org/abs/2107.09949 摘要:基于移动健康中高效个性化学习的需求,研究了多任务环境下高斯过程回归的在线核选择问题。为此,我们提出了一种新的生成过程。我们的方法证明了核演化的轨迹可以在用户之间传递以提高学习效率,并且核本身对于健康预测目标是有意义的。 摘要:Motivated by the need for efficient and personalized learning in mobile health, we investigate the problem of online kernel selection for Gaussian Process regression in the multi-task setting. We propose a novel generative process on the kernel composition for this purpose. Our method demonstrates that trajectories of kernel evolutions can be transferred between users to improve learning and that the kernels themselves are meaningful for an mHealth prediction goal.

【9】 Design of Experiments for Stochastic Contextual Linear Bandits 标题:随机背景线性图的实验设计

作者:Andrea Zanette,Kefan Dong,Jonathan Lee,Emma Brunskill 机构:Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, Department of Computer Science 备注:Initial submission 链接:https://arxiv.org/abs/2107.09912 摘要:在随机线性背景bandit设置中,存在多个minimax程序用于策略的探索,这些策略对所获取的数据是反应性的。在实践中,部署这些算法可能会有很大的工程开销,特别是当数据集以分布式方式收集时,或者当需要人在回路中实现不同的策略时。在这种情况下,使用单一的非反应性策略进行探索是有益的。假设某些批处理上下文是可用的,我们设计一个单一的随机策略来收集一个好的数据集,从中可以提取一个接近最优的策略。我们提出了一个理论分析以及数值实验的合成和现实世界的数据集。 摘要:In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Exploring with a single non-reactive policy is beneficial in such cases. Assuming some batch contexts are available, we design a single stochastic policy to collect a good dataset from which a near-optimal policy can be extracted. We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets.

【10】 Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates 标题:私有交替最小二乘:具有更紧速率的实用私有矩阵完成

作者:Steve Chien,Prateek Jain,Walid Krichene,Steffen Rendle,Shuang Song,Abhradeep Thakurta,Li Zhang 链接:https://arxiv.org/abs/2107.09802 摘要:研究了用户级隐私下的差分私有矩阵完备问题。我们设计了一种流行的交替最小二乘(ALS)方法的联合差分私有变体,该方法实现了:(i)矩阵完成的样本复杂度(以项目数、用户数为单位)接近最优,以及(ii)理论上和基准数据集上最著名的隐私/效用权衡。特别是,我们首次对引入噪声以确保DP的ALS进行了全局收敛性分析,并表明,与最著名的替代方案(Jain et al.(2018)提出的私有Frank Wolfe算法)相比,我们的误差界限在项目和用户数量方面具有更好的伸缩性,这在实际问题中是至关重要的。在标准基准上的广泛验证表明,该算法与精心设计的采样程序相结合,比现有的技术具有更高的精度,有望成为第一个实用的DP嵌入模型。 摘要:We study the problem of differentially private (DP) matrix completion under user-level privacy. We design a joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) the best known privacy/utility trade-off both theoretically, as well as on benchmark data sets. In particular, we provide the first global convergence analysis of ALS with noise introduced to ensure DP, and show that, in comparison to the best known alternative (the Private Frank-Wolfe algorithm by Jain et al. (2018)), our error bounds scale significantly better with respect to the number of items and users, which is critical in practical problems. Extensive validation on standard benchmarks demonstrate that the algorithm, in combination with carefully designed sampling procedures, is significantly more accurate than existing techniques, thus promising to be the first practical DP embedding model.

【11】 Faster Matchings via Learned Duals 标题:通过学习的对偶实现更快的匹配

作者:Michael Dinitz,Sungjin Im,Thomas Lavastida,Benjamin Moseley,Sergei Vassilvitskii 备注:27 pages, 7 figures 链接:https://arxiv.org/abs/2107.09770 摘要:最近的一项研究调查了如何用机器学习的预测来增强算法,以克服最坏情况的下限。这一领域揭示了有趣的算法洞察问题,特别是成功的设计有竞争力的在线算法。然而,用预测来改进算法运行时间的问题在很大程度上还没有被探索。我们在这个方向上迈出了第一步,将机器学习预测的思想与“温启动”原始-对偶算法的思想结合起来。我们考虑组合优化中最重要的原语之一:加权二分匹配及其推广到$b$匹配。我们确定了三个关键的挑战时,使用学习对偶变量的原始对偶算法。首先,预测对偶可能是不可行的,所以我们给出了一个算法,有效地将预测不可行对偶映射到附近的可行解。第二,一旦对偶是可行的,他们可能不是最优的,所以我们证明了他们可以用来快速找到一个最优解。最后,这种预测只有在可以学习的情况下才有用,因此我们证明了学习对偶匹配问题的样本复杂度很低。我们通过对真实数据和合成数据的实验来验证我们的理论发现。因此,我们给出了一个严格的,实用的,经验有效的方法来计算二部匹配。 摘要:A recent line of research investigates how algorithms can be augmented with machine-learned predictions to overcome worst case lower bounds. This area has revealed interesting algorithmic insights into problems, with particular success in the design of competitive online algorithms. However, the question of improving algorithm running times with predictions has largely been unexplored. We take a first step in this direction by combining the idea of machine-learned predictions with the idea of "warm-starting" primal-dual algorithms. We consider one of the most important primitives in combinatorial optimization: weighted bipartite matching and its generalization to $b$-matching. We identify three key challenges when using learned dual variables in a primal-dual algorithm. First, predicted duals may be infeasible, so we give an algorithm that efficiently maps predicted infeasible duals to nearby feasible solutions. Second, once the duals are feasible, they may not be optimal, so we show that they can be used to quickly find an optimal solution. Finally, such predictions are useful only if they can be learned, so we show that the problem of learning duals for matching has low sample complexity. We validate our theoretical findings through experiments on both real and synthetic data. As a result we give a rigorous, practical, and empirically effective method to compute bipartite matchings.

【12】 What Do You Get When You Cross Beam Search with Nucleus Sampling? 标题:当您使用Nucleus采样进行交叉光束搜索时,您会得到什么?

作者:Uri Shaham,Omer Levy 机构:The Blavatnik School of Computer Science, Tel Aviv University 链接:https://arxiv.org/abs/2107.09729 摘要:本文将波束搜索与核采样的概率剪枝技术相结合,提出了两种用于自然语言生成的确定性核搜索算法。第一种算法p-exact search对下一个令牌分布进行局部剪枝,并对剩余空间进行精确搜索。第二种算法,动态波束搜索,根据候选概率分布的熵来缩小和扩大波束大小。尽管nucleus搜索背后有概率直觉,但在机器翻译和摘要基准测试上的实验表明,这两种算法都达到了与标准beam搜索相同的性能水平。 摘要:We combine beam search with the probabilistic pruning technique of nucleus sampling to create two deterministic nucleus search algorithms for natural language generation. The first algorithm, p-exact search, locally prunes the next-token distribution and performs an exact search over the remaining space. The second algorithm, dynamic beam search, shrinks and expands the beam size according to the entropy of the candidate's probability distribution. Despite the probabilistic intuition behind nucleus search, experiments on machine translation and summarization benchmarks show that both algorithms reach the same performance levels as standard beam search.

【13】 Differentiable Annealed Importance Sampling and the Perils of Gradient Noise 标题:可微退火重要性抽样与梯度噪声的危害

作者:Guodong Zhang,Kyle Hsu,Jianing Li,Chelsea Finn,Roger Grosse 机构:University of Toronto,Vector Institute,Stanford University 备注:22 pages 链接:https://arxiv.org/abs/2107.10211 摘要:退火重要性抽样(AIS)及其相关算法是边缘似然估计的高效工具,但由于使用了Metropolis-Hastings(MH)校正步骤,因此不完全可微。可微性是一个可取的性质,因为它将承认的可能性,优化边际可能性作为一个目标使用梯度为基础的方法。为此,我们提出了一种可微AIS算法,通过放弃MH步骤,进一步解除了小批量计算的锁定。我们提供了一个详细的收敛性分析贝叶斯线性回归超越了以往的分析,明确说明非完美过渡。利用这个分析,我们证明了我们的算法在整批设置下是一致的,并且提供了一个次线性的收敛速度。然而,我们证明了算法在使用小批量梯度时是不一致的,这是由于最后一次迭代收敛到后验点和消除路径随机误差的目标之间的根本不相容。这一结果与随机优化和随机梯度Langevin动力学的经验形成了鲜明的对比,在随机优化和随机梯度Langevin动力学中,梯度噪声的影响可以通过采取更小的步骤来消除。我们的否定结果主要依赖于我们对平稳分布收敛性的明确考虑,这有助于解释开发实用有效的利用小批量梯度的AIS类算法的困难。 摘要:Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings (MH) correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose a differentiable AIS algorithm by abandoning MH steps, which further unlocks mini-batch computation. We provide a detailed convergence analysis for Bayesian linear regression which goes beyond previous analyses by explicitly accounting for non-perfect transitions. Using this analysis, we prove that our algorithm is consistent in the full-batch setting and provide a sublinear convergence rate. However, we show that the algorithm is inconsistent when mini-batch gradients are used due to a fundamental incompatibility between the goals of last-iterate convergence to the posterior and elimination of the pathwise stochastic error. This result is in stark contrast to our experience with stochastic optimization and stochastic gradient Langevin dynamics, where the effects of gradient noise can be washed out by taking more steps of a smaller size. Our negative result relies crucially on our explicit consideration of convergence to the stationary distribution, and it helps explain the difficulty of developing practically effective AIS-like algorithms that exploit mini-batch gradients.

【14】 A variational approximate posterior for the deep Wishart process 标题:深度Wishart过程的变分近似后验估计

作者:Sebastian W. Ober,Laurence Aitchison 机构:Department of Engineering, University of Cambridge, Cambridge, UK, Department of Computer Science, University of Bristol, Bristol, UK 备注:20 pages 链接:https://arxiv.org/abs/2107.10125 摘要:最近的工作引入了深度内核进程作为NNs的完全基于内核的替代方案(Aitchison等人,2020)。深核过程通过交替地从半正定矩阵上的分布中采样核并执行非线性变换,灵活地学习良好的顶层表示。一种特殊的深核过程,即深Wishart过程(DWP),由于它的先验等价于深高斯过程(DGP)的先验而引起了人们的特别关注。然而,由于在半正定矩阵上缺乏足够灵活的分布,DWPs中的推理仍然是不可能的。本文通过推广Wishart概率密度的Bartlett分解,给出了一种在半正定矩阵上获得柔性分布的新方法。我们使用这个新的分布来发展一个包含跨层依赖的DWP的近似后验分布。我们提出了一种双随机诱导点的DWP推理方案,并通过实验证明了在DWP中的推理比在具有等价先验知识的DGP中的推理具有更好的性能。 摘要:Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particular interest because its prior is equivalent to deep Gaussian process (DGP) priors. However, inference in DWPs has not yet been possible due to the lack of sufficiently flexible distributions over positive semi-definite matrices. Here, we give a novel approach to obtaining flexible distributions over positive semi-definite matrices by generalising the Bartlett decomposition of the Wishart probability density. We use this new distribution to develop an approximate posterior for the DWP that includes dependency across layers. We develop a doubly-stochastic inducing-point inference scheme for the DWP and show experimentally that inference in the DWP gives improved performance over doing inference in a DGP with the equivalent prior.

【15】 Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA 标题:通过机制稀疏性发现潜在因果变量:非线性独立成分分析的一种新原理

作者:Sébastien Lachapelle,Pau Rodríguez López,Rémi Le Priol,Alexandre Lacoste,Simon Lacoste-Julien 机构: Universit´e de Montr´eal 2Element AI 备注:Appears in: Workshop on the Neglected Assumptions in Causal Inference (NACI) at the 38 th International Conference on Machine Learning, 2021. 19 pages 链接:https://arxiv.org/abs/2107.10098 摘要:可以说,找到一个可解释的低维代表一个潜在的高维现象是核心的科学事业。独立分量分析(ICA)是一种将这一目标形式化并为实际应用提供估计过程的综合方法。当潜在因素稀疏地依赖于观测辅助变量和/或过去的潜在因素时,本文提出机制稀疏正则化作为实现非线性ICA的新原则。我们证明,如果将潜在机制正则化为稀疏,并且数据生成过程满足某种图形准则,则潜在变量可以恢复为一个排列。作为一个特例,我们的框架展示了如何利用未知目标干预对潜在因素进行分解,从而在独立分量分析和因果关系之间建立进一步的联系。我们用玩具实验验证了我们的理论结果。 摘要:It can be argued that finding an interpretable low-dimensional representation of a potentially high-dimensional phenomenon is central to the scientific enterprise. Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application. This work proposes mechanism sparsity regularization as a new principle to achieve nonlinear ICA when latent factors depend sparsely on observed auxiliary variables and/or past latent factors. We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse and if some graphical criterion is satisfied by the data generating process. As a special case, our framework shows how one can leverage unknown-target interventions on the latent factors to disentangle them, thus drawing further connections between ICA and causality. We validate our theoretical results with toy experiments.

0 人点赞