机器学习学术速递[7.12]

2021-07-27 10:47:24 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.LG 方向,今日共计83篇

Graph相关(图学习|图神经网络|图优化等)(3篇)

【1】 Graph-based Deep Generative Modelling for Document Layout Generation 标题:基于图的文档版面生成的深度生成建模

作者:Sanket Biswas,Pau Riba,Josep Lladós,Umapada Pal 机构: Computer Vision Center & Computer Science Department, Universitat Autonoma de Barcelona, Spain, CVPR Unit, Indian Statistical Institute, India 备注:Accepted by ICDAR Workshops-GLESDO 2021 链接:https://arxiv.org/abs/2107.04357 摘要:任何深度学习方法的主要先决条件之一是提供大规模的训练数据。在处理现实场景中扫描的文档图像时,其内容的主要信息存储在版面中。在这项工作中,我们提出了一个自动化的深度生成模型,使用图神经网络(GNNs)生成具有高度可变和合理文档布局的合成数据,可用于训练文档解释系统,在这种情况下,特别是在数字邮件收发应用中。这也是第一个基于图形的文档布局生成方法,在管理文档图像(本例中是发票)上进行了实验。 摘要:One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.

【2】 Robust Counterfactual Explanations on Graph Neural Networks 标题:图神经网络的鲁棒反事实解释

作者:Mohit Bajaj,Lingyang Chu,Zi Yu Xue,Jian Pei,Lanjun Wang,Peter Cho-Ho Lam,Yong Zhang 机构:Huawei Technologies Canada Co. Ltd., McMaster University, The University of British Columbia, Simon Fraser University 链接:https://arxiv.org/abs/2107.04086 摘要:图神经网络(GNNs)在高风险应用中的大规模部署对解释产生了强烈的需求,这些解释对噪声具有鲁棒性,并与人类的直觉保持一致。大多数现有的方法通过识别与预测有很强相关性的输入图的子图来产生解释。这些解释对噪声不可靠,因为单独优化单个输入的相关性很容易过拟合噪声。此外,它们与人类的直觉并不一致,因为从输入图中删除已识别的子图并不一定会改变预测结果。在本文中,我们提出了一种新的方法,通过在相似的输入图上显式地建模GNNs的公共决策逻辑来生成GNNs的鲁棒反事实解释。我们的解释对噪声具有天然的鲁棒性,因为它们是由支配许多相似输入图的预测的GNN的公共决策边界产生的。这些解释也很符合人类的直觉,因为从输入图中删除解释所识别的一组边会显著改变预测。在许多公共数据集上的详尽实验证明了该方法的优越性能。 摘要:Massive deployment of Graph Neural Networks (GNNs) in high-stake applications generates a strong demand for explanations that are robust to noise and align well with human intuition. Most existing methods generate explanations by identifying a subgraph of an input graph that has a strong correlation with the prediction. These explanations are not robust to noise because independently optimizing the correlation for a single input can easily overfit noise. Moreover, they do not align well with human intuition because removing an identified subgraph from an input graph does not necessarily change the prediction result. In this paper, we propose a novel method to generate robust counterfactual explanations on GNNs by explicitly modelling the common decision logic of GNNs on similar input graphs. Our explanations are naturally robust to noise because they are produced from the common decision boundaries of a GNN that govern the predictions of many similar input graphs. The explanations also align well with human intuition because removing the set of edges identified by an explanation from the input graph changes the prediction significantly. Exhaustive experiments on many public datasets demonstrate the superior performance of our method.

【3】 Quantitative Evaluation of Explainable Graph Neural Networks for Molecular Property Prediction 标题:分子性质预测的可解释图神经网络定量评价

作者:Jiahua Rao,Shuangjia Zheng,Yuedong Yang 机构: 20 19;Equal contribution 1School of Computer Science and En-gineering, Sun Yat-sen University 2Galixir 链接:https://arxiv.org/abs/2107.04119 摘要:机器学习的进步导致了基于图形神经网络的药物发现方法,在分子设计、化学合成规划和分子性质预测方面产生了有希望的结果。然而,目前的图神经网络(GNNs)由于缺乏可解释性,在药物发现中的应用受到限制。尽管可解释人工智能(XAI)技术的发展缓解了这一主要弱点,但大多数可解释任务中的“基本真理”分配最终取决于人类的主观判断,因此模型解释的质量难以定量评估。在这项工作中,我们首先建立了三个水平的基准数据集来定量评估最先进的GNN模型的可解释性。然后我们结合不同的GNN算法实现了最新的XAI方法,以突出药物发现的优点、局限性和未来的机会。因此,GradInput和IG通常为GNNs提供最佳的模型解释能力,特别是当与GraphNet和CMPNN结合使用时。集成和开发的XAI包是完全开源的,可以被从业者用来训练其他药物发现任务的新模型。 摘要:Advances in machine learning have led to graph neural network-based methods for drug discovery, yielding promising results in molecular design, chemical synthesis planning, and molecular property prediction. However, current graph neural networks (GNNs) remain of limited acceptance in drug discovery is limited due to their lack of interpretability. Although this major weakness has been mitigated by the development of explainable artificial intelligence (XAI) techniques, the "ground truth" assignment in most explainable tasks ultimately rests with subjective judgments by humans so that the quality of model interpretation is hard to evaluate in quantity. In this work, we first build three levels of benchmark datasets to quantitatively assess the interpretability of the state-of-the-art GNN models. Then we implemented recent XAI methods in combination with different GNN algorithms to highlight the benefits, limitations, and future opportunities for drug discovery. As a result, GradInput and IG generally provide the best model interpretability for GNNs, especially when combined with GraphNet and CMPNN. The integrated and developed XAI package is fully open-sourced and can be used by practitioners to train new models on other drug discovery tasks.

Transformer(1篇)

【1】 ViTGAN: Training GANs with Vision Transformers 标题:ViTGAN:用视觉Transformer训练Gan

作者:Kwonjoon Lee,Huiwen Chang,Lu Jiang,Han Zhang,Zhuowen Tu,Ce Liu 机构:UC San Diego, Google Research 链接:https://arxiv.org/abs/2107.04589 摘要:近年来,视觉Transformer(vit)在图像识别方面表现出了很强的竞争力,同时对视觉感应偏差的要求也越来越低。在本文中,我们探讨了这种观察是否可以扩展到图像生成。为此,我们将ViT架构集成到生成性对抗网络(GAN)中。我们观察到,现有的正则化方法与自我注意的交互作用很差,导致训练过程中严重的不稳定性。为了解决这个问题,我们引入了一种新的正则化技术来训练具有ViTs的GANs。根据经验,我们的方法名为ViTGAN,在CIFAR-10、CelebA和LSUN卧室数据集上实现了与基于CNN的最新StyleGAN2相当的性能。 摘要:Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such observation can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). We observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce novel regularization techniques for training GANs with ViTs. Empirically, our approach, named ViTGAN, achieves comparable performance to state-of-the-art CNN-based StyleGAN2 on CIFAR-10, CelebA, and LSUN bedroom datasets.

GAN|对抗|攻击|生成相关(5篇)

【1】 White-Box Cartoonization Using An Extended GAN Framework 标题:使用扩展GaN框架的白盒卡通化

作者:Amey Thakur,Hasan Rizvi,Mega Satish 机构:Department of Computer Engineering, Department of Computer Engineering, Department of Computer Engineering, University of Mumbai, University of Mumbai, University of Mumbai 备注:5 pages, 6 figures. International Journal of Engineering Applied Sciences and Technology, 2021 链接:https://arxiv.org/abs/2107.04551 摘要:在本研究中,我们提出一个新的框架来估计生成模型,通过一个对抗过程来扩展现有的GAN框架,并开发一个白盒可控的图像卡通化,它可以从真实世界的照片和视频中生成高质量的卡通图像/视频。我们系统的学习目的是基于三种不同的表示:表面表示、结构表示和纹理表示。表面表示是指图像的光滑表面。结构表示与稀疏色块相关并压缩一般内容。纹理表示法表示卡通图像中的纹理、曲线和特征。生成性对抗网络(GAN)框架将图像分解为不同的表示形式,并从中学习生成卡通图像。这种分解使得框架更加可控和灵活,允许用户根据所需的输出进行更改。这种方法克服了以往任何系统在保持清晰度,颜色,纹理,形状的图像,但显示出卡通形象的特点。 摘要:In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate high-quality cartooned images/videos from real-world photos and videos. The learning purposes of our system are based on three distinct representations: surface representation, structure representation, and texture representation. The surface representation refers to the smooth surface of the images. The structure representation relates to the sparse colour blocks and compresses generic content. The texture representation shows the texture, curves, and features in cartoon images. Generative Adversarial Network (GAN) framework decomposes the images into different representations and learns from them to generate cartoon images. This decomposition makes the framework more controllable and flexible which allows users to make changes based on the required output. This approach overcomes any previous system in terms of maintaining clarity, colours, textures, shapes of images yet showing the characteristics of cartoon images.

【2】 Adversarial Mixture Density Networks: Learning to Drive Safely from Collision Data 标题:对抗性混合密度网络:从碰撞数据中学习安全驾驶

作者:Sampo Kuutti,Saber Fallah,Richard Bowden 备注:Accepted in IEEE Intelligent Transportation Systems Conference (ITSC) 2021 链接:https://arxiv.org/abs/2107.04485 摘要:模仿学习被广泛应用于基于预记录数据的自动驾驶控制策略学习。然而,基于模仿学习的策略在遇到训练分布以外的状态时,容易出现复合错误。此外,这些代理已被证明很容易被敌对道路使用者利用,目的是制造碰撞。为了克服这些缺点,我们引入了对抗性混合密度网络(AMDN),它从不同的数据集中学习两个分布。第一种是从自然人驾驶数据集中学习到的安全行为分布。第二个分布表示可能导致冲突的不安全操作,它是从冲突数据集学习的。在训练过程中,我们利用这两个分布来提供基于两个分布相似性的额外损失。在碰撞数据集上训练时,根据安全行为分布与不安全行为分布的相似性对安全行为分布进行惩罚,得到一种更为鲁棒和安全的控制策略。我们在车辆跟踪用例中演示了所提出的AMDN方法,并在自然和对抗性测试环境下进行了评估。我们发现,尽管AMDN方法简单,但与纯模仿学习或标准混合密度网络方法相比,AMDN在学习控制策略的安全性方面具有显著的优势。 摘要:Imitation learning has been widely used to learn control policies for autonomous driving based on pre-recorded data. However, imitation learning based policies have been shown to be susceptible to compounding errors when encountering states outside of the training distribution. Further, these agents have been demonstrated to be easily exploitable by adversarial road users aiming to create collisions. To overcome these shortcomings, we introduce Adversarial Mixture Density Networks (AMDN), which learns two distributions from separate datasets. The first is a distribution of safe actions learned from a dataset of naturalistic human driving. The second is a distribution representing unsafe actions likely to lead to collision, learned from a dataset of collisions. During training, we leverage these two distributions to provide an additional loss based on the similarity of the two distributions. By penalising the safe action distribution based on its similarity to the unsafe action distribution when training on the collision dataset, a more robust and safe control policy is obtained. We demonstrate the proposed AMDN approach in a vehicle following use-case, and evaluate under naturalistic and adversarial testing environments. We show that despite its simplicity, AMDN provides significant benefits for the safety of the learned control policy, when compared to pure imitation learning or standard mixture density network approaches.

【3】 Adversarial Domain Adaptation with Self-Training for EEG-based Sleep Stage Classification 标题:基于脑电睡眠阶段分类的自训练对抗性领域自适应

作者:Emadeldeen Eldele,Mohamed Ragab,Zhenghua Chen,Min Wu,Chee-Keong Kwoh,Xiaoli Li,Cuntai Guan 备注:Under review 链接:https://arxiv.org/abs/2107.04470 摘要:睡眠分期在睡眠障碍的诊断和治疗中具有重要意义。最近,许多数据驱动的深度学习模型被提出用于自动睡眠分段。它们主要依赖于这样一个假设:训练和测试数据来自同一个分布,这在现实场景中可能不成立。无监督域自适应(UDA)是近年来发展起来的一种处理域转移问题的方法。然而,以前用于睡眠分期的UDA方法有两个主要局限性。首先,它们依赖于一个完全共享的域对齐模型,这可能会在特征提取过程中丢失特定于域的信息。其次,它们只对源分布和目标分布进行全局对齐,而不考虑目标域中的类信息,影响了模型的分类性能。在这项工作中,我们提出了一个新的对抗式学习框架来解决未标记目标域中的域转移问题。首先,我们开发了非共享注意机制来保留源域和目标域中特定于域的特征。其次,我们设计了一个自训练策略,通过目标域伪标签来调整源域和目标域的细粒度类分布。为了提高伪标签的鲁棒性和质量,我们还提出了双区分分类器。在六个跨域场景上的实验结果验证了我们提出的睡眠分期框架的有效性及其相对于现有UDA方法的优势。 摘要:Sleep staging is of great importance in the diagnosis and treatment of sleep disorders. Recently, numerous data driven deep learning models have been proposed for automatic sleep staging. They mainly rely on the assumption that training and testing data are drawn from the same distribution which may not hold in real-world scenarios. Unsupervised domain adaption (UDA) has been recently developed to handle this domain shift problem. However, previous UDA methods applied for sleep staging has two main limitations. First, they rely on a totally shared model for the domain alignment, which may lose the domain-specific information during feature extraction. Second, they only align the source and target distributions globally without considering the class information in the target domain, which hinders the classification performance of the model. In this work, we propose a novel adversarial learning framework to tackle the domain shift problem in the unlabeled target domain. First, we develop unshared attention mechanisms to preserve the domain-specific features in the source and target domains. Second, we design a self-training strategy to align the fine-grained class distributions for the source and target domains via target domain pseudo labels. We also propose dual distinct classifiers to increase the robustness and quality of the pseudo labels. The experimental results on six cross-domain scenarios validate the efficacy of our proposed framework for sleep staging and its advantage over state-of-the-art UDA methods.

【4】 Learning to Detect Adversarial Examples Based on Class Scores 标题:学习根据班级分数检测对抗性实例

作者:Tobias Uelwer,Felix Michels,Oliver De Candido 机构: Department of Computer Science, Heinrich Heine University D¨usseldorf, Germany, Department of Electrical and Computer Engineering, Technical University of Munich, Germany 备注:Accepted at the 44th German Conference on Artificial Intelligence (KI 2021) 链接:https://arxiv.org/abs/2107.04435 摘要:随着对抗性攻击对深度神经网络的威胁越来越大,研究有效的检测方法显得尤为重要。在这项工作中,我们进一步研究了基于已经训练好的分类模型的类分数的对抗攻击检测。我们建议训练一个支持向量机(SVM)在类分数检测对抗性的例子。我们的方法能够检测出由各种攻击产生的对抗性例子,并且可以很容易地应用到大量的深度分类模型中。我们表明,我们的方法产生了一个改进的检测率相比,现有的方法,同时很容易实现。我们对不同的深度分类模型进行了广泛的实证分析,调查了各种最先进的对抗性攻击。此外,我们观察到,我们提出的方法是更好地检测组合的对抗性攻击。这项工作表明,通过简单地使用已经训练好的分类模型的类分数,可以检测出各种对抗性攻击。 摘要:Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of adversarial attacks. This work indicates the potential of detecting various adversarial attacks simply by using the class scores of an already trained classification model.

【5】 Deep Image Synthesis from Intuitive User Input: A Review and Perspectives 标题:基于直观用户输入的深度图像合成:回顾与展望

作者:Yuan Xue,Yuan-Chen Guo,Han Zhang,Tao Xu,Song-Hai Zhang,Xiaolei Huang 机构: The Pennsylvania State University, University Park, PA, USA, Tsinghua University, Beijing, China, Google Brain, Mountain View, CA, USA, Facebook, Menlo Park, CA, USA 备注:None 链接:https://arxiv.org/abs/2107.04240 摘要:在计算机图形学、艺术和设计的许多应用中,用户希望提供直观的非图像输入,例如文本、草图、笔划、图形或布局,并且计算机系统自动生成附着于输入内容的照片真实图像。虽然允许这种自动图像内容生成的经典作品遵循了图像检索和合成的框架,但是在深层生成模型方面的最新进展,例如生成性对抗网络(GANs)、变分自动编码器(VAEs)和基于流的方法,使得更强大和通用的图像生成任务成为可能。本文回顾了近年来基于用户输入的图像合成方法,包括输入通用性、图像生成方法、基准数据集和评价指标等方面的研究进展。这激发了对输入表示和交互、主要图像生成范式之间的交叉授粉以及生成方法的评估和比较的新观点。 摘要:In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images that adhere to the input content. While classic works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation tasks. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross pollination between major image generation paradigms, and evaluation and comparison of generation methods.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】 Behavior Self-Organization Supports Task Inference for Continual Robot Learning 标题:行为自组织支持机器人持续学习的任务推理

作者:Muhammad Burhan Hafez,Stefan Wermter 机构: Department of Informatics, University of Hamburg 备注:Accepted at IROS 2021 链接:https://arxiv.org/abs/2107.04533 摘要:机器人学习的最新进展使机器人越来越善于掌握一组预先定义的任务。另一方面,作为人类,我们有能力在一生中学习越来越多的任务。机器人的持续学习是一个新兴的研究方向,其目标是赋予机器人这种能力。为了随着时间的推移学习新的任务,机器人首先需要推断手头的任务。然而,任务推理在多任务学习文献中却很少受到关注。本文提出了一种机器人控制任务连续学习的新方法。我们的方法通过递增的自组织示范行为来执行行为嵌入的无监督学习。任务推理是通过找到最近的行为嵌入到一个示范的行为,它与环境状态一起作为输入到一个多任务策略训练与强化学习,以优化任务的性能。与以前的方法不同,我们的方法对任务分布不做任何假设,也不需要任务探索来推断任务。实验结果表明,该方法在泛化性能和收敛速度方面优于其他多任务学习方法,特别是在连续学习环境下。 摘要:Recent advances in robot learning have enabled robots to become increasingly better at mastering a predefined set of tasks. On the other hand, as humans, we have the ability to learn a growing set of tasks over our lifetime. Continual robot learning is an emerging research direction with the goal of endowing robots with this ability. In order to learn new tasks over time, the robot first needs to infer the task at hand. Task inference, however, has received little attention in the multi-task learning literature. In this paper, we propose a novel approach to continual learning of robotic control tasks. Our approach performs unsupervised learning of behavior embeddings by incrementally self-organizing demonstrated behaviors. Task inference is made by finding the nearest behavior embedding to a demonstrated behavior, which is used together with the environment state as input to a multi-task policy trained with reinforcement learning to optimize performance over tasks. Unlike previous approaches, our approach makes no assumptions about task distribution and requires no task exploration to infer tasks. We evaluate our approach in experiments with concurrently and sequentially presented tasks and show that it outperforms other multi-task learning approaches in terms of generalization performance and convergence speed, particularly in the continual learning setting.

【2】 Offline reinforcement learning with uncertainty for treatment strategies in sepsis 标题:具有不确定性的离线强化学习在脓毒症治疗策略中的应用

作者:Ran Liu,Joseph L. Greenstein,James C. Fackler,Jules Bergmann,Melania M. Bembea,Raimond L. Winslow 机构:Affiliations:, Institute for Computational Medicine, The Johns Hopkins University, Department of Biomedical Engineering, The Johns Hopkins University School of Medicine &, Whiting School of Engineering 备注:25 pages, 8 figures 链接:https://arxiv.org/abs/2107.04491 摘要:脓毒症和感染性休克的指南性治疗是困难的,因为脓毒症是一种不同范围的危及生命的器官功能障碍,其病理生理学尚不完全清楚。脓毒症的早期干预对患者的预后至关重要,然而这些干预措施会产生不良影响,并且经常管理过度。更大的个性化是必要的,因为没有一个单一的行动是适合所有患者。我们提出了一个新的应用强化学习,其中我们确定最佳的建议脓毒症治疗的数据,估计他们的信心水平,并确定治疗方案很少观察到的训练数据。我们的方法可以提供多种治疗方案,而不是单一的建议。我们研究了学习策略,发现强化学习由于死亡率和接受治疗水平之间的混杂关系而偏向于积极干预。我们使用子空间学习来减轻这种偏见,并开发出能够在医疗保健应用程序中产生更准确的学习策略的方法。 摘要:Guideline-based treatment for sepsis and septic shock is difficult because sepsis is a disparate range of life-threatening organ dysfunctions whose pathophysiology is not fully understood. Early intervention in sepsis is crucial for patient outcome, yet those interventions have adverse effects and are frequently overadministered. Greater personalization is necessary, as no single action is suitable for all patients. We present a novel application of reinforcement learning in which we identify optimal recommendations for sepsis treatment from data, estimate their confidence level, and identify treatment options infrequently observed in training data. Rather than a single recommendation, our method can present several treatment options. We examine learned policies and discover that reinforcement learning is biased against aggressive intervention due to the confounding relationship between mortality and level of treatment received. We mitigate this bias using subspace learning, and develop methodology that can yield more accurate learning policies across healthcare applications.

【3】 Differentially private training of neural networks with Langevin dynamics forcalibrated predictive uncertainty 标题:用于校正预测不确定性的朗之万动力学神经网络的差分私有训练

作者:Moritz Knolle,Alexander Ziller,Dmitrii Usynin,Rickmer Braren,Marcus R. Makowski,Daniel Rueckert,Georgios Kaissis 机构:Equal contribution 1Department of Diagnostic and Interven-tional Radiology, School of Medicine, Technical University of Munich, Germany 2Institutefor Artificial Intelligence and Informatics in Medicine, Technical University of Mu-nich 备注:Accepted to the ICML 2021 Theory and Practice of Differential Privacy Workshop 链接:https://arxiv.org/abs/2107.04296 摘要:我们发现,差异私人随机梯度下降(DP-SGD)可以产生校准差,过度自信的深度学习模型。这是安全关键应用的一个严重问题,例如在医疗诊断中。我们强调并利用随机梯度Langevin动态(一种用于训练深层神经网络的可伸缩贝叶斯推理技术)和DP-SGD之间的相似性,以便在对原始(DP-SGD)算法稍作调整的情况下训练差异私有的贝叶斯神经网络。我们的方法提供了比DP-SGD更可靠的不确定度估计,如预期校准误差的减少(MNIST$sim{5}$-倍,儿科肺炎数据集$sim{2}$-倍)所证明的那样。 摘要:We show that differentially private stochastic gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models. This represents a serious issue for safety-critical applications, e.g. in medical diagnosis. We highlight and exploit parallels between stochastic gradient Langevin dynamics, a scalable Bayesian inference technique for training deep neural networks, and DP-SGD, in order to train differentially private, Bayesian neural networks with minor adjustments to the original (DP-SGD) algorithm. Our approach provides considerably more reliable uncertainty estimates than DP-SGD, as demonstrated empirically by a reduction in expected calibration error (MNIST $sim{5}$-fold, Pediatric Pneumonia Dataset $sim{2}$-fold).

【4】 Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation 标题:使用不确定性估计度量和改进模型-主持人协作

作者:Ian D. Kivlichan,Zi Lin,Jeremiah Liu,Lucy Vasserman 机构:Jigsaw, Google Research 备注:WOAH 2021 链接:https://arxiv.org/abs/2107.04212 摘要:内容调节通常由人类和机器学习模型之间的协作来执行。然而,如何设计协同过程以最大限度地提高联合调节模型系统的性能,目前尚不清楚。这项工作提出了一个严格的研究这个问题,重点放在一种方法,将模型的不确定性纳入到协作过程。首先,我们引入原则性的度量来描述协作系统在人类调节者的能力约束下的性能,量化组合系统如何有效地利用人类决策。利用这些指标,我们进行了一项大型基准研究,评估了在不同协作评审策略下最先进的不确定性模型的性能。我们发现,基于不确定性的策略始终优于基于毒性评分的广泛使用的策略,而且,审查策略的选择极大地改变了系统的整体性能。我们的结果证明了严格的度量对于理解和开发有效的内容调节模型系统的重要性,以及不确定性估计在这一领域的实用性。 摘要:Content moderation is often performed by a collaboration between humans and machine learning models. However, it is not well understood how to design the collaborative process so as to maximize the combined moderator-model system performance. This work presents a rigorous study of this problem, focusing on an approach that incorporates model uncertainty into the collaborative process. First, we introduce principled metrics to describe the performance of the collaborative system under capacity constraints on the human moderator, quantifying how efficiently the combined system utilizes human decisions. Using these metrics, we conduct a large benchmark study evaluating the performance of state-of-the-art uncertainty models under different collaborative review strategies. We find that an uncertainty-based strategy consistently outperforms the widely used strategy based on toxicity scores, and moreover that the choice of review strategy drastically changes the overall system performance. Our results demonstrate the importance of rigorous metrics for understanding and developing effective moderator-model systems for content moderation, as well as the utility of uncertainty estimation in this domain.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】 BayesSimIG: Scalable Parameter Inference for Adaptive Domain Randomization with IsaacGym 标题:BayesSimIG:基于IsaacGym的自适应域随机化可扩展参数推理

作者:Rika Antonova,Fabio Ramos,Rafael Possas,Dieter Fox 机构:Department of Computer Science, Stanford University, USA, NVIDIA, USA, School of Computer Science, University of Sydney, Australia 链接:https://arxiv.org/abs/2107.04527 摘要:BayesSim是一种基于仿真参数无似然推理的强化学习领域随机化统计技术。本文概述了BayesSimIG:一个提供BayesSim与最近发布的nvidiaisaacgym集成的实现的库。这种组合允许使用端到端GPU加速进行大规模参数推断。推理和仿真都得到了GPU的加速,支持为复杂的机器人任务运行超过10K的并行仿真环境,可以估计超过100个仿真参数。BayesSimIG提供了与张力板的集成,可以轻松地可视化高维后验图像。该图书馆是建立在一个模块化的方式,以支持研究实验与新的方法收集和处理的轨迹,从平行的IsaacGym环境。 摘要:BayesSim is a statistical technique for domain randomization in reinforcement learning based on likelihood-free inference of simulation parameters. This paper outlines BayesSimIG: a library that provides an implementation of BayesSim integrated with the recently released NVIDIA IsaacGym. This combination allows large-scale parameter inference with end-to-end GPU acceleration. Both inference and simulation get GPU speedup, with support for running more than 10K parallel simulation environments for complex robotics tasks that can have more than 100 simulation parameters to estimate. BayesSimIG provides an integration with TensorBoard to easily visualize slices of high-dimensional posteriors. The library is built in a modular way to support research experiments with novel ways to collect and process the trajectories from the parallel IsaacGym environments.

【2】 Online Adaptation to Label Distribution Shift 标题:适应标签分销转变的在线调整

作者:Ruihan Wu,Chuan Guo,Yi Su,Kilian Q. Weinberger 机构:Cornell University, Facebook AI Research 链接:https://arxiv.org/abs/2107.04520 摘要:机器学习模型在实际应用中经常会遇到分布变化。本文主要研究在线环境下标签分布变化的自适应问题,在线环境下测试时间标签分布是不断变化的,模型必须在不观察真实标签的情况下动态地适应这种变化。利用一个新的分析,我们表明,缺乏真正的标签并不妨碍估计预期的测试损失,这使得减少在线标签转移适应传统的在线学习。基于这一观察结果,我们提出了受经典在线学习技术启发的自适应算法,如跟随引导(FTL)和在线梯度下降(OGD),并推导了它们的遗憾界。我们在模拟和真实世界的标签分布转移下验证了我们的研究结果,并表明OGD对于各种具有挑战性的标签转移场景特别有效和稳健。 摘要:Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.

【3】 Lithography Hotspot Detection via Heterogeneous Federated Learning with Local Adaptation 标题:基于异质局部自适应联邦学习的光刻热点检测

作者:Xuezhong Lin,Jingyu Pan,Jinming Xu,Yiran Chen,Cheng Zhuo 机构:Zhejiang University, China; ,Duke University, USA 备注:8 pages, 9 figures 链接:https://arxiv.org/abs/2107.04367 摘要:随着工艺规模的不断扩大,光刻热点检测已成为可制造性设计中的一项重要任务。虽然在热点检测中部署模式匹配或机器学习有助于节省大量的仿真时间,但这种方法通常需要非平凡的质量数据来建立模型,而大多数设计公司都缺乏这种数据。此外,设计院也不愿意直接与其他设计院共享这些数据,建立统一的模型,这对于设计模式独特的设计院来说,由于数据不足,效果不佳。另一方面,由于每个设计室的数据都是同质的,局部训练的模型很容易过拟合,失去泛化能力和鲁棒性。在本文中,我们提出了一个异构的联邦学习框架,用于光刻热点检测,可以解决上述问题。一方面,该框架通过异构知识共享,在保持局部数据私有化的前提下,建立了一个更加健壮的集中式全局子模型。另一方面,全局子模型可以与局部子模型相结合,以更好地适应局部数据的异构性。实验结果表明,该框架能够克服非独立同分布(non-IID)数据和异构通信的挑战,在保证各种场景下具有良好的收敛速度的同时,取得了比现有方法更高的性能。 摘要:As technology scaling is approaching the physical limit, lithography hotspot detection has become an essential task in design for manufacturability. While the deployment of pattern matching or machine learning in hotspot detection can help save significant simulation time, such methods typically demand for non-trivial quality data to build the model, which most design houses are short of. Moreover, the design houses are also unwilling to directly share such data with the other houses to build a unified model, which can be ineffective for the design house with unique design patterns due to data insufficiency. On the other hand, with data homogeneity in each design house, the locally trained models can be easily over-fitted, losing generalization ability and robustness. In this paper, we propose a heterogeneous federated learning framework for lithography hotspot detection that can address the aforementioned issues. On one hand, the framework can build a more robust centralized global sub-model through heterogeneous knowledge sharing while keeping local data private. On the other hand, the global sub-model can be combined with a local sub-model to better adapt to local data heterogeneity. The experimental results show that the proposed framework can overcome the challenge of non-independent and identically distributed (non-IID) data and heterogeneous communication to achieve very high performance in comparison to other state-of-the-art methods while guaranteeing a good convergence rate in various scenarios.

【4】 FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning 标题:FedAdapt:联合学习中物联网设备的自适应卸载

作者:Di Wu,Rehmat Ullah,Paul Harvey,Peter Kilpatrick,Ivor Spence,Blesson Varghese 机构: Varghese are with theSchool of Electronics, Queen’sUniversity Belfast 备注:13 pages 链接:https://arxiv.org/abs/2107.04271 摘要:在物联网设备上应用联合学习(FL)是物联网设备产生的大量数据和对数据隐私的日益关注所必需的。然而,要使FL高效,有三个挑战需要解决:(i)在计算能力有限的设备上执行,(ii)由于设备的计算异构性而考虑散乱,以及(iii)适应不断变化的网络带宽。本文提出了一个自适应卸载FL框架fedadadapt来缓解上述挑战。FedAdapt通过利用深度神经网络(DNNs)到服务器的层卸载来加速计算受限设备中的局部训练。此外,FedAdapt采用基于强化学习的优化和聚类方法,自适应地确定DNN的哪些层应该为每个单独的设备卸载到服务器上,以应对计算异构性和不断变化的网络带宽的挑战。实验研究是在一个由五个物联网设备组成的实验室试验台上进行的。通过将DNN从设备卸载到服务器,FedAdapt将典型物联网设备的训练时间比经典FL减少了一半以上。极端掉队者的训练时间和整体训练时间最多可减少57%。此外,随着网络带宽的变化,fedadadapt与经典FL相比,在不牺牲准确度的前提下,训练时间可减少40%。FedAdapt可从https://github.com/qub-blesson/FedAdapt. 摘要:Applying Federated Learning (FL) on Internet-of-Things devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: (i) execute on devices with limited computational capabilities, (ii) account for stragglers due to computational heterogeneity of devices, and (iii) adapt to the changing network bandwidths. This paper presents FedAdapt, an adaptive offloading FL framework to mitigate the aforementioned challenges. FedAdapt accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Further, FedAdapt adopts reinforcement learning-based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. Experimental studies are carried out on a lab-based testbed comprising five IoT devices. By offloading a DNN from the device to the server FedAdapt reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, FedAdapt is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy. FedAdapt can be downloaded from https://github.com/qub-blesson/FedAdapt.

【5】 Exploring Dropout Discriminator for Domain Adaptation 标题:用于领域自适应的丢弃判别器研究

作者:Vinod K Kurmi,Venkatesh K Subramanian,Vinay P. Namboodiri 机构:a Electrical Engineering Department, Indian Institute of Technology Kanpur, Kanpur, India, b Department of Computer Science and Engineering, Indian Institute of Technology 备注:This work is an extension of our BMVC-2019 paper (arXiv:1907.10628) 链接:https://arxiv.org/abs/2107.04231 摘要:如何使分类器适应新的领域是机器学习中一个具有挑战性的问题。这一点已经通过许多基于深度和非深度学习的方法来解决。在所使用的方法中,对抗性学习方法被广泛应用于解决许多深度学习问题和领域适应问题。这些方法是基于一个鉴别器,确保源和目标分布是密切的。然而,这里我们建议,与其使用由单个鉴别器获得的点估计,不如使用基于鉴别器集合的分布来弥合这一差距。这可以通过使用多个分类器或使用传统的集成方法来实现。与此相反,我们建议一个基于montecarlo辍学的系综鉴别器足以获得基于分布的鉴别器。具体来说,我们提出了一个基于课程的辍学鉴别器,该鉴别器逐渐增加基于样本的分布的方差,并使用相应的反向梯度来对齐源和目标的特征表示。一组鉴别器有助于模型有效地学习数据分布。它还提供了更好的梯度估计来训练特征抽取器。详细的结果和深入的烧蚀分析表明,我们的模型优于最新的结果。 摘要:Adaptation of a classifier to new domains is one of the challenging problems in machine learning. This has been addressed using many deep and non-deep learning based methods. Among the methodologies used, that of adversarial learning is widely applied to solve many deep learning problems along with domain adaptation. These methods are based on a discriminator that ensures source and target distributions are close. However, here we suggest that rather than using a point estimate obtaining by a single discriminator, it would be useful if a distribution based on ensembles of discriminators could be used to bridge this gap. This could be achieved using multiple classifiers or using traditional ensemble methods. In contrast, we suggest that a Monte Carlo dropout based ensemble discriminator could suffice to obtain the distribution based discriminator. Specifically, we propose a curriculum based dropout discriminator that gradually increases the variance of the sample based distribution and the corresponding reverse gradients are used to align the source and target feature representations. An ensemble of discriminators helps the model to learn the data distribution efficiently. It also provides a better gradient estimates to train the feature extractor. The detailed results and thorough ablation analysis show that our model outperforms state-of-the-art results.

【6】 Improved Breath Phase and Continuous Adventitious Sound Detection in Lung and Tracheal Sound Using Mixed Set Training and Domain Adaptation 标题:基于混合集合训练和域自适应的肺音和气管音的改进呼吸相位和连续异声检测

作者:Fu-Shun Hsu,Shang-Ran Huang,Chang-Fu Su,Chien-Wen Huang,Yuan-Ren Cheng,Chun-Chieh Chen,Chun-Yu Wu,Chung-Wei Chen,Yen-Chun Lai,Tang-Wei Cheng,Nian-Jhen Lin,Wan-Ling Tsai,Ching-Shiang Lu,Chuan Chen,Feipei Lai 机构:a Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, b Department of Critical Care Medicine, Far Eastern Memorial Hospital, New Taipei, Taiwan, c Heroic Faith Medical Science Co., Ltd., Taipei, Taiwan 备注:To be submitted, 31 pages, 6 figures, 5 tables 链接:https://arxiv.org/abs/2107.04229 摘要:在此之前,我们建立了一个肺音数据库HFu lungu V2,并提出了卷积双向门控循环单元(CNN-BiGRU)模型,该模型具有足够的能力检测肺音中的吸入、呼气、连续不定音(CAS)和间断不定音。在这项研究中,我们建立了一个气管声音数据库HF_tracthal_V1,包含11107个15秒气管声音记录,23087个吸入标签,16728个呼气标签和6874个CAS标签。将HFu气管V1的气管音和HFu肺V2的肺音联合或单独用于CNN-BiGRU模型的肺和气管音分析。对不同的训练策略进行了研究和比较:(1)采用完全训练(从头开始训练),单独用肺音训练肺音模型,单独用气管音训练气管音模型;(2)采用同时包含肺音和气管音的混合集训练模型,(3)利用域自适应技术,将预先训练好的肺音模型与气管音数据进行微调,反之亦然。结果表明,仅用肺音训练的模型在气管音分析中表现较差,反之亦然。然而,与阳性对照组相比,混合集训练和域适应可以提高肺音中呼气和CAS检测的性能,以及气管音中吸入、呼气和CAS检测的性能(仅通过肺音训练的肺模型,反之亦然)。特别是在一石二鸟的情况下,混合集训练的模型更为普遍。 摘要:Previously, we established a lung sound database, HF_Lung_V2 and proposed convolutional bidirectional gated recurrent unit (CNN-BiGRU) models with adequate ability for inhalation, exhalation, continuous adventitious sound (CAS), and discontinuous adventitious sound detection in the lung sound. In this study, we proceeded to build a tracheal sound database, HF_Tracheal_V1, containing 11107 of 15-second tracheal sound recordings, 23087 inhalation labels, 16728 exhalation labels, and 6874 CAS labels. The tracheal sound in HF_Tracheal_V1 and the lung sound in HF_Lung_V2 were either combined or used alone to train the CNN-BiGRU models for respective lung and tracheal sound analysis. Different training strategies were investigated and compared: (1) using full training (training from scratch) to train the lung sound models using lung sound alone and train the tracheal sound models using tracheal sound alone, (2) using a mixed set that contains both the lung and tracheal sound to train the models, and (3) using domain adaptation that finetuned the pre-trained lung sound models with the tracheal sound data and vice versa. Results showed that the models trained only by lung sound performed poorly in the tracheal sound analysis and vice versa. However, the mixed set training and domain adaptation can improve the performance of exhalation and CAS detection in the lung sound, and inhalation, exhalation, and CAS detection in the tracheal sound compared to positive controls (lung models trained only by lung sound and vice versa). Especially, a model derived from the mixed set training prevails in the situation of killing two birds with one stone.

强化学习(2篇)

【1】 Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention 标题:Attend2Pack:带注意的深度强化学习装箱

作者:Jingwei Zhang,Bin Zi,Xiaoyu Ge 机构:China 2Australian National University 备注:Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 38th International Conference on Machine Learning, 2021 链接:https://arxiv.org/abs/2107.04333 摘要:本文试图通过学习的角度来解决装箱问题。在基于自我注意的编码和深度强化学习算法的基础上,我们提出了一种新的端到端学习模型。通过对组合动作空间进行分解,并利用一种新的训练技术,即优先过采样(prioritized oversampling),这是一种加速策略学习的通用方案,我们在一系列实验环境中获得了最先进的性能。此外,虽然提出的方法attend2pack的目标是离线BPP,但我们将我们的方法简化为严格的在线BPP设置,在这里它也能够实现最先进的性能。通过一系列的消融研究以及与以前的一系列工作的比较,我们希望能为这一领域的研究提供一个有效的基线方法。 摘要:This paper seeks to tackle the bin packing problem (BPP) through a learning perspective. Building on self-attention-based encoding and deep reinforcement learning algorithms, we propose a new end-to-end learning model for this task of interest. By decomposing the combinatorial action space, as well as utilizing a new training technique denoted as prioritized oversampling, which is a general scheme to speed up on-policy learning, we achieve state-of-the-art performance in a range of experimental settings. Moreover, although the proposed approach attend2pack targets offline-BPP, we strip our method down to the strict online-BPP setting where it is also able to achieve state-of-the-art performance. With a set of ablation studies as well as comparisons against a range of previous works, we hope to offer as a valid baseline approach to this field of study.

【2】 Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning 标题:高效的基于模型的多智能体平均场强化学习

作者:Barna Pasztor,Ilija Bogunovic,Andreas Krause 机构:ETH Zürich 备注:28 pages, 2 figures, Preprint, Submitted to NeurIPS 2021 链接:https://arxiv.org/abs/2107.04050 摘要:多智能体系统中的学习具有很大的挑战性,这是由于智能体之间的交互所带来的固有复杂性。我们通过平均场控制(MFC)来处理具有大量交互代理(如群集)的系统。MFC考虑了一个渐近无限的群体,这个群体的目标是使集体报酬最大化。具体而言,我们考虑的情况下,未知的系统动力学的目标是同时优化奖励和学习经验。我们提出了一个有效的基于模型的强化学习算法$text{M}^3text{-UCRL}$,该算法以片段形式运行并可证明地解决了这个问题$text{M}^3text{-UCRL}$在策略学习期间使用置信上限来平衡探索和利用。我们的主要理论贡献是通过一种新的平均场类型分析获得的第一个基于模型的MFC RL的一般遗憾界$text{M}^3text{-UCRL}$可以用不同的模型(如神经网络或高斯过程)实例化,并与神经网络策略学习有效结合。我们实证地证明了$text{M}^3text{-UCRL}$在群体运动问题上的收敛性,该群体运动问题是控制无限多个个体寻求最大的位置依赖报酬和避免拥挤区域。 摘要:Learning in multi-agent systems is highly challenging due to the inherent complexity introduced by agents' interactions. We tackle systems with a huge population of interacting agents (e.g., swarms) via Mean-Field Control (MFC). MFC considers an asymptotically infinite population of identical agents that aim to collaboratively maximize the collective reward. Specifically, we consider the case of unknown system dynamics where the goal is to simultaneously optimize for the rewards and learn from experience. We propose an efficient model-based reinforcement learning algorithm $text{M}^3text{-UCRL}$ that runs in episodes and provably solves this problem. $text{M}^3text{-UCRL}$ uses upper-confidence bounds to balance exploration and exploitation during policy learning. Our main theoretical contributions are the first general regret bounds for model-based RL for MFC, obtained via a novel mean-field type analysis. $text{M}^3text{-UCRL}$ can be instantiated with different models such as neural networks or Gaussian Processes, and effectively combined with neural network policy learning. We empirically demonstrate the convergence of $text{M}^3text{-UCRL}$ on the swarm motion problem of controlling an infinite population of agents seeking to maximize location-dependent reward and avoid congested areas.

医学相关(2篇)

【1】 Multi-level Stress Assessment from ECG in a Virtual Reality Environment using Multimodal Fusion 标题:基于多模式融合的虚拟现实环境心电多水平应力评估

作者:Zeeshan Ahmad,Suha Rabbani,Muhammad Rehman Zafar,Syem Ishaque,Sridhar Krishnan,Naimul Khan 备注:Under review 链接:https://arxiv.org/abs/2107.04566 摘要:心电图是一个有吸引力的选择,以评估压力在严重的虚拟现实(VR)应用,由于其非侵入性的性质。然而,现有的机器学习模型性能较差。此外,现有的研究只进行二元压力评估,而开发更具吸引力的生物反馈应用,多层次的评估是必要的。现有研究将单一体验(如观看虚拟现实视频)注释并分类为单一的压力水平,这再次妨碍了动态体验的设计,而动态体验可以利用实时游戏中的压力评估。在这篇论文中,我们报告了一项关于虚拟现实压力评估的新研究的结果,其中评估了三种压力水平。心电图数据收集自9名体验VR过山车的用户。然后,由三名评分员在10秒的时间段内将虚拟现实体验手动标记为三种压力水平。然后,我们提出了一个新的多模态深度融合模型,利用频谱图和一维心电图,可以提供一个压力预测从一秒钟的窗口。实验结果表明,该模型优于经典的基于HRV的ML模型(准确率提高9%)和基线深度学习模型(准确率提高2.5%)。我们还报告了基准WESAD数据集的结果,以显示该模型的优越性。 摘要:ECG is an attractive option to assess stress in serious Virtual Reality (VR) applications due to its non-invasive nature. However, the existing Machine Learning (ML) models perform poorly. Moreover, existing studies only perform a binary stress assessment, while to develop a more engaging biofeedback-based application, multi-level assessment is necessary. Existing studies annotate and classify a single experience (e.g. watching a VR video) to a single stress level, which again prevents design of dynamic experiences where real-time in-game stress assessment can be utilized. In this paper, we report our findings on a new study on VR stress assessment, where three stress levels are assessed. ECG data was collected from 9 users experiencing a VR roller coaster. The VR experience was then manually labeled in 10-seconds segments to three stress levels by three raters. We then propose a novel multimodal deep fusion model utilizing spectrogram and 1D ECG that can provide a stress prediction from just a 1-second window. Experimental results demonstrate that the proposed model outperforms the classical HRV-based ML models (9% increase in accuracy) and baseline deep learning models (2.5% increase in accuracy). We also report results on the benchmark WESAD dataset to show the supremacy of the model.

【2】 3D RegNet: Deep Learning Model for COVID-19 Diagnosis on Chest CT Image 标题:3D RegNet:胸部CT图像冠状病毒诊断的深度学习模型

作者:Haibo Qi,Yuhan Wang,Xinyu Liu 机构:Xidian University, Xian, China 链接:https://arxiv.org/abs/2107.04055 摘要:本文提出了一种基于三维RegNet的神经网络诊断冠状病毒(Covid-19)感染患者身体状况的方法。在临床医学的应用中,肺部CT图像被医生用来判断病人是否感染了冠状病毒。然而,对于这种诊断方法,还可以考虑一些延迟,例如耗时和低准确度。肺作为人体较大的器官,如果利用二维切片图像进行诊断,会丢失重要的空间特征。为此,本文设计了一个基于三维图像的深度学习模型。三维图像作为输入数据,由二维肺部图像序列组成,从中提取相关的冠状病毒感染三维特征并进行分类。结果表明,在三维模型的测试集上,f1得分为0.8379,AUC值为0.8807。 摘要:In this paper, a 3D-RegNet-based neural network is proposed for diagnosing the physical condition of patients with coronavirus (Covid-19) infection. In the application of clinical medicine, lung CT images are utilized by practitioners to determine whether a patient is infected with coronavirus. However, there are some laybacks can be considered regarding to this diagnostic method, such as time consuming and low accuracy. As a relatively large organ of human body, important spatial features would be lost if the lungs were diagnosed utilizing two dimensional slice image. Therefore, in this paper, a deep learning model with 3D image was designed. The 3D image as input data was comprised of two-dimensional pulmonary image sequence and from which relevant coronavirus infection 3D features were extracted and classified. The results show that the test set of the 3D model, the result: f1 score of 0.8379 and AUC value of 0.8807 have been achieved.

蒸馏|知识提取(3篇)

【1】 Form2Seq : A Framework for Higher-Order Form Structure Extraction 标题:Form2Seq:一种高阶表单结构提取框架

作者:Milan Aggarwal,Hiresh Gupta,Mausoom Sarkar,Balaji Krishnamurthy 机构:Media and Data Science Research Labs, Adobe, Adobe Experience Cloud 备注:This paper has been presented at EMNLP 2020 链接:https://arxiv.org/abs/2107.04419 摘要:文档结构提取是近几十年来广泛研究的一个领域,最近的工作将其作为一种基于完全卷积网络的文档图像语义分割任务。这种方法受图像分辨率的限制,无法消除密集区域中的结构歧义。为了缓解这一问题,我们提出了Form2Seq,一个新颖的序列到序列(Seq2Seq)的框架,用于使用文本进行结构提取,特别关注表单,它利用了结构的相对空间排列。我们讨论两个任务;1) 将底层组成元素(TextBlock和空可填充小部件)分类为10种类型,如字段标题、列表项和其他;2) 将低级元素分组为高阶结构,如文本字段、选项字段和选项组,用作表单中的信息收集机制。为了实现这一点,我们按照自然的阅读顺序线性地排列组成元素,将它们的空间和文本表示形式反馈给Seq2Seq框架,Seq2Seq框架根据最终任务依次输出每个元素的预测。我们修改了Seq2Seq来分组任务,并讨论了通过两个任务的级联端到端训练与单独训练获得的改进。实验结果表明,基于文本的方法在分类任务上的准确率达到90%,在上述分组上的F1分别为75.82、86.01和61.63,优于分割基线。此外,我们还展示了我们的框架在ICDAR2013数据集上实现了表结构识别的结果。 摘要:Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fully-convolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text, with a specific focus on forms, which leverages relative spatial arrangement of structures. We discuss two tasks; 1) Classification of low-level constituent elements (TextBlock and empty fillable Widget) into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields and ChoiceGroups, used as information collection mechanism in forms. To achieve this, we arrange the constituent elements linearly in natural reading order, feed their spatial and textual representations to Seq2Seq framework, which sequentially outputs prediction of each element depending on the final task. We modify Seq2Seq for grouping task and discuss improvements obtained through cascaded end-to-end training of two tasks versus training in isolation. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above respectively, outperforming segmentation baselines. Further we show our framework achieves state of the results for table structure recognition on ICDAR 2013 dataset.

【2】 UniRE: A Unified Label Space for Entity Relation Extraction 标题:UniRE:一种用于实体关系抽取的统一标签空间

作者:Yijun Wang,Changzhi Sun,Yuanbin Wu,Hao Zhou,Lei Li,Junchi Yan 机构:Department of Computer Science and Engineering, Shanghai Jiao Tong University, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, School of Computer Science and Technology, East China Normal University, ByteDance AI Lab 备注:ACL2021 链接:https://arxiv.org/abs/2107.04292 摘要:许多联合实体关系抽取模型为两个子任务(即实体检测和关系分类)建立了两个独立的标签空间。我们认为,这种设置可能会阻碍实体和关系之间的信息交互。在这项工作中,我们建议消除对两个子任务标签空间的不同处理。我们模型的输入是一个包含一个句子中所有单词对的表。实体和关系在表中用正方形和矩形表示。我们使用一个统一的分类器来预测每个细胞的标签,从而统一了两个子任务的学习。为了测试,提出了一种有效的(快速的)近似解码器,用于从表中查找正方形和矩形。在三个基准(ACE04、ACE05、SciERC)上的实验表明,我们的模型只需使用一半的参数,就可以用最好的抽取器获得具有竞争力的精度,而且速度更快。 摘要:Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks' label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell's label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster.

【3】 Multi-path Convolutional Neural Networks Efficiently Improve Feature Extraction in Continuous Adventitious Lung Sound Detection 标题:多路径卷积神经网络有效提高连续异位肺音检测的特征提取

作者:Fu-Shun Hsu,Shang-Ran Huang,Chien-Wen Huang,Chun-Chieh Chen,Yuan-Ren Cheng,Feipei Lai 机构: Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan, University, Taipei, Taiwan, Department of Critical Care Medicine, Far Eastern Memorial Hospital, New Taipei, Heroic Faith Medical Science Co., Ltd., Taipei, Taiwan 备注:To be submitted, 32 pages, 8 figures, 2 tables 链接:https://arxiv.org/abs/2107.04226 摘要:我们之前建立了一个大型的肺音数据库,HFu lungu V2(lungu V2)。我们训练了卷积双向门控循环单元(CNN-BiGRU)网络,用于在记录水平上检测吸入、呼气、连续不定音(CAS)和间断不定音。然而,由于多种原因,CAS检测性能较差,其中一个原因就是CAS模式的高度多样化。为了使原来的CNN-BiGRU模型更有效地学习CAS模式,并且不造成太大的计算负担,研究了三种对CNN层网络结构进行最小修改的策略:(1)利用残差块使CNN层更深一点,(2)通过增加CNN核的数目,使CNN的层次更宽;(3)将特征输入分为多条路径(模型用多路径CNN BiGRU表示)。评估了CAS片段和事件检测的性能。结果表明,在所有提出的结构改进模型中,CAS检测都得到了改进。模型的CAS事件检测F1评分从0.445提高到0.491-0.530,具有显著性。然而,多路径CNN-BiGRU模型在总共9个评价指标中的获奖次数(5次)超过了其他模型。此外,与原CNN-BiGRU模型相比,多路径CNN-BiGRU模型没有造成额外的计算负担(0.97倍的推理时间)。总之,多路径CNN层可以有效地提高特征提取的有效性,从而得到更好的CAS检测。 摘要:We previously established a large lung sound database, HF_Lung_V2 (Lung_V2). We trained convolutional-bidirectional gated recurrent unit (CNN-BiGRU) networks for detecting inhalation, exhalation, continuous adventitious sound (CAS) and discontinuous adventitious sound at the recording level on the basis of Lung_V2. However, the performance of CAS detection was poor due to many reasons, one of which is the highly diversified CAS patterns. To make the original CNN-BiGRU model learn the CAS patterns more effectively and not cause too much computing burden, three strategies involving minimal modifications of the network architecture of the CNN layers were investigated: (1) making the CNN layers a bit deeper by using the residual blocks, (2) making the CNN layers a bit wider by increasing the number of CNN kernels, and (3) separating the feature input into multiple paths (the model was denoted by Multi-path CNN-BiGRU). The performance of CAS segment and event detection were evaluated. Results showed that improvement in CAS detection was observed among all the proposed architecture-modified models. The F1 score for CAS event detection of the proposed models increased from 0.445 to 0.491-0.530, which was deemed significant. However, the Multi-path CNN-BiGRU model outperformed the other models in terms of the number of winning titles (five) in total nine evaluation metrics. In addition, the Multi-path CNN-BiGRU model did not cause extra computing burden (0.97-fold inference time) compared to the original CNN-BiGRU model. Conclusively, the Multi-path CNN layers can efficiently improve the effectiveness of feature extraction and subsequently result in better CAS detection.

自动驾驶|车辆|车道检测等(3篇)

【1】 ARC: Adversarially Robust Control Policies for Autonomous Vehicles 标题:ARC:自主车辆的对抗性鲁棒控制策略

作者:Sampo Kuutti,Saber Fallah,Richard Bowden 备注:Accepted in IEEE Intelligent Transportation Systems Conference (ITSC) 2021 链接:https://arxiv.org/abs/2107.04487 摘要:深度神经网络已经证明了其学习各种任务控制策略的能力。然而,这些基于神经网络的策略已被证明容易被敌对代理利用。因此,有必要开发技术来学习对对手具有鲁棒性的控制策略。我们引入了对手鲁棒控制(ARC),它在相同的损失下,端到端地训练主角策略和对手策略。主角的目标是最大限度地减少损失,而对手则试图将损失降到最低。我们在一个高速公路驾驶场景中演示了建议的ARC训练,其中主角控制跟随者车辆,而对手控制领头车辆。通过训练主人公对抗一组对手,它学习了一种更为鲁棒的控制策略,该策略可推广到多种对抗策略。结果表明,与原策略相比,该方法减少了90.25%的碰撞次数。此外,通过利用辅助蒸馏损失,我们证明了微调控制策略在其原始训练分布上的性能没有下降。 摘要:Deep neural networks have demonstrated their capability to learn control policies for a variety of tasks. However, these neural network-based policies have been shown to be susceptible to exploitation by adversarial agents. Therefore, there is a need to develop techniques to learn control policies that are robust against adversaries. We introduce Adversarially Robust Control (ARC), which trains the protagonist policy and the adversarial policy end-to-end on the same loss. The aim of the protagonist is to maximise this loss, whilst the adversary is attempting to minimise it. We demonstrate the proposed ARC training in a highway driving scenario, where the protagonist controls the follower vehicle whilst the adversary controls the lead vehicle. By training the protagonist against an ensemble of adversaries, it learns a significantly more robust control policy, which generalises to a variety of adversarial strategies. The approach is shown to reduce the amount of collisions against new adversaries by up to 90.25%, compared to the original policy. Moreover, by utilising an auxiliary distillation loss, we show that the fine-tuned control policy shows no drop in performance across its original training distribution.

【2】 A First Look at Class Incremental Learning in Deep Learning Mobile Traffic Classification 标题:深度学习移动流量分类中的类增量学习初探

作者:Giampaolo Bovenzi,Lixuan Yang,Alessandro Finamore,Giuseppe Aceto,Domenico Ciuonzo,Antonio Pescapè,Dario Rossi 机构:Huawei Technology France, † University of Napoli Federico II 备注:Accepted for publication at Network Traffic Measurement and Analysis Conference (TMA), September 2021 链接:https://arxiv.org/abs/2107.04464 摘要:近年来,深度学习(Deep Learning,DL)的流行再次激发了人们对流量分类的兴趣,一些研究表明,基于DL的分类器能够准确地识别互联网应用程序的流量。即使有硬件加速器(gpu、tpu)的帮助,DL模型训练仍然很昂贵,并且限制了操作频繁的模型更新的能力,以适应不断变化的互联网流量,特别是移动流量。为了解决这个痛点,在这项工作中,我们探索了增量学习(IL)技术,在不进行完全再训练的情况下向模型添加新类,从而加快模型的更新周期。我们认为ICARL是最先进的IL方法,MIRAGE-2019是一个具有40个Android应用程序流量的公共数据集,旨在理解“在流量分类中是否有增量学习的情况”。通过剖析iCarl的内部结构,我们讨论了改进其设计的方法,并给出了一个修订版本,即iCarl 。尽管我们的分析揭示了它们的雏形,IL技术是一个很有前途的研究领域的路线图自动化DL为基础的流量分析系统。 摘要:The recent popularity growth of Deep Learning (DL) re-ignited the interest towards traffic classification, with several studies demonstrating the accuracy of DL-based classifiers to identify Internet applications' traffic. Even with the aid of hardware accelerators (GPUs, TPUs), DL model training remains expensive, and limits the ability to operate frequent model updates necessary to fit to the ever evolving nature of Internet traffic, and mobile traffic in particular. To address this pain point, in this work we explore Incremental Learning (IL) techniques to add new classes to models without a full retraining, hence speeding up model's updates cycle. We consider iCarl, a state of the art IL method, and MIRAGE-2019, a public dataset with traffic from 40 Android apps, aiming to understand "if there is a case for incremental learning in traffic classification". By dissecting iCarl internals, we discuss ways to improve its design, contributing a revised version, namely iCarl . Despite our analysis reveals their infancy, IL techniques are a promising research area on the roadmap towards automated DL-based traffic analysis systems.

【3】 Learning to Delegate for Large-scale Vehicle Routing 标题:大规模车辆路径问题的授权学习

作者:Sirui Li,Zhongxia Yan,Cathy Wu 机构:MIT 链接:https://arxiv.org/abs/2107.04139 摘要:车辆路径问题是一类具有广泛实际应用的组合问题。虽然以前的启发式或基于学习的工作可以在多达100个客户的小问题实例上获得不错的解决方案,但它们的性能不能扩展到大问题。针对大规模VRP问题,提出了一种新的学习增强局部搜索算法。该方法通过识别适当的子问题和$textit{delegating}$将它们的改进交给一个黑盒子问题来迭代地改进解决方案。在每个步骤中,我们利用空间局部性只考虑子问题的线性数,而不是指数。我们将子问题选择框架化为回归问题,并在生成的问题实例训练集上训练变换器。我们证明,在500到3000的vrp上,我们的方法达到了最先进的性能,速度比强基线提高了15倍。 摘要:Vehicle routing problems (VRPs) are a class of combinatorial problems with wide practical applications. While previous heuristic or learning-based works achieve decent solutions on small problem instances of up to 100 customers, their performance does not scale to large problems. This article presents a novel learning-augmented local search algorithm to solve large-scale VRP. The method iteratively improves the solution by identifying appropriate subproblems and $textit{delegating}$ their improvement to a black box subsolver. At each step, we leverage spatial locality to consider only a linear number of subproblems, rather than exponential. We frame subproblem selection as a regression problem and train a Transformer on a generated training set of problem instances. We show that our method achieves state-of-the-art performance, with a speed-up of up to 15 times over strong baselines, on VRPs with sizes ranging from 500 to 3000.

联邦学习|隐私保护|加密(1篇)

【1】 Personalized Federated Learning over non-IID Data for Indoor Localization 标题:基于非IID数据的个性化联合学习室内定位

作者:Peng Wu,Tales Imbiriba,Junha Park,Sunwoo Kim,Pau Closas 机构: Electrical and Computer Engineering Department, Northeastern University, Boston, MA, Department of Electronic Engineering, Hanyang University, Seoul, Korea 链接:https://arxiv.org/abs/2107.04189 摘要:由于无线信道传播模型物理特性的复杂性,利用数据驱动方法对目标进行定位和跟踪是一个热门话题。在这些建模方法中,需要收集数据以准确地训练模型,同时维护用户的隐私。合作实现这些目标的一个吸引人的方案被称为联合学习(FL)。FL方案的一个挑战是存在非独立和同分布(非IID)数据,这是由不同区域的不均匀勘探造成的。在本文中,我们考虑使用最近的FL方案训练一组个性化的模型,然后通过贝叶斯规则优化融合,这使得它适合在室内定位的背景下。 摘要:Localization and tracking of objects using data-driven methods is a popular topic due to the complexity in characterizing the physics of wireless channel propagation models. In these modeling approaches, data needs to be gathered to accurately train models, at the same time that user's privacy is maintained. An appealing scheme to cooperatively achieve these goals is known as Federated Learning (FL). A challenge in FL schemes is the presence of non-independent and identically distributed (non-IID) data, caused by unevenly exploration of different areas. In this paper, we consider the use of recent FL schemes to train a set of personalized models that are then optimally fused through Bayesian rules, which makes it appropriate in the context of indoor localization.

推理|分析|理解|解释(7篇)

【1】 Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation 标题:RELU激活人工神经网络训练中梯度流的收敛性分析

作者:Arnulf Jentzen,Adrian Riekert 机构: Applied Mathematics: Institute for Analysis and Numerics, University of M¨unster, Germany, e-mail: ajentzen a○uni-muenster.de, School of Data Science and Shenzhen Research Institute of Big Data 备注:37 pages 链接:https://arxiv.org/abs/2107.04479 摘要:梯度下降(GD)型优化方案是训练具有修正线性单元(ReLU)激活的人工神经网络(ann)的标准方法。这种方案可以看作是梯度流(GFs)的离散化,与训练ReLU激活的神经网络有关,GD型优化方案在训练ReLU激活的神经网络时的数学收敛性分析中的大部分关键困难似乎已经存在于相应的GF的动力学中微分方程。在神经网络训练中,采用三层(一层输入层、一层隐藏层和一层输出层)的ReLU激活方法,分析这类GF微分方程是本文的重点。特别地,在本文中,我们证明了在目标函数可能是多维连续的情况下,以及在输入数据的概率分布对于Lebesgue测度是绝对连续的情况下,每个有界GF轨道的风险收敛到临界点的风险。另外,本文在一维仿射线性目标函数和输入数据的概率分布符合标准均匀分布的情况下,证明了当初始风险足够小时,每个有界GF轨迹的风险收敛到零。最后,在隐层(一维隐层)只有一个神经元的特殊情况下,证明了当初始风险足够小时,每个(不一定有界)GF轨迹的风险收敛到零,从而加强了仿射线性目标函数的上述结果。 摘要:Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in the training of ANNs with ReLU activation seem to be already present in the dynamics of the corresponding GF differential equations. It is the key subject of this work to analyze such GF differential equations in the training of ANNs with ReLU activation and three layers (one input layer, one hidden layer, and one output layer). In particular, in this article we prove in the case where the target function is possibly multi-dimensional and continuous and in the case where the probability distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, in this article we show in the case of a 1-dimensional affine linear target function and in the case where the probability distribution of the input data coincides with the standard uniform distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.

【2】 Understanding the Distributions of Aggregation Layers in Deep Neural Networks 标题:理解深度神经网络中聚合层的分布

作者:Eng-Jon Ong,Sameed Husain,Miroslaw Bober 机构: University of Surrey 链接:https://arxiv.org/abs/2107.04458 摘要:聚合过程在几乎所有的深网模型中都是普遍存在的。它是一种重要的机制,用于将深层特征整合为更紧凑的表示,同时提高对过度拟合的鲁棒性,并在深层网络中提供空间不变性。特别地,全局聚合层与DNNs的输出层的接近性意味着聚合特征对深网的性能有直接的影响。利用信息论方法可以更好地理解这种关系。然而,这需要了解聚合层激活的分布。为了实现这一点,我们提出了一种新的数学公式,用于分析建模涉及深度特征聚合的层的输出值的概率分布。一个重要的结果是我们的能力,分析预测KL发散的输出节点在DNN。我们还通过实验验证了我们的理论预测与一系列不同的分类任务和数据集的经验观测。 摘要:The process of aggregation is ubiquitous in almost all deep nets models. It functions as an important mechanism for consolidating deep features into a more compact representation, whilst increasing robustness to overfitting and providing spatial invariance in deep nets. In particular, the proximity of global aggregation layers to the output layers of DNNs mean that aggregated features have a direct influence on the performance of a deep net. A better understanding of this relationship can be obtained using information theoretic methods. However, this requires the knowledge of the distributions of the activations of aggregation layers. To achieve this, we propose a novel mathematical formulation for analytically modelling the probability distributions of output values of layers involved with deep feature aggregation. An important outcome is our ability to analytically predict the KL-divergence of output nodes in a DNN. We also experimentally verify our theoretical predictions against empirical observations across a range of different classification tasks and datasets.

【3】 Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis 标题:基于似然比的扭曲风险测度政策梯度方法的非渐近分析

作者:Nithia Vijayan,Prashanth L. A 机构: are with Department of ComputerScience and Engineering, Indian Institute of Technology Madras 链接:https://arxiv.org/abs/2107.04422 摘要:我们提出了策略梯度算法来解决风险敏感强化学习(RL)环境下的控制问题。该算法的目标是在一个幕式马尔可夫决策过程(MDP)中最大化累积报酬的扭曲风险测度(DRM)。我们推导出一个变种的政策梯度定理,以满足DRM的目标。利用这个定理,结合基于似然比(LR)的梯度估计方案,我们提出了策略梯度算法来优化DRM在on-policy和off-policy-RL设置下的性能。我们推导了非渐近界,建立了我们的算法收敛到近似稳定点的DRM目标。 摘要:We propose policy-gradient algorithms for solving the problem of control in a risk-sensitive reinforcement learning (RL) context. The objective of our algorithm is to maximize the distorted risk measure (DRM) of the cumulative reward in an episodic Markov decision process (MDP). We derive a variant of the policy gradient theorem that caters to the DRM objective. Using this theorem in conjunction with a likelihood ratio (LR) based gradient estimation scheme, we propose policy gradient algorithms for optimizing DRM in both on-policy and off-policy RL settings. We derive non-asymptotic bounds that establish the convergence of our algorithms to an approximate stationary point of the DRM objective.

【4】 Understanding surrogate explanations: the interplay between complexity, fidelity and coverage 标题:理解代理解释:复杂性、保真度和覆盖率之间的相互作用

作者:Rafael Poyiadzi,Xavier Renard,Thibault Laugel,Raul Santos-Rodriguez,Marcin Detyniecki 机构: University of Bristol, Bristol, United Kingdom, AXA, Paris, France, Sorbonne Universit´e, CNRS, LIP, F-, Paris, France, Polish Academy of Science, IBS PAN, Warsaw, Poland 备注:12 pages, 8 figures 链接:https://arxiv.org/abs/2107.04309 摘要:本文分析了替代解释背后的基本要素,以更好地理解其内在机制。我们从考虑全局代理开始我们的阐述,描述代理的复杂性和对所建模黑盒的忠实度之间的权衡。我们表明,从全球过渡到地方-减少覆盖面-允许更有利的条件在帕累托边界的保真度复杂的代理。我们讨论的复杂性,保真度和覆盖范围之间的相互作用,并考虑如何不同的用户需求可以导致问题的配方,这些约束或处罚。我们还通过实验证明了如何使局部替代解释过程具有交互性,从而得到更好的解释。 摘要:This paper analyses the fundamental ingredients behind surrogate explanations to provide a better understanding of their inner workings. We start our exposition by considering global surrogates, describing the trade-off between complexity of the surrogate and fidelity to the black-box being modelled. We show that transitioning from global to local - reducing coverage - allows for more favourable conditions on the Pareto frontier of fidelity-complexity of a surrogate. We discuss the interplay between complexity, fidelity and coverage, and consider how different user needs can lead to problem formulations where these are either constraints or penalties. We also present experiments that demonstrate how the local surrogate interpretability procedure can be made interactive and lead to better explanations.

【5】 Sensitivity analysis in differentially private machine learning using hybrid automatic differentiation 标题:混合自动微分在差分私有机器学习中的灵敏度分析

作者:Alexander Ziller,Dmitrii Usynin,Moritz Knolle,Kritika Prakash,Andrew Trask,Rickmer Braren,Marcus Makowski,Daniel Rueckert,Georgios Kaissis 机构: Techni-cal University of Munich, Germany 2Institute of Diag-nostic and Interventional Radiology, Technical University of Mu-nich, Germany 3OpenMined 4Department of Computing 备注:Accepted to the ICML 2021 Theory and Practice of Differential Privacy Workshop 链接:https://arxiv.org/abs/2107.04265 摘要:近年来,出现了诸如差分隐私(DP)之类的隐私保护的正式方法,这种方法能够部署到诸如机器学习(ML)之类的数据驱动任务中。为了使大规模ML与个人隐私损失的原则性分析所需的封闭式推理相协调,需要引入新的工具来进行自动敏感性分析,并通过计算流跟踪个人的数据及其特征。为此,我们介绍了一种新颖的自动微分(AD)系统,它结合了反向模式AD的效率和在计算图中获得任意给定量的闭式表达式的能力。这使得对任意可微函数组合的敏感性建模成为可能,例如对私有数据的神经网络训练。我们通过分析统计数据库查询的各个DP保证来演示我们的方法。此外,我们还探讨了该技术在DP神经网络训练中的应用。我们的方法可以在数据处理环境下对隐私损失进行有原则的推理,并进一步发展自动敏感性分析和隐私预算系统。 摘要:In recent years, formal methods of privacy protection such as differential privacy (DP), capable of deployment to data-driven tasks such as machine learning (ML), have emerged. Reconciling large-scale ML with the closed-form reasoning required for the principled analysis of individual privacy loss requires the introduction of new tools for automatic sensitivity analysis and for tracking an individual's data and their features through the flow of computation. For this purpose, we introduce a novel textit{hybrid} automatic differentiation (AD) system which combines the efficiency of reverse-mode AD with an ability to obtain a closed-form expression for any given quantity in the computational graph. This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data. We demonstrate our approach by analysing the individual DP guarantees of statistical database queries. Moreover, we investigate the application of our technique to the training of DP neural networks. Our approach can enable the principled reasoning about privacy loss in the setting of data processing, and further the development of automatic sensitivity analysis and privacy budgeting systems.

【6】 MCMC Variational Inference via Uncorrected Hamiltonian Annealing 标题:基于未校正哈密顿退火的MCMC变分推断

作者:Tomas Geffner,Justin Domke 机构:College of Information and Computer Science, University of Massachusetts, Amherst, Amherst, MA 链接:https://arxiv.org/abs/2107.04150 摘要:给定一个非标准化的目标分布,我们希望从中获得近似样本,并在其(log)标准化常数logz上获得一个紧下界。退火重要性抽样(AIS)与哈密顿MCMC是一个强大的方法,可以用来做到这一点。它的主要缺点是使用了不可微的过渡核,这使得它的许多参数很难调整。我们提出了一个框架来使用类AIS程序与未修正的哈密顿MCMC,称为未修正的哈密顿退火。我们的方法得到了logz上的紧下界和可微下界。我们的经验表明,我们的方法比其他竞争的方法产生更好的性能,并且使用重参数化梯度调整其参数的能力可能会导致性能的大幅提高。 摘要:Given an unnormalized target distribution we want to obtain approximate samples from it and a tight lower bound on its (log) normalization constant log Z. Annealed Importance Sampling (AIS) with Hamiltonian MCMC is a powerful method that can be used to do this. Its main drawback is that it uses non-differentiable transition kernels, which makes tuning its many parameters hard. We propose a framework to use an AIS-like procedure with Uncorrected Hamiltonian MCMC, called Uncorrected Hamiltonian Annealing. Our method leads to tight and differentiable lower bounds on log Z. We show empirically that our method yields better performances than other competing approaches, and that the ability to tune its parameters using reparameterization gradients may lead to large performance improvements.

【7】 Scaling Gaussian Processes with Derivative Information Using Variational Inference 标题:利用变分推论对具有导数信息的高斯过程进行标度

作者:Misha Padidar,Xinran Zhu,Leo Huang,Jacob R. Gardner,David Bindel 机构:Cornell University, University of Pennsylvania 链接:https://arxiv.org/abs/2107.04061 摘要:具有导数信息的高斯过程在导数信息可用的许多环境中都很有用,包括自然科学中出现的许多贝叶斯优化和回归任务。然而,当在$D$输入维度中对$N$点进行训练时,合并导数观测值会带来占主导地位的$O(N^3D^3)$计算成本。即使是中等规模的问题也难以解决。虽然最近的工作已经解决了低-$D$设置中的这一棘手问题,但是高-$N$,高-$D$设置仍然没有被探索,并且具有很大的价值,特别是随着机器学习问题越来越高维化。本文介绍了利用变分推理实现带导数的完全可伸缩高斯过程回归的方法。类似于使用诱导值稀疏训练集的标签,我们引入了诱导方向导数的概念来稀疏训练集的偏导数信息。这使得我们能够构造一个包含导数信息的变分后验,但其大小既不依赖于完整数据集大小$N$,也不依赖于完整维度$D$。我们展示了我们的方法在各种任务上的完全可扩展性,从高维恒星融合回归任务到使用贝叶斯优化在Pubmed上训练图卷积神经网络。令人惊讶的是,我们发现,即使在只有标签数据可用的情况下,我们的方法也可以提高回归性能。 摘要:Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.

分类|识别(4篇)

【1】 Specialists Outperform Generalists in Ensemble Classification 标题:专家在合奏分类中的表现优于多面手

作者:Sascha Meyen,Frieder Göppert,Helen Alber,Ulrike von Luxburg,Volker H. Franz 机构:Frieder G¨oppert, Department of Computer Science, University of T¨ubingen, T¨ubingen, Germany, Max Planck Institute for Intelligent Systems, T¨ubingen, Germany 链接:https://arxiv.org/abs/2107.04381 摘要:考虑一组$K$个体分类器,其精度是已知的。在接收到一个测试点时,每个分类器输出一个预测的标签和对该特定测试点的预测的置信度。在本文中,我们讨论的问题是,我们是否可以确定的准确性的集合。令人惊讶的是,即使在该设置中以统计上最优的方式组合分类器,也不能像在置信加权多数表决的标准设置中那样,从单个分类器的精度来计算得到的集成分类器的精度。我们证明了集合精度的严格上下界。我们显式地构造达到上下界的个体分类器:专家和通才。我们的理论结果具有非常实际的意义:(1)如果我们使用集成方法,并且可以选择从零开始构造我们的个体(独立)分类器,那么我们应该瞄准专家分类器而不是多面手(2) 我们的界限可以用来确定至少需要多少分类器来达到所需的集成精度。最后,我们通过考虑真实标签和单个分类器输出之间的互信息来改进边界。 摘要:Consider an ensemble of $k$ individual classifiers whose accuracies are known. Upon receiving a test point, each of the classifiers outputs a predicted label and a confidence in its prediction for this particular test point. In this paper, we address the question of whether we can determine the accuracy of the ensemble. Surprisingly, even when classifiers are combined in the statistically optimal way in this setting, the accuracy of the resulting ensemble classifier cannot be computed from the accuracies of the individual classifiers-as would be the case in the standard setting of confidence weighted majority voting. We prove tight upper and lower bounds on the ensemble accuracy. We explicitly construct the individual classifiers that attain the upper and lower bounds: specialists and generalists. Our theoretical results have very practical consequences: (1) If we use ensemble methods and have the choice to construct our individual (independent) classifiers from scratch, then we should aim for specialist classifiers rather than generalists. (2) Our bounds can be used to determine how many classifiers are at least required to achieve a desired ensemble accuracy. Finally, we improve our bounds by considering the mutual information between the true label and the individual classifier's output.

【2】 Multitask Multi-database Emotion Recognition 标题:多任务多数据库情感识别

作者:Manh Tu Vu,Marie Beurton-Aimar 机构:Lucine, Avenue Emile Counord, Bordeaux, France, LaBRI, Cours de la Libération, Talence CEDEX, France 链接:https://arxiv.org/abs/2107.04127 摘要:在这项工作中,我们介绍我们提交的第二届情感行为分析在野生(ABAW)2021年的竞争。我们在多个资料库上训练一个统一的深度学习模型来执行两项任务:七种基本的面部表情预测和价唤醒估计。由于这些数据库并不包含所有这两个任务的标签,我们应用了知识提取技术来训练两个网络:一个教师模型和一个学生模型。学生模型将使用基本真值标签和从预先训练的教师模型导出的软标签进行训练。在训练过程中,为了更好地利用任务间的相关性,我们又增加了一个任务,即两个任务的组合。我们还利用比赛中使用的AffWild2数据库中两个任务之间的视频共享,进一步提高了网络的性能。实验结果表明,该网络在AffWild2数据库的验证集上取得了良好的效果。代码和预训练模型在https://github.com/glmanhtu/multitask-abaw-2021 摘要:In this work, we introduce our submission to the 2nd Affective Behavior Analysis in-the-wild (ABAW) 2021 competition. We train a unified deep learning model on multi-databases to perform two tasks: seven basic facial expressions prediction and valence-arousal estimation. Since these databases do not contains labels for all the two tasks, we have applied the distillation knowledge technique to train two networks: one teacher and one student model. The student model will be trained using both ground truth labels and soft labels derived from the pretrained teacher model. During the training, we add one more task, which is the combination of the two mentioned tasks, for better exploiting inter-task correlations. We also exploit the sharing videos between the two tasks of the AffWild2 database that is used in the competition, to further improve the performance of the network. Experiment results shows that the network have achieved promising results on the validation set of the AffWild2 database. Code and pretrained model are publicly available at https://github.com/glmanhtu/multitask-abaw-2021

【3】 Machine Learning for Stuttering Identification: Review, Challenges & Future Directions 标题:机器学习在口吃识别中的应用:回顾、挑战和未来方向

作者:Shakeel Ahmad Sheikh,Md Sahidullah,Fabrice Hirsch,Slim Ouni 机构: Université de Lorraine 备注:under review in ACM Computing Surveys 链接:https://arxiv.org/abs/2107.04057 摘要:口吃是一种言语障碍,在此期间,言语的流动被不自觉的停顿和重复的声音打断。口吃识别是一个涉及病理学、心理学、声学、信号处理等多学科交叉的研究课题,其检测难度大且复杂。机器学习和深度学习的最新发展极大地改变了语音领域,然而对口吃识别的关注却很少。这项工作填补了这一空白,试图把跨学科领域的研究人员聚集在一起。本文综述了基于声学特征、统计和深度学习的口吃/不流利分类方法。我们还提出了一些挑战和未来可能的方向。 摘要:Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds. Stuttering identification is an interesting interdisciplinary domain research problem which involves pathology, psychology, acoustics, and signal processing that makes it hard and complicated to detect. Recent developments in machine and deep learning have dramatically revolutionized speech domain, however minimal attention has been given to stuttering identification. This work fills the gap by trying to bring researchers together from interdisciplinary fields. In this paper, we review comprehensively acoustic features, statistical and deep learning based stuttering/disfluency classification methods. We also present several challenges and possible future directions.

【4】 Scopeformer: n-CNN-ViT Hybrid Model for Intracranial Hemorrhage Classification 标题:Scopeform:N-CNN-VIT混合颅内出血分类模型

作者:Yassine Barhoumi,Rasool Ghulam 机构:eduRowan University 链接:https://arxiv.org/abs/2107.04575 摘要:我们提出了一个由卷积神经网络(CNN)组成的特征发生器主干,以改进最近出现的视觉变换(ViT)模型。我们解决了RSNA颅内出血分类问题,即从CT切片中识别各种出血类型。我们表明,通过逐步叠加使用多个异常CNN提取的多个特征图,我们可以为ViT模型开发一个特征丰富的输入。我们的方法允许ViT模型在多个层次上关注相关特性。此外,使用不同的范例对n个CNN进行预训练,可以得到不同的特征集,并进一步提高所提出的n-CNN-ViT的性能。在加权对数损失值为0.0708的情况下,测试准确率达到98.04%。所提出的体系结构在用于特征提取的cnn的数量和ViT的大小方面都是模块化和可伸缩的。 摘要:We propose a feature generator backbone composed of an ensemble of convolutional neuralnetworks (CNNs) to improve the recently emerging Vision Transformer (ViT) models. We tackled the RSNA intracranial hemorrhage classification problem, i.e., identifying various hemorrhage types from computed tomography (CT) slices. We show that by gradually stacking several feature maps extracted using multiple Xception CNNs, we can develop a feature-rich input for the ViT model. Our approach allowed the ViT model to pay attention to relevant features at multiple levels. Moreover, pretraining the n CNNs using various paradigms leads to a diverse feature set and further improves the performance of the proposed n-CNN-ViT. We achieved a test accuracy of 98.04% with a weighted logarithmic loss value of 0.0708. The proposed architecture is modular and scalable in both the number of CNNs used for feature extraction and the size of the ViT.

表征(1篇)

【1】 Autoencoder-driven Spiral Representation Learning for Gravitational Wave Surrogate Modelling 标题:引力波代理模拟的自动编码器驱动的螺旋表示学习

作者:Paraskevi Nousi,Styliani-Christina Fragkouli,Nikolaos Passalis,Panagiotis Iosif,Theocharis Apostolatos,George Pappas,Nikolaos Stergioulas,Anastasios Tefas 机构:Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece, Department of Physics, University of Athens, Athens, Greece, Department of Physics, Aristotle University of Thessaloniki, Thessaloniki, Greece 链接:https://arxiv.org/abs/2107.04312 摘要:近年来,人工神经网络在引力波天文学领域得到了广泛的应用,例如在计算量大的二元黑洞激发和合并波形模型的替代建模方面。替代模型产生了快速而精确的引力波近似,神经网络被用于在训练样本之外的任意波形中插值替代模型系数的最后一步。我们使用自动编码器研究了经验插值系数中潜在结构的存在性。我们证明了当系数空间压缩到二维时,会出现螺旋结构,其中螺旋角与质量比呈线性关系。基于这一发现,我们设计了一个参数可学习的螺旋模块,作为神经网络的第一层,学习将输入空间映射到系数。螺旋模块在多种神经网络结构上进行评估,始终比基线模型实现更好的速度-精度折衷。在桌面GPU上进行了深入的实验研究,最终得到了一个替代模型,该模型能在1ms内一次前向传递数百万个输入参数,而相应生成的波形与地面真值波形之间的失配优于比较的基线方法。我们预期在旋转黑洞双星的情况下,也会存在类似的底层结构和相应的计算增益。 摘要:Recently, artificial neural networks have been gaining momentum in the field of gravitational wave astronomy, for example in surrogate modelling of computationally expensive waveform models for binary black hole inspiral and merger. Surrogate modelling yields fast and accurate approximations of gravitational waves and neural networks have been used in the final step of interpolating the coefficients of the surrogate model for arbitrary waveforms outside the training sample. We investigate the existence of underlying structures in the empirical interpolation coefficients using autoencoders. We demonstrate that when the coefficient space is compressed to only two dimensions, a spiral structure appears, wherein the spiral angle is linearly related to the mass ratio. Based on this finding, we design a spiral module with learnable parameters, that is used as the first layer in a neural network, which learns to map the input space to the coefficients. The spiral module is evaluated on multiple neural network architectures and consistently achieves better speed-accuracy trade-off than baseline models. A thorough experimental study is conducted and the final result is a surrogate model which can evaluate millions of input parameters in a single forward pass in under 1ms on a desktop GPU, while the mismatch between the corresponding generated waveforms and the ground-truth waveforms is better than the compared baseline methods. We anticipate the existence of analogous underlying structures and corresponding computational gains also in the case of spinning black hole binaries.

3D|3D重建等相关(1篇)

【1】 Comparison of 2D vs. 3D U-Net Organ Segmentation in abdominal 3D CT images 标题:腹部三维CT图像二维与三维U-net器官分割的比较

作者:Nico Zettler,Andre Mastmeyer 机构:Aalen University, Beethovenstr. , Burren Campus, Germany, Aalen, Baden-Wuerttemberg 备注:None 链接:https://arxiv.org/abs/2107.04062 摘要:提出了一种两步分割的方法,对体层CT图像中的5个腹部器官进行三维分割。首先提取每个相关器官的感兴趣体积作为边界框。提取的体积作为第二阶段的输入,其中具有不同结构尺寸的两个比较的U网络重新构建器官分割作为标签掩码。在这项工作中,我们重点比较二维U网与三维U网的对应。我们的初步结果表明,骰子的改善约6%的最大值。在这项研究中,令我们惊讶的是,肝脏和肾脏,例如显着更好地解决使用更快和GPU内存节省2D U网络。对于其他腹部关键器官,没有显著性差异,但我们观察到2du-Net在所有研究器官的GPU计算方面具有非常显著的优势。 摘要:A two-step concept for 3D segmentation on 5 abdominal organs inside volumetric CT images is presented. First each relevant organ's volume of interest is extracted as bounding box. The extracted volume acts as input for a second stage, wherein two compared U-Nets with different architectural dimensions re-construct an organ segmentation as label mask. In this work, we focus on comparing 2D U-Nets vs. 3D U-Net counterparts. Our initial results indicate Dice improvements of about 6% at maximum. In this study to our surprise, liver and kidneys for instance were tackled significantly better using the faster and GPU-memory saving 2D U-Nets. For other abdominal key organs, there were no significant differences, but we observe highly significant advantages for the 2D U-Net in terms of GPU computational efforts for all organs under study.

优化|敛散性(3篇)

【1】 Optimal Gradient-based Algorithms for Non-concave Bandit Optimization 标题:基于最优梯度的非凹Bandit优化算法

作者:Baihe Huang,Kaixuan Huang,Sham M. Kakade,Jason D. Lee,Qi Lei,Runzhe Wang,Jiaqi Yang 机构:Peking University, Princeton University, University of Washington, Microsoft Research, Tsinghua University 链接:https://arxiv.org/abs/2107.04518 摘要:线性报酬和凹报酬的土匪问题已经得到了广泛的研究,但对非凹报酬土匪问题的研究相对较少。研究了一大类未知报酬函数为非凹函数的bandit问题,包括低阶广义线性bandit问题和带多项式激活的双层神经网络bandit问题。对于低秩广义线性bandit问题,我们在维数上给出了一个minimax最优算法,驳斥了[LMT21,JWWN19]中的两个猜想。我们的算法是基于一个统一的零阶优化范式,适用于极为普遍性,并获得最佳利率在几个结构化多项式设置(在维度)。我们进一步证明了我们的算法在RL生成模型环境中的适用性,从而提高了样本复杂度。最后,我们证明了标准的乐观算法(如UCB)是维数次优的。在具有无噪声报酬的神经网络环境(多项式激活函数)中,我们提出了一种样本复杂度等于内在代数维数的bandit算法。再次,我们证明了乐观方法具有更差的样本复杂度,即在外在维度上的多项式(在多项式次数上可能是指数级的)。 摘要:Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality and attains optimal rates in several structured polynomial settings (in the dimension). We further demonstrate the applicability of our algorithms in RL in the generative model setting, resulting in improved sample complexity over prior approaches. Finally, we show that the standard optimistic algorithms (e.g., UCB) are sub-optimal by dimension factors. In the neural net setting (with polynomial activation functions) with noiseless reward, we provide a bandit algorithm with sample complexity equal to the intrinsic algebraic dimension. Again, we show that optimistic approaches have worse sample complexity, polynomial in the extrinsic dimension (which could be exponentially worse in the polynomial degree).

【2】 Model compression as constrained optimization, with application to neural nets. Part V: combining compressions 标题:模型压缩为约束优化,并将其应用于神经网络。第五部分:组合压缩

作者:Miguel Á. Carreira-Perpiñán,Yerlan Idelbayev 备注:29 pages, 9 figures, 10 tables 链接:https://arxiv.org/abs/2107.04380 摘要:模型压缩一般采用量化、低秩逼近或剪枝等方法,近年来对各种算法进行了研究。一个基本问题是:对于给定的模型,哪种类型的压缩效果更好?或者更好:我们可以通过适当的方式组合压缩来改进吗?我们一般将其描述为一个优化损失的问题,但其中权重被限制为等于单独压缩部分的相加组合;给出了相应零件参数的学习算法。通过对深层神经网络的实验,我们观察到:1)我们可以在误差压缩空间中找到明显更好的模型,表明不同的压缩类型具有互补性;2)最佳的组合类型取决于神经网络的类型。例如,我们可以压缩resnet和AlexNet,每个权值仅使用1位,而不会降低错误率,代价是添加几个浮点权值。然而,通过将低秩与少量浮点权值相结合,VGG网络可以得到更好的压缩。 摘要:Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a given model? Or even better: can we improve by combining compressions in a suitable way? We formulate this generally as a problem of optimizing the loss but where the weights are constrained to equal an additive combination of separately compressed parts; and we give an algorithm to learn the corresponding parts' parameters. Experimentally with deep neural nets, we observe that 1) we can find significantly better models in the error-compression space, indicating that different compression types have complementary benefits, and 2) the best type of combination depends exquisitely on the type of neural net. For example, we can compress ResNets and AlexNet using only 1 bit per weight without error degradation at the cost of adding a few floating point weights. However, VGG nets can be better compressed by combining low-rank with a few floating point weights.

【3】 Many Objective Bayesian Optimization 标题:多目标贝叶斯优化

作者:Lucia Asencio Martín,Eduardo C. Garrido-Merchán 机构: Garrido-Merch´anUniversidad Aut´onoma de Madrid 备注:arXiv admin note: text overlap with arXiv:2101.08061 链接:https://arxiv.org/abs/2107.04126 摘要:一些实际问题需要评估昂贵且有噪声的目标函数。此外,这些目标函数的解析表达式可能是未知的。这些函数被称为黑匣子,例如,估计机器学习算法的泛化误差,并根据其超参数计算其预测时间。多目标贝叶斯优化(MOBO)是一组已成功应用于黑箱同时优化的方法。具体来说,BO方法依赖于目标函数的概率模型,通常是高斯过程。该模型生成目标的预测分布。然而,当多目标优化问题中的目标个数为3个或3个以上时,即多目标设置时,MOBO方法存在问题。特别是,BO过程的代价更高,因为考虑到更多的目标,通过超体积计算解的质量的代价也更高,最重要的是,我们必须评估每个目标函数,浪费昂贵的计算、经济或其他资源。然而,由于优化问题涉及到更多的目标,其中一些目标很可能是多余的,并且没有添加有关问题解决方案的信息。提出了一种表示GP预测分布相似程度的度量方法。我们还提出了一个多目标贝叶斯优化算法,该算法使用这个度量来确定两个目标是否冗余。该算法在发现相似度的情况下停止对其中一个进行评价,既节省了资源,又不影响多目标BO算法的性能。我们在一组玩具、合成、基准和真实的实验中展示了经验证据,证明了GPs预测分布度量和算法的有效性。 摘要:Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOBO) is a set of methods that has been successfully applied for the simultaneous optimization of black-boxes. Concretely, BO methods rely on a probabilistic model of the objective functions, typically a Gaussian process. This model generates a predictive distribution of the objectives. However, MOBO methods have problems when the number of objectives in a multi-objective optimization problem are 3 or more, which is the many objective setting. In particular, the BO process is more costly as more objectives are considered, computing the quality of the solution via the hyper-volume is also more costly and, most importantly, we have to evaluate every objective function, wasting expensive computational, economic or other resources. However, as more objectives are involved in the optimization problem, it is highly probable that some of them are redundant and not add information about the problem solution. A measure that represents how similar are GP predictive distributions is proposed. We also propose a many objective Bayesian optimization algorithm that uses this metric to determine whether two objectives are redundant. The algorithm stops evaluating one of them if the similarity is found, saving resources and not hurting the performance of the multi-objective BO algorithm. We show empirical evidence in a set of toy, synthetic, benchmark and real experiments that GPs predictive distributions of the effectiveness of the metric and the algorithm.

预测|估计(3篇)

【1】 Group-Node Attention for Community Evolution Prediction 标题:群体节点关注度在群落演化预测中的应用

作者:Matt Revelle,Carlotta Domeniconi,Ben Gelman 机构:George Mason University, Fairfax, VA , USA 链接:https://arxiv.org/abs/2107.04522 摘要:随着人们进入和离开社交网络以及他们的活动行为的改变,社交网络中的社区会随着时间的推移而演变。预测群落结构随时间变化的任务称为群落进化预测。该领域的现有工作集中于开发定义事件的框架,同时使用传统的分类方法进行实际预测。提出了一种基于结构和时间信息预测群落演化事件的图神经网络。该模型(GNAN)包含一个组节点注意组件,该组件支持可变大小的输入和基于成员和邻居节点特征的组学习表示。与标准的基线方法进行了比较评估,我们证明了我们的模型优于基线。此外,我们还展示了网络趋势对模型性能的影响。 摘要:Communities in social networks evolve over time as people enter and leave the network and their activity behaviors shift. The task of predicting structural changes in communities over time is known as community evolution prediction. Existing work in this area has focused on the development of frameworks for defining events while using traditional classification methods to perform the actual prediction. We present a novel graph neural network for predicting community evolution events from structural and temporal information. The model (GNAN) includes a group-node attention component which enables support for variable-sized inputs and learned representation of groups based on member and neighbor node features. A comparative evaluation with standard baseline methods is performed and we demonstrate that our model outperforms the baselines. Additionally, we show the effects of network trends on model performance.

【2】 Probabilistic Trajectory Prediction with Structural Constraints 标题:考虑结构约束的概率弹道预测

作者:Weiming Zhi,Lionel Ott,Fabio Ramos 机构: 1 School of Computer Science, the University of Sydney 备注:To appear at IROS 2021 链接:https://arxiv.org/abs/2107.04193 摘要:这项工作解决了预测环境中动态物体运动轨迹的问题。在预测运动模式方面的最新进展通常依赖于机器学习技术从观察到的轨迹推断运动模式,而没有直接结合已知规则的机制。我们提出了一个结合概率学习和约束轨迹优化的新框架。我们的框架的学习组件提供了一个分布在未来的运动轨迹的条件下观察过去的坐标。然后将该分布作为约束优化问题的先验条件,该约束优化问题对轨迹分布施加机会约束。这将导致符合约束的轨迹分布非常类似于先前的。特别地,我们的研究集中在碰撞约束上,使得外推的未来轨迹分布符合环境结构。我们在真实世界和模拟数据集上实证证明了我们的框架学习运动数据的复杂概率运动轨迹的能力,同时直接强制约束以提高通用性,产生更健壮和更高质量的轨迹分布。 摘要:This work addresses the problem of predicting the motion trajectories of dynamic objects in the environment. Recent advances in predicting motion patterns often rely on machine learning techniques to extrapolate motion patterns from observed trajectories, with no mechanism to directly incorporate known rules. We propose a novel framework, which combines probabilistic learning and constrained trajectory optimisation. The learning component of our framework provides a distribution over future motion trajectories conditioned on observed past coordinates. This distribution is then used as a prior to a constrained optimisation problem which enforces chance constraints on the trajectory distribution. This results in constraint-compliant trajectory distributions which closely resemble the prior. In particular, we focus our investigation on collision constraints, such that extrapolated future trajectory distributions conform to the environment structure. We empirically demonstrate on real-world and simulated datasets the ability of our framework to learn complex probabilistic motion trajectories for motion data, while directly enforcing constraints to improve generalisability, producing more robust and higher quality trajectory distributions.

【3】 Ensembles of Randomized NNs for Pattern-based Time Series Forecasting 标题:基于模式的时间序列预测的随机神经网络集成

作者:Grzegorz Dudek,Paweł Pełka 机构:Electrical Engineering Faculty, Częstochowa University of Technology, Częstochowa, Poland 备注:arXiv admin note: text overlap with arXiv:2107.01705 链接:https://arxiv.org/abs/2107.04091 摘要:本文提出了一种基于随机神经网络的集成预测方法。改进的随机学习算法根据数据和目标函数特征生成网络参数,简化了个体学习者的拟合能力。基于模式的时间序列表示方法适用于多季节性时间序列的预测。我们提出了六个策略来控制集合成员的多样性。通过对四个实际预报问题的实例分析,验证了该方法的有效性和优越的性能。在预测精度方面,它优于统计模型以及最先进的机器学习模型。该方法具有训练速度快、结构简单、易于实现、精度高、能够处理时间序列的非平稳性和多个季节性等优点。 摘要:In this work, we propose an ensemble forecasting approach based on randomized neural networks. Improved randomized learning streamlines the fitting abilities of individual learners by generating network parameters in accordance with the data and target function features. A pattern-based representation of time series makes the proposed approach suitable for forecasting time series with multiple seasonality. We propose six strategies for controlling the diversity of ensemble members. Case studies conducted on four real-world forecasting problems verified the effectiveness and superior performance of the proposed ensemble forecasting approach. It outperformed statistical models as well as state-of-the-art machine learning models in terms of forecasting accuracy. The proposed approach has several advantages: fast and easy training, simple architecture, ease of implementation, high accuracy and the ability to deal with nonstationarity and multiple seasonality in time series.

其他神经网络|深度学习|模型|建模(17篇)

【1】 Universal Multilayer Network Exploration by Random Walk with Restart 标题:基于随机游走和重启的通用多层网络探索

作者:Anthony Baptista,Aitor Gonzalez,Anaïs Baudot 机构:Aix-Marseille Univ, INSERM, MMG, Turing Center for Living Systems, Marseille, France; bAix-Marseille Univ, INSERM, TAGC, Turing Center for Living Systems, Marseille, France; dBarcelona Supercomputing Center, Barcelona, Spain 链接:https://arxiv.org/abs/2107.04565 摘要:数年来,数据的数量和种类急剧增加。这些数据通常用网络来表示,然后用网络理论的方法来探索。近年来,网络探索方法不断扩展,以利用更复杂、更丰富的网络框架。例如,随机游动已经扩展到探索多层网络。然而,现有的随机游走方法在处理网络层的组合和异构性方面受到限制。为了应对多层网络日益增加的多样性和复杂性,需要新的分析和数值随机游走方法。我们在此提出MultiXrank,这是一个Python包,它通过优化的实现在任何类型的多层网络上启用带重启的随机行走(RWR)。该软件包由RWR的通用数学公式支持。我们使用漏掉一个交叉验证和链路预测来评估MultiXrank,并引入协议来测量添加或删除多层网络数据对预测性能的影响。通过对参数空间的深入研究,进一步测量了MultiXrank对输入参数的敏感性。最后,我们说明了MultiXrank在人类遗传疾病的背景下,在无监督节点优先排序和监督分类的不同用例中的通用性。 摘要:The amount and variety of data is increasing drastically for several years. These data are often represented as networks, which are then explored with approaches arising from network theory. Recent years have witnessed the extension of network exploration methods to leverage more complex and richer network frameworks. Random walks, for instance, have been extended to explore multilayer networks. However, current random walk approaches are limited in the combination and heterogeneity of network layers they can handle. New analytical and numerical random walk methods are needed to cope with the increasing diversity and complexity of multilayer networks. We propose here MultiXrank, a Python package that enables Random Walk with Restart (RWR) on any kind of multilayer network with an optimized implementation. This package is supported by a universal mathematical formulation of the RWR. We evaluated MultiXrank with leave-one-out cross-validation and link prediction, and introduced protocols to measure the impact of the addition or removal of multilayer network data on prediction performances. We further measured the sensitivity of MultiXrank to input parameters by in-depth exploration of the parameter space. Finally, we illustrate the versatility of MultiXrank with different use-cases of unsupervised node prioritization and supervised classification in the context of human genetic diseases.

【2】 Redescription Model Mining 标题:再描述模型挖掘

作者:Felix I. Stamm,Martin Becker,Markus Strohmaier,Florian Lemmerich 机构:Germany, University of Würzburg, RWTH Aachen University & GESIS, University of Passau 链接:https://arxiv.org/abs/2107.04462 摘要:本文介绍了重描述模型挖掘,这是一种新的方法来识别跨两个数据集的可解释模式,这两个数据集只共享一个子集的属性,没有共同的实例。特别是,重描述模型挖掘的目的是找到一对可描述的数据子集(每个数据集一个),这些数据子集针对一个预先指定的模型类产生相似的异常模型。为了实现这一点,我们结合了两个以前独立的研究领域:异常模型挖掘和重描述挖掘。对于这个新的问题设置,我们开发了兴趣度度量来选择有前途的模式,提出了有效的算法,并在合成和真实数据上展示了它们的潜力。未发现的模式可以暗示在数据集中表现出来的常见底层现象,从而能够发现不出现在同一数据集中的属性(组合)之间的可能关联。 摘要:This paper introduces Redescription Model Mining, a novel approach to identify interpretable patterns across two datasets that share only a subset of attributes and have no common instances. In particular, Redescription Model Mining aims to find pairs of describable data subsets -- one for each dataset -- that induce similar exceptional models with respect to a prespecified model class. To achieve this, we combine two previously separate research areas: Exceptional Model Mining and Redescription Mining. For this new problem setting, we develop interestingness measures to select promising patterns, propose efficient algorithms, and demonstrate their potential on synthetic and real-world data. Uncovered patterns can hint at common underlying phenomena that manifest themselves across datasets, enabling the discovery of possible associations between (combinations of) attributes that do not appear in the same dataset.

【3】 Improving Model Robustness with Latent Distribution Locally and Globally 标题:利用局部和全局潜在分布提高模型稳健性

作者:Zhuang Qian,Shufei Zhang,Kaizhu Huang,Qiufeng Wang,Rui Zhang,Xinping Yi 机构:Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, L,GJ, United Kingdom, School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu , P.R.China 链接:https://arxiv.org/abs/2107.04401 摘要:在这项工作中,我们考虑对抗全局攻击的深度神经网络模型鲁棒性从全局歧管的角度。利用局部和全局的潜在信息,我们提出了一种新的鲁棒优化对抗训练方法,以及一种通过判别器和分类器之间的对抗博弈生成潜在流形对抗样本(lmae)的简单方法。提出了一种基于潜在分布的对抗性训练(ATLD)方法,该方法在无监督的情况下利用潜在流形构造lmae来防御对抗性攻击。ATLD保留了潜在流形的局部和全局信息,提高了对抗攻击的鲁棒性。为了验证我们提出的方法的有效性,我们在不同的数据集(如CIFAR-10、CIFAR-100、SVHN)上进行了大量的实验,实验结果表明,我们的方法在对抗鲁棒性方面比现有的方法(如特征散射)有很大的提高。源代码可在https://github.com/LitterQ/ATLD-pytorch. 摘要:In this work, we consider model robustness of deep neural networks against adversarial attacks from a global manifold perspective. Leveraging both the local and global latent information, we propose a novel adversarial training method through robust optimization, and a tractable way to generate Latent Manifold Adversarial Examples (LMAEs) via an adversarial game between a discriminator and a classifier. The proposed adversarial training with latent distribution (ATLD) method defends against adversarial attacks by crafting LMAEs with the latent manifold in an unsupervised manner. ATLD preserves the local and global information of latent manifold and promises improved robustness against adversarial attacks. To verify the effectiveness of our proposed method, we conduct extensive experiments over different datasets (e.g., CIFAR-10, CIFAR-100, SVHN) with different adversarial attacks (e.g., PGD, CW), and show that our method substantially outperforms the state-of-the-art (e.g., Feature Scattering) in adversarial robustness by a large accuracy margin. The source codes are available at https://github.com/LitterQ/ATLD-pytorch.

【4】 Hoechst Is All You Need: LymphocyteClassification with Deep Learning 标题:Hoechst是您所需要的一切:基于深度学习的淋巴细胞分类

作者:Jessica Cooper,In Hwa Um,Ognjen Arandjelović,David J Harrison 机构:University of St Andrews 备注:15 pages, 4 figures 链接:https://arxiv.org/abs/2107.04388 摘要:多重免疫荧光和免疫组织化学使癌症病理学家能够识别细胞表面表达的几种蛋白质,从而使细胞分类、更好地了解肿瘤微环境、更准确的诊断、预后,并根据个别病人的免疫状况进行量身定制的免疫治疗。然而,他们是昂贵和耗时的过程,需要复杂的染色和成像技术的专家技术人员。Hoechst染色更便宜,也更容易进行,但在这种情况下通常不使用,因为它与DNA结合,而不是与免疫荧光技术靶向的蛋白质结合,而且以前认为仅基于DNA形态来分化表达这些蛋白质的细胞是不可能的。在这项工作中,我们展示了另一种方法,即训练一个深度卷积神经网络来识别表达三种蛋白质(T淋巴细胞标记CD3和CD8,以及B淋巴细胞标记CD20)的细胞,其精确度和召回率超过90%,仅来自Hoechst 33342染色的组织。我们的模型学习了以前未知的与这些蛋白表达相关的形态学特征,这些特征可用于准确区分淋巴细胞亚型,用于评估免疫细胞浸润等关键预后指标,从而预测和改善患者的预后,而无需昂贵的多重免疫荧光。 摘要:Multiplex immunofluorescence and immunohistochemistry benefit patients by allowing cancer pathologists to identify several proteins expressed on the surface of cells, enabling cell classification, better understanding of the tumour micro-environment, more accurate diagnoses, prognoses, and tailored immunotherapy based on the immune status of individual patients. However, they are expensive and time consuming processes which require complex staining and imaging techniques by expert technicians. Hoechst staining is much cheaper and easier to perform, but is not typically used in this case as it binds to DNA rather than to the proteins targeted by immunofluorescent techniques, and it was not previously thought possible to differentiate cells expressing these proteins based only on DNA morphology. In this work we show otherwise, training a deep convolutional neural network to identify cells expressing three proteins (T lymphocyte markers CD3 and CD8, and the B lymphocyte marker CD20) with greater than 90% precision and recall, from Hoechst 33342 stained tissue only. Our model learns previously unknown morphological features associated with expression of these proteins which can be used to accurately differentiate lymphocyte subtypes for use in key prognostic metrics such as assessment of immune cell infiltration,and thereby predict and improve patient outcomes without the need for costly multiplex immunofluorescence.

【5】 Bib2Auth: Deep Learning Approach for Author Disambiguation using Bibliographic Data 标题:Bib2Auth:利用书目数据进行作者排歧的深度学习方法

作者:Zeyd Boukhers,Nagaraj Bahubali,Abinaya Thulsi Chandrasekaran,Adarsh Anand,Soniya Manchenahalli Gnanendra Prasadand,Sriram Aralappa 机构:Soniya, Manchenahalli Gnanendra Prasad, and 备注:Accepted and presented at the workshop BiblioDAP@KDD2021 链接:https://arxiv.org/abs/2107.04382 摘要:在数字图书馆中,由于姓名的同义和同名,作者姓名的模糊性一直是一个关键的开放性问题。在本文中,我们提出了一种新的方法,通过依赖作者的合作模式和研究领域,将作者姓名与现实世界中的实体联系起来。我们的监督深度学习模型通过捕捉作者与合作作者的关系和研究领域来识别作者,研究领域以目标作者的出版物的标题和来源为代表。这些属性通过其语义和符号表示进行编码。为此,Bib2Auth使用来自DBLP存储库的约22K个书目记录,并与每对合作作者一起接受训练。大量的实验证明了该方法能够区分同名作者和识别不同姓名变体的作者。Bib2Auth在一个相对较大的数据集上表现出良好的性能,这使得它能够直接集成到书目索引中。 摘要:Author name ambiguity remains a critical open problem in digital libraries due to synonymy and homonymy of names. In this paper, we propose a novel approach to link author names to their real-world entities by relying on their co-authorship pattern and area of research. Our supervised deep learning model identifies an author by capturing his/her relationship with his/her co-authors and area of research, which is represented by the titles and sources of the target author's publications. These attributes are encoded by their semantic and symbolic representations. To this end, Bib2Auth uses ~ 22K bibliographic records from the DBLP repository and is trained with each pair of co-authors. The extensive experiments have proved the capability of the approach to distinguish between authors sharing the same name and recognize authors with different name variations. Bib2Auth has shown good performance on a relatively large dataset, which qualifies it to be directly integrated into bibliographic indices.

【6】 IDRLnet: A Physics-Informed Neural Network Library 标题:IDRLnet:一个物理信息神经网络库

作者:Wei Peng,Jun Zhang,Weien Zhou,Xiaoyu Zhao,Wen Yao,Xiaoqian Chen 机构:Defense Innovation Institute 链接:https://arxiv.org/abs/2107.04320 摘要:物理信息神经网络(PINN)是一种用于求解偏微分方程正、反问题的科学计算框架。本文系统地介绍了一个Python工具箱IDRLnet,用于通过PINN建模和求解问题。IDRLnet为各种PINN算法和应用构建了框架。它提供了一种结构化的方法来将几何对象、数据源、人工神经网络、损失度量和优化器合并到Python中。此外,它还提供了求解含噪反问题、变分极小化和积分微分方程的功能。新的PINN变体可以很容易地集成到框架中。源代码、教程和文档可从url获得{https://github.com/idrl-lab/idrlnet}. 摘要:Physics Informed Neural Network (PINN) is a scientific computing framework used to solve both forward and inverse problems modeled by Partial Differential Equations (PDEs). This paper introduces IDRLnet, a Python toolbox for modeling and solving problems through PINN systematically. IDRLnet constructs the framework for a wide range of PINN algorithms and applications. It provides a structured way to incorporate geometric objects, data sources, artificial neural networks, loss metrics, and optimizers within Python. Furthermore, it provides functionality to solve noisy inverse problems, variational minimization, and integral differential equations. New PINN variants can be integrated into the framework easily. Source code, tutorials, and documentation are available at url{https://github.com/idrl-lab/idrlnet}.

【7】 On the Variance of the Fisher Information for Deep Learning 标题:深度学习中Fisher信息的方差研究

作者:Alexander Soen,Ke Sun 机构:The Australian National University, Canberra, Australia, CSIRO’s Data, Sydney, Australia 链接:https://arxiv.org/abs/2107.04205 摘要:Fisher信息矩阵(FIM)已被应用于深度学习领域。它与损失景观、参数方差、二阶优化和深度学习理论密切相关。确切的FIM要么以封闭形式不可用,要么计算成本太高。在实践中,它几乎总是基于经验样本进行估计。我们基于FIM的两个等价表示研究了两个这样的估计量。它们都是公正的,并且与基本的“真实”职能指令手册保持一致。它们的估计质量的特征是它们的方差以封闭形式给出。我们限制了它们的方差,并分析了深度神经网络的参数结构如何影响方差。我们讨论了这个方差度量的意义和我们在深度学习中的界限。 摘要:The Fisher information matrix (FIM) has been applied to the realm of deep learning. It is closely related to the loss landscape, the variance of the parameters, second order optimization, and deep learning theory. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM. They are both unbiased and consistent with respect to the underlying "true" FIM. Their estimation quality is characterized by their variance given in closed form. We bound their variances and analyze how the parametric structure of a deep neural network can impact the variance. We discuss the meaning of this variance measure and our bounds in the context of deep learning.

【8】 Structured Model Pruning of Convolutional Networks on Tensor Processing Units 标题:张量处理单元上卷积网络的结构化模型剪枝

作者:Kongtao Chen,Ken Franko,Ruoxin Sang 备注:International Conference on Machine Learning 2021 链接:https://arxiv.org/abs/2107.04191 摘要:卷积神经网络的部署往往受到高计算和存储要求的阻碍。结构化模型剪枝是缓解这些需求的一种很有前途的方法。以VGG-16模型为例,我们在张量处理单元(tpu)上测量了各种结构化模型修剪方法和数据集(CIFAR-10和ImageNet)的精度效率权衡。为了度量模型的实际性能,我们为TensorFlow2开发了一个结构化的模型修剪库来修改模型(而不是添加遮罩层)。我们发现,结构化模型修剪可以显著提高tpu上的模型内存使用率和速度,而不会损失准确性,特别是对于小型数据集(如CIFAR-10)。 摘要:The deployment of convolutional neural networks is often hindered by high computational and storage requirements. Structured model pruning is a promising approach to alleviate these requirements. Using the VGG-16 model as an example, we measure the accuracy-efficiency trade-off for various structured model pruning methods and datasets (CIFAR-10 and ImageNet) on Tensor Processing Units (TPUs). To measure the actual performance of models, we develop a structured model pruning library for TensorFlow2 to modify models in place (instead of adding mask layers). We show that structured model pruning can significantly improve model memory usage and speed on TPUs without losing accuracy, especially for small datasets (e.g., CIFAR-10).

【9】 Greedy structure learning from data that contains systematic missing values 标题:从包含系统缺失值的数据中进行贪婪结构学习

作者:Yang Liu,Anthony C. Constantinou 机构:Received: date Accepted: date 链接:https://arxiv.org/abs/2107.04184 摘要:从包含缺失值的数据中学习是许多领域的常见现象。相对较少的贝叶斯网络结构学习算法能够解释缺失数据,而那些确实依赖于假设缺失数据是随机缺失的标准方法的算法,例如期望最大化算法。由于缺失数据通常是系统性的,因此需要更实用的方法来有效地处理包含不随机缺失的缺失值的数据集。缺少处理系统性缺失数据的方法阻碍了BN结构学习方法在现实问题中的应用,在现实问题中,缺失不是随机的。本文描述了贪婪搜索结构学习的三种变体,它们利用成对删除和逆概率加权来最大限度地利用观测数据并限制由缺失值引起的潜在偏差。前两个变量可以看作是第三个变量和表现最好的变量的子版本,但它们本身在说明学习准确性的连续改进方面很重要。实证研究表明,无论是在学习精度和效率方面,还是在数据随机缺失和非随机缺失的情况下,该方法都优于目前常用的结构EM算法。 摘要:Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random.

【10】 Does Form Follow Function? An Empirical Exploration of the Impact of Deep Neural Network Architecture Design on Hardware-Specific Acceleration 标题:形式遵循功能吗?深度神经网络结构设计对硬件加速影响的实证研究

作者:Saad Abbasi,Mohammad Javad Shafiee,Ellick Chan,Alexander Wong 机构:Waterloo, ON, Canada, University of Waterloo, DarwinAI, Intel Corporation, United States 备注:8 pages 链接:https://arxiv.org/abs/2107.04144 摘要:关于深层神经网络架构设计和硬件特定加速的形式和功能之间的细粒度关系是研究文献中没有很好研究的一个领域,形式通常由精度而不是硬件功能决定。在这项研究中,我们进行了全面的实证研究,以探讨深层神经网络架构设计对通过特定硬件加速实现的推理加速程度的影响。更具体地说,我们通过OpenVINO微处理器特定加速和GPU特定加速的视角,实证研究了各种常用的宏体系结构设计模式对不同体系结构深度的影响。实验结果表明,在利用硬件特定加速的情况下,平均推理速度提高了380%,而推理速度的提高程度因宏体系结构设计模式的不同而有很大的差异,其中最快的加速速度达到了550%。此外,我们还深入探讨了随着体系结构深度和宽度的增加,FLOPs需求、3级缓存效率和网络延迟之间的关系。最后,我们分析了在各种手工制作的深度卷积神经网络架构设计以及通过神经架构搜索策略发现的架构设计中,使用硬件特定加速与本地深度学习框架相比,推理时间的减少。我们发现,DARTS派生的体系结构受益于硬件特定软件加速的最大改进(1200%),而基于深度瓶颈卷积的MobileNet-V2的总体推断时间最低,约为2.4毫秒。 摘要:The fine-grained relationship between form and function with respect to deep neural network architecture design and hardware-specific acceleration is one area that is not well studied in the research literature, with form often dictated by accuracy as opposed to hardware function. In this study, a comprehensive empirical exploration is conducted to investigate the impact of deep neural network architecture design on the degree of inference speedup that can be achieved via hardware-specific acceleration. More specifically, we empirically study the impact of a variety of commonly used macro-architecture design patterns across different architectural depths through the lens of OpenVINO microprocessor-specific and GPU-specific acceleration. Experimental results showed that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern, with the greatest speedup achieved on the depthwise bottleneck convolution design pattern at 550%. Furthermore, we conduct an in-depth exploration of the correlation between FLOPs requirement, level 3 cache efficacy, and network latency with increasing architectural depth and width. Finally, we analyze the inference time reductions using hardware-specific acceleration when compared to native deep learning frameworks across a wide variety of hand-crafted deep convolutional neural network architecture designs as well as ones found via neural architecture search strategies. We found that the DARTS-derived architecture to benefit from the greatest improvement from hardware-specific software acceleration (1200%) while the depthwise bottleneck convolution-based MobileNet-V2 to have the lowest overall inference time of around 2.4 ms.

【11】 Fedlearn-Algo: A flexible open-source privacy-preserving machine learning platform 标题:FedLearning-Algo:一个灵活的开源隐私保护机器学习平台

作者:Bo Liu,Chaowei Tan,Jiazhou Wang,Tao Zeng,Huasong Shan,Houpu Yao,Huang Heng,Peng Dai,Liefeng Bo,Yanqing Chen 机构:JD Finance America Corporation, Mountain View, CA, USA 链接:https://arxiv.org/abs/2107.04129 摘要:本文介绍了一个开源的隐私保护机器学习平台Fedlearn-Algo。我们利用这个平台展示了我们在隐私保护机器学习算法方面的研究和开发成果。作为第一批新的FL算法实例,我们发布了垂直联邦核二元分类模型和垂直联邦随机森林模型。在我们的实践中,它们已经被测试为比现有的垂直联合学习模型更有效。除了新的FL算法实例外,我们还发布了一个机器通信模块。统一数据传输接口支持在机器之间传输广泛使用的数据格式。我们将通过添加更多的功能模块和算法示例来维护这个平台。 摘要:In this paper, we present Fedlearn-Algo, an open-source privacy preserving machine learning platform. We use this platform to demonstrate our research and development results on privacy preserving machine learning algorithms. As the first batch of novel FL algorithm examples, we release vertical federated kernel binary classification model and vertical federated random forest model. They have been tested to be more efficient than existing vertical federated learning models in our practice. Besides the novel FL algorithm examples, we also release a machine communication module. The uniform data transfer interface supports transfering widely used data formats between machines. We will maintain this platform by adding more functional modules and algorithm examples.

【12】 Deep Learning for Mean Field Games and Mean Field Control with Applications to Finance 标题:平均场对策和平均场控制的深度学习及其在金融中的应用

作者:René Carmona,Mathieu Laurière 链接:https://arxiv.org/abs/2107.04568 摘要:金融市场和更普遍的宏观经济模型涉及到大量的个体通过变量相互作用,例如由所有主体的总体行为产生的价格。引入平均场对策来研究这类问题在有限人数下的纳什均衡问题。这一理论在过去十年中得到了广泛的发展,使用了分析和概率工具,并发现了广泛的应用,从经济学到人群运动。最近,与机器学习的交互吸引了越来越多的兴趣。这方面特别适用于解决具有复杂结构、高维或具有共同随机性来源的非常大的博弈。在这一章中,我们回顾了关于平均场游戏和深度学习之间相互作用的文献,重点介绍了三种方法。特别强调金融应用。 摘要:Financial markets and more generally macro-economic models involve a large number of individuals interacting through variables such as prices resulting from the aggregate behavior of all the agents. Mean field games have been introduced to study Nash equilibria for such problems in the limit when the number of players is infinite. The theory has been extensively developed in the past decade, using both analytical and probabilistic tools, and a wide range of applications have been discovered, from economics to crowd motion. More recently the interaction with machine learning has attracted a growing interest. This aspect is particularly relevant to solve very large games with complex structures, in high dimension or with common sources of randomness. In this chapter, we review the literature on the interplay between mean field games and deep learning, with a focus on three families of methods. A special emphasis is given to financial applications.

【13】 The Bayesian Learning Rule 标题:贝叶斯学习规则

作者:Mohammad Emtiyaz Khan,Håvard Rue 机构:RIKEN Center for AI Project, Tokyo, Japan, H˚avard Rue, CEMSE Division, KAUST, Thuwal, Saudi Arabia 链接:https://arxiv.org/abs/2107.04562 摘要:我们证明了许多机器学习算法都是一种称为贝叶斯学习规则的算法的具体实例。该规则源于贝叶斯原理,产生了优化、深度学习和图形模型等领域的广泛算法。这包括经典算法,如岭回归、牛顿法和卡尔曼滤波,以及现代深度学习算法,如随机梯度下降、RMSprop和Dropout。推导这种算法的关键思想是利用自然梯度估计的候选分布来逼近后验分布。不同的候选分布会导致不同的算法,而对自然梯度的进一步逼近会导致这些算法的变体。我们的工作不仅统一、推广和改进了现有的算法,而且有助于我们设计新的算法。 摘要:We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

【14】 Deep Learning for Reduced Order Modelling and Efficient Temporal Evolution of Fluid Simulations 标题:用于流体模拟降阶建模和高效时间演化的深度学习

作者:Pranshu Pant,Ruchit Doshi,Pranav Bahl,Amir Barati Farimani 机构:Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA 备注:16 pages, 11 figures 链接:https://arxiv.org/abs/2107.04556 摘要:降阶模型(ROM)已被广泛应用于高阶动力系统的低阶表示。使用这些表示,ROMs可以有效地模拟流场,同时使用明显较少的参数。传统的rom通过使用降维技术(如适当正交分解(POD))将高阶流形线性投影到低维空间来实现这一点。在这项工作中,我们开发了一个新的深度学习框架DL-ROM(深度学习-降阶建模)来创建一个能够非线性投影到降阶状态的神经网络。然后,我们使用学习到的简化状态,有效地预测未来的时间步长的模拟使用三维自动编码器和三维U-Net为基础的架构。我们的模型DL-ROM能够从所学习的ROM中创建高度精确的重构,并且因此能够通过在所学习的简化状态下的时间遍历来有效地预测未来的时间步长。所有这些都是在没有地面真值监督或需要迭代求解昂贵的Navier-Stokes(NS)方程的情况下实现的,从而节省了大量的计算量。为了测试我们的方法的有效性和性能,我们使用重建性能和计算运行时度量在五个不同的计算流体动力学(CFD)数据集上评估了我们的实现。DL-ROM可以将迭代求解器的计算运行时间减少近两个数量级,同时保持一个可接受的错误阈值。 摘要:Reduced Order Modelling (ROM) has been widely used to create lower order, computationally inexpensive representations of higher-order dynamical systems. Using these representations, ROMs can efficiently model flow fields while using significantly lesser parameters. Conventional ROMs accomplish this by linearly projecting higher-order manifolds to lower-dimensional space using dimensionality reduction techniques such as Proper Orthogonal Decomposition (POD). In this work, we develop a novel deep learning framework DL-ROM (Deep Learning - Reduced Order Modelling) to create a neural network capable of non-linear projections to reduced order states. We then use the learned reduced state to efficiently predict future time steps of the simulation using 3D Autoencoder and 3D U-Net based architectures. Our model DL-ROM is able to create highly accurate reconstructions from the learned ROM and is thus able to efficiently predict future time steps by temporally traversing in the learned reduced state. All of this is achieved without ground truth supervision or needing to iteratively solve the expensive Navier-Stokes(NS) equations thereby resulting in massive computational savings. To test the effectiveness and performance of our approach, we evaluate our implementation on five different Computational Fluid Dynamics (CFD) datasets using reconstruction performance and computational runtime metrics. DL-ROM can reduce the computational runtimes of iterative solvers by nearly two orders of magnitude while maintaining an acceptable error threshold.

【15】 Continual Learning in the Teacher-Student Setup: Impact of Task Similarity 标题:师生系统中的持续学习:任务相似性的影响

作者:Sebastian Lee,Sebastian Goldt,Andrew Saxe 机构: UK 2International School ofAdvanced Studies (SISSA), Italy 3Department of Ex-perimental Psychology, University of Oxford 备注:None 链接:https://arxiv.org/abs/2107.04384 摘要:连续学习——按顺序学习许多任务的能力对于人工学习系统来说是至关重要的。然而,深度网络的标准训练方法经常遭受灾难性遗忘,即学习新任务会抹去先前任务的知识。虽然灾难性遗忘给问题贴上了标签,但任务间干扰的理论原因仍然不清楚。在这里,我们试图通过研究师生互动中的持续学习来缩小理论与实践之间的差距。我们扩展了以往的分析工作,在两层网络的师生设置到多个教师。以每一位教师代表一个不同的任务为例,我们研究了教师之间的关系如何影响学生在任务转换时表现出的遗忘和迁移。与最近的研究一致,我们发现当任务依赖于相似的特征时,中间任务相似性会导致最大的遗忘。然而,特征相似性只是任务关联的一种方式。师生教学法允许我们在读出(隐到输出的权重)和特征(输入到隐到输出的权重)的层次上分离任务相似性。我们发现两种类型的相似性、初始转移/遗忘率、最大转移/遗忘和长期转移/遗忘之间存在复杂的相互作用。总之,这些结果有助于阐明导致灾难性遗忘的各种因素。 摘要:Continual learning-the ability to learn many tasks in sequence-is critical for artificial learning systems. Yet standard training methods for deep networks often suffer from catastrophic forgetting, where learning new tasks erases knowledge of earlier tasks. While catastrophic forgetting labels the problem, the theoretical reasons for interference between tasks remain unclear. Here, we attempt to narrow this gap between theory and practice by studying continual learning in the teacher-student setup. We extend previous analytical work on two-layer networks in the teacher-student setup to multiple teachers. Using each teacher to represent a different task, we investigate how the relationship between teachers affects the amount of forgetting and transfer exhibited by the student when the task switches. In line with recent work, we find that when tasks depend on similar features, intermediate task similarity leads to greatest forgetting. However, feature similarity is only one way in which tasks may be related. The teacher-student approach allows us to disentangle task similarity at the level of readouts (hidden-to-output weights) and features (input-to-hidden weights). We find a complex interplay between both types of similarity, initial transfer/forgetting rates, maximum transfer/forgetting, and long-term transfer/forgetting. Together, these results help illuminate the diverse factors contributing to catastrophic forgetting.

【16】 Training a Deep Neural Network via Policy Gradients for Blind Source Separation in Polyphonic Music Recordings 标题:基于策略梯度的深层神经网络复调音乐盲源分离

作者:Sören Schulze,Johannes Leuschner,Emily J. King 机构:Center for Industrial Mathematics, University of Bremen, Bibliothekstr. , Bremen, Germany;, Mathematics Department, Colorado State University, Campus Delivery, Weber Bldg, Fort, 链接:https://arxiv.org/abs/2107.04235 摘要:本文提出了一种在音频信号中实现乐器声音盲分离的方法。我们通过一个参数模型来描述单个音调,训练一个字典来捕捉谐波的相对振幅。模型参数的预测采用深度神经网络的U网络。基于模型预测和单个STFT时间帧之间的差异,在没有地面真值信息的情况下训练网络。由于一些模型参数不产生有用的反向传播梯度,我们对它们进行随机建模,并使用策略梯度代替。为了提供相位信息并解释基于字典的表示中的不精确性,我们还让网络输出一个直接预测,然后使用该预测来重新合成各个乐器的音频信号。由于神经网络的灵活性,非谐性可以无缝结合,不需要对输入光谱进行预处理。我们的算法产生高质量的分离结果,对各种不同的音频样本(包括声学和合成)的干扰特别低,前提是样本包含足够的训练数据,并且乐器的光谱特性足够稳定,可以通过字典进行近似。 摘要:We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual STFT time frames. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.

【17】 On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models 标题:基于HMM和CTC的全上下文ASR模型的无格增强MMI训练

作者:Xiaohui Zhang,Vimal Manohar,David Zhang,Frank Zhang,Yangyang Shi,Nayan Singhal,Julian Chan,Fuchun Peng,Yatharth Saraf,Mike Seltzer 机构:Facebook AI, USA 备注:submitted to ASRU 2021 链接:https://arxiv.org/abs/2107.04154 摘要:混合自动语音识别(ASR)模型通常采用CTC或LF-MMI准则进行顺序训练。然而,它们有着截然不同的遗产,通常在不同的框架中实现。本文通过解耦建模单元和标签拓扑的概念,建立适当的分子/分母图,建立了混合声学建模的通用框架。在这个框架中,我们证明了LF-MMI是一个强大的训练标准,适用于有限上下文和全上下文模型,对于单字/单字/双字/单字单元,以及HMM/CTC两种拓扑结构。在此框架下,我们提出了三种新的训练方案:chenone(ch)/wordpiece(wp)-CTC-bMMI和wordpiece(wp)-HMM-bMMI,它们在训练性能、译码效率和译码时间戳精度方面具有不同的优势。在Librispeech上对不同训练方案的优点进行了综合评价,并在两个实际ASR任务上对wp-CTC-bMMI和ch-CTC-bMMI进行了评价。此外,我们还发现双字符HMM-MMI模型比传统的非神经GMM-HMM模型具有更好的对齐效果。 摘要:Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concepts of modeling units and label topologies and building proper numerator/denominator graphs accordingly, we establish a generalized framework for hybrid acoustic modeling (AM). In this framework, we show that LF-MMI is a powerful training criterion applicable to both limited-context and full-context models, for wordpiece/mono-char/bi-char/chenone units, with both HMM/CTC topologies. From this framework, we propose three novel training schemes: chenone(ch)/wordpiece(wp)-CTC-bMMI, and wordpiece(wp)-HMM-bMMI with different advantages in training performance, decoding efficiency and decoding time-stamp accuracy. The advantages of different training schemes are evaluated comprehensively on Librispeech, and wp-CTC-bMMI and ch-CTC-bMMI are evaluated on two real world ASR tasks to show their effectiveness. Besides, we also show bi-char(bc) HMM-MMI models can serve as better alignment models than traditional non-neural GMM-HMMs.

其他(17篇)

【1】 ANCER: Anisotropic Certification via Sample-wise Volume Maximization 标题:ANCER:基于样本量最大化的各向异性认证

作者:Francisco Eiras,Motasem Alfarra,M. Pawan Kumar,Philip H. S. Torr,Puneet K. Dokania,Bernard Ghanem,Adel Bibi 机构: University of Oxford, United Kingdom, KAUST, Saudi Arabia, Five AI Limited, United Kingdom 备注:First two authors and the last one contributed equally to this work 链接:https://arxiv.org/abs/2107.04570 摘要:随机平滑是近年来发展起来的一种有效的深度神经网络分类器规模验证工具。关于随机平滑的所有现有技术都集中于各向同性$ellu p$认证,其优点是产生可以通过$ellu p$-范数半径在各向同性方法之间容易比较的证书。然而,各向同性认证限制了在最坏情况下对手的输入周围可以认证的区域,即它不能推理其他“接近的”、潜在的大的、恒定的预测安全区域。为了缓解这个问题,(i)在简化分析之后,我们从理论上把各向同性随机平滑$ellu 1$和$ellu 2$证书扩展到它们的广义各向异性证书。此外,(ii)我们提出了评估指标,允许比较一般证书(如果证书证明了超集区域,则证书优于另一证书),并通过认证区域的体积对每个证书进行量化。我们介绍了ANCER,一个通过体积最大化获得给定测试集样本的各向异性证书的实用框架。我们的实证结果表明,ANCER在CIFAR-10和ImageNet上都达到了最先进的$ellu 1$和$ellu 2$认证精度,同时在体积方面认证了更大的区域,从而突出了远离各向同性分析的好处。我们实验中使用的代码在https://github.com/MotasemAlfarra/ANCER. 摘要:Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale. All prior art on randomized smoothing has focused on isotropic $ell_p$ certification, which has the advantage of yielding certificates that can be easily compared among isotropic methods via $ell_p$-norm radius. However, isotropic certification limits the region that can be certified around an input to worst-case adversaries, ie it cannot reason about other "close", potentially large, constant prediction safe regions. To alleviate this issue, (i) we theoretically extend the isotropic randomized smoothing $ell_1$ and $ell_2$ certificates to their generalized anisotropic counterparts following a simplified analysis. Moreover, (ii) we propose evaluation metrics allowing for the comparison of general certificates - a certificate is superior to another if it certifies a superset region - with the quantification of each certificate through the volume of the certified region. We introduce ANCER, a practical framework for obtaining anisotropic certificates for a given test set sample via volume maximization. Our empirical results demonstrate that ANCER achieves state-of-the-art $ell_1$ and $ell_2$ certified accuracy on both CIFAR-10 and ImageNet at multiple radii, while certifying substantially larger regions in terms of volume, thus highlighting the benefits of moving away from isotropic analysis. Code used in our experiments is available in https://github.com/MotasemAlfarra/ANCER.

【2】 Using Machine Translation to Localize Task Oriented NLG Output 标题:使用机器翻译实现面向任务的NLG输出本地化

作者:Scott Roy,Cliff Brunk,Kyu-Young Kim,Justin Zhao,Markus Freitag,Mihir Kale,Gagan Bansal,Sidharth Mudgal,Chris Varano 机构:Google, Inc. 备注:12 pages, 10 figures 链接:https://arxiv.org/abs/2107.04512 摘要:面向任务的自然语言应用程序(如googleassistant、Siri或Alexa)面临的挑战之一是将输出本地化为多种语言。本文探讨了如何将机器翻译应用于英语输出。使用机器翻译具有很强的可伸缩性,因为它可以处理任何英语输出,并且可以处理动态文本,但在其他方面,问题是不适合。所需的质量栏近乎完美,句子范围极窄,而且句子往往与机器翻译训练数据中的句子差别很大。这种需求组合在机器翻译领域是一种新颖的领域适应。我们可以通过建立现有的想法并添加新的想法来达到所需的质量标准:微调域内翻译、添加来自Web的句子、添加语义注释以及使用自动错误检测。本文分享了我们的研究方法和结果,并提出了一个蒸馏模型,为规模翻译模型服务。 摘要:One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages. This paper explores doing this by applying machine translation to the English output. Using machine translation is very scalable, as it can work with any English output and can handle dynamic text, but otherwise the problem is a poor fit. The required quality bar is close to perfection, the range of sentences is extremely narrow, and the sentences are often very different than the ones in the machine translation training data. This combination of requirements is novel in the field of domain adaptation for machine translation. We are able to reach the required quality bar by building on existing ideas and adding new ones: finetuning on in-domain translations, adding sentences from the Web, adding semantic annotations, and using automatic error detection. The paper shares our approach and results, together with a distillation model to serve the translation models at scale.

【3】 Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression 标题:批次逆方差加权:深度异方差回归

作者:Vincent Mai,Waleed Khamies,Liam Paull 机构:Whaleed Khamies, Robotics and Embodied AI Lab, Mila - Quebec Institute of Artificial Intelligence, Université de Montréal, Canada, Canada CIFAR AI Chair 备注:Accepted at the Uncertainty in Deep Learning (UDL) workshop at ICML 2021 链接:https://arxiv.org/abs/2107.04497 摘要:异方差回归是监督学习的任务,其中每个标签都受到来自不同分布的噪声的影响。这种噪声可能是由标记过程引起的,并且会对学习算法的性能产生负面影响,因为它违反了i.i.d.的假设。然而,在许多情况下,标签过程能够估计每个标签的这种分布的方差,这可以用作减轻这种影响的附加信息。基于Gauss-Markov定理,提出了一种逆方差加权均方误差的神经网络参数优化方法。我们引入了一种对近地真值样本具有鲁棒性的损失函数批量逆方差,并允许控制有效学习率。实验结果表明,与L2丢失、逆方差加权以及基于滤波的基线相比,BIV算法在两个噪声数据集上都显著提高了网络的性能。 摘要:Heteroscedastic regression is the task of supervised learning where each label is subject to noise from a different distribution. This noise can be caused by the labelling process, and impacts negatively the performance of the learning algorithm as it violates the i.i.d. assumptions. In many situations however, the labelling process is able to estimate the variance of such distribution for each label, which can be used as an additional information to mitigate this impact. We adapt an inverse-variance weighted mean square error, based on the Gauss-Markov theorem, for parameter optimization on neural networks. We introduce Batch Inverse-Variance, a loss function which is robust to near-ground truth samples, and allows to control the effective learning rate. Our experimental results show that BIV improves significantly the performance of the networks on two noisy datasets, compared to L2 loss, inverse-variance weighting, as well as a filtering-based baseline.

【4】 Aligning an optical interferometer with beam divergence control and continuous action space 标题:利用光束发散控制和连续作用空间对准光学干涉仪

作者:Stepan Makarenko,Dmitry Sorokin,Alexander Ulanov,A. I. Lvovsky 机构:Russian Quantum Center, Moscow, Russia, Moscow Institute of Physics and Technology, Russia, University of Oxford, United Kingdom 备注:12 pages, 5 figures 链接:https://arxiv.org/abs/2107.04457 摘要:强化学习正从模拟环境向物理环境的转变,逐渐走向现实问题的应用。在这项工作中,我们实现了光学马赫-曾德尔干涉仪与共焦望远镜在一个手臂,它控制相应的光束直径和发散度的视觉对齐。我们使用连续动作空间;指数缩放使我们能够处理超过两个数量级范围内的动作。我们的代理只在一个模拟的环境中进行训练。在实验评估中,代理的性能明显优于现有的解决方案和人类专家。 摘要:Reinforcement learning is finding its way to real-world problem application, transferring from simulated environments to physical setups. In this work, we implement vision-based alignment of an optical Mach-Zehnder interferometer with a confocal telescope in one arm, which controls the diameter and divergence of the corresponding beam. We use a continuous action space; exponential scaling enables us to handle actions within a range of over two orders of magnitude. Our agent trains only in a simulated environment with domain randomizations. In an experimental evaluation, the agent significantly outperforms an existing solution and a human expert.

【5】 A Comparison of Contextual and Non-Contextual Preference Ranking for Set Addition Problems 标题:集合相加问题的上下文偏好排序与非上下文偏好排序的比较

作者:Timo Bertram,Johannes Fürnkranz,Martin Müller 机构: Johannes-Kepler Universit¨at, University of Al-berta 备注:None 链接:https://arxiv.org/abs/2107.04438 摘要:本文研究了元素对集合加法的求值问题。这个问题是困难的,因为在一般情况下,它不能被简化为选择之间的无条件偏好。因此,我们根据决策的上下文建立偏好模型。我们讨论并比较了两种不同的暹罗网络架构:一种是比较加法后产生的两个集合的双网络,另一种是模拟每个候选对现有集合的贡献的三重网络。我们在实际任务中评估这两种设置;学习人类的卡片偏好,在收集纸牌游戏魔术甲板建设:收集。我们证明了三重网络方法比双网络方法获得了更好的结果,并且在这项任务上都优于以前的结果。 摘要:In this paper, we study the problem of evaluating the addition of elements to a set. This problem is difficult, because it can, in the general case, not be reduced to unconditional preferences between the choices. Therefore, we model preferences based on the context of the decision. We discuss and compare two different Siamese network architectures for this task: a twin network that compares the two sets resulting after the addition, and a triplet network that models the contribution of each candidate to the existing set. We evaluate the two settings on a real-world task; learning human card preferences for deck building in the collectible card game Magic: The Gathering. We show that the triplet approach achieves a better result than the twin network and that both outperform previous results on this task.

【6】 How to choose an Explainability Method? Towards a Methodical Implementation of XAI in Practice 标题:如何选择可解释性方法?如何在实践中有条不紊地实施XAI

作者:Tom Vermeire,Thibault Laugel,Xavier Renard,David Martens,Marcin Detyniecki 机构: University of Antwerp, Prinsstraat , Antwerp, Belgium, AXA, Paris, France, Sorbonne Universit´e, CNRS, LIP, F-, Paris, France, Polish Academy of Science, IBS PAN, Warsaw, Poland 链接:https://arxiv.org/abs/2107.04427 摘要:由于监管举措和公众意识的转变,解释性正成为使用自动化决策的组织的一个重要要求。在这个领域中,已经引入了各种不同的算法来提供这种解释性,但是机器学习领域的现有文献很少关注利益相关者的需求,而这些利益相关者的需求是在人机界面领域中研究的。因此,想要或需要提供这种可解释性的组织将面临为其用例选择适当方法的问题。在本文中,我们认为需要一种方法来弥补利益相关者的需求和解释方法之间的差距。我们将介绍我们正在进行的创建此方法的工作,以帮助数据科学家向利益相关者提供可解释性。特别是,我们的贡献包括用于描述XAI方法和用户需求的文档(如附录所示),我们的方法建立在这些文档的基础上。 摘要:Explainability is becoming an important requirement for organizations that make use of automated decision-making due to regulatory initiatives and a shift in public awareness. Various and significantly different algorithmic methods to provide this explainability have been introduced in the field, but the existing literature in the machine learning community has paid little attention to the stakeholder whose needs are rather studied in the human-computer interface community. Therefore, organizations that want or need to provide this explainability are confronted with the selection of an appropriate method for their use case. In this paper, we argue there is a need for a methodology to bridge the gap between stakeholder needs and explanation methods. We present our ongoing work on creating this methodology to help data scientists in the process of providing explainability to stakeholders. In particular, our contributions include documents used to characterize XAI methods and user requirements (shown in Appendix), which our methodology builds upon.

【7】 Multiaccurate Proxies for Downstream Fairness 标题:下游公平性的多精度指标

作者:Emily Diana,Wesley Gill,Michael Kearns,Krishnaram Kenthapadi,Aaron Roth,Saeed Sharifi-Malvajerdi 机构:University of Pennsylvania, Amazon AWS AI 链接:https://arxiv.org/abs/2107.04423 摘要:我们研究的问题是,当敏感特征在训练时不可用时,训练一个必须服从人口公平条件的模型——换句话说,当我们没有关于种族的数据时,我们如何训练一个模型按种族公平?我们采用了公平管道的观点,在这个观点中,一个“上游”学习者能够访问敏感特征,将从其他属性中学习这些特征的代理模型。代理的目标是让一个普通的“下游”学习者——对他们的预测任务有最小的假设——能够使用代理来训练一个对真正的敏感特征公平的模型。我们证明了对下游模型类遵守多精度约束就足够了,并为学习此类代理提供了样本和预言机有效的算法和泛化边界。一般来说,多精度比分类精度更容易满足,甚至在敏感特征难以预测的情况下也能满足。 摘要:We study the problem of training a model that must obey demographic fairness conditions when the sensitive features are not available at training time -- in other words, how can we train a model to be fair by race when we don't have data about race? We adopt a fairness pipeline perspective, in which an "upstream" learner that does have access to the sensitive features will learn a proxy model for these features from the other attributes. The goal of the proxy is to allow a general "downstream" learner -- with minimal assumptions on their prediction task -- to be able to use the proxy to train a model that is fair with respect to the true sensitive features. We show that obeying multiaccuracy constraints with respect to the downstream model class suffices for this purpose, and provide sample- and oracle efficient-algorithms and generalization bounds for learning such proxies. In general, multiaccuracy can be much easier to satisfy than classification accuracy, and can be satisfied even when the sensitive features are hard to predict.

【8】 Multi-headed Neural Ensemble Search 标题:多头神经集成搜索

作者:Ashwin Raaghav Narayanan,Arber Zela,Tonmoy Saikia,Thomas Brox,Frank Hutter 机构: 1University of Freiburg 2Bosch Center for Artificial In-telligence 备注:8 pages, 12 figures, 3 tables 链接:https://arxiv.org/abs/2107.04369 摘要:使用不同种子训练的CNN模型的集合(也称为深集合)被认为比CNN的单个拷贝获得更高的性能。神经集成搜索(NES)可以通过增加架构多样性来进一步提高性能。然而,在有限的计算资源下,NES的范围仍然是禁止的。在这项工作中,我们将神经网络扩展到多头群,它由一个连接到多个预测头的共享主干组成。不同于深层集合,这些多头集合可以端到端地训练,这使我们能够利用一次性NAS方法来优化集合目标。通过大量的实证评估,我们证明了多头集成搜索发现鲁棒集成的速度快3倍,同时在预测性能和不确定度校准方面与其他集成搜索方法具有相当的性能。 摘要:Ensembles of CNN models trained with different seeds (also known as Deep Ensembles) are known to achieve superior performance over a single copy of the CNN. Neural Ensemble Search (NES) can further boost performance by adding architectural diversity. However, the scope of NES remains prohibitive under limited computational resources. In this work, we extend NES to multi-headed ensembles, which consist of a shared backbone attached to multiple prediction heads. Unlike Deep Ensembles, these multi-headed ensembles can be trained end to end, which enables us to leverage one-shot NAS methods to optimize an ensemble objective. With extensive empirical evaluations, we demonstrate that multi-headed ensemble search finds robust ensembles 3 times faster, while having comparable performance to other ensemble search methods, in both predictive performance and uncertainty calibration.

【9】 A Survey on Low-Resource Neural Machine Translation 标题:低资源神经机器翻译研究综述

作者:Rui Wang,Xu Tan,Renqian Luo,Tao Qin,Tie-Yan Liu 机构:Microsoft Research Asia 备注:A short version has been submitted to IJCAI2021 Survey Track on Feb. 26th, 2021, accepted on Apr. 16th, 2021. 14 pages, 4 figures 链接:https://arxiv.org/abs/2107.04239 摘要:神经网络方法在机器翻译中已经达到了最先进的精度,但是由于收集大规模并行数据的成本很高。因此,大量的研究已经进行了非常有限的并行数据,即低资源设置的神经机器翻译(NMT)。本文对低资源NMT进行了综述,并根据所使用的辅助数据将相关工作分为三类:(1)利用源语言和/或目标语言的单语数据;(2)利用辅助语言的数据;(3)利用多模态数据。我们希望我们的调查能够帮助研究人员更好地理解这一领域,启发他们设计更好的算法,帮助行业从业者为自己的应用选择合适的算法。 摘要:Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has been conducted for neural machine translation (NMT) with very limited parallel data, i.e., the low-resource setting. In this paper, we provide a survey for low-resource NMT and classify related works into three categories according to the auxiliary data they used: (1) exploiting monolingual data of source and/or target languages, (2) exploiting data from auxiliary languages, and (3) exploiting multi-modal data. We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms, and help industry practitioners to choose appropriate algorithms for their applications.

【10】 Safe Exploration by Solving Early Terminated MDP 标题:解决MDP提前终止的安全探索

作者:Hao Sun,Ziping Xu,Meng Fang,Zhenghao Peng,Jiadong Guo,Bo Dai,Bolei Zhou 机构:CUHK,University of Michigan,Tencent,HKUST,NTU 链接:https://arxiv.org/abs/2107.04200 摘要:安全探索对于强化学习的实际应用至关重要。以前的工作考虑安全探索问题作为Constrained Markov决策过程(CMDP),其中政策正在优化约束条件下。然而,当遇到任何潜在的危险时,人类往往会立即停止,很少学会在危险中安全行事。在人类学习的推动下,我们提出了一种在早期终止MDP(ET-MDP)框架下解决安全RL问题的新方法。我们首先将ET-MDP定义为一个无约束的MDP,其最优值函数与其对应的CMDP相同。在此基础上,提出了一种基于上下文模型的非策略算法来求解ET-MDP,从而使求解相应的CMDP具有更好的渐近性能和更高的学习效率。在各种CMDP任务上的实验表明,与以前直接求解CMDP的方法相比有了很大的改进。 摘要:Safe exploration is crucial for the real-world application of reinforcement learning (RL). Previous works consider the safe exploration problem as Constrained Markov Decision Process (CMDP), where the policies are being optimized under constraints. However, when encountering any potential dangers, human tends to stop immediately and rarely learns to behave safely in danger. Motivated by human learning, we introduce a new approach to address safe RL problems under the framework of Early Terminated MDP (ET-MDP). We first define the ET-MDP as an unconstrained MDP with the same optimal value function as its corresponding CMDP. An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better asymptotic performance and improved learning efficiency. Experiments on various CMDP tasks show a substantial improvement over previous methods that directly solve CMDP.

【11】 REX: Revisiting Budgeted Training with an Improved Schedule 标题:雷克斯:用改进的时间表重新访问预算训练

作者:John Chen,Cameron Wolfe,Anastasios Kyrillidis 机构:Rice University, Houston, Texas, USA 链接:https://arxiv.org/abs/2107.04197 摘要:深度学习实践者通常在计算和货币预算上操作。因此,设计在任何预算下都有良好性能的优化算法是至关重要的。线性学习率调度被认为是最好的预算感知调度,因为它在低预算情况下优于大多数其他调度。另一方面,众所周知,学习速率计划(如texttt{30-60-90}步骤计划)在模型可以针对多个时期进行训练时可以获得高性能。然而,人们往往不知道一个人的预算是大是小;因此,学习率计划的最佳选择是根据具体情况而定的。在本文中,我们将学习率计划选择问题框架为$i)$选择一个配置文件(即,建立学习率计划模型的连续函数)和$ii)$选择一个采样率(即,从该配置文件更新/采样学习率的频率)的组合。我们提出了一种新的轮廓和采样率组合,称为反射指数(REX)调度,我们使用SGD和Adam优化器对七种不同的实验设置进行了评估。在低预算情况下,REX的表现优于线性计划,而在高预算和低预算情况下,REX的表现与几种最先进的学习率计划(线性、阶跃、指数、余弦、高原阶跃衰减和单周期)相当或超过。此外,REX不需要额外的计算、存储或超参数。 摘要:Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the texttt{30-60-90} step schedule -- are known to achieve high performance when the model can be trained for many epochs. Yet, it is often not known a priori whether one's budget will be large or small; thus, the optimal choice of learning rate schedule is made on a case-by-case basis. In this paper, we frame the learning rate schedule selection problem as a combination of $i)$ selecting a profile (i.e., the continuous function that models the learning rate schedule), and $ii)$ choosing a sampling rate (i.e., how frequently the learning rate is updated/sampled from this profile). We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule, which we evaluate across seven different experimental settings with both SGD and Adam optimizers. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules (linear, step, exponential, cosine, step decay on plateau, and OneCycle) in both high and low budget regimes. Furthermore, REX requires no added computation, storage, or hyperparameters.

【12】 EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments 标题:EasyCom:增强现实数据集,支持在嘈杂环境中轻松通信的算法

作者:Jacob Donley,Vladimir Tourbabin,Jung-Suk Lee,Mark Broyles,Hao Jiang,Jie Shen,Maja Pantic,Vamsi Krishna Ithapu,Ravish Mehra 机构:∗Facebook Reality Labs Research, USA., †Facebook AI Applied Research, UK. 备注:Dataset is available at: this https URL 链接:https://arxiv.org/abs/2107.04174 摘要:增强现实(AR)作为一个平台,有助于降低鸡尾酒会效应。未来的AR头戴式耳机可能会利用来自多种不同模式的传感器阵列的信息。训练和测试信号处理和机器学习算法的任务,如波束形成和语音增强需要高质量的代表性数据。据作者所知,截至出版之日,还没有包含以自我为中心的多通道音频和视频的数据集,这些音频和视频在嘈杂的环境中具有动态移动和对话。在这项工作中,我们描述、评估和发布了一个数据集,其中包含超过5小时的多模态数据,可用于训练和测试算法,以改进AR眼镜佩戴者的会话。我们提供了一个基线方法的语音清晰度、质量和信噪比改进结果,并显示了所有测试指标的改进。我们正在发布的数据集包含AR眼镜、以自我为中心的多通道麦克风阵列音频、宽视场RGB视频、语音源姿势、耳机麦克风音频、带注释的语音活动、语音转录、头部边界框、语音目标和源识别标签。我们已经创建并发布了这个数据集,以促进鸡尾酒会问题的多模式AR解决方案的研究。 摘要:Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To the best of the author's knowledge, as of publication there are no available datasets that contain synchronized egocentric multi-channel audio and video with dynamic movement and conversations in a noisy environment. In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer. We provide speech intelligibility, quality and signal-to-noise ratio improvement results for a baseline method and show improvements across all tested metrics. The dataset we are releasing contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head bounding boxes, target of speech and source identification labels. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem.

【13】 Towards Robust Active Feature Acquisition 标题:面向鲁棒主动特征获取

作者:Yang Li,Siyuan Shan,Qin Liu,Junier B. Oliva 机构:Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 链接:https://arxiv.org/abs/2107.04163 摘要:真正的智能系统需要在数据不完整和不确定的情况下做出关键决策。主动特征获取(AFA)是朝着这个目标迈出的一步,在AFA中,特征被顺序地获取以改进预测。然而,现有的AFA模型都只处理一小部分候选特征,难以扩展到一个大的特征空间。此外,他们不知道有效的领域,他们可以自信地预测,因此他们可能容易受到分布外(OOD)输入。为了弥补这些不足,使AFA模型更接近实际应用,我们提出了几种改进现有AFA方法的技术。我们的框架可以使用分层采集策略轻松处理大量的特征,并且在部分观测数据的OOD检测器的帮助下,对OOD输入更加鲁棒。大量的实验证明了我们的框架在强基线上的有效性。 摘要:Truly intelligent systems are expected to make critical decisions with incomplete and uncertain data. Active feature acquisition (AFA), where features are sequentially acquired to improve the prediction, is a step towards this goal. However, current AFA models all deal with a small set of candidate features and have difficulty scaling to a large feature space. Moreover, they are ignorant about the valid domains where they can predict confidently, thus they can be vulnerable to out-of-distribution (OOD) inputs. In order to remedy these deficiencies and bring AFA models closer to practical use, we propose several techniques to advance the current AFA approaches. Our framework can easily handle a large number of features using a hierarchical acquisition policy and is more robust to OOD inputs with the help of an OOD detector for partially observed data. Extensive experiments demonstrate the efficacy of our framework over strong baselines.

【14】 Even Faster SNN Simulation with Lazy Event-driven Plasticity and Shared Atomics 标题:采用Lazy 事件驱动塑性和共享原子的SNN模拟速度更快

作者:Dennis Bautembach,Iason Oikonomidis,Antonis Argyros 机构:FORTH - ICS & CSD - UOC 备注:Submitted to IEEE-HPEC 2021 链接:https://arxiv.org/abs/2107.04092 摘要:我们提出了两种新的优化方法来加速基于时钟的脉冲神经网络(SNN)模拟器。第一个目标是尖峰时间依赖性可塑性(STDP)。它结合了懒惰和事件驱动的可塑性,并有效地促进了使用位场和整数内部函数计算突触前和突触后峰值。它提供了更高的带宽比事件驱动塑性单独实现了1.5倍-2倍的加速比我们最接近的竞争对手。第二个优化目标是尖峰交货。我们以一种限制在任何给定时间需要更新的神经元数量的方式来划分我们的图表示,这允许我们在共享内存而不是全局内存中执行所述更新。这比我们最接近的竞争对手快2-2.5倍。这两种优化都代表了STDP和Spice(我们最先进的SNN模拟器)内部峰值交付多年迭代的最终进化阶段。所提出的优化不仅限于我们的图形表示或管道,而且适用于多种模拟器设计。我们在三个成熟的模型上评估我们的性能,并将我们自己与其他三个最先进的模拟器进行比较。 摘要:We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators. The first one targets spike timing dependent plasticity (STDP). It combines lazy- with event-driven plasticity and efficiently facilitates the computation of pre- and post-synaptic spikes using bitfields and integer intrinsics. It offers higher bandwidth than event-driven plasticity alone and achieves a 1.5x-2x speedup over our closest competitor. The second optimization targets spike delivery. We partition our graph representation in a way that bounds the number of neurons that need be updated at any given time which allows us to perform said update in shared memory instead of global memory. This is 2x-2.5x faster than our closest competitor. Both optimizations represent the final evolutionary stages of years of iteration on STDP and spike delivery inside "Spice" (/spaIk/), our state of the art SNN simulator. The proposed optimizations are not exclusive to our graph representation or pipeline but are applicable to a multitude of simulator designs. We evaluate our performance on three well-established models and compare ourselves against three other state of the art simulators.

【15】 Accelerating Spherical k-Means 标题:加速球面k-均值算法

作者:Erich Schubert,Andreas Lang,Gloria Feher 机构:TU Dortmund University, Dortmund, Germany 链接:https://arxiv.org/abs/2107.04074 摘要:球形k-均值算法是一种广泛应用于稀疏高维数据(如文档向量)的聚类算法。虽然对原来的k-means算法进行了一些改进和加速,但并不是所有的算法都很容易转化为球形变量:许多加速技术,如Elkan和Hamerly算法,都依赖于欧氏距离的三角不等式。然而,为了提高计算效率,球形k-均值使用余弦相似性而不是距离。在本文中,我们将Elkan和Hamerly加速度合并到球面k-means算法中,直接使用余弦代替欧几里德距离,以获得可观的加速比,并在实际数据上评估这些球面加速度。 摘要:Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors. While several improvements and accelerations have been introduced for the original k-means algorithm, not all easily translate to the spherical variant: Many acceleration techniques, such as the algorithms of Elkan and Hamerly, rely on the triangle inequality of Euclidean distances. However, spherical k-means uses Cosine similarities instead of distances for computational efficiency. In this paper, we incorporate the Elkan and Hamerly accelerations to the spherical k-means algorithm working directly with the Cosines instead of Euclidean distances to obtain a substantial speedup and evaluate these spherical accelerations on real data.

【16】 A Triangle Inequality for Cosine Similarity 标题:一个余弦相似的三角形不等式

作者:Erich Schubert 机构:TU Dortmund University, Dortmund, Germany 链接:https://arxiv.org/abs/2107.04071 摘要:相似性搜索是许多数据分析技术的一个基本问题。许多有效的搜索技术依赖于度量的三角不等式,该不等式允许根据距离的传递边界修剪部分搜索空间。最近,余弦相似度已经成为标准欧几里德度量的一种流行的替代选择,特别是在文本数据和神经网络嵌入的环境中。遗憾的是,余弦相似性不是度量的,不满足标准的三角不等式。相反,许多余弦搜索技术依赖于近似技术,如局部敏感哈希。本文推导了一个余弦相似度的三角不等式,该不等式适用于许多标准搜索结构(如VP树、覆盖树和M树)的有效相似度搜索;证明了这个界是紧的,并讨论了它的快速逼近。我们希望这能促进新的研究,加速精确的余弦相似性搜索,以及在现有的距离度量工作之外的其他可能的相似性度量。 摘要:Similarity search is a fundamental problem for many data analysis techniques. Many efficient search techniques rely on the triangle inequality of metrics, which allows pruning parts of the search space based on transitive bounds on distances. Recently, Cosine similarity has become a popular alternative choice to the standard Euclidean metric, in particular in the context of textual data and neural network embeddings. Unfortunately, Cosine similarity is not metric and does not satisfy the standard triangle inequality. Instead, many search techniques for Cosine rely on approximation techniques such as locality sensitive hashing. In this paper, we derive a triangle inequality for Cosine similarity that is suitable for efficient similarity search with many standard search structures (such as the VP-tree, Cover-tree, and M-tree); show that this bound is tight and discuss fast approximations for it. We hope that this spurs new research on accelerating exact similarity search for cosine similarity, and possible other similarity measures beyond the existing work for distance metrics.

【17】 Generalization of the Change of Variables Formula with Applications to Residual Flows 标题:变量变换公式的推广及其在剩余流中的应用

作者:Niklas Koenen,Marvin N. Wright,Peter Maaß,Jens Behrmann 机构: Universityof Bremen, Germany 2Leibniz Institute for PreventionResearch and Epidemiology – BIPS 链接:https://arxiv.org/abs/2107.04346 摘要:标准化流程利用变量变化公式(CVF)定义灵活的密度模型。然而,CVF中光滑变换(微分同胚)的要求对这些模型的构造提出了重大挑战。为了扩大流的设计空间,我们引入$mathcal{L}$-微分同胚作为广义变换,这可能违反零Lebesgue测度集的这些要求。这种松弛允许使用非光滑激活函数,例如ReLU。最后,我们将所得结果应用于平面、径向和收缩残余流。 摘要:Normalizing flows leverage the Change of Variables Formula (CVF) to define flexible density models. Yet, the requirement of smooth transformations (diffeomorphisms) in the CVF poses a significant challenge in the construction of these models. To enlarge the design space of flows, we introduce $mathcal{L}$-diffeomorphisms as generalized transformations which may violate these requirements on zero Lebesgue-measure sets. This relaxation allows e.g. the use of non-smooth activation functions such as ReLU. Finally, we apply the obtained results to planar, radial, and contractive residual flows.

0 人点赞