cs.LG 方向,今日共计74篇
Graph相关(图学习|图神经网络|图优化等)(3篇)
【1】 Trans4E: Link Prediction on Scholarly Knowledge Graphs 标题:Trans4E:学术知识图上的链接预测
作者:Mojtaba Nayyeri,Gokce Muge Cil,Sahar Vahdati,Francesco Osborne,Mahfuzur Rahman,Simone Angioni,Angelo Salatino,Diego Reforgiato Recupero,Nadezhda Vassilyeva,Enrico Motta,Jens Lehmann 机构:SDA Research Group, University of Bonn (Germany), Institute for Applied Informatics (InfAI), Fraunhofer IAIS, Dresden (Germany), Knowledge Media Institute, The Open University, Milton Keynes (UK) 链接:https://arxiv.org/abs/2107.03297 摘要:知识图的不完全性是影响人工智能服务质量的关键问题。在学术领域,描述研究出版物的KG通常缺乏重要信息,妨碍了我们分析和预测研究动态的能力。近年来,基于知识图嵌入模型的链路预测方法成为解决这一问题的急救手段。在这项工作中,我们提出了一个新的嵌入模型Trans4E,它特别适合于包含N到M关系和N$gg$M的KGs。这对于将大量实体(例如,研究文章、专利、人员)按照相对较小的类别进行分类的KG来说是典型的。Trans4E被应用于两个大规模的知识图,学术/工业动态(AIDA)和微软学术图(MAG),以完成有关研究领域(如“神经网络”、“机器学习”、“人工智能”)和附属类型(如“教育”、“公司”、“政府”)的信息,提高结果数据的范围和准确性。我们根据AIDA、MAG和其他四个基准(FB15k、FB15k-237、WN18和WN18RR)的替代解决方案评估了我们的方法。Trans4E模型在低嵌入维数下的性能优于其他模型,在高嵌入维数下具有竞争性。 摘要:The incompleteness of Knowledge Graphs (KGs) is a crucial issue affecting the quality of AI-based services. In the scholarly domain, KGs describing research publications typically lack important information, hindering our ability to analyse and predict research dynamics. In recent years, link prediction approaches based on Knowledge Graph Embedding models became the first aid for this issue. In this work, we present Trans4E, a novel embedding model that is particularly fit for KGs which include N to M relations with N$gg$M. This is typical for KGs that categorize a large number of entities (e.g., research articles, patents, persons) according to a relatively small set of categories. Trans4E was applied on two large-scale knowledge graphs, the Academia/Industry DynAmics (AIDA) and Microsoft Academic Graph (MAG), for completing the information about Fields of Study (e.g., 'neural networks', 'machine learning', 'artificial intelligence'), and affiliation types (e.g., 'education', 'company', 'government'), improving the scope and accuracy of the resulting data. We evaluated our approach against alternative solutions on AIDA, MAG, and four other benchmarks (FB15k, FB15k-237, WN18, and WN18RR). Trans4E outperforms the other models when using low embedding dimensions and obtains competitive results in high dimensions.
【2】 Graphing else matters: exploiting aspect opinions and ratings in explainable graph-based recommendations 标题:用图表表示其他问题:在可解释的基于图表的推荐中利用方面意见和评级
作者:Iván Cantador,Andrés Carvallo,Fernando Diez,Denis Parra 机构: Universidad Autónoma de Madrid, Pontificia Universidad Católica de Chile 链接:https://arxiv.org/abs/2107.03226 摘要:随着神经网络嵌入技术的成功,人们对利用知识图进行各种机器学习和信息检索产生了新的兴趣。特别是,目前基于图嵌入的推荐方法已经显示出了最先进的性能。这些方法通常对潜在的评级模式和内容特征进行编码。与以往的工作不同,本文提出利用从图中提取的嵌入信息,将来自评分的信息和文本评论中表达的基于方面的观点结合起来。然后,我们在6个域上采用并评估了最先进的图嵌入技术,这些技术比Amazon和Yelp评论生成的图具有更好的性能。我们的方法的优点是提供解释,利用用户对推荐项目给出的基于方面的意见。此外,我们还提供了在可视化仪表板中使用方面观点作为解释的建议的适用性示例,该仪表板允许获得从输入图的嵌入中获得的类似用户最喜欢和最不喜欢的方面的信息。 摘要:The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, current recommendation methods based on graph embeddings have shown state-of-the-art performance. These methods commonly encode latent rating patterns and content features. Different from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Our approach has the advantage of providing explanations which leverage aspect-based opinions given by users about recommended items. Furthermore, we also provide examples of the applicability of recommendations utilizing aspect opinions as explanations in a visualization dashboard, which allows obtaining information about the most and least liked aspects of similar users obtained from the embeddings of an input graph.
【3】 Joint Embedding of Structural and Functional Brain Networks with Graph Neural Networks for Mental Illness Diagnosis 标题:用于精神疾病诊断的结构和功能脑网络与图形神经网络的联合嵌入
作者:Yanqiao Zhu,Hejie Cui,Lifang He,Lichao Sun,Carl Yang 机构:Institute of Automation, Chinese Academy of Sciences 2School ofArtificial Intelligence, University of Chinese Academy of Sciences 3Department of Computer Science, Emory University 4Departmentof Computer Science and Engineering, Lehigh University 备注:Accepted to ICML 2021 Workshop on Computational Approaches to Mental Health 链接:https://arxiv.org/abs/2107.03220 摘要:多模态脑网络从结构和功能两个方面描述了不同脑区之间复杂的联系,为精神疾病分析提供了新的手段。近年来,图神经网络(GNNs)已成为分析图结构数据的一种实际模型。然而,如何利用GNNs以多种形式从脑网络中提取有效的表征仍然很少被研究。此外,由于脑网络不提供初始节点特征,如何设计信息节点属性和利用边缘权值进行GNNs学习是一个有待解决的问题。为此,我们开发了一种新的多视图GNN多模态脑网络。特别是,我们将每种模式视为大脑网络的视图,并采用对比学习进行多模式融合。然后,我们提出了一个GNN模型,该模型利用了基于度统计和脑区连通性的消息传播机制。在两个真实世界的疾病数据集(HIV和两极)上的大量实验证明了我们提出的方法在最新基线上的有效性。 摘要:Multimodal brain networks characterize complex connectivities among different brain regions from both structural and functional aspects and provide a new means for mental disease analysis. Recently, Graph Neural Networks (GNNs) have become a de facto model for analyzing graph-structured data. However, how to employ GNNs to extract effective representations from brain networks in multiple modalities remains rarely explored. Moreover, as brain networks provide no initial node features, how to design informative node attributes and leverage edge weights for GNNs to learn is left unsolved. To this end, we develop a novel multiview GNN for multimodal brain networks. In particular, we regard each modality as a view for brain networks and employ contrastive learning for multimodal fusion. Then, we propose a GNN model which takes advantage of the message passing scheme by propagating messages based on degree statistics and brain region connectivities. Extensive experiments on two real-world disease datasets (HIV and Bipolar) demonstrate the effectiveness of our proposed method over state-of-the-art baselines.
GAN|对抗|攻击|生成相关(12篇)
【1】 Incorporating Label Uncertainty in Understanding Adversarial Robustness 标题:在理解对抗性稳健性时引入标签不确定性
作者:Xiao Zhang,David Evans 机构:Department of Computer Science, University of Virginia 备注:20 pages, 6 figures, 1 table 链接:https://arxiv.org/abs/2107.03250 摘要:对抗式机器学习的一个基本问题是,对于给定的任务,是否存在鲁棒分类器。一系列的研究已经通过研究测量的集中度而朝着这个目标取得了进展,但是没有考虑数据标签。我们认为标准浓度不能完全描述分类问题的内在稳健性,因为它忽略了任何分类任务所必需的数据标签。基于标签不确定性的一个新定义,我们从经验上证明,与随机选择的子集相比,由最新模型引起的错误区域往往具有更高的标签不确定性。这一观察结果促使我们采用浓度估计算法来考虑标签的不确定性,从而为基准图像分类问题提供更精确的内在鲁棒性度量。我们进一步提供了经验证据表明,在基于标签不确定性的分类器中添加弃权选项有助于提高模型的清晰性和鲁棒性。 摘要:A fundamental question in adversarial machine learning is whether a robust classifier exists for a given task. A line of research has made progress towards this goal by studying concentration of measure, but without considering data labels. We argue that the standard concentration fails to fully characterize the intrinsic robustness of a classification problem, since it ignores data labels which are essential to any classification task. Building on a novel definition of label uncertainty, we empirically demonstrate that error regions induced by state-of-the-art models tend to have much higher label uncertainty compared with randomly-selected subsets. This observation motivates us to adapt a concentration estimation algorithm to account for label uncertainty, resulting in more accurate intrinsic robustness measures for benchmark image classification problems. We further provide empirical evidence showing that adding an abstain option for classifiers based on label uncertainty can help improve both the clean and robust accuracies of models.
【2】 On Training Instance Selection for Few-Shot Neural Text Generation 标题:Few-Shot神经文本生成中的训练实例选择研究
作者:Ernie Chang,Xiaoyu Shen,Hui-Syuan Yeh,Vera Demberg 机构:Dept. of Language Science and Technology, Saarland University 备注:Accepted at ACL 2021 链接:https://arxiv.org/abs/2107.03176 摘要:大规模的预训练语言模型导致了文本生成的巨大改进。只需在少量实例(很少的镜头设置)上进行微调,就可以获得令人印象深刻的性能。尽管如此,几乎所有以前的工作只是应用随机抽样来选择少数镜头训练实例。对于选择策略以及它们将如何影响模型性能,几乎没有人关注。在这项工作中,我们提出了一个研究训练实例选择在Few-Shot神经文本生成。选择决策只基于未标注的数据,以便在一定的标注成本预算下确定最值得标注的数据点。基于少数镜头训练实例应具有多样性和代表整个数据分布的直觉,我们提出了一种简单的K-均值聚类选择策略。我们发现,即使使用基于朴素聚类的方法,生成模型在三个文本生成任务(数据到文本生成、文档摘要和问题生成)上仍然优于随机抽样。我们希望,这项工作将引起对这一基本上未开发地区的更多关注。 摘要:Large-scale pretrained language models have led to dramatic improvements in text generation. Impressive performance can be achieved by finetuning only on a small number of instances (few-shot setting). Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect model performance. In this work, we present a study on training instance selection in few-shot neural text generation. The selection decision is made based only on the unlabeled data so as to identify the most worthwhile data points that should be annotated under some budget of labeling cost. Based on the intuition that the few-shot training instances should be diverse and representative of the entire data distribution, we propose a simple selection strategy with K-means clustering. We show that even with the naive clustering-based approach, the generation models consistently outperform random sampling on three text generation tasks: data-to-text generation, document summarization and question generation. We hope that this work will call for more attention on this largely unexplored area.
【3】 Controlled Caption Generation for Images Through Adversarial Attacks 标题:基于对抗性攻击的受控图像字幕生成
作者:Nayyer Aafaq,Naveed Akhtar,Wei Liu,Mubarak Shah,Ajmal Mian 机构:University of Western Australia, Stirling Highway, WA, University of Central Florida, Central Florida Blvd, Orlando, Florida, USA 链接:https://arxiv.org/abs/2107.03050 摘要:深层次的学习很容易受到敌对例子的影响。然而,它在image caption生成中的对抗敏感性还没有得到充分的研究。我们研究视觉和语言模型的对抗性例子,这些模型通常采用由两个主要组件组成的编解码器框架:用于图像特征提取的卷积神经网络(即CNN)和用于字幕生成的递归神经网络(RNN)。特别地,我们调查了对视觉编码器的隐藏层的攻击,该层被馈送到随后的递归网络。现有的方法要么攻击视觉编码器的分类层,要么从语言模型中反向传播梯度。相比之下,我们提出了一种基于GAN的算法,用于制作神经图像字幕的对抗性示例,该算法模拟CNN的内部表示,从而使输入图像的深度特征能够通过递归网络控制错误字幕的生成。我们的贡献为理解具有语言成分的视觉系统的对抗性攻击提供了新的见解。该方法采用两种策略进行综合评价。第一个检查是否神经图像字幕系统可以误导输出目标图像字幕。第二部分分析了在预测字幕中加入关键词的可能性。实验表明,该算法能够在CNN隐藏层的基础上生成有效的对抗性图像,从而愚弄字幕框架。此外,我们发现所提出的攻击是高度可转移的。我们的工作为神经图像字幕带来了新的鲁棒性含义。 摘要:Deep learning is found to be vulnerable to adversarial examples. However, its adversarial susceptibility in image caption generation is under-explored. We study adversarial examples for vision and language models, which typically adopt an encoder-decoder framework consisting of two major components: a Convolutional Neural Network (i.e., CNN) for image feature extraction and a Recurrent Neural Network (RNN) for caption generation. In particular, we investigate attacks on the visual encoder's hidden layer that is fed to the subsequent recurrent network. The existing methods either attack the classification layer of the visual encoder or they back-propagate the gradients from the language model. In contrast, we propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN such that the resulting deep features of the input image enable a controlled incorrect caption generation through the recurrent network. Our contribution provides new insights for understanding adversarial attacks on vision systems with language component. The proposed method employs two strategies for a comprehensive evaluation. The first examines if a neural image captioning system can be misled to output targeted image captions. The second analyzes the possibility of keywords into the predicted captions. Experiments show that our algorithm can craft effective adversarial images based on the CNN hidden layers to fool captioning framework. Moreover, we discover the proposed attack to be highly transferable. Our work leads to new robustness implications for neural image captioning.
【4】 Keiki: Towards Realistic Danmaku Generation via Sequential GANs 标题:Keiki:通过顺序Gans走向现实的Danmaku一代
作者:Ziqi Wang,Jialin Liu,Georgios N. Yannakakis 机构:∗ Research Institute of Trustworthy Autonomous System, Southern University of Science and Technology, †Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Department of Computer Science and Engineering, Shenzhen, China 备注:This paper is accepted by the 2021 IEEE Conference on Games 链接:https://arxiv.org/abs/2107.02991 摘要:基于搜索的程序性内容生成方法最近被引入到子弹地狱游戏的自主创建中。然而,基于搜索的方法很难显式地对danmakus(子弹地狱射击实体)的模式进行建模,而且生成的级别通常看起来不现实。在本文中,我们提出了一个新的子弹地狱游戏平台Keiki,它允许将danmakus表示为一个参数序列,而这个参数序列又可以模拟danmakus的序列行为。我们采用了三种类型的生成性对抗网络(GANs),并测试了Keiki的三个指标,旨在量化生成的danmakus的质量。时间序列GAN和周期空间GAN在所采用的评价指标、它们与人类设计的danmakus的偏差以及生成的danmakus的多样性方面表现出不同的竞争性能。初步的实验研究展示了时间序列GANs在游戏连续内容生成中的潜力。 摘要:Search-based procedural content generation methods have recently been introduced for the autonomous creation of bullet hell games. Search-based methods, however, can hardly model patterns of danmakus -- the bullet hell shooting entity -- explicitly and the resulting levels often look non-realistic. In this paper, we present a novel bullet hell game platform named Keiki, which allows the representation of danmakus as a parametric sequence which, in turn, can model the sequential behaviours of danmakus. We employ three types of generative adversarial networks (GANs) and test Keiki across three metrics designed to quantify the quality of the generated danmakus. The time-series GAN and periodic spatial GAN show different yet competitive performance in terms of the evaluation metrics adopted, their deviation from human-designed danmakus, and the diversity of generated danmakus. The preliminary experimental studies presented here showcase that potential of time-series GANs for sequential content generation in games.
【5】 Deep Extrapolation for Attribute-Enhanced Generation 标题:用于属性增强型生成的深度外推
作者:Alvin Chan,Ali Madani,Ben Krause,Nikhil Naik 机构:Salesforce Research, NTU 链接:https://arxiv.org/abs/2107.02968 摘要:样本生成中的属性外推对于超越训练分布的深层神经网络来说是一个挑战。我们在序列生成中提出了一个新的外推任务,重点是自然语言和蛋白质,并提出了GENhance,一个通过学习的潜在空间来增强属性的生成框架。通过电影评论和蛋白质稳定性数据集的训练,GENhance可以生成强阳性的文本评论和高度稳定的蛋白质序列,而无需在训练期间暴露于类似的数据。我们发布了我们的基准任务和模型,以促进生物和化学中生成性建模外推和数据驱动设计的研究。 摘要:Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins, and propose GENhance, a generative framework that enhances attributes through a learned latent space. Trained on movie reviews and a computed protein stability dataset, GENhance can generate strongly-positive text reviews and highly stable protein sequences without being exposed to similar data during training. We release our benchmark tasks and models to contribute to the study of generative modeling extrapolation and data-driven design in biology and chemistry.
【6】 Immunization of Pruning Attack in DNN Watermarking Using Constant Weight Code 标题:基于等权码的DNN水印中剪枝攻击的免疫
作者:Minoru Kuribayashi,Tatsuya Yasui,Asad Malik,Nobuo Funabiki 机构:Funabiki, Member, IEEE 链接:https://arxiv.org/abs/2107.02961 摘要:为了保护DNN模型的知识产权,研究了在不严重影响原任务性能的前提下,将边信息嵌入模型的水印技术。DNN水印的威胁之一是剪枝攻击,剪枝攻击使得模型中不太重要的神经元被剪枝,使其更快、更紧凑,并去除水印。在这项研究中,我们探讨了一种抗剪枝攻击的信道编码方法。由于信道模型与数字图像等传统模型完全不同,因此采用何种编码方法进行DNN水印一直是一个悬而未决的问题。提出了一种利用等权码免疫剪枝攻击的编码方法。据我们所知,这是第一个研究,介绍了一种编码技术的DNN水印,使其对剪枝攻击的鲁棒性。 摘要:To ensure protection of the intellectual property rights of DNN models, watermarking techniques have been investigated to insert side-information into the models without seriously degrading the performance of original task. One of the threats for the DNN watermarking is the pruning attack such that less important neurons in the model are pruned to make it faster and more compact as well as to remove the watermark. In this study, we investigate a channel coding approach to resist the pruning attack. As the channel model is completely different from conventional models like digital images, it has been an open problem what kind of encoding method is suitable for DNN watermarking. A novel encoding approach by using constant weight codes to immunize the effects of pruning attacks is presented. To the best of our knowledge, this is the first study that introduces an encoding technique for DNN watermarking to make it robust against pruning attacks.
【7】 Bi-Level Poisoning Attack Model and Countermeasure for Appliance Consumption Data of Smart Homes 标题:智能家电消费数据的二级中毒攻击模型及对策
作者:Mustain Billah,Adnan Anwar,Ziaur Rahman,Syed Md. Galib 机构:Department of CSE, Jashore University of Science and Technology (JUST), Jashore-, Bangladesh;, Centre for Cyber Security Research and Innovation, Deakin University-, Australia, 备注:None 链接:https://arxiv.org/abs/2107.02897 摘要:准确的建筑能源预测在从建筑能源自动化和管理到最优存储控制的各种应用中都很有用。然而,在设计建筑能耗预测模型时,应该考虑漏洞,因为智能攻击者可以使用复杂的攻击模型故意影响模型性能。这可能会降低预测精度,从而影响建筑能源管理系统的效率和性能。本文研究了两级中毒攻击对家电能耗回归模型的影响。在此基础上,提出了一种有效的预防中毒攻击的方法。在基准数据集上评估攻击和防御。实验结果表明,一个智能网络攻击者可以通过对预测模型下毒来操纵决策。然而,与其他基准技术相比,我们提出的解决方案成功地确保了对此类中毒攻击的有效防御。 摘要:Accurate building energy prediction is useful in various applications starting from building energy automation and management to optimal storage control. However, vulnerabilities should be considered when designing building energy prediction models, as intelligent attackers can deliberately influence the model performance using sophisticated attack models. These may consequently degrade the prediction accuracy, which may affect the efficiency and performance of the building energy management systems. In this paper, we investigate the impact of bi-level poisoning attacks on regression models of energy usage obtained from household appliances. Furthermore, an effective countermeasure against the poisoning attacks on the prediction model is proposed in this paper. Attacks and defenses are evaluated on a benchmark dataset. Experimental results show that an intelligent cyber-attacker can poison the prediction model to manipulate the decision. However, our proposed solution successfully ensures defense against such poisoning attacks effectively compared to other benchmark techniques.
【8】 Bio-Inspired Adversarial Attack Against Deep Neural Networks 标题:基于生物启发的针对深度神经网络的敌意攻击
作者:Bowei Xi,Yujie Chen,Fan Fei,Zhan Tu,Xinyan Deng 机构:Department of Statistics, Purdue University, University of Chicago, School of Mechanical Engineering 备注:None 链接:https://arxiv.org/abs/2107.02895 摘要:在将仿生设计应用于运动物体的基础上,提出了一种新的针对深度神经网络(DNN)的对抗性攻击方法。据我们所知,这是第一次引入移动物体的物理攻击。本文给出了两种新的成功的攻击策略,而不是沿用现有文献中的主要攻击策略,即对数字输入或静止的物理对象引入小扰动。我们将几个模式叠加到一个物理对象上,DNN会变得混乱,并选择其中一个模式来指定类标签。我们用三个扑翼机器人进行的实验证明了发展一种对抗性伪装的可能性,以引起DNN的目标错误。我们还发现,某些运动可以减少视频中连续帧之间的依赖性,使目标检测器“盲”,即无法检测到视频中存在的目标。因此,在对DNN进行成功的物理攻击时,还应考虑针对系统的目标运动。 摘要:The paper develops a new adversarial attack against deep neural networks (DNN), based on applying bio-inspired design to moving physical objects. To the best of our knowledge, this is the first work to introduce physical attacks with a moving object. Instead of following the dominating attack strategy in the existing literature, i.e., to introduce minor perturbations to a digital input or a stationary physical object, we show two new successful attack strategies in this paper. We show by superimposing several patterns onto one physical object, a DNN becomes confused and picks one of the patterns to assign a class label. Our experiment with three flapping wing robots demonstrates the possibility of developing an adversarial camouflage to cause a targeted mistake by DNN. We also show certain motion can reduce the dependency among consecutive frames in a video and make an object detector "blind", i.e., not able to detect an object exists in the video. Hence in a successful physical attack against DNN, targeted motion against the system should also be considered.
【9】 Adversarial Machine Learning for Cybersecurity and Computer Vision: Current Developments and Challenges 标题:用于网络安全和计算机视觉的对抗性机器学习:当前发展和挑战
作者:Bowei Xi 机构:Department of Statistics, Purdue University 备注:None 链接:https://arxiv.org/abs/2107.02894 摘要:我们提供了一个全面的概述对手机器学习集中在两个应用领域,即网络安全和计算机视觉。对抗式机器学习的研究解决了机器学习技术广泛应用的一个重大威胁——它们容易受到恶意对手精心设计的攻击。例如,深度神经网络不能正确地对敌方图像进行分类,而敌方图像是通过在干净的图像中加入不可察觉的扰动而产生的。我们首先讨论了针对机器学习技术的三类主要攻击——中毒攻击、逃避攻击和隐私攻击。然后介绍了相应的防御方法以及现有防御方法的弱点和局限性。我们注意到,网络安全和计算机视觉中的对抗性样本有着根本的不同。与训练数据相比,网络安全中的对抗性样本往往具有不同的属性/分布,而计算机视觉中的对抗性图像是在较小的输入扰动下生成的。这使得鲁棒学习技术的发展更加复杂,因为鲁棒学习技术必须能够抵御不同类型的攻击。 摘要:We provide a comprehensive overview of adversarial machine learning focusing on two application domains, i.e., cybersecurity and computer vision. Research in adversarial machine learning addresses a significant threat to the wide application of machine learning techniques -- they are vulnerable to carefully crafted attacks from malicious adversaries. For example, deep neural networks fail to correctly classify adversarial images, which are generated by adding imperceptible perturbations to clean images.We first discuss three main categories of attacks against machine learning techniques -- poisoning attacks, evasion attacks, and privacy attacks. Then the corresponding defense approaches are introduced along with the weakness and limitations of the existing defense approaches. We notice adversarial samples in cybersecurity and computer vision are fundamentally different. While adversarial samples in cybersecurity often have different properties/distributions compared with training data, adversarial images in computer vision are created with minor input perturbations. This further complicates the development of robust learning techniques, because a robust learning technique must withstand different types of attacks.
【10】 RAILS: A Robust Adversarial Immune-inspired Learning System 标题:Rails:一种健壮的对抗性免疫启发学习系统
作者:Ren Wang,Tianqi Chen,Stephen Lindsly,Cooper Stansbury,Alnawaz Rehemtulla,Indika Rajapakse,Alfred Hero 机构:University of Michigan 备注:arXiv admin note: text overlap with arXiv:2012.10485 链接:https://arxiv.org/abs/2107.02840 摘要:针对深度神经网络(DNNs)的对抗性攻击不断发展,需要越来越强大的防御策略。我们开发了一个受适应性免疫系统启发的新型对抗性防御框架:健壮的对抗性免疫启发学习系统(RAILS)。在初始化跨类平衡的示例群时,RAILS从一个统一的标签分布开始,它鼓励多样性并消除可能损坏的初始条件。RAILS实现了一个进化优化过程来调整标签分布并实现对地面真实性的特异性。RAILS在健壮性(多样性)和精确性(特异性)之间进行了权衡,为对抗性学习提供了一个新的免疫启发视角。我们通过在MNIST、SVHN和CIFAR-10数据集上的几个对抗性图像分类实验,验证了RAILS的优势。对于PGD攻击,RAILS的鲁棒性分别比现有方法提高了5.62%、12.5%和10.32%,而标准精度没有明显损失。 摘要:Adversarial attacks against deep neural networks (DNNs) are continuously evolving, requiring increasingly powerful defense strategies. We develop a novel adversarial defense framework inspired by the adaptive immune system: the Robust Adversarial Immune-inspired Learning System (RAILS). Initializing a population of exemplars that is balanced across classes, RAILS starts from a uniform label distribution that encourages diversity and debiases a potentially corrupted initial condition. RAILS implements an evolutionary optimization process to adjust the label distribution and achieve specificity towards ground truth. RAILS displays a tradeoff between robustness (diversity) and accuracy (specificity), providing a new immune-inspired perspective on adversarial learning. We empirically validate the benefits of RAILS through several adversarial image classification experiments on MNIST, SVHN, and CIFAR-10 datasets. For the PGD attack, RAILS is found to improve the robustness over existing methods by >= 5.62%, 12.5% and 10.32%, respectively, without appreciable loss of standard accuracy.
【11】 A Deep Residual Star Generative Adversarial Network for multi-domain Image Super-Resolution 标题:一种用于多域图像超分辨率的深度残差星生成对抗网络
作者:Rao Muhammad Umer,Asad Munir,Christian Micheloni 机构:Dept. of Computer Science, University of Udine, Udine, Italy 备注:5 pages, 6th International Conference on Smart and Sustainable Technologies 2021. arXiv admin note: text overlap with arXiv:2009.03693, arXiv:2005.00953 链接:https://arxiv.org/abs/2107.03145 摘要:近年来,利用深度卷积神经网络(DCNNs)实现的最新单图像超分辨率(SISR)方法取得了令人瞩目的性能。现有的SR方法由于固定的退化设置(即通常是低分辨率(LR)图像的双三次缩小)而性能有限。然而,在现实环境中,LR退化过程是未知的,可以是双三次LR、双线性LR、最近邻LR或真实LR。因此,大多数SR方法在处理单个网络中的多个降级设置时是无效和低效的。为了解决多重退化问题,即多域图像的超分辨率问题,我们提出了一种深度超分辨率残差StarGAN(SR2*GAN)算法,该算法只需一个模型就可以对多个LR域的LR图像进行超分辨率处理。该方案是在一个类似StarGAN的网络拓扑结构中训练的,该拓扑结构由一个生成器和一个鉴别器网络组成。与其他先进的方法相比,我们在定量和定性实验中证明了我们提出的方法的有效性。 摘要:Recently, most of state-of-the-art single image super-resolution (SISR) methods have attained impressive performance by using deep convolutional neural networks (DCNNs). The existing SR methods have limited performance due to a fixed degradation settings, i.e. usually a bicubic downscaling of low-resolution (LR) image. However, in real-world settings, the LR degradation process is unknown which can be bicubic LR, bilinear LR, nearest-neighbor LR, or real LR. Therefore, most SR methods are ineffective and inefficient in handling more than one degradation settings within a single network. To handle the multiple degradation, i.e. refers to multi-domain image super-resolution, we propose a deep Super-Resolution Residual StarGAN (SR2*GAN), a novel and scalable approach that super-resolves the LR images for the multiple LR domains using only a single model. The proposed scheme is trained in a StarGAN like network topology with a single generator and discriminator networks. We demonstrate the effectiveness of our proposed approach in quantitative and qualitative experiments compared to other state-of-the-art methods.
【12】 GAN-based Data Augmentation for Chest X-ray Classification 标题:基于GaN的胸部X线分类数据增强
作者:Shobhita Sundaram,Neha Hulkund 机构:Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 备注:Spotlight Talk at KDD 2021 - Applied Data Science for Healthcare Workshop 链接:https://arxiv.org/abs/2107.02970 摘要:计算机视觉中的一个常见问题——特别是在医学应用中——是缺乏足够多样化的大型训练数据集。这些数据集经常遭受严重的类不平衡。因此,网络往往过拟合,无法推广到新的例子。生成性对抗网络(Generative敌对网络,GANs)提供了一种新的综合数据扩充方法。在这项工作中,我们评估了使用基于GAN的数据增强来人工扩展胸部X光片的CheXpert数据集。我们将性能与传统的增广进行了比较,发现基于GAN的增广对于代表性不足的类具有更高的下游性能。此外,我们看到这个结果在低数据方案中是明显的。这表明,基于GAN的增强是一个很有前途的研究领域,可以在数据收集成本过高的情况下提高网络性能。 摘要:A common problem in computer vision -- particularly in medical applications -- is a lack of sufficiently diverse, large sets of training data. These datasets often suffer from severe class imbalance. As a result, networks often overfit and are unable to generalize to novel examples. Generative Adversarial Networks (GANs) offer a novel method of synthetic data augmentation. In this work, we evaluate the use of GAN- based data augmentation to artificially expand the CheXpert dataset of chest radiographs. We compare performance to traditional augmentation and find that GAN-based augmentation leads to higher downstream performance for underrepresented classes. Furthermore, we see that this result is pronounced in low data regimens. This suggests that GAN-based augmentation a promising area of research to improve network performance when data collection is prohibitively expensive.
半/弱/无/有监督|不确定性|主动学习(5篇)
【1】 A Survey of Uncertainty in Deep Neural Networks 标题:深度神经网络中的不确定性综述
作者:Jakob Gawlikowski,Cedrique Rovile Njieutcheu Tassi,Mohsin Ali,Jongseok Lee,Matthias Humt,Jianxiang Feng,Anna Kruspe,Rudolph Triebel,Peter Jung,Ribana Roscher,Muhammad Shahzad,Wen Yang,Richard Bamler,Xiao Xiang Zhu 链接:https://arxiv.org/abs/2107.03342 摘要:由于它们的传播越来越广泛,神经网络预测的可信度变得越来越重要。然而,基本的神经网络不能提供确定性估计或遭受过度或不足的信心。许多研究人员一直致力于理解和量化神经网络预测中的不确定性。因此,不同类型和来源的不确定性已经确定,并提出了各种方法来衡量和量化神经网络中的不确定性。这项工作给出了神经网络中不确定性估计的全面概述,回顾了该领域的最新进展,突出了当前的挑战,并确定了潜在的研究机会。它的目的是给任何对神经网络中的不确定性估计感兴趣的人一个广泛的概述和介绍,而不预先假定在这个领域的先验知识。全面介绍了最关键的不确定性来源,并将其分为可约模型不确定性和不可约数据不确定性。介绍了基于确定性神经网络、贝叶斯神经网络、神经网络集成和测试时间数据扩充方法的不确定性建模,讨论了这些领域的不同分支和最新发展。对于实际应用,我们讨论了不同的不确定度度量,神经网络的校准方法,并给出了现有基线和实现的概述。不同领域的各种挑战中的不同例子说明了实际应用中不确定性的需求和挑战。此外,还讨论了当前任务和安全关键现实世界应用方法的实际局限性,并对下一步更广泛地使用此类方法进行了展望。 摘要:Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.
【2】 SelfCF: A Simple Framework for Self-supervised Collaborative Filtering 标题:SelfCF:一种简单的自监督协同过滤框架
作者:Xin Zhou,Aixin Sun,Yong Liu,Jie Zhang,Chunyan Miao 机构: Miao are with the Schoolof Computer Science and Engineering 链接:https://arxiv.org/abs/2107.03019 摘要:协同过滤(CF)被广泛用于从观察到的交互中学习用户或项目的潜在信息表示。现有的基于CF的方法通常采用负抽样来区分不同的项目。即,将观察到的用户项对视为正实例;未观察到的对被视为负实例,并在定义的分布下进行抽样训练。在大数据集上使用负采样的训练在计算上是昂贵的。此外,负项应在定义的分布下仔细取样,以避免在训练数据集中选择观察到的正项。不可避免地,从训练数据集中抽取的一些负项在测试集中可能是正的。近年来,自监督学习(SSL)已成为一种学习无负样本模型的有力工具。在本文中,我们提出了一个自我监督的协同过滤框架(SelfCF),这是专为具有隐含反馈的推荐者场景而设计的。SelfCF的主要思想是增加骨干网生成的输出嵌入,因为增加用户/项目id的原始输入是不可行的。我们提出并研究了三种输出扰动技术,可应用于不同类型的主干网,包括传统的CF模型和基于图的模型。通过将两个流行的推荐模型封装到该框架中,我们在三个数据集上的实验表明,该框架的最佳性能与监督推荐模型相当或更好。我们还发现,与另一个自监督框架相比,SelfCF平均可以提高8.93%的性能。源代码位于:https://github.com/enoche/SelfCF. 摘要:Collaborative filtering (CF) is widely used to learn an informative latent representation of a user or item from observed interactions. Existing CF-based methods commonly adopt negative sampling to discriminate different items. That is, observed user-item pairs are treated as positive instances; unobserved pairs are considered as negative instances and are sampled under a defined distribution for training. Training with negative sampling on large datasets is computationally expensive. Further, negative items should be carefully sampled under the defined distribution, in order to avoid selecting an observed positive item in the training dataset. Unavoidably, some negative items sampled from the training dataset could be positive in the test set. Recently, self-supervised learning (SSL) has emerged as a powerful tool to learn a model without negative samples. In this paper, we propose a self-supervised collaborative filtering framework (SelfCF), that is specially designed for recommender scenario with implicit feedback. The main idea of SelfCF is to augment the output embeddings generated by backbone networks, because it is infeasible to augment raw input of user/item ids. We propose and study three output perturbation techniques that can be applied to different types of backbone networks including both traditional CF models and graph-based models. By encapsulating two popular recommendation models into the framework, our experiments on three datasets show that the best performance of our framework is comparable or better than the supervised counterpart. We also show that SelfCF can boost up the performance by up to 8.93% on average, compared with another self-supervised framework as the baseline. Source codes are available at: https://github.com/enoche/SelfCF.
【3】 Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams 标题:一种可扩展的半监督大规模数据流教师强迫网络
作者:Mahardhika Pratama,Choiru Za'in,Edwin Lughofer,Eric Pardede,Dwi A. P. Rahayu 机构:A.P. Rahayue, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Monash University, Australia, Department of Knowledge-Based Mathematical Systems, Johannes Kepler University, Linz, Austria 备注:None 链接:https://arxiv.org/abs/2107.02943 摘要:大规模数据流问题是指在传统的计算平台下无法以可伸缩的方式处理的高速信息流。这个问题也带来了昂贵的标签成本,使得部署完全监督算法变得不可行。另一方面,由于大多数工作是在传统的单节点计算环境中设计的,同时也是完全监督的,因此半监督大规模数据流的问题在文献中很少被探讨。本文提出了一种弱监督的可扩展教师强制网络(WeScatterNet)来同时处理标记样本和大规模数据流的不足。WeScatterNet是在apachespark分布式计算平台下构建的,采用无数据模型融合策略对并行计算后的模型进行压缩。它采用开放的网络结构来解决全局和局部漂移问题,同时集成了数据增强、注释和自动校正($DA^3$)方法来处理部分标记的数据流。在6个大规模数据流问题中,仅以$25%$的标签比例对WeScatterNet的性能进行了数值评估。即使与100%$标签比例的完全监督学习者相比,它也显示出很强的竞争力。 摘要:The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction ($DA^3$) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only $25%$ label proportions. It shows highly competitive performance even if compared with fully supervised learners with $100%$ label proportions.
【4】 Supervised Bayesian Specification Inference from Demonstrations 标题:基于示例的有监督贝叶斯规范推理
作者:Ankit Shah,Pritish Kamath,Shen Li,Patrick Craven,Kevin Landers,Kevin Oden,Julie Shah 机构:Massachusetts Institute of Technology 链接:https://arxiv.org/abs/2107.02912 摘要:当观察任务演示时,人类学徒能够在获得实际执行任务的专业知识之前,识别给定任务是否正确执行。先前关于从示范中学习(LfD)的研究未能抓住任务执行的可接受性这一概念;同时,时态逻辑为任务规范的表达提供了一种灵活的语言。受此启发,我们提出了贝叶斯规范推理,一种将任务规范作为时序逻辑公式进行推理的概率模型。我们结合了概率规划的方法来定义我们的先验,以及一个独立于领域的似然函数来实现基于抽样的推理。我们证明了我们的模型用于推断规范的有效性,在合成域和实际表格设置任务中,推断规范和基本事实之间的相似度超过90%。 摘要:When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task. Prior research into learning from demonstrations (LfD) has failed to capture this notion of the acceptability of a task's execution; meanwhile, temporal logics provide a flexible language for expressing task specifications. Inspired by this, we present Bayesian specification inference, a probabilistic model for inferring task specification as a temporal logic formula. We incorporate methods from probabilistic programming to define our priors, along with a domain-independent likelihood function to enable sampling-based inference. We demonstrate the efficacy of our model for inferring specifications, with over 90% similarity observed between the inferred specification and the ground truth, both within a synthetic domain and during a real-world table setting task.
【5】 Logit-based Uncertainty Measure in Classification 标题:基于Logit的分类不确定性度量
作者:Huiyu Wu,Diego Klabjan 机构:Northwestern University 链接:https://arxiv.org/abs/2107.02845 摘要:我们介绍了一种新的,可靠的,不可知的测量分类任务称为logit不确定性。它基于神经网络的logit输出。我们特别指出,这种新的不确定度度量在不同的任务上,包括样本外检测和发现错误的预测,比现有的不确定度度量有更好的性能。我们分析了测量的理论基础,并探讨了与高密度区域的关系。我们还演示了如何测试不确定性使用中间输出的训练生成性对抗网络。我们提出了两种在实际应用中利用基于logit的不确定性的方法,并证明了不确定性度量的优越性。 摘要:We introduce a new, reliable, and agnostic uncertainty measure for classification tasks called logit uncertainty. It is based on logit outputs of neural networks. We in particular show that this new uncertainty measure yields a superior performance compared to existing uncertainty measures on different tasks, including out of sample detection and finding erroneous predictions. We analyze theoretical foundations of the measure and explore a relationship with high density regions. We also demonstrate how to test uncertainty using intermediate outputs in training of generative adversarial networks. We propose two potential ways to utilize logit-based uncertainty in real world applications, and show that the uncertainty measure outperforms.
迁移|Zero/Few/One-Shot|自适应(5篇)
【1】 Differentiable Architecture Pruning for Transfer Learning 标题:用于迁移学习的可微体系结构剪枝
作者:Nicolo Colombo,Yang Gao 机构:Department of Computer Science, Royal Holloway University of London, Egham Hill, Egham TW,EX, UK 备注:19 pages (main appendix), 7 figures and 1 table, Workshop @ ICML 2021, 24th July 2021 链接:https://arxiv.org/abs/2107.03375 摘要:我们提出了一种新的基于梯度的方法来从给定的大模型中提取子结构。与现有的剪枝方法无法分离网络结构和相应的权值相反,我们的结构剪枝方案产生了可转移的新结构,可以成功地重新训练以解决不同的任务。我们关注的是一个迁移学习设置,在这个设置中,架构可以在一个大的数据集上进行训练,但是很少有数据点可用于在新任务上对它们进行微调。我们定义了一种新的基于梯度的算法,该算法独立于附加的权值来训练任意低复杂度的体系结构。给定一个由现有大型神经网络模型定义的搜索空间,我们将结构搜索任务转化为一个复杂度惩罚的子集选择问题,并通过一个双温度松弛方案进行求解。我们提供了理论上的收敛性保证,并在实际数据上验证了所提出的迁移学习策略。 摘要:We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
【2】 Enhancing an Intelligent Digital Twin with a Self-organized Reconfiguration Management based on Adaptive Process Models 标题:基于自适应过程模型的自组织重构管理增强智能数字双胞胎
作者:Timo Müller,Benjamin Lindemann,Tobias Jung,Nasser Jazdi,Michael Weyrich 机构:Institute of Industrial Automation and Software Engineering, University of Stuttgart, Pfaffenwaldring , Stuttgart, Germany 备注:6 pages, 2 figures. Submitted to 54th CIRP Conference on Manufacturing Systems 2021 链接:https://arxiv.org/abs/2107.03324 摘要:产品生命周期的缩短和生产个性化程度的提高,使得工业自动化系统领域的重构需求不断增加,未来工业自动化系统将以网络物理生产系统为主。然而,在不断变化的系统中,并不是几乎无限状态空间的所有配置方案都被完全理解。因此,某些配置可能导致工艺不稳定、质量降低或机器故障。因此,本文提出了一种基于自适应过程模型的自组织重构管理方法来增强智能数字孪生子系统的性能,以便更全面地找到优化配置。 摘要:Shorter product life cycles and increasing individualization of production leads to an increased reconfiguration demand in the domain of industrial automation systems, which will be dominated by cyber-physical production systems in the future. In constantly changing systems, however, not all configuration alternatives of the almost infinite state space are fully understood. Thus, certain configurations can lead to process instability, a reduction in quality or machine failures. Therefore, this paper presents an approach that enhances an intelligent Digital Twin with a self-organized reconfiguration management based on adaptive process models in order to find optimized configurations more comprehensively.
【3】 Distributed adaptive algorithm based on the asymmetric cost of error functions 标题:基于误差函数代价不对称的分布式自适应算法
作者:Sihai Guan,Qing Cheng,Yong Zhao 机构:(,. College of Electronic and Information, Southwest Minzu University, Chengdu, China. ,. Key, Laboratory of Electronic and Information Engineering, State Ethnic Affairs Commission, Chengdu 链接:https://arxiv.org/abs/2107.03067 摘要:本文从非对称代价函数的角度出发,将扩散策略与所有分布式网络节点的线性代价(LLC)、二次二次代价(QQC)和线性指数代价(LEC)相结合,提出了一类新的扩散自适应估计算法,称为扩散LLCLMS(DLLCLMS),扩散QQCLMS(DQQCLMS)和扩散LECLMS(DLECLMS)。然后从理论上分析了这三种扩散算法的均值估计误差的稳定性和计算复杂度。最后,通过实验仿真验证了这三种扩散算法的优越性。实验仿真结果表明,DLLCLMS、DQQCLMS和DLECLMS算法对输入信号和脉冲噪声的鲁棒性优于DSELMS、DRVSSLMS和DLLAD算法。总之,理论分析和实验结果表明,所提出的DLLCLMS、DQQCLMS和DLECLMS算法在变化的脉冲噪声环境和不同类型的输入信号下估计未知线性系统时具有优越的性能。 摘要:In this paper, a family of novel diffusion adaptive estimation algorithm is proposed from the asymmetric cost function perspective by combining diffusion strategy and the linear-linear cost (LLC), quadratic-quadratic cost (QQC), and linear-exponential cost (LEC), at all distributed network nodes, and named diffusion LLCLMS (DLLCLMS), diffusion QQCLMS (DQQCLMS), and diffusion LECLMS (DLECLMS), respectively. Then the stability of mean estimation error and computational complexity of those three diffusion algorithms are analyzed theoretically. Finally, several experiment simulation results are designed to verify the superiority of those three proposed diffusion algorithms. Experimental simulation results show that DLLCLMS, DQQCLMS, and DLECLMS algorithms are more robust to the input signal and impulsive noise than the DSELMS, DRVSSLMS, and DLLAD algorithms. In brief, theoretical analysis and experiment results show that those proposed DLLCLMS, DQQCLMS, and DLECLMS algorithms have superior performance when estimating the unknown linear system under the changeable impulsive noise environments and different types of input signals.
【4】 ADAPT : Awesome Domain Adaptation Python Toolbox 标题:适配:令人敬畏的领域适配Python工具箱
作者:Antoine de Mathelin,François Deheeger,Guillaume Richard,Mathilde Mougeot,Nicolas Vayatis 机构:Fran¸cois Deheeger, Manufacture Fran¸caise des pneumatiques Michelin, Clermont-Ferrand, France, Centre Borelli, Universit´e Paris-Saclay, CNRS, ENS Paris-Saclay, Gif-sur-Yvette, France 备注:8 pages, 4 figures 链接:https://arxiv.org/abs/2107.03049 摘要:ADAPT是一个开源的python库,提供了几种域自适应方法的实现。该库适用于scikit学习估计器对象(实现拟合和预测方法的对象)和tensorflow模型。大多数实现的方法都是以估计器不可知的方式开发的,提供了适应多种用途的各种可能性。该库提供了三个模块,分别对应于领域自适应的三种主要策略:(i)基于特征的包含方法,实现特征转换(ii)基于实例,实施重加权技术和(iii)基于参数的建议方法,以使预先训练的模型适应新的观测。建议在网上提供完整的文件https://adapt-python.github.io/adapt/ 有很多例子。此外,该库还提供了很高的测试覆盖率。 摘要:ADAPT is an open-source python library providing the implementation of several domain adaptation methods. The library is suited for scikit-learn estimator object (object which implement fit and predict methods) and tensorflow models. Most of the implemented methods are developed in an estimator agnostic fashion, offering various possibilities adapted to multiple usage. The library offers three modules corresponding to the three principal strategies of domain adaptation: (i) feature-based containing methods performing feature transformation; (ii) instance-based with the implementation of reweighting techniques and (iii) parameter-based proposing methods to adapt pre-trained models to novel observations. A full documentation is proposed online https://adapt-python.github.io/adapt/ with gallery of examples. Besides, the library presents an high test coverage.
【5】 Transfer Learning in Information Criteria-based Feature Selection 标题:基于信息准则的特征选择中的迁移学习
作者:Shaohan Chen,Nikolaos V. Sahinidis,Chuanhou Gao 机构:School of Mathematical Sciences, Zhejiang University, Hangzhou , China, H. Milton Stewart School of Industrial & Systems Engineering and, School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA , USA, Editor: 链接:https://arxiv.org/abs/2107.02847 摘要:本文研究了基于Mallows-Cp的迁移学习的有效性,提出了一种将迁移学习与Mallows-Cp相结合的方法(TLCp),并证明了该方法在准确性和稳定性方面优于传统的Mallows-Cp准则。我们的理论结果表明,对于目标域中的任何样本大小,如果i)源域和目标域任务之间的相异性很小,则所提出的TLCp估计器在正交预测的情况下的均方误差(MSE)度量优于Cp估计器,根据特定的显式规则调整过程参数(复杂性惩罚)。此外,我们证明了我们的迁移学习框架可以扩展到其他特征选择准则,如贝叶斯信息准则。通过分析正交化Cp的解,在非正交预测的情况下,我们确定了一个渐近逼近Cp准则解的估计量。对于非正交TLCp也得到了类似的结果。最后,通过仿真研究和实际数据应用,验证了TLCp方案的有效性。 摘要:This paper investigates the effectiveness of transfer learning based on Mallows' Cp. We propose a procedure that combines transfer learning with Mallows' Cp (TLCp) and prove that it outperforms the conventional Mallows' Cp criterion in terms of accuracy and stability. Our theoretical results indicate that, for any sample size in the target domain, the proposed TLCp estimator performs better than the Cp estimator by the mean squared error (MSE) metric in the case of orthogonal predictors, provided that i) the dissimilarity between the tasks from source domain and target domain is small, and ii) the procedure parameters (complexity penalties) are tuned according to certain explicit rules. Moreover, we show that our transfer learning framework can be extended to other feature selection criteria, such as the Bayesian information criterion. By analyzing the solution of the orthogonalized Cp, we identify an estimator that asymptotically approximates the solution of the Cp criterion in the case of non-orthogonal predictors. Similar results are obtained for the non-orthogonal TLCp. Finally, simulation studies and applications with real data demonstrate the usefulness of the TLCp scheme.
强化学习(1篇)
【1】 Evaluating the progress of Deep Reinforcement Learning in the real world: aligning domain-agnostic and domain-specific research 标题:评估真实世界中深度强化学习的进展:调整领域不可知性和领域特异性研究
作者:Juan Jose Garau-Luis,Edward Crawley,Bruce Cameron 机构: Massachusetts Institute ofTechnology 链接:https://arxiv.org/abs/2107.03015 摘要:深度强化学习(DRL)被认为是一个潜在的框架,以改善许多现实世界的自治系统;它引起了多个不同领域的关注。然而,在现实世界中的成功部署是大多数DRL模型仍然需要通过的一个考验。在这项工作中,我们通过回顾和评估来自领域不可知和领域特定社区的研究成果来关注这个问题。一方面,我们对DRL挑战进行了全面总结,并总结了缓解这些挑战的不同建议;这有助于确定领域不可知研究的五个缺口。另一方面,从特定领域的角度,我们讨论了不同的成功案例,并论证了为什么其他模型可能无法部署。最后,我们将讨论如何推进这两个角度的会计工作。 摘要:Deep Reinforcement Learning (DRL) is considered a potential framework to improve many real-world autonomous systems; it has attracted the attention of multiple and diverse fields. Nevertheless, the successful deployment in the real world is a test most of DRL models still need to pass. In this work we focus on this issue by reviewing and evaluating the research efforts from both domain-agnostic and domain-specific communities. On one hand, we offer a comprehensive summary of DRL challenges and summarize the different proposals to mitigate them; this helps identifying five gaps of domain-agnostic research. On the other hand, from the domain-specific perspective, we discuss different success stories and argue why other models might fail to be deployed. Finally, we take up on ways to move forward accounting for both perspectives.
医学相关(1篇)
【1】 AGD-Autoencoder: Attention Gated Deep Convolutional Autoencoder for Brain Tumor Segmentation 标题:AGD自动编码器:用于脑肿瘤分割的注意力门控深卷积自动编码器
作者:Tim Cvetko 备注:8 pages, 2 figures 链接:https://arxiv.org/abs/2107.03323 摘要:脑肿瘤分割是医学图像分析中一个具有挑战性的问题。其目的是在功能磁共振成像筛查中生成能准确识别脑肿瘤区域的显著掩模。本文提出了一种新的注意门(AG)脑肿瘤分割模型,该模型利用边缘检测单元和注意门网络来突出和分割fMRI图像中的显著区域。这一特点使我们能够消除必须明确指向受损区域(外部组织定位)的必要性,并根据经典的计算机视觉技术进行分类(分类)。AGs可以很容易地集成到深卷积神经网络(CNNs)中。最小的计算开销是必需的,而AGs显著提高了敏感性得分。我们发现,边缘检测器和注意门机制提供了一个足够的方法,使大脑分割的IOU达到0.78 摘要:Brain tumor segmentation is a challenging problem in medical image analysis. The endpoint is to generate the salient masks that accurately identify brain tumor regions in an fMRI screening. In this paper, we propose a novel attention gate (AG model) for brain tumor segmentation that utilizes both the edge detecting unit and the attention gated network to highlight and segment the salient regions from fMRI images. This feature enables us to eliminate the necessity of having to explicitly point towards the damaged area(external tissue localization) and classify(classification) as per classical computer vision techniques. AGs can easily be integrated within the deep convolutional neural networks(CNNs). Minimal computional overhead is required while the AGs increase the sensitivity scores significantly. We show that the edge detector along with an attention gated mechanism provide a sufficient enough method for brain segmentation reaching an IOU of 0.78
推荐(1篇)
【1】 From Zero to The Hero: A Collaborative Market Aware Recommendation System for Crowd Workers 标题:从零到英雄:面向群工的协同市场感知推荐系统
作者:Hamid Shamszare,Razieh Saremi,Sanam Jena 机构:Stevens Institute of Technology, Hoboken, NJ, USA 备注:11 pages, 7 figures, 4 tables 链接:https://arxiv.org/abs/2107.02890 摘要:软件众包的成功依赖于活跃和可信的员工资源库。群体性员工行为的不确定性使得预测员工的成功并制定相应的计划具有挑战性。在一个竞争激烈的众包市场中,对共享任务的成功竞争为众包员工的决策过程增加了另一层不确定性。对软件工作者行为的初步分析表明,任务下降率高达82.9%。这些因素导致需要为CSD员工提供一个自动化的推荐系统,以提高他们在竞争中成功的可见性和可预测性。为此,本文提出了一种面向人群工作者的协同推荐系统。该推荐系统采用了五个输入指标,分别基于员工在池中的协作历史、员工在接受任务时的偏好(奖金和工期)、员工的专业性和员工的熟练程度。然后,该方法根据工人在任务中的成功概率来推荐最适合工人竞争的任务。对260名活跃人群工作人员的实验结果表明,只要遵循任务推荐的前三大成功概率,工作人员的成功率可达86% 摘要:The success of software crowdsourcing depends on active and trustworthy pool of worker supply. The uncertainty of crowd workers' behaviors makes it challenging to predict workers' success and plan accordingly. In a competitive crowdsourcing marketplace, competition for success over shared tasks adds another layer of uncertainty in crowd workers' decision-making process. Preliminary analysis on software worker behaviors reveals an alarming task dropping rate of 82.9%. These factors lead to the need for an automated recommendation system for CSD workers to improve the visibility and predictability of their success in the competition. To that end, this paper proposes a collaborative recommendation system for crowd workers. The proposed recommendation system method uses five input metrics based on workers' collaboration history in the pool, workers' preferences in taking tasks in terms of monetary prize and duration, workers' specialty, and workers' proficiency. The proposed method then recommends the most suitable tasks for a worker to compete on based on workers' probability of success in the task. Experimental results on 260 active crowd workers demonstrate that just following the top three success probabilities of task recommendations, workers can achieve success up to 86%
聚类(2篇)
【1】 Hub and Spoke Logistics Network Design for Urban Region with Clustering-Based Approach 标题:基于聚类法的城市区域轴辐式物流网络设计
作者:Quan Duong,Dang Nguyen,Quoc Nguyen 机构:Nguyen[,−,−,−,], GHN Data Science, Ho Chi Minh, Vietnam 链接:https://arxiv.org/abs/2107.03080 摘要:本研究旨在提出有效的都市区物流网路设计模式与方法,以期在需求对价格与时间敏感的物流业中,提供有效的物流配送网路作为竞争策略。提出了一种多阶段的方法来选择枢纽数目,并将辐条分配给枢纽进行流量分配和枢纽位置检测。具体地说,采用了以近似运输成本最小化为目标函数的模糊聚类模型,下一阶段在领域专家的帮助下,重点解决枢纽间的需求容量平衡问题,然后引入网络内的设施选址-车辆路径问题。为了证明该方法的优越性,针对胡志明市基础设施的实际运行数据,对设计的网络及其实际运输成本进行了实验。此外,我们还展示了所设计的网络在流量分布上的灵活性,并通过计算实验来开发有助于网络设计决策过程的管理见解。 摘要:This study aims to propose effective modeling and approach for designing a logistics network in the urban area in order to offer an efficient flow distribution network as a competitive strategy in the logistics industry where demand is sensitive to both price and time. A multi-stage approach is introduced to select the number of hubs and allocate spokes to the hubs for flow distribution and hubs' location detection. Specifically, a fuzzy clustering model with the objective function is to minimize the approximate transportation cost is employed, in the next phase is to focus on balancing the demand capacity among the hubs with the help of domain experts, afterward, the facility location vehicle routing problems within the network is introduced. To demonstrate the approach's advantages, an experiment was performed on the designed network and its actual transportation cost for the real operational data in which specific to the Ho Chi Minh city infrastructure conditions. Additionally, we show the flexibility of the designed network in the flow distribution and its computational experiments to develop the managerial insights which contribute to the network design decision-making process.
【2】 Probabilistic partition of unity networks: clustering based deep approximation 标题:单位网络的概率划分:基于聚类的深度逼近
作者:Nat Trask,Mamikon Gulian,Andy Huang,Kookjin Lee 机构:Center for Computing Research, Sandia National Laboratories, Albuquerque, NM , Electrical Models and Simulation, Quantitative Modeling and Analysis 备注:12 pages, 6 figures 链接:https://arxiv.org/abs/2107.03066 摘要:单位网络划分(POU-Nets)可以实现偏微分方程回归和求解的代数收敛速度,但需要对训练参数进行经验调整。我们用高斯噪声模型来丰富POU网络,以获得一个基于梯度的最大似然损失最小化的概率泛化。所得到的结构提供了无噪和有噪数据的空间表示为高斯混合,方差的闭合形式表达式提供了局部误差的估计。训练过程基于函数值的相关性产生显著的输入空间划分。这种训练点的分类可以采用分层细化策略,显著提高回归的局部化程度,允许使用高阶多项式近似。与高斯过程回归相比,该框架更适合于大数据集,并允许空间变化的不确定性,利用深度神经网络的表达能力,同时绕过与其他概率深度学习方法相关的昂贵训练。与标准的深度神经网络相比,该框架在不使用正则化子来调整分区局部性的情况下证明了hp收敛性。我们提供了量化高/低维性能的基准,证明了收敛速度仅依赖于高维空间中数据的潜在维。最后,我们介绍了一个新的基于偏微分方程的半导体器件模拟的开源数据集,并对物理上可解释的降阶基进行了无监督提取。 摘要:Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs, but require empirical tuning of training parameters. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based minimization of a maximum likelihood loss. The resulting architecture provides spatial representations of both noiseless and noisy data as Gaussian mixtures with closed form expressions for variance which provides an estimator of local error. The training process yields remarkably sharp partitions of input space based upon correlation of function values. This classification of training points is amenable to a hierarchical refinement strategy that significantly improves the localization of the regression, allowing for higher-order polynomial approximation to be utilized. The framework scales more favorably to large data sets as compared to Gaussian process regression and allows for spatially varying uncertainty, leveraging the expressive power of deep neural networks while bypassing expensive training associated with other probabilistic deep learning methods. Compared to standard deep neural networks, the framework demonstrates hp-convergence without the use of regularizers to tune the localization of partitions. We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space. Finally, we introduce a new open-source data set of PDE-based simulations of a semiconductor device and perform unsupervised extraction of a physically interpretable reduced-order basis.
超分辨率|去噪|去模糊|去雾(2篇)
【1】 Structured Denoising Diffusion Models in Discrete State-Spaces 标题:离散状态空间中的结构化去噪扩散模型
作者:Jacob Austin,Daniel Johnson,Jonathan Ho,Danny Tarlow,Rianne van den Berg 机构:Google Research, Brain Team 备注:10 pages plus references and appendices. First two authors contributed equally 链接:https://arxiv.org/abs/2107.03006 摘要:去噪扩散概率模型(DDPM)(Ho et al.2020)在连续状态空间中的图像和波形生成方面显示了令人印象深刻的结果。在这里,我们介绍了离散去噪扩散概率模型(D3PM),离散数据的类扩散生成模型,通过超越具有统一转移概率的腐败过程,推广了Hoogeboom等人2021年的多项式扩散模型。这包括连续空间中模仿高斯核的转移矩阵、嵌入空间中基于最近邻的矩阵以及引入吸收态的矩阵的损坏。第三种方法使我们能够在扩散模型和基于自回归和掩模的生成模型之间建立联系。我们证明了转换矩阵的选择是一个重要的设计决策,它可以改善图像和文本领域的结果。我们还引入了一个新的损失函数,它结合了变分下界和辅助交叉熵损失。对于文本,该模型类在字符级文本生成方面取得了很好的效果,同时可以扩展到LM1B上的大型词汇表。在CIFAR-10图像数据集上,我们的模型接近样本质量,超过了连续空间DDPM模型的对数似然。 摘要:Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. 2021, by going beyond corruption processes with uniform transition probabilities. This includes corruption with transition matrices that mimic Gaussian kernels in continuous space, matrices based on nearest neighbors in embedding space, and matrices that introduce absorbing states. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B. On the image dataset CIFAR-10, our models approach the sample quality and exceed the log-likelihood of the continuous-space DDPM model.
【2】 A comparative study of various Deep Learning techniques for spatio-temporal Super-Resolution reconstruction of Forced Isotropic Turbulent flows 标题:强迫各向同性湍流时空超分辨率重建的不同深度学习技术比较研究
作者:T. S. Sachin Venkatesh,Rajat Srivastava,Pratyush Bhatt,Prince Tyagi,Raj Kumar Singh 机构:Department of Applied Physics, Delhi Technological University, New Delhi, India, Department of Mechanical Engineering, Delhi Technological University, New Delhi, India 备注:10 pages, 10 figures, 2 tables, accepted for IMECE2021 链接:https://arxiv.org/abs/2107.03361 摘要:超分辨率是一种创新技术,它可以提高图像或视频的分辨率,从而使我们能够从低分辨率数据中重建高保真图像。本研究利用各种最先进的机器学习技术,如ESPCN、ESRGAN和TecoGAN,从低分辨率流场数据重建高分辨率流场,在时空上对湍流流场进行超分辨率分析,特别是要考虑到低资源消耗和快速结果生产/验证的需要。本研究使用的数据集是从约翰斯霍普金斯湍流数据库(JHTDB)的“各向同性1024粗糙”数据集中提取的。我们利用预先训练好的模型,并根据我们的需要对其进行微调,以尽量减少执行超分辨率模型所需的计算资源和时间。这种方法的优点远远超出了常规单一结构模型的预期和结果。然后利用MSE、PSNR、SAM、VIF和SCC度量对这些模型得到的结果进行比较,以评估放大后的结果,找到计算能力和输出质量之间的平衡点,进而确定最精确、最有效的湍流流场时空超分辨率模型。 摘要:Super-resolution is an innovative technique that upscales the resolution of an image or a video and thus enables us to reconstruct high-fidelity images from low-resolution data. This study performs super-resolution analysis on turbulent flow fields spatially and temporally using various state-of-the-art machine learning techniques like ESPCN, ESRGAN and TecoGAN to reconstruct high-resolution flow fields from low-resolution flow field data, especially keeping in mind the need for low resource consumption and rapid results production/verification. The dataset used for this study is extracted from the 'isotropic 1024 coarse' dataset which is a part of Johns Hopkins Turbulence Databases (JHTDB). We have utilized pre-trained models and fine tuned them to our needs, so as to minimize the computational resources and the time required for the implementation of the super-resolution models. The advantages presented by this method far exceed the expectations and the outcomes of regular single structure models. The results obtained through these models are then compared using MSE, PSNR, SAM, VIF and SCC metrics in order to evaluate the upscaled results, find the balance between computational power and output quality, and then identify the most accurate and efficient model for spatial and temporal super-resolution of turbulent flow fields.
自动驾驶|车辆|车道检测等(1篇)
【1】 Efficient Detection of Botnet Traffic by features selection and Decision Trees 标题:基于特征选择和决策树的僵尸网络流量高效检测
作者:Javier Velasco-Mata,Víctor González-Castro,Eduardo Fidalgo,Enrique Alegre 机构: 2 1Department of Electrical, Universidad de León 2Researcher at INCIBE 备注:Submitted to IEEE Access 链接:https://arxiv.org/abs/2107.02896 摘要:僵尸网络是最具影响力的在线威胁之一,给全球经济造成了亿万富翁的损失。如今,越来越多的设备接入互联网,使得分析大量的网络流量数据成为必要。在这项工作中,我们致力于通过选择那些进一步提高检测率的特征来提高僵尸网络流量分类的性能。为此,我们使用两种特征选择技术,信息增益和基尼重要性,这导致了三个预先选定的子集的五,六和七个特征。然后,我们评估了三个特征子集以及三个模型:决策树、随机林和k近邻。为了测试三个特征向量和三个模型的性能,我们在CTU-13数据集的基础上生成了两个数据集,即QB-CTU13和EQB-CTU13。我们以分类样本所需的计算时间内的宏观平均F1分数来衡量性能。结果表明,使用五特征集的决策树在平均0.78微秒的时间内对每个样本进行分类,得到的平均F1得分为85%的结果是获得了最高的性能。 摘要:Botnets are one of the online threats with the biggest presence, causing billionaire losses to global economies. Nowadays, the increasing number of devices connected to the Internet makes it necessary to analyze large amounts of network traffic data. In this work, we focus on increasing the performance on botnet traffic classification by selecting those features that further increase the detection rate. For this purpose we use two feature selection techniques, Information Gain and Gini Importance, which led to three pre-selected subsets of five, six and seven features. Then, we evaluate the three feature subsets along with three models, Decision Tree, Random Forest and k-Nearest Neighbors. To test the performance of the three feature vectors and the three models we generate two datasets based on the CTU-13 dataset, namely QB-CTU13 and EQB-CTU13. We measure the performance as the macro averaged F1 score over the computational time required to classify a sample. The results show that the highest performance is achieved by Decision Trees using a five feature set which obtained a mean F1 score of 85% classifying each sample in an average time of 0.78 microseconds.
联邦学习|隐私保护|加密(2篇)
【1】 RoFL: Attestable Robustness for Secure Federated Learning 标题:ROFL:安全联合学习的可证明鲁棒性
作者:Lukas Burkhalter,Hidde Lycklama à Nijeholt,Alexander Viand,Nicolas Küchler,Anwar Hithnawi 机构:ETH Zürich 备注:20 pages, 15 figures 链接:https://arxiv.org/abs/2107.03311 摘要:联合学习是一种新兴的去中心化机器学习范式,它允许大量客户机训练联合模型,而不需要共享他们的私有数据。参与者只分享训练模型所需的短暂更新。为了确保客户端更新的机密性,联邦学习系统采用安全聚合;客户机对其梯度更新进行加密,只有聚合模型才会显示给服务器。然而,实现这一级别的数据保护对联邦学习的健壮性提出了新的挑战,即容忍失败和攻击的能力。不幸的是,在此设置中,恶意客户端现在可以很容易地对模型行为施加影响而不被检测到。随着联邦学习在一系列敏感应用中的实际应用,它的健壮性变得越来越重要。在本文中,我们朝着理解和提高安全联邦学习的鲁棒性迈出了一步。我们从系统的研究开始,评估和分析现有的攻击向量,讨论潜在的防御措施并评估其有效性。然后我们介绍了RoFL,一个安全的联邦学习系统,它通过对加密模型更新的输入检查来提高对恶意客户机的鲁棒性。RoFL扩展了联邦学习的安全聚合协议,允许使用零知识证明表达模型更新的各种属性和约束。为了使RoFL能够扩展到典型的联邦学习设置,我们引入了一些特定于联邦学习的ML和密码优化。我们实现并评估了一个RoFL原型,结果表明真实的ML模型可以在合理的时间内进行训练,同时提高了鲁棒性。 摘要:Federated Learning is an emerging decentralized machine learning paradigm that allows a large number of clients to train a joint model without the need to share their private data. Participants instead only share ephemeral updates necessary to train the model. To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation; clients encrypt their gradient updates, and only the aggregated model is revealed to the server. Achieving this level of data protection, however, presents new challenges to the robustness of Federated Learning, i.e., the ability to tolerate failures and attacks. Unfortunately, in this setting, a malicious client can now easily exert influence on the model behavior without being detected. As Federated Learning is being deployed in practice in a range of sensitive applications, its robustness is growing in importance. In this paper, we take a step towards understanding and improving the robustness of secure Federated Learning. We start this paper with a systematic study that evaluates and analyzes existing attack vectors and discusses potential defenses and assesses their effectiveness. We then present RoFL, a secure Federated Learning system that improves robustness against malicious clients through input checks on the encrypted model updates. RoFL extends Federated Learning's secure aggregation protocol to allow expressing a variety of properties and constraints on model updates using zero-knowledge proofs. To enable RoFL to scale to typical Federated Learning settings, we introduce several ML and cryptographic optimizations specific to Federated Learning. We implement and evaluate a prototype of RoFL and show that realistic ML models can be trained in a reasonable time while improving robustness.
【2】 DER Forecast using Privacy Preserving Federated Learning 标题:基于保密性联邦学习的DER预测
作者:Venkatesh Venkataramanan,Sridevi Kaza,Anuradha M. Annaswamy 机构: Annaswamy are with the Department of MechanicalEngineering, Massachusetts Institute of Technology 链接:https://arxiv.org/abs/2107.03248 摘要:随着分布式能源(DERs)在电网边缘(包括可再生能源发电、灵活负荷和存储)的日益普及,在用户层面准确预测分布式发电和消耗变得非常重要。然而,基于客户级数据的重复或大量传输的DER预测由于隐私问题而不可行。本文提出了一种分布式机器学习方法,即联邦学习,利用物联网节点的网络来进行DER预测,每个节点在不暴露消费者数据的情况下,传递一个消费和发电模式的模型。我们考虑了一个模拟研究,其中包括1000个,并表明,我们的方法导致准确的预测保存消费者隐私,同时仍然导致准确的预测。我们还评估了特定于电网的性能指标,如负载波动和负载缩减,并表明我们的FL算法具有令人满意的性能。在山核桃街数据集上进行了仿真,验证了该方法在实际数据上的有效性。 摘要:With increasing penetration of Distributed Energy Resources (DERs) in grid edge including renewable generation, flexible loads, and storage, accurate prediction of distributed generation and consumption at the consumer level becomes important. However, DER prediction based on the transmission of customer level data, either repeatedly or in large amounts, is not feasible due to privacy concerns. In this paper, a distributed machine learning approach, Federated Learning, is proposed to carry out DER forecasting using a network of IoT nodes, each of which transmits a model of the consumption and generation patterns without revealing consumer data. We consider a simulation study which includes 1000 DERs, and show that our method leads to an accurate prediction of preserve consumer privacy, while still leading to an accurate forecast. We also evaluate grid-specific performance metrics such as load swings and load curtailment and show that our FL algorithm leads to satisfactory performance. Simulations are also performed on the Pecan street dataset to demonstrate the validity of the proposed approach on real data.
推理|分析|理解|解释(4篇)
【1】 On Codomain Separability and Label Inference from (Noisy) Loss Functions 标题:基于(噪声)损失函数的余域可分性和标号推断
作者:Abhinav Aggarwal,Shiva Prasad Kasiviswanathan,Zekun Xu,Oluwaseyi Feyisetan,Nathanael Teissier 机构:Amazon 链接:https://arxiv.org/abs/2107.03022 摘要:机器学习分类器通常依赖于一个私有(隐藏)数据集上的损失函数进行性能评估。标签推理是最近提出的一个问题,即仅从所选预测向量处评估的(可能受扰动的)损失函数值重建这个私有数据集的基本真值标签,而不需要任何其他访问隐藏数据集的方法。已有的结果表明,这种推断在交叉熵损失等特定损失函数上是可能的。本文引入共域可分性的概念,形式化地研究了从任意(噪声)损失函数值中进行标签推断的充要条件。利用这个概念,我们证明了对于许多常用的损失函数,包括具有公共激活函数的多类交叉熵和一些基于Bregman散度的损失,可以设计任意噪声水平的标签推理攻击。我们证明了这些攻击也可以通过实际的神经网络模型进行,并从形式上和经验上论证了有限精度算法在这种情况下的作用。 摘要:Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. Label inference was recently introduced as the problem of reconstructing the ground truth labels of this private dataset from just the (possibly perturbed) loss function values evaluated at chosen prediction vectors, without any other access to the hidden dataset. Existing results have demonstrated this inference is possible on specific loss functions like the cross-entropy loss. In this paper, we introduce the notion of codomain separability to formally study the necessary and sufficient conditions under which label inference is possible from any (noisy) loss function values. Using this notion, we show that for many commonly used loss functions, including multiclass cross-entropy with common activation functions and some Bregman divergence-based losses, it is possible to design label inference attacks for arbitrary noise levels. We demonstrate that these attacks can also be carried out through actual neural network models, and argue, both formally and empirically, the role of finite precision arithmetic in this setting.
【2】 Generalization Error Analysis of Neural networks with Gradient Based Regularization 标题:基于梯度正则化的神经网络泛化误差分析
作者:Lingfeng Li,Xue-Cheng Tai,Jiang Yang 机构:Received: date Accepted: date 链接:https://arxiv.org/abs/2107.02797 摘要:研究了基于梯度的神经网络正则化方法。主要研究了两种正则化方法:全变分正则化和Tikhonov正则化。应用这些方法相当于用神经网络求解一些偏微分方程,在实际应用中大多是高维的。在这项工作中,我们引入了一个通用的框架来分析正则化网络的泛化误差。误差估计依赖于近似误差和正交误差两个假设。此外,我们还对图像分类任务进行了实验,结果表明基于梯度的方法可以显著提高神经网络的泛化能力和对抗鲁棒性。实验中还考虑了基于梯度的方法的图形扩展。 摘要:We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partial differential equations, mostly in high dimensions in practical applications. In this work, we introduce a general framework to analyze the generalization error of regularized networks. The error estimate relies on two assumptions on the approximation error and the quadrature error. Moreover, we conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability and adversarial robustness of neural networks. A graphical extension of the gradient-based methods are also considered in the experiments.
【3】 MD-split : Practical Local Conformal Inference in High Dimensions 标题:MD-Split :实用的高维局部共形推理
作者:Benjamin LeRoy,David Zhao 机构: The classical version of conformal inference (VovkEqual contribution 1Department of Statistics and Data Sci-ence, Carnegie Mellon University 备注:Appearing in ICML 2021 workshop on distribution-free uncertainty quantification 链接:https://arxiv.org/abs/2107.03280 摘要:量化模型预测中的不确定性是寻求不仅仅是点预测的实践者的共同目标。不确定性量化的一个工具是共形推理,它可以帮助为黑盒模型创建概率有效的预测区域。经典的共形预测只提供了边缘有效性,而在许多情况下,局部有效的预测区域是可取的。在应用局部共形预测时,如何最好地划分特征空间X仍然是一个悬而未决的问题。我们提出MD-split ,一种实用的局部共形方法,它基于条件密度估计模型的局部模型性能来创建X分区。我们的方法处理复杂的实际数据设置,在这些设置中,这些模型可能被错误地指定,并扩展到高维输入。我们讨论如何我们的局部分区哲学上符合预期的行为从一个不可达到的条件共形推理方法。我们还根据经验比较了我们的方法与其他局部保角方法。 摘要:Quantifying uncertainty in model predictions is a common goal for practitioners seeking more than just point predictions. One tool for uncertainty quantification that requires minimal assumptions is conformal inference, which can help create probabilistically valid prediction regions for black box models. Classical conformal prediction only provides marginal validity, whereas in many situations locally valid prediction regions are desirable. Deciding how best to partition the feature space X when applying localized conformal prediction is still an open question. We present MD-split , a practical local conformal approach that creates X partitions based on localized model performance of conditional density estimation models. Our method handles complex real-world data settings where such models may be misspecified, and scales to high-dimensional inputs. We discuss how our local partitions philosophically align with expected behavior from an unattainable conditional conformal inference approach. We also empirically compare our method against other local conformal approaches.
【4】 Coastal water quality prediction based on machine learning with feature interpretation and spatio-temporal analysis 标题:基于特征解释和时空分析的机器学习近岸海域水质预测
作者:Luka Grbčić,Siniša Družeta,Goran Mauša,Tomislav Lipić,Darija Vukić Lušić,Marta Alvir,Ivana Lučin,Ante Sikirica,Davor Davidović,Vanja Travaš,Daniela Kalafatović,Kristina Pikelj,Hana Fajković,Lado Kranjčević 机构:Department of Fluid Mechanics and Computational Engineering, Engineering, University of Rijeka, Vukovarska , Rijeka, Croatia, Department of Computer Engineering, University of, Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 链接:https://arxiv.org/abs/2107.03230 摘要:沿海水质管理是一个公共卫生问题,因为糟糕的沿海水质可能含有危害人类健康的病原体。以旅游业为导向的国家需要在夏季积极监测旅游热点地区的沿海水域状况。本研究利用克罗地亚里耶卡市15个公共海滩大肠杆菌和肠球菌的常规监测数据,建立了基于环境参数预测其水平的机器学习模型,并探讨了它们与环境应激源的关系。梯度增强(Catboost,Xgboost),随机森林,支持向量回归和人工神经网络训练与测量从所有采样点,并用于预测$大肠杆菌$和肠球菌值基于环境特征。通过对机器学习模型进行10倍交叉验证分析,对稳定性和可推广性进行评估,结果表明,与其他评估的ML算法(包括Xgboost、Random Forests、,支持向量回归和人工神经网络。我们还使用SHapley加法解释技术来识别和解释哪些特征具有最大的预测能力。结果表明,现场盐度测量是预测大肠杆菌和肠球菌水平的最重要特征。最后,在沿海水质最低的地点检验了两种ML模型的时空精度。空间上的E。大肠杆菌和肠球菌模型的R$^2$值分别为0.85和0.83,而时间模型的R$^2$值分别为0.74和0.67。在沿海水质较高的地点,时间模型也获得了0.44和0.46的中等R$^2$值。 摘要:Coastal water quality management is a public health concern, as poor coastal water quality can harbor pathogens that are dangerous to human health. Tourism-oriented countries need to actively monitor the condition of coastal water at tourist popular sites during the summer season. In this study, routine monitoring data of $Escherichia Coli$ and enterococci across 15 public beaches in the city of Rijeka, Croatia, were used to build machine learning models for predicting their levels based on environmental parameters as well as to investigate their relationships with environmental stressors. Gradient Boosting (Catboost, Xgboost), Random Forests, Support Vector Regression and Artificial Neural Networks were trained with measurements from all sampling sites and used to predict $E. Coli$ and enterococci values based on environmental features. The evaluation of stability and generalizability with 10-fold cross validation analysis of the machine learning models, showed that the Catboost algorithm performed best with R$^2$ values of 0.71 and 0.68 for predicting $E. Coli$ and enterococci, respectively, compared to other evaluated ML algorithms including Xgboost, Random Forests, Support Vector Regression and Artificial Neural Networks. We also use the SHapley Additive exPlanations technique to identify and interpret which features have the most predictive power. The results show that site salinity measured is the most important feature for forecasting both $E. Coli$ and enterococci levels. Finally, the spatial and temporal accuracy of both ML models were examined at sites with the lowest coastal water quality. The spatial $E. Coli$ and enterococci models achieved strong R$^2$ values of 0.85 and 0.83, while the temporal models achieved R$^2$ values of 0.74 and 0.67. The temporal model also achieved moderate R$^2$ values of 0.44 and 0.46 at a site with high coastal water quality.
检测相关(1篇)
【1】 New Methods and Datasets for Group Anomaly Detection From Fundamental Physics 标题:基础物理群体异常检测的新方法和新数据集
作者:Gregor Kasieczka,Benjamin Nachman,David Shih 机构: Universität Hamburg, Department of Physics & Astronomy, Rutgers University 备注:Accepted for ANDEA (Anomaly and Novelty Detection, Explanation and Accommodation) Workshop at KDD 2021 链接:https://arxiv.org/abs/2107.02821 摘要:在大量的实际应用中,异常超密度的识别是一个丰富的问题。然而,与点异常或其他类型的单实例异常值相比,它在更广泛的ML社区中受到的关注相对较少。其中一个原因是缺乏强大的基准数据集。在本文中,我们首先解释了在诺贝尔奖获得者发现希格斯玻色子之后,无监督的群异常探测如何成为基础物理学的一个新前沿(其动机是寻找新的粒子和力)。然后,我们提出了一个真实的综合基准数据集(LHCO2020)来开发群体异常检测算法。最后,我们比较了几种现有的用于无监督群体异常检测的统计声音技术,并在LHCO2020数据集上展示了它们的性能。 摘要:The identification of anomalous overdensities in data - group or collective anomaly detection - is a rich problem with a large number of real world applications. However, it has received relatively little attention in the broader ML community, as compared to point anomalies or other types of single instance outliers. One reason for this is the lack of powerful benchmark datasets. In this paper, we first explain how, after the Nobel-prize winning discovery of the Higgs boson, unsupervised group anomaly detection has become a new frontier of fundamental physics (where the motivation is to find new particles and forces). Then we propose a realistic synthetic benchmark dataset (LHCO2020) for the development of group anomaly detection algorithms. Finally, we compare several existing statistically-sound techniques for unsupervised group anomaly detection, and demonstrate their performance on the LHCO2020 dataset.
分类|识别(4篇)
【1】 Bias-Tolerant Fair Classification 标题:容许偏差的公平分类
作者:Yixuan Zhang,Feng Zhou,Zhidong Li,Yang Wang,Fang Chen 机构:Data Science Institute, University of Technology Sydney, Department of Computer Science and Technology, Tsinghua University 链接:https://arxiv.org/abs/2107.03207 摘要:标签偏差和选择偏差被认为是阻碍机器学习结果公平性的两个原因。当标注决策受到敏感特征的干扰时,会产生标注偏差;当数据采样过程中存在主观偏差时,会产生选择偏差。更糟糕的是,基于这些数据训练的模型可能会继承甚至加剧歧视。大多数算法的公平性方法在预先定义的公平性约束下执行经验风险最小化,这倾向于在公平性的准确性上进行权衡。然而,这样的方法会达到期望的公平水平,牺牲受偏见影响的个人的利益(获得积极的结果)。因此,我们提出了一个偏差容忍度正则化损失(B-FARL),它试图利用受标签偏差和选择偏差影响的数据重新获得收益。B-FARL将有偏数据作为输入,调用一个模型,该模型近似于用公平但潜在的数据训练的模型,从而避免了无约束的歧视。此外,我们通过分解B-FARL来展示有效成分,并利用元学习框架对B-FARL进行优化。在真实数据集上的实验结果表明,我们的方法可以有效地提高对真实但潜在标签方向的公平性。 摘要:The label bias and selection bias are acknowledged as two reasons in data that will hinder the fairness of machine-learning outcomes. The label bias occurs when the labeling decision is disturbed by sensitive features, while the selection bias occurs when subjective bias exists during the data sampling. Even worse, models trained on such data can inherit or even intensify the discrimination. Most algorithmic fairness approaches perform an empirical risk minimization with predefined fairness constraints, which tends to trade-off accuracy for fairness. However, such methods would achieve the desired fairness level with the sacrifice of the benefits (receive positive outcomes) for individuals affected by the bias. Therefore, we propose a Bias-TolerantFAirRegularizedLoss (B-FARL), which tries to regain the benefits using data affected by label bias and selection bias. B-FARL takes the biased data as input, calls a model that approximates the one trained with fair but latent data, and thus prevents discrimination without constraints required. In addition, we show the effective components by decomposing B-FARL, and we utilize the meta-learning framework for the B-FARL optimization. The experimental results on real-world datasets show that our method is empirically effective in improving fairness towards the direction of true but latent labels.
【2】 Nested Counterfactual Identification from Arbitrary Surrogate Experiments 标题:任意代理实验中的嵌套式反事实鉴定
作者:Juan D Correa,Sanghack Lee,Elias Bareinboim 机构:Seoul National University, Columbia University 链接:https://arxiv.org/abs/2107.03190 摘要:因果关系阶梯描述了代理人可能感兴趣的三种性质不同的活动类型,即看(观察)、做(干预)和想象(反事实)(Pearl和Mackenzie,2018)。因果层次结构带来的推理挑战是,数据是由观察或干预系统的代理收集的(第1层和第2层),而它的目标可能是了解如果它采取不同的行动过程会发生什么,与实际结果相反(第3层)。虽然人们对允许从观察到干预进行跨层推断的条件有着坚实的理解,但在针对反事实量时,结果却有点少。在本文中,我们研究从观察和实验的任意组合中识别嵌套反事实。具体地说,基于嵌套反实数的一个更明确的定义,我们证明了反实数不可测定理(CUT),它允许我们将任意嵌套的反实数映射到非嵌套的反实数。例如,调解和公平性分析中的应用通常会引发直接、间接和虚假效果的概念,这自然需要嵌套。其次,我们从观测分布和实验分布的任意组合引入了反事实识别的充要图形条件。最后,我们提出了一个有效且完整的识别嵌套反事实的算法;算法返回查询表达式失败意味着它不可识别。 摘要:The Ladder of Causation describes three qualitatively different types of activities an agent may be interested in engaging in, namely, seeing (observational), doing (interventional), and imagining (counterfactual) (Pearl and Mackenzie, 2018). The inferential challenge imposed by the causal hierarchy is that data is collected by an agent observing or intervening in a system (layers 1 and 2), while its goal may be to understand what would have happened had it taken a different course of action, contrary to what factually ended up happening (layer 3). While there exists a solid understanding of the conditions under which cross-layer inferences are allowed from observations to interventions, the results are somewhat scarcer when targeting counterfactual quantities. In this paper, we study the identification of nested counterfactuals from an arbitrary combination of observations and experiments. Specifically, building on a more explicit definition of nested counterfactuals, we prove the counterfactual unnesting theorem (CUT), which allows one to map arbitrary nested counterfactuals to unnested ones. For instance, applications in mediation and fairness analysis usually evoke notions of direct, indirect, and spurious effects, which naturally require nesting. Second, we introduce a sufficient and necessary graphical condition for counterfactual identification from an arbitrary combination of observational and experimental distributions. Lastly, we develop an efficient and complete algorithm for identifying nested counterfactuals; failure of the algorithm returning an expression for a query implies it is not identifiable.
【3】 Urban Tree Species Classification Using Aerial Imagery 标题:基于航空影像的城市树种分类
作者:Emily Waters,Mahdi Maktabdar Oghaz,Lakshmi Babu Saheer 机构:AngliaRuskinUniversity 备注:International Conference on Machine Learning (ICML 2021), Workshop on Tackling Climate Change with Machine Learning 链接:https://arxiv.org/abs/2107.03182 摘要:城市树木有助于调节温度,减少能源消耗,改善城市空气质量,降低风速,减轻城市热岛效应。城市树木在减缓气候变化和全球变暖方面也发挥着关键作用,它捕获和储存大气中的二氧化碳,而二氧化碳是温室气体的最大贡献者。利用航空图像进行树木自动检测和物种分类是可持续森林和城市树木管理的有力工具。因此,本研究首先提供了一个利用Google-Map航空影像生成城市树木标签数据集的管道,然后研究了VGG和ResNet等先进的深度卷积神经网络模型在不同参数下如何处理城市树木航空影像的分类问题。实验结果表明,我们的最佳模型在6个树种上的平均准确率达到60%。 摘要:Urban trees help regulate temperature, reduce energy consumption, improve urban air quality, reduce wind speeds, and mitigating the urban heat island effect. Urban trees also play a key role in climate change mitigation and global warming by capturing and storing atmospheric carbon-dioxide which is the largest contributor to greenhouse gases. Automated tree detection and species classification using aerial imagery can be a powerful tool for sustainable forest and urban tree management. Hence, This study first offers a pipeline for generating labelled dataset of urban trees using Google Map's aerial images and then investigates how state of the art deep Convolutional Neural Network models such as VGG and ResNet handle the classification problem of urban tree aerial images under different parameters. Experimental results show our best model achieves an average accuracy of 60% over 6 tree species.
【4】 Exact Learning Augmented Naive Bayes Classifier 标题:精确学习增广朴素贝叶斯分类器
作者:Shouta Sugahara,Maomi Ueno 机构:Graduate school of Informatics and Engineering, The University of Electro-Communications, -,-, Chofugaoka, Chofu-shi, Tokyo, Japan, Editor: 备注:29 pages 链接:https://arxiv.org/abs/2107.03018 摘要:以往的研究表明,在给定特征变量的情况下,通过最大化一类变量的条件对数似然(CLL)得到的贝叶斯网络(BNs)的分类精度高于通过最大化边缘似然(ML)得到的分类精度。然而,在早期的研究中,这两个分数的表现之间的差异可能是由于他们使用的是近似的学习算法,而不是精确的学习算法。本文比较了用CLL近似学习和用ML精确学习的BNs分类精度,结果表明,对于大数据,最大化ML得到的BNs分类精度高于最大化CLL得到的BNs分类精度。然而,研究结果也显示,当样本量较小且类别变数有多个父变数时,使用ML的精确学习BNs的分类准确率要比其他方法差得多。为了解决这一问题,我们提出了一种精确学习的增广朴素贝叶斯分类器(ANB),它保证了类变量没有父变量。该方法保证了在精确学习的BN之后渐近估计同一类。对比实验表明,该方法具有良好的性能。 摘要:Earlier studies have shown that classification accuracies of Bayesian networks (BNs) obtained by maximizing the conditional log likelihood (CLL) of a class variable, given the feature variables, were higher than those obtained by maximizing the marginal likelihood (ML). However, differences between the performances of the two scores in the earlier studies may be attributed to the fact that they used approximate learning algorithms, not exact ones. This paper compares the classification accuracies of BNs with approximate learning using CLL to those with exact learning using ML. The results demonstrate that the classification accuracies of BNs obtained by maximizing the ML are higher than those obtained by maximizing the CLL for large data. However, the results also demonstrate that the classification accuracies of exact learning BNs using the ML are much worse than those of other methods when the sample size is small and the class variable has numerous parents. To resolve the problem, we propose an exact learning augmented naive Bayes classifier (ANB), which ensures a class variable with no parents. The proposed method is guaranteed to asymptotically estimate the identical class posterior to that of the exactly learned BN. Comparison experiments demonstrated the superior performance of the proposed method.
优化|敛散性(4篇)
【1】 Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization 标题:二阶信息的高效无矩阵逼近及其在剪枝和优化中的应用
作者:Elias Frantar,Eldar Kurtic,Dan Alistarh 机构:IST Austria 链接:https://arxiv.org/abs/2107.03356 摘要:有效地逼近损失函数的局部曲率信息是深度神经网络优化和压缩的关键工具。然而,现有的大多数二阶信息近似方法计算量大或存储量大,限制了其实用性。在这项工作中,我们研究了无矩阵的线性时间方法来估计逆Hessian向量积(IHVPs),当Hessian可以近似为秩一矩阵的和时,就像经典的经验Fisher矩阵近似Hessian向量积一样。作为M-FAC框架的一部分,我们提出了两种新算法:第一种算法是针对网络压缩而定制的,如果Hessian是以$M$秩一矩阵的和给出的,则可以计算$d$维度的IHVP,使用$O(dm^2)$预计算,$O(dm)$计算IHVP的成本,对于逆Hessian的任何单个元素,查询成本为$O(m)$。第二种算法针对一个优化设置,我们希望计算逆Hessian(在优化步骤的滑动窗口上估计)和给定梯度方向之间的乘积,这是预处理SGD所需的。我们给出了一个计算IHVP的代价为$O(dm m^2)$和从滑动窗口中添加或删除任何梯度的代价为$O(dm m^3)$的算法。与现有的二阶方法相比,这两种算法在网络修剪和优化方面具有较低的计算开销。[10]和[18]中提供了实现。 摘要:Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost $O(dm m^2)$ for computing the IHVP and $O(dm m^3)$ for adding or removing any gradient from the sliding window. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [10] and [18].
【2】 KaFiStO: A Kalman Filtering Framework for Stochastic Optimization 标题:KaFiStO:一种随机优化的卡尔曼滤波框架
作者:Aram Davtyan,Sepehr Sameni,Llukman Cerkezi,Givi Meishvilli,Adam Bielski,Paolo Favaro 机构:Computer Vision Group, University of Bern 链接:https://arxiv.org/abs/2107.03331 摘要:优化问题通常是一个确定性问题,通过梯度下降等迭代过程求解。然而,当训练神经网络时,由于样本子集的随机选择,损失函数随(迭代)时间而变化。这种随机化将优化问题转化为随机问题。我们建议考虑一些参考优化的损失作为嘈杂的观察。这种对损失的解释使我们可以采用Kalman滤波作为优化器,因为它的递推公式是用来从噪声测量中估计未知参数的。此外,我们还证明了未知参数演化的Kalman滤波动力学模型可以用来捕捉动量和Adam等先进方法的梯度动力学。我们称这种随机优化方法为KaFiStO。KaFiStO是一种易于实现、可扩展、高效的神经网络训练方法。我们表明,它也产生参数估计,与现有的优化算法相比,在多个神经网络架构和机器学习任务,如计算机视觉和语言建模。 摘要:Optimization is often cast as a deterministic problem, where the solution is found through some iterative procedure such as gradient descent. However, when training neural networks the loss function changes over (iteration) time due to the randomized selection of a subset of the samples. This randomization turns the optimization problem into a stochastic one. We propose to consider the loss as a noisy observation with respect to some reference optimum. This interpretation of the loss allows us to adopt Kalman filtering as an optimizer, as its recursive formulation is designed to estimate unknown parameters from noisy measurements. Moreover, we show that the Kalman Filter dynamical model for the evolution of the unknown parameters can be used to capture the gradient dynamics of advanced methods such as Momentum and Adam. We call this stochastic optimization method KaFiStO. KaFiStO is an easy to implement, scalable, and efficient method to train neural networks. We show that it also yields parameter estimates that are on par with or better than existing optimization algorithms across several neural network architectures and machine learning tasks, such as computer vision and language modeling.
【3】 Combined Global and Local Search for Optimization with Gaussian Process Models 标题:基于高斯过程模型的全局搜索和局部搜索相结合的优化算法
作者:Qun Meng,Songhao Wang,Szu Hui Ng 机构:Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore , Department of Information Systems and Management Engineering, Southern University of Science and Technology, No. 链接:https://arxiv.org/abs/2107.03217 摘要:基于高斯过程模型的优化在仿真和机器学习中有着广泛的应用。一般情况下,它首先根据真实响应中的一些观测值来估计GP模型,然后利用该模型来指导搜索,目的是快速找到全局最优解。尽管它的应用很成功,但它有一些限制,可能会阻碍它的广泛应用。首先,建立一个精确的GP模型可能是困难的和计算昂贵的,特别是当响应函数是多模态的或在设计空间内变化很大时。第二,即使有适当的模型,搜索过程也可能在移动到全局最优之前陷入次优区域,这是因为在当前最佳解周围花费了过多的精力。本文在优化框架中采用了加性全局和局部GP(AGLGP)模型。该模型以基于诱导点的GP稀疏逼近为基础,结合不同区域的独立局部模型。由于这些特性,AGLGP模型适用于数据量相对较大的多模态响应。在此模型的基础上,提出了一种全局和局部搜索相结合的优化算法。它首先将整个设计空间划分为不相交的局部区域,并用全局模型确定一个有希望的区域。然后,在选定区域中建立一个局部模型来指导该区域内的详细搜索。当找到一个好的局部解时,该算法将切换回全局步长。CGLO算法的全局性和局部性使得它能够同时利用全局搜索和局部搜索的优点来有效地定位全局最优解。 摘要:Gaussian process (GP) model based optimization is widely applied in simulation and machine learning. In general, it first estimates a GP model based on a few observations from the true response and then employs this model to guide the search, aiming to quickly locate the global optimum. Despite its successful applications, it has several limitations that may hinder its broader usage. First, building an accurate GP model can be difficult and computationally expensive, especially when the response function is multi-modal or varies significantly over the design space. Second, even with an appropriate model, the search process can be trapped in suboptimal regions before moving to the global optimum due to the excessive effort spent around the current best solution. In this work, we adopt the Additive Global and Local GP (AGLGP) model in the optimization framework. The model is rooted in the inducing-points-based GP sparse approximations and is combined with independent local models in different regions. With these properties, the AGLGP model is suitable for multi-modal responses with relatively large data sizes. Based on this AGLGP model, we propose a Combined Global and Local search for Optimization (CGLO) algorithm. It first divides the whole design space into disjoint local regions and identifies a promising region with the global model. Next, a local model in the selected region is fit to guide detailed search within this region. The algorithm then switches back to the global step when a good local solution is found. The global and local natures of CGLO enable it to enjoy the benefits of both global and local search to efficiently locate the global optimum.
【4】 Distributed stochastic optimization with large delays 标题:大延迟分布式随机优化
作者:Zhengyuan Zhou,Panayotis Mertikopoulos,Nicholas Bambos,Peter W. Glynn,Yinyu Ye 备注:41 pages, 8 figures; to be published in Mathematics of Operations Research 链接:https://arxiv.org/abs/2107.02919 摘要:分布式异步随机梯度下降(DASGD)算法是解决大规模随机优化问题最广泛使用的方法之一,它是在分布式计算体系结构(可能)上并行化随机梯度下降而产生的一系列算法。然而,DASGD有效实现的一个关键障碍是延迟问题:当一个计算节点贡献一个梯度更新时,全局模型参数可能已经被其他节点更新了好几次,从而使得这个梯度信息过时。如果节点的计算吞吐量饱和,这些延迟会迅速累积,因此在存在大延迟的情况下,DASGD的收敛性可能会受到影响。我们的第一个贡献是,通过仔细调整算法的步长,即使延迟以多项式速率无界增长,仍然可以在均方条件下收敛到临界集。我们还建立了一类广泛的结构优化问题(称为变量相关)的更精细的结果,其中我们证明了在相同的延迟假设下,DASGD以1$的概率收敛到一个全局最优解。总之,这些结果为大规模非凸随机优化提供了最先进的理论保证,并为算法设计提供了见解。 摘要:One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithm's step-size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with probability $1$ under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale non-convex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design.
预测|估计(4篇)
【1】 Regularization-based Continual Learning for Fault Prediction in Lithium-Ion Batteries 标题:基于正则化的连续学习在锂离子电池故障预测中的应用
作者:Benjamin Maschler,Sophia Tatiyosyan,Michael Weyrich 机构:a University of Stuttgart, Institute of Industrial Automation and Software Engineering, Pfaffenwaldring , Stuttgart, Germany 备注:6 pages, 5 figures, 4 tables. Accepted at CIRP ICME 2021. arXiv admin note: text overlap with arXiv:2101.00509 链接:https://arxiv.org/abs/2107.03336 摘要:近年来,锂离子电池的使用已大大扩展到许多工业部门的产品,如汽车、电动工具或医疗设备。因此,对电池故障的早期预测和强有力的理解可以大大提高这些领域的产品质量。虽然目前的数据驱动故障预测方法在其训练的确切过程中提供了良好的结果,但它们往往缺乏灵活适应变化的能力,例如在操作或环境参数方面。持续的学习保证了这样的灵活性,允许以前学习的知识自动适应新的任务。因此,本文从一组正则化策略中讨论了不同的持续学习方法,并基于一个真实的电池磨损数据集对这些方法进行了实现、评估和比较。在线弹性权重整合提供了最好的结果,但是,与所有被检查的方法一样,它的性能似乎强烈依赖于任务特征和任务序列。 摘要:In recent years, the use of lithium-ion batteries has greatly expanded into products from many industrial sectors, e.g. cars, power tools or medical devices. An early prediction and robust understanding of battery faults could therefore greatly increase product quality in those fields. While current approaches for data-driven fault prediction provide good results on the exact processes they were trained on, they often lack the ability to flexibly adapt to changes, e.g. in operational or environmental parameters. Continual learning promises such flexibility, allowing for an automatic adaption of previously learnt knowledge to new tasks. Therefore, this article discusses different continual learning approaches from the group of regularization strategies, which are implemented, evaluated and compared based on a real battery wear dataset. Online elastic weight consolidation delivers the best results, but, as with all examined approaches, its performance appears to be strongly dependent on task characteristics and task sequence.
【2】 Predicting with Confidence on Unseen Distributions 标题:不可见分布的置信度预测
作者:Devin Guillory,Vaishaal Shankar,Sayna Ebrahimi,Trevor Darrell,Ludwig Schmidt 机构:UC Berkeley, Amazon, Toyota Research Institute 链接:https://arxiv.org/abs/2107.03315 摘要:最近的研究表明,当对来自于接近但不同于训练分布的分布的数据进行评估时,机器学习模型的性能会有很大的不同。因此,预测模型在未知分布上的性能是一个重要的挑战。我们的工作结合了领域适应和预测不确定性文献中的技术,并允许我们在不访问标记数据的情况下预测具有挑战性的未知分布的模型精度。在分布转移的背景下,分布距离常常被用来调整模型并改善其在新领域的性能,然而在这些研究中,精度估计或其他形式的预测不确定性常常被忽略。通过调查广泛的已建立的分布距离,如Frechet距离或最大平均差异,我们确定,他们无法诱导可靠的估计性能下的分布转移。另一方面,我们发现分类器预测的置信度差异(DoC)成功地估计了分类器在各种变化下的性能变化。我们特别研究了综合分布和自然分布变化之间的区别,并观察到尽管DoC简单,但它始终优于其他分布差异的量化方法$DoC$可将几个现实且具有挑战性的分布变化的预测误差减少近一半($46%$),例如,在ImageNet Vid Robust和ImageNet格式副本数据集上。 摘要:Recent work has shown that the performance of machine learning models can vary substantially when models are evaluated on data drawn from a distribution that is close to but different from the training distribution. As a result, predicting model performance on unseen distributions is an important challenge. Our work connects techniques from domain adaptation and predictive uncertainty literature, and allows us to predict model accuracy on challenging unseen distributions without access to labeled data. In the context of distribution shift, distributional distances are often used to adapt models and improve their performance on new domains, however accuracy estimation, or other forms of predictive uncertainty, are often neglected in these investigations. Through investigating a wide range of established distributional distances, such as Frechet distance or Maximum Mean Discrepancy, we determine that they fail to induce reliable estimates of performance under distribution shift. On the other hand, we find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference. $DoC$ reduces predictive error by almost half ($46%$) on several realistic and challenging distribution shifts, e.g., on the ImageNet-Vid-Robust and ImageNet-Rendition datasets.
【3】 Intensity Prediction of Tropical Cyclones using Long Short-Term Memory Network 标题:利用长短期记忆网络预报热带气旋强度
作者:Koushik Biswas,Sandeep Kumar,Ashish Kumar Pandey 机构:Department of Computer Science, IIIT Delhi, New Delhi, India,., &, Shaheed Bhagat Singh College, University of Delhi, Department of Mathematics, IIIT Delhi 备注:10 pages 链接:https://arxiv.org/abs/2107.03187 摘要:热带气旋的强度是多种多样的,如果强度足够大,就会造成巨大的生命和财产损失。因此,及时预报热带气旋的强度具有十分重要的意义。提出了一种新的基于叠加双向长短时记忆网络(BiLSTM)的模式结构,以最大地表持续风速(MSWS)预测热带气旋的强度。该模型能很好地预测城市固体废弃物,预测精度很高。我们将该模型应用于1982年至2018年北印度洋热带气旋,并在最近的两个热带气旋Fani和Vayu上检验了其性能。该模型预测未来3、12、24、36、48、60和72小时的MSW(以节为单位),平均绝对误差分别为1.52、3.66、5.88、7.42、8.96、10.15和11.92。 摘要:Tropical cyclones can be of varied intensity and cause a huge loss of lives and property if the intensity is high enough. Therefore, the prediction of the intensity of tropical cyclones advance in time is of utmost importance. We propose a novel stacked bidirectional long short-term memory network (BiLSTM) based model architecture to predict the intensity of a tropical cyclone in terms of Maximum surface sustained wind speed (MSWS). The proposed model can predict MSWS well advance in time (up to 72 h) with very high accuracy. We have applied the model on tropical cyclones in the North Indian Ocean from 1982 to 2018 and checked its performance on two recent tropical cyclones, namely, Fani and Vayu. The model predicts MSWS (in knots) for the next 3, 12, 24, 36, 48, 60, and 72 hours with a mean absolute error of 1.52, 3.66, 5.88, 7.42, 8.96, 10.15, and 11.92, respectively.
【4】 Big Data Information and Nowcasting: Consumption and Investment from Bank Transactions in Turkey 标题:大数据信息与现在预测:来自土耳其银行交易的消费和投资
作者:Ali B. Barlas,Seda Guler Mert,Berk Orkun Isa,Alvaro Ortiz,Tomasa Rodrigo,Baris Soybilgen,Ege Yazgan 机构: Baris Soybil-gen (Bilgi University) and Ege Yazgan (Bilgi University)AbstractWe use the aggregate information from individual-to-firm and firm-to-firm in GarantiBBVA Bank transactions to mimic domestic private demand 备注:31 pages, 7 figures, 9 tables 链接:https://arxiv.org/abs/2107.03299 摘要:我们利用加兰蒂BBVA银行交易中从个人到企业和企业到企业的总信息来模拟国内私人需求。特别是,我们以土耳其为例,实时复制季度国民账户总消费和投资(固定资本形成总额)及其较大组成部分(机械设备和建筑)。为了验证来自这些指标的信息的有用性,我们使用不同的即时预测模型测试了这两个指标对土耳其国内生产总值的即时预测能力。结果是成功的,并证实了有用的消费和投资银行交易的即时预测目的。当传统的硬数据信息稀缺时,大数据信息的价值在即时预测过程开始时更具相关性。这使得这些信息特别适用于那些统计数据发布滞后时间较长的国家,比如新兴市场。 摘要:We use the aggregate information from individual-to-firm and firm-to-firm in Garanti BBVA Bank transactions to mimic domestic private demand. Particularly, we replicate the quarterly national accounts aggregate consumption and investment (gross fixed capital formation) and its bigger components (Machinery and Equipment and Construction) in real time for the case of Turkey. In order to validate the usefulness of the information derived from these indicators we test the nowcasting ability of both indicators to nowcast the Turkish GDP using different nowcasting models. The results are successful and confirm the usefulness of Consumption and Investment Banking transactions for nowcasting purposes. The value of the Big data information is more relevant at the beginning of the nowcasting process, when the traditional hard data information is scarce. This makes this information specially relevant for those countries where statistical release lags are longer like the Emerging Markets.
其他神经网络|深度学习|模型|建模(6篇)
【1】 Evaluating Large Language Models Trained on Code 标题:评估针对代码训练的大型语言模型
作者:Mark Chen,Jerry Tworek,Heewoo Jun,Qiming Yuan,Henrique Ponde,Jared Kaplan,Harri Edwards,Yura Burda,Nicholas Joseph,Greg Brockman,Alex Ray,Raul Puri,Gretchen Krueger,Michael Petrov,Heidy Khlaaf,Girish Sastry,Pamela Mishkin,Brooke Chan,Scott Gray,Nick Ryder,Mikhail Pavlov,Alethea Power,Lukasz Kaiser,Mohammad Bavarian,Clemens Winter,Philippe Tillet,Felipe Such,Dave Cummings,Matthias Plappert,Fotios Chantzis,Elizabeth Barnes,Ariel Herbert-Voss,Will Guss,Alex Nichol,Igor Babuschkin,Suchir Balaji,Shantanu Jain,Andrew Carr,Jan Leike,Josh Achiam,Vedant Misra,Evan Morikawa,Alec Radford,Matthew Knight,Miles Brundage,Mira Murati,Katie Mayer,Peter Welinder,Bob McGrew,Dario Amodei,Sam McCandlish,Ilya Sutskever,Wojciech Zaremba 链接:https://arxiv.org/abs/2107.03374 摘要:我们介绍Codex,这是一个GPT语言模型,它在GitHub的公开代码上进行了微调,并研究了它的Python代码编写能力。一个独特的生产版本的法典权力GitHub副驾驶。在HumanEval上,我们发布了一个新的评估集来衡量从docstring合成程序的功能正确性,我们的模型解决了28.8%的问题,GPT-3解决了0%,GPT-J解决了11.4%。此外,我们发现,从模型中重复抽样是一个出人意料的有效策略,生产工作的解决方案,以困难的提示。使用这种方法,我们解决了70.2%的问题,每个问题有100个样本。仔细研究我们的模型可以发现它的局限性,包括很难用docstring描述长的操作链,也很难将操作绑定到变量。最后,我们讨论了部署强大的代码生成技术的潜在更广泛的影响,包括安全性、安全性和经济性。 摘要:We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
【2】 RISAN: Robust Instance Specific Abstention Network 标题:Risan:健壮的实例特定弃权网络
作者:Bhavya Kalra,Kulin Shah,Naresh Manwani 机构:Machine Learning Lab, International Institute of Technology, Hyderabad, India, Microsoft Research, Bangalore, India 链接:https://arxiv.org/abs/2107.03090 摘要:在本文中,我们提出了学习特定实例的弃权(拒绝选项)二进制分类器的深层结构。所提出的方法使用Kulin Shah和Naresh Manwani在《拒绝选项分类器的在线主动学习》,AAAI,2020)中描述的双S形损失函数作为性能度量。我们证明了双乙状结肠损失是分类校正的。我们还证明了0-d-1损失的超额风险是双s形损失超额风险的上界。我们推导了所提出的拒绝选项分类器结构的泛化误差界。为了验证所提出的方法的有效性,我们用几个真实的数据集进行了实验。我们观察到,所提出的方法不仅性能相当于国家的最先进的方法,它还对标签噪声的鲁棒性。我们还提供了可视化,以观察网络学习到的与弃权决定相对应的重要特征。 摘要:In this paper, we propose deep architectures for learning instance specific abstain (reject option) binary classifiers. The proposed approach uses double sigmoid loss function as described by Kulin Shah and Naresh Manwani in ("Online Active Learning of Reject Option Classifiers", AAAI, 2020), as a performance measure. We show that the double sigmoid loss is classification calibrated. We also show that the excess risk of 0-d-1 loss is upper bounded by the excess risk of double sigmoid loss. We derive the generalization error bounds for the proposed architecture for reject option classifiers. To show the effectiveness of the proposed approach, we experiment with several real world datasets. We observe that the proposed approach not only performs comparable to the state-of-the-art approaches, it is also robust against label noise. We also provide visualizations to observe the important features learned by the network corresponding to the abstaining decision.
【3】 Discriminative Mutual Information Estimators for Channel Capacity Learning 标题:用于信道容量学习的鉴别互信息估计器
作者:Nunzio A. Letizia,Andrea M. Tonello 机构: X is the channel input signal and Y is the channel outputThe authors are with the University of Klagenfurt - Chair of EmbeddedCommunication Systems 备注:13 pages, 4 figures 链接:https://arxiv.org/abs/2107.03084 摘要:信道容量在现代通信系统的发展中起着至关重要的作用,因为它代表了信息在通信信道上可靠传输的最大速率。然而,对于大多数通道来说,找到一个封闭形式的容量表达式仍然是一个开放的挑战。这是因为它需要执行两项艰巨的任务:a)计算信道输入和输出之间的互信息;b)最大化信道输入处的信号分布。在本文中,我们处理这两个任务。受隐式生成模型的启发,我们提出了一种新的协作框架,可以自动学习任何类型的无记忆信道的信道容量。特别地,我们首先开发了一种新的方法来直接从通常用于训练对抗网络的鉴别器中估计互信息,称为判别互信息估计器(DIME)。其次,我们将鉴别器纳入合作信道容量学习框架,称为皮层,其中,鉴别器学习区分依赖和独立信道输入-输出样本,而生成器学习产生最佳信道输入分布,鉴别器表现出最佳性能。最后,我们证明了合作值函数的特殊选择解决了信道容量估计问题。仿真结果表明,该方法具有较高的精度。 摘要:Channel capacity plays a crucial role in the development of modern communication systems as it represents the maximum rate at which information can be reliably transmitted over a communication channel. Nevertheless, for the majority of channels, finding a closed-form capacity expression remains an open challenge. This is because it requires to carry out two formidable tasks a) the computation of the mutual information between the channel input and output, and b) its maximization with respect to the signal distribution at the channel input. In this paper, we address both tasks. Inspired by implicit generative models, we propose a novel cooperative framework to automatically learn the channel capacity, for any type of memory-less channel. In particular, we firstly develop a new methodology to estimate the mutual information directly from a discriminator typically deployed to train adversarial networks, referred to as discriminative mutual information estimator (DIME). Secondly, we include the discriminator in a cooperative channel capacity learning framework, referred to as CORTICAL, where a discriminator learns to distinguish between dependent and independent channel input-output samples while a generator learns to produce the optimal channel input distribution for which the discriminator exhibits the best performance. Lastly, we prove that a particular choice of the cooperative value function solves the channel capacity estimation problem. Simulation results demonstrate that the proposed method offers high accuracy.
【4】 Harnessing Heterogeneity: Learning from Decomposed Feedback in Bayesian Modeling 标题:利用异构性:从贝叶斯建模中分解的反馈中学习
作者:Kai Wang,Bryan Wilder,Sze-chuan Suen,Bistra Dilkina,Milind Tambe 机构:Harvard University, USA, University of Southern California, USA 链接:https://arxiv.org/abs/2107.03003 摘要:学习和优化一个由多个子组件组成的复杂系统,其中这些组件可以是代理或自主传感器,这引起了人们极大的兴趣。在这方面的丰富文献中,基于agent和特定领域的仿真可以捕获复杂的动力学和子组交互,但是在这样的仿真上进行优化在计算和算法上都具有挑战性。贝叶斯方法,如高斯过程(GPs),可以用来学习一个计算上易于处理的近似基础动力学,但通常忽略了有关复杂系统中子群的详细信息。我们试图通过提出分解反馈的思想来找到两个世界中最好的一个,它捕获了基于组的异质性和动态性。我们引入了一种新的分解GP回归方法来结合子组分解反馈。与以前的方法相比,我们的修正回归具有更低的方差,因此后验概率更准确;它还允许我们引入一个分解GP-UCB优化算法,利用子组反馈。该方法的贝叶斯性质使得优化算法在理论上具有收敛性和无遗憾性。为了证明这项工作的广泛适用性,我们在两个不同的社会问题上执行了我们的算法:异质人群中的传染病控制和分布式天气传感器的分配。实验结果表明,与现有方法相比,新方法有了显著的改进。 摘要:There is significant interest in learning and optimizing a complex system composed of multiple sub-components, where these components may be agents or autonomous sensors. Among the rich literature on this topic, agent-based and domain-specific simulations can capture complex dynamics and subgroup interaction, but optimizing over such simulations can be computationally and algorithmically challenging. Bayesian approaches, such as Gaussian processes (GPs), can be used to learn a computationally tractable approximation to the underlying dynamics but typically neglect the detailed information about subgroups in the complicated system. We attempt to find the best of both worlds by proposing the idea of decomposed feedback, which captures group-based heterogeneity and dynamics. We introduce a novel decomposed GP regression to incorporate the subgroup decomposed feedback. Our modified regression has provably lower variance -- and thus a more accurate posterior -- compared to previous approaches; it also allows us to introduce a decomposed GP-UCB optimization algorithm that leverages subgroup feedback. The Bayesian nature of our method makes the optimization algorithm trackable with a theoretical guarantee on convergence and no-regret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control in a heterogeneous population and allocation of distributed weather sensors. Experimental results show that our new method provides significant improvement compared to the state-of-the-art.
【5】 Principles for Evaluation of AI/ML Model Performance and Robustness 标题:AI/ML模型性能和稳健性的评价原则
作者:Olivia Brown,Andrew Curtis,Justin Goodwin 链接:https://arxiv.org/abs/2107.02868 摘要:美国国防部(DoD)已经大幅增加了对人工智能和机器学习(AI/ML)能力的设计、评估和部署的投资,以满足国家安全需求。虽然AI/ML在学术和商业领域取得了许多成功,但其中许多系统也被证明是脆弱和不健壮的。在复杂和不断变化的国家安全环境中,在这些新能力部署到战场之前,国防部必须建立一个健全和系统的过程来评估AI/ML模型的性能和健壮性。本文回顾了AI/ML开发过程,重点介绍了AI/ML模型评估的常见最佳实践,并向国防部评估人员提出了建议,以确保为国家安全需要部署强大的AI/ML能力。 摘要:The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust. In a complex and ever-changing national security environment, it is vital that the DoD establish a sound and methodical process to evaluate the performance and robustness of AI/ML models before these new capabilities are deployed to the field. This paper reviews the AI/ML development process, highlights common best practices for AI/ML model evaluation, and makes recommendations to DoD evaluators to ensure the deployment of robust AI/ML capabilities for national security needs.
【6】 Immuno-mimetic Deep Neural Networks (Immuno-Net) 标题:免疫仿生深度神经网络(Immuno-Net)
作者:Ren Wang,Tianqi Chen,Stephen Lindsly,Cooper Stansbury,Indika Rajapakse,Alfred Hero 机构: 1University of Michigan 链接:https://arxiv.org/abs/2107.02842 摘要:仿生学在人工神经网络的进化中起着关键的作用。到目前为止,电子隐喻已经被神经科学和认知心理学的概念所支配。在本文中,我们介绍了一种不同类型的仿生模型,一种借用免疫系统的概念,用于设计健壮的深层神经网络。这种免疫模拟模型为深层神经网络对抗对抗性攻击提供了一个新的计算生物学框架。在这个免疫网络框架中,我们定义了一个健壮的适应性免疫启发学习系统(Immuno-Net-RAILS),它在硅片中模拟了B细胞的适应性生物学机制,用于保护哺乳动物宿主免受致病性攻击。当应用于在基准数据集上的图像分类任务时,我们证明了免疫网络RAILS在基线方法(DkNN鲁棒CNN)的对抗性精度方面提高了12.5%,而在干净数据上没有明显的精度损失。 摘要:Biomimetics has played a key role in the evolution of artificial neural networks. Thus far, in silico metaphors have been dominated by concepts from neuroscience and cognitive psychology. In this paper we introduce a different type of biomimetic model, one that borrows concepts from the immune system, for designing robust deep neural networks. This immuno-mimetic model leads to a new computational biology framework for robustification of deep neural networks against adversarial attacks. Within this Immuno-Net framework we define a robust adaptive immune-inspired learning system (Immuno-Net RAILS) that emulates, in silico, the adaptive biological mechanisms of B-cells that are used to defend a mammalian host against pathogenic attacks. When applied to image classification tasks on benchmark datasets, we demonstrate that Immuno-net RAILS results in improvement of as much as 12.5% in adversarial accuracy of a baseline method, the DkNN-robustified CNN, without appreciable loss of accuracy on clean data.
其他(16篇)
【1】 Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions 标题:减轻神经标记点过程的性能饱和:结构和损失函数
作者:Tianbo Li,Tianze Luo,Yiping Ke,Sinno Jialin Pan 机构:Sea AI Lab, Singapore, Nanyang Technological University 备注:9 pages, 4 figures, accepted by KDD-21 research track. The source code is available at this https URL Hawkes-Processes-GCHP 链接:https://arxiv.org/abs/2107.03354 摘要:属性化事件序列在实践中经常遇到。最近的一个研究方向是将神经网络与统计模型——标记点过程相结合,标记点过程是处理属性事件序列的传统工具。神经标记点过程具有很好的概率模型解释能力和神经网络的表示能力。然而,我们发现神经标记点过程的性能并不总是随着网络结构的复杂化和大型化而提高,这就是我们所说的性能饱和现象。这是由于神经标记点过程的泛化误差同时由网络的表示能力和模型规格决定的。因此,我们可以得出两个主要结论:第一,在某些情况下,简单的网络结构并不比复杂的网络结构差;其次,使用适当的概率假设与提高网络的复杂性同等重要,甚至更重要。基于这一观察,我们提出了一种简单的基于图的网络结构GCHP,它只使用图卷积层,因此可以很容易地被并行机制加速。我们直接考虑到达时间的分布,而不是对条件强度函数施加特定假设,并提出使用似然比损失与矩匹配机制进行优化和模型选择。实验结果表明,GCHP能显著减少训练时间,而在间隔时间概率假设下的似然比损失能显著提高模型性能。 摘要:Attributed event sequences are commonly encountered in practice. A recent research line focuses on incorporating neural networks with the statistical model -- marked point processes, which is the conventional tool for dealing with attributed event sequences. Neural marked point processes possess good interpretability of probabilistic models as well as the representational power of neural networks. However, we find that performance of neural marked point processes is not always increasing as the network architecture becomes more complicated and larger, which is what we call the performance saturation phenomenon. This is due to the fact that the generalization error of neural marked point processes is determined by both the network representational ability and the model specification at the same time. Therefore we can draw two major conclusions: first, simple network structures can perform no worse than complicated ones for some cases; second, using a proper probabilistic assumption is as equally, if not more, important as improving the complexity of the network. Based on this observation, we propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers, thus it can be easily accelerated by the parallel mechanism. We directly consider the distribution of interarrival times instead of imposing a specific assumption on the conditional intensity function, and propose to use a likelihood ratio loss with a moment matching mechanism for optimization and model selection. Experimental results show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
【2】 Samplets: A new paradigm for data compression 标题:样本集:一种新的数据压缩范式
作者:Helmut Harbrecht,Michael Multerer 链接:https://arxiv.org/abs/2107.03337 摘要:在这篇文章中,我们通过将Tausch-White小波的构造转移到数据领域来引入样本的新概念。通过这种方法,我们得到了离散数据的多级表示,直接实现了数据压缩、奇异点检测和自适应。应用样本来表示核矩阵,当它们出现在基于核的学习或高斯过程回归中时,我们最终得到准稀疏矩阵。通过对小条目设置阈值,这些矩阵可压缩为O(N logn)相关条目,其中N是数据点的数目。此功能允许使用填充来减少重排序,以获得压缩矩阵的稀疏分解。除了全面介绍样本及其性质外,我们还进行了大量的数值研究来验证这种方法。我们的结果表明,样本标志着一个相当大的步骤的方向,使大数据集可访问的分析。 摘要:In this article, we introduce the novel concept of samplets by transferring the construction of Tausch-White wavelets to the realm of data. This way we obtain a multilevel representation of discrete data which directly enables data compression, detection of singularities and adaptivity. Applying samplets to represent kernel matrices, as they arise in kernel based learning or Gaussian process regression, we end up with quasi-sparse matrices. By thresholding small entries, these matrices are compressible to O(N log N) relevant entries, where N is the number of data points. This feature allows for the use of fill-in reducing reorderings to obtain a sparse factorization of the compressed matrices. Besides the comprehensive introduction to samplets and their properties, we present extensive numerical studies to benchmark the approach. Our results demonstrate that samplets mark a considerable step in the direction of making large data sets accessible for analysis.
【3】 Probabilistic semi-nonnegative matrix factorization: a Skellam-based framework 标题:概率半非负矩阵分解:一个基于Skellam的框架
作者:Benoit Fuentes,Gaël Richard 备注:Submitted for publication 链接:https://arxiv.org/abs/2107.03317 摘要:我们提出了一种新的概率模型来解决半非负矩阵分解(SNMF),称为Skellam-SNMF。它是一个由先验分量、骨架分布的隐变量和观测数据组成的递阶生成模型。推导了两种推理算法:最大后验估计的期望最大化算法和全贝叶斯推理的变分Bayes-EM算法,包括参数先验分布的估计。在这个基于Skellam的模型中,我们还引入了实值目标数据$x$和两个非负参数$lambda{0}$和$lambda{1}$之间的一个新的散度$mathcal{D} left(xmidlambda{0}、lambda{1} right)=0Leftrightarrow x=lambda{0}-lambda{1}$,这是Kullback-Leibler(KL)散度的推广。最后,我们对这些新算法进行了实验研究,以了解它们的行为,并证明它们在实际数据的自动聚类任务中可以优于经典的SNMF方法。 摘要:We present a new probabilistic model to address semi-nonnegative matrix factorization (SNMF), called Skellam-SNMF. It is a hierarchical generative model consisting of prior components, Skellam-distributed hidden variables and observed data. Two inference algorithms are derived: Expectation-Maximization (EM) algorithm for maximum emph{a posteriori} estimation and Variational Bayes EM (VBEM) for full Bayesian inference, including the estimation of parameters prior distribution. From this Skellam-based model, we also introduce a new divergence $mathcal{D}$ between a real-valued target data $x$ and two nonnegative parameters $lambda_{0}$ and $lambda_{1}$ such that $mathcal{D}left(xmidlambda_{0},lambda_{1}right)=0Leftrightarrow x=lambda_{0}-lambda_{1}$, which is a generalization of the Kullback-Leibler (KL) divergence. Finally, we conduct experimental studies on those new algorithms in order to understand their behavior and prove that they can outperform the classic SNMF approach on real data in a task of automatic clustering.
【4】 SoundStream: An End-to-End Neural Audio Codec 标题:Soundstream:一种端到端的神经音频编解码器
作者:Neil Zeghidour,Alejandro Luebs,Ahmed Omran,Jan Skoglund,Marco Tagliasacchi 链接:https://arxiv.org/abs/2107.03312 摘要:我们提出了SoundStream,一种新的神经音频编解码器,可以有效地压缩语音,音乐和一般音频比特率通常是针对语音定制编解码器。SoundStream依赖于一个模型结构,该结构由一个完全卷积的编码器/解码器网络和一个残差矢量量化器组成,它们端到端地联合训练。训练利用了文本到语音和语音增强方面的最新进展,这些技术结合了对抗性和重建损失,允许从量化嵌入中生成高质量的音频内容。通过对量化器层应用结构化丢包进行训练,单个模型可以跨3kbps到18kbps的可变比特率进行操作,与在固定比特率下训练的模型相比,质量损失可以忽略不计。此外,该模型还适用于低延迟实现,支持流式推理,并在智能手机CPU上实时运行。在使用24kHz采样率音频的主观评估中,3kbps的声音流在12kbps时优于Opus,在9.6kbps时接近EVS。此外,我们能够在编码器或解码器端执行联合压缩和增强,而无需额外的延迟,这是我们通过语音背景噪声抑制来证明的。 摘要:We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream relies on a model architecture composed by a fully convolutional encoder/decoder network and a residual vector quantizer, which are trained jointly end-to-end. Training leverages recent advances in text-to-speech and speech enhancement, which combine adversarial and reconstruction losses to allow the generation of high-quality audio content from quantized embeddings. By training with structured dropout applied to quantizer layers, a single model can operate across variable bitrates from 3kbps to 18kbps, with a negligible quality loss when compared with models trained at fixed bitrates. In addition, the model is amenable to a low latency implementation, which supports streamable inference and runs in real time on a smartphone CPU. In subjective evaluations using audio at 24kHz sampling rate, SoundStream at 3kbps outperforms Opus at 12kbps and approaches EVS at 9.6kbps. Moreover, we are able to perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency, which we demonstrate through background noise suppression for speech.
【5】 Episodic Bandits with Stochastic Experts 标题:与随机专家的插曲强盗
作者:Nihal Sharma,Soumya Basu,Karthikeyan Shanmugam,Sanjay Shakkottai 机构:Department of ECE, The University of Texas at Austin, Google, Mountain View, Research AI, IBM Research NY 链接:https://arxiv.org/abs/2107.03263 摘要:我们研究了一个上下文bandit问题,在这个问题中,通过一组随机专家策略,一个agent被赋予对图结构环境中的一个节点的软控制。代理在多个事件中与环境交互,每个事件具有不同的上下文分布;这会导致“最佳专家”在不同的剧集中发生变化。我们的目标是开发一个代理,跟踪最好的专家超过集。在这种情况下,我们引入了基于经验散度的UCB(ED-UCB)算法,其中agent不知道专家策略或上下文分布的变化。在温和的假设下,我们证明了从$tilde{O}(Nlog(NT^2sqrt{E}))$样本引导导致$tilde{O}(E(N 1) frac{Nsqrt{E}{T^2})$的遗憾。如果代理事先知道专家策略,那么我们可以将遗憾改进为$tilde{O}(EN)$,而不需要任何引导。在已知专家策略的情况下,我们的分析还将已有的对数遗憾界收紧为非偶发性条件下的问题相关常数。最后我们通过模拟实验验证了我们的发现。 摘要:We study a version of the contextual bandit problem where an agent is given soft control of a node in a graph-structured environment through a set of stochastic expert policies. The agent interacts with the environment over episodes, with each episode having different context distributions; this results in the `best expert' changing across episodes. Our goal is to develop an agent that tracks the best expert over episodes. We introduce the Empirical Divergence-based UCB (ED-UCB) algorithm in this setting where the agent does not have any knowledge of the expert policies or changes in context distributions. With mild assumptions, we show that bootstrapping from $tilde{O}(Nlog(NT^2sqrt{E}))$ samples results in a regret of $tilde{O}(E(N 1) frac{Nsqrt{E}}{T^2})$. If the expert policies are known to the agent a priori, then we can improve the regret to $tilde{O}(EN)$ without requiring any bootstrapping. Our analysis also tightens pre-existing logarithmic regret bounds to a problem-dependent constant in the non-episodic setting when expert policies are known. We finally empirically validate our findings through simulations.
【6】 "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops
作者:Patrick John Chia,Bingqing Yu,Jacopo Tagliabue 机构:Montreal, Canada, Coveo Labs, New York, NY 链接:https://arxiv.org/abs/2107.03256 摘要:大型电子商务公司引入了比较表作为一种新的推荐方式。然而,在没有预先存在的训练/分类数据的情况下进行大规模比较仍然是一个公开的挑战,特别是在长尾商店的运营限制下。我们展示了构建一个比较管道的初步结果,该管道设计用于在多车间场景中进行扩展:我们描述了我们的设计选择,并在多个车间上运行广泛的基准测试来对其进行压力测试。最后,我们对物业选择进行了一个小型用户研究,并通过讨论潜在的改进和突出有待解决的问题得出结论。 摘要:Large eCommerce players introduced comparison tables as a new type of recommendations. However, building comparisons at scale without pre-existing training/taxonomy data remains an open challenge, especially within the operational constraints of shops in the long tail. We present preliminary results from building a comparison pipeline designed to scale in a multi-shop scenario: we describe our design choices and run extensive benchmarks on multiple shops to stress-test it. Finally, we run a small user study on property selection and conclude by discussing potential improvements and highlighting the questions that remain to be addressed.
【7】 Scalable Data Balancing for Unlabeled Satellite Imagery 标题:无标签卫星影像的可伸缩数据平衡
作者:Deep Patel,Erin Gao,Anirudh Koul,Siddha Ganju,Meher Anand Kasam 备注:Accepted to COSPAR 2021 Workshop on Machine Learning for Space Sciences. 5 pages, 9 figures 链接:https://arxiv.org/abs/2107.03227 摘要:数据不平衡是机器学习中普遍存在的问题。在大规模收集和注释的数据集中,数据不平衡可以通过对频繁类的欠采样和稀有类的过采样来手动缓解,也可以通过插补和扩充技术来计划。在这两种情况下,平衡数据都需要标签。换句话说,只有带注释的数据才能平衡。收集完全注释的数据集是一项挑战,特别是对于大型卫星系统,如未标记的NASA的35pb地球图像数据集。尽管美国宇航局的地球图像数据集是未标记的,但我们可以依赖数据源的隐含属性来假设其不平衡性,例如地球图像中的土地和水的分布。我们提出了一种新的迭代方法来平衡未标记的数据。我们的方法利用图像嵌入作为图像标签的代理,可以用来平衡数据,并最终在训练时提高整体精度。 摘要:Data imbalance is a ubiquitous problem in machine learning. In large scale collected and annotated datasets, data imbalance is either mitigated manually by undersampling frequent classes and oversampling rare classes, or planned for with imputation and augmentation techniques. In both cases balancing data requires labels. In other words, only annotated data can be balanced. Collecting fully annotated datasets is challenging, especially for large scale satellite systems such as the unlabeled NASA's 35 PB Earth Imagery dataset. Although the NASA Earth Imagery dataset is unlabeled, there are implicit properties of the data source that we can rely on to hypothesize about its imbalance, such as distribution of land and water in the case of the Earth's imagery. We present a new iterative method to balance unlabeled data. Our method utilizes image embeddings as a proxy for image labels that can be used to balance data, and ultimately when trained increases overall accuracy.
【8】 RAM-VO: Less is more in Visual Odometry 标题:RAM-VO:视觉里程计中的少即是多
作者:Iury Cleveston,Esther L. Colombini 机构:Laboratory of Robotics and Cognitive Systems (LaRoCS), Institute of Computing, University of Campinas, Campinas, S˜ao Paulo, Brazil 链接:https://arxiv.org/abs/2107.02974 摘要:建造能够在没有人监督的情况下运行的车辆需要确定代理人的姿势。视觉里程计(VO)算法仅利用输入图像的视觉变化来估计自我运动。最新的VO方法广泛使用卷积神经网络(CNN)来实现深度学习,这在处理高分辨率图像时增加了大量的成本。此外,在VO任务中,输入数据越多并不意味着预测效果越好;相反,架构可能会过滤掉无用的信息。因此,实现计算效率高、轻量级的体系结构至关重要。在这项工作中,我们提出了RAM-VO,一个扩展的经常性注意模型(RAM)的视觉里程计任务。RAM-VO改进了信息的视觉和时间表示,实现了近端策略优化(PPO)算法来学习鲁棒策略。结果表明,RAM-VO可以用大约300万个参数对单目输入图像进行6个自由度的回归。此外,在KITTI数据集上的实验表明,RAM-VO只使用了5.7%的可用视觉信息就获得了具有竞争力的结果。 摘要:Building vehicles capable of operating without human supervision requires the determination of the agent's pose. Visual Odometry (VO) algorithms estimate the egomotion using only visual changes from the input images. The most recent VO methods implement deep-learning techniques using convolutional neural networks (CNN) extensively, which add a substantial cost when dealing with high-resolution images. Furthermore, in VO tasks, more input data does not mean a better prediction; on the contrary, the architecture may filter out useless information. Therefore, the implementation of computationally efficient and lightweight architectures is essential. In this work, we propose the RAM-VO, an extension of the Recurrent Attention Model (RAM) for visual odometry tasks. RAM-VO improves the visual and temporal representation of information and implements the Proximal Policy Optimization (PPO) algorithm to learn robust policies. The results indicate that RAM-VO can perform regressions with six degrees of freedom from monocular input images using approximately 3 million parameters. In addition, experiments on the KITTI dataset demonstrate that RAM-VO achieves competitive results using only 5.7% of the available visual information.
【9】 Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows 标题:使用良态正规化流的对数凹分布的通用逼近
作者:Holden Lee,Chirag Pabbaraju,Anish Sevekari,Andrej Risteski 机构:Duke University, Carnegie Mellon University 备注:40 pages, 0 figures 链接:https://arxiv.org/abs/2107.02951 摘要:规范化流是一类广泛应用的具有可处理似然性的潜变量生成模型。仿射耦合(Dinh et al,2014-16)模型是一种特别常见的标准化流类型,其中潜在到可观测变量变换的雅可比矩阵是三角形的,允许在线性时间内计算可能性。尽管仿射耦合被广泛使用,但该体系结构的特殊结构使得理解它们的表示能力具有挑战性。直到最近,三篇平行论文才解决了普遍近似问题(Huang等人,2020;张等,2020;Koehler等人,2020年)--他展示了合理的正则分布可以通过仿射耦合任意地逼近--尽管网络具有几乎奇异的雅可比矩阵。由于病态雅可比矩阵是基于似然的训练的一个障碍,基本问题仍然存在:哪些分布可以用条件良好的仿射耦合流来近似?本文证明了任意对数凹分布都可以用条件良好的仿射耦合流来近似。在证明技术方面,我们揭示并利用仿射耦合结构、欠阻尼Langevin动力学(通常用于从Gibbs测度采样的随机微分方程)和H′enon映射(辛微分同胚研究中出现的结构化动力学系统)之间的深层联系。我们的结果也为仿射耦合的训练实践提供了信息:我们用iid高斯近似输入分布的填充版本——Koehler等人(2020)根据经验观察到的一种策略可以产生更好的条件流,但迄今为止没有任何理论基础。因此,我们的证明为高斯填充在训练标准化流时的好处提供了理论证据。 摘要:Normalizing flows are a widely used class of latent-variable generative models with a tractable likelihood. Affine-coupling (Dinh et al, 2014-16) models are a particularly common type of normalizing flows, for which the Jacobian of the latent-to-observable-variable transformation is triangular, allowing the likelihood to be computed in linear time. Despite the widespread usage of affine couplings, the special structure of the architecture makes understanding their representational power challenging. The question of universal approximation was only recently resolved by three parallel papers (Huang et al.,2020;Zhang et al.,2020;Koehler et al.,2020) -- who showed reasonably regular distributions can be approximated arbitrarily well using affine couplings -- albeit with networks with a nearly-singular Jacobian. As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows? In this paper, we show that any log-concave distribution can be approximated using well-conditioned affine-coupling flows. In terms of proof techniques, we uncover and leverage deep connections between affine coupling architectures, underdamped Langevin dynamics (a stochastic differential equation often used to sample from Gibbs measures) and H'enon maps (a structured dynamical system that appears in the study of symplectic diffeomorphisms). Our results also inform the practice of training affine couplings: we approximate a padded version of the input distribution with iid Gaussians -- a strategy which Koehler et al.(2020) empirically observed to result in better-conditioned flows, but had hitherto no theoretical grounding. Our proof can thus be seen as providing theoretical evidence for the benefits of Gaussian padding when training normalizing flows.
【10】 Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification 标题:扩展连续时间马尔可夫链有助于解决不规范问题
作者:Alkis Gotovos,Rebekka Burkholz,John Quackenbush,Stefanie Jegelka 机构:MIT, Harvard University 链接:https://arxiv.org/abs/2107.02911 摘要:在许多生物医学应用中,建立离散项目集(如基因突变)的时间演化模型是一个基本问题。我们通过连续时间马尔可夫链的镜头来处理这个问题,并且证明在通常的横截面数据设置中,所产生的学习任务通常是不明确的。我们探索了一种可能令人惊讶的补救方法:包括一些额外的独立项可以帮助确定时间顺序,从而解决不明确的问题。这与将分析局限于相关项目的一小部分的常见做法形成了鲜明对比,这在很大程度上是由于现有方法的伸缩性差。为了将我们的理论观点应用到实践中,我们提出了一种学习连续时间马尔可夫链的近似似然最大化方法,它可以扩展到数百个项目,并且比以前的方法快几个数量级。我们证明了我们的方法对合成和真实癌症数据的有效性。 摘要:Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecification. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.
【11】 An algorithmic view of ell_2 regularization and some path-following algorithms
作者:Yunzhang Zhu,Renxiong Liu 机构:Department of Statistics, The Ohio State University, Columbus, OH , USA, Editor: Rina Foygel Barber 备注:62 pages, 7 figures 链接:https://arxiv.org/abs/2107.03322 摘要:建立了凸损失函数的$ellu 2$-正则解路与常微分方程(ODE)解的等价性。重要的是,这种等价性揭示了解的路径可以看作是梯度下降法和牛顿法的混合应用于经验损失的流程,这类似于广泛使用的优化技术信赖域法。这提供了一个有趣的$ellu 2$正则化算法视图,与传统的$ellu 2$正则化解路径类似于经验损失的梯度流的观点不同,提出了基于同伦方法和数值ODE求解器的路径跟踪算法来数值逼近解路径。特别地,我们分别考虑牛顿方法和梯度下降法作为同伦方法的基础算法,并在解路径上建立它们的逼近误差率。重要的是,我们的理论提出了新的方案来选择网格点,以保证求解路径具有任意小的次优性。在计算量方面,我们证明了为了获得整个解路径的$epsilon$-次优性,牛顿法所需的牛顿步数为$mathcal O(epsilon^{-1/2})$,而梯度下降方法所需的梯度步数是$mathcal Oleft(epsilon^{-1}ln(epsilon^{-1})right)$。最后,以$ellu 2$-正则logistic回归为例,验证了所提出的路径跟踪算法的有效性。 摘要:We establish an equivalence between the $ell_2$-regularized solution path for a convex loss function, and the solution of an ordinary differentiable equation (ODE). Importantly, this equivalence reveals that the solution path can be viewed as the flow of a hybrid of gradient descent and Newton method applying to the empirical loss, which is similar to a widely used optimization technique called trust region method. This provides an interesting algorithmic view of $ell_2$ regularization, and is in contrast to the conventional view that the $ell_2$ regularization solution path is similar to the gradient flow of the empirical loss.New path-following algorithms based on homotopy methods and numerical ODE solvers are proposed to numerically approximate the solution path. In particular, we consider respectively Newton method and gradient descent method as the basis algorithm for the homotopy method, and establish their approximation error rates over the solution path. Importantly, our theory suggests novel schemes to choose grid points that guarantee an arbitrarily small suboptimality for the solution path. In terms of computational cost, we prove that in order to achieve an $epsilon$-suboptimality for the entire solution path, the number of Newton steps required for the Newton method is $mathcal O(epsilon^{-1/2})$, while the number of gradient steps required for the gradient descent method is $mathcal Oleft(epsilon^{-1} ln(epsilon^{-1})right)$. Finally, we use $ell_2$-regularized logistic regression as an illustrating example to demonstrate the effectiveness of the proposed path-following algorithms.
【12】 A Closed-Form Approximation to the Conjugate Prior of the Dirichlet and Beta Distributions 标题:Dirichlet分布和Beta分布的共轭先验的闭式逼近
作者:Kaspar Thommen 链接:https://arxiv.org/abs/2107.03183 摘要:我们推导了Dirichlet分布和beta分布的共轭先验,并用数值例子对其进行了探讨,以获得对分布本身、超参数及其收敛条件的直观理解。由于先验的不确定性,我们继续定义和分析一个闭式近似。最后,我们提供了一个实现这种近似的算法,该算法能够在不需要蒙特卡罗模拟的情况下,对Dirichlet和beta似然性进行完全可处理的贝叶斯共轭处理。 摘要:We derive the conjugate prior of the Dirichlet and beta distributions and explore it with numerical examples to gain an intuitive understanding of the distribution itself, its hyperparameters, and conditions concerning its convergence. Due to the prior's intractability, we proceed to define and analyze a closed-form approximation. Finally, we provide an algorithm implementing this approximation that enables fully tractable Bayesian conjugate treatment of Dirichlet and beta likelihoods without the need for Monte Carlo simulations.
【13】 Neural Contextual Bandits without Regret 标题:无怨无悔的神经情境性强盗
作者:Parnian Kassraie,Andreas Krause 机构:ETH Zurich 备注:37 pages, 6 figures 链接:https://arxiv.org/abs/2107.03144 摘要:上下文盗贼是一个丰富的模型,为顺序决策给定的边信息,具有重要的应用,如在推荐系统。提出了一种利用神经网络逼近未知奖赏函数的新算法。我们解决了在这种情况下证明一般上下文序列的次线性遗憾界的公开问题,同时考虑了完全连通网络和卷积网络。为此,我们首先分析了一种基于神经切线核(NTK)的核化bandit优化算法NTK-UCB,并以NTK最大信息增益$gammau T$这一反映学习困难的复杂度参数来界定其遗憾。我们对NTK的$gammau T$的边界可能有独立的兴趣。然后介绍了基于神经网络的算法NN-UCB,并证明了该算法能很好地跟踪NTK-UCB算法。在关于奖励函数的广泛的非参数假设下,我们的方法在$tilde{mathcal{O}(T^{-1/2d})$率下收敛到最优策略,其中$d$是上下文的维度。 摘要:Contextual bandits are a rich model for sequential decision making given side information, with important applications, e.g., in recommender systems. We propose novel algorithms for contextual bandits harnessing neural networks to approximate the unknown reward function. We resolve the open problem of proving sublinear regret bounds in this setting for general context sequences, considering both fully-connected and convolutional networks. To this end, we first analyze NTK-UCB, a kernelized bandit optimization algorithm employing the Neural Tangent Kernel (NTK), and bound its regret in terms of the NTK maximum information gain $gamma_T$, a complexity parameter capturing the difficulty of learning. Our bounds on $gamma_T$ for the NTK may be of independent interest. We then introduce our neural network based algorithm NN-UCB, and show that its regret closely tracks that of NTK-UCB. Under broad non-parametric assumptions about the reward function, our approach converges to the optimal policy at a $tilde{mathcal{O}}(T^{-1/2d})$ rate, where $d$ is the dimension of the context.
【14】 Test for non-negligible adverse shifts 标题:测试不可忽略的逆变率
作者:Vathy M. Kamulete 机构:Enterprise Model Risk Management, Royal Bank of Canada, Toronto, Canada 备注:14 pages, 4 figures, preprint 链接:https://arxiv.org/abs/2107.02990 摘要:数据集转移的统计测试易受假警报的影响:它们对微小差异非常敏感,而这些差异实际上具有足够的样本覆盖率和预测性能。我们提出了一个基于离群值得分的数据集迁移测试的健壮框架,简称D-SOS。D-SOS能检测到不利的偏移,并能识别由良性偏移引起的假警报。它假设一个新的(测试)样本在本质上并不比一个旧的(训练)样本差,也不是说两者相等。其关键思想是将观测值减少到异常值分数,并比较污染率。除了比较分布之外,用户还可以根据预测性能和其他相关概念来定义更差的含义。我们展示了D-SOS对于各种真实和模拟数据集的通用性和实用性。与均匀分布和拟合优度的测试不同,D-SOS测试是专门定制的,可作为监控模型漂移和数据集转移的稳健性能度量。 摘要:Statistical tests for dataset shift are susceptible to false alarms: they are sensitive to minor differences where there is in fact adequate sample coverage and predictive performance. We propose instead a robust framework for tests of dataset shift based on outlier scores, D-SOS for short. D-SOS detects adverse shifts and can identify false alarms caused by benign ones. It posits that a new (test) sample is not substantively worse than an old (training) sample, and not that the two are equal. The key idea is to reduce observations to outlier scores and compare contamination rates. Beyond comparing distributions, users can define what worse means in terms of predictive performance and other relevant notions. We show how versatile and practical D-SOS is for a wide range of real and simulated datasets. Unlike tests of equal distribution and of goodness-of-fit, the D-SOS tests are uniquely tailored to serve as robust performance metrics to monitor model drift and dataset shift.
【15】 Solution of Physics-based Bayesian Inverse Problems with Deep Generative Priors 标题:具有深度生成先验的基于物理的贝叶斯反问题的求解
作者:Dhruv V Patel,Deep Ray,Assad A Oberai 机构:Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, California, USA 备注:Paper: 18 pages, 5 figures. Supplementary: 9 pages, 6 Figures, 2 Tables 链接:https://arxiv.org/abs/2107.02926 摘要:众所周知,反问题很难求解,因为它们可能没有解,也可能有多个解,或者在测量的小扰动下解会有很大的变化。贝叶斯推理,提出了一个逆问题作为一个随机推理问题,解决了这些困难,并提供了定量估计的推断领域和相关的不确定性。然而,当推断大维度的向量时,和/或当通过先前获取的样本获得先验信息时,难以采用。在本文中,我们描述了如何使用深层生成对抗网络来表示贝叶斯推理中的先验分布,并克服这些挑战。我们将这些想法应用于逆问题,这些逆问题在控制物理原理、先验知识来源、测量类型以及测量噪声的可用信息范围等方面存在差异。在每种情况下,我们应用所提出的方法来推断最可能的解决方案和不确定性的定量估计。 摘要:Inverse problems are notoriously difficult to solve because they can have no solutions, multiple solutions, or have solutions that vary significantly in response to small perturbations in measurements. Bayesian inference, which poses an inverse problem as a stochastic inference problem, addresses these difficulties and provides quantitative estimates of the inferred field and the associated uncertainty. However, it is difficult to employ when inferring vectors of large dimensions, and/or when prior information is available through previously acquired samples. In this paper, we describe how deep generative adversarial networks can be used to represent the prior distribution in Bayesian inference and overcome these challenges. We apply these ideas to inverse problems that are diverse in terms of the governing physical principles, sources of prior knowledge, type of measurement, and the extent of available information about measurement noise. In each case we apply the proposed approach to infer the most likely solution and quantitative estimates of uncertainty.
【16】 Particle Convolution for High Energy Physics 标题:高能物理中的粒子卷积
作者:Chase Shimmin 机构:Department of Physics, Yale University, New Haven, CT 备注:To be presented at ML4Jets 2021 链接:https://arxiv.org/abs/2107.02908 摘要:介绍了粒子卷积网络(PCN),它是一种新型的等变神经网络层,适用于射流物理中的许多任务。粒子卷积层可以看作是深集和能量流网络结构的扩展,其中置换不变算子被提升为群卷积。虽然PCN可以实现各种对称性,我们考虑旋转轴的特定情况下的$ η-φ$平面。在两个标准的基准测试任务q/g标记和top标记中,我们证明了旋转PCN(rPCN)的性能与ParticleNet等图网络相当。此外,我们证明了实现IRC安全的rPCN是可能的,这在两个任务上都显著优于现有的IRC安全标记方法。我们推测,通过将PCN推广到包含与喷射物理相关的额外卷积对称性,它可能会优于目前最先进的图形网络集,同时提供了对物理激励的电感偏差的新程度的控制。 摘要:We introduce the Particle Convolution Network (PCN), a new type of equivariant neural network layer suitable for many tasks in jet physics. The particle convolution layer can be viewed as an extension of Deep Sets and Energy Flow network architectures, in which the permutation-invariant operator is promoted to a group convolution. While the PCN can be implemented for various kinds of symmetries, we consider the specific case of rotation about the jet axis the $eta - phi$ plane. In two standard benchmark tasks, q/g tagging and top tagging, we show that the rotational PCN (rPCN) achieves performance comparable to graph networks such as ParticleNet. Moreover, we show that it is possible to implement an IRC-safe rPCN, which significantly outperforms existing IRC-safe tagging methods on both tasks. We speculate that by generalizing the PCN to include additional convolutional symmetries relevant to jet physics, it may outperform the current state-of-the-art set by graph networks, while offering a new degree of control over physically-motivated inductive biases.