Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.AI人工智能,共计126篇
【1】 Discovering and Achieving Goals via World Models 标题:通过世界模型发现和实现目标 链接:https://arxiv.org/abs/2110.09514
作者:Russell Mendonca,Oleh Rybkin,Kostas Daniilidis,Danijar Hafner,Deepak Pathak 机构:Carnegie Mellon University, University of Pennsylvania, University of Toronto 备注:NeurIPS 2021. First two authors contributed equally. Website at this https URL 摘要:人工智能体如何在没有任何监督的情况下学会在复杂的视觉环境中解决许多不同的任务?我们将这个问题分解为两个问题:发现新的目标和学习可靠地实现它们。我们介绍了潜在探索者成就者(LEXA),这是一个统一的解决方案,它从图像输入中学习世界模型,并使用它从想象的卷展中训练探索者和成就者策略。与之前通过到达之前访问过的州进行探索的方法不同,探索者计划通过预见发现未知的令人惊讶的州,然后将这些州作为不同的目标供成功者练习。在无监督阶段之后,LEXA解决了指定为目标图像Zero-Shot的任务,无需任何额外的学习。LEXA在以前的基准测试和新的具有挑战性的基准测试中,在四个标准机器人操作和移动领域共有40项测试任务,大大优于以前的无监督目标达成方法。LEXA进一步实现了需要按顺序与多个对象交互的目标。最后,为了演示LEXA的可伸缩性和通用性,我们在四个不同的环境中训练了一个通用代理。代码和视频在https://orybkin.github.io/lexa/ 摘要:How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/
【2】 In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications 标题:简而言之,人类提出了这样的要求:以下时间规范的潜在目标 链接:https://arxiv.org/abs/2110.09461
作者:Borja G. León,Murray Shanahan,Francesco Belardinelli 机构:Department of Computing, Imperial College London, London, United Kingdom 摘要:我们解决了构建代理的问题,其目标是通过使用深度强化学习(DRL)来满足以时态逻辑(TL)表示的分布外(OOD)多任务指令。最近的研究证明,当教授DRL代理解决TL中的OOD任务时,深度学习体系结构是一个关键特征。然而,对其性能的研究仍然有限。在这项工作中,我们分析了各种最先进的(SOTA)体系结构,这些体系结构在概括TL中表示的安全感知任务时,包括概括机制,如关系层、软注意机制或层次结构。最重要的是,我们提出了一种新的深度学习体系结构,该体系结构可以诱导智能体在给定人类指令和当前环境观察的情况下生成其当前目标的潜在表示。我们发现,当在OOD环境中执行新任务时,将我们建议的配置应用于SOTA体系结构会产生更高的性能。 摘要:We address the problem of building agents whose goal is to satisfy out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the deep learning architecture is a key feature when teaching a DRL agent to solve OOD tasks in TL. Yet, the studies on their performance are still limited. In this work, we analyse various state-of-the-art (SOTA) architectures that include generalisation mechanisms such as relational layers, the soft-attention mechanism, or hierarchical configurations, when generalising safety-aware tasks expressed in TL. Most importantly, we present a novel deep learning architecture that induces agents to generate latent representations of their current goal given both the human instruction and the current observation from the environment. We find that applying our proposed configuration to SOTA architectures yields significantly stronger performance when executing new tasks in OOD environments.
【3】 NormFormer: Improved Transformer Pretraining with Extra Normalization 标题:NormFormer:改进的Transformer预训练和额外归一化 链接:https://arxiv.org/abs/2110.09456
作者:Sam Shleifer,Jason Weston,Myle Ott 机构:Facebook AI Research∗ 摘要:在预训练期间,预层变形Transformer遭受梯度幅度失配:早期层的梯度比后期层大得多。这些问题可以通过我们提出的NormFormer架构得到缓解,该架构为每一层添加了三个规范化操作:自我注意后的层规范、自我注意输出的头部缩放以及第一个完全连接层后的层规范。额外操作产生的计算成本可以忽略不计( 0.4%的参数增加),但对于1.25亿到27亿个参数范围内的因果语言模型和掩蔽语言模型,可以改善训练前的困惑和下游任务性能。例如,在最强的1.3B参数基线上添加NormFormer可以更快地达到相等的复杂度24%,或者在相同的计算预算中更好地收敛到0.27复杂度。该型号的GPT3大(1.3B)零炮性能提高60%。对于屏蔽语言建模,NormFormer平均将微调的GLUE性能提高1.9%。fairseq中提供了用于训练NORMONER模型的代码https://github.com/pytorch/fairseq/tree/main/examples/normformer . 摘要:During pretraining, the Pre-LayerNorm transformer suffers from a gradient magnitude mismatch: gradients at early layers are much larger than at later layers. These issues can be alleviated by our proposed NormFormer architecture, which adds three normalization operations to each layer: a Layer Norm after self attention, head-wise scaling of self-attention outputs, and a Layer Norm after the first fully connected layer. The extra operations incur negligible compute cost ( 0.4% parameter increase), but improve pretraining perplexity and downstream task performance for both causal and masked language models ranging from 125 Million to 2.7 Billion parameters. For example, adding NormFormer on top of our strongest 1.3B parameter baseline can reach equal perplexity 24% faster, or converge 0.27 perplexity better in the same compute budget. This model reaches GPT3-Large (1.3B) zero shot performance 60% faster. For masked language modeling, NormFormer improves fine-tuned GLUE performance by 1.9% on average. Code to train NormFormer models is available in fairseq https://github.com/pytorch/fairseq/tree/main/examples/normformer .
【4】 TLDR: Twin Learning for Dimensionality Reduction 标题:TLDR:用于降维的孪生学习 链接:https://arxiv.org/abs/2110.09455
作者:Yannis Kalantidis,Carlos Lassance,Jon Almazan,Diane Larlus 机构:Jon Almazán, NAVER LABS Europe 备注:Code available at: this https URL 摘要:降维方法是一种无监督的方法,用于学习低维空间,在低维空间中保留初始空间的某些属性,通常是“邻域”的概念。它们是可视化、压缩、索引和检索等多种任务的重要组成部分。针对一个完全不同的目标,自监督视觉表征学习已被证明能够通过学习模型产生可转移的表征函数,该模型对人为产生的失真(例如一组手工制作的图像变换)进行编码不变性。与流形学习方法不同,流形学习方法通常需要在大型k-NN图或复杂的优化求解器上传播,自监督学习方法依赖于更简单和更可伸缩的学习框架。在本文中,我们从流形学习的角度统一了这两类方法,并提出了TLDR,一种通用输入空间的降维方法,该方法将Barlow Twins的简单自监督学习框架移植到难以或不可能手动定义适当扭曲集的环境中。我们建议使用最近邻从训练集中构建成对,并借用自监督文献中的冗余减少损失来学习一个编码器,该编码器在这些成对中产生不变的表示。TLDR方法简单,易于实施和训练,适用性广;它包括一个可高度近似的离线最近邻计算步骤,以及一个不需要挖掘负样本进行对比、特征分解或繁琐的优化解算器的简单学习过程。通过将PCA替换为TLDR,我们能够将GeM AP的性能提高4%,使128维的mAP提高4%,并使其性能保持在16倍以下的维。 摘要:Dimensionality reduction methods are unsupervised approaches which learn low-dimensional spaces where some properties of the initial space, typically the notion of "neighborhood", are preserved. They are a crucial component of diverse tasks like visualization, compression, indexing, and retrieval. Aiming for a totally different goal, self-supervised visual representation learning has been shown to produce transferable representation functions by learning models that encode invariance to artificially created distortions, e.g. a set of hand-crafted image transformations. Unlike manifold learning methods that usually require propagation on large k-NN graphs or complicated optimization solvers, self-supervised learning approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of manifold learning and propose TLDR, a dimensionality reduction method for generic input spaces that is porting the simple self-supervised learning framework of Barlow Twins to a setting where it is hard or impossible to define an appropriate set of distortions by hand. We propose to use nearest neighbors to build pairs from a training set and a redundancy reduction loss borrowed from the self-supervised literature to learn an encoder that produces representations invariant across such pairs. TLDR is a method that is simple, easy to implement and train, and of broad applicability; it consists of an offline nearest neighbor computation step that can be highly approximated, and a straightforward learning process that does not require mining negative samples to contrast, eigendecompositions, or cumbersome optimization solvers. By replacing PCA with TLDR, we are able to increase the performance of GeM-AP by 4% mAP for 128 dimensions, and to retain its performance with 16x fewer dimensions.
【5】 Beltrami Flow and Neural Diffusion on Graphs 标题:图上的Beltrami流与神经扩散 链接:https://arxiv.org/abs/2110.09443
作者:Benjamin Paul Chamberlain,James Rowbottom,Davide Eynard,Francesco Di Giovanni,Xiaowen Dong,Michael M Bronstein 机构:University of Oxford, Michael M. Bronstein, Twitter Inc. and Imperial College London 备注:21 pages, 5 figures. Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) 2021 摘要:我们提出了一类新的基于离散Beltrami流的图神经网络,这是一种非欧几里德扩散偏微分方程。在我们的模型中,节点特征补充了从图拓扑派生的位置编码,并由Beltrami流联合进化,同时产生连续的特征学习和拓扑进化。由此产生的模型推广了许多流行的图形神经网络,并在几个基准上实现了最先进的结果。 摘要:We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks.
【6】 Goal Agnostic Planning using Maximum Likelihood Paths in Hypergraph World Models 标题:超图世界模型中基于最大似然路径的目标不可知规划 链接:https://arxiv.org/abs/2110.09442
作者:Christopher Robinson 机构:Department of Electrical and Computer Engineering, University of Louisville, Louisville, KY , USA, Editor: 备注:58 pages, 27 figures, comments 摘要:在本文中,我们提出了一种基于超图的机器学习算法、一种数据结构驱动的维护方法以及一种基于Dijkstra算法概率应用的规划算法。这些共同构成了自主学习代理的目标无关自动规划引擎,该引擎结合了经典机器学习和传统人工智能的有益特性。我们证明了该算法确定了问题空间内的最优解,数学上有界的学习性能,并提供了一个数学模型,通过时间分析系统状态的进展,得出学习曲线、目标实现率以及对抽象和不确定性的响应的显式预测。为了验证性能,我们展示了将代理应用于三个原型规划问题(包括复合层次结构域)的结果,并强调了说明分析中阐明的属性的实证结果。 摘要:In this paper, we present a hypergraph--based machine learning algorithm, a datastructure--driven maintenance method, and a planning algorithm based on a probabilistic application of Dijkstra's algorithm. Together, these form a goal agnostic automated planning engine for an autonomous learning agent which incorporates beneficial properties of both classical Machine Learning and traditional Artificial Intelligence. We prove that the algorithm determines optimal solutions within the problem space, mathematically bound learning performance, and supply a mathematical model analyzing system state progression through time yielding explicit predictions for learning curves, goal achievement rates, and response to abstractions and uncertainty. To validate performance, we exhibit results from applying the agent to three archetypal planning problems, including composite hierarchical domains, and highlight empirical findings which illustrate properties elucidated in the analysis.
【7】 FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection 标题:FMFCC-A:一种具有挑战性的汉语合成语音检测数据集 链接:https://arxiv.org/abs/2110.09441
作者:Zhenyu Zhang,Yewei Gu,Xiaowei Yi,Xianfeng Zhao 机构: State Key Laboratory of Information Security, Institute of Information, Engineering, Chinese Academy of Sciences, Beijing , China, School of Cyber Security, University of Chinese Academy of Sciences 摘要:随着文本到语音(TTS)和语音转换(VC)技术的不断发展,合成语音的检测受到了极大的影响。为了促进针对普通话TTS和VC技术的合成语音检测模型的发展,我们构建了一个具有挑战性的普通话数据集,并组织了中国图像与图形学会(FMFCC-a)第一次假媒体取证挑战赛的伴奏音轨。FMFCC-A数据集是迄今为止最大的用于合成语音检测的公开普通话数据集,其中包含由11个普通话TTS系统和两个普通话VC系统生成的40000个合成普通话语音,以及从58位发言者收集的10000个真实普通话语音。FMFCC-A数据集分为训练集、开发集和评估集,用于研究各种未知语音合成系统或音频后处理操作下的合成汉语语音检测。除了描述FMFCC-A数据集的构造外,我们还详细分析了两种基线方法和FMFCC-A提交的最优秀的数据,这说明了FMFCC-A数据集的有用性和挑战性。我们希望FMFCC-A数据集能够填补汉语合成语音检测数据集的不足。 摘要:As increasing development of text-to-speech (TTS) and voice conversion (VC) technologies, the detection of synthetic speech has been suffered dramatically. In order to promote the development of synthetic speech detection model against Mandarin TTS and VC technologies, we have constructed a challenging Mandarin dataset and organized the accompanying audio track of the first fake media forensic challenge of China Society of Image and Graphics (FMFCC-A). The FMFCC-A dataset is by far the largest publicly-available Mandarin dataset for synthetic speech detection, which contains 40,000 synthesized Mandarin utterances that generated by 11 Mandarin TTS systems and two Mandarin VC systems, and 10,000 genuine Mandarin utterances collected from 58 speakers. The FMFCC-A dataset is divided into the training, development and evaluation sets, which are used for the research of detection of synthesized Mandarin speech under various previously unknown speech synthesis systems or audio post-processing operations. In addition to describing the construction of the FMFCC-A dataset, we provide a detailed analysis of two baseline methods and the top-performing submissions from the FMFCC-A, which illustrates the usefulness and challenge of FMFCC-A dataset. We hope that the FMFCC-A dataset can fill the gap of lack of Mandarin datasets for synthetic speech detection.
【8】 Measuring Cognitive Status from Speech in a Smart Home Environment 标题:在智能家居环境中从语音测量认知状态 链接:https://arxiv.org/abs/2110.09421
作者:Kathleen C. Fraser,Majid Komeili 备注:None 摘要:人口正在老龄化,越来越精通科技。联合国预测,到2050年,全世界六分之一的人将超过65岁(2019年为11分之一),而欧洲和北美的这一比例将增至四分之一。与此同时,65岁以上的美国成年人拥有智能手机的比例从2013年到2017年上升了24个百分点,其中大多数人的家中都有互联网。智能设备和智能家居技术在改变人们的年龄、他们晚年独立生活的能力以及他们与护理圈的互动方面具有巨大的潜力。认知健康是老年人独立和幸福的一个关键组成部分,智能家庭提供了许多机会,以持续、不引人注目的方式测量认知状态。在这篇文章中,我们将重点放在言语作为认知健康的测量工具上。现有的认知评估方法存在许多局限性,可以通过智能家庭语音感知技术加以解决。我们从一个简短的教程开始,介绍如何从语音中测量认知状态,包括一些有用的开源软件工具箱的指针,供感兴趣的读者使用。然后,我们概述了用于测量认知健康的主动和被动智能家居语音传感试点研究的初步结果,并总结了该领域下一波工作的一些建议和挑战陈述,以帮助克服成功的技术和道德障碍。 摘要:The population is aging, and becoming more tech-savvy. The United Nations predicts that by 2050, one in six people in the world will be over age 65 (up from one in 11 in 2019), and this increases to one in four in Europe and Northern America. Meanwhile, the proportion of American adults over 65 who own a smartphone has risen 24 percentage points from 2013-2017, and the majority have Internet in their homes. Smart devices and smart home technology have profound potential to transform how people age, their ability to live independently in later years, and their interactions with their circle of care. Cognitive health is a key component to independence and well-being in old age, and smart homes present many opportunities to measure cognitive status in a continuous, unobtrusive manner. In this article, we focus on speech as a measurement instrument for cognitive health. Existing methods of cognitive assessment suffer from a number of limitations that could be addressed through smart home speech sensing technologies. We begin with a brief tutorial on measuring cognitive status from speech, including some pointers to useful open-source software toolboxes for the interested reader. We then present an overview of the preliminary results from pilot studies on active and passive smart home speech sensing for the measurement of cognitive health, and conclude with some recommendations and challenge statements for the next wave of work in this area, to help overcome both technical and ethical barriers to success.
【9】 Using Psychological Characteristics of Situations for Social Situation Comprehension in Support Agents 标题:利用情境的心理特征在支援人员中进行社会情境理解 链接:https://arxiv.org/abs/2110.09397
作者:Ilir Kola,Catholijn M. Jonker,M. Birna van Riemsdijk 机构:The Netherlands, Delft University of Technology & Leiden Institute for Advanced Computer Science, University of Twente 备注:19 pages 摘要:在日常生活中帮助用户的支持代理不仅需要考虑用户的特征,还需要考虑用户的社会状况。现有的包含社会背景的工作使用某种类型的情境线索作为信息处理技术的输入,以评估用户的预期行为。然而,研究表明,确定情境的含义也很重要,这一步我们称之为社会情境理解。我们建议使用情境的心理特征作为社会情境理解的基础,这一点在社会科学中已经提出,用于将意义归属于情境。使用来自用户研究的数据,我们从两个角度对该提案进行评估。首先,从技术角度来看,我们表明情境的心理特征可以作为输入来预测社会情境的优先级,情境的心理特征可以从社会情境的特征中预测。其次,我们考察了理解步骤在人机意义生成中的作用。我们表明,心理特征可以成功地作为一个基础,解释给用户的决定议程管理个人助理代理。 摘要:Support agents that help users in their daily lives need to take into account not only the user's characteristics, but also the social situation of the user. Existing work on including social context uses some type of situation cue as an input to information processing techniques in order to assess the expected behavior of the user. However, research shows that it is important to also determine the meaning of a situation, a step which we refer to as social situation comprehension. We propose using psychological characteristics of situations, which have been proposed in social science for ascribing meaning to situations, as the basis for social situation comprehension. Using data from user studies, we evaluate this proposal from two perspectives. First, from a technical perspective, we show that psychological characteristics of situations can be used as input to predict the priority of social situations, and that psychological characteristics of situations can be predicted from the features of a social situation. Second, we investigate the role of the comprehension step in human-machine meaning making. We show that psychological characteristics can be successfully used as a basis for explanations given to users about the decisions of an agenda management personal assistant agent.
【10】 Ceasing hate withMoH: Hate Speech Detection in Hindi-English Code-Switched Language 标题:用MoH停止仇恨:印英语码转换语言中的仇恨语音检测 链接:https://arxiv.org/abs/2110.09393
作者:Arushi Sharma,Anubha Kabra,Minni Jain 机构:Optum Global Advantage, Adobe Inc., Delhi Technological University 备注:Accepted in Elsevier Journal of Information Processing and Management. Sharma and Kabra made equal contribution 摘要:社交媒体已经成为人们在全世界发表意见的基石。由于匿名功能带来的更大的自由感,有可能无视网络社交礼仪,攻击他人而不会面临严重后果,不可避免地传播仇恨言论。目前筛选网络内容和抵消仇恨蔓延的措施做得不够。造成这一现象的一个因素是社交媒体中区域语言的流行以及缺乏灵活的仇恨语言检测器。拟议的工作重点是分析印地语-英语代码转换语言中的仇恨言论。我们的方法探索了转换技术来捕获精确的文本表示。为了包含数据的结构并与现有算法一起使用,我们开发了MoH或Map Only印地语,这在印地语中表示“爱”。MoH管道包括语言识别、使用罗马-印地语单词知识库的罗马-德瓦纳加里-印地语音译。最后,它采用了经过微调的多语言Bert和MuRIL语言模型。我们在三个数据集上进行了一些定量实验研究,并使用精确度、召回率和F1指标评估了性能。第一个实验使用经典机器学习模型研究MoH映射文本的性能,结果显示F1成绩平均提高了13%。第二种方法将拟议工作的分数与基线模型的分数进行比较,结果显示性能提高了6%。最后,第三部分利用现有的音译库进行各种数据模拟,得出所提出的MoH技术。在这方面,卫生部的表现优于其他部门15%。我们的结果表明,在所有三个数据集上,最先进的分数都有显著提高。 摘要:Social media has become a bedrock for people to voice their opinions worldwide. Due to the greater sense of freedom with the anonymity feature, it is possible to disregard social etiquette online and attack others without facing severe consequences, inevitably propagating hate speech. The current measures to sift the online content and offset the hatred spread do not go far enough. One factor contributing to this is the prevalence of regional languages in social media and the paucity of language flexible hate speech detectors. The proposed work focuses on analyzing hate speech in Hindi-English code-switched language. Our method explores transformation techniques to capture precise text representation. To contain the structure of data and yet use it with existing algorithms, we developed MoH or Map Only Hindi, which means "Love" in Hindi. MoH pipeline consists of language identification, Roman to Devanagari Hindi transliteration using a knowledge base of Roman Hindi words. Finally, it employs the fine-tuned Multilingual Bert and MuRIL language models. We conducted several quantitative experiment studies on three datasets and evaluated performance using Precision, Recall, and F1 metrics. The first experiment studies MoH mapped text's performance with classical machine learning models and shows an average increase of 13% in F1 scores. The second compares the proposed work's scores with those of the baseline models and offers a rise in performance by 6%. Finally, the third reaches the proposed MoH technique with various data simulations using the existing transliteration library. Here, MoH outperforms the rest by 15%. Our results demonstrate a significant improvement in the state-of-the-art scores on all three datasets.
【11】 Neuro-Symbolic Forward Reasoning 标题:神经符号正向推理 链接:https://arxiv.org/abs/2110.09383
作者:Hikaru Shindo,Devendra Singh Dhami,Kristian Kersting 机构:AI and Machine Learning Group, Dept. of Computer Science, TU Darmstadt, Germany, Centre for Cognitive Science, TU Darmstadt, Germany, Hessian Center for AI (hessian.AI), Darmstadt, Germany 备注:Preprint 摘要:推理是人类智能的重要组成部分,因此一直是人工智能研究的一个长期目标。随着深度学习最近的成功,将推理与深度学习系统相结合,即神经符号人工智能已成为一个主要的兴趣领域。我们提出了神经符号前向推理器(NSFR),这是一种利用一阶逻辑的可微前向链推理任务的新方法。其关键思想是将可微前向链推理与以对象为中心的(深度)学习相结合。可微前向链式推理平滑地计算逻辑蕴涵,即以可微的方式从给定的事实和规则中推导出新的事实。以对象为中心的学习方法将原始输入分解为对象表示。因此,它允许我们提供一个一致的框架来执行来自原始输入的前向链接推断。NSFR将原始输入分解为以对象为中心的表示,将其转换为概率地原子,最后使用加权规则进行可微前向链推理。我们对以对象为中心的推理数据集、2D Kandinsky模式和3D CLEVR Hans以及各种任务的综合实验评估表明了我们方法的有效性和优势。 摘要:Reasoning is an essential part of human intelligence and thus has been a long-standing goal in artificial intelligence research. With the recent success of deep learning, incorporating reasoning with deep learning systems, i.e., neuro-symbolic AI has become a major field of interest. We propose the Neuro-Symbolic Forward Reasoner (NSFR), a new approach for reasoning tasks taking advantage of differentiable forward-chaining using first-order logic. The key idea is to combine differentiable forward-chaining reasoning with object-centric (deep) learning. Differentiable forward-chaining reasoning computes logical entailments smoothly, i.e., it deduces new facts from given facts and rules in a differentiable manner. The object-centric learning approach factorizes raw inputs into representations in terms of objects. Thus, it allows us to provide a consistent framework to perform the forward-chaining inference from raw inputs. NSFR factorizes the raw inputs into the object-centric representations, converts them into probabilistic ground atoms, and finally performs differentiable forward-chaining inference using weighted rules for inference. Our comprehensive experimental evaluations on object-centric reasoning data sets, 2D Kandinsky patterns and 3D CLEVR-Hans, and a variety of tasks show the effectiveness and advantage of our approach.
【12】 Forecasting Nonverbal Social Signals during Dyadic Interactions with Generative Adversarial Neural Networks 标题:用产生式对抗神经网络预测二元交互过程中的非语言社会信号 链接:https://arxiv.org/abs/2110.09378
作者:Nguyen Tan Viet Tuyen,Oya Celiktutan 机构: Department of Engineer-ing 摘要:我们正在走向一个未来,社会机器人将逐渐广泛应用于我们日常生活的许多方面,包括教育、医疗、工作和个人使用。所有这些实际应用都要求人类和机器人在人类环境中协作,而人类环境中的社会互动是不可避免的。除了言语交流,成功的社会互动还与非言语感知和动作机制之间的相互作用密切相关,如观察注视行为和关注他们的注意力,协调手势的形式和功能。人类以一种本能和适应性的方式进行非语言交流,不费吹灰之力。为了让机器人在我们的社会环境中取得成功,它们应该以类似人类的方式参与社会互动,并提高自主性。特别是,非语言手势有望赋予社交机器人强调其语言或显示其意图的能力。出于这一动机,我们的研究揭示了在社会互动中对人类行为进行建模,特别是预测二元互动过程中的人类非言语社会信号,其总体目标是开发能够学习模仿人类二元互动的机器人界面。这种方法将确保机器人手势中编码的信息能够被互动伙伴以简单透明的方式感知,这有助于改善互动伙伴的感知,并提高社交互动的效果。 摘要:We are approaching a future where social robots will progressively become widespread in many aspects of our daily lives, including education, healthcare, work, and personal use. All of such practical applications require that humans and robots collaborate in human environments, where social interaction is unavoidable. Along with verbal communication, successful social interaction is closely coupled with the interplay between nonverbal perception and action mechanisms, such as observation of gaze behaviour and following their attention, coordinating the form and function of hand gestures. Humans perform nonverbal communication in an instinctive and adaptive manner, with no effort. For robots to be successful in our social landscape, they should therefore engage in social interactions in a humanlike way, with increasing levels of autonomy. In particular, nonverbal gestures are expected to endow social robots with the capability of emphasizing their speech, or showing their intentions. Motivated by this, our research sheds a light on modeling human behaviors in social interactions, specifically, forecasting human nonverbal social signals during dyadic interactions, with an overarching goal of developing robotic interfaces that can learn to imitate human dyadic interactions. Such an approach will ensure the messages encoded in the robot gestures could be perceived by interacting partners in a facile and transparent manner, which could help improve the interacting partner perception and makes the social interaction outcomes enhanced.
【13】 Understanding Dimensional Collapse in Contrastive Self-supervised Learning 标题:理解对比性自我监督学习中的维度塌陷 链接:https://arxiv.org/abs/2110.09348
作者:Li Jing,Pascal Vincent,Yann LeCun,Yuandong Tian 机构:Facebook AI Research 备注:15 pages, 10 figures 摘要:自监督视觉表征学习旨在学习有用的表征,而不依赖于人类的注释。联合嵌入方法基于最大化同一图像不同视图的嵌入向量之间的一致性。人们已经提出了各种方法来解决所有嵌入向量折叠为平凡常数解的折叠问题。在这些方法中,对比学习通过负样本对防止崩溃。已经证明,非对比方法遭受不同性质的较小折叠问题:维度折叠,嵌入向量最终跨越低维子空间而不是整个可用嵌入空间。在这里,我们表明,维度崩溃也发生在对比学习中。在这篇论文中,我们揭示了在对比学习中导致维度崩溃的动态机制。受我们理论的启发,我们提出了一种新的对比学习方法,称为DirectCLR,它直接优化表示空间,而不依赖可训练的投影仪。实验表明,在ImageNet上使用可训练线性投影仪时,DirectCLR的性能优于SimCLR。 摘要:Self-supervised visual representation learning aims to learn useful representations without relying on human annotations. Joint embedding approach bases on maximizing the agreement between embedding vectors from different views of the same image. Various methods have been proposed to solve the collapsing problem where all embedding vectors collapse to a trivial constant solution. Among these methods, contrastive learning prevents collapse via negative sample pairs. It has been shown that non-contrastive methods suffer from a lesser collapse problem of a different nature: dimensional collapse, whereby the embedding vectors end up spanning a lower-dimensional subspace instead of the entire available embedding space. Here, we show that dimensional collapse also happens in contrastive learning. In this paper, we shed light on the dynamics at play in contrastive learning that leads to dimensional collapse. Inspired by our theory, we propose a novel contrastive learning method, called DirectCLR, which directly optimizes the representation space without relying on a trainable projector. Experiments show that DirectCLR outperforms SimCLR with a trainable linear projector on ImageNet.
【14】 Intrusion-Free Graph Mixup 标题:无入侵图形混合 链接:https://arxiv.org/abs/2110.09344
作者:Hongyu Guo,Yongyi Mao 机构:National Research Council Canada, Montreal Road, Ottawa, School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa, Ontario 摘要:我们提出了一种简单而有效的基于插值的正则化技术来提高图神经网络(GNN)的泛化能力。我们利用视觉和文本混合正则化器的最新进展,其中对随机样本对及其标签进行插值以创建用于训练的合成样本。与采用网格或线性序列格式的图像或自然句子不同,图形具有任意的结构和拓扑,对图形的语义信息起着至关重要的作用。因此,即使只是简单地从一个图中删除或添加一条边,也可以极大地改变其语义。这使得插值图输入非常具有挑战性,因为混合随机图对可能会自然地创建具有相同结构但具有不同标签的图,从而导致流形入侵问题。为了克服这一障碍,我们提出了第一种用于图上混合的输入混合模式。我们从理论上证明了我们的混合策略可以从混合图中恢复源图,并保证混合图是流形无入侵的。我们还实证表明,我们的方法可以有效地正则化图分类学习,从而比流行的图扩充基线具有更高的预测精度。 摘要:We present a simple and yet effective interpolation-based regularization technique to improve the generalization of Graph Neural Networks (GNNs). We leverage the recent advances in Mixup regularizer for vision and text, where random sample pairs and their labels are interpolated to create synthetic samples for training. Unlike images or natural sentences, which embrace a grid or linear sequence format, graphs have arbitrary structure and topology, which play a vital role on the semantic information of a graph. Consequently, even simply deleting or adding one edge from a graph can dramatically change its semantic meanings. This makes interpolating graph inputs very challenging because mixing random graph pairs may naturally create graphs with identical structure but with different labels, causing the manifold intrusion issue. To cope with this obstacle, we propose the first input mixing schema for Mixup on graph. We theoretically prove that our mixing strategy can recover the source graphs from the mixed graph, and guarantees that the mixed graphs are manifold intrusion free. We also empirically show that our method can effectively regularize the graph classification learning, resulting in superior predictive accuracy over popular graph augmentation baselines.
【15】 COVIDRead: A Large-scale Question Answering Dataset on COVID-19 标题:CoVIDRead:一个关于冠状病毒的大规模问答数据集 链接:https://arxiv.org/abs/2110.09321
作者:Tanik Saikh,Sovan Kumar Sahoo,Asif Ekbal,Pushpak Bhattacharyya 机构:Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihta, Patna, India, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India 备注:20 pages, 7 figures 摘要:在这种大流行情况下,提取任何与新冠病毒-19相关的信息将对整个社区大有裨益。在本文中,我们介绍了一个非常重要的资源COVIDRead,它是一个斯坦福问答数据集(STAND),类似于超过10万个问答对的数据集。数据集由上下文-答案-问题三元组组成。主要是上下文中的问题以自动方式构造。之后,系统生成的问题由hu mans的注释员手动检查。这是一个宝贵的资源,可以用于多种用途,从普通人对这种非常罕见的疾病的查询到编辑/助理编辑管理文章。我们建立了几个基于端到端神经网络的基线模型,其F1最低值为32.03%,最高值为37.19%。据我们所知,我们是第一个在COVID-19上提供如此大量此类QA数据集的公司。该数据集通过提供基准数据集和基线模型,为开展新冠病毒-19的研究开辟了一条新途径。 摘要:During this pandemic situation, extracting any relevant information related to COVID-19 will be immensely beneficial to the community at large. In this paper, we present a very important resource, COVIDRead, a Stanford Question Answering Dataset (SQuAD) like dataset over more than 100k question-answer pairs. The dataset consists of Context-Answer-Question triples. Primarily the questions from the context are constructed in an automated way. After that, the system-generated questions are manually checked by hu-mans annotators. This is a precious resource that could serve many purposes, ranging from common people queries regarding this very uncommon disease to managing articles by editors/associate editors of a journal. We establish several end-to-end neural network based baseline models that attain the lowest F1 of 32.03% and the highest F1 of 37.19%. To the best of our knowledge, we are the first to provide this kind of QA dataset in such a large volume on COVID-19. This dataset creates a new avenue of carrying out research on COVID-19 by providing a benchmark dataset and a baseline model.
【16】 Mixed Reality using Illumination-aware Gradient Mixing in Surgical Telepresence: Enhanced Multi-layer Visualization 标题:手术临场感中使用照明感知梯度混合的混合现实:增强的多层可视化 链接:https://arxiv.org/abs/2110.09318
作者:Nirakar Puri,Abeer Alsadoon,P. W. C. Prasad,Nada Alsalami,Tarik A. Rashid 机构:Illumination-aware Gradient Mixing in Surgical Telepresence: Enhanced Multi-layer Visualization. Multimedia Tools and Applications., Mixed Reality using Illumination-aware Gradient Mixing in Surgical 备注:None 摘要:背景与目的:使用增强感知的手术临场感已经得到应用,但混合现实仍在研究中,只是理论上的。这项工作的目的是提出一种解决方案,当输入源和目标视频中的照明强度变化时,通过生成全局一致的视频来提高最终合并视频中的可视化效果。方法:提出的系统使用增强的多层可视化,并使用光照感知视频合成算法进行光照感知梯度混合。粒子群优化算法用于从前景和背景区域以及图像像素相关性中寻找最佳样本对,以估计alpha蒙版。粒子群优化算法有助于在未知区域获得未知像素的原始颜色和深度。结果:我们的结果显示,在肠道、颌骨和乳房的每个样本中,通过减少为未知区域选择最佳样本对的均方误差,提高了准确性。与最先进的系统相比,该减少量为16.48%。结果,能见度精度从89.4%提高到97.7%,这有助于即使在光线不同的情况下也能清除手部视觉。结论:照明效果和alpha像素相关性提高了可视化精度,生成了全局一致的合成结果,并在合成具有高照明效果和反向照明效果的两个视频时保持了时间一致性。此外,本文还为未知区域选择最佳采样对以获得原始颜色和深度提供了解决方案。 摘要:Background and aim: Surgical telepresence using augmented perception has been applied, but mixed reality is still being researched and is only theoretical. The aim of this work is to propose a solution to improve the visualization in the final merged video by producing globally consistent videos when the intensity of illumination in the input source and target video varies. Methodology: The proposed system uses an enhanced multi-layer visualization with illumination-aware gradient mixing using Illumination Aware Video Composition algorithm. Particle Swarm Optimization Algorithm is used to find the best sample pair from foreground and background region and image pixel correlation to estimate the alpha matte. Particle Swarm Optimization algorithm helps to get the original colour and depth of the unknown pixel in the unknown region. Result: Our results showed improved accuracy caused by reducing the Mean squared Error for selecting the best sample pair for unknown region in 10 each sample for bowel, jaw and breast. The amount of this reduction is 16.48% from the state of art system. As a result, the visibility accuracy is improved from 89.4 to 97.7% which helped to clear the hand vision even in the difference of light. Conclusion: Illumination effect and alpha pixel correlation improves the visualization accuracy and produces a globally consistent composition results and maintains the temporal coherency when compositing two videos with high and inverse illumination effect. In addition, this paper provides a solution for selecting the best sampling pair for the unknown region to obtain the original colour and depth.
【17】 Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention 标题:Energon:使用动态稀疏注意实现Transformer的高效加速 链接:https://arxiv.org/abs/2110.09310
作者:Zhe Zhou,Junlin Liu,Zhenyu Gu,Guangyu Sun 机构: Peking University 摘要:近年来,transformer模型已经彻底改变了自然语言处理(NLP),并且在计算机视觉(CV)任务中也显示了良好的性能。尽管有效,但由于复杂的数据移动和二次计算复杂性,Transformer的注意力操作很难加速,从而禁止在资源受限的边缘计算平台上进行实时推理。为了应对这一挑战,我们提出了Energon,这是一种利用动态稀疏注意加速各种Transformer的算法架构协同设计方法。由于注意结果只依赖于几个重要的查询密钥对,因此我们提出了一种多轮过滤算法来在运行时动态识别这些密钥对。我们在每轮滤波中采用低比特宽度,在注意阶段仅使用高精度张量,以降低整体复杂度。通过这种方式,我们显著降低了计算成本,而精度损失可以忽略不计。为了使这种算法具有更低的延迟和更好的能量效率,我们还提出了一种Energon协处理器体系结构。精心设计的管道和专门的优化共同提高了性能并降低了功耗。在NLP和CV基准上进行的大量实验表明,与Intel Xeon 5220 CPU和NVIDIA V100 GPU相比,Energon实现了$161倍和$8.4倍的几何平均加速比,并实现了$10^4倍和$10^3倍的节能。与最先进的注意力加速器SpAtten和$A^3$相比,Energon还实现了$1.7倍、1.25倍和$1.6倍、1.5倍的能效提升。 摘要:In recent years, transformer models have revolutionized Natural Language Processing (NLP) and also show promising performance on Computer Vision (CV) tasks. Despite their effectiveness, transformers' attention operations are hard to accelerate due to complicated data movement and quadratic computational complexity, prohibiting the real-time inference on resource-constrained edge-computing platforms. To tackle this challenge, we propose Energon, an algorithm-architecture co-design approach that accelerates various transformers using dynamic sparse attention. With the observation that attention results only depend on a few important query-key pairs, we propose a multi-round filtering algorithm to dynamically identify such pairs at runtime. We adopt low bitwidth in each filtering round and only use high-precision tensors in the attention stage to reduce overall complexity. By this means, we significantly mitigate the computational cost with negligible accuracy loss. To enable such an algorithm with lower latency and better energy-efficiency, we also propose an Energon co-processor architecture. Elaborated pipelines and specialized optimizations jointly boost the performance and reduce power consumption. Extensive experiments on both NLP and CV benchmarks demonstrate that Energon achieves $161times$ and $8.4times$ geo-mean speedup and up to $10^4times$ and $10^3times$ energy reduction compared with Intel Xeon 5220 CPU and NVIDIA V100 GPU. Compared to state-of-the-art attention accelerators SpAtten and $A^3$, Energon also achieves $1.7times, 1.25times$ speedup and $1.6 times, 1.5times $ higher energy efficiency.
【18】 A Prior Guided Adversarial Representation Learning and Hypergraph Perceptual Network for Predicting Abnormal Connections of Alzheimer's Disease 标题:先验引导的对抗性表征学习和超图感知网络预测阿尔茨海默病异常联系 链接:https://arxiv.org/abs/2110.09302
作者:Qiankun Zuo,Baiying Lei,Shuqiang Wang,Yong Liu,Bingchuan Wang,Yanyan Shen 机构: ShenzhenUniversity, ChinaYong Liu is with the Gaoling School of Artificial Intelligence, RenminUniversity of China 摘要:阿尔茨海默病的特征是在其进行性退化过程中大脑结构和功能连接的改变。现有的辅助诊断方法已经完成了分类任务,但很少有方法能够准确地评估脑连接性的变化特征。在这项工作中,提出了一种先验引导的对抗性表征学习和超图感知网络(PGARL-HPN),用于使用三模态医学图像预测异常的大脑连接。具体而言,根据解剖学知识估计先验分布,以指导使用对抗策略的多模态表征学习。此外,还进一步利用成对协作鉴别器结构来缩小表示分布的差异。此外,开发了超图感知网络来有效地融合学习的表示,同时在多模态图像内部和之间建立高阶关系。实验结果表明,该模型在分析和预测阿尔茨海默病进展方面优于其他相关方法。更重要的是,已识别的异常连接部分符合先前的神经科学发现。该模型可以评估阿尔茨海默病不同阶段异常脑连接的特征,有助于认知疾病的研究和早期治疗。 摘要:Alzheimer's disease is characterized by alterations of the brain's structural and functional connectivity during its progressive degenerative processes. Existing auxiliary diagnostic methods have accomplished the classification task, but few of them can accurately evaluate the changing characteristics of brain connectivity. In this work, a prior guided adversarial representation learning and hypergraph perceptual network (PGARL-HPN) is proposed to predict abnormal brain connections using triple-modality medical images. Concretely, a prior distribution from the anatomical knowledge is estimated to guide multimodal representation learning using an adversarial strategy. Also, the pairwise collaborative discriminator structure is further utilized to narrow the difference of representation distribution. Moreover, the hypergraph perceptual network is developed to effectively fuse the learned representations while establishing high-order relations within and between multimodal images. Experimental results demonstrate that the proposed model outperforms other related methods in analyzing and predicting Alzheimer's disease progression. More importantly, the identified abnormal connections are partly consistent with the previous neuroscience discoveries. The proposed model can evaluate characteristics of abnormal brain connections at different stages of Alzheimer's disease, which is helpful for cognitive disease study and early treatment.
【19】 The AI Triplet: Computational, Conceptual, and Mathematical Representations in AI Education 标题:人工智能三元组:人工智能教育中的计算、概念和数学表示 链接:https://arxiv.org/abs/2110.09290
作者:Maithilee Kunda 机构:Department of Computer Science, Vanderbilt University, Nashville, TN, USA 摘要:人工智能方面的专业知识要求集成计算、概念和数学知识和表示。我们将这三个因素称为“人工智能三元组”,在精神上类似于影响了过去四十年化学教育的“化学三元组”。我们描述了这个三元组的基本原理,以及它如何映射到AI课程中常用的主题,如树搜索和梯度下降。此外,与化学三元组对化学教育的影响类似,我们提出了一个初步的例子,说明考虑AI三元组如何有助于查明AI教育中的障碍,即,如何对学生的学习进行架构,以便在三元组各点之间移动时达到专家级的灵活性。 摘要:Expertise in AI requires integrating computational, conceptual, and mathematical knowledge and representations. We propose this trifecta as an "AI triplet," similar in spirit to the "chemistry triplet" that has influenced the past four decades of chemistry education. We describe a rationale for this triplet and how it maps onto topics commonly taught in AI courses, such as tree search and gradient descent. Also, similar to impacts of the chemistry triplet on chemistry education, we suggest an initial example of how considering the AI triplet may help pinpoint obstacles in AI education, i.e., how student learning might be scaffolded to approach expert-level flexibility in moving between the points of the triplet.
【20】 SafeAccess : An Intelligent System to make Smart Home Safer and Americans with Disability Act Compliant 标题:SafeAccess :一种智能系统,使智能家庭更安全,并符合美国残疾人法案 链接:https://arxiv.org/abs/2110.09273
作者:Shahinur Alam 机构:Department of Electrical and Computer Engineering, The University of Memphis, Engineering Science Building, Memphis, TN 摘要:智能家居正变得无处不在,但它们并不符合美国残疾人法案(ADA)。配备符合ADA标准的设备和服务的智能家居对于残疾人(即视觉障碍和行动受限)提高独立性、安全性和生活质量至关重要。尽管智能家居技术取得了诸多进步,但仍存在一些基本的设计和实施问题。例如,当有人敲门或按门铃时,残疾人往往会感到不安。在本文中,我们提出了一个名为“SafeAccess ”的智能系统,以构建更安全且符合ADA的场所(例如智能家居、办公室)。SafeAccess 的主要功能是:1)监控场所内外并识别进入人员;2) 为用户提供相关信息,以评估即将到来的威胁(如入室盗窃、抢劫)和正在进行的犯罪3),使用户可以安全地进入朋友/家人的家中。我们已经解决了几个技术和研究挑战:-开发检测和识别人员/活动的模型,生成图像描述,设计符合ADA的终端系统。此外,我们还设计了一款智能门原型,展示了概念验证。该房地预计将配备置于战略位置的摄像机,便于全天候监控该房地,以识别入境人员并生成图像描述。系统根据图像描述生成预结构化消息,以评估传入的威胁并立即通知用户。通过严格的定量评估,确保了模型的完整性和通用性。使用PYTHEIA量表对系统的用户满意度和可靠性进行了测量,并被评为优秀(内部一致性Cronbach's alpha为0.784,重测信度为0.939) 摘要:Smart homes are becoming ubiquitous, but they are not Americans with Disability Act (ADA) compliant. Smart homes equipped with ADA compliant appliances and services are critical for people with disabilities (i.e., visual impairments and limited mobility) to improve independence, safety, and quality of life. Despite all advancements in smart home technologies, some fundamental design and implementation issues remain. For example, people with disabilities often feel insecure to respond when someone knocks on the door or rings the doorbell. In this paper, we present an intelligent system called "SafeAccess " to build safer and ADA compliant premises (e.g. smart homes, offices). The key functionalities of the SafeAccess are: 1) Monitoring the inside/outside of premises and identifying incoming people; 2) Providing users relevant information to assess incoming threats (e.g., burglary, robbery) and ongoing crimes 3) Allowing users to grant safe access to homes for friends/family members. We have addressed several technical and research challenges: - developing models to detect and recognize person/activity, generating image descriptions, designing ADA compliant end-end system. In addition, we have designed a prototype smart door showcasing the proof-of-concept. The premises are expected to be equipped with cameras placed in strategic locations that facilitate monitoring the premise 24/7 to identify incoming persons and to generate image descriptions. The system generates a pre-structured message from the image description to assess incoming threats and immediately notify the users. The completeness and generalization of models have been ensured through a rigorous quantitative evaluation. The users' satisfaction and reliability of the system has been measured using PYTHEIA scale and was rated excellent (Internal Consistency-Cronbach's alpha is 0.784, Test-retest reliability is 0.939 )
【21】 Rebuilding Trust: Queer in AI Approach to Artificial Intelligence Risk Management 标题:重建信任:人工智能风险管理的人工智能方法中的酷儿 链接:https://arxiv.org/abs/2110.09271
作者:Ashwin,William Agnew,Juan Pajaro,Arjun Subramonian 机构:Rebuilding Trust 备注:Queer in AI Response to NIST AI Risk Management Framework 摘要:人工智能、机器学习和数据科学方法已经渗透到我们的社会和技术中,以许多微妙的方式影响着我们所有的生活。值得信赖的人工智能已经成为一个重要的话题,因为对人工智能系统及其创建者的信任已经丧失,或者根本就不存在。研究人员、企业和政府长期以来都有将边缘群体排除在技术开发、部署和监督之外的痛苦历史。这种排斥的直接结果是,这些技术长期以来对少数族裔群体没有那么大的用处,甚至有害。这段令人愤怒的历史表明,不能相信行业能够自我监管,也说明了为什么人们对商业人工智能系统和开发失去了信任。我们认为,任何追求信任的人工智能开发、部署和监控框架都必须结合女性主义、非剥削性的参与式设计原则和强大、外部和持续的监控和测试。此外,我们还解释了考虑诚信方面的重要性,不仅仅是透明性、公平性和问责制,特别是考虑到正义和权力转移给人民,并将其作为核心价值观让给任何值得信赖的人工智能系统。创建值得信赖的人工智能始于资助、支持和授权人工智能领域中的酷儿(Queer)等团体,因此人工智能领域具有多样性和包容性,能够可靠有效地开发值得信赖的人工智能。通过我们多年的工作和宣传,我们已经形成了关于在人工智能系统中是否以及如何使用性别、性行为和身份的其他方面,以及如何减轻这些方面的伤害等问题的专家知识。在此基础上,我们讨论了人工智能的性别化方法,并进一步提出了一种奇怪的认识论,并分析了它可以给人工智能带来的好处。 摘要:AI, machine learning, and data science methods are already pervasive in our society and technology, affecting all of our lives in many subtle ways. Trustworthy AI has become an important topic because trust in AI systems and their creators has been lost, or was never present in the first place. Researchers, corporations, and governments have long and painful histories of excluding marginalized groups from technology development, deployment, and oversight. As a direct result of this exclusion, these technologies have long histories of being less useful or even harmful to minoritized groups. This infuriating history illustrates that industry cannot be trusted to self-regulate and why trust in commercial AI systems and development has been lost. We argue that any AI development, deployment, and monitoring framework that aspires to trust must incorporate both feminist, non-exploitative participatory design principles and strong, outside, and continual monitoring and testing. We additionally explain the importance of considering aspects of trustworthiness beyond just transparency, fairness, and accountability, specifically, to consider justice and shifting power to the people and disempowered as core values to any trustworthy AI system. Creating trustworthy AI starts by funding, supporting, and empowering groups like Queer in AI so the field of AI has the diversity and inclusion to credibly and effectively develop trustworthy AI. Through our years of work and advocacy, we have developed expert knowledge around questions of if and how gender, sexuality, and other aspects of identity should be used in AI systems and how harms along these lines should be mitigated. Based on this, we discuss a gendered approach to AI, and further propose a queer epistemology and analyze the benefits it can bring to AI.
【22】 Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection 标题:单层预测归一化最大似然失配检测算法 链接:https://arxiv.org/abs/2110.09246
作者:Koby Bibas,Meir Feder,Tal Hassner 机构:School of Electrical Engineering, Tel Aviv University, Facebook AI 备注:NeurIPS 2021 摘要:检测分布外(OOD)样本对于开发基于机器学习的关键安全系统模型至关重要。常用的OOD检测方法假设在训练期间可以访问一些在现实场景中可能不可用的OOD样本。相反,我们使用{em预测归一化最大似然}(pNML)学习器,在该学习器中,对测试输入不做任何假设。我们推导了单层神经网络(NN)的pNML及其推广误差的显式表达式,表示为{em}。我们证明,当(i)测试向量位于与训练数据的经验相关矩阵的大特征值相关的特征向量所跨越的子空间中,或者(ii)测试样本远离决策边界时,该学习者具有良好的泛化能力。此外,我们描述了如何通过在最后一层使用显式pNML,然后使用softmax函数,将导出的pNML遗憾有效地应用于任何预训练的深度NN。将导出的遗憾应用于深度神经网络既不需要额外的可调参数,也不需要额外的数据。我们使用经CIFAR-100、CIFAR-10、SVHN和ImageNet-30训练的DenseNet-100、ResNet-34和WideResNet-40模型,在74个OOD检测基准上对我们的方法进行了广泛评估,结果表明,与最近领先的方法相比,我们的方法有了高达15.6%的显著改进。 摘要:Detecting out-of-distribution (OOD) samples is vital for developing machine learning based models for critical safety systems. Common approaches for OOD detection assume access to some OOD samples during training which may not be available in a real-life scenario. Instead, we utilize the {em predictive normalized maximum likelihood} (pNML) learner, in which no assumptions are made on the tested input. We derive an explicit expression of the pNML and its generalization error, denoted as the {em regret}, for a single layer neural network (NN). We show that this learner generalizes well when (i) the test vector resides in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, or (ii) the test sample is far from the decision boundary. Furthermore, we describe how to efficiently apply the derived pNML regret to any pretrained deep NN, by employing the explicit pNML for the last layer, followed by the softmax function. Applying the derived regret to deep NN requires neither additional tunable parameters nor extra data. We extensively evaluate our approach on 74 OOD detection benchmarks using DenseNet-100, ResNet-34, and WideResNet-40 models trained with CIFAR-100, CIFAR-10, SVHN, and ImageNet-30 showing a significant improvement of up to 15.6% over recent leading methods.
【23】 Value alignment: a formal approach 标题:价值调整:一种正式的方法 链接:https://arxiv.org/abs/2110.09240
作者:Carles Sierra,Nardine Osman,Pablo Noriega,Jordi Sabater-Mir,Antoni Perelló 机构:Perello-Moragues, Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Catalonia 备注:accepted paper at the Responsible Artificial Intelligence Agents Workshop, of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2019) 摘要:自治人工智能系统应遵循的原则。它从本质上说,一个系统的目标和行为应该与人类价值观保持一致。但如何确保价值一致?在本文中,我们首先提供一个形式化模型,通过偏好和计算价值聚合的方法来表示价值;i、 e.关于一组代理的偏好和/或关于一组值的偏好。然后,通过增加/减少导致世界未来国家偏好的增加/减少,定义并计算给定规范相对于给定值的值对齐。我们关注规范,因为规范支配行为,因此,给定系统与给定值的对齐将由系统遵循的规范决定。 摘要:principles that should govern autonomous AI systems. It essentially states that a system's goals and behaviour should be aligned with human values. But how to ensure value alignment? In this paper we first provide a formal model to represent values through preferences and ways to compute value aggregations; i.e. preferences with respect to a group of agents and/or preferences with respect to sets of values. Value alignment is then defined, and computed, for a given norm with respect to a given value through the increase/decrease that it results in the preferences of future states of the world. We focus on norms as it is norms that govern behaviour, and as such, the alignment of a given system with a given value will be dictated by the norms the system follows.
【24】 Accountability in AI: From Principles to Industry-specific Accreditation 标题:人工智能中的责任:从原则到特定行业的认证 链接:https://arxiv.org/abs/2110.09232
作者:Chris Percy,Simo Dragicevic,Sanjoy Sarkar,Artur S. d'Avila Garcez 备注:24 pages, 2 figures, 2 tables 摘要:最近与人工智能相关的丑闻使人工智能中的责任问题成为人们关注的焦点,公众对此越来越感兴趣和关注。本文从公共政策和治理两个方面借鉴文献,做出贡献。首先,我们提出了一个AI问责生态系统,作为系统的有用透镜,不同的利益相关者需要并促进特定的问责机制。我们认为,目前的生态系统是不平衡的,需要通过人工智能的可解释性和充分的文档和流程形式化来提高透明度,以支持内部审计,最终导致外部认证流程。其次,我们使用博彩业的案例研究,在整个生态系统的一个子集中说明需要行业特定的问责原则和流程。在根据我们的研究结果总结和讨论未来工作的方向之前,我们对博彩业关键责任原则的实施进行了定义和批判性评估,即解决算法偏差和模型可解释性。关键词:问责性、可解释人工智能、算法偏差、监管。 摘要:Recent AI-related scandals have shed a spotlight on accountability in AI, with increasing public interest and concern. This paper draws on literature from public policy and governance to make two contributions. First, we propose an AI accountability ecosystem as a useful lens on the system, with different stakeholders requiring and contributing to specific accountability mechanisms. We argue that the present ecosystem is unbalanced, with a need for improved transparency via AI explainability and adequate documentation and process formalisation to support internal audit, leading up eventually to external accreditation processes. Second, we use a case study in the gambling sector to illustrate in a subset of the overall ecosystem the need for industry-specific accountability principles and processes. We define and evaluate critically the implementation of key accountability principles in the gambling industry, namely addressing algorithmic bias and model explainability, before concluding and discussing directions for future work based on our findings. Keywords: Accountability, Explainable AI, Algorithmic Bias, Regulation.
【25】 Machine Learning Featurizations for AI Hacking of Political Systems 标题:面向政治系统人工智能黑客攻击的机器学习特征 链接:https://arxiv.org/abs/2110.09231
作者:Nathan E Sanders,Bruce Schneier 机构:Science and International Affairs, Harvard Kennedy School, USA 备注:19 pages, 2 figures 摘要:如果一台机器的输出破坏了一个强大的民主国家的稳定,或者其产生可能破坏国家的政治权力,那么它的投入将是什么?在最近的论文《即将到来的人工智能黑客》(the Coming AI Hackers)中,Schneier(2021)提出了人工智能的未来应用,以远远超过人类识别和应对此类威胁的能力的速度发现、操纵和利用社会、经济和政治系统的漏洞。这项工作通过将机器学习的it理论应用于AI黑客攻击,提出了一些可能的“特征化”(输入规范和转换)框架,从而推进了这一概念。以政治领域为重点,我们开发了图形和序列数据表示法,使一系列深度学习模型能够应用于预测政治系统的属性和结果。我们探索与每个框架相关的可能的数据模型、数据集、预测任务和可操作的应用程序。我们推测此类模型可能产生的实际影响和可行性,并通过讨论其伦理含义得出结论。 摘要:What would the inputs be to a machine whose output is the destabilization of a robust democracy, or whose emanations could disrupt the political power of nations? In the recent essay "The Coming AI Hackers," Schneier (2021) proposed a future application of artificial intelligences to discover, manipulate, and exploit vulnerabilities of social, economic, and political systems at speeds far greater than humans' ability to recognize and respond to such threats. This work advances the concept by applying to it theory from machine learning, hypothesizing some possible "featurization" (input specification and transformation) frameworks for AI hacking. Focusing on the political domain, we develop graph and sequence data representations that would enable the application of a range of deep learning models to predict attributes and outcomes of political systems. We explore possible data models, datasets, predictive tasks, and actionable applications associated with each framework. We speculate about the likely practical impact and feasibility of such models, and conclude by discussing their ethical implications.
【26】 Governance and Communication of Algorithmic Decision Making: A Case Study on Public Sector 标题:算法决策的治理与沟通:以公共部门为例 链接:https://arxiv.org/abs/2110.09226
作者:Erik Jonk,Deniz Iren 机构:Information Sciences, Open Universiteit, Heerlen, the Netherlands, Cite as:, Jonk, E., Iren, D. (, Decision Making: A Case Study on Public Sector. Preprint. In , IEEE ,rd Conference on, Business Informatics (CBI). IEEE. 备注:10 pages 摘要:算法决策(ADM)已经渗透到社会的各个方面。政府组织也受到这一趋势的影响。然而,ADM的使用一直受到公众、媒体和利益集团的负面关注。对于政府组织通过ADM产生积极影响,几乎没有可操作的指导方针。在本案例研究中,我们调查了荷兰八个市政组织对ADM的实际和预期用途。我们采访了关键人员和决策者。我们的研究结果表明,市政当局大多以临时方式使用ADM,他们尚未系统地定义或制度化数据科学过程。它们规避风险,明确表示需要国家层面的合作、指导甚至监督。第三方(主要是商业的)通常参与ADM开发生命周期,没有系统的治理。关于使用ADM的沟通通常对媒体和公众的负面关注作出反应。有强烈的迹象表明需要一个ADM治理框架。在本文中,我们详细介绍了我们的发现,以及对ADM系统的治理、通信和性能评估的可操作见解。 摘要:Algorithmic Decision Making (ADM) has permeated all aspects of society. Government organizations are also affected by this trend. However, the use of ADM has been getting negative attention from the public, media, and interest groups. There is little to no actionable guidelines for government organizations to create positive impact through ADM. In this case study, we examined eight municipal organizations in the Netherlands regarding their actual and intended use of ADM. We interviewed key personnel and decision makers. Our results show that municipalities mostly use ADM in an ad hoc manner, and they have not systematically defined or institutionalized a data science process yet. They operate risk averse, and they clearly express the need for cooperation, guidance, and even supervision at the national level. Third parties, mostly commercial, are often involved in the ADM development lifecycle, without systematic governance. Communication on the use of ADM is generally responsive to negative attention from the media and public. There are strong indications for the need of an ADM governance framework. In this paper, we present our findings in detail, along with actionable insights on governance, communication, and performance evaluation of ADM systems.
【27】 Noise-Resilient Ensemble Learning using Evidence Accumulation Clustering 标题:基于证据积累聚类的抗噪集成学习 链接:https://arxiv.org/abs/2110.09212
作者:Gaëlle Candel,David Naccache 机构:Wordline TSS Labs, Paris, &, Département d’informatique de l’ENS, ENS, CNRS, PSL University, Paris 备注:12 pages, submitted and accepted to ANTIC-2021 (International Conference on Advanced Network Technologies and Intelligent Computing) 摘要:集成学习方法将执行同一任务的多个算法结合起来,以建立一个质量更高的组。这些系统很好地适应分布式设置,其中网络的每个对等方或机器承载一个算法,并将其结果传递给其对等方。由于集成冗余,集成学习方法自然能够适应多个节点的缺失。然而,网络可能被破坏,改变对等点的预测精度,这对集合质量有不利影响。在本文中,我们提出了一种抗噪声的集成分类方法,它有助于提高分类精度和纠正随机错误。该方法受证据积累聚类的启发,适用于分类集合。我们将其与四个多类数据集上的朴素选民模型进行了比较。我们的模型显示了更大的弹性,使我们能够在非常高的噪声水平下恢复预测。此外,由于该方法是基于证据积累聚类的,因此我们的方法具有高度的灵活性,因为它可以组合具有不同标签定义的分类器。 摘要:Ensemble Learning methods combine multiple algorithms performing the same task to build a group with superior quality. These systems are well adapted to the distributed setup, where each peer or machine of the network hosts one algorithm and communicate its results to its peers. Ensemble learning methods are naturally resilient to the absence of several peers thanks to the ensemble redundancy. However, the network can be corrupted, altering the prediction accuracy of a peer, which has a deleterious effect on the ensemble quality. In this paper, we propose a noise-resilient ensemble classification method, which helps to improve accuracy and correct random errors. The approach is inspired by Evidence Accumulation Clustering , adapted to classification ensembles. We compared it to the naive voter model over four multi-class datasets. Our model showed a greater resilience, allowing us to recover prediction under a very high noise level. In addition as the method is based on the evidence accumulation clustering, our method is highly flexible as it can combines classifiers with different label definitions.
【28】 On the Completness and Complexity of the Lifted Dynamic Junction Tree Algorithm 标题:关于提升动态连接树算法的完备性和复杂性 链接:https://arxiv.org/abs/2110.09197
作者:Marcel Gehrke 机构:Institute of Information Systems, University of L¨ubeck, L¨ubeck 备注:StaRAI 2021 摘要:提升推理允许以多项式时间w.r.t.域大小执行推理。对于提升算法,完备性研究算法保证计算提升解的模型类。据我们所知,我们对时间提升算法,即所谓的提升动态连接树算法(LDJT)进行了首次完整性和复杂性分析。为了将时间视为一等公民,LDJT引入了一些约束。考虑到这些约束条件,我们分析了可提升模型的类别。此外,我们还从复杂性的角度证明了LDJT与命题时间推理算法w.r.t.域大小相比具有许多优势。因此,LDJT不仅从实践角度,而且从理论角度,提出了推理任务可以在合理时间内解决的模型数量。 摘要:Lifted inference allows to perform inference in polynomial time w.r.t. domain sizes. For a lifted algorithm, completeness investigates model classes for which the algorithm is guaranteed to compute a lifted solution. We contribute, to the best of our knowledge, the first completeness and complexity analysis for a temporal lifted algorithm, the so-called lifted dynamic junction tree algorithm (LDJT). To treat time as a first class citizen, LDJT introduces some constraints. Given these constraints, we analyse the classes of liftable models. Further, we show that LDJT has many advantages from a complexity point of view compared to a propositional temporal inference algorithm w.r.t. domain sizes. Therefore, LDJT advances the number of models for which inference tasks can be solved in reasonable time not only from a practically point of view, but also from a theoretical point of view.
【29】 MDP Abstraction with Successor Features 标题:具有后续功能的MDP抽象 链接:https://arxiv.org/abs/2110.09196
作者:Dongge Han,Michael Wooldridge,Sebastian Tschiatschek 机构:Department of Computer Sciences, University of Oxford, United Kingdom, University of Vienna, Austria 摘要:抽象对于知识和技能的概括起着重要的作用,也是有效学习和计划的关键。对于许多复杂的问题,可以先形成一个抽象的计划,然后通过填充必要的底层细节来实例化。通常,这种抽象的计划很好地概括了相关的新问题。我们在强化学习的背景下研究抽象,在强化学习中,主体可以执行状态或时间抽象。时间抽象(又称选项)以选项策略的形式表示时间扩展的动作。但是,由于状态空间或过渡动态的变化,通常获取的期权策略无法直接转移到新环境。此外,许多现有的状态抽象方案忽略了状态和时间抽象之间的相关性。在这项工作中,我们提出了后继抽象,一种基于后继特征的新的抽象方案。这包括在不同环境中编码和实例化抽象选项的算法,以及基于抽象选项的状态抽象机制。我们的后续抽象允许我们学习抽象环境模型,这些模型的语义可以通过编码和实例化抽象选项在不同的环境中传递。从经验上看,与相关的最新基线相比,我们在一组基准任务上实现了更好的转移和改进的性能。 摘要:Abstraction plays an important role for generalisation of knowledge and skills, and is key to sample efficient learning and planning. For many complex problems an abstract plan can be formed first, which is then instantiated by filling in the necessary low-level details. Often, such abstract plans generalize well to related new problems. We study abstraction in the context of reinforcement learning, in which agents may perform state or temporal abstractions. Temporal abstractions aka options represent temporally-extended actions in the form of option policies. However, typically acquired option policies cannot be directly transferred to new environments due to changes in the state space or transition dynamics. Furthermore, many existing state abstraction schemes ignore the correlation between state and temporal abstraction. In this work, we propose successor abstraction, a novel abstraction scheme building on successor features. This includes an algorithm for encoding and instantiation of abstract options across different environments, and a state abstraction mechanism based on the abstract options. Our successor abstraction allows us to learn abstract environment models with semantics that are transferable across different environments through encoding and instantiation of abstract options. Empirically, we achieve better transfer and improved performance on a set of benchmark tasks as compared to relevant state of the art baselines.
【30】 A Formalisation of Abstract Argumentation in Higher-Order Logic 标题:高阶逻辑中抽象论证的一种形式化 链接:https://arxiv.org/abs/2110.09174
作者:Alexander Steen,David Fuenmayor 机构:University of Luxembourg, Department of Computer Science, avenue de la Fonte, L-, Esch-sur-Alzette, Luxembourg 备注:33 pages, 8 figures; submitted article 摘要:我们提出了一种基于经典高阶逻辑编码的抽象论证框架表示方法。这为使用交互式和自动推理工具的抽象论证框架的计算机辅助评估提供了统一的框架。这使得元理论属性的形式化分析和验证成为可能,并且能够灵活地生成与著名论证语义相关的扩展和标签。 摘要:We present an approach for representing abstract argumentation frameworks based on an encoding into classical higher-order logic. This provides a uniform framework for computer-assisted assessment of abstract argumentation frameworks using interactive and automated reasoning tools. This enables the formal analysis and verification of meta-theoretical properties as well as the flexible generation of extensions and labellings with respect to well-known argumentation semantics.
【31】 Projected Model Counting: Beyond Independent Support 标题:预测模型计数:超越独立支持 链接:https://arxiv.org/abs/2110.09171
作者:Jiong Yang,Supratik Chakraborty,Kuldeep S. Meel 机构: School of Computing, National University of Singapore, Indian Institute of Technology, Bombay 摘要:在过去十年中,人们对预测模型计数实用技术的兴趣激增。尽管取得了重大进展,但性能扩展仍然是该领域的致命弱点。现代计数器中使用的一个关键思想是计算投影在投影集(即我们想要投影的原始变量集)的一个小子集上的模型。虽然这一想法在衡量性能方面是有效的,但对投影集以外的变量上投影的模型计数是否有益的问题尚未探讨。在本文中,我们研究了这个问题,并表明与直觉相反,在投影集之外的变量上投影是有益的。在诸如二值化神经网络验证、信息流量化、电网可靠性等应用中,预测模型计数的良好上界通常就足够了。我们表明,在一些这样的情况下,我们可以确定一组变量,称为上界支持度(UBS),它不一定是投影集的子集,但在UBS上投影的计数模型保证了真实投影模型计数的上界。理论上,瑞银的规模可以比最小的独立支持规模小很多。我们的实验表明,即使在其他情况下,基于UBS的投影计数也可以比基于独立支持度的投影计数更有效,同时产生非常高质量的边界。基于广泛的实验,我们发现基于UBS的投影计数可以解决许多问题实例,这些问题实例超出了最先进的基于独立支持的投影模型计数器的能力范围。 摘要:The past decade has witnessed a surge of interest in practical techniques for projected model counting. Despite significant advancements, however, performance scaling remains the Achilles' heel of this field. A key idea used in modern counters is to count models projected on an emph{independent support} that is often a small subset of the projection set, i.e. original set of variables on which we wanted to project. While this idea has been effective in scaling performance, the question of whether it can benefit to count models projected on variables beyond the projection set, has not been explored. In this paper, we study this question and show that contrary to intuition, it can be beneficial to project on variables beyond the projection set. In applications such as verification of binarized neural networks, quantification of information flow, reliability of power grids etc., a good upper bound of the projected model count often suffices. We show that in several such cases, we can identify a set of variables, called upper bound support (UBS), that is not necessarily a subset of the projection set, and yet counting models projected on UBS guarantees an upper bound of the true projected model count. Theoretically, a UBS can be exponentially smaller than the smallest independent support. Our experiments show that even otherwise, UBS-based projected counting can be more efficient than independent support-based projected counting, while yielding bounds of very high quality. Based on extensive experiments, we find that UBS-based projected counting can solve many problem instances that are beyond the reach of a state-of-the-art independent support-based projected model counter.
【32】 A Dimensionality Reduction Approach for Convolutional Neural Networks 标题:卷积神经网络的一种降维方法 链接:https://arxiv.org/abs/2110.09163
作者:Laura Meneghetti,Nicola Demo,Gianluigi Rozza 机构:Mathematics Area, mathLab, SISSA, via Bonomea , I-, Trieste, Italy 摘要:本文的重点是将经典的模型降阶技术,如主动子空间和本征正交分解,应用于深层神经网络。通过将上述降维技术与输入输出映射(如多项式混沌扩展和前馈神经网络)相结合,我们提出了一种通用的方法来减少预训练网络的层数。压缩现有卷积神经网络结构的必要性是由于其在具有特定存储约束的嵌入式系统中的应用。我们的实验表明,在节省内存分配的同时,所得到的约化网络可以达到与被检测的原始卷积神经网络相似的精度水平。 摘要:The focus of this paper is the application of classical model order reduction techniques, such as Active Subspaces and Proper Orthogonal Decomposition, to Deep Neural Networks. We propose a generic methodology to reduce the number of layers of a pre-trained network by combining the aforementioned techniques for dimensionality reduction with input-output mappings, such as Polynomial Chaos Expansion and Feedforward Neural Networks. The necessity of compressing the architecture of an existing Convolutional Neural Network is motivated by its application in embedded systems with specific storage constraints. Our experiment shows that the reduced nets obtained can achieve a level of accuracy similar to the original Convolutional Neural Network under examination, while saving in memory allocation.
【33】 Newsalyze: Effective Communication of Person-Targeting Biases in News Articles 标题:新闻分析:新闻报道中人对人偏见的有效传播 链接:https://arxiv.org/abs/2110.09158
作者:Felix Hamborg,Kim Heinser,Anastasia Zhukova,Karsten Donnay,Bela Gipp 机构:∗Heidelberg Academy of Sciences and Humanities, Germany, ‡University of Konstanz, Germany, †University of Zurich, Switzerland, §University of Wuppertal, Germany 摘要:媒体偏见及其极端形式——假新闻——可以决定性地影响公众舆论。特别是在报道政策问题时,倾斜的新闻报道可能会强烈影响社会决策,例如在民主选举中。我们的论文为解决这个问题做出了三点贡献。首先,我们提出了一个偏见识别系统,它结合了来自自然语言理解的最新方法。第二,我们设计了对偏见敏感的可视化,将新闻文章中的偏见传达给非专家新闻消费者。第三,我们的主要贡献是一项大规模的用户研究,该研究在一个近似于日常新闻消费的环境中测量偏见意识,例如,我们向受访者提供新闻概述和个人文章。我们不仅测量了视觉化对受访者偏见意识的影响,而且还可以通过联合设计精确定位视觉化对个体成分的影响。我们的偏见敏感概述强烈且显著地提高了受访者的偏见意识。我们的研究进一步表明,我们的内容驱动识别方法可以检测到由于个别新闻文章中存在大量偏见而产生的类似倾斜新闻文章组。相比之下,经过审查的先前工作仅通过区分左翼和右翼出口等方式促进了偏见的可见度。 摘要:Media bias and its extreme form, fake news, can decisively affect public opinion. Especially when reporting on policy issues, slanted news coverage may strongly influence societal decisions, e.g., in democratic elections. Our paper makes three contributions to address this issue. First, we present a system for bias identification, which combines state-of-the-art methods from natural language understanding. Second, we devise bias-sensitive visualizations to communicate bias in news articles to non-expert news consumers. Third, our main contribution is a large-scale user study that measures bias-awareness in a setting that approximates daily news consumption, e.g., we present respondents with a news overview and individual articles. We not only measure the visualizations' effect on respondents' bias-awareness, but we can also pinpoint the effects on individual components of the visualizations by employing a conjoint design. Our bias-sensitive overviews strongly and significantly increase bias-awareness in respondents. Our study further suggests that our content-driven identification method detects groups of similarly slanted news articles due to substantial biases present in individual news articles. In contrast, the reviewed prior work rather only facilitates the visibility of biases, e.g., by distinguishing left- and right-wing outlets.
【34】 Lifting DecPOMDPs for Nanoscale Systems -- A Work in Progress 标题:提升纳米系统的去POMDP--一项正在进行的工作 链接:https://arxiv.org/abs/2110.09152
作者:Tanya Braun,Stefan Fischer,Florian Lau,Ralf Möller 机构: Institute of Information Systems, University of L¨ubeck, L¨ubeck, Germany, Institute of Telematics, University of L¨ubeck, L¨ubeck, Germany 备注:Accepted at the Tenth International Workshop on Statistical Relational AI (StarAI-2021) 摘要:基于DNA的纳米网络有着广泛的应用前景,尤其是在医学领域。有了大量的代理、部分可观测的随机环境和嘈杂的观测,这样的纳米尺度系统可以被建模为一个分散的、部分可观测的马尔可夫决策过程(DecPOMDP)。由于代理集是一个主要因素,本文提出(i)提升的DECPOMDP,将代理集划分为不可区分的代理集,减少所需的最坏情况空间,以及(ii)作为应用的纳米级医疗系统。未来的工作转向解决和实现DPS。 摘要:DNA-based nanonetworks have a wide range of promising use cases, especially in the field of medicine. With a large set of agents, a partially observable stochastic environment, and noisy observations, such nanoscale systems can be modelled as a decentralised, partially observable, Markov decision process (DecPOMDP). As the agent set is a dominating factor, this paper presents (i) lifted DecPOMDPs, partitioning the agent set into sets of indistinguishable agents, reducing the worst-case space required, and (ii) a nanoscale medical system as an application. Future work turns to solving and implementing lifted DecPOMDPs.
【35】 How to Effectively Identify and Communicate Person-Targeting Media Bias in Daily News Consumption? 标题:如何有效识别和传播每日新闻消费中的针对性媒体偏向? 链接:https://arxiv.org/abs/2110.09151
作者:Felix Hamborg,Timo Spinde,Kim Heinser,Karsten Donnay,Bela Gipp 机构:University of Konstanz, Konstanz, Germany, Heidelberg Academy of Sciences and Humanities, Heidelberg, Germany, University of Zurich, Zurich, Switzerland, University of Wuppertal, Wuppertal, Germany 摘要:倾斜的新闻报道强烈影响公众舆论。对政治和相关问题的报道尤其如此,研究表明,新闻中的偏见可能会影响选举和其他集体决策。由于新闻报道的重要性,社会科学界对其进行了长期的研究,形成了描述新闻报道的综合模型和有效但昂贵的分析方法,如内容分析。我们提出了一个正在进行的新闻推荐系统,该系统是第一个自动化内容分析的手动过程,以揭示在报道政策问题的新闻文章中针对个人的偏见。在一项大规模的用户研究中,我们发现了关于这一跨学科研究方向的非常有希望的结果。我们的推荐人检测并揭示了实际存在于个别新闻文章中的实质性框架。相比之下,之前的工作仅通过区分左翼和右翼出口等方式,促进了偏见的可见度。此外,我们的研究表明,推荐不同事件框架的新闻文章可以显著提高受访者的偏见意识。 摘要:Slanted news coverage strongly affects public opinion. This is especially true for coverage on politics and related issues, where studies have shown that bias in the news may influence elections and other collective decisions. Due to its viable importance, news coverage has long been studied in the social sciences, resulting in comprehensive models to describe it and effective yet costly methods to analyze it, such as content analysis. We present an in-progress system for news recommendation that is the first to automate the manual procedure of content analysis to reveal person-targeting biases in news articles reporting on policy issues. In a large-scale user study, we find very promising results regarding this interdisciplinary research direction. Our recommender detects and reveals substantial frames that are actually present in individual news articles. In contrast, prior work rather only facilitates the visibility of biases, e.g., by distinguishing left- and right-wing outlets. Further, our study shows that recommending news articles that differently frame an event significantly improves respondents' awareness of bias.
【36】 BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation 标题:BEAMetrics:一种语言生成评价的基准 链接:https://arxiv.org/abs/2110.09147
作者:Thomas Scialom,Felix Hill 机构:Sorbonne Université, CNRS, LIP, F-, reciTAL, Paris, France, DeepMind 摘要:自然语言处理(NLP)系统越来越多地训练生成开放式文本,而不是在响应之间进行分类。这使得对生成的语言的评估指标的研究至关重要。生成的语言是在给定上下文和/或人类引用响应的情况下对系统输出进行评分的函数。然而,不同的指标具有不同的优势和偏差,并且在某些任务上比在其他任务上更好地反映了人类的直觉。目前还没有一种简单、统一的方法来比较、分析或评估一组具有代表性的任务的指标。这里,我们描述了评估自动度量的基准(BeamMetrics),这是一种使新度量本身的研究更容易评估的资源。BeamMetrics用户可以在不同的任务、质量维度(流利性、连贯性、信息性等)和语言中快速比较现有和新的指标与人类的判断。正如发电专家可能预测的那样,BeamMetrics揭示了现有指标之间与任务相关的显著差异,以及在回答空间复杂或高度依赖一般知识的任务上一贯表现不佳。虽然该分析强调了当前研究实践中面临的一个关键问题,但BeamMetrics也通过促进更好的度量研究——特别是那些可以解释许多现代NLP应用固有的上下文和一般知识之间复杂交互的度量,为解决这一问题做出了贡献。BEAMetrics在麻省理工学院许可证下提供:https://github.com/ThomasScialom/BEAMetrics 摘要:Natural language processing (NLP) systems are increasingly trained to generate open-ended text rather than classifying between responses. This makes research on evaluation metrics for generated language -- functions that score system output given the context and/or human reference responses -- of critical importance. However, different metrics have different strengths and biases, and reflect human intuitions better on some tasks than others. There is currently no simple, unified way to compare, analyse or evaluate metrics across a representative set of tasks. Here, we describe the Benchmark to Evaluate Automatic Metrics (BEAMetrics), a resource to make research into new metrics itself easier to evaluate. BEAMetrics users can quickly compare existing and new metrics with human judgements across a diverse set of tasks, quality dimensions (fluency vs. coherence vs. informativeness etc), and languages. As generation experts might predict, BEAMetrics reveals stark task-dependent differences between existing metrics, and consistently poor performance on tasks with complex answer spaces or high reliance on general knowledge. While this analysis highlights a critical issue facing current research practice, BEAMetrics also contribute to its resolution by facilitating research into better metrics -- particularly those that can account for the complex interaction between context and general knowledge inherent to many modern NLP applications. BEAMetrics is available under the MIT License: https://github.com/ThomasScialom/BEAMetrics
【37】 Ensembling Graph Predictions for AMR Parsing 标题:AMR分析中的集成图预测 链接:https://arxiv.org/abs/2110.09131
作者:Hoang Thanh Lam,Gabriele Picco,Yufang Hou,Young-Suk Lee,Lam M. Nguyen,Dzung T. Phan,Vanessa López,Ramon Fernandez Astudillo 机构: IBM Research, Dublin, Ireland, IBM Research, Thomas J. Watson Research Center, Yorktown Heights, USA 备注:Accepted at NeurIPS 2021 摘要:在许多机器学习任务中,模型被训练来预测结构数据,如图。例如,在自然语言处理中,将文本解析为依赖树或抽象意义表示(AMR)图是非常常见的。另一方面,集成方法将来自多个模型的预测结合起来,创建一个比单个预测更稳健和准确的新模型。在文献中,有许多针对分类或回归问题提出的集成技术,然而,集成图预测还没有得到深入的研究。在这项工作中,我们将这个问题形式化为挖掘最大的图,这是一组图预测最支持的图。由于该问题是NP难问题,我们提出了一种有效的启发式算法来逼近最优解。为了验证我们的方法,我们在AMR解析问题中进行了实验。实验结果表明,所提出的方法可以结合最先进的AMR解析器的优势,创建比五个标准基准数据集中的任何单个模型更精确的新预测。 摘要:In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest graph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of state-of-the-art AMR parsers to create new predictions that are more accurate than any individual models in five standard benchmark datasets.
【38】 Analyzing Wikipedia Membership Dataset and PredictingUnconnected Nodes in the Signed Networks 标题:分析维基百科成员数据集并预测签名网络中的未连接节点 链接:https://arxiv.org/abs/2110.09111
作者:Zhihao Wu,Taoran Li,Ray Roman 机构:University of California, Los Angeles 备注:The work was done in UCLA CS249 17Spring 摘要:在数字交互时代,存在于社交媒体上的人际关系可能与存在于线下的完全相同的交互不同。对计算机科学家来说,研究社交网络中成员之间的潜在或虚假关系是一个肥沃的研究领域——在这里,我们通过使用Precison回忆曲线下的区域和ROC来研究如何预测社交网络中两个不相关的人之间的关系。将社交网络建模为签名图,我们比较了三元模型、潜在信息模型和情感模型,并使用它们预测点对点交互,首先使用普通签名网络,然后使用带有注释的签名网络作为上下文。我们发现我们的模型比随机模型好得多,并且在不同的情况下可以相互补充。 摘要:In the age of digital interaction, person-to-person relationships existing on social media may be different from the very same interactions that exist offline. Examining potential or spurious relationships between members in a social network is a fertile area of research for computer scientists -- here we examine how relationships can be predicted between two unconnected people in a social network by using area under Precison-Recall curve and ROC. Modeling the social network as a signed graph, we compare Triadic model,Latent Information model and Sentiment model and use them to predict peer to peer interactions, first using a plain signed network, and second using a signed network with comments as context. We see that our models are much better than random model and could complement each other in different cases.
【39】 Graph Convolution Neural Network For Weakly Supervised Abnormality Localization In Long Capsule Endoscopy Videos 标题:基于图形卷积神经网络的长胶囊内窥镜视频弱监督异常定位 链接:https://arxiv.org/abs/2110.09110
作者:Sodiq Adewole,Philip Fernandes,James Jablonski,Andrew Copland,Michael Porter,Sana Syed,Donald Brown 机构:∗ Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA, USA, ∗ Department of Pediatrics, School of Medicine, University of Virginia, Charlottesville, VA, USA 摘要:长视频中的时间活动定位是一个重要的问题。获取长无线胶囊内窥镜(WCE)视频帧级标签的成本过高。在本文中,我们提出了一种仅使用弱视频级别标签的长WCE视频的端到端时间异常定位方法。内科医生使用胶囊内镜(CE)作为非手术和非侵入性方法检查整个消化道,以诊断疾病或异常。虽然CE彻底改变了传统的内窥镜检查程序,但一次CE检查可以持续8小时,产生多达100000帧图像。医生必须逐帧检查整个视频,以识别捕获相关异常的帧。这有时可能只有一帧。考虑到这种高度冗余,分析长CE视频可能非常繁琐、耗时,而且容易出错。该文提出了一种新的多步定位方法,仅使用弱视频标签对长视频中捕获感兴趣异常的目标帧进行端到端定位。首先,我们开发了一种使用变化点检测技术的自动时间分割方法,将视频时间分割成均匀、均匀和可识别的片段。然后,我们使用图卷积神经网络(GCNN)来学习每个视频片段的表示。使用弱视频片段标签,我们训练我们的GCNN模型,以便在每个视频片段至少包含一个异常帧时将其识别为异常。最后,利用训练的GCNN模型的参数,我们将网络的最后一层替换为时间池层,以定位每个异常视频片段中的相关异常帧。我们的方法在图形分类任务上的准确率为89.9%,在异常帧定位任务上的特异性为97.5%。 摘要:Temporal activity localization in long videos is an important problem. The cost of obtaining frame level label for long Wireless Capsule Endoscopy (WCE) videos is prohibitive. In this paper, we propose an end-to-end temporal abnormality localization for long WCE videos using only weak video level labels. Physicians use Capsule Endoscopy (CE) as a non-surgical and non-invasive method to examine the entire digestive tract in order to diagnose diseases or abnormalities. While CE has revolutionized traditional endoscopy procedures, a single CE examination could last up to 8 hours generating as much as 100,000 frames. Physicians must review the entire video, frame-by-frame, in order to identify the frames capturing relevant abnormality. This, sometimes could be as few as just a single frame. Given this very high level of redundancy, analyzing long CE videos can be very tedious, time consuming and also error prone. This paper presents a novel multi-step method for an end-to-end localization of target frames capturing abnormalities of interest in the long video using only weak video labels. First we developed an automatic temporal segmentation using change point detection technique to temporally segment the video into uniform, homogeneous and identifiable segments. Then we employed Graph Convolutional Neural Network (GCNN) to learn a representation of each video segment. Using weak video segment labels, we trained our GCNN model to recognize each video segment as abnormal if it contains at least a single abnormal frame. Finally, leveraging the parameters of the trained GCNN model, we replaced the final layer of the network with a temporal pool layer to localize the relevant abnormal frames within each abnormal video segment. Our method achieved an accuracy of 89.9% on the graph classification task and a specificity of 97.5% on the abnormal frames localization task.
【40】 Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode 标题:Vega:一种面向物联网终端节点的10核SoC,支持DNN加速和基于MRAM的状态保持睡眠模式的认知唤醒 链接:https://arxiv.org/abs/2110.09101
作者:Davide Rossi,Francesco Conti,Manuel Eggimann,Alfio Di Mauro,Giuseppe Tagliavini,Stefan Mach,Marco Guermandi,Antonio Pullini,Igor Loi,Jie Chen,Eric Flamand,Luca Benini 机构: Benini are withUniversity of Bologna 备注:13 pages, 11 figures, 8 tables, journal paper 摘要:物联网要求终端节点具有超低功耗、长电池寿命的常开功能,以及高性能、能效和极端灵活性,以处理复杂且快速发展的近传感器分析算法(NSAA)。我们介绍了Vega,一种物联网终端节点SoC,能够在NSAA上从1.7$mathrm{mu}$W完全保持认知睡眠模式扩展到32.2 GOP(@49.4 mW)峰值性能,包括移动DNN推断、利用1.6 MB状态保持SRAM和4 MB非易失性MRAM。为了满足NSAAs的性能和灵活性要求,SoC具有10个RISC-V核:一个用于SoC和IO管理的核,以及一个支持多精度SIMD整数和浮点计算的9核集群。Vega在8位INT计算上实现了615个GOPS/W的SoA领先效率(在硬件加速的情况下,对于8位DNN推断,提升到1.3TOPS/W)。在浮点(FP)计算中,它在32位和16位FP上分别达到79和129 GFLOPS/W的SoA领先效率。两个可编程机器学习(ML)加速器分别在认知睡眠和活动状态下提高能量效率。 摘要:The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for a long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from a 1.7 $mathrm{mu}$W fully retentive cognitive sleep mode up to 32.2 GOPS (@ 49.4 mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6 MB of state-retentive SRAM, and 4 MB of non-volatile MRAM. To meet the performance and flexibility requirements of NSAAs, the SoC features 10 RISC-V cores: one core for SoC and IO management and a 9-cores cluster supporting multi-precision SIMD integer and floating-point computation. Vega achieves SoA-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1.3TOPS/W for 8-bit DNN inference with hardware acceleration). On floating-point (FP) compuation, it achieves SoA-leading efficiency of 79 and 129 GFLOPS/W on 32- and 16-bit FP, respectively. Two programmable machine-learning (ML) accelerators boost energy efficiency in cognitive sleep and active states, respectively.
【41】 Learning to Learn a Cold-start Sequential Recommender 标题:学会学习冷启动序贯推荐器 链接:https://arxiv.org/abs/2110.09083
作者:Xiaowen Huang,Jitao Sang,Jian Yu,Changsheng Xu 机构:& Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, China, Sciences, China, School of Artificial Intelligence, University of Chinese Academy of Sciences, China, and Peng, Cheng Laboratory, China 摘要:冷启动推荐是当代在线应用中一个迫切需要解决的问题。它的目的是为行为稀疏的用户提供尽可能准确的建议。许多数据驱动算法,如广泛使用的矩阵分解,由于数据稀疏性而表现不佳。本文采用元学习的思想来解决用户的冷启动推荐问题。我们提出了一个基于元学习的冷启动顺序推荐框架metaCSR,该框架包括三个主要部分:Diffusion Representer,通过交互图上的信息扩散来学习更好的用户/项目嵌入;用于捕获行为序列的时间依赖性的序列推荐器;元学习器,用于提取和传播先前用户的可转移知识,并为新用户学习良好的初始化。metaCSR具有从常规用户行为中学习常见模式并优化初始化的能力,以便在一次或几次梯度更新后,模型能够快速适应新用户,以实现最佳性能。在三个广泛使用的数据集上进行的大量定量实验表明,metaCSR在处理用户冷启动问题方面具有显著的性能。同时,一系列的定性分析表明,该方法具有良好的泛化能力。 摘要:The cold-start recommendation is an urgent problem in contemporary online applications. It aims to provide users whose behaviors are literally sparse with as accurate recommendations as possible. Many data-driven algorithms, such as the widely used matrix factorization, underperform because of data sparseness. This work adopts the idea of meta-learning to solve the user's cold-start recommendation problem. We propose a meta-learning based cold-start sequential recommendation framework called metaCSR, including three main components: Diffusion Representer for learning better user/item embedding through information diffusion on the interaction graph; Sequential Recommender for capturing temporal dependencies of behavior sequences; Meta Learner for extracting and propagating transferable knowledge of prior users and learning a good initialization for new users. metaCSR holds the ability to learn the common patterns from regular users' behaviors and optimize the initialization so that the model can quickly adapt to new users after one or a few gradient updates to achieve optimal performance. The extensive quantitative experiments on three widely-used datasets show the remarkable performance of metaCSR in dealing with user cold-start problem. Meanwhile, a series of qualitative analysis demonstrates that the proposed metaCSR has good generalization.
【42】 Towards General Deep Leakage in Federated Learning 标题:论联合学习中的普遍深度泄漏 链接:https://arxiv.org/abs/2110.09074
作者:Jiahui Geng,Yongli Mou,Feifei Li,Qing Li,Oya Beyan,Stefan Decker,Chunming Rong 机构: University of Stavanger, RWTH-Aachen University, University of Cologne 摘要:与传统的集中训练不同,联邦学习(FL)通过共享和聚合本地模型而不是本地数据来保护用户的隐私,从而提高了全局模型的性能。尽管这种训练方法看起来很安全,但一些研究表明,攻击者仍然可以基于共享梯度信息恢复私有数据。这种动态重建攻击值得深入研究,因为它可以发生在训练的任何阶段,无论是在模型训练的开始还是结束时;不需要相关数据集,也不需要训练其他模型。我们突破了一些不切实际的假设和限制,将这种重建攻击应用于更广泛的场景中。我们提出了一些方法,分别对应于FedSGD和FedAvg使用场景,从共享梯度或权重重构训练数据。我们提出了一种零拍方法来恢复标签,即使批次中存在重复的标签。我们研究了标签和图像恢复之间的关系。我们发现,即使批次中只有一个错误推断的标签,图像恢复也会失败;我们还发现,当批处理图像具有相同的标签时,相应的图像被恢复为该类图像的融合。我们的方法基于经典图像基准进行评估,包括CIFAR-10和ImageNet。我们的方法的批量大小、图像质量和标签分布的适应性超过了最先进的梯度转换方法。 摘要:Unlike traditional central training, federated learning (FL) improves the performance of the global model by sharing and aggregating local models rather than local data to protect the users' privacy. Although this training approach appears secure, some research has demonstrated that an attacker can still recover private data based on the shared gradient information. This on-the-fly reconstruction attack deserves to be studied in depth because it can occur at any stage of training, whether at the beginning or at the end of model training; no relevant dataset is required and no additional models need to be trained. We break through some unrealistic assumptions and limitations to apply this reconstruction attack in a broader range of scenarios. We propose methods that can reconstruct the training data from shared gradients or weights, corresponding to the FedSGD and FedAvg usage scenarios, respectively. We propose a zero-shot approach to restore labels even if there are duplicate labels in the batch. We study the relationship between the label and image restoration. We find that image restoration fails even if there is only one incorrectly inferred label in the batch; we also find that when batch images have the same label, the corresponding image is restored as a fusion of that class of images. Our approaches are evaluated on classic image benchmarks, including CIFAR-10 and ImageNet. The batch size, image quality, and the adaptability of the label distribution of our approach exceed those of GradInversion, the state-of-the-art.
【43】 Ranking Facts for Explaining Answers to Elementary Science Questions 标题:解释初等科学问题答案的排序事实 链接:https://arxiv.org/abs/2110.09036
作者:Jennifer D'Souza,Isaiah Onando Mulang',Soeren Auer 机构: Germany 2University of Bonn, machine learningc⃝ Cambridge University Press 20 19 备注:25 pages, 5 figures, accepted for publication in NLE 摘要:在多项选择题考试中,学生从典型的四个选项中选择一个答案,并可以解释他们为什么做出那个特定的选择。学生擅长理解自然语言问题,基于他们的领域知识,可以通过在各种相关事实之间“连接点”轻松推断问题的答案。考虑到基础科学问答的自动推理,我们解决了从人类编写的事实生成答案解释的新任务。为此,我们研究了功能丰富的支持向量机的实际可扩展框架,该框架利用面向领域的手工制作的功能。解释是根据WorldTree语料库中近5000个候选事实的人类注释集创建的。我们的目标是在现有的事实候选者的基础上,更好地匹配解释问题正确答案的有效事实。为此,我们的功能提供了一个全面的语言和语义统一范式。机器学习问题是事实的偏好排序,为此,我们测试逐点回归与成对学习排序。我们的贡献是:(1)一个案例研究,其中系统地比较了两种偏好排序方法;(2) 这是一种实际可行的方法,可以超越基于BERT的重新排序模型的一些变体;(3)人工设计的特性使其成为任务的可解释机器学习模型。 摘要:In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question's answer by 'connecting the dots' across various pertinent facts. Considering automated reasoning for elementary science question answering, we address the novel task of generating explanations for answers from human-authored facts. For this, we examine the practically scalable framework of feature-rich support vector machines leveraging domain-targeted, hand-crafted features. Explanations are created from a human-annotated set of nearly 5,000 candidate facts in the WorldTree corpus. Our aim is to obtain better matches for valid facts of an explanation for the correct answer of a question over the available fact candidates. To this end, our features offer a comprehensive linguistic and semantic unification paradigm. The machine learning problem is the preference ordering of facts, for which we test pointwise regression versus pairwise learning-to-rank. Our contributions are: (1) a case study in which two preference ordering approaches are systematically compared; (2) it is a practically competent approach that can outperform some variants of BERT-based reranking models; and (3) the human-engineered features make it an interpretable machine learning model for the task.
【44】 Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient 标题:边缘重新布线走向神经:通过策略梯度提高网络弹性 链接:https://arxiv.org/abs/2110.09035
作者:Shanchao Yang,Kaili Ma,Baoxiang Wang,Hongyuan Zha 机构:School of Data Science, The Chinese University of Hong Kong, Shenzhen, Department of Computer Science and Engineering, Shenzhen Institute of Artificial Intelligence and Robotics for Society 摘要:提高网络的恢复能力可保护系统免受自然灾害和恶意攻击。这通常是通过引入新边来实现的,但新边可能超出节点可以维持的最大连接数。许多研究随后求助于重新布线的保度操作,即将现有边$AC、BD$交换为新边$AB、CD$。一系列重要的研究集中在该技术的理论和实践结果上,同时留下了三个局限性:网络效用损失、局部最优性和跨导性。在本文中,我们提出了ResiNet,一个基于强化学习(RL)的框架,用于发现针对各种灾难和攻击的弹性网络拓扑。ResiNet是客观不可知的,它允许通过将效用合并到目标函数中来平衡效用。局部最优性,通常出现在贪婪算法中,通过将累积弹性增益转化为逐步重新布线的顺序决策过程来解决。transductivity指的是对每个输入图进行计算密集型优化的必要性,它由我们的具有自回归置换不变变量操作空间的RL变量提升。ResiNet由我们的技术创新——过滤增强型GNN(FireGNN)提供支持,它可以区分细微差别的图形。因此,ResiNet可以捕获局部结构变化并在连续图中调整其决策,这对于GNN是不可行的。大量实验表明,与现有方法相比,通过少量的重新布线操作,ResiNet在平衡效用的同时,在多个图上实现了接近最优的恢复增益。 摘要:Improving the resilience of a network protects the system from natural disasters and malicious attacks. This is typically achieved by introducing new edges, which however may reach beyond the maximum number of connections a node could sustain. Many studies then resort to the degree-preserving operation of rewiring, which swaps existing edges $AC, BD$ to new edges $AB, CD$. A significant line of studies focuses on this technique for theoretical and practical results while leaving three limitations: network utility loss, local optimality, and transductivity. In this paper, we propose ResiNet, a reinforcement learning (RL)-based framework to discover resilient network topologies against various disasters and attacks. ResiNet is objective agnostic which allows the utility to be balanced by incorporating it into the objective function. The local optimality, typically seen in greedy algorithms, is addressed by casting the cumulative resilience gain into a sequential decision process of step-wise rewiring. The transductivity, which refers to the necessity to run a computationally intensive optimization for each input graph, is lifted by our variant of RL with auto-regressive permutation-invariant variable action space. ResiNet is armed by our technical innovation, Filtration enhanced GNN (FireGNN), which distinguishes graphs with minor differences. It is thus possible for ResiNet to capture local structure changes and adapt its decision among consecutive graphs, which is known to be infeasible for GNN. Extensive experiments demonstrate that with a small number of rewiring operations, ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
【45】 Arjun: An Efficient Independent Support Computation Technique and its Applications to Counting and Sampling 标题:ARJUN:一种高效的独立支持度计算技术及其在计数抽样中的应用 链接:https://arxiv.org/abs/2110.09026
作者:Mate Soos,Kuldeep S. Meel 机构:National University of Singapore 摘要:给定变量集$X$上的布尔公式$varphi$和投影集$mathcal{P}subseteq X$,变量子集$mathcal{I}$是$mathcal{P}$的独立支持。如果两个解在$mathcal{I}$上一致,那么它们也在$mathcal{P}$上一致。独立支持的概念与可定义性的经典概念相关,可以追溯到1901年,并且已经研究了几十年。最近,由于基于散列的计数和采样技术的独立支持至关重要,因此确定给定公式的独立支持度的计算问题变得非常重要。在本文中,我们设计了一种高效且可扩展的独立支持度计算技术,它可以处理来自真实基准测试的公式。我们的算法框架,称为Arjun,采用隐式和显式可定义性概念,并基于门识别技术和基于假设的框架的紧密集成。我们证明,使用Arjun增强最先进的模型计数器ApproxMC4和采样器UniGen3可以显著提高性能。特别是,在1896年中,使用Arjun样本增强的ApproxMC4又增加了387个基准,而使用Arjun样本增强的UniGen3在同一时间限制内又增加了319个基准。 摘要:Given a Boolean formula $varphi$ over the set of variables $X$ and a projection set $mathcal{P} subseteq X$, a subset of variables $mathcal{I}$ is independent support of $mathcal{P}$ if two solutions agree on $mathcal{I}$, then they also agree on $mathcal{P}$. The notion of independent support is related to the classical notion of definability dating back to 1901, and have been studied over the decades. Recently, the computational problem of determining independent support for a given formula has attained importance owing to the crucial importance of independent support for hashing-based counting and sampling techniques. In this paper, we design an efficient and scalable independent support computation technique that can handle formulas arising from real-world benchmarks. Our algorithmic framework, called Arjun, employs implicit and explicit definability notions, and is based on a tight integration of gate-identification techniques and assumption-based framework. We demonstrate that augmenting the state of the art model counter ApproxMC4 and sampler UniGen3 with Arjun leads to significant performance improvements. In particular, ApproxMC4 augmented with Arjun counts 387 more benchmarks out of 1896 while UniGen3 augmented with Arjun samples 319 more benchmarks within the same time limit.
【46】 Utilizing Active Machine Learning for Quality Assurance: A Case Study of Virtual Car Renderings in the Automotive Industry 标题:利用主动机器学习进行质量保证--以汽车行业虚拟汽车渲染为例 链接:https://arxiv.org/abs/2110.09023
作者:Patrick Hemmer,Niklas Kühl,Jakob Schöffer 机构:Karlsruhe Institute of Technology, Niklas K¨uhl, Jakob Sch¨offer 备注:Hawaii International Conference on System Sciences 2022 (HICSS-55) 摘要:计算机生成的汽车模型图像已成为汽车制造商广告理念中不可或缺的一部分。例如,它们被用于汽车配置程序中,为客户提供根据个人喜好在线配置汽车的可能性。然而,由于车型日益复杂,以人为主导的质量保证面临着跟上大批量目视检查的挑战。尽管机器学习在许多视觉检查任务中的应用已经取得了巨大的成功,但它对大型标记数据集的需求仍然是在实践中使用此类系统的主要障碍。在本文中,我们提出了一种基于主动机器学习的质量保证系统,该系统在不影响性能的情况下,需要显著更少的标记实例来识别有缺陷的虚拟汽车渲染。通过在一家德国汽车制造商使用我们的系统,可以克服启动困难,提高检测过程效率,从而实现经济优势。 摘要:Computer-generated imagery of car models has become an indispensable part of car manufacturers' advertisement concepts. They are for instance used in car configurators to offer customers the possibility to configure their car online according to their personal preferences. However, human-led quality assurance faces the challenge to keep up with high-volume visual inspections due to the car models' increasing complexity. Even though the application of machine learning to many visual inspection tasks has demonstrated great success, its need for large labeled data sets remains a central barrier to using such systems in practice. In this paper, we propose an active machine learning-based quality assurance system that requires significantly fewer labeled instances to identify defective virtual car renderings without compromising performance. By employing our system at a German automotive manufacturer, start-up difficulties can be overcome, the inspection process efficiency can be increased, and thus economic advantages can be realized.
【47】 Reinforcement Learning-Based Coverage Path Planning with Implicit Cellular Decomposition 标题:基于强化学习的隐式元胞分解覆盖路径规划 链接:https://arxiv.org/abs/2110.09018
作者:Javad Heydari,Olimpiya Saha,Viswanath Ganapathy 备注:20 pages 摘要:在一般已知环境中,覆盖路径规划是NP难的。当环境未知时,机器人需要依靠覆盖期间建立的在线地图信息来规划其路径,这就变得更具挑战性。一项重要的研究工作集中在设计启发式或近似算法,以实现合理的性能。此类算法在覆盖面积或覆盖成本(例如,覆盖时间或能量消耗)方面具有次优性能。在本文中,我们对覆盖问题进行了系统分析,并将其表述为一个最优停止时间问题,其中明确考虑了覆盖性能和成本之间的权衡。接下来,我们证明了强化学习(RL)技术可以用于计算解决问题。为此,我们提供了一些技术和实践方面的考虑,以促进RL算法的应用并提高解决方案的效率。最后,通过在网格世界环境和Gazebo模拟器中的实验,我们证明了基于强化学习的算法能够有效地覆盖真实的未知室内环境,并且优于目前的技术水平。 摘要:Coverage path planning in a generic known environment is shown to be NP-hard. When the environment is unknown, it becomes more challenging as the robot is required to rely on its online map information built during coverage for planning its path. A significant research effort focuses on designing heuristic or approximate algorithms that achieve reasonable performance. Such algorithms have sub-optimal performance in terms of covering the area or the cost of coverage, e.g., coverage time or energy consumption. In this paper, we provide a systematic analysis of the coverage problem and formulate it as an optimal stopping time problem, where the trade-off between coverage performance and its cost is explicitly accounted for. Next, we demonstrate that reinforcement learning (RL) techniques can be leveraged to solve the problem computationally. To this end, we provide some technical and practical considerations to facilitate the application of the RL algorithms and improve the efficiency of the solutions. Finally, through experiments in grid world environments and Gazebo simulator, we show that reinforcement learning-based algorithms efficiently cover realistic unknown indoor environments, and outperform the current state of the art.
【48】 Finding Everything within Random Binary Networks 标题:查找随机二进制网络中的所有内容 链接:https://arxiv.org/abs/2110.08996
作者:Kartik Sreenivasan,Shashank Rajput,Jy-yong Sohn,Dimitris Papailiopoulos 机构:University of Wisconsin-Madison 摘要:Ramanujan等人(2020年)最近的一项工作提供了重要的经验证据,证明充分参数化的随机神经网络包含未经训练的子网络,这些子网络在若干预测任务上达到了最先进的精度。后续的一系列理论工作通过证明稍微过参数化的神经网络,以及常用的连续值随机初始化,确实可以修剪以近似任何目标网络,从而为这些发现提供了理由。在这项工作中,我们证明了这些随机权重的振幅甚至不重要。我们证明了通过简单地剪枝一个二元${pm1}$权重的随机网络,任何目标网络都可以近似到任意精度,该网络只是一个比目标网络宽和深的多段对数因子。 摘要:A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used continuous-valued random initializations can indeed be pruned to approximate any target network. In this work, we show that the amplitude of those random weights does not even matter. We prove that any target network can be approximated up to arbitrary accuracy by simply pruning a random network of binary ${pm1}$ weights that is only a polylogarithmic factor wider and deeper than the target network.
【49】 Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research 标题:深度迁移学习与超越:信息系统研究中的Transformer语言模型 链接:https://arxiv.org/abs/2110.08975
作者:Ross Gruetzemacher,David Paradice 机构: Transformer Language Models in Information Systems Research Deep Transfer Learning & Beyond Transformer Language Models in Information Systems Research Ross Gruetzemacher Wichita State University, Frank Barton School of Business 备注:Under review (revised once). Section 2, the literature review on deep transfer learning and transformer language models, is a valuable introduction for a broad audience (not just information systems researchers) 摘要:人们普遍认为人工智能即将改变业务,但目前对这种转变范围的看法可能是短视的。涉及transformer language models(TLM)的自然语言处理的最新进展为AI驱动的业务和社会转型提供了一条潜在的途径,这超出了大多数人目前所预见的范围。我们回顾了这一最新进展以及利用顶级IS期刊中文本挖掘的最新文献,以概述未来IS研究如何从这些新技术中获益。我们对现有IS文献的回顾表明,次优文本挖掘技术非常普遍,更先进的TLM可用于增强和增加涉及文本数据的IS研究,并启用新的IS研究主题,从而为研究社区创造更多价值。这是可能的,因为这些技术使开发非常强大的定制系统变得更加容易,并且对于广泛的任务和应用,它们的性能优于现有方法。此外,多语言模型为多语言研究提供了更高质量的文本分析。我们还确定了IS研究的新途径,如语言用户界面,这可能为未来IS研究提供更大的潜力。 摘要:AI is widely thought to be poised to transform business, yet current perceptions of the scope of this transformation may be myopic. Recent progress in natural language processing involving transformer language models (TLMs) offers a potential avenue for AI-driven business and societal transformation that is beyond the scope of what most currently foresee. We review this recent progress as well as recent literature utilizing text mining in top IS journals to develop an outline for how future IS research can benefit from these new techniques. Our review of existing IS literature reveals that suboptimal text mining techniques are prevalent and that the more advanced TLMs could be applied to enhance and increase IS research involving text data, and to enable new IS research topics, thus creating more value for the research community. This is possible because these techniques make it easier to develop very powerful custom systems and their performance is superior to existing methods for a wide range of tasks and applications. Further, multilingual language models make possible higher quality text analytics for research in multiple languages. We also identify new avenues for IS research, like language user interfaces, that may offer even greater potential for future IS research.
【50】 SS-MAIL: Self-Supervised Multi-Agent Imitation Learning 标题:SS-MAIL:自监督多Agent模仿学习 链接:https://arxiv.org/abs/2110.08963
作者:Akshay Dharmavaram,Tejus Gupta,Jiachen Li,Katia P. Sycara 机构:Carnegie Mellon University,UC Berkeley 备注:Pre-Print 摘要:目前,多智能体专家模仿的领域主要由两大类算法控制——行为克隆(BC)和对抗性模仿学习(AIL)。BC方法存在复合错误,因为它们忽略了轨迹生成问题的顺序决策性质。此外,它们不能有效地模拟多模态行为。虽然所有方法都解决了复合错误和多模式策略训练的问题,但它们的训练动态不稳定。在这项工作中,我们通过引入一种新的自我监督损失来解决这个问题,该损失鼓励鉴别器近似更丰富的奖励函数。我们使用我们的方法来训练一个基于图的多代理参与者-批评家体系结构,该体系结构学习一个集中的策略,条件是学习一个潜在的交互图。我们表明,我们的方法(SS-MAIL)在现实世界的预测任务以及定制设计的合成实验中优于现有的最新方法。我们证明SS-MAIL是AIL方法家族的一部分,它提供了成本正规化学徒学习的理论联系。此外,我们利用自监督公式引入了一种新的基于教师强制的课程(轨迹强制),该课程通过逐步增加生成轨迹的长度来提高样本效率。SS-MAIL框架通过稳定策略训练、提高奖励形成能力以及提供多模式轨迹建模能力,提高了多代理模拟能力。 摘要:The current landscape of multi-agent expert imitation is broadly dominated by two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL). BC approaches suffer from compounding errors, as they ignore the sequential decision-making nature of the trajectory generation problem. Furthermore, they cannot effectively model multi-modal behaviors. While AIL methods solve the issue of compounding errors and multi-modal policy training, they are plagued with instability in their training dynamics. In this work, we address this issue by introducing a novel self-supervised loss that encourages the discriminator to approximate a richer reward function. We employ our method to train a graph-based multi-agent actor-critic architecture that learns a centralized policy, conditioned on a learned latent interaction graph. We show that our method (SS-MAIL) outperforms prior state-of-the-art methods on real-world prediction tasks, as well as on custom-designed synthetic experiments. We prove that SS-MAIL is part of the family of AIL methods by providing a theoretical connection to cost-regularized apprenticeship learning. Moreover, we leverage the self-supervised formulation to introduce a novel teacher forcing-based curriculum (Trajectory Forcing) that improves sample efficiency by progressively increasing the length of the generated trajectory. The SS-MAIL framework improves multi-agent imitation capabilities by stabilizing the policy training, improving the reward shaping capabilities, as well as providing the ability for modeling multi-modal trajectories.
【51】 Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training 标题:对抗性训练提高强化学习在电力系统控制中的鲁棒性 链接:https://arxiv.org/abs/2110.08956
作者:Alexander Pan,Yongkyun,Lee,Huan Zhang,Yize Chen,Yuanyuan Shi 机构: Huan Zhang is with the Department of Computer Science, Yuanyuan Shi is with the Department of Electrical and ComputerEngineering, University of California San Diego 备注:Published at 2021 ICML RL4RL Workshop Submitted to 2022 PSCC 摘要:由于可再生能源的扩散及其固有的间歇性和随机性,当前电力系统面临着严峻的运行挑战。强化学习(RL)的数据驱动决策算法为高效运行清洁能源系统提供了解决方案。尽管与基于模型的控制模型相比,RL算法取得了很好的性能,但对于安全关键物理系统中RL鲁棒性的研究还很有限。在这项工作中,我们首先展示了几个赢得竞争的、最先进的用于电力系统控制的RL代理容易受到对手攻击。具体而言,我们使用对手马尔可夫决策过程来学习攻击策略,并通过在白盒和黑盒攻击设置下从学习运行电力网络(L2RPN)挑战中成功攻击多个获胜代理来证明我们的攻击的效力。然后,我们建议使用对抗性训练来提高RL代理对攻击的鲁棒性,并避免不可行的操作决策。据我们所知,我们的工作首次强调了网格控制RL算法的脆弱性,并为提高其鲁棒性和安全性提供了有效的防御方案。 摘要:Due to the proliferation of renewable energy and its intrinsic intermittency and stochasticity, current power systems face severe operational challenges. Data-driven decision-making algorithms from reinforcement learning (RL) offer a solution towards efficiently operating a clean energy system. Although RL algorithms achieve promising performance compared to model-based control models, there has been limited investigation of RL robustness in safety-critical physical systems. In this work, we first show that several competition-winning, state-of-the-art RL agents proposed for power system control are vulnerable to adversarial attacks. Specifically, we use an adversary Markov Decision Process to learn an attack policy, and demonstrate the potency of our attack by successfully attacking multiple winning agents from the Learning To Run a Power Network (L2RPN) challenge, under both white-box and black-box attack settings. We then propose to use adversarial training to increase the robustness of RL agent against attacks and avoid infeasible operational decisions. To the best of our knowledge, our work is the first to highlight the fragility of grid control RL algorithms, and contribute an effective defense scheme towards improving their robustness and security.
【52】 Sim-to-Real Transfer in Multi-agent Reinforcement Networking for Federated Edge Computing 标题:联合边缘计算多智能体增强型网络中的SIM-to-Real传输 链接:https://arxiv.org/abs/2110.08952
作者:Pinyarash Pinyoanuntapong,Tagore Pothuneedi,Ravikumar Balakrishnan,Minwoo Lee,Chen Chen,Pu Wang 机构:University of North Carolina Charlotte∗, Intel Labs† University of Central Florida‡ 备注:5 pages 摘要:基于无线多跳边缘计算网络的联邦学习(FL),即多跳FL,是一种经济高效的分布式设备深度学习范式。本文介绍了基于Linux的高保真度仿真器FedEdge simulator,该仿真器支持多跳FL系统的快速原型制作、模拟到真实代码和知识转移。FedEdge模拟器建立在面向硬件的FedEdge实验框架之上,并对真实物理层模拟器进行了新的扩展。该仿真器利用基于跟踪的信道建模和动态链路调度来最小化仿真器和物理测试台之间的实际差距。我们的初步实验表明,在强化学习优化的多跳FL中,FedEdge模拟器具有高保真度,在sim卡上的性能优于真实知识转移。 摘要:Federated Learning (FL) over wireless multi-hop edge computing networks, i.e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm. This paper presents FedEdge simulator, a high-fidelity Linux-based simulator, which enables fast prototyping, sim-to-real code, and knowledge transfer for multi-hop FL systems. FedEdge simulator is built on top of the hardware-oriented FedEdge experimental framework with a new extension of the realistic physical layer emulator. This emulator exploits trace-based channel modeling and dynamic link scheduling to minimize the reality gap between the simulator and the physical testbed. Our initial experiments demonstrate the high fidelity of the FedEdge simulator and its superior performance on sim-to-real knowledge transfer in reinforcement learning-optimized multi-hop FL.
【53】 Developing a novel fair-loan-predictor through a multi-sensitive debiasing pipeline: DualFair 标题:通过多敏感去偏管道开发一种新的公平贷款预测器:DualFair 链接:https://arxiv.org/abs/2110.08944
作者:Arashdeep Singh,Jashandeep Singh,Ariba Khan,Amar Gupta 机构:Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Fresno, USA, Cambridge, USA 备注:10 pages, 2 figures, 3 tables, 1 pseudocode 摘要:机器学习(ML)模型越来越多地用于对人们生活产生重大影响的高风险应用。尽管使用了这些模型,但这些模型仍有可能基于种族、性别或族裔对某些社会群体产生偏见。以前的许多工作都试图通过更新训练数据(预处理)、改变模型学习过程(处理中)或操纵模型输出(后处理)来减轻这种“模型歧视”。然而,这些工作还没有扩展到多敏感参数和敏感选项(MSPSO)领域,其中敏感参数是可以区别的属性(例如种族),敏感选项是敏感参数中的选项(例如黑色或白色),因此它们的实际可用性有限。先前在公平方面的工作也受到了准确性和公平性权衡的影响,这使得准确性和公平性都不高。此外,以前的文献未能提供与MSPSO一起工作的整体公平性指标。在本文中,我们通过(a)创建一种新的称为DualFair的偏差缓解技术和(b)开发一种新的公平性度量(即AWI)来解决这三个问题,该度量可以处理MSPSO。最后,我们使用一个全面的美国抵押贷款数据集测试了我们新的缓解方法,并表明我们的分类器(或公平贷款预测器)比当前最先进的模型获得了更好的公平性和准确性指标。 摘要:Machine learning (ML) models are increasingly used for high-stake applications that can greatly impact people's lives. Despite their use, these models have the potential to be biased towards certain social groups on the basis of race, gender, or ethnicity. Many prior works have attempted to mitigate this "model discrimination" by updating the training data (pre-processing), altering the model learning process (in-processing), or manipulating model output (post-processing). However, these works have not yet been extended to the realm of multi-sensitive parameters and sensitive options (MSPSO), where sensitive parameters are attributes that can be discriminated against (e.g race) and sensitive options are options within sensitive parameters (e.g black or white), thus giving them limited real-world usability. Prior work in fairness has also suffered from an accuracy-fairness tradeoff that prevents both the accuracy and fairness from being high. Moreover, previous literature has failed to provide holistic fairness metrics that work with MSPSO. In this paper, we solve all three of these problems by (a) creating a novel bias mitigation technique called DualFair and (b) developing a new fairness metric (i.e. AWI) that can handle MSPSO. Lastly, we test our novel mitigation method using a comprehensive U.S mortgage lending dataset and show that our classifier, or fair loan predictor, obtains better fairness and accuracy metrics than current state-of-the-art models.
【54】 Poisoning Attacks on Fair Machine Learning 标题:对公平机器学习的毒害攻击 链接:https://arxiv.org/abs/2110.08932
作者:Minh-Hao Van,Wei Du,Xintao Wu,Aidong Lu 机构: University of Arkansas at Fayetteville, University of North Carolina at Charlotte 摘要:公平机器学习和对抗学习都得到了广泛的研究。然而,攻击公平的机器学习模型受到的关注较少。在本文中,我们提出了一个框架,旨在有效地生成中毒样本,以攻击模型准确性和算法公平性。我们的攻击框架可以针对使用各种基于群体的公平概念训练的公平机器学习模型,如人口均等和均等赔率。我们开发了三种在线攻击:对抗性采样、对抗性标记和对抗性特征修改。所有这三种攻击都通过采样、标记或修改部分训练数据来有效地生成中毒样本,以降低测试精度。我们的框架使攻击者能够灵活地调整攻击的重点,即预测准确性或公平性,并准确地量化每个候选点对准确性损失和公平性违反的影响,从而产生有效的中毒样本。在两个真实数据集上的实验证明了该框架的有效性和效率。 摘要:Both fair machine learning and adversarial learning have been extensively studied. However, attacking fair machine learning models has received less attention. In this paper, we present a framework that seeks to effectively generate poisoning samples to attack both model accuracy and algorithmic fairness. Our attacking framework can target fair machine learning models trained with a variety of group based fairness notions such as demographic parity and equalized odds. We develop three online attacks, adversarial sampling , adversarial labeling, and adversarial feature modification. All three attacks effectively and efficiently produce poisoning samples via sampling, labeling, or modifying a fraction of training data in order to reduce the test accuracy. Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation, thus producing effective poisoning samples. Experiments on two real datasets demonstrate the effectiveness and efficiency of our framework.
【55】 Explaining generalization in deep learning: progress and fundamental limits 标题:解释深度学习中的泛化:进展与基本局限 链接:https://arxiv.org/abs/2110.08922
作者:Vaishnavh Nagarajan 机构:CMU-CS-,-, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA , Thesis Committee:, J. Zico Kolter, Chair, Andrej Risteski, Ameet Talwalkar, Nathan Srebro, Toyota Technological Institute at Chicago, Submitted in partial fulfillment of the requirements 备注:arXiv admin note: text overlap with arXiv:1902.04742 摘要:本文研究了深度学习理论中一个基本的开放性挑战:为什么深度网络在过度参数化、未规范化以及将训练数据拟合为零误差的情况下仍能很好地推广?在论文的第一部分,我们将实证研究通过随机梯度下降训练深层网络如何隐含地控制网络的容量。随后,为了说明这如何导致更好的泛化,我们将导出{em数据相关的}{em基于一致收敛的}泛化边界,并改进对参数计数的依赖性。由于其简单性和通用性,一致收敛实际上是深度学习文献中使用最广泛的工具。鉴于一致收敛的普遍性,在本论文中,我们还将后退一步,确定一致收敛的基本极限,作为解释泛化的工具。特别是,我们将展示在一些过参数化设置的示例中,{em any}一致收敛界将只提供一个空洞的泛化界。考虑到这一点,在论文的最后一部分,我们将改变方向,引入一种{em emerical}技术来使用未标记数据估计泛化。我们的技术不依赖于基于一致收敛的复杂性的任何概念,并且非常精确。我们将从理论上说明为什么我们的技术具有如此高的精度。最后,我们将讨论未来的工作如何探索将分布假设纳入泛化边界(如以未标记数据的形式)的新方法,并探索其他工具来推导边界,可能是通过修改一致收敛或开发全新的工具。 摘要:This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {em data-dependent} {em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.
【56】 Green Simulation Assisted Policy Gradient to Accelerate Stochastic Process Control 标题:绿色仿真辅助政策梯度加速随机过程控制 链接:https://arxiv.org/abs/2110.08902
作者:Hua Zheng,Wei Xie,M. Ben Feng 机构:Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA , Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON Canada 备注:36 pages, 7 figures 摘要:本研究的动机是生物制药制造中的关键挑战,包括高复杂性、高不确定性和非常有限的过程数据。每次实验通常都非常昂贵。为了支持最优和鲁棒的过程控制,我们为在线和离线学习设置提出了一个通用的绿色模拟辅助策略梯度(GS-PG)框架。基本上,为了解决最先进的强化学习(RL)的关键局限性,例如样本效率低和可靠性低,我们创建了一种基于混合似然比的策略梯度估计,它可以利用在不同输入下进行的历史实验的信息,包括过程模型系数和决策策略参数。然后,为了加速最优和鲁棒策略的学习,我们进一步提出了一种基于方差缩减的样本选择方法,该方法允许GS-PG智能地选择和重用最相关的历史轨迹。在学习过程机制和搜索最优策略的过程中,选择规则自动更新要重用的样本。我们的理论和实证研究表明,该框架比现有的政策梯度法具有更好的性能,能够加速高不确定性复杂随机系统的最优鲁棒过程控制。 摘要:This study is motivated by the critical challenges in the biopharmaceutical manufacturing, including high complexity, high uncertainty, and very limited process data. Each experiment run is often very expensive. To support the optimal and robust process control, we propose a general green simulation assisted policy gradient (GS-PG) framework for both online and offline learning settings. Basically, to address the key limitations of state-of-art reinforcement learning (RL), such as sample inefficiency and low reliability, we create a mixture likelihood ratio based policy gradient estimation that can leverage on the information from historical experiments conducted under different inputs, including process model coefficients and decision policy parameters. Then, to accelerate the learning of optimal and robust policy, we further propose a variance reduction based sample selection method that allows GS-PG to intelligently select and reuse most relevant historical trajectories. The selection rule automatically updates the samples to be reused during the learning of process mechanisms and the search for optimal policy. Our theoretical and empirical studies demonstrate that the proposed framework can perform better than the state-of-art policy gradient approach and accelerate the optimal robust process control for complex stochastic systems under high uncertainty.
【57】 Network Augmentation for Tiny Deep Learning 标题:用于微深度学习的网络增强 链接:https://arxiv.org/abs/2110.08890
作者:Han Cai,Chuang Gan,Ji Lin,Song Han 机构:Massachusetts Institute of Technology, MIT-IBM Watson AI Lab 摘要:我们介绍了一种新的训练方法,即网络增强(netaugment,NetAug),以提高微型神经网络的性能。现有的正则化技术(例如,数据增强、丢失)通过添加噪声克服过拟合,在大型神经网络(例如,ResNet50)上取得了很大的成功。然而,我们发现这些技术损害了微型神经网络的性能。我们认为,训练微小模型不同于大型模型:与其增加数据,不如增加模型,因为由于容量有限,微小模型往往存在拟合不足而不是拟合过度的问题。为了缓解此问题,NetAug增加了网络(反向退出),而不是将噪声插入数据集或网络。它将微小的模型放入更大的模型中,并鼓励它作为更大模型的子模型工作,以获得额外的监督,此外还作为独立模型发挥作用。在测试时,仅使用微小的模型进行推理,因此推理开销为零。我们展示了NetAug在图像分类和目标检测方面的有效性。NetAug持续改进微型模型的性能,在ImageNet上实现了高达2.1%的精度改进,在汽车上实现了4.3%的精度改进。在Pascal VOC上,NetAug以相同的计算成本提供了2.96%的mAP改进。 摘要:We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. Existing regularization techniques (e.g., data augmentation, dropout) have shown much success on large neural networks (e.g., ResNet50) by adding noise to overcome over-fitting. However, we found these techniques hurt the performance of tiny neural networks. We argue that training tiny models are different from large models: rather than augmenting the data, we should augment the model, since tiny models tend to suffer from under-fitting rather than over-fitting due to limited capacity. To alleviate this issue, NetAug augments the network (reverse dropout) instead of inserting noise into the dataset or the network. It puts the tiny model into larger models and encourages it to work as a sub-model of larger models to get extra supervision, in addition to functioning as an independent model. At test time, only the tiny model is used for inference, incurring zero inference overhead. We demonstrate the effectiveness of NetAug on image classification and object detection. NetAug consistently improves the performance of tiny models, achieving up to 2.1% accuracy improvement on ImageNet, and 4.3% on Cars. On Pascal VOC, NetAug provides 2.96% mAP improvement with the same computational cost.
【58】 Learning First-Order Rules with Relational Path Contrast for Inductive Relation Reasoning 标题:用于归纳关系推理的带关系路径对比的一阶规则学习 链接:https://arxiv.org/abs/2110.08810
作者:Yudai Pan,Jun Liu,Lingling Zhang,Xin Hu,Tianzhe Zhao,Qika Lin 机构: sothe previous transductive methods will not accurately predictthe relation denoted as the blue dotted arrow without retrainingYudai Pan is with the School of Computer Science and Technology, Xi’anJiaotong University 备注:This work is going to be submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要:知识图中的关系推理(KG)旨在预测不完全三元组中的缺失关系,而主流范式是学习关系和实体的嵌入,这仅限于转换设置,并限制在归纳情况下处理看不见的实体。以前的归纳方法是可伸缩的,并且消耗更少的资源。它们利用子图中实体和三元组的结构来拥有归纳能力。然而,为了获得更好的推理结果,该模型应该在潜在规则中获得与实体无关的关系语义,并解决子图中缺乏规则所导致的监管不足。为了解决这些问题,我们提出了一种新的基于图卷积网络(GCN)的具有关系路径对比度的可解释归纳推理方法,称为RPC-IR。RPC-IR首先提取两个实体之间的关系路径并学习它们的表示,然后创新性地引入了一种对比策略,通过构建积极和消极的关系路径。提出了一种同时考虑监督信息和对比信息的联合训练策略。在三个归纳数据集上的综合实验表明,与最新的归纳推理方法相比,RPC-IR具有优异的性能,能够显式地表示逻辑规则的可解释性。 摘要:Relation reasoning in knowledge graphs (KGs) aims at predicting missing relations in incomplete triples, whereas the dominant paradigm is learning the embeddings of relations and entities, which is limited to a transductive setting and has restriction on processing unseen entities in an inductive situation. Previous inductive methods are scalable and consume less resource. They utilize the structure of entities and triples in subgraphs to own inductive ability. However, in order to obtain better reasoning results, the model should acquire entity-independent relational semantics in latent rules and solve the deficient supervision caused by scarcity of rules in subgraphs. To address these issues, we propose a novel graph convolutional network (GCN)-based approach for interpretable inductive reasoning with relational path contrast, named RPC-IR. RPC-IR firstly extracts relational paths between two entities and learns representations of them, and then innovatively introduces a contrastive strategy by constructing positive and negative relational paths. A joint training strategy considering both supervised and contrastive information is also proposed. Comprehensive experiments on three inductive datasets show that RPC-IR achieves outstanding performance comparing with the latest inductive reasoning methods and could explicitly represent logical rules for interpretability.
【59】 Coordinated Multi-Agent Pathfinding for Drones and Trucks over Road Networks 标题:基于多智能体的无人机和卡车在公路网上的协同寻路 链接:https://arxiv.org/abs/2110.08802
作者:Shushman Choudhury,Kiril Solovey,Mykel Kochenderfer,Marco Pavone 机构:Stanford University, Stanford, CA, USA, Technion - Israel Institute of Technology, Haifa, Israel 摘要:我们解决了无人机和卡车在大规模城市道路网络上的路线问题。为了节省有限的飞行能量,无人机可以在前往目的地的途中使用卡车作为临时运输方式。与独立操作无人机和卡车相比,这种协调可以显著节省车辆行驶总距离,即卡车行驶距离和无人机飞行距离。但是,在决定哪些卡车和无人机应该进行协调以及何时何地进行协调时,可能会产生令人望而却步的计算成本。我们通过将整个棘手的问题解耦为可处理的子问题来解决这个基本的权衡问题,我们分阶段解决这些子问题。第一阶段只解决卡车的问题,通过计算路径使其更有可能成为无人机的有用运输选择。第二阶段仅解决无人机的问题,通过将无人机路由到由第一阶段卡车路径定义的道路网络和交通网络的组合中。我们设计了一个全面的算法框架,将每个阶段作为一个多智能体路径发现问题,并实现了两种不同的解决方法。我们使用高达100美元的代理在真实世界曼哈顿道路网络上进行广泛的模拟,其中包含近4500美元的顶点和10000美元的边缘,对我们的方法进行评估。与独立解决卡车和无人机的问题相比,我们的框架节省了超过50%$的车辆行驶距离,并在商品硬件上在5$分钟内计算所有设置的解决方案。 摘要:We address the problem of routing a team of drones and trucks over large-scale urban road networks. To conserve their limited flight energy, drones can use trucks as temporary modes of transit en route to their own destinations. Such coordination can yield significant savings in total vehicle distance traveled, i.e., truck travel distance and drone flight distance, compared to operating drones and trucks independently. But it comes at the potentially prohibitive computational cost of deciding which trucks and drones should coordinate and when and where it is most beneficial to do so. We tackle this fundamental trade-off by decoupling our overall intractable problem into tractable sub-problems that we solve stage-wise. The first stage solves only for trucks, by computing paths that make them more likely to be useful transit options for drones. The second stage solves only for drones, by routing them over a composite of the road network and the transit network defined by truck paths from the first stage. We design a comprehensive algorithmic framework that frames each stage as a multi-agent path-finding problem and implement two distinct methods for solving them. We evaluate our approach on extensive simulations with up to $100$ agents on the real-world Manhattan road network containing nearly $4500$ vertices and $10000$ edges. Our framework saves on more than $50%$ of vehicle distance traveled compared to independently solving for trucks and drones, and computes solutions for all settings within $5$ minutes on commodity hardware.
【60】 Taming Visually Guided Sound Generation 标题:驯服视觉引导的声音生成 链接:https://arxiv.org/abs/2110.08791
作者:Vladimir Iashin,Esa Rahtu 机构:Computing Sciences, Tampere University, Tampere, Finland, Playing, Harp, Lions, Roaring, Canary, Calling, Visually-, Guided, Sound, Generation, Model, � Click to Play, in Adobe Reader 备注:Accepted as an oral presentation for the BMVC 2021. Code: this https URL Project page: this https URL 摘要:视觉诱导音频生成的最新进展是基于采样短、低保真度和一类声音。此外,在高端GPU上,从最先进的模型中采集1秒的音频需要几分钟。在这项工作中,我们提出了一种单一的模型,它能够在比在单个GPU上播放所需时间更短的时间内,从开放域视频中生成一组帧提示的视觉相关的高保真声音。我们训练一个转换器,在给定一组视频特征的情况下,从预先训练的频谱图码本中采样一个新的频谱图。该码本是使用VQGAN的一种变体获得的,该变体经过训练以产生一个紧凑的采样空间,该采样空间具有一种新的基于谱图的感知损失。生成的光谱图使用基于窗口的GAN转换为波形,显著加快生成速度。考虑到缺乏自动评估生成光谱图的指标,我们还构建了一系列指标,称为FID和MKL。这些指标基于一种称为Melection的新型声音分类器,旨在评估开放域样本的保真度和相关性。定性和定量研究均在小型和大型数据集上进行,以评估生成样本的保真度和相关性。我们还将我们的模型与最新技术进行了比较,并观察到在质量、大小和计算时间方面有了实质性的改进。代码、演示和示例:v-iashin.github.io/SpecVQGAN 摘要:Recent advances in visually-induced audio generation are based on sampling short, low-fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the-art model takes minutes on a high-end GPU. In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU. We train a transformer to sample a new spectrogram from the pre-trained spectrogram codebook given the set of video features. The codebook is obtained using a variant of VQGAN trained to produce a compact sampling space with a novel spectrogram-based perceptual loss. The generated spectrogram is transformed into a waveform using a window-based GAN that significantly speeds up generation. Considering the lack of metrics for automatic evaluation of generated spectrograms, we also build a family of metrics called FID and MKL. These metrics are based on a novel sound classifier, called Melception, and designed to evaluate the fidelity and relevance of open-domain samples. Both qualitative and quantitative studies are conducted on small- and large-scale datasets to evaluate the fidelity and relevance of generated samples. We also compare our model to the state-of-the-art and observe a substantial improvement in quality, size, and computation time. Code, demo, and samples: v-iashin.github.io/SpecVQGAN
【61】 An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes 标题:基于LSTM的注意机制抄袭检测和基于群体的不平衡类预训练参数方法 链接:https://arxiv.org/abs/2110.08771
作者:Seyed Vahid Moravvej,Seyed Jalaleddin Mousavirad,Mahshid Helali Moghadam,Mehrdad Saadatmand 机构:Department of Computer Engineering, Isfahan University of Technology, Isfahan, Iran, Department of Computer Engineering, Hakim Sabzevari Univesity, Sabzevar, Iran, RISE Research Institutes of Sweden, Sweden, Mälardalen University, Västerás, Sweden 备注:12 pages, The 28th International Conference on Neural Information Processing (ICONIP2021), BALI, Indonesia 摘要:剽窃是学术和工业环境中的主要问题之一,其目标是在典型的文档或源代码中找到类似的项目。本文提出了一种基于长短时记忆(LSTM)和注意机制的结构,称为LSTM-AM-ABC,该结构由基于人口的参数初始化方法推动。基于梯度的优化算法,如反向传播(BP)在LSTM、注意机制和前馈神经网络的学习过程中得到了广泛的应用,但也存在一些问题,如陷入局部最优。为了解决这个问题,可以使用基于群体的元启发式(PBMH)算法。为此,本文采用了PBMH算法,即人工蜂群(ABC)来缓和问题。我们提出的算法可以同时在所有LSTM、注意机制和前馈神经网络中找到模型学习的初始值。换句话说,ABC算法为启动BP算法找到了一个有希望的点。为了评估,我们将我们提出的算法与传统方法和基于群体的方法进行了比较。结果清楚地表明,所提出的方法可以提供有竞争力的性能。 摘要:Plagiarism is one of the leading problems in academic and industrial environments, which its goal is to find the similar items in a typical document or source code. This paper proposes an architecture based on a Long Short-Term Memory (LSTM) and attention mechanism called LSTM-AM-ABC boosted by a population-based approach for parameter initialization. Gradient-based optimization algorithms such as back-propagation (BP) are widely used in the literature for learning process in LSTM, attention mechanism, and feed-forward neural network, while they suffer from some problems such as getting stuck in local optima. To tackle this problem, population-based metaheuristic (PBMH) algorithms can be used. To this end, this paper employs a PBMH algorithm, artificial bee colony (ABC), to moderate the problem. Our proposed algorithm can find the initial values for model learning in all LSTM, attention mechanism, and feed-forward neural network, simultaneously. In other words, ABC algorithm finds a promising point for starting BP algorithm. For evaluation, we compare our proposed algorithm with both conventional and population-based methods. The results clearly show that the proposed method can provide competitive performance.
【62】 Towards Better Long-range Time Series Forecasting using Generative Adversarial Networks 标题:利用产生式对抗性网络进行更好的长期时间序列预测 链接:https://arxiv.org/abs/2110.08770
作者:Shiyu Liu,Mehul Motani 机构:Department of Electrical and Computer Engineering, National University of Singapore 备注:7 pages main paper with 4 pages appendix 摘要:时间序列数据的准确长期预测是能源、医疗和金融等许多行业的一个重要问题。近年来,生成性对抗网络(GAN)为解决许多问题提供了革命性的方法。然而,使用GAN改进长期时间序列预测的方法仍然相对未被探索。在本文中,我们利用条件Wasserstein-GAN(CWGAN)并用误差惩罚项对其进行扩充,从而产生一种新的生成模型,该模型旨在生成高质量的合成时间序列数据,称为CWGAN-TS。通过使用这种合成数据,我们开发了一种长期预测方法,称为生成预测(GenF),由三部分组成:(i)CWGAN-TS,用于生成接下来几个时间步的合成数据。(ii)根据生成和观测数据进行长期预测的预测器。(iii)一种信息论聚类(ITC)算法,用于更好地训练CWGAN-TS和预测器。我们在三个公共数据集上的实验结果表明,GenF显著优于各种最先进的基准和经典方法。在大多数情况下,我们发现与性能最佳的基准相比,预测性能(平均绝对误差)提高了6%-12%,参数减少了37%。最后,我们进行了烧蚀研究,以证明CWGAN-TS和ITC算法的有效性。 摘要:Accurate long-range forecasting of time series data is an important problem in many sectors, such as energy, healthcare, and finance. In recent years, Generative Adversarial Networks (GAN) have provided a revolutionary approach to many problems. However, the use of GAN to improve long-range time series forecasting remains relatively unexplored. In this paper, we utilize a Conditional Wasserstein GAN (CWGAN) and augment it with an error penalty term, leading to a new generative model which aims to generate high-quality synthetic time series data, called CWGAN-TS. By using such synthetic data, we develop a long-range forecasting approach, called Generative Forecasting (GenF), consisting of three components: (i) CWGAN-TS to generate synthetic data for the next few time steps. (ii) a predictor which makes long-range predictions based on generated and observed data. (iii) an information theoretic clustering (ITC) algorithm to better train the CWGAN-TS and the predictor. Our experimental results on three public datasets demonstrate that GenF significantly outperforms a diverse range of state-of-the-art benchmarks and classical approaches. In most cases, we find a 6% - 12% improvement in predictive performance (mean absolute error) and a 37% reduction in parameters compared to the best performing benchmark. Lastly, we conduct an ablation study to demonstrate the effectiveness of the CWGAN-TS and the ITC algorithm.
【63】 TIP: Task-Informed Motion Prediction for Intelligent Systems 标题:提示:面向智能系统的任务知晓运动预测 链接:https://arxiv.org/abs/2110.08750
作者:Xin Huang,Guy Rosman,Ashkan Jasour,Stephen G. McGill,John J. Leonard,Brian C. Williams 机构:Mas-sachusettsInstituteofTechnology, edu 2Toyota Research Institute 备注:8 pages, 6 figures, 2 tables 摘要:运动预测对于智能驾驶系统非常重要,它可以提供道路代理行为的未来分布,并支持各种决策任务。现有的运动预测器通常通过基于预测精度的任务无关度量进行优化和评估。这些措施无法考虑在下游任务中使用预测,并可能导致次优任务性能。我们提出了一个基于任务的运动预测框架,该框架综合考虑预测精度和任务效用,通过预测更好地支持下游任务。任务效用函数不需要完整的任务信息,而是需要任务效用的规范,从而产生服务于广泛下游任务的预测值。我们在自主驾驶和并行自主的背景下,在任务实用程序的两个用例上展示了我们的框架,并在Waymo开放运动数据集上展示了任务通知预测器相对于任务无关预测器的优势。 摘要:Motion prediction is important for intelligent driving systems, providing the future distributions of road agent behaviors and supporting various decision making tasks. Existing motion predictors are often optimized and evaluated via task-agnostic measures based on prediction accuracy. Such measures fail to account for the use of prediction in downstream tasks, and could result in sub-optimal task performance. We propose a task-informed motion prediction framework that jointly reasons about prediction accuracy and task utility, to better support downstream tasks through its predictions. The task utility function does not require the full task information, but rather a specification of the utility of the task, resulting in predictors that serve a wide range of downstream tasks. We demonstrate our framework on two use cases of task utilities, in the context of autonomous driving and parallel autonomy, and show the advantage of task-informed predictors over task-agnostic ones on the Waymo Open Motion dataset.
【64】 Reminding the Incremental Language Model via Data-Free Self-Distillation 标题:通过无数据自蒸馏提醒增量语言模型 链接:https://arxiv.org/abs/2110.08745
作者:Han Wang,Ruiliu Fu,Chengzhang Li,Xuejun Zhang,Jun Zhou,Yonghong Yan 机构: Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China, University of Chinese Academy of Sciences, Beijing, China 备注:8 pages, 5 figures 摘要:使用伪数据的增量语言学习可以缓解神经网络中的灾难性遗忘。然而,为了获得更好的性能,以前的方法对以前任务的伪数据有更高的要求。当使用更少的伪数据时,性能会显著降低。此外,随着不同任务的顺序学习,伪数据的分布逐渐偏离真实数据。学习的任务越多,偏差就越大,这会导致更严重的灾难性遗忘。为了解决这些问题,我们提出了通过无数据自蒸馏(DFSD)来提醒增量语言模型,该模型包括基于地球移动者距离的自蒸馏和隐藏数据增强。通过估计GPT-2各层的知识分布,并将其从教师模型转换为学生模型,基于推土机距离的自蒸馏可以显著减少对伪数据的需求。通过将伪数据的生成建模为一个隐藏数据扩充过程,其中每个样本都是所有训练任务数据的混合,隐藏数据扩充可以极大地缓解由偏差引起的灾难性遗忘。实验结果表明,即使伪数据的最大减少量为90%,我们的DFSD也可以超过以前的最新方法。 摘要:Incremental language learning with pseudo-data can alleviate catastrophic forgetting in neural networks. However, to obtain better performance, former methods have higher demands for pseudo-data of the previous tasks. The performance dramatically decreases when fewer pseudo-data are employed. In addition, the distribution of pseudo-data gradually deviates from the real data with the sequential learning of different tasks. The deviation will be greater with more tasks learned, which results in more serious catastrophic forgetting. To address these issues, we propose reminding incremental language model via data-free self-distillation (DFSD), which includes self-distillation based on the Earth Mover's Distance and hidden data augmentation. By estimating the knowledge distribution in all layers of GPT-2 and transforming it from teacher model to student model, the Self-distillation based on the Earth Mover's Distance can significantly reduce the demand for pseudo-data. Hidden data augmentation can greatly alleviate the catastrophic forgetting caused by deviations via modeling the generation of pseudo-data as a hidden data augmentation process, where each sample is a mixture of all trained task data. The experimental results demonstrate that our DFSD can exceed the previous state-of-the-art methods even if the maximum decrease in pseudo-data is 90%.
【65】 A model for full local image interpretation 标题:一种全局域图像解译模型 链接:https://arxiv.org/abs/2110.08744
作者:Guy Ben-Yosef,Liav Assif,Daniel Harari,Shimon Ullman 机构:Department of Computer Science, Weizmann Institute of Science, Rehovot, Israel, Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 备注:None 摘要:我们描述了人类提供场景中组件详细解释能力的计算模型。人类几乎可以在任何地方识别图像中有意义的成分,而识别这些成分是视觉过程的重要组成部分,也是理解周围场景及其对观众潜在意义的重要组成部分。详细的解释超出了当前视觉识别模型的范围。我们的模型表明,这是一个基本限制,与现有模型依赖于前馈但有限的自顶向下处理这一事实有关。在我们的模型中,第一个识别阶段导致类候选的初始激活,这是不完整的,并且精确度有限。然后,该阶段触发特定于类的解释和验证过程的应用,从而恢复对可见场景更丰富、更准确的解释。我们讨论了该模型对人类和计算机视觉模型的视觉解释的影响。 摘要:We describe a computational model of humans' ability to provide a detailed interpretation of components in a scene. Humans can identify in an image meaningful components almost everywhere, and identifying these components is an essential part of the visual process, and of understanding the surrounding scene and its potential meaning to the viewer. Detailed interpretation is beyond the scope of current models of visual recognition. Our model suggests that this is a fundamental limitation, related to the fact that existing models rely on feed-forward but limited top-down processing. In our model, a first recognition stage leads to the initial activation of class candidates, which is incomplete and with limited accuracy. This stage then triggers the application of class-specific interpretation and validation processes, which recover richer and more accurate interpretation of the visible scene. We discuss implications of the model for visual interpretation by humans and by computer vision models.
【66】 Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms 标题:用有效的增强机制改进发音错误检测的端到端建模 链接:https://arxiv.org/abs/2110.08731
作者:Tien-Hong Lo,Yao-Ting Sung,Berlin Chen 机构:National Taiwan Normal University, Taipei City, Taiwan 备注:7 pages, 2 figures, 4 tables, accepted to Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021) 摘要:近年来,端到端(E2E)模型在开发发音错误检测(MD)系统时引起了广泛的研究关注,该模型允许将第二语言学习者话语的频谱向量序列作为输入,并生成相应的电话级序列作为输出。然而,由于缺乏足够的二语说话人标记语音数据进行模型估计,E2E MD模型相对于基于DNN-HMM声学模型的传统模型容易过度拟合。为了缓解这一关键问题,我们在本文中提出了两种建模策略,以增强E2E MD模型的识别能力,每种建模策略都可以隐式地利用在预训练声学模型中编码的语音和语音特征,并分别包含在训练数据的参考转录本中。第一种是输入增强,其目的是从DNN-HMM声学模型中提取语音识别知识。第二种是标签增强,它设法从训练数据的转录本中捕获更多的语音模式。在L2-ARCTIC英语数据集上进行的一系列实证实验似乎证实了我们的E2E MD模型与一些顶级E2E MD模型和基于DNN-HMM声学模型的经典发音评分方法相比的有效性。 摘要:Recently, end-to-end (E2E) models, which allow to take spectral vector sequences of L2 (second-language) learners' utterances as input and produce the corresponding phone-level sequences as output, have attracted much research attention in developing mispronunciation detection (MD) systems. However, due to the lack of sufficient labeled speech data of L2 speakers for model estimation, E2E MD models are prone to overfitting in relation to conventional ones that are built on DNN-HMM acoustic models. To alleviate this critical issue, we in this paper propose two modeling strategies to enhance the discrimination capability of E2E MD models, each of which can implicitly leverage the phonetic and phonological traits encoded in a pretrained acoustic model and contained within reference transcripts of the training data, respectively. The first one is input augmentation, which aims to distill knowledge about phonetic discrimination from a DNN-HMM acoustic model. The second one is label augmentation, which manages to capture more phonological patterns from the transcripts of training data. A series of empirical experiments conducted on the L2-ARCTIC English dataset seem to confirm the efficacy of our E2E MD model when compared to some top-of-the-line E2E MD models and a classic pronunciation-scoring based method built on a DNN-HMM acoustic model.
【67】 VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds 标题:VoteHMR:遮挡感知投票网络用于从部分点云中稳健地恢复三维人体网格 链接:https://arxiv.org/abs/2110.08729
作者:Guanze Liu,Yu Rong,Lu Sheng 机构:College of Software, Beihang, Beijing, China, The Chinese University of Hong Kong, Hong Kong SAR, China 备注:Our paper are accepted to MM 2021 as oral 摘要:从点云中恢复三维人体网格对于各种任务都至关重要,包括AR/VR和人类行为理解。该领域以前的工作要么需要高质量的3D人体扫描,要么需要连续的点云,这无法轻松应用于消费者级深度传感器捕获的低质量3D扫描。在本文中,我们首次尝试从单帧局部点云重建可靠的三维人体形状。为此,我们提出了一种端到端可学习的方法,称为VoteHMR。VoteHMR的核心是一种新型的遮挡感知投票网络,该网络能够从输入的部分点云可靠地生成可见的关节级特征,然后通过人体骨骼的运动树完成关节级特征。与以往的整体特征相比,关节级特征不仅能有效地编码人体几何信息,而且对具有自遮挡和缺失区域的噪声输入具有鲁棒性。该方法利用关节级特征和输入点云的全局特征提供的丰富互补线索,为统计三维人体模型(如SMPL)提供可靠且不纠缠的参数预测。该方法在超现实和DFAUST两个大规模数据集上实现了最先进的性能。此外,VoteHMR还展示了在现实世界数据集(如Berkeley MHAD)上的卓越泛化能力。 摘要:3D human mesh recovery from point clouds is essential for various tasks, including AR/VR and human behavior understanding. Previous works in this field either require high-quality 3D human scans or sequential point clouds, which cannot be easily applied to low-quality 3D scans captured by consumer-level depth sensors. In this paper, we make the first attempt to reconstruct reliable 3D human shapes from single-frame partial point clouds.To achieve this, we propose an end-to-end learnable method, named VoteHMR. The core of VoteHMR is a novel occlusion-aware voting network that can first reliably produce visible joint-level features from the input partial point clouds, and then complete the joint-level features through the kinematic tree of the human skeleton. Compared with holistic features used by previous works, the joint-level features can not only effectively encode the human geometry information but also be robust to noisy inputs with self-occlusions and missing areas. By exploiting the rich complementary clues from the joint-level features and global features from the input point clouds, the proposed method encourages reliable and disentangled parameter predictions for statistical 3D human models, such as SMPL. The proposed method achieves state-of-the-art performances on two large-scale datasets, namely SURREAL and DFAUST. Furthermore, VoteHMR also demonstrates superior generalization ability on real-world datasets, such as Berkeley MHAD.
【68】 Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation 标题:无图神经网络:通过蒸馏教授旧的MLP新技巧 链接:https://arxiv.org/abs/2110.08727
作者:Shichang Zhang,Yozen Liu,Yizhou Sun,Neil Shah 机构:University of California, Los Angeles, Snap Inc. 摘要:图神经网络(GNNs)近年来在图机学习中得到了广泛的应用,并在宽节点分类任务中取得了良好的效果。然而,由于数据依赖性带来的可扩展性挑战,GNN在业界的实际部署中不太受欢迎。也就是说,GNN推断依赖于距离目标多跳的邻居节点,获取这些节点会给延迟受限的应用程序带来负担。现有的推理加速方法,如剪枝和量化,通过减少乘法和累加(MAC)运算,可以在一定程度上加速GNN。然而,由于数据依赖性没有得到解决,它们的改进是有限的。相反,多层感知器(MLP)对图形数据没有依赖性,推理速度比GNNs快得多,尽管它们在节点分类方面通常不如GNNs准确。基于这些互补的优势和劣势,我们通过知识提炼(KD)将GNN和MLP结合在一起。我们的工作表明,使用GNN KD可以大幅度提高MLPs的性能。我们称提取的MLPs无图神经网络(GLNNs),因为它们没有推理图依赖性。我们表明,具有竞争性能的GLNN推断速度比GNNs快146X-273X,比其他加速方法快14X-27X。同时,在涉及7个数据集的转换和归纳预测的生产环境下,GLNN的准确度比独立MLP平均提高12.36%,并与6/7数据集的GNN相匹配。对GLNN的全面分析显示了GLNN何时以及为什么能够取得与GNNs竞争的结果,并建议GLNN作为延迟受限应用程序的便捷选择。 摘要:Graph Neural Networks (GNNs) have recently become popular for graph machine learning and have shown great results on wide node classification tasks. Yet, GNNs are less popular for practical deployments in the industry owing to their scalability challenges incurred by data dependency. Namely, GNN inference depends on neighbor nodes multiple hops away from the target, and fetching these nodes burdens latency-constrained applications. Existing inference acceleration methods like pruning and quantization can speed up GNNs to some extent by reducing Multiplication-and-ACcumulation (MAC) operations. However, their improvements are limited given the data dependency is not resolved. Conversely, multi-layer perceptrons (MLPs) have no dependency on graph data and infer much faster than GNNs, even though they are less accurate than GNNs for node classification in general. Motivated by these complementary strengths and weaknesses, we bring GNNs and MLPs together via knowledge distillation (KD). Our work shows that the performance of MLPs can be improved by large margins with GNN KD. We call the distilled MLPs Graph-less Neural Networks (GLNNs) as they have no inference graph dependency. We show that GLNN with competitive performance infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X. Meanwhile, under a production setting involving both transductive and inductive predictions across 7 datasets, GLNN accuracies improve over stand alone MLPs by 12.36% on average and match GNNs on 6/7 datasets. A comprehensive analysis of GLNN shows when and why GLNN can achieve competitive results to GNNs and suggests GLNN as a handy choice for latency-constrained applications.
【69】 Black-box Adversarial Attacks on Network-wide Multi-step Traffic State Prediction Models 标题:全网多步流量状态预测模型的黑盒对抗性攻击 链接:https://arxiv.org/abs/2110.08712
作者:Bibek Poudel,Weizi Li 机构:Bibek Poudel and Weizi Li are with the Department of Com-puter Science, University of Memphis 备注:Accepted to IEEE International Conference on Intelligent Transportation Systems (ITSC), 2021 摘要:交通状态预测是许多智能交通系统应用的必要条件。该主题的最新发展集中于网络范围的多步预测,其中最先进的性能是通过深度学习模型实现的,特别是基于图形神经网络的模型。虽然深度学习模型的预测精度很高,但这些模型的鲁棒性引起了许多安全问题,因为添加到输入中的不可察觉的扰动会严重降低模型性能。在这项工作中,我们提出了一个对抗性攻击框架,将预测模型视为一个黑箱,即假设不知道模型体系结构、训练数据和(超)参数。然而,我们假设对手可以用任何输入预言预测模型并获得相应的输出。接下来,对手可以使用输入-输出对训练替代模型,并基于替代模型生成对抗信号。为了测试攻击的有效性,研究了两种最新的基于图形神经网络的模型(GCGRNN和DCRNN)。因此,对手可以将目标模型的预测精度降低到54美元。相比之下,两个传统的统计模型(线性回归和历史平均)也进行了检验。虽然这两个模型不能产生很高的预测精度,但它们要么受到轻微影响(低于3%$),要么对对手的攻击免疫。 摘要:Traffic state prediction is necessary for many Intelligent Transportation Systems applications. Recent developments of the topic have focused on network-wide, multi-step prediction, where state of the art performance is achieved via deep learning models, in particular, graph neural network-based models. While the prediction accuracy of deep learning models is high, these models' robustness has raised many safety concerns, given that imperceptible perturbations added to input can substantially degrade the model performance. In this work, we propose an adversarial attack framework by treating the prediction model as a black-box, i.e., assuming no knowledge of the model architecture, training data, and (hyper)parameters. However, we assume that the adversary can oracle the prediction model with any input and obtain corresponding output. Next, the adversary can train a substitute model using input-output pairs and generate adversarial signals based on the substitute model. To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined. As a result, the adversary can degrade the target model's prediction accuracy up to $54%$. In comparison, two conventional statistical models (linear regression and historical average) are also examined. While these two models do not produce high prediction accuracy, they are either influenced negligibly (less than $3%$) or are immune to the adversary's attack.
【70】 A Q-Learning-based Approach for Distributed Beam Scheduling in mmWave Networks 标题:一种基于Q学习的毫米波网络分布式波束调度方法 链接:https://arxiv.org/abs/2110.08704
作者:Xiang Zhang,Shamik Sarkar,Arupjyoti Bhuyan,Sneha Kumar Kasera,Mingyue Ji 机构:Department of Electrical and Computer Engineering, University of Utah∗, School of Computing, University of Utah †, Idaho National Laboratory‡ 备注:10 pages 摘要:我们考虑了毫米波(毫米波)蜂窝网络的分布式下行链路波束调度和功率分配问题,其中属于不同服务运营商的多个基站(BSS)共享相同的未经许可的频谱,在它们之间没有中央协调或合作。我们的目标是设计高效的分布式波束调度和功率分配算法,使网络级的收益(定义为总吞吐量和功率惩罚项的加权和)最大化。为此,我们提出了一种分布式功率分配和自适应调度方法,通过将每个基站建模为独立的Q-学习代理,在共享频谱上实现有效的干扰管理。作为基线,我们将所提出的方法与之前针对同一问题开发的最先进的非合作博弈方法进行了比较。我们在各种场景下进行了大量实验,以验证多个因素对两种方法性能的影响。实验结果表明,与基于博弈的方法相比,该方法能更好地适应不同的干扰情况,并能获得更高的收益。所提出的方法也可以集成到我们先前开发的Lyapunov随机优化框架中,以实现网络效用最大化和最优保证。因此,支付函数中的权重可以通过从Lyapunov优化框架导出的子问题中得到的虚拟队列值自动地、最优地确定。 摘要:We consider the problem of distributed downlink beam scheduling and power allocation for millimeter-Wave (mmWave) cellular networks where multiple base stations (BSs) belonging to different service operators share the same unlicensed spectrum with no central coordination or cooperation among them. Our goal is to design efficient distributed beam scheduling and power allocation algorithms such that the network-level payoff, defined as the weighted sum of the total throughput and a power penalization term, can be maximized. To this end, we propose a distributed scheduling approach to power allocation and adaptation for efficient interference management over the shared spectrum by modeling each BS as an independent Q-learning agent. As a baseline, we compare the proposed approach to the state-of-the-art non-cooperative game-based approach which was previously developed for the same problem. We conduct extensive experiments under various scenarios to verify the effect of multiple factors on the performance of both approaches. Experiment results show that the proposed approach adapts well to different interference situations by learning from experience and can achieve higher payoff than the game-based approach. The proposed approach can also be integrated into our previously developed Lyapunov stochastic optimization framework for the purpose of network utility maximization with optimality guarantee. As a result, the weights in the payoff function can be automatically and optimally determined by the virtual queue values from the sub-problems derived from the Lyapunov optimization framework.
【71】 Towards Instance-Optimal Offline Reinforcement Learning with Pessimism 标题:面向实例最优的悲观离线强化学习 链接:https://arxiv.org/abs/2110.08695
作者:Ming Yin,Yu-Xiang Wang 机构:Department of Computer Science, UC Santa Barbara, Department of Statistics and Applied Probability, UC Santa Barbara 备注:NeurIPS, 2021 摘要:我们研究了离线强化学习(offline-RL)问题,其目标是使用来自策略$mu$的数据,在未知马尔可夫决策过程(MDP)中学习奖励最大化策略。特别地,我们考虑有限地平线MDPs的离线RL的样本复杂性问题。以往的工作是基于不同的数据覆盖假设来研究这个问题的,它们的学习保证是用覆盖系数来表示的,而覆盖系数缺乏对系统数量的明确描述。在这项工作中,我们分析了自适应悲观值迭代(APVI)算法,并推导了自适应悲观值迭代(APVI)算法,分析了自适应悲观值迭代(APVI)算法,我们分析了自适应悲观值迭代(APVI)算法,并分析了自适应悲观值迭代(APVI)算法,我们分析了自适应悲观值迭代(APVI)算法,并推导出了接近匹配[OOOOOO左(sum {{{{{{{h{{h h{{h{h h h h=h=h=h=h=h=1)h}{{{{h{{{{{{h h{{{{{h{{h{h{{{{{h{h h{{{{{{h{{h{{{}sqrt{frac{1}{n}}right.]作为补充,我们还证明了在弱假设下,如果$d^{pi^star}uh(s_h,a_h)>0$,则$d^muu h(s_h,a_h)>0$的每实例信息理论下界。与以前的极大极小下界不同,每个实例的下界(通过局部极大极小)是一个更强的标准,因为它分别适用于单个实例。这里$pi^star$是一个最优策略,$mu$是行为策略,$d_h^mu$是边际状态动作概率。我们将上述方程称为内在离线强化学习界,因为它直接暗示了所有现有的最优结果:统一数据覆盖假设下的极大极小率、无地平线设置、单一策略集中性和紧问题相关结果。随后,我们将结果推广到无假设的区域(在该区域中,我们不对$mu$进行假设),并获得无假设的内在界。由于其一般形式,我们相信内在界有助于阐明导致特定问题难以解决的原因,并揭示离线RL的基本挑战。 摘要:We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $mu$. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches [ Oleft(sum_{h=1}^Hsum_{s_h,a_h}d^{pi^star}_h(s_h,a_h)sqrt{frac{mathrm{Var}_{P_{s_h,a_h}}{(V^star_{h 1} r_h)}}{d^mu_h(s_h,a_h)}}sqrt{frac{1}{n}}right). ] In complementary, we also prove a per-instance information-theoretical lower bound under the weak assumption that $d^mu_h(s_h,a_h)>0$ if $d^{pi^star}_h(s_h,a_h)>0$. Different from the previous minimax lower bounds, the per-instance lower bound (via local minimaxity) is a much stronger criterion as it applies to individual instances separately. Here $pi^star$ is a optimal policy, $mu$ is the behavior policy and $d_h^mu$ is the marginal state-action probability. We call the above equation the intrinsic offline reinforcement learning bound since it directly implies all the existing optimal results: minimax rate under uniform data-coverage assumption, horizon-free setting, single policy concentrability, and the tight problem-dependent results. Later, we extend the result to the assumption-free regime (where we make no assumption on $ mu$) and obtain the assumption-free intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.
【72】 Classical-to-Quantum Transfer Learning for Spoken Command Recognition Based on Quantum Neural Networks 标题:基于量子神经网络的经典到量子转移学习在口令识别中的应用 链接:https://arxiv.org/abs/2110.08689
作者:Jun Qi,Javier Tejedor 机构:Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, Escuela Politecnica Superior, Universidad San Pablo-CEU, CEU Universities, Madrid, Spain 备注:submitted to ICASSP'22 摘要:本研究将机器学习算法中的转移学习扩展到新兴的用于语音命令识别(SCR)的端到端混合量子神经网络(QNN)。基于QNN的SCR系统由经典和量子两部分组成:(1)经典部分主要依靠一维卷积神经网络(CNN)提取语音特征;(2) 量子部分建立在变分量子电路的基础上,具有一些可学习的参数。由于在有噪声的中尺度量子(NISQ)设备上从头训练混合端到端QNN效率低下,我们提出了一种混合转移学习算法,允许预先训练的经典网络转移到混合QNN模型的经典部分。通过与变分量子电路(VQC)的联合微调,对预先训练好的经典网络进行进一步修改和扩充。混合转移学习方法对于基于QNN的SCR任务特别有吸引力,因为低维经典特征有望被编码到量子态中。我们在Google语音命令数据集上评估了应用于SCR混合经典量子QNN的混合转移学习算法,我们的经典模拟结果表明,混合转移学习可以提高SCR任务的基线性能。 摘要:This work investigates an extension of transfer learning applied in machine learning algorithms to the emerging hybrid end-to-end quantum neural network (QNN) for spoken command recognition (SCR). Our QNN-based SCR system is composed of classical and quantum components: (1) the classical part mainly relies on a 1D convolutional neural network (CNN) to extract speech features; (2) the quantum part is built upon the variational quantum circuit with a few learnable parameters. Since it is inefficient to train the hybrid end-to-end QNN from scratch on a noisy intermediate-scale quantum (NISQ) device, we put forth a hybrid transfer learning algorithm that allows a pre-trained classical network to be transferred to the classical part of the hybrid QNN model. The pre-trained classical network is further modified and augmented through jointly fine-tuning with a variational quantum circuit (VQC). The hybrid transfer learning methodology is particularly attractive for the task of QNN-based SCR because low-dimensional classical features are expected to be encoded into quantum states. We assess the hybrid transfer learning algorithm applied to the hybrid classical-quantum QNN for SCR on the Google speech command dataset, and our classical simulation results suggest that the hybrid transfer learning can boost our baseline performance on the SCR task.
【73】 Blockchain and Federated Edge Learning for Privacy-Preserving Mobile Crowdsensing 标题:区块链和联合边缘学习在隐私保护移动树冠感知中的应用 链接:https://arxiv.org/abs/2110.08671
作者:Qin Hu,Zhilin Wang,Minghui Xu,Xiuzhen Cheng 机构:Qin Hu and Zhilin Wang are with the Department of Computer andInformation Science, Indiana University-Purdue University Indianapolis, eduMinghui Xu and Xiuzhen Cheng are with the School of Computer Scienceand Technology, Shandong University 摘要:移动众感知(MCS)依靠大量工作人员的移动性,帮助请求者以更大的灵活性和更低的成本完成各种感知任务。然而,对于传统的MCS,原始数据传输的通信资源的巨大消耗以及对数据存储和计算能力的高要求阻碍了资源有限的潜在请求者使用MCS。为了促进MCS的广泛应用,我们提出了一个新的MCS学习框架,该框架利用区块链技术和基于联邦学习(FL)的边缘智能新概念,涉及四个主要实体,包括请求者、区块链、边缘服务器和作为工人的移动设备。尽管存在一些关于基于区块链的MCS和基于区块链的FL的研究,但它们无法解决MCS在容纳资源受限的请求者或处理请求者和工作者参与学习过程所带来的隐私问题方面的基本挑战。为填补这一空白,设计了四个主要程序,即任务发布、数据感知和提交、学习返回最终结果以及支付结算和分配,以应对内部和外部威胁带来的重大挑战,如恶意边缘服务器和不诚实的请求者。具体而言,提出了一种基于机制设计的数据提交规则,以保证移动设备的数据隐私在边缘服务器上得到真实的保护;阐述了基于区块链的联合体FL,以确保分布式学习过程的安全;设计了一种合作实施控制策略,以获得请求方的全额支付。通过大量的仿真来评估我们设计的方案的性能。 摘要:Mobile crowdsensing (MCS) counting on the mobility of massive workers helps the requestor accomplish various sensing tasks with more flexibility and lower cost. However, for the conventional MCS, the large consumption of communication resources for raw data transmission and high requirements on data storage and computing capability hinder potential requestors with limited resources from using MCS. To facilitate the widespread application of MCS, we propose a novel MCS learning framework leveraging on blockchain technology and the new concept of edge intelligence based on federated learning (FL), which involves four major entities, including requestors, blockchain, edge servers and mobile devices as workers. Even though there exist several studies on blockchain-based MCS and blockchain-based FL, they cannot solve the essential challenges of MCS with respect to accommodating resource-constrained requestors or deal with the privacy concerns brought by the involvement of requestors and workers in the learning process. To fill the gaps, four main procedures, i.e., task publication, data sensing and submission, learning to return final results, and payment settlement and allocation, are designed to address major challenges brought by both internal and external threats, such as malicious edge servers and dishonest requestors. Specifically, a mechanism design based data submission rule is proposed to guarantee the data privacy of mobile devices being truthfully preserved at edge servers; consortium blockchain based FL is elaborated to secure the distributed learning process; and a cooperation-enforcing control strategy is devised to elicit full payment from the requestor. Extensive simulations are carried out to evaluate the performance of our designed schemes.
【74】 Finding Critical Scenarios for Automated Driving Systems: A Systematic Literature Review 标题:寻找自动驾驶系统的关键场景:系统的文献综述 链接:https://arxiv.org/abs/2110.08664
作者:Xinhai Zhang,Jianbo Tao,Kaige Tan,Martin Törngren,José Manuel Gaspar Sánchez,Muhammad Rusyadi Ramli,Xin Tao,Magnus Gyllenhammar,Franz Wotawa,Naveen Mohan,Mihai Nica,Hermann Felbinger 备注:37 pages, 24 figures 摘要:基于场景的方法在自动驾驶系统的研究和工程中受到了极大的关注。由于驾驶环境的复杂性和不确定性,以及驾驶任务本身的复杂性,ADS或ADA可能遇到的驾驶场景数量几乎是无限的。因此,必须能够对识别情景进行推理,特别是如果不考虑可能会带来不可接受风险的关键情景。关键场景对于支持设计、验证和确认工作以及作为安全案例的基础尤为重要。在这篇论文中,我们提出了一个系统的文献回顾的结果,在自动驾驶的背景下。主要贡献有:(i)为关键场景识别方法引入了一个全面的分类法;(ii)根据2017年至2020年间包含86篇论文的分类法,概述最新研究成果;以及(iii)确定有待进一步研究的未决问题和方向。所提供的分类法包括三个主要方面,包括问题定义(原因)、解决方案(派生场景的方法)和对已建立场景的评估。此外,我们还从覆盖范围、实用性和场景空间爆炸的角度讨论了开放性研究问题。 摘要:Scenario-based approaches have been receiving a huge amount of attention in research and engineering of automated driving systems. Due to the complexity and uncertainty of the driving environment, and the complexity of the driving task itself, the number of possible driving scenarios that an ADS or ADAS may encounter is virtually infinite. Therefore it is essential to be able to reason about the identification of scenarios and in particular critical ones that may impose unacceptable risk if not considered. Critical scenarios are particularly important to support design, verification and validation efforts, and as a basis for a safety case. In this paper, we present the results of a systematic literature review in the context of autonomous driving. The main contributions are: (i) introducing a comprehensive taxonomy for critical scenario identification methods; (ii) giving an overview of the state-of-the-art research based on the taxonomy encompassing 86 papers between 2017 and 2020; and (iii) identifying open issues and directions for further research. The provided taxonomy comprises three main perspectives encompassing the problem definition (the why), the solution (the methods to derive scenarios), and the assessment of the established scenarios. In addition, we discuss open research issues considering the perspectives of coverage, practicability, and scenario space explosion.
【75】 Learning UI Navigation through Demonstrations composed of Macro Actions 标题:通过由宏操作组成的演示学习UI导航 链接:https://arxiv.org/abs/2110.08653
作者:Wei Li 机构:Google Research, New York, NY 摘要:我们开发了一个框架来可靠地构建能够进行UI导航的代理。状态空间从原始像素简化为从屏幕理解中提取的一组UI元素,如OCR和图标检测。操作空间仅限于UI元素和一些全局操作。可以为任务自定义操作,每个操作都是以状态检查为条件的一系列基本操作。有了这样的设计,我们就能够用少量的演示情节来训练DQfD和BC代理。我们建议增加演示,显著减少所需的人工演示数量。我们对DQfD进行了定制,以允许在截图上收集演示,从而促进罕见案例的演示覆盖。仅在评估早期版本的代理时收集失败案例的演示。通过在评估、演示收集和训练上循环10秒的迭代,该代理在80多个应用程序和网站的环境中(初始状态和查看参数是随机的)搜索任务的成功率达到98.7%。 摘要:We have developed a framework to reliably build agents capable of UI navigation. The state space is simplified from raw-pixels to a set of UI elements extracted from screen understanding, such as OCR and icon detection. The action space is restricted to the UI elements plus a few global actions. Actions can be customized for tasks and each action is a sequence of basic operations conditioned on status checks. With such a design, we are able to train DQfD and BC agents with a small number of demonstration episodes. We propose demo augmentation that significantly reduces the required number of human demonstrations. We made a customization of DQfD to allow demos collected on screenshots to facilitate the demo coverage of rare cases. Demos are only collected for the failed cases during the evaluation of the previous version of the agent. With 10s of iterations looping over evaluation, demo collection, and training, the agent reaches a 98.7% success rate on the search task in an environment of 80 apps and websites where initial states and viewing parameters are randomized.
【76】 Equivariant Discrete Normalizing Flows 标题:等变离散正则化流 链接:https://arxiv.org/abs/2110.08649
作者:Avishek Joey Bose,Ivan Kobyzev 机构:McGill University and Mila, Huawei Noah’s Ark Lab 备注:Preprint 摘要:生成性建模的核心是揭示产生观测数据的潜在因素,这些数据通常可以建模为自然对称性,通过对某些变换定律的不变性和等变性表现出来。然而,当前的方法是以连续规范化流的形式表达的,这需要构造等变向量场——这限制了它们在传统的高维生成建模领域(如自然图像)中的简单应用。在本文中,我们着重于使用离散层构建等变规范化流。首先从理论上证明了作用于紧空间上的紧群的等变映射的存在性。我们进一步介绍了两种新的等变流:$G$-耦合流和$G$-剩余流,它们将具有等变映射的经典耦合流和剩余流提升到规定的组$G$。我们对$G$-剩余流的构造也是普遍的,从这个意义上说,我们证明了$G$-等变微分同胚可以精确地映射为$G$-剩余流。最后,我们首次在像CIFAR-10这样的图像数据集上进行实验,以补充我们的理论见解,并表明$G$-等变离散规范化流可以提高数据效率、加快收敛速度和改进似然估计。 摘要:At its core, generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modelled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformations laws. However, current approaches are couched in the formalism of continuous normalizing flows that require the construction of equivariant vector fields -- inhibiting their simple application to conventional higher dimensional generative modelling domains like natural images. In this paper we focus on building equivariant normalizing flows using discrete layers. We first theoretically prove the existence of an equivariant map for compact groups whose actions are on compact spaces. We further introduce two new equivariant flows: $G$-coupling Flows and $G$-Residual Flows that elevate classical Coupling and Residual Flows with equivariant maps to a prescribed group $G$. Our construction of $G$-Residual Flows are also universal, in the sense that we prove an $G$-equivariant diffeomorphism can be exactly mapped by a $G$-residual flow. Finally, we complement our theoretical insights with experiments -- for the first time -- on image datasets like CIFAR-10 and show $G$-Equivariant Discrete Normalizing flows lead to increased data efficiency, faster convergence, and improved likelihood estimates.
【77】 Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning 标题:鲁棒多Agent深度强化学习的局部优势执行者-批评者 链接:https://arxiv.org/abs/2110.08642
作者:Yuchen Xiao,Xueguang Lyu,Christopher Amato 机构: Northeastern University 备注:None 摘要:策略梯度方法已在多智能体强化学习中得到广泛应用,但由于环境随机性和探索智能体(即非平稳性)的存在,策略梯度方法的方差很高,这可能会因信贷分配的困难而恶化。因此,需要一种不仅能够有效地解决上述两个问题,而且能够足够健壮地解决各种任务的方法。为此,我们提出了一种新的多智能体策略梯度方法,称为鲁棒局部优势(ROLA)算法。ROLA允许每个代理作为本地批评家学习单个动作值函数,并通过基于集中式批评家的新型集中式训练方法改善环境的非平稳性。通过使用此局部批评家,每个代理计算一个基线,以减少其策略梯度估计的方差,这将导致相对于其他代理的选择的预期优势行动值,从而隐式地改善信贷分配。我们评估了ROLA在不同基准上的性能,并展示了它相对于许多最先进的多代理策略梯度算法的健壮性和有效性。 摘要:Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic. ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic. By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents' choices that implicitly improves credit assignment. We evaluate ROLA across diverse benchmarks and show its robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.
【78】 Conceptual Modeling and Artificial Intelligence: Mutual Benefits from Complementary Worlds 标题:概念建模与人工智能:互补世界的互惠互利 链接:https://arxiv.org/abs/2110.08637
作者:Dominik Bork 机构:TU Wien, Business Informatics Group, Vienna, Austria 备注:Editorial preface to the 3rd Int. Workshop on Conceptual Modeling Meets Artificial Intelligence (CMAI'2021) 摘要:概念建模(CM)应用抽象来降低所研究系统的复杂性(例如,现实的摘录)。作为概念建模过程的一个结果,衍生出一种人类可解释的形式化表示(即概念模型),从而实现人类之间的理解和交流以及机器的处理。人工智能(AI)算法也适用于复杂现实(通常由大量数据表示),以识别模式或对数据中的实体进行分类。除了两种方法的共同点之外,通过观察结果可以观察到显著的差异。虽然概念模型是可理解的、可复制的和明确的知识表示,但人工智能技术能够有效地从给定的输入中导出输出,同时充当黑匣子。人工智能解决方案通常缺乏全面性和再现性。即使是人工智能系统的开发者也无法解释为什么会产生某种输出。在概念建模与人工智能(CMAI)研讨会上,我们感兴趣的是解决这两个学科的交叉点,到目前为止,主要是CM和AI的孤立学科。研讨会接受了这样一个假设,即通过i)调查概念建模(CM)对AI的贡献,以及ii)反过来,人工智能(AI)对CM的贡献,可以实现多种互利。 摘要:Conceptual modeling (CM) applies abstraction to reduce the complexity of a system under study (e.g., an excerpt of reality). As a result of the conceptual modeling process a human interpretable, formalized representation (i.e., a conceptual model) is derived which enables understanding and communication among humans, and processing by machines. Artificial Intelligence (AI) algorithms are also applied to complex realities (regularly represented by vast amounts of data) to identify patterns or to classify entities in the data. Aside from the commonalities of both approaches, a significant difference can be observed by looking at the results. While conceptual models are comprehensible, reproducible, and explicit knowledge representations, AI techniques are capable of efficiently deriving an output from a given input while acting as a black box. AI solutions often lack comprehensiveness and reproducibility. Even the developers of AI systems can't explain why a certain output is derived. In the Conceptual Modeling meets Artificial Intelligence (CMAI) workshop, we are interested in tackling the intersection of the two, thus far, mostly isolated approached disciplines of CM and AI. The workshop embraces the assumption, that manifold mutual benefits can be realized by i) investigating what Conceptual Modeling (CM) can contribute to AI, and ii) the other way around, what Artificial Intelligence (AI) can contribute to CM.
【79】 On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits 标题:随机土匪中遗憾最小化的Pareto前沿与最佳臂识别 链接:https://arxiv.org/abs/2110.08627
作者:Zixin Zhong,Wang Chi Cheung,Vincent Y. F. Tan 机构: 1Department of Mathematics, National University of Singapore, Singapore 2Department of Electrical and Computer Engineering, Singapore 3Department of In-dustrial Systems and Management, National University of Singa-pore 备注:27 pages, 8 figures 摘要:我们研究了随机土匪中两个原型目标的帕累托边界,即具有固定视界的后悔最小化(RM)和最佳手臂识别(BAI)。民间传说,开发和勘探之间的平衡对RM和BAI都至关重要,但勘探对于实现后一个目标的最佳绩效更为关键。为了使这一点更加精确,我们首先设计并分析了BoBW-lil'UCB$({gamma})$算法,该算法在${gamma}$的不同值下实现RM或BAI的顺序最优性能。作为补充,我们证明,对于RM和BAI目标,没有任何算法可以同时实现最佳性能。更准确地说,我们建立了具有给定BAI失效概率的任何算法可实现的遗憾的非平凡下界。这一分析表明,在某些制度下,BoBW-lil'UCB$({gamma})$达到了帕累托最优,直至常数或小项。数值实验进一步证明,当应用于困难情况时,BoBW-lil'UCB的表现优于竞争对手UCB${alpha}$(Degenne et al.,2019),后者是为RM和BAI设计的,具有固定置信度。 摘要:We study the Pareto frontier of two archetypal objectives in stochastic bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To make this precise, we first design and analyze the BoBW-lil'UCB$({gamma})$ algorithm, which achieves order-wise optimal performance for RM or BAI under different values of ${gamma}$. Complementarily, we show that no algorithm can simultaneously perform optimally for both the RM and BAI objectives. More precisely, we establish non-trivial lower bounds on the regret achievable by any algorithm with a given BAI failure probability. This analysis shows that in some regimes BoBW-lil'UCB$({gamma})$ achieves Pareto-optimality up to constant or small terms. Numerical experiments further demonstrate that when applied to difficult instances, BoBW-lil'UCB outperforms a close competitor UCB$_{alpha}$ (Degenne et al., 2019), which is designed for RM and BAI with a fixed confidence.
【80】 Generative Adversarial Imitation Learning for End-to-End Autonomous Driving on Urban Environments 标题:城市环境端到端自主驾驶的生成性对抗性模仿学习 链接:https://arxiv.org/abs/2110.08586
作者:Gustavo Claudio Karl Couto,Eric Aislan Antonelo 机构:Automation and Systems Engineering Department, Federal University of Santa Catarina, Florianopolis, Brazil 摘要:自动驾驶是一项复杂的任务,自1989年第一辆自动驾驶汽车ALVINN问世以来,就一直采用有监督的学习方法或行为克隆(BC)来解决这一问题。在BC中,使用状态-动作对对对神经网络进行训练,这些状态-动作对构成由专家(即人类驾驶员)制作的训练集。然而,这种类型的模仿学习没有考虑在导航轨迹的不同时刻采取的行动之间可能存在的时间依赖性。强化学习(RL)算法可以更好地处理这些类型的任务,它需要定义一个奖励函数。另一方面,最近的模仿学习方法,如生成性对抗性模仿学习(GAIL),可以在不明确要求定义奖励函数的情况下训练策略,允许代理直接在专家轨迹的训练集上通过试错学习。在这项工作中,我们提出了两种GAIL变体,用于在城市场景的真实CARLA模拟环境中进行车辆自主导航。它们都使用相同的网络结构,处理来自三个正面摄像头的高维图像输入,以及表示速度的其他九个连续输入,稀疏轨迹的下一个点和高级驾驶指令。我们证明了这两种方法都能在训练结束后从头到尾模拟专家轨迹,但在收敛时间和训练稳定性方面,用BC扩充的GAIL损失函数优于前者。 摘要:Autonomous driving is a complex task, which has been tackled since the first self-driving car ALVINN in 1989, with a supervised learning approach, or behavioral cloning (BC). In BC, a neural network is trained with state-action pairs that constitute the training set made by an expert, i.e., a human driver. However, this type of imitation learning does not take into account the temporal dependencies that might exist between actions taken in different moments of a navigation trajectory. These type of tasks are better handled by reinforcement learning (RL) algorithms, which need to define a reward function. On the other hand, more recent approaches to imitation learning, such as Generative Adversarial Imitation Learning (GAIL), can train policies without explicitly requiring to define a reward function, allowing an agent to learn by trial and error directly on a training set of expert trajectories. In this work, we propose two variations of GAIL for autonomous navigation of a vehicle in the realistic CARLA simulation environment for urban scenarios. Both of them use the same network architecture, which process high dimensional image input from three frontal cameras, and other nine continuous inputs representing the velocity, the next point from the sparse trajectory and a high-level driving command. We show that both of them are capable of imitating the expert trajectory from start to end after training ends, but the GAIL loss function that is augmented with BC outperforms the former in terms of convergence time and training stability.
【81】 Visual-aware Attention Dual-stream Decoder for Video Captioning 标题:视频字幕的视觉感知注意双流解码器 链接:https://arxiv.org/abs/2110.08578
作者:Zhixin Sun,Xian Zhong,Shuqin Chen,Lin Li,Luo Zhong 机构: School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, China, Hubei Key Laboratory of Transportation Internet of Things, Wuhan University of Technology, Wuhan, China 摘要:视频字幕是一项具有挑战性的任务,需要捕捉不同的视觉部分并用句子描述,因为它需要视觉和语言的连贯性。当前视频字幕方法中的注意机制学习为每一帧分配权重,从而动态提升解码器。这可能无法明确地建模序列框架中提取的视觉特征的相关性和时间一致性。为了生成语义一致的句子,我们提出了一种新的视觉感知注意(VA)模型,该模型将时间序列框架的动态变化与前一时刻的单词连接起来,作为注意机制的输入,提取序列特征。此外,流行的方法广泛使用教师强迫(TF)学习,在训练过程中,下一个标记是在前一个地面真值标记的基础上生成的。先前生成的标记中的语义信息丢失。因此,我们设计了一个自强制(SF)流,将前一个令牌概率分布中的语义信息作为输入,以增强当前令牌。双流解码器(DD)架构将TF和SF流统一起来,生成句子以促进两个流的注释字幕。同时,通过使用双流解码器,缓解了TF学习中由于训练和测试之间的差异而导致的暴露偏差问题。通过对Microsoft视频描述(MSVD)的实验研究,证明了所提出的视觉感知注意双流解码器(VADD)的有效性语料库和MSR视频到文本(MSR-VTT)数据集。 摘要:Video captioning is a challenging task that captures different visual parts and describes them in sentences, for it requires visual and linguistic coherence. The attention mechanism in the current video captioning method learns to assign weight to each frame, promoting the decoder dynamically. This may not explicitly model the correlation and the temporal coherence of the visual features extracted in the sequence frames.To generate semantically coherent sentences, we propose a new Visual-aware Attention (VA) model, which concatenates dynamic changes of temporal sequence frames with the words at the previous moment, as the input of attention mechanism to extract sequence features.In addition, the prevalent approaches widely use the teacher-forcing (TF) learning during training, where the next token is generated conditioned on the previous ground-truth tokens. The semantic information in the previously generated tokens is lost. Therefore, we design a self-forcing (SF) stream that takes the semantic information in the probability distribution of the previous token as input to enhance the current token.The Dual-stream Decoder (DD) architecture unifies the TF and SF streams, generating sentences to promote the annotated captioning for both streams.Meanwhile, with the Dual-stream Decoder utilized, the exposure bias problem is alleviated, caused by the discrepancy between the training and testing in the TF learning.The effectiveness of the proposed Visual-aware Attention Dual-stream Decoder (VADD) is demonstrated through the result of experimental studies on Microsoft video description (MSVD) corpus and MSR-Video to text (MSR-VTT) datasets.
【82】 DPNAS: Neural Architecture Search for Deep Learningwith Differential Privacy 标题:DPNAS:基于差分隐私的深度学习神经结构搜索 链接:https://arxiv.org/abs/2110.08557
作者:Anda Cheng,Jiaxing Wang,Xi Sheryl Zhang,Qiang Chen,Peisong Wang,Jian Cheng 机构: Jian Cheng 1 1Institute of Automation 摘要:训练深层神经网络(DNN)以获得有意义的微分隐私(DP)保证了模型的效用严重降低。在本文中,我们证明了DNNs的体系结构在私人深度学习的背景下对模型效用有显著的影响,而其影响在以前的研究中基本上未被探索。鉴于这一缺失,我们提出了第一个框架,即DPNAS,该框架采用神经架构搜索来自动设计私人深度学习模型。为了将私有学习与架构搜索相结合,我们精心设计了一个新的搜索空间,并提出了一种基于DP的候选模型训练方法。我们通过实证证明了所提出框架的有效性。搜索模型DPNASNet实现了最先进的隐私/效用权衡,例如,对于$(epsilon,delta)=(3,1times10^{-5})的隐私预算,我们的模型在MNIST上的测试精度为$98.57%$,在时尚主义者上为$88.09%$,在CIFAR-10上为$68.33%$。此外,通过研究生成的体系结构,我们提供了一些有趣的发现,设计私人学习友好的DNN,这可以为具有差异隐私的深度学习模型设计提供新的思路。 摘要:Training deep neural networks (DNNs) for meaningful differential privacy (DP) guarantees severely degrades model utility. In this paper, we demonstrate that the architecture of DNNs has a significant impact on model utility in the context of private deep learning, whereas its effect is largely unexplored in previous studies. In light of this missing, we propose the very first framework that employs neural architecture search to automatic model design for private deep learning, dubbed as DPNAS. To integrate private learning with architecture search, we delicately design a novel search space and propose a DP-aware method for training candidate models. We empirically certify the effectiveness of the proposed framework. The searched model DPNASNet achieves state-of-the-art privacy/utility trade-offs, e.g., for the privacy budget of $(epsilon, delta)=(3, 1times10^{-5})$, our model obtains test accuracy of $98.57%$ on MNIST, $88.09%$ on FashionMNIST, and $68.33%$ on CIFAR-10. Furthermore, by studying the generated architectures, we provide several intriguing findings of designing private-learning-friendly DNNs, which can shed new light on model design for deep learning with differential privacy.
【83】 HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression 标题:HRKD:面向跨域语言模型压缩的层次关系知识抽取 链接:https://arxiv.org/abs/2110.08551
作者:Chenhe Dong,Yaliang Li,Ying Shen,Minghui Qiu 机构: Sun Yat-sen University , Alibaba Group 备注:EMNLP 2021 摘要:在许多自然语言处理任务中,与传统的神经网络方法相比,大型预训练语言模型(PLM)表现出了压倒性的性能。然而,它们庞大的模型尺寸和较低的推理速度在实际应用中阻碍了在资源有限的设备上的部署。本文以知识提取为目标,提出了一种分层关系知识提取(HRKD)方法,用于获取分层和领域关系信息。具体来说,为了增强模型的能力和可转移性,我们利用元学习的思想,建立领域关系图来捕获不同领域之间的关系信息。为了动态地为每个域选择最具代表性的原型,我们提出了一种分层比较聚合机制来捕获分层关系。在公共多域数据集上的大量实验证明了我们的HRKD方法的优越性能及其强大的Few-Shot学习能力。为了再现性,我们在https://github.com/cheneydon/hrkd. 摘要:On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at https://github.com/cheneydon/hrkd.
【84】 Multimodal Dialogue Response Generation 标题:多模态对话响应生成 链接:https://arxiv.org/abs/2110.08515
作者:Qingfeng Sun,Yujing Wang,Can Xu,Kai Zheng,Yaming Yang,Huang Hu,Fei Xu,Jessica Zhang,Xiubo Geng,Daxin Jiang 机构:Microsoft STC Aisa, Microsoft Research Asia 备注:This paper has been submitted before 15th October @ 11:59pm AOE(UTC -12) 摘要:图像响应是智能会话代理的一项重要功能。然而现有的研究只关注于多模态对话模型的探索,依赖于基于检索的方法,而忽略了生成方法。为了填补这一空白,我们首先提出了一个多模态对话生成模型,该模型将对话历史作为输入,然后生成文本序列或图像作为响应。学习这样一个模型通常需要多模态对话,其中包含难以获得的文本和图像。动机的挑战,在实践中,我们认为多模态对话产生的自然假设,只有有限的训练实例是可用的。在这种低资源环境下,我们设计了一种新的会话代理Divter,以便将依赖于多模态对话的参数从整个生成模型中分离出来。通过这种方法,可以分别从大量纯文本对话和文本-图像对中学习模型的主要部分,然后使用有限的训练示例对整个参数进行拟合。大量实验表明,我们的方法在自动和人工评估方面都达到了最先进的效果,并且能够生成信息丰富的文本和高分辨率的图像响应。 摘要:Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response. Learning such a model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider multimodal dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of text-only dialogues and text-image pairs respectively, then the whole parameters can be well fitted using the limited training examples. Extensive experiments demonstrate our method achieves state-of-the-art results in both automatic and human evaluation, and can generate informative text and high-resolution image responses.
【85】 AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models 标题:AugmentedCode:检查自然语言资源在代码检索模型中的影响 链接:https://arxiv.org/abs/2110.08512
作者:Mehdi Bahrami,N. C. Shrikanth,Yuji Mizobuchi,Lei Liu,Masahiro Fukuyori,Wei-Peng Chen,Kazuki Munakata 机构:N.C. Shrikanth, Fujitsu Research of America, Sunnyvale, CA, USA, North Carolina State University, Raleigh, NC, USA, Fujitsu Research Ltd., Kawasaki, Japan 备注:7 pages, 2 figures, 5 tables, 1 video 摘要:代码检索允许软件工程师通过自然语言查询来搜索代码,自然语言查询依赖于自然语言处理和软件工程技术。从搜索片段代码到函数代码,在代码检索方面进行了多次尝试。在本文中,我们引入了增广代码(AugmentedCode)检索,它利用代码中的现有信息,构造增广编程语言来提高代码检索模型的性能。我们编写了大量的Python语料库,并展示了增强编程语言的框架和结果,其优于CodeBERT和平均倒数排名(MRR)分别为0.73和0.96。性能优异的微调增强代码检索模型发表在HuggingFace上https://huggingface.co/Fujitsu/AugCode 演示视频可在以下网址获得:https://youtu.be/mnZrUTANjGs . 摘要:Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from searching snippet codes to function codes. In this paper, we introduce Augmented Code (AugmentedCode) retrieval which takes advantage of existing information within the code and constructs augmented programming language to improve the code retrieval models' performance. We curated a large corpus of Python and showcased the the framework and the results of augmented programming language which outperforms on CodeSearchNet and CodeBERT with a Mean Reciprocal Rank (MRR) of 0.73 and 0.96, respectively. The outperformed fine-tuned augmented code retrieval model is published in HuggingFace at https://huggingface.co/Fujitsu/AugCode and a demonstration video is available at: https://youtu.be/mnZrUTANjGs .
【86】 What can we learn from universal Turing machines? 标题:我们能从万能图灵机学到什么? 链接:https://arxiv.org/abs/2110.08511
作者:Maurice Margenstern 备注:35 pages, 5 tables 摘要:在本论文中,我们构造了一个教学通用图灵机。我们试图理解哪些与生物现象的比较可以从它的编码和工作中推断出来。 摘要:In the present paper, we construct what we call a pedagogical universal Turing machine. We try to understand which comparisons with biological phenomena can be deduced from its encoding and from its working.
【87】 Lifelong Topological Visual Navigation 标题:终身拓扑视觉导航 链接:https://arxiv.org/abs/2110.08488
作者:Rey Reza Wiyatno,Anqi Xu,Liam Paull 机构: 1 illustrates howour agent uses the graph for planning a navigation task in 1ReyRezaWiyatnoandLiamPaullarewithMontr´ealRoboticsandEmbodiedAILab(REAL)andDIROattheUniversityofMontr´eal 备注:Project page: this https URL 摘要:机器人仅使用视觉导航的能力因其简单而吸引人。传统的基于视觉的导航方法需要预先构建地图,这是一个艰巨且容易失败的步骤,或者只能精确地遵循先前执行的轨迹。新的基于学习的视觉导航技术减少了对地图的依赖,而是直接从图像输入中学习策略进行导航。目前有两种流行的范例:端到端方法完全放弃显式地图表示,拓扑方法仍然保留空间的一些松散连接。然而,虽然端到端的方法往往难以在远程导航任务中实现,但基于拓扑图的解决方案由于图中的伪边而容易失败。在这项工作中,我们提出了一种基于学习的拓扑视觉导航方法,该方法具有图形更新策略,可以随着时间的推移提高终身导航性能。我们从基于采样的规划算法中获得灵感,构建基于图像的拓扑图,从而生成更稀疏的图,但与基线方法相比具有更高的导航性能。此外,与从固定训练环境学习的控制器不同,我们表明,我们的模型可以使用来自部署机器人的真实环境的相对较小的数据集进行微调。我们将进一步评估系统在实际部署中的性能。 摘要:The ability for a robot to navigate with only the use of vision is appealing due to its simplicity. Traditional vision-based navigation approaches required a prior map-building step that was arduous and prone to failure, or could only exactly follow previously executed trajectories. Newer learning-based visual navigation techniques reduce the reliance on a map and instead directly learn policies from image inputs for navigation. There are currently two prevalent paradigms: end-to-end approaches forego the explicit map representation entirely, and topological approaches which still preserve some loose connectivity of the space. However, while end-to-end methods tend to struggle in long-distance navigation tasks, topological map-based solutions are prone to failure due to spurious edges in the graph. In this work, we propose a learning-based topological visual navigation method with graph update strategies that improve lifelong navigation performance over time. We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods. Also, unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed. We further assess performance of our system in real-world deployments.
【88】 Streaming Decision Trees and Forests 标题:流式决策树和林 链接:https://arxiv.org/abs/2110.08483
作者:Haoyin Xu,Jayanta Dey,Sambit Panda,Joshua T. Vogelstein 机构: 2Methods 1Johns Hopkins University 摘要:机器学习成功地利用了现代数据,并为无数现实问题提供了计算解决方案,包括物理和生物医学发现。目前,估计员可以处理所有可用样本的情况和需要持续更新的情况。然而,基于批处理决策树和随机森林的流式算法仍然有改进的空间,这是批处理数据任务中的主要方法。在本文中,我们探索了扩展批处理树的最简单部分拟合算法,并在三个不同复杂度的分类任务上测试了我们的模型:流决策树(SDT)和流决策林(SDF)。作为参考,现有的流树(Hoeffding树和Mondrian森林)和批估计都包含在实验中。在所有三项任务中,SDF始终能产生高精度,而现有的估计器会遇到空间限制和精度波动。因此,我们的流树和流森林显示出进一步改进的巨大潜力,它们是解决分布漂移和迁移学习等问题的良好候选。 摘要:Machine learning has successfully leveraged modern data and provided computational solutions to innumerable real-world problems, including physical and biomedical discoveries. Currently, estimators could handle both scenarios with all samples available and situations requiring continuous updates. However, there is still room for improvement on streaming algorithms based on batch decision trees and random forests, which are the leading methods in batch data tasks. In this paper, we explore the simplest partial fitting algorithm to extend batch trees and test our models: stream decision tree (SDT) and stream decision forest (SDF) on three classification tasks of varying complexities. For reference, both existing streaming trees (Hoeffding trees and Mondrian forests) and batch estimators are included in the experiments. In all three tasks, SDF consistently produces high accuracy, whereas existing estimators encounter space restraints and accuracy fluctuations. Thus, our streaming trees and forests show great potential for further improvements, which are good candidates for solving problems like distribution drift and transfer learning.
【89】 Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network 标题:基于仿真和图卷积网络的学习协作与在线规划 链接:https://arxiv.org/abs/2110.08480
作者:Rafid Ameer Mahmud,Fahim Faisal,Saaduddin Mahmud,Md. Mosaddek Khan 机构:University of Dhaka, University of Massachusetts Amherst 摘要:多agent马尔可夫决策过程(Multi-agent-Markov-Decision-Process,MMDP)是多agent协作环境下序列决策算法建模的有效方法。在这一领域,已经开发了许多基于集中和分散规划的算法。然而,动态变化的环境,加上指数大小的状态和联合行动空间,使得这些算法难以同时提供效率和可扩展性。最近,集中式规划算法FV-MCTS-MP和分散式规划算法ABC(交替最大化与行为克隆)在解决MMDP方面取得了显著的成绩。然而,它们不能适应动态变化的环境,也不能分别解释代理之间缺乏通信的原因。在此背景下,我们介绍了一种基于仿真的在线规划算法,我们称之为SiCLOP,用于多智能体协作环境。具体而言,SiCLOP定制蒙特卡罗树搜索(MCTS),并使用协调图(CG)和图神经网络(GCN)学习协作,并提供MMDP问题的实时解决方案。它还通过有效修剪操作空间来提高可伸缩性。此外,与FV-MCTS-MP和ABC不同,SiCLOP支持迁移学习,使学习的代理能够在不同的环境中运行。在多智能体环境下,我们还对算法的收敛性进行了理论讨论。最后,我们广泛的实证结果表明,SiCLOP显著优于最先进的在线规划算法。 摘要:Multi-agent Markov Decision Process (MMDP) has been an effective way of modelling sequential decision making algorithms for multi-agent cooperative environments. A number of algorithms based on centralized and decentralized planning have been developed in this domain. However, dynamically changing environment, coupled with exponential size of the state and joint action space, make it difficult for these algorithms to provide both efficiency and scalability. Recently, Centralized planning algorithm FV-MCTS-MP and decentralized planning algorithm textit{Alternate maximization with Behavioural Cloning} (ABC) have achieved notable performance in solving MMDPs. However, they are not capable of adapting to dynamically changing environments and accounting for the lack of communication among agents, respectively. Against this background, we introduce a simulation based online planning algorithm, that we call SiCLOP, for multi-agent cooperative environments. Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation and provides real time solution of a MMDP problem. It also improves scalability through an effective pruning of action space. Additionally, unlike FV-MCTS-MP and ABC, SiCLOP supports transfer learning, which enables learned agents to operate in different environments. We also provide theoretical discussion about the convergence property of our algorithm within the context of multi-agent settings. Finally, our extensive empirical results show that SiCLOP significantly outperforms the state-of-the-art online planning algorithms.
【90】 Improving Compositional Generalization with Self-Training for Data-to-Text Generation 标题:通过自训练提高数据到文本生成的构图泛化能力 链接:https://arxiv.org/abs/2110.08467
作者:Sanket Vaibhav Mehta,Jinfeng Rao,Yi Tay,Mihir Kale,Ankur Parikh,Hongtao Zhong,Emma Strubell 机构:Carnegie Mellon University,Google,Google Research 备注:10 pages 摘要:数据到文本生成侧重于从结构化语义表示生成流畅的自然语言响应。这种表示是组合的,允许以各种方式组合原子意义图式,以表达自然语言中丰富的语义。最近,预训练语言模型(LMs)在从数据到文本的任务中取得了令人印象深刻的结果,尽管这些LMs在多大程度上推广到新的语义表示尚不清楚。在这项工作中,我们系统地研究了数据到文本任务中当前最先进的生成模型的合成概括。通过模拟成分天气数据集中的结构变化,我们表明T5模型无法推广到看不见的结构。接下来,我们展示了基于模板的输入表示极大地提高了模型性能,并且模型规模并不能很好地解决泛化的不足。为了进一步提高模型的性能,我们提出了一种基于自训练的伪响应选择方法。在少炮天气和多域SGD数据集上的大量实验表明,我们的方法具有很强的增益。 摘要:Data-to-text generation focuses on generating fluent natural language responses from structured semantic representations. Such representations are compositional, allowing for the combination of atomic meaning schemata in various ways to express the rich semantics in natural language. Recently, pretrained language models (LMs) have achieved impressive results on data-to-text tasks, though it remains unclear the extent to which these LMs generalize to new semantic representations. In this work, we systematically study the compositional generalization of current state-of-the-art generation models in data-to-text tasks. By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures. Next, we show that template-based input representations greatly improve the model performance and model scale does not trivially solve the lack of generalization. To further improve the model's performance, we propose an approach based on self-training using finetuned BLEURT for pseudo-response selection. Extensive experiments on the few-shot Weather and multi-domain SGD datasets demonstrate strong gains of our method.
【91】 Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining 标题:用快速采样和流水线加速图神经网络的训练和推理 链接:https://arxiv.org/abs/2110.08450
作者:Tim Kaler,Nickolas Stathas,Anne Ouyang,Alexandros-Stavros Iliopoulos,Tao B. Schardl,Charles E. Leiserson,Jie Chen 摘要:提高图神经网络(GNNs)的训练和推理性能面临着一个在一般神经网络中不常见的挑战:由于多跳图邻域沿网络层呈指数增长,创建小批量需要大量计算和数据移动。这种独特的挑战带来了一系列不同的系统设计选择。我们主张在分布式多GPU环境中使用邻域采样进行小批量训练,在此环境下,我们确定了开发人员迄今未充分探索的主要性能瓶颈:小批量准备和传输。我们提出了一系列的改进来缓解这些瓶颈,包括性能工程邻域采样器、共享内存并行化策略以及使用GPU计算的批处理传输流水线。我们还进行了实证分析,支持使用抽样进行推理,表明测试精度并未受到重大影响。这样的观察将训练和推理结合起来,简化了模型的实现。我们报告了多个基准数据集和GNN体系结构的综合实验结果,包括一个演示,即对于ogbn-papers100M数据集,我们的系统在使用单个GPU的标准Pyrotch几何实现的基础上实现了3倍的加速,在使用16个GPU的基础上进一步实现了8倍的并行加速。其中,使用抽样扇出(15,10,5)训练3层图形图像模型每历元需要2.0秒,使用扇出(20,20,20)进行推理需要2.4秒,达到测试精度64.58%。 摘要:Improving the training and inference performance of graph neural networks (GNNs) is faced with a challenge uncommon in general neural networks: creating mini-batches requires a lot of computation and data movement due to the exponential growth of multi-hop graph neighborhoods along network layers. Such a unique challenge gives rise to a diverse set of system design choices. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment, under which we identify major performance bottlenecks hitherto under-explored by developers: mini-batch preparation and transfer. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler, a shared-memory parallelization strategy, and the pipelining of batch transfer with GPU computation. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised. Such an observation unifies training and inference, simplifying model implementation. We report comprehensive experimental results with several benchmark data sets and GNN architectures, including a demonstration that, for the ogbn-papers100M data set, our system SALIENT achieves a speedup of 3x over a standard PyTorch-Geometric implementation with a single GPU and a further 8x parallel speedup with 16 GPUs. Therein, training a 3-layer GraphSAGE model with sampling fanout (15, 10, 5) takes 2.0 seconds per epoch and inference with fanout (20, 20, 20) takes 2.4 seconds, attaining test accuracy 64.58%.
【92】 Self-Annotated Training for Controllable Image Captioning 标题:用于可控图像字幕的自注解训练 链接:https://arxiv.org/abs/2110.08446
作者:Zhangzi Zhu,Tianlei Wang,Hong Qu 机构:University of Electronic Science and Technology of China 摘要:可控图像字幕(CIC)任务旨在根据指定的控制信号生成字幕。本文从两个方面对CIC进行了改进:1)现有的强化训练方法不适用于与结构相关的CIC模型,因为基于准确性的奖励主要关注内容而不是语义结构。由于缺乏强化训练,该模型无法生成更准确、更可控的句子。为了解决上述问题,我们提出了一种新的结构相关CIC模型强化训练方法:自注释训练(SAT),其中设计了递归采样机制(RSM),强制输入控制信号与实际输出句子匹配。在MSCOCO上进行的大量实验表明,在长度控制任务中,我们的SAT方法将CIDEr-D分数的C-Transformer(XE)从118.6提高到130.1,在紧张控制任务中从132.2提高到142.7,同时保持了与控制信号的99$%$以上匹配精度。2) 我们引入了一个新的控制信号:句子质量。配备了它,CIC模型能够根据需要生成不同质量级别的字幕。实验表明,在不增加背景真实性字幕信息的情况下,由最高句子质量水平控制的模型比基线模型在准确性上表现得更好。 摘要:The Controllable Image Captioning (CIC) task aims to generate captions conditioned on designated control signals. In this paper, we improve CIC from two aspects: 1) Existing reinforcement training methods are not applicable to structure-related CIC models due to the fact that the accuracy-based reward focuses mainly on contents rather than semantic structures. The lack of reinforcement training prevents the model from generating more accurate and controllable sentences. To solve the problem above, we propose a novel reinforcement training method for structure-related CIC models: Self-Annotated Training (SAT), where a recursive sampling mechanism (RSM) is designed to force the input control signal to match the actual output sentence. Extensive experiments conducted on MSCOCO show that our SAT method improves C-Transformer (XE) on CIDEr-D score from 118.6 to 130.1 in the length-control task and from 132.2 to 142.7 in the tense-control task, while maintaining more than 99$%$ matching accuracy with the control signal. 2) We introduce a new control signal: sentence quality. Equipped with it, CIC models are able to generate captions of different quality levels as needed. Experiments show that without additional information of ground truth captions, models controlled by the highest level of sentence quality perform much better in accuracy than baseline models.
【93】 Unsupervised Natural Language Inference Using PHL Triplet Generation 标题:基于PHL三元组生成的无监督自然语言推理 链接:https://arxiv.org/abs/2110.08438
作者:Neeraj Varshney,Pratyay Banerjee,Tejas Gokhale,Chitta Baral 机构:Arizona State University 备注:9 pages, 2 figures, 8 tables 摘要:当在各自的训练数据集上进行训练时,基于转换器的模型在各种自然语言推理(NLI)基准上取得了令人印象深刻的性能。但是,在某些情况下,可能无法获得训练样本,或者收集样本可能会耗费大量时间和资源。在这项工作中,我们解决了这一挑战,并对无监督NLI进行了探索性研究,这是一种没有人类注释训练样本的范式。我们在三种具有挑战性的环境下研究NLI:PH、P和NPH,它们在可用于学习的未标记数据的范围上不同。作为一种解决方案,我们提出了一种过程数据生成方法,该方法利用一组句子转换来收集用于训练NLI模型的PHL(前提、假设、标签)三元组,而不需要人类注释的训练数据集。综合实验表明,该方法在PH、P、NPH设置下的准确率分别为66.75%、65.9%、65.39%,优于所有现有基线。此外,仅使用约0.1%的训练数据集(500个样本)对我们的模型进行微调,与在相同的500个实例上从头开始训练的模型相比,精确度提高了12.2%。 摘要:Transformer-based models have achieved impressive performance on various Natural Language Inference (NLI) benchmarks, when trained on respective training datasets. However, in certain cases, training samples may not be available or collecting them could be time-consuming and resource-intensive. In this work, we address this challenge and present an explorative study on unsupervised NLI, a paradigm in which no human-annotated training samples are available. We investigate NLI under three challenging settings: PH, P, and NPH that differ in the extent of unlabeled data available for learning. As a solution, we propose a procedural data generation approach that leverages a set of sentence transformations to collect PHL (Premise, Hypothesis, Label) triplets for training NLI models, bypassing the need for human-annotated training datasets. Comprehensive experiments show that this approach results in accuracies of 66.75%, 65.9%, 65.39% in PH, P, NPH settings respectively, outperforming all existing baselines. Furthermore, fine-tuning our models with as little as ~0.1% of the training dataset (500 samples) leads to 12.2% higher accuracy than the model trained from scratch on the same 500 instances.
【94】 TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models 标题:TorchEsegeta:基于图像的深度学习模型的可解释性和可解释性框架 链接:https://arxiv.org/abs/2110.08429
作者:Soumick Chatterjee,Arnab Das,Chirag Mandal,Budhaditya Mukhopadhyay,Manish Vipinraj,Aniruddh Shukla,Rajatha Nagaraja Rao,Chompunuch Sarasaen,Oliver Speck,Andreas Nürnberger 机构:Data and Knowledge Engineering Group, Otto von Guericke University Magdeburg, Germany, Biomedical Magnetic Resonance, Otto von Guericke University Magdeburg, Germany, Institute for Medical Engineering, Otto von Guericke University Magdeburg, Germany 摘要:临床医生通常非常怀疑在实践中应用自动图像处理方法,尤其是基于深度学习的方法。造成这种情况的一个主要原因是这些方法的黑匣子性质以及缺少对自动衍生决策的洞察的固有问题。为了增加对这些方法的信任,本文通过描述对算法决策影响最大的解剖区域,提出了有助于解释深度学习算法结果的方法。此外,本研究提出了一个统一的框架,TorchEsegeta,用于将各种可解释性和可解释性技术应用于深度学习模型,并为临床医生提供视觉解释和解释,以证实其临床发现。此外,这将有助于获得对此类方法的信心。该框架建立在现有的可解释性和可解释性技术的基础上,这些技术目前主要关注分类模型,并将其扩展到分割任务。此外,这些方法已适用于体积分析的三维模型。提出的框架提供了使用不忠和敏感度指标定量比较视觉解释的方法。数据科学家可以使用该框架对其模型进行事后解释和解释,开发更多可解释的工具,并向临床医生展示研究结果,以增强他们对此类模型的信心。所提出的框架是基于一个用例场景进行评估的,该场景场景中的血管分割模型是在人类大脑的战斗时间(TOF)磁共振血管造影(MRA)图像上训练的。定量和定性结果的比较研究,不同的模型和解释方法提出。此外,本文还对现有的几种可解释性和可解释性方法进行了广泛的概述。 摘要:Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Moreover, this research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models and generate visual interpretations and explanations for clinicians to corroborate their clinical findings. In addition, this will aid in gaining confidence in such methods. The framework builds on existing interpretability and explainability techniques that are currently focusing on classification models, extending them to segmentation tasks. In addition, these methods have been adapted to 3D models for volumetric analysis. The proposed framework provides methods to quantitatively compare visual explanations using infidelity and sensitivity metrics. This framework can be used by data scientists to perform post-hoc interpretations and explanations of their models, develop more explainable tools and present the findings to clinicians to increase their faith in such models. The proposed framework was evaluated based on a use case scenario of vessel segmentation models trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain. Quantitative and qualitative results of a comparative study of different models and interpretability methods are presented. Furthermore, this paper provides an extensive overview of several existing interpretability and explainability methods.
【95】 Finding Backdoors to Integer Programs: A Monte Carlo Tree Search Framework 标题:寻找整数程序的后门:蒙特卡罗树搜索框架 链接:https://arxiv.org/abs/2110.08423
作者:Elias B. Khalil,Pashootan Vaezipoor,Bistra Dilkina 机构:Department of Mechanical, & Industrial Engineering, University of Toronto, Department of Computer Science, University of Southern California 摘要:在混合整数线性规划(MIP)中,(强)后门是实例整数变量的“小”子集,具有以下特性:在分支定界过程中,通过仅对后门中的变量进行分支,实例可以求解为全局最优。为广泛使用的MIP基准集或特定问题族构建预先计算的后门数据集,可以围绕MIP的新颖结构特性提出新的问题,或者解释为什么理论上困难的问题可以在实践中有效解决。现有的寻找后门的算法以各种方式依赖于对候选变量子集的抽样,这种方法已经证明了MIPLIB2003和MIPLIB2010中的一些实例存在后门。然而,由于勘探和开发之间的不平衡,这些算法无法在任务中持续成功。我们提出了BaMCTS,一个用于查找MIPs后门的蒙特卡罗树搜索框架。广泛的算法工程、与传统MIP概念的混合以及与CPLEX解算器的紧密集成使我们的方法在MIPLIB2017实例上的性能优于基线,从而更频繁、更高效地找到后门。 摘要:In Mixed Integer Linear Programming (MIP), a (strong) backdoor is a "small" subset of an instance's integer variables with the following property: in a branch-and-bound procedure, the instance can be solved to global optimality by branching only on the variables in the backdoor. Constructing datasets of pre-computed backdoors for widely used MIP benchmark sets or particular problem families can enable new questions around novel structural properties of a MIP, or explain why a problem that is hard in theory can be solved efficiently in practice. Existing algorithms for finding backdoors rely on sampling candidate variable subsets in various ways, an approach which has demonstrated the existence of backdoors for some instances from MIPLIB2003 and MIPLIB2010. However, these algorithms fall short of consistently succeeding at the task due to an imbalance between exploration and exploitation. We propose BaMCTS, a Monte Carlo Tree Search framework for finding backdoors to MIPs. Extensive algorithmic engineering, hybridization with traditional MIP concepts, and close integration with the CPLEX solver have enabled our method to outperform baselines on MIPLIB2017 instances, finding backdoors more frequently and more efficiently.
【96】 Information-Theoretic Measures of Dataset Difficulty 标题:数据集难度的信息论测度 链接:https://arxiv.org/abs/2110.08420
作者:Kawin Ethayarajh,Yejin Choi,Swabha Swayamdipta 机构:Stanford University♥, Allen Institute for Artificial Intelligence♣, Paul G. Allen School of Computer Science, University of Washington♦ 摘要:评估数据集的难度通常涉及将最先进的模型与人类进行比较;性能差距越大,数据集的难度就越大。这个框架不仅是非正式的,而且对于每个实例有多困难,或者是什么属性使给定模型变得困难,它也提供了很少的理解。为了解决这些问题,我们提出了一种信息论的观点,即由于缺少$textit{usableinformation}$,数据集难以构建。测量可用信息与测量性能一样简单,但具有一定的理论优势。后者只允许我们比较同一数据集的不同模型,前者还允许我们比较同一模型的不同数据集。然后,我们引入$textit{pointwise}$$mathcal{V}-$textit{information}$(PVI)来度量单个实例的难度,其中具有较高PVI的实例对于模型$mathcal{V}$更容易。通过在测量可用信息之前对输入进行操作,我们可以理解$textit{why}$数据集对于给定模型来说是容易的还是困难的,我们使用它来发现广泛使用的基准测试中的注释人工制品。 摘要:Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. Not only is this framework informal, but it also provides little understanding of how difficult each instance is, or what attributes make it difficult for a given model. To address these problems, we propose an information-theoretic perspective, framing dataset difficulty as the absence of $textit{usable information}$. Measuring usable information is as easy as measuring performance, but has certain theoretical advantages. While the latter only allows us to compare different models w.r.t the same dataset, the former also allows us to compare different datasets w.r.t the same model. We then introduce $textit{pointwise}$ $mathcal{V}-$$textit{information}$ (PVI) for measuring the difficulty of individual instances, where instances with higher PVI are easier for model $mathcal{V}$. By manipulating the input before measuring usable information, we can understand $textit{why}$ a dataset is easy or difficult for a given model, which we use to discover annotation artefacts in widely-used benchmarks.
【97】 Open Domain Question Answering over Virtual Documents: A Unified Approach for Data and Text 标题:虚拟文档上的开放领域问答:一种数据和文本的统一方法 链接:https://arxiv.org/abs/2110.08417
作者:Kaixin Ma,Hao Cheng,Xiaodong Liu,Eric Nyberg,Jianfeng Gao 机构:♣ Carnegie Mellon University ♠ Microsoft Research 摘要:由于其在数据和文本上具有通用接口的潜力,数据到文本生成最近变得越来越流行。然而,以前的工作很少关注其在下游任务中的应用,例如,使用转换后的数据进行推理。在这项工作中,我们的目标是弥合这一差距,并使用数据到文本的方法作为知识密集型应用(即开放领域问答(QA))的结构化知识编码手段。具体来说,我们提出了一个用于数据和文本上的开放域QA的言语化检索器阅读器框架,其中来自Wikipedia的言语化表和来自Wikidata的三元组被用作增强的知识源。我们表明,我们的统一数据和文本QA,UDT-QA,可以有效地受益于扩展的知识索引,从而比纯文本基线获得更大的收益。值得注意的是,我们的方法在自然问题上设定了单一模型的最新水平。此外,我们的分析表明,对于自适应和热插拔设置,口头化知识是首选的答案推理。 摘要:Due to its potential for a universal interface over both data and text, data-to-text generation is becoming increasingly popular recently. However, few previous work has focused on its application to downstream tasks, e.g. using the converted data for grounding or reasoning. In this work, we aim to bridge this gap and use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA). Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources. We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines. Notably, our approach sets the single-model state-of-the-art on Natural Questions. Furthermore, our analyses indicate that verbalized knowledge is preferred for answer reasoning for both adapted and hot-swap settings.
【98】 Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining 标题:用递归掩蔽和再训练评估自然语言处理中重要性度量的可信性 链接:https://arxiv.org/abs/2110.08412
作者:Andreas Madsen,Nicholas Meade,Vaibhav Adlakha,Siva Reddy 机构: Mila – Quebec AI Institute, Polytechnique Montréal, McGill, Facebook CIFAR AI Chair 摘要:为了解释NLP模型,许多方法告知哪些输入标记对预测很重要。然而,一个悬而未决的问题是,这些方法是否准确地反映了模型的逻辑,这一特性通常被称为忠实性。在这项工作中,我们改编并改进了胡克等人(2019)最近提出的一个来自计算机视觉的忠诚基准,名为ROAR(移除和再训练)。我们通过递归移除数据集冗余来改进ROAR,否则这些冗余会干扰ROAR。我们将ROAR应用于流行的NLP重要性度量,即注意力、梯度和综合梯度。此外,我们使用互信息作为额外的基线。评估是在一系列的分类任务上进行的,这些任务通常用于注意力的忠实性文献中。最后,我们提出了一个标量信度度量,它可以方便地比较不同论文的结果。我们发现,被认为对计算机视觉任务不忠的重要性度量在NLP任务中表现良好,重要性度量的忠实性依赖于任务,并且积分梯度的计算开销很少合理。 摘要:To explain NLP models, many methods inform which inputs tokens are important for a prediction. However, an open question is if these methods accurately reflect the model's logic, a property often called faithfulness. In this work, we adapt and improve a recently proposed faithfulness benchmark from computer vision called ROAR (RemOve And Retrain), by Hooker et al. (2019). We improve ROAR by recursively removing dataset redundancies, which otherwise interfere with ROAR. We adapt and apply ROAR, to popular NLP importance measures, namely attention, gradient, and integrated gradients. Additionally, we use mutual information as an additional baseline. Evaluation is done on a suite of classification tasks often used in the faithfulness of attention literature. Finally, we propose a scalar faithfulness metric, which makes it easy to compare results across papers. We find that, importance measures considered to be unfaithful for computer vision tasks perform favorably for NLP tasks, the faithfulness of an importance measure is task-dependent, and the computational overhead of integrated gradient is rarely justified.
【99】 sbp-env: A Python Package for Sampling-based Motion Planner and Samplers 标题:sbp-env:基于采样的运动规划器和采样器的Python包 链接:https://arxiv.org/abs/2110.08402
作者:Tin Lai 机构: School of Computer Science, The University of Sydney, Australia, DOI: ,.,joss., Software, • Review, • Repository, • Archive, Editor: Daniel S. Katz, Reviewers:, Authors of papers retain, copyright and release the work, under a Creative Commons, Attribution ,., International 备注:None 摘要:基于采样的运动规划测试环境(sbp env)是一个全功能框架,用于快速测试不同的基于采样的运动规划算法。sbp env注重修补框架不同方面的灵活性,并将主要规划组件分为两类(i)采样器和(ii)规划器。运动规划研究的重点主要集中在(i)提高采样效率(使用启发式或学习分布等方法)和(ii)规划器使用不同例程构建连通图的算法方面。因此,通过分离这两个组件,可以快速交换出不同的组件来测试新的想法。 摘要:Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quickly test different sampling-based algorithms for motion planning. sbp-env focuses on the flexibility of tinkering with different aspects of the framework, and had divided the main planning components into two categories (i) samplers and (ii) planners. The focus of motion planning research had been mainly on (i) improving the sampling efficiency (with methods such as heuristic or learned distribution) and (ii) the algorithmic aspect of the planner using different routines to build a connected graph. Therefore, by separating the two components one can quickly swap out different components to test novel ideas.
【100】 Comparing Human and Machine Bias in Face Recognition 标题:人脸识别中人与机器偏差的比较 链接:https://arxiv.org/abs/2110.08396
作者:Samuel Dooley,Ryan Downing,George Wei,Nathan Shankar,Bradon Thymes,Gudrun Thorkelsdottir,Tiye Kurtz-Miott,Rachel Mattson,Olufemi Obiwumi,Valeriia Cherepanova,Micah Goldblum,John P Dickerson,Tom Goldstein 机构:University of Maryland, University of Massachusetts, Amherst, Pomona College, Howard University, University of California, San Diego, University of Georgia, Haverford College 摘要:最近的许多研究发现并讨论了面部分析技术中存在的严重偏差问题,发现了基于感知性别、皮肤类型、照明条件的人群之间的表现差异,这些审计在测量算法偏差方面非常重要和成功,但有两个主要挑战:审计(1)使用缺乏高质量元数据的面部识别数据集,如LFW和CelebA,以及(2)不将观察到的算法偏差与其人类备选方案的偏差进行比较。在本文中,我们对LFW和CelebA数据集进行了改进,这将使未来的研究人员能够获得不受数据集中主要缺陷影响的算法偏差测量值(例如,画廊和测试集中出现的相同图像)。我们还利用这些新数据开发了一系列具有挑战性的面部识别和验证问题,并对各种算法和大量平衡的人类评论者样本进行了验证。我们发现,计算机模型和人类调查参与者在验证任务中的表现显著更好,黑皮肤或女性受试者在这两项任务中的准确率通常较低,并且当他们的人口统计数据与问题匹配时,准确率较高。在这两项任务上,观察到计算机模型比调查参与者的准确度更高,并且显示出与人类调查参与者相似的偏差程度。 摘要:Much recent research has uncovered and discussed serious concerns of bias in facial analysis technologies, finding performance disparities between groups of people based on perceived gender, skin type, lighting condition, etc. These audits are immensely important and successful at measuring algorithmic bias but have two major challenges: the audits (1) use facial recognition datasets which lack quality metadata, like LFW and CelebA, and (2) do not compare their observed algorithmic bias to the biases of their human alternatives. In this paper, we release improvements to the LFW and CelebA datasets which will enable future researchers to obtain measurements of algorithmic bias that are not tainted by major flaws in the dataset (e.g. identical images appearing in both the gallery and test set). We also use these new data to develop a series of challenging facial identification and verification questions that we administered to various algorithms and a large, balanced sample of human reviewers. We find that both computer models and human survey participants perform significantly better at the verification task, generally obtain lower accuracy rates on dark-skinned or female subjects for both tasks, and obtain higher accuracy rates when their demographics match that of the question. Computer models are observed to achieve a higher level of accuracy than the survey participants on both tasks and exhibit bias to similar degrees as the human survey participants.
【101】 A Bayesian Approach for Medical Inquiry and Disease Inference in Automated Differential Diagnosis 标题:自动鉴别诊断中医学查询和疾病推理的贝叶斯方法 链接:https://arxiv.org/abs/2110.08393
作者:Hong Guan,Chitta Baral 机构:Arizona State University 摘要:我们提出了一种用于医学查询和疾病推断的贝叶斯方法,这是鉴别诊断的两个主要阶段。与以往模拟给定概率数据并使用ML算法的工作不同,我们直接使用快速医学参考(QMR)信念网络,在推理阶段应用贝叶斯推理,在查询阶段应用贝叶斯实验设计。此外,我们通过将贝叶斯实验设计框架从一步搜索扩展到多步搜索来改进查询阶段。我们的方法具有一些实际优势,因为它是可解释的,不需要昂贵的训练,并且能够适应新的变化,而无需任何额外的努力。我们的实验表明,我们的方法在两个模拟数据集SymCAT和HPO上取得了最新的结果,在两个诊断对话数据集Muzhi和Dxy上取得了有竞争力的结果。 摘要:We propose a Bayesian approach for both medical inquiry and disease inference, the two major phases in differential diagnosis. Unlike previous work that simulates data from given probabilities and uses ML algorithms on them, we directly use the Quick Medical Reference (QMR) belief network, and apply Bayesian inference in the inference phase and Bayesian experimental design in the inquiry phase. Moreover, we improve the inquiry phase by extending the Bayesian experimental design framework from one-step search to multi-step search. Our approach has some practical advantages as it is interpretable, free of costly training, and able to adapt to new changes without any additional effort. Our experiments show that our approach achieves new state-of-the-art results on two simulated datasets, SymCAT and HPO, and competitive results on two diagnosis dialogue datasets, Muzhi and Dxy.
【102】 A Neural Network Ensemble Approach to System Identification 标题:一种用于系统辨识的神经网络集成方法 链接:https://arxiv.org/abs/2110.08382
作者:Elisa Negrini,Giovanna Citti,Luca Capogna 机构: Worcester Polytechnic Institute, 100 Institute Road, USA 2Department of Mathematics, University of Bologna 摘要:我们提出了一种新的算法学习未知控制方程的轨迹数据,使用和集成的神经网络。给定未知动力系统$dot{x}(t)=f(t,x(t))$的解样本$x(t)$,我们使用神经网络集合来近似函数$f$。我们以积分形式表示方程,并使用Euler方法预测每个连续时间步的解,在每次迭代中使用不同的神经网络作为$f$的先验。这个过程产生M-1时间独立网络,其中M是观测到$x(t)$的时间步数。最后,我们通过神经网络插值得到单个函数$f(t,x(t))$。与我们以前的工作不同,我们数值计算了数据的导数,并将其作为Lipschitz正则化神经网络的目标,以接近$f$,我们的新方法避免了数值微分,这在噪声存在时是不稳定的。我们在数据中有噪声和无噪声的多个例子上测试了新算法。我们的经验表明,通过在我们的损失函数中添加Lipschitz正则项,控制方程的泛化和恢复得到改善,并且该方法改进了我们以前的方法,特别是在存在噪声的情况下,当数值微分提供低质量的目标数据时。最后,我们将我们的结果与Raissi等人提出的方法进行比较。arXiv:1801.01236(2018)和SINDy。 摘要:We present a new algorithm for learning unknown governing equations from trajectory data, using and ensemble of neural networks. Given samples of solutions $x(t)$ to an unknown dynamical system $dot{x}(t)=f(t,x(t))$, we approximate the function $f$ using an ensemble of neural networks. We express the equation in integral form and use Euler method to predict the solution at every successive time step using at each iteration a different neural network as a prior for $f$. This procedure yields M-1 time-independent networks, where M is the number of time steps at which $x(t)$ is observed. Finally, we obtain a single function $f(t,x(t))$ by neural network interpolation. Unlike our earlier work, where we numerically computed the derivatives of data, and used them as target in a Lipschitz regularized neural network to approximate $f$, our new method avoids numerical differentiations, which are unstable in presence of noise. We test the new algorithm on multiple examples both with and without noise in the data. We empirically show that generalization and recovery of the governing equation improve by adding a Lipschitz regularization term in our loss function and that this method improves our previous one especially in presence of noise, when numerical differentiation provides low quality target data. Finally, we compare our results with the method proposed by Raissi, et al. arXiv:1801.01236 (2018) and with SINDy.
【103】 Starkit: RoboCup Humanoid KidSize 2021 Worldwide Champion Team Paper 标题:Starkit:RoboCup人形KidSize 2021世界冠军团体论文 链接:https://arxiv.org/abs/2110.08377
作者:Egor Davydenko,Ivan Khokhlov,Vladimir Litvinenko,Ilya Ryakin,Ilya Osokin,Azer Babaev 机构:Team Starkit, Moscow Institute of Physics and Technology, Russia 备注:15 pages, 10 figures 摘要:本文致力于介绍在悉尼RoboCup 2019和全球RoboCup 2021之间正在开发的功能。这些特征包括与视觉相关的问题,如检测和定位、机械和算法创新。由于比赛是虚拟举行的,本文还考虑了模拟的具体特点。我们概述了已经尝试过的方法,并分析了它们的前提条件、观点和性能评估。 摘要:This article is devoted to the features that were under development between RoboCup 2019 Sydney and RoboCup 2021 Worldwide. These features include vision-related matters, such as detection and localization, mechanical and algorithmic novelties. Since the competition was held virtually, the simulation-specific features are also considered in the article. We give an overview of the approaches that were tried out along with the analysis of their preconditions, perspectives and the evaluation of their performance.
【104】 Revisiting Popularity and Demographic Biases in Recommender Evaluation and Effectiveness 标题:再论推荐人评价和有效性中的人气和人口偏向 链接:https://arxiv.org/abs/2110.08353
作者:Nicola Neophytou,Bhaskar Mitra,Catherine Stinson 机构: The University of Manchester, Oxford Rd, Manchester M,PL, UK, Microsoft, Rue Marconi, Montréal, Quebec, H,S ,J, Canada, School of Computing, Goodwin Hall, Queen’s University, Kingston ON, K,L 摘要:推荐算法容易受到流行偏见的影响:即使流行项目不能满足用户需求,也会倾向于推荐它们。一个相关的问题是,推荐质量可能因人口统计组而异。与其他算法相比,边缘化群体或在训练数据中代表性不足的群体可能从这些算法中获得的相关建议较少。在最近的一项研究中,Ekstrand等人调查了推荐人的表现如何随受欢迎程度和人口统计学的不同而变化,并发现在两个数据集上,二元性别之间的推荐效用在统计学上存在显著差异,在一个数据集上,推荐人的推荐效用在年龄上存在显著差异。在这里,我们复制这些结果,并通过额外的分析对其进行扩展。我们发现在年龄和性别方面,推荐人的表现存在显著的统计学差异。我们观察到,老年用户的推荐效用稳步下降,女性的推荐效用低于男性。我们还发现,对于来自数据集中具有更多代表性的国家的用户,效用更高。此外,我们发现,消费内容的总使用量和受欢迎程度是推荐人绩效的有力预测因素,并且在人口统计学组中也存在显著差异。 摘要:Recommendation algorithms are susceptible to popularity bias: a tendency to recommend popular items even when they fail to meet user needs. A related issue is that the recommendation quality can vary by demographic groups. Marginalized groups or groups that are under-represented in the training data may receive less relevant recommendations from these algorithms compared to others. In a recent study, Ekstrand et al. investigate how recommender performance varies according to popularity and demographics, and find statistically significant differences in recommendation utility between binary genders on two datasets, and significant effects based on age on one dataset. Here we reproduce those results and extend them with additional analyses. We find statistically significant differences in recommender performance by both age and gender. We observe that recommendation utility steadily degrades for older users, and is lower for women than men. We also find that the utility is higher for users from countries with more representation in the dataset. In addition, we find that total usage and the popularity of consumed content are strong predictors of recommender performance and also vary significantly across demographic groups.
【105】 Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction 标题:基于分步纠错的透明交互式语义分析 链接:https://arxiv.org/abs/2110.08345
作者:Lingbo Mo,Ashley Lewis,Huan Sun,Michael White 机构:The Ohio State University 摘要:现有的语义分析研究主要集中在将自然语言话语一次性映射到相应的逻辑形式上。然而,由于自然语言可能包含大量的歧义和可变性,这是一个困难的挑战。在这项工作中,我们研究了一个交互式语义分析框架,该框架以自然语言一步一步地解释预测的逻辑形式,并使用户能够通过自然语言反馈对各个步骤进行更正。我们将重点放在知识库问答(KBQA)上,作为我们框架的一个实例,旨在提高解析过程的透明度,并帮助用户适当地信任最终答案。为此,我们构建了一个从ComplexWebQuestions数据集派生的受启发的众包对话数据集。我们的实验表明,具有人类反馈的交互式框架有可能极大地提高整体解析精度。此外,我们开发了一个对话模拟管道,以评估我们的框架w.r.t.各种最先进的KBQA模型,而无需进一步的众包工作。结果表明,我们的交互式语义分析框架有望在这些模型中有效。 摘要:Existing studies on semantic parsing focus primarily on mapping a natural-language utterance to a corresponding logical form in one turn. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, aiming to increase the transparency of the parsing process and help the user appropriately trust the final answer. To do so, we construct INSPIRED, a crowdsourced dialogue dataset derived from the ComplexWebQuestions dataset. Our experiments show that the interactive framework with human feedback has the potential to greatly improve overall parse accuracy. Furthermore, we develop a pipeline for dialogue simulation to evaluate our framework w.r.t. a variety of state-of-the-art KBQA models without involving further crowdsourcing effort. The results demonstrate that our interactive semantic parsing framework promises to be effective across such models.
【106】 HyperSeed: Unsupervised Learning with Vector Symbolic Architectures 标题:HyperSeed:基于矢量符号体系结构的无监督学习 链接:https://arxiv.org/abs/2110.08343
作者:Evgeny Osipov,Sachin Kahawala,Dilantha Haputhanthri,Thimal Kempitiya,Daswin De Silva,Damminda Alahakoon,Denis Kleyko 机构: Alahakoonis with the Centre for Data Analytics and Cognition at La Trobe University 摘要:受生物神经形态硬件最新创新的启发,本文提出了一种新的无监督机器学习方法Hyperseed,该方法利用向量符号体系结构(VSA)快速学习未标记数据的拓扑保持特征映射。它依赖于VSAs的两个主要功能:绑定操作和叠加计算。在本文中,我们介绍了在傅里叶全息约化表示VSA模型中表示的Hyperseed的算法部分,它特别适合于在尖峰神经形态硬件上实现。Hyperseed算法有两个独特的新颖之处:1)仅从少量输入数据样本进行学习;2)基于单个向量运算的学习规则。这些特性在合成数据集以及示例性基准用例、虹膜分类和使用n-gram统计的语言识别任务上得到了演示。 摘要:Motivated by recent innovations in biologically-inspired neuromorphic hardware, this paper presents a novel unsupervised machine learning approach named Hyperseed that leverages Vector Symbolic Architectures (VSA) for fast learning a topology preserving feature map of unlabelled data. It relies on two major capabilities of VSAs: the binding operation and computing in superposition. In this paper, we introduce the algorithmic part of Hyperseed expressed within Fourier Holographic Reduced Representations VSA model, which is specifically suited for implementation on spiking neuromorphic hardware. The two distinctive novelties of the Hyperseed algorithm are: 1) Learning from only few input data samples and 2) A learning rule based on a single vector operation. These properties are demonstrated on synthetic datasets as well as on illustrative benchmark use-cases, IRIS classification and a language identification task using n-gram statistics.
【107】 Control Prefixes for Text Generation 标题:用于文本生成的控件前缀 链接:https://arxiv.org/abs/2110.08329
作者:Jordan Clive,Kris Cao,Marek Rei 机构:Imperial College London, DeepMind, London, UK 摘要:提示学习方法通过使用特定于任务的提示和输入,使预先训练的语言模型适应下游应用程序。当前关于文本生成中提示学习的大多数工作都依赖于数据集中所有示例的共享数据集级别的提示。我们扩展了这种方法,并提出了一种动态方法,即控制前缀,它允许在每个提示中包含条件输入相关信息。控制前缀位于即时学习和受控生成的交叉点,使模型能够在文本生成期间具有更细粒度的控制。该方法将属性级可学习表示合并到预先训练的转换器的不同层中,允许生成的文本在特定方向上被引导。我们对该技术进行了系统评估,并将其应用于自然语言生成(NLG)GEM基准中的五个数据集。我们展示了几个数据到文本数据集的最新结果,包括WebNLG。 摘要:Prompt learning methods adapt pre-trained language models to downstream applications by using a task-specific prompt together with the input. Most of the current work on prompt learning in text generation relies on a shared dataset-level prompt for all examples in the dataset. We extend this approach and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information in each prompt. Control Prefixes is at the intersection of prompt learning and controlled generation, empowering the model to have finer-grained control during text generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). We present state-of-the-art results on several data-to-text datasets, including WebNLG.
【108】 Robustness of different loss functions and their impact on networks learning capability 标题:不同损失函数的鲁棒性及其对网络学习能力的影响 链接:https://arxiv.org/abs/2110.08322
作者:Vishal Rajput 机构:Computer Science Department, KU Leuven, Belgium 摘要:人工智能的最新发展使其无处不在,每个行业都试图采用某种形式的智能数据处理。尽管在这一领域取得了如此多的进步,AIs的全部功能仍有待行业开发。由于对这种自主系统缺乏信任,涉及一些风险因素的行业仍然对人工智能的使用持谨慎态度。当今的人工智能在很多方面可能都很好,但在推理方面却非常糟糕,人工智能的这种行为可能导致灾难性的结果。自动驾驶汽车撞人或无人驾驶飞机卡在树上是人工智能决策导致灾难性后果的几个例子。为了深入了解和解释人工智能的学习能力,我们将尝试分析损失函数的工作原理。在我们的例子中,我们将使用两组损失函数,广义损失函数(如二进制交叉熵或BCE)和特殊损失函数(如骰子损失或焦点损失)。通过一系列实验,我们将确定组合不同的损失函数是否比使用单个损失函数更好,如果是,那么背后的原因是什么。为了确定广义损失和专门损失之间的差异,我们将使用上述损失训练几个模型,然后比较它们在对抗性示例中的稳健性。特别是,我们将观察当我们改变与最显著的梯度对应的像素时,不同模型的精度下降的速度。 摘要:Recent developments in AI have made it ubiquitous, every industry is trying to adopt some form of intelligent processing of their data. Despite so many advances in the field, AIs full capability is yet to be exploited by the industry. Industries that involve some risk factors still remain cautious about the usage of AI due to the lack of trust in such autonomous systems. Present-day AI might be very good in a lot of things but it is very bad in reasoning and this behavior of AI can lead to catastrophic results. Autonomous cars crashing into a person or a drone getting stuck in a tree are a few examples where AI decisions lead to catastrophic results. To develop insight and generate an explanation about the learning capability of AI, we will try to analyze the working of loss functions. For our case, we will use two sets of loss functions, generalized loss functions like Binary cross-entropy or BCE and specialized loss functions like Dice loss or focal loss. Through a series of experiments, we will establish whether combining different loss functions is better than using a single loss function and if yes, then what is the reason behind it. In order to establish the difference between generalized loss and specialized losses, we will train several models using the above-mentioned losses and then compare their robustness on adversarial examples. In particular, we will look at how fast the accuracy of different models decreases when we change the pixels corresponding to the most salient gradients.
【109】 Dynamic probabilistic logic models for effective abstractions in RL 标题:RL中有效抽象的动态概率逻辑模型 链接:https://arxiv.org/abs/2110.08318
作者:Harsha Kokel,Arjun Manoharan,Sriraam Natarajan,Balaraman Ravindran,Prasad Tadepalli 机构:The University of Texas at Dallas, Robert Bosch Centre for Data Science and Artificial Intelligence at Indian Institute of Technology Madras, Oregon State University 备注:Accepted at StarAI 2021 (held in conjunction with IJCLR 2021) 摘要:在复杂的强化学习环境中,状态抽象可以实现样本高效学习和更好的任务转移。最近,我们提出了RePReL(Kokel et al.2021),这是一个分层框架,利用关系规划器为学习提供有用的状态抽象。我们简要概述了这个框架,并使用动态概率逻辑模型来设计这些状态抽象。我们的实验表明,RePReL不仅在手头的任务上获得了更好的性能和有效的学习,而且对看不见的任务也有更好的泛化能力。 摘要:State abstraction enables sample-efficient learning and better task transfer in complex reinforcement learning environments. Recently, we proposed RePReL (Kokel et al. 2021), a hierarchical framework that leverages a relational planner to provide useful state abstractions for learning. We present a brief overview of this framework and the use of a dynamic probabilistic logic model to design these state abstractions. Our experiments show that RePReL not only achieves better performance and efficient learning on the task at hand but also demonstrates better generalization to unseen tasks.
【110】 GrowSpace: Learning How to Shape Plants 标题:GrowSpace:学习如何塑造植物 链接:https://arxiv.org/abs/2110.08307
作者:Yasmeen Hitti,Ionelia Buzatu,Manuel Del Verme,Mark Lefsrud,Florian Golemo,Audrey Durand 机构:McGill University, Mila, Johannes Kepler Universität Linz, Université de Montréal, Mila, Element AI, Université Laval, Mila 摘要:植物是动态系统,是我们生存和生存不可或缺的组成部分。植物面临环境变化,并随着时间的推移适应周围的条件。我们认为,植物对环境刺激的反应是现实世界问题的一个很好的例子,可以在强化学习(RL)框架内处理。为了通过移动光源来控制植物,我们提出了生长空间作为新的RL基准。模拟器的后端使用空间殖民算法实现,这是一种基于空间竞争的植物生长模型。与视频游戏RL环境相比,该模拟器解决了一个现实问题,并作为一个试验台,以比物理实验更快的方式可视化植物生长和运动。成长空间由一系列挑战组成,这些挑战解决了控制、多阶段学习、公平性和多目标学习等问题。我们提供了代理基线和案例研究,以证明拟议基准的难度。 摘要:Plants are dynamic systems that are integral to our existence and survival. Plants face environment changes and adapt over time to their surrounding conditions. We argue that plant responses to an environmental stimulus are a good example of a real-world problem that can be approached within a reinforcement learning (RL)framework. With the objective of controlling a plant by moving the light source, we propose GrowSpace, as a new RL benchmark. The back-end of the simulator is implemented using the Space Colonisation Algorithm, a plant growing model based on competition for space. Compared to video game RL environments, this simulator addresses a real-world problem and serves as a test bed to visualize plant growth and movement in a faster way than physical experiments. GrowSpace is composed of a suite of challenges that tackle several problems such as control, multi-stage learning,fairness and multi-objective learning. We provide agent baselines alongside case studies to demonstrate the difficulty of the proposed benchmark.
【111】 When Combating Hype, Proceed with Caution 标题:在打击炒作时,请谨慎行事。 链接:https://arxiv.org/abs/2110.08300
作者:Samuel R. Bowman 机构:New York University 摘要:为了避免强化关于最先进语言技术能力的广泛宣传,研究人员开发了框架和引用的实践,这些实践有助于淡化该领域的成功。尽管这些做法的用意很好,但它们往往会误导甚至错误地宣称我们的最佳技术的局限性。这是一个问题,而且可能比看上去更严重:它限制了我们减轻NLP部署带来的短期危害的能力,也限制了我们为更遥远的未来进步带来的潜在巨大影响做好准备的能力。本文敦促研究人员对这些说法保持谨慎,并提出一些研究方向和沟通策略,以便于避免或反驳这些说法。 摘要:In an effort to avoid reinforcing widespread hype about the capabilities of state-of-the-art language technology, researchers have developed practices in framing and citation that serve to deemphasize the field's successes. Though well-meaning, these practices often yield misleading or even false claims about the limits of our best technology. This is a problem, and it may be more serious than it looks: It limits our ability to mitigate short-term harms from NLP deployments and it limits our ability to prepare for the potentially enormous impacts of more distant future advances. This paper urges researchers to be careful about these claims and suggests some research directions and communication strategies that will make it easier to avoid or rebut them.
【112】 Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails 标题:用个性化注意解释学生失败原因的可解释学生表现预测 链接:https://arxiv.org/abs/2110.08268
作者:Kun Niu,Xipeng Cao,Yicong Yu 机构:School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 备注:AAAI 2021 Workshop on AI Education/TIPCE 2021 摘要:随着高等教育中学生不及格率的不断增加,预测学生下学期的表现已成为一项重要需求。个性化的学生表现预测有助于教育工作者全面了解学生状况,并提前进行有效干预。然而,现有的工作几乎没有考虑到学生绩效预测的可解释性,这是教育工作者最关心的问题。在本文中,我们提出了一种新的基于个性化注意的可解释学生成绩预测方法(ESPA),该方法利用学生档案中的关系和相关课程的先验知识。所设计的双向长短时记忆(BiLSTM)结构提取特定模式路径中的语义信息。对于利用相似路径的内部关系,提出了一种局部和全局层面的注意机制,以区分不同学生或课程对预测的影响。因此,有效的路径推理可以用来预测学生的表现。ESPA在学生成绩预测方面始终优于其他最先进的模型,其结果可以直观地解释。这项工作可以帮助教育工作者更好地理解行为对学生学习的不同影响。 摘要:As student failure rates continue to increase in higher education, predicting student performance in the following semester has become a significant demand. Personalized student performance prediction helps educators gain a comprehensive view of student status and effectively intervene in advance. However, existing works scarcely consider the explainability of student performance prediction, which educators are most concerned about. In this paper, we propose a novel Explainable Student performance prediction method with Personalized Attention (ESPA) by utilizing relationships in student profiles and prior knowledge of related courses. The designed Bidirectional Long Short-Term Memory (BiLSTM) architecture extracts the semantic information in the paths with specific patterns. As for leveraging similar paths' internal relations, a local and global-level attention mechanism is proposed to distinguish the influence of different students or courses for making predictions. Hence, valid reasoning on paths can be applied to predict the performance of students. The ESPA consistently outperforms the other state-of-the-art models for student performance prediction, and the results are intuitively explainable. This work can help educators better understand the different impacts of behavior on students' studies.
【113】 PG^2Net: Personalized and Group Preferences Guided Network for Next Place Prediction标题:PG^2NET:用于下一地点预测的个性化和群体偏好引导网络链接:https://arxiv.org/abs/2110.08266
作者:Huifeng Li,Bin Wang,Fan Xia,Xi Zhai,Sulei Zhu,Yanyan Xu 机构:ShandongUniversity 摘要:预测下一个访问地点是人类移动行为建模的关键,它在传染病控制、城市规划、交通管理和出行推荐等领域发挥着重要作用。为了实现这一点,一个典型的解决方案是基于RNN设计模块,以捕获不同位置的偏好。尽管这些基于RNN的方法可以有效地学习个体对其访问地点的隐藏个性化偏好,但用户之间的交互只能通过位置表示进行弱学习。针对这一点,我们提出了一个名为个性化和群体偏好引导网络(PG$^2$Net)的端到端框架,该框架考虑了用户在个人和集体层面对不同地点的偏好。具体而言,PG$^2$Net将Bi LSTM和注意机制连接起来,以捕获每个用户的长期移动趋势。为了了解人群的群体偏好,我们利用访问的时空信息来构建时空依赖模块。我们采用一种图嵌入方法将用户的轨迹映射到一个隐藏空间中,捕捉用户的序列关系。此外,我们设计了一个辅助损失来学习她的下一个位置的矢量表示。在两个Foursquare签入数据集和一个手机数据集上的实验结果表明,与最先进的基线相比,我们的模型具有优势。源代码可在https://github.com/urbanmobility/PG2Net. 摘要:Predicting the next place to visit is a key in human mobility behavior modeling, which plays a significant role in various fields, such as epidemic control, urban planning, traffic management, and travel recommendation. To achieve this, one typical solution is designing modules based on RNN to capture their preferences to various locations. Although these RNN-based methods can effectively learn individual's hidden personalized preferences to her visited places, the interactions among users can only be weakly learned through the representations of locations. Targeting this, we propose an end-to-end framework named personalized and group preference guided network (PG$^2$Net), considering the users' preferences to various places at both individual and collective levels. Specifically, PG$^2$Net concatenates Bi-LSTM and attention mechanism to capture each user's long-term mobility tendency. To learn population's group preferences, we utilize spatial and temporal information of the visitations to construct a spatio-temporal dependency module. We adopt a graph embedding method to map users' trajectory into a hidden space, capturing their sequential relation. In addition, we devise an auxiliary loss to learn the vectorial representation of her next location. Experiment results on two Foursquare check-in datasets and one mobile phone dataset indicate the advantages of our model compared to the state-of-the-art baselines. Source codes are available at https://github.com/urbanmobility/PG2Net.
【114】 Knowledge-driven Active Learning 标题:知识驱动的主动学习 链接:https://arxiv.org/abs/2110.08265
作者:Gabriele Ciravegna,Frederic Precioso,Marco Gori 机构:Universita di Firenze, Universite Cˆote d’Azur, Italy,France, Universita di Siena 备注:Submitted to the ICLR 2022 conference 摘要:在过去几年中,深度学习模式变得越来越流行。然而,在受监督的数据量有限且人工标签费用昂贵的情况下,仍然无法部署它们。主动学习策略旨在解决这一问题,只需要对少数未标记样本进行监督,在将样本添加到训练集中后,可以提高大多数模型的性能。大多数策略都基于不确定样本选择,甚至常常局限于靠近决策边界的样本。在这里,我们提出了一种非常不同的方法,考虑到领域知识。事实上,在多标签分类的情况下,类之间的关系提供了一种发现非相干预测的方法,即模型最有可能需要监督的预测。我们开发了一个框架,将一阶逻辑知识转换为约束,并检查它们的违反情况,作为样本选择的自然指南。我们的经验表明,知识驱动策略优于标准策略,尤其是在领域知识完整的数据集上。此外,我们还展示了该方法如何发现远离训练数据的数据分布。最后,所提出的知识驱动策略也可以很容易地用于基于标准不确定性的技术难以应用的目标检测问题。 摘要:In the last few years, Deep Learning models have become increasingly popular. However, their deployment is still precluded in those contexts where the amount of supervised data is limited and manual labelling expensive. Active learning strategies aim at solving this problem by requiring supervision only on few unlabelled samples, which improve the most model performances after adding them to the training set. Most strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary. Here we propose a very different approach, taking into consideration domain knowledge. Indeed, in the case of multi-label classification, the relationships among classes offer a way to spot incoherent predictions, i.e., predictions where the model may most likely need supervision. We have developed a framework where first-order-logic knowledge is converted into constraints and their violation is checked as a natural guide for sample selection. We empirically demonstrate that knowledge-driven strategy outperforms standard strategies, particularly on those datasets where domain knowledge is complete. Furthermore, we show how the proposed approach enables discovering data distributions lying far from training data. Finally, the proposed knowledge-driven strategy can be also easily used in object-detection problems where standard uncertainty-based techniques are difficult to apply.
【115】 Self-supervised Contrastive Attributed Graph Clustering 标题:自监督对比属性图聚类 链接:https://arxiv.org/abs/2110.08264
作者:Wei Xia,Quanxue Gao,Ming Yang,Xinbo Gao 机构:Xidian University, Westfield State University, Chongqing University of Posts and Telecommunications 摘要:属性图聚类是图分析中的一项基本而富有挑战性的任务,它从节点属性和拓扑图中学习节点表示进行聚类。最近,基于图对比学习(GCL)的方法在这项任务上取得了令人印象深刻的聚类性能。然而,我们观察到现有的基于GCL的方法1)不能从不精确的聚类标签中获益;2) 需要进行后处理操作以获取群集标签;3) 无法解决样本外(OOS)问题。为了解决这些问题,我们提出了一种新的属性图聚类网络,即自监督对比属性图聚类(SCAGC)。在SCAGC中,通过利用不准确的聚类标签,设计了一种自我监督的对比损失,其目的是最大化簇内节点的相似性,同时最小化簇间节点的相似性,用于节点表示学习。同时,构建了聚类模块,通过对比不同聚类的表示,直接输出聚类标签。因此,对于OOS节点,SCAGC可以直接计算它们的集群标签。在四个基准数据集上的大量实验结果表明,SCAGC始终优于11种竞争聚类方法。 摘要:Attributed graph clustering, which learns node representation from node attribute and topological graph for clustering, is a fundamental but challenging task for graph analysis. Recently, methods based on graph contrastive learning (GCL) have obtained impressive clustering performance on this task. Yet, we observe that existing GCL-based methods 1) fail to benefit from imprecise clustering labels; 2) require a post-processing operation to get clustering labels; 3) cannot solve out-of-sample (OOS) problem. To address these issues, we propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC). In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, which aims to maximize the similarities of intra-cluster nodes while minimizing the similarities of inter-cluster nodes, are designed for node representation learning. Meanwhile, a clustering module is built to directly output clustering labels by contrasting the representation of different clusters. Thus, for the OOS nodes, SCAGC can directly calculate their clustering labels. Extensive experimental results on four benchmark datasets have shown that SCAGC consistently outperforms 11 competitive clustering methods.
【116】 Effective Certification of Monotone Deep Equilibrium Models 标题:单调深度平衡模型的有效证明 链接:https://arxiv.org/abs/2110.08260
作者:Mark Niklas Müller,Robin Staab,Marc Fischer,Martin Vechev 机构:Department of Computer Science, ETH Zurich, Switzerland 摘要:单调算子均衡模型(mondeq)是一类结合了强大的深度均衡范式和收敛保证的模型。此外,它们对对抗性干扰的固有鲁棒性使得研究它们的可证明性成为一个有希望的研究方向。不幸的是,现有的方法要么不精确,要么在可伸缩性方面受到严重限制。在这项工作中,我们提出了第一个可伸缩和精确的monDEQ验证器,基于两个关键思想:(i)一个新的凸松弛,实现有效的包含检查;(ii)非平凡的数学见解,表征monDEQ在集合上的不动点操作,而不是具体的输入。我们的验证器对具有挑战性的$elluinfty$扰动的广泛评估表明,它在速度(两个数量级)和可扩展性(一个数量级)方面超过了最先进的性能,同时在同一网络上获得了25%的高认证精度。 摘要:Monotone Operator Equilibrium Models (monDEQs) represent a class of models combining the powerful deep equilibrium paradigm with convergence guarantees. Further, their inherent robustness to adversarial perturbations makes investigating their certifiability a promising research direction. Unfortunately, existing approaches are either imprecise or severely limited in scalability. In this work, we propose the first scalable and precise monDEQ verifier, based on two key ideas: (i) a novel convex relaxation enabling efficient inclusion checks, and (ii) non-trivial mathematical insights characterizing the fixpoint operations at the heart of monDEQs on sets rather than concrete inputs. An extensive evaluation of our verifier on the challenging $ell_infty$ perturbations demonstrates that it exceeds state-of-the-art performance in terms of speed (two orders of magnitude) and scalability (an order of magnitude) while yielding 25% higher certified accuracies on the same networks.
【117】 Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework 标题:学习何时问什么:分层强化学习框架 链接:https://arxiv.org/abs/2110.08258
作者:Khanh Nguyen,Yonatan Bisk,Hal Daumé III 机构:Hal Daum´e III♣♥, ♣ University of Maryland, ♦ Carnegie Melon University, ♥ Microsoft Research 备注:15 pages, 3 figures, 4 tables 摘要:可靠的人工智能代理应注意其知识的局限性,并在感觉到他们没有足够的知识做出合理决策时咨询人类。我们制定了一个分层强化学习框架,用于学习决定何时向人类请求额外信息,以及哪些类型的信息有助于请求。我们的框架通过允许代理与助手交互来利用其知识完成任务,从而扩展了部分观察到的马尔可夫决策过程(POMDP)。模拟人类辅助导航问题的结果证明了我们框架的有效性:通过我们的方法学习的交互策略的辅助,导航策略在任务成功率方面比单独执行任务提高了7倍。交互策略也很有效:平均而言,在任务执行期间执行的所有操作中,只有四分之一是信息请求。我们用分层的政策结构分析学习的好处和挑战,并为未来的工作提出方向。 摘要:Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.
【118】 C-AllOut: Catching & Calling Outliers by Type 标题:C-AllOut:通过类型捕获和调用离群值 链接:https://arxiv.org/abs/2110.08257
作者:Guilherme D. F. Silva,Leman Akoglu,Robson L. F. Cordeiro 机构:com†Carnegie Mellon University, edu‡University of S˜ao Paulo 备注:9 4 pages, 3 figures, 11 tables 摘要:给定一个未标记的数据集,其中我们只能访问成对的相似性(或距离),我们如何有效地(1)检测异常值,以及(2)按类型注释/标记异常值?离群点检测有大量文献,但我们在该领域发现了一个关键的差距:据我们所知,没有任何现有的工作解决离群点标注问题。离群值大致分为3种类型,代表了对分析员有价值的不同模式:(a)全局离群值是严重的,但孤立的情况不会重复,例如数据收集错误;(b) 本地离群者在某个环境中与其同龄人不同,例如,一个特别矮的篮球运动员;(c)集体异常值是可能表明联合或重复的孤立微集群,例如利用同一漏洞的欺诈。本文介绍了C-AllOut:一种新的、有效的离群点检测器,它可以按类型注释离群点。除了在需要时仅处理成对的相似性(或距离)之外,它是无参数的和可伸缩的。我们发现,C-ALOUT实现PAR或显着更好的性能比国家的最先进的探测器发现离群点,无论其类型。它在注释特定类型的异常值方面也非常有效,这是任何基线都无法执行的任务。 摘要:Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing distinct patterns that could be valuable to analysts: (a) global outliers are severe yet isolate cases that do not repeat, e.g., a data collection error; (b) local outliers diverge from their peers within a context, e.g., a particularly short basketball player; and (c) collective outliers are isolated micro-clusters that may indicate coalition or repetitions, e.g., frauds that exploit the same loophole. This paper presents C-AllOut: a novel and effective outlier detector that annotates outliers by type. It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed. We show that C-AllOut achieves on par or significantly better performance than state-of-the-art detectors when spotting outliers regardless of their type. It is also highly effective in annotating outliers of particular types, a task that none of the baselines can perform.
【119】 A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research 标题:科学XAI实地调查指南:生物信息学研究的透明和可解释的深度学习 链接:https://arxiv.org/abs/2110.08253
作者:Thomas P Quinn,Sunil Gupta,Svetha Venkatesh,Vuong Le 机构:Applied Artificial Intelligence Institute (A,I,), Deakin University, Geelong, Australia 摘要:深度学习因其在预测任务中实现高精度的潜力而变得流行。然而,准确性并不总是统计建模的唯一目标,特别是对于作为科学研究一部分开发的模型而言。相反,许多科学模型的发展是为了促进科学发现,我们的意思是抽象出一种人类可以理解的自然世界的表现。不幸的是,深层神经网络的不透明性限制了它们在科学发现中的作用,对透明解释的模型产生了新的需求。本文是透明模型设计的现场指南。它提供了透明模型设计概念的分类、将设计概念付诸实践的实用工作流以及报告设计选择的通用模板。我们希望本现场指南将帮助研究人员更有效地设计透明的可解释模型,从而使他们能够利用深度学习进行科学发现。 摘要:Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natural world. Unfortunately, the opacity of deep neural networks limit their role in scientific discovery, creating a new demand for models that are transparently interpretable. This article is a field guide to transparent model design. It provides a taxonomy of transparent model design concepts, a practical workflow for putting design concepts into practice, and a general template for reporting design choices. We hope this field guide will help researchers more effectively design transparently interpretable models, and thus enable them to use deep learning for scientific discovery.
【120】 A Rate-Distortion Framework for Explaining Black-box Model Decisions 标题:解释黑盒模型决策的率失真框架 链接:https://arxiv.org/abs/2110.08252
作者:Stefan Kolek,Duc Anh Nguyen,Ron Levie,Joan Bruna,Gitta Kutyniok 机构:Department of Mathematics, Ludwig Maximilian University, Munich, Courant Institute of Mathematical Sciences, New York University, New York 摘要:我们提出了率失真解释(RDE)框架,这是一种数学上很好的解释黑盒模型决策的方法。该框架基于目标输入信号的扰动,适用于任何可微的预训练模型,如神经网络。我们的实验证明了该框架对各种数据模式的适应性,特别是对城市环境的图像、音频和物理模拟。 摘要:We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework's adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.
【121】 SGEN: Single-cell Sequencing Graph Self-supervised Embedding Network 标题:SGen:单细胞测序图自监督嵌入网络 链接:https://arxiv.org/abs/2110.09413
作者:Ziyi Liu,Minghui Liao,Fulin luo,Bo Du 机构:National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, School of Computer Science, and Hubei Key Laboratory of Multimedia and Network Communication, Engineering, Wuhan University, Wuhan, China 备注:6 pages body 2 pages reference 摘要:单细胞测序在探索胚胎发育、癌症进化和细胞分化等生物学过程中具有重要作用。这些生物学特性可用二维散点图表示。然而,单细胞测序数据通常具有很高的维数。因此,应使用降维来处理高维测序数据,以进行二维可视化和后续生物学分析。传统的维数约简方法不考虑单细胞序列数据的结构特征,难以揭示二维表示中的数据结构。在本文中,我们开发了一种基于图卷积网络(GCN)的二维特征表示方法,用于单细胞数据的可视化,称为单细胞序列图嵌入网络(SGEN)。该方法利用细胞间的相似关系构造图,采用GCN分析样本的邻域嵌入信息,使相似细胞在二维散点图上更接近。结果表明,SGEN具有明显的二维分布,保持了不同细胞之间的高维关系。同时,相似的细胞簇具有空间连续性,而不是严重依赖随机初始化,这可以在散点图中反映细胞的发展轨迹。 摘要:Single-cell sequencing has a significant role to explore biological processes such as embryonic development, cancer evolution, and cell differentiation. These biological properties can be presented by a two-dimensional scatter plot. However, single-cell sequencing data generally has very high dimensionality. Therefore, dimensionality reduction should be used to process the high dimensional sequencing data for 2D visualization and subsequent biological analysis. The traditional dimensionality reduction methods, which do not consider the structure characteristics of single-cell sequencing data, are difficult to reveal the data structure in the 2D representation. In this paper, we develop a 2D feature representation method based on graph convolutional networks (GCN) for the visualization of single-cell data, termed single-cell sequencing graph embedding networks (SGEN). This method constructs the graph by the similarity relationship between cells and adopts GCN to analyze the neighbor embedding information of samples, which makes the similar cell closer to each other on the 2D scatter plot. The results show SGEN achieves obvious 2D distribution and preserves the high-dimensional relationship of different cells. Meanwhile, similar cell clusters have spatial continuity rather than relying heavily on random initialization, which can reflect the trajectory of cell development in this scatter plot.
【122】 An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem 标题:一种求解Job Shop调度问题的具有深度双递归智能体的行动者-批评者算法 链接:https://arxiv.org/abs/2110.09076
作者:Marta Monaci,Valerio Agasucci,Giorgio Grani 机构:Sapienza University of Rome, Dep. of Computer Science, Control and Management Engineering, Rome, Italy, OptRail, Rome, Italy, SINTEF Digital, Dep. of Mathematics and Cybernetics, Oslo, Norway 摘要:集成机器学习技术和优化以解决具有挑战性的优化问题的兴趣越来越大。在这项工作中,我们提出了一种针对作业车间调度问题(JSSP)的深度强化学习方法。其目的是建立一个类似贪婪的启发式算法,该算法能够在作业和机器数量不同的JSSP实例的某些分布上进行学习。对快速调度方法的需求是众所周知的,它出现在许多领域,从运输到医疗保健。我们将JSSP建模为一个马尔可夫决策过程,然后利用强化学习的有效性来解决这个问题。我们采用了一种参与者-批评家方案,其中代理所采取的行动受到对状态值函数的政策考虑的影响。对程序进行调整,以考虑JSSP的挑战性,即状态和行动空间不仅在每个实例中发生变化,而且在每个决策之后也会发生变化。为了解决输入中作业和操作数量的可变性,我们使用两个事件LSTM模型(一种特殊类型的深度神经网络)对代理进行建模。实验表明,该算法能在较短的时间内获得较好的解,证明了仅从基于学习的方法中生成新的贪婪启发式算法是可能的。与商业解算器CPLEX相比,已生成基准。正如预期的那样,该模型在某种程度上可以推广到由不同于训练中使用的分布所引起的更大问题或实例。 摘要:There is a growing interest in integrating machine learning techniques and optimization to solve challenging optimization problems. In this work, we propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP). The aim is to build up a greedy-like heuristic able to learn on some distribution of JSSP instances, different in the number of jobs and machines. The need for fast scheduling methods is well known, and it arises in many areas, from transportation to healthcare. We model the JSSP as a Markov Decision Process and then we exploit the efficacy of reinforcement learning to solve the problem. We adopt an actor-critic scheme, where the action taken by the agent is influenced by policy considerations on the state-value function. The procedures are adapted to take into account the challenging nature of JSSP, where the state and the action space change not only for every instance but also after each decision. To tackle the variability in the number of jobs and operations in the input, we modeled the agent using two incident LSTM models, a special type of deep neural network. Experiments show the algorithm reaches good solutions in a short time, proving that is possible to generate new greedy heuristics just from learning-based methodologies. Benchmarks have been generated in comparison with the commercial solver CPLEX. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
【123】 Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set 标题:Pareto导航梯度下降:Pareto集中的一阶优化算法 链接:https://arxiv.org/abs/2110.08713
作者:Mao Ye,Qiang Liu 机构:University of Texas at Austin 摘要:许多现代机器学习应用,如多任务学习,需要找到最佳的模型参数来权衡可能相互冲突的多个目标函数。帕累托集的概念使我们能够专注于无法严格改进的模型集(通常是无限数量的)。但它并没有提供一个可操作的程序来选择一个或几个特殊的模型返回给实际用户。在本文中,我们考虑了帕累托集(Opt in帕累托)中的“EMPH{{优化”},找到了在帕累托集内优化一个额外的参考准则函数的帕累托模型的问题。该函数可以对用户的特定偏好进行编码,或者表示通用多样性度量,以获得代表整个帕累托集的一组多样性帕累托模型。不幸的是,尽管是一个非常有用的框架,有效的选择-加入-帕累托算法在很大程度上仍然缺失,特别是对于深度学习中的大规模、非凸和非线性目标。一种简单的方法是在Pareto集上应用黎曼流形梯度下降,由于需要对Hessian矩阵进行特征计算,因此会产生较高的计算成本。我们提出了一种仅利用梯度信息近似求解OPT-in-Pareto的一阶算法,具有较高的实用效率和理论上保证的收敛性。经验上,我们证明了我们的方法对于各种具有挑战性的多任务相关问题是有效的。 摘要:Many modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions that may conflict with each other. The notion of the Pareto set allows us to focus on the set of (often infinite number of) models that cannot be strictly improved. But it does not provide an actionable procedure for picking one or a few special models to return to practical users. In this paper, we consider emph{optimization in Pareto set (OPT-in-Pareto)}, the problem of finding Pareto models that optimize an extra reference criterion function within the Pareto set. This function can either encode a specific preference from the users, or represent a generic diversity measure for obtaining a set of diversified Pareto models that are representative of the whole Pareto set. Unfortunately, despite being a highly useful framework, efficient algorithms for OPT-in-Pareto have been largely missing, especially for large-scale, non-convex, and non-linear objectives in deep learning. A naive approach is to apply Riemannian manifold gradient descent on the Pareto set, which yields a high computational cost due to the need for eigen-calculation of Hessian matrices. We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information, with both high practical efficiency and theoretically guaranteed convergence property. Empirically, we demonstrate that our method works efficiently for a variety of challenging multi-task-related problems.
【124】 A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer 标题:声学知识转移潜变量学习的变分贝叶斯方法 链接:https://arxiv.org/abs/2110.08598
作者:Hu Hu,Sabato Marco Siniscalchi,Chao-Han Huck Yang,Chin-Hui Lee 机构:School of Electrical and Computer Engineering, Georgia Institute of Technology, GA, USA, Computer Engineering School, University of Enna Kore, Italy 备注:Submitted to ICASSP 2022 摘要:我们提出了一种变分贝叶斯(VB)方法来学习深度神经网络(DNN)模型中潜在变量的分布,用于跨领域知识转移,以解决训练和测试条件之间的声学不匹配问题。与传统的最大后验概率估计中的点估计不同,在估计大量模型参数时存在维数灾难的风险,我们将注意力集中在通过VB推理框架估计可管理数量的DNN潜在变量上。为了完成模型转换,从源域学习的知识被编码在潜在变量的先验分布中,并在贝叶斯意义上与来自目标域的一小组自适应数据进行最佳组合,以近似相应的后验分布。声场景分类中设备自适应的实验结果表明,我们提出的VB方法可以对目标设备进行很好的改进,并且始终优于13种最先进的知识转移算法。 摘要:We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent variables of DNNs via a VB inference framework. To accomplish model transfer, knowledge learnt from a source domain is encoded in prior distributions of latent variables and optimally combined, in a Bayesian sense, with a small set of adaptation data from a target domain to approximate the corresponding posterior distributions. Experimental results on device adaptation in acoustic scene classification show that our proposed VB approach can obtain good improvements on target devices, and consistently outperforms 13 state-of-the-art knowledge transfer algorithms.
【125】 ASR4REAL: An extended benchmark for speech models 标题:ASR4REAL:一种扩展的语音模型基准 链接:https://arxiv.org/abs/2110.08583
作者:Morgane Riviere,Jade Copet,Gabriel Synnaeve 机构:†Facebook AI Research 备注:Submitted to ICASSP 2022 摘要:流行的ASR基准测试,如Librispeech和Switchboard,其所代表的设置和扬声器的多样性受到限制。我们引入了一组与现实生活条件相匹配的基准,旨在发现模型中可能存在的偏差和弱点。我们发现,尽管最近的模型似乎没有表现出性别偏见,但它们通常通过口音表现出重要的表现差异,甚至更重要的表现差异取决于说话者的社会经济地位。最后,当对会话语音进行测试时,所有测试模型的性能都有很大的下降,在这种精确的情况下,即使是在像Common Crawl这样大的数据集上训练的语言模型,似乎也没有显著的积极影响,这重申了开发会话语言模型的重要性 摘要:Popular ASR benchmarks such as Librispeech and Switchboard are limited in the diversity of settings and speakers they represent. We introduce a set of benchmarks matching real-life conditions, aimed at spotting possible biases and weaknesses in models. We have found out that even though recent models do not seem to exhibit a gender bias, they usually show important performance discrepancies by accent, and even more important ones depending on the socio-economic status of the speakers. Finally, all tested models show a strong performance drop when tested on conversational speech, and in this precise context even a language model trained on a dataset as big as Common Crawl does not seem to have significant positive effect which reiterates the importance of developing conversational language models
【126】 Reinforcement Learning for Standards Design 标题:标准设计中的强化学习 链接:https://arxiv.org/abs/2110.06909
作者:Shahrukh Khan Kasi,Sayandev Mukherjee,Lin Cheng,Bernardo A. Huberman 机构:AI,Networks Center, University of Oklahoma, Tulsa, OK, USA, Next-Generation Systems, CableLabs, Santa Clara, CA, USA, Louisville, CO, USA 摘要:通信标准是通过人类委员会设计的,这些委员会在数月甚至数年内反复召开会议,直到达成共识。这包括关于要通过空中接口支持的调制和编码方案的决定。我们提出了一种方法,用于“自动化”选择在给定空中接口上支持的一组调制和编码方案,从而简化标准设计过程,并简化扩展标准以支持适用于新的更高级别应用和服务的新调制方案的易用性。我们的方案涉及机器学习,即构造器实体向评估者实体提交建议,评估者实体返回建议的分数。建造商采用强化学习对其提交的建议书进行迭代,直到达到建造商和评估人之前商定的分数,以表明满足所需的设计标准(包括接口传输的性能指标)。 摘要:Communications standards are designed via committees of humans holding repeated meetings over months or even years until consensus is achieved. This includes decisions regarding the modulation and coding schemes to be supported over an air interface. We propose a way to "automate" the selection of the set of modulation and coding schemes to be supported over a given air interface and thereby streamline both the standards design process and the ease of extending the standard to support new modulation schemes applicable to new higher-level applications and services. Our scheme involves machine learning, whereby a constructor entity submits proposals to an evaluator entity, which returns a score for the proposal. The constructor employs reinforcement learning to iterate on its submitted proposals until a score is achieved that was previously agreed upon by both constructor and evaluator to be indicative of satisfying the required design criteria (including performance metrics for transmissions over the interface).
机器翻译,仅供参考