人工智能学术速递[8.31]

2021-09-16 14:53:26 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.AI人工智能,共计60篇

【1】 Trustworthy AI for Process Automation on a Chylla-Haase Polymerization Reactor 标题:在Chylla-Haase聚合反应器上实现过程自动化的值得信赖的人工智能 链接:https://arxiv.org/abs/2108.13381

作者:Daniel Hein,Daniel Labisch 机构:⋆⋆Siemens AG, Digital Industries, Karlsruhe, Germany (e-mail: 备注:None 摘要:本文利用遗传规划强化学习(GPRL)为乳糜蛋白酶聚合反应器生成人类可解释的控制策略。这种带夹套冷却的连续搅拌釜式反应器(CSTR)广泛应用于化工行业,用于精细化学品、颜料、聚合物和医疗产品的生产。尽管看起来相当简单,但在实际应用中控制CSTR是一个相当具有挑战性的问题。GPRL利用来自反应堆的现有数据,并完全自动生成一组优化的简单控制策略,即领域专家可以选择的所谓策略。请注意,这些策略是低复杂度的白盒模型,这使得它们易于在目标控制系统(如SIMATIC PCS 7)中验证和实施。然而,尽管其复杂度较低,自动生成的策略在反应堆温度控制偏差方面产生了高性能,我们在原始反应器模板上进行了经验评估。 摘要:In this paper, genetic programming reinforcement learning (GPRL) is utilized to generate human-interpretable control policies for a Chylla-Haase polymerization reactor. Such continuously stirred tank reactors (CSTRs) with jacket cooling are widely used in the chemical industry, in the production of fine chemicals, pigments, polymers, and medical products. Despite appearing rather simple, controlling CSTRs in real-world applications is quite a challenging problem to tackle. GPRL utilizes already existing data from the reactor and generates fully automatically a set of optimized simplistic control strategies, so-called policies, the domain expert can choose from. Note that these policies are white-box models of low complexity, which makes them easy to validate and implement in the target control system, e.g., SIMATIC PCS 7. However, despite its low complexity the automatically-generated policy yields a high performance in terms of reactor temperature control deviation, which we empirically evaluate on the original reactor template.

【2】 Survival Prediction of Heart Failure Patients using Stacked Ensemble Machine Learning Algorithm 标题:基于堆叠集成机器学习算法的心力衰竭患者生存预测 链接:https://arxiv.org/abs/2108.13367

作者:S. M Mehedi Zaman,Wasay Mahmood Qureshi,Md. Mohsin Sarker Raihan,Ocean Monjur,Abdullah Bin Shams 机构:Department of Electrical and Electronic Engineering, Islamic University of Technology, Gazipur , Bangladesh, Department of Biomedical Engineering Khulna University of Engineering & Technology, Khulna , Bangladesh 备注:This article has been submitted for publication in Biomedical Physics & Engineering Express 摘要:心血管疾病,尤其是心力衰竭,是我们这个时代的主要健康危害问题之一,也是全世界死亡的主要原因。使用机器学习(ML)模型的数据挖掘技术的进步为有希望的预测方法铺平了道路。数据挖掘是将医疗机构创建的大量原始数据转换为有意义的信息的过程,这些信息有助于做出预测和关键决策。本研究的主要目的是收集心力衰竭患者的各种随访数据,分析这些数据,并利用几种ML模型预测心血管病患者的生存可能性。由于数据集中类的不平衡性,实现了合成少数过采样技术(SMOTE)。我们的研究中使用了两个无监督模型(K-Means和模糊C-Means聚类)和三个有监督分类器(随机森林、XGBoost和决策树)。经过深入的研究,我们的结果证明了监督ML算法优于无监督模型。此外,我们还设计并提出了一个有监督的堆叠集成学习模型,该模型可以实现99.98%的准确率、准确率、召回率和F1分数。我们的研究表明,使用有监督的ML算法,只有从患者身上收集的某些属性才能成功预测心力衰竭后存活的可能性。 摘要:Cardiovascular disease, especially heart failure is one of the major health hazard issues of our time and is a leading cause of death worldwide. Advancement in data mining techniques using machine learning (ML) models is paving promising prediction approaches. Data mining is the process of converting massive volumes of raw data created by the healthcare institutions into meaningful information that can aid in making predictions and crucial decisions. Collecting various follow-up data from patients who have had heart failures, analyzing those data, and utilizing several ML models to predict the survival possibility of cardiovascular patients is the key aim of this study. Due to the imbalance of the classes in the dataset, Synthetic Minority Oversampling Technique (SMOTE) has been implemented. Two unsupervised models (K-Means and Fuzzy C-Means clustering) and three supervised classifiers (Random Forest, XGBoost and Decision Tree) have been used in our study. After thorough investigation, our results demonstrate a superior performance of the supervised ML algorithms over unsupervised models. Moreover, we designed and propose a supervised stacked ensemble learning model that can achieve an accuracy, precision, recall and F1 score of 99.98%. Our study shows that only certain attributes collected from the patients are imperative to successfully predict the surviving possibility post heart failure, using supervised ML algorithms.

【3】 Representation and Processing of Instantaneous and Durative Temporal Phenomena 标题:瞬时和持续性时间现象的表示和处理 链接:https://arxiv.org/abs/2108.13365

作者:Manolis Pitsikalis,Alexei Lisitsa,Shan Luo 机构:Department of Computer Science, University of Liverpool 备注:Pre-proceedings paper presented at the 31st International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2021), Tallinn, Estonia, and Virtual, September 7-8, 2021 (arXiv:2107.10160) 摘要:复杂事件处理系统中的事件定义受到每个系统语言表达能力的约束。一些系统允许定义瞬时复杂事件,而另一些系统允许定义持续复杂事件。尽管有两种选择的例外情况,但它们通常缺乏艾伦区间代数所规定的区间关系。在本文中,我们提出了一种新的基于逻辑的时态现象定义语言,专门为复杂事件处理而定制,它允许表示瞬时和持续现象以及它们之间的时态关系。此外,我们还通过使用一个海事用例(定义了感兴趣的海事事件)来展示我们建议的语言的表达能力。最后,我们分析了我们提出的流处理语言的执行语义,并介绍了“Phenesthe”实现原型。 摘要:Event definitions in Complex Event Processing systems are constrained by the expressiveness of each system's language. Some systems allow the definition of instantaneous complex events, while others allow the definition of durative complex events. While there are exceptions that offer both options, they often lack of intervals relations such as those specified by the Allen's interval algebra. In this paper, we propose a new logic based temporal phenomena definition language, specifically tailored for Complex Event Processing, that allows the representation of both instantaneous and durative phenomena and the temporal relations between them. Moreover, we demonstrate the expressiveness of our proposed language by employing a maritime use case where we define maritime events of interest. Finally, we analyse the execution semantics of our proposed language for stream processing and introduce the `Phenesthe' implementation prototype.

【4】 On the Multilingual Capabilities of Very Large-Scale English Language Models 标题:浅谈超大规模英语语言模型的多语种能力 链接:https://arxiv.org/abs/2108.13349

作者:Jordi Armengol-Estapé,Ona de Gibert Bonet,Maite Melero 机构:Text Mining Unit, Barcelona Supercomputing Center 摘要:生成式预训练Transformer(GPT)最近被扩展到机器学习史上前所未有的规模。这些模型仅针对语言建模目标进行训练,在许多不同的任务中表现出出色的少数镜头学习能力。然而,除了轶事之外,由于训练前语料库几乎完全由英语文本组成,人们对他们的多语言能力知之甚少。在这项工作中,我们调查了GPT-3的多语言技能,重点是训练前语料库中很少出现的一种语言,加泰罗尼亚语,这使得研究结果特别有意义;我们假设我们的结果可能也适用于其他语言。我们发现,该模型表现出了出色的性能,尤其是在生成性任务中,在语言理解任务中存在可预测的局限性,但在Zero-Shot场景下仍然具有显著的效果。我们研究了它在抽取式问答和自然语言生成方面的潜力和局限性,以及模型大小方面的规模效应。 摘要:Generative Pre-trained Transformers (GPTs) have recently been scaled to unprecedented sizes in the history of machine learning. These models, solely trained on the language modeling objective, have been shown to exhibit outstanding few-shot learning capabilities in a number of different tasks. Nevertheless, aside from anecdotal experiences, little is known regarding their multilingual capabilities, given the fact that the pre-training corpus is almost entirely composed of English text. In this work, we investigate the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan, which makes the results especially meaningful; we assume that our results may be relevant for other languages as well. We find that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario. We investigate its potential and limits in extractive question-answering and natural language generation, as well as the effect of scale in terms of model size.

【5】 Enlisting 3D Crop Models and GANs for More Data Efficient and Generalizable Fruit Detection 标题:征集3D作物模型和GAN以实现更高数据效率和更具通用性的水果检测 链接:https://arxiv.org/abs/2108.13344

作者:Zhenghao Fei,Alex Olenskyj,Brian N. Bailey,Mason Earles 机构:University of California, Davis 摘要:训练真实世界的神经网络模型以实现高性能和可推广性通常需要大量的标记数据,跨越广泛的变化范围。此数据标记过程可能是劳动密集型和成本密集型的。为了获得理想的预测性能,训练模型通常应用于数据分布与训练数据集相似的领域。然而,对于许多农业机器学习问题,训练数据集是在生长季节的特定时间在特定位置收集的。由于农业系统在作物类型、品种、管理、季节生长动态、光照条件、传感器类型等方面表现出很大的可变性,从一个数据集训练的模型通常不能很好地跨领域推广。为了使农业中的神经网络模型具有更高的数据效率和通用性,我们提出了一种从合成的3D作物模型域到真实世界作物域生成照片级真实农业图像的方法。该方法使用语义约束的生成性对抗网络(generativediscountary network,GAN)来保持果实的位置和几何结构。我们观察到,基线CycleGAN方法生成视觉逼真的目标域图像,但不保留水果位置信息,而我们的方法保持水果位置良好。葡萄园葡萄昼夜图像的图像生成结果表明,与基线网络相比,我们网络的视觉输出要好得多。葡萄园葡萄检测任务中的增量训练实验表明,该方法生成的图像可以显著加快域自适应过程,提高给定数量标记图像的性能(即数据效率),并降低标记要求。 摘要:Training real-world neural network models to achieve high performance and generalizability typically requires a substantial amount of labeled data, spanning a broad range of variation. This data-labeling process can be both labor and cost intensive. To achieve desirable predictive performance, a trained model is typically applied into a domain where the data distribution is similar to the training dataset. However, for many agricultural machine learning problems, training datasets are collected at a specific location, during a specific period in time of the growing season. Since agricultural systems exhibit substantial variability in terms of crop type, cultivar, management, seasonal growth dynamics, lighting condition, sensor type, etc, a model trained from one dataset often does not generalize well across domains. To enable more data efficient and generalizable neural network models in agriculture, we propose a method that generates photorealistic agricultural images from a synthetic 3D crop model domain into real world crop domains. The method uses a semantically constrained GAN (generative adversarial network) to preserve the fruit position and geometry. We observe that a baseline CycleGAN method generates visually realistic target domain images but does not preserve fruit position information while our method maintains fruit positions well. Image generation results in vineyard grape day and night images show the visual outputs of our network are much better compared to a baseline network. Incremental training experiments in vineyard grape detection tasks show that the images generated from our method can significantly speed the domain adaption process, increase performance for a given number of labeled images (i.e. data efficiency), and decrease labeling requirements.

【6】 A Mathematical Walkthrough and Discussion of the Free Energy Principle 标题:自由能原理的数学演绎与讨论 链接:https://arxiv.org/abs/2108.13343

作者:Beren Millidge,Anil Seth,Christopher L Buckley 机构:School of Informatics, University of Edinburgh, Anil K Seth, Sackler Center for Consciousness Science, Evolutionary and Adaptive Systems Research Group, School of Engineering and Informatics, University of Sussex 备注:30/08/21 initial upload 摘要:自由能原理(FEP)是一个有影响力和争议的理论,它假设自组织的随机热力学与通过变分推理学习之间存在着深刻而有力的联系。具体地说,它声称任何自组织系统都可以从统计上与其环境分离,并保持自身处于非平衡稳态,可以解释为最小化信息论泛函——变分自由能——从而执行变分贝叶斯推理来推断其环境的隐藏状态。这一原理在神经科学中也得到了广泛应用,并开始在机器学习领域取得进展,刺激了新的强大算法的构建,通过这些算法,动作、感知和学习都可以统一到一个目标下。尽管其广泛且往往宏大的主张在哲学和理论神经科学领域引发了重大争论,但该理论的数学深度以及缺乏可访问的介绍和指导,往往妨碍了对文献的深入理解。在这里,我们的目的是提供一个数学上详细的,但直观的步行通过公式和核心主张的FEP,同时也提供了一个必要的假设和理论的潜在局限性的讨论。此外,由于FEP仍然是一个活的理论,受到内部争议、变化和修订的影响,我们还提供了一个详细的附录,突出和浓缩当前的观点,以及关于FEP的性质、适用性、数学假设和形式的争议。 摘要:The Free-Energy-Principle (FEP) is an influential and controversial theory which postulates a deep and powerful connection between the stochastic thermodynamics of self-organization and learning through variational inference. Specifically, it claims that any self-organizing system which can be statistically separated from its environment, and which maintains itself at a non-equilibrium steady state, can be construed as minimizing an information-theoretic functional -- the variational free energy -- and thus performing variational Bayesian inference to infer the hidden state of its environment. This principle has also been applied extensively in neuroscience, and is beginning to make inroads in machine learning by spurring the construction of novel and powerful algorithms by which action, perception, and learning can all be unified under a single objective. While its expansive and often grandiose claims have spurred significant debates in both philosophy and theoretical neuroscience, the mathematical depth and lack of accessible introductions and tutorials for the core claims of the theory have often precluded a deep understanding within the literature. Here, we aim to provide a mathematically detailed, yet intuitive walk-through of the formulation and central claims of the FEP while also providing a discussion of the assumptions necessary and potential limitations of the theory. Additionally, since the FEP is a still a living theory, subject to internal controversy, change, and revision, we also present a detailed appendix highlighting and condensing current perspectives as well as controversies about the nature, applicability, and the mathematical assumptions and formalisms underlying the FEP.

【7】 A Conditional Cascade Model for Relational Triple Extraction 标题:一种关系三元组抽取的条件级联模型 链接:https://arxiv.org/abs/2108.13303

作者:Feiliang Ren,Longhui Zhang,Shujuan Yin,Xiaofeng Zhao,Shilei Liu,Bochao Li 机构:Northeastern University, Shenyang, China 备注:CIKM2021-Short 摘要:基于标记的方法是关系三元组抽取的主流方法之一。然而,他们中的大多数人深受阶级不平衡问题的困扰。在这里,我们提出了一个新的基于标记的模型,从以下两个方面解决了这个问题。首先,在模型层面,我们提出了一个三步提取框架,可以大大减少样本总数,这隐含着降低了上述问题的严重性。第二,在模型内部,我们提出了一种基于置信阈值的交叉熵损失,可以直接忽略主要类中的一些样本。我们在NYT和WebNLG上评估了所提出的模型。大量的实验表明,它可以有效地解决上述问题,并在两个数据集上都取得了最新的结果。我们模型的源代码可从以下网址获得:https://github.com/neukg/ConCasRTE. 摘要:Tagging based methods are one of the mainstream methods in relational triple extraction. However, most of them suffer from the class imbalance issue greatly. Here we propose a novel tagging based model that addresses this issue from following two aspects. First, at the model level, we propose a three-step extraction framework that can reduce the total number of samples greatly, which implicitly decreases the severity of the mentioned issue. Second, at the intra-model level, we propose a confidence threshold based cross entropy loss that can directly neglect some samples in the major classes. We evaluate the proposed model on NYT and WebNLG. Extensive experiments show that it can address the mentioned issue effectively and achieves state-of-the-art results on both datasets. The source code of our model is available at: https://github.com/neukg/ConCasRTE.

【8】 E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling 标题:基于提升模型的在线多选背包电子商务推广个性化 链接:https://arxiv.org/abs/2108.13298

作者:Javier Albert,Dmitri Goldenberg 摘要:促销和折扣是现代电子商务平台的重要组成部分,通常用于激励客户完成购买。促销也会影响收入,并可能导致金钱损失,而金钱损失通常受到专用促销预算的限制。我们研究了在线约束多项选择促销个性化问题,其中优化目标是为每个客户选择要提供的促销,以最大限度地提高购买完成率,同时遵守全球预算限制。我们的工作将该问题形式化为一个在线多项选择背包问题,并通过解决具有负权重和负值的案例扩展了现有文献。我们提供了一种实时自适应方法,确保符合预算约束,并在各种数据集上实现99.7%以上的最佳促销效果。我们的方法是在世界领先的在线旅游平台之一的大规模实验研究中评估的。 摘要:Promotions and discounts are essential components of modern e-commerce platforms, where they are often used to incentivize customers towards purchase completion. Promotions also affect revenue and may incur a monetary loss that is often limited by a dedicated promotional budget. We study the Online Constrained Multiple-Choice Promotions Personalization Problem, where the optimization goal is to select for each customer which promotion to present in order to maximize purchase completions, while also complying with global budget limitations. Our work formalizes the problem as an Online Multiple Choice Knapsack Problem and extends the existent literature by addressing cases with negative weights and values. We provide a real-time adaptive method that guarantees budget constraints compliance and achieves above 99.7% of the optimal promotional impact on various datasets. Our method is evaluated on a large-scale experimental study at one of the leading online travel platforms in the world.

【9】 Multi-Agent Simulation for AI Behaviour Discovery in Operations Research 标题:运筹学中人工智能行为发现的多Agent仿真 链接:https://arxiv.org/abs/2108.13296

作者:Michael Papasimeon,Lyndon Benke 机构:Defence Science and Technology Group, Lorimer Street, Fishermans Bend, VIC. , Australia 备注:14 pages, 7 figures. To be published in proceedings of the 22nd International Workshop on Multi-Agent-Based Simulation (MABS 2021) at AAMAS 2021. this https URL 摘要:我们描述了ACE0,一个轻量级平台,用于评估AI方法在多智能体模拟中行为发现的适用性和可行性。具体而言,ACE0旨在探索与自主飞机等新技术相关的运筹学研究中使用的多智能体模拟的AI方法。生产中使用的仿真环境通常高保真、复杂,需要大量领域知识,因此研发成本较高。最小和轻量级的模拟环境可以帮助研究人员和工程师评估新AI技术的可行性,以便以更灵活、更具潜在成本效益的方式进行行为发现。在本文中,我们描述了ACE0开发的动机。我们提供了系统架构的技术概述,描述了航空航天领域行为发现的案例研究,并对系统进行了定性评估。评估包括与学术合作伙伴合作研究项目的简要说明,探索不同的人工智能行为发现方法。 摘要:We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.

【10】 The missing link: Developing a safety case for perception components in automated driving 标题:缺失的环节:为自动驾驶中的感知部件开发安全案例 链接:https://arxiv.org/abs/2108.13294

作者:Rick Salay,Krzysztof Czarnecki,Hiroshi Kuwajima,Hirotoshi Yasuoka,Toshihiro Nakae,Vahdat Abdelzad,Chengjie Huang,Maximilian Kahn,Van Duong Nguyen 机构:University of Waterloo, Waterloo, Canada, DENSO CORPORATION, Tokyo, Japan 摘要:安全保证是自动驾驶(AD)系统开发和社会接受的核心问题。感知是AD的一个关键方面,它严重依赖于机器学习(ML)。尽管基于ML的组件的安全保证存在已知的挑战,但最近出现了针对这些组件的单元级安全案例的建议。不幸的是,AD安全案例表达了系统级的安全要求,而这些工作缺少将系统级的安全要求与单元级的组件性能要求联系起来的关键论点。在本文中,我们提出了一个通用的模板,专门为感知组件定制的链接参数。该模板采用演绎和形式化方法来定义级别之间的强可追溯性。我们通过详细的案例研究证明了模板的适用性,并讨论了其作为支持perception组件增量开发的工具的使用。 摘要:Safety assurance is a central concern for the development and societal acceptance of automated driving (AD) systems. Perception is a key aspect of AD that relies heavily on Machine Learning (ML). Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components. Unfortunately, AD safety cases express safety requirements at the system-level and these efforts are missing the critical linking argument connecting safety requirements at the system-level to component performance requirements at the unit-level. In this paper, we propose a generic template for such a linking argument specifically tailored for perception components. The template takes a deductive and formal approach to define strong traceability between levels. We demonstrate the applicability of the template with a detailed case study and discuss its use as a tool to support incremental development of perception components.

【11】 The effects of data size on Automated Essay Scoring engines 标题:数据大小对自动作文评分引擎的影响 链接:https://arxiv.org/abs/2108.13275

作者:Christopher Ormerod,Amir Jafari,Susan Lottridge,Milan Patel,Amy Harris,Paul van Wamelen 备注:14 pages, 3 figures, 5 tables 摘要:我们研究了数据大小和质量对自动论文评分(AES)引擎性能的影响,该引擎根据三种不同的范式设计;一个频率和手工制作的基于特征的模型,一个递归神经网络模型,以及一个基于预训练的Transformer的语言模型,该模型经过微调以用于分类。我们期望每种类型的模型都能以非常不同的方式从训练数据的大小和质量中获益。为AES发动机开发训练数据的标准实践是根据基于特征的方法制定的,然而,由于生产环境中越来越多地考虑神经网络,本工作旨在告知我们如何为生产中使用的神经网络建立更好的训练数据。 摘要:We study the effects of data size and quality on the performance on Automated Essay Scoring (AES) engines that are designed in accordance with three different paradigms; A frequency and hand-crafted feature-based model, a recurrent neural network model, and a pretrained transformer-based language model that is fine-tuned for classification. We expect that each type of model benefits from the size and the quality of the training data in very different ways. Standard practices for developing training data for AES engines were established with feature-based methods in mind, however, since neural networks are increasingly being considered in a production setting, this work seeks to inform us as to how to establish better training data for neural networks that will be used in production.

【12】 Deep Reinforcement Learning at the Edge of the Statistical Precipice 标题:统计悬崖边缘的深度强化学习 链接:https://arxiv.org/abs/2108.13264

作者:Rishabh Agarwal,Max Schwarzer,Pablo Samuel Castro,Aaron Courville,Marc G. Bellemare 机构:Google Research, Brain Team, MILA, Université de Montréal, CIFAR Fellow, MILA, McGill University 摘要:深度强化学习(RL)算法主要是通过比较它们在大量任务中的相对性能来评估的。在deep RL基准上公布的大多数结果都比较了总体绩效的点估计值,如任务的平均分和中位数,忽略了使用有限次数的训练所隐含的统计不确定性。从Arcade Learning Environment(ALE)开始,向计算要求高的基准的转变导致了每个任务只评估少量运行的实践,加剧了点估计的统计不确定性。在本文中,我们认为,在少数运行深度RL制度下的可靠评估不能忽视结果的不确定性,而不会降低该领域进展的风险。我们使用Atari 100k基准的案例研究来说明这一点,我们发现单独从点估计得出的结论与更彻底的统计分析得出的结论之间存在重大差异。为了通过少量的运行提高现场对报告结果的信心,我们提倡报告总体绩效的区间估计,并提出绩效概况,以说明结果的可变性,同时提出更稳健和有效的总体指标,如四分位平均分,实现结果的小不确定性。使用这些统计工具,我们仔细检查了其他广泛使用的RL基准(包括ALE、Procgen和DeepMind Control Suite)上现有算法的性能评估,再次揭示了先前比较中的差异。我们的发现要求改变我们评估deep RL性能的方式,为此,我们提出了更严格的评估方法,并提供了一个开源的rliable库,以防止不可靠的结果使该领域停滞不前。 摘要:Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.

【13】 Adaptive perturbation adversarial training: based on reinforcement learning 标题:基于强化学习的自适应扰动对抗训练 链接:https://arxiv.org/abs/2108.13239

作者:Zhishen Nie,Ying Lin,Sp Ren,Lan Zhang 机构: School of Software, Yunnan University, Kunming, Yunnan Province, China., Key Laboratory in Software Engineering of Yunnan Province. 摘要:对抗性训练已成为对抗对抗性样本的主要方法。然而,由于许多缺点,它很难实际应用。对抗训练的缺点之一是会降低正常样本的识别精度。为了缓解这一问题,提出了自适应扰动对抗训练。它使用接近决策边界但不跨越决策边界的边缘对抗性样本进行对抗性训练,在保持模型鲁棒性的同时提高了模型识别的准确性。然而,搜索边际对抗样本会带来额外的计算成本。提出了一种基于强化学习的边缘对抗样本发现方法,并将其与最新的快速对抗训练技术相结合,有效地加快了训练过程,降低了训练成本。 摘要:Adversarial training has become the primary method to defend against adversarial samples. However, it is hard to practically apply due to many shortcomings. One of the shortcomings of adversarial training is that it will reduce the recognition accuracy of normal samples. Adaptive perturbation adversarial training is proposed to alleviate this problem. It uses marginal adversarial samples that are close to the decision boundary but does not cross the decision boundary for adversarial training, which improves the accuracy of model recognition while maintaining the robustness of the model. However, searching for marginal adversarial samples brings additional computational costs. This paper proposes a method for finding marginal adversarial samples based on reinforcement learning, and combines it with the latest fast adversarial training technology, which effectively speeds up training process and reduces training costs.

【14】 PTRAIL -- A python package for parallel trajectory data preprocessing 标题:PTRAIL--用于并行弹道数据预处理的Python软件包 链接:https://arxiv.org/abs/2108.13202

作者:Salman Haidri,Yaksh J. Haranwala,Vania Bogorny,Chiara Renso,Vinicius Prado da Fonseca,Amilcar Soares 机构:Department of Computer Science, Memorial University, St. John’s, NL A,B ,X, Canada S.J. Crew Building, EN-, Institute of science and technology A. Faedo, National Research Council of Italy, Universidade Federal de Santa Catarina (UFSC), Brazil 摘要:轨迹数据表示随时间改变其在空间中位置的对象轨迹。这类数据的处理和分析非常复杂,因为它通常产生大量数据,往往容易因地理定位设备、人为错误处理或区域覆盖限制而产生错误。因此,需要专门定制用于预处理轨迹数据的软件。在这项工作中,我们提出了PTRAIL,这是一个python包,提供了几个轨迹预处理步骤,包括过滤、特征提取和插值。PTRAIL使用并行计算和矢量化,适用于大型数据集,与其他python库相比速度更快。 摘要:Trajectory data represent a trace of an object that changes its position in space over time. This kind of data is complex to handle and analyze, since it is generally produced in huge quantities, often prone to errors generated by the geolocation device, human mishandling, or area coverage limitation. Therefore, there is a need for software specifically tailored to preprocess trajectory data. In this work we propose PTRAIL, a python package offering several trajectory preprocessing steps, including filtering, feature extraction, and interpolation. PTRAIL uses parallel computation and vectorization, being suitable for large datasets and fast compared to other python libraries.

【15】 Enterprise Architecture Model Transformation Engine 标题:企业架构模型转换引擎 链接:https://arxiv.org/abs/2108.13169

作者:Erik Heiland,Peter Hillmann,Andreas Karcher 机构:Universit¨at der Bundeswehr M¨unchen, Werner-Heisenberg-Weg , Neubiberg, Germany 备注:None 摘要:随着价值链内的联系日益紧密,不同公司的IT系统也在相互连接。这使得服务能够集成到工业4.0运动中,以提高流程的质量和性能。企业架构模型通过更好的业务IT协调形成了这方面的基础。然而,建模框架和描述语言的异构性使得连接相当困难,特别是在语法、语义和关系方面的差异。因此,本文提出了一个转换引擎,用于在多种语言之间转换企业架构模型。我们开发了第一种通用翻译方法,它不需要特定的元建模,可以灵活地适应任意建模语言。转换过程由使用基于规则的描述语言的各种模式匹配技术定义。它以集合论和一阶逻辑的直观描述为基础。在德国一家大型IT服务提供商的领域中,使用一个示例对该概念进行了实际评估。无论如何,该方法适用于各种企业架构框架。 摘要:With increasing linkage within value chains, the IT systems of different companies are also being connected with each other. This enables the integration of services within the movement of Industry 4.0 in order to improve the quality and performance of the processes. Enterprise architecture models form the basis for this with a better buisness IT-alignment. However, the heterogeneity of the modeling frameworks and description languages makes a concatenation considerably difficult, especially differences in syntax, semantic and relations. Therefore, this paper presents a transformation engine to convert enterprise architecture models between several languages. We developed the first generic translation approach that is free of specific meta-modeling, which is flexible adaptable to arbitrary modeling languages. The transformation process is defined by various pattern matching techniques using a rule-based description language. It uses set theory and first-order logic for an intuitive description as a basis. The concept is practical evaluated using an example in the area of a large German IT-service provider. Anyhow, the approach is applicable between a wide range of enterprise architecture frameworks.

【16】 A Sentiment Analysis Dataset for Trustworthiness Evaluation 标题:一种用于可信度评估的情感分析数据集 链接:https://arxiv.org/abs/2108.13140

作者:Lijie Wang,Hao Liu,Shuyuan Peng,Hongxuan Tang,Xinyan Xiao,Ying Chen,Hua Wu 机构:Baidu Inc., Beijing, China 摘要:虽然深度学习模型极大地提高了大多数人工智能任务的性能,但由于黑盒问题,它们常常被批评为不可信。因此,人们提出了许多研究深度学习可信度的工作。然而,由于大多数开放式数据集是为评估模型输出的准确性而设计的,因此仍然缺乏用于评估神经网络内部工作的适当数据集。数据集的缺乏明显阻碍了诚信研究的发展。因此,为了系统地评估构建可信系统的因素,我们提出了一种新颖且注释良好的情绪分析数据集来评估鲁棒性和可解释性。为了评估这些因素,我们的数据集包含了各种各样的注释,包括具有挑战性的实例分布、手动敌对实例和情绪解释。进一步提出了几个用于解释性和鲁棒性的评估指标。基于数据集和度量,我们对三种典型模型的可信度进行了综合比较,并研究了准确性、稳健性和可解释性之间的关系。我们在url发布此可信评估数据集{https://github/xyz}并希望我们的工作能够促进为实际应用构建更可信系统的进展。 摘要:While deep learning models have greatly improved the performance of most artificial intelligence tasks, they are often criticized to be untrustworthy due to the black-box problem. Consequently, many works have been proposed to study the trustworthiness of deep learning. However, as most open datasets are designed for evaluating the accuracy of model outputs, there is still a lack of appropriate datasets for evaluating the inner workings of neural networks. The lack of datasets obviously hinders the development of trustworthiness research. Therefore, in order to systematically evaluate the factors for building trustworthy systems, we propose a novel and well-annotated sentiment analysis dataset to evaluate robustness and interpretability. To evaluate these factors, our dataset contains diverse annotations about the challenging distribution of instances, manual adversarial instances and sentiment explanations. Several evaluation metrics are further proposed for interpretability and robustness. Based on the dataset and metrics, we conduct comprehensive comparisons for the trustworthiness of three typical models, and also study the relations between accuracy, robustness and interpretability. We release this trustworthiness evaluation dataset at url{https://github/xyz} and hope our work can facilitate the progress on building more trustworthy systems for real-world applications.

【17】 Investigating Vulnerabilities of Deep Neural Policies 标题:调查深层神经策略的脆弱性 链接:https://arxiv.org/abs/2108.13093

作者:Ezgi Korkmaz 机构:KTH Royal Institute of Technology , Stockholm, Sweden. 备注:Presented at the Conference on Uncertainty in Artificial Intelligence (UAI) 2021 摘要:基于深度神经网络的强化学习策略很容易受到其输入的不可察觉的对抗性扰动的影响,这与神经网络图像分类器非常相似。最近的工作提出了几种方法,以提高深度强化学习代理在存在这些不可察觉干扰(即对抗性训练)的情况下基于训练的对抗性干扰的鲁棒性。在本文中,我们研究了对抗性训练对agent学习的神经策略的影响。特别是,我们采用两种不同的平行方法,研究基于最坏情况分布转移和特征敏感性的深层神经策略对抗性训练的结果。对于第一种方法,我们比较了针对对抗训练和普通训练神经策略计算的最小扰动的傅里叶谱。通过在OpenAI Atari环境中的实验,我们表明,针对敌对训练策略计算的最小扰动更集中于傅立叶域中的低频,这表明这些策略对低频扰动的敏感性更高。对于第二种方法,我们提出了一种新的方法来测量深度神经策略的特征敏感性,并比较了最先进的对抗训练深度神经策略和普通训练深度神经策略的这些特征敏感性差异。我们相信,我们的研究结果可以作为了解对抗性训练与神经策略鲁棒性不同概念之间关系的第一步。 摘要:Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs, in much the same way as neural network image classifiers. Recent work has proposed several methods to improve the robustness of deep reinforcement learning agents to adversarial perturbations based on training in the presence of these imperceptible perturbations (i.e. adversarial training). In this paper, we study the effects of adversarial training on the neural policy learned by the agent. In particular, we follow two distinct parallel approaches to investigate the outcomes of adversarial training on deep neural policies based on worst-case distributional shift and feature sensitivity. For the first approach, we compare the Fourier spectrum of minimal perturbations computed for both adversarially trained and vanilla trained neural policies. Via experiments in the OpenAI Atari environments we show that minimal perturbations computed for adversarially trained policies are more focused on lower frequencies in the Fourier domain, indicating a higher sensitivity of these policies to low frequency perturbations. For the second approach, we propose a novel method to measure the feature sensitivities of deep neural policies and we compare these feature sensitivity differences in state-of-the-art adversarially trained deep neural policies and vanilla trained deep neural policies. We believe our results can be an initial step towards understanding the relationship between adversarial training and different notions of robustness for neural policies.

【18】 An Introduction to Variational Inference 标题:变分推理导论 链接:https://arxiv.org/abs/2108.13083

作者:Ankush Ganguly,Samuel W. F. Earp 机构:Sertis Vision Lab† 备注:13 pages, 9 figures 摘要:近似复概率密度是现代统计学中的一个核心问题。在本文中,我们引入了变分推理(VI)的概念,这是机器学习中一种流行的方法,它使用优化技术来估计复杂的概率密度。这种特性使得VI比经典方法(如马尔可夫链蒙特卡罗抽样)收敛更快。从概念上讲,VI的工作原理是选择一系列概率密度函数,然后找到最接近实际概率密度的函数——通常使用Kullback-Leibler(KL)散度作为优化指标。我们引入证据下界来方便地计算近似概率密度,并回顾了平均场变分推理背后的思想。最后,我们讨论了虚拟仪器在变分自动编码器(VAE)和VAE生成对抗网络(VAE-GAN)中的应用。通过本文,我们旨在解释虚拟仪器的概念,并用这种方法帮助未来的研究。 摘要:Approximating complex probability densities is a core problem in modern statistics. In this paper, we introduce the concept of Variational Inference (VI), a popular method in machine learning that uses optimization techniques to estimate complex probability densities. This property allows VI to converge faster than classical methods, such as, Markov Chain Monte Carlo sampling. Conceptually, VI works by choosing a family of probability density functions and then finding the one closest to the actual probability density -- often using the Kullback-Leibler (KL) divergence as the optimization metric. We introduce the Evidence Lower Bound to tractably compute the approximated probability density and we review the ideas behind mean-field variational inference. Finally, we discuss the applications of VI to variational auto-encoders (VAE) and VAE-Generative Adversarial Network (VAE-GAN). With this paper, we aim to explain the concept of VI and assist in future research with this approach.

【19】 To tune or not to tune? An Approach for Recommending Important Hyperparameters 标题:调音还是不调音?一种推荐重要超参数的方法 链接:https://arxiv.org/abs/2108.13066

作者:Mohamadjavad Bahmani,Radwa El Shawi,Nshan Potikyan,Sherif Sakr 机构:Data Systems Group, University of Tartu, Tartu, Estonia 备注:Presented on The Fifth International Workshop on Automation in Machine Learning, A workshop to be held in conjunction with the KDD 2021 Conference 摘要:自动机器学习中的新技术减轻了算法选择和超参数优化的复杂性。超参数对于机器学习模型非常重要,因为它们显著影响机器学习模型的性能。许多优化技术在超参数调整方面取得了显著的成功,超过了人类专家的性能。然而,依靠诸如黑盒算法之类的技术,机器学习实践者可能无法深入了解不同超参数的相对重要性。在本文中,我们考虑建立机器学习模型的性能和它们的超参数之间的关系,发现趋势和增益洞察力,基于六个分类器和200个数据集的实证结果。我们的结果使用户能够决定是否值得执行可能耗时的调优策略,关注最重要的超参数,并选择足够的超参数空间进行调优。实验结果表明,梯度boosting和Adaboost在200个问题上优于其他分类器。但是,它们需要调整以提高性能。总的来说,从这项研究中获得的结果提供了一个定量基础,以集中精力进行引导式自动化超参数优化,并有助于开发更好的自动化机器学习框架。 摘要:Novel technologies in automated machine learning ease the complexity of algorithm selection and hyperparameter optimization. Hyperparameters are important for machine learning models as they significantly influence the performance of machine learning models. Many optimization techniques have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such techniques as blackbox algorithms can leave machine learning practitioners without insight into the relative importance of different hyperparameters. In this paper, we consider building the relationship between the performance of the machine learning models and their hyperparameters to discover the trend and gain insights, with empirical results based on six classifiers and 200 datasets. Our results enable users to decide whether it is worth conducting a possibly time-consuming tuning strategy, to focus on the most important hyperparameters, and to choose adequate hyperparameter spaces for tuning. The results of our experiments show that gradient boosting and Adaboost outperform other classifiers across 200 problems. However, they need tuning to boost their performance. Overall, the results obtained from this study provide a quantitative basis to focus efforts toward guided automated hyperparameter optimization and contribute toward the development of better-automated machine learning frameworks.

【20】 Satisfiability and Containment of Recursive SHACL 标题:递归SHACL的可满足性和包容性 链接:https://arxiv.org/abs/2108.13063

作者:Paolo Pareti,George Konstantinidis,Fabio Mogavero 机构:University of Winchester, Sparkford Road, SO,NR, Winchester, United Kingdom, University of Southampton, University Road, SO,BJ, Southampton, United Kingdom, Universita degli Studi di Napoli Federico II, Corso Umberto I , Napoli, Italy 摘要:形状约束语言(SHACL)是W3C最新的推荐语言,通过验证图形上的某些形状来验证RDF数据。以前的工作主要集中在验证问题和对设计和优化至关重要的可满足性和控制的标准决策问题上,仅针对简化版本的SHACL进行了研究。此外,SHACL规范没有定义递归定义的约束的语义,这导致文献中提出了几种可选的递归语义。这些不同语义和重要决策问题之间的相互作用还没有被研究。在本文中,我们通过提供一种新的一阶语言(称为SCL)的翻译,对SHACL的不同特征进行了全面的研究,该语言准确地捕捉了SHACL的语义。我们还介绍了MSCL,它是SCL的二阶扩展,允许我们在一个形式逻辑框架中定义SHACL的主要递归语义。在这种语言中,我们还提供了一种有效的处理过滤器约束的方法,这些约束在相关文献中经常被忽略。利用这一逻辑,我们为不同SHACL片段的可满足性和包含决策问题提供了(非)可判定性和复杂性结果的详细地图。值得注意的是,我们证明了这两个问题对于整个语言来说都是不可判定的,但我们给出了有趣特性的可判定组合,即使在面对递归时也是如此。 摘要:The Shapes Constraint Language (SHACL) is the recent W3C recommendation language for validating RDF data, by verifying certain shapes on graphs. Previous work has largely focused on the validation problem and the standard decision problems of satisfiability and containment, crucial for design and optimisation purposes, have only been investigated for simplified versions of SHACL. Moreover, the SHACL specification does not define the semantics of recursively-defined constraints, which led to several alternative recursive semantics being proposed in the literature. The interaction between these different semantics and important decision problems has not been investigated yet. In this article we provide a comprehensive study of the different features of SHACL, by providing a translation to a new first-order language, called SCL, that precisely captures the semantics of SHACL. We also present MSCL, a second-order extension of SCL, which allows us to define, in a single formal logic framework, the main recursive semantics of SHACL. Within this language we also provide an effective treatment of filter constraints which are often neglected in the related literature. Using this logic we provide a detailed map of (un)decidability and complexity results for the satisfiability and containment decision problems for different SHACL fragments. Notably, we prove that both problems are undecidable for the full language, but we present decidable combinations of interesting features, even in the face of recursion.

【21】 Demystifying Drug Repurposing Domain Comprehension with Knowledge Graph Embedding 标题:用知识图嵌入法揭开药物再利用领域理解的神秘面纱 链接:https://arxiv.org/abs/2108.13051

作者:Edoardo Ramalli,Alberto Parravicini,Guido Walter Di Donato,Mirko Salaris,Céline Hudelot,Marco Domenico Santambrogio 机构:C´eline Hudelot†, Marco D. Santambrogio∗, ∗Politecnico di Milano, DEIB, Milan, Italy, †Universit´e Paris-Saclay CentraleSupel´ec, MICS Lab, Gif-sur-Yvette, France 备注:5 pages, IEEE BioCAS 2021 摘要:由于药物开发成本不断上升以及需要快速应对新出现的疾病,药物再利用比以往任何时候都更加重要。知识图嵌入可以使用异构数据源结合最先进的机器学习模型重新调整药物用途,以预测知识图中的新药-疾病联系。在许多机器学习应用中,理解预测模型的行为仍然需要大量的工作。我们提出了一种结构化的方法来理解更好的机器学习模型对药物再利用的结果,建议知识图的关键元素来改进预测,同时节省计算资源。我们减少了11.05%的训练集和31.87%的嵌入空间,只减少了2%的准确度,在开放ogbl biokg图上增加了60%的准确度,只增加了1.53%的新三元组。 摘要:Drug repurposing is more relevant than ever due to drug development's rising costs and the need to respond to emerging diseases quickly. Knowledge graph embedding enables drug repurposing using heterogeneous data sources combined with state-of-the-art machine learning models to predict new drug-disease links in the knowledge graph. As in many machine learning applications, significant work is still required to understand the predictive models' behavior. We propose a structured methodology to understand better machine learning models' results for drug repurposing, suggesting key elements of the knowledge graph to improve predictions while saving computational resources. We reduce the training set of 11.05% and the embedding space by 31.87%, with only a 2% accuracy reduction, and increase accuracy by 60% on the open ogbl-biokg graph adding only 1.53% new triples.

【22】 Auto-Split: A General Framework of Collaborative Edge-Cloud AI 标题:Auto-Split:一种协同边缘云人工智能的通用框架 链接:https://arxiv.org/abs/2108.13041

作者:Amin Banitalebi-Dehkordi,Naveen Vedula,Jian Pei,Fei Xia,Lanjun Wang,Yong Zhang 机构:Huawei Technologies Canada Co. Ltd., Vancouver, Canada, School of Computing Science, Simon Fraser University, Shenzhen, China 备注:None 摘要:在许多行业规模的应用程序中,大型且消耗资源的机器学习模型驻留在功能强大的云服务器中。同时,大量的输入数据被收集在云的边缘。推理结果也会传递给用户或传递给边缘的下游任务。edge通常由大量低功耗设备组成。设计工业产品以支持复杂的深层模型部署并以高效的方式进行模型推理,从而保持高的模型精度和低的端到端延迟,这是一个巨大的挑战。本文描述了华为云的边缘云协作原型Auto Split背后的技术和工程实践。这项专利技术已经在选定的应用程序上得到验证,正在进行更广泛的系统边缘云应用程序集成,并作为端到端云边缘协作智能部署的自动化管道服务提供给公众使用。据我们所知,目前还没有提供深度神经网络(DNN)拆分功能的现有工业产品。 摘要:In many industry scale applications, large and resource consuming machine learning models reside in powerful cloud servers. At the same time, large amounts of input data are collected at the edge of cloud. The inference results are also communicated to users or passed to downstream tasks at the edge. The edge often consists of a large number of low-power devices. It is a big challenge to design industry products to support sophisticated deep model deployment and conduct model inference in an efficient manner so that the model accuracy remains high and the end-to-end latency is kept low. This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. This patented technology is already validated on selected applications, is on its way for broader systematic edge-cloud application integration, and is being made available for public use as an automated pipeline service for end-to-end cloud-edge collaborative intelligence deployment. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.

【23】 Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow 标题:混合车流的多车道交叉口综合决策与控制 链接:https://arxiv.org/abs/2108.13038

作者:Jianhua Jiang,Yangang Ren,Yang Guan,Shengbo Eben Li,Yuming Yin,Xiaoping Jin 机构:Tsinghua University, Beijing, China, Dongjie Yu, China Agricultural University 备注:8 pages, 10 figures, 11 equations and 14 conferences 摘要:交叉口自动驾驶是最复杂和最容易发生事故的交通场景之一,尤其是车辆、自行车和行人等混合交通参与者。驾驶政策应做出安全决策,以处理动态交通条件,并满足车载计算的要求。然而,目前的研究大多集中在只考虑周围车辆和理想交通灯的简化交叉口上。本文改进了综合决策与控制框架,提出了一种基于学习的混合交通流复杂交叉口处理算法,该算法既能考虑交通信号灯的现实特性,又能学习不同安全约束下的安全策略。我们首先考虑不同的速度模型的绿色和红灯在训练过程中,并使用有限状态机来处理不同模式的光转换。然后分别为车辆、交通灯、行人、自行车设计不同类型的距离约束,并将要优化的约束最优控制问题(OCP)公式化。最后,采用具有价值和策略网络的强化学习(RL)来解决OCP问题。为了验证该方法的安全性和有效性,我们设计了一个存在大规模混合交通参与者的多车道交叉口,并设置了实际的交通灯相位。仿真结果表明,经过训练的决策控制策略能够很好地平衡安全性和跟踪性能。与模型预测控制(MPC)相比,计算时间减少了三个数量级。 摘要:Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersections considering only the surrounding vehicles and idealized traffic lights. This paper improves the integrated decision and control framework and develops a learning-based algorithm to deal with complex intersections with mixed traffic flows, which can not only take account of realistic characteristics of traffic lights, but also learn a safe policy under different safety constraints. We first consider different velocity models for green and red lights in the training process and use a finite state machine to handle different modes of light transformation. Then we design different types of distance constraints for vehicles, traffic lights, pedestrians, bicycles respectively and formulize the constrained optimal control problems (OCPs) to be optimized. Finally, reinforcement learning (RL) with value and policy networks is adopted to solve the series of OCPs. In order to verify the safety and efficiency of the proposed method, we design a multi-lane intersection with the existence of large-scale mixed traffic participants and set practical traffic light phases. The simulation results indicate that the trained decision and control policy can well balance safety and tracking performance. Compared with model predictive control (MPC), the computational time is three orders of magnitude lower.

【24】 Aleatoric Description Logic for Probailistic Reasoning (Long Version) 标题:概率推理的任意描述逻辑(长版) 链接:https://arxiv.org/abs/2108.13036

作者:Tim French,Tom Smoker 机构:The University of Western Australia, Thomas Smoker 备注:Short version submitted to DL2021 摘要:描述逻辑是描述本体知识库的有力工具。也就是说,他们从个人、概念和关系的角度对世界进行了真实的描述。在存在不确定性的情况下,此类事实陈述是不可行的,需要主观或认知的方法。任意描述逻辑通过掷骰子将世界上的不确定性建模为任意事件,其中一个代理对这些骰子的偏差有主观信念。这提供了一种主观的贝叶斯描述逻辑,其中命题和关系根据理性主体在给定可能的个体和骰子配置的情况下下注的概率进行分配。任意描述逻辑被证明是描述逻辑ALC的推广,并且可以被视为描述ALC限制解释的概率空间,其中所有角色都是函数。考虑了几个计算问题,给出了模型检查和一致性检查算法。最后,任意描述逻辑被证明能够建模学习,其中代理能够根据观察结果根据骰子的偏差来调整他们的信念。 摘要:Description logics are a powerful tool for describing ontological knowledge bases. That is, they give a factual account of the world in terms of individuals, concepts and relations. In the presence of uncertainty, such factual accounts are not feasible, and a subjective or epistemic approach is required. Aleatoric description logic models uncertainty in the world as aleatoric events, by the roll of the dice, where an agent has subjective beliefs about the bias of these dice. This provides a subjective Bayesian description logic, where propositions and relations are assigned probabilities according to what a rational agent would bet, given a configuration of possible individuals and dice. Aleatoric description logic is shown to generalise the description logic ALC, and can be seen to describe a probability space of interpretations of a restriction of ALC where all roles are functions. Several computational problems are considered and model-checking and consistency checking algorithms are presented. Finally, aleatoric description logic is shown to be able to model learning, where agents are able to condition their beliefs on the bias of dice according to observations.

【25】 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning 标题:SurRoL:一个开源的以强化学习为中心的兼容dVRK的手术机器人学习平台 链接:https://arxiv.org/abs/2108.13035

作者:Jiaqi Xu,Bin Li,Bo Lu,Yun-Hui Liu,Qi Dou,Pheng-Ann Heng 机构: Heng are also with the T Stone Robotics Institute, Liu are with the Department of Mechanicaland Automation Engineering 备注:8 pages, 8 figures, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 摘要:自主手术执行减轻了繁琐的例行程序和外科医生的疲劳。最近的基于学习的方法,特别是基于强化学习(RL)的方法,在灵巧操作方面取得了很好的性能,这通常需要通过仿真来有效地收集数据并降低硬件成本。现有的基于学习的医疗机器人仿真平台存在场景受限和物理交互简化的问题,这降低了学习策略的真实性能。在这项工作中,我们设计了SurRoL,一个与达芬奇研究工具包(dVRK)兼容的以RL为中心的外科机器人学习仿真平台。设计的SurRoL集成了用于算法开发的用户友好的RL库和实时物理引擎,能够支持更多PSM/ECM场景和更真实的物理交互。该平台构建了10个基于学习的手术任务,这些任务在真正的自主手术执行中是常见的。我们在仿真中使用RL算法对SurRoL进行了评估,提供了深入的分析,在真实的dVRK上部署了经过训练的策略,并表明我们的SurRoL在现实世界中实现了更好的可转移性。 摘要:Autonomous surgical execution relieves tedious routines and surgeon's fatigue. Recent learning-based methods, especially reinforcement learning (RL) based methods, achieve promising performance for dexterous manipulation, which usually requires the simulation to collect data efficiently and reduce the hardware cost. The existing learning-based simulation platforms for medical robots suffer from limited scenarios and simplified physical interactions, which degrades the real-world performance of learned policies. In this work, we designed SurRoL, an RL-centered simulation platform for surgical robot learning compatible with the da Vinci Research Kit (dVRK). The designed SurRoL integrates a user-friendly RL library for algorithm development and a real-time physics engine, which is able to support more PSM/ECM scenarios and more realistic physical interactions. Ten learning-based surgical tasks are built in the platform, which are common in the real autonomous surgical execution. We evaluate SurRoL using RL algorithms in simulation, provide in-depth analysis, deploy the trained policies on the real dVRK, and show that our SurRoL achieves better transferability in the real world.

【26】 Transport-based Counterfactual Models 标题:基于运输的反事实模型 链接:https://arxiv.org/abs/2108.13025

作者:Lucas de Lara,Alberto González-Sanz,Nicholas Asher,Jean-Michel Loubes 机构:Institut de Mathématiques de Toulouse, Université Paul Sabatier, Institut de Recherche en Informatique de Toulouse, Université Paul Sabatier 摘要:反事实框架在可解释和公平的机器学习中越来越流行,因为它们提供了因果关系的自然概念。然而,计算反事实的最先进模型要么不现实,要么不可行。特别是,虽然Pearl的因果推理为计算反事实提供了吸引人的规则,但它依赖于一个未知且难以在实践中发现的模型。我们解决了在缺乏因果模型的情况下设计现实可行的反事实的问题。我们将基于传输的反事实模型定义为可观测分布之间联合概率分布的集合,并展示了它们与因果反事实的联系。更具体地说,我们认为最优运输理论定义了相关的基于运输的反事实模型,因为它们在数值上是可行的,在统计上是可信的,甚至可以与因果反事实模型相吻合。我们通过定义比典型组公平条件更清晰的公平标准来说明这些模型的实用性。 摘要:Counterfactual frameworks have grown popular in explainable and fair machine learning, as they offer a natural notion of causation. However, state-of-the-art models to compute counterfactuals are either unrealistic or unfeasible. In particular, while Pearl's causal inference provides appealing rules to calculate counterfactuals, it relies on a model that is unknown and hard to discover in practice. We address the problem of designing realistic and feasible counterfactuals in the absence of a causal model. We define transport-based counterfactual models as collections of joint probability distributions between observable distributions, and show their connection to causal counterfactuals. More specifically, we argue that optimal transport theory defines relevant transport-based counterfactual models, as they are numerically feasible, statistically-faithful, and can even coincide with causal counterfactual models. We illustrate the practicality of these models by defining sharper fairness criteria than typical group fairness conditions.

【27】 A Temporal Knowledge Graph Completion Method Based on Balanced Timestamp Distribution 标题:一种基于均衡时间戳分布的时态知识图补全方法 链接:https://arxiv.org/abs/2108.13024

作者:Kangzheng Liu,Yuhong Zhang 机构: National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China 备注:14 pages, 1 figures 摘要:知识图的嵌入表示(KGE)是近年来的一个研究热点。现实知识图大多与时间有关,而现有的KGE算法大多忽略了时间信息。现有的一些方法直接或间接地对时间信息进行编码,忽略了时间戳分布的平衡,极大地限制了时态知识图完成(KGC)的性能。本文提出了一种基于直接编码时间信息框架的时间KGC方法,将给定的时间片作为时间戳均衡分布的最细粒度。对从现实世界中提取的时态知识图数据集进行的大量实验证明了该方法的有效性。 摘要:Completion through the embedding representation of the knowledge graph (KGE) has been a research hotspot in recent years. Realistic knowledge graphs are mostly related to time, while most of the existing KGE algorithms ignore the time information. A few existing methods directly or indirectly encode the time information, ignoring the balance of timestamp distribution, which greatly limits the performance of temporal knowledge graph completion (KGC). In this paper, a temporal KGC method is proposed based on the direct encoding time information framework, and a given time slice is treated as the finest granularity for balanced timestamp distribution. A large number of experiments on temporal knowledge graph datasets extracted from the real world demonstrate the effectiveness of our method.

【28】 Communication-Computation Efficient Device-Edge Co-Inference via AutoML 标题:基于AutoML的通信计算高效设备边缘协同推理 链接:https://arxiv.org/abs/2108.13009

作者:Xinjie Zhang,Jiawei Shao,Yuyi Mao,Jun Zhang 机构:∗Dept. of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, †Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong 摘要:设备边缘协同推理(Device-edge-co-inference)是一种在资源受限的移动设备和边缘服务器之间划分深度神经网络的方法,最近成为支持智能移动应用的一种很有前途的范例。为了加快推理过程,设备模型稀疏化和中间特征压缩被认为是两种重要的技术。然而,由于设备模型稀疏度水平和中间特征压缩比分别直接影响计算量和通信开销,并且两者都影响推理精度,因此,由于搜索空间大,寻找这些超参数的最优值带来了重大挑战。在本文中,我们致力于开发一种有效的算法来确定这些超参数。通过选择合适的模型分割点和一对编码器/解码器作为中间特征向量,将该问题转化为序列决策问题,提出了一种基于深度强化学习(DRL)的自动机器学习(AutoML)框架。在一个图像分类任务上的实验结果证明了该框架的有效性,在不同的基线方案下,该框架实现了更好的通信计算权衡和显著的推理加速。 摘要:Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the inference process, on-device model sparsification and intermediate feature compression are regarded as two prominent techniques. However, as the on-device model sparsity level and intermediate feature compression ratio have direct impacts on computation workload and communication overhead respectively, and both of them affect the inference accuracy, finding the optimal values of these hyper-parameters brings a major challenge due to the large search space. In this paper, we endeavor to develop an efficient algorithm to determine these hyper-parameters. By selecting a suitable model split point and a pair of encoder/decoder for the intermediate feature vector, this problem is casted as a sequential decision problem, for which, a novel automated machine learning (AutoML) framework is proposed based on deep reinforcement learning (DRL). Experiment results on an image classification task demonstrate the effectiveness of the proposed framework in achieving a better communication-computation trade-off and significant inference speedup against various baseline schemes.

【29】 X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph 标题:X2Teeth:基于单张全景X线片的牙齿三维重建 链接:https://arxiv.org/abs/2108.13004

作者:Yuan Liang,Weinan Song,Jiawei Yang,Liang Qiu,Kun Wang,Lei He 机构:University of California, Los Angeles, CA , USA 摘要:X射线三维牙齿重建对于牙齿诊断和许多临床手术具有重要意义。然而,现有的研究还没有探索从一张全景片重建整个龋齿的方法。与从照片中重建单个对象不同,该任务具有在高分辨率下构建多个对象的独特挑战。为了解决这一问题,我们开发了一种新的牙齿定位算法,将任务分解为牙齿定位和单个形状估计。我们还引入了一种基于贴片的训练策略,这样就可以对X2牙齿进行端到端的训练,以获得最佳性能。大量的实验表明,我们的方法可以成功地估计空腔的三维结构,并反映每个牙齿的细节。此外,X2齿实现了0.681的重建IoU,其显著优于编码器-解码器方法$1.71X和基于检索的方法$1.52X。我们的方法也可以用于其他多解剖三维重建任务。 摘要:3D teeth reconstruction from X-ray is important for dental diagnosis and many clinical operations. However, no existing work has explored the reconstruction of teeth for a whole cavity from a single panoramic radiograph. Different from single object reconstruction from photos, this task has the unique challenge of constructing multiple objects at high resolutions. To conquer this task, we develop a novel ConvNet X2Teeth that decomposes the task into teeth localization and single-shape estimation. We also introduce a patch-based training strategy, such that X2Teeth can be end-to-end trained for optimal performance. Extensive experiments show that our method can successfully estimate the 3D structure of the cavity and reflect the details for each tooth. Moreover, X2Teeth achieves a reconstruction IoU of 0.681, which significantly outperforms the encoder-decoder method by $1.71X and the retrieval-based method by $1.52X. Our method can also be promising for other multi-anatomy 3D reconstruction tasks.

【30】 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations 标题:3DStyleNet:使用几何样式和纹理样式变化创建3D形状 链接:https://arxiv.org/abs/2108.12958

作者:Kangxue Yin,Jun Gao,Maria Shugrina,Sameh Khamis,Sanja Fidler 机构:NVIDIA, University of Toronto, Vector Institute 备注:Accepted to ICCV 2021. Supplementary material can be found on the project page: this https URL 摘要:我们提出了一种方法来创建三维对象的合理几何和纹理样式变化,以实现三维内容创建的民主化。给定一对纹理源和目标对象,我们的方法预测一个部分感知仿射变换场,该变换场自然扭曲源形状以模仿目标的整体几何样式。此外,在多视图可微分渲染器的帮助下,目标的纹理样式将转移到扭曲的源对象。我们的模型3DStyleNet由两个子网络组成,分两个阶段进行训练。首先,几何样式网络是在一大组无结构的三维形状上训练的。其次,我们联合优化我们的几何样式网络和预先训练的图像样式传输网络,并在几何和结果的渲染上定义损失。给定一小组高质量的纹理对象,我们的方法可以创建许多新颖的样式化形状,从而轻松创建3D内容和增加style ware数据。我们在3D内容样式化方面定性地展示了我们的方法,并提供用户研究来验证我们结果的质量。此外,我们的方法可以作为一个有价值的工具,为计算机视觉任务创建三维数据增强。大量的定量分析表明,3DStyleNet在单图像三维重建的下游任务中优于其他数据增强技术。 摘要:We propose a method to create plausible geometric and texture style variations of 3D objects in the quest to democratize 3D content creation. Given a pair of textured source and target objects, our method predicts a part-aware affine transformation field that naturally warps the source shape to imitate the overall geometric style of the target. In addition, the texture style of the target is transferred to the warped source object with the help of a multi-view differentiable renderer. Our model, 3DStyleNet, is composed of two sub-networks trained in two stages. First, the geometric style network is trained on a large set of untextured 3D shapes. Second, we jointly optimize our geometric style network and a pre-trained image style transfer network with losses defined over both the geometry and the rendering of the result. Given a small set of high-quality textured objects, our method can create many novel stylized shapes, resulting in effortless 3D content creation and style-ware data augmentation. We showcase our approach qualitatively on 3D content stylization, and provide user studies to validate the quality of our results. In addition, our method can serve as a valuable tool to create 3D data augmentations for computer vision tasks. Extensive quantitative analysis shows that 3DStyleNet outperforms alternative data augmentation techniques for the downstream task of single-image 3D reconstruction.

【31】 Searching for Two-Stream Models in Multivariate Space for Video Recognition 标题:用于视频识别的多变量空间双流模型搜索 链接:https://arxiv.org/abs/2108.12957

作者:Xinyu Gong,Heng Wang,Zheng Shou,Matt Feiszli,Zhangyang Wang,Zhicheng Yan 机构:†Facebook AI, ‡The University of Texas at Austin 备注:Accepted by ICCV 2021 摘要:传统的视频模型依赖于单个流来捕获复杂的时空特征。最近关于两个流视频模型(如SlowFast network和AssembleNet)的工作规定了单独的流来学习互补功能,并实现了更强的性能。然而,手动设计两个流以及中间融合块是一项艰巨的任务,需要探索巨大的设计空间。这样的人工探索非常耗时,当计算资源有限且探索不足时,往往会以次优架构告终。在这项工作中,我们提出了一种实用的神经结构搜索方法,它能够在巨大的空间中高效地搜索两个流视频模型。我们设计了一个多元搜索空间,包括6个搜索变量,以捕获设计两个流模型时的各种选择。此外,我们提出了一种渐进式搜索过程,通过逐个搜索各个流、融合块和注意块的结构。我们演示了在我们的设计空间中可以自动发现两个性能显著更好的流模型。我们搜索的两个流模型,即Auto-TSNet,在标准基准上始终优于其他模型。在动力学方面,与慢速模型相比,我们的Auto-TSNet-L模型减少了近11倍的失败,同时实现了78.9%的相同精度。在Something-Something-V2上,Auto-TSNet-M比其他每个视频使用不到50gflops的方法至少提高了2%的准确性。 摘要:Conventional video models rely on a single stream to capture the complex spatial-temporal features. Recent work on two-stream video models, such as SlowFast network and AssembleNet, prescribe separate streams to learn complementary features, and achieve stronger performance. However, manually designing both streams as well as the in-between fusion blocks is a daunting task, requiring to explore a tremendously large design space. Such manual exploration is time-consuming and often ends up with sub-optimal architectures when computational resources are limited and the exploration is insufficient. In this work, we present a pragmatic neural architecture search approach, which is able to search for two-stream video models in giant spaces efficiently. We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models. Furthermore, we propose a progressive search procedure, by searching for the architecture of individual streams, fusion blocks, and attention blocks one after the other. We demonstrate two-stream models with significantly better performance can be automatically discovered in our design space. Our searched two-stream models, namely Auto-TSNet, consistently outperform other models on standard benchmarks. On Kinetics, compared with the SlowFast model, our Auto-TSNet-L model reduces FLOPS by nearly 11 times while achieving the same accuracy 78.9%. On Something-Something-V2, Auto-TSNet-M improves the accuracy by at least 2% over other methods which use less than 50 GFLOPS per video.

【32】 RetroGAN: A Cyclic Post-Specialization System for Improving Out-of-Knowledge and Rare Word Representations 标题:RetroGan:一种改进知识外和稀有词表征的循环后专业化系统 链接:https://arxiv.org/abs/2108.12941

作者:Pedro Colon-Hernandez,Yida Xin,Henry Lieberman,Catherine Havasi,Cynthia Breazeal,Peter Chin 机构:MIT Media Lab, Boston University, MIT CSAIL, Basis Technologies∗ 摘要:改装是一种技术,用于将词向量在空间中移动得更近或更远,以反映它们在知识库(KB)中的关系。但是,改装只适用于该知识库中存在的概念。RetroGAN使用一对生成性对抗网络(GAN)来学习概念与改造后的对应概念之间的一对一映射。它将映射(post)应用于处理未出现在原始知识库中的概念,其方式类似于某些自然语言系统处理词汇表外条目的方式。我们在三个单词相似性基准和一个下游句子简化任务上测试了我们的系统,并达到了最先进的水平(CARD-660)。总之,我们的结果证明了我们的系统对于知识不足和稀有词泛化的有效性。 摘要:Retrofitting is a technique used to move word vectors closer together or further apart in their space to reflect their relationships in a Knowledge Base (KB). However, retrofitting only works on concepts that are present in that KB. RetroGAN uses a pair of Generative Adversarial Networks (GANs) to learn a one-to-one mapping between concepts and their retrofitted counterparts. It applies that mapping (post-specializes) to handle concepts that do not appear in the original KB in a manner similar to how some natural language systems handle out-of-vocabulary entries. We test our system on three word-similarity benchmarks and a downstream sentence simplification task and achieve the state of the art (CARD-660). Altogether, our results demonstrate our system's effectiveness for out-of-knowledge and rare word generalization.

【33】 Distributed Swarm Collision Avoidance Based on Angular Calculations 标题:基于角度计算的分布式群体避碰 链接:https://arxiv.org/abs/2108.12934

作者:SeyedZahir Qazavi,Samaneh Hosseini Semnani 机构: 1 SeyedZahir Qazavi is with the Department of Electrical and Computer Engineering, Isfahan University of Technology, ir) 2 Samaneh Hosseini Semnani is with the Department of Electrical and Computer Engineering 摘要:碰撞避免是机器人领域最重要的课题之一。目标是将机器人从初始位置移动到目标位置,以便它们在最短的时间内以最少的能量遵循最短的非碰撞路径。本文提出了一种适用于密集复杂二维和三维环境的分布式实时算法。该算法使用角度计算来选择每个机器人的最佳运动方向,并且已经证明,这些单独的计算会导致代理之间的合作行为。我们在各种仿真和实验场景下评估了所提出的方法,并将结果与该领域的两个重要算法FMP和ORCA进行了比较。结果表明,该方法比ORCA方法至少快25%,比FMP方法至少快7%,并且比两种方法都更可靠。所提出的方法被证明能够实现对一群疯狂飞行物的完全自主导航。 摘要:Collision avoidance is one of the most important topics in the robotics field. The goal is to move the robots from initial locations to target locations such that they follow shortest non-colliding paths in the shortest time and with the least amount of energy. In this paper, a distributed and real-time algorithm for dense and complex 2D and 3D environments is proposed. This algorithm uses angular calculations to select the optimal direction for the movement of each robot and it has been shown that these separate calculations lead to a form of cooperative behavior among agents. We evaluated the proposed approach on various simulation and experimental scenarios and compared the results with FMP and ORCA, two important algorithms in this field. The results show that the proposed approach is at least 25% faster than ORCA and at least 7% faster than FMP and also more reliable than both methods. The proposed method is shown to enable fully autonomous navigation of a swarm of crazyflies.

【34】 KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning 标题:KO码:发明基于深度学习的可靠无线通信的非线性编译码 链接:https://arxiv.org/abs/2108.12920

作者:Ashok Vardhan Makkuva,Xiyang Liu,Mohammad Vahid Jamali,Hessam Mahdavifar,Sewoong Oh,Pramod Viswanath 机构:Equal contribution 1Department of Electrical and ComputerEngineering, University of Illinois at Urbana-Champaign 2Paul G, Allen School of Computer Science & Engineering, University ofWashington 3Department of Electrical Engineerign and ComputerScience 摘要:Landmark码是可靠物理层通信的基础,例如Reed Muller码、BCH码、卷积码、Turbo码、LDPC码和极性码:每种码都是线性码,代表着数学上的突破。对人类的影响是巨大的:这些代码中的每一个都被用于全球无线通信标准(卫星、WiFi、蜂窝)。在经典加性高斯白噪声(AWGN)信道上通信的可靠性使得不同代码的基准测试和排序成为可能。在本文中,我们构造了KO码,这是一个计算效率高的深度学习驱动(编码器、解码器)对族,在标准AWGN信道上的性能优于最先进的可靠性。在AWGN信道上,KO码在具有挑战性的中短分组长度体制下,在低复杂度连续抵消译码下击败了最先进的Reed-Muller码和Polar码。我们证明,KO码的增益主要是由于信息比特直接传输真实符号(绕过调制)的非线性映射,并且具有高效、高性能的解码器。使这成为可能的关键技术创新是设计一个新的神经体系结构家族,其灵感来源于里德·穆勒和极性代码的{bf K}朗尼克{bf O}运算(KO)的计算树。这些体系结构为迄今为止尚未探索的更丰富的非线性代数结构的发现铺平了道路。该代码可从href获取{https://github.com/deepcomm/KOcodes}{https://github.com/deepcomm/KOcodes} 摘要:Landmark codes underpin reliable physical layer communication, e.g., Reed-Muller, BCH, Convolution, Turbo, LDPC and Polar codes: each is a linear code and represents a mathematical breakthrough. The impact on humanity is huge: each of these codes has been used in global wireless communication standards (satellite, WiFi, cellular). Reliability of communication over the classical additive white Gaussian noise (AWGN) channel enables benchmarking and ranking of the different codes. In this paper, we construct KO codes, a computationaly efficient family of deep-learning driven (encoder, decoder) pairs that outperform the state-of-the-art reliability performance on the standardized AWGN channel. KO codes beat state-of-the-art Reed-Muller and Polar codes, under the low-complexity successive cancellation decoding, in the challenging short-to-medium block length regime on the AWGN channel. We show that the gains of KO codes are primarily due to the nonlinear mapping of information bits directly to transmit real symbols (bypassing modulation) and yet possess an efficient, high performance decoder. The key technical innovation that renders this possible is design of a novel family of neural architectures inspired by the computation tree of the {bf K}ronecker {bf O}peration (KO) central to Reed-Muller and Polar codes. These architectures pave way for the discovery of a much richer class of hitherto unexplored nonlinear algebraic structures. The code is available at href{https://github.com/deepcomm/KOcodes}{https://github.com/deepcomm/KOcodes}

【35】 Lipschitz Continuity Guided Knowledge Distillation 标题:Lipschitz连续性指导的知识蒸馏 链接:https://arxiv.org/abs/2108.12905

作者:Yuzhang Shang,Bin Duan,Ziliang Zong,Liqiang Nie,Yan Yan 机构:Department of Computer Science, Illinois Institute of Technology, USA, Department of Computer Science, Texas State University, USA, School of Computer Science and Technology, Shandong University, China 备注:This work has been accepted by ICCV 2021 摘要:通过将知识从较大的教师网络提取到较小的学生网络,知识提取已成为最重要的模型压缩技术之一。尽管先前的蒸馏方法通过精心设计各种类型的知识取得了巨大的成功,但它们忽略了神经网络的功能特性,这使得将这些技术应用于新任务的过程不可靠且非常繁琐。为了缓解这一问题,本文首先利用Lipschitz连续性来更好地表示神经网络的功能特性,并指导知识提取过程。特别地,我们提出了一种新的Lipschitz连续性引导的知识提取框架,通过最小化两个神经网络Lipschitz常数之间的距离来忠实地提取知识,这使得教师网络能够更好地正则化学生网络并提高相应的性能。我们推导了一个可解释的近似算法,并给出了一个明确的理论推导,以解决计算Lipschitz常数的NP难问题。实验结果表明,在CIFAR-100、ImageNet和PASCAL VOC数据集上,我们的方法优于其他基准测试,优于一些知识提取任务(例如分类、分割和对象检测)。 摘要:Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniques to new tasks unreliable and non-trivial. To alleviate such problem, in this paper, we initially leverage Lipschitz continuity to better represent the functional characteristic of neural networks and guide the knowledge distillation process. In particular, we propose a novel Lipschitz Continuity Guided Knowledge Distillation framework to faithfully distill knowledge by minimizing the distance between two neural networks' Lipschitz constants, which enables teacher networks to better regularize student networks and improve the corresponding performance. We derive an explainable approximation algorithm with an explicit theoretical derivation to address the NP-hard problem of calculating the Lipschitz constant. Experimental results have shown that our method outperforms other benchmarks over several knowledge distillation tasks (e.g., classification, segmentation and object detection) on CIFAR-100, ImageNet, and PASCAL VOC datasets.

【36】 Generating Answer Candidates for Quizzes and Answer-Aware Question Generators 标题:为测验和可识别答案的问题生成器生成候选答案 链接:https://arxiv.org/abs/2108.12898

作者:Kristiyan Vachev,Momchil Hardalov,Georgi Karadzhov,Georgi Georgiev,Ivan Koychev,Preslav Nakov 机构:FMI, Sofia University, “St. Kliment Ohridski”, Sofia, Bulgaria, Department of Computer, Science and Technology, University of Cambridge, UK, Releva AI, FMI and GATE, Qatar Computing Research Institute, HBKU, Doha, Qatar 备注:None 摘要:在教育领域,开放式测验已经成为评估学生知识的重要工具。然而,手动准备此类问题是一项繁琐的任务,因此,自动生成问题已被提议作为一种可能的替代方案。到目前为止,绝大多数研究都集中在生成问题文本上,依赖于具有现成答案的问答数据集,而如何首先提出答案候选者的问题基本上被忽略了。在这里,我们的目标是弥合这一差距。特别是,我们提出了一个模型,可以为给定的文本段生成指定数量的候选答案,然后讲师可以使用该模型手动编写问题,或者将其作为输入传递给自动答案感知问题生成器。我们的实验表明,我们提出的答案候选生成模型优于几个基线。 摘要:In education, open-ended quiz questions have become an important tool for assessing the knowledge of students. Yet, manually preparing such questions is a tedious task, and thus automatic question generation has been proposed as a possible alternative. So far, the vast majority of research has focused on generating the question text, relying on question answering datasets with readily picked answers, and the problem of how to come up with answer candidates in the first place has been largely ignored. Here, we aim to bridge this gap. In particular, we propose a model that can generate a specified number of answer candidates for a given passage of text, which can then be used by instructors to write questions manually or can be passed as an input to automatic answer-aware question generators. Our experiments show that our proposed answer candidate generation model outperforms several baselines.

【37】 Neural Network Gaussian Processes by Increasing Depth 标题:递增深度的神经网络高斯过程 链接:https://arxiv.org/abs/2108.12862

作者:Shao-Qun Zhang,Feng-Lei Fan 机构:National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China, AI-based X-ray Imaging System (AXIS) Lab, Rensselaer Polytechnic Institute, Troy , NY, USA 摘要:近年来,人们对无限宽网络和高斯过程之间的对应关系越来越感兴趣。尽管当前的神经网络高斯过程理论有效且优雅,但据我们所知,所有的神经网络高斯过程本质上都是由宽度增加引起的。然而,在深度学习时代,我们更关心的是神经网络的深度以及深度如何影响网络的行为。受广度-深度对称性考虑的启发,我们使用一个快捷网络来表明,增加神经网络的深度也可以产生高斯过程,这是对现有理论的一个有价值的补充,有助于揭示深度学习的真实图景。除了所提出的高斯过程的深度,我们从理论上刻画了它的一致紧性及其相关核的最小特征值。这些表征不仅可以增强我们对深度诱导高斯过程的理解,而且为将来的应用铺平了道路。最后,我们通过在两个真实数据集上的回归实验来检验所提出的高斯过程的性能。 摘要:Recent years have witnessed an increasing interest in the correspondence between infinitely wide networks and Gaussian processes. Despite the effectiveness and elegance of the current neural network Gaussian process theory, to the best of our knowledge, all the neural network Gaussian processes are essentially induced by increasing width. However, in the era of deep learning, what concerns us more regarding a neural network is its depth as well as how depth impacts the behaviors of a network. Inspired by a width-depth symmetry consideration, we use a shortcut network to show that increasing the depth of a neural network can also give rise to a Gaussian process, which is a valuable addition to the existing theory and contributes to revealing the true picture of deep learning. Beyond the proposed Gaussian process by depth, we theoretically characterize its uniform tightness property and the smallest eigenvalue of its associated kernel. These characterizations can not only enhance our understanding of the proposed depth-induced Gaussian processes, but also pave the way for future applications. Lastly, we examine the performance of the proposed Gaussian process by regression experiments on two real-world data sets.

【38】 Flow-Guided Video Inpainting with Scene Templates 标题:基于场景模板的流导视频修复 链接:https://arxiv.org/abs/2108.12845

作者:Dong Lao,Peihao Zhu,Peter Wonka,Ganesh Sundaramoorthi 机构:KAUST, Saudi Arabia 摘要:我们考虑的问题,填补了空时空的视频区域。我们通过引入与场景相关的图像生成模型(无缺失区域)和从场景到图像的映射,提供了一种新的基于流的解决方案。我们使用该模型联合推断场景模板、场景的二维表示和映射。这确保了生成到基础场景的帧到帧流的一致性,减少了基于流的修复中的几何扭曲。模板通过新的L2-L1插值方案映射到视频中缺失的区域,创建清晰的修复并减少常见的模糊和失真伪影。我们在两个基准数据集上显示,我们的方法在数量和用户研究方面都超过了最先进的水平。 摘要:We consider the problem of filling in missing spatio-temporal regions of a video. We provide a novel flow-based solution by introducing a generative model of images in relation to the scene (without missing regions) and mappings from the scene to images. We use the model to jointly infer the scene template, a 2D representation of the scene, and the mappings. This ensures consistency of the frame-to-frame flows generated to the underlying scene, reducing geometric distortions in flow based inpainting. The template is mapped to the missing regions in the video by a new L2-L1 interpolation scheme, creating crisp inpaintings and reducing common blur and distortion artifacts. We show on two benchmark datasets that our approach out-performs state-of-the-art quantitatively and in user studies.

【39】 A Hybrid Rule-Based and Data-Driven Approach to Driver Modeling through Particle Filtering 标题:基于规则和数据驱动的混合粒子滤波驾驶员建模方法 链接:https://arxiv.org/abs/2108.12820

作者:Raunak Bhattacharyya,Soyeon Jung,Liam Kruse,Ransalu Senanayake,Mykel Kochenderfer 机构: Kochender-fer are with the Stanford Intelligent Systems Laboratory in the Departmentof Aeronautics and Astronautics at Stanford University 备注:arXiv admin note: text overlap with arXiv:2005.02597 摘要:自动驾驶车辆需要模拟周围人类驾驶车辆的行为,才能成为安全高效的交通参与者。现有的人类驾驶行为建模方法依赖于数据驱动和基于规则的方法。虽然数据驱动模型更具表现力,但基于规则的模型是可解释的,这是驾驶等安全关键领域的一项重要要求。然而,基于规则的模型不能充分代表数据,并且由于不现实的驾驶行为(如碰撞),数据驱动的模型仍然无法生成真实的交通仿真。在本文中,我们提出了一种将基于规则的建模与数据驱动学习相结合的方法。虽然规则由驾驶员模型的可解释参数控制,但这些参数是使用粒子滤波从驾驶演示数据在线学习的。我们使用来自三个真实驾驶演示数据集的数据,对高速公路驾驶和合并任务进行了驾驶员建模实验。我们的结果表明,基于混合规则和数据驱动方法的驾驶员模型能够准确地捕捉真实世界的驾驶行为。此外,我们通过让人类进行驾驶图灵测试来评估我们的模型生成的驾驶行为的真实性,在测试中,他们被要求区分真实驾驶视频和使用我们的驾驶模型生成的视频。 摘要:Autonomous vehicles need to model the behavior of surrounding human driven vehicles to be safe and efficient traffic participants. Existing approaches to modeling human driving behavior have relied on both data-driven and rule-based methods. While data-driven models are more expressive, rule-based models are interpretable, which is an important requirement for safety-critical domains like driving. However, rule-based models are not sufficiently representative of data, and data-driven models are yet unable to generate realistic traffic simulation due to unrealistic driving behavior such as collisions. In this paper, we propose a methodology that combines rule-based modeling with data-driven learning. While the rules are governed by interpretable parameters of the driver model, these parameters are learned online from driving demonstration data using particle filtering. We perform driver modeling experiments on the task of highway driving and merging using data from three real-world driving demonstration datasets. Our results show that driver models based on our hybrid rule-based and data-driven approach can accurately capture real-world driving behavior. Further, we assess the realism of the driving behavior generated by our model by having humans perform a driving Turing test, where they are asked to distinguish between videos of real driving and those generated using our driver models.

【40】 DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks 标题:DropAttack:一种提高神经网络泛化能力的掩蔽权重对抗性训练方法 链接:https://arxiv.org/abs/2108.12805

作者:Shiwen Ni,Jiawen Li,Hung-Yu Kao 机构:Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan 摘要:对抗训练已被证明是一种有效的正则化方法,可以提高模型的泛化能力。然而,现有的对抗性训练方法仅对原始输入样本或嵌入向量进行攻击,其攻击缺乏覆盖性和多样性。为了进一步提高攻击的广度和深度,我们提出了一种新的蒙面重量对抗训练方法DropAttack,它通过在不同维度的输入层和隐藏层中故意添加最坏情况下的对抗性扰动来增强模型的泛化,并最小化每一层产生的对抗性风险。DropAttack是一种通用技术,可以应用于各种不同结构的神经网络。为了验证该方法的有效性,我们使用了自然语言处理(NLP)和计算机视觉(CV)领域的五个公共数据集进行实验评估。我们将该方法与其他对抗训练方法和正则化方法进行了比较,我们的方法在所有数据集上都达到了最新水平。此外,与其他标准训练方法相比,Dropattack仅使用一半的训练数据,就可以获得相同的性能。理论分析表明,DropAttack可以对模型的一些输入参数和wight参数进行随机梯度正则化。进一步的可视化实验表明,DropAttack可以将模型的最小风险推到更低、更平坦的损失区域。我们的源代码在https://github.com/nishiwen1214/DropAttack. 摘要:Adversarial training has been proven to be a powerful regularization method to improve the generalization of models. However, current adversarial training methods only attack the original input sample or the embedding vectors, and their attacks lack coverage and diversity. To further enhance the breadth and depth of attack, we propose a novel masked weight adversarial training method called DropAttack, which enhances generalization of model by adding intentionally worst-case adversarial perturbations to both the input and hidden layers in different dimensions and minimize the adversarial risks generated by each layer. DropAttack is a general technique and can be adopt to a wide variety of neural networks with different architectures. To validate the effectiveness of the proposed method, we used five public datasets in the fields of natural language processing (NLP) and computer vision (CV) for experimental evaluating. We compare the proposed method with other adversarial training methods and regularization methods, and our method achieves state-of-the-art on all datasets. In addition, Dropattack can achieve the same performance when it use only a half training data compared to other standard training method. Theoretical analysis reveals that DropAttack can perform gradient regularization at random on some of the input and wight parameters of the model. Further visualization experiments show that DropAttack can push the minimum risk of the model to a lower and flatter loss landscapes. Our source code is publicly available on https://github.com/nishiwen1214/DropAttack.

【41】 Interpretable Propaganda Detection in News Articles 标题:新闻文章中的可解释性宣传检测 链接:https://arxiv.org/abs/2108.12802

作者:Seunghak Yu,Giovanni Da San Martino,Mitra Mohtarami,James Glass,Preslav Nakov 机构: Amazon Alexa AI, Seattle, WA, USA, Department of Mathematics, University of Padova, Italy, MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA, Qatar Computing Research Institute, HBKU, Qatar 备注:None 摘要:如今,在线用户每天都会接触到误导性和宣传性的新闻文章和媒体帖子。为了应对这种情况,已经设计了许多方法,旨在实现更健康、更安全的在线新闻和媒体消费。自动系统能够支持人类检测此类内容;然而,广泛采用这些系统的一个主要障碍是,除了准确之外,这些系统的决定还需要具有可解释性,以便得到用户的信任和广泛采用。由于误导性和宣传性内容通过使用大量欺骗技术影响读者,我们建议检测并展示此类技术的使用,以提供可解释性。特别是,我们定义了定性描述特征,并分析了它们对检测欺骗技术的适用性。我们进一步表明,我们的可解释特征可以很容易地与预先训练的语言模型相结合,产生最先进的结果。 摘要:Online users today are exposed to misleading and propagandistic news articles and media posts on a daily basis. To counter thus, a number of approaches have been designed aiming to achieve a healthier and safer online news and media consumption. Automatic systems are able to support humans in detecting such content; yet, a major impediment to their broad adoption is that besides being accurate, the decisions of such systems need also to be interpretable in order to be trusted and widely adopted by users. Since misleading and propagandistic content influences readers through the use of a number of deception techniques, we propose to detect and to show the use of such techniques as a way to offer interpretability. In particular, we define qualitatively descriptive features and we analyze their suitability for detecting deception techniques. We further show that our interpretable features can be easily combined with pre-trained language models, yielding state-of-the-art results.

【42】 Markov Switching Model for Driver Behavior Prediction: Use cases on Smartphones 标题:用于驾驶员行为预测的马尔可夫切换模型:智能手机上的使用案例 链接:https://arxiv.org/abs/2108.12801

作者:Ahmed B. Zaky,Mohamed A. Khamis,Walid Gomaa 机构:Benha University, Cairo , Egypt, Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen , Guangdong, China, Cyber-Physical Systems Lab, Egypt-Japan University of Science and Technology (E-JUST) 摘要:一些智能交通系统专注于研究各种驾驶员行为,以实现多种目标。这包括分析驾驶员动作、敏感性、分心和响应时间的能力。由于数据收集是学习和验证不同驾驶状况的主要关注点之一,因此我们提出了一个通过使用智能手机的低成本数据收集解决方案验证的驾驶员行为转换模型。使用真实数据集对所提出的模型进行了验证,以预测驾驶员在短时间内的行为。对运动检测(特别是使用智能手机的驾驶行为检测)进行了文献综述。采用多重马尔可夫切换变量自回归(MSVAR)模型对收集的驾驶员行为数据进行精密拟合。这不仅可以对驾驶员行为进行更准确的预测,而且可以对整个驾驶情况进行更准确的预测。还介绍了所提出模型的性能以及合适的模型选择标准。提出的驾驶员行为预测框架可用于事故预测和驾驶员安全系统。 摘要:Several intelligent transportation systems focus on studying the various driver behaviors for numerous objectives. This includes the ability to analyze driver actions, sensitivity, distraction, and response time. As the data collection is one of the major concerns for learning and validating different driving situations, we present a driver behavior switching model validated by a low-cost data collection solution using smartphones. The proposed model is validated using a real dataset to predict the driver behavior in short duration periods. A literature survey on motion detection (specifically driving behavior detection using smartphones) is presented. Multiple Markov Switching Variable Auto-Regression (MSVAR) models are implemented to achieve a sophisticated fitting with the collected driver behavior data. This yields more accurate predictions not only for driver behavior but also for the entire driving situation. The performance of the presented models together with a suitable model selection criteria is also presented. The proposed driver behavior prediction framework can potentially be used in accident prediction and driver safety systems.

【43】 TCCT: Tightly-Coupled Convolutional Transformer on Time Series Forecasting 标题:TCCT:基于时间序列预测的紧耦合卷积Transformer 链接:https://arxiv.org/abs/2108.12784

作者:Li Shen,Yangzhu Wang 机构: Yangzhu Wang 2 1Affiliation 1 and Correspondence; Beihang University; shenli, cn 2Affiliation 2; Beihang University; wangyangzhu 摘要:时间序列预测对于广泛的实际应用至关重要。最近的研究表明,Transformer在处理此类问题,特别是长序列时间序列输入(LSTI)和长序列时间序列预测(LSTF)问题方面具有优越性。为了提高效率和增强Transformer的局部性,这些研究在不同程度上将Transformer与CNN相结合。然而,它们的组合是松散耦合的,不能充分利用CNN。为了解决这个问题,我们提出了紧耦合卷积变换器(TCCT)的概念和三种TCCT架构,它们将转换后的CNN架构应用到变换器中:(1)CSP注意:通过将CSPNet与自我注意机制融合,自我注意机制的计算成本降低了30%,内存使用率降低了50%,同时达到了相当或超过预测精度(2) 扩展的因果卷积:这种方法是修改Informer提出的提取操作,用扩展的因果卷积层替换标准卷积层,以获得指数级的感受野增长(3) 传递机制:将传递机制应用于自关注块堆栈有助于类似转换器的模型获得更细粒度的信息,而额外的计算成本可以忽略不计。我们在真实数据集上的实验表明,我们的TCCT体系结构可以大大提高现有最先进的Transformer模型在时间序列预测方面的性能,并且具有更低的计算和内存成本,包括规范Transformer、LogTrans和Informer。 摘要:Time series forecasting is essential for a wide range of real-world applications. Recent studies have shown the superiority of Transformer in dealing with such problems, especially long sequence time series input(LSTI) and long sequence time series forecasting(LSTF) problems. To improve the efficiency and enhance the locality of Transformer, these studies combine Transformer with CNN in varying degrees. However, their combinations are loosely-coupled and do not make full use of CNN. To address this issue, we propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures which apply transformed CNN architectures into Transformer: (1) CSPAttention: through fusing CSPNet with self-attention mechanism, the computation cost of self-attention mechanism is reduced by 30% and the memory usage is reduced by 50% while achieving equivalent or beyond prediction accuracy. (2) Dilated causal convolution: this method is to modify the distilling operation proposed by Informer through replacing canonical convolutional layers with dilated causal convolutional layers to gain exponentially receptive field growth. (3) Passthrough mechanism: the application of passthrough mechanism to stack of self-attention blocks helps Transformer-like models get more fine-grained information with negligible extra computation costs. Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models on time series forecasting with much lower computation and memory costs, including canonical Transformer, LogTrans and Informer.

【44】 Risk-Aware Fine-Grained Access Control in Cyber-Physical Contexts 标题:网络物理环境下的风险感知细粒度访问控制 链接:https://arxiv.org/abs/2108.12739

作者:Jinxin Liu,Murat Simsek,Burak Kantarci,Melike Erol-Kantarci,Andrew Malton,Andrew Walenstein 机构:perhaps particularly, in cyber-physical settings. Unfortunately, creating and modifying context-sensitive access control solutions in dynamic, environments creates ongoing challenges to manage the authorization contexts. This paper proposes RASA, a context- 备注:ACM Digital Threats: Research and Practice, 2021 30 pages, 14 Figures, 14 Tables 摘要:用户对资源的访问可能只需要在特定条件和环境下授予,特别是在网络物理环境下。不幸的是,在动态环境中创建和修改上下文敏感的访问控制解决方案会给管理授权上下文带来持续的挑战。本文提出了RASA,一种上下文敏感的访问授权方法和机制,利用无监督机器学习自动推断基于风险的授权决策边界。我们在医疗保健使用环境中探索RASA,其中网络和物理条件为保护私人健康信息创造了特定于上下文的风险。风险级别与安全策略建议的访问控制决策相关。引入耦合方法,使用共存频率和持续时间跟踪上下文中对象的共存,并对这些对象进行聚类,以揭示具有共同风险水平的行为集合;这些用于创建授权决策边界。此外,我们还提出了一种评估风险水平的方法,并根据相应的风险水平对集群进行标记。我们评估RASA生成的策略相对于基于规则的启发式策略的承诺。通过采用三种不同的耦合特征(基于频率、基于持续时间和组合特征),无监督方法的决策与策略的决策一致性超过99%。 摘要:Access to resources by users may need to be granted only upon certain conditions and contexts, perhaps particularly in cyber-physical settings. Unfortunately, creating and modifying context-sensitive access control solutions in dynamic environments creates ongoing challenges to manage the authorization contexts. This paper proposes RASA, a context-sensitive access authorization approach and mechanism leveraging unsupervised machine learning to automatically infer risk-based authorization decision boundaries. We explore RASA in a healthcare usage environment, wherein cyber and physical conditions create context-specific risks for protecting private health information. The risk levels are associated with access control decisions recommended by a security policy. A coupling method is introduced to track coexistence of the objects within context using frequency and duration of coexistence, and these are clustered to reveal sets of actions with common risk levels; these are used to create authorization decision boundaries. In addition, we propose a method for assessing the risk level and labelling the clusters with respect to their corresponding risk levels. We evaluate the promise of RASA-generated policies against a heuristic rule-based policy. By employing three different coupling features (frequency-based, duration-based, and combined features), the decisions of the unsupervised method and that of the policy are more than 99% consistent.

【45】 Event Extraction as Natural Language Generation 标题:作为自然语言生成的事件抽取 链接:https://arxiv.org/abs/2108.12724

作者:I-Hung Hsu,Kuan-Hao Huang,Elizabeth Boschee,Scott Miller,Prem Natarajan,Kai-Wei Chang,Nanyun Peng 机构:†Information Science Institute, University of Southern California, ‡Computer Science Department, University of California, Los Angeles 备注:The first two authors contribute equally 摘要:事件提取(EE)是识别文本中事件触发器及其参数的任务,通常被描述为一个分类或结构化预测问题。此类模型通常将标签简化为数字标识符,使其无法利用标签语义(例如,名为逮捕的事件类型与逮捕、扣留或逮捕等词相关)。这将防止泛化为新的事件类型。在这项工作中,我们将EE描述为一项自然语言生成任务,并提出了GENE,这是一个模型,不仅可以捕获事件中的复杂依赖项,而且可以很好地推广到不可见或罕见的事件类型。给定一个段落和一个事件类型,GenEE被训练按照该事件类型的预定义模板生成一个自然句子。然后将生成的输出解码为触发器和参数预测。自动回归生成过程自然地模拟了预测之间的依赖关系——预测的每个新词都依赖于输出句子中已经生成的新词。在生成过程中使用精心设计的输入提示,GenEE能够捕获标签语义,从而实现对新事件类型的泛化。实证结果表明,在所有Zero-Shot、Few-Shot和高资源场景下,我们的模型在事件提取任务上都取得了很好的性能。特别是,在高资源环境下,GenEE在参数提取方面优于最先进的模型,并且在端到端EE任务上与当前最好的模型相比具有竞争力。 摘要:Event extraction (EE), the task that identifies event triggers and their arguments in text, is usually formulated as a classification or structured prediction problem. Such models usually reduce labels to numeric identifiers, making them unable to take advantage of label semantics (e.g. an event type named Arrest is related to words like arrest, detain, or apprehend). This prevents the generalization to new event types. In this work, we formulate EE as a natural language generation task and propose GenEE, a model that not only captures complex dependencies within an event but also generalizes well to unseen or rare event types. Given a passage and an event type, GenEE is trained to generate a natural sentence following a predefined template for that event type. The generated output is then decoded into trigger and argument predictions. The autoregressive generation process naturally models the dependencies among the predictions -- each new word predicted depends on those already generated in the output sentence. Using carefully designed input prompts during generation, GenEE is able to capture label semantics, which enables the generalization to new event types. Empirical results show that our model achieves strong performance on event extraction tasks under all zero-shot, few-shot, and high-resource scenarios. Especially, in the high-resource setting, GenEE outperforms the state-of-the-art model on argument extraction and gets competitive results with the current best on end-to-end EE tasks.

【46】 CHAINGE: A Blockchain Solution to Automate Payment Detail Updates to Subscription Services 标题:CHAINGE:自动更新订阅服务支付细节的区块链解决方案 链接:https://arxiv.org/abs/2108.12705

作者:David Buckley,Gueltoum Bendiab,Stavros Shiaeles,Nick Savage,Nicholas Kolokotronis 机构:∗Cyber Security Research Group, University of Portsmouth, PO,UP, Portsmouth, UK, †Department of Informatics and Telecommunications, University of Peloponnese 备注:None 摘要:基于订阅的业务模式的兴起导致客户需要管理其支付的订阅数量相应增加。对于客户来说,对多个订阅的付款进行管理已成为一项非常复杂且不安全的任务,尤其是在卡丢失、被盗或过期时更新付款详细信息。此外,根据安全报告,这一过程大多是手动的,容易受到人为错误、数字欺诈和数据泄露的影响。因此,在本文中,我们提出了一种新的方法来自动化、管理和简化更新和管理用户订阅付款过程中涉及的金融供应链。这是通过利用Hyperledger锯齿状区块链框架实现的,该框架允许消费者在中央数字钱包中输入其支付卡详细信息,并将其订阅链接到其卡。正在更新的卡触发区块链上的事件,从而允许在订阅系统上自动更新支付详细信息。对所提出的系统原型进行的验证测试表明,其当前实现是安全的。 摘要:The rise of the subscription-based business model has led to a corresponding increase in the number of subscriptions where a customer needs to manage their payments. This management of payments for multiple subscriptions has become a very complicated and insecure task for customers, especially when it comes to renewing payment details when the card is lost, stolen, or expires. In addition, this, mostly manual, process is vulnerable to human error, digital frauds, and data breaches, according to security reports. Thus, in this paper, we propose a novel approach to automate, manage and simplify the Financial Supply Chain involved in the process of updating and managing payments to user subscriptions. This is done by utilising the Hyperledger Sawtooth blockchain framework, that allows a consumer to enter their payment card details in a central digital wallet and link their subscriptions to their cards. The card being updated triggers an event on the blockchain, which allow for the payment details to be updated on subscription systems automatically. The verification tests performed on the prototype of the proposed system shows that its current implementation has been securely achieved.

【47】 DKM: Differentiable K-Means Clustering Layer for Neural Network Compression 标题:DKM:神经网络压缩的可微K-均值聚类层 链接:https://arxiv.org/abs/2108.12659

作者:Minsik Cho,Keivan A. Vahid,Saurabh Adya,Mohammad Rastegari 机构:Apple 摘要:深度神经网络(DNN)模型压缩用于有效的设备推断,对于减少内存需求和将用户数据保存在设备上变得越来越重要。为此,我们提出了一种新的可微k-均值聚类层(DKM),并将其应用于基于训练时间权重聚类的DNN模型压缩。DKM将k-均值聚类作为一个关注问题,并支持参数和聚类质心的联合优化。与以前依赖额外正则化器和参数的工作不同,基于DKM的压缩保持了原始损失函数和模型结构的固定。我们评估了用于计算机视觉和自然语言处理(NLP)任务的各种DNN模型上基于DKM的压缩。我们的结果表明,DMK在ImageNet1k和GLUE基准测试上提供了优越的压缩和精度权衡。例如,基于DKM的压缩可以在3.3MB模型大小(29.4x模型压缩系数)的ResNet50 DNN模型上提供74.5%的top-1 ImageNet1k精度。对于MobileNet-v1,这是一个具有挑战性的DNN压缩,DKM提供62.8%的top-1 ImageNet1k精度,模型大小为0.74 MB(模型压缩系数为22.4倍)。这一结果比当前最先进的DNN压缩算法的top-1精度高6.8%,模型尺寸相对较小33%。此外,DKM可以将DistilBERT模型压缩11.8倍,而GLUE NLP基准测试的精度损失最小(1.1%)。 摘要:Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DMK delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 ImageNet1k accuracy on ResNet50 DNN model with 3.3MB model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 62.8% top-1 ImageNet1k accuracy with 0.74 MB model size (22.4x model compression factor). This result is 6.8% higher top-1 accuracy and 33% relatively smaller model size than the current state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1%) accuracy loss on GLUE NLP benchmarks.

【48】 Smoothing Dialogue States for Open Conversational Machine Reading 标题:用于开放式对话式机器阅读的平滑对话状态 链接:https://arxiv.org/abs/2108.12599

作者:Zhuosheng Zhang,Siru Ouyang,Hai Zhao,Masao Utiyama,Eiichiro Sumita 机构: Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction, and Cognitive Engineering, Shanghai Jiao Tong University 备注:Accepted by EMNLP 2021 Main Conference 摘要:对话式机器阅读(CMR)要求机器通过决策和问题生成两种显著对话状态之间的多回合互动与人类进行交流。在开放的CMR环境中,作为更现实的场景,检索到的背景知识会有噪声,这会对信息传输造成严重挑战。现有的研究通常为这两个子任务训练独立或管道系统。然而,通过使用硬标签决策来激活问题生成,这些方法是微不足道的,这最终会阻碍模型的性能。在这项工作中,我们提出了一种有效的选通策略,通过在一个解码器中平滑两个对话状态,以及桥接决策和问题生成,以提供更丰富的对话状态参考。在OR ShARC数据集上的实验表明了该方法的有效性,取得了新的最新成果。 摘要:Conversational machine reading (CMR) requires machines to communicate with humans through multi-turn interactions between two salient dialogue states of decision making and question generation processes. In open CMR settings, as the more realistic scenario, the retrieved background knowledge would be noisy, which results in severe challenges in the information transmission. Existing studies commonly train independent or pipeline systems for the two subtasks. However, those methods are trivial by using hard-label decisions to activate question generation, which eventually hinders the model performance. In this work, we propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation to provide a richer dialogue state reference. Experiments on the OR-ShARC dataset show the effectiveness of our method, which achieves new state-of-the-art results.

【49】 Layer-wise Model Pruning based on Mutual Information 标题:基于互信息的分层模型剪枝 链接:https://arxiv.org/abs/2108.12594

作者:Chun Fan,Jiwei Li,Xiang Ao,Fei Wu,Yuxian Meng,Xiaofei Sun 机构:♦Zhejiang University, ♠Computer Center of Peking University, ⋆Peng Cheng Laboratory, ▼Key Lab of Intelligent Information Processing of Chinese Academy of Sciences, ♣ Shannon.AI 备注:To appear at EMNLP2021 摘要:与基于权重的剪枝技术相比,所提出的剪枝策略具有以下优点:(1)避免了不规则的内存访问,因为表示和矩阵可以压缩到更小但更密集的对应项中,从而导致更大的加速(2) 该方法采用自上而下的剪枝方式,基于顶层的训练信号,从更全局的角度进行操作,并通过将全局信号的效果传播到各层来剪枝,从而在相同的稀疏度水平下获得更好的性能。大量实验表明,在相同的稀疏度水平下,该策略比基于权重的剪枝方法(如幅度剪枝、运动剪枝)具有更高的加速比和性能。 摘要:The proposed pruning strategy offers merits over weight-based pruning techniques: (1) it avoids irregular memory access since representations and matrices can be squeezed into their smaller but dense counterparts, leading to greater speedup; (2) in a manner of top-down pruning, the proposed method operates from a more global perspective based on training signals in the top layer, and prunes each layer by propagating the effect of global signals through layers, leading to better performances at the same sparsity level. Extensive experiments show that at the same sparsity level, the proposed strategy offers both greater speedup and higher performances than weight-based pruning methods (e.g., magnitude pruning, movement pruning).

【50】 Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain Conversation 标题:将大规模产生式模型的知识提取到检索模型中以实现高效的开放领域会话 链接:https://arxiv.org/abs/2108.12582

作者:Beomsu Kim,Seokjun Seo,Seungju Han,Enkhbayar Erdenee,Buru Chang 机构:Hyperconnect 备注:EMNLP21-Findings 摘要:尽管大规模生成模型在开放域会话中表现出色,但由于高延迟,它们在构建实时会话系统方面的实用性较差。另一方面,检索模型可以以更低的延迟返回响应,但由于会话质量受预定义响应集的限制,因此其性能不如大规模生成模型。为了利用这两种方法,我们提出了一种称为G2R(生成到检索蒸馏)的新训练方法,该方法通过将生成模型的知识注入检索模型中,在利用大规模生成模型的会话能力的同时保持检索模型的效率。G2R由两种不同的提取技术组成:数据级G2R通过大规模生成模型生成的额外响应来扩充对话数据集,模型级G2R通过知识提取损失将生成模型评估的响应质量分数转换为检索模型的分数。通过大量实验,包括人类评估,我们证明了我们的基于检索的会话系统经过G2R训练后,与基线检索模型相比,性能有了显著提高,同时推理延迟显著低于大规模生成模型。 摘要:Despite the remarkable performance of large-scale generative models in open-domain conversation, they are known to be less practical for building real-time conversation systems due to high latency. On the other hand, retrieval models could return responses with much lower latency but show inferior performance to the large-scale generative models since the conversation quality is bounded by the pre-defined response set. To take advantage of both approaches, we propose a new training method called G2R (Generative-to-Retrieval distillation) that preserves the efficiency of a retrieval model while leveraging the conversational ability of a large-scale generative model by infusing the knowledge of the generative model into the retrieval model. G2R consists of two distinct techniques of distillation: the data-level G2R augments the dialogue dataset with additional responses generated by the large-scale generative model, and the model-level G2R transfers the response quality score assessed by the generative model to the score of the retrieval model by the knowledge distillation loss. Through extensive experiments including human evaluation, we demonstrate that our retrieval-based conversation system trained with G2R shows a substantially improved performance compared to the baseline retrieval model while showing significantly lower inference latency than the large-scale generative models.

【51】 AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data 标题:AMMASurv:针对完整幻灯片图像和基因表达数据的精确生存分析的非对称多模态注意 链接:https://arxiv.org/abs/2108.12565

作者:Ruoqi Wang,Ziwang Huang,Haitao Wang,Hejun Wu 机构:School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 备注:8 pages 摘要:使用多模态数据,如结合全幻灯片图像(WSI)和基因表达数据进行生存分析,可导致更准确的生存预测。以往的多模态生存模型不能有效地挖掘每个模态的内在信息。此外,尽管实验结果表明WSI提供的信息比基因表达数据更有效,但以前的方法认为来自不同模式的信息同样重要,因此无法灵活利用模式之间的潜在联系。为了解决上述问题,我们提出了一种新的非对称多模态方法AMMASurv。具体来说,我们在Transformer编码器中为多模态数据设计了一种非对称多模态注意机制(AMMA),以实现更灵活的多模态信息融合,用于生存预测。与以往的工作不同,AMMASurv能够有效地利用每个模态中的固有信息,并灵活地适应不同重要性的模态。通过大量实验验证了该模型的有效性。令人鼓舞的结果表明,我们的方法优于其他最先进的方法。 摘要:The use of multi-modal data such as the combination of whole slide images (WSIs) and gene expression data for survival analysis can lead to more accurate survival predictions. Previous multi-modal survival models are not able to efficiently excavate the intrinsic information within each modality. Moreover, despite experimental results show that WSIs provide more effective information than gene expression data, previous methods regard the information from different modalities as similarly important so they cannot flexibly utilize the potential connection between the modalities. To address the above problems, we propose a new asymmetrical multi-modal method, termed as AMMASurv. Specifically, we design an asymmetrical multi-modal attention mechanism (AMMA) in Transformer encoder for multi-modal data to enable a more flexible multi-modal information fusion for survival prediction. Different from previous works, AMMASurv can effectively utilize the intrinsic information within every modality and flexibly adapts to the modalities of different importance. Extensive experiments are conducted to validate the effectiveness of the proposed model. Encouraging results demonstrate the superiority of our method over other state-of-the-art methods.

【52】 Anytime Stochastic Task and Motion Policies 标题:随时随地随机任务和运动策略 链接:https://arxiv.org/abs/2108.12537

作者:Naman Shah,Siddharth Srivastava 机构: It also provides a running estimate of 1Arizona State University, School of Computing, Arizona State University 摘要:为了解决复杂、长视距的任务,智能机器人需要结合运动规划进行高级、抽象的规划和推理。然而,抽象模型通常是有损的,使用它们计算的计划或策略可能是不可执行的。这些问题在随机情况下会加剧,机器人需要对多种意外情况进行推理和计划。我们提出了一种在随机环境下集成任务和运动规划的新方法。与以前在这个方向上的工作相比,我们表明我们的方法可以有效地计算集成的任务和运动策略,这些策略的分支结构编码处理多个执行时间偶然事件的代理行为。我们证明了我们的算法是概率完全的,并且可以随时计算可行解策略,从而使遇到未解决的偶然事件的概率随着时间的推移而降低。一组具有挑战性问题的实证结果表明了我们方法的实用性和适用范围。 摘要:In order to solve complex, long-horizon tasks, intelligent robots need to carry out high-level, abstract planning and reasoning in conjunction with motion planning. However, abstract models are typically lossy and plans or policies computed using them can be inexecutable. These problems are exacerbated in stochastic situations where the robot needs to reason about and plan for multiple contingencies. We present a new approach for integrated task and motion planning in stochastic settings. In contrast to prior work in this direction, we show that our approach can effectively compute integrated task and motion policies whose branching structures encode agent behaviors that handle multiple execution-time contingencies. We prove that our algorithm is probabilistically complete and can compute feasible solution policies in an anytime fashion so that the probability of encountering an unresolved contingency decreases over time. Empirical results on a set of challenging problems show the utility and scope of our method.

【53】 Combining chest X-rays and EHR data using machine learning to diagnose acute respiratory failure 标题:利用机器学习结合胸部X线和EHR数据诊断急性呼吸衰竭 链接:https://arxiv.org/abs/2108.12530

作者:Sarah Jabbour,David Fouhey,Ella Kazerooni,Jenna Wiens,Michael W Sjoding 机构:Affiliations:, . Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, . Department of Radiology, University of Michigan Medical School, Ann Arbor, Michigan 备注:43 pages, 10 tables, 4 figures 摘要:当患者出现急性呼吸衰竭时,准确识别潜在病因对于确定最佳治疗至关重要,但在临床实践中区分常见诊断可能具有挑战性。机器学习模型可以通过增强临床决策来改善医学诊断,并在急性呼吸衰竭患者的诊断评估中发挥作用。虽然已经开发了机器学习模型来识别胸片(如肺炎)的常见发现,但通过分析电子健康记录(EHR)中的临床相关数据来增强这些方法可能有助于急性呼吸衰竭的诊断。对机器学习模型进行训练,以使用胸片和EHR数据预测急性呼吸衰竭(肺炎、心力衰竭和/或COPD)的原因,这些数据来自内部队列中的患者,使用基于医生图表审查的诊断。模型还使用出院诊断代码对外部队列中的患者进行了测试。结合胸片和EHR数据的模型在肺炎和COPD方面优于单独基于每种模式的模型。对于肺炎,联合模型AUROC为0.79(0.78-0.79),图像模型AUROC为0.73(0.72-0.75),EHR模型AUROC为0.73(0.70-0.76);对于慢性阻塞性肺病,综合指数为0.89(0.83-0.91),影像指数为0.85(0.77-0.89),EHR为0.80(0.76-0.84);对于心力衰竭,合并:0.80(0.77-0.84),图像:0.77(0.71-0.81),EHR:0.80(0.75-0.82)。在外部队列中,心力衰竭和COPD的表现是一致的,但肺炎的表现略有下降。总的来说,结合胸片和EHR数据的机器学习模型可以准确区分急性呼吸衰竭的常见原因。需要进一步的工作来确定这些模型是否可以帮助临床医生在临床环境中诊断急性呼吸衰竭。 摘要:When patients develop acute respiratory failure, accurately identifying the underlying etiology is essential for determining the best treatment, but it can be challenging to differentiate between common diagnoses in clinical practice. Machine learning models could improve medical diagnosis by augmenting clinical decision making and play a role in the diagnostic evaluation of patients with acute respiratory failure. While machine learning models have been developed to identify common findings on chest radiographs (e.g. pneumonia), augmenting these approaches by also analyzing clinically relevant data from the electronic health record (EHR) could aid in the diagnosis of acute respiratory failure. Machine learning models were trained to predict the cause of acute respiratory failure (pneumonia, heart failure, and/or COPD) using chest radiographs and EHR data from patients within an internal cohort using diagnoses based on physician chart review. Models were also tested on patients in an external cohort using discharge diagnosis codes. A model combining chest radiographs and EHR data outperformed models based on each modality alone for pneumonia and COPD. For pneumonia, the combined model AUROC was 0.79 (0.78-0.79), image model AUROC was 0.73 (0.72-0.75), and EHR model AUROC was 0.73 (0.70-0.76); for COPD, combined: 0.89 (0.83-0.91), image: 0.85 (0.77-0.89), and EHR: 0.80 (0.76-0.84); for heart failure, combined: 0.80 (0.77-0.84), image: 0.77 (0.71-0.81), and EHR: 0.80 (0.75-0.82). In the external cohort, performance was consistent for heart failure and COPD, but declined slightly for pneumonia. Overall, machine learning models combing chest radiographs and EHR data can accurately differentiate between common causes of acute respiratory failure. Further work is needed to determine whether these models could aid clinicians in the diagnosis of acute respiratory failure in clinical settings.

【54】 Robustness Disparities in Commercial Face Detection 标题:商用人脸检测中的鲁棒性差异 链接:https://arxiv.org/abs/2108.12508

作者:Samuel Dooley,Tom Goldstein,John P. Dickerson 机构:University of Maryland 摘要:在过去的十年中,面部检测和分析系统已经被大公司部署,并受到学者和活动家的批评。聚焦于系统性能的评论分析了系统输出的差异,即,针对不同的菲茨帕特里克皮肤类型或感知的性别检测人脸的频率。然而,我们关注这些系统输出在噪声自然扰动下的鲁棒性。我们展示了三个这样的系统的健壮性的第一个详细基准:Amazon Rekognion、Microsoft Azure和Google云平台。我们使用标准和最近发布的学术面部数据集,定量分析每种面部数据的稳健性趋势。在所有数据集和系统中,我们普遍发现,与其他身份的人相比,年龄较大、男性化、肤色较深或光线暗淡的人的照片更容易出错。 摘要:Facial detection and analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Critiques that focus on system performance analyze disparity of the system's output, i.e., how frequently is a face detected for different Fitzpatrick skin types or perceived genders. However, we focus on the robustness of these system outputs under noisy natural perturbations. We present the first of its kind detailed benchmark of the robustness of three such systems: Amazon Rekognition, Microsoft Azure, and Google Cloud Platform. We use both standard and recently released academic facial datasets to quantitatively analyze trends in robustness for each. Across all the datasets and systems, we generally find that photos of individuals who are older, masculine presenting, of darker skin type, or have dim lighting are more susceptible to errors than their counterparts in other identities.

【55】 StressNAS: Affect State and Stress Detection Using Neural Architecture Search 标题:StressNAS:使用神经体系结构搜索的影响状态和应力检测 链接:https://arxiv.org/abs/2108.12502

作者:Lam Huynh,Tri Nguyen,Thu Nguyen,Susanna Pirttikangas,Pekka Siirtola 机构:Center for Machine Vision and Signal Analysis, University of Oulu, Center for Ubiquitous Computing, University of Oulu, Economics and Business Administration, University of Oulu, Biomimetics and Intelligent Systems Group, University of Oulu 备注:5 pages, 2 figures 摘要:智能手表已迅速发展成为能够准确捕捉生理信号的功能。作为一种吸引人的应用,压力检测因其对人类健康的潜在益处而吸引了许多研究。深入研究深度神经网络(DNN)在通过生理信号增强人类决策方面的适用性是非常有利的。然而,由于这种现象的复杂性,手动工程DNN证明是一项繁琐的任务,尤其是在应力检测中。为此,我们提出了一种优化的深度神经网络训练方案,该方案仅使用来自WESAD的腕部磨损数据,使用神经结构搜索。实验表明,在三状态和两状态分类器中,使用WESAD腕部信号的组合,我们的方法比传统的ML方法分别高8.22%和6.02%。此外,该方法可以最大限度地减少对人工设计DNN的需求,同时将性能提高4.39%(三态)和8.99%(二进制)。 摘要:Smartwatches have rapidly evolved towards capabilities to accurately capture physiological signals. As an appealing application, stress detection attracts many studies due to its potential benefits to human health. It is propitious to investigate the applicability of deep neural networks (DNN) to enhance human decision-making through physiological signals. However, manually engineering DNN proves a tedious task especially in stress detection due to the complex nature of this phenomenon. To this end, we propose an optimized deep neural network training scheme using neural architecture search merely using wrist-worn data from WESAD. Experiments show that our approach outperforms traditional ML methods by 8.22% and 6.02% in the three-state and two-state classifiers, respectively, using the combination of WESAD wrist signals. Moreover, the proposed method can minimize the need for human-design DNN while improving performance by 4.39% (three-state) and 8.99% (binary).

【56】 Disrupting Adversarial Transferability in Deep Neural Networks 标题:深度神经网络中破坏敌意可转移性的研究 链接:https://arxiv.org/abs/2108.12492

作者:Christopher Wiedeman,Ge Wang 机构:Rensselaer Polytechnic Institute 备注:18 pages, 12 figures 摘要:对抗性攻击的可转移性是深度学习中公认的现象。先前的工作通过识别共同的对抗子空间和决策边界之间的相关性部分解释了可转移性,但我们在文献中发现除此之外几乎没有其他解释。在本文中,我们提出,表面上不同的模型之间的可转移性是由于不同深度神经网络提取的特征之间具有高度的线性相关性。换句话说,在同一任务中训练的两个模型在参数空间中似乎相距遥远,它们可能以相同的方式提取特征,只是在潜在空间之间进行微小的移动和旋转。此外,我们还展示了如何应用特征相关损失(将提取的特征在潜在空间中解相关)来显著降低模型之间对抗性攻击的可转移性,这表明模型以语义不同的方式完成任务。最后,我们提出了一种双颈自动编码器(DNA),它利用这种特征相关性损失来创建输入信息的两种有意义的不同编码,降低了可传输性。 摘要:Adversarial attack transferability is a well-recognized phenomenon in deep learning. Prior work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but we have found little explanation in the literature beyond this. In this paper, we propose that transferability between seemingly different models is due to a high linear correlation between features that different deep neural networks extract. In other words, two models trained on the same task that are seemingly distant in the parameter space likely extract features in the same fashion, just with trivial shifts and rotations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in a latent space, can drastically reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose a Dual Neck Autoencoder (DNA), which leverages this feature correlation loss to create two meaningfully different encodings of input information with reduced transferability.

【57】 Learning Inner-Group Relations on Point Clouds 标题:学习点云上的群内关系 链接:https://arxiv.org/abs/2108.12468

作者:Haoxi Ran,Wei Zhuo,Jun Liu,Li Lu 机构: Sichuan University, † Tencent 备注:ICCV 2021. arXiv admin note: text overlap with arXiv:2011.14285 摘要:计算机视觉中关系网络的流行与未充分探索的基于点的方法形成了鲜明的对比。在本文中,我们探讨了局部关系算子的可能性,并考察了它们的可行性。我们提出了一个可扩展且高效的模块,称为组关系聚合器。该模块根据几何关系和语义关系加权的内部组点的特征聚合来计算组的特征。我们采用这个模块来设计我们的RPNet。我们进一步验证了RPNet在深度和宽度两个方面对分类和分割任务的可扩展性。令人惊讶的是,实证结果表明,较宽的RPNet适合分类,而较深的RPNet更适合分割。RPNet在具有挑战性的基准上实现了最先进的分类和分割。我们还将本地聚合器与PointNet 进行了比较,它们的参数大约为30%,计算量节省了50%。最后,我们通过实验验证了RPNet对刚性变换和噪声的鲁棒性。 摘要:The prevalence of relation networks in computer vision is in stark contrast to underexplored point-based methods. In this paper, we explore the possibilities of local relation operators and survey their feasibility. We propose a scalable and efficient module, called group relation aggregator. The module computes a feature of a group based on the aggregation of the features of the inner-group points weighted by geometric relations and semantic relations. We adopt this module to design our RPNet. We further verify the expandability of RPNet, in terms of both depth and width, on the tasks of classification and segmentation. Surprisingly, empirical results show that wider RPNet fits for classification, while deeper RPNet works better on segmentation. RPNet achieves state-of-the-art for classification and segmentation on challenging benchmarks. We also compare our local aggregator with PointNet , with around 30% parameters and 50% computation saving. Finally, we conduct experiments to reveal the robustness of RPNet with regard to rigid transformation and noises.

【58】 Code-switched inspired losses for generic spoken dialog representations 标题:用于通用口语对话表示的代码切换引起的损失 链接:https://arxiv.org/abs/2108.12465

作者:Emile Chapuis,Pierre Colombo,Matthieu Labeau,Chloe Clave 机构:LTCI, Telecom Paris, Institut Polytechnique de Paris, IBM GBS France 备注:None 摘要:口语对话系统需要能够处理对话中的多种语言和多种语言(textit{e.g}在代码切换的情况下)。在这项工作中,我们介绍了新的训练前损失定制学习多语言口语对话表示。这些损失的目的是将模型暴露于代码交换语言。为了扩大训练规模,我们从texttt{OpenSubtitles}自动构建了一个由五种不同语言(法语、意大利语、英语、德语和西班牙语)的多语言对话组成的训练前语料库,这是一个由24.3G标记组成的大型多语言语料库。我们测试了texttt{MIAM}上的泛型表示,这是一个新的基准,由上述相同语言上的五个对话行为语料库组成,以及两个新的多语言下游任务(texttt{i.e}多语言掩码话语检索和多语言不一致性识别)。我们的实验表明,我们的新代码切换系统在单语言和多语言环境下都具有更好的性能。 摘要:Spoken dialog systems need to be able to handle both multiple languages and multilinguality inside a conversation (textit{e.g} in case of code-switching). In this work, we introduce new pretraining losses tailored to learn multilingual spoken dialog representations. The goal of these losses is to expose the model to code-switched language. To scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from texttt{OpenSubtitles}, a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on texttt{MIAM}, a new benchmark composed of five dialog act corpora on the same aforementioned languages as well as on two novel multilingual downstream tasks (textit{i.e} multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new code switched-inspired losses achieve a better performance in both monolingual and multilingual settings.

【59】 Automatic Text Evaluation through the Lens of Wasserstein Barycenters 标题:基于Wasserstein重心透镜的文本自动评测 链接:https://arxiv.org/abs/2108.12463

作者:Pierre Colombo,Guillaume Staerman,Chloe Clavel,Pablo Piantanida 机构:LTCI, Telecom Paris, Institut Polytechnique de Paris, IBM GBS France, ∗CentraleSupelec-CNRS, Universite Paris-Saclay 备注:None 摘要:介绍了一种新的度量texttt{BaryScore}来评估基于深层上下文嵌入的文本生成(texttt{e.g.},BERT,Roberta,ELMo)。这一指标是由一个新的框架推动的,该框架依赖于最佳运输工具、瓦瑟斯坦距离和重心。通过将深层情境化嵌入的层输出建模为概率分布,而不是向量嵌入;该框架提供了一种通过Wasserstein空间拓扑聚合不同输出的自然方式。此外,它为我们的指标提供了理论依据,并提供了可用解决方案的替代方案(textit{e.g.}、MoverScore和BertScore)。数值评估在四个不同的任务上执行:机器翻译、摘要、数据2文本生成和图像字幕。我们的结果表明texttt{BaryScore}优于其他基于BERT的度量,并且表现出更一致的行为,特别是对于文本摘要。 摘要:A new metric texttt{BaryScore} to evaluate text generation based on deep contextualized embeddings (textit{e.g.}, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, textit{i.e.}, Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this framework provides a natural way to aggregate the different outputs through the Wasserstein space topology. In addition, it provides theoretical grounds to our metric and offers an alternative to available solutions (textit{e.g.}, MoverScore and BertScore). Numerical evaluation is performed on four different tasks: machine translation, summarization, data2text generation and image captioning. Our results show that texttt{BaryScore} outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.

【60】 Why and How Governments Should Monitor AI Development 标题:政府为什么以及如何监控人工智能的发展 链接:https://arxiv.org/abs/2108.12427

作者:Jess Whittlestone,Jack Clark 机构:Centre for the Study of Existential Risk, University of Cambridge, Anthropic, EXECUTIVE SUMMARY, We outline a proposal for improving the governance of artificial intelligence (AI) by investing in government capacity 摘要:在本文中,我们概述了一项通过投资于政府能力来改善人工智能(AI)治理的建议,以系统地衡量和监测AI系统的能力和影响。如果采用,这将为政府提供更多关于人工智能生态系统的信息,使其能够更有效地指导人工智能的开发和部署,使其朝着最具社会效益和经济效益的方向发展。它还将创建能够快速识别AI生态系统变化可能带来的潜在威胁或危害的基础设施,如战略转型能力的出现或有害系统的部署。我们首先概述了推动这一提议的问题:简言之,传统治理方法难以跟上人工智能的发展速度。然后,我们提出了解决这一问题的建议:政府必须投资于计量和监测基础设施。我们详细讨论了这一建议,概述了政府可以重点衡量和监测的具体事项,以及这将为决策带来的好处。最后,我们概述了一些潜在的试点项目以及在实践中实施该项目的一些考虑因素。 摘要:In this paper we outline a proposal for improving the governance of artificial intelligence (AI) by investing in government capacity to systematically measure and monitor the capabilities and impacts of AI systems. If adopted, this would give governments greater information about the AI ecosystem, equipping them to more effectively direct AI development and deployment in the most societally and economically beneficial directions. It would also create infrastructure that could rapidly identify potential threats or harms that could occur as a consequence of changes in the AI ecosystem, such as the emergence of strategically transformative capabilities, or the deployment of harmful systems. We begin by outlining the problem which motivates this proposal: in brief, traditional governance approaches struggle to keep pace with the speed of progress in AI. We then present our proposal for addressing this problem: governments must invest in measurement and monitoring infrastructure. We discuss this proposal in detail, outlining what specific things governments could focus on measuring and monitoring, and the kinds of benefits this would generate for policymaking. Finally, we outline some potential pilot projects and some considerations for implementing this in practice.

机器翻译,仅供参考

0 人点赞