cs.AI人工智能,共计52篇
【1】 Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning 标题:用双系统、神经-符号推理提高神经序列模型的一致性和一致性
作者:Maxwell Nye,Michael Henry Tessler,Joshua B. Tenenbaum,Brenden M. Lake 机构:MIT, NYU and Facebook AI Research 链接:https://arxiv.org/abs/2107.02794 摘要:人类的推理通常可以理解为两个系统之间的相互作用:直觉和联想(“系统1”)和深思熟虑和逻辑(“系统2”)。神经序列模型——在执行复杂的结构化任务方面越来越成功——展示了System 1的优点和故障模式:它们速度快,可以从数据中学习模式,但往往不一致和不连贯。在这项工作中,我们寻求一种轻量级的、无需训练的方法,通过添加系统2启发的逻辑推理来改进现有的类系统1序列模型。我们探讨了这一主题的几种变体,其中神经序列模型的候选代通过符号推理模块检查逻辑一致性,该模块可以接受或拒绝这些代。我们的方法使用神经推理在神经系统1和逻辑系统2之间进行中介。在健壮的故事生成和接地指令跟踪中的结果表明,这种方法可以提高基于神经系统的生成的一致性和准确性。 摘要:Human reasoning can often be understood as an interplay between two systems: the intuitive and associative ("System 1") and the deliberative and logical ("System 2"). Neural sequence models -- which have been increasingly successful at performing complex, structured tasks -- exhibit the advantages and failure modes of System 1: they are fast and learn patterns from data, but are often inconsistent and incoherent. In this work, we seek a lightweight, training-free means of improving existing System 1-like sequence models by adding System 2-inspired logical reasoning. We explore several variations on this theme in which candidate generations from a neural sequence model are examined for logical consistency by a symbolic reasoning module, which can either accept or reject the generations. Our approach uses neural inference to mediate between the neural System 1 and the logical System 2. Results in robust story generation and grounded instruction-following show that this approach can increase the coherence and accuracy of neurally-based generations.
【2】 Learned Visual Navigation for Under-Canopy Agricultural Robots 标题:学习的冠下农业机器人视觉导航
作者:Arun Narenthiran Sivakumar,Sahil Modi,Mateus Valverde Gasparino,Che Ellis,Andres Eduardo Baquero Velasquez,Girish Chowdhary,Saurabh Gupta 机构:Department of Agricultural and Biological Engineering, University of Illinois at Urbana-Champaign (UIUC), Department of Computer Science, UIUC,Department of Electrical and Computer Engineering, UIUC, EarthSense Inc. 备注:RSS 2021. Project website with data and videos: this https URL 链接:https://arxiv.org/abs/2107.02792 摘要:我们描述了一个视觉引导下农场机器人自主导航系统。低成本的冠层下机器人可以在植物冠层下的作物行之间行驶,完成对冠层上的无人机或大型农业设备不可行的任务。然而,在树冠下自主导航它们带来了许多挑战:GPS和激光雷达不可靠、传感成本高、农场地形具有挑战性、树叶和杂草造成的杂乱以及季节和作物类型之间外观的巨大变化。我们通过构建一个模块化系统来解决这些挑战,该系统利用机器学习从低成本相机的单目RGB图像获得鲁棒和可概括的感知,并利用模型预测控制在具有挑战性的地形中进行精确控制。我们的系统CropFollow平均每次干预能够自主行驶485米,在跨越25公里的广泛实地测试中优于最先进的基于激光雷达的系统(每次干预286米)。 摘要:We describe a system for visually guided autonomous navigation of under-canopy farm robots. Low-cost under-canopy robots can drive between crop rows under the plant canopy and accomplish tasks that are infeasible for over-the-canopy drones or larger agricultural equipment. However, autonomously navigating them under the canopy presents a number of challenges: unreliable GPS and LiDAR, high cost of sensing, challenging farm terrain, clutter due to leaves and weeds, and large variability in appearance over the season and across crop types. We address these challenges by building a modular system that leverages machine learning for robust and generalizable perception from monocular RGB images from low-cost cameras, and model predictive control for accurate control in challenging terrain. Our system, CropFollow, is able to autonomously drive 485 meters per intervention on average, outperforming a state-of-the-art LiDAR based system (286 meters per intervention) in extensive field testing spanning over 25 km.
【3】 Causal Bandits on General Graphs 标题:一般图上的因果图
作者:Aurghya Maiti,Vineet Nair,Gaurav Sinha 机构:Adobe Research, Technion Israel Institute of Technology 备注:35 pages 链接:https://arxiv.org/abs/2107.02772 摘要:研究了仅由因果图表示的因果贝叶斯网络(CBN)的最优干预问题。我们将此模型化为一个随机多臂强盗(MAB)问题,其中干预对应于强盗实例的臂。首先,我们提出了一个简单的遗憾最小化算法,该算法以一个带有原子干预和可能不可观测变量的半马尔可夫因果图为输入,实现了$tilde{O}(sqrt{M/T})$期望的简单遗憾,其中,$M$依赖于输入CBN,与臂数相比可能非常小。我们还表明,这几乎是最佳的CBN所描述的因果图有一个$n$-元树结构。我们的简单遗憾最小化结果,包括上限和下限,包含了文献中先前的结果,这些结果假设输入因果图有额外的结构限制。特别是,我们的结果表明,我们提出的算法的简单遗憾保证只能通过考虑因果图上更细微的结构限制来改进。接下来,我们提出了一个累积后悔最小化算法,该算法将一个包含所有可观察节点和原子干预的一般因果图作为输入,比不考虑因果边信息的最优MAB算法性能更好。我们还通过实验将这两种算法与文献中最著名的算法进行了比较。据我们所知,这项工作给出了第一个简单和累积的遗憾最小化算法一般因果图在原子干预和未观察到的混杂因素。 摘要:We study the problem of determining the best intervention in a Causal Bayesian Network (CBN) specified only by its causal graph. We model this as a stochastic multi-armed bandit (MAB) problem with side-information, where the interventions correspond to the arms of the bandit instance. First, we propose a simple regret minimization algorithm that takes as input a semi-Markovian causal graph with atomic interventions and possibly unobservable variables, and achieves $tilde{O}(sqrt{M/T})$ expected simple regret, where $M$ is dependent on the input CBN and could be very small compared to the number of arms. We also show that this is almost optimal for CBNs described by causal graphs having an $n$-ary tree structure. Our simple regret minimization results, both upper and lower bound, subsume previous results in the literature, which assumed additional structural restrictions on the input causal graph. In particular, our results indicate that the simple regret guarantee of our proposed algorithm can only be improved by considering more nuanced structural restrictions on the causal graph. Next, we propose a cumulative regret minimization algorithm that takes as input a general causal graph with all observable nodes and atomic interventions and performs better than the optimal MAB algorithm that does not take causal side-information into account. We also experimentally compare both our algorithms with the best known algorithms in the literature. To the best of our knowledge, this work gives the first simple and cumulative regret minimization algorithms for CBNs with general causal graphs under atomic interventions and having unobserved confounders.
【4】 MAJORITY-3SAT (and Related Problems) in Polynomial Time 标题:多项式时间内的多数-3SAT(及相关问题)
作者:Shyan Akmal,Ryan Williams 机构:MIT 备注:Abstract shortened to fit arXiv requirements 链接:https://arxiv.org/abs/2107.02748 摘要:多数SAT是一个问题,它确定一个以合取范式(CNF)表示的输入$n$变量公式是否至少有$2^{n-1}$满足赋值。在对概率规划和推理的复杂性感兴趣的各种人工智能社区中,大多数SAT和相关问题都得到了广泛的研究。尽管多数SAT已经被认为是PP完整的40多年了,但自然变量的复杂性仍然是开放的:多数-$k$SAT,其中输入CNF公式的子句宽度限制为最多$k$。我们证明了对于每$k$,多数-$k$SAT在P中。事实上,对于分母有界的正整数$k$和有理$rhoIn(0,1)$,我们给出了一个算法,可以确定给定的$k$-CNF在确定性线性时间内是否至少有$rhocdot 2^n$个满意的赋值(而以前最著名的算法是指数时间)。我们的算法在计算复杂性和推理复杂性方面有着有趣的积极意义,显著降低了E-MAJ-$k$SAT和MAJ-MAJ-$k$SAT等相关问题的已知复杂性。我们方法的核心是通过提取a$k$-CNF的相应集合系统中发现的向日葵来解决阈值计数问题的有效方法。我们还表明,大多数-k$SAT的可处理性有些脆弱。对于密切相关的GTMASTERY-SAT问题(其中我们询问给定的公式是否有大于$2^{n-1}$的满意赋值),已知它是PP完全的,我们证明GTMASTERY-$k$SAT对于$kle3$在P中,但是对于$kgeq 4$变为NP完全。这些结果是违反直觉的,因为这些问题的“自然”分类是PP完备性的,而且对于所有的$kge 4$,GTMASTERY-$k$SAT和MASTERY-$k$SAT的复杂性有着明显的差异。 摘要:Majority-SAT is the problem of determining whether an input $n$-variable formula in conjunctive normal form (CNF) has at least $2^{n-1}$ satisfying assignments. Majority-SAT and related problems have been studied extensively in various AI communities interested in the complexity of probabilistic planning and inference. Although Majority-SAT has been known to be PP-complete for over 40 years, the complexity of a natural variant has remained open: Majority-$k$SAT, where the input CNF formula is restricted to have clause width at most $k$. We prove that for every $k$, Majority-$k$SAT is in P. In fact, for any positive integer $k$ and rational $rho in (0,1)$ with bounded denominator, we give an algorithm that can determine whether a given $k$-CNF has at least $rho cdot 2^n$ satisfying assignments, in deterministic linear time (whereas the previous best-known algorithm ran in exponential time). Our algorithms have interesting positive implications for counting complexity and the complexity of inference, significantly reducing the known complexities of related problems such as E-MAJ-$k$SAT and MAJ-MAJ-$k$SAT. At the heart of our approach is an efficient method for solving threshold counting problems by extracting sunflowers found in the corresponding set system of a $k$-CNF. We also show that the tractability of Majority-$k$SAT is somewhat fragile. For the closely related GtMajority-SAT problem (where we ask whether a given formula has greater than $2^{n-1}$ satisfying assignments) which is known to be PP-complete, we show that GtMajority-$k$SAT is in P for $kle 3$, but becomes NP-complete for $kgeq 4$. These results are counterintuitive, because the ``natural'' classifications of these problems would have been PP-completeness, and because there is a stark difference in the complexity of GtMajority-$k$SAT and Majority-$k$SAT for all $kge 4$.
【5】 Neural Computing 标题:神经计算
作者:Ayushe Gangal,Peeyush Kumar,Sunita Kumari,Aditya Kumar 机构: Deenbandhu Chhotu Ram University Of Science And Technology 2 ayushe 17 备注:Book chapter, 25 pages, 16 figures, 5 tables 链接:https://arxiv.org/abs/2107.02744 摘要:本章旨在提供对世界问题的下一个层次的理解,以及这些问题的解决方案,这些问题很好地属于神经计算领域,同时在方法上也很聪明,以唤起教育家、研究人员、学术专业人士的创新意识,学生和有关人士,通过强调在这一领域的主要研究人员和创新者所做的工作,从而鼓励读者为同一领域开发更新和更先进的技术。通过这一章,我们讨论了社会问题,并通过目前的理论和研究给出了各种解决方案。目前发现的不同类型的神经网络以及其中一些神经网络的应用,除了理论上的理解外,还集中在应用中涉及的工作原理和核心概念上。 摘要:This chapter aims to provide next-level understanding of the problems of the world and the solutions available to those problems, which lie very well within the domain of neural computing, and at the same time are intelligent in their approach, to invoke a sense of innovation among the educationalists, researchers, academic professionals, students and people concerned, by highlighting the work done by major researchers and innovators in this field and thus, encouraging the readers to develop newer and more advanced techniques for the same. By means of this chapter, the societal problems are discussed and various solutions are also given by means of the theories presented and researches done so far. Different types of neural networks discovered so far and applications of some of those neural networks are focused on, apart from their theoretical understanding, the working and core concepts involved in the applications.
【6】 AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning 标题:AdaRL:在迁移强化学习中适应什么、在哪里和如何适应
作者:Biwei Huang,Fan Feng,Chaochao Lu,Sara Magliacane,Kun Zhang 机构:Carnegie Mellon University, City University of Hong Kong, University of Cambridge, University of Amsterdam, MIT-IBM Watson AI Lab 链接:https://arxiv.org/abs/2107.02729 摘要:强化学习(RL)中的大多数方法都是数据饥渴的,并且特定于固定的环境。在本文中,我们提出了一个原则性的自适应RL框架AdaRL,它能够可靠地适应跨域的变化。具体地说,我们为系统中变量之间的结构关系构建了一个生成环境模型,并以一种紧凑的方式嵌入了变化,这为定位变化是什么、在哪里以及如何适应变化提供了一个清晰的、可解释的图像。基于环境模型,我们描述了一个最小的表示集,包括领域特定的因素和领域共享状态表示,足以实现可靠和低成本的传输。此外,我们还表明,通过显式地利用紧凑的表示来编码更改,我们可以只使用少量样本来调整策略,而无需在目标域中进一步优化策略。我们通过一系列实验来说明AdaRL的有效性,这些实验允许Cartpole和Atari游戏的不同组件发生变化。 摘要:Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the environment model, we characterize a minimal set of representations, including both domain-specific factors and domain-shared state representations, that suffice for reliable and low-cost transfer. Moreover, we show that by explicitly leveraging a compact representation to encode changes, we can adapt the policy with only a few samples without further policy optimization in the target domain. We illustrate the efficacy of AdaRL through a series of experiments that allow for changes in different components of Cartpole and Atari games.
【7】 A Unified Off-Policy Evaluation Approach for General Value Function 标题:一般价值函数的统一非政策性评价方法
作者:Tengyu Xu,Zhuoran Yang,Zhaoran Wang,Yingbin Liang 机构:The Ohio State University, Princeton University, Northwestern University 备注:submitted for publication 链接:https://arxiv.org/abs/2107.02711 摘要:一般值函数(GVF)是强化学习(RL)中一种既能表示{em预测性}又能表示{em回顾性}知识的有力工具。在实践中,往往需要对多个相互关联的全球价值函数与预先收集的政策外样本进行联合评估。在文献中,梯度时间差(GTD)学习方法被用来评估非策略环境下的GVFs,但是这种方法即使函数近似类具有足够的表达性,也可能会产生较大的估计误差。此外,在函数逼近的情况下,以往的工作都没有正式建立对地真值GVF的收敛保证。在本文中,我们通过一类带因果滤波的GVF来解决这两个问题,它涵盖了RL的广泛应用,如报酬方差、价值梯度、异常检测中的成本、平稳分布梯度、,我们提出了一种新的非策略GVFs求值算法GenTD,并证明GenTD学习多个相关的多维GVFs的效率与学习单个规范标量值函数的效率相同。我们进一步证明了与GTD不同的是,只要函数逼近能力足够大,GenTD所学习的GVFs就可以保证收敛到地面真值GVFs。据我们所知,GenTD是第一个具有全局最优保证的策略GVF评估算法。 摘要:General Value Function (GVF) is a powerful tool to represent both the {em predictive} and {em retrospective} knowledge in reinforcement learning (RL). In practice, often multiple interrelated GVFs need to be evaluated jointly with pre-collected off-policy samples. In the literature, the gradient temporal difference (GTD) learning method has been adopted to evaluate GVFs in the off-policy setting, but such an approach may suffer from a large estimation error even if the function approximation class is sufficiently expressive. Moreover, none of the previous work have formally established the convergence guarantee to the ground truth GVFs under the function approximation settings. In this paper, we address both issues through the lens of a class of GVFs with causal filtering, which cover a wide range of RL applications such as reward variance, value gradient, cost in anomaly detection, stationary distribution gradient, etc. We propose a new algorithm called GenTD for off-policy GVFs evaluation and show that GenTD learns multiple interrelated multi-dimensional GVFs as efficiently as a single canonical scalar value function. We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large. To our best knowledge, GenTD is the first off-policy GVF evaluation algorithm that has global optimality guarantee.
【8】 ML-Quadrat & DriotData: A Model-Driven Engineering Tool and a Low-Code Platform for Smart IoT Services 标题:ML-Quadrat&DriotData:模型驱动的工程工具和智能物联网服务的低代码平台
作者:Armin Moin,Andrei Mituca,Atta Badii,Stephan Günnemann 机构:Department of Informatics, Technical University of Munich, Germany, DriotData UG, Department of Computer Science, University of Reading, United Kingdom 备注:Preliminary version 链接:https://arxiv.org/abs/2107.02692 摘要:本文基于Eclipse建模框架(EMF)和智能网络物理系统(CPS)和物联网(IoT)的模型驱动软件工程(MDSE)的研究现状,提出了一种新的早期工具原型ML-quadra。它设想的用户大多是软件开发人员,他们可能在异构物联网平台和各种人工智能(AI)技术方面缺乏深入的知识和技能,特别是在数据分析和机器学习(DAML)方面。ML QUARDA是根据Github上Apache 2.0许可证的条款发布的:https://github.com/arminmoin/ML-Quadrat. 此外,还展示了一个新颖的早期工具原型DriotData,它是一个面向公民数据科学家和公民/最终用户软件开发人员的低代码平台。DriotData开发并采用了mlquadra,并将其扩展版本作为一种基于web的服务提供给公司,特别是中小企业(SME)。DriotData的最小可行产品(MVP)的基本web演示已经可用。最后,YouTube上提供了演示这些工具的简短视频:https://youtu.be/YCNFfhmy_JY. 摘要:In this paper, we present the novel early tool prototype of ML-Quadrat, which is an open source research prototype, based on the Eclipse Modeling Framework (EMF) and the state of the art in the literature of Model-Driven Software Engineering (MDSE) for smart Cyber-Physical Systems (CPS) and the Internet of Things (IoT). Its envisioned users are mostly software developers, who might not have deep knowledge and skills in the heterogeneous IoT platforms and the diverse Artificial Intelligence (AI) technologies, specifically regarding Data Analytics and Machine Learning (DAML). ML-Quadrat is released under the terms of the Apache 2.0 license on Github: https://github.com/arminmoin/ML-Quadrat. Additionally, the novel early tool prototype of DriotData, a Low-Code platform targeting citizen data scientists and citizen/end-user software developers is demonstrated. DriotData exploits and adopts ML-Quadrat and offers an extended version of it as a web-based service to companies, especially Small- and Medium-Sized Enterprises (SME). A basic web-based demo of the Minimum Viable Product (MVP) of DriotData is already available. Finally, a short video demonstrating the tools is available on YouTube: https://youtu.be/YCNFfhmy_JY.
【9】 Enabling Un-/Semi-Supervised Machine Learning for MDSE of the Real-World CPS/IoT Applications 标题:为真实CPS/物联网应用的MDSE启用无/半监督机器学习
作者:Armin Moin,Atta Badii,Stephan Günnemann 机构:Department of Informatics, Technical University of Munich, Germany, Department of Computer Science, University of Reading, United Kingdom, Stephan G¨unnemann 备注:Preliminary version 链接:https://arxiv.org/abs/2107.02690 摘要:在本文中,我们提出了一种新的方法来支持特定领域模型驱动的软件工程(MDSE),用于智能网络物理系统(CPS)和物联网(IoT)的实际用例场景。我们认为,在人工智能(AI),特别是机器学习(ML)的自然界中,大多数可用的数据都是未标记的。因此,无监督和/或半监督ML方法是实际的选择。然而,在MDSE的文献中,先前的工作考虑了监督ML方法,它只适用于标记的训练数据。我们提出的方法已经完全实现,并与现有最先进的MDSE工具集成,以服务于CPS/IoT领域。此外,我们使用智能能源系统领域的REFIT参考数据集的一部分开放数据来验证所提出的方法。我们的模型到代码转换(代码生成器)以自动化的方式从模型实例中提供所需物联网服务的完整源代码。目前,我们用Java和Python生成源代码。Python代码负责ML功能,并使用几个ML库和框架的api,即sciketlearn、Keras和TensorFlow。对于无监督和半监督学习,部署了sciket-Learn的api。除了纯MDSE方法(其中支持某些ML方法,例如K-均值、小批量K-均值、DB-扫描、光谱聚类、高斯混合模型、自训练、标签传播和标签扩展)之外,更灵活的方法,混合方法还支持实践者部署具有任意体系结构和学习算法的预先训练的ML模型。 摘要:In this paper, we propose a novel approach to support domain-specific Model-Driven Software Engineering (MDSE) for the real-world use-case scenarios of smart Cyber-Physical Systems (CPS) and the Internet of Things (IoT). We argue that the majority of available data in the nature for Artificial Intelligence (AI), specifically Machine Learning (ML) are unlabeled. Hence, unsupervised and/or semi-supervised ML approaches are the practical choices. However, prior work in the literature of MDSE has considered supervised ML approaches, which only work with labeled training data. Our proposed approach is fully implemented and integrated with an existing state-of-the-art MDSE tool to serve the CPS/IoT domain. Moreover, we validate the proposed approach using a portion of the open data of the REFIT reference dataset for the smart energy systems domain. Our model-to-code transformations (code generators) provide the full source code of the desired IoT services out of the model instances in an automated manner. Currently, we generate the source code in Java and Python. The Python code is responsible for the ML functionalities and uses the APIs of several ML libraries and frameworks, namely Scikit-Learn, Keras and TensorFlow. For unsupervised and semi-supervised learning, the APIs of Scikit-Learn are deployed. In addition to the pure MDSE approach, where certain ML methods, e.g., K-Means, Mini-Batch K-Means, DB-SCAN, Spectral Clustering, Gaussian Mixture Model, Self-Training, Label Propagation and Label Spreading are supported, a more flexible, hybrid approach is also enabled to support the practitioner in deploying a pre-trained ML model with any arbitrary architecture and learning algorithm.
【10】 A Model-Driven Engineering Approach to Machine Learning and Software Modeling 标题:一种模型驱动的机器学习和软件建模工程方法
作者:Armin Moin,Atta Badii,Stephan Günnemann 机构:G¨unnemann, Received: date Accepted: date 备注:Preliminary version 链接:https://arxiv.org/abs/2107.02689 摘要:模型在软件工程(SE)和人工智能(AI)领域都有应用。在前一种情况下,从早期概念化和设计到验证、实现、测试和演化,可以在软件开发生命周期(SDLC)的各个阶段使用软件模型,这些模型可以在不同的抽象层次上指定软件系统架构。然而,在后一种情况下,即人工智能,模型可以提供智能能力,例如预测和决策支持。例如,在目前人工智能最流行的子学科机器学习(ML)中,数学模型可以在观察到的数据实例中学习有用的模式,并且能够在未来做出更好的预测或建议。这项工作的目标是通过将上述社区中的模型结合在一起并提出一种整体方法来创造协同效应。我们说明了软件模型如何能够生成或处理数据分析和ML模型。主要关注物联网(IoT)和智能网络物理系统(CPS)用例,其中ML和模型驱动(基于模型)SE都扮演着关键角色。特别是,我们在一个开源原型中实现了所提出的方法,并使用来自物联网/定制付款服务领域的两个用例对其进行了验证。 摘要:Models are used in both the Software Engineering (SE) and the Artificial Intelligence (AI) communities. In the former case, models of software, which may specify the software system architecture on different levels of abstraction could be used in various stages of the Software Development Life-Cycle (SDLC), from early conceptualization and design, to verification, implementation, testing and evolution. However, in the latter case, i.e., AI, models may provide smart capabilities, such as prediction and decision making support. For instance, in Machine Learning (ML), which is the most popular sub-discipline of AI at the present time, mathematical models may learn useful patterns in the observed data instances and can become capable of making better predictions or recommendations in the future. The goal of this work is to create synergy by bringing models in the said communities together and proposing a holistic approach. We illustrate how software models can become capable of producing or dealing with data analytics and ML models. The main focus is on the Internet of Things (IoT) and smart Cyber-Physical Systems (CPS) use cases, where both ML and model-driven (model-based) SE play a key role. In particular, we implement the proposed approach in an open source prototype and validate it using two use cases from the IoT/CPS domain.
【11】 VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer 标题:VidLanKD:通过视频提炼知识传授提高语言理解能力
作者:Zineng Tang,Jaemin Cho,Hao Tan,Mohit Bansal 机构:UNC Chapel Hill 备注:18 pages (5 figures, 10 tables) 链接:https://arxiv.org/abs/2107.02681 摘要:由于视觉感知能提供文本描述以外的丰富信息来理解世界,因此利用视觉基础进行语言学习越来越受到人们的关注。近年来,vokenization通过使用文本-图像检索模型的预测作为语言模型监控的标签而受到关注。尽管该方法取得了成功,但存在使用有限图像标签的近似误差和小图像文本数据集词汇多样性不足的问题。为了克服这些局限性,我们提出了一种视频语言知识提取方法VidLanKD。我们在一个视频文本数据集上训练一个多模态的教师模型,然后用一个文本数据集将其知识转化为一个学生语言模型。为了避免近似误差,我们建议使用不同的知识提取目标。此外,使用大规模视频文本数据集有助于学习多样化和更丰富的词汇。在我们的实验中,与纯文本语言模型和vokenization模型相比,vidlank在几个下游语言理解任务(包括GLUE、SQuAD和SWAG)上取得了一致的改进。通过对GLUE诊断、PIQA和TRACIE数据集的评估,我们还展示了改进的世界知识、物理推理和时间推理能力。最后,我们提出了综合消融研究以及可视化的学习文本到视频的结果,我们的教师和学生的语言模型。我们的代码和型号可从以下网址获得:https://github.com/zinengtang/VidLanKD 摘要:Since visual perception can give rich information beyond text descriptions for world understanding, there has been increasing interest in leveraging visual grounding for language learning. Recently, vokenization has attracted attention by using the predictions of a text-to-image retrieval model as labels for language model supervision. Despite its success, the method suffers from approximation error of using finite image labels and the lack of vocabulary diversity of a small image-text dataset. To overcome these limitations, we present VidLanKD, a video-language knowledge distillation method for improving language understanding. We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset. To avoid approximation error, we propose to use different knowledge distillation objectives. In addition, the use of a large-scale video-text dataset helps learn diverse and richer vocabularies. In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models, on several downstream language understanding tasks including GLUE, SQuAD, and SWAG. We also demonstrate the improved world knowledge, physical reasoning, and temporal reasoning capabilities of our model by evaluating on the GLUE-diagnostics, PIQA, and TRACIE datasets. Lastly, we present comprehensive ablation studies as well as visualizations of the learned text-to-video grounding results of our teacher and student language models. Our code and models are available at: https://github.com/zinengtang/VidLanKD
【12】 Does Dataset Complexity Matters for Model Explainers? 标题:数据集复杂性对模型解释器重要吗?
作者:José Ribeiro,Raíssa Silva,Ronnie Alves 机构: Federal University of Par´a - UFPA, Bel´em, Brazil, Federal Institute of Par´a - IFPA, Ananindeua, Brazil, IRMB, Montpellier University, Montpellier, France, La Ligue Contre le Cancer, Montpellier, France 备注:12 pages, 5 figures 链接:https://arxiv.org/abs/2107.02661 摘要:基于可解释人工智能(XAI)的策略已经出现在计算领域,以促进更好地理解黑箱模型所做的预测。今天使用的大多数基于XAI的工具都解释了这些类型的模型,生成属性排名的目的在于解释相同的问题,即分析属性的重要性。关于哪个XAI工具产生了一个可解释性的一般等级,还没有达成共识,因此,出现了一些关于工具的建议(Ciu、Dalex、Eli5、Lofo、Shap和Skater)。在这里,我们提出了一个可解释人工智能技术的实验基准,该技术能够基于与不同问题相关的表格数据生成模型不可知的全局可解释性等级。试图回答诸如“不同工具生成的解释是相同的、相似的还是不同的?”和“数据复杂性如何影响模型的可解释性?”。82个计算模型和592个秩的构建结果为我们揭示了可解释性问题的另一面:数据集复杂性! 摘要:Strategies based on Explainable Artificial Intelligence - XAI have emerged in computing to promote a better understanding of predictions made by black box models. Most XAI-based tools used today explain these types of models, generating attribute rankings aimed at explaining the same, that is, the analysis of Attribute Importance. There is no consensus on which XAI tool generates a general rank of explainability, for this reason, several proposals for tools have emerged (Ciu, Dalex, Eli5, Lofo, Shap and Skater). Here, we present an experimental benchmark of explainable AI techniques capable of producing model-agnostic global explainability ranks based on tabular data related to different problems. Seeking to answer questions such as "Are the explanations generated by the different tools the same, similar or different?" and "How does data complexity play along model explainability?". The results from the construction of 82 computational models and 592 ranks give us some light on the other side of the problem of explainability: dataset complexity!
【13】 Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation 标题:利用空间变换网络进行自动大小和姿势同质化以改进和加速儿科分割
作者:Giammarco La Barbera,Pietro Gori,Haithem Boussaid,Bruno Belucci,Alessandro Delmonte,Jeanne Goulin,Sabine Sarnacki,Laurence Rouet,Isabelle Bloch 机构:- LTCI, Telecom Paris, Institut Polytechnique de Paris, France, - Philips Research Paris, Suresnes, France, - IMAG, Imagine Institute, Universite de Paris, France 备注:None 链接:https://arxiv.org/abs/2107.02655 摘要:由于体位和大小的高度异质性以及可用数据的有限性,儿科图像的分割对于深度学习方法来说是一个挑战。在这项工作中,我们提出了一种新的CNN架构,由于使用了空间变换网络(STN),该架构具有姿态和尺度不变性。我们的结构由三个连续的模块组成,这些模块在训练期间一起被估计:(i)一个回归模块来估计相似矩阵,以将输入图像归一化为参考图像(ii)一个可微模块,用于寻找要分割的感兴趣区域(iii)基于流行的UNet架构的分割模块,用于描绘对象。不同于原始的UNet,它努力学习一个复杂的映射,包括姿势和比例的变化,从一个有限的训练数据集,我们的分割模块学习一个简单的映射集中在图像的规格化姿势和大小。此外,通过STN使用自动边界框检测可以节省时间,特别是内存,同时保持相似的性能。我们在儿科腹部CT扫描仪上测试了该方法在肾脏和肾脏肿瘤分割中的应用。结果表明,与标准数据增强(33h)相比,估计的STN大小和姿势均匀化加速了分割(25h),同时获得了相似的肾脏质量(Dice评分的88.01%),并改善了肾脏肿瘤的轮廓(从85.52%提高到87.12%)。 摘要:Due to a high heterogeneity in pose and size and to a limited number of available data, segmentation of pediatric images is challenging for deep learning methods. In this work, we propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN). Our architecture is composed of three sequential modules that are estimated together during training: (i) a regression module to estimate a similarity matrix to normalize the input image to a reference one; (ii) a differentiable module to find the region of interest to segment; (iii) a segmentation module, based on the popular UNet architecture, to delineate the object. Unlike the original UNet, which strives to learn a complex mapping, including pose and scale variations, from a finite training dataset, our segmentation module learns a simpler mapping focusing on images with normalized pose and size. Furthermore, the use of an automatic bounding box detection through STN allows saving time and especially memory, while keeping similar performance. We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners. Results indicate that the estimated STN homogenization of size and pose accelerates the segmentation (25h), compared to standard data-augmentation (33h), while obtaining a similar quality for the kidney (88.01% of Dice score) and improving the renal tumor delineation (from 85.52% to 87.12%).
【14】 Multi-Level Graph Contrastive Learning 标题:多层次图对比学习
作者:Pengpeng Shao,Tong Liu,Dawei Zhang,Jianhua Tao,Feihu Che,Guohua Yang 机构:Yang, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; 链接:https://arxiv.org/abs/2107.02639 摘要:近年来,图表示学习引起了人们极大的兴趣,其目标是学习图中每个节点的判别式嵌入。这些表示方法大多集中在有监督学习上,并且严重依赖于标签信息。然而,在现实世界中,尤其是在专门领域(如生物学)中,注释图的获取是昂贵的,因为它需要注释者具有领域知识来标记图。为了解决这个问题,自监督学习为图表示学习提供了一个可行的解决方案。本文提出了一种多层次图对比学习(MLGCL)框架,通过对比图的空间视图来学习图数据的鲁棒表示。具体来说,我们介绍了一种新的对比视图-拓扑视图和特征空间视图。原始图是一阶近似结构,包含不确定性或误差,而编码特征生成的$k$NN图保持了高阶近似。因此编码特征生成的$k$NN图不仅提供了一个互补视图,而且更适合于GNN编码器提取判别表示。此外,本文还提出了一种多级对比模式,以同时保持图形结构数据的局部相似度和语义相似度。大量实验表明,与现有的最先进的图表示学习方法相比,MLGCL在七个数据集上取得了很好的效果。 摘要:Graph representation learning has attracted a surge of interest recently, whose target at learning discriminant embedding for each node in the graph. Most of these representation methods focus on supervised learning and heavily depend on label information. However, annotating graphs are expensive to obtain in the real world, especially in specialized domains (i.e. biology), as it needs the annotator to have the domain knowledge to label the graph. To approach this problem, self-supervised learning provides a feasible solution for graph representation learning. In this paper, we propose a Multi-Level Graph Contrastive Learning (MLGCL) framework for learning robust representation of graph data by contrasting space views of graphs. Specifically, we introduce a novel contrastive view - topological and feature space views. The original graph is first-order approximation structure and contains uncertainty or error, while the $k$NN graph generated by encoding features preserves high-order proximity. Thus $k$NN graph generated by encoding features not only provide a complementary view, but is more suitable to GNN encoder to extract discriminant representation. Furthermore, we develop a multi-level contrastive mode to preserve the local similarity and semantic similarity of graph-structured data simultaneously. Extensive experiments indicate MLGCL achieves promising results compared with the existing state-of-the-art graph representation learning methods on seven datasets.
【15】 Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation 标题:拥抱黑暗知识:使用正则化知识蒸馏的领域概括
作者:Yufei Wang,Haoliang Li,Lap-pui Chau,Alex C. Kot 机构:Rapid-Rich Object Search Lab, Nanyang Technological University, Department of Electrical Engineering, City University of Hong Kong 备注:Accepted by ACM MM, 2021 链接:https://arxiv.org/abs/2107.02629 摘要:虽然卷积神经网络在不同的任务中有着广泛的应用,但在缺乏足够的、有代表性的数据的情况下,其泛化能力的不足是阻碍其实际应用的难题之一。本文提出了一种简单、有效、即插即用的领域综合知识提取(KDDG)训练策略,该策略建立在以梯度滤波器作为正则化项的知识提取框架之上。我们发现,教师网络中的“丰富的暗知识”以及我们提出的梯度滤波器都可以降低学习映射的难度,从而进一步提高模型的泛化能力。我们还进行了大量的实验,通过与现有的领域泛化技术的比较,表明我们的框架能够显著提高深度神经网络在图像分类、分割、强化学习等不同任务中的泛化能力。最后,我们建议采用两个指标来分析我们提出的方法,以便更好地了解我们提出的方法如何有利于深层神经网络的泛化能力。 摘要:Though convolutional neural networks are widely used in different tasks, lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. In this paper, we propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) which is built upon a knowledge distillation framework with the gradient filter as a novel regularization term. We find that both the ``richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the mapping which further improves the generalization ability of the model. We also conduct experiments extensively to show that our framework can significantly improve the generalization capability of deep neural networks in different tasks including image classification, segmentation, reinforcement learning by comparing our method with existing state-of-the-art domain generalization techniques. Last but not the least, we propose to adopt two metrics to analyze our proposed method in order to better understand how our proposed method benefits the generalization capability of deep neural networks.
【16】 How to Discover a Semantic Web Service by Knowing Its Functionality Parameters 标题:如何通过知道语义Web服务的功能参数来发现它
作者:Golsa Heidari,Kamran Zamanifar,Naser Nematbakhsh,Farhad Mardookhi 机构:Young researchers club, Dept. of Computer Science, Islamic Azad University, Najafabad Branch, Isfahan, Iran., University of Isfahan 备注:5 pages, 1 figure, 2 tables, ICSTE 2010 链接:https://arxiv.org/abs/2107.02609 摘要:在这项工作中,我们展示了如何在web服务库中发现语义web服务。一种基于功能相似度计算的web服务发现方法。我们使用本体Web语言(OWL)定义Web服务功能。我们编写了一些规则来比较两个web服务的参数。我们的算法通过生成一个二部图来比较两个web服务的输入/输出参数。我们使用Ford-Fulkerson算法计算相似度。相似性越高,它们之间的功能差异就越小。最后,我们的算法选择了相似度最高的服务。因此,当我们需要找到一个合适的web服务来替换已经失败的现有web服务时,我们的方法非常有用。特别是在自治系统中,这种情况非常常见和重要,因为我们需要确保基于失败的web服务的应用程序的可用性。我们使用通用描述、发现和集成(universaldescription,Discovery and Integration,UDDI)兼容的web服务注册中心。 摘要:In this work, we show how to discover a semantic web service among a repository of web services. A new approach for web service discovery based on calculating the functions similarity. We define the Web service functions with Ontology Web Language (OWL). We wrote some rules for comparing two web services` parameters. Our algorithm compares the parameters of two web services` inputs/outputs by making a bipartite graph. We compute the similarity rate by using the Ford-Fulkerson algorithm. The higher the similarity, the less are the differences between their functions. At last, our algorithm chooses the service which has the highest similarity. As a consequence, our method is useful when we need to find a web service suitable to replace an existing one that has failed. Especially in autonomic systems, this situation is very common and important since we need to ensure the availability of the application which is based on the failed web service. We use Universal Description, Discovery and Integration (UDDI) compliant web service registry.
【17】 Meta-Reinforcement Learning for Heuristic Planning 标题:启发式规划的元强化学习
作者:Ricardo Luna Gutierrez,Matteo Leonetti 机构:University of Leeds 备注:ICAPS 2021 链接:https://arxiv.org/abs/2107.02603 摘要:在元强化学习(Meta-RL)中,一个agent被训练完成一组任务,以便在新的、看不见的、但相关的任务中更快地准备和学习。训练任务通常是手工制作的,以代表测试任务的预期分布,因此都用于训练。我们发现,给定一组训练任务,如果适当地选择训练任务,学习可以更快更有效(导致在测试任务中有更好的表现)。提出了一种基于信息论的任务选择算法&信息论任务选择算法(ITTS),该算法不考虑任务的生成方式,对元RL训练任务集进行优化。该算法确定哪些训练任务与测试任务有足够的相关性,哪些训练任务之间有足够的差异。我们从文献中复制了不同的meta-RL实验,并表明ITTS在所有这些实验中都提高了最终性能。 摘要:In Meta-Reinforcement Learning (meta-RL) an agent is trained on a set of tasks to prepare for and learn faster in new, unseen, but related tasks. The training tasks are usually hand-crafted to be representative of the expected distribution of test tasks and hence all used in training. We show that given a set of training tasks, learning can be both faster and more effective (leading to better performance in the test tasks), if the training tasks are appropriately selected. We propose a task selection algorithm, Information-Theoretic Task Selection (ITTS), based on information theory, which optimizes the set of tasks used for training in meta-RL, irrespectively of how they are generated. The algorithm establishes which training tasks are both sufficiently relevant for the test tasks, and different enough from one another. We reproduce different meta-RL experiments from the literature and show that ITTS improves the final performance in all of them.
【18】 Depth-Aware Multi-Grid Deep Homography Estimation with Contextual Correlation 标题:基于上下文相关的深度感知多网格深单应估计
作者:Lang Nie,Chunyu Lin,Kang Liao,Shuaicheng Liu,Yao Zhao 链接:https://arxiv.org/abs/2107.02524 摘要:单应估计是计算机视觉中的一项重要任务,如图像拼接、视频稳定、摄像机标定等。传统的单应估计方法严重依赖于特征点的数量和分布,在无纹理场景中鲁棒性较差。相反,学习解决方案试图学习鲁棒的深层特征,但在低重叠率场景中表现出不令人满意的性能。在本文中,我们同时解决了这两个问题,通过设计一个上下文相关层,它可以捕获特征图上的长期相关性,并在学习框架中灵活地桥接。此外,考虑到单一的单应不能代表具有视差的深度变化图像的复杂空间变换,我们提出了从全局到局部的多网格单应预测。此外,我们通过引入一种新的深度感知形状保持丢失机制,使我们的网络具有深度感知能力。在综合基准数据集和真实数据集上进行了大量的实验,结果表明了该方法的优越性。代码和型号将在https://github.com/nie-lang/Multi-Grid-Deep-Homogarphy. 摘要:Homography estimation is an important task in computer vision, such as image stitching, video stabilization, and camera calibration. Traditional homography estimation methods heavily depend on the quantity and distribution of feature points, leading to poor robustness in textureless scenes. The learning solutions, on the contrary, try to learn robust deep features but demonstrate unsatisfying performance in the scenes of low overlap rates. In this paper, we address the two problems simultaneously, by designing a contextual correlation layer, which can capture the long-range correlation on feature maps and flexibly be bridged in a learning framework. In addition, considering that a single homography can not represent the complex spatial transformation in depth-varying images with parallax, we propose to predict multi-grid homography from global to local. Moreover, we equip our network with depth perception capability, by introducing a novel depth-aware shape-preserved loss. Extensive experiments demonstrate the superiority of our method over other state-of-the-art solutions in the synthetic benchmark dataset and real-world dataset. The codes and models will be available at https://github.com/nie-lang/Multi-Grid-Deep-Homogarphy.
【19】 An Evaluation of Machine Learning and Deep Learning Models for Drought Prediction using Weather Data 标题:基于天气数据的机器学习和深度学习模型在干旱预测中的评价
作者:Weiwei Jiang,Jiayun Luo 机构:Department of Electronic Engineering, Tsinghua University, Beijing, China, Department of Statistics, University of California-Los Angeles, Los Angeles, USA 备注:Github link: this https URL 链接:https://arxiv.org/abs/2107.02517 摘要:干旱是一种持续时间长、影响范围广的严重自然灾害。为了减少干旱造成的损失,干旱预测是制定相应的抗旱减灾措施的基础。虽然这一问题在文献中已有研究,但利用天气数据的机器学习模型能否准确预测干旱仍然是个未知数。为了回答这个问题,本研究利用了一个真实的公共数据集,并以过去90天的18个气象指标作为预测因子,对不同的干旱程度进行了预测。综合评价和比较了16种机器学习模型和16种深度学习模型。结果表明,没有一个单一的模型能够同时获得所有评价指标的最佳性能,这表明干旱预测问题仍然具有挑战性。作为进一步研究的基准,代码和结果可以在Github存储库中公开获得。 摘要:Drought is a serious natural disaster that has a long duration and a wide range of influence. To decrease the drought-caused losses, drought prediction is the basis of making the corresponding drought prevention and disaster reduction measures. While this problem has been studied in the literature, it remains unknown whether drought can be precisely predicted or not with machine learning models using weather data. To answer this question, a real-world public dataset is leveraged in this study and different drought levels are predicted using the last 90 days of 18 meteorological indicators as the predictors. In a comprehensive approach, 16 machine learning models and 16 deep learning models are evaluated and compared. The results show no single model can achieve the best performance for all evaluation metrics simultaneously, which indicates the drought prediction problem is still challenging. As benchmarks for further studies, the code and results are publicly available in a Github repository.
【20】 Empowering NGOs in Countering Online Hate Messages 标题:增强非政府组织打击网上仇恨信息的能力
作者:Yi-Ling Chung,Serra Sinem Tekiroglu,Sara Tonelli,Marco Guerini 机构:Fondazione Bruno Kessler, Via Sommarive , Povo, Trento, Italy, University of Trento, Italy 备注:Preprint of the paper published in Online Social Networks and Media Journal (OSNEM) 链接:https://arxiv.org/abs/2107.02472 摘要:对网络仇恨言论的研究主要集中在有害信息的自动检测上。到目前为止,很少有人注意制定有效的战略来打击仇恨言论,特别是通过创造反信息。虽然现有的人工审查和干预策略既耗时又不可扩展,但自然语言处理的进步有可能为仇恨管理提供一种系统的方法。在本文中,我们介绍了一个新的信息通信技术平台,非政府组织运营商可以用来监测和分析社会媒体数据,以及一个反叙事的建议工具。我们的平台旨在提高运营商打击仇视伊斯兰教活动的效率和效果。通过定性和定量的评估,我们在三个国家的一百多个非政府组织运营商中测试了这个平台。结果显示,非政府组织倾向于使用建议工具的平台解决方案,并且产生反叙述所需的时间显著减少。 摘要:Studies on online hate speech have mostly focused on the automated detection of harmful messages. Little attention has been devoted so far to the development of effective strategies to fight hate speech, in particular through the creation of counter-messages. While existing manual scrutiny and intervention strategies are time-consuming and not scalable, advances in natural language processing have the potential to provide a systematic approach to hatred management. In this paper, we introduce a novel ICT platform that NGO operators can use to monitor and analyze social media data, along with a counter-narrative suggestion tool. Our platform aims at increasing the efficiency and effectiveness of operators' activities against islamophobia. We test the platform with more than one hundred NGO operators in three countries through qualitative and quantitative evaluation. Results show that NGOs favor the platform solution with the suggestion tool, and that the time required to produce counter-narratives significantly decreases.
【21】 On-edge Multi-task Transfer Learning: Model and Practice with Data-driven Task Allocation 标题:边缘多任务迁移学习:基于数据驱动的任务分配模型与实践
作者:Zimu Zheng,Qiong Chen,Chuang Hu,Dan Wang,Fangming Liu 备注:None 链接:https://arxiv.org/abs/2107.02466 摘要:在边缘设备上,数据匮乏是一个普遍存在的问题,迁移学习是一种广泛建议的补救方法。然而,转移学习给资源受限的边缘设备带来了沉重的计算负担。现有的任务分配工作通常假设所有提交的任务都是同等重要的,直接应用于多任务迁移学习(MTL)会导致任务级资源分配效率低下。为了解决这些问题,我们首先揭示了测量任务对整体决策绩效改进的影响并量化任务重要性是至关重要的。然后,我们证明了具有任务重要性的MTL任务分配(TATIM)是NP完全背包问题的一个变种,求解该问题的复杂计算需要在不同的环境下重复进行。为了提高计算效率,提出了一种数据驱动的协同任务分配(DCTA)方法。最后,我们不仅通过跟踪驱动的仿真来评估DCTA的性能,而且还通过一个新的综合真实世界AIOps案例研究来评估DCTA的性能,该案例研究通过AIOps系统中的一个新架构和主要组件设计来连接模型和实践。大量实验表明,在求解TATIM时,我们的DCTA算法比现有的算法减少了3.24倍的处理时间,节省了48.4%的能耗。 摘要:On edge devices, data scarcity occurs as a common problem where transfer learning serves as a widely-suggested remedy. Nevertheless, transfer learning imposes a heavy computation burden to resource-constrained edge devices. Existing task allocation works usually assume all submitted tasks are equally important, leading to inefficient resource allocation at a task level when directly applied in Multi-task Transfer Learning (MTL). To address these issues, we first reveal that it is crucial to measure the impact of tasks on overall decision performance improvement and quantify emph{task importance}. We then show that task allocation with task importance for MTL (TATIM) is a variant of the NP-complete Knapsack problem, where the complicated computation to solve this problem needs to be conducted repeatedly under varying contexts. To solve TATIM with high computational efficiency, we propose a Data-driven Cooperative Task Allocation (DCTA) approach. Finally, we evaluate the performance of DCTA by not only a trace-driven simulation, but also a new comprehensive real-world AIOps case study that bridges model and practice via a new architecture and main components design within the AIOps system. Extensive experiments show that our DCTA reduces 3.24 times of processing time, and saves 48.4% energy consumption compared with the state-of-the-art when solving TATIM.
【22】 EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data 标题:EVARS-GPR:季节性数据高斯过程回归的事件触发增广修正
作者:Florian Haselbeck,Dominik G. Grimm 机构: Technical University of Munich, TUM Campus Straubing for Biotechnology and, Sustainability, Bioinformatics, Schulgasse , Straubing, Germany, Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse , Straubing, Germany 链接:https://arxiv.org/abs/2107.02463 摘要:时间序列预测是一个应用日益广泛的领域。然而,随着时间的推移,由于内部或外部的影响,系统行为的变化是具有挑战性的。因此,以前学习的预测模型的预测可能不再有用了。在本文中,我们提出了事件触发的季节数据高斯过程回归(EVARS-GPR)的增广修正,这是一种新的在线算法,能够处理季节数据目标变量尺度的突然变化。为此,EVARS-GPR将在线变化点检测与使用变化点之前样本的数据增强重新调整预测模型相结合。模拟数据实验表明,EVARS-GPR适用于大范围的输出尺度变化。与具有相似计算资源消耗的方法相比,EVARS-GPR在不同的实际数据集上的RMSE平均降低了20.8%。此外,我们还证明了我们的算法相对于所有具有周期性调整策略的比较伙伴,平均运行时间减少了6倍。综上所述,我们提出了一个计算效率高的季节性时间序列的在线预测算法,并在模拟和真实数据上演示了它的功能。所有代码都可在GitHub上公开获取:https://github.com/grimmlab/evars-gpr. 摘要:Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned fore-casting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR com-bines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on sim-ulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8 % lower RMSE on different real-world datasets compared to methods with a similar computational resource con-sumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online fore-casting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr.
【23】 Comparing PCG metrics with Human Evaluation in Minecraft Settlement Generation 标题:MIWART聚落生成中PCG度量与人的评价的比较
作者:Jean-Baptiste Hervé,Christoph Salge 机构:University of Hertfordshire, Hatfield, UK 备注:Accepted to the FDG'21 workshop on PCG 链接:https://arxiv.org/abs/2107.02457 摘要:有一系列的度量可以应用于由过程内容生成生成生成的工件,其中有几个带有定性声明。在本文中,我们调整了一系列现有的PCG指标,以生成雷工定居点,开发了一些新的指标,这些指标受到PCG文献的启发,并将所得测量结果与现有人类评估进行了比较。其目的是分析这些度量如何在不同类别中捕获人类的评估分数,度量如何推广到另一个游戏领域,以及度量如何处理更复杂的工件。我们提供了对各种指标的探索性研究,并提供了信息增益和一些相关分析。我们发现了人类得分与计算特定元素、测量块的多样性以及测量当前复杂块的工艺材料的存在的度量之间的一些关系。 摘要:There are a range of metrics that can be applied to the artifacts produced by procedural content generation, and several of them come with qualitative claims. In this paper, we adapt a range of existing PCG metrics to generated Minecraft settlements, develop a few new metrics inspired by PCG literature, and compare the resulting measurements to existing human evaluations. The aim is to analyze how those metrics capture human evaluation scores in different categories, how the metrics generalize to another game domain, and how metrics deal with more complex artifacts. We provide an exploratory look at a variety of metrics and provide an information gain and several correlation analyses. We found some relationships between human scores and metrics counting specific elements, measuring the diversity of blocks and measuring the presence of crafting materials for the present complex blocks.
【24】 Neural Mixture Models with Expectation-Maximization for End-to-end Deep Clustering 标题:端到端深度聚类的期望最大化神经混合模型
作者:Dumindu Tissera,Kasun Vithanage,Rukshan Wijesinghe,Alex Xavier,Sanath Jayasena,Subha Fernando,Ranga Rodrigo 机构:Department of Electronic & Telecommunication Engineering, Univerisity of Moratuwa, Sri Lanka, CodeGen QBITS Lab, University of Moratuwa, Sri Lanka 链接:https://arxiv.org/abs/2107.02453 摘要:任何聚类算法都必须同步学习对聚类进行建模,并在没有标签的情况下将数据分配给这些聚类。基于混合模型的方法用预先定义的统计分布对聚类进行建模,并根据聚类概率将数据分配给这些聚类。他们按照期望最大化(EM)算法迭代地优化这些分布参数和成员分配。然而,对于大多数实际的聚类任务来说,采用有限参数的手工设计的分布的聚类表示性是不够的。本文利用神经网络实现了基于混合模型的聚类,其中最后一层神经元通过附加的变换,得到近似的聚类分布输出。网络参数构成了这些分布的参数。结果是一个优雅的,更广泛的集群表示比限制混合手工设计的分布。我们通过批处理EM迭代对网络进行端到端的训练,其中前向传递作为E步,后向传递作为M步。在图像聚类中,基于混合的EM目标可以和现有的表示学习方法一起作为聚类目标。特别地,我们证明了当混合EM优化与一致性优化相融合时,它提高了聚类中唯一一致性优化的性能。我们训练的网络优于仍然依赖于k均值的单阶段深度聚类方法,在STL10、CIFAR10、CIFAR100和MNIST中,无监督分类准确率分别为63.8%、58%、25.9%和98.9%。 摘要:Any clustering algorithm must synchronously learn to model the clusters and allocate data to those clusters in the absence of labels. Mixture model-based methods model clusters with pre-defined statistical distributions and allocate data to those clusters based on the cluster likelihoods. They iteratively refine those distribution parameters and member assignments following the Expectation-Maximization (EM) algorithm. However, the cluster representability of such hand-designed distributions that employ a limited amount of parameters is not adequate for most real-world clustering tasks. In this paper, we realize mixture model-based clustering with a neural network where the final layer neurons, with the aid of an additional transformation, approximate cluster distribution outputs. The network parameters pose as the parameters of those distributions. The result is an elegant, much-generalized representation of clusters than a restricted mixture of hand-designed distributions. We train the network end-to-end via batch-wise EM iterations where the forward pass acts as the E-step and the backward pass acts as the M-step. In image clustering, the mixture-based EM objective can be used as the clustering objective along with existing representation learning methods. In particular, we show that when mixture-EM optimization is fused with consistency optimization, it improves the sole consistency optimization performance in clustering. Our trained networks outperform single-stage deep clustering methods that still depend on k-means, with unsupervised classification accuracy of 63.8% in STL10, 58% in CIFAR10, 25.9% in CIFAR100, and 98.9% in MNIST.
【25】 Integrating Circle Kernels into Convolutional Neural Networks 标题:圆核与卷积神经网络的集成
作者:Kun He,Chao Li,Yixiao Yang,Gao Huang,John E. Hopcroft 机构:School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan , Tsinghua University, Beijing , Computer Science Department, Cornell University 备注:13 pages, 6 figures 链接:https://arxiv.org/abs/2107.02451 摘要:平方核是现代卷积神经网络(CNNs)的标准单元,它非常适合卷积运算的张量计算。然而,人类视觉系统中的感受野实际上是各向同性的,就像一个圆。基于这一观察,我们建议使用具有各向同性感受野的圆核进行卷积,并且我们的训练与相应的具有平方核的CNN相比,所需的计算量大致相等。我们的初步实验证明了圆核的合理性。在此基础上,提出了一种将圆形核与方形核相结合的核增强策略,使核的大小/半径在训练过程中是可学习的。注意,我们在推理之前重新参数化了圆核或积分核,因此不需要额外的计算以及测试的参数开销。在ImageNet、CIFAR-10和CIFAR-100等多个标准数据集上进行了大量的实验,并在典型的cnn上使用了圆形核或集成核,结果表明该方法具有很强的竞争力。具体来说,在具有标准数据扩充的ImageNet上,我们的方法将MobileNetV3 Small的性能显著提高了5.20%的top-1精度和3.39%的top-5精度,并将MobileNetV3 Large的性能显著提高了2.16%的top-1精度和1.18%的top-5精度。 摘要:The square kernel is a standard unit for contemporary Convolutional Neural Networks (CNNs), as it fits well on the tensor computation for the convolution operation. However, the receptive field in the human visual system is actually isotropic like a circle. Motivated by this observation, we propose using circle kernels with isotropic receptive fields for the convolution, and our training takes approximately equivalent amount of calculation when compared with the corresponding CNN with square kernels. Our preliminary experiments demonstrate the rationality of circle kernels. We then propose a kernel boosting strategy that integrates the circle kernels with square kernels for the training and inference, and we further let the kernel size/radius be learnable during the training. Note that we reparameterize the circle kernels or integrated kernels before the inference, thus taking no extra computation as well as the number of parameter overhead for the testing. Extensive experiments on several standard datasets, ImageNet, CIFAR-10 and CIFAR-100, using the circle kernels or integrated kernels on typical existing CNNs, show that our approach exhibits highly competitive performance. Specifically, on ImageNet with standard data augmentation, our approach dramatically boosts the performance of MobileNetV3-Small by 5.20% top-1 accuracy and 3.39% top-5 accuracy, and boosts the performance of MobileNetV3-Large by 2.16% top-1 accuracy and 1.18% top-5 accuracy.
【26】 Dynamical System Parameter Identification using Deep Recurrent Cell Networks 标题:基于深递归细胞网络的动态系统参数辨识
作者:Erdem Akagündüz,Oguzhan Cifdaloz 备注:Final version published in Journal of Neural Computing and Applications 链接:https://arxiv.org/abs/2107.02427 摘要:本文用深度学习的方法研究动态系统的参数辨识问题。主要针对二阶线性时不变动力系统,研究了阻尼因子辨识问题。利用具有不同递归细胞的六层深度神经网络,即GRUs、LSTM或BILSTM;通过对动态系统模拟器的输入输出序列对进行反馈,寻找一种有效的深度递归结构来解决阻尼因子辨识问题。我们的研究结果表明,尽管文献中没有使用双向选通递归单元(BILSTM),但与单向选通递归存储单元(如GRU和LSTM)相比,双向选通递归单元(BILSTM)提供了更好的参数识别结果。因此,表明从动力系统收集的有限长度的输入-输出序列对,当被错误地观察时,可以在两个时间方向上携带用于预测动力系统参数的信息。 摘要:In this paper, we investigate the parameter identification problem in dynamical systems through a deep learning approach. Focusing mainly on second-order, linear time-invariant dynamical systems, the topic of damping factor identification is studied. By utilizing a six-layer deep neural network with different recurrent cells, namely GRUs, LSTMs or BiLSTMs; and by feeding input-output sequence pairs captured from a dynamical system simulator, we search for an effective deep recurrent architecture in order to resolve damping factor identification problem. Our study results show that, although previously not utilized for this task in the literature, bidirectional gated recurrent cells (BiLSTMs) provide better parameter identification results when compared to unidirectional gated recurrent memory cells such as GRUs and LSTM. Thus, indicating that an input-output sequence pair of finite length, collected from a dynamical system and when observed anachronistically, may carry information in both time directions for prediction of a dynamical systems parameter.
【27】 Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling 标题:基于随机采样的大规模点云学习语义分割
作者:Qingyong Hu,Bo Yang,Linhai Xie,Stefano Rosa,Yulan Guo,Zhihua Wang,Niki Trigoni,Andrew Markham 机构: Yang is with the Department of Computing, The HongKong Polytechnic University, Guo is with the School ofElectronics and Communication Engineering, Sun Yat-sen University 备注:IEEE TPAMI 2021. arXiv admin note: substantial text overlap with arXiv:1911.11236 链接:https://arxiv.org/abs/2107.02389 摘要:研究了大规模三维点云的有效语义分割问题。由于依赖于昂贵的采样技术或计算繁重的前/后处理步骤,大多数现有方法只能在小尺度点云上进行训练和操作。在本文中,我们引入了RandLA网络,一种高效轻量级的神经网络结构来直接推断大规模点云的逐点语义。我们的方法的关键是使用随机点采样而不是更复杂的点选择方法。虽然计算和内存效率非常高,但随机抽样可以随意丢弃关键特征。为了克服这个问题,我们引入了一个新的局部特征聚合模块,以逐步增加每个三维点的感受野,从而有效地保留几何细节。对比实验表明,我们的RandLA网络一次处理100万个点的速度比现有方法快200倍。此外,在Semantic3D、SemanticKITTI、Toronto3D、NPM3D和S3DIS等五个大型点云数据集上进行了大量实验,验证了RandLA网络的最新语义分割性能。 摘要:We study the problem of efficient semantic segmentation of large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Comparative experiments show that our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches. Moreover, extensive experiments on five large-scale point cloud datasets, including Semantic3D, SemanticKITTI, Toronto3D, NPM3D and S3DIS, demonstrate the state-of-the-art semantic segmentation performance of our RandLA-Net.
【28】 Estimates for the Branching Factors of Atari Games 标题:Atari对策分支因子的估计
作者:Mark J. Nelson 机构:American University, Washington, DC, USA 备注:Accepted at IEEE Conference on Games (CoG) 2021 链接:https://arxiv.org/abs/2107.02385 摘要:博弈的分枝因子是从一个给定的状态可以到达的新状态的平均数。这是一个广泛使用的指标,在人工智能研究板上游戏,但很少计算或讨论视频游戏。本文估计了103个Atari 2600游戏的分支因子,这些游戏是在街机学习环境(ALE)中实现的。根据游戏的不同,ALE在游戏的每一帧中公开3到18个可用动作,这是分支因子的上限。本文通过对每个博弈中可到达的前100万个不同状态的计数,说明在许多博弈中,平均分枝因子通常要低得多,几乎不超过1。除了报告分枝因子外,本文还旨在阐明ALE中什么是不同状态。 摘要:The branching factor of a game is the average number of new states reachable from a given state. It is a widely used metric in AI research on board games, but less often computed or discussed for videogames. This paper provides estimates for the branching factors of 103 Atari 2600 games, as implemented in the Arcade Learning Environment (ALE). Depending on the game, ALE exposes between 3 and 18 available actions per frame of gameplay, which is an upper bound on branching factor. This paper shows, based on an enumeration of the first 1 million distinct states reachable in each game, that the average branching factor is usually much lower, in many games barely above 1. In addition to reporting the branching factors, this paper aims to clarify what constitutes a distinct state in ALE.
【29】 A Short Note on the Relationship of Information Gain and Eluder Dimension 标题:关于信息增益与Eluder维数关系的一点注记
作者:Kaixuan Huang,Sham M. Kakade,Jason D. Lee,Qi Lei 机构:Princeton University, University of Washington, Microsoft Research 链接:https://arxiv.org/abs/2107.02377 摘要:逃逸维和信息增益是bandit和强化学习中常用的两种复杂度度量方法。Eluder维数最初是作为函数类的一般复杂性度量而提出的,但是已知它很小的常见例子是函数空间(向量空间)。在这些情况下,主要的工具上限的逃避维数是椭圆势引理。有趣的是,椭圆势引理在分析线性bandits/强化学习及其非参数推广,即信息增益方面也有显著的特点。我们证明了这不是巧合——对于再生核希尔BERT空间,逃逸维数和信息增益在精确意义上是等价的。 摘要:Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning. Eluder dimension was originally proposed as a general complexity measure of function classes, but the common examples of where it is known to be small are function spaces (vector spaces). In these cases, the primary tool to upper bound the eluder dimension is the elliptic potential lemma. Interestingly, the elliptic potential lemma also features prominently in the analysis of linear bandits/reinforcement learning and their nonparametric generalization, the information gain. We show that this is not a coincidence -- eluder dimension and information gain are equivalent in a precise sense for reproducing kernel Hilbert spaces.
【30】 Weighted Gaussian Process Bandits for Non-stationary Environments 标题:非平稳环境下的加权高斯过程带
作者:Yuntian Deng,Xingyu Zhou,Baekjin Kim,Ambuj Tewari,Abhishek Gupta,Ness Shroff 机构:Ohio State University, Columbus, OH, USA, Wayne State University, Detroit, MI, USA, Department of Statistics, University of Michigan, Ann Arbor, MI, USA, Department of ECE and CSE 链接:https://arxiv.org/abs/2107.02371 摘要:在本文中,我们考虑高斯过程(GP)强盗优化问题在非平稳环境中。为了捕捉外部变化,黑盒函数允许在再生核Hilbert空间(RKHS)内时变。为此,我们开发了一种基于加权高斯过程回归的UCB型算法WGP-UCB。一个关键的挑战是如何处理无限维特征映射。为此,我们利用核近似技术证明了一个次线性后悔界,它是一般非线性报酬的加权时变强盗的第一个次线性后悔保证。这一结果推广了非平稳线性bandits算法和标准GP-UCB算法。进一步,对于加权高斯过程回归,得到了一个新的浓度不等式。我们还提供了加权最大信息增益的通用上界和与权重相关的上界。这些结果对于新闻排名和自适应定价等应用具有潜在的独立意义,在这些应用中,可以采用权重来捕获数据的重要性或质量。最后,我们进行了实验,在许多情况下,与现有的方法相比,该算法具有良好的性能。 摘要:In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are potentially of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.
【31】 Discrete-Valued Neural Communication 标题:离散值神经通信
作者:Dianbo Liu Dianbo_Liu,Alex Lamb,Kenji Kawaguchi,Anirudh Goyal,Chen Sun,Michael Curtis Mozer,Yoshua Bengio 机构:MIT, MILA and Deepmind, Google Brain and University of Colorado, co-first author 链接:https://arxiv.org/abs/2107.02367 摘要:深度学习已经从完全连接的体系结构发展到组织成组件的结构化模型,例如,由位置元素组成的Transformer、划分为插槽的模块化体系结构和由节点组成的图形神经网络。在结构化模型中,一个有趣的问题是如何在独立的组件之间进行动态且可能稀疏的通信。在这里,我们探讨的假设,限制传输信息之间的组件离散表示是一个有益的瓶颈。直觉是人类的语言,通过离散的符号进行交流。尽管个体对“猫”有着不同的理解,但共享的离散表征使得个体间的交流不受个体内在表征差异的阻碍。为了离散专家组件之间动态交流的概念值,我们将量化机制从矢量量化的变分自动编码器扩展到共享码本的多头离散化,并将其用于离散值神经网络通信(DVNC)。我们的实验表明,DVNC极大地提高了系统在各种架构(Transformer、模块化架构和图形神经网络)中的泛化能力。我们还证明了DVNC对超参数选择的鲁棒性,使得该方法在实际应用中非常有用。此外,我们建立了一个理论证明我们的离散化过程,证明它有能力提高噪声鲁棒性和降低潜在的维数的模型。 摘要:Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a ``"cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.
【32】 Effects of Smart Traffic Signal Control on Air Quality 标题:智能交通信号控制对空气质量的影响
作者:Paolo Fazzini,Marco Torre,Valeria Rizza,Francesco Petracchini 机构:Petracchini, Institute of Atmospheric Pollution Research, CNR, Rome, Italy 备注:23 pages, 21 figures. arXiv admin note: substantial text overlap with arXiv:2107.01347 链接:https://arxiv.org/abs/2107.02361 摘要:城市交通网络中的自适应交通信号控制(ATSC)是一项具有挑战性的任务。近年来,基于多智能体深度强化学习(multi-agent deep reinforcement learning,MARL)的几种方法得到了实验研究。这些方法提出了分布式技术,其中每个信号交叉口被视为随机博弈中的一个代理,其目的是优化其附近的车辆流量。在这种情况下,系统朝着对整个交通网络有利的代理之间的平衡方向发展。最近发展起来的一种多智能体变体,称为MA2C(multi-agenta2c),它利用了一种很有前途的思想,即在智能体之间进行某种通信。在这个观点中,代理与其他邻居代理共享他们的策略,从而稳定学习过程,即使代理的数量和种类增加。我们在位于博洛尼亚(意大利)的两个交通网络中对MA2C进行了试验,发现其作用转化为显著减少排放到环境中的污染物量。 摘要:Adaptive traffic signal control (ATSC) in urban traffic networks poses a challenging task due to the complicated dynamics arising in traffic systems. In recent years, several approaches based on multi-agent deep reinforcement learning (MARL) have been studied experimentally. These approaches propose distributed techniques in which each signalized intersection is seen as an agent in a stochastic game whose purpose is to optimize the flow of vehicles in its vicinity. In this setting, the systems evolves towards an equilibrium among the agents that shows beneficial for the whole traffic network. A recently developed multi-agent variant of the well-established advantage actor-critic (A2C) algorithm, called MA2C (multi-agent A2C) exploits the promising idea of some communication among the agents. In this view,the agents share their strategies with other neighbor agents, thereby stabilizing the learning process even when the agents grow in number and variety. We experimented MA2C in two traffic networks located in Bologna (Italy) and found that its action translates into a significant decrease of the amount of pollutants released into the environment.
【33】 Leveraging Clinical Context for User-Centered Explainability: A Diabetes Use Case 标题:利用临床环境实现以用户为中心的可理解性:糖尿病使用案例
作者:Shruthi Chari,Prithwish Chakraborty,Mohamed Ghalwash,Oshani Seneviratne,Elif K. Eyigoz,Daniel M. Gruen,Ching-Hua Chen,Pablo Meyer Rojas,Deborah L. McGuinness 机构: Rensselaer Polytechnic Institute (RPI), NY, USA, Center for Computational Health, IBM Research, NY, USA, IBM Watson Health, MA, USA 备注:4 pages, 4 tables, 3 figures, 2.5 pages appendices To appear and accepted at: KDD Workshop on Applied Data Science for Healthcare (DSHealth), 2021, Virtual 链接:https://arxiv.org/abs/2107.02359 摘要:在医疗保健等高精度领域,人工智能模型的学术进展需要得到解释,以提高现实世界的采用率。我们过去的研究和正在进行的互动表明,如果有办法将关于患者的模型推断与与使用背景相关的解释联系起来,医学专家可以更信任地使用人工智能系统。具体来说,风险预测是一个复杂的问题的诊断和干预的重要性,临床医生,其中他们咨询不同的来源作出决定。为了能够在实践中采用不断改进的人工智能风险预测模型,我们已经开始探索将这些模型与三个感兴趣的维度相关联的技术:患者的临床状态、人工智能对其并发症风险的预测以及支持预测的算法解释。我们通过在2型糖尿病(T2DM)用例中实施概念验证(POC)来验证这些维度的重要性,在该用例中我们评估了慢性肾病(CKD)的风险——一种常见的T2DM共病。在POC中,我们包括CKD的风险预测模型,预测的事后解释者,以及其他自然语言模块,这些模块操作领域知识和cpg来提供上下文。以初级保健医生(PCP)为最终用户,我们在本文中介绍了我们的初步结果和临床医生的反馈。我们的POC方法涵盖了多种知识来源和临床场景,融合了知识来解释PCPs的数据和预测,并得到了医学专家的热烈响应。 摘要:Academic advances of AI models in high-precision domains, like healthcare, need to be made explainable in order to enhance real-world adoption. Our past studies and ongoing interactions indicate that medical experts can use AI systems with greater trust if there are ways to connect the model inferences about patients to explanations that are tied back to the context of use. Specifically, risk prediction is a complex problem of diagnostic and interventional importance to clinicians wherein they consult different sources to make decisions. To enable the adoption of the ever improving AI risk prediction models in practice, we have begun to explore techniques to contextualize such models along three dimensions of interest: the patients' clinical state, AI predictions about their risk of complications, and algorithmic explanations supporting the predictions. We validate the importance of these dimensions by implementing a proof-of-concept (POC) in type-2 diabetes (T2DM) use case where we assess the risk of chronic kidney disease (CKD) - a common T2DM comorbidity. Within the POC, we include risk prediction models for CKD, post-hoc explainers of the predictions, and other natural-language modules which operationalize domain knowledge and CPGs to provide context. With primary care physicians (PCP) as our end-users, we present our initial results and clinician feedback in this paper. Our POC approach covers multiple knowledge sources and clinical scenarios, blends knowledge to explain data and predictions to PCPs, and received an enthusiastic response from our medical expert.
【34】 Impact of On-Chip Interconnect on In-Memory Acceleration of Deep Neural Networks 标题:片上互连对深度神经网络内存加速的影响
作者:Gokul Krishnan,Sumit K. Mandal,Chaitali Chakrabarti,Jae-sun Seo,Umit Y. Ogras,Yu Cao 机构: Arizona State University, University of Wisconsin-Madison 链接:https://arxiv.org/abs/2107.02358 摘要:随着深度神经网络(DNN)的广泛应用,机器学习算法已经朝着两个不同的方向发展——一个是不断增加的连接密度以获得更好的精度,另一个是更紧凑的尺寸以提高能效。连接密度的增加增加了片上数据的移动,这使得高效的片上通信成为DNN加速器的关键功能。这项工作的贡献是三方面的。首先,我们说明了基于点对点(P2P)的互连无法处理DNNs的大量片上数据移动。第二,我们评估了P2P和片上网络(NoC)互连(具有规则的拓扑结构,如mesh)对于一系列dnn的基于SRAM和ReRAM的内存计算(IMC)架构。分析表明了IMC-DNN加速器最佳互连选择的必要性。最后,我们对不同的dnn进行了实验评估,以获得NoC树和NoC网的IMC架构的性能。我们的结论是,在tile层,NoC树适合于在边缘使用紧凑的dnn,而NoC网格对于加速具有高连接密度的dnn是必要的。此外,我们提出了一种技术,以确定最佳选择互连任何给定的DNN。在该技术中,我们使用NoC的分析模型来评估任何给定DNN的端到端通信延迟。我们证明了IMC架构中的互连优化使得VGG-19推断的能量延迟面积积比最新的ReRAM架构提高了6$倍。 摘要:With the widespread use of Deep Neural Networks (DNNs), machine learning algorithms have evolved in two diverse directions -- one with ever-increasing connection density for better accuracy and the other with more compact sizing for energy efficiency. The increase in connection density increases on-chip data movement, which makes efficient on-chip communication a critical function of the DNN accelerator. The contribution of this work is threefold. First, we illustrate that the point-to-point (P2P)-based interconnect is incapable of handling a high volume of on-chip data movement for DNNs. Second, we evaluate P2P and network-on-chip (NoC) interconnect (with a regular topology such as a mesh) for SRAM- and ReRAM-based in-memory computing (IMC) architectures for a range of DNNs. This analysis shows the necessity for the optimal interconnect choice for an IMC DNN accelerator. Finally, we perform an experimental evaluation for different DNNs to empirically obtain the performance of the IMC architecture with both NoC-tree and NoC-mesh. We conclude that, at the tile level, NoC-tree is appropriate for compact DNNs employed at the edge, and NoC-mesh is necessary to accelerate DNNs with high connection density. Furthermore, we propose a technique to determine the optimal choice of interconnect for any given DNN. In this technique, we use analytical models of NoC to evaluate end-to-end communication latency of any given DNN. We demonstrate that the interconnect optimization in the IMC architecture results in up to 6$times$ improvement in energy-delay-area product for VGG-19 inference compared to the state-of-the-art ReRAM-based IMC architectures.
【35】 Proof Generation in CDSAT 标题:CDSAT中的证明生成
作者:Maria Paola Bonacina 机构:Dipartimento di Informatica, Universita degli Studi di Verona, Verona, Italy, Proofs of unsatisfiability of a negated conjecture, or, equivalently, proofs of validity of the original 备注:None 链接:https://arxiv.org/abs/2107.02351 摘要:总结了冲突驱动的可满足性(CDSAT)框架的主要思想,提出了CDSAT中的证明生成方法。 摘要:The main ideas in the CDSAT (Conflict-Driven Satisfiability) framework for SMT are summarized, leading to approaches to proof generation in CDSAT.
【36】 Multi-Modal Mutual Information (MuMMI) Training for Robust Self-Supervised Deep Reinforcement Learning 标题:鲁棒自监督深度强化学习的多模态互信息(MuMMI)训练
作者:Kaiqi Chen,Yong Lee,Harold Soh 机构:Dept. of Computer Science, National University of Singapore. 备注:10 pages, Published in ICRA 2021 链接:https://arxiv.org/abs/2107.02339 摘要:这项工作的重点是学习有用和强大的深世界模型使用多个,可能不可靠的,传感器。我们发现,目前的方法不足以鼓励模式之间的共同代表性;这可能会导致下游任务的性能不佳,以及对特定传感器的过度依赖。作为一个解决方案,我们提出了一个新的多模态深潜状态空间模型,利用互信息下界进行训练。关键的创新是一个特别设计的密度比估计器,鼓励每个模态的潜在代码之间的一致性。我们的任务是学习多模态自然MuJoCo基准上的策略(以自我监督的方式),以及一个具有挑战性的擦表任务。实验表明,我们的方法明显优于现有的深度强化学习方法,尤其是在缺少观察的情况下。 摘要:This work focuses on learning useful and robust deep world models using multiple, possibly unreliable, sensors. We find that current methods do not sufficiently encourage a shared representation between modalities; this can cause poor performance on downstream tasks and over-reliance on specific sensors. As a solution, we contribute a new multi-modal deep latent state-space model, trained using a mutual information lower-bound. The key innovation is a specially-designed density ratio estimator that encourages consistency between the latent codes of each modality. We tasked our method to learn policies (in a self-supervised manner) on multi-modal Natural MuJoCo benchmarks and a challenging Table Wiping task. Experiments show our method significantly outperforms state-of-the-art deep reinforcement learning methods, particularly in the presence of missing observations.
【37】 Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering 标题:当心你的异类!视觉答疑中离群点对主动学习的负面影响研究
作者:Siddharth Karamcheti,Ranjay Krishna,Li Fei-Fei,Christopher D. Manning 机构:Department of Computer Science, Stanford University 备注:Accepted at ACL-IJCNLP 2021. 17 pages, 16 Figures 链接:https://arxiv.org/abs/2107.02331 摘要:主动学习有望缓解有监督机器学习对海量数据的需求:在主题分类和目标识别等传统任务中,它成功地将样本效率提高了一个数量级。然而,我们发现了一个惊人的反差,这一承诺:跨5个模型和4个数据集的任务,视觉问答,各种各样的主动学习方法无法超越随机选择。为了理解这种差异,我们在每个示例的基础上分析了8种主动学习方法,并将问题确定为集体异常值——主动学习方法更喜欢获取但模型无法学习的示例组(例如,询问图像中的文本或需要外部知识的问题)。通过系统的消融实验和定性可视化,我们验证了集体异常值是导致基于池的主动学习退化的普遍现象。值得注意的是,我们发现,随着主动学习池中集体异常值的减少,主动学习样本效率显著提高。最后,我们进行了讨论,并提出了在未来工作中减轻这些异常值影响的建议。 摘要:Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we uncover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. To understand this discrepancy, we profile 8 active learning methods on a per-example basis, and identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn (e.g., questions that ask about text in images or require external knowledge). Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work.
【38】 Polarized skylight orientation determination artificial neural network 标题:偏振天窗方位确定人工神经网络
作者:Huaju Liang,Hongyang Bai,Ke Hu,Xinbo Lv 机构:School of Energy and Power Engineering, Nanjing University of Science and Technology, Nanjing, School of Automation, Nanjing University of Science and Technology, Nanjing , China 链接:https://arxiv.org/abs/2107.02328 摘要:本文提出了一种利用偏光天窗确定方位的人工神经网络方法。该神经网络具有特定的扩张卷积,可以提取不同偏振方向的光强信息。然后在网络中直接提取偏振度(DOP)和偏振角(AOP)。另外,网络输出采用了方向指数函数编码,能更好地反映昆虫对偏振信息的编码,提高了定位精度。最后,在一个公共偏光天窗导航数据集上进行了训练和测试,实验结果证明了网络的稳定性和有效性。 摘要:This paper proposes an artificial neural network to determine orientation using polarized skylight. This neural network has specific dilated convolution, which can extract light intensity information of different polarization directions. Then, the degree of polarization (DOP) and angle of polarization (AOP) are directly extracted in the network. In addition, the exponential function encoding of orientation is designed as the network output, which can better reflect the insect's encoding of polarization information, and improve the accuracy of orientation determination. Finally, training and testing were conducted on a public polarized skylight navigation dataset, and the experimental results proved the stability and effectiveness of the network.
【39】 Pedestrian Emergence Estimation and Occlusion-Aware Risk Assessment for Urban Autonomous Driving 标题:城市自动驾驶行人涌现估计与遮挡感知风险评估
作者:Mert Koc,Ekim Yurtsever,Keith Redmill,Umit Ozguner 机构: [ 16] improved the reachability 1The Department of Electrical and Computer Engineering, The OhioState University 备注:Accepted to ITSC2021 链接:https://arxiv.org/abs/2107.02326 摘要:避免看不见或部分闭塞的脆弱道路使用者(VRU)是城市场景中完全自主驾驶的一个主要挑战。然而,遮挡感知的风险评估系统还没有得到广泛的研究。在这里,我们提出了一个行人出现估计和遮挡感知的城市自主驾驶风险评估系统。首先,提出的系统利用可用的上下文信息,如可见的汽车和行人,估计行人出现在闭塞区域的概率。这些概率然后用于风险评估框架,并纳入纵向运动控制器。所提出的控制器测试了几个基线控制器,这些控制器概括了一些常见的驾驶风格。模拟的测试场景包括随机停放的汽车和行人,他们中的大多数被挡在汽车的视线之外,随机出现。提出的控制器在安全性和舒适性方面优于基线。 摘要:Avoiding unseen or partially occluded vulnerable road users (VRUs) is a major challenge for fully autonomous driving in urban scenes. However, occlusion-aware risk assessment systems have not been widely studied. Here, we propose a pedestrian emergence estimation and occlusion-aware risk assessment system for urban autonomous driving. First, the proposed system utilizes available contextual information, such as visible cars and pedestrians, to estimate pedestrian emergence probabilities in occluded regions. These probabilities are then used in a risk assessment framework, and incorporated into a longitudinal motion controller. The proposed controller is tested against several baseline controllers that recapitulate some commonly observed driving styles. The simulated test scenarios include randomly placed parked cars and pedestrians, most of whom are occluded from the ego vehicle's view and emerges randomly. The proposed controller outperformed the baselines in terms of safety and comfort measures.
【40】 A visual introduction to Gaussian Belief Propagation 标题:高斯信念传播的可视化介绍
作者:Joseph Ortiz,Talfan Evans,Andrew J. Davison 机构:Imperial College London,DeepMind 备注:See online version of this article: this https URL 链接:https://arxiv.org/abs/2107.02308 摘要:在这篇文章中,我们提出了一个可视化的介绍高斯信念传播(GBP),一个近似的概率推理算法,通过在任意结构的因子图的节点之间传递消息来进行操作。作为循环信念传播的一种特殊情况,GBP更新只依赖于局部信息,并且将独立于消息调度而收敛。我们的主要论点是,考虑到计算硬件的最新趋势,GBP具有正确的计算特性,可以作为未来机器学习系统的可伸缩分布式概率推理框架。 摘要:In this article, we present a visual introduction to Gaussian Belief Propagation (GBP), an approximate probabilistic inference algorithm that operates by passing messages between the nodes of arbitrarily structured factor graphs. A special case of loopy belief propagation, GBP updates rely only on local information and will converge independently of the message schedule. Our key argument is that, given recent trends in computing hardware, GBP has the right computational properties to act as a scalable distributed probabilistic inference framework for future machine learning systems.
【41】 Knowledge Modelling and Active Learning in Manufacturing 标题:制造业中的知识建模与主动学习
作者:Jože M. Rožanec,Inna Novalija,d Patrik Zajec,Klemen Kenda,Dunja Mladenić 机构:Mladeni´c, Joˇzef Stefan Institute, Jamova , Ljubljana, Slovenia, Joˇzef Stefan International Postgraduate School, Jamova , Ljubljana, Slovenia 链接:https://arxiv.org/abs/2107.02298 摘要:制造领域日益数字化,需要建立足够的知识模型来获取相关信息。本体论和知识图提供了建模和关联各种概念、问题和配置的方法。两者都可以通过演绎推理产生新的知识,识别缺失的知识。数字化增加了可用的数据量,但许多数据没有标记,不能直接用于训练有监督的机器学习模型。主动学习可以用来识别信息量最大的数据实例,从而获得用户的反馈,减少摩擦,最大限度地获取知识。通过将语义技术和主动学习相结合,可以利用现有的知识和数据来解决制造领域中的多个用例。 摘要:The increasing digitalization of the manufacturing domain requires adequate knowledge modeling to capture relevant information. Ontologies and Knowledge Graphs provide means to model and relate a wide range of concepts, problems, and configurations. Both can be used to generate new knowledge through deductive inference and identify missing knowledge. While digitalization increases the amount of data available, much data is not labeled and cannot be directly used to train supervised machine learning models. Active learning can be used to identify the most informative data instances for which to obtain users' feedback, reduce friction, and maximize knowledge acquisition. By combining semantic technologies and active learning, multiple use cases in the manufacturing domain can be addressed taking advantage of the available knowledge and data.
【42】 A Review of Explainable Artificial Intelligence in Manufacturing 标题:制造业中的可解释人工智能研究综述
作者:Georgios Sofianidis,Jože M. Rožanec,Dunja Mladenić,Dimosthenis Kyriazis 机构: Department of Digital Systems, University of Piraeus, Piraeus, Greece, Joˇzef Stefan Institute, Jamova , Ljubljana, Slovenia, Joˇzef Stefan International Postgraduate School, Jamova , Ljubljana, Slovenia 链接:https://arxiv.org/abs/2107.02295 摘要:人工智能(AI)系统在制造领域的实施,利用了诸如深度学习和强化学习技术等强大工具,实现了更高的生产效率、卓越的性能和更安全的操作。尽管这些模型精度很高,但它们大多被认为是黑匣子:人类无法理解。不透明性影响对系统的信任,这是决策过程中的一个关键因素。本文概述了可解释人工智能(XAI)技术作为提高模型透明度的一种手段。我们分析不同的指标来评估这些技术,并描述了在制造领域的几个应用场景。 摘要:The implementation of Artificial Intelligence (AI) systems in the manufacturing domain enables higher production efficiency, outstanding performance, and safer operations, leveraging powerful tools such as deep learning and reinforcement learning techniques. Despite the high accuracy of these models, they are mostly considered black boxes: they are unintelligible to the human. Opaqueness affects trust in the system, a factor that is critical in the context of decision-making. We present an overview of Explainable Artificial Intelligence (XAI) techniques as a means of boosting the transparency of models. We analyze different metrics to evaluate these techniques and describe several application scenarios in the manufacturing domain.
【43】 Weakly Supervised Named Entity Tagging with Learnable Logical Rules 标题:具有可学习逻辑规则的弱监督命名实体标注
作者:Jiacheng Li,Haibo Ding,Jingbo Shang,Julian McAuley,Zhe Feng 机构:University of California, San Diego, Bosch Research North America 链接:https://arxiv.org/abs/2107.02282 摘要:研究了利用少量规则作为弱监督来构建实体标注系统的问题。以往的方法主要是基于上下文和专家提供的规则对实体类型进行消歧,而假设实体跨度是给定的。在这项工作中,我们提出了一种新的方法,即引导高质量的逻辑规则来训练一个完全自动化的神经标记器。具体来说,我们引入由简单规则组成的复合规则,以提高边界检测的精度,并生成更多样化的伪标签。我们进一步设计了一个动态标签选择策略,以确保伪标签的质量,从而避免过度拟合神经标记器。在三个数据集上的实验表明,该方法的性能优于其他弱监督方法,甚至可以与现有的远程监督标记方法相媲美,该方法仅从20条简单规则出发,词库超过2000条。我们的方法可以作为在新兴领域和任务中快速构建标记器的工具。案例研究表明,学习规则可以解释预测实体。 摘要:We study the problem of building entity tagging systems by using a few rules as weak supervision. Previous methods mostly focus on disambiguation entity types based on contexts and expert-provided rules, while assuming entity spans are given. In this work, we propose a novel method TALLOR that bootstraps high-quality logical rules to train a neural tagger in a fully automated manner. Specifically, we introduce compound rules that are composed from simple rules to increase the precision of boundary detection and generate more diverse pseudo labels. We further design a dynamic label selection strategy to ensure pseudo label quality and therefore avoid overfitting the neural tagger. Experiments on three datasets demonstrate that our method outperforms other weakly supervised methods and even rivals a state-of-the-art distantly supervised tagger with a lexicon of over 2,000 terms when starting from only 20 simple rules. Our method can serve as a tool for rapidly building taggers in emerging domains and tasks. Case studies show that learned rules can potentially explain the predicted entities.
【44】 Dueling Bandits with Adversarial Sleeping 标题:对抗性睡眠决斗土匪
作者:Aadirupa Saha,Pierre Gaillard 链接:https://arxiv.org/abs/2107.02274 摘要:介绍了具有随机偏好和对抗可用性的睡眠决斗土匪问题(DB-SPAA)。在几乎所有的决斗土匪应用中,决策空间往往随时间而变化;例如,零售店管理,网上购物,餐厅推荐,搜索引擎优化等。令人惊讶的是,这'睡觉方面'决斗土匪从来没有研究过的文献。与决斗强盗一样,目标是通过顺序查询项目对的偏好反馈来与最佳手臂竞争。然而,非平凡性的结果是由于非平稳项空间,允许任何任意子集项在每一轮都不可用。其目的是找到一个最佳的“无遗憾”政策,可以确定最好的可用项目在每一轮,而不是标准的“固定最佳武器遗憾目标”决斗土匪。我们首先导出DB-SPAA$Omega(sum{i=1}^{K-1}sum{j=i 1}^Kfrac{log T}{Delta(i,j)})$)的实例特定下界,其中$K$是项目数,$Delta(i,j)$是项目$i$和$j$之间的差距。这表明偏好反馈下的睡眠问题比经典的多臂土匪问题更为困难。然后我们提出了两种算法,具有近似最优的遗憾保证。我们的结果得到了经验的证实。 摘要:We introduce the problem of sleeping dueling bandits with stochastic preferences and adversarial availabilities (DB-SPAA). In almost all dueling bandit applications, the decision space often changes over time; eg, retail store management, online shopping, restaurant recommendation, search engine optimization, etc. Surprisingly, this `sleeping aspect' of dueling bandits has never been studied in the literature. Like dueling bandits, the goal is to compete with the best arm by sequentially querying the preference feedback of item pairs. The non-triviality however results due to the non-stationary item spaces that allow any arbitrary subsets items to go unavailable every round. The goal is to find an optimal `no-regret' policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits. We first derive an instance-specific lower bound for DB-SPAA $Omega( sum_{i =1}^{K-1}sum_{j=i 1}^K frac{log T}{Delta(i,j)})$, where $K$ is the number of items and $Delta(i,j)$ is the gap between items $i$ and $j$. This indicates that the sleeping problem with preference feedback is inherently more difficult than that for classical multi-armed bandits (MAB). We then propose two algorithms, with near optimal regret guarantees. Our results are corroborated empirically.
【45】 VolNet: Estimating Human Body Part Volumes from a Single RGB Image 标题:VolNet:从单个RGB图像估计人体部位体积
作者:Fabian Leinen,Vittorio Cozzolino,Torsten Schön 机构:Technical University of Munich, Audi AG, Torsten Sch¨on, Technische Hochschule, Ingolstadt 链接:https://arxiv.org/abs/2107.02259 摘要:从单个RGB图像估计人体体积是一个具有挑战性的问题,尽管研究界对此关注甚少。然而,VolNet是一种利用2D和3D姿势估计、身体部位分割和从单个2D RGB图像中提取的体积回归并结合受试者身高的结构,可用于估计总身体体积。VolNet被设计用于预测中间任务中的二维和三维姿势以及身体部位分割。我们生成了一个合成的、大规模的人体照片真实感图像数据集,这些图像具有广泛的身体形状和真实的姿势,称为超现实主义。通过使用Volnet并结合多重叠加沙漏网络和ResNeXt,我们的模型在10%的容许阈值下正确预测了约82%的病例的体积。这是一个相当大的改善相比,国家的最先进的解决方案,如人体网只有约38%的成功率。 摘要:Human body volume estimation from a single RGB image is a challenging problem despite minimal attention from the research community. However VolNet, an architecture leveraging 2D and 3D pose estimation, body part segmentation and volume regression extracted from a single 2D RGB image combined with the subject's body height can be used to estimate the total body volume. VolNet is designed to predict the 2D and 3D pose as well as the body part segmentation in intermediate tasks. We generated a synthetic, large-scale dataset of photo-realistic images of human bodies with a wide range of body shapes and realistic poses called SURREALvols. By using Volnet and combining multiple stacked hourglass networks together with ResNeXt, our model correctly predicted the volume in ~82% of cases with a 10% tolerance threshold. This is a considerable improvement compared to state-of-the-art solutions such as BodyNet with only a ~38% success rate.
【46】 Vision Xformers: Efficient Attention for Image Classification 标题:视觉变形器:图像分类的有效关注点
作者:Pranav Jeevan,Amit Sethi 机构:Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India 备注:7 pages, 4 figures 链接:https://arxiv.org/abs/2107.02239 摘要:线性注意机制为克服二次复杂度的瓶颈提供了希望,二次复杂度限制了transformer模型在视觉任务中的应用。我们修改了ViT体系结构,将二次注意替换为高效的变换器,如Performer、Linformer和Nystr“omformer,线性复杂度,创建视觉X-formers(ViX)),从而处理更长的序列数据。结果表明,ViX比ViT具有更好的图像分类性能,占用较少的计算资源。我们进一步证明在ViX中用卷积层代替嵌入线性层可以进一步提高其性能。我们对LeViT和Compact coulsional transformer(CCT)等最新visions-transformer模型的测试表明,用Nystr“omformer或Performer替换注意力可以节省GPU的使用和内存,而不会降低性能。合并这些变化可以使数据和计算资源有限的人能够访问Transformer,从而使Transformer民主化。 摘要:Linear attention mechanisms provide hope for overcoming the bottleneck of quadratic complexity which restricts application of transformer models in vision tasks. We modify the ViT architecture to work on longer sequence data by replacing the quadratic attention with efficient transformers like Performer, Linformer and Nystr"omformer of linear complexity creating Vision X-formers (ViX). We show that ViX performs better than ViT in image classification consuming lesser computing resources. We further show that replacing the embedding linear layer by convolutional layers in ViX further increases their performance. Our test on recent visions transformer models like LeViT and Compact Convolutional Transformer (CCT) show that replacing the attention with Nystr"omformer or Performer saves GPU usage and memory without deteriorating performance. Incorporating these changes can democratize transformers by making them accessible to those with limited data and computing resources.
【47】 End-to-End Weak Supervision 标题:端到端监管不力
作者:Salva Rühling Cachay,Benedikt Boecking,Artur Dubrawski 机构:Technical University of Darmstadt, Carnegie Mellon University 链接:https://arxiv.org/abs/2107.02233 摘要:聚合多个弱监督源(WS)可以通过替换繁琐的人工收集基本事实标签来缓解许多机器学习应用程序中普遍存在的数据标签瓶颈。然而,目前不使用任何标记训练数据的最新方法需要两个独立的建模步骤:基于WS源学习概率潜变量模型——做出实践中很少成立的假设——然后进行下游模型训练。重要的是,建模的第一步不考虑下游模型的性能。为了解决这些问题,我们提出了一种端到端的直接学习下游模型的方法,通过最大化其与通过使用神经网络重新参数化以前的概率后验概率而生成的概率标签的一致性。我们的结果显示,在下游测试集的终端模型性能方面,以及在弱监督源之间的依赖性方面,性能比以前的工作有了改进。 摘要:Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources -- making assumptions that rarely hold in practice -- followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.
【48】 Meta-learning Amidst Heterogeneity and Ambiguity 标题:异质与歧义中的元学习
作者:Kyeongryeol Go,Seyoung Yun 机构:Graduate School of AI, KAIST, Daejeon, South Korea 链接:https://arxiv.org/abs/2107.02228 摘要:元学习的目的是学习一个模型,它可以处理由未知但共享的分布生成的多个任务。然而,典型的元学习算法假设任务是相似的,这样一个元学习者就足以聚合各个方面的变化。此外,当有限的信息作为上下文时,对不确定性的考虑较少。在本文中,我们设计了一个新的元学习框架,称为异质性和歧义中的元学习(MAHA),它在基于任务识别能力的预测方面优于以往的工作。通过大量的回归和分类实验,我们证明了该模型的有效性,该模型对任务异质性和模糊性都具有鲁棒性。 摘要:Meta-learning aims to learn a model that can handle multiple tasks generated from an unknown but shared distribution. However, typical meta-learning algorithms have assumed the tasks to be similar such that a single meta-learner is sufficient to aggregate the variations in all aspects. In addition, there has been less consideration on uncertainty when limited information is given as context. In this paper, we devise a novel meta-learning framework, called Meta-learning Amidst Heterogeneity and Ambiguity (MAHA), that outperforms previous works in terms of prediction based on its ability on task identification. By extensively conducting several experiments in regression and classification, we demonstrate the validity of our model, which turns out to be robust to both task heterogeneity and ambiguity.
【49】 An Evolutionary Algorithm for Task Scheduling in Crowdsourced Software Development 标题:众包软件开发中任务调度的一种进化算法
作者:Razieh Saremi,Hardik Yagnik,Julian Togelius,Ye Yang,Guenther Ruhe 机构:Ruhe, Stevens Institute of Technology, Hoboken NJ, USA, New York University, NYC NY, USA, University of Calgary, Calgary, Alberta, Canada 备注:16 pages, 5 figures, 3 tables 链接:https://arxiv.org/abs/2107.02202 摘要:软件任务的复杂性和众包开发者行为的不确定性使得规划众包软件开发(CSD)项目具有挑战性。在竞争激烈的众包市场中,来自多个同时开放的任务的共享工作者资源的竞争为软件众包的潜在结果增加了另一层不确定性。这些因素导致需要通过自动化调度来支持CSD经理,以提高众包流程和结果的可见性和可预测性。为此,本文提出了一种基于进化算法的众包软件开发任务调度方法。提出的进化调度方法采用多目标遗传算法来推荐一个最优的任务开始日期。该方法使用三个适应度函数,分别基于项目工期、任务相似度和任务失败预测。任务失败适应度函数使用神经网络来预测特定任务开始日期的任务失败概率。然后,该方法为整个项目和每个单独的任务推荐最佳的任务开始日期,以达到最低的项目失败率。对4个项目的实验结果表明,该方法有可能缩短项目工期33-78%。 摘要:The complexity of software tasks and the uncertainty of crowd developer behaviors make it challenging to plan crowdsourced software development (CSD) projects. In a competitive crowdsourcing marketplace, competition for shared worker resources from multiple simultaneously open tasks adds another layer of uncertainty to the potential outcomes of software crowdsourcing. These factors lead to the need for supporting CSD managers with automated scheduling to improve the visibility and predictability of crowdsourcing processes and outcomes. To that end, this paper proposes an evolutionary algorithm-based task scheduling method for crowdsourced software development. The proposed evolutionary scheduling method uses a multiobjective genetic algorithm to recommend an optimal task start date. The method uses three fitness functions, based on project duration, task similarity, and task failure prediction, respectively. The task failure fitness function uses a neural network to predict the probability of task failure with respect to a specific task start date. The proposed method then recommends the best tasks start dates for the project as a whole and each individual task so as to achieve the lowest project failure ratio. Experimental results on 4 projects demonstrate that the proposed method has the potential to reduce project duration by a factor of 33-78%.
【50】 Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems 标题:倾听的代理:多感觉系统的高通量强化学习
作者:Shashank Hegde,Anssi Kanervisto,Aleksei Petrenko 机构:University of Southern California, Los Angeles, United States, University of Eastern Finland, Joensuu, Finland 备注:To appear in IEEE Conference on Games 2021. Video demonstrations and experiment can be found at this https URL 链接:https://arxiv.org/abs/2107.02195 摘要:人类和其他聪明的动物进化出高度复杂的感知系统,将多种感官形态结合起来。另一方面,最先进的人工智能体主要依靠视觉输入或仪器化环境提供的结构化低维观察。基于视觉和听觉的联合输入来学习行为仍然是一个新的研究课题,除了简单的场景之外,还没有被探索过。为了促进这方面的进展,我们引入了新版本的VizDoom模拟器,以创建一个高效的学习环境,提供原始音频观察。我们研究了不同模型架构在一系列任务中的性能,这些任务要求代理识别声音并执行以自然语言给出的指令。最后,我们训练我们的代理人玩完整的游戏的厄运,并发现它可以始终击败传统的视觉为基础的对手。我们目前正在将增强的模拟器与主ViZDoom代码库合并。视频演示和实验代码可以在https://sites.google.com/view/sound-rl. 摘要:Humans and other intelligent animals evolved highly sophisticated perception systems that combine multiple sensory modalities. On the other hand, state-of-the-art artificial agents rely mostly on visual inputs or structured low-dimensional observations provided by instrumented environments. Learning to act based on combined visual and auditory inputs is still a new topic of research that has not been explored beyond simple scenarios. To facilitate progress in this area we introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We study the performance of different model architectures in a series of tasks that require the agent to recognize sounds and execute instructions given in natural language. Finally, we train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary. We are currently in the process of merging the augmented simulator with the main ViZDoom code repository. Video demonstrations and experiment code can be found at https://sites.google.com/view/sound-rl.
【51】 Identifying negativity factors from social media text corpus using sentiment analysis method 标题:使用情感分析方法从社交媒体文本语料库中识别负面因素
作者:Mohammad Aimal,Maheen Bakhtyar,Junaid Baber,Sadia Lakho,Umar Mohammad,Warda Ahmed,Jahanvash Karim 机构:Department of CS and IT, University of Balochistan, SBK Women’s University Balochistan, Wardah Ahmed, Institute of Management Sciences 备注:None 链接:https://arxiv.org/abs/2107.02175 摘要:自动情感分析在决策中起着至关重要的作用。许多组织花费大量预算,通过手动查看反馈/评论或tweet来了解客户满意度。自动情绪分析可以给出针对任何事件、产品或活动收到的评论的总体情况。通常,评论/推文分为两大类,一类是负面的,一类是正面的。然而,负面评论过于抽象,难以理解其根本原因和语境。组织有兴趣找出负面影响的确切原因。在这项研究中,我们将负面评论分层次进行,并将它们与更多的类联系起来。tweet是从Twitter和Facebook等社交媒体网站提取的。如果情绪分析将任何tweet分类为负面类,那么我们进一步尝试将负面评论与更多可能的负面类联系起来。根据专家意见,负面评论/微博进一步分为8类。评估了不同的机器学习算法,并报告了它们的精度。 摘要:Automatic sentiment analysis play vital role in decision making. Many organizations spend a lot of budget to understand their customer satisfaction by manually going over their feedback/comments or tweets. Automatic sentiment analysis can give overall picture of the comments received against any event, product, or activity. Usually, the comments/tweets are classified into two main classes that are negative or positive. However, the negative comments are too abstract to understand the basic reason or the context. organizations are interested to identify the exact reason for the negativity. In this research study, we hierarchically goes down into negative comments, and link them with more classes. Tweets are extracted from social media sites such as Twitter and Facebook. If the sentiment analysis classifies any tweet into negative class, then we further try to associates that negative comments with more possible negative classes. Based on expert opinions, the negative comments/tweets are further classified into 8 classes. Different machine learning algorithms are evaluated and their accuracy are reported.
【52】 WisdomNet: Prognosis of COVID-19 with Slender Prospect of False Negative Cases and Vaticinating the Probability of Maturation to ARDS using Posteroanterior Chest X-Rays 标题:WisdomNet:冠状病毒的预后与假阴性病例前景渺茫及应用后前胸X线片评估ARDS的成熟概率
作者:Peeyush Kumar,Ayushe Gangal,Sunita Kumari 机构:G.B. Pant Government Engineering College, Delhi - , India. 备注:None 链接:https://arxiv.org/abs/2107.01392 摘要:冠状病毒是一个由多种病毒组成的大病毒家族,其中一些在哺乳动物中传播,另一些在人类中引起疾病。COVID-19传染性强,传播迅速,可早期诊断为卓越状态。世界各地的研究人员、医学专家和组织一直在不懈地努力与这种病毒作斗争,帮助遏制这种病毒。本文提出了一种新的神经网络&WisdomNet,用于胸部X线诊断COVID-19。智慧网以群体智慧的概念为其创始理念。它是一个两层卷积神经网络(CNN),以胸部x线图像为输入。所提出的神经网络的两层都由若干个神经网络组成。本研究使用的数据集由Cohen博士在GitHub上编辑和共享的COVID-19阳性患者的胸部x射线图像组成,健康肺和受病毒和细菌肺炎影响的肺的胸部x射线图像从Kaggle获得。该网络不仅能精确定位COVID-19的存在,而且能给出该疾病发展为急性呼吸窘迫综合征(ARDS)的可能性。因此,预测COVID-19阳性患者的疾病进展。该网络还通过采用高阈值来缩小假阴性病例的发生率,从而有助于抑制疾病的传播,并使成功预测COVID-19、细菌性和病毒性肺炎患者胸部x光片中COVID-19的准确率达到100%。 摘要:Coronavirus is a large virus family consisting of diverse viruses, some of which disseminate among mammals and others cause sickness among humans. COVID-19 is highly contagious and is rapidly spreading, rendering its early diagnosis of preeminent status. Researchers, medical specialists and organizations all over the globe have been working tirelessly to combat this virus and help in its containment. In this paper, a novel neural network called WisdomNet has been proposed, for the diagnosis of COVID-19 using chest X-rays. The WisdomNet uses the concept of Wisdom of Crowds as its founding idea. It is a two-layered convolutional Neural Network (CNN), which takes chest x-ray images as input. Both layers of the proposed neural network consist of a number of neural networks each. The dataset used for this study consists of chest x-ray images of COVID-19 positive patients, compiled and shared by Dr. Cohen on GitHub, and the chest x-ray images of healthy lungs and lungs affected by viral and bacterial pneumonia were obtained from Kaggle. The network not only pinpoints the presence of COVID-19, but also gives the probability of the disease maturing into Acute Respiratory Distress Syndrome (ARDS). Thus, predicting the progression of the disease in the COVID-19 positive patients. The network also slender the occurrences of false negative cases by employing a high threshold value, thus aids in curbing the spread of the disease and gives an accuracy of 100% for successfully predicting COVID-19 among the chest x-rays of patients affected with COVID-19, bacterial and viral pneumonia.