访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.AI人工智能,共计28篇
【1】 A Logic of Expertise 标题:专业知识的逻辑
作者:Joseph Singleton 机构:Cardiff University, Cardiff, UK 链接:https://arxiv.org/abs/2107.10832 摘要:在这篇文章中,我们引入一个简单的模态逻辑框架来推理信息源的专业知识。在这个框架中,如果消息源能够在任何可能的世界中正确地确定$p$的真值,那么消息源就是$p$命题的专家。我们还考虑了信息可能是虚假的,但真实的会计来源缺乏专门知识。这与信息来源提出超出其专业领域的主张的情况建模有关。我们基于具有特定闭包属性的专门知识集对语言使用非标准语义。事实证明,我们的语义学和认知逻辑之间有着密切的联系,因此专业知识可以在所有可能的状态下用知识来表达。我们使用这个连接来获得一个健全和完整的公理化。 摘要:In this paper we introduce a simple modal logic framework to reason about the expertise of an information source. In the framework, a source is an expert on a proposition $p$ if they are able to correctly determine the truth value of $p$ in any possible world. We also consider how information may be false, but true after accounting for the lack of expertise of the source. This is relevant for modelling situations in which information sources make claims beyond their domain of expertise. We use non-standard semantics for the language based on an expertise set with certain closure properties. It turns out there is a close connection between our semantics and S5 epistemic logic, so that expertise can be expressed in terms of knowledge at all possible states. We use this connection to obtain a sound and complete axiomatisation.
【2】 Philosophical Specification of Empathetic Ethical Artificial Intelligence 标题:移情伦理人工智能的哲学规范
作者:Michael Timothy Bennett,Yoshihiro Maruyama 备注:To appear in IEEE Transactions in Cognitive and Developmental Systems 链接:https://arxiv.org/abs/2107.10715 摘要:为了构建一个有道德的人工智能,必须克服两个复杂的问题。首先,人类对什么是道德的,什么是不道德的并不一致。第二,现代人工智能和机器学习方法往往是钝器,要么在预定义规则的范围内寻找解决方案,要么模仿行为。道德人工智能必须能够推断出潜规则,解释细微差别和上下文,拥有并能够推断意图,不仅解释其行为,而且解释其意图。运用行为主义、符号学、知觉符号系统和符号涌现理论,我们指定了一个agent,它不仅学习符号之间的任意关系,而且还根据其感觉运动系统的知觉状态来学习符号的意义。随后,它可以学习一个句子的意思,并根据自己的经验推断出他人的意图。它具有可塑性的意图,因为符号的意义随着它的学习而改变,它的意图被象征性地表示为一个目标。因此,它可能会学习到一个概念,即什么是最有可能被大多数人认为是道德的,在一个人口中,这可能会被用作一个目标。抽象符号的意义是用原始感觉运动刺激的感性符号来表达的,它是最弱的(与奥克姆剃刀一致的)必要和充分的概念,是从明示的定义中获得的内涵定义,从中可以获得所有伦理决定的外延定义或范畴。因为这些抽象符号对于情景和反应都是相同的,所以在执行或观察动作时使用相同的符号。这类似于人脑中的镜像神经元。镜像符号可能允许代理移情,因为它自己的经验与符号相关联,这也与观察另一个代理体验符号所代表的东西相关联。 摘要:In order to construct an ethical artificial intelligence (AI) two complex problems must be overcome. Firstly, humans do not consistently agree on what is or is not ethical. Second, contemporary AI and machine learning methods tend to be blunt instruments which either search for solutions within the bounds of predefined rules, or mimic behaviour. An ethical AI must be capable of inferring unspoken rules, interpreting nuance and context, possess and be able to infer intent, and explain not just its actions but its intent. Using enactivism, semiotics, perceptual symbol systems and symbol emergence, we specify an agent that learns not just arbitrary relations between signs but their meaning in terms of the perceptual states of its sensorimotor system. Subsequently it can learn what is meant by a sentence and infer the intent of others in terms of its own experiences. It has malleable intent because the meaning of symbols changes as it learns, and its intent is represented symbolically as a goal. As such it may learn a concept of what is most likely to be considered ethical by the majority within a population of humans, which may then be used as a goal. The meaning of abstract symbols is expressed using perceptual symbols of raw sensorimotor stimuli as the weakest (consistent with Ockham's Razor) necessary and sufficient concept, an intensional definition learned from an ostensive definition, from which the extensional definition or category of all ethical decisions may be obtained. Because these abstract symbols are the same for both situation and response, the same symbol is used when either performing or observing an action. This is akin to mirror neurons in the human brain. Mirror symbols may allow the agent to empathise, because its own experiences are associated with the symbol, which is also associated with the observation of another agent experiencing something that symbol represents.
【3】 A Framework for Imbalanced Time-series Forecasting 标题:非平衡时间序列预测的一个框架
作者:Luis P. Silvestrin,Leonardos Pantiskas,Mark Hoogendoorn 机构:Computer Science Department, Vrije Universiteit Amsterdam, NL 链接:https://arxiv.org/abs/2107.10709 摘要:时间序列预测在许多领域中起着重要的作用。在深度学习算法进步的推动下,它已被用于预测风力发电的风电生产、股市波动或电机过热。在这些任务中,我们感兴趣的是准确地预测某些特定的时刻,这些时刻在数据集中往往被低估,从而导致一个被称为不平衡回归的问题。在文献中,虽然被认为是一个具有挑战性的问题,但对如何在实际环境中处理这个问题的关注有限。在这篇文章中,我们提出了一种分析时间序列预测问题的一般方法,集中在那些代表性不足的时刻,以减少不平衡。我们的方法是在一个大型工业公司的案例研究的基础上发展起来的,我们用它来举例说明这种方法。 摘要:Time-series forecasting plays an important role in many domains. Boosted by the advances in Deep Learning algorithms, it has for instance been used to predict wind power for eolic energy production, stock market fluctuations, or motor overheating. In some of these tasks, we are interested in predicting accurately some particular moments which often are underrepresented in the dataset, resulting in a problem known as imbalanced regression. In the literature, while recognized as a challenging problem, limited attention has been devoted on how to handle the problem in a practical setting. In this paper, we put forward a general approach to analyze time-series forecasting problems focusing on those underrepresented moments to reduce imbalances. Our approach has been developed based on a case study in a large industrial company, which we use to exemplify the approach.
【4】 Typing assumptions improve identification in causal discovery 标题:键入假设改进了因果发现中的识别
作者:Philippe Brouillard,Perouz Taslakian,Alexandre Lacoste,Sebastien Lachapelle,Alexandre Drouin 备注:Accepted for presentation as a contributed talk at the Workshop on the Neglected Assumptions in Causal Inference (NACI) at the 38th International Conference on Machine Learning, 2021 链接:https://arxiv.org/abs/2107.10703 摘要:从观测数据中发现因果关系是一项具有挑战性的任务,不可能总是找到精确的解决方案。在数据生成过程的假设下,因果图通常可以被识别为等价类。提出新的现实假设来限定这些等价类是一个活跃的研究领域。在这项工作中,我们提出了一套新的假设,限制可能的因果关系的基础上的性质的变量。因此,我们引入类型化有向无环图,其中变量类型用于确定因果关系的有效性。我们证明,无论是理论上还是实证上,所提出的假设可以导致在因果图的识别显着收益。 摘要:Causal discovery from observational data is a challenging task to which an exact solution cannot always be identified. Under assumptions about the data-generative process, the causal graph can often be identified up to an equivalence class. Proposing new realistic assumptions to circumscribe such equivalence classes is an active field of research. In this work, we propose a new set of assumptions that constrain possible causal relationships based on the nature of the variables. We thus introduce typed directed acyclic graphs, in which variable types are used to determine the validity of causal relationships. We demonstrate, both theoretically and empirically, that the proposed assumptions can result in significant gains in the identification of the causal graph.
【5】 Dialogue Object Search 标题:对话对象搜索
作者:Monica Roy,Kaiyu Zheng,Jason Liu,Stefanie Tellex 机构:Department of Computer Science, Brown University 备注:3 pages, 1 figure. Robotics: Science and Systems (RSS) 2021 Workshop on Robotics for People (R4P): Perspectives on Interaction, Learning and Safety. Extended Abstract 链接:https://arxiv.org/abs/2107.10653 摘要:我们设想的机器人可以与人类无缝协作和交流。这类机器人在与人类互动时,有必要决定说什么和怎么做。为此,我们引入了一个新的任务,对话对象搜索:机器人的任务是在人类环境(如厨房)中搜索目标对象(如叉子),同时与远程人类进行“视频通话”,后者对目标位置有额外但不准确的知识。也就是说,机器人与人类进行基于语音的对话,同时分享其安装摄像头的图像。这项任务在多个层面上都具有挑战性,从数据收集、算法和系统开发到评估。尽管有这些挑战,我们相信这样的任务阻碍了走向更智能和协作机器人的道路。在这个扩展的摘要中,我们激发和介绍了对话对象搜索任务,并分析了从一个试点研究中收集的例子。然后,我们讨论我们的下一步,并总结了我们希望收到反馈的几个挑战。 摘要:We envision robots that can collaborate and communicate seamlessly with humans. It is necessary for such robots to decide both what to say and how to act, while interacting with humans. To this end, we introduce a new task, dialogue object search: A robot is tasked to search for a target object (e.g. fork) in a human environment (e.g., kitchen), while engaging in a "video call" with a remote human who has additional but inexact knowledge about the target's location. That is, the robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera. This task is challenging at multiple levels, from data collection, algorithm and system development,to evaluation. Despite these challenges, we believe such a task blocks the path towards more intelligent and collaborative robots. In this extended abstract, we motivate and introduce the dialogue object search task and analyze examples collected from a pilot study. We then discuss our next steps and conclude with several challenges on which we hope to receive feedback.
【6】 DEAP-FAKED: Knowledge Graph based Approach for Fake News Detection 标题:Deap-faked:基于知识图的假新闻检测方法
作者:Mohit Mayank,Shakshi Sharma,Rajesh Sharma 备注:8 链接:https://arxiv.org/abs/2107.10648 摘要:最近一段时间,社交媒体平台上的假新闻吸引了大量关注,主要是与政治(2016年美国总统选举)、医疗(COVID-19期间的infodemic)等相关的事件。人们提出了各种方法来检测假新闻。这些方法涵盖了与网络分析、自然语言处理(NLP)和图形神经网络(GNNs)相关的开发技术。本文提出了一种基于知识图的假新闻检测框架DEAP-FAKED。我们的方法是NLP(我们在这里对新闻内容进行编码)和GNN技术(我们在这里对知识图(KG)进行编码)的结合。这些编码的多样性为我们的检测器提供了一个互补的优势。我们使用两个公开可用的数据集评估我们的框架,其中包含来自政治、商业、技术和医疗保健等领域的文章。作为数据集预处理的一部分,我们还消除了可能影响模型性能的偏差,例如文章的来源。DEAP-FAKED的F1评分分别为88%和78%,分别提高了21%和3%,说明了该方法的有效性。 摘要:Fake News on social media platforms has attracted a lot of attention in recent times, primarily for events related to politics (2016 US Presidential elections), healthcare (infodemic during COVID-19), to name a few. Various methods have been proposed for detecting Fake News. The approaches span from exploiting techniques related to network analysis, Natural Language Processing (NLP), and the usage of Graph Neural Networks (GNNs). In this work, we propose DEAP-FAKED, a knowleDgE grAPh FAKe nEws Detection framework for identifying Fake News. Our approach is a combination of the NLP -- where we encode the news content, and the GNN technique -- where we encode the Knowledge Graph (KG). A variety of these encodings provides a complementary advantage to our detector. We evaluate our framework using two publicly available datasets containing articles from domains such as politics, business, technology, and healthcare. As part of dataset pre-processing, we also remove the bias, such as the source of the articles, which could impact the performance of the models. DEAP-FAKED obtains an F1-score of 88% and 78% for the two datasets, which is an improvement of 21%, and 3% respectively, which shows the effectiveness of the approach.
【7】 HANT: Hardware-Aware Network Transformation 标题:HANT:硬件感知的网络转型
作者:Pavlo Molchanov,Jimmy Hall,Hongxu Yin,Jan Kautz,Nicolo Fusi,Arash Vahdat 机构:NVIDIA, Microsoft Research 链接:https://arxiv.org/abs/2107.10624 摘要:给定一个经过训练的网络,我们如何加速它以满足在特定硬件上部署的效率需求?常用的硬件感知网络压缩技术通过剪枝、核融合、量化和降低精度来解决这一问题。但是,这些方法不会改变底层网络的操作。在本文中,我们提出了硬件感知网络变换(HANT),它通过使用一种类似于神经结构的搜索方法,将低效的操作替换为更有效的选择,从而加速网络。HANT分两个阶段来解决这个问题:第一阶段采用分层特征图提取的方法,对教师模型的每一层进行大量的交替操作训练。在第二阶段,有效操作的组合选择被放宽到整数优化问题,可以在几秒钟内解决。我们通过核融合和量化来扩展HANT,进一步提高吞吐量。我们对EfficientNet家族的加速实验结果表明,HANT可以将其加速3.6倍,在ImageNet数据集上的前1精度下降<0.4%。当比较相同的潜伏期水平时,HANT可以将EfficientNet-B4加速到与EfficientNet-B1相同的潜伏期,同时准确率提高3%。我们检查了一个大的操作池,每层最多197个,并提供了对所选操作和最终体系结构的见解。 摘要:Given a trained network, how can we accelerate it to meet efficiency needs for deployment on particular hardware? The commonly used hardware-aware network compression techniques address this question with pruning, kernel fusion, quantization and lowering precision. However, these approaches do not change the underlying network operations. In this paper, we propose hardware-aware network transformation (HANT), which accelerates a network by replacing inefficient operations with more efficient alternatives using a neural architecture search like approach. HANT tackles the problem in two phase: In the first phase, a large number of alternative operations per every layer of the teacher model is trained using layer-wise feature map distillation. In the second phase, the combinatorial selection of efficient operations is relaxed to an integer optimization problem that can be solved in a few seconds. We extend HANT with kernel fusion and quantization to improve throughput even further. Our experimental results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with <0.4% drop in the top-1 accuracy on the ImageNet dataset. When comparing the same latency level, HANT can accelerate EfficientNet-B4 to the same latency as EfficientNet-B1 while having 3% higher accuracy. We examine a large pool of operations, up to 197 per layer, and we provide insights into the selected operations and final architectures.
【8】 CNN-based Realized Covariance Matrix Forecasting 标题:基于CNN的已实现协方差矩阵预测
作者:Yanwen Fang,Philip L. H. Yu,Yaohua Tang 备注:17 pages, 5 figures 链接:https://arxiv.org/abs/2107.10602 摘要:众所周知,对资产收益的实际协方差矩阵进行建模和预测在金融领域起着至关重要的作用。高频日内数据的可用性可以直接对实现的协方差矩阵进行建模。然而,文献中的大多数模型都依赖于强有力的结构假设,并且经常受到维数灾难的影响。我们提出了一个基于CNN和卷积LSTM(convlsm)的端到端可训练模型,该模型不需要任何分布或结构假设,但可以一致地处理高维实现的协方差矩阵。该模型关注局部结构和时空相关性。它学习一个非线性映射,将历史实现的协方差矩阵连接到未来的协方差矩阵。我们对合成和真实数据集的实证研究表明,与几种先进的波动率模型相比,该模型具有很好的预测能力。 摘要:It is well known that modeling and forecasting realized covariance matrices of asset returns play a crucial role in the field of finance. The availability of high frequency intraday data enables the modeling of the realized covariance matrices directly. However, most of the models available in the literature depend on strong structural assumptions and they often suffer from the curse of dimensionality. We propose an end-to-end trainable model built on the CNN and Convolutional LSTM (ConvLSTM) which does not require to make any distributional or structural assumption but could handle high-dimensional realized covariance matrices consistently. The proposed model focuses on local structures and spatiotemporal correlations. It learns a nonlinear mapping that connect the historical realized covariance matrices to the future one. Our empirical studies on synthetic and real-world datasets demonstrate its excellent forecasting ability compared with several advanced volatility models.
【9】 Multiple Query Optimization using a Hybrid Approach of Classical and Quantum Computing 标题:基于经典和量子计算混合方法的多查询优化
作者:Tobias Fankhauser,Marc E. Solèr,Rudolf M. Füchslin,Kurt Stockinger 机构:Zurich University of Applied Sciences, Winterthur, Switzerland, European Centre for Living Technology (ECLT), Ca’ Bottacin, Dorsoduro , - Venice, Italy 备注:18 pages, 16 figures 链接:https://arxiv.org/abs/2107.10508 摘要:量子计算有望比经典计算机更有效地解决化学、物理和数学中的复杂优化问题,但需要具有数百万量子比特的容错量子计算机。为了克服当今量子计算机引入的错误,采用了经典计算机和量子计算机相结合的混合算法。多查询优化问题(MQO)是数据密集型问题中一个重要的NP-hard问题。我们提出了一种新的混合经典量子算法来解决基于门的量子计算机上的多量子点问题。我们对我们的算法进行了详细的实验评估,并将其性能与采用量子退火机(另一种量子计算机)的竞争方法进行了比较。我们的实验结果表明,与基于量子退火的量子计算机相比,由于基于门的量子计算机上可用的量子比特数有限,我们的算法目前只能处理小规模的问题。然而,我们的算法显示了接近99%的量子比特效率,与最新的实现相比几乎高出2倍。最后,我们分析了我们的算法如何在较大的问题规模下进行扩展,并得出结论,我们的方法在短期量子计算机中显示了有希望的结果。 摘要:Quantum computing promises to solve difficult optimization problems in chemistry, physics and mathematics more efficiently than classical computers, but requires fault-tolerant quantum computers with millions of qubits. To overcome errors introduced by today's quantum computers, hybrid algorithms combining classical and quantum computers are used. In this paper we tackle the multiple query optimization problem (MQO) which is an important NP-hard problem in the area of data-intensive problems. We propose a novel hybrid classical-quantum algorithm to solve the MQO on a gate-based quantum computer. We perform a detailed experimental evaluation of our algorithm and compare its performance against a competing approach that employs a quantum annealer -- another type of quantum computer. Our experimental results demonstrate that our algorithm currently can only handle small problem sizes due to the limited number of qubits available on a gate-based quantum computer compared to a quantum computer based on quantum annealing. However, our algorithm shows a qubit efficiency of close to 99% which is almost a factor of 2 higher compared to the state of the art implementation. Finally, we analyze how our algorithm scales with larger problem sizes and conclude that our approach shows promising results for near-term quantum computers.
【10】 Abstract Reasoning via Logic-guided Generation 标题:基于逻辑制导生成的抽象推理
作者:Sihyun Yu,Sangwoo Mo,Sungsoo Ahn,Jinwoo Shin 机构:those DNN-based approaches attempt to derive the 1Korea Advanced Institute of Science and Technology(KAIST) 2Mohamed bin Zayed University of Artificial In-telligence (MBZUAI) 备注:ICML 2021 Workshop on Self-Supervised Learning for Reasoning and Perception (Spotlight Talk) 链接:https://arxiv.org/abs/2107.10493 摘要:抽象推理,即从给定的观察结果中推断出复杂的模式,是人工通用智能的核心组成部分。当人们通过排除错误的候选或首先构造答案来寻找答案时,基于先验深度神经网络(DNN)的方法主要集中于前者的判别方法。本文旨在为后一种方法设计一个框架,弥合人工智能和人类智能之间的鸿沟。为此,我们提出了逻辑引导生成(LoGe)这一新的生成性DNN框架,将抽象推理简化为命题逻辑中的优化问题。LoGe由三个步骤组成:从图像中提取命题变量,用逻辑层推理答案变量,并从变量中重构答案图像。我们证明了在RAVEN基准下,LoGe优于黑盒DNN框架的生成性抽象推理,即基于从观察中获取各种属性的正确规则来重构答案。 摘要:Abstract reasoning, i.e., inferring complicated patterns from given observations, is a central building block of artificial general intelligence. While humans find the answer by either eliminating wrong candidates or first constructing the answer, prior deep neural network (DNN)-based methods focus on the former discriminative approach. This paper aims to design a framework for the latter approach and bridge the gap between artificial and human intelligence. To this end, we propose logic-guided generation (LoGe), a novel generative DNN framework that reduces abstract reasoning as an optimization problem in propositional logic. LoGe is composed of three steps: extract propositional variables from images, reason the answer variables with a logic layer, and reconstruct the answer image from the variables. We demonstrate that LoGe outperforms the black box DNN frameworks for generative abstract reasoning under the RAVEN benchmark, i.e., reconstructing answers based on capturing correct rules of various attributes from observations.
【11】 Neural Ordinary Differential Equation Model for Evolutionary Subspace Clustering and Its Applications 标题:进化子空间聚类的神经常微分方程模型及其应用
作者:Mingyuan Bai,S. T. Boris Choy,Junping Zhang,Junbin Gao 链接:https://arxiv.org/abs/2107.10484 摘要:神经常微分方程(neural-ODE)模型由于能够处理不规则的时间步长,即在等间隔的时间间隔内观测不到数据,在时间序列分析中受到越来越多的关注。在多维时间序列分析中,一个任务是进行演化子空间聚类,目的是根据时间数据演化的低维子空间结构对其进行聚类。现有的许多方法只能处理具有规则时间步长的时间序列,而在数据丢失等情况下,时间序列的采样是不均匀的。本文提出了一种进化子空间聚类的神经ODE模型来克服这一局限性,并引入了一种新的具有子空间自表达约束的目标函数。实验结果表明,该方法不仅可以在任意时间步长内插进化子空间聚类任务的数据,而且比现有的进化子空间聚类方法具有更高的精度。文中用合成数据和实际数据说明了该方法的有效性。 摘要:The neural ordinary differential equation (neural ODE) model has attracted increasing attention in time series analysis for its capability to process irregular time steps, i.e., data are not observed over equally-spaced time intervals. In multi-dimensional time series analysis, a task is to conduct evolutionary subspace clustering, aiming at clustering temporal data according to their evolving low-dimensional subspace structures. Many existing methods can only process time series with regular time steps while time series are unevenly sampled in many situations such as missing data. In this paper, we propose a neural ODE model for evolutionary subspace clustering to overcome this limitation and a new objective function with subspace self-expressiveness constraint is introduced. We demonstrate that this method can not only interpolate data at any time step for the evolutionary subspace clustering task, but also achieve higher accuracy than other state-of-the-art evolutionary subspace clustering methods. Both synthetic and real-world data are used to illustrate the efficacy of our proposed method.
【12】 Efficient Neural Causal Discovery without Acyclicity Constraints 标题:无循环性约束的高效神经因果发现
作者:Phillip Lippe,Taco Cohen,Efstratios Gavves 机构:University of Amsterdam, QUVA lab, Qualcomm AI Research∗ 备注:8th Causal Inference Workshop at UAI 2021 (contributed talk). 34 pages, 12 figures 链接:https://arxiv.org/abs/2107.10483 摘要:利用观测数据和介入数据学习因果图模型的结构是许多科学领域的一个基本问题。一个很有前途的方向是基于分数的方法的连续优化,它以数据驱动的方式有效地学习因果图。然而,到目前为止,这些方法需要约束优化来实现非循环性或缺乏收敛性保证。在这篇文章中,我们提出了一种有效的结构学习方法,利用观察和干预数据的有向,非循环因果图。ENCO将图搜索描述为一个独立边可能性的优化,边方向被建模为一个单独的参数。因此,我们可以在温和的条件下提供ENCO的收敛性保证,而不必限制分数函数的非循环性。实验表明,在处理确定性变量和潜在混杂因素的同时,ENCO可以有效地恢复具有数百个节点的图,其数量级比以前可能的要大。 摘要:Learning the structure of a causal graphical model using both observational and interventional data is a fundamental problem in many scientific fields. A promising direction is continuous optimization for score-based methods, which efficiently learn the causal graph in a data-driven manner. However, to date, those methods require constrained optimization to enforce acyclicity or lack convergence guarantees. In this paper, we present ENCO, an efficient structure learning method for directed, acyclic causal graphs leveraging observational and interventional data. ENCO formulates the graph search as an optimization of independent edge likelihoods, with the edge orientation being modeled as a separate parameter. Consequently, we can provide convergence guarantees of ENCO under mild conditions without constraining the score function with respect to acyclicity. In experiments, we show that ENCO can efficiently recover graphs with hundreds of nodes, an order of magnitude larger than what was previously possible, while handling deterministic variables and latent confounders.
【13】 Copy and Paste method based on Pose for ReID 标题:一种基于Reid姿势的复制粘贴方法
作者:Cheng Yang 机构:Huazhong University of, Science and Technology., Wuhan, China. 链接:https://arxiv.org/abs/2107.10479 摘要:再识别(ReID)是针对监控摄像机中不同视点的目标进行匹配。它的发展非常快,但是在这个阶段没有针对多个场景中的ReID任务的处理方法。然而,这种情况在现实生活中经常发生,比如安全场景。本文探讨了一种新的场景重新识别,不同的视角,背景和姿势(步行或骑自行车)。显然,普通的ReID处理方法不能很好地处理这种情况。我们都知道,最好的方法是在这个scanario中引入图像数据集,但是这个非常昂贵。针对这一问题,本文提出了一种简单有效的新场景下的图像生成方法,即基于姿势的复制粘贴方法(CPP)。CPP是一种基于关键点检测的方法,通过复制和粘贴,在两个不同的语义图像数据集中合成一个新的语义图像数据集。例如,我们可以使用行人和自行车生成一些显示同一个人骑不同自行车的图像。CPP适用于新场景中的ReID任务,它在原始ReID任务中的原始数据集上的性能优于最新技术。具体来说,它还可以对第三方公共数据集具有更好的泛化性能。未来将提供由CPP合成的代码和数据集。 摘要:Re-identification(ReID) aims at matching objects in surveillance cameras with different viewpoints. It's developing very fast, but there is no processing method for the ReID task in multiple scenarios at this stage. However, this dose happen all the time in real life, such as the security scenarios. This paper explores a new scenario of Re-identification, which differs in perspective, background, and pose(walking or cycling). Obviously, ordinary ReID processing methods cannot handle this scenario well. As we all konw, the best way to deal with that it is to introduce image datasets in this scanario, But this one is very expensive. To solve this problem, this paper proposes a simple and effective way to generate images in some new scenario, which is named Copy and Paste method based on Pose(CPP). The CPP is a method based on key point detection, using copy and paste, to composite a new semantic image dataset in two different semantic image datasets. Such as, we can use pedestrians and bicycles to generate some images that shows the same person rides on different bicycles. The CPP is suitable for ReID tasks in new scenarios and it outperforms state-of-the-art on the original datasets in original ReID tasks. Specifically, it can also have better generalization performance for third-party public datasets. Code and Datasets which composited by the CPP will be available in the future.
【14】 MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking 标题:MFGNet:用于RGB-T跟踪的动态模态感知过滤生成
作者:Xiao Wang,Xiujun Shu,Shiliang Zhang,Bo Jiang,Yaowei Wang,Yonghong Tian,Feng Wu 机构:Shiliang Zhang and YonghongTian are also with Peking University, Bo Jiang is withthe School of Computer Science and Technology, Anhui University, Feng Wu is also with University of Science and Technology of China 备注:In Peer Review 链接:https://arxiv.org/abs/2107.10433 摘要:许多RGB-T跟踪器试图通过自适应加权(或注意机制)来获得鲁棒的特征表示。与这些工作不同的是,我们提出了一种新的动态模态感知滤波器生成模块(MFGNet),通过自适应地调整各种输入图像的卷积核来增强可见光和热数据之间的信息通信。以图像对为输入,首先利用骨干网对其特征进行编码。然后,我们将这些特征映射串联起来,并生成具有两个独立网络的动态模态感知滤波器。可见光滤波器和热滤波器将分别对其相应的输入特征图进行动态卷积运算。受残差连接的启发,生成的可见光和热特征图都将用输入特征图进行总结。增强后的特征映射将被输入RoI-align模块,生成实例级特征,用于后续分类。为了解决由重遮挡、快速运动和视线外引起的问题,我们提出利用一种新的方向感知目标驱动注意机制进行局部和全局联合搜索。利用时空递归神经网络来捕捉方向感知上下文,实现对全局注意力的精确预测。在三个大规模RGB-T跟踪基准数据集上的实验验证了该算法的有效性。本文的项目页面可在https://sites.google.com/view/mfgrgbttrack/. 摘要:Many RGB-T trackers attempt to attain robust feature representation by utilizing an adaptive weighting scheme (or attention mechanism). Different from these works, we propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data by adaptively adjusting the convolutional kernels for various input images in practical tracking. Given the image pairs as input, we first encode their features with the backbone network. Then, we concatenate these feature maps and generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively. Inspired by residual connection, both the generated visible and thermal feature maps will be summarized with input feature maps. The augmented feature maps will be fed into the RoI align module to generate instance-level features for subsequent classification. To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism. The spatial and temporal recurrent neural network is used to capture the direction-aware context for accurate global attention prediction. Extensive experiments on three large-scale RGB-T tracking benchmark datasets validated the effectiveness of our proposed algorithm. The project page of this paper is available at https://sites.google.com/view/mfgrgbttrack/.
【15】 Shedding some light on Light Up with Artificial Intelligence 标题:用人工智能点亮灯塔让人眼前一亮
作者:Libo Sun,James Browning,Roberto Perera 机构:Samuel Ginn College of Engineering, Auburn University, Auburn, AL USA, Department of Aerospace Engineering 备注:14 pages, 16 figures, for associated codes, see <this https URL> 链接:https://arxiv.org/abs/2107.10429 摘要:光明之谜,也被称为AKARI之谜,从未用现代人工智能(AI)方法解决过。目前,应用最广泛的自主求解方法是进化论算法。这个项目是一项努力,以应用新的人工智能技术来解决发光难题更快,更有效的计算。产生最优解的算法包括爬山法、模拟退火法、前馈神经网络(FNN)和卷积神经网络(CNN)。开发了两种爬山和模拟退火算法,分别使用2个动作(添加和移除灯泡)和3个动作(添加、移除或将灯泡移动到不同的单元)。爬山算法和模拟退火算法对3个动作都有较高的精度。模拟退火算法的性能明显优于爬山算法、模糊神经网络算法、CNN算法和进化理论算法,在30种独特的电路板配置中,其精度达到100%。最后,虽然FNN和CNN算法的精确度较低,但计算时间明显快于其他算法。此项目的GitHub存储库位于https://github.com/rperera12/AKARI-LightUp-GameSolver-with-DeepNeuralNetworks-and-HillClimb-or-SimulatedAnnealing. 摘要:The Light-Up puzzle, also known as the AKARI puzzle, has never been solved using modern artificial intelligence (AI) methods. Currently, the most widely used computational technique to autonomously develop solutions involve evolution theory algorithms. This project is an effort to apply new AI techniques for solving the Light-up puzzle faster and more computationally efficient. The algorithms explored for producing optimal solutions include hill climbing, simulated annealing, feed-forward neural network (FNN), and convolutional neural network (CNN). Two algorithms were developed for hill climbing and simulated annealing using 2 actions (add and remove light bulb) versus 3 actions(add, remove, or move light-bulb to a different cell). Both hill climbing and simulated annealing algorithms showed a higher accuracy for the case of 3 actions. The simulated annealing showed to significantly outperform hill climbing, FNN, CNN, and an evolutionary theory algorithm achieving 100% accuracy in 30 unique board configurations. Lastly, while FNN and CNN algorithms showed low accuracies, computational times were significantly faster compared to the remaining algorithms. The GitHub repository for this project can be found at https://github.com/rperera12/AKARI-LightUp-GameSolver-with-DeepNeuralNetworks-and-HillClimb-or-SimulatedAnnealing.
【16】 Evaluation of In-Person Counseling Strategies To Develop Physical Activity Chatbot for Women 标题:开发女性健身聊天机器人的面对面咨询策略评价
作者:Kai-Hui Liang,Patrick Lange,Yoo Jung Oh,Jingwen Zhang,Yoshimi Fukuoka,Zhou Yu 机构:Columbia University, University of California, Davis, San Francisco 备注:Accepted by SIGDIAL 2021 as a long paper 链接:https://arxiv.org/abs/2107.10410 摘要:人工智能聊天机器人是基于技术干预改变人们行为的先锋。要开发干预聊天机器人,首先要了解人类会话中的自然语言会话策略。这项工作介绍了一个干预会话数据集收集自现实世界中的妇女体育活动干预计划。我们设计了四个维度(领域、策略、社会交换和任务聚焦交换)的综合注释方案,并对对话子集进行了注释。我们建立了一个包含上下文信息的策略分类器,在注释的基础上检测训练者和参与者的策略。为了了解人为干预如何诱导有效的行为改变,我们分析了干预策略与参与者身体活动障碍和社会支持变化之间的关系。我们还分析了参与者的基线权重如何与相应策略的发生量相关。这项工作为开发个性化的体育活动干预BOT奠定了基础。数据集和代码位于https://github.com/KaihuiLiang/physical-activity-counseling 摘要:Artificial intelligence chatbots are the vanguard in technology-based intervention to change people's behavior. To develop intervention chatbots, the first step is to understand natural language conversation strategies in human conversation. This work introduces an intervention conversation dataset collected from a real-world physical activity intervention program for women. We designed comprehensive annotation schemes in four dimensions (domain, strategy, social exchange, and task-focused exchange) and annotated a subset of dialogs. We built a strategy classifier with context information to detect strategies from both trainers and participants based on the annotation. To understand how human intervention induces effective behavior changes, we analyzed the relationships between the intervention strategies and the participants' changes in the barrier and social support for physical activity. We also analyzed how participant's baseline weight correlates to the amount of occurrence of the corresponding strategy. This work lays the foundation for developing a personalized physical activity intervention bot. The dataset and code are available at https://github.com/KaihuiLiang/physical-activity-counseling
【17】 Reinforcement Learning Agent Training with Goals for Real World Tasks 标题:具有现实任务目标的强化学习Agent训练
作者:Xuan Zhao,Marcos Campos 备注:Accepted to Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 38th International Conference on Machine Learning 链接:https://arxiv.org/abs/2107.10390 摘要:强化学习是解决各种控制、优化和顺序决策问题的一种很有前途的方法。然而,为复杂的任务(例如,有多个目标和安全约束)设计奖励函数对大多数用户来说是一个挑战,通常需要多次昂贵的试验(奖励函数黑客攻击)。本文提出了一种面向复杂控制和优化任务的规范语言(Inkling-Goal-specification),该语言非常接近自然语言,允许实践者专注于问题规范,而不是奖赏函数。我们框架的核心元素是:(i)将高级语言映射到为控制和优化任务定制的谓词时态逻辑;(ii)一种新的自动机引导的密集奖励生成,可用于驱动RL算法;(iii)一组性能度量来评估系统的行为。我们包括一组实验表明,提出的方法提供了极大的易用性,以指定广泛的现实世界的任务;所产生的奖励能够推动政策训练达到规定的目标。 摘要:Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks. However, designing reward functions for complex tasks (e.g., with multiple objectives and safety constraints) can be challenging for most users and usually requires multiple expensive trials (reward function hacking). In this paper we propose a specification language (Inkling Goal Specification) for complex control and optimization tasks, which is very close to natural language and allows a practitioner to focus on problem specification instead of reward function hacking. The core elements of our framework are: (i) mapping the high level language to a predicate temporal logic tailored to control and optimization tasks, (ii) a novel automaton-guided dense reward generation that can be used to drive RL algorithms, and (iii) a set of performance metrics to assess the behavior of the system. We include a set of experiments showing that the proposed method provides great ease of use to specify a wide range of real world tasks; and that the reward generated is able to drive the policy training to achieve the specified goal.
【18】 Uncertainty-Aware Task Allocation for Distributed Autonomous Robots 标题:分布式自主机器人的不确定性感知任务分配
作者:Liang Sun,Leonardo Escamilla 机构:1Liang Sun and Leonardo Escamilla are with Department of Mechanicaland Aerospace Engineering, New Mexico State University 链接:https://arxiv.org/abs/2107.10350 摘要:研究了分布式自主机器人(DARs)中具有不确定性的任务分配问题。不确定性在任务分配过程中的传播是通过使用Sigma点抽样机制的Unscented变换来完成的。在不考虑态势感知不确定性的情况下,无需对现有的任务分配方法进行修改,因此,它在通用任务分配方案中具有很大的应用潜力。提出的框架在一个模拟环境中进行了测试,在这个环境中,决策者需要确定分配给多个移动飞行机器人的多个位置的最优分配,这些机器人的位置是已知均值和协方差的随机变量。仿真结果表明,在不考虑不确定性的情况下,所提出的随机任务分配方法生成的任务总成本比随机任务分配方法低30%。 摘要:This paper addresses task-allocation problems with uncertainty in situational awareness for distributed autonomous robots (DARs). The uncertainty propagation over a task-allocation process is done by using the Unscented transform that uses the Sigma-Point sampling mechanism. It has great potential to be employed for generic task-allocation schemes, in the sense that there is no need to modify an existing task-allocation method that has been developed without considering the uncertainty in the situational awareness. The proposed framework was tested in a simulated environment where the decision-maker needs to determine an optimal allocation of multiple locations assigned to multiple mobile flying robots whose locations come as random variables of known mean and covariance. The simulation result shows that the proposed stochastic task allocation approach generates an assignment with 30% less overall cost than the one without considering the uncertainty.
【19】 iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability 标题:iReason:基于视频和自然语言的多模态常识推理
作者:Aman Chadha,Vinija Jain 机构:Department of Computer Science, Stanford University 备注:12 pages, 1 figure, 7 tables 链接:https://arxiv.org/abs/2107.10300 摘要:因果关系知识对于构建健壮的人工智能系统至关重要。深度学习模型通常在需要因果推理的任务上表现不佳,而因果推理通常是使用某种形式的常识推导出来的,这些常识不是在输入中立即可用的,而是由人类隐含地推断出来的。先前的工作已经揭示了在没有因果关系的情况下,模型所受到的虚假的观测偏差。虽然语言表征模型在学习的嵌入中保留了语境知识,但在训练过程中它们不考虑因果关系。通过将因果关系与输入特征融合到一个执行视觉认知任务(如场景理解、视频字幕、视频问答等)的现有模型中,由于因果关系带来的洞察力,可以获得更好的性能。最近,有人提出了一些模型来处理从视觉或文本模态中挖掘因果数据的任务。然而,目前还没有广泛的研究通过视觉和语言形式并置来挖掘因果关系。虽然图像为我们提供了丰富且易于处理的资源,可以从中挖掘因果关系知识,但视频密度更高,并且由自然的时间顺序事件组成。此外,文本信息提供了视频中可能隐含的细节。我们提出了iReason,一个使用视频和自然语言字幕推断视觉语义常识知识的框架。此外,iReason的架构集成了一个因果合理化模块,以帮助解释性、错误分析和偏差检测的过程。我们通过与语言表征学习模型(BERT,GPT-2)以及当前最先进的多模态因果关系模型的双管齐下的比较分析,证明了iReason的有效性。 摘要:Causality knowledge is vital to building robust AI systems. Deep learning models often perform poorly on tasks that require causal reasoning, which is often derived using some form of commonsense knowledge not immediately available in the input but implicitly inferred by humans. Prior work has unraveled spurious observational biases that models fall prey to in the absence of causality. While language representation models preserve contextual knowledge within learned embeddings, they do not factor in causal relationships during training. By blending causal relationships with the input features to an existing model that performs visual cognition tasks (such as scene understanding, video captioning, video question-answering, etc.), better performance can be achieved owing to the insight causal relationships bring about. Recently, several models have been proposed that have tackled the task of mining causal data from either the visual or textual modality. However, there does not exist widespread research that mines causal relationships by juxtaposing the visual and language modalities. While images offer a rich and easy-to-process resource for us to mine causality knowledge from, videos are denser and consist of naturally time-ordered events. Also, textual information offers details that could be implicit in videos. We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions. Furthermore, iReason's architecture integrates a causal rationalization module to aid the process of interpretability, error analysis and bias detection. We demonstrate the effectiveness of iReason using a two-pronged comparative analysis with language representation learning models (BERT, GPT-2) as well as current state-of-the-art multimodal causality models.
【20】 How to Tell Deep Neural Networks What We Know 标题:如何告诉深度神经网络我们所知道的
作者:Tirtharaj Dash,Sharad Chitlangia,Aditya Ahuja,Ashwin Srinivasan 机构:Department of Computer Science & Information Systems, Department of Electrical and Electronics Engineering, Anuradha and Prashanth Palakurthi Centre for AI Research (APPCAIR), BITS Pilani, K.K. Birla Goa Campus, Goa , India 备注:12 pages (full version); substantial overlap with arXiv:2103.00180 链接:https://arxiv.org/abs/2107.10295 摘要:我们提出了一个简短的调查方式,在现有的科学知识,包括当建立模型与神经网络。领域知识的包含不仅对构建科学助手有特别的意义,而且对许多其他涉及使用人机协作来理解数据的领域也有特别的意义。在许多这样的实例中,基于机器的模型构造可以从以足够精确的形式编码的域的人类知识中获得显著的益处。本文通过对输入、损失函数和深层网络结构的改变来检验领域知识的包含。分类是为了便于阐述:在实践中,我们希望结合使用这些变化。在每一个类别中,我们描述的技术,已被证明产生重大变化的网络性能。 摘要:We present a short survey of ways in which existing scientific knowledge are included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit significantly from being provided with human-knowledge of the domain encoded in a sufficiently precise form. This paper examines the inclusion of domain-knowledge by means of changes to: the input, the loss-function, and the architecture of deep networks. The categorisation is for ease of exposition: in practice we expect a combination of such changes will be employed. In each category, we describe techniques that have been shown to yield significant changes in network performance.
【21】 HARP-Net: Hyper-Autoencoded Reconstruction Propagation\for Scalable Neural Audio Coding 标题:HARP-NET:适用于可伸缩神经音频编码的超自动编码重建传播
作者:Darius Petermann,Seungkwon Beack,Minje Kim 机构:Indiana University, Department of Intelligent Systems Engineering, Bloomington, IN , USA, Electronics and Telecommunications Research Institute, Daejeon , South Korea 备注:Accepted to the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021, Mohonk Mountain House, New Paltz, NY 链接:https://arxiv.org/abs/2107.10843 摘要:基于自动编码器的编解码器采用量化将其瓶颈层激活转换为位串,这一过程阻碍了编码器和解码器部分之间的信息流。为了避免这个问题,我们在相应的一对编码器-解码器层之间使用额外的跳过连接。假设在镜像自动编码器拓扑中,解码器层重构其相应编码器层的中间特征表示。因此,从相应的编码器层直接传播的任何附加信息都有助于重建。我们以附加自动编码器的形式实现这种跳过连接,每个自动编码器是一个小型编解码器,用于压缩成对编解码器层之间的大量数据传输。我们的经验验证,提出的超自动编码架构改善感知音频质量相比,一个普通的自动编码器基线。 摘要:An autoencoder-based codec employs quantization to turn its bottleneck layer activation into bitstrings, a process that hinders information flow between the encoder and decoder parts. To circumvent this issue, we employ additional skip connections between the corresponding pair of encoder-decoder layers. The assumption is that, in a mirrored autoencoder topology, a decoder layer reconstructs the intermediate feature representation of its corresponding encoder layer. Hence, any additional information directly propagated from the corresponding encoder layer helps the reconstruction. We implement this kind of skip connections in the form of additional autoencoders, each of which is a small codec that compresses the massive data transfer between the paired encoder-decoder layers. We empirically verify that the proposed hyper-autoencoded architecture improves perceptual audio quality compared to an ordinary autoencoder baseline.
【22】 Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity 标题:基于可解释SincNet的脑电信号情感识别深度学习
作者:Juan Manuel Mayor-Torres,Mirco Ravanelli,Sara E. Medina-DeVilliers,Matthew D. Lerner,Giuseppe Riccardi 机构: 2Mila -Quebec Artifical Intelligence Institute, 3 StonyBrookUniversity 链接:https://arxiv.org/abs/2107.10790 摘要:机器学习方法,如深度学习,在医学领域显示出良好的效果。然而,这些算法缺乏可解释性,可能会阻碍它们在医疗决策支持系统中的应用。本文研究了一种可解释的深度学习技术,称为SincNet。SincNet是一种卷积神经网络,它通过可训练的sinc函数有效地学习定制的带通滤波器。在这项研究中,我们使用SincNet来分析孤独症谱系障碍(ASD)个体的神经活动,这些个体的神经振荡活动存在特征性差异。特别地,我们提出了一种新的基于SincNet的神经网络来检测ASD患者的情绪。所学习的滤波器可以很容易地检测出脑电图频谱的哪一部分用于预测情绪。我们发现,我们的系统自动学习高-$alpha$(9-13赫兹)和$beta$(13-30赫兹)波段抑制经常出现在自闭症患者。这一结果与最近神经科学关于情绪识别的研究一致,后者发现这些条带抑制与在自闭症患者中观察到的行为缺陷有关。在不牺牲情感识别性能的前提下,提高了SincNet的可解释性。 摘要:Machine learning methods, such as deep learning, show promising results in the medical domain. However, the lack of interpretability of these algorithms may hinder their applicability to medical decision support systems. This paper studies an interpretable deep learning technique, called SincNet. SincNet is a convolutional neural network that efficiently learns customized band-pass filters through trainable sinc-functions. In this study, we use SincNet to analyze the neural activity of individuals with Autism Spectrum Disorder (ASD), who experience characteristic differences in neural oscillatory activity. In particular, we propose a novel SincNet-based neural network for detecting emotions in ASD patients using EEG signals. The learned filters can be easily inspected to detect which part of the EEG spectrum is used for predicting emotions. We found that our system automatically learns the high-$alpha$ (9-13 Hz) and $beta$ (13-30 Hz) band suppression often present in individuals with ASD. This result is consistent with recent neuroscience studies on emotion recognition, which found an association between these band suppressions and the behavioral deficits observed in individuals with ASD. The improved interpretability of SincNet is achieved without sacrificing performance in emotion recognition.
【23】 Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition 标题:用于音视频情感识别的多模态残差感知器网络
作者:Xin Chang,Władysław Skarbek 机构:����������, Citation: Chang, X.; Skarbek, W., Multi-modal Residual Perceptron, Network for Audio-Video Emotion, classification. Preprints ,. 链接:https://arxiv.org/abs/2107.10742 摘要:情感识别是人机交互的一个重要研究领域。音视频情感识别(AVER)目前受到深度神经网络(DNN)建模工具的攻击。在已发表的论文中,作者通常只展示了多种模式优于纯音频或纯视频模式的案例。然而,在某些情况下,单一模态的优越性是可以找到的。在我们的研究中,我们假设对于情感事件的模糊类别,一个模态的高噪声可以放大在建模神经网络参数中间接表示的第二模态的低噪声。为了避免这种跨模态信息的干扰,我们定义了一个多模态剩余感知器网络(MRPN),它从多模态网络分支中学习产生深度特征表示,同时降低噪声。对于提出的MRPN模型和新的流媒体数字电影时间增强算法,Ryerson情感语音和歌曲视听数据库(RAVDESS)和众包情感多模态演员数据集(Crema-d)的平均识别率分别提高到91.4%和83.15%。此外,MRPN概念显示了其在处理光和声信号源的多模态分类器中的潜力。 摘要:Emotion recognition is an important research field for Human-Computer Interaction(HCI). Audio-Video Emotion Recognition (AVER) is now attacked with Deep Neural Network (DNN) modeling tools. In published papers, as a rule, the authors show only cases of the superiority of multi modalities over audio-only or video-only modalities. However, there are cases superiority in single modality can be found. In our research, we hypothesize that for fuzzy categories of emotional events, the higher noise of one modality can amplify the lower noise of the second modality represented indirectly in the parameters of the modeling neural network. To avoid such cross-modal information interference we define a multi-modal Residual Perceptron Network (MRPN) which learns from multi-modal network branches creating deep feature representation with reduced noise. For the proposed MRPN model and the novel time augmentation for streamed digital movies, the state-of-art average recognition rate was improved to 91.4% for The Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS) dataset and to 83.15% for Crowd-sourced Emotional multi-modal Actors Dataset(Crema-d). Moreover, the MRPN concept shows its potential for multi-modal classifiers dealing with signal sources not only of optical and acoustical type.
【24】 Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity 标题:基于结构感知交互图神经网络的蛋白质-配体结合亲和力预测
作者:Shuangli Li,Jingbo Zhou,Tong Xu,Liang Huang,Fan Wang,Haoyi Xiong,Weili Huang,Dejing Dou,Hui Xiong 机构:University of Science and Technology of China,Business Intelligence Lab, Baidu Research, Baidu Inc.,Baidu Research USA,Oregon State University,HWL Consulting LLC,Rutgers University 备注:11 pages, 8 figures, Accepted by KDD 2021 (Research Track) 链接:https://arxiv.org/abs/2107.10670 摘要:药物的发现往往依赖于蛋白质-配体结合亲和力的成功预测。最近的进展表明,应用图神经网络(GNNs)通过学习蛋白质-配体复合物的表示来更好地预测亲和力是很有希望的。然而,现有的解决方案通常将蛋白质-配体复合物作为拓扑图数据处理,没有充分利用生物分子的结构信息。在GNN模型中,原子间的长程相互作用也被忽略了。为此,我们提出了一种结构感知的交互式图神经网络(SIGN),它由两个部分组成:极性启发图注意层(PGAL)和成对交互式池(PiPool)。具体地说,PGAL迭代地执行节点边缘聚合过程来更新节点和边缘的嵌入,同时保留原子之间的距离和角度信息。然后,采用PiPool方法收集交互边缘,并进行后续重建损失,以反映全局交互。在两个基准上进行了详尽的实验研究,验证了该方法的优越性。 摘要:Drug discovery often relies on the successful prediction of protein-ligand binding affinity. Recent advances have shown great promise in applying graph neural networks (GNNs) for better affinity prediction by learning the representations of protein-ligand complexes. However, existing solutions usually treat protein-ligand complexes as topological graph data, thus the biomolecular structural information is not fully utilized. The essential long-range interactions among atoms are also neglected in GNN models. To this end, we propose a structure-aware interactive graph neural network (SIGN) which consists of two components: polar-inspired graph attention layers (PGAL) and pairwise interactive pooling (PiPool). Specifically, PGAL iteratively performs the node-edge aggregation process to update embeddings of nodes and edges while preserving the distance and angle information among atoms. Then, PiPool is adopted to gather interactive edges with a subsequent reconstruction loss to reflect the global interactions. Exhaustive experimental study on two benchmarks verifies the superiority of SIGN.
【25】 Digital Einstein Experience: Fast Text-to-Speech for Conversational AI 标题:数字爱因斯坦体验:对话式人工智能的快速文本到语音转换
作者:Joanna Rownicka,Kilian Sprenkamp,Antonio Tripiana,Volodymyr Gromoglasov,Timo P Kunz 机构:Aflorithmic Labs Ltd. 备注:accepted at Interspeech 2021 链接:https://arxiv.org/abs/2107.10658 摘要:我们描述了为会话人工智能用例创建和提供自定义语音的方法。更具体地说,我们为数字爱因斯坦角色提供了一个声音,以实现数字对话体验中的人机交互。为了生成符合上下文的语音,我们首先设计一个语音字符,然后生成与所需语音属性相对应的录音。然后我们模拟声音。我们的解决方案利用FastSpeech2从音素中预测对数标度的mel谱图,并利用并行WaveGAN生成波形。该系统支持字符输入,并在输出端提供语音波形。我们为选定的单词使用自定义词典,以确保它们的正确发音。我们提出的云架构能够实现快速的语音传输,使我们能够与数字版本的Albert Einstein进行实时对话。 摘要:We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.
【26】 Improving Polyphonic Sound Event Detection on Multichannel Recordings with the Sørensen-Dice Coefficient Loss and Transfer Learning 标题:利用Sørensen-Dice系数损失和迁移学习改进多通道录音的复音事件检测
作者:Karn N. Watcharasupat,Thi Ngoc Tho Nguyen,Ngoc Khanh Nguyen,Zhen Jian Lee,Douglas L. Jones,Woon Seng Gan 机构:School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore., Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA. 备注:Under review for the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021 链接:https://arxiv.org/abs/2107.10471 摘要:最近,S{o}rensen--Dice系数作为一种损失函数(也称为Dice loss)越来越受欢迎,因为它在负样本数显著超过正样本数的任务中具有鲁棒性,如语义分割、自然语言处理和声音事件检测。传统的二元交叉熵损失的复调声音事件检测系统的训练往往会导致次优的检测性能,因为训练常常被来自负样本的更新所淹没。在本文中,我们研究了骰子丢失、模态内和模态间转移学习、数据增强和记录格式对多声道输入的合成音事件检测系统性能的影响。我们的分析表明,在不同的训练设置和录制格式下,以骰子损失训练的复调声音事件检测系统在F1分数和错误率方面始终优于以交叉熵损失训练的复调声音事件检测系统。通过使用迁移学习和适当组合不同的数据增强技术,我们进一步提高了性能。 摘要:The S{o}rensen--Dice Coefficient has recently seen rising popularity as a loss function (also known as Dice loss) due to its robustness in tasks where the number of negative samples significantly exceeds that of positive samples, such as semantic segmentation, natural language processing, and sound event detection. Conventional training of polyphonic sound event detection systems with binary cross-entropy loss often results in suboptimal detection performance as the training is often overwhelmed by updates from negative samples. In this paper, we investigated the effect of the Dice loss, intra- and inter-modal transfer learning, data augmentation, and recording formats, on the performance of polyphonic sound event detection systems with multichannel inputs. Our analysis showed that polyphonic sound event detection systems trained with Dice loss consistently outperformed those trained with cross-entropy loss across different training settings and recording formats in terms of F1 score and error rate. We achieved further performance gains via the use of transfer learning and an appropriate combination of different data augmentation techniques.
【27】 What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis 标题:是什么让声音事件定位和检测变得困难?从错误分析中得到的启示
作者:Thi Ngoc Tho Nguyen,Karn N. Watcharasupat,Zhen Jian Lee,Ngoc Khanh Nguyen,Douglas L. Jones,Woon Seng Gan 机构:School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore., Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA. 备注:Under review for the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021 链接:https://arxiv.org/abs/2107.10469 摘要:声事件定位与检测(SELD)是一个新兴的研究课题,旨在将声事件检测与波达方向估计的任务统一起来。因此,SELD继承了噪声、混响、干扰、复调和声源的非平稳性等两个任务的挑战。此外,SELD常常面临一个额外的挑战,即为多个重叠的声音事件分配检测到的声音类别和到达方向之间的正确对应。以往的研究表明,混响环境中的未知干扰往往会导致SELD系统性能的严重下降。为了进一步了解SELD任务的挑战,我们对两个SELD系统进行了详细的错误分析,这两个系统在DCASE SELD挑战的团队类别中都排名第二,一个在2020年,一个在2021年。实验结果表明,复调是SELD的主要挑战,因为很难检测出所有感兴趣的声音事件。此外,SELD系统对于训练集中占主导地位的复调场景往往会产生较少的错误。 摘要:Sound event localization and detection (SELD) is an emerging research topic that aims to unify the tasks of sound event detection and direction-of-arrival estimation. As a result, SELD inherits the challenges of both tasks, such as noise, reverberation, interference, polyphony, and non-stationarity of sound sources. Furthermore, SELD often faces an additional challenge of assigning correct correspondences between the detected sound classes and directions of arrival to multiple overlapping sound events. Previous studies have shown that unknown interferences in reverberant environments often cause major degradation in the performance of SELD systems. To further understand the challenges of the SELD task, we performed a detailed error analysis on two of our SELD systems, which both ranked second in the team category of DCASE SELD Challenge, one in 2020 and one in 2021. Experimental results indicate polyphony as the main challenge in SELD, due to the difficulty in detecting all sound events of interest. In addition, the SELD systems tend to make fewer errors for the polyphonic scenario that is dominant in the training set.
【28】 Machine Learning Characterization of Cancer Patients-Derived Extracellular Vesicles using Vibrational Spectroscopies 标题:基于振动光谱的癌症患者细胞外小泡的机器学习表征
作者:Abicumaran Uthamacumaran,Samir Elouatik,Mohamed Abdouh,Michael Berteau-Rainville,Zhu- Hua Gao,Goffredo Arena 机构:Concordia University, Department of Physics, Montreal, QC, Canada, Université de Montréal, Département de chimie, Montreal, QC, Canada, Cancer Research Program, Research Institute of the McGill University Health Centre, Decarie 备注:50 pages 链接:https://arxiv.org/abs/2107.10332 摘要:癌症的早期发现在医学上是一个具有挑战性的问题。肿瘤患者的血清富含异质性分泌性脂质结合细胞外小泡(EVs),这些细胞外小泡提供了一系列复杂的信息和生物标志物,代表了它们的起源细胞,目前正在液体活检和癌症筛查领域进行研究。振动光谱为评估复杂生物样品的结构和生物物理性质提供了非侵入性方法。在这项研究中,对从9名患者(包括4种不同的癌症亚型(结直肠癌、肝细胞癌、乳腺癌和胰腺癌)和5名健康患者(对照组)的血清中提取的EVs进行了多重拉曼光谱测量。FTIR(Fourier Transform Infrared)光谱测量作为拉曼光谱分析的补充方法,对四种癌症亚型中的两种进行了分析。AdaBoost随机森林分类器、决策树和支持向量机(SVM)将癌症EV的基线校正拉曼光谱与健康对照组的拉曼光谱(18个光谱)区分开来,当降低到1800到1940厘米的光谱频率范围时,分类准确率大于90%,接受0.5分的训练/测试。对14个光谱的FTIR分类准确率为80%。我们的研究结果表明,基本的机器学习算法是区分癌症患者EVs和健康患者EVs复杂振动光谱的有力工具。这些实验方法有望成为机器智能辅助早期癌症筛查的有效液体活检方法。 摘要:The early detection of cancer is a challenging problem in medicine. The blood sera of cancer patients are enriched with heterogeneous secretory lipid bound extracellular vesicles (EVs), which present a complex repertoire of information and biomarkers, representing their cell of origin, that are being currently studied in the field of liquid biopsy and cancer screening. Vibrational spectroscopies provide non-invasive approaches for the assessment of structural and biophysical properties in complex biological samples. In this study, multiple Raman spectroscopy measurements were performed on the EVs extracted from the blood sera of 9 patients consisting of four different cancer subtypes (colorectal cancer, hepatocellular carcinoma, breast cancer and pancreatic cancer) and five healthy patients (controls). FTIR(Fourier Transform Infrared) spectroscopy measurements were performed as a complementary approach to Raman analysis, on two of the four cancer subtypes. The AdaBoost Random Forest Classifier, Decision Trees, and Support Vector Machines (SVM) distinguished the baseline corrected Raman spectra of cancer EVs from those of healthy controls (18 spectra) with a classification accuracy of greater than 90% when reduced to a spectral frequency range of 1800 to 1940 inverse cm, and subjected to a 0.5 training/testing split. FTIR classification accuracy on 14 spectra showed an 80% classification accuracy. Our findings demonstrate that basic machine learning algorithms are powerful tools to distinguish the complex vibrational spectra of cancer patient EVs from those of healthy patients. These experimental methods hold promise as valid and efficient liquid biopsy for machine intelligence-assisted early cancer screening.