人工智能学术速递[6.30]

2021-07-02 16:26:48 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.AI人工智能,共计58篇

【1】 A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning 标题:从表征学习的角度看元学习中训练验证分离的重要性

作者:Nikunj Saunshi,Arushi Gupta,Wei Hu 机构:Department of Computer Science, Princeton University, In proceedings of ICML 备注:In proceedings of ICML 2021 链接:https://arxiv.org/abs/2106.15615 摘要:元学习中的一种有效方法是利用多个“训练任务”来学习模型参数的良好初始化,通过从初始化中微调,可以帮助解决很少样本的看不见的“测试任务”。虽然在实践中取得了成功,但对这些方法的理论认识还很有限。这项工作研究了这些方法的一个重要方面:在元训练期间将每个任务的数据分解为训练(支持)和验证(查询)集。受最近工作的启发(Raghu et al.,2020),我们从表征学习的角度来看待这种元学习方法,并认为序列验证分割鼓励学习的表征为低秩而不损害表达能力,而非鼓励高秩表征的非分割变体。由于样本效率得益于较低的rankness,分裂策略将需要很少的样本来解决看不见的测试任务。我们提出了在子空间元学习实例上形式化这种线性表示学习思想的理论结果,并在仿真和标准元学习基准上实验验证了这种分裂的实际好处。 摘要:An effective approach in meta-learning is to utilize multiple "train tasks" to learn a good initialization for model parameters that can help solve unseen "test tasks" with very few samples by fine-tuning from this initialization. Although successful in practice, theoretical understanding of such methods is limited. This work studies an important aspect of these methods: splitting the data from each task into train (support) and validation (query) sets during meta-training. Inspired by recent work (Raghu et al., 2020), we view such meta-learning methods through the lens of representation learning and argue that the train-validation split encourages the learned representation to be low-rank without compromising on expressivity, as opposed to the non-splitting variant that encourages high-rank representations. Since sample efficiency benefits from low-rankness, the splitting strategy will require very few samples to solve unseen test tasks. We present theoretical results that formalize this idea for linear representation learning on a subspace meta-learning instance, and experimentally verify this practical benefit of splitting in simulations and on standard meta-learning benchmarks.

【2】 Learning Task Informed Abstraction 标题:学习任务知情摘要

作者:Xiang Fu,Ge Yang,Pulkit Agrawal,Tommi Jaakkola 备注:8 pages, 12 figures 链接:https://arxiv.org/abs/2106.15612 摘要:现有的基于模型的强化学习方法在复杂的视觉场景中进行操作时,由于无法确定任务相关特征的优先级,因而存在一定的困难。为了缓解这个问题,我们提出学习任务通知抽象(TIA),明确区分奖励相关的视觉特征和分心。对于TIA的学习,我们引入了任务通知MDP(TiMDP)的形式化方法,该方法通过训练两个通过合作重建学习视觉特征的模型来实现,但其中一个模型与奖赏信号是敌对分离的。实验结果表明,在许多视觉控制任务中,TIA比最先进的方法有显著的性能提高,而这些任务中自然和无约束的视觉分心是一个巨大的挑战。 摘要:Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features. To mitigate this problem, we propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors. For learning TIA, we introduce the formalism of Task Informed MDP (TiMDP) that is realized by training two models that learn visual features via cooperative reconstruction, but one model is adversarially dissociated from the reward signal. Empirical evaluation shows that TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks where natural and unconstrained visual distractions pose a formidable challenge.

【3】 Framework for an Intelligent Affect Aware Smart Home Environment for Elderly People 标题:面向老年人的智能情感感知智能家居环境框架

作者:Nirmalya Thakur,Chia Y. Han 机构:edu Department of Electrical Engineering and Computer Science University of Cincinnati Cincinnati 备注:None 链接:https://arxiv.org/abs/2106.15599 摘要:在过去的几十年里,老年人的人口一直在快速增长,预计他们的人口在不久的将来还会进一步增加。随着年龄的增长,老年人面临着身体残疾、认知问题、记忆力减退和行为紊乱等问题,这与他们日益增长的需求有关。为了减轻他们在世界经济中的财政负担,提高他们的生活质量,必须开发具有适应性、辅助性和智能性的基于技术的解决方案。智能情感感知系统不仅可以分析,而且可以预测老年人在物联网环境中与技术的日常交互中的行为,具有巨大的潜力,可以作为改善智能家居中老年人用户体验的长期解决方案。因此,这项工作提出了一个老年人智能情感感知环境的框架,不仅可以分析他们互动的情感成分,而且可以预测他们可能的用户体验,甚至在他们开始在给定的智能家居环境中从事任何活动之前。这种对用户体验的预测将为增强用户体验提供空间,从而增强此类智能系统的辅助性和适应性。为了支持这一框架在改善智能家居中老年人生活质量方面的有效性,我们在三个数据集上进行了测试,并对结果进行了介绍和讨论。 摘要:The population of elderly people has been increasing at a rapid rate over the last few decades and their population is expected to further increase in the upcoming future. Their increasing population is associated with their increasing needs due to problems like physical disabilities, cognitive issues, weakened memory and disorganized behavior, that elderly people face with increasing age. To reduce their financial burden on the world economy and to enhance their quality of life, it is essential to develop technology-based solutions that are adaptive, assistive and intelligent in nature. Intelligent Affect Aware Systems that can not only analyze but also predict the behavior of elderly people in the context of their day to day interactions with technology in an IoT-based environment, holds immense potential for serving as a long-term solution for improving the user experience of elderly in smart homes. This work therefore proposes the framework for an Intelligent Affect Aware environment for elderly people that can not only analyze the affective components of their interactions but also predict their likely user experience even before they start engaging in any activity in the given smart home environment. This forecasting of user experience would provide scope for enhancing the same, thereby increasing the assistive and adaptive nature of such intelligent systems. To uphold the efficacy of this proposed framework for improving the quality of life of elderly people in smart homes, it has been tested on three datasets and the results are presented and discussed.

【4】 The Values Encoded in Machine Learning Research 标题:机器学习研究中的编码价值

作者:Abeba Birhane,Pratyusha Kalluri,Dallas Card,William Agnew,Ravit Dotan,Michelle Bao 机构:University College Dublin & Lero, Dublin, Ireland, Stanford University, University of Washington, University of California, Berkeley 备注:Data and code available at this https URL 链接:https://arxiv.org/abs/2106.15590 摘要:机器学习(ML)目前对世界产生了巨大的影响,越来越多地影响到社区和机构实践。因此,至关重要的是,我们质疑该领域作为价值中立或普遍有益的模糊概念,并调查该领域正在推进的具体价值。在本文中,我们提出了一个严格的审查价值领域的定量和定性分析100高引用ML论文发表在总理ML会议,ICML和神经。我们注释了论文的关键特征,这些特征揭示了他们的价值:他们如何证明他们选择项目的合理性,他们提升了哪些方面,他们对潜在负面后果的考虑,以及他们的机构联系和资金来源。我们发现,社会需求与项目选择之间的联系通常非常松散(如果有提及的话),而且考虑负面后果的情况极为罕见。我们确定了在机器学习研究中被提升的67个价值观,其中,我们发现论文最经常基于表现、概括、效率、研究者理解、新颖性和基于先前工作的基础来证明和评估自己。我们提供了大量的文本证据,并分析了这些价值观是如何运作的。值得注意的是,我们发现,这些最高价值观中的每一个目前都在定义和应用,其假设和含义通常支持权力集中。最后,我们发现这些被高度引用的论文与科技公司和精英大学之间的联系越来越密切。 摘要:Machine learning (ML) currently exerts an outsized influence on the world, increasingly affecting communities and institutional practices. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we present a rigorous examination of the values of the field by quantitatively and qualitatively analyzing 100 highly cited ML papers published at premier ML conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: how they justify their choice of project, which aspects they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that societal needs are typically very loosely connected to the choice of project, if mentioned at all, and that consideration of negative consequences is extremely rare. We identify 67 values that are uplifted in machine learning research, and, of these, we find that papers most frequently justify and assess themselves based on performance, generalization, efficiency, researcher understanding, novelty, and building on previous work. We present extensive textual evidence and analysis of how these values are operationalized. Notably, we find that each of these top values is currently being defined and applied with assumptions and implications generally supporting the centralization of power. Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.

【5】 As easy as APC: Leveraging self-supervised learning in the context of time series classification with varying levels of sparsity and severe class imbalance 标题:与APC一样简单:在具有不同级别稀疏性和严重类别失衡的时间序列分类环境中利用自我监督学习

作者:Fiorella Wever,T. Anderson Keller,Victor Garcia,Laura Symul 机构: 1University of Amsterdam, Germany 3Department of Statistics, Stan-ford University 链接:https://arxiv.org/abs/2106.15577 摘要:高水平的稀疏性和强的类不平衡是普遍存在的挑战,往往同时出现在现实世界的时间序列数据。虽然大多数方法分别处理每个问题,但我们提出的方法同时处理这两个问题,同时对数据施加较少的假设。在这项工作中,我们提出利用自监督学习方法,特别是自回归预测编码(APC),来学习在缺失数据和类别不平衡的情况下时间序列数据的相关隐藏表示。我们使用GRU或GRU-D编码器在两个真实数据集上应用APC,并表明在所有设置下应用APC的一步超前预测可以改善分类结果。事实上,通过应用GRU-D-APC,我们在Physionet基准上获得了最先进的AUPRC结果。 摘要:High levels of sparsity and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. While most methods tackle each problem separately, our proposed approach handles both in conjunction, while imposing fewer assumptions on the data. In this work, we propose leveraging a self-supervised learning method, specifically Autoregressive Predictive Coding (APC), to learn relevant hidden representations of time series data in the context of both missing data and class imbalance. We apply APC using either a GRU or GRU-D encoder on two real-world datasets, and show that applying one-step-ahead prediction with APC improves the classification results in all settings. In fact, by applying GRU-D - APC, we achieve state-of-the-art AUPRC results on the Physionet benchmark.

【6】 Learning latent causal graphs via mixture oracles 标题:基于混合预言的潜在因果图学习

作者:Bohdan Kivva,Goutham Rajendran,Pradeep Ravikumar,Bryon Aragam 机构:University of Chicago, Carnegie Mellon University 备注:37 pages 链接:https://arxiv.org/abs/2106.15563 摘要:我们研究了在潜在变量存在的情况下,从数据中重建因果图模型的问题。主要的问题是恢复因果结构的潜在变量,同时考虑到一般,潜在的非线性变量之间的依赖性。在许多实际问题中,原始观测值(例如图像中的像素)之间的依赖性远不如某些高级潜在特征(例如概念或对象)之间的依赖性,这就是感兴趣的设置。我们提供了一个条件,在这个条件下,潜在的表示和潜在的潜在因果模型都可以通过简化为一个混合预言来识别。证明是建设性的,并导致几个算法显式重建完整的图形模型。我们讨论了有效的算法,并提供了实验来说明算法在实际中的应用。 摘要:We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant than the dependence between certain high-level, latent features (e.g. concepts or objects), and this is the setting of interest. We provide conditions under which both the latent representations and the underlying latent causal model are identifiable by a reduction to a mixture oracle. The proof is constructive, and leads to several algorithms for explicitly reconstructing the full graphical model. We discuss efficient algorithms and provide experiments illustrating the algorithms in practice.

【7】 Subgroup Generalization and Fairness of Graph Neural Networks 标题:图神经网络的子群泛化与公平性

作者:Jiaqi Ma,Junwei Deng,Qiaozhu Mei 机构:School of Information, University of Michigan, ‡Department of EECS 链接:https://arxiv.org/abs/2106.15535 摘要:尽管近年来图神经网络(GNNs)的应用取得了巨大的成功,但对其泛化能力的理论认识还很少,特别是对于数据非独立同分布(IID)的节点级任务。泛化性能的理论研究有助于理解GNN模型的基本问题(如公平性),设计更好的学习方法。在本文中,我们提出了一种新的PAC贝叶斯分析GNNs在非IID半监督学习设置。此外,我们还分析了在不同子群的未标记节点上的泛化性能,这使得我们可以从理论上进一步研究GNNs的精确(dis)奇偶校验(un)公平性。在合理的假设下,我们证明了测试子组和训练集之间的距离是影响GNN在该子组上性能的关键因素,这就需要特别注意训练节点的选择以实现公平学习。跨多个GNN模型和数据集的实验支持了我们的理论结果。 摘要:Despite enormous successful applications of graph neural networks (GNNs) recently, theoretical understandings of their generalization ability, especially for node-level tasks where data are not independent and identically-distributed (IID), have been sparse. The theoretical investigation of the generalization performance is beneficial for understanding fundamental issues (such as fairness) of GNN models and designing better learning methods. In this paper, we present a novel PAC-Bayesian analysis for GNNs under a non-IID semi-supervised learning setup. Moreover, we analyze the generalization performances on different subgroups of unlabeled nodes, which allows us to further study an accuracy-(dis)parity-style (un)fairness of GNNs from a theoretical perspective. Under reasonable assumptions, we demonstrate that the distance between a test subgroup and the training set can be a key factor affecting the GNN performance on that subgroup, which calls special attention to the training node selection for fair learning. Experiments across multiple GNN models and datasets support our theoretical results.

【8】 GraphAnoGAN: Detecting Anomalous Snapshots from Attributed Graphs 标题:GraphAnoGAN:从属性图中检测异常快照

作者:Siddharth Bhatia,Yiwei Wang,Bryan Hooi,Tanmoy Chakraborty 机构: National University of Singapore, IIIT-Delhi, India 备注:Accepted at ECML-PKDD 2021 链接:https://arxiv.org/abs/2106.15504 摘要:最近,从图形中发现异常快照引起了极大的关注。现有的研究利用诸如子空间选择、自我网络或社区分析等浅层学习机制来解决这个问题。这些模型没有考虑网络中结构和属性之间的多方面相互作用。本文提出了一种异常快照排序框架GraphAnoGAN,它由两个核心组件生成模型和判别模型组成。具体地说,生成模型从候选图快照集中学习近似异常样本的分布,判别模型检测采样的快照是否来自地面真值。在4个真实网络上的实验表明,GraphAnoGAN优于6个基线,具有显著的优势(与所有数据集平均的最佳基线相比,准确率和召回率分别提高28.29%和22.01%)。 摘要:Finding anomalous snapshots from a graph has garnered huge attention recently. Existing studies address the problem using shallow learning mechanisms such as subspace selection, ego-network, or community analysis. These models do not take into account the multifaceted interactions between the structure and attributes in the network. In this paper, we propose GraphAnoGAN, an anomalous snapshot ranking framework, which consists of two core components -- generative and discriminative models. Specifically, the generative model learns to approximate the distribution of anomalous samples from the candidate set of graph snapshots, and the discriminative model detects whether the sampled snapshot is from the ground-truth or not. Experiments on 4 real-world networks show that GraphAnoGAN outperforms 6 baselines with a significant margin (28.29% and 22.01% higher precision and recall, respectively compared to the best baseline, averaged across all datasets).

【9】 Curious Explorer: a provable exploration strategy in Policy Learning 标题:好奇的探索者:政策学习中的一种可证明的探索策略

作者:Marco Miani,Maurizio Parton,Marco Romito 机构:University of Pisa, University of Chieti-Pescara 链接:https://arxiv.org/abs/2106.15503 摘要:对于策略梯度方法来说,访问一个具有探索性的重新启动分布(所谓的广覆盖假设)是至关重要的。这是因为,虽然目标函数对不太可能出现的状态的更新不敏感,但是代理仍然需要在这些状态中进行改进,以达到接近最优的收益。因此,在分析实际政策梯度方法的理论性质时,以某种形式使用了宽覆盖。然而,这种假设在某些环境中是不可行的,例如在线学习时,或者只能从固定的初始状态重新启动时。在这种情况下,经典的策略梯度算法的收敛性和样本效率都很差。在本文中,我们开发了好奇探索者,一种新颖而简单的迭代状态空间探索策略,可用于任何起始分布$rho$。好奇资源管理器从$rho$开始,然后使用分配给访问不佳的状态集的内在奖励生成一系列策略,每一个策略都比前一个策略更具探索性,并且以知情的方式,最后根据探索性策略的状态访问分布输出一个重新启动模型$mu$。好奇的探索者是可以证明的,从这个意义上说,我们提供了一个最佳策略访问访问访问量很低的州的频率的理论上限。当PAC优化器插入浏览器时,这些边界可以用来证明PAC收敛性和样本效率结果。这使得在没有任何覆盖假设的情况下,可以实现全局收敛和样本效率结果,并且对于任何其他策略梯度方法都有可能确保PAC在大覆盖范围内的收敛。最后,我们将好奇资源管理器的输出插入到REINFORCE和TRPO中,并通过实验证明它可以提高MDPs中具有挑战性的探索的性能。 摘要:Having access to an exploring restart distribution (the so-called wide coverage assumption) is critical with policy gradient methods. This is due to the fact that, while the objective function is insensitive to updates in unlikely states, the agent may still need improvements in those states in order to reach a nearly optimal payoff. For this reason, wide coverage is used in some form when analyzing theoretical properties of practical policy gradient methods. However, this assumption can be unfeasible in certain environments, for instance when learning is online, or when restarts are possible only from a fixed initial state. In these cases, classical policy gradient algorithms can have very poor convergence properties and sample efficiency. In this paper, we develop Curious Explorer, a novel and simple iterative state space exploration strategy that can be used with any starting distribution $rho$. Curious Explorer starts from $rho$, then using intrinsic rewards assigned to the set of poorly visited states produces a sequence of policies, each one more exploratory than the previous one in an informed way, and finally outputs a restart model $mu$ based on the state visitation distribution of the exploratory policies. Curious Explorer is provable, in the sense that we provide theoretical upper bounds on how often an optimal policy visits poorly visited states. These bounds can be used to prove PAC convergence and sample efficiency results when a PAC optimizer is plugged in Curious Explorer. This allows to achieve global convergence and sample efficiency results without any coverage assumption for REINFORCE, and potentially for any other policy gradient method ensuring PAC convergence with wide coverage. Finally, we plug (the output of) Curious Explorer into REINFORCE and TRPO, and show empirically that it can improve performance in MDPs with challenging exploration.

【10】 Attentive Neural Processes and Batch Bayesian Optimization for Scalable Calibration of Physics-Informed Digital Twins 标题:关注神经过程和批量贝叶斯优化在物理信息数字孪生可扩展校准中的应用

作者:Ankush Chakrabarty,Gordon Wichern,Christopher Laughman 备注:12 pages, accepted to ICML 2021 Workshop on Tackling Climate Change with Machine Learning 链接:https://arxiv.org/abs/2106.15502 摘要:以物理为基础的动态系统模型是构建环境的数字孪生体的关键组成部分。这些数字孪生子使节能基础设施的设计成为可能,但必须进行适当校准,以准确反映系统行为,以便进行下游预测和分析。现代建筑的动力系统模型通常由大量的参数来描述,在仿真过程中会产生大量的计算开销。为了在无需过度模拟的情况下处理数字孪生子的大规模校准,我们提出了ANP-BBO:一种利用注意力神经过程(ANP)的可扩展并行批处理贝叶斯优化(BBO)方法。 摘要:Physics-informed dynamical system models form critical components of digital twins of the built environment. These digital twins enable the design of energy-efficient infrastructure, but must be properly calibrated to accurately reflect system behavior for downstream prediction and analysis. Dynamical system models of modern buildings are typically described by a large number of parameters and incur significant computational expenditure during simulations. To handle large-scale calibration of digital twins without exorbitant simulations, we propose ANP-BBO: a scalable and parallelizable batch-wise Bayesian optimization (BBO) methodology that leverages attentive neural processes (ANPs).

【11】 Self-Contrastive Learning 标题:自我对比学习

作者:Sangmin Bae,Sungnyun Kim,Jongwoo Ko,Gihun Lee,Seungjong Noh,Se-Young Yun 机构:KAIST, SK hynix 链接:https://arxiv.org/abs/2106.15499 摘要:本文提出了一种新的对比学习框架,称为自对比学习(SelfCon),它是在网络不同层次的多个输出中进行自我对比的学习。我们确认SelfCon损失保证了中间表示和最后表示之间的互信息(MI)的下界。此外,我们通过各种MI估计的实证研究表明,SelfCon损失与MI的增加和更好的分类性能高度相关。在我们的实验中,SelfCon优于有监督对比学习(SupCon),不需要多视角的批处理,并且计算成本更低。特别是在ResNet-18上,CIFAR-100数据集的top-1分类准确率为76.45%,分别比SupCon和交叉熵损失高2.87%和4.36%。我们发现,减少消失梯度和过度拟合的问题,使我们的方法优于同行。 摘要:This paper proposes a novel contrastive learning framework, coined as Self-Contrastive (SelfCon) Learning, that self-contrasts within multiple outputs from the different levels of a network. We confirmed that SelfCon loss guarantees the lower bound of mutual information (MI) between the intermediate and last representations. Besides, we empirically showed, via various MI estimators, that SelfCon loss highly correlates to the increase of MI and better classification performance. In our experiments, SelfCon surpasses supervised contrastive (SupCon) learning without the need for a multi-viewed batch and with the cheaper computational cost. Especially on ResNet-18, we achieved top-1 classification accuracy of 76.45% for the CIFAR-100 dataset, which is 2.87% and 4.36% higher than SupCon and cross-entropy loss, respectively. We found that mitigating both vanishing gradient and overfitting issue makes our method outperform the counterparts.

【12】 Few-Shot Electronic Health Record Coding through Graph Contrastive Learning 标题:基于图形对比学习的Few-Shot电子病历编码

作者:Shanshan Wang,Pengjie Ren,Zhumin Chen,Zhaochun Ren,Huasheng Liang,Qiang Yan,Evangelos Kanoulas,Maarten de Rijke 链接:https://arxiv.org/abs/2106.15467 摘要:电子健康记录(EHR)编码的任务是为每个EHR分配ICD代码。以往的研究大多只关注于频繁的ICD码,对罕见和频繁的ICD码的处理方法也不尽相同。这些方法对常见的ICD码有很好的性能,但由于ICD码的分布极不均衡,对少数ICD码的性能很不理想。我们试图通过使用基于对比图的EHR编码框架CoGraph来提高频繁和罕见ICD码的性能,CoGraph将EHR编码重新转换为一个小镜头学习任务。首先,我们为每个EHR构造一个异构EHR词实体(HEWE)图,其中从EHR中提取的词和实体作为节点,它们之间的关系作为边。然后,CoGraph从不同的ICD码中学习不同的图之间的相似性和不相似性,以便在它们之间传递信息。在少数镜头学习场景中,该模型在训练过程中只能访问频繁的ICD码,这可能会迫使它编码只对频繁的ICD码有用的特征。为了降低这种风险,CoGraph设计了两种图形对比学习方案GSCL和GECL,它们利用HEWE图形结构对可转换特征进行编码。GSCL利用HEWE图中不同子图的内部相关性,而GECL利用不同临床阶段HEWE图之间的相互相关性。在micimic-III基准数据集上的实验表明,CoGraph在EHR编码方面,不仅在频繁ICD码上,而且在稀有码上,在多个评价指标上都显著优于现有的编码方法。在频繁ICD码上,GSCL和GECL分别提高了1.31%和0.61%的分类精度和F1,在稀有ICD码上CoGraph分别提高了2.12%和2.95%。 摘要:Electronic health record (EHR) coding is the task of assigning ICD codes to each EHR. Most previous studies either only focus on the frequent ICD codes or treat rare and frequent ICD codes in the same way. These methods perform well on frequent ICD codes but due to the extremely unbalanced distribution of ICD codes, the performance on rare ones is far from satisfactory. We seek to improve the performance for both frequent and rare ICD codes by using a contrastive graph-based EHR coding framework, CoGraph, which re-casts EHR coding as a few-shot learning task. First, we construct a heterogeneous EHR word-entity (HEWE) graph for each EHR, where the words and entities extracted from an EHR serve as nodes and the relations between them serve as edges. Then, CoGraph learns similarities and dissimilarities between HEWE graphs from different ICD codes so that information can be transferred among them. In a few-shot learning scenario, the model only has access to frequent ICD codes during training, which might force it to encode features that are useful for frequent ICD codes only. To mitigate this risk, CoGraph devises two graph contrastive learning schemes, GSCL and GECL, that exploit the HEWE graph structures so as to encode transferable features. GSCL utilizes the intra-correlation of different sub-graphs sampled from HEWE graphs while GECL exploits the inter-correlation among HEWE graphs at different clinical stages. Experiments on the MIMIC-III benchmark dataset show that CoGraph significantly outperforms state-of-the-art methods on EHR coding, not only on frequent ICD codes, but also on rare codes, in terms of several evaluation indicators. On frequent ICD codes, GSCL and GECL improve the classification accuracy and F1 by 1.31% and 0.61%, respectively, and on rare ICD codes CoGraph has more obvious improvements by 2.12% and 2.95%.

【13】 A Mechanism for Producing Aligned Latent Spaces with Autoencoders 标题:一种利用自动编码器产生对齐潜在空间的机制

作者:Saachi Jain,Adityanarayanan Radhakrishnan,Caroline Uhler 机构: and Institute for Data, Massachusetts Institute ofTechnology 1Code available at https 链接:https://arxiv.org/abs/2106.15456 摘要:对齐的潜在空间,其中输入空间中有意义的语义转移对应于嵌入空间中的翻译,对下游任务的成功起着重要作用,如无监督聚类和数据插补。在这项工作中,我们证明了线性和非线性自动编码器通过沿着数据的左奇异向量拉伸产生对齐的潜在空间。我们充分描述了线性自动编码器中的拉伸量,并提供了一个初始化方案,使用这些网络沿顶部方向任意拉伸。我们还量化了拉伸量在非线性自动编码器在一个简化的设置。我们利用我们的理论结果在基因表达空间的细胞类型和单词嵌入空间的语义转移中对齐药物特征。 摘要:Aligned latent spaces, where meaningful semantic shifts in the input space correspond to a translation in the embedding space, play an important role in the success of downstream tasks such as unsupervised clustering and data imputation. In this work, we prove that linear and nonlinear autoencoders produce aligned latent spaces by stretching along the left singular vectors of the data. We fully characterize the amount of stretching in linear autoencoders and provide an initialization scheme to arbitrarily stretch along the top directions using these networks. We also quantify the amount of stretching in nonlinear autoencoders in a simplified setting. We use our theoretical results to align drug signatures across cell types in gene expression space and semantic shifts in word embedding spaces.

【14】 A Systematic Evaluation of Domain Adaptation in Facial Expression Recognition 标题:人脸表情识别中领域自适应的系统评价

作者:Yan San Kong,Varsha Suresh,Jonathan Soh,Desmond C. Ong 机构:Department of Information Systems and Analytics, National University of Singapore, Department of Computer Science, National University of Singapore, Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore 链接:https://arxiv.org/abs/2106.15453 摘要:面部表情识别是一个重要的商业应用,但一个常见的限制是,应用程序通常需要对样本外分布进行预测,其中目标图像可能与模型训练的图像具有非常不同的特性。这些模型在看不见的目标域上做得好还是坏?在本文中,我们提供了一个系统的评估领域适应在面部表情识别。利用最先进的迁移学习技术和六个常用的面部表情数据集(三个在实验室收集,三个在野外收集),我们进行了广泛的循环实验,以检验最先进的CNN模型的分类精度。我们还进行多源实验,检查模型从多源数据集传输的能力,包括(i)设置内(例如,实验室到实验室),(ii)交叉设置(例如,在野外到实验室),(iii)混合设置(例如,实验室和野外到实验室)传输学习实验。我们发现,迁移学习的准确率并不高,并且随着目标数据集的不同而不同,而源数据集的准确率则较低。一般来说,传递的最佳设置包括微调预训练模型的权重,我们发现,无论设置如何,使用更多数据集的训练都可以提高传递性能。最后,我们讨论了对FER模型(尤其是已部署应用程序)的可推广性进行更多、定期的系统性研究的必要性。 摘要:Facial Expression Recognition is a commercially important application, but one common limitation is that applications often require making predictions on out-of-sample distributions, where target images may have very different properties from the images that the model was trained on. How well, or badly, do these models do on unseen target domains? In this paper, we provide a systematic evaluation of domain adaptation in facial expression recognition. Using state-of-the-art transfer learning techniques and six commonly-used facial expression datasets (three collected in the lab and three "in-the-wild"), we conduct extensive round-robin experiments to examine the classification accuracies for a state-of-the-art CNN model. We also perform multi-source experiments where we examine a model's ability to transfer from multiple source datasets, including (i) within-setting (e.g., lab to lab), (ii) cross-setting (e.g., in-the-wild to lab), (iii) mixed-setting (e.g., lab and wild to lab) transfer learning experiments. We find sobering results that the accuracy of transfer learning is not high, and varies idiosyncratically with the target dataset, and to a lesser extent the source dataset. Generally, the best settings for transfer include fine-tuning the weights of a pre-trained model, and we find that training with more datasets, regardless of setting, improves transfer performance. We end with a discussion of the need for more -- and regular -- systematic investigations into the generalizability of FER models, especially for deployed applications.

【15】 Coach2vec: autoencoding the playing style of soccer coaches 标题:Coach2vec:自动编码足球教练的打法

作者:Paolo Cintia,Luca Pappalardo 机构: autoencoding the playing style of soccer coachesPaolo Cintia (University of Pisa)Luca Pappalardo (ISTI-CNR 链接:https://arxiv.org/abs/2106.15444 摘要:捕捉职业足球教练的比赛风格是一个复杂的,但几乎没有探索,在体育分析任务。如今,描述足球比赛的每个相关时空方面的数字数据的可用性,允许以自动的方式捕获和分析球员、球队和教练的比赛风格。在本文中,我们提出了coach2vec,一个工作流程,以捕捉专业教练的比赛方式使用比赛项目流和人工智能。Coach2vec从每场比赛中提取出球占有率,根据相似度对球占有率进行聚类,重构教练员的典型球占有率。然后,它使用一个自动编码器,一种人工神经网络,来获得每个教练的比赛风格的简明表示(编码)。我们在描述意大利甲级联赛过去四个赛季的足球日志上进行的实验,揭示了著名教练之间有趣的相似之处,为模拟比赛风格和职业教练的定量比较铺平了道路。 摘要:Capturing the playing style of professional soccer coaches is a complex, and yet barely explored, task in sports analytics. Nowadays, the availability of digital data describing every relevant spatio-temporal aspect of soccer matches, allows for capturing and analyzing the playing style of players, teams, and coaches in an automatic way. In this paper, we present coach2vec, a workflow to capture the playing style of professional coaches using match event streams and artificial intelligence. Coach2vec extracts ball possessions from each match, clusters them based on their similarity, and reconstructs the typical ball possessions of coaches. Then, it uses an autoencoder, a type of artificial neural network, to obtain a concise representation (encoding) of the playing style of each coach. Our experiments, conducted on soccer-logs describing the last four seasons of the Italian first division, reveal interesting similarities between prominent coaches, paving the road to the simulation of playing styles and the quantitative comparison of professional coaches.

【16】 Semantic Reasoning from Model-Agnostic Explanations 标题:基于模型不可知性解释的语义推理

作者:Timen Stepišnik Perdih,Nada Lavrač,Blaž Škrlj 机构:Semantic Reasoning from Model-AgnosticExplanationsTimen Stepiˇsnik PerdihJoˇzef Stefan InstituteLjubljana, comNada LavraˇcJoˇzef Stefan InstituteLjubljana, SloveniaUniversity of Nova GoricaNova Gorica 链接:https://arxiv.org/abs/2106.15433 摘要:随着黑盒模型的广泛采用,基于实例的解释工具,如LIME和SHAP变得越来越流行。这些工具产生解释,精确定位与给定预测相关的关键特征的贡献。然而,所获得的解释仍然停留在原始特征水平上,并且不一定能被没有广泛领域知识的人类专家理解。我们提出了ReEx(带解释的推理),一种适用于由任意实例级解释程序(如SHAP)生成的解释的方法。通过使用本体形式的背景知识,ReEx以一种最不一般的泛化方式来泛化实例解释。得到的符号描述是特定于单个类的,并基于解释者的输出提供了一般化。衍生的语义解释可能更具信息性,因为它们在更一般的背景知识中描述了关键属性,例如在生物过程层面。我们展示了ReEx在九个生物数据集上的性能,表明可以获得紧凑的语义解释,并且比将术语直接链接到特征名称的通用本体映射更具信息性。ReEx是作为一个易于使用的Python库提供的,并且与SHAP等工具兼容。据我们所知,这是最早将语义推理与现代模型解释方法直接结合起来的方法之一。这张纸是预印本。完整版本的doi是:10.1109/SAMI50585.2021.9378668 摘要:With the wide adoption of black-box models, instance-based emph{post hoc} explanation tools, such as LIME and SHAP became increasingly popular. These tools produce explanations, pinpointing contributions of key features associated with a given prediction. However, the obtained explanations remain at the raw feature level and are not necessarily understandable by a human expert without extensive domain knowledge. We propose ReEx (Reasoning with Explanations), a method applicable to explanations generated by arbitrary instance-level explainers, such as SHAP. By using background knowledge in the form of ontologies, ReEx generalizes instance explanations in a least general generalization-like manner. The resulting symbolic descriptions are specific for individual classes and offer generalizations based on the explainer's output. The derived semantic explanations are potentially more informative, as they describe the key attributes in the context of more general background knowledge, e.g., at the biological process level. We showcase ReEx's performance on nine biological data sets, showing that compact, semantic explanations can be obtained and are more informative than generic ontology mappings that link terms directly to feature names. ReEx is offered as a simple-to-use Python library and is compatible with tools such as SHAP and similar. To our knowledge, this is one of the first methods that directly couples semantic reasoning with contemporary model explanation methods. This paper is a preprint. Full version's doi is: 10.1109/SAMI50585.2021.9378668

【17】 A Convergent and Efficient Deep Q Network Algorithm 标题:一种收敛高效的深度Q网络算法

作者:Zhikang T. Wang,Masahito Ueda 机构:Department of Physics and, Institute for Physics of Intelligence, University of Tokyo, RIKEN Center for Emergent Matter Science (CEMS) 链接:https://arxiv.org/abs/2106.15419 摘要:尽管deepq网络(DQN)强化学习算法及其变体在经验上取得了成功,但DQN仍然没有得到很好的理解,也不能保证其收敛性。在这项工作中,我们表明,DQN可以发散,并停止在现实环境中运行。尽管存在基于梯度的收敛方法,但我们证明了它们在学习行为上存在固有的问题,并阐明了它们在实践中经常失败的原因。为了克服这些问题,我们通过对DQN算法的仔细修改,提出了一种收敛的DQN算法(C-DQN),并证明了该算法是收敛的,并且可以处理较大的折扣因子(0.9998)。它在困难的环境中学习能力很强,并且可以在Atari 2600基准测试中学习一些DQN失败的困难游戏,并且计算预算适中。我们的代码已经公开发布,可以用来复制我们的结果。 摘要:Despite the empirical success of the deep Q network (DQN) reinforcement learning algorithm and its variants, DQN is still not well understood and it does not guarantee convergence. In this work, we show that DQN can diverge and cease to operate in realistic settings. Although there exist gradient-based convergent methods, we show that they actually have inherent problems in learning behaviour and elucidate why they often fail in practice. To overcome these problems, we propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN, and we show that the algorithm is convergent and can work with large discount factors (0.9998). It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark where DQN fail, within a moderate computational budget. Our codes have been publicly released and can be used to reproduce our results.

【18】 High-dimensional separability for one- and few-shot learning 标题:单次和少次学习的高维可分性

作者:Alexander N. Gorban,Bogdan Grechuk,Evgeny M. Mirkes,Sergey V. Stasenko,Ivan Y. Tyukin 机构:Department of Mathematics, University of Leicester, Leicester, UK, Lobachevsky University, Nizhni Novgorod, Russia, Department of Geoscience and Petroleum, Norwegian University of Science and Technology, , Received: date; Accepted: date; Published: date 链接:https://arxiv.org/abs/2106.15416 摘要:这项工作是由一个实际的问题,人工智能(AI)错误的纠正。对一个大型人工智能系统进行系统的再训练几乎是不可能的。为了解决这个问题,专门的外部设备,校正器,被开发出来。他们应该提供快速和非迭代的系统修复,而不需要修改遗留的人工智能系统。AI校正器的一个通用部分是一个分类器,它应该将不希望的和错误的行为从正常操作中分离出来。训练这样的分类器是一个巨大的挑战,在核心的一个和少数镜头学习方法。简单方法的有效性是基于显著的维度缩减或维度效应的支持。随机可分性是维数现象的一个优点,它允许一次或几次错误纠正:在高维数据集中,在广泛的假设下,每个点都可以通过简单而健壮的线性判别法从集合的其余部分中分离出来。引入了数据域的层次结构,其中每个数据簇都有一个细粒度的内部结构等,建立并证明了新的细粒度数据分布的随机分离定理。在数据空间模式紧嵌入的假设下,证明了无限维极限下的分离定理。提出了一种新的人工智能系统多重校正方法,并以深度卷积神经网络预测误差和学习新类对象为例进行了说明。 摘要:This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors. Systematic re-training of a large AI system is hardly possible. To solve this problem, special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system. A common universal part of the AI corrector is a classifier that should separate undesired and erroneous behavior from normal operation. Training of such classifiers is a grand challenge at the heart of the one- and few-shot learning methods. Effectiveness of one- and few-short methods is based on either significant dimensionality reductions or the blessing of dimensionality effects. Stochastic separability is a blessing of dimensionality phenomenon that allows one-and few-shot error correction: in high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust linear discriminant. The hierarchical structure of data universe is introduced where each data cluster has a granular internal structure, etc. New stochastic separation theorems for the data distributions with fine-grained structure are formulated and proved. Separation theorems in infinite-dimensional limits are proven under assumptions of compact embedding of patterns into data space. New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.

【19】 Explaining the Performance of Multi-label Classification Methods with Data Set Properties 标题:用数据集属性解释多标签分类方法的性能

作者:Jasmin Bogatinovski,Ljupčo Todorovski,Sašo Džeroski,Dragi Kocev 机构:Joˇzef Stefan Institute, Ljubljana, Slovenia, Joˇzef Stefan IPSchool, Ljubljana, Slovenia, Dept. of Distributed Operating Systems, TU Berlin, Germany, University of Ljubljana, Slovenia, Bias Variance Labs, Ljubljana, Slovenia 链接:https://arxiv.org/abs/2106.15411 摘要:元学习概括了不同学习任务的经验经验,为机器学习算法的行为提供了重要的经验启示。在本文中,我们提出了一个全面的元学习研究的数据集和方法的多标签分类(MLC)。MLC是一个实际相关的机器学习任务,其中每个示例同时被标记为多个标签。在这里,我们使用50个描述数据不同属性的元特征来分析40个MLC数据集。本研究的主要发现如下。首先,描述MLC数据集空间的最突出的元特征是评估标签空间的不同方面的元特征。其次,元模型表明,最重要的元特征描述了标签空间,并且,描述标签之间关系的元特征往往比描述单个标签之间和内部分布的元特征更频繁地出现。第三,超参数的优化可以提高预测性能,然而,通常改进的程度并不总是证明资源利用率的合理性。 摘要:Meta learning generalizes the empirical experience with different learning tasks and holds promise for providing important empirical insight into the behaviour of machine learning algorithms. In this paper, we present a comprehensive meta-learning study of data sets and methods for multi-label classification (MLC). MLC is a practically relevant machine learning task where each example is labelled with multiple labels simultaneously. Here, we analyze 40 MLC data sets by using 50 meta features describing different properties of the data. The main findings of this study are as follows. First, the most prominent meta features that describe the space of MLC data sets are the ones assessing different aspects of the label space. Second, the meta models show that the most important meta features describe the label space, and, the meta features describing the relationships among the labels tend to occur a bit more often than the meta features describing the distributions between and within the individual labels. Third, the optimization of the hyperparameters can improve the predictive performance, however, quite often the extent of the improvements does not always justify the resource utilization.

【20】 Automated Repair of Process Models with Non-Local Constraints Using State-Based Region Theory 标题:基于状态区域理论的非局部约束过程模型自动修复

作者:Anna Kalenkova,Josep Carmona,Artem Polyvyanyy,Marcello La Rosa 机构:School of Computing and Information Systems, The University of Melbourne, Australia, Department of Computer Science, Polytechnic, University of Catalonia, Spain 链接:https://arxiv.org/abs/2106.15398 摘要:最先进的进程发现方法从事件日志构建自由选择的进程模型。因此,构建的模型不考虑事件之间的间接依赖关系。当输入行为不是自由选择时,这些方法就不能提供精确的模型。在本文中,我们提出了一种新的方法来增强自由选择过程模型,通过添加非自由选择结构发现a-后验通过基于区域的技术。这使我们能够受益于现有过程发现方法的性能和所采用的基本合成技术的准确性。我们证明了当存在间接依赖时,该方法保持了事件日志的适应度,同时提高了精度。该方法已经在合成数据集和实际数据集上实现和测试。结果表明,该方法能有效地修复从事件日志中发现的模型。 摘要:State-of-the-art process discovery methods construct free-choice process models from event logs. Consequently, the constructed models do not take into account indirect dependencies between events. Whenever the input behaviour is not free-choice, these methods fail to provide a precise model. In this paper, we propose a novel approach for enhancing free-choice process models by adding non-free-choice constructs discovered a-posteriori via region-based techniques. This allows us to benefit from the performance of existing process discovery methods and the accuracy of the employed fundamental synthesis techniques. We prove that the proposed approach preserves fitness with respect to the event log while improving the precision when indirect dependencies exist. The approach has been implemented and tested on both synthetic and real-life datasets. The results show its effectiveness in repairing models discovered from event logs.

【21】 Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes 标题:线性可解马尔可夫决策过程的全局最优分层强化学习

作者:Guillermo Infante,Anders Jonsso,Vicenç Gómez 机构:DTIC, Universitat Pompeu Fabra 链接:https://arxiv.org/abs/2106.15380 摘要:在这项工作中,我们提出了一种新的方法来分层强化学习线性可解马尔可夫决策过程。我们的方法假设状态空间是分区的,子任务在分区之间移动。我们在多个抽象层次上表示值函数,并使用子任务的组合性来估计每个分区中状态的最佳值。策略是在这些最优值估计上隐式定义的,而不是在子任务之间分解。因此,我们的方法可以学习全局最优策略,并且不会受到高层决策的非平稳性的影响。如果多个分区具有等效的动态性,则可以共享这些分区的子任务。如果边界状态集小于整个状态空间,那么我们的方法的样本复杂度将明显小于平面学习者的样本复杂度,并且我们在几个实验中验证了这一点。 摘要:In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.

【22】 DRILL-- Deep Reinforcement Learning for Refinement Operators in mathcal{ALC}

作者:Caglar Demir,Axel-Cyrille Ngonga Ngomo 机构:Data Science Research Group, Paderborn University 链接:https://arxiv.org/abs/2106.15373 摘要:基于求精算子的方法已成功应用于RDF知识图的类表达式学习。这些方法往往需要探索大量的概念,以找到适当的假设。这种需求可以说是源于当前的方法依赖于短视的启发式函数来引导他们的搜索通过一个无限的概念空间。反过来,深度强化学习提供了有效的手段来解决近视估计多少折扣累积未来奖励国家承诺。在这项工作中,我们利用深度强化学习来加速$mathcal{ALC}$中概念的学习,提出了一种新的类表达式学习方法DRILL,它使用卷积深度Q学习模型来指导其搜索。凭借其体系结构,DRILL能够在标准硬件上计算出每秒超过$10^3$类表达式的预期折扣累积未来回报。我们根据最先进的方法对四个基准数据集进行了评估。我们的结果表明,在所有基准数据集上,DRILL收敛到目标状态的速度至少比最先进的模型快2.7$倍。我们提供了我们方法的开源实现,包括训练和评估脚本以及预先训练的模型。 摘要:Approaches based on refinement operators have been successfully applied to class expression learning on RDF knowledge graphs. These approaches often need to explore a large number of concepts to find adequate hypotheses. This need arguably stems from current approaches relying on myopic heuristic functions to guide their search through an infinite concept space. In turn, deep reinforcement learning provides effective means to address myopia by estimating how much discounted cumulated future reward states promise. In this work, we leverage deep reinforcement learning to accelerate the learning of concepts in $mathcal{ALC}$ by proposing DRILL -- a novel class expression learning approach that uses a convolutional deep Q-learning model to steer its search. By virtue of its architecture, DRILL is able to compute the expected discounted cumulated future reward of more than $10^3$ class expressions in a second on standard hardware. We evaluate DRILL on four benchmark datasets against state-of-the-art approaches. Our results suggest that DRILL converges to goal states at least 2.7$times$ faster than state-of-the-art models on all benchmark datasets. We provide an open-source implementation of our approach, including training and evaluation scripts as well as pre-trained models.

【23】 MAML is a Noisy Contrastive Learner 标题:MAML是一个嘈杂的对比学习者

作者:Chia-Hsiang Kao,Wei-Chen Chiu,Pin-Yu Chen 机构:†National Yang Ming Chiao Tung University, Taiwan, ‡IBM Research 备注:15 pages, 11 figures 链接:https://arxiv.org/abs/2106.15367 摘要:模型不可知元学习(MAML)是当前最流行、应用最广泛的元学习算法之一,在各种学习问题中取得了显著的成功。然而,由于嵌套内循环和外循环更新的独特设计分别控制着任务特定学习和元模型中心学习,MAML的潜在学习目标仍然是隐含的,因此阻碍了对其更直接的理解。本文为MAML的工作机制提供了一个新的视角,发现:MAML类似于一个使用监督对比目标函数的元学习者,其中的查询特征被拉向同一类的支持特征,而不是不同类的支持特征,通过基于余弦相似性的分析实验验证了这种对比性。此外,我们的分析显示,香草MAML算法有一个不良的干扰项源自随机初始化和跨任务交互。因此,我们提出了一种简单而有效的技术,即归零技术来减轻这种干扰,并在minimagenet和Omniglot数据集上进行了大量的实验,证明了我们提出的技术所带来的一致性改进,从而很好地验证了它的有效性。 摘要:Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays, which achieves remarkable success in various learning problems. Yet, with the unique design of nested inner-loop and outer-loop updates which respectively govern the task-specific and meta-model-centric learning, the underlying learning objective of MAML still remains implicit and thus impedes a more straightforward understanding of it. In this paper, we provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function, where the query features are pulled towards the support features of the same class and against those of different classes, in which such contrastiveness is experimentally verified via an analysis based on the cosine similarity. Moreover, our analysis reveals that the vanilla MAML algorithm has an undesirable interference term originating from the random initialization and the cross-task interaction. We therefore propose a simple but effective technique, zeroing trick, to alleviate such interference, where the extensive experiments are then conducted on both miniImagenet and Omniglot datasets to demonstrate the consistent improvement brought by our proposed technique thus well validating its effectiveness.

【24】 Federated Learning for Intrusion Detection in IoT Security: A Hybrid Ensemble Approach 标题:物联网安全中入侵检测的联合学习:一种混合集成方法

作者:Sayan Chatterjee,Manjesh K. Hanawal 机构:IEOR, IIT Bombay, Mumbai, India 备注:11 pages, 3 figures 链接:https://arxiv.org/abs/2106.15349 摘要:物联网在智能城市、医疗保健、供应链、交通运输等领域发挥着重要作用,成为恶意攻击的目标。过去在这方面的工作主要集中在集中式入侵检测系统(IDS),假设存在一个中心实体来执行数据分析和识别威胁。然而,这样的IDS可能并不总是可行的,主要是由于数据在多个源之间的传播和在中心节点的收集可能是昂贵的。另外,早期的工作主要集中在提高真阳性率(TPR),而忽略了假阳性率(FPR),这也是避免系统不必要停机的关键。在本文中,我们首先提出了一种基于混合集成模型的入侵检测系统体系结构,称为PHEC,它比现有的体系结构具有更好的性能。然后,我们将此模型调整为执行局部训练并仅聚合模型参数的联合学习框架。接下来,我们提出了在集中式和联邦式环境下的抗噪声PHEC来解决标签噪声问题。该方法使用加权凸代理损失函数作为分类器。该结构还利用了KNN分类器对噪声数据的自然鲁棒性。在四个测试数据集上的实验结果表明,该模型在保持低噪声和干净数据的FPR的同时,实现了较高的TPR。此外,他们还证明了混合集成模型在联邦环境中的性能接近于集中式环境。 摘要:Critical role of Internet of Things (IoT) in various domains like smart city, healthcare, supply chain and transportation has made them the target of malicious attacks. Past works in this area focused on centralized Intrusion Detection System (IDS), assuming the existence of a central entity to perform data analysis and identify threats. However, such IDS may not always be feasible, mainly due to spread of data across multiple sources and gathering at central node can be costly. Also, the earlier works primarily focused on improving True Positive Rate (TPR) and ignored the False Positive Rate (FPR), which is also essential to avoid unnecessary downtime of the systems. In this paper, we first present an architecture for IDS based on hybrid ensemble model, named PHEC, which gives improved performance compared to state-of-the-art architectures. We then adapt this model to a federated learning framework that performs local training and aggregates only the model parameters. Next, we propose Noise-Tolerant PHEC in centralized and federated settings to address the label-noise problem. The proposed idea uses classifiers using weighted convex surrogate loss functions. Natural robustness of KNN classifier towards noisy data is also used in the proposed architecture. Experimental results on four benchmark datasets drawn from various security attacks show that our model achieves high TPR while keeping FPR low on noisy and clean data. Further, they also demonstrate that the hybrid ensemble models achieve performance in federated settings close to that of the centralized settings.

【25】 Probabilistic Attention for Interactive Segmentation 标题:交互式分词中的概率关注度

作者:Prasad Gabbur,Manjot Bilkhu,Javier Movellan 机构:Apple 备注:17 pages, 8 figures 链接:https://arxiv.org/abs/2106.15338 摘要:我们提供了注意的概率解释,并证明了Transformer中的标准点积注意是最大后验概率(MAP)推理的特例。提出的方法建议使用期望最大化算法在线调整关键和价值模型参数。这种方法对于外部代理(如注释器)提供有关某些标记的正确值(如某些像素的语义类别)的推理时信息的情况非常有用,我们需要以原则性的方式将此新信息传播到其他标记。在一个交互式语义切分任务中,注释者和模型在线协作以提高注释效率。使用标准基准,我们观察到关键自适应在低反馈状态下提高了模型性能($sim10%$mIoU),而值传播在高反馈状态下提高了模型响应性。我们的概率注意模型的PyTorch层实现将公开。 摘要:We provide a probabilistic interpretation of attention and show that the standard dot-product attention in transformers is a special case of Maximum A Posteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g, the semantic category of some pixels, and we need for this new information to propagate to other tokens in a principled manner. We illustrate the approach on an interactive semantic segmentation task in which annotators and models collaborate online to improve annotation efficiency. Using standard benchmarks, we observe that key adaptation boosts model performance ($sim10%$ mIoU) in the low feedback regime and value propagation improves model responsiveness in the high feedback regime. A PyTorch layer implementation of our probabilistic attention model will be made publicly available.

【26】 SE-MD: A Single-encoder multiple-decoder deep network for point cloud generation from 2D images 标题:SE-MD:一种用于从二维图像生成点云的单编码器多解码器深度网络

作者:Abdul Mueed Hafiz,Rouf Ul Alam Bhat,Shabir Ahmad Parah,M. Hassaballah 机构:the date of receipt and acceptance should be inserted later 链接:https://arxiv.org/abs/2106.15325 摘要:从单个二维RGB图像生成三维模型是一项具有挑战性的计算机视觉研究课题。针对同一问题,已经提出了使用传统网络体系结构的各种技术。然而,目前的研究工作还很有限,存在着各种各样的问题,如使用低效的三维表示格式、弱的三维模型生成主干、无法生成稠密点云、稠密点云生成的后处理依赖性以及RGB图像中轮廓的依赖性。本文提出了一种新的二维RGB图像到点云的转换技术,该技术利用网络结构中的并行化概念,以其高效、健壮和简单的模型改进了该领域的研究现状。它不仅利用了点云的高效和丰富的三维表示,而且利用了一种新颖而健壮的点云生成主干来解决当前普遍存在的问题。这涉及使用单个编码器-多解码器深度网络架构,其中每个解码器生成特定的固定视点。然后融合所有视点生成密集点云。对该技术进行了各种实验,并将其性能与其它先进技术进行了比较,取得了显著的效果。代码位于https://github.com/mueedhafiz1982/ 摘要:3D model generation from single 2D RGB images is a challenging and actively researched computer vision task. Various techniques using conventional network architectures have been proposed for the same. However, the body of research work is limited and there are various issues like using inefficient 3D representation formats, weak 3D model generation backbones, inability to generate dense point clouds, dependence of post-processing for generation of dense point clouds, and dependence on silhouettes in RGB images. In this paper, a novel 2D RGB image to point cloud conversion technique is proposed, which improves the state of art in the field due to its efficient, robust and simple model by using the concept of parallelization in network architecture. It not only uses the efficient and rich 3D representation of point clouds, but also uses a novel and robust point cloud generation backbone in order to address the prevalent issues. This involves using a single-encoder multiple-decoder deep network architecture wherein each decoder generates certain fixed viewpoints. This is followed by fusing all the viewpoints to generate a dense point cloud. Various experiments are conducted on the technique and its performance is compared with those of other state of the art techniques and impressive gains in performance are demonstrated. Code is available at https://github.com/mueedhafiz1982/

【27】 Effective Evaluation of Deep Active Learning on Image Classification Tasks 标题:深度主动学习在图像分类任务中的有效性评价

作者:Nathan Beck,Durga Sivasubramanian,Apurva Dani,Ganesh Ramakrishnan,Rishabh Iyer 机构:The University of Texas at Dallas, Indian Institute of Technology, Bombay, AIFY Innovation Labs 备注:9 pages in main paper, 6 figures in main paper, 3 tables in main paper. 23 pages in total, 15 figures in total, 14 tables in total. Submitted to and currently under review for NeurIPS 2021 链接:https://arxiv.org/abs/2106.15324 摘要:为了提高深度学习的效率,越来越多的论文研究了基于深度模型的主动学习。然而,在普遍的实验环境中存在着一些问题,主要是由于缺乏统一的实施和基准。当前文献中存在的问题包括:对不同AL算法性能的观察有时相互矛盾,无意中排除了重要的泛化方法,如用于优化的数据扩充和SGD,缺乏对评估方面的研究,如AL的标记效率,对于AL优于随机抽样(RS)的情况,很少或没有明确的说明。在这项工作中,我们提出了一个统一的重新实现的国家最先进的AL算法的背景下的图像分类,我们仔细研究这些问题作为有效的评估方面。在积极的一面,我们表明,与使用数据扩充的RS相比,AL技术的标签效率提高了2到4倍。令人惊讶的是,当包含数据扩充时,使用BADGE(一种最先进的方法)与简单的不确定性采样相比,不再有一致的收益。然后,我们仔细分析现有方法在不同冗余度和每个类的示例数下的性能。最后,我们提供了一些见解供AL从业者在未来的工作中考虑,例如AL批量大小的影响、初始化的影响、每轮重新训练新模型的重要性以及其他见解。 摘要:With the goal of making deep learning more label-efficient, a growing number of papers have been studying active learning (AL) for deep models. However, there are a number of issues in the prevalent experimental settings, mainly stemming from a lack of unified implementation and benchmarking. Issues in the current literature include sometimes contradictory observations on the performance of different AL algorithms, unintended exclusion of important generalization approaches such as data augmentation and SGD for optimization, a lack of study of evaluation facets like the labeling efficiency of AL, and little or no clarity on the scenarios in which AL outperforms random sampling (RS). In this work, we present a unified re-implementation of state-of-the-art AL algorithms in the context of image classification, and we carefully study these issues as facets of effective evaluation. On the positive side, we show that AL techniques are 2x to 4x more label-efficient compared to RS with the use of data augmentation. Surprisingly, when data augmentation is included, there is no longer a consistent gain in using BADGE, a state-of-the-art approach, over simple uncertainty sampling. We then do a careful analysis of how existing approaches perform with varying amounts of redundancy and number of examples per class. Finally, we provide several insights for AL practitioners to consider in future work, such as the effect of the AL batch size, the effect of initialization, the importance of retraining a new model at every round, and other insights.

【28】 TNCR: Table Net Detection and Classification Dataset 标题:TNCR:表网检测和分类数据集

作者:Abdelrahman Abdallah,Alexander Berendeyev,Islam Nuradin,Daniyar Nurseitov 机构:a Department of Machine Learning & Data Science , Satbayev, University, Almaty, Almaty, Kazakhstan, National Open Research Laboratory for Information and Space Technologies, Satbayev 链接:https://arxiv.org/abs/2106.15322 摘要:我们提出TNCR,一个新的表格数据集,不同的图像质量收集免费网站。TNCR数据集可用于扫描文档图像中的表检测,并可将其分为5类。TNCR包含9428个高质量的标记图像。在本文中,我们已经实现了最先进的基于深度学习的表检测方法,以创建几个强大的基线。在TNCR数据集上,采用ResNeXt-101-64x4d骨干网的Cascade Mask R-CNN方法的准确率为79.7%,召回率为89.8%,f1得分为84.4%,取得了最好的效果。我们已经使TNCR开源,希望鼓励对表检测、分类和结构识别采用更深入的学习方法。数据集和经过训练的模型检查点在https://github.com/abdoelsayed2016/TNCR_Dataset. 摘要:We present TNCR, a new table dataset with varying image quality collected from free websites. The TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes. TNCR contains 9428 high-quality labeled images. In this paper, we have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines. Cascade Mask R-CNN with ResNeXt-101-64x4d Backbone Network achieves the highest performance compared to other methods with a precision of 79.7%, recall of 89.8%, and f1 score of 84.4% on the TNCR dataset. We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification, and structure recognition. The dataset and trained model checkpoints are available at https://github.com/abdoelsayed2016/TNCR_Dataset.

【29】 Adaptive Sample Selection for Robust Learning under Label Noise 标题:标签噪声下鲁棒学习的自适应样本选择

作者:Deep Patel,P. S. Sastry 机构:Department of Electrical Engineering, Indian Institute of Science, Bangalore, Karnataka 备注:Preprint. Under review 链接:https://arxiv.org/abs/2106.15292 摘要:深度神经网络(DNNs)已被证明在有噪声标记的数据存在时容易被记忆或过度拟合。针对这种噪声数据下的鲁棒学习问题,提出了几种算法。一个突出的算法类依赖于样本选择策略,课程学习的动机。例如,许多算法使用“小损失技巧”,其中选择损失值低于某个阈值的部分样本进行训练。这些算法对这些阈值非常敏感,很难确定或学习这些阈值。通常,这些算法还需要标签噪声率等信息,而这些信息在实际中通常是不可用的。在本文中,我们提出了一个数据相关的自适应样本选择策略,它只依赖于给定小批量的批次统计信息来提供对标签噪声的鲁棒性。该算法不需要任何额外的样本选择超参数,不需要任何噪声率信息,也不需要访问带有干净标签的单独数据。我们在基准测试数据集上验证了算法的有效性。 摘要:Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily labelled data. For the problem of robust learning under such noisy data, several algorithms have been proposed. A prominent class of algorithms rely on sample selection strategies, motivated by curriculum learning. For example, many algorithms use the `small loss trick' wherein a fraction of samples with loss values below a certain threshold are selected for training. These algorithms are sensitive to such thresholds, and it is difficult to fix or learn these thresholds. Often, these algorithms also require information such as label noise rates which are typically unavailable in practice. In this paper, we propose a data-dependent, adaptive sample selection strategy that relies only on batch statistics of a given mini-batch to provide robustness against label noise. The algorithm does not have any additional hyperparameters for sample selection, does not need any information on noise rates, and does not need access to separate data with clean labels. We empirically demonstrate the effectiveness of our algorithm on benchmark datasets.

【30】 Cascaded Diffusion Models for High Fidelity Image Generation 标题:用于高保真图像生成的级联扩散模型

作者:Jonathan Ho,Chitwan Saharia,William Chan,David J. Fleet,Mohammad Norouzi,Tim Salimans 机构:Google Research 链接:https://arxiv.org/abs/2106.15282 摘要:我们证明了级联扩散模型能够在类条件ImageNet生成挑战上生成高保真图像,而不需要任何辅助图像分类器的帮助来提高样本质量。级联扩散模型包括多个扩散模型的管道,这些扩散模型生成分辨率不断提高的图像,首先是最低分辨率的标准扩散模型,然后是一个或多个超分辨率扩散模型,这些模型依次对图像进行上采样并添加更高分辨率的细节。我们发现级联管道的样本质量主要依赖于条件增强,我们提出的方法是将低分辨率条件输入数据增强到超分辨率模型中。我们的实验表明,条件增强可以防止级联模型采样过程中的复合误差,帮助我们训练级联管道,在64x64、128x128和256x256分辨率下的FID分数分别达到1.48、3.52和4.88,优于BigGAN-deep。 摘要:We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at 128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep.

【31】 Open-Set Representation Learning through Combinatorial Embedding 标题:基于组合嵌入的开集表示学习

作者:Geeho Kim,Bohyung Han 机构:Computer Vision Lab. & ASRI, Seoul National University 备注:12 pages, 4 figures 链接:https://arxiv.org/abs/2106.15278 摘要:视觉识别任务通常仅限于处理一小部分类,因为其余类的标签不可用。我们感兴趣的是通过基于标记类和未标记类的实例的表示学习来识别数据集中的新概念,并将识别范围扩展到已知类和新类。为了解决这个具有挑战性的任务,我们提出了一种组合学习方法,该方法利用多个有监督的元分类器在异构标签空间上给出的组合知识,自然地将示例聚类到不可见的类中。我们还引入了一种度量学习策略来估计成对伪标签,以改进未标记示例的表示,有效地保留了已知类和新类之间的语义关系。该算法通过联合优化来发现新的概念,增强未知类的可辨性,同时学习可归纳为新类的已知类的表示。我们的大量实验表明,该方法在多图像检索和新的类发现基准测试中取得了显著的性能改进。 摘要:Visual recognition tasks are often limited to dealing with a small subset of classes simply because the labels for the remaining classes are unavailable. We are interested in identifying novel concepts in a dataset through representation learning based on the examples in both labeled and unlabeled classes, and extending the horizon of recognition to both known and novel classes. To address this challenging task, we propose a combinatorial learning approach, which naturally clusters the examples in unseen classes using the compositional knowledge given by multiple supervised meta-classifiers on heterogeneous label spaces. We also introduce a metric learning strategy to estimate pairwise pseudo-labels for improving representations of unlabeled examples, which preserves semantic relations across known and novel classes effectively. The proposed algorithm discovers novel concepts via a joint optimization of enhancing the discrimitiveness of unseen classes as well as learning the representations of known classes generalizable to novel ones. Our extensive experiments demonstrate remarkable performance gains by the proposed approach in multiple image retrieval and novel class discovery benchmarks.

【32】 Learning Control Policies for Imitating Human Gaits 标题:模仿人体步态的学习控制策略

作者:Utkarsh A. Mishra 机构: IjspeertDEPARTMENT OF MECHANICAL AND INDUSTRIAL ENGINEERINGINDIAN INSTITUTE OF TECHNOLOGY ROORKEEROORKEE - 2 47667 (INDIA)APRIL 备注:47 pages, 17 figures, Bachelor of Technology Final Year Project Report 链接:https://arxiv.org/abs/2106.15273 摘要:本报告介绍了一个旨在学习模仿人类步态的框架。人类以最有效的方式表现出像行走、跑步和跳跃这样的动作,这是这个项目的动力来源。骨骼和肌肉骨骼人体模型被认为是在矢状面运动,并从两者的结果进行了详尽的比较。骨骼模型是由运动驱动的,而肌肉骨骼模型是通过肌腱驱动的。无模型强化学习算法用于优化逆动力学控制行为,以满足模仿参考运动的目标,以及最小化电机消耗的功率和肌肉消耗的代谢能量的次要目标。一方面,电机驱动模型的控制行为是通过比例微分控制器将目标关节角转化为关节力矩。另一方面,肌腱驱动模型的控制作用是将肌肉的激励隐式地转化为肌肉的激活,再转化为施加在关节上力矩的肌力。研究发现,肌腱驱动模型比运动驱动模型具有更大的优越性,因为它们由于肌肉激活动力学而具有固有的光滑性,并且不需要任何外部调节器。最后,讨论了一种在框架中获得重要决策变量最优配置的策略。所有的结果和分析都以说明性、定性和定量的方式呈现。附件中提供了支持视频链接。 摘要:The work presented in this report introduces a framework aimed towards learning to imitate human gaits. Humans exhibit movements like walking, running, and jumping in the most efficient manner, which served as the source of motivation for this project. Skeletal and Musculoskeletal human models were considered for motions in the sagittal plane, and results from both were compared exhaustively. While skeletal models are driven with motor actuation, musculoskeletal models perform through muscle-tendon actuation. Model-free reinforcement learning algorithms were used to optimize inverse dynamics control actions to satisfy the objective of imitating a reference motion along with secondary objectives of minimizing effort in terms of power spent by motors and metabolic energy consumed by the muscles. On the one hand, the control actions for the motor actuated model is the target joint angles converted into joint torques through a Proportional-Differential controller. While on the other hand, the control actions for the muscle-tendon actuated model is the muscle excitations converted implicitly to muscle activations and then to muscle forces which apply moments on joints. Muscle-tendon actuated models were found to have superiority over motor actuation as they are inherently smooth due to muscle activation dynamics and don't need any external regularizers. Finally, a strategy that was used to obtain an optimal configuration of the significant decision variables in the framework was discussed. All the results and analysis are presented in an illustrative, qualitative, and quantitative manner. Supporting video links are provided in the Appendix.

【33】 Unsupervised Technique To Conversational Machine Reading 标题:无监督技术在对话式机器阅读中的应用

作者:Peter Ochieng,Dennis Mugambi 机构:Department of Computing and Informatics, Taita Taveta University, P.O. Box , – , Voi,Kenya., Dennis Kaburu, Jomo Kenyatta University of Agriculture and Technology, P.O. Box , CITY SQUARE,Nairobi ,Kenya. 链接:https://arxiv.org/abs/2106.15247 摘要:会话式机器阅读(CMR)工具近年来发展迅速。现有的工具依赖于有监督学习技术,需要有标记的数据集进行训练。监督技术需要为每个新的规则文本创建一个手动标记的数据集。这是乏味和容易出错的。本文介绍并论证了无监督学习技术在CMR开发中的应用。具体来说,我们演示了如何无监督学习可以用于规则提取和蕴涵模块的CMR。与目前最好的CMR工具相比,我们开发的框架报告微观平均精度提高了3.3%,宏观平均精度提高了1.4%。 摘要:Conversational machine reading (CMR) tools have seen a rapid progress in the recent past. The current existing tools rely on the supervised learning technique which require labeled dataset for their training. The supervised technique necessitates that for every new rule text, a manually labeled dataset must be created. This is tedious and error prone. This paper introduces and demonstrates how unsupervised learning technique can be applied in the development of CMR. Specifically, we demonstrate how unsupervised learning can be used in rule extraction and entailment modules of CMR. Compared to the current best CMR tool, our developed framework reports 3.3% improvement in micro averaged accuracy and 1.4 % improvement in macro averaged accuracy.

【34】 Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis 标题:自动生成的反事实在情感分析中的有效性探讨

作者:Yang Linyi,Li Jiazheng,Cunningham Pádraig,Zhang Yue,Smyth Barry,Dong Ruihai 机构: University College Dublin 2 School of Computer Science, University College Dublin 3 School of Engineering, Westlake University 4 Institute of Advanced Technology, Westlake Institute for Advanced Study{linyi 备注:ACL-21, Main Conference, Long Paper 链接:https://arxiv.org/abs/2106.15231 摘要:近年来,虽然最先进的NLP模型在许多任务中都取得了优异的性能,但人们对其鲁棒性以及对训练和测试数据中可能存在的系统偏差的潜在敏感性提出了重要的问题。当在现场面对分布外的数据时,这些问题表现为性能问题。最近的一个解决方案是使用反事实增强的数据集,以减少对原始数据中可能存在的虚假模式的依赖。生成高质量的增强数据可能既昂贵又耗时,因为它通常需要人工反馈和众包工作。在这项工作中,我们提出了一个替代方案,通过描述和评估一种自动生成反事实数据的方法,用于数据扩充和解释。通过对多个不同数据集的综合评估,并使用各种最先进的基准,说明了我们的方法如何在模型性能方面取得显著改进,这一改进与基于原始数据的模型训练相比,甚至与利用人工生成的增强数据训练的模型相比。 摘要:While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that may exist in their training and test data. Such issues come to be manifest in performance problems when faced with out-of-distribution data in the field. One recent solution has been to use counterfactually augmented datasets in order to reduce any reliance on spurious patterns that may exist in the original data. Producing high-quality augmented data can be costly and time-consuming as it usually needs to involve human feedback and crowdsourcing efforts. In this work, we propose an alternative by describing and evaluating an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance when compared to models training on the original data and even when compared to models trained with the benefit of human-generated augmented data.

【35】 Leveraging Static Models for Link Prediction in Temporal Knowledge Graphs 标题:利用静电模型进行时态知识图中的链接预测

作者:Wessel Radstok,Mel Chekol 机构:nlUtrecht UniversityAbstract 链接:https://arxiv.org/abs/2106.15223 摘要:在知识图嵌入(KGE)中包含事实的时间范围为改进结果嵌入提供了重要的机会,从而提高了下游应用程序的性能。然而,很少有研究致力于这一领域,与没有时间范围的训练模型(静态模型)相比,许多已开展的研究报告只略微改善了结果。此外,他们没有利用静态模型的现有工作,而是引入了特定于时态知识图的新模型。我们提出了一种新的观点,通过集中精力处理数据来利用现有静态嵌入模型的能力。我们的方法SpliMe从信号处理领域和早期的图形嵌入工作中得到了启发。我们证明了SpliMe与时态KGE的最新技术相竞争或优于后者。此外,我们揭示了当前用于评估时态图上静态模型性能的过程中存在的问题,并介绍了两种方法来抵消这些问题。 摘要:The inclusion of temporal scopes of facts in knowledge graph embedding (KGE) presents significant opportunities for improving the resulting embeddings, and consequently for increased performance in downstream applications. Yet, little research effort has focussed on this area and much of the carried out research reports only marginally improved results compared to models trained without temporal scopes (static models). Furthermore, rather than leveraging existing work on static models, they introduce new models specific to temporal knowledge graphs. We propose a novel perspective that takes advantage of the power of existing static embedding models by focussing effort on manipulating the data instead. Our method, SpliMe, draws inspiration from the field of signal processing and early work in graph embedding. We show that SpliMe competes with or outperforms the current state of the art in temporal KGE. Additionally, we uncover issues with the procedure currently used to assess the performance of static models on temporal graphs and introduce two ways to counteract them.

【36】 Fact Check: Analyzing Financial Events from Multilingual News Sources 标题:事实核查:从多语种新闻来源分析金融事件

作者:Yang Linyi,Ng Tin Lok James,Smyth Barry,Dong Ruihai 机构: University College Dublin 2Trinity College Dublin{linyi 备注:Demo 链接:https://arxiv.org/abs/2106.15221 摘要:近年来,金融新闻数据的规模和复杂性急剧增加,这使得投资分析师提取有价值的见解并进行分析变得越来越具有挑战性。我们提出了FactCheck-in-finance,这是一个基于web的新闻聚合器,具有深度学习模型,可以为分析人员提供多语种新闻来源的重要金融事件的整体视图,并使用无监督聚类方法提取事件。提供了一个web界面,使用基于转换器的事实检查器检查新闻文章的可信度。事实检查器的性能是用一个与并购事件相关的数据集来评估的,并显示出优于几个强基线。 摘要:The explosion in the sheer magnitude and complexity of financial news data in recent years makes it increasingly challenging for investment analysts to extract valuable insights and perform analysis. We propose FactCheck in finance, a web-based news aggregator with deep learning models, to provide analysts with a holistic view of important financial events from multilingual news sources and extract events using an unsupervised clustering method. A web interface is provided to examine the credibility of news articles using a transformer-based fact-checker. The performance of the fact checker is evaluated using a dataset related to merger and acquisition (M&A) events and is shown to outperform several strong baselines.

【37】 Rethinking the Evaluation of Neural Machine Translation 标题:对神经机器翻译评价的再思考

作者:Jianhao Yan,Chenming Wu,Fandong Meng,Jie Zhou 机构:WeChat AI, Tencent, China 备注:Submitted to NeurIPS 2021 链接:https://arxiv.org/abs/2106.15217 摘要:神经机器翻译系统的评估通常建立在特定解码方法(例如,波束搜索)的生成翻译的基础上,其评估指标超过生成翻译(例如,BLEU)。然而,该评估框架存在启发式搜索算法带来的高搜索错误,并且受到对一个最佳候选进行评估的性质的限制。本文提出了一种新的评估协议,不仅避免了搜索错误的影响,而且从模型排序的角度提供了系统级的评估。特别地,我们的方法是基于我们新提出的精确top-$k$解码而不是波束搜索。我们的方法评估模型错误的距离候选空间评分的参考和模型分别。在WMT'14英德版上的大量实验表明,糟糕的排名能力与众所周知的波束搜索诅咒有关,最先进的Transformer模型面临着严重的排名错误。通过评估各种模型架构和技术,我们提供了几个有趣的发现。最后,为了在与原始波束搜索相同的时间开销下有效地逼近精确搜索算法,我们提出了一种最小堆增强波束搜索算法。 摘要:The evaluation of neural machine translation systems is usually built upon generated translation of a certain decoding method (e.g., beam search) with evaluation metrics over the generated translation (e.g., BLEU). However, this evaluation framework suffers from high search errors brought by heuristic search algorithms and is limited by its nature of evaluation over one best candidate. In this paper, we propose a novel evaluation protocol, which not only avoids the effect of search errors but provides a system-level evaluation in the perspective of model ranking. In particular, our method is based on our newly proposed exact top-$k$ decoding instead of beam search. Our approach evaluates model errors by the distance between the candidate spaces scored by the references and the model respectively. Extensive experiments on WMT'14 English-German demonstrate that bad ranking ability is connected to the well-known beam search curse, and state-of-the-art Transformer models are facing serious ranking errors. By evaluating various model architectures and techniques, we provide several interesting findings. Finally, to effectively approximate the exact search algorithm with same time cost as original beam search, we present a minimum heap augmented beam search algorithm.

【38】 Counterfactual Explanations for Arbitrary Regression Models 标题:任意回归模型的反事实解释

作者:Thomas Spooner,Danial Dervovic,Jason Long,Jon Shepard,Jiahao Chen,Daniele Magazzeni 机构:J. P. Morgan AI Research 备注:20 pages, 5 figures, 3 tables 链接:https://arxiv.org/abs/2106.15212 摘要:提出了一种新的基于贝叶斯优化的反事实解释方法,该方法适用于分类模型和回归模型。该方法是一种全局收敛的搜索算法,支持任意回归模型和特征稀疏性、可操作追索等约束条件,并能在学习已有查询的同时并行地回答多个反事实问题。我们在一个严格的数学框架下,利用可微势来建立回归模型的CFE搜索,解决了基于阈值的目标的鲁棒性问题。我们证明了在这个框架下,(a)证明反事实的存在是NP完全的;并且(b)利用这种势寻找实例是完全的。我们描述了一个统一的算法为CFEs使用一个专门的收购函数,其中既包括预期的改善和指数多项式(EP)家庭与理想的性质。我们对真实世界基准域的评估显示了高的样本效率和精度。 摘要:We present a new method for counterfactual explanations (CFEs) based on Bayesian optimisation that applies to both classification and regression models. Our method is a globally convergent search algorithm with support for arbitrary regression models and constraints like feature sparsity and actionable recourse, and furthermore can answer multiple counterfactual questions in parallel while learning from previous queries. We formulate CFE search for regression models in a rigorous mathematical framework using differentiable potentials, which resolves robustness issues in threshold-based objectives. We prove that in this framework, (a) verifying the existence of counterfactuals is NP-complete; and (b) that finding instances using such potentials is CLS-complete. We describe a unified algorithm for CFEs using a specialised acquisition function that composes both expected improvement and an exponential-polynomial (EP) family with desirable properties. Our evaluation on real-world benchmark domains demonstrate high sample-efficiency and precision.

【39】 Inconspicuous Adversarial Patches for Fooling Image Recognition Systems on Mobile Devices 标题:用于欺骗移动设备上的图像识别系统的不起眼的对抗性补丁

作者:Tao Bai,Jinqi Luo,Jun Zhao 机构: Zhao are with the School of Computer Science andEngineering, Nanyang Technological University 备注:arXiv admin note: substantial text overlap with arXiv:2009.09774 链接:https://arxiv.org/abs/2106.15202 摘要:基于深度学习的图像识别系统在当今世界的移动设备上得到了广泛的应用。然而,在最近的研究中,深度学习模式很容易受到对抗性例子的影响。对抗性例子的一个变体,叫做对抗性补丁,由于其强大的攻击能力而引起了研究人员的注意。尽管敌对补丁的攻击成功率很高,但由于补丁与原始图像的视觉不一致性,很容易被检测到。此外,文献中的对抗性补丁生成通常需要大量的数据,计算量大且耗时。为了应对这些挑战,我们提出了一种方法来生成一个单一的图像不显眼的敌对补丁。在该方法中,首先根据受害者模型的感知敏感度来确定斑块的位置,然后利用多个尺度生成器和鉴别器从粗到精地生成对抗性斑块。鼓励补丁与对抗训练的背景图像保持一致,同时保持强大的攻击能力。通过对不同结构和训练方法的模型进行大量实验,证明了该方法在白盒环境下具有很强的攻击能力,在黑盒环境下具有很好的可移植性。与其他对抗性补丁相比,我们的对抗性补丁具有可忽略的风险,可以避免人类的观察,这得到了显著性图和用户评估结果的支持。最后,我们证明了我们的对抗补丁可以应用于物理世界。 摘要:Deep learning based image recognition systems have been widely deployed on mobile devices in today's world. In recent studies, however, deep learning models are shown vulnerable to adversarial examples. One variant of adversarial examples, called adversarial patch, draws researchers' attention due to its strong attack abilities. Though adversarial patches achieve high attack success rates, they are easily being detected because of the visual inconsistency between the patches and the original images. Besides, it usually requires a large amount of data for adversarial patch generation in the literature, which is computationally expensive and time-consuming. To tackle these challenges, we propose an approach to generate inconspicuous adversarial patches with one single image. In our approach, we first decide the patch locations basing on the perceptual sensitivity of victim models, then produce adversarial patches in a coarse-to-fine way by utilizing multiple-scale generators and discriminators. The patches are encouraged to be consistent with the background images with adversarial training while preserving strong attack abilities. Our approach shows the strong attack abilities in white-box settings and the excellent transferability in black-box settings through extensive experiments on various models with different architectures and training methods. Compared to other adversarial patches, our adversarial patches hold the most negligible risks to be detected and can evade human observations, which is supported by the illustrations of saliency maps and results of user evaluations. Lastly, we show that our adversarial patches can be applied in the physical world.

【40】 Action Set Based Policy Optimization for Safe Power Grid Management 标题:基于动作集的安全电网管理策略优化

作者:Bo Zhou,Hongsheng Zeng,Yuecheng Liu,Kejiao Li,Fan Wang,Hao Tian 机构:Baidu Inc., China 备注:accepted by ECML PKDD 2021; 1st place in NeurIPS2020 RL challenge: power grid management 链接:https://arxiv.org/abs/2106.15200 摘要:由于电力消费波动、可再生能源供电不稳定以及人为和自然灾害等不可预知的事故,维持现代电网的稳定变得越来越困难。由于电网运行必须考虑其对未来稳定性的影响,强化学习(RL)被用来提供电网管理中的顺序决策。然而,现有的方法没有考虑环境约束。因此,学习到的政策有可能在紧急情况下选择违反约束的行动,这将使电力线路过载问题升级并导致大规模停电。在这项工作中,我们提出了一种新的方法来解决这个问题,它建立在基于搜索的规划算法之上。在规划阶段,搜索空间仅限于策略生成的动作集。所选动作严格遵循约束条件,通过系统提供的仿真功能测试其结果。在学习阶段,为了解决梯度不能传播到策略的问题,我们引入进化策略和黑箱策略优化来直接改进策略,使长期收益最大化。在NeurIPS 2020 Learning to Run Power Network(L2RPN)竞赛中,我们的解决方案安全地管理了电网,并在这两个方面都排名第一。 摘要:Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.

【41】 Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning 标题:利用深度学习增强云计算系统中的软件故障分析

作者:Domenico Cotroneo,Luigi De Simone,Pietro Liguori,Roberto Natella 机构:Universita degli Studi di Napoli Federico II, Naples, Italy 备注:Paper accepted to the Journal of Systems and Software on June 28th, 2021 链接:https://arxiv.org/abs/2106.15182 摘要:由于云计算系统日益复杂,故障数据量大、噪声大,识别云计算系统的故障模式是一项既困难又耗时的任务。本文提出了一种分析云系统故障数据的新方法,以减轻分析人员对特征工程数据进行手工微调的负担。该方法利用了一系列基于深度学习的无监督聚类算法Deep Embedded Clustering(DEC),它使用一个自动编码器来优化数据维数和簇间方差。我们将该方法应用于OpenStack云计算平台的环境中,包括原始故障数据和结合异常检测预处理算法。结果表明,该方法在聚类纯度方面的性能与人工微调聚类相当,甚至在某些情况下优于人工微调聚类,从而避免了对深层域知识的需要,减少了分析的工作量。在所有情况下,该方法提供了更好的性能比无监督聚类时,没有特征工程应用于数据。此外,该方法得到的失效模式分布更接近于失效模式的实际频率。 摘要:Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages Deep Embedded Clustering (DEC), a family of unsupervised clustering algorithms based on deep learning, which uses an autoencoder to optimize data dimensionality and inter-cluster variance. We applied the approach in the context of the OpenStack cloud computing platform, both on the raw failure data and in combination with an anomaly detection pre-processing algorithm. The results show that the performance of the proposed approach, in terms of purity of clusters, is comparable to, or in some cases even better than manually fine-tuned clustering, thus avoiding the need for deep domain knowledge and reducing the effort to perform the analysis. In all cases, the proposed approach provides better performance than unsupervised clustering when no feature engineering is applied to the data. Moreover, the distribution of failure modes from the proposed approach is closer to the actual frequency of the failure modes.

【42】 The Price of Selfishness: Conjunctive Query Entailment for ALCSelf is 2ExpTime-hard 标题:自私的代价:ALCSelf的合取查询蕴涵是2ExpTime-Hard

作者:Bartosz Bednarczyk,Sebastian Rudolph 机构:Computational Logic Group, Technische Universität Dresden, Germany, Institute of Computer Science, University of Wrocław, Poland 链接:https://arxiv.org/abs/2106.15150 摘要:在基于逻辑的知识表示中,问答问题实质上已经取代了单纯的满足性检验,成为人们最关心的推理问题。对于基本描述逻辑ALC中的知识库,合取查询(CQ)应答的计算复杂度是可扩展的,因此并不比可满足性困难。当逻辑被某些特性(如计数或角色层次结构)扩展时,这一点不会改变,而添加其他特性(反转、名词或传递性以及角色层次结构)则会使CQ的回答变得更加困难。通过展示一个令人惊讶的事实,我们对这一结果做出了贡献:即使仅通过自算子扩展ALC(在许多其他上下文中被证明是无害的),也会将CQ蕴涵的复杂性增加到2ExpTime。对于这类问题,我们的证明证明了在指数空间中运行的交替图灵机的一种简化方法,但要使这种方法在特定的、受限的环境中工作,还需要一些新颖的思想和编码技巧。 摘要:In logic-based knowledge representation, query answering has essentially replaced mere satisfiability checking as the inferencing problem of primary interest. For knowledge bases in the basic description logic ALC, the computational complexity of conjunctive query (CQ) answering is well known to be ExpTime-complete and hence not harder than satisfiability. This does not change when the logic is extended by certain features (such as counting or role hierarchies), whereas adding others (inverses, nominals or transitivity together with role-hierarchies) turns CQ answering exponentially harder. We contribute to this line of results by showing the surprising fact that even extending ALC by just the Self operator - which proved innocuous in many other contexts - increases the complexity of CQ entailment to 2ExpTime. As common for this type of problem, our proof establishes a reduction from alternating Turing machines running in exponential space, but several novel ideas and encoding tricks are required to make the approach work in that specific, restricted setting.

【43】 SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption 标题:SCARF:基于随机特征破坏的自监督对比学习

作者:Dara Bahri,Heinrich Jiang,Yi Tay,Donald Metzler 机构:Google Research 链接:https://arxiv.org/abs/2106.15147 摘要:自监督对比表征学习在视觉和自然语言领域取得了令人难以置信的成功,能够以数量级更少的标记数据实现最先进的性能。然而,这些方法是特定领域的,很少有人在实际的表格数据集上利用这种技术。我们提出了SCARF,一种简单的,广泛应用的对比学习技术,其中观点是通过腐蚀一个随机的特征子集形成的。当应用于从OpenML-CC18基准测试得到的69个真实世界、表格分类数据集上的预训练深度神经网络时,SCARF不仅在全监督环境下提高了分类精度,而且在标签噪声和半监督环境下也提高了分类精度,其中只有一小部分可用的训练数据被标记。我们表明,围巾补充现有的战略和优于替代品,如自动编码器。我们进行全面的烧蚀,详细说明了一系列因素的重要性。 摘要:Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data. However, such methods are domain-specific and little has been done to leverage this technique on real-world tabular datasets. We propose SCARF, a simple, widely-applicable technique for contrastive learning, where views are formed by corrupting a random subset of features. When applied to pre-train deep neural networks on the 69 real-world, tabular classification datasets from the OpenML-CC18 benchmark, SCARF not only improves classification accuracy in the fully-supervised setting but does so also in the presence of label noise and in the semi-supervised setting where only a fraction of the available training data is labeled. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders. We conduct comprehensive ablations, detailing the importance of a range of factors.

【44】 Do Not Deceive Your Employer with a Virtual Background: A Video Conferencing Manipulation-Detection System 标题:不要用虚拟背景欺骗你的雇主:视频会议操作检测系统

作者:Mauro Conti,Simone Milani,Ehsan Nowroozi,Gabriele Orazi 机构:† Department of Mathematics, University of Padua, Via Trieste, - Padua, ITALY, ∗Department of Information Engineering, University of Padua, Via Gradenigo ,B - Padua, ITALY 备注:6 pages 链接:https://arxiv.org/abs/2106.15130 摘要:上一代的视频会议软件允许用户利用虚拟背景来隐藏个人环境,尤其是在与其他雇主的正式会议上。另一方面,用户可能希望通过考虑虚拟背景来隐藏他们所在的位置,从而在会议中愚弄人们。在这种情况下,开发工具来理解虚拟背景,并利用这些工具在会议中愚弄人们,起着重要的作用。此外,由于恶意用户可以通过在视频上应用一组对抗性的编辑步骤来隐藏任何暴露的足迹,因此此类检测器必须对不同类型的攻击具有鲁棒性。本文研究了一种有效的视频会议用户背景检测工具的可行性。特别地,我们提供第一个工具来计算像素共现矩阵,并使用它们来搜索光谱和空间波段之间的不一致性。实验证明交叉共现矩阵提高了检测器对各种攻击的鲁棒性。这项工作的表现是特别值得注意的方面,彩色垃圾邮件的特点。此外,与后处理(如几何变换、滤波、对比度增强和具有不同质量因子的JPEG压缩)相比,性能尤其重要。 摘要:The last-generation video conferencing software allows users to utilize a virtual background to conceal their personal environment due to privacy concerns, especially in official meetings with other employers. On the other hand, users maybe want to fool people in the meeting by considering the virtual background to conceal where they are. In this case, developing tools to understand the virtual background utilize for fooling people in meeting plays an important role. Besides, such detectors must prove robust against different kinds of attacks since a malicious user can fool the detector by applying a set of adversarial editing steps on the video to conceal any revealing footprint. In this paper, we study the feasibility of an efficient tool to detect whether a videoconferencing user background is real. In particular, we provide the first tool which computes pixel co-occurrences matrices and uses them to search for inconsistencies among spectral and spatial bands. Our experiments confirm that cross co-occurrences matrices improve the robustness of the detector against different kinds of attacks. This work's performance is especially noteworthy with regard to color SPAM features. Moreover, the performance especially is significant with regard to robustness versus post-processing, like geometric transformations, filtering, contrast enhancement, and JPEG compression with different quality factors.

【45】 Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit 标题:正则化OFU:一种有效的非线性上下文盗贼的UCB估计器

作者:Yichi Zhou,Shihong Song,Huishuai Zhang,Jun Zhu,Wei Chen,Tie-Yan Liu 机构:com†Tsinghua University 链接:https://arxiv.org/abs/2106.15128 摘要:勘探与开发的平衡是当代土匪斗争中的一个基本问题。面对不确定性,EE的一个强有力的权衡原则是等时性(OFU),即agent根据奖励的置信上限(UCB)采取行动。OFU已经达到了(接近)最佳的遗憾边界,适合于linear/kernel上下文盗贼。然而,如何在非线性复杂任务中得到有效的EE折衷方法,如以深层神经网络作为奖励函数的上下文bandit,目前还不清楚。本文提出了一种新的OFU算法ROFU。在rofu中,我们用一个可微函数来度量报酬的不确定性,并通过求解一个正则优化问题来计算置信上界。我们证明了,对于多武装土匪,核上下文土匪和神经中枢核土匪,ROFU在一定的不确定性测度下达到(接近)最优后悔界,这从理论上证明了它在EE权衡上的有效性。重要的是,ROFU承认了一个非常有效的梯度基优化实现,它很容易扩展到一般的深度神经网络模型以外的神经切线核,与以往的OFU方法形成鲜明对比。理论评估表明,在不同的环境下,ROFU对语境条件非常有效。 摘要:Balancing exploration and exploitation (EE) is a fundamental problem in contex-tual bandit. One powerful principle for EE trade-off isOptimism in Face of Uncer-tainty(OFU), in which the agent takes the action according to an upper confidencebound (UCB) of reward. OFU has achieved (near-)optimal regret bound for lin-ear/kernel contextual bandits. However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function. In thispaper, we propose a novel OFU algorithm namedregularized OFU(ROFU). InROFU, we measure the uncertainty of the reward by a differentiable function andcompute the upper confidence bound by solving a regularized optimization prob-lem. We prove that, for multi-armed bandit, kernel contextual bandit and neuraltangent kernel bandit, ROFU achieves (near-)optimal regret bounds with certainuncertainty measure, which theoretically justifies its effectiveness on EE trade-off.Importantly, ROFU admits a very efficient implementation with gradient-basedoptimizer, which easily extends to general deep neural network models beyondneural tangent kernel, in sharp contrast with previous OFU methods. The em-pirical evaluation demonstrates that ROFU works extremelywell for contextualbandits under various settings.

【46】 Neural Machine Translation for Low-Resource Languages: A Survey 标题:面向低资源语言的神经机器翻译研究综述

作者:Surangika Ranathunga,En-Shiun Annie Lee,Marjana Prifti Skenduli,Ravi Shekhar,Mehreen Alam,Rishemjit Kaur 机构: University of Moratuwa, University of Toronto, University of New York Tirana, Queen Mary University, National University of Computer and Emerging Sciences 备注:35 pages, 8 figures 链接:https://arxiv.org/abs/2106.15115 摘要:神经机器翻译(NMT)在不到十年的时间里经历了巨大的增长,已经进入了一个成熟阶段。虽然被认为是机器翻译中应用最广泛的解决方案,但由于大型并行语料库的不可用,它在低资源语言对上的性能仍然低于高资源语言对上的性能。因此,针对低资源语言对的NMT技术的实现成为近年来NMT研究领域的一个热点,并由此产生了大量的研究报道。本文对低资源语言NMT(LRL-NMT)的研究进展进行了详细的综述,并对最流行的解决方案进行了定量分析。基于我们在回顾以往工作的基础上得出的结论,本文提供了一套针对给定LRL数据环境选择NMT技术的指南。它还提出了LRL-NMT研究领域的整体观点,并提供了进一步加强LRL-NMT研究工作的建议清单。 摘要:Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase. While considered as the most widely used solution for Machine Translation, its performance on low-resource language pairs still remains sub-optimal compared to the high-resource counterparts, due to the unavailability of large parallel corpora. Therefore, the implementation of NMT techniques for low-resource language pairs has been receiving the spotlight in the recent NMT research arena, thus leading to a substantial amount of research reported on this topic. This paper presents a detailed survey of research advancements in low-resource language NMT (LRL-NMT), along with a quantitative analysis aimed at identifying the most popular solutions. Based on our findings from reviewing previous work, this survey paper provides a set of guidelines to select the possible NMT technique for a given LRL data setting. It also presents a holistic view of the LRL-NMT research landscape and provides a list of recommendations to further enhance the research efforts on LRL-NMT.

【47】 Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation 标题:不要从字面上理解:文本生成的编辑不变序列丢失

作者:Guangyi Liu,Zichao Yang,Tianhua Tao,Xiaodan Liang,Zhen Li,Bowen Zhou,Shuguang Cui,Zhiting Hu 机构:Chinese University of Hong Kong, Shenzhen, Carnegie Mellon University, Tsinghua University, Sun Yat-Sen University, JD AI Research, UC San Diego 备注:10 pages, 5 figures 链接:https://arxiv.org/abs/2106.15078 摘要:神经文本生成模型通常通过序列交叉熵损失最大化对数似然来训练,这鼓励了目标序列和生成序列之间的精确逐标记匹配。当目标序列不完美时,例如当目标序列被噪声破坏时,或者当只有弱序列监督可用时,这种训练目标是次优的。为了解决这一问题,我们提出了一种新的编辑不变序列丢失(EISL),它计算目标n-gram与生成序列中所有n-gram的匹配丢失。EISL从卷积网络(ConvNets)中获得灵感,卷积网络对图像具有平移不变性,因此对n-gram的平移具有鲁棒性,能够容忍对目标序列的编辑。此外,EISL的计算本质上是以目标n-gram为核的卷积运算,易于在现有库中实现。为了验证EISL的有效性,我们在三个任务上进行了实验:含噪声目标序列的机器翻译、无监督文本风格转换和非自回归机器翻译。实验结果表明,在这三个任务上,我们的方法都明显优于交叉熵损失法。 摘要:Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy loss, which encourages an exact token-by-token match between a target sequence with a generated sequence. Such training objective is sub-optimal when the target sequence not perfect, e.g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available. To address this challenge, we propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. EISL draws inspirations from convolutional networks (ConvNets) which are shift-invariant to images, hence is robust to the shift of n-grams to tolerate edits in the target sequences. Moreover, the computation of EISL is essentially a convolution operation with target n-grams as kernels, which is easy to implement with existing libraries. To demonstrate the effectiveness of EISL, we conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation. Experimental results show our method significantly outperforms cross entropy loss on these three tasks.

【48】 K-ZSL: Resources for Knowledge-driven Zero-shot Learning 标题:K-ZSL:知识驱动的零距离学习资源

作者:Yuxia Geng,Jiaoyan Chen,Zhuo Chen,Jeff Z. Pan,Zonggang Yuan,Huajun Chen 机构:Zhejiang University, Hangzhou, China, University of Oxford, Oxford, United Kingdom 备注:Under Review 链接:https://arxiv.org/abs/2106.15047 摘要:外部知识(又称边信息)在Zero-Shot学习(ZSL)中起着至关重要的作用,ZSL旨在预测从未出现在训练数据中的未知类。文本、属性等几种外部知识已经得到了广泛的研究,但它们本身的局限性在于语义的不完全性。因此,最近的一些研究提出使用知识图(KG)来表示各种知识,因为KG具有很高的表达能力和兼容性。然而,ZSL社区仍然缺乏研究和比较基于KG的ZSL方法的标准基准。在本文中,我们提出了5个资源,用于Zero-Shot图像分类(ZS-IMGC)和Zero-ShotKG补全(ZS-KGC)中基于KG的研究。对于每个资源,我们都提供了一个基准测试及其KG,其语义范围从文本到属性,从关系知识到逻辑表达式。我们已经清楚地展示了资源是如何构造的,它们的统计数据和格式,以及如何在评估ZSL方法的性能和解释的案例中使用它们。我们的资源可在https://github.com/China-UK-ZSL/Resources_for_KZSL. 摘要:External knowledge (a.k.a side information) plays a critical role in zero-shot learning (ZSL) which aims to predict with unseen classes that have never appeared in training data. Several kinds of external knowledge such as text and attribute have been widely investigated, but they alone are limited with incomplete semantics. Therefore, some very recent studies propose to use Knowledge Graph (KG) due to its high expressivity and compatibility for representing kinds of knowledge. However, the ZSL community is still short of standard benchmarks for studying and comparing different KG-based ZSL methods. In this paper, we proposed 5 resources for KG-based research in zero-shot image classification (ZS-IMGC) and zero-shot KG completion (ZS-KGC). For each resource, we contributed a benchmark and its KG with semantics ranging from text to attributes, from relational knowledge to logical expressions. We have clearly presented how the resources are constructed, their statistics and formats, and how they can be utilized with cases in evaluating ZSL methods' performance and explanations. Our resources are available at https://github.com/China-UK-ZSL/Resources_for_KZSL.

【49】 EVPropNet: Detecting Drones By Finding Propellers For Mid-Air Landing And Following 标题:EVPropNet:通过寻找用于半空着陆和跟随的螺旋桨来探测无人机

作者:Nitin J. Sanket,Chahat Deep Singh,Chethan M. Parameshwara,Cornelia Fermüller,Guido C. H. E. de Croon,Yiannis Aloimonos 机构:a b 备注:11 pages, 10 figures, 6 tables. Accepted in Robotics: Science and Systems (RSS) 2021 链接:https://arxiv.org/abs/2106.15045 摘要:无人机或无人机的无障碍性迅速增加,对一般安全和保密构成威胁。大多数商用或定制的无人机都是多旋翼的,由多个螺旋桨组成。由于这些推进器以高速旋转,它们通常是图像中移动最快的部分,在没有严重运动模糊的情况下,经典相机无法直接“看到”。我们利用了一类传感器,这些传感器特别适合这种场景,称为事件摄像机,具有高时间分辨率、低延迟和高动态范围。在本文中,我们建立了螺旋桨的几何模型,并用它来生成模拟事件,这些事件被用来训练一个叫做EVPropNet的深层神经网络,从事件摄像机的数据中检测螺旋桨。EVPropNet直接转移到现实世界中,无需任何微调或再训练。我们介绍了我们的网络的两个应用:(a)跟踪和跟踪一个无标记的无人机和(b)降落在一个近悬停无人机。我们在许多不同螺旋桨形状和尺寸的真实实验中成功地评估和演示了所提出的方法。我们的网络能够以85.1%的速度检测螺旋桨,即使60%的螺旋桨被阻塞,并且可以在2W功率预算下以高达35Hz的频率运行。据我们所知,这是第一个基于深度学习的螺旋桨探测(无人机探测)解决方案。最后,我们的应用也显示了令人印象深刻的成功率为92%和90%的跟踪和着陆任务分别。 摘要:The rapid rise of accessibility of unmanned aerial vehicles or drones pose a threat to general security and confidentiality. Most of the commercially available or custom-built drones are multi-rotors and are comprised of multiple propellers. Since these propellers rotate at a high-speed, they are generally the fastest moving parts of an image and cannot be directly "seen" by a classical camera without severe motion blur. We utilize a class of sensors that are particularly suitable for such scenarios called event cameras, which have a high temporal resolution, low-latency, and high dynamic range. In this paper, we model the geometry of a propeller and use it to generate simulated events which are used to train a deep neural network called EVPropNet to detect propellers from the data of an event camera. EVPropNet directly transfers to the real world without any fine-tuning or retraining. We present two applications of our network: (a) tracking and following an unmarked drone and (b) landing on a near-hover drone. We successfully evaluate and demonstrate the proposed approach in many real-world experiments with different propeller shapes and sizes. Our network can detect propellers at a rate of 85.1% even when 60% of the propeller is occluded and can run at upto 35Hz on a 2W power budget. To our knowledge, this is the first deep learning-based solution for detecting propellers (to detect drones). Finally, our applications also show an impressive success rate of 92% and 90% for the tracking and landing tasks respectively.

【50】 Are conditional GANs explicitly conditional? 标题:有条件的甘斯是否有明确的条件?

作者:Houssem-eddine Boulahbal,Adrian Voicila,Andrew Comport 机构:Renault Software Labs, CNRS-I,S, Sophia Antipolis university 链接:https://arxiv.org/abs/2106.15011 摘要:本文提出了条件生成对抗网络(cGANs)的两个重要贡献,以改进利用该体系结构的各种应用。第一个主要贡献是对cgan的分析,以表明它们不是显式条件的。特别地,将证明鉴别器和随后的cGAN不会自动学习输入之间的条件。第二个贡献是一种新的方法,称为acontrario,它通过一种新的acontrario损失来为对抗体系结构的两个部分显式地建模条件性,该损失涉及训练鉴别器来学习无条件(不利)示例。这导致了一种新型的数据扩充方法,它允许使用不利的例子将生成器的搜索空间限制为条件输出。广泛的实验进行了评估条件的鉴别器提出了一个概率分布分析。与不同应用的cGAN体系结构的比较表明,在众所周知的数据集上,包括语义图像合成、图像分割和使用不同度量的单目深度预测的性能有显著改进,这些度量包括Fr′echet初始距离(FID)、联合平均交集(mIoU),均方根误差对数(RMSE对数)和统计上不同的箱数(NDB) 摘要:This paper proposes two important contributions for conditional Generative Adversarial Networks (cGANs) to improve the wide variety of applications that exploit this architecture. The first main contribution is an analysis of cGANs to show that they are not explicitly conditional. In particular, it will be shown that the discriminator and subsequently the cGAN does not automatically learn the conditionality between inputs. The second contribution is a new method, called acontrario, that explicitly models conditionality for both parts of the adversarial architecture via a novel acontrario loss that involves training the discriminator to learn unconditional (adverse) examples. This leads to a novel type of data augmentation approach for GANs (acontrario learning) which allows to restrict the search space of the generator to conditional outputs using adverse examples. Extensive experimentation is carried out to evaluate the conditionality of the discriminator by proposing a probability distribution analysis. Comparisons with the cGAN architecture for different applications show significant improvements in performance on well known datasets including, semantic image synthesis, image segmentation and monocular depth prediction using different metrics including Fr'echet Inception Distance(FID), mean Intersection over Union (mIoU), Root Mean Square Error log (RMSE log) and Number of statistically-Different Bins (NDB)

【51】 Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment 标题:基于信用分配算法无关性的强化学习模块性

作者:Michael Chang,Sidhant Kaushik,Sergey Levine,Thomas L. Griffiths 机构:Equal contribution 1Department of Computer Science, USA 2Department of ComputerScience, Princeton University 备注:Long Presentation at the Thirty-eighth International Conference on Machine Learning (ICML) 2021. 21 pages, 11 figures 链接:https://arxiv.org/abs/2106.14993 摘要:许多迁移问题需要重新使用以前的最优决策来解决新的任务,这表明需要学习算法来修改选择特定动作的机制,而不是选择其他动作的机制。然而,如何实现这种模块化的信用分配,目前还没有一种形式主义或理论。为了回答这个问题,我们定义模块化信用分配作为一个约束,以最小化不同决策的反馈信号之间的算法互信息。通过对学习算法本身进行因果分析,我们引入了我们称之为模块化的准则来检验学习算法是否满足这个约束。我们将最近提出的社会决策框架推广为比Markov决策过程更细粒度的形式主义,以证明对于不包含循环的决策序列,某些单步时间差动作值方法满足这个准则,而所有的策略梯度方法则不满足。经验证据表明,这种行动价值方法比政策梯度方法对转移问题的样本效率更高,转移问题只需要对先前最优决策序列进行稀疏的改变。 摘要:Many transfer problems require re-using previously optimal decisions for solving new tasks, which suggests the need for learning algorithms that can modify the mechanisms for choosing certain actions independently of those for choosing others. However, there is currently no formalism nor theory for how to achieve this kind of modular credit assignment. To answer this question, we define modular credit assignment as a constraint on minimizing the algorithmic mutual information among feedback signals for different decisions. We introduce what we call the modularity criterion for testing whether a learning algorithm satisfies this constraint by performing causal analysis on the algorithm itself. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process to prove that for decision sequences that do not contain cycles, certain single-step temporal difference action-value methods meet this criterion while all policy-gradient methods do not. Empirical evidence suggests that such action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.

【52】 The Food Recognition Benchmark: Using DeepLearning to Recognize Food on Images 标题:食物识别基准:使用深度学习识别图像上的食物

作者:Sharada Prasanna Mohanty,Gaurav Singhal,Eric Antoine Scuccimarra,Djilani Kebaili,Harris Héritier,Victor Boulanger,Marcel Salathé 机构:AIcrowd Research, AIcrowd; bOtto von Guericke University Magdeburg; cSociété des Produits Nestlé, Nestlé Research Center; dDigital Epidemiology Lab, EPFL (École, polytechnique fédérale de Lausanne) 链接:https://arxiv.org/abs/2106.14977 摘要:图像上食物的自动识别有许多有趣的应用,包括医学队列中的营养跟踪。这个问题已经得到了大量的研究关注,但是一个正在进行的开发开放和可复制算法的公共基准已经丢失。在这里,我们报告了使用通过移动MyFoodRepo应用程序获得的公开食物图像建立这样一个基准。通过四轮测试,基准测试发布了MyFoodRepo-273数据集,包括24119幅图像和总共39325个分割多边形,这些多边形被分类在273个不同的类中。在最后一轮测试中,模型在同一平台的私人测试集上进行了评估,共有5000张图像和7865条注释。在273个食品类别中表现最好的模型的平均准确率达到0.568(第4轮),平均召回率达到0.885(第3轮)。我们展示了第4轮结果的实验验证,并讨论了为未来几轮增加数据集大小和多样性而设计的基准设置的含义。 摘要:The automatic recognition of food on images has numerous interesting applications, including nutritional tracking in medical cohorts. The problem has received significant research attention, but an ongoing public benchmark to develop open and reproducible algorithms has been missing. Here, we report on the setup of such a benchmark using publicly available food images sourced through the mobile MyFoodRepo app. Through four rounds, the benchmark released the MyFoodRepo-273 dataset constituting 24,119 images and a total of 39,325 segmented polygons categorized in 273 different classes. Models were evaluated on private tests sets from the same platform with 5,000 images and 7,865 annotations in the final round. Top-performing models on the 273 food categories reached a mean average precision of 0.568 (round 4) and a mean average recall of 0.885 (round 3). We present experimental validation of round 4 results, and discuss implications of the benchmark setup designed to increase the size and diversity of the dataset for future rounds.

【53】 Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search 标题:用神经修剪搜索实现移动设备上的实时目标检测

作者:Pu Zhao,Wei Niu,Geng Yuan,Yuxuan Cai,Bin Ren,Yanzhi Wang,Xue Lin 机构:Northeastern University, Boston, MA, William & Mary, Williamsburg, VA 备注:Presented on the HiPEAC 2021 workshop (cogarch 2021) 链接:https://arxiv.org/abs/2106.14943 摘要:目标检测在自动驾驶汽车安全发展中起着重要的作用。然而,在自动驾驶汽车上的移动系统由于计算资源有限,给目标检测带来了困难。为了实现这一点,我们提出了一个编译器感知的神经剪枝搜索框架,以实现对自主车辆进行二维和三维目标检测的高速推理。该框架自动搜索每一层的剪枝方案和剪枝率,在编译器优化的情况下找到一个最适合的剪枝,以优化检测精度和速度性能。我们的实验表明,该方法首次在现成的手机上实现了(接近)实时、55ms和99ms的推理时间,分别用于基于YOLOv4的二维目标检测和基于PointPillars的三维检测,精度损失很小(或没有)。 摘要:Object detection plays an important role in self-driving cars for security development. However, mobile systems on self-driving cars with limited computation resources lead to difficulties for object detection. To facilitate this, we propose a compiler-aware neural pruning search framework to achieve high-speed inference on autonomous vehicles for 2D and 3D object detection. The framework automatically searches the pruning scheme and rate for each layer to find a best-suited pruning for optimizing detection accuracy and speed performance under compiler optimization. Our experiments demonstrate that for the first time, the proposed method achieves (close-to) real-time, 55ms and 99ms inference times for YOLOv4 based 2D object detection and PointPillars based 3D detection, respectively, on an off-the-shelf mobile phone with minor (or no) accuracy loss.

【54】 Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering 标题:BioASQ 2021综述:第九届BioASQ关于大规模生物医学语义索引和问题回答的挑战

作者:Anastasios Nentidis,Georgios Katsimpras,Eirini Vandorou,Anastasia Krithara,Luis Gasco,Martin Krallinger,Georgios Paliouras 机构: National Center for Scientific Research “Demokritos”, Athens, Greece, Aristotle University of Thessaloniki, Thessaloniki, Greece, Barcelona Supercomputing Center, Barcelona, Spain 备注:25 pages, 15 tables, 3 figures. arXiv admin note: text overlap with arXiv:2106.14618 链接:https://arxiv.org/abs/2106.14885 摘要:推进大规模生物医学语义索引和问答技术的发展是BioASQ挑战的主要焦点。BioASQ组织了各自的任务,不同的团队开发了基于相同基准数据集的系统,这些数据集代表了生物医学领域专家的真实信息需求。本文以2021年评估论坛(CLEF)的会议和实验室为背景,对第九版BioASQ挑战进行了概述。今年,一项新的问答任务命名为Synergy,它的引入是为了支持研究COVID-19疾病的研究人员,并测量参与研究的团队在问题仍在发展时辨别信息的能力。共有42支拥有170多个系统的队伍报名参加挑战赛的4项任务。与前几年类似,评估结果显示,与基线相比,绩效有所提高,这表明该领域的最新技术水平不断提高。 摘要:Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This paper presents an overview of the ninth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2021. In this year, a new question answering task, named Synergy, is introduced to support researchers studying the COVID-19 disease and measure the ability of the participating teams to discern information while the problem is still developing. In total, 42 teams with more than 170 systems were registered to participate in the four tasks of the challenge. The evaluation results, similarly to previous years, show a performance gain against the baselines which indicates the continuous improvement of the state-of-the-art in this field.

【55】 What Is Consciousness? Artificial Intelligence, Real Intelligence, Quantum Mind, And Qualia 标题:意识是什么?人工智能、真实智能、量子思维和量子

作者:Stuart A. Kauffman,Andrea Roli 机构:Institute for Systems Biology, Seattle, USA, Department of Computer Science and Engineering, Campus of Cesena, Alma Mater Studiorum Universita di Bologna, European Centre for Living Technology, Venezia, Italy 链接:https://arxiv.org/abs/2106.15515 摘要:我们以一种新的方式来探讨“意识是什么?”这个问题,不是笛卡尔的“系统怀疑”,而是生物体如何在自己的世界中找到自己的方式。找到自己的路就要找到世界上可能有益或有害的特征的可能用途。”用X来完成Y的可能用法是“启示”。X的使用次数是不确定的,不同的使用是无序的,不能相互推断。所有的生物适应要么是遗传变异和选择所带来的启示,要么是生物在其世界中寻找X的用途来完成Y的作用所带来的启示。基于此,我们得出了相当惊人的结论:1)强人工智能是不可能的。通用图灵机无法“找到”新颖的启示。2) 大脑不是纯粹的经典物理,因为没有一个经典物理系统可以是一个模拟计算机,其动力学行为可以同构于“可能的用途”。3) 大脑的思维必须是部分量子的——在6.0西格玛到7.3西格玛之间有越来越多的证据支持。4) 基于海森堡对量子态的解释,即通过测量将“电势”转换为“实际值”,一个自然的假设是,心灵实现了电势。这在5.2西格玛中得到了支持。然后,心智实现纠缠的大脑心智世界状态被体验为质量,并允许“看见”或“感知”X的使用来完成Y。我们可以做陪审团的工作。计算机不能。5) 除了熟悉的量子计算机之外,我们还考虑了Trans-Turing系统。 摘要:We approach the question, "What is Consciousness?" in a new way, not as Descartes' "systematic doubt", but as how organisms find their way in their world. Finding one's way involves finding possible uses of features of the world that might be beneficial or avoiding those that might be harmful. "Possible uses of X to accomplish Y" are "Affordances". The number of uses of X is indefinite, the different uses are unordered and are not deducible from one another. All biological adaptations are either affordances seized by heritable variation and selection or, far faster, by the organism acting in its world finding uses of X to accomplish Y. Based on this, we reach rather astonishing conclusions: 1) Strong AI is not possible. Universal Turing Machines cannot "find" novel affordances. 2) Brain-mind is not purely classical physics for no classical physics system can be an analogue computer whose dynamical behavior can be isomorphic to "possible uses". 3) Brain mind must be partly quantum -- supported by increasing evidence at 6.0 sigma to 7.3 Sigma. 4) Based on Heisenberg's interpretation of the quantum state as "Potentia" converted to "Actuals" by Measurement, a natural hypothesis is that mind actualizes Potentia. This is supported at 5.2 Sigma. Then Mind's actualization of entangled brain-mind-world states are experienced as qualia and allow "seeing" or "perceiving" of uses of X to accomplish Y. We can and do jury-rig. Computers cannot. 5) Beyond familiar quantum computers, we consider Trans-Turing-Systems.

【56】 Deep Random Projection Outlyingness for Unsupervised Anomaly Detection 标题:基于深度随机投影的无监督异常检测

作者:Martin Bauw,Santiago Velasco-Forero,Jesus Angulo,Claude Adnet,Olivier Airiau 机构:Center for Mathematical Morphology, MINES ParisTech, PSL Research University, France, Thales LAS France, Advanced Radar Concepts, Limours, France 链接:https://arxiv.org/abs/2106.15307 摘要:随机投影是一种常用的算法设计技术,广泛应用于各种领域,包括信息检索、压缩感知和粗糙度测量等。在这项工作中,对原始的随机投影展开度量进行了修改,并与神经网络相关联,以获得一种能够处理多模态正态性的无监督异常检测方法。理论和实验论证证明了异常评分估计器的选择、随机投影的维数和投影的数目。研究了自适应丢包的贡献,以及该方法的仿射稳定性。所提出的神经网络方法的性能与最先进的异常检测方法相当。在MNIST、Fashion-MNIST和CIFAR-10数据集上进行的实验表明了该方法的有效性,并提出了一种扩展到半监督系统的可能性。 摘要:Random projection is a common technique for designing algorithms in a variety of areas, including information retrieval, compressive sensing and measuring of outlyingness. In this work, the original random projection outlyingness measure is modified and associated with a neural network to obtain an unsupervised anomaly detection method able to handle multimodal normality. Theoretical and experimental arguments are presented to justify the choices of the anomaly score estimator, the dimensions of the random projections, and the number of such projections. The contribution of adapted dropouts is investigated, along with the affine stability of the proposed method. The performance of the proposed neural network approach is comparable to a state-of-the-art anomaly detection method. Experiments conducted on the MNIST, Fashion-MNIST and CIFAR-10 datasets show the relevance of the proposed approach, and suggest a possible extension to a semi-supervised setup.

【57】 DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection 标题:DCASE 2021任务3:用于复调声音事件定位和检测的谱时间对齐特征

作者:Thi Ngoc Tho Nguyen,Karn Watcharasupat,Ngoc Khanh Nguyen,Douglas L. Jones,Woon Seng Gan 机构: School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore., Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA. 备注:5 pages, Technical Report for DCASE 2021 Challenge Task 3 链接:https://arxiv.org/abs/2106.15190 摘要:声事件定位与检测由声事件检测和到达方向估计两个子任务组成。声事件检测主要依靠时频模式来区分不同的声音类别,而到达方向估计则利用麦克风之间的幅度或相位差来估计声源方向。因此,通常很难同时对这两个子任务进行联合训练。我们提出了一种新的特征空间线索增强对数谱图(SALSA),它具有信号功率和源到达方向之间的精确时频映射。该特征包括多信道对数谱图以及估计的直接混响比和谱图上每个时频bin的空间协方差矩阵的主特征向量的归一化版本。在DCASE 2021数据集上进行的具有方向干扰的声音事件定位和检测的实验结果表明,基于这种新特征训练的深度学习模型的性能大大优于DCASE挑战基线。为了进一步提高DCASE声音事件定位和检测挑战的系统性能,我们结合了几个结构稍有不同的模型,对这些模型进行了新特性的训练。 摘要:Sound event localization and detection consists of two subtasks which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses magnitude or phase differences between microphones to estimate source directions. Therefore, it is often difficult to jointly train these two subtasks simultaneously. We propose a novel feature called spatial cue-augmented log-spectrogram (SALSA) with exact time-frequency mapping between the signal power and the source direction-of-arrival. The feature includes multichannel log-spectrograms stacked along with the estimated direct-to-reverberant ratio and a normalized version of the principal eigenvector of the spatial covariance matrix at each time-frequency bin on the spectrograms. Experimental results on the DCASE 2021 dataset for sound event localization and detection with directional interference showed that the deep learning-based models trained on this new feature outperformed the DCASE challenge baseline by a large margin. We combined several models with slightly different architectures that were trained on the new feature to further improve the system performances for the DCASE sound event localization and detection challenge.

【58】 Meta-learning for Matrix Factorization without Shared Rows or Columns 标题:无共享行或列的矩阵因式分解的元学习

作者:Tomoharu Iwata 机构:NTT Communication Science Laboratories 链接:https://arxiv.org/abs/2106.15133 摘要:我们提出了一种方法,元学习知识的矩阵分解从各种矩阵,并利用这些知识分解看不见的矩阵。该方法使用一个以矩阵为输入的神经网络,生成给定矩阵的分解矩阵的先验分布。神经网络是元学习的,当分解矩阵通过最大后验概率(MAP)估计适应每个矩阵时,期望的插补误差最小。我们使用梯度下降法进行MAP估计,这使得我们能够通过梯度下降步骤来反向传播期望的插补误差,以更新神经网络参数,因为每个梯度下降步骤都是以封闭形式编写的,并且是可微的。该方法可以从矩阵中进行元学习,即使矩阵的行和列不共享,并且矩阵的大小不同。在三个用户项目评分数据集的实验中,我们证明了我们提出的方法在经过不同的矩阵训练后,可以从不可见矩阵中的有限个观察值中填充缺失值。 摘要:We propose a method that meta-learns a knowledge on matrix factorization from various matrices, and uses the knowledge for factorizing unseen matrices. The proposed method uses a neural network that takes a matrix as input, and generates prior distributions of factorized matrices of the given matrix. The neural network is meta-learned such that the expected imputation error is minimized when the factorized matrices are adapted to each matrix by a maximum a posteriori (MAP) estimation. We use a gradient descent method for the MAP estimation, which enables us to backpropagate the expected imputation error through the gradient descent steps for updating neural network parameters since each gradient descent step is written in a closed form and is differentiable. The proposed method can meta-learn from matrices even when their rows and columns are not shared, and their sizes are different from each other. In our experiments with three user-item rating datasets, we demonstrate that our proposed method can impute the missing values from a limited number of observations in unseen matrices after being trained with different matrices.

0 人点赞