人工智能学术速递[9.9]

2021-09-16 16:52:37 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.AI人工智能,共计36篇

【1】 Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms 标题:Video2Skill:使用循环MDP同态使演示视频中的事件适应环境中的技能 链接:https://arxiv.org/abs/2109.03813

作者:Sumedh A Sontakke,Sumegh Roychowdhury,Mausoom Sarkar,Nikaash Puri,Balaji Krishnamurthy,Laurent Itti 机构:Indian Institute of Technology, Kharagpur, Adobe MDSR, University of Southern California 摘要:人类擅长从带有文本注释的演示中学习长时间任务,在线教程视频的迅速普及证明了这一点。直观地说,这种能力可以分为两个不同的子任务——首先,将长时间的演示序列划分为语义上有意义的事件;第二,在自己的环境中将这些事件转化为有意义的行为。在这里,我们介绍了Video2Skill(V2S),它试图通过允许机器人手臂从人类烹饪视频中学习来将这种能力扩展到人工代理。我们首先使用序列到序列自动编码器风格的架构来学习长时间范围演示中事件的时间潜在空间。然后,我们使用少量离线和不相关的交互数据(由专家控制的机器人手臂的状态-动作对序列)将这些表示转移到机器人目标域,以将这些事件调整为可操作的表示,即技能。通过实验,我们证明了我们的方法可以实现自监督类比学习,在这种学习中,智能体可以学习在人类演示数据中的运动和机器人环境中的行为之间进行类比。我们还展示了我们的模型学习方法的有效性——演示Video2Skill如何利用人类演示中的先验知识来超越传统的长视野动态模型学习。最后,我们展示了我们的非表格决策方法的实用性,即利用视频演示生成Zero-Shot技能。 摘要:Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own environment. Here, we present Video2Skill (V2S), which attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos. We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations. We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data (sequences of state-action pairs of the robot arm controlled by an expert) to adapt these events into actionable representations, i.e., skills. Through experiments, we demonstrate that our approach results in self-supervised analogy learning, where the agent learns to draw analogies between motions in human demonstration data and behaviors in the robotic environment. We also demonstrate the efficacy of our approach on model learning - demonstrating how Video2Skill utilizes prior knowledge from human demonstration to outperform traditional model learning of long-horizon dynamics. Finally, we demonstrate the utility of our approach for non-tabula rasa decision-making, i.e, utilizing video demonstration for zero-shot skill generation.

【2】 Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking 标题:全景NuScenes:一种用于LiDAR全景分割和跟踪的大规模基准 链接:https://arxiv.org/abs/2109.03805

作者:Whye Kit Fong,Rohit Mohan,Juana Valeria Hurtado,Lubing Zhou,Holger Caesar,Oscar Beijbom,Abhinav Valada 机构: 2 Department of Computer Science, University of Freiburg 备注:The benchmark is available at this https URL and this https URL 摘要:动态代理的全景场景理解和跟踪对于机器人和自动车辆在城市环境中导航至关重要。由于激光雷达提供与照明无关的精确场景几何描述,因此使用激光雷达点云执行这些任务可以提供可靠的预测。然而,现有数据集缺乏城市场景类型的多样性,动态对象实例数量有限,这既妨碍了对这些任务的学习,也妨碍了对所开发方法的可靠基准测试。在本文中,我们介绍了大规模全景nuScenes基准数据集,它扩展了我们流行的nuScenes数据集,并为语义分割、全景分割和全景跟踪任务提供了逐点的地面真相注释。为了便于比较,我们在建议的数据集上为每项任务提供了几个强大的基线。此外,我们分析了现有的全景跟踪指标的缺点,并提出了一种新的以实例为中心的指标来解决这些问题。我们提供了大量的实验,与现有的数据集相比,这些实验证明了全景式nuScenes的实用性,并使在线评估服务器在url{nuScenes.org}上可用。我们相信,这一扩展将加速研究动态城市环境场景理解的新方法。 摘要:Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments. As LiDARs provide accurate illumination-independent geometric depictions of the scene, performing these tasks using LiDAR point clouds provides reliable predictions. However, existing datasets lack diversity in the type of urban scenes and have a limited number of dynamic object instances which hinders both learning of these tasks as well as credible benchmarking of the developed methods. In this paper, we introduce the large-scale Panoptic nuScenes benchmark dataset that extends our popular nuScenes dataset with point-wise groundtruth annotations for semantic segmentation, panoptic segmentation, and panoptic tracking tasks. To facilitate comparison, we provide several strong baselines for each of these tasks on our proposed dataset. Moreover, we analyze the drawbacks of the existing metrics for the panoptic tracking problem and propose a novel instance-centric metric that addresses the concerns. We present extensive experiments that demonstrate the utility of Panoptic nuScenes compared to existing datasets and make the online evaluation server available at url{nuScenes.org}. We believe that this extension will accelerate the research of novel methods for scene understanding of dynamic urban environments.

【3】 Highly Parallel Autoregressive Entity Linking with Discriminative Correction 标题:具有判别校正的高度并行自回归实体链接 链接:https://arxiv.org/abs/2109.03792

作者:Nicola De Cao,Wilker Aziz,Ivan Titov 机构:University of Amsterdam,University of Edinburgh 备注:Accepted at EMNLP2021 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Code at this https URL . 8 pages, 1 figure, 3 tables 摘要:生成性方法最近被证明对实体消歧和实体链接(即联合提及检测和消歧)都是有效的。然而,先前提出的EL自回归公式存在以下问题:i)复杂(深度)解码器导致的高计算成本;ii)随源序列长度扩展的非并行解码;以及iii)需要对大量数据进行训练。在这项工作中,我们提出了一种非常有效的方法,该方法将所有潜在提及的自回归链接并行化,并依赖于浅层高效解码器。此外,我们还通过一个额外的鉴别成分,即一个校正项,来增强生成目标,使我们能够直接优化生成器的排名。综合起来,这些技术解决了上述所有问题:我们的模型比以前的生成方法快70倍以上,精度更高,优于标准英语数据集AIDA CoNLL上的最先进方法。源代码可在https://github.com/nicola-decao/efficient-autoregressive-EL 摘要:Generative approaches have been recently shown to be effective for both Entity Disambiguation and Entity Linking (i.e., joint mention detection and disambiguation). However, the previously proposed autoregressive formulation for EL suffers from i) high computational cost due to a complex (deep) decoder, ii) non-parallelizable decoding that scales with the source sequence length, and iii) the need for training on a large amount of data. In this work, we propose a very efficient approach that parallelizes autoregressive linking across all potential mentions and relies on a shallow and efficient decoder. Moreover, we augment the generative objective with an extra discriminative component, i.e., a correction term which lets us directly optimize the generator's ranking. When taken together, these techniques tackle all the above issues: our model is >70 times faster and more accurate than the previous generative method, outperforming state-of-the-art approaches on the standard English dataset AIDA-CoNLL. Source code available at https://github.com/nicola-decao/efficient-autoregressive-EL

【4】 Active Learning by Acquiring Contrastive Examples 标题:通过获取对比实例进行主动学习 链接:https://arxiv.org/abs/2109.03764

作者:Katerina Margatina,Giorgos Vernikos,Loïc Barrault,Nikolaos Aletras 机构:†University of Sheffield, ‡EPFL, ∗HEIG-VD 备注:Accepted at EMNLP 2021 摘要:用于主动学习的常见采集函数使用不确定性或多样性采样,旨在分别从未标记数据池中选择困难和多样的数据点。在这项工作中,利用这两个方面的优势,我们提出了一个采集函数,用于选择text{对比示例},即模型特征空间中相似但模型输出最大不同预测可能性的数据点。我们将我们的方法CAL(对比主动学习)与四个自然语言理解任务和七个数据集中的一组不同的习得功能进行了比较。我们的实验表明,无论是域内还是域外数据,CAL在所有任务中的性能始终优于或等于最佳性能基线。我们还对我们的方法进行了广泛的研究,并进一步分析了所有主动获取的数据集,结果表明,与其他策略相比,CAL在不确定性和多样性之间实现了更好的权衡。 摘要:Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting textit{contrastive examples}, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

【5】 Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories 标题:推理长篇故事显著性的记忆和知识增强语言模型 链接:https://arxiv.org/abs/2109.03754

作者:David Wilmot,Frank Keller 机构:School of Informatics, University of Edinburgh 备注:Accepted to the EMNLP 2021 Conference as a long-paper, 9 pages, 15 pages with appendices and references, 2 figures, 4 tables 摘要:测量事件显著性对于理解故事至关重要。本文从巴特基本函数和惊奇理论出发,提出了一种新的无监督显著性检测方法,并将其应用于较长的叙事形式。我们改进了标准transformer语言模型,加入了一个外部知识库(源自检索增强生成),并添加了一个内存机制,以提高较长作品的性能。我们使用一种新颖的方法,从Shmoop经典文学作品语料库中使用章节对齐的摘要来获得显著性注释。我们对这些数据的评估表明,我们的显著性检测模型在非知识库和内存增强语言模型的基础上提高了性能,这两个模型对这种改进都至关重要。 摘要:Measuring event salience is essential in the understanding of stories. This paper takes a recent unsupervised method for salience detection derived from Barthes Cardinal Functions and theories of surprise and applies it to longer narrative forms. We improve the standard transformer language model by incorporating an external knowledgebase (derived from Retrieval Augmented Generation) and adding a memory mechanism to enhance performance on longer works. We use a novel approach to derive salience annotation using chapter-aligned summaries from the Shmoop corpus for classic literary works. Our evaluation against this data demonstrates that our salience detection model improves performance over and above a non-knowledgebase and memory augmented language model, both of which are crucial to this improvement.

【6】 Conjectures, Tests and Proofs: An Overview of Theory Exploration 标题:猜想、检验与证明:理论探索综述 链接:https://arxiv.org/abs/2109.03721

作者:Moa Johansson,Nicholas Smallbone 机构:Chalmers University of Technology, Gothenburg, Sweden 备注:None 摘要:数学推理的一个关键组成部分是能够对手头的问题领域进行有趣的猜测。在本文中,我们简要介绍了一个名为QuickSpec的理论探索系统,该系统能够自动发现关于给定函数集的有趣猜想。QuickSpec的工作原理是将术语生成与随机测试交错,形成候选猜想。通过从小尺寸开始并确保只考虑对于已经发现的猜想不可约的项,这是易于处理的。QuickSpec已成功应用于生成自动归纳定理证明的引理以及生成函数程序的规范。我们概述了QuickSpec的典型用例,并演示了如何轻松地将其连接到用户选择的定理证明器。 摘要:A key component of mathematical reasoning is the ability to formulate interesting conjectures about a problem domain at hand. In this paper, we give a brief overview of a theory exploration system called QuickSpec, which is able to automatically discover interesting conjectures about a given set of functions. QuickSpec works by interleaving term generation with random testing to form candidate conjectures. This is made tractable by starting from small sizes and ensuring that only terms that are irreducible with respect to already discovered conjectures are considered. QuickSpec has been successfully applied to generate lemmas for automated inductive theorem proving as well as to generate specifications of functional programs. We give an overview of typical use-cases of QuickSpec, as well as demonstrating how to easily connect it to a theorem prover of the user's choice.

【7】 BotSpot: Deep Learning Classification of Bot Accounts within Twitter 标题:BotSpot:Twitter内机器人账户的深度学习分类 链接:https://arxiv.org/abs/2109.03710

作者:Christopher Braker,Stavros Shiaeles,Gueltoum Bendiab,Nick Savage,Konstantinos Limniotis 机构:This paper is a preprint; it has been accepted for, publication in NEW,AN , ruSMART ,: Internet, of Things, Smart Spaces, and Next Generation Networks, and Systems pp ,-, DOI: ,.,,-,-,-,-,_ 备注:None 摘要:Twitter的开放性特性允许程序通过Twitter API自动生成和控制Twitter帐户。这些账户被称为机器人,可以像真人一样自动执行推特、重新推特、跟踪、取消跟踪或直接向其他账户发送消息等操作。他们还可以执行恶意任务,如传播假新闻、垃圾邮件、恶意软件和其他网络犯罪。在本文中,我们介绍了一种新的机器人检测方法,使用深度学习,多层感知器神经网络和机器人帐户的九个特征。开发了一个网络爬虫,用于自动从公共Twitter帐户收集数据,并使用860个人类和机器人帐户样本构建测试和训练数据集。在初始训练完成后,多层感知器神经网络的总体准确率达到92%,证明了该方法的性能。 摘要:The openness feature of Twitter allows programs to generate and control Twitter accounts automatically via the Twitter API. These accounts, which are known as bots, can automatically perform actions such as tweeting, re-tweeting, following, unfollowing, or direct messaging other accounts, just like real people. They can also conduct malicious tasks such as spreading of fake news, spams, malicious software and other cyber-crimes. In this paper, we introduce a novel bot detection approach using deep learning, with the Multi-layer Perceptron Neural Networks and nine features of a bot account. A web crawler is developed to automatically collect data from public Twitter accounts and build the testing and training datasets, with 860 samples of human and bot accounts. After the initial training is done, the Multilayer Perceptron Neural Networks achieved an overall accuracy rate of 92%, which proves the performance of the proposed approach.

【8】 Continuous Entailment Patterns for Lexical Inference in Context 标题:语境中词汇推理的连续蕴涵模式 链接:https://arxiv.org/abs/2109.03695

作者:Martin Schmitt,Hinrich Schütze 机构:Center for Information and Language Processing (CIS), LMU Munich, Germany 备注:Accepted as a short paper at EMNLP 2021. Code available at this https URL 摘要:将预训练语言模型(PLM)与文本模式相结合在Zero-Shot和Few-Shot设置中都有帮助。对于Zero-Shot性能,设计与自我监督预训练期间看到的文本非常相似的模式是有意义的,因为模型从未见过其他任何东西。有监督的训练允许更大的灵活性。如果我们允许PLM词汇表之外的标记,则模式可以更灵活地适应PLM的特性。对比“标记”可以是任何连续向量的模式与必须在词汇表元素之间进行离散选择的模式,我们称我们的方法为连续模式(CONAN)。我们在两个已建立的上下文词汇推理(LIiC)基准(又称谓词蕴涵)上评估了CONAN,这是一项具有挑战性的自然语言理解任务,训练集相对较小。与离散模式直接比较,柯南不断提高性能,开创了新的艺术境界。我们的实验为增强PLM在LIiC上的性能的模式类型提供了有价值的见解,并提出了有关我们使用文本模式理解PLM的重要问题。 摘要:Combining a pretrained language model (PLM) with textual patterns has been shown to help in both zero- and few-shot settings. For zero-shot performance, it makes sense to design patterns that closely resemble the text seen during self-supervised pretraining because the model has never seen anything else. Supervised training allows for more flexibility. If we allow for tokens outside the PLM's vocabulary, patterns can be adapted more flexibly to a PLM's idiosyncrasies. Contrasting patterns where a "token" can be any continuous vector vs. those where a discrete choice between vocabulary elements has to be made, we call our method CONtinuous pAtterNs (CONAN). We evaluate CONAN on two established benchmarks for lexical inference in context (LIiC) a.k.a. predicate entailment, a challenging natural language understanding task with relatively small training sets. In a direct comparison with discrete patterns, CONAN consistently leads to improved performance, setting a new state of the art. Our experiments give valuable insights into the kind of pattern that enhances a PLM's performance on LIiC and raise important questions regarding our understanding of PLMs using text patterns.

【9】 On Event-Driven Knowledge Graph Completion in Digital Factories 标题:数字工厂中事件驱动的知识图补全研究 链接:https://arxiv.org/abs/2109.03655

作者:Martin Ringsquandl,Evgeny Kharlamov,Daria Stepanova,Steffen Lamparter,Raffaello Lepratti,Ian Horrocks,Peer Kröger 机构:Ludwig-Maximilians Universit¨at, Munich, Germany, Oxford University, Oxford, United Kingdom, Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany, Siemens AG CT, Siemens PLM Software, Genoa, Italy, Peer Kr¨oger 备注:None 摘要:智能工厂配备了能够感知其制造环境、相互作用和控制生产过程的机器。此类工厂的顺利运行要求进行监测和诊断的机器和工程人员共享有关工厂的详细的通用工业知识,例如以知识图的形式。创建和维护此类知识的成本很高,需要自动化。在这项工作中,我们展示了专门针对工业应用的机器学习如何帮助完成知识图。特别是,我们展示了知识完成如何从智能工厂中常见的事件日志中获益。我们从一个受现实世界启发的智能工厂的知识图上对此进行评估,结果令人鼓舞。 摘要:Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation and maintenance of such knowledge is expensive and requires automation. In this work we show how machine learning that is specifically tailored towards industrial applications can help in knowledge graph completion. In particular, we show how knowledge completion can benefit from event logs that are common in smart factories. We evaluate this on the knowledge graph from a real world-inspired smart factory with encouraging results.

【10】 Tactile Image-to-Image Disentanglement of Contact Geometry from Motion-Induced Shear 标题:运动诱导剪切中接触几何的触觉像像解缠 链接:https://arxiv.org/abs/2109.03615

作者:Anupam K. Gupta,Laurence Aitchison,Nathan F. Lepora 机构:Department of Engineering Maths and Bristol Robotics Laboratory, University of Bristol United Kingdom, Department of Computer Science 备注:15 pages, 6 figure, under review CORL 2021 摘要:机器人触摸,特别是使用软光学触觉传感器时,会因运动相关剪切而产生变形。传感器接触刺激物的方式与关于刺激物几何形状的触觉信息纠缠在一起。在这项工作中,我们提出了一个有监督的卷积深度神经网络模型,该模型学习在潜在空间中分离由接触几何引起的传感器变形分量和由滑动诱发剪切引起的传感器变形分量。该方法通过从剪切图像重建无耳触觉图像并显示它们与无滑动运动采集的无耳触觉图像匹配来验证。此外,无耳触觉图像提供了从剪切数据不可能实现的接触几何体的忠实重建,以及可用于伺服控制围绕各种2D形状滑动的接触姿态的稳健估计。最后,将接触几何重建与伺服控制滑动相结合,实现了各种二维形状的真实全对象重建。该方法对具有剪切敏感触觉的机器人深度学习模型具有广泛的适用性。 摘要:Robotic touch, particularly when using soft optical tactile sensors, suffers from distortion caused by motion-dependent shear. The manner in which the sensor contacts a stimulus is entangled with the tactile information about the geometry of the stimulus. In this work, we propose a supervised convolutional deep neural network model that learns to disentangle, in the latent space, the components of sensor deformations caused by contact geometry from those due to sliding-induced shear. The approach is validated by reconstructing unsheared tactile images from sheared images and showing they match unsheared tactile images collected with no sliding motion. In addition, the unsheared tactile images give a faithful reconstruction of the contact geometry that is not possible from the sheared data, and robust estimation of the contact pose that can be used for servo control sliding around various 2D shapes. Finally, the contact geometry reconstruction in conjunction with servo control sliding were used for faithful full object reconstruction of various 2D shapes. The methods have broad applicability to deep learning models for robots with a shear-sensitive sense of touch.

【11】 LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR 标题:LiDARTouch:用小波束激光雷达进行单目测量深度估计 链接:https://arxiv.org/abs/2109.03569

作者:Florent Bartoccioni,Éloi Zablocki,Patrick Pérez,Matthieu Cord,Karteek Alahari 机构:Valeo.ai, rue de Courcelle, Paris, France, Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France. 备注:Preprint. Under review 摘要:基于视觉的深度估计是自治系统的一个关键特性,它通常依赖于单个摄像机或多个独立的摄像机。在这种单目设置中,密集深度通过一个或多个昂贵的激光雷达(例如64光束)的额外输入获得,或仅通过相机方法获得,这些方法存在尺度模糊和无限深度问题。在本文中,我们提出了一种新的方案,通过将单目相机与轻型激光雷达(例如,具有4束光束)相结合来密集估计米制深度,这是当今汽车级大规模生产的激光扫描仪的典型特征。受最近自我监督方法的启发,我们引入了一种称为LiDARTouch的新框架,利用激光雷达的“触感”从单目图像估计密集深度图,即不需要密集地面真实深度。在我们的设置中,最小激光雷达输入在三个不同的层次上起作用:作为附加模型的输入,在自监督激光雷达重建目标函数中,以及估计姿态变化(自监督深度估计体系结构的关键组成部分)。我们的LiDARTouch框架在KITTI数据集的自监督深度估计方面达到了最新水平,从而支持我们将非常稀疏的LiDAR信号与其他视觉特征相结合的选择。此外,我们还表明,使用几束激光雷达可以缓解仅相机方法所面临的尺度模糊和无限深度问题。我们还证明,完全监督深度完井文献中的方法可以适应具有最小激光雷达信号的自我监督机制。 摘要:Vision-based depth estimation is a key feature in autonomous systems, which often relies on a single camera or several independent ones. In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e.g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems. In this paper, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR, e.g., with 4 beams, typical of today's automotive-grade mass-produced laser scanners. Inspired by recent self-supervised methods, we introduce a novel framework, called LiDARTouch, to estimate dense depth maps from monocular images with the help of ``touches'' of LiDAR, i.e., without the need for dense ground-truth depth. In our setup, the minimal LiDAR input contributes on three different levels: as an additional model's input, in a self-supervised LiDAR reconstruction objective function, and to estimate changes of pose (a key component of self-supervised depth estimation architectures). Our LiDARTouch framework achieves new state of the art in self-supervised depth estimation on the KITTI dataset, thus supporting our choices of integrating the very sparse LiDAR signal with other visual features. Moreover, we show that the use of a few-beam LiDAR alleviates scale ambiguity and infinite-depth issues that camera-only methods suffer from. We also demonstrate that methods from the fully-supervised depth-completion literature can be adapted to a self-supervised regime with a minimal LiDAR signal.

【12】 NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction 标题:NSP-BERT:一种通过原始预训练任务--下一句预测--的基于提示的零命中率学习者 链接:https://arxiv.org/abs/2109.03564

作者:Yi Sun,Yu Zheng,Chao Hao,Hangping Qiu 机构:Army Engineering University of PLA 备注:11 pages, 9 figures 摘要:使用提示来利用语言模型执行各种下游任务,也称为基于提示的学习或提示学习,与训练前和微调范式相比,最近取得了显著的成功。尽管如此,几乎所有基于提示的方法都是标记级的,这意味着它们都使用GPT的从左到右语言模型或BERT的蒙面语言模型来执行完形填空风格的任务。在本文中,我们尝试使用RoBERTa和其他模型放弃的BERT原始预训练任务——下一句预测(NSP),在Zero-Shot场景中完成几个NLP任务。与令牌级技术不同,我们的基于句子级提示的方法NSP-BERT不需要固定提示的长度或要预测的位置,允许它轻松处理实体链接等任务。基于NSP-BERT的特点,我们为各种下游任务提供了几种快速构建模板。我们特别提出了一种两阶段提示的词义消歧方法。我们映射标签的策略显著提高了模型在句子对任务上的性能。在FewCLUE基准上,我们的NSP-BERT在大多数任务上都优于其他零炮方法,并且接近少数几种炮方法。 摘要:Using prompts to utilize language models to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform cloze-style tasks. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking with ease. Based on the characteristics of NSP-BERT, we offer several quick building templates for various downstream tasks. We suggest a two-stage prompt method for word sense disambiguation tasks in particular. Our strategies for mapping the labels significantly enhance the model's performance on sentence pair tasks. On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of these tasks and comes close to the few-shot methods.

【13】 Do What Nature Did To Us: Evolving Plastic Recurrent Neural Networks For Task Generalization 标题:做大自然对我们做的事:进化塑料递归神经网络用于任务概括 链接:https://arxiv.org/abs/2109.03554

作者:Fan Wang,Hao Tian,Haoyi Xiong,Hua Wu,Yang Cao,Yu Kang,Haifeng Wang 机构:† Baidu Inc., ‡ Department of Automation, University of Science and Technology of China 摘要:虽然人工神经网络(ANN)已被广泛应用于机器学习,但研究人员越来越关注ANN和生物神经网络(BNN)之间的差距。在本文中,我们提出了一个称为进化塑性回归神经网络(EPRN)的框架。受BNN的启发,EPRN将进化策略、可塑性规则和基于递归的学习组合在一个元学习框架中,用于推广到不同的任务。更具体地说,EPRN结合嵌套循环进行元学习——外循环搜索神经网络的最佳初始参数和学习规则;内部循环适应特定的任务。在EPRN的内环中,我们通过使用递归学习机制锻造可塑性,有效地获得了长期记忆和短期记忆,这两种机制被认为是BNN中记忆性的原因。内部循环设置与BNN非常相似,BNN既不从任何梯度oracle查询优化,也不需要学习目标的精确形式。为了评估EPRN的性能,我们在两组任务中进行了广泛的实验:序列预测和轮式机器人导航。实验结果表明,与基于可塑性和递归的最新技术相比,EPRN具有独特的优势,同时在任务中与基于深度学习的方法相比具有相当好的性能。实验结果表明,EPRN有潜力推广到各种任务,并鼓励在可塑性和基于递归的学习机制方面做出更多努力。 摘要:While artificial neural networks (ANNs) have been widely adopted in machine learning, researchers are increasingly obsessed by the gaps between ANNs and biological neural networks (BNNs). In this paper, we propose a framework named as Evolutionary Plastic Recurrent Neural Networks} (EPRNN). Inspired by BNN, EPRNN composes Evolution Strategies, Plasticity Rules, and Recursion-based Learning all in one meta learning framework for generalization to different tasks. More specifically, EPRNN incorporates with nested loops for meta learning -- an outer loop searches for optimal initial parameters of the neural network and learning rules; an inner loop adapts to specific tasks. In the inner loop of EPRNN, we effectively attain both long term memory and short term memory by forging plasticity with recursion-based learning mechanisms, both of which are believed to be responsible for memristance in BNNs. The inner-loop setting closely simulate that of BNNs, which neither query from any gradient oracle for optimization nor require the exact forms of learning objectives. To evaluate the performance of EPRNN, we carry out extensive experiments in two groups of tasks: Sequence Predicting, and Wheeled Robot Navigating. The experiment results demonstrate the unique advantage of EPRNN compared to state-of-the-arts based on plasticity and recursion while yielding comparably good performance against deep learning based approaches in the tasks. The experiment results suggest the potential of EPRNN to generalize to variety of tasks and encourage more efforts in plasticity and recursion based learning mechanisms.

【14】 Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi 标题:低资源语言的跨语言攻击性语言识别--以马拉提语为例 链接:https://arxiv.org/abs/2109.03552

作者:Saurabh Gaikwad,Tharindu Ranasinghe,Marcos Zampieri,Christopher M. Homan 机构:Rochester Institute of Technology, USA, University of Wolverhampton, UK 备注:Accepted to RANLP 2021 摘要:攻击性语言在社交媒体上的广泛存在推动了能够自动识别此类内容的系统的发展。除了几个显著的例外,大多数关于自动攻击性语言识别的研究都涉及英语。为了解决这个缺点,我们引入了MOLD,即马拉地攻击性语言数据集。MOLD是第一个为马拉地语编译的同类数据集,从而为低资源印度-雅利安语的研究开辟了一个新领域。我们展示了在该数据集上进行的几次机器学习实验的结果,包括zero short和其他基于现有孟加拉语、英语和印地语数据的最新跨语言转换学习实验。 摘要:The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English. To address this shortcoming, we introduce MOLD, the Marathi Offensive Language Dataset. MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in low-resource Indo-Aryan languages. We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-of-the-art cross-lingual transformers from existing data in Bengali, English, and Hindi.

【15】 A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions 标题:深度强化学习在推荐系统中的研究综述:系统回顾与未来发展方向 链接:https://arxiv.org/abs/2109.03540

作者:Xiaocong Chen,Lina Yao,Julian McAuley,Guangling Zhou,Xianzhi Wang 机构:GUANGLIN ZHOU, University of New South Wales, Australia 摘要:鉴于深度强化学习(DRL)在推荐系统研究中的出现以及近年来取得的一些成果,本次调查旨在及时全面地概述推荐系统中深度强化学习的最新趋势。我们从在推荐系统中应用DRL的动机开始。然后,我们提供了当前基于DRL的推荐系统的分类和现有方法的总结。我们讨论新出现的主题和开放性问题,并提供我们对推进该领域的观点。这项调查为来自学术界和工业界的读者提供了关于该主题的介绍材料,并确定了进一步研究的显著机会。 摘要:In light of the emergence of deep reinforcement learning (DRL) in recommender systems research and several fruitful results in recent years, this survey aims to provide a timely and comprehensive overview of the recent trends of deep reinforcement learning in recommender systems. We start with the motivation of applying DRL in recommender systems. Then, we provide a taxonomy of current DRL-based recommender systems and a summary of existing methods. We discuss emerging topics and open issues, and provide our perspective on advancing the domain. This survey serves as introductory material for readers from academia and industry into the topic and identifies notable opportunities for further research.

【16】 How do I update my model? On the resilience of Predictive Process Monitoring models to change 标题:如何更新我的模型?预测过程监控模型对变化的抗跌性研究 链接:https://arxiv.org/abs/2109.03501

作者:Williams Rizzi1,Chiara Di Francescomarino,Chiara Ghidini,Fabrizio Maria Maggi 机构: Fondazione Bruno Kessler (FBK), Trento, Italy, Free University of Bozen-Bolzano, Bolzano, Italy 摘要:现有的经过充分研究的预测性流程监控技术通常基于过去的流程执行情况构建预测模型,然后使用该模型预测新的正在进行的案例的未来,而不可能在新案例完成执行后使用新案例更新该模型。这可能会使预测过程监控过于僵化,无法处理在实际环境中工作的过程的可变性,这些环境随着时间的推移不断演变和/或表现出新的变化行为。作为这个问题的解决方案,我们评估了三种不同策略的使用,它们允许预测模型的周期性重新发现或增量构建,以便利用新的可用数据。评估侧重于新学习的预测模型相对于原始模型在准确性和时间方面的性能,并使用了大量真实和合成数据集,有无明确的概念漂移。结果证明了增量学习算法在实际环境中预测过程监控的潜力。 摘要:Existing well investigated Predictive Process Monitoring techniques typically construct a predictive model based on past process executions, and then use it to predict the future of new ongoing cases, without the possibility of updating it with new cases when they complete their execution. This can make Predictive Process Monitoring too rigid to deal with the variability of processes working in real environments that continuously evolve and/or exhibit new variant behaviours over time. As a solution to this problem, we evaluate the use of three different strategies that allow the periodic rediscovery or incremental construction of the predictive model so as to exploit new available data. The evaluation focuses on the performance of the new learned predictive models, in terms of accuracy and time, against the original one, and uses a number of real and synthetic datasets with and without explicit Concept Drift. The results provide an evidence of the potential of incremental learning algorithms for predicting process monitoring in real environments.

【17】 A Deep Reinforcement Learning Approach for Constrained Online Logistics Route Assignment 标题:约束在线物流路径分配的深度强化学习方法 链接:https://arxiv.org/abs/2109.03467

作者:Hao Zeng,Yangdong Liu,Dandan Zhang,Kunpeng Han,Haoyuan Hu 机构:Cainiao Network, Wenyi West Road, Xixi Block B, Hangzhou, China 备注:8 pages, 7 figures 摘要:随着网上购物的盛行和电子商务平台的出现,每天都有大量包裹被运送。因此,对于物流行业来说,如何为每个运输包裹正确分配候选物流路线至关重要,因为这会对总物流成本优化和业务约束满意度(如运输枢纽容量和交付供应商的交付比例)产生重大影响。该在线路径分配问题可以看作是一个有约束的在线决策问题。值得注意的是,每天大量(超过${10^5}$)的包裹,包裹信息的可变性和非马尔可夫特征,在不过度违反约束的情况下,难以获得(接近)最优解。在本文中,我们开发了一种称为PPO-RA的无模型DRL方法,其中使用专用技术改进了近端策略优化(PPO),以应对路由分配(RA)的挑战。参与者和评论家网络使用注意机制和参数共享来适应每个具有不同数量和身份的候选路线的传入包裹,而无需建模非马尔可夫包裹到达动力学,因为我们假设了i.i.d.包裹到达。我们通过模拟将PPO-RA与广泛使用的基线进行比较,使用记录的交付包裹数据来评估PPO-RA的性能。结果表明,该方法能够在满足大多数约束条件的同时实现可观的成本节约。 摘要:As online shopping prevails and e-commerce platforms emerge, there is a tremendous number of parcels being transported every day. Thus, it is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly as it leaves a significant impact on the total logistics cost optimization and business constraints satisfaction such as transit hub capacity and delivery proportion of delivery providers. This online route-assignment problem can be viewed as a constrained online decision-making problem. Notably, the large amount (beyond ${10^5}$) of daily parcels, the variability and non-Markovian characteristics of parcel information impose difficulties on attaining (near-) optimal solution without violating constraints excessively. In this paper, we develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA). The actor and critic networks use attention mechanism and parameter sharing to accommodate each incoming parcel with varying numbers and identities of candidate routes, without modeling non-Markovian parcel arriving dynamics since we make assumption of i.i.d. parcel arrival. We use recorded delivery parcel data to evaluate the performance of PPO-RA by comparing it with widely-used baselines via simulation. The results show the capability of the proposed approach to achieve considerable cost savings while satisfying most constraints.

【18】 ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Archival News Collections 标题:ArchivalQA:档案新闻集中开放领域问答的大规模基准数据集 链接:https://arxiv.org/abs/2109.03438

作者:Jiexin Wang,Adam Jatowt,Masatoshi Yoshikawa 机构:Kyoto University, Japan, University of Innsbruck, Austria 摘要:在过去几年中,由于深度学习技术的发展和大规模问答数据集的可用性,开放领域问答(ODQA)得到了迅速发展。然而,当前的数据集基本上是为同步文档收集而设计的(例如,维基百科)。时态新闻收藏,如跨越几十年的长期新闻档案,很少用于训练模型,尽管它们对我们的社会非常有价值。为了促进ODQA领域对此类历史收藏的研究,我们介绍了ArchivalQA,这是一个大型问答数据集,由1067056对问答组成,专为时态新闻问答而设计。此外,我们根据问题难度和时态表达式的包含情况创建了数据集的四个子部分,我们认为这对于训练或测试具有不同优势和能力的ODQA系统是有用的。我们介绍的新的QA数据集构建框架也可以应用于创建其他类型集合上的数据集。 摘要:In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives spanning several decades, are rarely used in training the models despite they are quite valuable for our society. In order to foster the research in the field of ODQA on such historical collections, we present ArchivalQA, a large question answering dataset consisting of 1,067,056 question-answer pairs which is designed for temporal news QA. In addition, we create four subparts of our dataset based on the question difficulty levels and the containment of temporal expressions, which we believe could be useful for training or testing ODQA systems characterized by different strengths and abilities. The novel QA dataset-constructing framework that we introduce can be also applied to create datasets over other types of collections.

【19】 Fixed Support Tree-Sliced Wasserstein Barycenter 标题:固定支撑树切片Wasserstein重心 链接:https://arxiv.org/abs/2109.03431

作者:Yuki Takezawa,Ryoma Sato,Zornitsa Kozareva,Sujith Ravi,Makoto Yamada 机构:Kyoto University, RIKEN AIP, Facebook AI Research, SliceX AI 摘要:Wasserstein重心在自然语言处理和计算机视觉等领域得到了广泛的研究。然而,解决Wasserstein重心问题需要较高的计算成本,因为计算Wasserstein距离需要与支撑数量相关的二次时间。相比之下,树上的Wasserstein距离(称为树Wasserstein距离)可以在线性时间内计算,并允许快速比较大量分布。在这项研究中,我们提出了树瓦瑟斯坦距离下的重心,称为固定支撑树瓦瑟斯坦重心(FS-TWB)及其扩展,称为固定支撑树切片瓦瑟斯坦重心(FS-TSWB)。更具体地说,我们首先证明了FS-TWB和FS-TSWB问题是凸优化问题,可以使用投影次梯度下降法求解。此外,我们还利用树的Wasserstein重心问题的性质,提出了一种更有效的计算次梯度和目标函数值的算法。通过真实世界的实验,我们表明,使用所提出的算法,FS-TWB和FS-TSWB的求解速度比原始Wasserstein重心快两个数量级。 摘要:The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter.

【20】 It is AI's Turn to Ask Human a Question: Question and Answer Pair Generation for Children Storybooks in FairytaleQA Dataset 标题:轮到人工智能向人类提问了:FairytaleQA数据集中儿童故事书的问答对生成 链接:https://arxiv.org/abs/2109.03423

作者:Bingsheng Yao,Dakuo Wang,Tongshuang Wu,Tran Hoang,Branda Sun,Toby Jia-Jun Li,Mo Yu,Ying Xu 机构:Rensselaer Polytechnic Institute, IBM Research, University of Washington, University of California Irvine, University of Notre Dame 摘要:现有问答(QA)数据集的创建主要是为了让AI能够回答人类提出的问题。但在教育应用中,教师和家长有时可能不知道应该问孩子什么问题才能最大限度地提高他们的语言学习效果。教育专家在46本童话故事书中为幼儿读者标记了一个新发布的图书QA数据集(FairytaleQA),我们为这个新应用开发了一个自动化QA生成模型体系结构。我们的模型(1)通过基于教学框架的精心设计的启发式,从给定的故事书段落中提取候选答案;(2) 使用语言模型生成对应于每个提取答案的适当问题;并且,(3)使用另一个QA模型对顶级QA对进行排名。自动和人工评估表明,我们的模型优于基线。我们还证明,我们的方法可以通过对200本未标记的故事书进行数据扩充,帮助解决儿童图书QA数据集的稀缺性问题。 摘要:Existing question answering (QA) datasets are created mainly for the application of having AI to be able to answer questions asked by humans. But in educational applications, teachers and parents sometimes may not know what questions they should ask a child that can maximize their language learning results. With a newly released book QA dataset (FairytaleQA), which educational experts labeled on 46 fairytale storybooks for early childhood readers, we developed an automated QA generation model architecture for this novel application. Our model (1) extracts candidate answers from a given storybook passage through carefully designed heuristics based on a pedagogical framework; (2) generates appropriate questions corresponding to each extracted answer using a language model; and, (3) uses another QA model to rank top QA-pairs. Automatic and human evaluations show that our model outperforms baselines. We also demonstrate that our method can help with the scarcity issue of the children's book QA dataset via data augmentation on 200 unlabeled storybooks.

【21】 Visual Sensation and Perception Computational Models for Deep Learning: State of the art, Challenges and Prospects 标题:深度学习中的视觉和知觉计算模型:现状、挑战和展望 链接:https://arxiv.org/abs/2109.03391

作者:Bing Wei,Yudi Zhao,Kuangrong Hao,Lei Gao 机构:E 摘要:视觉感知是指在环境意识和理解中感知、组织、识别和解释视觉信息的过程。基于视觉感知的计算模型来源于认知科学、信息科学、人工智能等学科,具有复杂性和多样性的特点。本文从生物视觉机制和计算视觉理论出发,系统地研究了面向深度学习的视知觉计算模型。然后,对视觉感知计算模型的发展前景提出了一些看法。最后,本文还总结了当前视觉感知面临的挑战,并预测了未来的发展趋势。通过本次调查,将为这一方向的研究提供全面的参考。 摘要:Visual sensation and perception refers to the process of sensing, organizing, identifying, and interpreting visual information in environmental awareness and understanding. Computational models inspired by visual perception have the characteristics of complexity and diversity, as they come from many subjects such as cognition science, information science, and artificial intelligence. In this paper, visual perception computational models oriented deep learning are investigated from the biological visual mechanism and computational vision theory systematically. Then, some points of view about the prospects of the visual perception computational models are presented. Finally, this paper also summarizes the current challenges of visual perception and predicts its future development trends. Through this survey, it will provide a comprehensive reference for research in this direction.

【22】 RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management 标题:RoadAtlas:道路缺陷自动检测和资产管理的智能平台 链接:https://arxiv.org/abs/2109.03385

作者:Zhuoxiao Chen,Yiyun Zhang,Yadan Luo,Zijian Wang,Jinjiang Zhong,Anthony Southon 机构:The University of Queensland,Logan City Council, Australia 摘要:随着基于深度学习的智能检测算法的快速发展,道路缺陷自动识别和道路标线解析技术取得了很大的进展。这可以有效地解决专业检查员手工审查街道的过程既昂贵又耗时的问题。为了实现这一目标,我们推出了RoadAtlas,这是一种新型的端到端集成系统,可支持1)道路缺陷检测、2)道路标记解析、3)基于web的仪表板,用于显示和输入用户数据,以及4)包含结构良好的数据库和开发的API的后端。 摘要:With the rapid development of intelligent detection algorithms based on deep learning, much progress has been made in automatic road defect recognition and road marking parsing. This can effectively address the issue of an expensive and time-consuming process for professional inspectors to review the street manually. Towards this goal, we present RoadAtlas, a novel end-to-end integrated system that can support 1) road defect detection, 2) road marking parsing, 3) a web-based dashboard for presenting and inputting data by users, and 4) a backend containing a well-structured database and developed APIs.

【23】 DeepZensols: Deep Natural Language Processing Framework 标题:DeepZensols:深层自然语言处理框架 链接:https://arxiv.org/abs/2109.03383

作者:Paul Landes,Barbara Di Eugenio,Cornelia Caragea 机构:Department of Computer Science, University of Illinois at Chicago 摘要:通过分发公开的源代码在出版物中复制结果变得越来越流行。鉴于再现机器学习(ML)实验的难度,在减少这些结果的方差方面做出了重大努力。与任何科学一样,持续复制结果的能力有效地强化了工作的基本假设,因此,应该被视为与研究本身的新颖方面一样重要。这项工作的贡献是一个能够重现一致结果的框架,并提供了一种轻松创建、训练和评估自然语言处理(NLP)深度学习(DL)模型的方法。 摘要:Reproducing results in publications by distributing publicly available source code is becoming ever more popular. Given the difficulty of reproducing machine learning (ML) experiments, there have been significant efforts in reducing the variance of these results. As in any science, the ability to consistently reproduce results effectively strengthens the underlying hypothesis of the work, and thus, should be regarded as important as the novel aspect of the research itself. The contribution of this work is a framework that is able to reproduce consistent results and provides a means of easily creating, training, and evaluating natural language processing (NLP) deep learning (DL) models.

【24】 Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering 标题:口语答疑中的自监督对比跨模态表征学习 链接:https://arxiv.org/abs/2109.03381

作者:Chenyu You,Nuo Chen,Yuexian Zou 机构:† Department of Electrical Engineering, Yale University, New Haven, CT, USA, ‡ADSPLAB, School of ECE, Peking University, Shenzhen, China, §Peng Cheng Laboratory, Shenzhen, China 摘要:口语问答(SQA)需要对口语文档和问题进行细粒度的理解,以实现最佳答案预测。在本文中,我们提出了一种新的口语问答训练方案,包括自我监督训练阶段和对比表征学习阶段。在自监督阶段,我们提出了三个辅助的自监督任务,包括话语恢复、话语插入和问题识别,并联合训练模型在不需要任何附加数据或注释的情况下捕获语音文档之间的一致性和连贯性。然后,我们提出在对比目标中通过多种增广策略学习噪声不变的话语表征,包括广度删除和广度替换。此外,我们还设计了一种时间对齐方法,注意在学习到的公共空间中对语音文本线索进行语义对齐,从而有利于SQA任务的完成。通过这种方式,训练方案可以更有效地指导生成模型预测更合适的答案。实验结果表明,我们的模型在三个SQA基准上达到了最先进的结果。 摘要:Spoken question answering (SQA) requires fine-grained understanding of both spoken documents and questions for the optimal answer prediction. In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage. In the self-supervised stage, we propose three auxiliary self-supervised tasks, including utterance restoration, utterance insertion, and question discrimination, and jointly train the model to capture consistency and coherence among speech documents without any additional data or annotations. We then propose to learn noise-invariant utterance representations in a contrastive objective by adopting multiple augmentation strategies, including span deletion and span substitution. Besides, we design a Temporal-Alignment attention to semantically align the speech-text clues in the learned common space and benefit the SQA tasks. By this means, the training schemes can more effectively guide the generation model to predict more proper answers. Experimental results show that our model achieves state-of-the-art results on three SQA benchmarks.

【25】 Malware Squid: A Novel IoT Malware Traffic Analysis Framework using Convolutional Neural Network and Binary Visualisation 标题:恶意软件Squid:一种基于卷积神经网络和二值可视化的物联网恶意软件流量分析框架 链接:https://arxiv.org/abs/2109.03375

作者:Robert Shire,Stavros Shiaeles,Keltoum Bendiab,Bogdan Ghita,Nicholas Kolokotronis 备注:None 摘要:近年来,物联网设备迅速发展和普及,越来越多的普通设备获得了网络能力,并成为不断增长的物联网网络的一部分。随着这一指数增长和资源的限制,防范恶意软件等安全威胁变得越来越困难,因为其发展速度快于防御机制所能处理的速度。传统的安全系统无法检测未知的恶意软件,因为它们使用基于签名的方法。在本文中,我们旨在通过引入一种新的物联网恶意软件流量分析方法来解决这个问题,该方法使用神经网络和二进制可视化。该方法的主要动机是更快地检测和分类新恶意软件(零日恶意软件)。实验结果表明,该方法能够满足实际应用的精度要求。 摘要:Internet of Things devices have seen a rapid growth and popularity in recent years with many more ordinary devices gaining network capability and becoming part of the ever growing IoT network. With this exponential growth and the limitation of resources, it is becoming increasingly harder to protect against security threats such as malware due to its evolving faster than the defence mechanisms can handle with. The traditional security systems are not able to detect unknown malware as they use signature-based methods. In this paper, we aim to address this issue by introducing a novel IoT malware traffic analysis approach using neural network and binary visualisation. The prime motivation of the proposed approach is to faster detect and classify new malware (zero-day malware). The experiment results show that our method can satisfy the accuracy requirement of practical application.

【26】 Identifying Influential Nodes in Two-mode Data Networks using Formal Concept Analysis 标题:基于形式概念分析的双模数据网络影响节点识别 链接:https://arxiv.org/abs/2109.03372

作者:Mohamed-Hamza Ibrahim,Rokia Missaoui,Jean Vaillancourt 机构:Universit´e du Qu´ebec en Outaouais, Canada, HEC Montreal, Canada 摘要:识别双模网络中的重要参与者(或节点)通常仍然是挖掘、分析和解释现实世界网络的关键挑战。虽然传统的二部中心度指数通常用于识别影响网络信息流的关键节点,但在复杂情况下,如具有复杂局部结构的大规模网络或缺乏关于网络拓扑和某些属性的完整知识时,它们通常会产生较差的结果。在本文中,我们介绍了一种新的用于识别双模网络中重要节点的二部中心度度量——双面(BF)。BF度量使用形式概念分析的强大数学形式主义,利用概念意图的面来识别具有有影响力的双桥连通性且不位于不相关桥中的节点。与现成的中心性指数不同,它量化了一个节点如何通过BICLUES对其相邻节点产生内聚子结构影响,而不是通过其不存在于非影响桥而处于网络核心外围节点。我们在几个真实和合成网络上的实验表明,BF比现有的显著的二部中心性度量(如介数、贴近度、特征向量和投票排名等)更有效。 摘要:Identifying important actors (or nodes) in a two-mode network often remains a crucial challenge in mining, analyzing, and interpreting real-world networks. While traditional bipartite centrality indices are often used to recognize key nodes that influence the network information flow, they frequently produce poor results in intricate situations such as massive networks with complex local structures or a lack of complete knowledge about the network topology and certain properties. In this paper, we introduce Bi-face (BF), a new bipartite centrality measurement for identifying important nodes in two-mode networks. Using the powerful mathematical formalism of Formal Concept Analysis, the BF measure exploits the faces of concept intents to identify nodes that have influential bicliques connectivity and are not located in irrelevant bridges. Unlike off-the shelf centrality indices, it quantifies how a node has a cohesive-substructure influence on its neighbour nodes via bicliques while not being in network core-peripheral ones through its absence from non-influential bridges. Our experiments on several real-world and synthetic networks show the efficiency of BF over existing prominent bipartite centrality measures such as betweenness, closeness, eigenvector, and vote-rank among others.

【27】 Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation 标题:基于分解代码图表示的深度学习软件漏洞检测 链接:https://arxiv.org/abs/2109.03341

作者:Yufan Zhuang,Sahil Suneja,Veronika Thost,Giacomo Domeniconi,Alessandro Morari,Jim Laredo 机构:IBM Research, Yorktown Heights, NY, USA 备注:Submitted June 2020 摘要:识别易受攻击的代码是防范软件安全漏洞的预防措施。单调乏味的专家工作已经花费在构建静态分析器上,但不安全的模式几乎没有被完全列举出来。这项工作探索了一种从代码语料库中自动学习不安全模式的深度学习方法。由于代码通过解析自然地允许图形结构,因此我们开发了一种新的图形神经网络(GNN)来利用程序的语义上下文和结构规则性,以提高预测性能。与一般GNN相比,我们的增强包括从程序的几个解析图中学习到的多个表示的综合,以及利用标记的精细粒度的新训练损失度量。我们的模型在两个真实数据集上优于多种基于文本、图像和图形的方法。 摘要:Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.

【28】 On the Challenges of Evaluating Compositional Explanations in Multi-Hop Inference: Relevance, Completeness, and Expert Ratings 标题:论多跳推理中评价成分解释的挑战:相关性、完备性和专家评分 链接:https://arxiv.org/abs/2109.03334

作者:Peter Jansen,Kelly Smith,Dan Moreno,Huitzilin Ortiz 机构:University of Arizona, USA 备注:Accepted to EMNLP 2021 摘要:构建成分解释需要模型结合两个或多个事实,这些事实一起描述问题答案的正确性。通常,这些“多跳”解释相对于一个(或少量)黄金解释进行评估。在这项工作中,我们发现这些评估大大低估了模型的性能,无论是在所包含事实的相关性方面,还是在模型生成的解释的完整性方面,因为模型经常发现并生成与黄金解释不同的有效解释。为了解决这个问题,我们构建了一个包含126k个领域专家(科学教师)相关评级的大型语料库,该语料库补充了标准化科学试题的解释语料库,发现了另外80k个未被评为金牌的相关事实。我们基于不同的方法(生成、排名和模式)构建了三个强大的模型,经验表明,虽然专家增强评级提供了更好的解释质量估计,但都是原始的(黄金)与完全手动的专家判断相比,专家增强的自动评估仍然大大低估了性能,高达36%,不同的模型受到的影响不成比例。这对准确评估组合推理模型产生的解释提出了重大的方法学挑战。 摘要:Building compositional explanations requires models to combine two or more facts that, together, describe why the answer to a question is correct. Typically, these "multi-hop" explanations are evaluated relative to one (or a small number of) gold explanations. In this work, we show these evaluations substantially underestimate model performance, both in terms of the relevance of included facts, as well as the completeness of model-generated explanations, because models regularly discover and produce valid explanations that are different than gold explanations. To address this, we construct a large corpus of 126k domain-expert (science teacher) relevance ratings that augment a corpus of explanations to standardized science exam questions, discovering 80k additional relevant facts not rated as gold. We build three strong models based on different methodologies (generation, ranking, and schemas), and empirically show that while expert-augmented ratings provide better estimates of explanation quality, both original (gold) and expert-augmented automatic evaluations still substantially underestimate performance by up to 36% when compared with full manual expert judgements, with different models being disproportionately affected. This poses a significant methodological challenge to accurately evaluating explanations produced by compositional reasoning models.

【29】 CyGIL: A Cyber Gym for Training Autonomous Agents over Emulated Network Systems 标题:CyGIL:一种在仿真网络系统上训练自治Agent的网络健身房 链接:https://arxiv.org/abs/2109.03331

作者:Li Li,Raed Fayad,Adrian Taylor 机构:Defence Research and Development Canada, Dept. of Electrical and Computer Engineering, Queens University, Canada 备注:Presented at 1st International Workshop on Adaptive Cyber Defense, 2021 (arXiv:2108.08476) 摘要:鉴于强化学习(RL)在各个领域的成功,有希望探索其方法在智能自主网络代理开发中的应用。实现这一发展需要有代表性的RL训练环境。为此,这项工作介绍了CyGIL:一个用于网络网络操作的模拟RL训练环境的实验测试台。CyGIL使用无状态环境体系结构,并结合MITRE ATT&CK框架来建立高保真训练环境,同时提供充分抽象的接口以支持RL训练。其全面的行动空间和灵活的游戏设计允许代理训练专注于特定的高级持久性威胁(APT)配置文件,并纳入广泛的潜在威胁和漏洞。通过在保真度和简单性之间取得平衡,它旨在利用最先进的RL算法应用于现实世界的网络防御。 摘要:Given the success of reinforcement learning (RL) in various domains, it is promising to explore the application of its methods to the development of intelligent and autonomous cyber agents. Enabling this development requires a representative RL training environment. To that end, this work presents CyGIL: an experimental testbed of an emulated RL training environment for network cyber operations. CyGIL uses a stateless environment architecture and incorporates the MITRE ATT&CK framework to establish a high fidelity training environment, while presenting a sufficiently abstracted interface to enable RL training. Its comprehensive action space and flexible game design allow the agent training to focus on particular advanced persistent threat (APT) profiles, and to incorporate a broad range of potential threats and vulnerabilities. By striking a balance between fidelity and simplicity, it aims to leverage state of the art RL algorithms for application to real-world cyber defence.

【30】 Predicting Process Name from Network Data 标题:从网络数据预测进程名称 链接:https://arxiv.org/abs/2109.03328

作者:Justin Allen,David Knapp,Kristine Monteith 机构:Lawrence Livermore National Laboratory 备注:Presented at 1st International Workshop on Adaptive Cyber Defense, 2021 (arXiv:2108.08476) 摘要:基于应用程序生成的网络数据识别应用程序的能力可能是网络防御的宝贵工具。我们报告了一种机器学习技术,该技术能够使用类似netflow的特性来预测产生流量的应用程序。在我们的实验中,我们使用了从部署在大型企业环境中的基于主机的传感器获得的地面真相标签;我们将随机森林和多层感知器应用于浏览器与非浏览器识别、浏览器指纹识别和进程名称预测等任务。对于这些任务中的每一项,我们将演示机器学习模型如何仅使用类似netflow的特征作为分类基础来实现高分类精度。 摘要:The ability to identify applications based on the network data they generate could be a valuable tool for cyber defense. We report on a machine learning technique capable of using netflow-like features to predict the application that generated the traffic. In our experiments, we used ground-truth labels obtained from host-based sensors deployed in a large enterprise environment; we applied random forests and multilayer perceptrons to the tasks of browser vs. non-browser identification, browser fingerprinting, and process name prediction. For each of these tasks, we demonstrate how machine learning models can achieve high classification accuracy using only netflow-like features as the basis for classification.

【31】 Effective and interpretable dispatching rules for dynamic job shops via guided empirical learning 标题:基于引导式经验学习的动态作业车间有效可解释调度规则 链接:https://arxiv.org/abs/2109.03323

作者:Cristiane Ferreira,Gonçalo Figueira,Pedro Amorim 机构:INESC TEC, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, sn,-, Porto, Portugal 摘要:工业4.0的出现使生产系统更加灵活,也更加动态。在这些设置中,调度规则通常需要实时调整调度。虽然直到90年代才取得实质性进展,但这些规则的执行仍然相当有限。机器学习文献正在开发各种方法来改进它们,但是产生的规则很难解释,并且不能很好地推广到广泛的环境中。本文是将机器学习与领域问题推理相结合用于调度的首次重大尝试。这一想法包括利用后者获得的见解来指导前者的实证研究。我们的假设是,这种有指导的经验学习过程应该产生有效且可解释的调度规则,并且可以很好地推广到不同的实例类。我们在经典的动态jobshop调度问题中测试了我们的方法,该问题是研究最为深入的调度问题之一。尽管如此,结果表明,我们的方法能够找到新的最先进的规则,在绝大多数情况下,从宽松的到期日到紧张的到期日,从低利用率条件到拥挤的商店,这些规则的表现明显优于现有文献。总体而言,平均改善率为19%。此外,这些规则紧凑、可解释,并能很好地推广到极端的、看不见的场景。 摘要:The emergence of Industry 4.0 is making production systems more flexible and also more dynamic. In these settings, schedules often need to be adapted in real-time by dispatching rules. Although substantial progress was made until the '90s, the performance of these rules is still rather limited. The machine learning literature is developing a variety of methods to improve them, but the resulting rules are difficult to interpret and do not generalise well for a wide range of settings. This paper is the first major attempt at combining machine learning with domain problem reasoning for scheduling. The idea consists of using the insights obtained with the latter to guide the empirical search of the former. Our hypothesis is that this guided empirical learning process should result in dispatching rules that are effective and interpretable and which generalise well to different instance classes. We test our approach in the classical dynamic job shop scheduling problem minimising tardiness, which is one of the most well-studied scheduling problems. Nonetheless, results suggest that our approach was able to find new state-of-the-art rules, which significantly outperform the existing literature in the vast majority of settings, from loose to tight due dates and from low utilisation conditions to congested shops. Overall, the average improvement is 19%. Moreover, the rules are compact, interpretable, and generalise well to extreme, unseen scenarios.

【32】 Melatect: A Machine Learning Model Approach For Identifying Malignant Melanoma in Skin Growths 标题:Melatect:一种识别皮肤生长过程中恶性黑色素瘤的机器学习模型方法 链接:https://arxiv.org/abs/2109.03310

作者:Vidushi Meel,Asritha Bodepudi 机构:Lexington High School, Lexington, MA 备注:11 Pages, Preprint 摘要:恶性黑色素瘤是一种常见的皮肤癌,在转移前通常可以治愈,黑色素瘤生长在远离原发部位的器官中。黑色素瘤是最危险的皮肤癌类型,如果不治疗,转移的可能性很高。本文介绍了Melatect,一种识别潜在恶性黑色素瘤的机器学习模型。一个递归的计算机图像分析算法被用来创建一个机器学习模型,能够检测可能的黑色素瘤。该比较是使用国际皮肤成像合作(ISIC)档案中的20000张良性和恶性病变原始图像进行的,这些图像增加到60000张。使用ISIC图像子集对该算法进行的测试表明,该算法在95%以上的时间内准确地将病变分为恶性病变或良性病变,且无明显偏差或过度拟合。后来创建了Melatect iOS应用程序(未发布),其中嵌入了机器学习模型。使用该应用程序,用户可以使用该应用程序拍摄皮肤损伤(痣)的照片,然后通过机器学习模型进行处理,并通知用户其损伤是否可能异常。Melatect提供了一种方便的方法,可以获得关于病变的免费建议,并随着时间的推移跟踪这些病变。 摘要:Malignant melanoma is a common skin cancer that is mostly curable before metastasis, where melanoma growths spawn in organs away from the original site. Melanoma is the most dangerous type of skin cancer if left untreated due to the high chance of metastasis. This paper presents Melatect, a machine learning model that identifies potential malignant melanoma. A recursive computer image analysis algorithm was used to create a machine learning model which is capable of detecting likely melanoma. The comparison is performed using 20,000 raw images of benign and malignant lesions from the International Skin Imaging Collaboration (ISIC) archive that were augmented to 60,000 images. Tests of the algorithm using subsets of the ISIC images suggest it accurately classifies lesions as malignant or benign over 95% of the time with no apparent bias or overfitting. The Melatect iOS app was later created (unpublished), in which the machine learning model was embedded. With the app, users have the ability to take pictures of skin lesions (moles) using the app, which are then processed through the machine learning model, and users are notified whether their lesion could be abnormal or not. Melatect provides a convenient way to get free advice on lesions and track these lesions over time.

【33】 Have a break from making decisions, have a MARS: The Multi-valued Action Reasoning System 标题:从决策中解脱出来,拥有一个MARS:多值动作推理系统 链接:https://arxiv.org/abs/2109.03283

作者:Cosmin Badea 机构:Department of Computing, Imperial College London, SW,AZ, UK 摘要:多值行为推理系统(MARS)是一种基于价值的人工智能道德决策模型。给定一组可用的行为和一个潜在的道德范式,通过使用MARS,人们可以确定道德上首选的行为。它可以在自动化实践推理和规范性决策分析的背景下,用于实施和模拟不同的伦理理论、不同的道德范式以及这些理论和范式的组合。它还可以用来模拟道德困境,并发现导致期望结果的道德范式。在这篇文章中,我们给出了火星的一个简明描述,解释了它的用途,并在现有文献中进行了比较。 摘要:The Multi-valued Action Reasoning System (MARS) is an automated value-based ethical decision-making model for artificial agents (AI). Given a set of available actions and an underlying moral paradigm, by employing MARS one can identify the ethically preferred action. It can be used to implement and model different ethical theories, different moral paradigms, as well as combinations of such, in the context of automated practical reasoning and normative decision analysis. It can also be used to model moral dilemmas and discover the moral paradigms that result in the desired outcomes therein. In this paper, we give a condensed description of MARS, explain its uses, and comparatively place it in the existing literature.

【34】 Exploration of Quantum Neural Architecture by Mixing Quantum Neuron Designs 标题:混合量子神经元设计探索量子神经结构 链接:https://arxiv.org/abs/2109.03806

作者:Zhepeng Wang,Zhiding Liang,Shanglin Zhou,Caiwen Ding,Jinjun Xiong,Yiyu Shi,Weiwen Jiang 机构:†George Mason University, VA, USA. ‡University of Notre Dame, IN, USA., §University of Connecticut, CT, USA. ¶University at Buffalo, NY, USA. 摘要:随着实际量子计算机中量子比特(qubits)数量的不断增加,在量子计算机上实现和加速流行的深度学习成为可能。随着这一趋势,出现了基于不同量子神经元设计的量子神经结构。量子深度学习的一个基本问题是:什么是最好的量子神经结构?受经典计算中神经结构设计的启发,本文首次尝试混合量子神经元设计来构建量子神经结构。我们观察到,现有的量子神经元设计可能是完全不同的,但是互补的,例如来自变异量子电路(VQC)和量子流的神经元。更具体地说,VQC可以应用实值权重,但会受到扩展到多层的影响,而QuantumFlow可以高效地构建多层网络,但仅限于使用二进制权重。为了利用它们各自的优势,我们建议将它们混合在一起,并找出一种无需额外成本测量就能无缝连接它们的方法。我们进一步研究了混合量子神经元的设计原则,为今后量子神经体系结构的探索提供了指导。实验结果表明,与VQC和QuantumFlow的准确率分别为52.77%和69.92%相比,使用混合量子神经元识别的量子神经结构在MNIST数据集上可以达到90.62%的准确率。 摘要:With the constant increase of the number of quantum bits (qubits) in the actual quantum computers, implementing and accelerating the prevalent deep learning on quantum computers are becoming possible. Along with this trend, there emerge quantum neural architectures based on different designs of quantum neurons. A fundamental question in quantum deep learning arises: what is the best quantum neural architecture? Inspired by the design of neural architectures for classical computing which typically employs multiple types of neurons, this paper makes the very first attempt to mix quantum neuron designs to build quantum neural architectures. We observe that the existing quantum neuron designs may be quite different but complementary, such as neurons from variation quantum circuits (VQC) and Quantumflow. More specifically, VQC can apply real-valued weights but suffer from being extended to multiple layers, while QuantumFlow can build a multi-layer network efficiently, but is limited to use binary weights. To take their respective advantages, we propose to mix them together and figure out a way to connect them seamlessly without additional costly measurement. We further investigate the design principles to mix quantum neurons, which can provide guidance for quantum neural architecture exploration in the future. Experimental results demonstrate that the identified quantum neural architectures with mixed quantum neurons can achieve 90.62% of accuracy on the MNIST dataset, compared with 52.77% and 69.92% on the VQC and QuantumFlow, respectively.

【35】 Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning 标题:批异步随机逼近的收敛性及其在强化学习中的应用 链接:https://arxiv.org/abs/2109.03445

作者:Rajeeva L. Karandikar,M. Vidyasagar 备注:11 pages 摘要:随机近似(SA)算法是一种广泛使用的概率方法,用于寻找形式为$mathbf{f}(boldsymbol{theta})=mathbf{0}$的方程的解,其中$mathbf{f}:mathbb{R}^drightarrowmathbb{R}^d$,此时只有$mathbf f{f}(cdot)的噪声测量值可用。在迄今为止的文献中,人们可以区分“同步”更新和“异步”更新,前者是指每次更新当前猜测$boldsymbol{theta}u t$的整个向量,后者是指更新$boldsymbol{theta}u t$的一个组件。在凸优化和非凸优化中,还存在“批量”更新的概念,即$boldsymbol{theta}u t$的部分但不是全部组件在每次$t$时更新。此外,使用“本地”时钟和“全局”时钟之间也有区别。在迄今为止的文献中,当使用本地时钟时,收敛性证明假设测量噪声是i.i.d 序列,这一假设在强化学习(RL)中不成立。在本文中,我们提供了批非同步随机逼近(BASA)的一般收敛理论,该理论适用于测量噪声形成鞅差序列的情况,无论更新使用本地时钟还是全局时钟。这是迄今为止最普遍的结果,包括所有其他结果。 摘要:The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a solution to an equation of the form $mathbf{f}(boldsymbol{theta}) = mathbf{0}$ where $mathbf{f} : mathbb{R}^d rightarrow mathbb{R}^d$, when only noisy measurements of $mathbf{f}(cdot)$ are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby the entire vector of the current guess $boldsymbol{theta}_t$ is updated at each time, and "asynchronous" updating, whereby ony one component of $boldsymbol{theta}_t$ is updated. In convex and nonconvex optimization, there is also the notion of "batch" updating, whereby some but not all components of $boldsymbol{theta}_t$ are updated at each time $t$. In addition, there is also a distinction between using a "local" clock versus a "global" clock. In the literature to date, convergence proofs when a local clock is used make the assumption that the measurement noise is an i.i.d sequence, an assumption that does not hold in Reinforcement Learning (RL). In this note, we provide a general theory of convergence for batch asymchronous stochastic approximation (BASA), that works whether the updates use a local clock or a global clock, for the case where the measurement noises form a martingale difference sequence. This is the most general result to date and encompasses all others.

【36】 Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition 标题:高效一致性:自动语音识别中的渐进式下采样和分组注意 链接:https://arxiv.org/abs/2109.01163

作者:Maxime Burchi,Valentin Vielzeuf 机构:Orange Labs, Cesson-S´evign´e, France 备注:None 摘要:最近提出的Conformer体系结构通过结合卷积和对模型局部和全局依赖性的关注,在自动语音识别中显示了最先进的性能。在本文中,我们研究如何在有限的计算预算下降低Conformer体系结构的复杂性,从而实现一种更高效的体系结构设计,我们称之为高效Conformer。我们在构象编码器中引入渐进下采样,并提出了一种新的注意机制,称为分组注意,允许我们将注意复杂度从$O(n^{2}d)$降低到$O(n^{2}d/g)$,用于序列长度$n$、隐藏维度$d$和组大小参数$g$。我们还实验了使用跨步多头自我注意作为一种全局下采样操作。我们的实验是在LibriSpeech数据集上进行的,带有CTC和RNN传感器损耗。我们表明,在相同的计算预算下,与一致性架构相比,该架构在更快的训练和解码速度下实现了更好的性能。我们的13M参数CTC模型在不使用语言模型的情况下实现了3.6%/9.0%的有竞争力的WER,在测试清洁/测试其他集合的情况下,使用外部n-gram语言模型实现了2.7%/6.7%的有竞争力的WER,同时在推理时比我们的CTC一致性基线快29%,训练速度快36%。 摘要:The recently proposed Conformer architecture has shown state-of-the-art performances in Automatic Speech Recognition by combining convolution with attention to model both local and global dependencies. In this paper, we study how to reduce the Conformer architecture complexity with a limited computing budget, leading to a more efficient architecture design that we call Efficient Conformer. We introduce progressive downsampling to the Conformer encoder and propose a novel attention mechanism named grouped attention, allowing us to reduce attention complexity from $O(n^{2}d)$ to $O(n^{2}d / g)$ for sequence length $n$, hidden dimension $d$ and group size parameter $g$. We also experiment the use of strided multi-head self-attention as a global downsampling operation. Our experiments are performed on the LibriSpeech dataset with CTC and RNN-Transducer losses. We show that within the same computing budget, the proposed architecture achieves better performances with faster training and decoding compared to the Conformer. Our 13M parameters CTC model achieves competitive WERs of 3.6%/9.0% without using a language model and 2.7%/6.7% with an external n-gram language model on the test-clean/test-other sets while being 29% faster than our CTC Conformer baseline at inference and 36% faster to train.

机器翻译,仅供参考

0 人点赞