自然语言处理学术速递[6.24]

2021-07-02 18:16:33 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.CL 方向,今日共计17篇

BERT(1篇)

【1】 BERT-based Multi-Task Model for Country and Province Level Modern Standard Arabic and Dialectal Arabic Identification 标题:基于ERT的县省级现代标准阿拉伯语和方言阿拉伯语识别多任务模型

作者:Abdellah El Mekki,Abdelkader El Mahdaouy,Kabil Essefar,Nabil El Mamoun,Ismail Berrada,Ahmed Khoumsi 链接:https://arxiv.org/abs/2106.12495 摘要:方言和标准语言识别是许多阿拉伯语自然语言处理应用的关键任务。在本文中,我们提出了我们的基于深度学习的系统,提交给第二个纳迪共同任务的国家级和省级识别现代标准阿拉伯语(MSA)和方言阿拉伯语(DA)。该系统基于一个端到端的深度多任务学习(MTL)模型来解决国家级和省级MSA/DA识别问题。后一个MTL模型由一个共享的双向编码器表示变换器(BERT)编码器、两个任务特定注意层和两个分类器组成。我们的关键思想是利用任务区分和任务间共享的特征来识别国家和省的MSA/DA。结果表明,我们的MTL模型在大多数子任务上都优于单任务模型。 摘要:Dialect and standard language identification are crucial tasks for many Arabic natural language processing applications. In this paper, we present our deep learning-based system, submitted to the second NADI shared task for country-level and province-level identification of Modern Standard Arabic (MSA) and Dialectal Arabic (DA). The system is based on an end-to-end deep Multi-Task Learning (MTL) model to tackle both country-level and province-level MSA/DA identification. The latter MTL model consists of a shared Bidirectional Encoder Representation Transformers (BERT) encoder, two task-specific attention layers, and two classifiers. Our key idea is to leverage both the task-discriminative and the inter-task shared features for country and province MSA/DA identification. The obtained results show that our MTL model outperforms single-task models on most subtasks.

机器翻译(1篇)

【1】 End-to-End Lexically Constrained Machine Translation for Morphologically Rich Languages 标题:面向形态丰富语言的端到端词汇受限机器翻译

作者:Josef Jon,João Paulo Aires,Dušan Variš,Ondřej Bojar 机构:Charles University 链接:https://arxiv.org/abs/2106.12398 摘要:词汇约束的机器翻译允许用户通过强制存在或不存在某些单词和短语来操纵输出句子。尽管当前的方法可以强制术语出现在翻译中,但它们常常难以使约束词形式与生成的输出的其余部分一致。我们的手工分析表明,46%的错误输出的基线约束模型为英语到捷克语翻译有关协议。我们研究的机制,使神经机器翻译推断正确的单词屈折给定柠檬化的约束。特别地,我们着重于基于训练模型的方法,其中约束作为输入序列的一部分。我们在英语-捷克语对上的实验表明,这种方法通过减少一致性错误,提高了自动和手动评估中约束项的翻译。因此,我们的方法消除了屈折错误,而不会引入新的错误或降低翻译的整体质量。 摘要:Lexically constrained machine translation allows the user to manipulate the output sentence by enforcing the presence or absence of certain words and phrases. Although current approaches can enforce terms to appear in the translation, they often struggle to make the constraint word form agree with the rest of the generated output. Our manual analysis shows that 46% of the errors in the output of a baseline constrained model for English to Czech translation are related to agreement. We investigate mechanisms to allow neural machine translation to infer the correct word inflection given lemmatized constraints. In particular, we focus on methods based on training the model with constraints provided as part of the input sequence. Our experiments on the English-Czech language pair show that this approach improves the translation of constrained terms in both automatic and manual evaluation by reducing errors in agreement. Our approach thus eliminates inflection errors, without introducing new errors or decreasing the overall quality of the translation.

Graph|知识图谱|Knowledge(2篇)

【1】 NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs 标题:NodePiess:大型知识图的组合和参数高效表示

作者:Mikhail Galkin,Jiapeng Wu,Etienne Denis,William L. Hamilton 机构:Mila, McGill University, Montreal, Canada 链接:https://arxiv.org/abs/2106.12144 摘要:传统的知识图表示学习算法将每个实体映射到一个唯一的嵌入向量。这种浅层查找导致用于存储嵌入矩阵的内存消耗的线性增长,并且在处理真实世界的KG时产生高计算成本。与NLP中常用的子词标记化方法相比较,我们探索了具有可能的次线性内存需求的更具参数效率的节点嵌入策略。为此,我们提出了NodePiece,一种基于锚的方法来学习固定大小的实体词汇表。在NodePiece中,子词/子实体单元的词汇表是由具有已知关系类型的图中的锚节点构造的。给定这样一个固定大小的词汇表,可以引导任何实体的编码和嵌入,包括在训练期间看不到的实体。实验表明,NodePiece在节点分类、链路预测和关系预测任务中表现出很强的竞争力,同时在图中保留不到10%的显式节点作为锚,并且参数通常减少10倍。 摘要:Conventional representation learning algorithms for knowledge graphs (KG) map each entity to a unique embedding vector. Such a shallow lookup results in a linear growth of memory consumption for storing the embedding matrix and incurs high computational costs when working with real-world KGs. Drawing parallels with subword tokenization commonly used in NLP, we explore the landscape of more parameter-efficient node embedding strategies with possibly sublinear memory requirements. To this end, we propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary. In NodePiece, a vocabulary of subword/sub-entity units is constructed from anchor nodes in a graph with known relation types. Given such a fixed-size vocabulary, it is possible to bootstrap an encoding and embedding for any entity, including those unseen during training. Experiments show that NodePiece performs competitively in node classification, link prediction, and relation prediction tasks while retaining less than 10% of explicit nodes in a graph as anchors and often having 10x fewer parameters.

【2】 ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences 标题:ABCD:一个将复句转换为简单句覆盖集的图框架

作者:Yanjun Gao,Ting-hao,Huang,Rebecca J. Passonneau 机构:Pennsylvania State University 备注:To appear in the proceeding of 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021) Main Conference 链接:https://arxiv.org/abs/2106.12027 摘要:原子分句是理解复句的基本语篇单位。复杂句中原子句的识别对于摘要、论据挖掘、语篇分析、语篇分析、问答等应用都具有重要意义。以前的工作主要依赖于依赖于解析的基于规则的方法。我们提出了一个新的任务,将每一个复句分解成由源中的时态从句派生出的简单句,并提出了一个新的问题表述作为一个图编辑任务。我们的神经模型学习接受、破坏、复制或删除结合词邻接和语法依赖的图形元素。整个处理流程包括图形构造、图形编辑和从输出图形生成句子的模块。我们介绍了一个新的用于训练和评估复句分解的数据集DeSSE和MinWikiSplit的子集MinWiki。ABCD作为MinWiki上的两个解析基线,性能相当。在具有更均匀的复杂句型平衡的DeSSE上,我们的模型实现了比编码器-解码器基线更高的原子句数量精度。结果包括详细的误差分析。 摘要:Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation as a graph edit task. Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies. The full processing pipeline includes modules for graph construction, graph editing, and sentence generation from the output graph. We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition, and MinWiki, a subset of MinWikiSplit. ABCD achieves comparable performance as two parsing baselines on MinWiki. On DeSSE, which has a more even balance of complex sentence types, our model achieves higher accuracy on the number of atomic sentences than an encoder-decoder baseline. Results include a detailed error analysis.

推理|分析|理解|解释(3篇)

【1】 Deep Multi-Task Model for Sarcasm Detection and Sentiment Analysis in Arabic Language 标题:阿拉伯语讽刺检测与情感分析的深度多任务模型

作者:Abdelkader El Mahdaouy,Abdellah El Mekki,Kabil Essefar,Nabil El Mamoun,Ismail Berrada,Ahmed Khoumsi 机构:School of Computer Sciences, Mohammed VI Polytechnic University, Morocco, Sidi Mohamed Ben Abdellah University, Morocco, Dept. Electrical & Computer Engineering, University of Sherbrooke, Canada 链接:https://arxiv.org/abs/2106.12488 摘要:讽刺和反讽等比喻性语言手段的突出,对阿拉伯语情感分析提出了严峻的挑战。在前人的研究工作中,SA和sarcasm的检测是分开进行的,本文引入了一个端到端的深度多任务学习(MTL)模型,允许两个任务之间的知识交互。我们的MTL模型的架构包括一个双向编码器表示Transformer(BERT)模型、一个多任务注意交互模块和两个任务分类器。结果表明,该模型在SA和sarcasm检测子任务上均优于单任务模型。 摘要:The prominence of figurative language devices, such as sarcasm and irony, poses serious challenges for Arabic Sentiment Analysis (SA). While previous research works tackle SA and sarcasm detection separately, this paper introduces an end-to-end deep Multi-Task Learning (MTL) model, allowing knowledge interaction between the two tasks. Our MTL model's architecture consists of a Bidirectional Encoder Representation from Transformers (BERT) model, a multi-task attention interaction module, and two task classifiers. The overall obtained results show that our proposed model outperforms its single-task counterparts on both SA and sarcasm detection sub-tasks.

【2】 It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning 标题:一切都在中心语中:使用注意中心语作为常识推理中跨语言迁移的基线

作者:Alexey Tikhonov,Max Ryabinin 机构:Berlin, Germany, Yandex, HSE University, Moscow, Russia 备注:Accepted to Findings of ACL 2021. 13 pages, 4 figures. Code: this https URL 链接:https://arxiv.org/abs/2106.12066 摘要:常识推理是自然语言处理中的关键问题之一,但标记数据的相对匮乏阻碍了非英语语言的发展。预先训练的跨语言模型是强大的语言不可知表征的来源,但其固有的推理能力仍在积极研究中。在这项工作中,我们设计了一个简单的方法来进行常识推理,它训练一个以多头注意力权重为特征的线性分类器。为了评估这种方法,我们通过在一个标准化的管道中处理来自先前工作的多个数据集来创建一个多语言Winograd模式语料库,并从样本外性能的角度来衡量跨语言泛化能力。该方法与最近的有监督和无监督的常识推理方法相比具有竞争力,即使以零炮方式应用于其他语言。此外,我们还证明,对于所有研究的语言,大多数性能都是由相同的小部分注意头提供的,这为多语言编码器的通用推理能力提供了证据。 摘要:Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for languages other than English. Pretrained cross-lingual models are a source of powerful language-agnostic representations, yet their inherent reasoning capabilities are still actively studied. In this work, we design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features. To evaluate this approach, we create a multilingual Winograd Schema corpus by processing several datasets from prior work within a standardized pipeline and measure cross-lingual generalization ability in terms of out-of-sample performance. The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning, even when applied to other languages in a zero-shot manner. Also, we demonstrate that most of the performance is given by the same small subset of attention heads for all studied languages, which provides evidence of universal reasoning capabilities in multilingual encoders.

【3】 On the Diversity and Limits of Human Explanations 标题:论人文主义解释的多样性与局限性

作者:Chenhao Tan 机构:Department of Computer Science & Harris School of Public Policy, University of Chicago 备注:15 pages, 12 tables 链接:https://arxiv.org/abs/2106.11988 摘要:在自然语言处理领域,越来越多的人致力于建立人类解释的数据集。然而,解释一词包含了广泛的概念,每个概念都有不同的性质和分支。我们的目标是提供一个不同类型的解释和人类的局限性概述,并讨论收集和使用NLP解释的含义。受心理学和认知科学的启发,我们将NLP中已有的人类解释分为三类:近端机制、证据和过程。这三种类型在性质上是不同的,对结果的解释也有影响。例如,在心理学中,程序不被认为是解释,而是与从指令中学习的大量工作联系在一起。解释的多样性进一步体现在注释者解释和回答开放式为什么问题所需的代理问题上。最后,解释可能需要不同的,往往是更深层次的理解,而不是预测,这让人怀疑人类是否能在某些任务中提供有用的解释。 摘要:A growing effort in NLP aims to build datasets of human explanations. However, the term explanation encompasses a broad range of notions, each with different properties and ramifications. Our goal is to provide an overview of diverse types of explanations and human limitations, and discuss implications for collecting and using explanations in NLP. Inspired by prior work in psychology and cognitive sciences, we group existing human explanations in NLP into three categories: proximal mechanism, evidence, and procedure. These three types differ in nature and have implications for the resultant explanations. For instance, procedure is not considered explanations in psychology and connects with a rich body of work on learning from instructions. The diversity of explanations is further evidenced by proxy questions that are needed for annotators to interpret and answer open-ended why questions. Finally, explanations may require different, often deeper, understandings than predictions, which casts doubt on whether humans can provide useful explanations in some tasks.

Zero/Few/One-Shot|迁移|自适应(2篇)

【1】 Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations 标题:基于迁移学习和数据转换的预训练视觉模型对文本数据的分类

作者:Charaf Eddine Benarab 备注:Paper contains: 5 pages, 6 figures, 1 table 链接:https://arxiv.org/abs/2106.12479 摘要:知识是人类通过经验获得的,我们在同一时间完成不同任务所能获得的知识或技能水平之间没有界限。当谈到神经网络时,情况并非如此,该领域的重大突破都是针对特定任务和领域的。视觉和语言以不同的方式处理,使用不同的方法和不同的数据集。在这项工作中,我们建议使用在ImageNet上训练的基准视觉模型所获得的知识来帮助更小的体系结构学习文本分类。将IMDB数据集中包含的文本数据转换为灰度图像后。对不同领域和迁移学习方法进行了分析。尽管不同的数据集带来了挑战,但还是取得了有希望的结果。这项工作的主要贡献是一种新颖的方法,它将语言和视觉上的大型预训练模型连接起来,在不同的子领域中从原始任务中获得最先进的结果。不需要高计算能力的资源。具体来说,情感分析是在视觉模型和语言模型之间传递知识后实现的。将BERT嵌入变换为灰度图像,然后将这些图像作为预训练视觉模型的训练样本,如VGG16和ResNet索引项:自然语言、视觉、BERT、迁移学习、CNN、域自适应。 摘要:Knowledge is acquired by humans through experience, and no boundary is set between the kinds of knowledge or skill levels we can achieve on different tasks at the same time. When it comes to Neural Networks, that is not the case, the major breakthroughs in the field are extremely task and domain specific. Vision and language are dealt with in separate manners, using separate methods and different datasets. In this work, we propose to use knowledge acquired by benchmark Vision Models which are trained on ImageNet to help a much smaller architecture learn to classify text. After transforming the textual data contained in the IMDB dataset to gray scale images. An analysis of different domains and the Transfer Learning method is carried out. Despite the challenge posed by the very different datasets, promising results are achieved. The main contribution of this work is a novel approach which links large pretrained models on both language and vision to achieve state-of-the-art results in different sub-fields from the original task. Without needing high compute capacity resources. Specifically, Sentiment Analysis is achieved after transferring knowledge between vision and language models. BERT embeddings are transformed into grayscale images, these images are then used as training examples for pretrained vision models such as VGG16 and ResNet Index Terms: Natural language, Vision, BERT, Transfer Learning, CNN, Domain Adaptation.

【2】 Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens 标题:使用切换令牌的多个口语-文本式转换任务的Zero-Shot联合建模

作者:Mana Ihori,Naoki Makishima,Tomohiro Tanaka,Akihiko Takashima,Shota Orihashi,Ryo Masumura 机构:NTT Media Intelligence Laboratories, NTT Corporation, Japan 备注:Accepted at INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.12131 摘要:本文提出了一种新的口语文本风格转换方法,该方法可以在不准备匹配数据集的情况下同时执行标点恢复和不流畅删除等多种风格转换模块。实际上,由自动语音识别系统生成的转录本可读性不高,因为它们通常包含许多不流畅的内容,并且不包含标点符号。为了提高可读性,将多个单独模拟单个转换任务的口语文本样式转换模块级联,因为同时处理多个转换任务的匹配数据集通常不可用。但是,由于转换错误链的存在,级联不稳定。此外,级联的计算量必须高于单次转换。为了在不准备匹配数据集的情况下同时执行多个转换任务,我们的关键思想是使用开关来区分单个转换任务。在我们提出的零拍联合建模中,我们使用多个切换令牌来切换单个任务,使我们能够利用零拍学习方法来执行同时转换。在不流畅删除和标点恢复的联合建模实验中,验证了该方法的有效性。 摘要:In this paper, we propose a novel spoken-text-style conversion method that can simultaneously execute multiple style conversion modules such as punctuation restoration and disfluency deletion without preparing matched datasets. In practice, transcriptions generated by automatic speech recognition systems are not highly readable because they often include many disfluencies and do not include punctuation marks. To improve their readability, multiple spoken-text-style conversion modules that individually model a single conversion task are cascaded because matched datasets that simultaneously handle multiple conversion tasks are often unavailable. However, the cascading is unstable against the order of tasks because of the chain of conversion errors. Besides, the computation cost of the cascading must be higher than the single conversion. To execute multiple conversion tasks simultaneously without preparing matched datasets, our key idea is to distinguish individual conversion tasks using the on-off switch. In our proposed zero-shot joint modeling, we switch the individual tasks using multiple switching tokens, enabling us to utilize a zero-shot learning approach to executing simultaneous conversions. Our experiments on joint modeling of disfluency deletion and punctuation restoration demonstrate the effectiveness of our method.

Word2Vec|文本|单词(1篇)

【1】 A Simple and Practical Approach to Improve Misspellings in OCR Text 标题:一种简单实用的改进OCR文本拼写错误的方法

作者:Junxia Lin,Johannes Ledolter 机构:Georgetown University Medical Center, Georgetown University, Washington, D.C., United States, Tippie College of Business, University of Iowa, Iowa City, IA, United States 备注:11 pages, 1 figures 链接:https://arxiv.org/abs/2106.12030 摘要:本文的重点是OCR文本中非文字错误的识别与纠正。这种错误可能是由于不正确的插入、删除或替换字符,或在一个单词中两个相邻字符的换位造成的。或者,它可能是导致运行错误和不正确的拆分错误的字边界问题的结果。传统的N-gram纠错方法能有效地处理单字错误。但是,它们在处理拆分和合并错误时显示出局限性。在本文中,我们提出了一种无监督的方法,可以处理这两个错误。我们开发的方法导致了校正率的显著提高。本教程的文件解决了非常困难的字纠正问题-即不正确的运行和分裂错误-并说明了什么需要考虑时,解决这类问题。我们概述了一种可能的方法,并在有限的研究中评估了它的成功。 摘要:The focus of our paper is the identification and correction of non-word errors in OCR text. Such errors may be the result of incorrect insertion, deletion, or substitution of a character, or the transposition of two adjacent characters within a single word. Or, it can be the result of word boundary problems that lead to run-on errors and incorrect-split errors. The traditional N-gram correction methods can handle single-word errors effectively. However, they show limitations when dealing with split and merge errors. In this paper, we develop an unsupervised method that can handle both errors. The method we develop leads to a sizable improvement in the correction rates. This tutorial paper addresses very difficult word correction problems - namely incorrect run-on and split errors - and illustrates what needs to be considered when addressing such problems. We outline a possible approach and assess its success on a limited study.

其他神经网络|深度学习|模型|建模(1篇)

【1】 Reinforcement Learning-based Dialogue Guided Event Extraction to Exploit Argument Relations 标题:基于强化学习的对话引导事件抽取挖掘论元关系

作者:Qian Li,Hao Peng,Jianxin Li,Yuanxing Ning,Lihong Wang,Philip S. Yu,Zheng Wang 机构:Zheng Wang is with the School of Computing, University of Leeds 链接:https://arxiv.org/abs/2106.12384 摘要:事件抽取是自然语言处理的一项基本任务。发现事件参数(如事件参与者)的角色对于事件提取至关重要。然而,对于现实生活中的事件描述这样做是很有挑战性的,因为一个论点的作用在不同的语境中往往是不同的。虽然多个论据之间的关系和相互作用有助于解决论据的角色,但现有的方法在很大程度上忽略了这些信息。该文提出了一种更好的事件抽取方法,通过显式地利用事件参数之间的关系。我们通过精心设计的面向任务的对话系统来实现这一目标。为了建立参数关系模型,我们采用强化学习和增量学习的方法,通过多轮迭代过程提取多个参数。我们的方法利用同一句子中已经提取的论点的知识来确定难以单独确定的论点的作用。然后使用新获得的信息来改进先前提取的参数的决策。这种双向反馈过程允许我们利用论据关系来有效地解决论据角色,从而更好地理解句子和提取事件。实验结果表明,该方法在事件分类、论元角色和论元识别等方面的性能均优于现有的七种事件提取方法。 摘要:Event extraction is a fundamental task for natural language processing. Finding the roles of event arguments like event participants is essential for event extraction. However, doing so for real-life event descriptions is challenging because an argument's role often varies in different contexts. While the relationship and interactions between multiple arguments are useful for settling the argument roles, such information is largely ignored by existing approaches. This paper presents a better approach for event extraction by explicitly utilizing the relationships of event arguments. We achieve this through a carefully designed task-oriented dialogue system. To model the argument relation, we employ reinforcement learning and incremental learning to extract multiple arguments via a multi-turned, iterative process. Our approach leverages knowledge of the already extracted arguments of the same sentence to determine the role of arguments that would be difficult to decide individually. It then uses the newly obtained information to improve the decisions of previously extracted arguments. This two-way feedback process allows us to exploit the argument relations to effectively settle argument roles, leading to better sentence understanding and event extraction. Experimental results show that our approach consistently outperforms seven state-of-the-art event extraction methods for the classification of events and argument role and argument identification.

其他(6篇)

【1】 Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding 标题:稳定、快速、准确:相对位置编码的核心注意力

作者:Shengjie Luo,Shanda Li,Tianle Cai,Di He,Dinglan Peng,Shuxin Zheng,Guolin Ke,Liwei Wang,Tie-Yan Liu 机构:Peking University, Princeton University, University of Science and Technology of China, Microsoft Research 备注:Preprint. Work in Progress 链接:https://arxiv.org/abs/2106.12566 摘要:注意模块是Transformer的重要组成部分,由于其二次复杂性,不能有效地扩展到长序列。许多工作集中在对点的逼近,然后对softmax函数进行指数化处理,导致了次二次甚至线性复杂的变换器结构。然而,我们发现,这些方法不能应用于更强大的注意模块,超越点然后指数风格,如Transformer与相对位置编码(RPE)。由于在许多最先进的模型中,相对位置编码被用作默认值,因此设计能够结合RPE的高效转换器是很有吸引力的。在本文中,我们提出了一种新的方法来加速Transformer的注意力计算的RPE上的核注意。基于相对位置编码形成Toeplitz矩阵的观察,我们从数学上证明了快速傅立叶变换(FFT)可以有效地计算RPE核化注意。通过FFT,我们的方法达到了$mathcal{O}(nlogn)$的时间复杂度。有趣的是,我们进一步证明了适当使用相对位置编码可以缓解香草核化注意的训练不稳定性问题。在广泛的任务,我们的经验表明,我们的模型可以从零开始训练,没有任何优化问题。所学习的模型比许多有效的Transformer变型具有更好的性能,并且在长序列情况下比标准Transformer更快。 摘要:The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e.g., Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention. Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT). With FFT, our method achieves $mathcal{O}(nlog n)$ time complexity. Interestingly, we further demonstrate that properly using relative positional encoding can mitigate the training instability problem of vanilla kernelized attention. On a wide range of tasks, we empirically show that our models can be trained from scratch without any optimization issues. The learned model performs better than many efficient Transformer variants and is faster than standard Transformer in the long-sequence regime.

【2】 Mixtures of Deep Neural Experts for Automated Speech Scoring 标题:用于语音自动评分的混合深度神经网络专家

作者:Sara Papi,Edmondo Trentin,Roberto Gretter,Marco Matassoni,Daniele Falavigna 备注:None 链接:https://arxiv.org/abs/2106.12475 摘要:本文研究了从语言学习者的口语反应到测试提示的第二语言水平自动评估的问题。这项任务与计算机辅助语言学习领域有着密切的关系。本文提出的方法依赖于两个独立的模块:(1)一个自动语音识别系统,生成所涉及的口语交互的文本记录;(2)一个基于深度学习者的多分类器系统,将记录的文本分类。不同的深层神经网络结构(包括前馈和递归)在文本的不同表示上是专门化的:参考语法、概率语言模型的结果、几个单词嵌入和两袋单词模型。单个分类器的组合可以通过概率伪联合模型来实现,也可以通过专家的神经网络混合来实现。利用第三次语音呼叫共享任务挑战的数据,得到了目前为止三个流行的评估指标的最高值。 摘要:The paper copes with the task of automatic assessment of second language proficiency from the language learners' spoken responses to test prompts. The task has significant relevance to the field of computer assisted language learning. The approach presented in the paper relies on two separate modules: (1) an automatic speech recognition system that yields text transcripts of the spoken interactions involved, and (2) a multiple classifier system based on deep learners that ranks the transcripts into proficiency classes. Different deep neural network architectures (both feed-forward and recurrent) are specialized over diverse representations of the texts in terms of: a reference grammar, the outcome of probabilistic language models, several word embeddings, and two bag-of-word models. Combination of the individual classifiers is realized either via a probabilistic pseudo-joint model, or via a neural mixture of experts. Using the data of the third Spoken CALL Shared Task challenge, the highest values to date were obtained in terms of three popular evaluation metrics.

【3】 PALRACE: Reading Comprehension Dataset with Human Data and Labeled Rationales 标题:PALRACE:包含人类数据和标注理论的阅读理解数据集

作者:Jiajie Zou,Yuran Zhang,Peiqing Jin,Cheng Luo,Xunyi Pan,Nai Ding 机构:Zhejiang Lab Hangzhou, China, Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical, Engineering and Instrument Sciences, Zhejiang University Hangzhou, China 链接:https://arxiv.org/abs/2106.12373 摘要:预先训练的语言模型在机器阅读理解(MRC)任务中取得了很好的效果,但结果很难解释。使模型可解释的一个吸引人的方法是为其决策提供理论依据。为了促进对人类基本原理的监督学习,我们提出了一个新的MRC数据集PALRACE(修剪和标记的RACE),它包含了从RACE数据集中选择的800个通道的人类标记的基本原理。我们把每一篇文章的问题进一步分为六类。每一篇文章都由至少26名参与者阅读,他们用自己的理由来回答问题。此外,我们进行了一次基本原理评估会议,要求参与者仅根据标记的基本原理回答问题,确认标记的基本原理质量高,能够充分支持问题回答。 摘要:Pre-trained language models achieves high performance on machine reading comprehension (MRC) tasks but the results are hard to explain. An appealing approach to make models explainable is to provide rationales for its decision. To facilitate supervised learning of human rationales, here we present PALRACE (Pruned And Labeled RACE), a new MRC dataset with human labeled rationales for 800 passages selected from the RACE dataset. We further classified the question to each passage into 6 types. Each passage was read by at least 26 participants, who labeled their rationales to answer the question. Besides, we conducted a rationale evaluation session in which participants were asked to answering the question solely based on labeled rationales, confirming that the labeled rationales were of high quality and can sufficiently support question answering.

【4】 CharacterChat: Supporting the Creation of Fictional Characters through Conversation and Progressive Manifestation with a Chatbot 标题:CharacterChat:支持通过对话和与聊天机器人的渐进式表现来创建虚拟角色

作者:Oliver Schmitt,Daniel Buschek 机构: Department of Computer Science, University of Bayreuth 备注:14 pages, 2 figures, 2 tables; ACM C&C 2021 链接:https://arxiv.org/abs/2106.12314 摘要:我们提出CharacterChat,一个概念和聊天机器人,以支持作家创造虚构的人物。具体地说,作家们通过对话逐步地把机器人变成他们想象中的角色。我们采用以用户为中心的方法反复开发CharacterChat,首先对作家(N=30)的角色创建进行调查,然后进行两项定性用户研究(N=7和N=8)。我们的原型结合了两种模式:(1)引导式提示帮助作者定义角色属性(例如用户:“你的名字是简”),包括属性建议(例如机器人:“我的主要动机是什么?”)和价值观,实现为一个基于规则的系统和一个概念网络(2) 与chatbot的开放式对话有助于作者探索自己的角色并获得灵感,这是通过一个考虑到已定义角色属性的语言模型实现的。我们的用户研究揭示了特别是对于角色创建的早期阶段的好处,以及由于会话能力有限而带来的挑战。最后,我们总结了经验教训和对今后工作的看法。 摘要:We present CharacterChat, a concept and chatbot to support writers in creating fictional characters. Concretely, writers progressively turn the bot into their imagined character through conversation. We iteratively developed CharacterChat in a user-centred approach, starting with a survey on character creation with writers (N=30), followed by two qualitative user studies (N=7 and N=8). Our prototype combines two modes: (1) Guided prompts help writers define character attributes (e.g. User: "Your name is Jane."), including suggestions for attributes (e.g. Bot: "What is my main motivation?") and values, realised as a rule-based system with a concept network. (2) Open conversation with the chatbot helps writers explore their character and get inspiration, realised with a language model that takes into account the defined character attributes. Our user studies reveal benefits particularly for early stages of character creation, and challenges due to limited conversational capabilities. We conclude with lessons learned and ideas for future work.

【5】 Recognising Biomedical Names: Challenges and Solutions 标题:生物医学名称识别:挑战与解决方案

作者:Xiang Dai 机构:Supervisors:, Sarvnaz Karimi, Ben Hachey, Cecile Paris, Joachim Gudmundsson, A thesis submitted in fulfilment of, the requirements for the degree of, Doctor of Philosophy, School of Computer Science, The University of Sydney, Australia, arXiv:,.,v, [cs.CL] , Jun 备注:PhD thesis, University of Sydney 链接:https://arxiv.org/abs/2106.12230 摘要:生物医学文献数量的增长速度惊人。解开这些文档中的信息可以使研究人员和从业者在信息世界中自信地运作。生物医学名称识别(Biomedical NER)通常作为NLP管道的第一步。基于序列标记技术的标准NER模型能够很好地识别泛型域中的短实体。然而,应用这些模型来识别生物医学名称还存在一些挑战:1)生物医学名称可能包含复杂的内部结构(不连续和重叠),这些结构无法使用标准序列标记技术来识别;2) 神经网络模型的训练通常需要大量的标记数据,这在生物医学领域是很难获得的;3)对常用的语言表示模型进行泛型数据预训练;因此,在这些模型和目标生物医学数据之间存在领域转移。为了应对这些挑战,我们探索了几个研究方向并做出了以下贡献:1)提出了一个基于转换的NER模型,该模型能够识别不连续提及;2) 我们开发了一种经济有效的方法,指定合适的预训练数据;并且,3)设计了几种新的数据扩充方法。我们的贡献具有明显的实际意义,特别是在需要新的生物医学应用时。我们提出的数据扩充方法可以帮助NER模型获得良好的性能,只需要少量的标记数据。我们在选择预训练数据方面的研究可以通过结合使用域内数据预训练的语言表示模型来改进模型。最后,我们提出的基于转换的NER模型可以通过识别不连续提及来进一步提高性能。 摘要:The growth rate in the amount of biomedical documents is staggering. Unlocking information trapped in these documents can enable researchers and practitioners to operate confidently in the information world. Biomedical NER, the task of recognising biomedical names, is usually employed as the first step of the NLP pipeline. Standard NER models, based on sequence tagging technique, are good at recognising short entity mentions in the generic domain. However, there are several open challenges of applying these models to recognise biomedical names: 1) Biomedical names may contain complex inner structure (discontinuity and overlapping) which cannot be recognised using standard sequence tagging technique; 2) The training of NER models usually requires large amount of labelled data, which are difficult to obtain in the biomedical domain; and, 3) Commonly used language representation models are pre-trained on generic data; a domain shift therefore exists between these models and target biomedical data. To deal with these challenges, we explore several research directions and make the following contributions: 1) we propose a transition-based NER model which can recognise discontinuous mentions; 2) We develop a cost-effective approach that nominates the suitable pre-training data; and, 3) We design several data augmentation methods for NER. Our contributions have obvious practical implications, especially when new biomedical applications are needed. Our proposed data augmentation methods can help the NER model achieve decent performance, requiring only a small amount of labelled data. Our investigation regarding selecting pre-training data can improve the model by incorporating language representation models, which are pre-trained using in-domain data. Finally, our proposed transition-based NER model can further improve the performance by recognising discontinuous mentions.

【6】 On Positivity Bias in Negative Reviews 标题:论负面评论中的正面偏向

作者:Madhusudhan Aithal,Chenhao Tan 机构:University of Colorado Boulder, University of Chicago 备注:11 pages, 17 figures, ACL 2021 链接:https://arxiv.org/abs/2106.12056 摘要:先前的研究表明,在人类表达中,积极的词语比消极的词语出现的频率更高,这通常归因于积极偏见,即人们倾向于报告对现实的积极看法。但是负面评论中使用的语言呢?与先前的研究结果一致,我们使用了大量的数据,发现英语负面评论中的正面词多于负面词。我们将这一观察结果与先前关于否定语用学的研究结果相一致,并表明否定通常与否定评论中的肯定词相联系。此外,在否定性评论中,大多数带有肯定词的句子表达的是基于情感量词的否定意见,表示某种形式的否定。 摘要:Prior work has revealed that positive words occur more frequently than negative words in human expressions, which is typically attributed to positivity bias, a tendency for people to report positive views of reality. But what about the language used in negative reviews? Consistent with prior work, we show that English negative reviews tend to contain more positive words than negative words, using a variety of datasets. We reconcile this observation with prior findings on the pragmatics of negation, and show that negations are commonly associated with positive words in negative reviews. Furthermore, in negative reviews, the majority of sentences with positive words express negative opinions based on sentiment classifiers, indicating some form of negation.

0 人点赞