自然语言处理学术速递[6.23]

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.CL 方向，今日共计22篇

BERT(1篇)

【1】 LV-BERT: Exploiting Layer Variety for BERT 标题：LV-BERT：开发BERT的层多样性

作者：Weihao Yu,Zihang Jiang,Fei Chen,Qibin Hou,Jiashi Feng 机构：National University, of Singapore, Huawei Noah’s, Ark Lab 备注：Accepted to Findings of ACL 2021. The code and pre-trained models are available at this https URL 链接：https://arxiv.org/abs/2106.11740 摘要：现代预先训练的语言模型大多是建立在主干上，以交错的顺序堆叠自我注意和前馈层。在本文中，除了这种刻板的层模式之外，我们还从层类型集和层顺序两个方面来利用层的多样性来改进预先训练的模型。具体来说，除了原始的自我注意层和前馈层外，我们在层类型集中引入了卷积，实验发现这对预先训练的模型是有益的。此外，除了原来的交错顺序，我们探索更多的层顺序，以发现更强大的架构。然而，引入的层多样性导致了超过数十亿个候选模型的大的架构空间，而从头开始训练单个候选模型已经需要巨大的计算成本，使得通过直接训练大量候选模型来搜索这样的空间是负担不起的。为了解决这一问题，我们首先对一个超网进行预训练，从中继承所有候选模型的权值，然后采用基于预训练精度的进化算法来寻找最优结构。大量实验表明，该方法得到的LV-BERT模型在各种下游任务上都优于BERT及其变种。例如，LV BERT small在胶水测试集上达到78.8，比强基线ELECTRA small高出1.8。摘要：Modern pre-trained language models are mostly built upon backbones stacking self-attention and feed-forward layers in an interleaved order. In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order. Specifically, besides the original self-attention and feed-forward layers, we introduce convolution into the layer type set, which is experimentally found beneficial to pre-trained models. Furthermore, beyond the original interleaved order, we explore more layer orders to discover more powerful architectures. However, the introduced layer variety leads to a large architecture space of more than billions of candidates, while training a single candidate model from scratch already requires huge computation cost, making it not affordable to search such a space by directly training large amounts of candidate models. To solve this problem, we first pre-train a supernet from which the weights of all candidate models can be inherited, and then adopt an evolutionary algorithm guided by pre-training accuracy to find the optimal architecture. Extensive experiments show that LV-BERT model obtained by our method outperforms BERT and its variants on various downstream tasks. For example, LV-BERT-small achieves 78.8 on the GLUE testing set, 1.8 higher than the strong baseline ELECTRA-small.

QA|VQA|问答|对话(2篇)

【1】 Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering 标题：学会解决会话依赖：会话问答的一致性训练框架

作者：Gangwoo Kim,Hyunjae Kim,Jungsoo Park,Jaewoo Kang 机构：Korea University 备注：12 pages, ACL 2021 链接：https://arxiv.org/abs/2106.11575 摘要：会话问答（CQA）的主要挑战之一是解决会话依赖，如回指和省略。然而，现有的方法并没有明确地训练QA模型如何解决依赖关系，因此这些模型在理解人类对话方面受到限制。在本文中，我们提出了一个新的框架ExCorD（Explicit guidance on how to resolve Conversational Dependency）来提高QA模型理解会话上下文的能力。ExCorD首先生成不需要会话历史就可以理解的自包含问题，然后使用基于一致性的正则化器训练包含原始问题和自包含问题对的QA模型。在我们的实验中，我们证明ExCorD显著地提高了QA模型的性能，在QuAC上提高了1.2f1，在CANARD上提高了5.2f1，同时解决了现有方法的局限性。摘要：One of the main challenges in conversational question answering (CQA) is to resolve the conversational dependency, such as anaphora and ellipsis. However, existing approaches do not explicitly train QA models on how to resolve the dependency, and thus these models are limited in understanding human dialogues. In this paper, we propose a novel framework, ExCorD (Explicit guidance on how to resolve Conversational Dependency) to enhance the abilities of QA models in comprehending conversational context. ExCorD first generates self-contained questions that can be understood without the conversation history, then trains a QA model with the pairs of original and self-contained questions using a consistency-based regularizer. In our experiments, we demonstrate that ExCorD significantly improves the QA models' performance by up to 1.2 F1 on QuAC, and 5.2 F1 on CANARD, while addressing the limitations of the existing approaches.

【2】 Fine-tune the Entire RAG Architecture (including DPR retriever) for Question-Answering 标题：微调整个RAG体系结构(包括DPR检索器)以进行问答

作者：Shamane Siriwardhana,Rivindu Weerasekera,Elliott Wen,Suranga Nanayakkara 机构：Auckland Bioengineering Instutute, The University of Auckland, Auckland, New Zealand 备注：for associated code, see this https URL 链接：https://arxiv.org/abs/2106.11517 摘要：在本文中，我们将演示如何以端到端的方式微调整个检索增强生成（RAG）体系结构。我们强调了实现这一目标需要解决的主要工程挑战。我们还比较了端到端RAG架构在回答问题方面的性能如何优于原始RAG架构。我们在HuggingFace Transformers库中实现了开源。摘要：In this paper, we illustrate how to fine-tune the entire Retrieval Augment Generation (RAG) architecture in an end-to-end manner. We highlighted the main engineering challenges that needed to be addressed to achieve this objective. We also compare how end-to-end RAG architecture outperforms the original RAG architecture for the task of question answering. We have open-sourced our implementation in the HuggingFace Transformers library.

机器翻译(2篇)

【1】 On the Evaluation of Machine Translation for Terminology Consistency 标题：论机器翻译对术语一致性的评价

作者：Md Mahfuz ibn Alam,Antonios Anastasopoulos,Laurent Besacier,James Cross,Matthias Gallé,Philipp Koehn,Vassilina Nikoulina 机构：Matthias Gall´e, Department of Computer Science, George Mason University, NAVER Labs, Grenoble, Facebook, Johns Hopkins University 备注：preprint 链接：https://arxiv.org/abs/2106.11891 摘要：随着神经机器翻译（NMT）系统成为专业翻译管道的重要组成部分，越来越多的工作集中在将NMT与术语相结合。在许多场景中，尤其是在域适配的情况下，人们期望机器翻译输出遵循术语提供的约束。在这项工作中，我们提出了衡量机器翻译输出与领域术语一致性的指标。我们对5种语言的COVID-19结构域进行了研究，还进行了针对人类的术语评估。我们为计算所有提议的指标开放源代码：https://github.com/mahfuzibnalam/terminology_evaluation 摘要：As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies. In many scenarios and particularly in cases of domain adaptation, one expects the MT output to adhere to the constraints provided by a terminology. In this work, we propose metrics to measure the consistency of MT output with regards to a domain terminology. We perform studies on the COVID-19 domain over 5 languages, also performing terminology-targeted human evaluation. We open-source the code for computing all proposed metrics: https://github.com/mahfuzibnalam/terminology_evaluation

【2】 Phrase-level Active Learning for Neural Machine Translation 标题：神经机器翻译中的短语级主动学习

作者：Junjie Hu,Graham Neubig 机构：Language Technologies Institute, Carnegie Mellon University 链接：https://arxiv.org/abs/2106.11375 摘要：神经机器翻译（NMT）对域移位非常敏感。在本文中，我们在一个主动学习环境中解决这个问题，在这个环境中，我们可以花费一定的预算来翻译域内数据，并在新翻译的数据上逐步微调预先训练好的域外NMT模型。现有的NMT主动学习方法通常是基于不确定性分数来选择句子，但是这些方法需要花费大量的代价来翻译完整的句子，即使句子中只有一个或两个关键短语是有用的。为了解决这一局限性，我们重新审视了基于短语的机器翻译（PBMT）时代以前的工作，即选择的不是完整的句子，而是单个短语。然而，虽然将这些短语合并到PBMT系统中相对简单，但对于NMT系统来说却不那么简单，因为NMT系统需要对完整序列进行训练，以捕获新领域特有的句子的更大结构特性。为了克服这些障碍，我们建议在新的域中从未标记的数据中选择完整的句子和单独的短语来路由到人工翻译。在一个德语-英语翻译任务中，我们的主动学习方法比基于不确定性的句子选择方法取得了一致的改进，比强主动学习基线提高了1.2 BLEU分数。摘要：Neural machine translation (NMT) is sensitive to domain shift. In this paper, we address this problem in an active learning setting where we can spend a given budget on translating in-domain data, and gradually fine-tune a pre-trained out-of-domain NMT model on the newly translated data. Existing active learning methods for NMT usually select sentences based on uncertainty scores, but these methods require costly translation of full sentences even when only one or two key phrases within the sentence are informative. To address this limitation, we re-examine previous work from the phrase-based machine translation (PBMT) era that selected not full sentences, but rather individual phrases. However, while incorporating these phrases into PBMT systems was relatively simple, it is less trivial for NMT systems, which need to be trained on full sequences to capture larger structural properties of sentences unique to the new domain. To overcome these hurdles, we propose to select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators. In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods, improving up to 1.2 BLEU score over strong active learning baselines.

语义分析(1篇)

【1】 Error-Aware Interactive Semantic Parsing of OpenStreetMap 标题：OpenStreetMap的错误感知交互语义解析

作者：Michael Staniek,Stefan Riezler 机构：Heidelberg University, Germany, Computational Linguistics & IWR 备注：Accepted at SpLU-RoboNLP 2021 链接：https://arxiv.org/abs/2106.11739 摘要：在针对OpenStreetMap（OSM）等现实世界数据库的地理查询的语义分析中，不一定存在唯一的正确答案。相反，事实可能就在用户的眼中，用户需要进入一个交互式设置，在这个设置中，可以解决歧义，纠正解析错误。我们的工作提出了一种交互式语义分析的方法，其中执行显式错误检测，并生成一个澄清问题，以确定可疑的歧义或错误源，并将其传达给人类用户。我们的实验结果表明，基于熵的不确定性检测和beam搜索的结合，以及关于澄清问题、初始解析和用户答案的多源训练，使得在用于OSM语义解析的NLMaps数据集上执行率为90.26%的解析器的F1得分提高了1.2%。摘要：In semantic parsing of geographical queries against real-world databases such as OpenStreetMap (OSM), unique correct answers do not necessarily exist. Instead, the truth might be lying in the eye of the user, who needs to enter an interactive setup where ambiguities can be resolved and parsing mistakes can be corrected. Our work presents an approach to interactive semantic parsing where an explicit error detection is performed, and a clarification question is generated that pinpoints the suspected source of ambiguity or error and communicates it to the human user. Our experimental results show that a combination of entropy-based uncertainty detection and beam search, together with multi-source training on clarification question, initial parse, and user answer, results in improvements of 1.2% F1 score on a parser that already performs at 90.26% on the NLMaps dataset for OSM semantic parsing.

Graph|知识图谱|Knowledge(3篇)

【1】 End-to-End Task-Oriented Dialog Modeling with Semi-Structured Knowledge Management 标题：基于半结构化知识管理的端到端面向任务的对话建模

作者：Silin Gao,Ryuichi Takanobu,Minlie Huang 机构：CoAI Group, DCST, IAI, BNRIST, Tsinghua University, Beijing, China 备注：Submitted to IEEE/ACM TASLP, regular paper. arXiv admin note: substantial text overlap with arXiv:2105.06041 链接：https://arxiv.org/abs/2106.11796 摘要：当前的面向任务的对话（TOD）系统主要管理结构化知识（如数据库和表）来指导面向目标的对话。然而，他们不能处理对话，其中也涉及非结构化的知识（如审查和文件）。本文提出了一个基于结构化和非结构化知识融合的TOD建模任务。为了解决这一问题，我们提出了一个具有半结构化知识管理的TOD系统SeKnow，它将信念状态扩展到管理结构化和非结构化内容的知识。此外，我们还介绍了SeKnow的两个实现，分别基于一个非预训练的序列到序列模型和一个预训练的语言模型。这两种实现都使用端到端的方式来联合优化基于结构化和非结构化知识的对话框建模。我们在修改后的multiwoz2.1数据集上进行了实验，其中对话框被处理成包含半结构化知识。实验结果表明，SeKnow在端到端对话和中间知识管理方面都有很强的性能，与现有的TOD系统及其扩展的流水线知识管理方案相比，SeKnow具有很强的性能。摘要：Current task-oriented dialog (TOD) systems mostly manage structured knowledge (e.g. databases and tables) to guide the goal-oriented conversations. However, they fall short of handling dialogs which also involve unstructured knowledge (e.g. reviews and documents). In this paper, we formulate a task of modeling TOD grounded on a fusion of structured and unstructured knowledge. To address this task, we propose a TOD system with semi-structured knowledge management, SeKnow, which extends the belief state to manage knowledge with both structured and unstructured contents. Furthermore, we introduce two implementations of SeKnow based on a non-pretrained sequence-to-sequence model and a pretrained language model, respectively. Both implementations use the end-to-end manner to jointly optimize dialog modeling grounded on structured and unstructured knowledge. We conduct experiments on the modified version of MultiWOZ 2.1 dataset, where dialogs are processed to involve semi-structured knowledge. Experimental results show that SeKnow has strong performances in both end-to-end dialog and intermediate knowledge management, compared to existing TOD systems and their extensions with pipeline knowledge management schemes.

【2】 Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech 标题：基于知识的仇恨言论反叙事生成研究

作者：Yi-Ling Chung,Serra Sinem Tekiroglu,Marco Guerini 机构：Fondazione Bruno Kessler, Via Sommarive , Povo, Trento, Italy, University of Trento, Italy 备注：To appear in "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL): Findings" 链接：https://arxiv.org/abs/2106.11783 摘要：利用知情的文本回应（称为反叙述）解决网络仇恨问题最近受到关注。因此，出现了一条自动生成反叙述的研究路线，以促进对仇恨讨论的直接干预，并防止仇恨内容进一步传播。尽管如此，目前的神经方法倾向于产生一般性/重复性反应，缺乏事实、统计或例子等有根据的最新证据。此外，这些模型可以创造似是而非但未必真实的论点。在本文中，我们提出了第一个完整的知识绑定反叙事生成管道，建立在一个外部知识库，可以提供更多的信息内容，以打击网上仇恨。结合我们的方法，我们提出了一系列的实验，证明其可行性，产生适当的和信息的反叙事领域内和跨领域的设置。摘要：Tackling online hatred using informed textual responses - called counter narratives - has been brought under the spotlight recently. Accordingly, a research line has emerged to automatically generate counter narratives in order to facilitate the direct intervention in the hate discussion and to prevent hate content from further spreading. Still, current neural approaches tend to produce generic/repetitive responses and lack grounded and up-to-date evidence such as facts, statistics, or examples. Moreover, these models can create plausible but not necessarily true arguments. In this paper we present the first complete knowledge-bound counter narrative generation pipeline, grounded in an external knowledge repository that can provide more informative content to fight online hatred. Together with our approach, we present a series of experiments that show its feasibility to produce suitable and informative counter narratives in in-domain and cross-domain settings.

【3】 Graph Routing between Capsules 标题：胶囊之间的图路由

作者：Yang Li,Wei Zhao,Erik Cambria,Suhang Wang,Steffen Eger 机构：Northwestern Polytechnical University, China, Nanyang Technological University, Singapore, Technical University of Darmstadt, Germany, Pennsylvania State University, USA 备注：None 链接：https://arxiv.org/abs/2106.11531 摘要：胶囊网络中的路由方法通常学习连续层胶囊之间的层次关系，但对同一层胶囊之间的内部关系研究较少，而这种内部关系是文本数据语义理解的关键因素。因此，本文引入一种新的带图路由的胶囊网络来学习这两种关系，其中每一层的胶囊都被视为图的节点。我们研究了从一层胶囊中产生三种不同距离的邻接矩阵和度矩阵的策略，并提出了胶囊之间的图路由机制。我们在五个文本分类数据集上验证了我们的方法，并且我们的发现表明，结合自底向上路由和自顶向下注意的方法表现最好。这种方法展示了跨数据集的泛化能力。与最新的路由方法相比，我们使用的五个数据集的精确度分别提高了0.82、0.39、0.07、1.01和0.02。摘要：Routing methods in capsule networks often learn a hierarchical relationship for capsules in successive layers, but the intra-relation between capsules in the same layer is less studied, while this intra-relation is a key factor for the semantic understanding in text data. Therefore, in this paper, we introduce a new capsule network with graph routing to learn both relationships, where capsules in each layer are treated as the nodes of a graph. We investigate strategies to yield adjacency and degree matrix with three different distances from a layer of capsules, and propose the graph routing mechanism between those capsules. We validate our approach on five text classification datasets, and our findings suggest that the approach combining bottom-up routing and top-down attention performs the best. Such an approach demonstrates generalization capability across datasets. Compared to the state-of-the-art routing methods, the improvements in accuracy in the five datasets we used were 0.82, 0.39, 0.07, 1.01, and 0.02, respectively.

摘要|信息提取(1篇)

【1】 How well do you know your summarization datasets? 标题：您对摘要数据集的了解程度如何？

作者：Priyam Tejaswin,Dhruv Naik,Pengfei Liu 机构：Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA 备注：Accepted into Findings of ACL-IJCNLP 2021 链接：https://arxiv.org/abs/2106.11388 摘要：最先进的摘要系统是在从网上搜集的大量数据集上训练和评估的。尽管它们普遍存在，但我们对这些数据集的基本特征（数据噪声、摘要复杂性等）知之甚少，以及它们如何影响系统性能和自动度量（如ROUGE）的可靠性。在这项研究中，我们从三个流行的摘要数据集中手工分析了600个样本。我们的研究是由一个六类类型学驱动的，它捕获不同的噪声类型（丢失的事实，实体）和程度的摘要困难（提取，抽象）。我们随后对27种最先进的摘要模型和5种流行的度量标准进行了深入分析，并报告了我们的主要见解：（1）数据集具有不同的数据质量和复杂性分布，可以追溯到它们的收集过程(2）模型的性能和度量的可靠性依赖于样本的复杂性(3）由于参考文献的多样性较差，忠实的摘要往往得分较低。我们发布了代码、带注释的数据和模型输出。摘要：State-of-the-art summarization systems are trained and evaluated on massive datasets scraped from the web. Despite their prevalence, we know very little about the underlying characteristics (data noise, summarization complexity, etc.) of these datasets, and how these affect system performance and the reliability of automatic metrics like ROUGE. In this study, we manually analyze 600 samples from three popular summarization datasets. Our study is driven by a six-class typology which captures different noise types (missing facts, entities) and degrees of summarization difficulty (extractive, abstractive). We follow with a thorough analysis of 27 state-of-the-art summarization models and 5 popular metrics, and report our key insights: (1) Datasets have distinct data quality and complexity distributions, which can be traced back to their collection process. (2) The performance of models and reliability of metrics is dependent on sample complexity. (3) Faithful summaries often receive low scores because of the poor diversity of references. We release the code, annotated data and model outputs.

推理|分析|理解|解释(3篇)

【1】 Do Language Models Perform Generalizable Commonsense Inference? 标题：语言模型是否执行可概括的常识推理？

作者：Peifeng Wang,Filip Ilievski,Muhao Chen,Xiang Ren 机构：Department of Computer Science, University of Southern California, Information Sciences Institute, University of Southern California 备注：8 pages, 4 figures. Accepted to ACL'21 Findings 链接：https://arxiv.org/abs/2106.11533 摘要：受预先训练的语言模型（LMs）编码常识知识的证据的启发，最近的工作已经应用LMs自动填充常识知识图（CKGs）。然而，人们对其推广到多个CKG、看不见的关系和新颖的实体缺乏认识。本文从知识容量、可转移性和归纳三个方面分析了LMs进行归纳推理的能力。我们在这三个方面的实验表明：（1）LMs能够适应由多个ckg定义的不同模式，但不能重用知识来推广新的关系(2）自适应LMs能很好地推广到看不见的对象，但对新奇对象的推广效果较差。如何提高LMs中常识挖掘的可传递性和可归纳性，是下一步研究的重点。摘要：Inspired by evidence that pretrained language models (LMs) encode commonsense knowledge, recent work has applied LMs to automatically populate commonsense knowledge graphs (CKGs). However, there is a lack of understanding on their generalization to multiple CKGs, unseen relations, and novel entities. This paper analyzes the ability of LMs to perform generalizable commonsense inference, in terms of knowledge capacity, transferability, and induction. Our experiments with these three aspects show that: (1) LMs can adapt to different schemas defined by multiple CKGs but fail to reuse the knowledge to generalize to new relations. (2) Adapted LMs generalize well to unseen subjects, but less so on novel objects. Future work should investigate how to improve the transferability and induction of commonsense mining from LMs.

【2】 Membership Inference on Word Embedding and Beyond 标题：词嵌入和词外的隶属度推理

作者：Saeed Mahloujifar,Huseyin A. Inan,Melissa Chase,Esha Ghosh,Marcello Hasegawa 机构：Princeton University, Microsoft Research, Microsoft Corporation 链接：https://arxiv.org/abs/2106.11384 摘要：在文本处理上下文中，大多数ML模型都建立在单词嵌入的基础上。这些嵌入本身是在一些数据集上训练的，可能包含敏感数据。在某些情况下，这种训练是独立完成的，在另一些情况下，它是作为一个更大的、特定于任务的模型训练的一部分进行的。在这两种情况下，考虑基于嵌入层的成员推理攻击是理解敏感信息泄漏的一种方法。但是，有点令人惊讶的是，对单词嵌入的成员推理攻击以及它们在使用这些嵌入的自然语言处理（NLP）任务中的作用，仍然没有得到充分的研究。在这项工作中，我们证明了在现实假设下，单词嵌入容易受到黑盒成员身份推理攻击。此外，我们还通过另外两个主要的NLP应用程序（分类和文本生成）证明了这种泄漏仍然存在，即使嵌入层没有暴露给攻击者。我们证明了我们的MI攻击对分类器模型和基于LSTM的语言模型具有较高的攻击精度。实际上，我们的攻击是对文本生成模型的一种廉价的成员推断攻击，它不需要目标模型的知识，也不需要对文本生成模型进行任何昂贵的训练。摘要：In the text processing context, most ML models are built on word embeddings. These embeddings are themselves trained on some datasets, potentially containing sensitive data. In some cases this training is done independently, in other cases, it occurs as part of training a larger, task-specific model. In either case, it is of interest to consider membership inference attacks based on the embedding layer as a way of understanding sensitive information leakage. But, somewhat surprisingly, membership inference attacks on word embeddings and their effect in other natural language processing (NLP) tasks that use these embeddings, have remained relatively unexplored. In this work, we show that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions. Furthermore, we show that this leakage persists through two other major NLP applications: classification and text-generation, even when the embedding layer is not exposed to the attacker. We show that our MI attack achieves high attack accuracy against a classifier model and an LSTM-based language model. Indeed, our attack is a cheaper membership inference attack on text-generative models, which does not require the knowledge of the target model or any expensive training of text-generative models as shadow models.

【3】 Analysis and Tuning of a Voice Assistant System for Dysfluent Speech 标题：一种不流利语音辅助系统的分析与调谐

作者：Vikramjit Mitra,Zifang Huang,Colin Lea,Lauren Tooley,Sarah Wu,Darren Botten,Ashwini Palekar,Shrinath Thelapurath,Panayiotis Georgiou,Sachin Kajarekar,Jefferey Bigham 机构：Apple, Cupertino, CA, USA 备注：5 pages, 1 page reference, 2 figures 链接：https://arxiv.org/abs/2106.11759 摘要：语音发音的不流畅和变异会严重降低语音识别性能，对于许多中重度语音障碍患者来说，语音操作系统不起作用。当前的语音识别系统主要是用流利的说话者的数据来训练的，因此不能很好地推广到语音不流畅的情况，如声音或单词重复、声音延长或听觉障碍。这项工作的重点是定量分析消费者的语音识别系统对个人谁口吃和生产为导向的方法，以提高性能的共同语音助理任务（即，“天气如何？”）。在基线检查时，该系统引入了大量的插入和替换错误，导致流利性障碍患者的预期语音词错误率（isWER）下降13.64%（绝对值）。我们表明，在现有的混合语音识别系统中，只要简单地调整解码参数，就可以将流利性障碍个体的isWER提高24%（相对）。与所有口吃严重程度的18名受试者的默认设置相比，调整这些参数可以提高3.6%的领域识别率和1.7%的意图识别率。摘要：Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6% better domain recognition and 1.7% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities.

GAN|对抗|攻击|生成相关(2篇)

【1】 Exemplars-guided Empathetic Response Generation Controlled by the Elements of Human Communication 标题：人类交际要素控制下的样例引导的同理心反应生成

作者：Navonil Majumder,Deepanway Ghosal,Devamanyu Hazarika,Alexander Gelbukh,Rada Mihalcea,Soujanya Poria 机构：Singapore University of Technology, and Design, Singapore, National University of Singapore, Instituto Politécnico Nacional, Mexico City, Mexico, University of Michigan, Ann Arbor, Michigan, USA 链接：https://arxiv.org/abs/2106.11791 摘要：现有的移情反应生成方法大多依赖于情境中的情感来生成移情反应。然而，移情远不止是用适当的情绪产生反应。它也常常需要微妙的表达理解和个人共鸣与其他对话者的情况。不幸的是，这样的质量很难量化，而且数据集缺乏相关的注释。为了解决这个问题，在本文中，我们提出了一种方法，依靠范例提示生成模式的优良文体特性，信号移情的对话者。为此，我们采用密集通道检索的方法从训练集中提取相关的典型反应。人类交流的三个要素——情感的存在、解释和探索，以及情感，通过使用合成标签来引导一代人走向移情。人类的评价也被人类交流的这些要素所延伸。我们的经验表明，这些方法在移情反应的质量方面产生了显著的改善，无论是自动的还是人工评估的指标。可在https://github.com/declare-lab/exemplary-empathy. 摘要：The majority of existing methods for empathetic response generation rely on the emotion of the context to generate empathetic responses. However, empathy is much more than generating responses with an appropriate emotion. It also often entails subtle expressions of understanding and personal resonance with the situation of the other interlocutor. Unfortunately, such qualities are difficult to quantify and the datasets lack the relevant annotations. To address this issue, in this paper we propose an approach that relies on exemplars to cue the generative model on fine stylistic properties that signal empathy to the interlocutor. To this end, we employ dense passage retrieval to extract relevant exemplary responses from the training set. Three elements of human communication -- emotional presence, interpretation, and exploration, and sentiment are additionally introduced using synthetic labels to guide the generation towards empathy. The human evaluation is also extended by these elements of human communication. We empirically show that these approaches yield significant improvements in empathetic response quality in terms of both automated and human-evaluated metrics. The implementation is available at https://github.com/declare-lab/exemplary-empathy.

【2】 BARTScore: Evaluating Generated Text as Text Generation 标题：BARTScore：将生成的文本作为文本生成求值

作者：Weizhe Yuan,Graham Neubig,Pengfei Liu 机构：Carnegie Mellon University 备注：Demo at this http URL 链接：https://arxiv.org/abs/2106.11520 摘要：各种各样的自然语言处理应用，如机器翻译、摘要和对话，都涉及到文本生成。这些应用程序面临的一个主要挑战是如何评估生成的文本是否真正流畅、准确或有效。在这项工作中，我们将生成文本的评估概念化为一个文本生成问题，使用预先训练的序列到序列模型进行建模。一般的想法是，当生成的文本更好时，经过训练将生成的文本转换成参考输出或源文本的模型将获得更高的分数。我们使用BART（一种基于编码器-解码器的预训练模型）来实现这一想法，并提出了一种度量BARTScore，其中包含许多变量，这些变量可以以无监督的方式灵活地应用于从不同角度（如信息性、流利性或真实性）对文本的评估。BARTScore在概念上简单，在经验上有效。它可以在22个测试设置中的16个测试设置中超过现有的最高评分标准，包括16个数据集（例如，机器翻译、文本摘要）和7个不同角度（例如，信息性、真实性）的评估。计算BARTScore的代码位于https://github.com/neulab/BARTScore，我们发布了一个交互式排行榜，用于在http://explainaboard.nlpedia.ai/leaderboard/task-meval/ 在ExplainOnard平台上，它允许我们以交互方式了解每个指标的优点、缺点和互补性。摘要：A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference output or the source text will achieve higher scores when the generated text is better. We operationalize this idea using BART, an encoder-decoder based pre-trained model, and propose a metric BARTScore with a number of variants that can be flexibly applied in an unsupervised fashion to evaluation of text from different perspectives (e.g. informativeness, fluency, or factuality). BARTScore is conceptually simple and empirically effective. It can outperform existing top-scoring metrics in 16 of 22 test settings, covering evaluation of 16 datasets (e.g., machine translation, text summarization) and 7 different perspectives (e.g., informativeness, factuality). Code to calculate BARTScore is available at https://github.com/neulab/BARTScore, and we have released an interactive leaderboard for meta-evaluation at http://explainaboard.nlpedia.ai/leaderboard/task-meval/ on the ExplainaBoard platform, which allows us to interactively understand the strengths, weaknesses, and complementarity of each metric.

检测相关(1篇)

【1】 Deep Learning Models in Detection of Dietary Supplement Adverse Event Signals from Twitter 标题：深度学习模型在推特上检测膳食补充剂不良事件信号中的应用

作者：Yefeng Wang,Yunpeng Zhao,Jiang Bian,Rui Zhang 机构：Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA,Department, of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, USA 备注：1 Figure, 6 Tables 链接：https://arxiv.org/abs/2106.11403 摘要：目的：本研究的目的是开发一个深度学习管道来检测来自Twitter的膳食补充剂相关不良事件（DS-AEs）信号。材料与方法：我们从2012年到2018年共获得247807条微博，其中提到DS和AE。我们在2000条随机选择的tweet上注释了生物医学实体和关系。在概念抽取任务中，我们比较了支持向量机（SVM）、CRF和LSTM-CRF分类器嵌入传统单词的性能，并与BERT模型进行了比较。对于关系抽取任务，我们比较了手套向量和CNN分类器以及BERT模型。我们在每个任务中选择性能最好的模型来组装一个端到端的深度学习管道来检测DS-AE信号，并将结果与DS知识库（即iDISK）中已知的DS-AEs进行比较。结果：在两个任务中，基于BERT的模型都优于传统的词嵌入。表现最好的概念提取模型是BioBERT模型，该模型能够识别补充剂、症状和身体器官实体，F1得分分别为0.8646、0.8497和0.7104。最佳的关系提取模型是BERT模型，该模型能够识别目的关系和AE关系，F1得分分别为0.8335和0.7538。端到端管道能够提取DS指征和DS AEs，F1得分分别为0.7459和07414。与iDISK相比，我们可以找到已知的和新的DS-AEs。结论：我们已经证明了利用基于BioBERT的深度学习管道从Twitter中检测DS-AE信号的可行性。摘要：Objective: The objective of this study is to develop a deep learning pipeline to detect signals on dietary supplement-related adverse events (DS AEs) from Twitter. Material and Methods: We obtained 247,807 tweets ranging from 2012 to 2018 that mentioned both DS and AE. We annotated biomedical entities and relations on 2,000 randomly selected tweets. For the concept extraction task, we compared the performance of traditional word embeddings with SVM, CRF and LSTM-CRF classifiers to BERT models. For the relation extraction task, we compared GloVe vectors with CNN classifiers to BERT models. We chose the best performing models in each task to assemble an end-to-end deep learning pipeline to detect DS AE signals and compared the results to the known DS AEs from a DS knowledge base (i.e., iDISK). Results: In both tasks, the BERT-based models outperformed traditional word embeddings. The best performing concept extraction model is the BioBERT model that can identify supplement, symptom, and body organ entities with F1-scores of 0.8646, 0.8497, and 0.7104, respectively. The best performing relation extraction model is the BERT model that can identify purpose and AE relations with F1-scores of 0.8335 and 0.7538, respectively. The end-to-end pipeline was able to extract DS indication and DS AEs with an F1-score of 0.7459 and 0,7414, respectively. Comparing to the iDISK, we could find both known and novel DS-AEs. Conclusion: We have demonstrated the feasibility of detecting DS AE signals from Twitter with a BioBERT-based deep learning pipeline.

识别/分类(1篇)

【1】 Incremental Deep Neural Network Learning using Classification Confidence Thresholding 标题：基于分类置信度阈值的增量式深度神经网络学习

作者：Justin Leo,Jugal Kalita 机构：Department of Computer Science, University of Colorado at Colorado Springs 备注：Accepted to IEEE TNNLS 链接：https://arxiv.org/abs/2106.11437 摘要：大多数现代神经网络分类都没有考虑到未知的概念。训练好的神经网络通常是在一个不现实的场景中进行测试，只需要从一组封闭的已知类中提取例子。为了开发更真实的模型，引入了在开放集环境中工作的概念。这反过来又导致了增量学习的概念，即具有自己的体系结构和初始训练数据集的模型可以在测试阶段识别未知类，并在检测到新类的证据时自动更新自身。增量学习中出现的一些问题是，重复训练分类器时资源的使用效率低下，并且随着时间的推移，多个类的添加会降低分类精度。实例化新类的过程会根据需要重复多次，从而产生错误。针对这些问题，本文提出了一种基于分类置信阈值的素数神经网络增量学习方法，通过限制遗忘来保持较高的学习精度。一个精益的方法也被用来减少神经网络再训练中使用的资源。提出的方法是基于这样一种思想，即网络即使暴露在与新类相关联的有限数量的样本中，也能够增量地学习新类。这种方法可以应用于大多数现有的神经网络，只需对网络结构进行最小的修改。摘要：Most modern neural networks for classification fail to take into account the concept of the unknown. Trained neural networks are usually tested in an unrealistic scenario with only examples from a closed set of known classes. In an attempt to develop a more realistic model, the concept of working in an open set environment has been introduced. This in turn leads to the concept of incremental learning where a model with its own architecture and initial trained set of data can identify unknown classes during the testing phase and autonomously update itself if evidence of a new class is detected. Some problems that arise in incremental learning are inefficient use of resources to retrain the classifier repeatedly and the decrease of classification accuracy as multiple classes are added over time. This process of instantiating new classes is repeated as many times as necessary, accruing errors. To address these problems, this paper proposes the Classification Confidence Threshold approach to prime neural networks for incremental learning to keep accuracies high by limiting forgetting. A lean method is also used to reduce resources used in the retraining of the neural network. The proposed method is based on the idea that a network is able to incrementally learn a new class even when exposed to a limited number samples associated with the new class. This method can be applied to most existing neural networks with minimal changes to network architecture.

Word2Vec|文本|单词(1篇)

【1】 KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers 标题：KaggleDBQA：文本到SQL解析器的现实评估

作者：Chia-Hsuan Lee,Oleksandr Polozov,Matthew Richardson 机构：♢University of Washington, ♠Microsoft Research, Redmond 备注：Published as a conference paper at ACL-IJCNLP 2021 链接：https://arxiv.org/abs/2106.11455 摘要：数据库问答的目标是实现对不同应用领域中的实际关系数据库的自然语言查询。最近，大规模数据集（如Spider和WikiSQL）为文本到SQL解析提供了新的建模技术，提高了对不可见数据库的零快照泛化。在这项工作中，我们研究了仍然阻碍这些技术实际应用的挑战。首先，我们提出了KaggleDBQA，一个新的跨领域的真实Web数据库评估数据集，具有特定领域的数据类型、原始格式和不受限制的问题。第二，我们重新检查在实际环境中应用的文本到SQL解析器的评估任务的选择。最后，我们用数据库文档来扩充我们的领域内评估任务，数据库文档是隐含领域知识的自然来源。我们表明，KaggleDBQA对最先进的零炮解析器提出了挑战，但是更现实的评估设置和对相关数据库文档的创造性使用将它们的准确度提高了13.2%以上，性能提高了一倍。摘要：The goal of database question answering is to enable natural language querying of real-life relational databases in diverse application domains. Recently, large-scale datasets such as Spider and WikiSQL facilitated novel modeling techniques for text-to-SQL parsing, improving zero-shot generalization to unseen databases. In this work, we examine the challenges that still prevent these techniques from practical deployment. First, we present KaggleDBQA, a new cross-domain evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. Second, we re-examine the choice of evaluation tasks for text-to-SQL parsers as applied in real-life settings. Finally, we augment our in-domain evaluation task with database documentation, a naturally occurring source of implicit domain knowledge. We show that KaggleDBQA presents a challenge to state-of-the-art zero-shot parsers but a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%, doubling their performance.

其他神经网络|深度学习|模型|建模(2篇)

【1】 A Comprehensive Exploration of Pre-training Language Models 标题：训练前语言模式的综合探索

作者：Tong Guo 备注：working in progress 链接：https://arxiv.org/abs/2106.11483 摘要：近年来，预训练语言模型的发展给自然语言处理带来了新的发展趋势。本文探讨了各种预训练语言模型的效率。我们使用相同的文本量和相同的训练步骤预先训练一系列基于transformer的模型。实验结果表明，对原始BERT的最大改进是增加RNN层，以便为Transformer编码器层捕获更多的上下文信息。摘要：Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for the transformer-encoder layers.

【2】 Dive into Deep Learning 标题：深入研究深度学习

作者：Aston Zhang,Zachary C. Lipton,Mu Li,Alexander J. Smola 机构：Jun 备注：(HTML) this https URL (GitHub) this https URL 链接：https://arxiv.org/abs/2106.11342 摘要：这本开源的书代表了我们试图让深度学习变得平易近人，教读者概念、上下文和代码。整本书是在Jupyter笔记本中起草的，无缝集成了说明图、数学和交互式示例以及自包含的代码。我们的目标是提供一个资源，可以（i）免费提供给每个人(ii）提供足够的技术深度，为实际成为应用机器学习科学家提供起点(iii）包括可运行代码，向读者展示如何在实践中解决问题(iv）允许我们和整个社区快速更新(v）辅以一个论坛，就技术细节进行互动讨论并回答问题。摘要：This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely available for everyone; (ii) offer sufficient technical depth to provide a starting point on the path to actually becoming an applied machine learning scientist; (iii) include runnable code, showing readers how to solve problems in practice; (iv) allow for rapid updates, both by us and also by the community at large; (v) be complemented by a forum for interactive discussion of technical details and to answer questions.

其他(2篇)

【1】 SENT: Sentence-level Distant Relation Extraction via Negative Training 标题：SENT：基于否定训练的语句级远距离关系抽取

作者：Ruotian Ma,Tao Gui,Linyang Li,Qi Zhang,Yaqian Zhou,Xuanjing Huang 机构：School of Computer Science, Fudan University, Shanghai, China, Institute of Modern Languages and Linguistics, Fudan University, Shanghai, China 备注：Accepted by ACL 2021 链接：https://arxiv.org/abs/2106.11566 摘要：关系抽取的远程监控为包中的每个句子提供统一的包标签，而准确的句子标签对于需要确切关系类型的下游应用程序非常重要。直接使用包标签进行句子级训练会引入大量噪声，从而严重降低性能。在这项工作中，我们建议使用否定训练（NT），其中一个模型是用互补标签训练的，关于“实例不属于这些互补标签”。由于选择真实标签作为补充标签的概率较低，NT提供的噪声信息较少。此外，用NT训练的模型能够将噪声数据从训练数据中分离出来。在NT的基础上，我们提出了一个句子级的框架SENT，用于距离关系的提取。SENT不仅对含噪数据进行滤波，构造出一个更干净的数据集，而且对含噪数据进行重新标记，将其转化为有用的训练数据，从而进一步提高了模型的性能。实验结果表明，该方法在句子级评价和去噪效果上都有明显的改进。摘要：Distant supervision for relation extraction provides uniform bag labels for each sentence inside the bag, while accurate sentence labels are important for downstream applications that need the exact relation type. Directly using bag labels for sentence-level training will introduce much noise, thus severely degrading performance. In this work, we propose the use of negative training (NT), in which a model is trained using complementary labels regarding that ``the instance does not belong to these complementary labels". Since the probability of selecting a true label as a complementary label is low, NT provides less noisy information. Furthermore, the model trained with NT is able to separate the noisy data from the training data. Based on NT, we propose a sentence-level framework, SENT, for distant relation extraction. SENT not only filters the noisy data to construct a cleaner dataset, but also performs a re-labeling process to transform the noisy data into useful training data, thus further benefiting the model's performance. Experimental results show the significant improvement of the proposed method over previous methods on sentence-level evaluation and de-noise effect.

【2】 A Survey of Race, Racism, and Anti-Racism in NLP 标题：关于NLP中的种族、种族主义和反种族主义问题的调查

作者：Anjalie Field,Su Lin Blodgett,Zeerak Waseem,Yulia Tsvetkov 机构：Carnegie Mellon University, Microsoft Research, University of Sheffield, University of Washington 备注：Accepted to ACL 2021 链接：https://arxiv.org/abs/2106.11410 摘要：尽管种族和语言之间有着千丝万缕的联系，但在自然语言处理的研究和开发中很少考虑到种族问题。在这项工作中，我们调查79篇论文从ACL选集提到种族。这些论文揭示了NLP模型发展的各个阶段中与种族相关的各种偏见，强调了需要积极考虑NLP系统如何维护种族等级制度。然而，关于种族和自然语言处理的研究仍然存在着持续的差距：种族一直被孤立为一个小众话题，在许多自然语言处理任务中仍然被忽视；大多数工作都把种族作为一个固定的一维变量来操作，带有一个基本的真相标签，这有可能加强历史种族主义产生的差异；而历史边缘人群的声音在NLP文学中几乎是缺席的。通过确定NLP文献在哪里和如何考虑到种族，特别是在与相关领域的比较中，我们的工作呼吁在NLP研究实践中纳入和种族公正。摘要：Despite inextricable ties between race and language, little work has considered race in NLP research and development. In this work, we survey 79 papers from the ACL anthology that mention race. These papers reveal various types of race-related bias in all stages of NLP model development, highlighting the need for proactive consideration of how NLP systems can uphold racial hierarchies. However, persistent gaps in research on race and NLP remain: race has been siloed as a niche topic and remains ignored in many NLP tasks; most work operationalizes race as a fixed single-dimensional variable with a ground-truth label, which risks reinforcing differences produced by historical racism; and the voices of historically marginalized people are nearly absent in NLP literature. By identifying where and how NLP literature has and has not considered race, especially in comparison to related fields, our work calls for inclusion and racial justice in NLP research practices.

https 网络安全 linux NLP服务数据库

0 人点赞