自然语言处理学术速递[12.15]

cs.CL 方向，今日共计40篇

BERT(1篇)

【1】 CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising 标题：Coco-BERT：用对比跨模态匹配和去噪改进视频语言预训练链接：https://arxiv.org/abs/2112.07515

作者：Jianjie Luo,Yehao Li,Yingwei Pan,Ting Yao,Hongyang Chao,Tao Mei 机构：★ School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, ♣ The Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of, Education, Guangzhou, China, ♠ JD AI Research, Beijing, China 备注：ACM Multimedia 2021 摘要：BERT式结构导致了视觉语言预训练的革命，并在众多视觉语言下游任务中取得了最新成果。现有解决方案主要利用带有掩码令牌的多模态输入来触发基于掩码的代理预训练任务（例如，掩码语言建模和掩码对象/帧预测）。在这项工作中，我们认为这样的蒙蔽输入将不可避免地在跨模式匹配代理任务中引入噪声，从而使固有的视觉语言关联未得到充分的探索。作为替代方案，我们推导了一种用于视频语言预训练的特定形式的跨模态代理目标，即对比跨模态匹配和去噪（CoCo）。通过将屏蔽帧/词序列视为主要未屏蔽帧/词序列的噪声增强，CoCo通过在屏蔽和未屏蔽输入之间以对比方式同时进行模态间匹配和模态内去噪来增强视频语言关联。我们的CoCo代理目标可以进一步集成到任何用于视频语言预训练的BERT类型编解码器结构中，称为对比跨模态BERT（CoCo BERT）。我们在电视数据集和新收集的大规模GIF视频数据集（ACTION）上预先训练CoCo BERT。通过广泛的下游任务（如跨模式检索、视频问答和视频字幕）的大量实验，我们证明了CoCo-BERT作为预训练结构的优越性。摘要：BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks. Existing solutions dominantly capitalize on the multi-modal inputs with mask tokens to trigger mask-based proxy pre-training tasks (e.g., masked language modeling and masked object/frame prediction). In this work, we argue that such masked inputs would inevitably introduce noise for cross-modal matching proxy task, and thus leave the inherent vision-language association under-explored. As an alternative, we derive a particular form of cross-modal proxy objective for video-language pre-training, i.e., Contrastive Cross-modal matching and denoising (CoCo). By viewing the masked frame/word sequences as the noisy augmentation of primary unmasked ones, CoCo strengthens video-language association by simultaneously pursuing inter-modal matching and intra-modal denoising between masked and unmasked inputs in a contrastive manner. Our CoCo proxy objective can be further integrated into any BERT-type encoder-decoder structure for video-language pre-training, named as Contrastive Cross-modal BERT (CoCo-BERT). We pre-train CoCo-BERT on TV dataset and a newly collected large-scale GIF video dataset (ACTION). Through extensive experiments over a wide range of downstream tasks (e.g., cross-modal retrieval, video question answering, and video captioning), we demonstrate the superiority of CoCo-BERT as a pre-trained structure.

QA|VQA|问答|对话(4篇)

【1】 Dual-Key Multimodal Backdoors for Visual Question Answering 标题：用于视觉答疑的双键多模式后门链接：https://arxiv.org/abs/2112.07668

作者：Matthew Walmer,Karan Sikka,Indranil Sur,Abhinav Shrivastava,Susmit Jha 机构：University of Maryland, College Park, SRI International 备注：22 pages, 11 figures, 12 tables 摘要：深度学习的成功促进了多模态任务的发展，这些任务需要对多个输入域进行非平凡的融合。尽管多模态模型在许多问题中显示出了潜力，但其日益增加的复杂性使其更容易受到攻击。后门（或特洛伊木马）攻击是一类安全漏洞，其中攻击者将恶意秘密行为嵌入到网络中（例如，有针对性的误分类），当攻击者指定的触发器添加到输入时，该网络会被激活。在这项工作中，我们表明多模网络容易受到一种新型攻击，我们称之为双键多模后门。这种攻击利用最先进的网络使用的复杂融合机制来嵌入既有效又隐蔽的后门。该攻击没有使用单个触发器，而是在每个输入模式中嵌入一个触发器，并且仅当两个触发器都存在时才激活恶意行为。我们对具有多种体系结构和视觉特征主干的视觉问答（VQA）任务的多模式后门进行了广泛的研究。在VQA模型中嵌入后门的一个主要挑战是，大多数模型使用从固定的预训练对象检测器中提取的视觉特征。这对攻击者来说是一个挑战，因为探测器可能会完全扭曲或忽略视觉触发器，从而导致后门过分依赖语言触发器的模型。我们通过提出一种针对预训练目标检测器的视觉触发优化策略来解决这个问题。通过这种方法，我们创建了双键后门，攻击成功率超过98%，同时只毒害了1%的训练数据。最后，我们发布了TrojVQA，这是一个干净的和特洛伊木马VQA模型的大集合，以支持针对多模式后门的防御研究。摘要：The success of deep learning has enabled advances in multimodal tasks that require non-trivial fusion of multiple input domains. Although multimodal models have shown potential in many problems, their increased complexity makes them more vulnerable to attacks. A Backdoor (or Trojan) attack is a class of security vulnerability wherein an attacker embeds a malicious secret behavior into a network (e.g. targeted misclassification) that is activated when an attacker-specified trigger is added to an input. In this work, we show that multimodal networks are vulnerable to a novel type of attack that we refer to as Dual-Key Multimodal Backdoors. This attack exploits the complex fusion mechanisms used by state-of-the-art networks to embed backdoors that are both effective and stealthy. Instead of using a single trigger, the proposed attack embeds a trigger in each of the input modalities and activates the malicious behavior only when both the triggers are present. We present an extensive study of multimodal backdoors on the Visual Question Answering (VQA) task with multiple architectures and visual feature backbones. A major challenge in embedding backdoors in VQA models is that most models use visual features extracted from a fixed pretrained object detector. This is challenging for the attacker as the detector can distort or ignore the visual trigger entirely, which leads to models where backdoors are over-reliant on the language trigger. We tackle this problem by proposing a visual trigger optimization strategy designed for pretrained object detectors. Through this method, we create Dual-Key Backdoors with over a 98% attack success rate while only poisoning 1% of the training data. Finally, we release TrojVQA, a large collection of clean and trojan VQA models to enable research in defending against multimodal backdoors.

【2】 Scaling Up Query-Focused Summarization to Meet Open-Domain Question Answering 标题：扩展以查询为中心的摘要以满足开放领域的问题回答链接：https://arxiv.org/abs/2112.07536

作者：Weijia Zhang,Svitlana Vakulenko,Thilina Rajapakse,Evangelos Kanoulas 机构：Kanoulas, University of Amsterdam, The Netherlands, Amazon, Spain 摘要：以查询为中心的摘要（QFS）要求使用一组相关文档生成给定查询的文本摘要。然而，在实践中，此类相关文件并不容易获得，但应首先从文件集合中检索。因此，我们将展示如何扩展此任务以使其更现实。因此，任务设置也类似于开放域问答任务的设置，其中的答案是顶级检索文档的摘要。为了解决这个扩展任务，我们将段落检索与文本生成相结合，在给定输入查询的情况下生成检索到的段落的摘要。我们展示了对建议任务的第一次评估结果，并表明少量样本足以微调具有检索通道的大型生成模型。摘要：Query-focused summarization (QFS) requires generating a textual summary given a query using a set of relevant documents. However, in practice, such relevant documents are not readily available but should be first retrieved from a document collection. Therefore, we show how to extend this task to make it more realistic. Thereby the task setup also resembles the settings of the open-domain question answering task, where the answer is a summary of the top-retrieved documents. To address this extended task, we combine passage retrieval with text generation to produce the summary of the retrieved passages given the input query. We demonstrate the first evaluation results on the proposed task and show that a few samples are sufficient to fine-tune a large generative model with retrieved passages.

【3】 You Only Need One Model for Open-domain Question Answering 标题：你只需要一个模型就可以进行开放领域的答疑链接：https://arxiv.org/abs/2112.07381

作者：Haejun Lee,Akhil Kedia,Jongwon Lee,Ashwin Paranjape,Christopher D. Manning,Kyoung-Gu Woo 机构：♠ Samsung Research, ♣ Stanford University, ♥ Growdle Corporation 备注：preprint 摘要：开放领域问题回答的最新工作涉及使用检索器模型的外部知识库，可以选择使用单独的重排器模型重新排列文章，并使用另一个读者模型生成答案。尽管执行了相关任务，但这些模型具有单独的参数，并且在训练期间耦合性很弱。在这项工作中，我们建议将检索器和重排器转换为硬注意机制，在transformer体系结构中顺序应用，并将生成的计算表示提供给读者。在这种单一的模型架构中，隐藏的表示从检索器到重新排序器再到读取器被逐步细化，这更有效地利用了模型容量，并且当我们以端到端的方式对其进行训练时，也会产生更好的梯度流。我们还提出了一种预训练方法来有效地训练这种体系结构。我们在自然问题和TriviaQA开放数据集上评估了我们的模型，对于固定参数预算，我们的模型的精确匹配分数分别为1.0和0.7，优于先前的最先进模型。摘要：Recent works for Open-domain Question Answering refer to an external knowledge base using a retriever model, optionally rerank the passages with a separate reranker model and generate an answer using an another reader model. Despite performing related tasks, the models have separate parameters and are weakly-coupled during training. In this work, we propose casting the retriever and the reranker as hard-attention mechanisms applied sequentially within the transformer architecture and feeding the resulting computed representations to the reader. In this singular model architecture the hidden representations are progressively refined from the retriever to the reranker to the reader, which is more efficient use of model capacity and also leads to better gradient flow when we train it in an end-to-end manner. We also propose a pre-training methodology to effectively train this architecture. We evaluate our model on Natural Questions and TriviaQA open datasets and for a fixed parameter budget, our model outperforms the previous state-of-the-art model by 1.0 and 0.7 exact match scores.

【4】 Multi-Instance Training for Question Answering Across Table and Linked Text 标题：跨表、跨链接文本问答的多实例训练链接：https://arxiv.org/abs/2112.07337

作者：Vishwajeet Kumar,Saneem Chemmengath,Yash Gupta,Jaydeep Sen,Samarth Bharadwaj,Soumen Chakrabarti 机构：Saneem Chemmengarh†, † IBM Research, ‡ IIT Bombay 摘要：使用表中的信息（TableQA）回答自然语言问题是最近相当感兴趣的话题。在许多应用程序中，表不是孤立出现的，而是嵌入或链接到非结构化文本中。通常，最好通过将问题的各个部分与表格单元格内容或非结构化文本范围相匹配，并从任一来源提取答案来回答问题。这导致了HybridQA数据集引入的TextTableQA问题的新空间。现有的基于Transformer的阅读理解（RC）结构的表格表示法无法通过单个系统解决两种表示法的不同模式。由于需要远程监督，对此类系统的训练面临进一步的挑战。为了减少认知负担，训练实例通常只包括问答，后者匹配多个表格行和文本段落。这导致了一个嘈杂的多实例训练机制，不仅涉及表的行，还涉及链接文本的跨度。为了应对这些挑战，我们提出了MITQA，这是一个新的TextTableQA系统，它明确地对表行选择和文本跨度选择的不同但密切相关的概率空间进行建模。我们的实验表明，与最近的基线相比，我们的方法具有优越性。提出的方法目前在HybridQA排行榜上名列前茅，测试集保持不变，与之前公布的结果相比，EM和F1成绩绝对提高了21%。摘要：Answering natural language questions using information from tables (TableQA) is of considerable recent interest. In many applications, tables occur not in isolation, but embedded in, or linked to unstructured text. Often, a question is best answered by matching its parts to either table cell contents or unstructured text spans, and extracting answers from either source. This leads to a new space of TextTableQA problems that was introduced by the HybridQA dataset. Existing adaptations of table representation to transformer-based reading comprehension (RC) architectures fail to tackle the diverse modalities of the two representations through a single system. Training such systems is further challenged by the need for distant supervision. To reduce cognitive burden, training instances usually include just the question and answer, the latter matching multiple table rows and text passages. This leads to a noisy multi-instance training regime involving not only rows of the table, but also spans of linked text. We respond to these challenges by proposing MITQA, a new TextTableQA system that explicitly models the different but closely-related probability spaces of table row selection and text span selection. Our experiments indicate the superiority of our approach compared to recent baselines. The proposed method is currently at the top of the HybridQA leaderboard with a held out test set, achieving 21 % absolute improvement on both EM and F1 scores over previous published results.

语义分析(2篇)

【1】 Semantic Answer Type and Relation Prediction Task (SMART 2021) 标题：语义答案类型和关系预测任务(SMART 2021) 链接：https://arxiv.org/abs/2112.07606

作者：Nandana Mihindukulasooriya,Mohnish Dubey,Alfio Gliozzo,Jens Lehmann,Axel-Cyrille Ngonga Ngomo,Ricardo Usbeck,Gaetano Rossiello,Uttam Kumar 机构：IBM Research AI, University of Bonn and Fraunhofer IAIS, Universität Paderborn, Conversational AI, Fraunhofer IAIS Dresden, IBM Research, USA, Germany 摘要：每年，国际语义网会议都会组织一系列语义网挑战赛，以建立竞赛，推动某些问题领域的最新解决方案。语义答案类型和关系预测任务（SMART）任务是ISWC 2021语义Web挑战之一。在ISWC 2020成功举办SMART 2020之后，今年是挑战的第二年。今年的版本重点关注对知识库问答（KBQA）非常重要的两个子任务：答案类型预测和关系预测。问题类型和答案类型预测可以在知识库问答系统中发挥关键作用，提供有关预期答案的见解，有助于生成正确的查询或对候选答案进行排序。更具体地说，给出一个自然语言问题，第一个任务是，使用目标本体（例如DBpedia或Wikidata）预测答案类型。类似地，第二项任务是识别自然语言查询中的关系并将其链接到目标本体中的关系。本文讨论了任务描述、基准数据集和评估指标。有关更多信息，请访问https://smart-task.github.io/2021/. 摘要：Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge after a successful SMART 2020 at ISWC 2020. This year's version focuses on two sub-tasks that are very important to Knowledge Base Question Answering (KBQA): Answer Type Prediction and Relation Prediction. Question type and answer type prediction can play a key role in knowledge base question answering systems providing insights about the expected answer that are helpful to generate correct queries or rank the answer candidates. More concretely, given a question in natural language, the first task is, to predict the answer type using a target ontology (e.g., DBpedia or Wikidata. Similarly, the second task is to identify relations in the natural language query and link them to the relations in a target ontology. This paper discusses the task descriptions, benchmark datasets, and evaluation metrics. For more information, please visit https://smart-task.github.io/2021/.

【2】 Reinforcing Semantic-Symmetry for Document Summarization 标题：加强文档摘要的语义对称性链接：https://arxiv.org/abs/2112.07583

作者：Mingyang Song,Liping Jing 机构：Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China 备注：10 pages, 8 figures 摘要：文档摘要将一个长文档压缩为一个短版本，其中包含显著的信息和准确的语义描述。主要问题是如何使输出摘要在语义上与输入文档一致。为了实现这一目标，最近，研究人员将重点放在有监督的端到端混合方法上，该方法包括提取器模块和抽象器模块。其中，提取器从输入文档中识别出突出的句子，摘要器从突出的句子中生成摘要。该模型通过各种策略（如强化学习）成功地保持了生成的摘要和参考摘要之间的一致性。在训练混合模型时，存在两个语义鸿沟（一个是文档和抽取句子之间的鸿沟，另一个是抽取句子和摘要之间的鸿沟）。然而，现有的方法并没有明确考虑它们，这通常会导致总结的语义偏差。为了缓解上述问题，本文提出了一种新的textbf{r}einforcing stextbf{e}mantic-textbf{sy}mmetry learningtextbf{m}文档摘要模型（textbf{ReSyM}）。ReSyM在提取器中引入了语义一致性奖励，以弥补第一个缺口。设计了语义双重奖励来弥补抽象器中的第二个缺口。整个文档摘要过程通过强化学习和混合奖励机制（结合上述两种奖励）实现。此外，本文还提出了一种全面的句子表征学习方法，以充分获取原始文档中的信息。在两个广泛使用的基准数据集CNN/Daily Mail和BigPatent上进行了一系列实验。通过与最先进的基线在各种评估指标方面的比较，结果显示了ReSyM的优越性。摘要：Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions. The main issue is how to make the output summary semantically consistent with the input document. To reach this goal, recently, researchers have focused on supervised end-to-end hybrid approaches, which contain an extractor module and abstractor module. Among them, the extractor identifies the salient sentences from the input document, and the abstractor generates a summary from the salient sentences. This model successfully keeps the consistency between the generated summary and the reference summary via various strategies (e.g., reinforcement learning). There are two semantic gaps when training the hybrid model (one is between document and extracted sentences, and the other is between extracted sentences and summary). However, they are not explicitly considered in the existing methods, which usually results in a semantic bias of summary. To mitigate the above issue, in this paper, a new textbf{r}einforcing stextbf{e}mantic-textbf{sy}mmetry learning textbf{m}odel is proposed for document summarization (textbf{ReSyM}). ReSyM introduces a semantic-consistency reward in the extractor to bridge the first gap. A semantic dual-reward is designed to bridge the second gap in the abstractor. The whole document summarization process is implemented via reinforcement learning with a hybrid reward mechanism (combining the above two rewards). Moreover, a comprehensive sentence representation learning method is presented to sufficiently capture the information from the original document. A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent. The results have shown the superiority of ReSyM by comparing it with the state-of-the-art baselines in terms of various evaluation metrics.

Graph|知识图谱|Knowledge(2篇)

【1】 ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs 标题：ISEEQ：基于动态元信息检索和知识图的信息搜索问题生成链接：https://arxiv.org/abs/2112.07622

作者：Manas Gaur,Kalpa Gunaratna,Vijay Srinivasan,Hongxia Jin 机构：†AI Institute, University of South Carolina, SC, USA, ‡Samsung Research America, Mountain View, CA, USA 备注：Accepted at AAAI 2022, preprint version. Supplementary materials are provided in the paper and alternatively can be found at this https URL 摘要：会话信息搜索（CIS）是会话人工智能中一个相对较新的研究领域，它试图从最终用户那里寻找信息，以理解和满足用户的需求。如果实现，这样的系统在现实世界中具有深远的好处；例如，CIS系统可以帮助临床医生在医疗保健中对患者进行预筛选或分类。CIS中的一个关键开放子问题在文献中尚未解决，即基于最终用户的简短初始查询生成信息查询问题（ISQ）。为了解决这个开放性问题，我们提出了信息搜索问题生成器（ISEEQ），这是一种新的方法，用于在给定与用户查询相关的大型文本语料库的情况下，从一个简短的用户查询生成ISQ。首先，ISEEQ使用知识图来丰富用户查询。其次，ISEEQ使用知识丰富的查询来检索相关的上下文段落，以询问遵循概念流的连贯ISQ。第三，ISEEQ引入了一种新的基于深度生成对抗强化学习的ISQ生成方法。我们证明ISEEQ可以生成高质量的ISQ，以促进CIS试剂的发展。ISEEQ在具有来自不同领域的用户查询的四个数据集的五个ISQ评估指标上显著优于可比基线。此外，我们认为ISEEQ可以跨域转移以生成ISQ，因为当在不同的域对上进行训练和测试时，它显示了可接受的性能。定性人工评估证实ISEEQ生成的ISQ在质量上与人工生成的问题相当，并且优于最佳可比基线。摘要：Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users' needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the end-user. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative-adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline.

【2】 Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models 标题：面向预训练语言模型的模型不确定性感知知识融合链接：https://arxiv.org/abs/2112.07327

作者：Lei Li,Yankai Lin,Xuancheng Ren,Guangxiang Zhao,Peng Li,Jie Zhou,Xu Sun 机构：†MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University, §Pattern Recognition Center, WeChat AI, Tencent Inc., China 摘要：随着许多性能良好的微调预训练语言模型（PLM）的大量发布，研究更好的方法重用这些模型至关重要，因为它可以大大降低再训练计算成本和潜在的环境副作用。在本文中，我们探索了一种新的模型重用范式，即PLMs的知识融合（KA）。在没有人类注释的情况下，KA旨在将来自不同教师PLM的知识（每个PLM专门处理不同的分类问题）合并到一个通用的学生模型中。为了实现这一点，我们设计了一个模型不确定性-感知知识融合（MUKA）框架，该框架使用蒙特卡罗辍学近似黄金指导来识别潜在的合格教师。实验结果表明，与基准数据集上的基线相比，MUKA实现了实质性的改进。进一步的分析表明，MUKA可以在多教师模型、异质教师甚至跨数据集教师的复杂环境下很好地推广。摘要：As many fine-tuned pre-trained language models~(PLMs) with promising performance are generously released, investigating better ways to reuse these models is vital as it can greatly reduce the retraining computational cost and the potential environmental side-effects. In this paper, we explore a novel model reuse paradigm, Knowledge Amalgamation~(KA) for PLMs. Without human annotations available, KA aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. The achieve this, we design a Model Uncertainty--aware Knowledge Amalgamation~(MUKA) framework, which identifies the potential adequate teacher using Monte-Carlo Dropout for approximating the golden supervision to guide the student. Experimental results demonstrate that MUKA achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKA can generalize well under several complicate settings with multiple teacher models, heterogeneous teachers, and even cross-dataset teachers.

摘要|信息提取(2篇)

【1】 Exploring Neural Models for Query-Focused Summarization 标题：探索面向查询的摘要神经模型链接：https://arxiv.org/abs/2112.07637

作者：Jesse Vig,Alexander R. Fabbri,Wojciech Kryściński 机构：Wojciech Kry´sci´nski∗, Salesforce Research 摘要：以查询为中心的摘要（QFS）旨在生成能够回答特定兴趣问题的摘要，从而实现更好的用户控制和个性化。虽然最近发布的数据集，如QMSum或AQuaMuSe，促进了QFS的研究工作，但该领域缺乏对适用建模方法广阔空间的全面研究。在本文中，我们对QFS的神经方法进行了系统的探索，考虑了两类方法：两阶段提取抽象解和端到端模型。在这些类别中，我们调查了现有的方法，并提出了两种模型扩展，它们在QMSum数据集上实现了最先进的性能，其差值高达3.38胭脂-1、3.72胭脂-2、，和3.28 ROUGE-L。通过定量实验，我们强调了不同模型配置之间的权衡，并探索了摘要任务之间的转换能力。代码和检查点是公开的：https://github.com/salesforce/query-focused-sum. 摘要：Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. While recently released datasets, such as QMSum or AQuaMuSe, facilitate research efforts in QFS, the field lacks a comprehensive study of the broad space of applicable modeling methods. In this paper we conduct a systematic exploration of neural approaches to QFS, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models. Within those categories, we investigate existing methods and present two model extensions that achieve state-of-the-art performance on the QMSum dataset by a margin of up to 3.38 ROUGE-1, 3.72 ROUGE-2, and 3.28 ROUGE-L. Through quantitative experiments we highlight the trade-offs between different model configurations and explore the transfer abilities between summarization tasks. Code and checkpoints are made publicly available: https://github.com/salesforce/query-focused-sum.

【2】 Reinforced Abstractive Summarization with Adaptive Length Controlling 标题：基于自适应长度控制的增强型摘要链接：https://arxiv.org/abs/2112.07534

作者：Mingyang Song,Yi Feng,Liping Jing 机构：Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, China 备注：9 pages, 3 figures 摘要：文档摘要是自然语言生成中的一项基本任务，其目的是为给定的文档生成简短而连贯的摘要。可控摘要，特别是长度的可控摘要，对于一些实际应用来说是一个重要的问题，尤其是如何权衡长度约束和信息完整性。在本文中，我们提出了一种textbf{A}自适应textbf{L}长度textbf{C}控制textbf{O}优化（textbf{ALCO}）方法，通过强化学习利用两阶段抽象摘要模型。ALCO将长度限制纳入句子提取阶段，以惩罚超长的提取句子。同时，设计了显著性估计机制，以保留生成句子中的显著性信息。在广泛使用的基准数据集上进行了一系列实验。结果表明，ALCO在长度可控性和内容保存方面优于常用基线。摘要：Document summarization, as a fundamental task in natural language generation, aims to generate a short and coherent summary for a given document. Controllable summarization, especially of the length, is an important issue for some practical applications, especially how to trade-off the length constraint and information integrity. In this paper, we propose an textbf{A}daptive textbf{L}ength textbf{C}ontrolling textbf{O}ptimization (textbf{ALCO}) method to leverage two-stage abstractive summarization model via reinforcement learning. ALCO incorporates length constraint into the stage of sentence extraction to penalize the overlength extracted sentences. Meanwhile, a saliency estimation mechanism is designed to preserve the salient information in the generated sentences. A series of experiments have been conducted on a wildly-used benchmark dataset textit{CNN/Daily Mail}. The results have shown that ALCO performs better than the popular baselines in terms of length controllability and content preservation.

推理|分析|理解|解释(2篇)

【1】 Exploring the Limits of Natural Language Inference Based Setup for Few-Shot Intent Detection 标题：探索基于自然语言推理的少发意图检测设置的局限性链接：https://arxiv.org/abs/2112.07434

作者：Vijit Malik,Ayush Kumar,Jithendra Veppa 机构：IIT Kanpur, India, Observe.AI, India, Jithendra Vepa 摘要：面向目标的对话系统的核心组件之一是意图检测任务。由于缺乏可用的注释话语，很少有意图检测时的镜头学习具有挑战性。尽管最近提出了使用基于度量和基于优化的方法的工作，但在大标签空间和更少的快照数量中，该任务仍然具有挑战性。由于在测试阶段同时存在新的和可见的类，因此广义的Few-Shot学习更加困难。在这项工作中，我们提出了一种基于自然语言推理的简单有效的方法，不仅解决了少数镜头意图检测问题，而且在Zero-Shot和广义少数镜头学习问题中也证明了它的有效性。我们在大量自然语言理解（NLU）和口语理解（SLU）数据集上的大量实验表明了我们方法的有效性。此外，我们还强调了我们基于NLI的方法比基线的性能好很多的设置。摘要：One of the core components of goal-oriented dialog systems is the task of Intent Detection. Few-shot Learning upon Intent Detection is challenging due to the scarcity of available annotated utterances. Although recent works making use of metric-based and optimization-based methods have been proposed, the task is still challenging in large label spaces and much smaller number of shots. Generalized Few-shot learning is more difficult due to the presence of both novel and seen classes during the testing phase. In this work, we propose a simple and effective method based on Natural Language Inference that not only tackles the problem of few shot intent detection, but also proves useful in zero-shot and generalized few shot learning problems. Our extensive experiments on a number of Natural Language Understanding (NLU) and Spoken Language Understanding (SLU) datasets show the effectiveness of our approach. In addition, we highlight the settings in which our NLI based method outperforms the baselines by huge margins.

【2】 Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing 标题：使用无人监督的后期编辑生成流畅的事实核查解释链接：https://arxiv.org/abs/2112.06924

作者：Shailza Jolly,Pepa Atanasova,Isabelle Augenstein 机构： Department of Computer Science, University of Copenhagen, Department ofComputer Science, Universityof Copenhagen 摘要：事实核查系统已经成为核实虚假和误导性新闻的重要工具。当人类可读的解释伴随着准确性标签时，这些系统变得更加可信。然而，手动收集此类解释既昂贵又耗时。最近的著作将解释生成框架为提取摘要，并建议从专业记者的裁决评论（RCs）中自动选择足够多的最重要事实子集，以获得事实检查解释。然而，这些解释缺乏流利性和句子连贯性。在这项工作中，我们提出了一种基于迭代编辑的算法，该算法仅使用短语级编辑来对断开连接的RCs执行无监督的后期编辑。为了调整我们的编辑算法，我们使用了一个评分函数，其中包括流畅性和语义保留。此外，我们还展示了我们的方法在完全无监督的环境中的适用性。我们使用两个基准数据集进行实验，LIAR-PLUS和PubHealth。我们表明，我们的模型生成的解释流畅、可读、无冗余，并且涵盖了事实检查的重要信息。摘要：Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when human-readable explanations accompany the veracity labels. However, manual collection of such explanations is expensive and time-consuming. Recent works frame explanation generation as extractive summarization, and propose to automatically select a sufficient subset of the most important facts from the ruling comments (RCs) of a professional journalist to obtain fact-checking explanations. However, these explanations lack fluency and sentence coherence. In this work, we present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of disconnected RCs. To regulate our editing algorithm, we use a scoring function with components including fluency and semantic preservation. In addition, we show the applicability of our approach in a completely unsupervised setting. We experiment with two benchmark datasets, LIAR-PLUS and PubHealth. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check.

GAN|对抗|攻击|生成相关(2篇)

【1】 Massive-scale Decoding for Text Generation using Lattices 标题：基于格的大规模文本生成解码算法链接：https://arxiv.org/abs/2112.07660

作者：Jiacheng Xu,Greg Durrett 机构：Department of Computer Science, The University of Texas at Austin 备注：19 pages, 13 figures, see this https URL 摘要：神经文本生成模型（如用于摘要和翻译的模型）生成高质量的输出，但通常集中于一种模式，而我们真正想要的是一组不同的选项。我们提出了一种搜索算法来构造编码大量生成选项的格。首先，我们将解码重构为一个最佳优先搜索，它探索的空间不同于波束搜索，并通过避免修剪路径来提高效率。其次，我们回顾了假设重组的概念：我们可以在搜索过程中识别相似的候选代对，并将它们合并为近似值。在文档摘要和机器翻译方面，我们的算法将数百到数千个不同的选项编码到一个线性大小的格中，这些选项保持语法和高质量。该算法为在大规模不同输出端上构建下游应用提供了基础。摘要：Neural text generation models like those used for summarization and translation generate high-quality outputs, but often concentrate around a mode when what we really want is a diverse set of options. We present a search algorithm to construct lattices encoding a massive number of generation options. First, we restructure decoding as a best-first search, which explores the space differently than beam search and improves efficiency by avoiding pruning paths. Second, we revisit the idea of hypothesis recombination: we can identify pairs of similar generation candidates during search and merge them as an approximation. On both document summarization and machine translation, we show that our algorithm encodes hundreds to thousands of diverse options that remain grammatical and high-quality into one linear-sized lattice. This algorithm provides a foundation for building downstream generation applications on top of massive-scale diverse outputs.

【2】 Controlled Cue Generation for Play Scripts 标题：游戏脚本的受控线索生成链接：https://arxiv.org/abs/2112.06953

作者：Alara Dirik,Hilal Donmez,Pinar Yanardag 机构：Bo˘gaziçi University, Istanbul, Turkey 摘要：在本文中，我们使用了一个大规模的剧本数据集，提出了从对话中生成戏剧线索的新任务。使用超过一百万行的对话和线索，我们将线索生成问题作为受控文本生成任务来处理，并展示如何使用以对话/线索鉴别器为条件的语言模型来使用线索来增强对话的影响。此外，我们还探讨了主题关键字和情感在受控文本生成中的应用。大量的定量和定性实验表明，语言模型可以成功地用于在高度专业化的领域（如剧本）中生成合理的、属性受控的文本。有关支持材料，请访问：https://catlab-team.github.io/cuegen. 摘要：In this paper, we use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. Using over one million lines of dialogue and cues, we approach the problem of cue generation as a controlled text generation task, and show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator. In addition, we explore the use of topic keywords and emotions for controlled text generation. Extensive quantitative and qualitative experiments show that language models can be successfully used to generate plausible and attribute-controlled texts in highly specialised domains such as play scripts. Supporting materials can be found at: https://catlab-team.github.io/cuegen.

半/弱/无监督|不确定性(1篇)

【1】 GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval 标题：GPL：密集检索无监督领域适配的生成性伪标注链接：https://arxiv.org/abs/2112.07577

作者：Kexin Wang,Nandan Thakur,Nils Reimers,Iryna Gurevych 机构： Ubiquitous Knowledge Processing Lab, Technical University of Darmstadt, University of Waterloo, Hugging Face, www.ukp.tu-darmstadt.de 摘要：密集检索方法可以克服词汇差异，显著改善搜索结果。但是，它们需要大量的训练数据，而这些数据在大多数领域都不可用。如之前的工作（Thakur et al.，2021b）所示，密集检索器的性能在域转移下严重下降。这将密集检索方法的使用限制在只有少数具有大型训练数据集的领域。在本文中，我们提出了一种新的无监督域自适应方法生成伪标记（GPL），它将查询生成器与交叉编码器的伪标记相结合。在六个具有代表性的领域专用数据集上，我们发现所提出的GPL比开箱即用的最新密集检索方法的性能高达8.9个百分点nDCG@10.与以前的方法相比，GPL需要更少的（未标记的）目标域数据，并且在训练中更加稳健。我们进一步研究了最近六种预训练方法在检索任务领域适应性场景中的作用，其中只有三种方法可以产生更好的结果。最好的方法，TSDAE（Wang等人，2021年）可以与GPL相结合，产生另一个平均提高1.0分的结果nDCG@10跨越六项任务。摘要：Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 8.9 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.0 points nDCG@10 across the six tasks.

检测相关(1篇)

【1】 Towards A Reliable Ground-Truth For Biased Language Detection 标题：为有偏误的语言检测寻找可靠的基础事实链接：https://arxiv.org/abs/2112.07421

作者：Timo Spinde,David Krieger,Manuel Plank,Bela Gipp 机构：Dept. of Computer and, Information Science, University of Konstanz, Constance, Germany, School of Electrical, Information, and Media Engineering, University of Wuppertal, Wuppertal, Germany 摘要：当客观报道被主观写作所取代时，百科全书和新闻文章等参考文本可能会表现出有偏见的语言。现有的检测偏差的方法大多依赖于带注释的数据来训练机器学习模型。然而，在现有的媒体偏见语料库中，注释者的一致性和可比性较低是一个重大缺陷。为了评估数据收集选项，我们收集并比较从两个流行众包平台获得的标签。我们的结果表明，现有众包方法缺乏数据质量，强调需要一个训练有素的专家框架来收集更可靠的数据集。通过创建这样一个框架并收集第一个数据集，我们能够将Krippendorff的$alpha$=0.144（众包标签）改进为$alpha$=0.419（专家标签）。我们得出结论，详细的注释器训练提高了数据质量，改善了现有偏差检测系统的性能。我们将在未来继续扩展我们的数据集。摘要：Reference texts such as encyclopedias and news articles can manifest biased language when objective reporting is substituted by subjective writing. Existing methods to detect bias mostly rely on annotated data to train machine learning models. However, low annotator agreement and comparability is a substantial drawback in available media bias corpora. To evaluate data collection options, we collect and compare labels obtained from two popular crowdsourcing platforms. Our results demonstrate the existing crowdsourcing approaches' lack of data quality, underlining the need for a trained expert framework to gather a more reliable dataset. By creating such a framework and gathering a first dataset, we are able to improve Krippendorff's $alpha$ = 0.144 (crowdsourcing labels) to $alpha$ = 0.419 (expert labels). We conclude that detailed annotator training increases data quality, improving the performance of existing bias detection systems. We will continue to extend our dataset in the future.

识别/分类(5篇)

【1】 On the Use of External Data for Spoken Named Entity Recognition 标题：浅谈外部数据在口语命名实体识别中的应用链接：https://arxiv.org/abs/2112.07648

作者：Ankita Pasad,Felix Wu,Suwon Shon,Karen Livescu,Kyu J. Han 机构：ASAPP, Toyota Technological Institute at Chicago 摘要：口语理解（SLU）任务涉及从语音信号到语义标签的映射。考虑到这些任务的复杂性，良好的性能可能需要大量标记的数据集，这些数据集很难为每个新任务和域收集。然而，自监督语音表示的最新进展使得考虑有限的标记数据学习SLU模型是可行的。在这项工作中，我们将重点放在低资源语音命名实体识别（NER）上，并解决以下问题：除了自我监督的预训练外，我们如何使用未为任务注释的外部语音和/或文本数据？我们采用各种方法，包括自我训练，知识蒸馏和转移学习，并考虑其适用于端到端的模型和管道（语音识别，其次是文本模型）的方法。我们发现，其中一些方法在资源受限的环境中提高了性能，而不仅仅是预先训练好的表示。与之前的工作相比，我们发现F1成绩提高了16%。虽然最佳基线模型是管道方法，但使用外部数据时的最佳性能最终是通过端到端模型实现的。我们提供了详细的比较和分析，例如表明端到端模型能够关注更具体的单词。摘要：Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labels. Given the complexity of such tasks, good performance might be expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with limited labeled data. In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task? We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline (speech recognition followed by text NER model) approaches. We find that several of these approaches improve performance in resource-constrained settings beyond the benefits from pre-trained representations alone. Compared to prior work, we find improved F1 scores of up to 16%. While the best baseline model is a pipeline approach, the best performance when using external data is ultimately achieved by an end-to-end model. We provide detailed comparisons and analyses, showing for example that end-to-end models are able to focus on the more NER-specific words.

【2】 Text Classification Models for Form Entity Linking 标题：用于表单实体链接的文本分类模型链接：https://arxiv.org/abs/2112.07443

作者：María Villota,César Domínguez,Jónathan Heras,Eloy Mata,Vico Pascual 机构：Department of Mathematics and Computer Science, University of La Rioja, Spain 摘要：表单是一种广泛使用的基于模板的文档类型，用于各种领域，包括管理、医学、金融或保险等。由于每天生成的表单数量不断增加，因此迫切需要自动提取这些文档中包含的信息。但是，在处理扫描的表单时，这并不是一项简单的任务，因为具有不同表单实体位置的模板非常多样，而且扫描文档的质量也很高。在这种情况下，所有表单都有一个共同的特性：它们包含一个作为键值（或标签值）对构建的互连实体集合，以及其他实体，如标题或图像。在这项工作中，我们通过结合图像处理技术和基于BERT体系结构的文本分类模型，解决了表单中的实体链接问题。这种方法在FUNSD数据集上的F1得分为0.80，达到了最先进的结果，比以前最好的方法提高了5%。该项目的代码可在https://github.com/mavillot/FUNSD-Entity-Linking. 摘要：Forms are a widespread type of template-based document used in a great variety of fields including, among others, administration, medicine, finance, or insurance. The automatic extraction of the information included in these documents is greatly demanded due to the increasing volume of forms that are generated in a daily basis. However, this is not a straightforward task when working with scanned forms because of the great diversity of templates with different location of form entities, and the quality of the scanned documents. In this context, there is a feature that is shared by all forms: they contain a collection of interlinked entities built as key-value (or label-value) pairs, together with other entities such as headers or images. In this work, we have tacked the problem of entity linking in forms by combining image processing techniques and a text classification model based on the BERT architecture. This approach achieves state-of-the-art results with a F1-score of 0.80 on the FUNSD dataset, a 5% improvement regarding the best previous method. The code of this project is available at https://github.com/mavillot/FUNSD-Entity-Linking.

【3】 Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings 标题：通过插座专用词嵌入的比较识别新闻文章中的偏向词语链接：https://arxiv.org/abs/2112.07384

作者：Timo Spinde,Lada Rudnitckaia,Felix Hamborg,Bela Gipp 机构： University of Konstanz, Konstanz, Germany, University of Wuppertal, Wuppertal, Germany 摘要：倾斜的新闻报道，也称为媒体偏见，会严重影响新闻消费者对新闻的解读和反应。为了自动识别有偏见的语言，我们提出了一种探索性的方法来比较相关单词的上下文。我们训练了两个单词嵌入模型，一个在左翼文本上，另一个在右翼新闻媒体上。我们的假设是，一个单词在两个单词嵌入空间中的表示方式对于非偏向单词比偏向单词更相似。其基本思想是，不同新闻媒体中有偏见的词语的语境比没有偏见的词语的语境差异更大，因为对一个词语有偏见的认知取决于其语境。虽然我们没有发现接受假设的统计意义，但结果表明了该方法的有效性。例如，在两个单词嵌入空间的线性映射之后，31%的最大距离的单词可能会产生偏差。为了改进结果，我们发现数据集需要大大扩大，我们提出了进一步的方法作为未来的研究方向。据我们所知，本文首次深入探讨了通过词语嵌入来衡量偏见词语的语境。摘要：Slanted news coverage, also called media bias, can heavily influence how news consumers interpret and react to the news. To automatically identify biased language, we present an exploratory approach that compares the context of related words. We train two word embedding models, one on texts of left-wing, the other on right-wing news outlets. Our hypothesis is that a word's representations in both word embedding spaces are more similar for non-biased words than biased words. The underlying idea is that the context of biased words in different news outlets varies more strongly than the one of non-biased words, since the perception of a word as being biased differs depending on its context. While we do not find statistical significance to accept the hypothesis, the results show the effectiveness of the approach. For example, after a linear mapping of both word embeddings spaces, 31% of the words with the largest distances potentially induce bias. To improve the results, we find that the dataset needs to be significantly larger, and we derive further methodology as future research direction. To our knowledge, this paper presents the first in-depth look at the context of bias words measured by word embeddings.

【4】 Event Based Time-Vectors for auditory features extraction: a neuromorphic approach for low power audio recognition 标题：基于事件的时间矢量听觉特征提取：一种用于低功耗音频识别的神经形态学方法链接：https://arxiv.org/abs/2112.07011

作者：Marco Rasetto,Juan P. Dominguez-Morales,Angel Jimenez-Fernandez,Ryad Benosman 机构：Department of Bioengineering and Center for Neural Basis of Cognition, University of Pittsburgh, Carnegie Mellon University, Pittsburgh, USA, Robotics and Technology of Computers Lab, Universidad de Sevilla, Seville, Spain 备注：10 pages, 7 figures 摘要：近年来，为了提高自然语言处理（NLP）和音频识别的技术水平，人们做出了巨大的努力。然而，这些努力往往导致更大、更复杂的模型的功耗和内存需求增加。这些解决方案没有满足物联网设备对低功耗、低内存效率计算的要求，因此无法满足日益增长的高效边缘计算需求。神经形态系统已被证明是许多应用中低功耗低延迟计算的优秀候选。因此，我们提出了一种神经形态结构，能够进行无监督的听觉特征识别。然后，我们在谷歌语音命令数据集的子集上验证网络。摘要：In recent years tremendous efforts have been done to advance the state of the art for Natural Language Processing (NLP) and audio recognition. However, these efforts often translated in increased power consumption and memory requirements for bigger and more complex models. These solutions falls short of the constraints of IoT devices which need low power, low memory efficient computation, and therefore they fail to meet the growing demand of efficient edge computing. Neuromorphic systems have proved to be excellent candidates for low-power low-latency computation in a multitude of applications. For this reason we present a neuromorphic architecture, capable of unsupervised auditory feature recognition. We then validate the network on a subset of Google's Speech Commands dataset.

【5】 Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model 标题：利用预先训练的声学和语言模型改进CTC/注意混合端到端语音识别链接：https://arxiv.org/abs/2112.07254

作者：Keqi Deng,Songjun Cao,Yike Zhang,Long Ma 机构：Tencent Cloud Xiaowei, Beijing, China, Institute of Acoustics, Chinese Academy of Sciences, China, University of Chinese Academy of Sciences, China 备注：ASRU2021 摘要：最近，自监督预训练在端到端（E2E）自动语音识别（ASR）中取得了令人印象深刻的结果。然而，主导序列对序列（S2S）E2E模型仍然难以充分利用自监督预训练方法，因为其解码器以声学表示为条件，因此无法单独预训练。在本文中，我们提出了一种基于混合CTC/注意E2E模型的预训练转换器（Preformer）S2S ASR体系结构，以充分利用预训练声学模型（AMs）和语言模型（LMs）。在我们的框架中，编码器是用预训练AM（wav2vec2.0）初始化的。预成型器在训练和推理过程中将CTC作为辅助任务。此外，我们还设计了一个单交叉译码器（OCD），它放松了对声学表示的依赖，因此可以使用预训练LM（DistilGPT2）对其进行初始化。实验在Aisell-1语料库上进行，在测试集上实现了$4.6%$字符错误率（CER）。与我们的香草混合CTC/注意力转换器基线相比，我们提出的基于CTC/注意力的预成型器相对CER降低27%$。据我们所知，这是首次在S2S ASR系统中使用预训练AM和LM。摘要：Recently, self-supervised pretraining has achieved impressive results in end-to-end (E2E) automatic speech recognition (ASR). However, the dominant sequence-to-sequence (S2S) E2E model is still hard to fully utilize the self-supervised pre-training methods because its decoder is conditioned on acoustic representation thus cannot be pretrained separately. In this paper, we propose a pretrained Transformer (Preformer) S2S ASR architecture based on hybrid CTC/attention E2E models to fully utilize the pretrained acoustic models (AMs) and language models (LMs). In our framework, the encoder is initialized with a pretrained AM (wav2vec2.0). The Preformer leverages CTC as an auxiliary task during training and inference. Furthermore, we design a one-cross decoder (OCD), which relaxes the dependence on acoustic representations so that it can be initialized with pretrained LM (DistilGPT2). Experiments are conducted on the AISHELL-1 corpus and achieve a $4.6%$ character error rate (CER) on the test set. Compared with our vanilla hybrid CTC/attention Transformer baseline, our proposed CTC/attention-based Preformer yields $27%$ relative CER reduction. To the best of our knowledge, this is the first work to utilize both pretrained AM and LM in a S2S ASR system.

Zero/Few/One-Shot|迁移|自适应(1篇)

【1】 LMTurk: Few-Shot Learners as Crowdsourcing Workers 标题：LMTurk：作为众包员工的机会很少的学习者链接：https://arxiv.org/abs/2112.07522

作者：Mengjie Zhao,Fei Mi,Yasheng Wang,Minglei Li,Xin Jiang,Qun Liu,Hinrich Schütze 机构：†CIS, LMU Munich, ‡Huawei Noah’s Ark Lab, Huawei Technologies Co., Ltd. 摘要：大量的努力致力于创建高性能的少数镜头学习者，即在几乎没有训练数据的情况下表现良好的模型。训练大规模的预训练语言模型（PLM）已经产生了巨大的成本，但是利用基于PLM的少量shot学习者仍然具有挑战性，因为他们的规模巨大。这项工作的重点是一个关键问题：如何有效地利用这几个射击学习者？我们提出了LMTurk，这是一种新的方法，它将少数射击学习者视为众包工作者。其基本原理是众包工作者实际上是少数的射击学习者：向他们展示一些示例，让他们了解一项任务，然后开始注释。LMTurk雇佣少数基于PLM的射击学习者作为工人。我们表明，生成的注释可以用来训练模型，这些模型能够很好地解决任务，并且足够小，可以在实际场景中部署。总之，LMTurk是有效利用当前基于PLM的少数镜头学习者的重要一步。摘要：Vast efforts have been devoted to creating high-performance few-shot learners, i.e., models that perform well with little training data. Training large-scale pretrained language models (PLMs) has incurred significant cost, but utilizing PLM-based few-shot learners is still challenging due to their enormous size. This work focuses on a crucial question: How to make effective use of these few-shot learners? We propose LMTurk, a novel approach that treats few-shot learners as crowdsourcing workers. The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating. LMTurk employs few-shot learners built upon PLMs as workers. We show that the resulting annotations can be utilized to train models that solve the task well and are small enough to be deployable in practical scenarios. Altogether, LMTurk is an important step towards making effective use of current PLM-based few-shot learners.

检索(1篇)

【1】 Conversational Search with Mixed-Initiative -- Asking Good Clarification Questions backed-up by Passage Retrieval 标题：混合主动性的对话式搜索--以文章检索为支撑的问答问答链接：https://arxiv.org/abs/2112.07308

作者：Yosi Mass,Doron Cohen,Asaf Yehudai,David Konopnicki 机构：IBM Research AI, Haifa University, Mount Carmel, Haifa, HA , Israel 摘要：我们处理一个混合主动的会话搜索场景：即用户询问系统答案，以及系统询问（澄清问题）和用户答案。在给定对话上下文的情况下，我们将重点放在选择下一个澄清问题上。我们的方法利用了段落检索，该检索既可用于相关候选人澄清问题的初始选择，也可用于微调两个深度学习模型，以便对这些候选人重新排序。我们在两个不同的用例上评估了我们的方法。第一种是在大型web集合中进行开放域对话搜索。第二个是面向任务的客户支持设置。我们证明了我们的方法在两种用例上都表现良好。摘要：We deal with a scenario of conversational search with mixed-initiative: namely user-asks system-answers, as well as system-asks (clarification questions) and user-answers. We focus on the task of selecting the next clarification question, given conversation context. Our method leverages passage retrieval that is used both for an initial selection of relevant candidate clarification questions, as well as for fine-tuning two deep-learning models for re-ranking these candidates. We evaluated our method on two different use-cases. The first is an open domain conversational search in a large web collection. The second is a task-oriented customer-support setup. We show that our method performs well on both use-cases.

Word2Vec|文本|单词(2篇)

【1】 TASSY -- A Text Annotation Survey System 标题：Tassy--一个文本标注调查系统链接：https://arxiv.org/abs/2112.07391

作者：Timo Spinde,Kanishka Sinha,Norman Meuschke,Bela Gipp 机构：University of Konstanz, Konstanz, Germany, konstanz.de, University of Passau, Passau, Germany, University of Wuppertal, Wuppertal, Germany, wuppertal.de 摘要：我们提供了一个免费的开源工具，用于创建包含文本注释任务的基于web的调查。现有工具提供文本注释或测量功能，但不能同时提供这两种功能。结合这两种输入类型对于调查读者对文本的感知尤其重要，这也取决于读者的背景，如年龄、性别和教育程度。我们的工具主要满足图书馆和信息科学、社会科学和人文科学研究人员的需求，他们应用内容分析来调查媒体偏见、政治传播或假新闻。摘要：We present a free and open-source tool for creating web-based surveys that include text annotation tasks. Existing tools offer either text annotation or survey functionality but not both. Combining the two input types is particularly relevant for investigating a reader's perception of a text which also depends on the reader's background, such as age, gender, and education. Our tool caters primarily to the needs of researchers in the Library and Information Sciences, the Social Sciences, and the Humanities who apply Content Analysis to investigate, e.g., media bias, political communication, or fake news.

【2】 Building on Huang et al. GlossBERT for Word Sense Disambiguation 标题：建立在Huang等人的基础上。词义排歧的GlossBERT 链接：https://arxiv.org/abs/2112.07089

作者：Nikhil Patel,James Hale,Kanika Jindal,Apoorva Sharma,Yichun Yu 机构：University of Southern California 摘要：我们建议处理词义消歧（WSD）问题。在语言中，同一形式的词可以根据上下文而具有不同的含义。虽然人类很容易根据上下文推断出这些单词的含义或光泽，但机器却在这项任务中遇到了麻烦。因此，我们打算复制和扩展Huang et al.GlossBERT的结果，他们设计了一个模型来消除这些单词的歧义（Huang et al.，2019）。具体来说，我们提出了以下增强：数据集调整（alpha超参数）、集成方法，以及用BART-andALBERT替换BERT。以下GitHub存储库包含本报告中使用的所有代码，它扩展了Huang等人提供的代码。摘要：We propose to take on the problem ofWord Sense Disambiguation (WSD). In language, words of the same form can take different meanings depending on context. While humans easily infer the meaning or gloss of such words by their context, machines stumble on this task.As such, we intend to replicated and expand upon the results of Huang et al.GlossBERT, a model which they design to disambiguate these words (Huang et al.,2019). Specifically, we propose the following augmentations: data-set tweaking(alpha hyper-parameter), ensemble methods, and replacement of BERT with BART andALBERT. The following GitHub repository contains all code used in this report, which extends on the code made available by Huang et al.

其他神经网络|深度学习|模型|建模(6篇)

【1】 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena 标题：VALSE：以语言现象为中心的视觉和语言模型的任务无关基准链接：https://arxiv.org/abs/2112.07566

作者：Letitia Parcalabescu,Michele Cafagna,Lilitta Muradjan,Anette Frank,Iacer Calixto,Albert Gatt 机构：Heidelberg University, Department of Computational Linguistics, University of Malta, Institute of Linguistics and Language Technology, New York University, ILLC, University of Amsterdam, Utrecht University, Department of Information and Computing Sciences 备注：28 pages, 4 figures, 11 tables 摘要：我们提出了VALSE（Vision And Language Structured Evaluation，视觉和语言结构化评估），这是一个新的基准，旨在测试通用预训练视觉和语言（V&L）模型对特定语言现象的视觉语言基础能力。VALSE提供了一套六项测试，涵盖各种语言结构。解决这些问题需要模型将语言现象置于视觉情态中，从而允许比迄今为止更细粒度的评估。我们使用支持有效箔构建的方法构建VALE，并报告评估五种广泛使用的V&L模型的结果。我们的实验表明，目前的模型很难解决大多数现象。因此，我们希望VALSE能够作为一个重要的基准，从语言学的角度衡量预先训练的V&L模型的未来进展，补充规范的以任务为中心的V&L评估。摘要：We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods that support the construction of valid foils, and report results from evaluating five widely-used V&L models. Our experiments suggest that current models have considerable difficulty addressing most phenomena. Hence, we expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations.

【2】 Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models 标题：用有偏见的尺子衡量公正性：对预先训练的语言模型中的偏差进行量化的研究综述链接：https://arxiv.org/abs/2112.07447

作者：Pieter Delobelle,Ewoenam Kwaku Tokpo,Toon Calders,Bettina Berendt 机构： Department of Computer Science, KU Leuven; Leuven.ai, Department of Computer Science, University of Antwerp, TU Berlin 备注：15 pages, 4 figures, 3 tables 摘要：人们对自然语言处理资源（如BERT）中的偏见模式的认识不断提高，促使许多指标量化“偏见”和“公平性”。但比较不同指标的结果和使用这些指标进行评估的工作仍然很困难，如果不是完全不可能的话。我们调查了关于预训练语言模型的公平性度量的现有文献，并通过实验评估了兼容性，包括语言模型中的两种偏见及其下游任务。我们通过混合传统文献调查和相关分析，以及进行实证评估来做到这一点。我们发现许多指标不兼容，高度依赖于（i）模板，（ii）属性和目标种子，以及（iii）嵌入的选择。这些结果表明，对于语境化的语言模型来说，公平性或偏见评估仍然具有挑战性，如果不是高度主观的话。为了改进未来的比较和公平性评估，我们建议避免嵌入基于度量的方法，并将重点放在下游任务中的公平性评估上。摘要：An increasing awareness of biased patterns in natural language processing resources, like BERT, has motivated many metrics to quantify `bias' and `fairness'. But comparing the results of different metrics and the works that evaluate with such metrics remains difficult, if not outright impossible. We survey the existing literature on fairness metrics for pretrained language models and experimentally evaluate compatibility, including both biases in language models as in their downstream tasks. We do this by a mixture of traditional literature survey and correlation analysis, as well as by running empirical evaluations. We find that many metrics are not compatible and highly depend on (i) templates, (ii) attribute and target seeds and (iii) the choice of embeddings. These results indicate that fairness or bias evaluation remains challenging for contextualized language models, if not at least highly subjective. To improve future comparisons and fairness evaluations, we recommend avoiding embedding-based metrics and focusing on fairness evaluations in downstream tasks.

【3】 TopNet: Learning from Neural Topic Model to Generate Long Stories 标题：TopNet：学习神经主题模型生成长篇故事链接：https://arxiv.org/abs/2112.07259

作者：Yazheng Yang,Boyuan Pan,Deng Cai,Huan Sun 机构：College of Computer Science, Hangzhou, China, State Key Lab of CAD&CG, Alibaba-Zhejiang University Joint Institute of Frontier, Technologies, Department of Computer Science and Engineering, The Ohio State University, Columbus, USA 备注：None 摘要：长故事生成（LSG）是自然语言处理领域梦寐以求的目标之一。与大多数文本生成任务不同，LSG需要基于更短的文本输入输出内容丰富的长篇大论，并且常常存在信息稀疏的问题。在本文中，我们提出emph{TopNet}来缓解这个问题，通过利用神经主题建模的最新进展来获得高质量的骨架词来补充短输入。特别是，我们首先学习将短文本输入映射到低维主题分布（由主题模型预先指定），而不是直接生成故事。基于这一潜在主题分布，我们可以使用主题模型的重建解码器对一系列相互关联的单词进行采样，作为故事的骨架。在两个基准数据集上的实验表明，我们提出的框架在骨架词选择方面非常有效，并且在自动评估和人工评估方面都显著优于最新的模型。摘要：Long story generation (LSG) is one of the coveted goals in natural language processing. Different from most text generation tasks, LSG requires to output a long story of rich content based on a much shorter text input, and often suffers from information sparsity. In this paper, we propose emph{TopNet} to alleviate this problem, by leveraging the recent advances in neural topic modeling to obtain high-quality skeleton words to complement the short input. In particular, instead of directly generating a story, we first learn to map the short text input to a low-dimensional topic distribution (which is pre-assigned by a topic model). Based on this latent topic distribution, we can use the reconstruction decoder of the topic model to sample a sequence of inter-related words as a skeleton for the story. Experiments on two benchmark datasets show that our proposed framework is highly effective in skeleton word selection and significantly outperforms the state-of-the-art models in both automatic evaluation and human evaluation.

【4】 From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression 标题：从密集到稀疏：对比剪枝以实现更好的预训练语言模型压缩链接：https://arxiv.org/abs/2112.07198

作者：Runxin Xu,Fuli Luo,Chengyu Wang,Baobao Chang,Jun Huang,Songfang Huang,Fei Huang 机构：Key Laboratory of Computational Linguistics, Peking University, MOE, China, Alibaba Group 备注：Accepted to AAAI 2022 摘要：在预训练和微调范式下，预训练语言模型（PLM）在各种自然语言处理（NLP）任务中取得了巨大成功。plm具有大量的参数，计算量大，资源消耗大。因此，模型修剪被引入到大规模PLM的压缩中。然而，大多数先验方法只考虑任务特定知识对下游任务的影响，而忽略修剪过程中不必要的任务无关知识，这可能导致灾难性遗忘问题，并导致泛化能力差。为了在我们的剪枝模型中保持任务不可知和任务特定的知识，我们在预训练和微调的范式下提出了对比剪枝（CAP）。它被设计为一个通用框架，兼容结构化和非结构化修剪。CAP统一于对比学习，使剪枝模型能够从预先训练的任务不可知知识模型和微调的任务特定知识模型中学习。此外，为了更好地保持修剪模型的性能，快照（即每次修剪迭代中的中间模型）也可以作为修剪的有效监督。我们的大量实验表明，采用CAP始终会产生显著的改进，特别是在非常高的稀疏性场景中。仅保留3%的模型参数（即97%的稀疏性），CAP在QQP和MNLI任务中成功实现了原始BERT性能的99.2%和96.3%。此外，我们的探索性实验表明，CAP修剪后的模型具有更好的泛化能力。摘要：Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.

【5】 Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models 标题：使用预先训练的语言模型发现案件判决中的解释语句链接：https://arxiv.org/abs/2112.07165

作者：Jaromir Savelka,Kevin D. Ashley 机构：School of Computer Science, Carnegie Mellon University, School of Law, University of Pittsburgh 备注：None 摘要：法律文本通常使用难以理解的概念。除其他外，律师通过仔细调查这些概念在过去的使用情况来阐述这些概念的含义。查找以有用的方式提及特定概念的文本片段既繁琐又耗时，因此成本高昂。我们收集了一组26959句来自法律案例判决的句子，并根据它们对解释选定法律概念的有用性对它们进行了标记。利用该数据集，我们研究了在大型语言语料库上预先训练的基于转换器的模型的有效性，以检测哪些句子是有用的。根据模型的预测，我们分析了解释句的各种语言特性以及它们与需要解释的法律概念的关系。我们表明，基于Transformer的模型能够学习出人意料的复杂功能，并优于先前的任务方法。摘要：Legal texts routinely use concepts that are difficult to understand. Lawyers elaborate on the meaning of such concepts by, among other things, carefully investigating how have they been used in past. Finding text snippets that mention a particular concept in a useful way is tedious, time-consuming, and, hence, expensive. We assembled a data set of 26,959 sentences, coming from legal case decisions, and labeled them in terms of their usefulness for explaining selected legal concepts. Using the dataset we study the effectiveness of transformer-based models pre-trained on large language corpora to detect which of the sentences are useful. In light of models' predictions, we analyze various linguistic properties of the explanatory sentences as well as their relationship to the legal concept that needs to be explained. We show that the transformer-based models are capable of learning surprisingly sophisticated features and outperform the prior approaches to the task.

【6】 Language Models are not Models of Language 标题：语言模型不是语言的模型链接：https://arxiv.org/abs/2112.07055

作者：Csaba Veres 机构：University of Bergen 摘要：自然语言处理（NLP）已成为当前人工智能热潮中的主要应用领域之一。迁移学习使得在语言建模任务中训练的大型深度学习神经网络能够极大地提高几乎所有语言任务的性能。有趣的是，当使用包含软件代码的数据对模型进行训练时，它们表现出从自然语言规范生成功能计算机代码的非凡能力。我们认为，这为神经模型在解释语言如何工作时为生成短语结构语法提供了另一种理论的说法带来了一个难题。由于编程语言的语法是由短语结构语法决定的，因此成功的神经模型显然对编程语言以及自然语言的理论基础缺乏信息。我们认为，语言模型这一术语具有误导性，因为深度学习模型不是语言的理论模型，因此建议采用语料库模型，这更好地反映了模型的起源和内容。摘要：Natural Language Processing (NLP) has become one of the leading application areas in the current Artificial Intelligence boom. Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance in almost all language tasks. Interestingly, when the models are trained with data that includes software code, they demonstrate remarkable abilities in generating functioning computer code from natural language specifications. We argue that this creates a conundrum for claims that neural models provide an alternative theory to generative phrase structure grammars in explaining how language works. Since the syntax of programming languages is determined by phrase structure grammars, successful neural models are apparently uninformative about the theoretical foundations of programming languages, and by extension, natural languages. We argue that the term language model is misleading because deep learning models are not theoretical models of language and propose the adoption of corpus model instead, which better reflects the genesis and contents of the model.

其他(8篇)

【1】 Improving Compositional Generalization with Latent Structure and Data Augmentation 标题：利用潜在结构和数据增强改进成分泛化链接：https://arxiv.org/abs/2112.07610

作者：Linlu Qiu,Peter Shaw,Panupong Pasupat,Paweł Krzysztof Nowak,Tal Linzen,Fei Sha,Kristina Toutanova 机构： Google Research, New York Univeristy 摘要：一般的非结构神经网络已经被证明难以实现分布外的合成泛化。通过示例重组的组合数据扩充已经将一些关于组合性的先验知识转移到了一些语义分析任务的黑盒神经模型中，但这通常需要特定于任务的工程或提供有限的收益。我们提出了一种更强大的数据重组方法，使用一种称为组合结构学习器（CSL）的模型。CSL是一个生成模型，具有准同步上下文无关语法主干，我们从训练数据中归纳出。我们从CSL中抽取重组的示例，并将它们添加到预先训练的序列到序列模型（T5）的微调数据中。该程序有效地将大多数CSL的合成偏差转移到了诊断任务的T5，并在两个真实世界的合成泛化任务中产生了比T5-CSL集合更强的模型。这就为这些具有挑战性的语义分析任务带来了最新的性能，这些任务需要对自然语言变化和元素的新组合进行概括。摘要：Generic unstructured neural networks have been shown to struggle on out-of-distribution compositional generalization. Compositional data augmentation via example recombination has transferred some prior knowledge about compositionality to such black-box neural models for several semantic parsing tasks, but this often required task-specific engineering or provided limited gains. We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL). CSL is a generative model with a quasi-synchronous context-free grammar backbone, which we induce from the training data. We sample recombined examples from CSL and add them to the fine-tuning data of a pre-trained sequence-to-sequence model (T5). This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks, and results in a model even stronger than a T5-CSL ensemble on two real world compositional generalization tasks. This results in new state-of-the-art performance for these challenging semantic parsing tasks requiring generalization to both natural language variation and novel compositions of elements.

【2】 The King is Naked: on the Notion of Robustness for Natural Language Processing 标题：国王是裸体的：关于自然语言处理的健壮性概念链接：https://arxiv.org/abs/2112.07605

作者：Emanuele La Malfa,Marta Kwiatkowska 机构：Department of Computer Science, University of Oxford 备注：AAAI 2022 main-track (full-paper) 摘要：越来越多的证据表明，最初为图像引入的对抗性稳健性的经典概念已被NLP研究界的大部分人作为事实标准采用。我们表明，在NLP的背景下，这个概念是有问题的，因为它考虑了一个狭窄的语言现象谱。在本文中，我们主张语义稳健性，这与人类的语言保真度概念更为一致。我们根据模型中预期会产生的偏差来描述语义稳健性。我们使用一个基于模板的生成性测试平台研究了一系列普通的、经过健壮训练的体系结构的语义健壮性。我们用实证证据来补充分析，尽管语义稳健性很难实现，但语义稳健性可以提高性能%，这保证了在经典意义上的模型稳健性失败的复杂语言现象上的性能。摘要：There is growing evidence that the classical notion of adversarial robustness originally introduced for images has been adopted as a de facto standard by a large part of the NLP research community. We show that this notion is problematic in the context of NLP as it considers a narrow spectrum of linguistic phenomena. In this paper, we argue for semantic robustness, which is better aligned with the human concept of linguistic fidelity. We characterize semantic robustness in terms of biases that it is expected to induce in a model. We study semantic robustness of a range of vanilla and robustly trained architectures using a template-based generative test bed. We complement the analysis with empirical evidence that, despite being harder to implement, semantic robustness can improve performance %gives guarantees for on complex linguistic phenomena where models robust in the classical sense fail.

【3】 Sentiment Dynamics of Success: Fractal Scaling of Story Arcs Predicts Reader Preferences 标题：成功的情感动力：故事弧度的分形尺度预测读者偏好链接：https://arxiv.org/abs/2112.07497

作者：Yuri Bizzoni,Telma Peura,Mads R. Thomsen,Kristoffer Nielbo 摘要：我们探讨了安徒生童话中的情感弧与它们的受欢迎程度之间的相关性，以它们在GoodReads平台上的平均分数来衡量。具体地说，我们并不认为一个故事的整体情感趋势具有预测性，但我们关注的是它随时间的连贯性和可预测性，如arc的赫斯特指数所示。我们发现，降低赫斯特值往往意味着降低质量分数，而赫斯特指数介于.55和.65之间可能表示文学欣赏的“最佳点”。摘要：We explore the correlation between the sentiment arcs of H. C. Andersen's fairy tales and their popularity, measured as their average score on the platform GoodReads. Specifically, we do not conceive a story's overall sentimental trend as predictive textit{per se}, but we focus on its coherence and predictability over time as represented by the arc's Hurst exponent. We find that degrading Hurst values tend to imply degrading quality scores, while a Hurst exponent between .55 and .65 might indicate a "sweet spot" for literary appreciation.

【4】 Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks 标题：主观NLP任务的两种对比数据标注范式链接：https://arxiv.org/abs/2112.07475

作者：Paul Röttger,Bertie Vidgen,Dirk Hovy,Janet B. Pierrehumbert 机构：University of Oxford, The Alan Turing Institute, Bocconi University 摘要：标记数据是大多数自然语言处理任务的基础。然而，给数据加标签是困难的，对于正确的数据标签应该是什么，通常存在着各种有效的信念。到目前为止，数据集创建者已经承认了注释者的主观性，但在注释过程中没有积极地管理它。这导致了部分主观数据集无法用于明确的下游用途。为了解决这个问题，我们提出了两种不同的数据注释范例。描述性范式鼓励注释者的主体性，而规定性范式则不鼓励注释者的主体性。描述性注释允许对不同的信念进行调查和建模，而规定性注释允许对一致应用一种信念的模型进行训练。我们讨论了实现这两种模式的好处和挑战，并认为数据集创建者应该明确地针对其中一种模式，以促进其数据集的预期用途。最后，我们设计了一个注释实验来说明这两种范式之间的对比。摘要：Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are diverse valid beliefs about what the correct data labels should be. So far, dataset creators have acknowledged annotator subjectivity, but not actively managed it in the annotation process. This has led to partly-subjective datasets that fail to serve a clear downstream use. To address this issue, we propose two contrasting paradigms for data annotation. The descriptive paradigm encourages annotator subjectivity, whereas the prescriptive paradigm discourages it. Descriptive annotation allows for the surveying and modelling of different beliefs, whereas prescriptive annotation enables the training of models that consistently apply one belief. We discuss benefits and challenges in implementing both paradigms, and argue that dataset creators should explicitly aim for one or the other to facilitate the intended use of their dataset. Lastly, we design an annotation experiment to illustrate the contrast between the two paradigms.

【5】 Do You Think It's Biased? How To Ask For The Perception Of Media Bias 标题：你认为这是有偏见的吗？如何追问媒体偏向的知觉链接：https://arxiv.org/abs/2112.07392

作者：Timo Spinde,Christina Kreuter,Wolfgang Gaissmaier,Felix Hamborg,Bela Gipp,Helge Giese 机构：University of Konstanz, Konstanz, Germany, University of Wuppertal, Wuppertal, Germany 摘要：媒体报道对公众对事件的看法有重大影响。媒体报道事件的方式可以极大地改变我们社会的信仰和观念。尽管如此，几乎所有的媒体都以有偏见的方式报道新闻。虽然这种偏见可以通过改变词语选择或省略信息来引入，但对偏见的感知也在很大程度上取决于读者的个人背景。因此，媒体偏见是一个非常复杂的结构来识别和分析。尽管媒体偏见已成为许多研究的主题，但以往的评估策略过于简单，缺乏重叠和实证评估。因此，本研究旨在开发一个可作为评估文章偏倚的可靠标准的量表。举一个例子：为了衡量一篇新闻文章中的偏见，我们是否应该问：“这篇文章的偏见有多大？”或者我们应该问：“这篇文章是如何对待美国总统的？”。我们进行了文献检索，在先前的研究中发现了824个与文本感知相关的问题。在一个多次迭代的过程中，我们从语义上总结和浓缩了这些问题，从而总结出一套完整且具有代表性的关于偏见的可能问题类型。最后一组包括25个不同回答形式的问题，17个使用语义差异的问题，以及6个感觉评分。我们测试了190篇文章中的每个问题，共有663名参与者，以确定这些问题在多大程度上衡量了一篇文章的感知偏差。我们的研究结果表明，21个最终项目是适合和可靠的测量感知媒体偏见。我们将最后一组问题发布在http://bias-question-tree.gipplab.org/. 摘要：Media coverage possesses a substantial effect on the public perception of events. The way media frames events can significantly alter the beliefs and perceptions of our society. Nevertheless, nearly all media outlets are known to report news in a biased way. While such bias can be introduced by altering the word choice or omitting information, the perception of bias also varies largely depending on a reader's personal background. Therefore, media bias is a very complex construct to identify and analyze. Even though media bias has been the subject of many studies, previous assessment strategies are oversimplified, lack overlap and empirical evaluation. Thus, this study aims to develop a scale that can be used as a reliable standard to evaluate article bias. To name an example: Intending to measure bias in a news article, should we ask, "How biased is the article?" or should we instead ask, "How did the article treat the American president?". We conducted a literature search to find 824 relevant questions about text perception in previous research on the topic. In a multi-iterative process, we summarized and condensed these questions semantically to conclude a complete and representative set of possible question types about bias. The final set consisted of 25 questions with varying answering formats, 17 questions using semantic differentials, and six ratings of feelings. We tested each of the questions on 190 articles with overall 663 participants to identify how well the questions measure an article's perceived bias. Our results show that 21 final items are suitable and reliable for measuring the perception of media bias. We publish the final set of questions on http://bias-question-tree.gipplab.org/.

【6】 Simple Local Attentions Remain Competitive for Long-Context Tasks 标题：对于长上下文任务，简单的本地关注仍然是好胜链接：https://arxiv.org/abs/2112.07210

作者：Wenhan Xiong,Barlas Oğuz,Anchit Gupta,Xilun Chen,Diana Liskovich,Omer Levy,Wen-tau Yih,Yashar Mehdad 机构：Meta AI 摘要：许多NLP任务需要处理超出预训练模型长度限制的长上下文。为了将这些模型扩展到更长的文本序列，人们提出了许多有效的远程注意变体。尽管沿着这一方向进行了大量的研究，但仍然难以衡量这些模型在实际用例中的相对有效性，例如，如果我们按照pretrain和finetune范式应用这些模型。在这项工作中，我们的目标是通过大规模和受控实验对这些新兴模型进行彻底的分析。对于每个注意变量，我们使用相同的长文档语料库预训练大型模型，然后针对真实世界的长上下文任务对这些模型进行微调。我们的研究结果揭示了现有广泛使用的长期基准的缺陷，并表明在标准的训练前范式下，没有一种经过测试的有效注意能够击败简单的局部窗口注意。对局部注意变体的进一步分析表明，即使常用的注意窗口重叠也不一定能获得良好的下游结果——使用不相交的局部注意，我们能够构建一个更简单、更高效的长文档质量保证模型，该模型将Longformer~citep{Longformer}的性能与其一半的预训练计算相匹配。摘要：Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In order to scale these models to longer text sequences, many efficient long-range attention variants have been proposed. Despite the abundance of research along this direction, it is still difficult to gauge the relative effectiveness of these models in practical use cases, e.g., if we apply these models following the pretrain-and-finetune paradigm. In this work, we aim to conduct a thorough analysis of these emerging models with large-scale and controlled experiments. For each attention variant, we pretrain large-size models using the same long-doc corpus and then finetune these models for real-world long-context tasks. Our findings reveal pitfalls of an existing widely-used long-range benchmark and show none of the tested efficient attentions can beat a simple local window attention under standard pretraining paradigms. Further analysis on local attention variants suggests that even the commonly used attention-window overlap is not necessary to achieve good downstream results -- using disjoint local attentions, we are able to build a simpler and more efficient long-doc QA model that matches the performance of Longformer~citep{longformer} with half of its pretraining compute.

【7】 MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation 标题：MDD-EVAL：用于多领域对话评估的扩充数据自我训练链接：https://arxiv.org/abs/2112.07194

作者：Chen Zhang,Luis Fernando D'Haro,Thomas Friedrichs,Haizhou Li 机构： National University of Singapore, Robert Bosch (SEA), Singapore, Kriston AI Lab, China, Universidad Polit´ecnica de Madrid, Spain, The Chinese University of Hong Kong (Shenzhen), China 备注：Accepted to AAAI2022 (10 pages, 3 figures, Preprint version) 摘要：聊天机器人被设计用于在不同领域进行类似人类的对话，如一般聊天、知识交流和基于角色的对话。为了衡量此类对话代理的质量，对话评估员还应跨领域进行评估。然而，大多数最先进的自动对话评估指标（ADM）并不是为多领域评估而设计的。为了解决这个问题，我们设计了一个通用而健壮的框架MDD Eval。具体地说，我们首先用人工标注的数据训练教师评价者，以获得在特定领域区分好的对话反应和坏的对话反应的评分技能，然后采用自我训练策略，用教师标注的多领域数据训练新的评价者，这有助于新的评价者在多个领域进行概括。MDD评估根据六个对话评估基准进行广泛评估。实证结果表明，MDD Eval框架在所有评估基准的平均Spearman相关分数方面取得了良好的性能，与最先进的ADMs相比，绝对提高了7%。摘要：Chatbots are designed to carry out human-like conversations across different domains, such as general chit-chat, knowledge exchange, and persona-grounded conversations. To measure the quality of such conversational agents, a dialogue evaluator is expected to conduct assessment across domains as well. However, most of the state-of-the-art automatic dialogue evaluation metrics (ADMs) are not designed for multi-domain evaluation. We are motivated to design a general and robust framework, MDD-Eval, to address the problem. Specifically, we first train a teacher evaluator with human-annotated data to acquire a rating skill to tell good dialogue responses from bad ones in a particular domain and then, adopt a self-training strategy to train a new evaluator with teacher-annotated multi-domain data, that helps the new evaluator to generalize across multiple domains. MDD-Eval is extensively assessed on six dialogue evaluation benchmarks. Empirical results show that the MDD-Eval framework achieves a strong performance with an absolute improvement of 7% over the state-of-the-art ADMs in terms of mean Spearman correlation scores across all the evaluation benchmarks.

【8】 Framework para Caracterizar Fake News en Terminos de Emociones 标题：框架Para Caracterizar假新闻en Terminos de Emociones 链接：https://arxiv.org/abs/2112.07035

作者：Luis Rojas Rubio,Claudio Meneses Villegas 机构： Universidad Católica del Norte. Departamento de Ingeniería de Sistemas y Computación. 备注：in Spanish 摘要：社交网络由于其提供的即时性和社会互动性，已经成为人类的主要信息渠道之一，在某些情况下允许发布每个用户认为相关的内容。这带来了虚假新闻或虚假新闻的产生，这些出版物只寻求产生不确定性、错误信息或歪曲读者的观点。已经证明，人类无法完全识别文章到底是事实还是假新闻，因此出现了基于数据挖掘和机器学习的模型来描述和识别文章。本文提出了一个三层框架，其主要目的是描述假新闻中存在的情绪，并作为未来识别公众情绪状态和意向状态的工具。摘要：Social networks have become one of the main information channels for human beings due to the immediate and social interactivity they offer, allowing in some cases to publish what each user considers relevant. This has brought with it the generation of false news or Fake News, publications that only seek to generate uncertainty, misinformation or skew the opinion of readers. It has been shown that the human being is not capable of fully identifying whether an article is really a fact or a Fake News, due to this it is that models arise that seek to characterize and identify articles based on data mining and machine learning. This article proposes a three-layer framework, the main objective of which is to characterize the emotions present in Fake News and to be a tool for future work that identifies the emotional state and intentional state of the public.

linux https 网络安全 NLP服务 python

0 人点赞