自然语言处理学术速递[8.20]

2021-08-24 16:37:51 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.CL 方向,今日共计16篇

Transformer(2篇)

【1】 Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021 标题:细心微调转换器以翻译资源不足的语言@LoResMT 2021 链接:https://arxiv.org/abs/2108.08556

作者:Karthik Puranik,Adeep Hande,Ruba Priyadharshini,Thenmozi Durairaj,Anbukkarasi Sampath,Kingston Pal Thamburaj,Bharathi Raja Chakravarthi 机构:Department of Computer Science, Indian Institute of Information Technology Tiruchirappalli, ULTRA Arts and Science College, Madurai, Tamil Nadu, India, Sri Sivasubramaniya Nadar College of Engineering, Tamil Nadu, India 备注:10 pages 摘要:本文报告了IIITT团队为英语->马拉地语和英语->爱尔兰语对LoResMT 2021共享任务提交的机器翻译(MT)系统。这项任务的重点是为爱尔兰语和马拉地语等资源相对匮乏的语言提供出色的翻译。我们使用外部平行语料库作为额外训练的输入,对Indicatrans进行微调,这是一个针对英语->马拉地语的预训练多语言NMT模型。对于后一种语言对,我们使用了经过训练的赫尔辛基NLP Opus MT English->Irish模式。我们的方法在BLEU指标上产生了相对有希望的结果。在团队名称IIITT下,我们的系统在英语->马拉地语、爱尔兰语->英语和英语->爱尔兰语中分别排名1、1和2。 摘要:This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English->Marathi and English->Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English->Marathi, using external parallel corpus as input for additional training. We have used a pretrained Helsinki-NLP Opus MT English->Irish model for the latter language pair. Our approaches yield relatively promising results on the BLEU metrics. Under the team name IIITT, our systems ranked 1, 1, and 2 in English->Marathi, Irish->English, and English->Irish, respectively.

【2】 Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks 标题:Transformer注意头在多语言和跨语言任务中的贡献 链接:https://arxiv.org/abs/2108.08375

作者:Weicheng Ma,Kai Zhang,Renze Lou,Lili Wang,Soroush Vosoughi 机构:Department of Computer Science, Dartmouth College, Department of Computer Science and Technology, Tsinghua University, Department of Computer Science, Zhejiang University City College 备注:In ACL 2021 摘要:本文研究了注意头在基于Transformer的模型中的相对重要性,以帮助它们在跨语言和多语言任务中的解释能力。先前的研究发现,在每项单语言自然语言处理(NLP)任务中,只有少数注意头是重要的,修剪其余的注意头可以使模型的性能相当或得到改进。然而,在跨语言和多语言任务中,修剪注意头的影响尚不清楚。通过大量的实验,我们发现:(1)在基于多语言转换的模型中剪除大量的注意头通常对其在跨语言和多语言任务中的表现有积极的影响;(2)可以使用梯度对要剪除的注意头进行排序,并通过一些试验进行识别。我们的实验集中在序列标记任务上,可能适用于其他跨语言和多语言任务。为了全面性,我们研究了两个预先训练的多语言模型,即多语言BERT(mBERT)和XLM-R,它们分别涉及9种语言的三个任务。我们还讨论了我们的发现的有效性及其对真正资源稀缺的语言和其他任务设置的可扩展性。 摘要:This paper studies the relative importance of attention heads in Transformer-based models to aid their interpretability in cross-lingual and multi-lingual tasks. Prior research has found that only a few attention heads are important in each mono-lingual Natural Language Processing (NLP) task and pruning the remaining heads leads to comparable or improved performance of the model. However, the impact of pruning attention heads is not yet clear in cross-lingual and multi-lingual tasks. Through extensive experiments, we show that (1) pruning a number of attention heads in a multi-lingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments. Our experiments focus on sequence labeling tasks, with potential applicability on other cross-lingual and multi-lingual tasks. For comprehensiveness, we examine two pre-trained multi-lingual models, namely multi-lingual BERT (mBERT) and XLM-R, on three tasks across 9 languages each. We also discuss the validity of our findings and their extensibility to truly resource-scarce languages and other task settings.

QA|VQA|问答|对话(2篇)

【1】 UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text 标题:UNIQORN:基于RDF知识图和自然语言文本的统一问答 链接:https://arxiv.org/abs/2108.08614

作者:Soumajit Pramanik,Jesujoba Alabi,Rishiraj Saha Roy,Gerhard Weikum 机构: Max Planck Institute for Informatics 摘要:通过知识图和其他RDF数据进行的问答已经得到了极大的改进,许多优秀的系统为自然语言问题或电报查询提供了清晰的答案。其中一些系统包含文本源,作为回答过程的额外证据,但无法单独计算文本中的答案。相反,来自IR和NLP社区的系统解决了文本上的QA问题,但几乎没有利用语义数据和知识。本文介绍了第一个可以在统一框架内无缝操作RDF数据集和文本语料库或两者的QA系统。我们的方法称为UNIQORN,通过从RDF数据和/或文本语料库中检索与问题相关的三元组,动态构建上下文图,后者通过自动信息提取处理。生成的图形通常很丰富,但噪音很大。UNIQORN通过组Steiner树的高级图算法处理该输入,该算法在上下文图中识别最佳答案候选。对具有多个实体和关系的复杂问题的多个基准测试的实验结果表明,UNIQORN是一种只有五个参数的无监督方法,其结果与KGs、文本语料库和异构源的最新水平相当。基于图形的方法为完整的回答过程提供了用户可解释的证据。 摘要:Question answering over knowledge graphs and other RDF data has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, systems from the IR and NLP communities have addressed QA over text, but barely utilize semantic data and knowledge. This paper presents the first QA system that can seamlessly operate over RDF datasets and text corpora, or both together, in a unified framework. Our method, called UNIQORN, builds a context graph on the fly, by retrieving question-relevant triples from the RDF data and/or the text corpus, where the latter case is handled by automatic information extraction. The resulting graph is typically rich but highly noisy. UNIQORN copes with this input by advanced graph algorithms for Group Steiner Trees, that identify the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that UNIQORN, an unsupervised method with only five parameters, produces results comparable to the state-of-the-art on KGs, text corpora, and heterogeneous sources. The graph-based methodology provides user-interpretable evidence for the complete answering process.

【2】 Efficient Contextualization using Top-k Operators for Question Answering over Knowledge Graphs 标题:基于Top-k算子的知识图问答高效语境化 链接:https://arxiv.org/abs/2108.08597

作者:Philipp Christmann,Rishiraj Saha Roy,Gerhard Weikum 机构:MPI for Informatics, Germany 摘要:在知识库上回答复杂问题(KB-QA)面临着数十亿事实的巨大输入数据,涉及数百万个实体和数千个谓词。为了提高效率,QA系统首先通过识别一组可能包含所有答案和相关线索的事实来减少答案搜索空间。最常用的技术是将命名实体消歧(NED)系统应用于问题,并检索消歧实体的知识库事实。这项工作提出了ECQA,一种使用KB感知信号修剪搜索空间中不相关部分的有效方法。ECQA基于对知识库项的分数排序列表进行top-k查询处理,这些知识库项组合了有关词汇匹配、与问题的相关性、候选项之间的一致性以及知识库图中的连通性的信号。使用两个最新的QA基准测试进行的实验表明,ECQA在答案存在、搜索空间大小和运行时方面优于最先进的基线。 摘要:Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique is to apply named entity disambiguation (NED) systems to the question, and retrieve KB facts for the disambiguated entities. This work presents ECQA, an efficient method that prunes irrelevant parts of the search space using KB-aware signals. ECQA is based on top-k query processing over score-ordered lists of KB items that combine signals about lexical matching, relevance to the question, coherence among candidate items, and connectivity in the KB graph. Experiments with two recent QA benchmarks demonstrate the superiority of ECQA over state-of-the-art baselines with respect to answer presence, size of the search space, and runtimes.

机器翻译(1篇)

【1】 MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation 标题:MvSR-NAT:面向非自回归机器翻译的多视点子集正则化 链接:https://arxiv.org/abs/2108.08447

作者:Pan Xie,Zexian Li,Xiaohui Hu 机构: Beihang University, Chinese Academy of Sciences 摘要:条件掩蔽语言模型(CMLM)在非自回归机器翻译(NAT)方面取得了令人瞩目的进展。他们通过预测目标句子中的随机掩蔽子集来学习条件翻译模型。在CMLM框架的基础上,我们引入了一种新的正则化方法——多视图子集正则化(MvSR),以提高NAT模型的性能。具体来说,MvSR由两部分组成:(1)textit{共享掩码一致性}:我们使用不同的掩码策略转发相同的目标,并鼓励共享掩码位置的预测彼此一致(2) textit{model consistency},我们保持模型权重的指数移动平均值,并强制平均模型和在线模型之间的预测保持一致。在不改变基于CMLM的体系结构的情况下,我们的方法在三个公共基准上实现了卓越的性能,与以前的NAT模型相比,BLEU增益为0.36-1.14。此外,与更强的Transformer基线相比,在小数据集(WMT16 RO$leftrightarrow$EN和IWSLT DE$rightarrow$EN)上,我们将差距缩小到0.01-0.44 BLEU分数。 摘要:Conditional masked language models (CMLM) have shown impressive progress in non-autoregressive machine translation (NAT). They learn the conditional translation model by predicting the random masked subset in the target sentence. Based on the CMLM framework, we introduce Multi-view Subset Regularization (MvSR), a novel regularization method to improve the performance of the NAT model. Specifically, MvSR consists of two parts: (1) textit{shared mask consistency}: we forward the same target with different mask strategies, and encourage the predictions of shared mask positions to be consistent with each other. (2) textit{model consistency}, we maintain an exponential moving average of the model weights, and enforce the predictions to be consistent between the average model and the online model. Without changing the CMLM-based architecture, our approach achieves remarkable performance on three public benchmarks with 0.36-1.14 BLEU gains over previous NAT models. Moreover, compared with the stronger Transformer baseline, we reduce the gap to 0.01-0.44 BLEU scores on small datasets (WMT16 RO$leftrightarrow$EN and IWSLT DE$rightarrow$EN).

推理|分析|理解|解释(2篇)

【1】 Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models 标题:利用预先训练的模型增加用于口语理解的槽值和上下文 链接:https://arxiv.org/abs/2108.08451

作者:Haitao Lin,Lu Xiang,Yu Zhou,Jiajun Zhang,Chengqing Zong 机构:National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China, School of Artificial Intelligence, University of Chinese Academy of Sciences, China, Fanyu AI Laboratory, Beijing Fanyu Technology Co., Ltd, Beijing, China 备注:Accepted by Interspeech2021 摘要:口语理解是建立对话系统的一个重要步骤。由于获取标记数据的成本高昂,SLU面临着数据稀缺问题。因此,本文主要研究SLU中时隙填充任务的数据扩充。为了实现这一目标,我们的目标是在现有数据的基础上生成更加多样化的数据。具体来说,我们试图通过微调预训练的语言模型来挖掘潜在的语言知识。我们提出了两种微调过程的策略:基于值的和基于上下文的扩充。在两个公开的SLU数据集上的实验结果表明,与现有的数据扩充方法相比,我们提出的方法可以生成更多不同的句子,并显著提高了SLU的性能。 摘要:Spoken Language Understanding (SLU) is one essential step in building a dialogue system. Due to the expensive cost of obtaining the labeled data, SLU suffers from the data scarcity problem. Therefore, in this paper, we focus on data augmentation for slot filling task in SLU. To achieve that, we aim at generating more diverse data based on existing data. Specifically, we try to exploit the latent language knowledge from pretrained language models by finetuning them. We propose two strategies for finetuning process: value-based and context-based augmentation. Experimental results on two public SLU datasets have shown that compared with existing data augmentation methods, our proposed method can generate more diverse sentences and significantly improve the performance on SLU.

【2】 Integrating Dialog History into End-to-End Spoken Language Understanding Systems 标题:将对话历史记录集成到端到端口语理解系统中 链接:https://arxiv.org/abs/2108.08405

作者:Jatin Ganhotra,Samuel Thomas,Hong-Kwang J. Kuo,Sachindra Joshi,George Saon,Zoltán Tüske,Brian Kingsbury 机构:IBM Research AI, Yorktown Heights, NY, USA 备注:Interspeech 2021 摘要:处理人机交互的端到端口语理解(SLU)系统通常与上下文无关,并独立处理会话的每一轮。另一方面,口语对话在很大程度上依赖于上下文,对话历史记录中包含有用的信息,可以改进每次会话的处理。在本文中,我们研究了对话历史的重要性,以及如何将其有效地集成到端到端SLU系统中。在处理口语时,我们提出的基于RNN传感器(RNN-T)的SLU模型能够以解码转录本和前几轮SLU标签的形式访问其对话历史。我们将对话历史编码为BERT嵌入,并将其与当前话语的语音特征一起作为SLU模型的附加输入。我们在最近发布的口语对话数据集HarperValleyBank语料库上评估了我们的方法。我们观察到显著的改进:与竞争性上下文无关的端到端基线系统相比,对话操作和呼叫者意图识别任务分别提高了8%和30%。 摘要:End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we investigate the importance of dialog history and how it can be effectively integrated into end-to-end SLU systems. While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns. We encode the dialog history as BERT embeddings, and use them as an additional input to the SLU model along with the speech features for the current utterance. We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus. We observe significant improvements: 8% for dialog action and 30% for caller intent recognition tasks, in comparison to a competitive context independent end-to-end baseline system.

检测相关(1篇)

【1】 DESYR: Definition and Syntactic Representation Based Claim Detection on the Web 标题:DESYR:基于定义和语法表示的Web索赔检测 链接:https://arxiv.org/abs/2108.08759

作者:Megha Sundriyal,Parantak Singh,Md Shad Akhtar,Shubhashis Sengupta,Tanmoy Chakraborty 机构:IIIT-Delhi, India. ,BITS Pilani, Goa, India. ,Accenture Labs Bangalore, India. 备注:10 pages, Accepted at CIKM 2021 摘要:权利主张的表述是论证挖掘的核心。由于声明和非声明之间潜在的语言差异以及基于广泛定义的形式化的不足,区分声明和非声明对于人类和机器来说都是困难的。此外,在线社交媒体使用量的增加导致网络上以非正式文本形式呈现的未经请求的信息激增。为了说明上述情况,在本文中,我们提出了DESYR。这是一个框架,旨在通过利用层次表示学习(依赖启发的庞加莱嵌入)、基于定义的对齐和特征投影的组合,消除非正式网络文本的上述问题。我们不再使用微调的计算机重语言模型,而是采用更以领域为中心但更轻的方法。实验结果表明,DESYR建立在四个基准索赔数据集的最先进系统的基础上,其中大多数数据集是由非正式文本构成的。我们看到,LESA Twitter数据集上增加了3个claim-F1点,在线评论(OC)数据集上增加了1个claim-F1点和9个macro-F1点,Web话语(WD)数据集上增加了24个claim-F1点和17个macro-F1点,微文本(MT)数据集上增加了8个claim-F1点和5个macro-F1点。我们还对结果进行了广泛的分析。我们制作了一个Poincare变体的100-D预训练版本以及源代码。 摘要:The formulation of a claim rests at the core of argument mining. To demarcate between a claim and a non-claim is arduous for both humans and machines, owing to latent linguistic variance between the two and the inadequacy of extensive definition-based formalization. Furthermore, the increase in the usage of online social media has resulted in an explosion of unsolicited information on the web presented as informal text. To account for the aforementioned, in this paper, we proposed DESYR. It is a framework that intends on annulling the said issues for informal web-based text by leveraging a combination of hierarchical representation learning (dependency-inspired Poincare embedding), definition-based alignment, and feature projection. We do away with fine-tuning computer-heavy language models in favor of fabricating a more domain-centric but lighter approach. Experimental results indicate that DESYR builds upon the state-of-the-art system across four benchmark claim datasets, most of which were constructed with informal texts. We see an increase of 3 claim-F1 points on the LESA-Twitter dataset, an increase of 1 claim-F1 point and 9 macro-F1 points on the Online Comments(OC) dataset, an increase of 24 claim-F1 points and 17 macro-F1 points on the Web Discourse(WD) dataset, and an increase of 8 claim-F1 points and 5 macro-F1 points on the Micro Texts(MT) dataset. We also perform an extensive analysis of the results. We make a 100-D pre-trained version of our Poincare-variant along with the source code.

识别/分类(1篇)

【1】 Fine-Grained Element Identification in Complaint Text of Internet Fraud 标题:网络诈骗投诉文本的细粒度元素识别 链接:https://arxiv.org/abs/2108.08676

作者:Tong Liu,Siyuan Wang,Jingchao Fu,Lei Chen,Zhongyu Wei,Yaqi Liu,Heng Ye,Liaosa Xu,Weiqiang Wan,Xuanjing Huang 机构:School of Data Science, Fudan University, China; ,Ant Group, China 备注:5 pages, 5 figures, 3 tables accepted as a short paper to CIKM 2021 摘要:处理在线投诉的现有系统提供了无需解释的最终决定。我们建议以细粒度的方式分析网络欺诈投诉文本。考虑到投诉文本包含多个具有不同功能的条款,我们建议确定每个条款的作用,并将其划分为不同类型的欺诈要素。我们构建了一个来自真实金融服务平台的大型标记数据集。我们在BERT的基础上建立了一个元素识别模型,并提出了另外两个模块来利用投诉文本的上下文进行更好的元素标签分类,即全局上下文编码器和标签细化器。实验结果表明了该模型的有效性。 摘要:Existing system dealing with online complaint provides a final decision without explanations. We propose to analyse the complaint text of internet fraud in a fine-grained manner. Considering the complaint text includes multiple clauses with various functions, we propose to identify the role of each clause and classify them into different types of fraud element. We construct a large labeled dataset originated from a real finance service platform. We build an element identification model on top of BERT and propose additional two modules to utilize the context of complaint text for better element label classification, namely, global context encoder and label refiner. Experimental results show the effectiveness of our model.

检索(1篇)

【1】 Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval 标题:Tydi先生:一种面向密集检索的多语言基准 链接:https://arxiv.org/abs/2108.08787

作者:Xinyu Zhang,Xueguang Ma,Peng Shi,Jimmy Lin 机构:David R. Cheriton School of Computer Science, University of Waterloo 摘要:我们介绍了TyDi先生,这是一个多语言基准数据集,用于11种类型多样的语言中的单语检索,旨在使用所学的密集表示来评估排名。本资源的目标是促进非英语语言中密集检索技术的研究,其动机是最近观察到,现有的表征学习技术在应用于分布外数据时表现不佳。作为一个起点,我们基于DPR的多语言改编(我们称之为“mDPR”)为这个新数据集提供零炮基线。实验表明,尽管mDPR的有效性远低于BM25,但密集表示似乎提供了有价值的相关信号,改善BM25会产生稀疏密集的杂种。除了分析我们的结果,我们还讨论了未来的挑战,并提出了多语言密集检索的研究议程。TyDi先生可从以下网址下载:https://github.com/castorini/mr.tydi. 摘要:We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations. The goal of this resource is to spur research in dense retrieval techniques in non-English languages, motivated by recent observations that existing techniques for representation learning perform poorly when applied to out-of-distribution data. As a starting point, we provide zero-shot baselines for this new dataset based on a multi-lingual adaptation of DPR that we call "mDPR". Experiments show that although the effectiveness of mDPR is much lower than BM25, dense representations nevertheless appear to provide valuable relevance signals, improving BM25 results in sparse-dense hybrids. In addition to analyses of our results, we also discuss future challenges and present a research agenda in multi-lingual dense retrieval. Mr. TyDi can be downloaded at https://github.com/castorini/mr.tydi.

Word2Vec|文本|单词(1篇)

【1】 Czech News Dataset for Semanic Textual Similarity 标题:面向语义文本相似度的捷克语新闻数据集 链接:https://arxiv.org/abs/2108.08708

作者:Jakub Sido,Michal Seják,Ondřej Pražák,Miloslav Konopík,Václav Moravec 机构: NTIS – New Technologies for the Information Society, Department of Computer Science and Engineering, University of West Bohemia, Czech Republic, Department of Journalism, Charles University, Czech Republic 摘要:本文描述了一个由具有语义相似性注释的句子组成的新数据集。数据来源于捷克语的新闻领域。我们详细描述了收集和注释数据的过程。该数据集包含138556条人类注释,分为训练集和测试集。总共有485名新闻专业学生参与了创作过程。为了提高测试集的可靠性,我们将注释计算为9个单独注释的平均值。我们通过测量注释间和注释内注释者的一致性来评估数据集的质量。除了协议编号之外,我们还提供了所收集数据集的详细统计信息。我们以一个基线实验来结束我们的论文,该实验构建了一个预测句子语义相似性的系统。由于大量的训练注释(116 956),该模型的性能明显优于平均注释者(人的相关系数分别为0,92和0,86)。 摘要:This paper describes a novel dataset consisting of sentences with semantic similarity annotations. The data originate from the journalistic domain in the Czech language. We describe the process of collecting and annotating the data in detail. The dataset contains 138,556 human annotations divided into train and test sets. In total, 485 journalism students participated in the creation process. To increase the reliability of the test set, we compute the annotation as an average of 9 individual annotations. We evaluate the quality of the dataset by measuring inter and intra annotation annotators' agreements. Beside agreement numbers, we provide detailed statistics of the collected dataset. We conclude our paper with a baseline experiment of building a system for predicting the semantic similarity of sentences. Due to the massive number of training annotations (116 956), the model can perform significantly better than an average annotator (0,92 versus 0,86 of Person's correlation coefficients).

其他神经网络|深度学习|模型|建模(1篇)

【1】 Language Model Augmented Relevance Score 标题:语言模型增强的相关性得分 链接:https://arxiv.org/abs/2108.08485

作者:Ruibo Liu,Jason Wei,Soroush Vosoughi 机构:Dartmouth College, Google AI Language 备注:In ACL 2021 摘要:虽然自动度量通常用于评估NLG系统,但它们通常与人类的判断关联性较差。较新的指标(如BERTScore)解决了以前的指标(如BLEU和ROUGE)中的许多弱点,这些指标依赖于n-gram匹配。然而,这些新的方法仍然是有限的,因为它们不考虑生成上下文,所以它们不能正确地奖励生成的文本是正确的,但偏离给定的引用。在本文中,我们提出了语言模型增强相关性评分(MARS),一种新的上下文感知NLG评估指标。火星利用现成的语言模型,在强化学习指导下,创建既考虑生成上下文又考虑可用的人类参考的扩充引用,然后将其用作附加得分的生成文本的引用。与三个常见NLG任务中的七个现有指标相比,MARS不仅实现了与人类参考判断的更高相关性,而且在更大程度上区分了形成良好的候选对象和对抗性样本。 摘要:Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements. Newer metrics such as BERTScore have addressed many weaknesses in prior metrics such as BLEU and ROUGE, which rely on n-gram matching. These newer methods, however, are still limited in that they do not consider the generation context, so they cannot properly reward generated text that is correct but deviates from the given reference. In this paper, we propose Language Model Augmented Relevance Score (MARS), a new context-aware metric for NLG evaluation. MARS leverages off-the-shelf language models, guided by reinforcement learning, to create augmented references that consider both the generation context and available human references, which are then used as additional references to score generated text. Compared with seven existing metrics in three common NLG tasks, MARS not only achieves higher correlation with human reference judgements, but also differentiates well-formed candidates from adversarial samples to a larger degree.

其他(4篇)

【1】 Contrastive Language-Image Pre-training for the Italian Language 标题:对比语言--意大利语的表象前训练 链接:https://arxiv.org/abs/2108.08688

作者:Federico Bianchi,Giuseppe Attanasio,Raphael Pisoni,Silvia Terragni,Gabriele Sarti,Sri Lakshmi 机构:Bocconi University, Milan, Italy, Politecnico di Torino, Turin, Italy, Independent Researcher, Vienna, Austria, University of Milano-Bicocca, University of Groningen, Groningen, The Netherlands, Chennai, India 摘要:CLIP(对比语言图像预训练)是一种最新的多模态模型,它联合学习图像和文本的表示。该模型基于大量英文数据进行训练,在Zero-Shot分类任务中表现出令人印象深刻的性能。在不同的语言上训练同一个模型并非易事,因为其他语言中的数据可能不够,模型需要高质量的文本翻译来保证良好的性能。在本文中,我们介绍了第一个意大利语剪辑模型(CLIP-意大利语),该模型在140多万对图像文本上进行训练。结果表明,CLIP-意大利语在图像检索和Zero-Shot分类任务上优于多语言CLIP模型。 摘要:CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs. Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification.

【2】 The Legislative Recipe: Syntax for Machine-Readable Legislation 标题:立法秘诀:机器可读立法的语法 链接:https://arxiv.org/abs/2108.08678

作者:Megan Ma,Bryan Wilson 摘要:法律解释是一项语言冒险。例如,在司法意见书中,法院经常被要求解释法规和立法的文本。正如时间所表明的,这并不总是像听起来那么容易。事情可能取决于含糊不清或前后不一致的语言,从表面上看,人类偏见可能影响法官的决策。这就提出了一个重要的问题:如果有一种方法可以始终如一地提取法规的含义,该怎么办?也就是说,如果有可能使用机器以精确的数学形式对立法进行编码,从而对法律问题做出更清晰的回答,那该怎么办?本文试图解开机器可读性的概念,概述其历史和最新发展。本文将反思逻辑语法和符号语言,以评估代表法律知识的能力和限度。在此过程中,本文试图超越现有文献,讨论各种机器可读立法方法的含义。重要的是,这项研究希望强调在这一新兴的机器可读立法生态系统中,与现有的人类可读立法生态系统相比所遇到的挑战。 摘要:Legal interpretation is a linguistic venture. In judicial opinions, for example, courts are often asked to interpret the text of statutes and legislation. As time has shown, this is not always as easy as it sounds. Matters can hinge on vague or inconsistent language and, under the surface, human biases can impact the decision-making of judges. This raises an important question: what if there was a method of extracting the meaning of statutes consistently? That is, what if it were possible to use machines to encode legislation in a mathematically precise form that would permit clearer responses to legal questions? This article attempts to unpack the notion of machine-readability, providing an overview of both its historical and recent developments. The paper will reflect on logic syntax and symbolic language to assess the capacity and limits of representing legal knowledge. In doing so, the paper seeks to move beyond existing literature to discuss the implications of various approaches to machine-readable legislation. Importantly, this study hopes to highlight the challenges encountered in this burgeoning ecosystem of machine-readable legislation against existing human-readable counterparts.

【3】 QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction 标题:QUEACO:从弱标签行为数据中借用宝库进行查询属性值提取 链接:https://arxiv.org/abs/2108.08468

作者:Danqing Zhang,Zheng Li,Tianyu Cao,Chen Luo,Tony Wu,Hanqing Lu,Yiwei Song,Bing Yin,Tuo Zhao,Qiang Yang 机构:Georgia Institute of Technology, GA, USA, Hong Kong University of Science and Technology, HK, China 备注:None 摘要:我们研究了查询属性值提取问题,其目的是将用户查询中的命名实体识别为不同的表面形式属性值,然后将其转换为正式的规范形式。这个问题包括两个阶段:{命名实体识别(NER)}和{属性值规范化(AVN)}。然而,现有的工作只关注NER阶段,而忽略了同样重要的AVN。为了弥补这一差距,本文提出了一个电子商务搜索中统一的查询属性值提取系统QUEACO,该系统包括两个阶段。此外,通过利用大规模弱标记行为数据,我们进一步提高了提取性能,同时降低了监督成本。具体而言,对于NER阶段,QUEACO采用了一种新型的教师-学生网络,其中在强标记数据上训练的教师网络生成伪标记以细化弱标记数据以训练学生网络。同时,教师网络可以通过学生对强标签数据的反馈动态调整,以最大限度地消除弱标签带来的噪声。对于AVN阶段,我们还利用弱标记的查询到属性行为数据将查询中的表面形式属性值规范化为产品中的规范形式。在真实世界的大规模电子商务数据集上进行的大量实验证明了QUEACO的有效性。 摘要:We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: {named entity recognition (NER)} and {attribute value normalization (AVN)}. However, existing works only focus on the NER phase but neglect equally important AVN. To bridge this gap, this paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO, which involves both two phases. Moreover, by leveraging large-scale weakly-labeled behavior data, we further improve the extraction performance with less supervision cost. Specifically, for the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels to refine the weakly-labeled data for training a student network. Meanwhile, the teacher network can be dynamically adapted by the feedback of the student's performance on strongly-labeled data to maximally denoise the noisy supervisions from the weak labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products. Extensive experiments on a real-world large-scale E-commerce dataset demonstrate the effectiveness of QUEACO.

【4】 FeelsGoodMan: Inferring Semantics of Twitch Neologisms 标题:FeelsGoodMan:推断抽搐新词的语义 链接:https://arxiv.org/abs/2108.08411

作者:Pavel Dolin,Luc d'Hauthuille,Andrea Vattani 机构:Spiketrap, San Francisco, CA, USA, Luc d’Hauthuille 摘要:Twitch聊天在自然语言理解中提出了一个独特的问题,因为大量新词,特别是表情词的存在。总共有806万个表情,其中超过40万个在研究的一周内使用。几乎没有关于表情的含义或情绪的信息,随着新表情的不断涌入和频率的漂移,不可能维护更新的手动标记数据集。我们的论文有两方面的贡献。首先,我们为Twitch数据的情绪分析建立了一个新的基线,比之前的监督基准高出7.9%。其次,我们介绍了一个简单但功能强大的基于单词嵌入和k-NN的无监督框架,以丰富现有的词汇表外知识模型。该框架允许我们自动生成表情的伪词典,我们表明,即使将这些表情知识注入到在电影评论或Twitter等无关数据集上训练的情感分类器中,我们也几乎可以匹配上述监督基准。 摘要:Twitch chats pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were used in the week studied. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in their frequencies, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two fold contribution. First we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous supervised benchmark by 7.9% points. Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate a pseudo-dictionary of emotes and we show that we can nearly match the supervised benchmark above even when injecting such emote knowledge into sentiment classifiers trained on extraneous datasets such as movie reviews or Twitter.

0 人点赞