Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.CL 方向,今日共计41篇
Transformer(2篇)
【1】 FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to Identify Toxic, Engaging, & Fact-Claiming Comments 标题:GermEval 2021年的FH-SWF SG:使用基于Transformer的语言模型来识别有毒的、吸引人的和具有事实说服力的评论 链接:https://arxiv.org/abs/2109.02966
作者:Christian Gawron,Sebastian Schmidt 机构:Fachhochschule S¨udwestfalen, Frauenstuhlweg , Iserlohn 摘要:在这篇文章中,我们描述了我们提交给GermEval 2021共享任务的方法,该任务涉及有毒、参与和事实陈述意见的识别。对于所有三个子任务,我们从Huggingface模型中心微调了免费提供的基于Transformer的模型。在对80%的具有不同超参数的训练数据进行微调后,我们评估了各种预训练模型的性能,并提交了两个性能最佳的结果模型的预测。我们发现这种方法最适用于子任务3,我们的F1分数为0.736。 摘要:In this paper we describe the methods we used for our submissions to the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. For all three subtasks we fine-tuned freely available transformer-based models from the Huggingface model hub. We evaluated the performance of various pre-trained models after fine-tuning on 80% of the training data with different hyperparameters and submitted predictions of the two best performing resulting models. We found that this approach worked best for subtask 3, for which we achieved an F1-score of 0.736.
【2】 Mixed Attention Transformer for LeveragingWord-Level Knowledge to Neural Cross-Lingual Information Retrieval 标题:利用词级知识实现神经跨语言信息检索的混合注意转换器 链接:https://arxiv.org/abs/2109.02789
作者:Zhiqi Huang,Hamed Bonab,Sheikh Muhammad Sarwar,Razieh Rahimi,James Allan 机构:Center for Intelligent Information Retrieval, University of Massachusetts Amherst 摘要:预先训练的情境化表示为许多下游任务提供了巨大的成功,包括文档排序。这种预训练表示法的多语言版本提供了一种使用同一模型联合学习多种语言的可能性。虽然通过这种联合训练有望获得巨大的收益,但在跨语言信息检索(CLIR)的情况下,多语言环境下的模型并没有达到与单语环境下的模型相同的性能水平。我们假设性能下降是由于查询和文档之间的翻译差异造成的。在单语检索任务中,由于相同的词法输入,模型更容易识别文档中出现的查询词。然而,在将不同语言中的单词投影到同一超空间的多语言预训练模型中,该模型倾向于将查询词转换为相关词,即除了目标语言中的同义词之外,或有时不是同义词,出现在类似上下文中的词。此属性使模型难以连接在查询和文档中同时出现的术语。为了解决这个问题,我们提出了一种新的混合注意转换器(MAT),它结合了外部单词级知识,如字典或翻译表。我们设计了一种三明治式结构,将MAT嵌入到最近基于Transformer的深度神经模型中。通过将翻译知识编码成注意矩阵,MAT模型能够关注输入序列中相互翻译的单词。实验结果表明,外部知识的有效性和MAT嵌入式神经重分类模型对CLIR任务的显著改进。 摘要:Pretrained contextualized representations offer great success for many downstream tasks, including document ranking. The multilingual versions of such pretrained representations provide a possibility of jointly learning many languages with the same model. Although it is expected to gain big with such joint training, in the case of cross lingual information retrieval (CLIR), the models under a multilingual setting are not achieving the same level of performance as those under a monolingual setting. We hypothesize that the performance drop is due to the translation gap between query and documents. In the monolingual retrieval task, because of the same lexical inputs, it is easier for model to identify the query terms that occurred in documents. However, in the multilingual pretrained models that the words in different languages are projected into the same hyperspace, the model tends to translate query terms into related terms, i.e., terms that appear in a similar context, in addition to or sometimes rather than synonyms in the target language. This property is creating difficulties for the model to connect terms that cooccur in both query and document. To address this issue, we propose a novel Mixed Attention Transformer (MAT) that incorporates external word level knowledge, such as a dictionary or translation table. We design a sandwich like architecture to embed MAT into the recent transformer based deep neural models. By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence. Experimental results demonstrate the effectiveness of the external knowledge and the significant improvement of MAT embedded neural reranking model on CLIR task.
BERT(5篇)
【1】 Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression 标题:超越保留的准确性:评估BERT压缩的忠诚度和稳健性 链接:https://arxiv.org/abs/2109.03228
作者:Canwen Xu,Wangchunshu Zhou,Tao Ge,Ke Xu,Julian McAuley,Furu Wei 机构: University of California, San Diego , Stanford University , Microsoft Research Asia , Beihang University 备注:Accepted to EMNLP 2021 (main conference) 摘要:最近关于预训练语言模型(例如,BERT)压缩的研究通常使用保留精度作为评估指标。在本文中,我们提出了两个新的度量标准,标签忠诚和概率忠诚,用于衡量压缩模型(即学生)与原始模型(即教师)的模仿程度。我们还探讨了在对抗性攻击下压缩对鲁棒性的影响。我们将量化、剪枝、知识提炼和渐进式模块替换为忠诚和健壮性。通过结合多种压缩技术,我们提供了一种实用的策略,以实现更好的准确性、忠诚度和鲁棒性。 摘要:Recent studies on compression of pretrained language models (e.g., BERT) usually use preserved accuracy as the metric for evaluation. In this paper, we propose two new metrics, label loyalty and probability loyalty that measure how closely a compressed model (i.e., student) mimics the original model (i.e., teacher). We also explore the effect of compression with regard to robustness under adversarial attacks. We benchmark quantization, pruning, knowledge distillation and progressive module replacing with loyalty and robustness. By combining multiple compression techniques, we provide a practical strategy to achieve better accuracy, loyalty and robustness.
【2】 ExCode-Mixed: Explainable Approaches towards Sentiment Analysis on Code-Mixed Data using BERT models 标题:Excode-Mixed:使用ERT模型对混合代码数据进行情感分析的可解释方法 链接:https://arxiv.org/abs/2109.03200
作者:Aman Priyanshu,Aleti Vardhan,Sudarshan Sivakumar,Supriti Vijay,Nipuna Chhabra 机构:Manipal Institute of Technology 备注:3 pages, 1 figure 摘要:在印度等国家,社交媒体网站的使用越来越多,导致了大量代码混合数据。对这些数据的情绪分析可以为人们的观点和观点提供完整的见解。开发强大的解释性技术,解释模型为什么做出预测变得至关重要。在本文中,我们提出了一种适当的方法,将可解释的方法集成到代码混合情感分析中。 摘要:The increasing use of social media sites in countries like India has given rise to large volumes of code-mixed data. Sentiment analysis of this data can provide integral insights into people's perspectives and opinions. Developing robust explainability techniques which explain why models make their predictions becomes essential. In this paper, we propose an adequate methodology to integrate explainable approaches into code-mixed sentiment analysis.
【3】 Naturalness Evaluation of Natural Language Generation in Task-oriented Dialogues using BERT 标题:基于ERT的任务型对话中自然语言生成的自然度评估 链接:https://arxiv.org/abs/2109.02938
作者:Ye Liu,Wolfgang Maier,Wolfgang Minker,Stefan Ultes 机构:Mercedes-Benz AG, Sindelfingen, Germany, Ulm University, Ulm, Germany 备注:accepted to RANLP 2021 摘要:本文提出了一种自动评估对话系统自然语言生成自然性的方法。虽然这项任务以前是通过昂贵和耗时的人力完成的,但我们提出了自动评估生成语言自然度的新任务。通过微调BERT模型,我们提出的自然度评估方法显示了稳健的结果,并优于基线:支持向量机、双向LSTM和BLEURT。此外,通过从质量和信息性语言知识迁移学习,提高了自然度模型的训练速度和评估性能。 摘要:This paper presents an automatic method to evaluate the naturalness of natural language generation in dialogue systems. While this task was previously rendered through expensive and time-consuming human labor, we present this novel task of automatic naturalness evaluation of generated language. By fine-tuning the BERT model, our proposed naturalness evaluation method shows robust results and outperforms the baselines: support vector machines, bi-directional LSTMs, and BLEURT. In addition, the training speed and evaluation performance of naturalness model are improved by transfer learning from quality and informativeness linguistic knowledge.
【4】 Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica 标题:BERT是否像人类感知的那样学习?通过词汇理解语言风格 链接:https://arxiv.org/abs/2109.02738
作者:Shirley Anugrah Hayati,Dongyeop Kang,Lyle Ungar 机构:University of Pennsylvania, Georgia Institute of Technology, University of Minnesota 备注:Accepted at EMNLP 2021 Main Conference 摘要:人们通过他们所写文本的语言风格来表达他们的意图和态度。在这项研究中,我们从两个角度研究了不同风格的词汇用法:人类感知和机器词重要性,因为词汇所提供的风格线索的强度不同。为了收集人类感知的标签,我们在标杆式数据集的基础上策划了一个新的数据集“蜂鸟”。我们让人群工作者突出文本中的代表性词语,使他们认为文本具有以下风格:礼貌、情感、攻击性和五种情感类型。然后,我们将这些人类单词标签与来自流行的微调风格分类器(如BERT)的单词重要性进行比较。我们的结果表明,BERT经常发现与目标风格无关的内容词作为风格预测中使用的重要词,但人类的感知方式并不相同,即使对于某些风格(例如积极情绪和快乐),人类和机器识别的词在某些风格中有显著重叠。 摘要:People convey their intention and attitude through linguistic styles of the text that they write. In this study, we investigate lexicon usages across styles throughout two lenses: human perception and machine word importance, since words differ in the strength of the stylistic cues that they provide. To collect labels of human perception, we curate a new dataset, Hummingbird, on top of benchmarking style datasets. We have crowd workers highlight the representative words in the text that makes them think the text has the following styles: politeness, sentiment, offensiveness, and five emotion types. We then compare these human word labels with word importance derived from a popular fine-tuned style classifier like BERT. Our results show that the BERT often finds content words not relevant to the target style as important words used in style prediction, but humans do not perceive the same way even though for some styles (e.g., positive sentiment and joy) human- and machine-identified words share significant overlap for some styles.
【5】 SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms" 链接:https://arxiv.org/abs/2109.02691
作者:Zhixue Zhao,Ziqi Zhang,Frank Hopfgartner 机构:University of Sheffield, United Kingdom 备注:12 pages, 6 tables, 3 figures 摘要:有毒评论分类模型往往偏向于身份术语,身份术语是表征特定人群(如“穆斯林”和“黑人”)的术语。这种偏见通常反映在假阳性预测中,即带有身份术语的无毒评论。在这项工作中,我们提出了一种新的方法来解决有毒评论分类中的这种偏见,利用评论的主观性水平和身份术语的存在。我们假设,当对一群具有身份术语特征的人发表评论时,该评论有毒的可能性与评论的主观性水平有关,即评论传达个人情感和观点的程度。在BERT模型的基础上,我们提出了一种能够利用这些特性的新结构,并在4个不同大小、代表不同社交媒体平台的数据集上全面评估了我们的模型。结果表明,我们的模型始终优于BERT模型和SOTA模型,前者以不同的方式处理身份项偏差,F1的最大改善率分别为2.43%和1.91%。 摘要:Toxic comment classification models are often found biased toward identity terms which are terms characterizing a specific group of people such as "Muslim" and "black". Such bias is commonly reflected in false-positive predictions, i.e. non-toxic comments with identity terms. In this work, we propose a novel approach to tackle such bias in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that when a comment is made about a group of people that is characterized by an identity term, the likelihood of that comment being toxic is associated with the subjectivity level of the comment, i.e. the extent to which the comment conveys personal feelings and opinions. Building upon the BERT model, we propose a new structure that is able to leverage these features, and thoroughly evaluate our model on 4 datasets of varying sizes and representing different social media platforms. The results show that our model can consistently outperform BERT and a SOTA model devised to address identity term bias in a different way, with a maximum improvement in F1 of 2.43% and 1.91% respectively.
QA|VQA|问答|对话(1篇)
【1】 Exploiting Reasoning Chains for Multi-hop Science Question Answering 标题:利用推理链进行多跳科学答疑 链接:https://arxiv.org/abs/2109.02905
作者:Weiwen Xu,Yang Deng,Huihui Zhang,Deng Cai,Wai Lam 机构:The Chinese University of Hong Kong 备注:14 pages, Findings of EMNLP 2021 摘要:我们提出了一种新的链引导检索器-读取器({tt-CGR})框架来为多跳科学问答的推理链建模。我们的框架能够执行可解释的推理,而不需要任何特定于语料库的注释,如基本真理推理链或人类注释实体。具体来说,我们首先从检索到的证据事实的抽象意义表示所构建的语义图生成推理链。还设计了一个{Chain aware loss},涉及局部和全局链信息,使生成的链能够作为远程监控信号用于训练检索器,其中还采用强化学习来最大化推理链的效用。我们的框架允许检索器捕获整个推理过程的逐步线索,这不仅在两个具有挑战性的多跳科学QA任务(即OpenBookQA和ARC Challenge)中被证明是有效的,而且有利于解释性。 摘要:We propose a novel Chain Guided Retriever-reader ({tt CGR}) framework to model the reasoning chain for multi-hop Science Question Answering. Our framework is capable of performing explainable reasoning without the need of any corpus-specific annotations, such as the ground-truth reasoning chain, or human-annotated entity mentions. Specifically, we first generate reasoning chains from a semantic graph constructed by Abstract Meaning Representation of retrieved evidence facts. A textit{Chain-aware loss}, concerning both local and global chain information, is also designed to enable the generated chains to serve as distant supervision signals for training the retriever, where reinforcement learning is also adopted to maximize the utility of the reasoning chains. Our framework allows the retriever to capture step-by-step clues of the entire reasoning process, which is not only shown to be effective on two challenging multi-hop Science QA tasks, namely OpenBookQA and ARC-Challenge, but also favors explainability.
机器翻译(3篇)
【1】 Revisiting Context Choices for Context-aware Machine Translation 标题:语境感知机器翻译中的语境选择重访 链接:https://arxiv.org/abs/2109.02995
作者:Matīss Rikters,Toshiaki Nakazawa 机构:The University of Tokyo, -,-, Hongo, Bunkyo-ku, Tokyo, Japan 摘要:上下文感知机器翻译(MT)最流行的方法之一是对源句子和上下文使用单独的编码器作为一个目标句子的多个源。最近的研究怀疑这些模型是否真的从上下文中学习有用的信号,或者自动评估指标的改进只是一个副作用。我们表明,多源转换器模型比标准转换器基础模型提高了机器翻译,即使提供了空行作为上下文,但当提供足够的正确上下文时,翻译质量显著提高(1.51-2.65 BLEU)。我们还表明,即使域内上下文的随机洗牌也可以比基线更好,正确的上下文进一步提高翻译质量,而域外上下文的随机洗牌则进一步降低翻译质量。 摘要:One of the most popular methods for context-aware machine translation (MT) is to use separate encoders for the source sentence and context as multiple sources for one target sentence. Recent work has cast doubt on whether these models actually learn useful signals from the context or are improvements in automatic evaluation metrics just a side-effect. We show that multi-source transformer models improve MT over standard transformer-base models even with empty lines provided as context, but the translation quality improves significantly (1.51 - 2.65 BLEU) when a sufficient amount of correct context is provided. We also show that even though randomly shuffling in-domain context can also improve over baselines, the correct context further improves translation quality and random out-of-domain context further degrades it.
【2】 Don't Go Far Off: An Empirical Study on Neural Poetry Translation 标题:不要走得太远:神经诗歌翻译的实证研究 链接:https://arxiv.org/abs/2109.02972
作者:Tuhin Chakrabarty,Arkadiy Saakyan,Smaranda Muresan 机构:Department of Computer Science, Columbia University, Data Science Institute, Columbia University 备注:EMNLP 2021 Camera ready 摘要:尽管机器翻译质量不断提高,但由于缺乏开源的平行诗歌语料库,以及保留诗歌的语义、风格和比喻性质所涉及的内在复杂性,诗歌自动翻译仍然是一个具有挑战性的问题。我们从几个维度对诗歌翻译进行了实证研究:1)训练数据的大小和风格(诗歌与非诗歌),包括Zero-Shot设置;2) 双语与多语学习;3)语言家族特定模型与混合多语言模型。为了实现这一点,我们为几种语言对提供了一个平行的诗歌翻译数据集。我们的研究结果表明,在自动度量(BLEU、BERTScore)和人类评估度量(如忠实性(含义和诗意风格))方面,诗意文本的多语言微调显著优于非诗意文本的多语言微调,非诗意文本的大小是前者的35倍。此外,诗意数据的多语言微调优于诗意数据的双语微调。 摘要:Despite constant improvements in machine translation quality, automatic poetry translation remains a challenging problem due to the lack of open-sourced parallel poetic corpora, and to the intrinsic complexities involved in preserving the semantics, style, and figurative nature of poetry. We present an empirical investigation for poetry translation along several dimensions: 1) size and style of training data (poetic vs. non-poetic), including a zero-shot setup; 2) bilingual vs. multilingual learning; and 3) language-family-specific models vs. mixed-multilingual models. To accomplish this, we contribute a parallel dataset of poetry translations for several language pairs. Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size, both in terms of automatic metrics (BLEU, BERTScore) and human evaluation metrics such as faithfulness (meaning and poetic style). Moreover, multilingual fine-tuning on poetic data outperforms emph{bilingual} fine-tuning on poetic data.
【3】 Paraphrase Generation as Unsupervised Machine Translation 标题:无监督机器翻译中的意译生成 链接:https://arxiv.org/abs/2109.02950
作者:Chun Fan,Yufei Tian,Yuxian Meng,Nanyun Peng,Xiaofei Sun,Fei Wu,Jiwei Li 机构:♦Zhejiang University, ♠Computer Center of Peking University, University of California, Los Angeles, ♣Shannon.AI 摘要:在本文中,我们提出了一种新的释义生成范式,该范式将任务视为无监督机器翻译(UMT),基于这样一个假设:在一个大规模的未标记单语语料库中,必须有对句子表达相同的意思。该方法首先将一个大型的未标记语料库分割成多个聚类,并利用这些聚类对训练多个UMT模型。然后,基于这些UMT模型产生的释义对,可以训练一个统一的代理模型作为最终的Seq2Seq模型来生成释义,该释义可以直接用于非监督设置中的测试,或者在监督设置中的标记数据集上进行微调。与基于机器翻译的释义生成方法相比,该方法避免了对双语句子对的依赖。它还允许人工干预模型,以便使用不同的过滤标准生成更多不同的释义。在现有的有监督和无监督的释义数据集上进行的大量实验证明了所提出范式的有效性。 摘要:In this paper, we propose a new paradigm for paraphrase generation by treating the task as unsupervised machine translation (UMT) based on the assumption that there must be pairs of sentences expressing the same meaning in a large-scale unlabeled monolingual corpus. The proposed paradigm first splits a large unlabeled corpus into multiple clusters, and trains multiple UMT models using pairs of these clusters. Then based on the paraphrase pairs produced by these UMT models, a unified surrogate model can be trained to serve as the final Seq2Seq model to generate paraphrases, which can be directly used for test in the unsupervised setup, or be finetuned on labeled datasets in the supervised setup. The proposed method offers merits over machine-translation-based paraphrase generation methods, as it avoids reliance on bilingual sentence pairs. It also allows human intervene with the model so that more diverse paraphrases can be generated using different filtering criteria. Extensive experiments on existing paraphrase dataset for both the supervised and unsupervised setups demonstrate the effectiveness the proposed paradigm.
Graph|知识图谱|Knowledge(3篇)
【1】 Learning grounded word meaning representations on similarity graphs 标题:基于相似图的基础词义表征学习 链接:https://arxiv.org/abs/2109.03084
作者:Mariella Dimiccoli,Herwig Wendt,Pau Batlle 机构:Institut de Robòtica i Informàtica Industrial, (CSIC-UPC), Barcelona, Spain, CNRS, IRIT, Univ. of Toulouse, France, California Institute of Technology, Pasadena, California 备注:Accepted to EMNLP 2021 (long paper) 摘要:本文介绍了一种新的方法来学习视觉扎根的意义表示的话作为低维节点嵌入在一个潜在的图形层次。层次结构的较低层次通过专用但可通信的图形对特定于模态的单词表示进行建模,而较高层次将这些表示放在一个图形上,以便从两种模态共同学习表示。每个图的拓扑结构建立了词与词之间的相似关系模型,并与图嵌入联合进行估计。该模型的基本假设是,在低维空间中,具有相似意义的单词对应于底层相似图中的社区。我们将该模型命名为分层多模态相似图嵌入(HM-SGE)。实验结果验证了HM-SGE模拟人类相似性判断和概念分类的能力,优于现有技术。 摘要:This paper introduces a novel approach to learn visually grounded meaning representations of words as low-dimensional node embeddings on an underlying graph hierarchy. The lower level of the hierarchy models modality-specific word representations through dedicated but communicating graphs, while the higher level puts these representations together on a single graph to learn a representation jointly from both modalities. The topology of each graph models similarity relations among words, and is estimated jointly with the graph embedding. The assumption underlying this model is that words sharing similar meaning correspond to communities in an underlying similarity graph in a low-dimensional space. We named this model Hierarchical Multi-Modal Similarity Graph Embedding (HM-SGE). Experimental results validate the ability of HM-SGE to simulate human similarity judgements and concept categorization, outperforming the state of the art.
【2】 Empathetic Dialogue Generation with Pre-trained RoBERTa-GPT2 and External Knowledge 标题:利用预先训练的Roberta-GPT2和外部知识生成移情对话 链接:https://arxiv.org/abs/2109.03004
作者:Ye Liu,Wolfgang Maier,Wolfgang Minker,Stefan Ultes 机构:comWolfgang MinkerUlm University 备注:accepted at International Workshop on Spoken Dialog System Technology (IWSDS) 2021 摘要:对话代理面临的一个挑战是识别对话伙伴的感受并做出相应的反应。在这项工作中,RoBERTa-GPT2被提议用于移情对话生成,其中预训练的自动编码RoBERTa被用作编码器,预训练的自动回归GPT-2被用作解码器。通过将预先训练好的RoBERTa和GPT-2相结合,我们的模型实现了最先进的情感准确性。为了实现RoBERTa-GPT2模型的移情能力,我们提出了一种常识性知识和情感概念抽取器,该抽取器为GPT-2解码器抽取对话上下文中的常识性和情感性概念。实验结果表明,移情对话的生成得益于预先训练的编译码器结构和外部知识。 摘要:One challenge for dialogue agents is to recognize feelings of the conversation partner and respond accordingly. In this work, RoBERTa-GPT2 is proposed for empathetic dialogue generation, where the pre-trained auto-encoding RoBERTa is utilised as encoder and the pre-trained auto-regressive GPT-2 as decoder. With the combination of the pre-trained RoBERTa and GPT-2, our model realizes a new state-of-the-art emotion accuracy. To enable the empathetic ability of RoBERTa-GPT2 model, we propose a commonsense knowledge and emotional concepts extractor, in which the commonsensible and emotional concepts of dialogue context are extracted for the GPT-2 decoder. The experiment results demonstrate that the empathetic dialogue generation benefits from both pre-trained encoder-decoder architecture and external knowledge.
【3】 Puzzle Solving without Search or Human Knowledge: An Unnatural Language Approach 标题:在没有搜索或人类知识的情况下解谜:一种非自然语言的方法 链接:https://arxiv.org/abs/2109.02797
作者:David Noever,Ryerson Burdick 机构:PeopleTec, Inc., University of Maryland, College Park, Huntsville, AL, College Park, MD 摘要:生成式预训练转换器(GPT-2)在学习文本存档游戏符号方面的应用为探索稀疏奖励游戏提供了一个模型环境。事实证明,transformer架构适合于对描述迷宫、魔方和数独解算器的已解决文本档案进行训练。该方法得益于微调transformer架构,以可视化从人类启发式或领域专业知识的任何指导之外衍生的合理策略。游戏的大搜索空间($>10^{19}$)提供了一个益智环境,在这个环境中,解决方案几乎没有中间奖励,只有解决挑战的最后一步。 摘要:The application of Generative Pre-trained Transformer (GPT-2) to learn text-archived game notation provides a model environment for exploring sparse reward gameplay. The transformer architecture proves amenable to training on solved text archives describing mazes, Rubik's Cube, and Sudoku solvers. The method benefits from fine-tuning the transformer architecture to visualize plausible strategies derived outside any guidance from human heuristics or domain expertise. The large search space ($>10^{19}$) for the games provides a puzzle environment in which the solution has few intermediate rewards and a final move that solves the challenge.
摘要|信息提取(2篇)
【1】 Aspect-Controllable Opinion Summarization 标题:方面可控的观点摘要 链接:https://arxiv.org/abs/2109.03171
作者:Reinald Kim Amplayo,Stefanos Angelidis,Mirella Lapata 机构:Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, Crichton Street, EH,AB 备注:EMNLP 2021 摘要:最近关于意见总结的工作根据一组输入评论和其中表达的意见的流行程度生成了一般性总结。在本文中,我们提出了一种基于方面查询(例如,描述酒店的位置和房间)生成定制摘要的方法。使用回顾语料库,我们创建了一个由(回顾,总结)对组成的综合训练数据集,其中包含方面控制器,这些控制器由一个多实例学习模型生成,该模型预测文档在不同粒度级别的方面。我们使用合成数据集微调预训练模型,并通过修改方面控制器生成特定于方面的摘要。在两个基准上的实验表明,我们的模型优于以前的最新技术,并通过控制其中讨论的方面的数量生成个性化摘要。 摘要:Recent work on opinion summarization produces general summaries based on a set of input reviews and the popularity of opinions expressed in them. In this paper, we propose an approach that allows the generation of customized summaries based on aspect queries (e.g., describing the location and room of a hotel). Using a review corpus, we create a synthetic training dataset of (review, summary) pairs enriched with aspect controllers which are induced by a multi-instance learning model that predicts the aspects of a document at different levels of granularity. We fine-tune a pretrained model using our synthetic dataset and generate aspect-specific summaries by modifying the aspect controllers. Experiments on two benchmarks show that our model outperforms the previous state of the art and generates personalized summaries by controlling the number of aspects discussed in them.
【2】 Text-to-Table: A New Way of Information Extraction 标题:文本到表格:一种新的信息抽取方式 链接:https://arxiv.org/abs/2109.02707
作者:Xueqing Wu,Jiacheng Zhang,Hang Li 机构: University of Illinois Urbana-Champaign, ByteDance AI Lab 摘要:我们研究了一个新的信息抽取问题集,称为文本到表格,它可以被看作是一个研究良好的表格到文本的反问题。在“文本到表”中,给定一个文本,可以创建一个表或多个表来表示文本的主要内容,同时从文本-表对数据学习模型。问题设置不同于IE的现有方法。首先,可以从长文本到结构复杂的大型表格进行提取。其次,提取完全是数据驱动的,不需要显式定义模式。据我们所知,以前没有研究这个问题的工作。在这项工作中,我们将文本到表形式化为序列到序列(seq2seq)问题。我们首先使用从预先训练的语言模型微调的seq2seq模型来执行任务。我们还在seq2seq方法中开发了一种新方法,利用表生成中的两种附加技术:表约束和表关系嵌入。我们在文本到表的实验中使用了四个现有的表到文本数据集。实验结果表明,vanilla seq2seq模型的性能优于使用关系提取和命名实体提取的基线方法。结果还表明,我们的方法可以进一步提高vanilla seq2seq模型的性能。我们进一步讨论拟议任务的主要挑战。代码和数据将公开。 摘要:We study a new problem setting of information extraction (IE), referred to as text-to-table, which can be viewed as an inverse problem of the well-studied table-to-text. In text-to-table, given a text, one creates a table or several tables expressing the main content of the text, while the model is learned from text-table pair data. The problem setting differs from those of the existing methods for IE. First, the extraction can be carried out from long texts to large tables with complex structures. Second, the extraction is entirely data-driven, and there is no need to explicitly define the schemas. As far as we know, there has been no previous work that studies the problem. In this work, we formalize text-to-table as a sequence-to-sequence (seq2seq) problem. We first employ a seq2seq model fine-tuned from a pre-trained language model to perform the task. We also develop a new method within the seq2seq approach, exploiting two additional techniques in table generation: table constraint and table relation embeddings. We make use of four existing table-to-text datasets in our experiments on text-to-table. Experimental results show that the vanilla seq2seq model can outperform the baseline methods of using relation extraction and named entity extraction. The results also show that our method can further boost the performances of the vanilla seq2seq model. We further discuss the main challenges of the proposed task. The code and data will be made publicly available.
推理|分析|理解|解释(1篇)
【1】 Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models 标题:基于预训练模型的泛化常识推理策略探索 链接:https://arxiv.org/abs/2109.02837
作者:Kaixin Ma,Filip Ilievski,Jonathan Francis,Satoru Ozaki,Eric Nyberg,Alessandro Oltramari 机构:†Language Technologies Institute, Carnegie Mellon University, §Information Sciences Institute, University of Southern California, ¶Human-Machine Collaboration, Bosch Research Pittsburgh 备注:EMNLP 2021 摘要:常识推理基准在很大程度上是通过微调语言模型来解决的。缺点是,微调可能会导致模型过度拟合特定于任务的数据,从而忘记在预训练期间获得的知识。最近的工作只提出了轻量级模型更新,因为模型可能已经从过去的经验中获得了有用的知识,但仍然存在一个挑战,即理解哪些部分以及在多大程度上应该为给定的任务优化模型。在本文中,我们研究了从常识推理数据集中学习的模型。我们测量了三种不同的适应方法对模型的泛化和准确性的影响。我们在两个模型上的实验表明,通过学习任务的内容和结构,微调的效果最好,但对新答案的过度拟合和有限泛化会带来问题。我们观察到,诸如前缀调整之类的替代自适应方法具有相当的准确性,但能更好地推广到看不见的答案,并且对敌对分裂更具鲁棒性。 摘要:Commonsense reasoning benchmarks have been largely solved by fine-tuning language models. The downside is that fine-tuning may cause models to overfit to task-specific data and thereby forget their knowledge gained during pre-training. Recent works only propose lightweight model updates as models may already possess useful knowledge from past experience, but a challenge remains in understanding what parts and to what extent models should be refined for a given task. In this paper, we investigate what models learn from commonsense reasoning datasets. We measure the impact of three different adaptation methods on the generalization and accuracy of models. Our experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
GAN|对抗|攻击|生成相关(1篇)
【1】 IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages 标题:IndicBART:一种印地语自然语言生成的预训练模型 链接:https://arxiv.org/abs/2109.02903
作者:Raj Dabre,Himani Shrotriya,Anoop Kunchukuttan,Ratish Puduppully,Mitesh M. Khapra,Pratyush Kumar 机构:National Institute of Information and Communications Technology, IIT Madras, Microsoft, University Of Edinburgh 备注:Preliminary work on Natural Language Generation for Indic languages. We work on pre-training Indic specific sequence to sequence models and evaluate them for Machine Translation and Summarization 摘要:在本文中,我们介绍了IndicBART,一个多语言、序列到序列的预训练模型,重点关注11种印度语和英语。与现有的预训练模型不同,indibart利用Indic脚本之间的拼写相似性来改进相似Indic语言之间的迁移学习。我们评估了两个NLG任务:神经机器翻译(NMT)和极端概括。我们对12种语言对的NMT和使用多语言微调的7种语言的极端摘要进行的实验表明,IndicBART与mBART50具有竞争性或优于mBART50,尽管包含的参数明显较少。我们的分析侧重于确定脚本统一(对Devanagari)、语料库大小以及多种语言对最终性能的影响。IndicBART模型在麻省理工学院许可下可在https://indicnlp.ai4bharat.org/indic-bart . 摘要:In this paper we present IndicBART, a multilingual, sequence-to-sequence pre-trained model focusing on 11 Indic languages and English. Different from existing pre-trained models, IndicBART utilizes the orthographic similarity between Indic scripts to improve transfer learning between similar Indic languages. We evaluate IndicBART on two NLG tasks: Neural Machine Translation (NMT) and extreme summarization. Our experiments on NMT for 12 language pairs and extreme summarization for 7 languages using multilingual fine-tuning show that IndicBART is competitive with or better than mBART50 despite containing significantly fewer parameters. Our analyses focus on identifying the impact of script unification (to Devanagari), corpora size as well as multilingualism on the final performance. The IndicBART model is available under the MIT license at https://indicnlp.ai4bharat.org/indic-bart .
半/弱/无监督|不确定性(1篇)
【1】 Unsupervised Conversation Disentanglement through Co-Training 标题:通过联合训练实现无人监督的谈话解脱 链接:https://arxiv.org/abs/2109.03199
作者:Hui Liu,Zhan Shi,Xiaodan Zhu 机构:Ingenuity Labs Research Institute & ECE, Queen’s University, Canada 备注:Accepted to EMNLP 2021 main conference 摘要:会话分离的目的是将混杂的消息分离成分离的会话,这是理解多方会话的一项基本任务。现有的会话解纠缠工作在很大程度上依赖于人工标注的数据集,在实践中获取这些数据集非常昂贵。在这项工作中,我们探索训练一个对话解开模型,而不参考任何人类注释。我们的方法建立在深度协同训练算法的基础上,该算法由两个神经网络组成:消息对分类器和会话分类器。前者负责检索两条消息之间的本地关系,而后者通过捕获上下文感知信息将消息分类到会话。这两个网络分别使用从未注语料库构建的伪数据进行初始化。在深度协同训练过程中,我们使用会话分类器作为强化学习组件,通过最大化消息对分类器给出的局部奖励来学习会话分配策略。对于消息对分类器,我们通过从会话分类器预测的解纠缠会话中检索具有高置信度的消息对来丰富其训练数据。在大型电影对话数据集上的实验结果表明,与以往的有监督方法相比,本文提出的方法具有更好的性能。进一步的实验表明,预测的解纠缠会话能够提高多方响应选择下游任务的性能。 摘要:Conversation disentanglement aims to separate intermingled messages into detached sessions, which is a fundamental task in understanding multi-party conversations. Existing work on conversation disentanglement relies heavily upon human-annotated datasets, which are expensive to obtain in practice. In this work, we explore to train a conversation disentanglement model without referencing any human annotations. Our method is built upon a deep co-training algorithm, which consists of two neural networks: a message-pair classifier and a session classifier. The former is responsible for retrieving local relations between two messages while the latter categorizes a message to a session by capturing context-aware information. Both networks are initialized respectively with pseudo data built from an unannotated corpus. During the deep co-training process, we use the session classifier as a reinforcement learning component to learn a session assigning policy by maximizing the local rewards given by the message-pair classifier. For the message-pair classifier, we enrich its training data by retrieving message pairs with high confidence from the disentangled sessions predicted by the session classifier. Experimental results on the large Movie Dialogue Dataset demonstrate that our proposed approach achieves competitive performance compared to the previous supervised methods. Further experiments show that the predicted disentangled conversations can promote the performance on the downstream task of multi-party response selection.
检测相关(2篇)
【1】 GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation 标题:GOLD:使用数据增强改进对话框中的超出范围检测 链接:https://arxiv.org/abs/2109.03079
作者:Derek Chen,Zhou Yu 机构:ASAPP, New York, NY , Columbia University, NY 备注:14 pages, 5 figures. Accepted at EMNLP 2021 摘要:实际的对话系统需要检测范围外(OOS)话语的稳健方法,以避免会话中断和相关故障模式。使用标记的OOS示例直接训练模型会产生合理的性能,但获取此类数据是一个资源密集型过程。为了解决这个有限的数据问题,以前的方法侧重于更好地建模范围内(INS)示例的分布。我们将GOLD作为一种正交技术引入,以增加现有数据,从而训练在低数据区运行的更好的OOS检测器。GOLD使用辅助数据集中的样本生成伪标记候选,并通过一种新的过滤机制仅保留最有利的候选进行训练。在三个目标基准测试的实验中,top GOLD模型在所有关键指标上都优于所有现有方法,相对于基准性能中值,实现了52.4%、48.9%和50.3%的相对收益。我们还分析了OOS数据的独特特性,以确定优化应用我们所提出方法的关键因素。 摘要:Practical dialogue systems require robust methods of detecting out-of-scope (OOS) utterances to avoid conversational breakdowns and related failure modes. Directly training a model with labeled OOS examples yields reasonable performance, but obtaining such data is a resource-intensive process. To tackle this limited-data problem, previous methods focus on better modeling the distribution of in-scope (INS) examples. We introduce GOLD as an orthogonal technique that augments existing data to train better OOS detectors operating in low-data regimes. GOLD generates pseudo-labeled candidates using samples from an auxiliary dataset and keeps only the most beneficial candidates for training through a novel filtering mechanism. In experiments across three target benchmarks, the top GOLD model outperforms all existing methods on all key metrics, achieving relative gains of 52.4%, 48.9% and 50.3% against median baseline performance. We also analyze the unique properties of OOS data to identify key factors for optimally applying our proposed method.
【2】 Detecting Inspiring Content on Social Media 标题:检测社交媒体上的鼓舞人心的内容 链接:https://arxiv.org/abs/2109.02734
作者:Oana Ignat,Y-Lan Boureau,Jane A. Yu,Alon Halevy 机构:University of Michigan, MI, USA, Facebook AI, New York City, USA, Menlo Park, USA 备注:accepted at ACII 2021 摘要:灵感使人看到新的可能性,并改变他们感知自身潜力的方式。灵感在心理学中很少受到关注,在NLP社区中也没有进行过研究。据我们所知,这项工作是第一次通过机器学习方法研究灵感。我们的目标是从社交媒体数据中自动检测鼓舞人心的内容。为此,我们分析社交媒体帖子,梳理出是什么让帖子鼓舞人心,以及哪些话题鼓舞人心。我们发布了一个包含5800个鼓舞人心的和5800个非鼓舞人心的英语公开帖子的数据集,这些独特的ID是从第三方提供的Reddit公开帖子中收集的,并使用语言启发法自动检测哪些社交媒体英语公开帖子鼓舞人心。 摘要:Inspiration moves a person to see new possibilities and transforms the way they perceive their own potential. Inspiration has received little attention in psychology, and has not been researched before in the NLP community. To the best of our knowledge, this work is the first to study inspiration through machine learning methods. We aim to automatically detect inspiring content from social media data. To this end, we analyze social media posts to tease out what makes a post inspiring and what topics are inspiring. We release a dataset of 5,800 inspiring and 5,800 non-inspiring English-language public post unique ids collected from a dump of Reddit public posts made available by a third party and use linguistic heuristics to automatically detect which social media English-language posts are inspiring.
识别/分类(2篇)
【1】 Joint model for intent and entity recognition 标题:意图和实体识别的联合模型 链接:https://arxiv.org/abs/2109.03221
作者:Petr Lorenc 机构:FEE, Czech Technical University in Prague 摘要:自然对话的语义理解由几个部分组成。其中一些,如意图分类和实体检测,在决定处理用户输入的下一步中起着至关重要的作用。将每个任务作为一个单独的问题来处理可能会浪费训练资源,而且每个问题都可以从中受益。本文将这些问题作为一个整体来解决。我们的新模型将意图和实体识别结合到一个系统中,与单独解决每个任务相比,在这两个任务中实现了更好的指标,并且训练需求更低。我们还根据输入对模型进行了优化。 摘要:The semantic understanding of natural dialogues composes of several parts. Some of them, like intent classification and entity detection, have a crucial role in deciding the next steps in handling user input. Handling each task as an individual problem can be wasting of training resources, and also each problem can benefit from each other. This paper tackles these problems as one. Our new model, which combine intent and entity recognition into one system, is achieving better metrics in both tasks with lower training requirements than solving each task separately. We also optimize the model based on the inputs.
【2】 End-to-end Neural Information Status Classification 标题:端到端神经信息状态分类 链接:https://arxiv.org/abs/2109.02753
作者:Yufang Hou 机构:IBM Research Europe, Dublin, Ireland 备注:Accepted at EMNLP2021 Findings 摘要:以往关于信息状态(IS)分类和桥接回指识别的大多数研究都假设黄金提及或句法树信息是给定的(Hou等人,2013;Roesiger等人,2018年;侯,2020年;余和坡西奥,2020年)。在本文中,我们提出了一种用于信息状态分类的端到端神经方法。我们的方法由提及提取组件和信息状态分配组件组成。在推理过程中,我们的系统将原始文本作为输入,并生成提及及其信息状态。在iNotes语料库(Markert et al.,2012)上,我们展示了我们的信息状态分配组件在基于黄金提及的细粒度IS分类方面取得了新的最新成果。此外,在端到端设置中,我们的系统在提及提取和细粒度IS分类方面的性能明显优于其他基线。最后,我们在BASHI(Roesiger,2018)和SciCorp(Roesiger,2016)上应用我们的系统来识别指称桥接回指。我们发现,与之前依靠句法信息并在域内数据集上进行训练的最先进系统相比,我们在ISNotes上训练的端到端系统在桥接回指识别方面取得了具有竞争力的结果(Yu和Poesio,2020)。 摘要:Most previous studies on information status (IS) classification and bridging anaphora recognition assume that the gold mention or syntactic tree information is given (Hou et al., 2013; Roesiger et al., 2018; Hou, 2020; Yu and Poesio, 2020). In this paper, we propose an end-to-end neural approach for information status classification. Our approach consists of a mention extraction component and an information status assignment component. During the inference time, our system takes a raw text as the input and generates mentions together with their information status. On the ISNotes corpus (Markert et al., 2012), we show that our information status assignment component achieves new state-of-the-art results on fine-grained IS classification based on gold mentions. Furthermore, our system performs significantly better than other baselines for both mention extraction and fine-grained IS classification in the end-to-end setting. Finally, we apply our system on BASHI (Roesiger, 2018) and SciCorp (Roesiger, 2016) to recognize referential bridging anaphora. We find that our end-to-end system trained on ISNotes achieves competitive results on bridging anaphora recognition compared to the previous state-of-the-art system that relies on syntactic information and is trained on the in-domain datasets (Yu and Poesio, 2020).
Zero/Few/One-Shot|迁移|自适应(1篇)
【1】 Patient Outcome and Zero-shot Diagnosis Prediction with Hypernetwork-guided Multitask Learning 标题:基于超网络引导的多任务学习的患者预后和零中率诊断预测 链接:https://arxiv.org/abs/2109.03062
作者:Shaoxiong Ji,Pekka Marttinen 机构:Aalto University, Finland 摘要:多任务深度学习已应用于从文本中预测患者结果,将临床笔记作为输入,并训练具有多任务联合损失函数的深度神经网络。然而,多任务学习的联合训练方案存在任务间干扰,多任务间的诊断预测由于罕见疾病或不可见的诊断而存在可推广性问题。为了解决这些挑战,我们提出了一种基于超网络的方法,该方法生成任务条件参数和多任务预测头的系数,以学习任务特定预测并平衡多任务学习。我们还加入了语义任务信息,以提高我们的任务条件多任务模型的通用性。从真实世界的模拟数据库中提取的早期和出院记录的实验表明,在大多数情况下,我们的方法在多任务患者结局预测方面比强基线具有更好的性能。此外,我们的方法可以有效地处理信息有限的场景,并改进对不可见诊断类别的零炮预测。 摘要:Multitask deep learning has been applied to patient outcome prediction from text, taking clinical notes as input and training deep neural networks with a joint loss function of multiple tasks. However, the joint training scheme of multitask learning suffers from inter-task interference, and diagnosis prediction among the multiple tasks has the generalizability issue due to rare diseases or unseen diagnoses. To solve these challenges, we propose a hypernetwork-based approach that generates task-conditioned parameters and coefficients of multitask prediction heads to learn task-specific prediction and balance the multitask learning. We also incorporate semantic task information to improves the generalizability of our task-conditioned multitask model. Experiments on early and discharge notes extracted from the real-world MIMIC database show our method can achieve better performance on multitask patient outcome prediction than strong baselines in most cases. Besides, our method can effectively handle the scenario with limited information and improve zero-shot prediction on unseen diagnosis categories.
Word2Vec|文本|单词(2篇)
【1】 Rare Words Degenerate All Words 标题:罕见词使所有的词都退化了 链接:https://arxiv.org/abs/2109.03127
作者:Sangwon Yu,Jongyoon Song,Heeseung Kim,Seong-min Lee,Woo-Jong Ryu,Sungroh Yoon 机构:Data Science & AI Laboratory, Seoul National University, Republic of Korea, ASRI, INMC, and Interdisciplinary Program in AI, Seoul National University, Republic of Korea, AIRS Company, Hyundai Motor Group, Republic of Korea 摘要:尽管神经网络语言模型取得了进展,但嵌入的表示退化问题仍然具有挑战性。最近的研究发现,学习到的输出嵌入退化为一个狭窄的锥分布,使得每个嵌入之间的相似性为正。他们分析了退化问题的原因,该问题已被证明是大多数嵌入件的常见问题。然而,我们发现退化问题尤其起源于稀有词嵌入的训练。在这项研究中,我们分析了稀有词嵌入退化的内在机制,即关于负对数似然损失函数的梯度。此外,我们从理论和实证上证明了稀有词嵌入的退化导致了非稀有词嵌入的退化,并且通过防止稀有词嵌入的退化可以缓解整体退化问题。基于我们的分析,我们提出了一种新的方法,自适应梯度部分缩放(AGPS),以解决退化问题。实验结果定性和定量地证明了该方法的有效性。 摘要:Despite advances in neural network language model, the representation degeneration problem of embeddings is still challenging. Recent studies have found that the learned output embeddings are degenerated into a narrow-cone distribution which makes the similarity between each embeddings positive. They analyzed the cause of the degeneration problem has been demonstrated as common to most embeddings. However, we found that the degeneration problem is especially originated from the training of embeddings of rare words. In this study, we analyze the intrinsic mechanism of the degeneration of rare word embeddings with respect of their gradient about the negative log-likelihood loss function. Furthermore, we theoretically and empirically demonstrate that the degeneration of rare word embeddings causes the degeneration of non-rare word embeddings, and that the overall degeneration problem can be alleviated by preventing the degeneration of rare word embeddings. Based on our analyses, we propose a novel method, Adaptive Gradient Partial Scaling(AGPS), to address the degeneration problem. Experimental results demonstrate the effectiveness of the proposed method qualitatively and quantitatively.
【2】 Generate & Rank: A Multi-task Framework for Math Word Problems 标题:生成与排序:数学应用题的多任务框架 链接:https://arxiv.org/abs/2109.03034
作者:Jianhao Shen,Yichun Yin,Lin Li,Lifeng Shang,Xin Jiang,Ming Zhang,Qun Liu 机构:Department of Computer Science, School of EECS, Peking University, Huawei Noah’s Ark Lab, Huawei HiSilicon 备注:Findings of EMNLP2021 摘要:数学单词问题(MWP)是自然语言处理中一项具有挑战性和关键性的任务。最近的许多研究将MWP形式化为生成任务,并采用序列到序列模型将问题描述转换为数学表达式。然而,数学表达式容易出现小错误,而生成目标没有明确处理此类错误。为了解决这个局限性,我们设计了一个新的MWP排序任务,并提出了Generate&Rank,一个基于生成预训练语言模型的多任务框架。通过与生成和排序的联合训练,该模型从自己的错误中学习,能够区分正确和错误的表达式。同时,我们执行专为MWP设计的基于树的干扰和在线更新,以提高ranker。我们在基准测试上证明了我们提出的方法的有效性,结果表明,我们的方法在所有数据集中都始终优于基线。特别是在经典的Math23k中,我们的方法比最先进的方法高7%(78.4%$rightarrow$85.4%)。 摘要:Math word problem (MWP) is a challenging and critical task in natural language processing. Many recent studies formalize MWP as a generation task and have adopted sequence-to-sequence models to transform problem descriptions to mathematical expressions. However, mathematical expressions are prone to minor mistakes while the generation objective does not explicitly handle such mistakes. To address this limitation, we devise a new ranking task for MWP and propose Generate & Rank, a multi-task framework based on a generative pre-trained language model. By joint training with generation and ranking, the model learns from its own mistakes and is able to distinguish between correct and incorrect expressions. Meanwhile, we perform tree-based disturbance specially designed for MWP and an online update to boost the ranker. We demonstrate the effectiveness of our proposed method on the benchmark and the results show that our method consistently outperforms baselines in all datasets. Particularly, in the classical Math23k, our method is 7% (78.4% $rightarrow$ 85.4%) higher than the state-of-the-art.
其他神经网络|深度学习|模型|建模(6篇)
【1】 How much pretraining data do language models need to learn syntax? 标题:语言模型需要多少预训练数据才能学习语法? 链接:https://arxiv.org/abs/2109.03160
作者:Laura Pérez-Mayos,Miguel Ballesteros,Leo Wanner 机构: NLP Research Group, Pompeu Fabra University, Barcelona, Spain, Amazon AI, Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain 备注:To be published in proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) 摘要:基于Transformers的预训练语言模型在许多著名的NLU基准测试中取得了优异的结果。然而,尽管预训练方法非常方便,但在时间和资源方面却非常昂贵。这就需要研究训练前数据大小对模型知识的影响。我们使用根据原始文本数据的增量大小训练的模型来探讨这种对RoBERTa语法能力的影响。首先,我们使用句法结构探测来确定预先训练了更多数据的模型是否编码了更多的句法信息。其次,我们进行了有针对性的句法评估,以分析预训练数据大小对模型句法泛化性能的影响。第三,我们比较了不同模型在三个下游应用上的性能:词性标记、依赖性分析和释义识别。我们通过分析训练此类模型的成本效益权衡来补充我们的研究。我们的实验表明,虽然在更多数据上预训练的模型编码了更多的语法知识,在下游应用程序中表现更好,但它们并不总是在不同的语法现象中提供更好的性能,并且会带来更高的财务和环境成本。 摘要:Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impact of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.
【2】 Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles 标题:独特但不武断:在线注册中的学习习惯揭示出独特而一致的个人风格 链接:https://arxiv.org/abs/2109.03158
作者:Jian Zhu,David Jurgens 机构:Department of Linguistics, University of Michigan, School of Information 备注:EMNLP 2021 main conference 摘要:个人写作风格的变化通常是社会属性和个人属性共同作用的结果。虽然结构化社会变异已被广泛研究,例如基于性别的变异,但由于个体风格的特质,对如何描述个体风格的了解却少得多。我们引入了一种新的方法,通过大量的跨作者比较来研究独特性,以识别和编码文体特征。该神经模型在短文本和基于类比的探测任务中取得了很好的作者身份识别性能,表明学习到的表征表现出惊人的规律性,编码了个性化风格的定性和定量变化。通过文本扰动,我们量化了不同语言元素对独特变异的相对贡献。此外,我们通过测量作者之间和作者内部的变异来描述个体选择,表明个体选择的变异通常是独特的,但又是一致的。 摘要:An individual's variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features. The neural model achieves strong performance at authorship identification on short texts and through an analogy-based probing task, showing that the learned representations exhibit surprising regularities that encode qualitative and quantitative shifts of idiolectal styles. Through text perturbation, we quantify the relative contributions of different linguistic elements to idiolectal variation. Furthermore, we provide a description of idiolects through measuring inter- and intra-author variation, showing that variation in idiolects is often distinctive yet consistent.
【3】 NumGPT: Improving Numeracy Ability of Generative Pre-trained Models 标题:NumGPT:提高生成性预训练模型的数值能力 链接:https://arxiv.org/abs/2109.03137
作者:Zhihua Jin,Xin Jiang,Xingbo Wang,Qun Liu,Yong Wang,Xiaozhe Ren,Huamin Qu 机构: Department of Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong, Huawei Noah’s Ark Lab, School of Computing and Information Systems, Singapore Management University, Singapore 备注:8 pages, 3 figures 摘要:现有的生成性预训练语言模型(如GPT)侧重于对一般文本的语言结构和语义进行建模。然而,这些模型不考虑数字的数值性质,并且不能在数值推理任务(例如,数学单词问题和测量估计)上可靠地执行。在本文中,我们提出了NumGPT,一个生成的预训练模型,它显式地模拟文本中数字的数值特性。具体来说,它利用基于原型的数字嵌入对数字的尾数进行编码,并利用单个嵌入对数字的指数进行编码。设计了一个数字感知损失函数,将数字集成到NumGPT的预训练目标中。我们在四个不同的数据集上进行了广泛的实验,以评估NumGPT的计算能力。实验结果表明,NumGPT在测量估计、数字比较、数学单词问题和震级分类等一系列数字推理任务上优于基线模型(如GPT和带DICE的GPT)。还进行了消融研究,以评估训练前和模型超参数对性能的影响。 摘要:Existing generative pre-trained language models (e.g., GPT) focus on modeling the language structure and semantics of general texts. However, those models do not consider the numerical properties of numbers and cannot perform robustly on numerical reasoning tasks (e.g., math word problems and measurement estimation). In this paper, we propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts. Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number. A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT. We conduct extensive experiments on four different datasets to evaluate the numeracy ability of NumGPT. The experiment results show that NumGPT outperforms baseline models (e.g., GPT and GPT with DICE) on a range of numerical reasoning tasks such as measurement estimation, number comparison, math word problems, and magnitude classification. Ablation studies are also conducted to evaluate the impact of pre-training and model hyperparameters on the performance.
【4】 Infusing Future Information into Monotonic Attention Through Language Models 标题:通过语言模型将未来信息注入单调注意 链接:https://arxiv.org/abs/2109.03121
作者:Mohd Abbas Zaidi,Sathish Indurthi,Beomseok Lee,Nikhil Kumar Lakumarapu,Sangha Kim 机构:NLP Lab, Samsung Research, Seoul, South Korea 摘要:同步神经机器翻译(SNMT)模型在处理源序列之前就开始发出目标序列。最近的SNMT自适应策略使用单调注意来执行基于部分源和目标序列的读/写决策。缺乏足够的信息可能会导致单调的注意力做出糟糕的读/写决策,这反过来会对SNMT模型的性能产生负面影响。另一方面,人类翻译人员能够做出更好的读/写决策,因为他们可以利用语言信息和领域知识预测近期的单词。在人类翻译人员的激励下,在这项工作中,我们提出了一个框架,通过外部语言模型来帮助单调注意,以改进其决策。我们在MuST-C英语-德语和英语-法语语音-文本翻译任务上进行了实验,以证明所提出的框架的有效性。所提SNMT方法改进了质量-延迟权衡最先进的单调多头注意力。 摘要:Simultaneous neural machine translation(SNMT) models start emitting the target sequence before they have processed the source sequence. The recent adaptive policies for SNMT use monotonic attention to perform read/write decisions based on the partial source and target sequences. The lack of sufficient information might cause the monotonic attention to take poor read/write decisions, which in turn negatively affects the performance of the SNMT model. On the other hand, human translators make better read/write decisions since they can anticipate the immediate future words using linguistic information and domain knowledge.Motivated by human translators, in this work, we propose a framework to aid monotonic attention with an external language model to improve its decisions.We conduct experiments on the MuST-C English-German and English-French speech-to-text translation tasks to show the effectiveness of the proposed framework.The proposed SNMT method improves the quality-latency trade-off over the state-of-the-art monotonic multihead attention.
【5】 FHAC at GermEval 2021: Identifying German toxic, engaging, and fact-claiming comments with ensemble learning 标题:2021年GermEval上的FHAC:通过整体学习识别德国有毒的、引人入胜的和声称事实的评论 链接:https://arxiv.org/abs/2109.03094
作者:Tobias Bornheim,Niklas Grieger,Stephan Bialonski 机构:Department of Medical Engineering and Technomathematics, FH Aachen University of Applied Sciences, J¨ulich, Germany, Institute for Data-Driven Technologies 备注:None 摘要:近年来,通过大型预训练神经网络模型(如BERT和ELECTRA)学习的语言表示的可用性导致了许多下游自然语言处理任务的改进。预训练模型通常在预训练目标、体系结构和数据集方面有所不同,这些都会影响下游性能。在这篇文章中,我们对德国BERT和德国ELECTRA模型进行了微调,以识别GermEval 2021竞赛提供的Facebook数据中的有毒(子任务1)、参与(子任务2)和事实陈述评论(子任务3)。我们创建了这些模型的集合,并研究分类性能是否以及如何取决于集合成员的数量及其组成。在样本外数据方面,我们的最佳集合的宏F1得分为0.73(所有子任务),子任务1、2和3的F1得分分别为0.72、0.70和0.76。 摘要:The availability of language representations learned by large pretrained neural network models (such as BERT and ELECTRA) has led to improvements in many downstream Natural Language Processing tasks in recent years. Pretrained models usually differ in pretraining objectives, architectures, and datasets they are trained on which can affect downstream performance. In this contribution, we fine-tuned German BERT and German ELECTRA models to identify toxic (subtask 1), engaging (subtask 2), and fact-claiming comments (subtask 3) in Facebook data provided by the GermEval 2021 competition. We created ensembles of these models and investigated whether and how classification performance depends on the number of ensemble members and their composition. On out-of-sample data, our best ensemble achieved a macro-F1 score of 0.73 (for all subtasks), and F1 scores of 0.72, 0.70, and 0.76 for subtasks 1, 2, and 3, respectively.
【6】 Integrating Regular Expressions with Neural Networks via DFA 标题:基于DFA的正则表达式与神经网络的集成 链接:https://arxiv.org/abs/2109.02882
作者:Shaobo Li,Qun Liu,Xin Jiang,Yichun Yin,Chengjie Sun,Bingquan Liu,Zhenzhou Ji,Lifeng Shang 机构:Harbin Institute of Technology, Huawei Noah’s Ark Lab 摘要:人工设计的规则广泛用于构建行业应用程序。然而,维持数千条这样的手工规则是不可行的。因此,将规则知识集成到神经网络中,建立一个性能更好的混合模型是非常重要的。具体来说,人类设计的规则被表示为正则表达式(REs),从中构造等价的最小确定性有限自动机(MDFA)。我们建议使用MDFA作为中间模型来捕获匹配的RE模式作为每个输入句子的基于规则的特征,并将这些附加特征引入神经网络。我们在ATIS意图分类任务中对所提出的方法进行了评估。实验结果表明,在训练数据集相对较小的情况下,与神经网络和其他四种REs与神经网络相结合的方法相比,该方法取得了最好的性能。 摘要:Human-designed rules are widely used to build industry applications. However, it is infeasible to maintain thousands of such hand-crafted rules. So it is very important to integrate the rule knowledge into neural networks to build a hybrid model that achieves better performance. Specifically, the human-designed rules are formulated as Regular Expressions (REs), from which the equivalent Minimal Deterministic Finite Automatons (MDFAs) are constructed. We propose to use the MDFA as an intermediate model to capture the matched RE patterns as rule-based features for each input sentence and introduce these additional features into neural networks. We evaluate the proposed method on the ATIS intent classification task. The experiment results show that the proposed method achieves the best performance compared to neural networks and four other methods that combine REs and neural networks when the training dataset is relatively small.
其他(9篇)
【1】 When differential privacy meets NLP: The devil is in the detail 标题:当差异隐私遇到NLP:魔鬼在细节中 链接:https://arxiv.org/abs/2109.03175
作者:Ivan Habernal 机构:This is a camera-ready version of the article accepted for publication at the , Conference on, Empirical Methods in Natural Language Processing (EMNLP ,). The final official version will be 备注:Camera-ready for EMNLP 2021 摘要:差异隐私为个人隐私提供了一种正式的方法。在各种场景中应用差异隐私,例如保护用户的原始话语,必须满足一定的数学性质。我们的贡献是对ADePT的正式分析,ADePT是一种用于文本重写的差异私有自动编码器(Krishna等人,2021年)。ADePT在下游任务上取得了有希望的结果,同时提供了严格的隐私保证。我们的证明表明,ADePT不是差异私有的,因此使得实验结果未经证实。我们还量化了错误在其私有机制中的影响,表明在编码器维度非常小的乐观情况下,真实敏感性至少高出6倍,并且未私有化的话语量很容易达到整个数据集的100%。我们的意图既不是批评作者,也不是批评同行审查过程,而是指出,如果NLP中的差异隐私应用依赖于正式的保证,则应全面概述这些内容并进行详细审查。 摘要:Differential privacy provides a formal approach to privacy of individuals. Applications of differential privacy in various scenarios, such as protecting users' original utterances, must satisfy certain mathematical properties. Our contribution is a formal analysis of ADePT, a differentially private auto-encoder for text rewriting (Krishna et al, 2021). ADePT achieves promising results on downstream tasks while providing tight privacy guarantees. Our proof reveals that ADePT is not differentially private, thus rendering the experimental results unsubstantiated. We also quantify the impact of the error in its private mechanism, showing that the true sensitivity is higher by at least factor 6 in an optimistic case of a very small encoder's dimension and that the amount of utterances that are not privatized could easily reach 100% of the entire dataset. Our intention is neither to criticize the authors, nor the peer-reviewing process, but rather point out that if differential privacy applications in NLP rely on formal guarantees, these should be outlined in full and put under detailed scrutiny.
【2】 PAUSE: Positive and Annealed Unlabeled Sentence Embedding 标题:暂停:积极的、退火式的无标记句子嵌入 链接:https://arxiv.org/abs/2109.03155
作者:Lele Cao,Emil Larsson,Vilhelm von Ehrenheim,Dhiana Deva Cavalcanti Rocha,Anna Martin,Sonja Horn 机构:Motherbrain, EQT Group, Stockholm, Sweden, Modulai, Stockholm, Sweden 备注:Accepted by EMNLP 2021 main conference as long paper (12 pages and 2 figures). For source code, see this https URL 摘要:句子嵌入是指将原始文本转换为数字向量表示的一组有效且通用的技术,可用于广泛的自然语言处理(NLP)应用。这些技术大多是有监督或无监督的。与无监督方法相比,有监督方法对优化目标的假设较少,通常获得更好的结果。然而,训练需要大量的标记句子对,这在许多工业场景中是不可用的。为此,我们提出了一种通用的端到端方法——暂停(积极和退火的未标记句子嵌入),能够从部分标记的数据集中学习高质量的句子嵌入。我们的实验表明,在各种基准测试任务中,停顿仅使用了一小部分标记句子对,就达到了,有时甚至超过了,最先进的结果。当应用于实际的工业用例时,如果标签样本很少,PAUSE鼓励我们扩展数据集,而不需要大量的手动注释工作。 摘要:Sentence embedding refers to a set of effective and versatile techniques for converting raw text into numerical vector representations that can be used in a wide range of natural language processing (NLP) applications. The majority of these techniques are either supervised or unsupervised. Compared to the unsupervised methods, the supervised ones make less assumptions about optimization objectives and usually achieve better results. However, the training requires a large amount of labeled sentence pairs, which is not available in many industrial scenarios. To that end, we propose a generic and end-to-end approach -- PAUSE (Positive and Annealed Unlabeled Sentence Embedding), capable of learning high-quality sentence embeddings from a partially labeled dataset. We experimentally show that PAUSE achieves, and sometimes surpasses, state-of-the-art results using only a small fraction of labeled sentence pairs on various benchmark tasks. When applied to a real industrial use case where labeled samples are scarce, PAUSE encourages us to extend our dataset without the liability of extensive manual annotation work.
【3】 POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling 标题:POSSCORE:一种简单有效的词性标注会话搜索评测 链接:https://arxiv.org/abs/2109.03039
作者:Zeyang Liu,Ke Zhou,Jiaxin Mao,Max L. Wilson 机构:Nottingham, UK, University of Nottingham & Nokia Bell Labs, Renmin University of China, Beijing, China 备注:11 pages 摘要:对话式搜索系统,如谷歌助手和微软Cortana,提供了一种新的搜索模式,允许用户通过自然语言对话与搜索系统进行交流。由于搜索结果以自然语言句子的形式呈现,因此评估此类系统非常具有挑战性。鉴于可能的答复数量不限,为所有可能的答复收集相关性评估是不可行的。本文提出了一种简单而有效的会话搜索自动评价方法POSSCORE。所提出的基于嵌入的度量方法考虑了词性对响应的影响。据我们所知,我们的工作是第一次系统地证明了在会话搜索评估中加入句法信息(如词性标签)的重要性。实验结果表明,我们的指标可以与人类偏好相关联,与最先进的基线指标相比,取得了显著的改进。 摘要:Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems. Evaluating such systems is very challenging since search results are presented in the format of natural language sentences. Given the unlimited number of possible responses, collecting relevance assessments for all the possible responses is infeasible. In this paper, we propose POSSCORE, a simple yet effective automatic evaluation method for conversational search. The proposed embedding-based metric takes the influence of part of speech (POS) of the terms in the response into account. To the best knowledge, our work is the first to systematically demonstrate the importance of incorporating syntactic information, such as POS labels, for conversational search evaluation. Experimental results demonstrate that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.
【4】 Sequential Attention Module for Natural Language Processing 标题:自然语言处理中的顺序注意模块 链接:https://arxiv.org/abs/2109.03009
作者:Mengyuan Zhou,Jian Ma,Haiqin Yang,Lianxin Jiang,Yang Mo 机构:Ping An Life Insurance, Ltd., Shenzhen City, Guangdong Province, China 备注:10 pages, 4 figures, 5 tables 摘要:最近,大型预训练神经语言模型通过微调在许多下游自然语言处理(NLP)应用中取得了显著的性能。在本文中,我们的目标是如何进一步改进语言模型上的标记表示。因此,我们提出了一个简单而有效的即插即用模块,即顺序注意模块(SAM),该模块基于从预先训练的语言模型中学习到的令牌嵌入。我们提出的SAM由两个主要的注意模块组成:特征注意模块(FAM)和令牌注意模块(TAM)。更具体地说,FAM可以有效地识别每个维度上特征的重要性,并通过点积对下游NLP应用程序的原始令牌嵌入提高效果。同时,TAM可以在令牌级别进一步重新加权特征。此外,我们在FAM上提出了一种自适应滤波器,以防止噪声影响和增加信息吸收。最后,我们进行了大量实验,以证明我们提出的SAM的优势和特性。我们首先展示SAM如何在SemEval'21 Task 7的两个子任务的冠军解决方案中发挥主要作用。之后,我们将SAM应用于情绪分析和三个流行的NLP任务,并证明SAM始终优于最先进的基线。 摘要:Recently, large pre-trained neural language models have attained remarkable performance on many downstream natural language processing (NLP) applications via fine-tuning. In this paper, we target at how to further improve the token representations on the language models. We, therefore, propose a simple yet effective plug-and-play module, Sequential Attention Module (SAM), on the token embeddings learned from a pre-trained language model. Our proposed SAM consists of two main attention modules deployed sequentially: Feature-wise Attention Module (FAM) and Token-wise Attention Module (TAM). More specifically, FAM can effectively identify the importance of features at each dimension and promote the effect via dot-product on the original token embeddings for downstream NLP applications. Meanwhile, TAM can further re-weight the features at the token-wise level. Moreover, we propose an adaptive filter on FAM to prevent noise impact and increase information absorption. Finally, we conduct extensive experiments to demonstrate the advantages and properties of our proposed SAM. We first show how SAM plays a primary role in the champion solution of two subtasks of SemEval'21 Task 7. After that, we apply SAM on sentiment analysis and three popular NLP tasks and demonstrate that SAM consistently outperforms the state-of-the-art baselines.
【5】 Countering Online Hate Speech: An NLP Perspective 标题:打击网上仇恨言论:NLP视角 链接:https://arxiv.org/abs/2109.02941
作者:Mudit Chaudhary,Chandni Saxena,Helen Meng 机构:The Chinese University of Hong Kong 备注:12 pages 摘要:网络仇恨言论引起了所有人的注意,因为它与新冠疫情、美国选举和全球抗议有关。在线毒害——在线仇恨行为的总称,以在线仇恨言论等形式表现出来。仇恨言论是针对个人或群体的蓄意攻击,其动机是目标实体的身份或观点。通过社交媒体进行的日益增多的大众传播进一步加剧了在线仇恨言论的有害后果。虽然已经有了大量关于使用自然语言处理(NLP)识别仇恨语音的研究,但利用NLP预防和干预在线仇恨语音的工作相对缺乏。本文提出了一个关于仇恨言语NLP对抗方法的整体概念框架,并对NLP对抗在线仇恨言语的最新进展进行了全面的调查。它根据对抗技术的作用时间对其进行分类,并确定该主题未来的潜在研究领域。 摘要:Online hate speech has caught everyone's attention from the news related to the COVID-19 pandemic, US elections, and worldwide protests. Online toxicity - an umbrella term for online hateful behavior, manifests itself in forms such as online hate speech. Hate speech is a deliberate attack directed towards an individual or a group motivated by the targeted entity's identity or opinions. The rising mass communication through social media further exacerbates the harmful consequences of online hate speech. While there has been significant research on hate-speech identification using Natural Language Processing (NLP), the work on utilizing NLP for prevention and intervention of online hate speech lacks relatively. This paper presents a holistic conceptual framework on hate-speech NLP countering methods along with a thorough survey on the current progress of NLP for countering online hate speech. It classifies the countering techniques based on their time of action, and identifies potential future research areas on this topic.
【6】 Data Driven Content Creation using Statistical and Natural Language Processing Techniques for Financial Domain 标题:使用统计和自然语言处理技术为金融领域创建数据驱动的内容 链接:https://arxiv.org/abs/2109.02935
作者:Ankush Chopra,Prateek Nagwanshi,Sohom Ghosh 机构:Fidelity Investments, AI CoE, Bengaluru, India 备注:In Proceedings of The 3rd Financial Narrative Processing Workshop (FNP 2021) [To be published in ACL Anthology] 摘要:多年来,客户对即时获取信息的期望导致了虚拟助理等渠道的使用增加。通常情况下,客户在与实时聊天代理或电话代表联系之前,会先尝试通过搜索和虚拟助手等低接触渠道回答他们的问题。提高这些低接触系统的使用率对客户和组织都是双赢的,因为它使组织能够获得低成本的服务,同时客户可以毫不拖延地得到服务。在本文中,我们提出了一个由两部分组成的框架,第一部分描述了将来自不同交互渠道(如呼叫、搜索和聊天)的信息组合起来的方法。我们通过(使用一个堆叠的Bi LSTM网络)将高度接触的交互渠道数据(如通话和聊天)汇总为类似于客户意向的简短搜索查询,然后从交互数据中创建一个有机增长的意向分类法(使用分层聚合聚类)。该框架的第二部分侧重于通过分析交互数据源来提取客户问题。它使用TF-IDF和BERT计算相似性分数(Devlin等人,2019年)。它还使用语法和语义相似性将这些已识别的问题映射到框架第一部分的输出。 摘要:Over the years customers' expectation of getting information instantaneously has given rise to the increased usage of channels like virtual assistants. Typically, customers try to get their questions answered by low-touch channels like search and virtual assistant first, before getting in touch with a live chat agent or the phone representative. Higher usage of these low-touch systems is a win-win for both customers and the organization since it enables organizations to attain a low cost of service while customers get served without delay. In this paper, we propose a two-part framework where the first part describes methods to combine the information from different interaction channels like call, search, and chat. We do this by summarizing (using a stacked Bi-LSTM network) the high-touch interaction channel data such as call and chat into short searchquery like customer intents and then creating an organically grown intent taxonomy from interaction data (using Hierarchical Agglomerative Clustering). The second part of the framework focuses on extracting customer questions by analyzing interaction data sources. It calculates similarity scores using TF-IDF and BERT(Devlin et al., 2019). It also maps these identified questions to the output of the first part of the framework using syntactic and semantic similarity.
【7】 Datasets: A Community Library for Natural Language Processing 标题:DataSet:一个面向自然语言处理的社区库 链接:https://arxiv.org/abs/2109.02846
作者:Quentin Lhoest,Albert Villanova del Moral,Yacine Jernite,Abhishek Thakur,Patrick von Platen,Suraj Patil,Julien Chaumond,Mariama Drame,Julien Plu,Lewis Tunstall,Joe Davison,Mario Šaško,Gunjan Chhablani,Bhavitvya Malik,Simon Brandeis,Teven Le Scao,Victor Sanh,Canwen Xu,Nicolas Patry,Angelina McMillan-Major,Philipp Schmid,Sylvain Gugger,Clément Delangue,Théo Matussière,Lysandre Debut,Stas Bekman,Pierric Cistac,Thibault Goehringer,Victor Mustar,François Lagunas,Alexander M. Rush,Thomas Wolf 备注:EMNLP Demo 2021 摘要:随着研究人员提出新的任务、更大的模型和新的基准,公开可用的NLP数据集的规模、种类和数量迅速增长。数据集是当代NLP的社区图书馆,旨在支持这一生态系统。数据集旨在标准化最终用户界面、版本控制和文档,同时提供一个轻量级前端,其行为类似于互联网规模语料库的小型数据集。该库的设计采用了分布式、社区驱动的方法来添加数据集和记录使用情况。经过一年的发展,该图书馆现在包括650多个独特的数据集,有250多个贡献者,并帮助支持各种新颖的跨数据集研究项目和共享任务。图书馆在https://github.com/huggingface/datasets. 摘要:The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks. The library is available at https://github.com/huggingface/datasets.
【8】 A Scalable AI Approach for Clinical Trial Cohort Optimization 标题:一种可扩展的人工智能临床试验队列优化方法 链接:https://arxiv.org/abs/2109.02808
作者:Xiong Liu,Cheng Shi,Uday Deore,Yingbo Wang,Myah Tran,Iya Khalil,Murthy Devarakonda 机构: AI Innovation Center, Novartis, Cambridge, MA, USA, RWE Data Science, Novartis Pharma, East Hanover, NJ, USA, Global Drug Development, Novartis, East Hanover, NJ, USA, Global Drug Development, Novartis, Basel, Switzerland 备注:PharML 2021 (Machine Learning for Pharma and Healthcare Applications) at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021) 摘要:FDA一直在推广可通过扩大资格标准提高临床试验人群多样性的登记做法。然而,如何扩大资格仍然是一项重大挑战。我们提出了一种队列优化(AICO)的人工智能方法,通过基于转换器的自然语言处理合格标准,并使用真实数据评估标准。该方法可以从大量相关试验中提取通用资格标准变量,并测量试验设计对真实患者的普遍性。它克服了现有手动方法的可扩展性限制,并能够快速模拟感兴趣疾病的合格标准设计。乳腺癌试验设计的案例研究证明了该方法在提高试验可推广性方面的实用性。 摘要:FDA has been promoting enrollment practices that could enhance the diversity of clinical trial populations, through broadening eligibility criteria. However, how to broaden eligibility remains a significant challenge. We propose an AI approach to Cohort Optimization (AICO) through transformer-based natural language processing of the eligibility criteria and evaluation of the criteria using real-world data. The method can extract common eligibility criteria variables from a large set of relevant trials and measure the generalizability of trial designs to real-world patients. It overcomes the scalability limits of existing manual methods and enables rapid simulation of eligibility criteria design for a disease of interest. A case study on breast cancer trial design demonstrates the utility of the method in improving trial generalizability.
【9】 WhyAct: Identifying Action Reasons in Lifestyle Vlogs 标题:WhyAct:在生活方式Vlog中识别行动原因 链接:https://arxiv.org/abs/2109.02747
作者:Oana Ignat,Santiago Castro,Hanwen Miao,Weiji Li,Rada Mihalcea 机构:University of Michigan 备注:Accepted at EMNLP 2021 摘要:我们的目标是在在线视频中自动识别人类行为的原因。我们关注广泛存在的生活方式虚拟博客类型,在这种类型中,人们在口头描述自己的行为的同时,也在表演动作。我们引入并公开了{sc WhyAct}数据集,该数据集由1077个手动注释其原因的视觉动作组成。我们描述了一个多模态模型,该模型利用视觉和文本信息自动推断与视频中呈现的动作对应的原因。 摘要:We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the {sc WhyAct} dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information to automatically infer the reasons corresponding to an action presented in the video.
机器翻译,仅供参考