自然语言处理学术速递[7.29]

2021-07-30 16:55:26 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.CL 方向,今日共计10篇

Transformer(1篇)

【1】 Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation 标题:基于Transformer文本生成的神经规则执行跟踪器

作者:Yufei Wang,Can Xu,Huang Hu,Chongyang Tao,Stephen Wan,Mark Dras,Mark Johnson,Daxin Jiang 机构:Macquarie University, Microsoft STCA NLP, CSIRO Data 链接:https://arxiv.org/abs/2107.13077 摘要:序列到序列(S2S)神经文本生成模型,特别是预训练的神经文本生成模型(如BART和T5),在各种自然语言生成任务中表现出令人信服的性能。然而,这些模型的黑盒特性限制了它们在需要执行特定规则(如可控约束、先验知识)的任务中的应用。以前的工作要么设计特定的模型结构(例如,与规则“生成的输出应包括源输入中的某些单词”相对应的复制机制),要么实现专门的推理算法(例如,约束束搜索),通过文本生成执行特定规则。这些方法需要逐案仔细设计,并且很难同时支持多个规则。在本文中,我们提出了一个名为神经规则执行跟踪机的新模块,该模块可装配到各种基于Transformer的发电机中,以同时利用多个规则来指导神经生成模型,从而以统一且可扩展的方式获得更优的生成性能。在多个基准上的大量实验结果验证了我们提出的模型在可控文本生成和一般文本生成方面的有效性。 摘要:Sequence-to-Sequence (S2S) neural text generation models, especially the pre-trained ones (e.g., BART and T5), have exhibited compelling performance on various natural language generation tasks. However, the black-box nature of these models limits their application in tasks where specific rules (e.g., controllable constraints, prior knowledge) need to be executed. Previous works either design specific model structure (e.g., Copy Mechanism corresponding to the rule "the generated output should include certain words in the source input") or implement specialized inference algorithm (e.g., Constrained Beam Search) to execute particular rules through the text generation. These methods require careful design case-by-case and are difficult to support multiple rules concurrently. In this paper, we propose a novel module named Neural Rule-Execution Tracking Machine that can be equipped into various transformer-based generators to leverage multiple rules simultaneously to guide the neural generation model for superior generation performance in a unified and scalable way. Extensive experimental results on several benchmarks verify the effectiveness of our proposed model in both controllable and general text generation.

BERT(1篇)

【1】 Arabic aspect based sentiment analysis using BERT 标题:基于阿拉伯语体的情感分析的BERT方法

作者:Mohammed M. Abdelgwad 链接:https://arxiv.org/abs/2107.13290 摘要:基于方面的情感分析(ABSA)是一种文本分析方法,它定义了与特定目标相关的某些方面的观点的极性。关于ABSA的大部分研究是用英语进行的,只有少量工作是用阿拉伯语进行的。大多数以前的阿拉伯语研究都依赖于深度学习模型,该模型主要依赖于上下文无关的单词嵌入(例如word2vec),其中每个单词都有一个独立于上下文的固定表示。本文探讨了预先训练好的语言模型(如BERT)中上下文嵌入的建模能力,以及在阿拉伯语ABSA任务中使用句子对输入的建模能力。特别是,我们正在建立一个简单而有效的基于BERT的神经基线来处理这个任务。根据在基准阿拉伯语酒店评论数据集上的实验结果,我们的BERT结构具有简单的线性分类层,超过了最先进的作品。 摘要:Aspect-based sentiment analysis(ABSA) is a textual analysis methodology that defines the polarity of opinions on certain aspects related to specific targets. The majority of research on ABSA is in English, with a small amount of work available in Arabic. Most previous Arabic research has relied on deep learning models that depend primarily on context-independent word embeddings (e.g.word2vec), where each word has a fixed representation independent of its context. This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT, and making use of sentence pair input on Arabic ABSA tasks. In particular, we are building a simple but effective BERT-based neural baseline to handle this task. Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results on the benchmarked Arabic hotel reviews dataset.

Graph|知识图谱|Knowledge(1篇)

【1】 Growing knowledge culturally across generations to solve novel, complex tasks 标题:在文化上跨越几代人增长知识,以解决新的、复杂的任务

作者:Michael Henry Tessler,Pedro A. Tsividis,Jason Madeano,Brin Harper,Joshua B. Tenenbaum 机构:Department of Brain and Cognitive Sciences, MIT 备注:Poster presentation at the 43rd Annual Meeting of the Cognitive Science Society 链接:https://arxiv.org/abs/2107.13377 摘要:代代相传的文化知识让人类学到的知识远远超过了一个人一生从自己的经历中所能学到的。文化知识反过来依赖于语言:语言是前几代人信仰、重视和实践的最丰富记录。然而,作为一种学习手段,语言和文化并不是一种很好的理解。我们通过语言向反向工程文化学习迈出了第一步。我们以极简主义风格的视频游戏的形式开发了一套复杂的高风险任务,我们在迭代学习范式中部署了这些任务。游戏参与者仅限于两次尝试(两次生命)来击败每一场游戏,并且被允许在玩游戏前向阅读信息的未来参与者写信息。知识在几代人之间逐渐积累,使后代能够在游戏中取得更大的进步,并采取更有效的行动。多代人学习的轨迹与个体在无限的生命中独自学习的轨迹惊人地相似。这些结果表明,语言提供了一个充分的媒介来表达和积累人们在这些不同任务中获得的知识:环境的动态、有价值的目标、危险的风险和成功的策略。因此,我们在这里开创的电子游戏范式为文化传播和语言学习理论提供了丰富的实验平台。 摘要:Knowledge built culturally across generations allows humans to learn far more than an individual could glean from their own experience in a lifetime. Cultural knowledge in turn rests on language: language is the richest record of what previous generations believed, valued, and practiced. The power and mechanisms of language as a means of cultural learning, however, are not well understood. We take a first step towards reverse-engineering cultural learning through language. We developed a suite of complex high-stakes tasks in the form of minimalist-style video games, which we deployed in an iterated learning paradigm. Game participants were limited to only two attempts (two lives) to beat each game and were allowed to write a message to a future participant who read the message before playing. Knowledge accumulated gradually across generations, allowing later generations to advance further in the games and perform more efficient actions. Multigenerational learning followed a strikingly similar trajectory to individuals learning alone with an unlimited number of lives. These results suggest that language provides a sufficient medium to express and accumulate the knowledge people acquire in these diverse tasks: the dynamics of the environment, valuable goals, dangerous risks, and strategies for success. The video game paradigm we pioneer here is thus a rich test bed for theories of cultural transmission and learning from language.

推理|分析|理解|解释(1篇)

【1】 Red Dragon AI at TextGraphs 2021 Shared Task: Multi-Hop Inference Explanation Regeneration by Matching Expert Ratings 标题:TextGraphs 2021共享任务中的红龙AI:通过匹配专家评级实现多跳推理解释再生

作者:Vivek Kalyan,Sam Witteveen,Martin Andrews 机构:Singapore, Red Dragon AI 备注:Accepted paper for TextGraphs-15 workshop at NAACL 2021. (5 pages including references) 链接:https://arxiv.org/abs/2107.13031 摘要:为科学问题的答案创建解释是一项具有挑战性的任务,需要对大量事实句进行多跳推理。今年,为了将Textgraphs共享任务的重点重新放在收集相关语句的问题上(而不是仅仅找到一条“正确的路径”),WorldTree数据集增加了专家对每个总体解释的语句“相关性”的评级。我们的系统结合了初始语句检索,在共享任务排行榜上获得第二名;训练语言模型预测关联性得分;并对由此产生的一些排名进行了整合。我们的代码实现在https://github.com/mdda/worldtree_corpus/tree/textgraphs_2021 摘要:Creating explanations for answers to science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. This year, to refocus the Textgraphs Shared Task on the problem of gathering relevant statements (rather than solely finding a single 'correct path'), the WorldTree dataset was augmented with expert ratings of 'relevance' of statements to each overall explanation. Our system, which achieved second place on the Shared Task leaderboard, combines initial statement retrieval; language models trained to predict the relevance scores; and ensembling of a number of the resulting rankings. Our code implementation is made available at https://github.com/mdda/worldtree_corpus/tree/textgraphs_2021

半/弱/无监督|不确定性(1篇)

【1】 Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition 标题:连续波2Vector 2:连续学习在自监督自动语音识别中的应用

作者:Samuel Kessler,Bethan Thomas,Salah Karout 机构: 1University of Oxford 2Huawei R&D Cambridge 备注:11 pages, 9 figures including references and appendix. Accepted at ICML 2021 Workshop: Self-Supervised Learning for Reasoning and Perception 链接:https://arxiv.org/abs/2107.13530 摘要:我们提出了一种使用自监督学习(SSL)对多种语言的语音表示进行连续学习的方法,并将其应用于自动语音识别。有大量的未注释语音,因此从原始音频创建自监督表示并在小的注释数据集上进行微调是构建语音识别系统的一个有希望的方向。Wav2vec模型在预训练阶段对原始音频执行SSL,然后对一小部分带注释的数据进行微调。SSL模型为ASR产生了最先进的结果。然而,这些模型在自我监督下进行预训练非常昂贵。我们要解决的问题是不断地从音频中学习新的语言表征,而不会忘记以前的语言表征。我们利用持续学习中的想法来转移先前任务中的知识,以加速新语言任务的预训练。我们的Continuous-wav2vec2模型可以在学习新的语言任务时减少32%的训练前时间,并且在学习这种新的音频语言表征时不会忘记以前的语言表征。 摘要:We present a method for continual learning of speech representations for multiple languages using self-supervised learning (SSL) and applying these for automatic speech recognition. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and finetuning on a small annotated datasets is a promising direction to build speech recognition systems. Wav2vec models perform SSL on raw audio in a pretraining phase and then finetune on a small fraction of annotated data. SSL models have produced state of the art results for ASR. However, these models are very expensive to pretrain with self-supervision. We tackle the problem of learning new language representations continually from audio without forgetting a previous language representation. We use ideas from continual learning to transfer knowledge from a previous task to speed up pretraining a new language task. Our continual-wav2vec2 model can decrease pretraining times by 32% when learning a new language task, and learn this new audio-language representation without forgetting previous language representation.

Word2Vec|文本|单词(1篇)

【1】 Towards Robustness Against Natural Language Word Substitutions 标题:对自然语言单词替换的稳健性研究

作者:Xinshuai Dong,Anh Tuan Luu,Rongrong Ji,Hong Liu 机构:Nanyang Technological University, Singapore, Xiamen University, China, National Institute of Informatics, Japan 备注:Conference paper ICLR 2021 链接:https://arxiv.org/abs/2107.13541 摘要:对单词替换的鲁棒性有一种定义明确且广为接受的形式,即使用语义相似的单词作为替换,因此它被认为是自然语言处理中实现更广泛鲁棒性的基本踏脚石。以前的防御方法通过使用$l_2$-ball或超矩形来捕获向量空间中的单词替换,这会导致扰动集不够包含或不必要地大,从而阻碍对鲁棒训练最坏情况的模拟。本文介绍了一种新的{对抗稀疏凸组合}(ASCC)方法。我们将单词替换攻击空间建模为凸包,并利用正则化项对实际替换进行扰动,从而使我们的建模更好地与离散文本空间保持一致。在ASCC方法的基础上,我们进一步提出了ASCC防御,它利用ASCC产生最坏情况下的扰动,并结合对抗性训练以增强鲁棒性。实验表明,ASCC防御在两种流行的NLP任务(情绪分析和自然语言推理)上的鲁棒性优于当前的技术水平,这两种任务涉及跨多模型体系结构的多个攻击。此外,我们还设想了一种新的针对自然语言处理鲁棒性的防御方法,即在不使用任何其他防御技术的情况下,将经过鲁棒训练的词向量插入到正常训练的模型中,从而增强其鲁棒性。 摘要:Robustness against word substitutions has a well-defined and widely acceptable form, i.e., using semantically similar words as substitutions, and thus it is considered as a fundamental stepping-stone towards broader robustness in natural language processing. Previous defense methods capture word substitutions in vector space by using either $l_2$-ball or hyper-rectangle, which results in perturbation sets that are not inclusive enough or unnecessarily large, and thus impedes mimicry of worst cases for robust training. In this paper, we introduce a novel textit{Adversarial Sparse Convex Combination} (ASCC) method. We model the word substitution attack space as a convex hull and leverages a regularization term to enforce perturbation towards an actual substitution, thus aligning our modeling better with the discrete textual space. Based on the ASCC method, we further propose ASCC-defense, which leverages ASCC to generate worst-case perturbations and incorporates adversarial training towards robustness. Experiments show that ASCC-defense outperforms the current state-of-the-arts in terms of robustness on two prevailing NLP tasks, emph{i.e.}, sentiment analysis and natural language inference, concerning several attacks across multiple model architectures. Besides, we also envision a new class of defense towards robustness in NLP, where our robustly trained word vectors can be plugged into a normally trained model and enforce its robustness without applying any other defense techniques.

其他神经网络|深度学习|模型|建模(2篇)

【1】 Multi-Scale Feature and Metric Learning for Relation Extraction 标题:多尺度特征和度量学习在关系提取中的应用

作者:Mi Zhang,Tieyun Qian 机构: Dependency basedmodels operate on parse trees either to extract paths between•Mi Zhang and Tieyun Qian are with Wuhan University 链接:https://arxiv.org/abs/2107.13425 摘要:现有的关系抽取方法利用了词序中的词汇特征和解析树中的句法特征。虽然有效,但从连续单词序列中提取的词汇特征可能会引入一些几乎没有或没有意义内容的噪声。同时,句法特征通常通过具有受限感受野的图卷积网络进行编码。为了解决上述局限性,我们提出了一个多尺度特征和度量学习框架来进行关系提取。具体来说,我们首先开发了一个多尺度卷积神经网络来聚合词汇序列中的非连续主干。我们还设计了一个多尺度图卷积网络,它可以增加对特定句法角色的接受域。此外,我们提出了一个多尺度的度量学习范式,利用词汇和句法特征之间的特征级关系以及具有相同或不同类的实例之间的样本级关系。我们在三个真实世界的数据集上对各种类型的关系抽取任务进行了广泛的实验。结果表明,我们的模型明显优于最先进的方法。 摘要:Existing methods in relation extraction have leveraged the lexical features in the word sequence and the syntactic features in the parse tree. Though effective, the lexical features extracted from the successive word sequence may introduce some noise that has little or no meaningful content. Meanwhile, the syntactic features are usually encoded via graph convolutional networks which have restricted receptive field. To address the above limitations, we propose a multi-scale feature and metric learning framework for relation extraction. Specifically, we first develop a multi-scale convolutional neural network to aggregate the non-successive mainstays in the lexical sequence. We also design a multi-scale graph convolutional network which can increase the receptive field towards specific syntactic roles. Moreover, we present a multi-scale metric learning paradigm to exploit both the feature-level relation between lexical and syntactic features and the sample-level relation between instances with the same or different classes. We conduct extensive experiments on three real world datasets for various types of relation extraction tasks. The results demonstrate that our model significantly outperforms the state-of-the-art approaches.

【2】 Exceeding the Limits of Visual-Linguistic Multi-Task Learning 标题:超越视觉语言多任务学习的局限

作者:Cameron R. Wolfe,Keld T. Lundgaard 机构:Rice University, Houston, TX, USA, Salesforce Einstein, Cambridge, MA, USA 备注:10 pages, 7 figures 链接:https://arxiv.org/abs/2107.13054 摘要:通过利用数百个在线电子商务网站收集的大量产品数据,我们构建了1000个独特的分类任务,这些任务共享类似结构的输入数据,包括文本和图像。这些分类任务侧重于学习不同电子商务网站的产品层次结构,从而使其中许多网站相互关联。采用多模态Transformer模型,我们使用多任务学习(MTL)统一解决这些任务。在最初的100个任务数据集上进行了大量的实验,以揭示“大规模MTL”(即具有100多个任务的MTL)的最佳实践。从这些实验中,得出了最终的统一方法,该方法由最佳实践和新建议(如DyPa)组成,DyPa是一种简单的启发式方法,用于自动将特定于任务的参数分配给可能受益于额外容量的任务。使用我们的大规模MTL方法,我们成功地训练了数据集中所有1000个任务的单个模型,同时使用了最小的任务特定参数,从而表明有可能将MTL的当前工作扩展几个数量级。 摘要:By leveraging large amounts of product data collected across hundreds of live e-commerce websites, we construct 1000 unique classification tasks that share similarly-structured input data, comprised of both text and images. These classification tasks focus on learning the product hierarchy of different e-commerce websites, causing many of them to be correlated. Adopting a multi-modal transformer model, we solve these tasks in unison using multi-task learning (MTL). Extensive experiments are presented over an initial 100-task dataset to reveal best practices for "large-scale MTL" (i.e., MTL with more than 100 tasks). From these experiments, a final, unified methodology is derived, which is composed of both best practices and new proposals such as DyPa, a simple heuristic for automatically allocating task-specific parameters to tasks that could benefit from extra capacity. Using our large-scale MTL methodology, we successfully train a single model across all 1000 tasks in our dataset while using minimal task specific parameters, thereby showing that it is possible to extend several orders of magnitude beyond current efforts in MTL.

其他(2篇)

【1】 Goal-Oriented Script Construction 标题:面向目标的脚本构建

作者:Qing Lyu,Li Zhang,Chris Callison-Burch 机构:University of Pennsylvania 备注:To be published in INLG2021 (14th International Conference on Natural Language Generation) 链接:https://arxiv.org/abs/2107.13189 摘要:脚本的知识,在常规场景中常见的事件链,是面向任务的自然语言理解系统的宝贵资产。我们提出了面向目标的脚本构建任务,其中模型生成一系列步骤来完成给定的目标。我们在第一个支持18种语言的多语言脚本学习数据集上试验我们的任务,该数据集来自wikiHow,一个包含50万篇how-to文章的网站。对于基线,我们考虑使用基于语言的模型和基于检索的方法,首先从大的候选池中检索相关的步骤,然后对它们进行排序。我们证明了我们的任务是实用的,可行的,但是对于最先进的Transformer模型是有挑战性的,并且我们的方法可以很容易地部署到各种其他数据集和领域,具有良好的零炮性能。 摘要:The knowledge of scripts, common chains of events in stereotypical scenarios, is a valuable asset for task-oriented natural language understanding systems. We propose the Goal-Oriented Script Construction task, where a model produces a sequence of steps to accomplish a given goal. We pilot our task on the first multilingual script learning dataset supporting 18 languages collected from wikiHow, a website containing half a million how-to articles. For baselines, we consider both a generation-based approach using a language model and a retrieval-based approach by first retrieving the relevant steps from a large candidate pool and then ordering them. We show that our task is practical, feasible but challenging for state-of-the-art Transformer models, and that our methods can be readily deployed for various other datasets and domains with decent zero-shot performance.

【2】 Towards Emotion-Aware Agents For Negotiation Dialogues 标题:面向谈判对话的情感感知主体

作者:Kushal Chawla,Rene Clever,Jaysa Ramirez,Gale Lucas,Jonathan Gratch 机构:University of Southern California, Los Angeles, USA, CUNY Lehman College, Bronx, USA, Rollins College, Winter Park, USA 备注:Accepted at 9th International Conference on Affective Computing & Intelligent Interaction (ACII 2021) 链接:https://arxiv.org/abs/2107.13165 摘要:谈判是一种复杂的社会互动,包含了人类决策过程中的情感遭遇。可以与人协商的虚拟代理在教育学和对话AI中很有用。为了促进这类代理的发展,我们探索了谈判中两个重要的主观目标的预测——结果满意度和合作伙伴感知。具体来说,我们分析了从谈判中提取的情感属性在多大程度上有助于预测,超出了个体差异变量。我们关注基于聊天的谈判中的一个最新数据集,该数据集基于一个现实的露营场景。我们通过利用情感词汇和最先进的深度学习架构,研究情感维度的三个层次——表情、词汇和语境。我们的见解将有助于设计通过真实的通信接口进行交互的自适应协商代理。 摘要:Negotiation is a complex social interaction that encapsulates emotional encounters in human decision-making. Virtual agents that can negotiate with humans are useful in pedagogy and conversational AI. To advance the development of such agents, we explore the prediction of two important subjective goals in a negotiation - outcome satisfaction and partner perception. Specifically, we analyze the extent to which emotion attributes extracted from the negotiation help in the prediction, above and beyond the individual difference variables. We focus on a recent dataset in chat-based negotiations, grounded in a realistic camping scenario. We study three degrees of emotion dimensions - emoticons, lexical, and contextual by leveraging affective lexicons and a state-of-the-art deep learning architecture. Our insights will be helpful in designing adaptive negotiation agents that interact through realistic communication interfaces.

0 人点赞