自然语言处理学术速递[7.19]

2021-07-27 11:05:54 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.CL 方向,今日共计16篇

QA|VQA|问答|对话(1篇)

【1】 Exploiting Rich Syntax for Better Knowledge Base Question Answering 标题:利用丰富的语法更好地回答知识库问题

作者:Pengju Zhang,Yonghui Jia,Muhua Zhu,Wenliang Chen,Min Zhang 机构:Institute of Artificial Intelligence, School of Computer Science and Technology, Soochow University, China, Tencent, China 链接:https://arxiv.org/abs/2107.07940 摘要:最近关于知识库问答(KBQA)的研究表明,通过更好地理解问题,知识库问答在这方面取得了很大的进展。以往的编码问题主要集中在词序上,而很少考虑句法树的信息,本文提出了一种基于KBQA的基于句法的表示方法。首先,我们通过考虑关键字之间的最短依赖路径来编码基于路径的语法。然后,提出了两种编码策略,对整个句法树的信息进行模式化处理,得到基于树的句法。最后,我们结合了基于路径和基于树的KBQA语法表示。我们在一个广泛使用的基准数据集上进行了大量的实验,实验结果表明,我们的语法感知系统能够在不同的设置下充分利用语法信息,达到KBQA的最新性能。 摘要:Recent studies on Knowledge Base Question Answering (KBQA) have shown great progress on this task via better question understanding. Previous works for encoding questions mainly focus on the word sequences, but seldom consider the information from syntactic trees.In this paper, we propose an approach to learn syntax-based representations for KBQA. First, we encode path-based syntax by considering the shortest dependency paths between keywords. Then, we propose two encoding strategies to mode the information of whole syntactic trees to obtain tree-based syntax. Finally, we combine both path-based and tree-based syntax representations for KBQA. We conduct extensive experiments on a widely used benchmark dataset and the experimental results show that our syntax-aware systems can make full use of syntax information in different settings and achieve state-of-the-art performance of KBQA.

语义分析(1篇)

【1】 POS tagging, lemmatization and dependency parsing of West Frisian 标题:西弗里斯语的词性标注、词汇化和依存句法分析

作者:Wilbert Heeringa,Gosse Bouma,Martha Hofman,Eduard Drenth,Jan Wijffels,Hans Van de Velde 机构:Fryske Akademy,University of Groningen,BNOSAC,Utrecht University, Leeuwarden,Groningen,Brussels,Utrecht 备注:6 pages, 2 figures, 6 tables 链接:https://arxiv.org/abs/2107.07974 摘要:我们提出了一个用于西弗里斯语的lemmatizer/POS-tagger/dependency解析器,它使用了一个语料库,共有44714个单词,3126个句子,根据通用依赖版本2的指导原则进行了注释。POS-tagger通过使用荷兰语POS-tagger来分配给单词,而荷兰语POS-tagger应用于逐字翻译,或荷兰语平行文本的句子。使用弗里斯翻译程序Oersetter创建的直译可以获得最佳效果。形态学和句法注释是在荷兰语直译的基础上产生的。将lemmatizer/tagger/annotator在使用默认参数进行训练时的性能与使用用于训练lassymall UD 2.5语料库的参数值时获得的性能进行比较。“引理”有显著的改进。Frisian lemmatizer/PoS tagger/dependency解析器作为web应用程序和web服务发布。 摘要:We present a lemmatizer/POS-tagger/dependency parser for West Frisian using a corpus of 44,714 words in 3,126 sentences that were annotated according to the guidelines of Universal Dependency version 2. POS tags were assigned to words by using a Dutch POS tagger that was applied to a literal word-by-word translation, or to sentences of a Dutch parallel text. Best results were obtained when using literal translations that were created by using the Frisian translation program Oersetter. Morphologic and syntactic annotations were generated on the basis of a literal Dutch translation as well. The performance of the lemmatizer/tagger/annotator when it was trained using default parameters was compared to the performance that was obtained when using the parameter values that were used for training the LassySmall UD 2.5 corpus. A significant improvement was found for `lemma'. The Frisian lemmatizer/PoS tagger/dependency parser is released as a web app and as a web service.

Graph|知识图谱|Knowledge(1篇)

【1】 Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation 标题:深入了解:开放领域对话生成的知识对话循环利用机制

作者:Yajing Sun,Yue Hu,Luxi Xing,Yuqiang Xie,Xiangpeng Wei 机构:Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 链接:https://arxiv.org/abs/2107.07771 摘要:端到端智能神经对话系统存在产生不一致和重复反应的问题。现有的对话模式注重将个人知识单方面地融入到对话中,而忽略了将与个性相关的对话信息融入到作为双边信息流的个人知识中会提高后续对话的质量。此外,在会话层面上控制个人知识的使用也是必不可少的。本文提出了一种会话自适应多视角人物角色感知响应生成模型,旨在提高会话的一致性,减少重复。首先,我们从多个角度考虑会话一致性。从人物模型的角度出发,我们设计了一个新的交互模块,该模块不仅将个性化知识迭代地融入到每个回合的会话中,而且从会话中获取与个性相关的信息,以增强个性化知识的语义表示。从语体的角度出发,引入语体向量,并将其输入解码器,以保持语体的一致性。为了避免会话重复,我们设计了一个覆盖机制来跟踪个人知识利用的激活。通过自动评价和人工评价实验,验证了该模型的优越性。 摘要:End-to-End intelligent neural dialogue systems suffer from the problems of generating inconsistent and repetitive responses. Existing dialogue models pay attention to unilaterally incorporating personal knowledge into the dialog while ignoring the fact that incorporating the personality-related conversation information into personal knowledge taken as the bilateral information flow boosts the quality of the subsequent conversation. Besides, it is indispensable to control personal knowledge utilization over the conversation level. In this paper, we propose a conversation-adaption multi-view persona aware response generation model that aims at enhancing conversation consistency and alleviating the repetition from two folds. First, we consider conversation consistency from multiple views. From the view of the persona profile, we design a novel interaction module that not only iteratively incorporates personalized knowledge into each turn conversation but also captures the personality-related information from conversation to enhance personalized knowledge semantic representation. From the view of speaking style, we introduce the speaking style vector and feed it into the decoder to keep the speaking style consistency. To avoid conversation repetition, we devise a coverage mechanism to keep track of the activation of personal knowledge utilization. Experiments on both automatic and human evaluation verify the superiority of our model over previous models.

GAN|对抗|攻击|生成相关(3篇)

【1】 How Vulnerable Are Automatic Fake News Detection Methods to Adversarial Attacks? 标题:假新闻自动检测方法对敌意攻击的脆弱性有多大?

作者:Camille Koenders,Johannes Filla,Nicolai Schneider,Vinicius Woloszyn 机构:Technische Universit¨at Berlin 备注:9 pages, Github: this https URL 链接:https://arxiv.org/abs/2107.07970 摘要:近年来,随着虚假信息在互联网上的大量传播,虚假新闻的自动检测越来越受到人们的重视。一些虚假新闻的检测方法已经相当成功。然而,在检测算法中仍然存在许多漏洞。其原因是,假新闻发布者可以以这样一种方式构造和表达他们的文本,即检测算法不会将此文本暴露为假新闻。本文表明,有可能自动攻击经过训练的检测假新闻的最新模型,从而使这些模型变得脆弱。为此,首先基于数据集训练相应的模型。然后,使用文本攻击,试图操纵经过训练的模型,使先前正确识别的假新闻被归类为真实新闻。结果表明,有可能自动绕过假新闻检测机制,导致影响现有的政策举措。 摘要:As the spread of false information on the internet has increased dramatically in recent years, more and more attention is being paid to automated fake news detection. Some fake news detection methods are already quite successful. Nevertheless, there are still many vulnerabilities in the detection algorithms. The reason for this is that fake news publishers can structure and formulate their texts in such a way that a detection algorithm does not expose this text as fake news. This paper shows that it is possible to automatically attack state-of-the-art models that have been trained to detect Fake News, making these vulnerable. For this purpose, corresponding models were first trained based on a dataset. Then, using Text-Attack, an attempt was made to manipulate the trained models in such a way that previously correctly identified fake news was classified as true news. The results show that it is possible to automatically bypass Fake News detection mechanisms, leading to implications concerning existing policy initiatives.

【2】 Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models 标题:鲁棒预训练语言模型的对抗性扰动自监督对比学习

作者:Zhao Meng,Yihan Dong,Mrinmaya Sachan,Roger Wattenhofer 机构:ETH Zurich, Switzerland 备注:Work in progress 链接:https://arxiv.org/abs/2107.07610 摘要:该文利用具有对抗性扰动的自监督对比学习,提高了预训练语言模型BERT对基于词替换的对抗性攻击的鲁棒性。与以前的工作相比,我们的方法的一个优点是它能够在不使用任何标签的情况下提高模型的鲁棒性。此外,我们还为BERT上的单词级对抗训练创建了一个对抗性攻击。这种攻击是有效的,允许BERT在训练过程中对动态生成的对抗性示例进行对抗性训练。在四个数据集上的实验结果表明,该方法提高了BERT对四种不同的基于词替换的对抗性攻击的鲁棒性。此外,为了理解为什么我们的方法能够提高模型对对抗性攻击的鲁棒性,我们研究了干净例子的向量表示以及在应用我们的方法前后它们对应的对抗性例子。由于我们的方法利用未标记的原始数据提高了模型的鲁棒性,这为使用大型文本数据集来训练健壮的语言模型提供了可能性。 摘要:This paper improves the robustness of the pretrained language model BERT against word substitution-based adversarial attacks by leveraging self-supervised contrastive learning with adversarial perturbations. One advantage of our method compared to previous works is that it is capable of improving model robustness without using any labels. Additionally, we also create an adversarial attack for word-level adversarial training on BERT. The attack is efficient, allowing adversarial training for BERT on adversarial examples generated on the fly during training. Experimental results on four datasets show that our method improves the robustness of BERT against four different word substitution-based adversarial attacks. Furthermore, to understand why our method can improve the model robustness against adversarial attacks, we study vector representations of clean examples and their corresponding adversarial examples before and after applying our method. As our method improves model robustness with unlabeled raw data, it opens up the possibility of using large text datasets to train robust language models.

【3】 Internet-Augmented Dialogue Generation 标题:互联网增强的对话生成

作者:Mojtaba Komeili,Kurt Shuster,Jason Weston 机构:Facebook AI Research 链接:https://arxiv.org/abs/2107.07566 摘要:地球上最大的不断更新的知识库可以通过互联网搜索访问。在这项工作中,我们研究给予这些信息的会话代理。大型语言模型,即使它们在其权重范围内存储了大量令人印象深刻的知识,在生成对话时也会产生幻觉(Shuster et al.,2021);而且,这些事实在模型训练时被冻结了。相比之下,我们提出了一种基于上下文学习生成internet搜索查询的方法,然后对搜索结果进行条件处理以最终生成响应,这种方法可以利用最新的相关信息。我们在一个新收集的人类对话数据集上训练和评估这样的模型,其中一个演讲者在知识驱动的讨论中被允许访问互联网搜索,以确定他们的回答。我们发现,与不使用增广或基于FAISS的检索的现有方法相比,基于搜索查询的会话互联网访问提供了更高的性能(Lewis等人,2020)。 摘要:The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen in time at the point of model training. In contrast, we propose an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information. We train and evaluate such models on a newly collected dataset of human-human conversations whereby one of the speakers is given access to internet search during knowledgedriven discussions in order to ground their responses. We find that search-query based access of the internet in conversation provides superior performance compared to existing approaches that either use no augmentation or FAISS-based retrieval (Lewis et al., 2020).

检测相关(1篇)

【1】 Pseudo-labelling Enhanced Media Bias Detection 标题:伪标记法增强介质偏置检测

作者:Qin Ruan,Brian Mac Namee,Ruihai Dong 机构:School of Computer Science, University College Dublin, Dublin, Ireland, Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland 链接:https://arxiv.org/abs/2107.07705 摘要:通过弱的或远程的监控来利用未标记的数据是开发更有效的文本分类模型的一个重要方法。本文提出了一种简单而有效的数据扩充方法,该方法利用伪标记的思想从有噪声的远程监督注释数据集中选取样本。结果表明,该方法提高了有偏新闻检测模型的准确率。 摘要:Leveraging unlabelled data through weak or distant supervision is a compelling approach to developing more effective text classification models. This paper proposes a simple but effective data augmentation method, which leverages the idea of pseudo-labelling to select samples from noisy distant supervision annotation datasets. The result shows that the proposed method improves the accuracy of biased news detection models.

识别/分类(1篇)

【1】 The Application of Active Query K-Means in Text Classification 标题:主动查询K-Means算法在文本分类中的应用

作者:Yukun Jiang 机构:Department of Computer Science, New York University, New York, United States 备注:None 链接:https://arxiv.org/abs/2107.07682 摘要:主动学习是一种处理大量未标记数据的先进机器学习方法。在自然语言处理领域,通常对所有的数据进行注释既费钱又费时。这种低效性启发我们将主动学习应用于文本分类。本文首先将传统的无监督k-均值聚类算法改进为半监督聚类算法。然后,将该算法进一步扩展到具有惩罚最小最大选择的主动学习场景中,使得有限的查询产生更稳定的初始质心。该方法利用了用户的交互查询结果和底层的距离表示。在一个中文新闻数据集上进行测试后,它显示了在降低训练成本的同时,准确率的持续提高。 摘要:Active learning is a state-of-art machine learning approach to deal with an abundance of unlabeled data. In the field of Natural Language Processing, typically it is costly and time-consuming to have all the data annotated. This inefficiency inspires out our application of active learning in text classification. Traditional unsupervised k-means clustering is first modified into a semi-supervised version in this research. Then, a novel attempt is applied to further extend the algorithm into active learning scenario with Penalized Min-Max-selection, so as to make limited queries that yield more stable initial centroids. This method utilizes both the interactive query results from users and the underlying distance representation. After tested on a Chinese news dataset, it shows a consistent increase in accuracy while lowering the cost in training.

表征(1篇)

【1】 Temporal-aware Language Representation Learning From Crowdsourced Labels 标题:基于众包标签的时态感知语言表征学习

作者:Yang Hao,Xiao Zhai,Wenbiao Ding,Zitao Liu 机构:TAL Education Group, Beijing, China 备注:The 59th Annual Meeting of the Association for Computational Linguistics Workshop on Representation Learning for NLP (ACL RepL4NLP 2021) 链接:https://arxiv.org/abs/2107.07958 摘要:从众包标签中学习有效的语言表示对于许多实际的机器学习任务是至关重要的。这个问题的一个挑战性方面是众包标签的质量会受到观察者内部和观察者之间的高度可变性的影响。由于高容量的深层神经网络可以很容易地记忆众包标签之间的所有分歧,直接应用现有的有监督语言表示学习算法可能会产生次优解。在本文中,我们提出了emph{TACMA},一个underline{t}emporal-underline{a}ware语言表示学习启发式算法,用于带有underline{m}多个underline{a}注释符的underline{c}行源标签。该方法(1)利用注意机制对观察者内部的可变性进行了显式建模(2) 计算和汇总来自多个工人的每个样本的信心分数,以解决观察者之间的分歧。所提出的启发式算法非常容易在大约5行代码中实现。在四个合成数据集和四个真实数据集上对所提出的启发式算法进行了评估。结果表明,我们的方法在预测精度和AUC方面优于许多最新的基线。为了鼓励可复制的结果,我们在url上公开了我们的代码{https://github.com/CrowdsourcingMining/TACMA}. 摘要:Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intra- and inter-observer variability. Since the high-capacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervised language representation learning algorithms may yield suboptimal solutions. In this paper, we propose emph{TACMA}, a underline{t}emporal-underline{a}ware language representation learning heuristic for underline{c}rowdsourced labels with underline{m}ultiple underline{a}nnotators. The proposed approach (1) explicitly models the intra-observer variability with attention mechanism; (2) computes and aggregates per-sample confidence scores from multiple workers to address the inter-observer disagreements. The proposed heuristic is extremely easy to implement in around 5 lines of code. The proposed heuristic is evaluated on four synthetic and four real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage the reproducible results, we make our code publicly available at url{https://github.com/CrowdsourcingMining/TACMA}.

其他神经网络|深度学习|模型|建模(5篇)

【1】 A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation 标题:一种用于教师发声评价的多模态机器学习框架

作者:Hang Li,Yu Kang,Yang Hao,Wenbiao Ding,Zhongqin Wu,Zitao Liu 机构:TAL Education Group, Beijing, China 备注:AIED'21: The 22nd International Conference on Artificial Intelligence in Education, 2021 链接:https://arxiv.org/abs/2107.07956 摘要:声乐教学质量是评价教师教学积极性的重要指标之一,已被普遍认为与整个课程质量密切相关。然而,现有的声乐演唱评价主要采用人工评分的方法,这就面临着主观性和耗时性两大核心挑战。在本文中,我们提出了一种新的机器学习方法,利用成对比较和多模态正交融合算法来生成大规模的客观评价结果,教师声乐表达的流畅性和激情。我们收集了两个真实教育场景的数据集,实验结果证明了算法的有效性。为了鼓励可复制的结果,我们在url公开了我们的代码{https://github.com/tal-ai/ML4VocalDelivery.git}. 摘要:The quality of vocal delivery is one of the key indicators for evaluating teacher enthusiasm, which has been widely accepted to be connected to the overall course qualities. However, existing evaluation for vocal delivery is mainly conducted with manual ratings, which faces two core challenges: subjectivity and time-consuming. In this paper, we present a novel machine learning approach that utilizes pairwise comparisons and a multimodal orthogonal fusing algorithm to generate large-scale objective evaluation results of the teacher vocal delivery in terms of fluency and passion. We collect two datasets from real-world education scenarios and the experiment results demonstrate the effectiveness of our algorithm. To encourage reproducible results, we make our code public available at url{https://github.com/tal-ai/ML4VocalDelivery.git}.

【2】 Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan 标题:对于资源适度不足的语言,多语言模式是最佳选择吗?加泰罗尼亚语综合评价

作者:Jordi Armengol-Estapé,Casimiro Pio Carrino,Carlos Rodriguez-Penagos,Ona de Gibert Bonet,Carme Armentano-Oller,Aitor Gonzalez-Agirre,Maite Melero,Marta Villegas 机构:Barcelona Supercomputing Center, Barcelona, Spain, ona.degibert, carme.armentano, aitor.gonzalez 备注:Accepted into Findings of ACL-IJCNLP 2021 链接:https://arxiv.org/abs/2107.07903 摘要:多语种语言模型是一个重要的突破,因为它们大大减少了资源不足的语言对数据的需求。然而,特定语言模型的优越性已经被证明适用于能够访问大量数据的语言。在这项工作中,我们将重点放在加泰罗尼亚语上,目的是探索一个中等规模的单语语言模型在多大程度上与最先进的大型多语言模型相竞争。为此,我们:(1)建立了一个干净、高质量的加泰罗尼亚语文本语料库(CaText),这是迄今为止最大的语料库(但仅是以往单语语言模型研究的一小部分),(2)为加泰罗尼亚语(BERTa)建立了一个基于转换器的语言模型,(3)设计了一个在多种环境下的全面评估,包括一系列完整的下游任务,即词性标注、命名实体识别和分类、文本分类、问答和语义文本相似性,大多数相应的数据集都是从头创建的。结果是一个新的基准,加泰罗尼亚语言理解基准(CLUB),我们将其作为一个开放资源,与干净的文本语料库、语言模型和清理管道一起发布。使用最先进的多语言模型和仅在维基百科上训练的单语模型作为基线,我们始终观察到我们的模型在任务和设置方面的优越性。 摘要:Multilingual language models have been a crucial breakthrough as they considerably reduce the need of data for under-resourced languages. Nevertheless, the superiority of language-specific models has already been proven for languages having access to large amounts of data. In this work, we focus on Catalan with the aim to explore to what extent a medium-sized monolingual language model is competitive with state-of-the-art large multilingual models. For this, we: (1) build a clean, high-quality textual Catalan corpus (CaText), the largest to date (but only a fraction of the usual size of the previous work in monolingual language models), (2) train a Transformer-based language model for Catalan (BERTa), and (3) devise a thorough evaluation in a diversity of settings, comprising a complete array of downstream tasks, namely, Part of Speech Tagging, Named Entity Recognition and Classification, Text Classification, Question Answering, and Semantic Textual Similarity, with most of the corresponding datasets being created ex novo. The result is a new benchmark, the Catalan Language Understanding Benchmark (CLUB), which we publish as an open resource, together with the clean textual corpus, the language model, and the cleaning pipeline. Using state-of-the-art multilingual models and a monolingual model trained only on Wikipedia as baselines, we consistently observe the superiority of our model across tasks and settings.

【3】 Intersectional Bias in Causal Language Models 标题:因果语言模型中的交叉性偏差

作者:Liam Magee,Lida Ghahremanlou,Karen Soldatic,Shanthi Robertson 机构:Western Sydney University, Australia, Microsoft, United Kingdom 备注:18 pages, 4 figures 链接:https://arxiv.org/abs/2107.07691 摘要:为了检验在语言生成中是否可以观察到交叉偏误,我们检验了emph{GPT-2}和emph{GPT-NEO}模型,其大小从1.24亿到27亿个参数不等。我们进行了一项实验,将性别、宗教和残疾这三个社会类别组合成无条件或Zero-Shot提示,用来生成句子,然后分析句子的情感。我们的结果证实了早期使用自回归因果模型进行的测试,包括emph{GPT}模型家族。我们还说明了为什么偏见可能会抵制针对单一类别(如性别、宗教和种族)的技术,因为它也可能以微妙的方式表现在由串联的社会类别引发的文本中。为了解决这些困难,我们建议技术和社区为基础的方法需要结合起来,以承认和解决复杂和交叉的语言模型偏见。 摘要:To examine whether intersectional bias can be observed in language generation, we examine emph{GPT-2} and emph{GPT-NEO} models, ranging in size from 124 million to ~2.7 billion parameters. We conduct an experiment combining up to three social categories - gender, religion and disability - into unconditional or zero-shot prompts used to generate sentences that are then analysed for sentiment. Our results confirm earlier tests conducted with auto-regressive causal models, including the emph{GPT} family of models. We also illustrate why bias may be resistant to techniques that target single categories (e.g. gender, religion and race), as it can also manifest, in often subtle ways, in texts prompted by concatenated social categories. To address these difficulties, we suggest technical and community-based approaches need to combine to acknowledge and address complex and intersectional language model bias.

【4】 TAPEX: Table Pre-training via Learning a Neural SQL Executor 标题:TAPEX:通过学习神经SQL执行器进行表预训练

作者:Qian Liu,Bei Chen,Jiaqi Guo,Zeqi Lin,Jian-guang Lou 机构:†Beihang University, Beijing, China; §Microsoft Research, Beijing, China, ♦Xi’an Jiaotong University, Xi’an, China 备注:Work in progress, the project homepage is at this https URL 链接:https://arxiv.org/abs/2107.07653 摘要:近年来,预训练语言模型在自然语言句子和(半)结构化表的建模方面取得了成功。然而,现有的表格预训练技术往往存在数据质量低、预训练效率低等问题。本文证明了通过在合成语料库上学习一个神经SQL执行器可以实现表的预训练,而合成语料库是通过自动合成可执行SQL查询获得的。通过对合成语料库的预训练,我们的方法TAPEX极大地提高了下游任务的性能,使现有的语言模型最多提高了19.5%。同时,TAPEX具有非常高的预训练效率,并且在使用小的预训练语料库时产生了很强的效果。实验结果表明,TAPEX在很大程度上优于以前的表预训练方法,并且我们的模型在四个已知数据集上取得了最新的结果,包括将WikiSQL表示精度提高到89.6%( 4.9%),将WikiTableQuestions表示精度提高到57.5%( 4.8%),SQA表示准确率为74.5%( 3.5%),TabFact表示准确率为84.6%( 3.6%)。我们的工作通过对合成可执行程序的预训练,为过度结构化数据的推理开辟了道路。 摘要:Recent years pre-trained language models hit a success on modeling natural language sentences and (semi-)structured tables. However, existing table pre-training techniques always suffer from low data quality and low pre-training efficiency. In this paper, we show that table pre-training can be realized by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries. By pre-training on the synthetic corpus, our approach TAPEX dramatically improves the performance on downstream tasks, boosting existing language models by at most 19.5%. Meanwhile, TAPEX has remarkably high pre-training efficiency and yields strong results when using a small pre-trained corpus. Experimental results demonstrate that TAPEX outperforms previous table pre-training approaches by a large margin, and our model achieves new state-of-the-art results on four well-known datasets, including improving the WikiSQL denotation accuracy to 89.6% ( 4.9%), the WikiTableQuestions denotation accuracy to 57.5% ( 4.8%), the SQA denotation accuracy to 74.5% ( 3.5%), and the TabFact accuracy to 84.6% ( 3.6%). Our work opens the way to reason over structured data by pre-training on synthetic executable programs.

【5】 Multi-task Learning with Cross Attention for Keyword Spotting 标题:基于交叉注意的关键词识别多任务学习

作者:Takuya Higuchi,Anmol Gupta,Chandra Dhir 机构:Apple, Department of Computer Science, The University of Hong Kong 备注:Submitted to ASRU 2021 链接:https://arxiv.org/abs/2107.07634 摘要:关键词定位(keywordspotting,KWS)是语音应用中的一项重要技术,它使用户能够通过说出关键词短语来激活设备。尽管音素分类器可以用于KWS,但它可以利用大量的转录数据进行自动语音识别(ASR),但训练标准(音素识别)和目标任务(KWS)之间存在不匹配。最近,多任务学习被应用到KWS中,以利用ASR和KWS训练数据。在这种方法中,声学模型的输出被分成两个分支,一个是用ASR数据训练的音素转录,另一个是用KWS数据训练的关键词分类。本文介绍了一种多任务学习框架下的交叉注意解码器。与输出层简单分割的传统多任务学习方法不同,交叉注意解码器通过在编码器输出和可训练查询序列之间执行交叉注意来总结来自语音编码器的信息,以预测KWS任务的置信度得分。在KWS任务上的实验结果表明,该方法比传统的分支分裂多任务学习和双向长-短团队记忆译码器的性能平均提高了12%。 摘要:Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multi-task learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention decoder summarizes information from a phonetic encoder by performing cross attention between the encoder outputs and a trainable query sequence to predict a confidence score for the KWS task. Experimental results on KWS tasks show that the proposed approach outperformed the conventional multi-task learning with split branches and a bi-directional long short-team memory decoder by 12% on average.

其他(2篇)

【1】 Automatic Task Requirements Writing Evaluation via Machine Reading Comprehension 标题:基于机器阅读理解的任务要求写作自动评测

作者:Shiting Xu,Guowei Xu,Peilei Jia,Wenbiao Ding,Zhongqin Wu,Zitao Liu 机构:TAL Education Group, Beijing, China 备注:AIED'21: The 22nd International Conference on Artificial Intelligence in Education, 2021 链接:https://arxiv.org/abs/2107.07957 摘要:任务要求(TRs)写作是英语重点测试和英语初级测试中的一个重要题型。一个TR写作问题可能包括多个要求,一篇高质量的论文必须对每个要求做出全面准确的回应。然而,有限的教师资源使学生无法立即获得详细的评分。大多数现有的自动论文评分系统侧重于给出一个整体的分数,但很少提供支持它的理由。为了在一定程度上解决这一问题,本文提出了一种基于机器阅读理解的端到端框架。该框架不仅检测文章是否回应了需求问题,而且清楚地标记了文章回答问题的位置。该框架由三个模块组成:问题规范化模块、基于ELECTRA的MRC模块和响应定位模块。我们广泛探索最先进的MRC方法。我们的方法在真实的教育数据集上获得了0.93的准确度和0.85的F1分数。为了鼓励可复制的结果,我们在url上公开了我们的代码{https://github.com/aied2021TRMRC/AIED_2021_TRMRC_code}. 摘要:Task requirements (TRs) writing is an important question type in Key English Test and Preliminary English Test. A TR writing question may include multiple requirements and a high-quality essay must respond to each requirement thoroughly and accurately. However, the limited teacher resources prevent students from getting detailed grading instantly. The majority of existing automatic essay scoring systems focus on giving a holistic score but rarely provide reasons to support it. In this paper, we proposed an end-to-end framework based on machine reading comprehension (MRC) to address this problem to some extent. The framework not only detects whether an essay responds to a requirement question, but clearly marks where the essay answers the question. Our framework consists of three modules: question normalization module, ELECTRA based MRC module and response locating module. We extensively explore state-of-the-art MRC methods. Our approach achieves 0.93 accuracy score and 0.85 F1 score on a real-world educational dataset. To encourage reproducible results, we make our code publicly available at url{https://github.com/aied2021TRMRC/AIED_2021_TRMRC_code}.

【2】 Beyond Goldfish Memory: Long-Term Open-Domain Conversation 标题:超越金鱼记忆:长期开放领域对话

作者:Jing Xu,Arthur Szlam,Jason Weston 机构:Facebook AI Research 链接:https://arxiv.org/abs/2107.07567 摘要:尽管开放领域对话模型最近有所改进,但最先进的模型是在几乎没有上下文的简短对话中进行训练和评估的。相比之下,长期会话环境很少被研究。在这项工作中,我们收集并发布了一个由多个聊天会话组成的人类数据集,通过这些会话,说话的伙伴可以了解彼此的兴趣,并讨论他们从过去的会话中学到的东西。我们展示了在现有数据集上训练的现有模型如何在自动和人工评估的长期会话环境中表现不佳,并且我们研究了可以表现更好的长期上下文模型。特别地,我们发现检索增强方法和具有总结和回忆先前对话能力的方法优于目前被认为是最先进的标准编码器-解码器架构。 摘要:Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions. We show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations, and we study long-context models that can perform much better. In particular, we find retrieval-augmented methods and methods with an ability to summarize and recall previous conversations outperform the standard encoder-decoder architectures currently considered state of the art.

0 人点赞