自然语言处理学术速递[7.6]

cs.CL 方向，今日共计30篇

Transformer(3篇)

【1】 Power Law Graph Transformer for Machine Translation and Representation Learning 标题：用于机器翻译和表示学习的幂律图转换器

作者：Burc Gokden 机构：Fromthesky Research Labs LLC, Oregon, USA 备注：55 pages, 39 figures 链接：https://arxiv.org/abs/2107.02039 摘要：我们提出了幂律图Transformer，Transformer模型有明确的演绎和归纳任务的预测和表示学习。演绎任务根据可学习的幂律分布参数学习数据集级（全局）和实例级（局部）图结构。归纳任务使用演绎任务输出输出预测概率，类似于一个转换模型。我们用来自TED谈话记录的土耳其语-英语和葡萄牙语-英语数据集训练我们的模型，用于机器翻译，并将模型的性能和特征与在同一实验装置上训练成比例点积注意力的Transformer模型进行比较。我们的模型在土耳其语-英语和葡萄牙语-英语翻译任务中的BLEU得分分别为17.79美元和28.33美元。我们还展示了如何利用量化集和N维流形表示之间的对偶关系，通过连续应用端到端的线性和非线性变换，在局部和全局演绎归纳输出之间进行变换。摘要：We present the Power Law Graph Transformer, a transformer model with well defined deductive and inductive tasks for prediction and representation learning. The deductive task learns the dataset level (global) and instance level (local) graph structures in terms of learnable power law distribution parameters. The inductive task outputs the prediction probabilities using the deductive task output, similar to a transductive model. We trained our model with Turkish-English and Portuguese-English datasets from TED talk transcripts for machine translation and compared the model performance and characteristics to a transformer model with scaled dot product attention trained on the same experimental setup. We report BLEU scores of $17.79$ and $28.33$ on the Turkish-English and Portuguese-English translation tasks with our model, respectively. We also show how a duality between a quantization set and N-dimensional manifold representation can be leveraged to transform between local and global deductive-inductive outputs using successive application of linear and non-linear transformations end-to-end.

【2】 Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition 标题：基于交叉模态变换器的语音自动识别神经校正模型

作者：Tomohiro Tanaka,Ryo Masumura,Mana Ihori,Akihiko Takashima,Takafumi Moriya,Takanori Ashihara,Shota Orihashi,Naoki Makishima 机构：NTT Media Intelligence Laboratories, NTT Corporation, Japan 备注：Accepted to Interspeech 2021 链接：https://arxiv.org/abs/2107.01569 摘要：提出了一种基于跨模态变换器的神经校正模型，对自动语音识别（ASR）系统的输出进行了修正，以消除ASR错误。神经校正模型一般由编解码网络组成，可以直接对序列到序列的映射问题进行建模。最成功的方法是使用输入语音和它的ASR输出文本作为编解码网络的输入上下文。然而，传统的方法不能考虑这两个不同模态输入之间的关系，因为每个模态的输入上下文是单独编码的。为了有效地利用两个不同模态输入之间的相关信息，我们提出的模型在跨模态自我注意的基础上，使用一个转换器对两个不同的上下文进行联合编码。我们期望跨模态自我注意能够有效地捕捉两种不同模态之间的关系，从而完善ASR假设。我们还引入了一种浅层融合技术来有效地集成第一通ASR模型和我们提出的神经校正模型。对日语自然语言ASR任务的实验表明，本文提出的模型比传统的神经校正模型具有更好的ASR性能。摘要：We propose a cross-modal transformer-based neural correction models that refines the output of an automatic speech recognition (ASR) system so as to exclude ASR errors. Generally, neural correction models are composed of encoder-decoder networks, which can directly model sequence-to-sequence mapping problems. The most successful method is to use both input speech and its ASR output text as the input contexts for the encoder-decoder networks. However, the conventional method cannot take into account the relationships between these two different modal inputs because the input contexts are separately encoded for each modal. To effectively leverage the correlated information between the two different modal inputs, our proposed models encode two different contexts jointly on the basis of cross-modal self-attention using a transformer. We expect that cross-modal self-attention can effectively capture the relationships between two different modals for refining ASR hypotheses. We also introduce a shallow fusion technique to efficiently integrate the first-pass ASR model and our proposed neural correction model. Experiments on Japanese natural language ASR tasks demonstrated that our proposed models achieve better ASR performance than conventional neural correction models.

【3】 Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN 标题：Transformer能用自然语言跳来跳去吗？评估从扫描转移的性能

作者：Rahma Chaabouni,Roberto Dessì,Eugene Kharitonov 机构：Ecole Normale Superieure, Roberto Dessı, Facebook AI & Pompeu Fabra 链接：https://arxiv.org/abs/2107.01366 摘要：尽管现代seq2seq体系结构在实际应用中取得了成功，但它们无法系统地概括几个扫描任务。因此，目前尚不清楚扫描样式的合成泛化在实际NLP任务中是否有用。在这项工作中，我们研究这样的组合带来的好处，几个机器翻译任务。我们提出了几个重点修改的Transformer，极大地提高泛化能力扫描，并选择一个仍然与标准的机器翻译（MT）任务上的香草Transformer。接下来，我们研究了它在低资源环境下的性能和一个新引入的分布移位英法翻译任务。总的来说，我们发现一个能够扫描的模型的改进不会直接转移到资源丰富的MT设置。相比之下，在低资源设置中，一般的修改可以使BLEU评分w.r.t.a提高13.1%。同样，在引入的合成英法翻译任务中，基于精确度的度量提高了14%。这提供了实验证据，证明SCAN中评估的合成泛化在资源匮乏和域转移的场景中特别有用。摘要：Despite their practical success, modern seq2seq architectures are unable to generalize systematically on several SCAN tasks. Hence, it is not clear if SCAN-style compositional generalization is useful in realistic NLP tasks. In this work, we study the benefit that such compositionality brings about to several machine translation tasks. We present several focused modifications of Transformer that greatly improve generalization capabilities on SCAN and select one that remains on par with a vanilla Transformer on a standard machine translation (MT) task. Next, we study its performance in low-resource settings and on a newly introduced distribution-shifted English-French translation task. Overall, we find that improvements of a SCAN-capable model do not directly transfer to the resource-rich MT setup. In contrast, in the low-resource setup, general modifications lead to an improvement of up to 13.1% BLEU score w.r.t. a vanilla Transformer. Similarly, an improvement of 14% in an accuracy-based metric is achieved in the introduced compositional English-French translation task. This provides experimental evidence that the compositional generalization assessed in SCAN is particularly useful in resource-starved and domain-shifted scenarios.

BERT(1篇)

【1】 Packing: Towards 2x NLP BERT Acceleration 标题：包装：接近2倍NLP BERT加速

作者：Matej Kosec,Sheng Fu,Mario Michael Krell 机构：Graphcore Inc., Palo Alto 链接：https://arxiv.org/abs/2107.02027 摘要：我们发现，在序列长度512填充令牌代表超过50%的维基百科数据集用于预训练BERT（双向编码器表示从Transformer）。因此，通过删除所有填充，我们实现了2倍的速度在序列/秒。为了利用数据集的这一特性，我们开发并比较了两种确定性打包算法。这两种算法都依赖于序列可互换的假设，因此可以根据序列长度的直方图（而不是每个样本）进行打包。这个问题的转换导致算法是快速的，并且在数据集大小上具有线性复杂度。最短包优先直方图打包（SPFHP）算法确定了0.02秒内超过16M序列的Wikipedia数据集的打包顺序。非负最小二乘直方图填充（NNLSHP）算法在28.4秒内收敛，但产生的解决方案更具深度效率，通过在一个样本中组合最多3个序列来获得接近最优的填充。使用每个样本具有多个序列的数据集需要在注意层进行额外的掩蔽，并修改传销损失函数。我们证明了这两个变化都是直接实现的，并且对现代硬件上可实现的性能增益的影响相对较小。最后，我们使用压缩数据集对BERT进行了预训练，证明了没有收敛性损失和期望的2x加速。摘要：We find that at sequence length 512 padding tokens represent in excess of 50% of the Wikipedia dataset used for pretraining BERT (Bidirectional Encoder Representations from Transformers). Therefore by removing all padding we achieve a 2x speed-up in terms of sequences/sec. To exploit this characteristic of the dataset, we develop and contrast two deterministic packing algorithms. Both algorithms rely on the assumption that sequences are interchangeable and therefore packing can be performed on the histogram of sequence lengths, rather than per sample. This transformation of the problem leads to algorithms which are fast and have linear complexity in dataset size. The shortest-pack-first histogram-packing (SPFHP) algorithm determines the packing order for the Wikipedia dataset of over 16M sequences in 0.02 seconds. The non-negative least-squares histogram-packing (NNLSHP) algorithm converges in 28.4 seconds but produces solutions which are more depth efficient, managing to get near optimal packing by combining a maximum of 3 sequences in one sample. Using the dataset with multiple sequences per sample requires additional masking in the attention layer and a modification of the MLM loss function. We demonstrate that both of these changes are straightforward to implement and have relatively little impact on the achievable performance gain on modern hardware. Finally, we pretrain BERT-Large using the packed dataset, demonstrating no loss of convergence and the desired 2x speed-up.

QA|VQA|问答|对话(2篇)

【1】 Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints 标题：具有计算约束的开放领域问答训练自适应计算

作者：Yuxiang Wu,Pasquale Minervini,Pontus Stenetorp,Sebastian Riedel 机构：University College London 备注：7 pages, 1 figure, to be published in ACL-IJCNLP 2021 链接：https://arxiv.org/abs/2107.02102 摘要：自适应计算（AC）在提高开放域问答（ODQA）系统的效率方面是有效的。然而，当前的交流方法需要调整所有的模型参数，而训练最先进的ODQA模型需要大量的计算资源，这对于大多数研究人员来说是不可能的。我们提出了一种自适应通道编码器，这种AC方法可以应用于现有的ODQA模型，并且可以在单个GPU上有效地训练。它使基本ODQA模型的参数保持不变，但它用经过训练的AC策略覆盖编码器的默认逐层计算，以优化模型的计算效率。我们的实验结果表明，我们的方法改进了两个数据集上的最新模型，并且由于ODQA模型的基础更强，比以前的AC方法更精确。所有源代码和数据集都可以在https://github.com/uclnlp/APE. 摘要：Adaptive Computation (AC) has been shown to be effective in improving the efficiency of Open-Domain Question Answering (ODQA) systems. However, current AC approaches require tuning of all model parameters, and training state-of-the-art ODQA models requires significant computational resources that may not be available for most researchers. We propose Adaptive Passage Encoder, an AC method that can be applied to an existing ODQA model and can be trained efficiently on a single GPU. It keeps the parameters of the base ODQA model fixed, but it overrides the default layer-by-layer computation of the encoder with an AC policy that is trained to optimise the computational efficiency of the model. Our experimental results show that our method improves upon a state-of-the-art model on two datasets, and is also more accurate than previous AC methods due to the stronger base ODQA model. All source code and datasets are available at https://github.com/uclnlp/APE.

【2】 Coarse-to-Careful: Seeking Semantic-related Knowledge for Open-domain Commonsense Question Answering 标题：从粗到细：面向开放领域常识问答的语义相关知识获取

作者：Luxi Xing,Yue Hu,Jing Yu,Yuqiang Xie,Wei Peng 机构：Institute of Information Engineering, Chinese Academy of Sciences, China, School of Cyber Security, University of Chinese Academy of Sciences, China 备注：In ICASSP2021 链接：https://arxiv.org/abs/2107.01592 摘要：利用外部知识来帮助机器回答需要背景常识的问题是一种普遍现象，这就面临着无限的知识会传递噪声和误导性信息的问题。针对引入相关知识的问题，提出了一种语义驱动的知识感知QA框架，该框架从粗到细控制知识注入。在知识抽取阶段，我们设计了一种裁剪策略，在问题的粗语义监控下对抽取的知识进行过滤。开发了一个语义感知的知识获取模块，该模块利用结构化知识信息，并根据问题的语义层次，融合适当的知识。实验表明，与强基线相比，该方法在CommonsenseQA数据集上提高了性能。摘要：It is prevalent to utilize external knowledge to help machine answer questions that need background commonsense, which faces a problem that unlimited knowledge will transmit noisy and misleading information. Towards the issue of introducing related knowledge, we propose a semantic-driven knowledge-aware QA framework, which controls the knowledge injection in a coarse-to-careful fashion. We devise a tailoring strategy to filter extracted knowledge under monitoring of the coarse semantic of question on the knowledge extraction stage. And we develop a semantic-aware knowledge fetching module that engages structural knowledge information and fuses proper knowledge according to the careful semantic of questions in a hierarchical way. Experiments demonstrate that the proposed approach promotes the performance on the CommonsenseQA dataset comparing with strong baselines.

机器翻译(1篇)

【1】 IITP at WAT 2021: System description for English-Hindi Multimodal Translation Task 标题：2021年WATIITP：英语-印地语多模态翻译任务的系统描述

作者：Baban Gain,Dibyanayan Bandyopadhyay,Asif Ekbal 机构：Indian Institute of Technology Patna, Patna, India 链接：https://arxiv.org/abs/2107.01656 摘要：神经机器翻译（NMT）由于其端到端可训练的灵活性，是当今主流的机器翻译技术。然而，NMT仍然难以在低资源环境下正确翻译，特别是在远程语言对上。克服这一问题的一种方法是利用其他方式提供的信息（如果有的话）。其思想是，尽管语言不同，源语和目标语使用者看到的是同一事物，源语和目标语的视觉表现是相同的，这对系统有积极的帮助。多模态信息可以帮助NMT系统消除某些短语或单词的歧义，从而提高翻译质量。我们参加了第八届亚洲翻译研讨会（WAT-2021）的英语-印地语多模态翻译任务，获得了42.47分和37.50分的BLEU评估分数和挑战分数。摘要：Neural Machine Translation (NMT) is a predominant machine translation technology nowadays because of its end-to-end trainable flexibility. However, NMT still struggles to translate properly in low-resource settings specifically on distant language pairs. One way to overcome this is to use the information from other modalities if available. The idea is that despite differences in languages, both the source and target language speakers see the same thing and the visual representation of both the source and target is the same, which can positively assist the system. Multimodal information can help the NMT system to improve the translation by removing ambiguity on some phrases or words. We participate in the 8th Workshop on Asian Translation (WAT - 2021) for English-Hindi multimodal translation task and achieve 42.47 and 37.50 BLEU points for Evaluation and Challenge subset, respectively.

Graph|知识图谱|Knowledge(2篇)

【1】 ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation 标题：Ernie3.0：大规模知识增强的语言理解和生成前期训练

作者：Yu Sun,Shuohuan Wang,Shikun Feng,Siyu Ding,Chao Pang,Junyuan Shang,Jiaxiang Liu,Xuyi Chen,Yanbin Zhao,Yuxiang Lu,Weixin Liu,Zhihua Wu,Weibao Gong,Jianzhong Liang,Zhizhou Shang,Peng Sun,Wei Liu,Xuan Ouyang,Dianhai Yu,Hao Tian,Hua Wu,Haifeng Wang 机构：Baidu Inc. 链接：https://arxiv.org/abs/2107.02137 摘要：经过预训练的模型在各种自然语言处理（NLP）任务中取得了最先进的结果。最近的工作，如T5和GPT-3已经表明，扩大预先训练的语言模型可以提高其泛化能力。特别是具有1750亿个参数的GPT-3模型，显示了其强大的任务无关零炮/少炮学习能力。尽管取得了成功，但这些大规模模型都是在纯文本上训练的，没有引入语言知识和世界知识等知识。此外，大多数大型模型都是以自回归的方式训练的。因此，这种传统的微调方法在解决下游语言理解任务时表现出相对较弱的性能。为了解决上述问题，我们提出了一个统一的ernie3.0框架，用于大规模知识增强模型的预训练。它融合了自回归网络和自编码网络，使得训练后的模型可以方便地适应自然语言理解和生成任务，具有零次学习、少次学习和微调的特点。在一个由纯文本和大规模知识图组成的4TB语料库上，我们用100亿个参数训练了该模型。实证结果表明，该模型在54项中文自然语言处理任务上优于最新的模型，其英文版在SuperGLUE基准（2021年7月3日）上取得第一名，超过人类绩效 0.8%（90.6%对89.8%）。摘要：Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by 0.8% (90.6% vs. 89.8%).

【2】 A Knowledge-based Approach for Answering Complex Questions in Persian 标题：一种基于知识的波斯语复杂问题解答方法

作者：Romina Etezadi,Mehrnoush Shamsfard 机构：aShahidBeheshtiUniversity 备注：9 pages, 5 figures 链接：https://arxiv.org/abs/2107.02040 摘要：开放领域问答系统的研究由来已久。在这一领域的挑战是回答复杂的问题（CQA）需要复杂的推理方法和大量的知识。在低资源语言（如波斯语）中，用于开放域复杂问题的数据集并不多，而且语言处理工具包也不是很精确。在本文中，我们提出了一个基于知识的方法来回答波斯复杂的问题，使用法拉斯基；波斯知识图，利用PeCoQ；新创建的复杂波斯问题数据集。在这项工作中，我们处理多约束和多跳的问题，建立他们的一套可能的相应的逻辑形式。然后用多语种BERT选择最能从句法和语义上描述输入复杂问题的逻辑形式。问题的答案是建立在答案的逻辑形式，提取知识图。实验表明，该方法在波斯CQA中的性能优于其他方法。摘要：Research on open-domain question answering (QA) has a long tradition. A challenge in this domain is answering complex questions (CQA) that require complex inference methods and large amounts of knowledge. In low resource languages, such as Persian, there are not many datasets for open-domain complex questions and also the language processing toolkits are not very accurate. In this paper, we propose a knowledge-based approach for answering Persian complex questions using Farsbase; the Persian knowledge graph, exploiting PeCoQ; the newly created complex Persian question dataset. In this work, we handle multi-constraint and multi-hop questions by building their set of possible corresponding logical forms. Then Multilingual-BERT is used to select the logical form that best describes the input complex question syntactically and semantically. The answer to the question is built from the answer to the logical form, extracted from the knowledge graph. Experiments show that our approach outperforms other approaches in Persian CQA.

推理|分析|理解|解释(3篇)

【1】 Statistical Analysis of Perspective Scores on Hate Speech Detection 标题：仇恨言语检测视角得分的统计分析

作者：Hadi Mansourifar,Dana Alsagheer,Weidong Shi,Lan Ni,Yan Huang 机构：Computer Science Department, University of Houston, Valenti School of Communication, University of Houston 备注：Accepted paper in International IJCAI Workshop on Artificial Intelligence for Social Good 2021 链接：https://arxiv.org/abs/2107.02024 摘要：近年来，由于社交媒体中攻击性语言的指数级增长，仇恨言语检测成为一个热门话题。实验证明，目前最先进的仇恨语音分类器只有在与训练数据具有相同特征分布的数据上进行测试时才是有效的。因此，模型体系结构在改进当前结果方面起着第二个作用。在这样一个多样化的数据分布中，依赖于低层次特征是由于数据中的自然偏差而导致不足的主要原因。这就是为什么我们需要使用高级特性来避免有偏见的判断。在本文中，我们统计分析了视角得分及其对仇恨言语检测的影响。我们发现，不同的仇恨言语数据集在提取他们的观点得分时是非常相似的。最后，我们证明了对仇恨语音数据集的透视分数进行过采样可以显著提高在其他仇恨语音数据集上的泛化性能。摘要：Hate speech detection has become a hot topic in recent years due to the exponential growth of offensive language in social media. It has proven that, state-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data. As a consequence, model architecture plays the second role to improve the current results. In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data. That's why we need to use high level features to avoid a biased judgement. In this paper, we statistically analyze the Perspective Scores and their impact on hate speech detection. We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores. Eventually, we prove that, over-sampling the Perspective Scores of a hate speech dataset can significantly improve the generalization performance when it comes to be tested on other hate speech datasets.

【2】 Doing Good or Doing Right? Exploring the Weakness of Commonsense Causal Reasoning Models 标题：做得好还是做得对？常识因果推理模型的弱点探析

作者：Mingyue Han,Yinglin Wang 机构：School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, China, A Sample from development dataset, Premise: The woman banished the children from her, property., ask-for: “cause” 备注：ACL2021, Main Conference, Short Paper 链接：https://arxiv.org/abs/2107.01791 摘要：预训练语言模型（PLM）在合理选择（COPA）任务中取得了令人惊讶的成绩。然而，PLMs是否真正获得了因果推理的能力仍然是一个问题。本文研究了语义相似度偏差问题，揭示了当前COPA模型在一定攻击下的脆弱性。以往的解决方案，解决表面线索的不平衡令牌分布仍然遇到同样的问题，语义偏见，甚至更严重的是由于利用更多的训练数据。实验结果表明，该方法不仅提高了模型的泛化能力，而且有助于模型在具有挑战性的数据集BCOPA-CE上更稳健地运行，它具有无偏的表征分布，模型更难区分因果关系。摘要：Pretrained language models (PLM) achieve surprising performance on the Choice of Plausible Alternatives (COPA) task. However, whether PLMs have truly acquired the ability of causal reasoning remains a question. In this paper, we investigate the problem of semantic similarity bias and reveal the vulnerability of current COPA models by certain attacks. Previous solutions that tackle the superficial cues of unbalanced token distribution still encounter the same problem of semantic bias, even more seriously due to the utilization of more training data. We mitigate this problem by simply adding a regularization loss and experimental results show that this solution not only improves the model's generalization ability, but also assists the models to perform more robustly on a challenging dataset, BCOPA-CE, which has unbiased token distribution and is more difficult for models to distinguish cause and effect.

【3】 Domain Adaptation for Sentiment Analysis Using Increased Intraclass Separation 标题：基于增强型类内分离的情感分析领域自适应

作者：Mohammad Rostami,Aram Galstyan 机构：USC Information Sciences Institute 链接：https://arxiv.org/abs/2107.01598 摘要：情绪分析是企业研究顾客意见、改进产品、确定最优营销策略的一项既昂贵又必要的任务。由于跨产品和服务的领域广泛，跨领域情感分析方法受到了广泛的关注。这些方法通过训练跨领域的可归纳分类器来缓解不同应用程序之间的领域差距，这有助于放松对每个领域的数据注释需求。大多数现有的方法集中于学习对源域和目标域都是不变的不可知域表示。因此，使用源域注释数据训练的分类器可以很好地在相关的目标域中推广。提出了一种新的域自适应方法，使得嵌入空间中不同类之间有较大的边界。通过匹配跨域的数据分布，将嵌入空间训练为不可知域。源域中较大的类内边界有助于减少“域偏移”对目标域分类器性能的影响。理论和实证分析都证明了该方法的有效性。摘要：Sentiment analysis is a costly yet necessary task for enterprises to study the opinions of their customers to improve their products and to determine optimal marketing strategies. Due to the existence of a wide range of domains across different products and services, cross-domain sentiment analysis methods have received significant attention. These methods mitigate the domain gap between different applications by training cross-domain generalizable classifiers which help to relax the need for data annotation for each domain. Most existing methods focus on learning domain-agnostic representations that are invariant with respect to both the source and the target domains. As a result, a classifier that is trained using the source domain annotated data would generalize well in a related target domain. We introduce a new domain adaptation method which induces large margins between different classes in an embedding space. This embedding space is trained to be domain-agnostic by matching the data distributions across the domains. Large intraclass margins in the source domain help to reduce the effect of "domain shift" on the classifier performance in the target domain. Theoretical and empirical analysis are provided to demonstrate that the proposed method is effective.

GAN|对抗|攻击|生成相关(1篇)

【1】 DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling 标题：DeepRapper：基于韵律和节奏建模的神经Rap生成

作者：Lanqing Xue,Kaitao Song,Duocai Wu,Xu Tan,Nevin L. Zhang,Tao Qin,Wei-Qiang Zhang,Tie-Yan Liu 机构：The Hong Kong University of Science and Technology, Nanjing University of Science and Technology, Fudan University, Microsoft Research Asia, Tsinghua University 备注：Accepted by ACL 2021 main conference 链接：https://arxiv.org/abs/2107.01875 摘要：说唱生成的目的是产生歌词和相应的歌唱节拍，它需要同时模拟押韵和节奏。以前的说唱作品主要集中在押韵的歌词上，而忽略了对说唱表演很重要的节奏。在本文中，我们开发了DeepRapper，一个基于Transformer的rap生成系统，可以同时模拟韵律和节奏。由于没有可用的rap节奏数据集，我们开发了一个数据挖掘管道来收集一个大规模的rap数据集，其中包括大量的rap歌曲与对齐的歌词和节奏。其次，我们设计了一个基于变换器的自回归语言模型，对韵律和韵律进行了细致的建模。具体地说，我们以相反的顺序生成歌词，用押韵表示和约束来增强押韵，并在歌词中插入节拍符号来进行节奏/节拍建模。据我们所知，DeepRapper是第一个同时产生押韵和节奏的说唱系统。客观和主观评价都表明，深说唱产生了富有创意和高质量的说唱韵律和节奏。代码将在GitHub上发布。摘要：Rap generation, which aims to produce lyrics and corresponding singing beats, needs to model both rhymes and rhythms. Previous works for rap generation focused on rhyming lyrics but ignored rhythmic beats, which are important for rap performance. In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms. Since there is no available rap dataset with rhythmic beats, we develop a data mining pipeline to collect a large-scale rap dataset, which includes a large number of rap songs with aligned lyrics and rhythmic beats. Second, we design a Transformer-based autoregressive language model which carefully models rhymes and rhythms. Specifically, we generate lyrics in the reverse order with rhyme representation and constraint for rhyme enhancement and insert a beat symbol into lyrics for rhythm/beat modeling. To our knowledge, DeepRapper is the first system to generate rap with both rhymes and rhythms. Both objective and subjective evaluations demonstrate that DeepRapper generates creative and high-quality raps with rhymes and rhythms. Code will be released on GitHub.

半/弱/无监督|不确定性(1篇)

【1】 Textual Data Distributions: Kullback Leibler Textual Distributions Contrasts on GPT-2 Generated Texts, with Supervised, Unsupervised Learning on Vaccine & Market Topics & Sentiment 标题：文本数据分布：Kullback Leibler文本分布在GPT-2生成的文本上与疫苗、市场主题和情绪的监督和非监督学习进行对比

作者：Jim Samuel,Ratnakar Palle,Eduardo Correa Soares 机构：University of Charleston, Apple Inc., Cerence B. V. 链接：https://arxiv.org/abs/2107.02025 摘要：有效的文本数据分布（TDD）对齐和生成是文本分析和自然语言处理中的一个开放性研究问题。目前很难用简洁的方法确认两个或两个以上的自然语言数据集属于相似的分布，也很难确定文本数据在多大程度上具有一致性。这项研究的重点是通过应用多种有监督和无监督机器学习（ML）方法，通过（i）主题对齐和（ii）情感对齐来探讨TDD的行为，从而解决上述更广泛问题的一部分。此外，我们使用多种文本生成方法，包括微调GPT-2，以生成文本的主题和情感。最后，我们开发了一个独特的过程驱动的Kullback-Leibler散度（KLD）在TDD中的应用，命名为KL文本分布对比（KL-TDC），以识别机器生成的文本语料库与自然生成的文本语料库的对齐。因此，本研究提出了一种独特的基于主题和情感的TDD生成和验证方法，可用于解决稀疏数据问题以及其他需要人工生成主题或情感文本数据的研究、实践和课堂情境。摘要：Efficient textual data distributions (TDD) alignment and generation are open research problems in textual analytics and NLP. It is presently difficult to parsimoniously and methodologically confirm that two or more natural language datasets belong to similar distributions, and to identify the extent to which textual data possess alignment. This study focuses on addressing a segment of the broader problem described above by applying multiple supervised and unsupervised machine learning (ML) methods to explore the behavior of TDD by (i) topical alignment, and (ii) by sentiment alignment. Furthermore we use multiple text generation methods including fine-tuned GPT-2, to generate text by topic and by sentiment. Finally we develop a unique process driven variation of Kullback-Leibler divergence (KLD) application to TDD, named KL Textual Distributions Contrasts(KL-TDC) to identify the alignment of machine generated textual corpora with naturally occurring textual corpora. This study thus identifies a unique approach for generating and validating TDD by topic and sentiment, which can be used to help address sparse data problems and other research, practice and classroom situations in need of artificially generated topic or sentiment aligned textual data.

检测相关(1篇)

【1】 Contradiction Detection in Persian Text 标题：波斯语文本中的矛盾检测

作者：Zeinab Rahimi,Mehrnoush ShamsFard 机构：NLP lab, Shahid Beheshti University, Tehran, Iran 备注：24 pages, 9 tables and 5 figures 链接：https://arxiv.org/abs/2107.01987 摘要：语义矛盾句的检测是自然语言处理应用中最具挑战性和最基本的问题之一，如文本蕴涵的识别。本研究中的矛盾包括不同类型的语义对抗，如冲突和反义词。由于缺乏足够的数据来应用精确的机器学习，特别是对波斯语和其他低资源语言的深度学习方法，基于规则的方法，可以类似于这些系统的功能将是一个巨大的兴趣。最近，迁移学习等新方法的出现，为低资源语言的深度学习开辟了可能。基于以上两点，本研究结合一个简单的规则库基线，提出了一个用于识别语义矛盾的规则库系统和一个基于Bert的波斯语文本深层矛盾检测系统。规则库系统采用频繁规则挖掘方法，利用一个开发集来提取合适的矛盾规则。对不同类型的矛盾句进行了规则提取测试。在该系统中，否定类的最大f-测度约为90%，所有类的平均f-测度约为76%，优于其它算法。另一方面，由于规则库系统对于某些类型的矛盾的性能一般，我们使用了基于Bert的深度学习系统来处理我们的翻译数据集；平均F-测度为73。我们的混合系统F-测度约为80。摘要：Detection of semantic contradictory sentences is one of the most challenging and fundamental issues for NLP applications such as recognition of textual entailments. Contradiction in this study includes different types of semantic confrontation, such as conflict and antonymy. Due to lack of sufficient data to apply precise machine learning and specifically deep learning methods to Persian and other low resource languages, rule-based approaches that can function similarly to these systems will be of a great interest. Also recently, emergence of new methods such as transfer learning, has opened up the possibility of deep learning for low-resource languages. Considering two above points, in this study, along with a simple rule-base baseline, a novel rule-base system for identifying semantic contradiction along with a Bert base deep contradiction detection system for Persian texts have been introduced. The rule base system has used frequent rule mining method to extract appropriate contradiction rules using a development set. Extracted rules are tested for different categories of contradictory sentences. In this system the maximum f-measure among contradiction categories is obtained for negation about 90% and the average F-measure of system for all classes is about 76% which outperforms other algorithms on Persian texts. On the other hand, because of medium performance of rule base system for some categories of contradiction, we use a Bert base deep learning system using our translated dataset; with average F-measure of 73. Our hybrid system has f-measure of about 80.

识别/分类(3篇)

【1】 Arabic Code-Switching Speech Recognition using Monolingual Data 标题：使用单语数据的阿拉伯语码转换语音识别

作者：Ahmed Ali,Shammur Chowdhury,Amir Hussein,Yasser Hifny 机构：Qatar Computing Research Institute, HBKU, Doha, Qatar, Kanari AI, California, USA, University of Helwan, Egypt 备注：Accepted in Interspeech 2021, speech recognition, code-switching, ASR, transformer, WFST, graph approach 链接：https://arxiv.org/abs/2107.01573 摘要：在全球化的背景下，自动语音识别（ASR）中的码切换是一个重要的挑战。最近对多语种ASR的研究表明，与单语系统相比，多语种ASR有潜在的改进。通过一系列大规模ASR实验，研究了ASR多语言建模的关键问题。我们的创新框架在加权有限状态传感器（WFST）框架中部署了一种多图方法。我们将我们的WFST解码策略与训练在相同数据上的Transformer序列对序列系统进行了比较。给出了阿拉伯语和英语之间的码切换场景，我们的结果表明WFST解码方法更适合于句子间的码切换数据集。此外，转换系统在句内语码转换任务中表现较好。在这项研究中，我们发布了一个人工生成的开发和测试集，以及生态代码转换测试集，以测试ASR的性能。摘要：Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization. Recent research in multilingual ASR shows potential improvement over monolingual systems. We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. Our innovative framework deploys a multi-graph approach in the weighted finite state transducers (WFST) framework. We compare our WFST decoding strategies with a transformer sequence to sequence system trained on the same data. Given a code-switching scenario between Arabic and English languages, our results show that the WFST decoding approaches were more suitable for the intersentential code-switching datasets. In addition, the transformer system performed better for intrasentential code-switching task. With this study, we release an artificially generated development and test sets, along with ecological code-switching test set, to benchmark the ASR performance.

【2】 Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation 标题：联合端到端多人重叠语音识别和说话人属性估计的统一自回归建模

作者：Ryo Masumura,Daiki Okamura,Naoki Makishima,Mana Ihori,Akihiko Takashima,Tomohiro Tanaka,Shota Orihashi 机构：† NTT Media Intelligence Laboratories, NTT Corporation, Japan, ‡ Nagaoka University of Technology, Japan 备注：Accepted at Interspeech 2021 链接：https://arxiv.org/abs/2107.01549 摘要：本文提出了一种新的单通道多说话人重叠自动语音识别（ASR）系统建模方法。基于完全神经网络的端到端模型极大地提高了多任务重叠ASR任务的性能。端到端建模的一种很有前途的方法是具有串行输出训练的自回归建模，其中多个说话人的转录一个接一个地递归生成。这使我们能够自然地捕捉说话者之间的关系。然而，传统的建模方法不能明确地考虑个体话语的说话人属性，如性别和年龄信息。事实上，当每个演讲者的性别相同或年龄相近时，他们的演技就会下降。为了解决这个问题，我们提出了端到端多说话人重叠ASR和说话人属性估计的统一自回归模型。我们的核心思想是在统一的自回归模型中处理性别和年龄估计任务。在该方法中，基于Transformer的自回归模型不仅递归地生成文本标记，而且递归地生成每个说话人的属性标记。这使得我们能够有效地利用说话人属性来提高多说话人重叠ASR。在日语多说话人重叠ASR任务上的实验表明了该方法的有效性。摘要：In this paper, we present a novel modeling method for single-channel multi-talker overlapped automatic speech recognition (ASR) systems. Fully neural network based end-to-end models have dramatically improved the performance of multi-taker overlapped ASR tasks. One promising approach for end-to-end modeling is autoregressive modeling with serialized output training in which transcriptions of multiple speakers are recursively generated one after another. This enables us to naturally capture relationships between speakers. However, the conventional modeling method cannot explicitly take into account the speaker attributes of individual utterances such as gender and age information. In fact, the performance deteriorates when each speaker is the same gender or is close in age. To address this problem, we propose unified autoregressive modeling for joint end-to-end multi-talker overlapped ASR and speaker attribute estimation. Our key idea is to handle gender and age estimation tasks within the unified autoregressive modeling. In the proposed method, transformer-based autoregressive model recursively generates not only textual tokens but also attribute tokens of each speaker. This enables us to effectively utilize speaker attributes for improving multi-talker overlapped ASR. Experiments on Japanese multi-talker overlapped ASR tasks demonstrate the effectiveness of the proposed method.

【3】 Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition 标题：放松注意力：提高端到端自动语音识别性能的一种简单方法

作者：Timo Lohrenz,Patrick Schwarz,Zhengyang Li,Tim Fingscheidt 机构：Technische Universit¨at Braunschweig, Institute for Communications Technology, Schleinitzstr. , Braunschweig, Germany 备注：submitted to ASRU 2021 链接：https://arxiv.org/abs/2107.01275 摘要：最近，基于注意的编解码器（AED）模型在多个任务中显示出很高的端到端自动语音识别（ASR）性能。针对这类模型中的过度自信问题，本文引入了放松注意的概念，即在训练过程中简单地向编码器-解码器的注意权值逐步注入均匀分布的注意权值，这很容易用两行代码实现。我们研究了不同AED模型结构和两个突出的ASR任务，华尔街日报（WSJ）和Librispeech中放松注意的效果。我们发现，在外部语言模型的解码过程中，放松注意力训练的Transformer的表现一直优于标准的基线模型。在WSJ上，我们为基于Transformer的端到端语音识别建立了一个新的基准，在只引入一个超参数的情况下，字错误率为3.65%，比最新技术（4.20%）高出13.1%。验收后，模型将在github上发布。摘要：Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of relaxed attention, which is a simple gradual injection of a uniform distribution to the encoder-decoder attention weights during training that is easily implemented with two lines of code. We investigate the effect of relaxed attention across different AED model architectures and two prominent ASR tasks, Wall Street Journal (WSJ) and Librispeech. We found that transformers trained with relaxed attention outperform the standard baseline models consistently during decoding with external language models. On WSJ, we set a new benchmark for transformer-based end-to-end speech recognition with a word error rate of 3.65%, outperforming state of the art (4.20%) by 13.1% relative, while introducing only a single hyperparameter. Upon acceptance, models will be published on github.

语料库(1篇)

【1】 Persian-WSD-Corpus: A Sense Annotated Corpus for Persian All-words Word Sense Disambiguation 标题：波斯-WSD-语料库：波斯语全词词义消歧的义项标注语料库

作者：Hossein Rouhizadeh,Mehrnoush Shamsfard,Vahideh Tajalli,Masoud Rouhziadeh 机构：Shahid Beheshti University, Tehran, Iran, University of Tehran, Tehran, Iran, Johns Hopkis University, MD, USA 链接：https://arxiv.org/abs/2107.01540 摘要：词义消歧（WSD）是自然语言处理（NLP）中一项长期存在的任务，其目的是在给定的上下文中自动识别最相关的词义。开发标准的WSD测试集合可以作为开发和评估感兴趣的语言中的不同WSD系统的一个重要前提。尽管已经为多种语言开发了许多WSD测试集合，但是没有标准的All words WSD benchmark可用于波斯语。本文通过引入SBU-WSD语料库作为波斯语全词WSD任务的第一个标准测试集，解决了波斯语的这一不足。SBU WSD语料库是用波斯语WordNet（FarsNet）词义库中的词义手工标注的。为此，三位注释者使用SAMP（一种基于FarsNet词法图的意义注释工具）来执行注释任务。SBU-WSD语料库由19篇体育、科学、艺术等不同领域的波斯语文献组成，包括5892个波斯语文本内容词和3371个手感注释词（2073个名词、566个动词、610个形容词和122个副词）。本文在SBU语料库上对几种WSD模型进行了评价，为今后波斯语全词WSD任务的研究提供了基础。语料库在https://github.com/hrouhizadeh/SBU-WSD-Corpus. 摘要：Word Sense Disambiguation (WSD) is a long-standing task in Natural Language Processing(NLP) that aims to automatically identify the most relevant meaning of the words in a given context. Developing standard WSD test collections can be mentioned as an important prerequisite for developing and evaluating different WSD systems in the language of interest. Although many WSD test collections have been developed for a variety of languages, no standard All-words WSD benchmark is available for Persian. In this paper, we address this shortage for the Persian language by introducing SBU-WSD-Corpus, as the first standard test set for the Persian All-words WSD task. SBU-WSD-Corpus is manually annotated with senses from the Persian WordNet (FarsNet) sense inventory. To this end, three annotators used SAMP (a tool for sense annotation based on FarsNet lexical graph) to perform the annotation task. SBU-WSD-Corpus consists of 19 Persian documents in different domains such as Sports, Science, Arts, etc. It includes 5892 content words of Persian running text and 3371 manually sense annotated words (2073 nouns, 566 verbs, 610 adjectives, and 122 adverbs). Providing baselines for future studies on the Persian All-words WSD task, we evaluate several WSD models on SBU-WSD-Corpus. The corpus is publicly available at https://github.com/hrouhizadeh/SBU-WSD-Corpus.

Word2Vec|文本|单词(2篇)

【1】 Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks 标题：辅助任务数学应用题的神经符号解算器

作者：Jinghui Qin,Xiaodan Liang,Yining Hong,Jianheng Tang,Liang Lin 机构： Sun Yat-sen University, Dark Matter AI Inc., University of California, Los Angeles 备注：ACL 2021 链接：https://arxiv.org/abs/2107.01431 摘要：以前的数学文字问题解决方法遵循编码器-解码器范式，未能明确纳入必要的数学符号约束，导致无法解释和不合理的预测。在此，我们提出神经符号解算器（NS-Solver）来通过辅助任务显式地、无缝地结合不同层次的符号约束。我们的NS解算器由一个问题读取器来编码问题，一个程序员来生成符号方程，以及一个符号执行器来获得答案。在目标表达式监督的同时，我们还通过4个新的辅助目标对求解器进行了优化，以实现不同的符号推理：a）同时预测数字数量和数字位置的自监督数字预测任务；b）常识性持续预测任务，预测需要什么先验知识（例如一只鸡有多少条腿）；c）程序一致性检查器计算预测方程和目标方程之间的语义损失，保证方程的合理映射；d）对偶性开发任务利用符号方程生成和问题词性生成之间的拟对偶性来提高求解者的理解能力。此外，为了给开发通用的、可扩展的求解器提供一个更具现实性和挑战性的基准，我们还构建了一个新的大规模MWP基准CM17K，由4种MWP（算术、一个未知线性、一个未知非线性、方程组）组成，样本数超过17K。在Math23K和CM17k上的大量实验表明，与最新的方法相比，NS求解器具有优越性。摘要：Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions. Herein, we propose Neural-Symbolic Solver (NS-Solver) to explicitly and seamlessly incorporate different levels of symbolic constraints by auxiliary tasks. Our NS-Solver consists of a problem reader to encode problems, a programmer to generate symbolic equations, and a symbolic executor to obtain answers. Along with target expression supervision, our solver is also optimized via 4 new auxiliary objectives to enforce different symbolic reasoning: a) self-supervised number prediction task predicting both number quantity and number locations; b) commonsense constant prediction task predicting what prior knowledge (e.g. how many legs a chicken has) is required; c) program consistency checker computing the semantic loss between predicted equation and target equation to ensure reasonable equation mapping; d) duality exploiting task exploiting the quasi duality between symbolic equation generation and problem's part-of-speech generation to enhance the understanding ability of a solver. Besides, to provide a more realistic and challenging benchmark for developing a universal and scalable solver, we also construct a new large-scale MWP benchmark CM17K consisting of 4 kinds of MWPs (arithmetic, one-unknown linear, one-unknown non-linear, equation set) with more than 17K samples. Extensive experiments on Math23K and our CM17k demonstrate the superiority of our NS-Solver compared to state-of-the-art methods.

【2】 Scarecrow: A Framework for Scrutinizing Machine Text 标题：稻草人：一个细读机器文本的框架

作者：Yao Dou,Maxwell Forbes,Rik Koncel-Kedziorski,Noah A. Smith,Yejin Choi 机构：†Paul G. Allen School of Computer Science & Engineering, University of Washington, ‡Allen Institute for AI 备注：The project webpage is at this https URL 链接：https://arxiv.org/abs/2107.01294 摘要：现代的神经文本生成系统可以生成非常流畅的语法文本。早期的语言模式存在重复和句法错误，而现代模式所犯的错误往往是语义、叙述或话语失误。为了便于研究这些复杂的错误类型，我们引入了一种新的结构化的众包错误注释模式，称为稻草人。将专家分析与多轮无本体群组注释相结合，确定稻草人中使用的错误类别，如冗余、常识性错误和不一致性，从而得到一个覆盖真实机器生成文本中错误现象的模式。我们使用稻草人收集了13k条注释，共1.3k条人工和机器生成的英语新闻文本段落，总计超过41k个跨段，每个跨段都标有错误类别、严重程度、自然语言解释和先行跨段（如果相关）。我们收集由具有不同已知性能水平的最先进系统生成的文本的注释，从GPT-2小到最大的GPT-3。我们分离出几个因素进行详细分析，包括参数计数、训练数据和解码技术。我们的结果显示了在这些设置中预期的和令人惊讶的差异。这些发现证明了稻草人注释在评估当前和未来文本生成系统中的价值。我们发布了完整的注解工具包和数据集https://yao-dou.github.io/scarecrow/. 摘要：Modern neural text generation systems can produce remarkably fluent and grammatical texts. While earlier language models suffered from repetition and syntactic errors, the errors made by contemporary models are often semantic, narrative, or discourse failures. To facilitate research of these complex error types, we introduce a new structured, crowdsourced error annotation schema called Scarecrow. The error categories used in Scarecrow -- such as redundancy, commonsense errors, and incoherence -- were identified by combining expert analysis with several pilot rounds of ontology-free crowd annotation to arrive at a schema which covers the error phenomena found in real machine generated text. We use Scarecrow to collect 13k annotations of 1.3k human and machine generate paragraphs of English language news text, amounting to over 41k spans each labeled with its error category, severity, a natural language explanation, and antecedent span (where relevant). We collect annotations for text generated by state-of-the-art systems with varying known performance levels, from GPT-2 Small through the largest GPT-3. We isolate several factors for detailed analysis, including parameter count, training data, and decoding technique. Our results show both expected and surprising differences across these settings. These findings demonstrate the value of Scarecrow annotations in the assessment of current and future text generation systems. We release our complete annotation toolkit and dataset at https://yao-dou.github.io/scarecrow/.

其他神经网络|深度学习|模型|建模(5篇)

【1】 Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence 标题：自动主题模型评估被打破了吗？--连贯的不连贯

作者：Alexander Hoyle,Pranav Goel,Denis Peskov,Andrew Hian-Cheong,Jordan Boyd-Graber,Philip Resnik 机构：Computer Science, CS, iSchool, UMIACS, LSC, UMIACS, Lingusitics, University of Maryland 链接：https://arxiv.org/abs/2107.02173 摘要：主题模型评估和其他无监督方法的评估一样，可能会引起争议。然而，这一领域已经围绕着主题连贯性的自动估计而展开，这依赖于参考语料库中单词共现的频率。根据这些指标，最近的基于神经组件的模型超过了经典的主题模型。同时，与经典模型不同，神经主题模型评价的实践存在一个验证缺口：神经模型的自动一致性还没有通过人体实验得到验证。此外，正如我们通过主题建模文献的元分析所显示的，在自动化主题建模基准的使用方面存在着巨大的标准化差距。我们解决了标准化差距和验证差距。使用两个最广泛使用的主题模型评估数据集，我们评估了一个主导的经典模型和两个国家的最先进的神经模型在一个系统，明确记录，可复制的方式。我们使用自动连贯以及两个最广泛接受的人类判断任务，即主题评级和单词入侵。当相应的人工评估不存在时，自动评估将宣布一个模型与另一个模型显著不同，从而对独立于人工判断的全自动评估的有效性提出质疑。摘要：Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Recent models relying on neural components surpass classical topic models according to these metrics. At the same time, unlike classical models, the practice of neural topic model evaluation suffers from a validation gap: automatic coherence for neural models has not been validated using human experimentation. In addition, as we show via a meta-analysis of topic modeling literature, there is a substantial standardization gap in the use of automated topic modeling benchmarks. We address both the standardization gap and the validation gap. Using two of the most widely used topic model evaluation datasets, we assess a dominant classical model and two state-of-the-art neural models in a systematic, clearly documented, reproducible way. We use automatic coherence along with the two most widely accepted human judgment tasks, namely, topic rating and word intrusion. Automated evaluation will declare one model significantly different from another when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.

【2】 Deep Learning Schema-based Event Extraction: Literature Review and Current Trends 标题：基于深度学习图式的事件抽取：文献综述与发展趋势

作者：Qian Li,Hao Peng,Jianxin Li,Yiming Hei,Rui Sun,Jiawei Sheng,Shu Guo,Lihong Wang,Philip S. Yu 机构： Beihang University 链接：https://arxiv.org/abs/2107.02126 摘要：基于模式的事件抽取是快速理解事件本质内容的关键技术。随着深度学习技术的迅速发展，基于深度学习的事件抽取技术成为研究热点。文献中提出了许多方法、数据集和评价指标，这就需要进行全面和更新的调查。本文通过回顾最新的方法填补了这一空白，重点是基于深度学习的模型。我们总结了基于模式的事件抽取的任务定义、范式和模型，并对每一种模型进行了详细的讨论。我们引入了支持预测和评估指标测试的基准数据集。本文还对不同技术进行了综合比较。最后，总结了该研究领域未来的研究方向。摘要：Schema-based event extraction is a critical technique to apprehend the essential content of events promptly. With the rapid development of deep learning technology, event extraction technology based on deep learning has become a research hotspot. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models. We summarize the task definition, paradigm, and models of schema-based event extraction and then discuss each of these in detail. We introduce benchmark datasets that support tests of predictions and evaluation metrics. A comprehensive comparison between different techniques is also provided in this survey. Finally, we conclude by summarizing future research directions facing the research area.

【3】 Tackling COVID-19 Infodemic using Deep Learning 标题：基于深度学习的冠状病毒信息处理

作者：Prathmesh Pathwar,Simran Gill 机构：Indian Institute of Information Technology, Allahabad 备注：15 pages, 4 figures, Accepted in 4th International Conference on Computational Intelligence and Data Engineering 链接：https://arxiv.org/abs/2107.02012 摘要：人类正在与现代史上最有害的病毒之一COVID-19大流行作斗争，但是伴随着这场大流行，一种信息学的错误信息渗透到学生和社会中，从而加剧了当前的疾病。我们尝试对网络媒体上的虚假新闻进行检测和分类，以检测与COVID-19和冠状病毒相关的虚假信息。该数据集包含从politifact等事实核查网站收集的虚假帖子、文章和新闻，而真实的推文则来自经过验证的twitter句柄。我们结合了多种传统的分类技术，如朴素贝叶斯、KNN、梯度增强和随机森林以及深度学习方法，特别是CNN、RNN、DNN和集成模型RMDL。我们用TF-IDF和手套词嵌入两种特征提取技术对这些方法进行了分析，这两种技术可以对在线媒体上包含COVID-19信息的数据集提供更深入的见解。摘要：Humanity is battling one of the most deleterious virus in modern history, the COVID-19 pandemic, but along with the pandemic there's an infodemic permeating the pupil and society with misinformation which exacerbates the current malady. We try to detect and classify fake news on online media to detect fake information relating to COVID-19 and coronavirus. The dataset contained fake posts, articles and news gathered from fact checking websites like politifact whereas real tweets were taken from verified twitter handles. We incorporated multiple conventional classification techniques like Naive Bayes, KNN, Gradient Boost and Random Forest along with Deep learning approaches, specifically CNN, RNN, DNN and the ensemble model RMDL. We analyzed these approaches with two feature extraction techniques, TF-IDF and GloVe Word Embeddings which would provide deeper insights into the dataset containing COVID-19 info on online media.

【4】 CasEE: A Joint Learning Framework with Cascade Decoding for Overlapping Event Extraction 标题：CASEE：一种用于重叠事件抽取的级联解码联合学习框架

作者：Jiawei Sheng,Shu Guo,Bowen Yu,Qian Li,Yiming Hei,Lihong Wang,Tingwen Liu,Hongbo Xu 机构：Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, NationalComputerNetworkEmergencyResponseTechnicalTeamCoordinationCenterofChina 备注：11 pages, 2 figures 链接：https://arxiv.org/abs/2107.01583 摘要：事件抽取是一项重要的信息抽取任务，旨在抽取文本中的事件信息。现有的方法大多假设事件出现在没有重叠的句子中，这不适用于复杂的重叠事件提取。本文系统地研究了现实事件重叠问题，即一个词可以作为多种类型的触发语，也可以作为不同角色的论据。为了解决上述问题，我们提出了一种新的基于级联译码的重叠事件提取联合学习框架CasEE。特别地，CasEE依次执行类型检测、触发提取和参数提取，其中重叠目标根据特定的前一预测分别提取。所有子任务在一个框架中联合学习，以捕获子任务之间的依赖关系。对一个公共事件抽取基准FewFC的评估表明，CasEE在重叠事件抽取方面比以往的竞争性方法有了显著的改进。摘要：Event extraction (EE) is a crucial information extraction task that aims to extract event information in texts. Most existing methods assume that events appear in sentences without overlaps, which are not applicable to the complicated overlapping event extraction. This work systematically studies the realistic event overlapping problem, where a word may serve as triggers with several types or arguments with different roles. To tackle the above problem, we propose a novel joint learning framework with cascade decoding for overlapping event extraction, termed as CasEE. Particularly, CasEE sequentially performs type detection, trigger extraction and argument extraction, where the overlapped targets are extracted separately conditioned on the specific former prediction. All the subtasks are jointly learned in a framework to capture dependencies among the subtasks. The evaluation on a public event extraction benchmark FewFC demonstrates that CasEE achieves significant improvements on overlapping event extraction over previous competitive methods.

【5】 Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model 标题：面向音频的多模态机器理解：任务、数据集和模型

作者：Zhiqi Huang,Fenglin Liu,Xian Wu,Shen Ge,Helin Wang,Wei Fan,Yuexian Zou 机构： ADSPLAB, School of ECE, Peking University, China, Tencent, China, Peng Cheng Laboratory, China 备注：AAAI 2021 链接：https://arxiv.org/abs/2107.01571 摘要：近年来，机器理解（Machine Comprehension，MC）引起了广泛的研究兴趣，现有的方法主要属于机器阅读理解任务的范畴，它通过挖掘文本输入（段落和问题）来预测答案（选项或文本跨度）。然而，在英语听力测试中，除了文本输入外，还有许多任务是接受音频输入的。本文主要研究面向音频的多模态机器理解问题，其目标是基于给定的音频和文本信息来回答问题。为了解决这个问题，我们提出了一个动态的模态间和模态内注意（DIIA）模型来有效地融合两种模态（音频和文本）。DIIA可以作为一个独立的组件工作，因此可以很容易地集成到现有的MC模型中。此外，我们进一步开发了一个多模态知识提取（MKD）模块，使我们的多模态MC模型能够仅基于文本或音频准确地预测答案。因此，本文提出的方法可以在一个单一的模型中处理多种任务，包括：面向音频的多模态机器理解、机器阅读理解和机器听力理解，使得我们的模型与现有的单峰MC模型进行比较成为可能。实验结果和分析证明了该方法的有效性。首先，所提出的DIIA在精度上提高了基准模型21.08%；其次，在单峰情景下，MKD模块使我们的多峰MC模型比单峰模型的性能显著提高了18.87%，而单峰模型仅通过音频或文本数据进行训练和测试。摘要：While Machine Comprehension (MC) has attracted extensive research interests in recent years, existing approaches mainly belong to the category of Machine Reading Comprehension task which mines textual inputs (paragraphs and questions) to predict the answers (choices or text spans). However, there are a lot of MC tasks that accept audio input in addition to the textual input, e.g. English listening comprehension test. In this paper, we target the problem of Audio-Oriented Multimodal Machine Comprehension, and its goal is to answer questions based on the given audio and textual information. To solve this problem, we propose a Dynamic Inter- and Intra-modality Attention (DIIA) model to effectively fuse the two modalities (audio and textual). DIIA can work as an independent component and thus be easily integrated into existing MC models. Moreover, we further develop a Multimodal Knowledge Distillation (MKD) module to enable our multimodal MC model to accurately predict the answers based only on either the text or the audio. As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models. Experimental results and analysis prove the effectiveness of the proposed approaches. First, the proposed DIIA boosts the baseline models by up to 21.08% in terms of accuracy; Second, under the unimodal scenarios, the MKD module allows our multimodal MC model to significantly outperform the unimodal models by up to 18.87%, which are trained and tested with only audio or textual data.

其他(4篇)

【1】 FaVIQ: FAct Verification from Information-seeking Questions 标题：FAVIQ：从信息寻求性问题中验证事实

作者：Jungsoo Park,Sewon Min,Jaewoo Kang,Luke Zettlemoyer,Hannaneh Hajishirzi 机构：Korea University, University of Washington, Allen Institute of AI 备注：12 pages, 3 figures; Data & Code available at this https URL 链接：https://arxiv.org/abs/2107.02153 摘要：尽管人们对开发通用的事实检验模型非常感兴趣，但要构建一个包含现实世界中可能发生的真实声明的大规模事实检验数据集仍然是一个挑战。现有的声明要么是由crowdworks编写的，从而引入了难以控制的微妙偏见，要么是由专业的事实核查人员手动验证的，导致它们的成本高昂且规模有限。在本文中，我们利用不知道如何回答的真实用户提出的信息寻求问题，构建了一个具有挑战性的、真实的、大规模的事实验证数据集FaVIQ。信息寻求问题中的模糊性使得能够自动构建真实和错误的声明，以反映用户产生的混淆（例如，电影拍摄与发行的年份）。我们的主张被证实是自然的，包含很少的词汇偏见，并需要一个完整的证据的理解进行验证。我们的实验表明，最先进的模型远远不能解决我们的新任务。此外，对我们的数据进行训练有助于专业的事实检查，比对最广泛使用的数据集或域内数据进行训练的模型的绝对性能高出17%。总之，我们的数据将作为一个具有挑战性的基准，自然语言的理解和支持未来的进展，在专业事实检查。摘要：Despite significant interest in developing general purpose fact checking models, it is challenging to construct a large-scale fact verification dataset with realistic claims that would occur in the real world. Existing claims are either authored by crowdworkers, thereby introducing subtle biases that are difficult to control for, or manually verified by professional fact checkers, causing them to be expensive and limited in scale. In this paper, we construct a challenging, realistic, and large-scale fact verification dataset called FaVIQ, using information-seeking questions posed by real users who do not know how to answer. The ambiguity in information-seeking questions enables automatically constructing true and false claims that reflect confusions arisen from users (e.g., the year of the movie being filmed vs. being released). Our claims are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification. Our experiments show that the state-of-the-art models are far from solving our new task. Moreover, training on our data helps in professional fact-checking, outperforming models trained on the most widely used dataset FEVER or in-domain data by up to 17% absolute. Altogether, our data will serve as a challenging benchmark for natural language understanding and support future progress in professional fact checking.

【2】 The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task 标题：IWPT 2021共享任务的DCU-EPFL增强型依赖解析器

作者：James Barry,Alireza Mohammadshahi,Joachim Wagner,Jennifer Foster,James Henderson 机构： ADAPT Centre, School of Computing, Dublin City University, Idiap Research Institute, ´Ecole Polytechnique F´ed´erale de Lausanne—EPFL 备注：Submitted to the IWPT 2021 Shared Task: From Raw Text to Enhanced Universal Dependencies: the Parsing Shared Task at IWPT 2021 链接：https://arxiv.org/abs/2107.01982 摘要：我们描述了DCU-EPFL提交给IWPT2021共享任务的过程，该任务将解析为增强的通用依赖关系。这项任务涉及解析增强的UD图，UD图是基本依赖树的扩展，旨在更方便地表示语义结构。评估在17种语言的29个树库上进行，参与者需要从原始字符串开始解析每种语言的数据。我们的方法使用节管道来预处理文本文件，XLMRoBERTa来获得上下文化的标记表示，以及一个边缘评分和标记模型来预测增强的图。最后，我们运行一个后处理脚本，以确保所有输出都是有效的增强UD图。我们的系统在9名参与者中排名第6，粗略的增强标记依恋得分（ELAS）为83.57。我们还进行了额外的截止期后实验，其中包括使用Trankit进行预处理、XLM-RoBERTa-LARGE、treebank连接，以及在基本和增强依赖解析器之间进行多任务学习。所有这些修改都提高了我们的初始分数，我们的最终系统有一个88.04的粗略弹性。摘要：We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLMRoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a post-processing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa-LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04.

【3】 End-to-end Neural Coreference Resolution Revisited: A Simple yet Effective Baseline 标题：重新审视端到端神经共指分解：一种简单而有效的基线

作者：Tuan Manh Lai,Trung Bui,Doo Soon Kim 机构： University of Illinois at Urbana-Champaign, USA, Adobe Research, San Jose, CA, Roku Inc., San Jose, CA 链接：https://arxiv.org/abs/2107.01700 摘要：自从第一个端到端的神经共指消解模型被提出以来，人们对该模型进行了许多扩展，从使用高阶推理到使用强化学习直接优化评价指标。尽管在很大程度上提高了共指解析性能，但是这些扩展给原始模型增加了很多额外的复杂性。基于这一观察和预训练Transformer语言模型的最新进展，我们提出了一个简单而有效的共指消解基线。我们的模型是原始神经共指消解模型的简化版本，然而，它取得了令人印象深刻的性能，优于所有最近在公共英语OntoNotes基准上的扩展工作。我们的工作提供了证据，证明有必要仔细证明现有或新提出的模型的复杂性，因为对现有模型引入概念或实际简化仍然可以产生有竞争力的结果。摘要：Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning. Despite improving the coreference resolution performance by a large margin, these extensions add a lot of extra complexity to the original model. Motivated by this observation and the recent advances in pre-trained Transformer language models, we propose a simple yet effective baseline for coreference resolution. Our model is a simplified version of the original neural coreference resolution model, however, it achieves impressive performance, outperforming all recent extended works on the public English OntoNotes benchmark. Our work provides evidence for the necessity of carefully justifying the complexity of existing or newly proposed models, as introducing a conceptual or practical simplification to an existing model can still yield competitive results.

【4】 Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors 标题：基于全局吸引子和局部吸引子的无限说话人神经二值化

作者：Shota Horiguchi,Shinji Watanabe,Paola Garcia,Yawen Xue,Yuki Takashima,Yohei Kawaguchi 机构：Hitachi, Ltd., Japan, Carnegie Mellon University, USA, Johns Hopkins University, USA 链接：https://arxiv.org/abs/2107.01545 摘要：基于吸引子的端到端二值化在具有挑战性的数据集上实现了与经过仔细调整的传统基于聚类的方法相当的精度。然而，它的主要缺点是不能处理说话者的数目大于训练中观察到的数目的情况。这是因为它的说话人计数依赖于监督学习。在这项工作中，我们引入了一个嵌入在基于吸引子的端到端二值化中的无监督聚类过程。我们首先将一系列的逐帧嵌入分解成短的子序列，然后对每个子序列执行基于吸引子的二值化。给定子序列的二值化结果，通过对所有子序列吸引子计算出的向量进行无监督聚类，得到子序列间的说话人对应关系。这使得即使每个子序列的输出扬声器的数量是有限的，也可以为整个记录产生大量扬声器的二值化结果。实验结果表明，该方法能对未知数量的说话人产生准确的二值化结果。我们的方法在CALLHOME、DIHARD II和DIHARD III数据集上分别达到11.84%、28.33%和19.49%，均优于传统的端到端二值化方法。摘要：Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number of speakers is larger than the one observed during training. This is because its speaker counting relies on supervised learning. In this work, we introduce an unsupervised clustering process embedded in the attractor-based end-to-end diarization. We first split a sequence of frame-wise embeddings into short subsequences and then perform attractor-based diarization for each subsequence. Given subsequence-wise diarization results, inter-subsequence speaker correspondence is obtained by unsupervised clustering of the vectors computed from the attractors from all the subsequences. This makes it possible to produce diarization results of a large number of speakers for the whole recording even if the number of output speakers for each subsequence is limited. Experimental results showed that our method could produce accurate diarization results of an unseen number of speakers. Our method achieved 11.84 %, 28.33 %, and 19.49 % on the CALLHOME, DIHARD II, and DIHARD III datasets, respectively, each of which is better than the conventional end-to-end diarization methods.

linux https 网络安全 NLP服务编程算法

0 人点赞