自然语言处理学术速递[8.17]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.CL 方向，今日共计27篇

Transformer(1篇)

【1】 BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification 标题：BloomNet：一种基于鲁棒变换的Bloom学习结果分类模型链接：https://arxiv.org/abs/2108.07249

作者：Abdul Waheed,Muskan Goyal,Nimisha Mittal,Deepak Gupta,Ashish Khanna,Moolchand Sharma 机构：α Maharaja Agrasen Institute of Technology, New Delhi, India. 备注：Bloom's Taxonomy, Natural Language Processing, Transformer, Robustness and Generalization 摘要：布鲁姆分类法是将教育学习目标分为认知、情感和精神运动三个学习层次的常用范例。为了优化教育计划，根据布鲁姆分类法的不同认知水平设计课程学习结果（CLO）至关重要。通常，机构的管理员手动完成将CLO和试题映射到Bloom分类级别的繁琐工作。为了解决这个问题，我们提出了一个名为BloomNet的基于转换器的模型，该模型捕获语言和语义信息，对课程学习成果（CLO）进行分类。我们将BloomNet与一组不同的基本基线和强基线进行比较，发现我们的模型比所有实验基线的性能都好。此外，我们还通过评估BloomNet在不同分布上的泛化能力来测试其泛化能力，我们的模型在训练过程中没有遇到这种情况，并且我们观察到，与其他考虑的模型相比，我们的模型不太容易受到分布转移的影响。我们通过进行广泛的结果分析来支持我们的发现。在本研究中，我们观察到显式封装语言信息和语义信息可以提高模型的IID（独立同分布）性能和OOD（非分布）泛化能力。摘要：Bloom taxonomy is a common paradigm for categorizing educational learning objectives into three learning levels: cognitive, affective, and psychomotor. For the optimization of educational programs, it is crucial to design course learning outcomes (CLOs) according to the different cognitive levels of Bloom Taxonomy. Usually, administrators of the institutions manually complete the tedious work of mapping CLOs and examination questions to Bloom taxonomy levels. To address this issue, we propose a transformer-based model named BloomNet that captures linguistic as well semantic information to classify the course learning outcomes (CLOs). We compare BloomNet with a diverse set of basic as well as strong baselines and we observe that our model performs better than all the experimented baselines. Further, we also test the generalization capability of BloomNet by evaluating it on different distributions which our model does not encounter during training and we observe that our model is less susceptible to distribution shift compared to the other considered models. We support our findings by performing extensive result analysis. In ablation study we observe that on explicitly encapsulating the linguistic information along with semantic information improves the model on IID (independent and identically distributed) performance as well as OOD (out-of-distribution) generalization capability.

QA|VQA|问答|对话(2篇)

【1】 HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation 标题：HITAB：一种用于问答和自然语言生成的层次化表格数据集链接：https://arxiv.org/abs/2108.06712

作者：Zhoujun Cheng,Haoyu Dong,Zhiruo Wang,Ran Jia,Jiaqi Guo,Yan Gao,Shi Han,Jian-Guang Lou,Dongmei Zhang 机构：Shanghai Jiao Tong University, Microsoft Research Asia, Carnegie Mellon University, Xi’an Jiaotong University 备注：Pre-print 摘要：表通常使用层次结构创建，但现有的表推理工作主要集中在平面表上，而忽略了层次结构表。层次表通过层次索引以及计算和语义的隐式关系挑战现有方法。这项工作介绍了HiTab，一个自由开放的数据集，供研究社区研究分层表上的问答（QA）和自然语言生成（NLG）。HiTab是一个跨域数据集，由大量统计报告和维基百科页面构建而成，具有独特的特征：（1）几乎所有的表格都是分层的，（2）NLG的目标句子和QA的问题都是从统计报告中有意义和多样性的高质量描述中修订的(3） HiTab提供了关于实体和数量对齐的细粒度注释。针对层次结构，我们设计了一种新的具有层次意识的逻辑形式，用于表上的符号推理，具有很高的效率。然后给出实体和数量对齐的注释，我们提出了部分监督训练，这有助于模型在很大程度上减少QA任务中的虚假预测。在NLG任务中，我们发现实体和数量对齐也有助于NLG模型在条件生成设置中生成更好的结果。最新基线的实验结果表明，该数据集对未来的研究提出了巨大的挑战和有价值的基准。摘要：Tables are often created with hierarchies, but existing works on table reasoning mainly focus on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods by hierarchical indexing, as well as implicit relationships of calculation and semantics. This work presents HiTab, a free and open dataset for the research community to study question answering (QA) and natural language generation (NLG) over hierarchical tables. HiTab is a cross-domain dataset constructed from a wealth of statistical reports and Wikipedia pages, and has unique characteristics: (1) nearly all tables are hierarchical, and (2) both target sentences for NLG and questions for QA are revised from high-quality descriptions in statistical reports that are meaningful and diverse. (3) HiTab provides fine-grained annotations on both entity and quantity alignment. Targeting hierarchical structure, we devise a novel hierarchy-aware logical form for symbolic reasoning over tables, which shows high effectiveness. Then given annotations of entity and quantity alignment, we propose partially supervised training, which helps models to largely reduce spurious predictions in the QA task. In the NLG task, we find that entity and quantity alignment also helps NLG models to generate better results in a conditional generation setting. Experiment results of state-of-the-art baselines suggest that this dataset presents a strong challenge and a valuable benchmark for future research.

【2】 Complex Knowledge Base Question Answering: A Survey 标题：复杂知识库问答研究综述链接：https://arxiv.org/abs/2108.06688

作者：Yunshi Lan,Gaole He,Jinhao Jiang,Jing Jiang,Wayne Xin Zhao,Ji-Rong Wen 机构：He is with the School of Information, Renmin University of China, andBeijing Key Laboratory of Big Data Management and Analysis Methods 备注：20 pages, 4 tables, 7 figures 摘要：知识库问答（KBQA）旨在通过知识库（KB）回答问题。早期的研究主要集中在通过KBs回答简单的问题，并取得了巨大的成功。然而，他们在复杂问题上的表现仍然远远不能令人满意。因此，近年来，研究者们提出了大量新的方法来研究回答复杂问题的挑战。在这篇综述中，我们回顾了KBQA在解决复杂问题方面的最新进展，这些问题通常包含多个主题、表示复合关系或涉及数值运算。我们首先详细介绍复杂的KBQA任务和相关背景。然后，我们描述了复杂KBQA任务的基准数据集，并介绍了这些数据集的构建过程。接下来，我们将介绍两种主流的复杂KBQA方法，即基于语义分析（SP）的方法和基于信息检索（IR）的方法。具体来说，我们用流程设计来说明它们的程序，并讨论它们的主要区别和相似之处。之后，我们总结了这两类方法在回答复杂问题时遇到的挑战，并阐述了现有工作中使用的高级解决方案和技术。最后，我们总结并讨论了与复杂KBQA相关的未来研究方向。摘要：Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performance on complex questions is still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances on KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and relevant background. Then, we describe benchmark datasets for complex KBQA task and introduce the construction process of these datasets. Next, we present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. Specifically, we illustrate their procedures with flow designs and discuss their major differences and similarities. After that, we summarize the challenges that these two categories of methods encounter when answering complex questions, and explicate advanced solutions and techniques used in existing work. Finally, we conclude and discuss several promising directions related to complex KBQA for future research.

机器翻译(1篇)

【1】 Active Learning for Massively Parallel Translation of Constrained Text into Low Resource Languages 标题：面向受限文本到低资源语言大规模并行翻译的主动学习链接：https://arxiv.org/abs/2108.07127

作者：Zhong Zhou,Alex Waibel 机构：Language Technology Institute, School of Computer Science, Carnegie Mellon University, Forbes Ave, Pittsburgh PA 备注：None 摘要：我们将一个事先已知并有多种语言可供使用的封闭文本翻译成一种新的、资源严重不足的语言。大多数人工翻译工作采用基于部分的方法按顺序翻译连续的页面/章节，这可能不适合机器翻译。我们比较了局部优化文本连贯性的基于部分的方法和全局增加文本覆盖率的随机抽样方法。我们的结果表明，随机抽样方法的性能更好。当在《圣经》约1000行的种子语料库上进行训练，并在《圣经》的其余部分（约30000行）上进行测试时，随机抽样使用英语作为模拟低资源语言时，性能增益为 11.0 BLEU，使用玛雅语言东方博康基时，性能增益为 4.9 BLEU。此外，我们还比较了三种通过迭代更新机器翻译模型的方法，以增加人工后编辑数据的数量。我们发现，在没有自我监督的情况下，在词汇更新后向训练中添加新的后编辑数据效果最好。我们提出了一种人与机器无缝协作的算法，将封闭文本翻译成资源严重不足的语言。摘要：We translate a closed text that is known in advance and available in many languages into a new and severely low resource language. Most human translation efforts adopt a portion-based approach to translate consecutive pages/chapters in order, which may not suit machine translation. We compare the portion-based approach that optimizes coherence of the text locally with the random sampling approach that increases coverage of the text globally. Our results show that the random sampling approach performs better. When training on a seed corpus of ~1,000 lines from the Bible and testing on the rest of the Bible (~30,000 lines), random sampling gives a performance gain of 11.0 BLEU using English as a simulated low resource language, and 4.9 BLEU using Eastern Pokomchi, a Mayan language. Furthermore, we compare three ways of updating machine translation models with increasing amount of human post-edited data through iterations. We find that adding newly post-edited data to training after vocabulary update without self-supervision performs the best. We propose an algorithm for human and machine to work together seamlessly to translate a closed text into a severely low resource language.

语义分析(1篇)

【1】 ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration 标题：Rosita：通过跨模态和内模态知识集成增强视觉和语言语义对齐链接：https://arxiv.org/abs/2108.07073

作者：Yuhao Cui,Zhou Yu,Chunqi Wang,Zhongzhou Zhao,Ji Zhang,Meng Wang,Jun Yu 机构：School of Computer Science and Technology, Hangzhou Dianzi University, China, Alibaba Group, China, School of Computer Science and Information Engineering, Hefei University of Technology, China 备注：Accepted at ACM Multimedia 2021. Code available at this https URL 摘要：视觉和语言预训练（VLP）旨在从海量图像-文本对中学习通用多模态表示。虽然已经提出了各种成功的尝试，但学习图像-文本对之间的细粒度语义对齐在其方法中起着关键作用。然而，大多数现有的VLP方法没有充分利用图像-文本对中的固有知识，这限制了学习对齐的有效性，并进一步限制了其模型的性能。为此，我们引入了一种称为ROSITA的新VLP方法，该方法将跨模态和模态内知识集成到一个统一的场景图中，以增强语义对齐。具体来说，我们引入了一种新的结构知识掩蔽（SKM）策略，以场景图结构作为先验来执行掩蔽语言（区域）建模，该策略通过消除模式内部和跨模式的干扰信息来增强语义对齐。广泛的消融研究和综合分析证实了ROSITA在语义对齐方面的有效性。通过对域内和域外数据集的预训练，ROSITA在三个典型的视觉和语言任务上显著优于现有的最先进的VLP方法，超过了六个基准数据集。摘要：Vision-and-language pretraining (VLP) aims to learn generic multimodal representations from massive image-text pairs. While various successful attempts have been proposed, learning fine-grained semantic alignments between image-text pairs plays a key role in their approaches. Nevertheless, most existing VLP approaches have not fully utilized the intrinsic knowledge within the image-text pairs, which limits the effectiveness of the learned alignments and further restricts the performance of their models. To this end, we introduce a new VLP method called ROSITA, which integrates the cross- and intra-modal knowledge in a unified scene graph to enhance the semantic alignments. Specifically, we introduce a novel structural knowledge masking (SKM) strategy to use the scene graph structure as a priori to perform masked language (region) modeling, which enhances the semantic alignments by eliminating the interference information within and across modalities. Extensive ablation studies and comprehensive analysis verifies the effectiveness of ROSITA in semantic alignments. Pretrained with both in-domain and out-of-domain datasets, ROSITA significantly outperforms existing state-of-the-art VLP methods on three typical vision-and-language tasks over six benchmark datasets.

Graph|知识图谱|Knowledge(2篇)

【1】 Contextual Mood Analysis with Knowledge Graph Representation for Hindi Song Lyrics in Devanagari Script 标题：基于知识图表示的梵文印地语歌词语境语气分析链接：https://arxiv.org/abs/2108.06947

作者：Makarand Velankar,Rachita Kotian,Parag Kulkarni 机构：Cummins College of Engineering for women, Karvenagar, Pune, India., CEO and Chief Scientist at Kvinna Limited, Pune, Maharashtra, India. 备注：16 pages 摘要：歌词在传达歌曲情绪方面起着重要作用，是理解和解释音乐交流的信息。传统的自然语言处理方法将印地语文本翻译成英语进行分析。这种方法不适用于歌词，因为它可能会失去固有的预期上下文含义。因此，确定有必要开发一个Devanagari文本分析系统。实验中使用了300首歌词的数据集，这些歌词在五种不同的情绪中具有相同的分布。该系统以德瓦纳加里文本格式对印地语歌词进行上下文情绪分析。上下文分析存储为知识库，使用增量学习方法和新数据进行更新。带有情绪和相关重要上下文术语的上下文知识图提供了所用歌词数据集的图形表示。测试结果表明，情绪预测的准确率为64%。这项工作可以很容易地扩展到与印地语文学作品相关的应用，如摘要、索引、上下文检索、基于上下文的分类和文档分组。摘要：Lyrics play a significant role in conveying the song's mood and are information to understand and interpret music communication. Conventional natural language processing approaches use translation of the Hindi text into English for analysis. This approach is not suitable for lyrics as it is likely to lose the inherent intended contextual meaning. Thus, the need was identified to develop a system for Devanagari text analysis. The data set of 300 song lyrics with equal distribution in five different moods is used for the experimentation. The proposed system performs contextual mood analysis of Hindi song lyrics in Devanagari text format. The contextual analysis is stored as a knowledge base, updated using an incremental learning approach with new data. Contextual knowledge graph with moods and associated important contextual terms provides the graphical representation of the lyric data set used. The testing results show 64% accuracy for the mood prediction. This work can be easily extended to applications related to Hindi literary work such as summarization, indexing, contextual retrieval, context-based classification and grouping of documents.

【2】 DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants 标题：Dexter：用于虚拟助手命名实体识别的外部知识深度编码链接：https://arxiv.org/abs/2108.06633

作者：Deepak Muralidharan,Joel Ruben Antony Moniz,Weicheng Zhang,Stephen Pulman,Lin Li,Megan Barnes,Jingjing Pan,Jason Williams,Alex Acero 机构：Apple, USA, Apple, UK, University of Washington, USA 备注：Interspeech 2021 摘要：命名实体识别（NER）通常是在书面来源良好的文本上开发和测试的。然而，在智能语音助理中，NER是一个重要组件，由于用户或语音识别错误，NER的输入可能会有噪声。在应用程序中，实体标签可能会频繁更改，并且可能需要非文本属性（如主题性或流行性）来在备选方案中进行选择。我们描述了一个旨在解决这些问题的NER系统。我们在一个专有的用户派生数据集上测试和训练这个系统。我们将其与基线纯文本NER系统进行比较；使用外部地名录增强基线；我们下面介绍的搜索和间接标记技术增强了基线。最终配置使NER错误率降低约6%。我们还表明，该技术改进了相关任务，如语义分析，错误率提高了5%。摘要：Named entity recognition (NER) is usually developed and tested on text from well-written sources. However, in intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. In applications, entity labels may change frequently, and non-textual properties like topicality or popularity may be needed to choose among alternatives. We describe a NER system intended to address these problems. We test and train this system on a proprietary user-derived dataset. We compare with a baseline text-only NER system; the baseline enhanced with external gazetteers; and the baseline enhanced with the search and indirect labelling techniques we describe below. The final configuration gives around 6% reduction in NER error rate. We also show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.

摘要|信息提取(1篇)

【1】 An Effective System for Multi-format Information Extraction 标题：一种高效的多格式信息抽取系统链接：https://arxiv.org/abs/2108.06957

作者：Yaduo Liu,Longhui Zhang,Shujuan Yin,Xiaofeng Zhao,Feiliang Ren 机构：School of Computer Science and Engineering, Northeastern University, Shenyang, China 备注：NLPCC-Evaluation 2021 摘要：2021年语言与情报挑战赛中的多格式信息提取任务旨在从不同维度综合评估信息提取。它由一个多插槽关系提取子任务和两个事件提取子任务组成，这些子任务从句子级和文档级提取事件。在这里，我们描述了我们的系统，用于这个多格式信息提取竞赛任务。具体来说，对于关系抽取子任务，我们将其转换为传统的三重抽取任务，并设计了一种基于投票的方法，该方法充分利用了现有的模型。对于句子级事件提取子任务，我们将其转换为NER任务，并使用基于指针标记的方法进行提取。此外，考虑到带注释的触发器信息可能有助于事件提取，我们设计了一个辅助触发器识别模型，并使用多任务学习机制将触发器特征集成到事件提取模型中。对于文档级事件提取子任务，我们设计了一种基于编码器-解码器的方法，并提出了一种类似转换器的解码器。最后，我们的系统在该多格式信息抽取任务的测试集排行榜上排名第4，其关系抽取、句子级事件抽取和文档级事件抽取的子任务F1得分分别为79.887%、85.179%和70.828%。我们型号的代码可在{https://github.com/neukg/MultiIE}. 摘要：The multi-format information extraction task in the 2021 Language and Intelligence Challenge is designed to comprehensively evaluate information extraction from different dimensions. It consists of an multiple slots relation extraction subtask and two event extraction subtasks that extract events from both sentence-level and document-level. Here we describe our system for this multi-format information extraction competition task. Specifically, for the relation extraction subtask, we convert it to a traditional triple extraction task and design a voting based method that makes full use of existing models. For the sentence-level event extraction subtask, we convert it to a NER task and use a pointer labeling based method for extraction. Furthermore, considering the annotated trigger information may be helpful for event extraction, we design an auxiliary trigger recognition model and use the multi-task learning mechanism to integrate the trigger features into the event extraction model. For the document-level event extraction subtask, we design an Encoder-Decoder based method and propose a Transformer-alike decoder. Finally,our system ranks No.4 on the test set leader-board of this multi-format information extraction task, and its F1 scores for the subtasks of relation extraction, event extractions of sentence-level and document-level are 79.887%, 85.179%, and 70.828% respectively. The codes of our model are available at {https://github.com/neukg/MultiIE}.

推理|分析|理解|解释(3篇)

【1】 An Effective Non-Autoregressive Model for Spoken Language Understanding 标题：一种有效的口语理解非自回归模型链接：https://arxiv.org/abs/2108.07005

作者：Lizhi Cheng,Weijia Jia,Wenmian Yang 机构：Department of Computer Science and, Engineering, Shanghai Jiao Tong University, Shanghai, PR China, BNU-UIC Institute of Artificial, Intelligence and Future Networks, Beijing Normal University (BNU, Zhuhai), Guangdong Key Lab of AI, and Multi-Modal Data Processing 摘要：口语理解（SLU）是面向任务的对话系统的核心组成部分，由于人类的不耐烦，它期望推理延迟更短。非自回归SLU模型明显提高了推理速度，但由于每个时隙块之间缺乏顺序依赖信息，导致时隙问题不协调。为了弥补这一缺点，本文提出了一种新的非自回归SLU模型分层细化变换器，该模型包含一个时隙标签生成（SLG）任务和一个分层细化机制（LRM）。SLG定义为使用令牌序列和生成的插槽标签生成下一个插槽标签。使用SLG，非自回归模型可以在训练过程中有效地获取依赖信息，并且不会花费额外的时间进行推理。LRM根据Transformer的中间状态预测初步SLU结果，并利用这些结果指导最终预测。在两个公共数据集上的实验表明，我们的模型显著提高了SLU性能（总体准确率为1.5%），同时大大加快了推理过程（超过10倍）。摘要：Spoken Language Understanding (SLU), a core component of the task-oriented dialogue system, expects a shorter inference latency due to the impatience of humans. Non-autoregressive SLU models clearly increase the inference speed but suffer uncoordinated-slot problems caused by the lack of sequential dependency information among each slot chunk. To gap this shortcoming, in this paper, we propose a novel non-autoregressive SLU model named Layered-Refine Transformer, which contains a Slot Label Generation (SLG) task and a Layered Refine Mechanism (LRM). SLG is defined as generating the next slot label with the token sequence and generated slot labels. With SLG, the non-autoregressive model can efficiently obtain dependency information during training and spend no extra time in inference. LRM predicts the preliminary SLU results from Transformer's middle states and utilizes them to guide the final prediction. Experiments on two public datasets indicate that our model significantly improves SLU performance (1.5% on Overall accuracy) while substantially speed up (more than 10 times) the inference process over the state-of-the-art baseline.

【2】 Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning 标题：探索预先训练的语言模型对算术和逻辑推理的泛化能力链接：https://arxiv.org/abs/2108.06743

作者：Cunxiang Wang,Boyuan Zheng,Yuchen Niu,Yue Zhang 机构：♠Zhejiang University, China, ♣School of Engineering, Westlake University, China, ♥Institute of Advanced Technology, Westlake Institute for Advanced Study, China, ♦Johns Hopkins University, ♭Imperial College London 备注：Accepted by NLPCC2021 摘要：为了定量和直观地探索预训练语言模型（PLM）的泛化能力，我们设计了几个算术和逻辑推理任务。我们分析了当测试数据与列车数据处于相同分布时以及当其不同时PLMs的通用性，对于后一种分析，我们还设计了一个交叉分布测试集，而不是分布内测试集。我们在最先进和公开发布的可生成PLM-BART上进行实验。我们的研究发现，当分布相同时，PLM可以很容易地进行推广，但是，从分布中推广出来仍然很困难。摘要：To quantitatively and intuitively explore the generalization ability of pre-trained language models (PLMs), we have designed several tasks of arithmetic and logical reasoning. We both analyse how well PLMs generalize when the test data is in the same distribution as the train data and when it is different, for the latter analysis, we have also designed a cross-distribution test set other than the in-distribution test set. We conduct experiments on one of the most advanced and publicly released generative PLM - BART. Our research finds that the PLMs can easily generalize when the distribution is the same, however, it is still difficult for them to generalize out of the distribution.

【3】 Accurate, yet inconsistent? Consistency Analysis on Language Understanding Models 标题：准确，但又不一致？语言理解模型的一致性分析链接：https://arxiv.org/abs/2108.06665

作者：Myeongjun Jang,Deuk Sin Kwon,Thomas Lukasiewicz 机构：Department of Computer Science, University of Oxford, Language Super Intelligence Labs, SK Telecom 摘要：一致性是指为语义相似的上下文生成相同预测的能力，是声音语言理解模型非常需要的属性。尽管最近的预训练语言模型（PLM）在各种下游任务中表现出色，但如果模型真正理解语言，它们应该表现出一致的行为。在本文中，我们提出了一个简单的语言理解模型一致性分析（CALUM）框架来评估模型的下限一致性能力。通过实验，我们证实了当前PLM即使对于语义相同的输入，也容易产生不一致的预测。我们还观察到，使用复述识别任务的多任务训练有助于提高一致性，平均提高一致性13%。摘要：Consistency, which refers to the capability of generating the same predictions for semantically similar contexts, is a highly desirable property for a sound language understanding model. Although recent pretrained language models (PLMs) deliver outstanding performance in various downstream tasks, they should exhibit consistent behaviour provided the models truly understand language. In this paper, we propose a simple framework named consistency analysis on language understanding models (CALUM)} to evaluate the model's lower-bound consistency ability. Through experiments, we confirmed that current PLMs are prone to generate inconsistent predictions even for semantically identical inputs. We also observed that multi-task training with paraphrase identification tasks is of benefit to improve consistency, increasing the consistency by 13% on average.

GAN|对抗|攻击|生成相关(5篇)

【1】 MTG: A Benchmarking Suite for Multilingual Text Generation 标题：MTG：一个用于多语言文本生成的基准套件链接：https://arxiv.org/abs/2108.07140

作者：Yiran Chen,Zhenqiao Song,Xianze Wu,Danqing Wang,Jingjing Xu,Jiaze Chen,Hao Zhou,Lei Li 机构：ByteDance AI Lab, Shanghai, China 备注：9 pages 摘要：我们介绍了MTG，一个用于训练和评估多语言文本生成的新基准套件。它是第一个也是最大的文本生成基准，具有120k人工注释的多路并行数据，用于跨四种语言（英语、德语、法语和西班牙语）的三项任务（故事生成、问题生成和标题生成）。在此基础上，我们设置了各种评估场景，并从不同角度对几种流行的多语言生成模型进行了深入分析。我们的基准套件将鼓励文本生成社区使用多种语言，提供更多人工注释的并行数据和更多样化的生成场景。摘要：We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first and largest text generation benchmark with 120k human-annotated multi-way parallel data for three tasks (story generation, question generation, and title generation) across four languages (English, German, French, and Spanish). Based on it, we set various evaluation scenarios and make a deep analysis of several popular multilingual generation models from different aspects. Our benchmark suite will encourage the multilingualism for text generation community with more human-annotated parallel data and more diverse generation scenarios.

【2】 A Single Example Can Improve Zero-Shot Data Generation 标题：单个示例可以改进零激发数据生成链接：https://arxiv.org/abs/2108.06991

作者：Pavel Burnyshev,Valentin Malykh,Andrey Bout,Ekaterina Artemova,Irina Piontkovskaya 机构：Huawei Noah’s Ark Lab, Moscow, Russia, Kazan Federal University, Kazan, Russia, HSE University, Moscow, Russia 备注：To appear in INLG2021 proceedings 摘要：意图分类的子任务，如对分布转移的鲁棒性、对特定用户组的适应性和个性化、域外检测，需要大量灵活的数据集进行实验和评估。由于收集此类数据集既耗时又费力，我们建议使用文本生成方法来收集数据集。应该训练生成器生成属于给定意图的话语。我们探讨了两种生成任务型话语的方法。在Zero-Shot方法中，该模型被训练成从所看到的意图生成话语，并进一步用于为训练期间未看到的意图生成话语。在一次性方法中，模型以测试意图中的一句话呈现。我们使用两种建议的方法对生成的数据集执行彻底的自动和人工评估。我们的结果表明，生成数据的属性与通过众包收集的原始测试集非常接近。摘要：Sub-tasks of intent classification, such as robustness to distribution shift, adaptation to specific user groups and personalization, out-of-domain detection, require extensive and flexible datasets for experiments and evaluation. As collecting such datasets is time- and labor-consuming, we propose to use text generation methods to gather datasets. The generator should be trained to generate utterances that belong to the given intent. We explore two approaches to generating task-oriented utterances. In the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training. In the one-shot approach, the model is presented with a single utterance from a test intent. We perform a thorough automatic, and human evaluation of the dataset generated utilizing two proposed approaches. Our results reveal that the attributes of the generated data are close to original test sets, collected via crowd-sourcing.

【3】 AutoChart: A Dataset for Chart-to-Text Generation Task 标题：AutoChart：图表到文本生成任务的数据集链接：https://arxiv.org/abs/2108.06897

作者：Jiawen Zhu,Jinye Ran,Roy Ka-wei Lee,Kenny Choo,Zhi Li 机构：Singapore University of Technology and Design, University of Saskatchewan, China Merchants Bank 摘要：图表的分析描述是一个激动人心的重要研究领域，在学术界和工业界都有许多应用。然而，这项具有挑战性的任务受到了计算语言学研究界的有限关注。本文提出了textsf{AutoChart}，一个用于图表分析描述的大型数据集，旨在鼓励对这一重要领域进行更多研究。具体来说，我们提供了一个新的框架，可以自动生成图表及其分析描述。我们对生成的图表和描述进行了广泛的人机评估，并证明生成的文本信息丰富、连贯，并且与相应的图表相关。摘要：The analytical description of charts is an exciting and important research area with many applications in academia and industry. Yet, this challenging task has received limited attention from the computational linguistics research community. This paper proposes textsf{AutoChart}, a large dataset for the analytical description of charts, which aims to encourage more research into this important area. Specifically, we offer a novel framework that generates the charts and their analytical description automatically. We conducted extensive human and machine evaluations on the generated charts and descriptions and demonstrate that the generated texts are informative, coherent, and relevant to the corresponding charts.

【4】 SAPPHIRE: Approaches for Enhanced Concept-to-Text Generation 标题：蓝宝石：增强概念到文本生成的方法链接：https://arxiv.org/abs/2108.06643

作者：Steven Y. Feng,Jessica Huynh,Chaitanya Narisetty,Eduard Hovy,Varun Gangal 机构：Language Technologies Institute, Carnegie Mellon University 备注：INLG 2021. Code available at this https URL 摘要：我们激发并提出了一套简单但有效的改进概念到文本生成的SAPPHIRE：集合增强和后期短语填充和重组。通过使用BART和T5模型的实验，我们证明了它们在生成性常识推理，即CommonGen任务上的有效性。通过广泛的自动和人工评估，我们发现SAPPHIRE显著提高了模型性能。深入的定性分析表明，SAPPHIRE有效地解决了基线模型生成的许多问题，包括缺乏常识、不够具体和流利性差。摘要：We motivate and propose a suite of simple but effective improvements for concept-to-text generation called SAPPHIRE: Set Augmentation and Post-hoc PHrase Infilling and REcombination. We demonstrate their effectiveness on generative commonsense reasoning, a.k.a. the CommonGen task, through experiments using both BART and T5 models. Through extensive automatic and human evaluation, we show that SAPPHIRE noticeably improves model performance. An in-depth qualitative analysis illustrates that SAPPHIRE effectively addresses many issues of the baseline model generations, including lack of commonsense, insufficient specificity, and poor fluency.

【5】 The SelectGen Challenge: Finding the Best Training Samples for Few-Shot Neural Text Generation 标题：SelectGen挑战：为少射神经文本生成寻找最佳训练样本链接：https://arxiv.org/abs/2108.06614

作者：Ernie Chang,Xiaoyu Shen,Alex Marin,Vera Demberg 机构：Dept. of Language Science and Technology, Saarland University, ⋒ Microsoft Corporation, Redmond, WA 备注：Accepted at GenChal @ INLG 2021 摘要：我们提出了一个关于训练实例选择的共享任务，用于Few-Shot神经文本生成。大规模的预训练语言模型极大地改善了很Few-Shot的文本生成。尽管如此，几乎所有以前的工作都只是简单地应用随机抽样来选择少数的射击训练实例。对于选择策略以及它们如何影响模型性能，几乎没有人关注。对选择策略的研究可以帮助我们（1）在下游任务中最大限度地利用注释预算；（2）更好地基准化少数镜头文本生成模型。我们欢迎提交他们的选择策略和对发电质量的影响。摘要：We propose a shared task on training instance selection for few-shot neural text generation. Large-scale pretrained language models have led to dramatic improvements in few-shot text generation. Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect model performance. The study of the selection strategy can help us to (1) make the most use of our annotation budget in downstream tasks and (2) better benchmark few-shot text generative models. We welcome submissions that present their selection strategies and the effects on the generation quality.

半/弱/无监督|不确定性(1篇)

【1】 Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss 标题：基于期望实体比损失的部分监督命名实体识别链接：https://arxiv.org/abs/2108.07216

作者：Thomas Effland,Michael Collins 机构：Columbia University, Google Research 备注：Accepted in TACL 2021, pre-MIT Press publication version 摘要：我们研究了在缺少实体注释的情况下学习命名实体识别器。我们将这种设置视为潜在变量的标记，并提出了一种新的损失，即期望实体比率，以在存在系统性缺失标记的情况下学习模型。我们证明了我们的方法在理论上是合理的，在经验上也是有用的。通过实验，我们发现它在各种语言、注释场景和标记数据量方面都达到或超过了强大和最先进的基线性能。特别是，我们发现，在一个具有挑战性的环境中，在7个数据集的平均值仅为1000个偏差注释的情况下，它显著优于Mayhew et al.（2019）和Li et al.（2021）先前最先进的方法，F1分数分别为 12.7和 2.3。我们还表明，当与我们的方法相结合时，一种新的稀疏注释方案在适当的注释预算下优于穷举注释。摘要：We study learning named entity recognizers in the presence of missing entity annotations. We approach this setting as tagging with latent variables and propose a novel loss, the Expected Entity Ratio, to learn models in the presence of systematically missing tags. We show that our approach is both theoretically sound and empirically useful. Experimentally, we find that it meets or exceeds performance of strong and state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. In particular, we find that it significantly outperforms the previous state-of-the-art methods from Mayhew et al. (2019) and Li et al. (2021) by 12.7 and 2.3 F1 score in a challenging setting with only 1,000 biased annotations, averaged across 7 datasets. We also show that, when combined with our approach, a novel sparse annotation scheme outperforms exhaustive annotation for modest annotation budgets.

检测相关(2篇)

【1】 Maps Search Misspelling Detection Leveraging Domain-Augmented Contextual Representations 标题：利用域增强上下文表示的地图搜索拼写错误检测链接：https://arxiv.org/abs/2108.06842

作者：Yutong Li 机构：Apple Inc., Cupertino, CA, USA 摘要：构建一个独立的拼写错误检测器并在纠正之前提供它可以为拼写器和其他搜索组件带来多方面的好处，这对于最常用的基于通道的噪声拼写器系统尤其如此。随着深度学习的快速发展和诸如BERTology等上下文表示学习的实质性进步，构建一个像样的拼写错误检测器而不必依赖于与嘈杂通道结构相关的手工功能变得比以往任何时候都更容易实现。然而，BERTolgy模型是用自然语言语料库训练的，但地图搜索是高度领域特定的，BERTolgy会继续成功吗。在本文中，我们设计了从最基本的LSTM到单域增强微调BERT的4个阶段的错误检测模型。在我们的案例中，我们发现对于Maps搜索，其他高级BERTology家族模型（如RoBERTa）并不一定优于BERT，而经典的跨域微调全BERT甚至低于较小的单域微调BERT。通过全面的建模实验和分析，我们分享了更多的发现，我们还简要介绍了数据生成算法的突破。摘要：Building an independent misspelling detector and serve it before correction can bring multiple benefits to speller and other search components, which is particularly true for the most commonly deployed noisy-channel based speller systems. With rapid development of deep learning and substantial advancement in contextual representation learning such as BERTology, building a decent misspelling detector without having to rely on hand-crafted features associated with noisy-channel architecture becomes more-than-ever accessible. However BERTolgy models are trained with natural language corpus but Maps Search is highly domain specific, would BERTology continue its success. In this paper we design 4 stages of models for misspeling detection ranging from the most basic LSTM to single-domain augmented fine-tuned BERT. We found for Maps Search in our case, other advanced BERTology family model such as RoBERTa does not necessarily outperform BERT, and a classic cross-domain fine-tuned full BERT even underperforms a smaller single-domain fine-tuned BERT. We share more findings through comprehensive modeling experiments and analysis, we also briefly cover the data generation algorithm breakthrough.

【2】 Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study 标题：毒物评论自动检测中的偏差调查：一项实证研究链接：https://arxiv.org/abs/2108.06487

作者：Ayush Kumar,Pratik Kumar 机构：Georgia Institute of Technology, Atlanta, US 摘要：随着在线平台的激增，通过评论和反应，用户在这些平台上的参与度也在激增。这些文字评论中有很大一部分是辱骂、粗鲁和冒犯观众的。有了机器学习系统来检查平台上的评论，训练数据中存在的偏见就会传递到分类器上，导致对一组阶级、宗教和性别的歧视。在这项工作中，我们评估了不同的分类器和特征，以估计这些分类器中的偏差以及它们在毒性分类下游任务中的性能。结果表明，自动毒性评论检测模型性能的改善与这些模型中的偏差的缓解正相关。在我们的工作中，有注意机制的LSTM被证明是比CNN模型更好的建模策略。进一步的分析表明，在毒性评价检测的训练模型上，fasttext嵌入略优于手套嵌入。更深入的分析揭示了这样一个发现，即这种自动模型特别偏向于特定的身份群体，即使该模型具有较高的AUC分数。最后，为了减轻毒性检测模型中的偏差，用毒性亚型辅助任务训练的多任务设置被证明是有用的，导致AUC得分增加高达0.26%（6%相对）。摘要：With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender. In this work, we evaluate different classifiers and feature to estimate the bias in these classifiers along with their performance on downstream task of toxicity classification. Results show that improvement in performance of automatic toxic comment detection models is positively correlated to mitigating biases in these models. In our work, LSTM with attention mechanism proved to be a better modelling strategy than a CNN model. Further analysis shows that fasttext embeddings is marginally preferable than glove embeddings on training models for toxicity comment detection. Deeper analysis reveals the findings that such automatic models are particularly biased to specific identity groups even though the model has a high AUC score. Finally, in effort to mitigate bias in toxicity detection models, a multi-task setup trained with auxiliary task of toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain in AUC scores.

识别/分类(2篇)

【1】 MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain 标题：MOBIE：移动域中用于命名实体识别、实体链接和关系提取的德国数据集链接：https://arxiv.org/abs/2108.06955

作者：Leonhard Hennig,Phuc Tran Truong,Aleksandra Gabryszak 机构：German Research Center for Artificial Intelligence (DFKI), Speech and Language Technology Lab 备注：Accepted at KONVENS 2021. 5 pages, 3 figures, 5 tables 摘要：我们介绍了MobIE，一个德语数据集，它由20种粗粒度和细粒度实体类型以及地理上可链接实体的实体链接信息进行人工注释。该数据集由3232个社交媒体文本和流量报告组成，其中包含91K个令牌，并包含20.5K个注释实体，其中13.1K个链接到知识库。数据集的一个子集使用七种与移动相关的n元关系类型进行人类注释，而其余文档则使用Snokel框架实现的弱监督标记方法进行注释。据我们所知，这是第一个结合了NER、EL和RE注释的德语数据集，因此可用于这些基本信息提取任务的联合和多任务学习。我们把MobIE公之于众https://github.com/dfki-nlp/mobie. 摘要：We present MobIE, a German-language dataset, which is human-annotated with 20 coarse- and fine-grained entity types and entity linking information for geographically linkable entities. The dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities, 13.1K of which are linked to a knowledge base. A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types, while the remaining documents are annotated using a weakly-supervised labeling approach implemented with the Snorkel framework. To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE, and thus can be used for joint and multi-task learning of these fundamental information extraction tasks. We make MobIE public at https://github.com/dfki-nlp/mobie.

【2】 Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models 标题：基于精调预训练语言模型的安全漏洞报告小样本命名实体识别链接：https://arxiv.org/abs/2108.06590

作者：Guanqun Yang,Shay Dineen,Zhipeng Lin,Xueqing Liu 机构：Department of Computer Science, Stevens Institute of Technology, Hoboken NJ , USA 备注：Accepted by MLHat @ KDD '21 摘要：公共安全漏洞报告（如CVE报告）在维护计算机和网络系统方面发挥着重要作用。安全公司和管理员依靠这些报告中的信息来确定开发和向客户部署修补程序的任务优先级。由于这些报告是非结构化文本，自动信息提取（IE）可以通过将非结构化报告转换为结构化形式（例如软件名称、版本和漏洞类型）来帮助扩大处理规模。用于安全漏洞报告的自动化IE的现有工作通常依赖于大量标记的训练样本。然而，创建大量标记的训练集既昂贵又耗时。在这项工作中，我们首次提出在只有少量标记训练样本可用的情况下研究这个问题。特别是，我们研究了在我们的小训练数据集上微调几个最先进的预训练语言模型的性能。结果表明，通过预先训练的语言模型和精心调整的超参数，我们在这项任务上已经达到或略优于最先进的系统。与前面的两步过程一致，即首先对主要类别进行微调，然后将学习转移到其他类别，如[7]，如果按照我们提出的方法进行，则两个阶段所需的标记样本数量都会大幅减少：微调从5758减少到576，减少了90%，每类64个标记样本，迁移学习减少88.8%。因此，我们的实验证明了基于NER的少量样本学习在安全漏洞报告中的有效性。这一结果为安全漏洞报告的少量样本学习提供了多个研究机会，本文对此进行了讨论。代码：https://github.com/guanqun-yang/FewVulnerability. 摘要：Public security vulnerability reports (e.g., CVE reports) play an important role in the maintenance of computer and network systems. Security companies and administrators rely on information from these reports to prioritize tasks on developing and deploying patches to their customers. Since these reports are unstructured texts, automatic information extraction (IE) can help scale up the processing by converting the unstructured reports to structured forms, e.g., software names and versions and vulnerability types. Existing works on automated IE for security vulnerability reports often rely on a large number of labeled training samples. However, creating massive labeled training set is both expensive and time consuming. In this work, for the first time, we propose to investigate this problem where only a small number of labeled training samples are available. In particular, we investigate the performance of fine-tuning several state-of-the-art pre-trained language models on our small training dataset. The results show that with pre-trained language models and carefully tuned hyperparameters, we have reached or slightly outperformed the state-of-the-art system on this task. Consistent with previous two-step process of first fine-tuning on main category and then transfer learning to others as in [7], if otherwise following our proposed approach, the number of required labeled samples substantially decrease in both stages: 90% reduction in fine-tuning from 5758 to 576,and 88.8% reduction in transfer learning with 64 labeled samples per category. Our experiments thus demonstrate the effectiveness of few-sample learning on NER for security vulnerability report. This result opens up multiple research opportunities for few-sample learning for security vulnerability reports, which is discussed in the paper. Code: https://github.com/guanqun-yang/FewVulnerability.

Word2Vec|文本|单词(1篇)

【1】 Who's Waldo? Linking People Across Text and Images 标题：沃尔多是谁？跨文本和图像链接人员链接：https://arxiv.org/abs/2108.07253

作者：Claire Yuqing Cui,Apoorv Khandelwal,Yoav Artzi,Noah Snavely,Hadar Averbuch-Elor 机构：Cornell University ,Cornell Tech 备注：Published in ICCV 2021 (Oral). Project webpage: this https URL 摘要：我们提出了一个任务和基准数据集，用于以人为中心的视觉基础，即标题中命名的人和图像中的人物之间的链接问题。与之前主要基于对象的视觉基础研究不同，我们的新任务掩盖了字幕中的人名，以鼓励在此类图像字幕对上训练的方法关注上下文线索（如多人之间的丰富互动），而不是学习名字和外表之间的联系。为了促进这项任务，我们引入了一个新的数据集，Who'swaldo，它是从wikimediacomons上的图像标题数据中自动挖掘出来的。我们提出了一种基于Transformer的方法，它优于这个任务上的几个强基线，并将我们的数据发布到研究社区，以刺激工作的上下文模型考虑视觉和语言。摘要：We present a task and benchmark dataset for person-centric visual grounding, the problem of linking between people named in a caption and people pictured in an image. In contrast to prior work in visual grounding, which is predominantly object-based, our new task masks out the names of people in captions in order to encourage methods trained on such image-caption pairs to focus on contextual cues (such as rich interactions between multiple people), rather than learning associations between names and appearances. To facilitate this task, we introduce a new dataset, Who's Waldo, mined automatically from image-caption data on Wikimedia Commons. We propose a Transformer-based method that outperforms several strong baselines on this task, and are releasing our data to the research community to spur work on contextual models that consider both vision and language.

其他神经网络|深度学习|模型|建模(1篇)

【1】 What can Neural Referential Form Selectors Learn? 标题：神经参照形式选择器能学到什么？链接：https://arxiv.org/abs/2108.06806

作者：Guanyi Chen,Fahime Same,Kees van Deemter 机构：♠Department of Information and Computing Sciences, Utrecht University, ♥Department of Linguistics, University of Cologne 备注：Long paper accepted at INLG 2021 摘要：尽管取得了令人鼓舞的结果，神经参考表达生成模型通常被认为缺乏透明度。我们探讨了神经参考形式选择（RFS）模型，以了解影响再形式的语言特征在多大程度上被最先进的RFS模型学习和捕获。8个探测任务的结果表明，所有定义的特征都在一定程度上得到了学习。与指称状态和句法位置相关的探测任务表现出最高的性能。最低的表现是通过设计用于预测句子层面以外的语篇结构属性的探测模型实现的。摘要：Despite achieving encouraging results, neural Referring Expression Generation models are often thought to lack transparency. We probed neural Referential Form Selection (RFS) models to find out to what extent the linguistic features influencing the RE form are learnt and captured by state-of-the-art RFS models. The results of 8 probing tasks show that all the defined features were learnt to some extent. The probing tasks pertaining to referential status and syntactic position exhibited the highest performance. The lowest performance was achieved by the probing models designed to predict discourse structure properties beyond the sentence level.

其他(4篇)

【1】 MMChat: Multi-Modal Chat Dataset on Social Media 标题：MMChat：社交媒体上的多模态聊天数据集链接：https://arxiv.org/abs/2108.07154

作者：Yinhe Zheng,Guanyi Chen,Xin Liu,Ke Lin 机构：♠ Alibaba Group, ♣Department of Information and Computing Sciences, Utrecht University, ♥Samsung Research China - Beijing (SRC-B) 摘要：在对话中融入多模态语境是发展更具吸引力的对话系统的一个重要步骤。在这项工作中，我们通过引入MMChat来探索这一方向：一个大规模的多模态对话语料库（3240万个原始对话和120.84K个过滤对话）。与以前的语料库不同，MMChat是众包的或从虚构电影中收集的，它包含从社交媒体上的真实对话中收集的基于图像的对话，其中观察到了稀疏性问题。具体地说，在普通通信中，图像引发的对话可能会随着对话的进行而偏离一些非基于图像的主题。我们开发了一个基准模型，通过调整图像特征上的注意路由机制来解决对话生成任务中的这一问题。实验证明了融合图像特征的有效性和处理图像特征稀疏性的有效性。摘要：Incorporating multi-modal contexts in conversation is an important step for developing more engaging dialogue systems. In this work, we explore this direction by introducing MMChat: a large scale multi-modal dialogue corpus (32.4M raw dialogues and 120.84K filtered dialogues). Unlike previous corpora that are crowd-sourced or collected from fictitious movies, MMChat contains image-grounded dialogues collected from real conversations on social media, in which the sparsity issue is observed. Specifically, image-initiated dialogues in common communications may deviate to some non-image-grounded topics as the conversation proceeds. We develop a benchmark model to address this issue in dialogue generation tasks by adapting the attention routing mechanism on image features. Experiments demonstrate the usefulness of incorporating image features and the effectiveness in handling the sparsity of image features.

【2】 Clustering Filipino Disaster-Related Tweets Using Incremental and Density-Based Spatiotemporal Algorithm with Support Vector Machines for Needs Assessment 2 标题：使用增量和基于密度的时空算法和支持向量机对菲律宾灾难相关推文进行聚类以进行需求评估2 链接：https://arxiv.org/abs/2108.06853

作者：Ocean M. Barba,Franz Arvin T. Calbay,Angelica Jane S. Francisco,Angel Luis D. Santos,Charmaine S. Ponay 摘要：社交媒体在人们获取信息和相互交流方面发挥了巨大作用。它帮助人们表达他们在灾难中的需求。由于通过Twitter发布的帖子在默认情况下是可以公开访问的，因此Twitter是灾难发生时最有用的社交媒体网站之一。因此，这项研究旨在评估菲律宾人在灾难期间在推特上表达的需求。使用Na “ive Bayes分类器收集数据并将其分类为与灾害相关或无关。在此之后，使用增量聚类算法对与灾难相关的tweet按灾难类型进行聚类，然后使用基于密度的时空聚类算法根据tweet的位置和时间进行子聚类。最后，使用支持向量机，根据表达的需求对推特进行分类，如避难所、救援、救济、现金、祈祷和其他。研究结果表明，增量聚类算法和基于密度的时空聚类算法能够对tweet进行聚类，f-measure得分分别为47.20%和82.28%。此外，Na “ive Bayes和支持向量机能够分别以97%的平均f-度量分数和77.57%的平均准确率进行分类。摘要：Social media has played a huge part on how people get informed and communicate with one another. It has helped people express their needs due to distress especially during disasters. Because posts made through it are publicly accessible by default, Twitter is among the most helpful social media sites in times of disaster. With this, the study aims to assess the needs expressed during calamities by Filipinos on Twitter. Data were gathered and classified as either disaster-related or unrelated with the use of Na"ive Bayes classifier. After this, the disaster-related tweets were clustered per disaster type using Incremental Clustering Algorithm, and then sub-clustered based on the location and time of the tweet using Density-based Spatiotemporal Clustering Algorithm. Lastly, using Support Vector Machines, the tweets were classified according to the expressed need, such as shelter, rescue, relief, cash, prayer, and others. After conducting the study, results showed that the Incremental Clustering Algorithm and Density-Based Spatiotemporal Clustering Algorithm were able to cluster the tweets with f-measure scores of 47.20% and 82.28% respectively. Also, the Na"ive Bayes and Support Vector Machines were able to classify with an average f-measure score of 97% and an average accuracy of 77.57% respectively.

【3】 Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages 标题：LoResMT 2021关于低资源语言的COVID和手语共享任务的研究结果链接：https://arxiv.org/abs/2108.06598

作者：Atul Kr. Ojha,Chao-Hong Liu,Katharina Kann,John Ortega,Sheetal Shatam,Theodorus Fransen 备注：10 pages 摘要：我们介绍了LoResMT 2021共享任务的发现，该任务侧重于低资源口语和手语的新冠病毒-19数据的机器翻译（MT）。这项任务的组织是第四次低资源语言机器翻译技术研讨会（LoResMT）的一部分。平行语料库提供并公开提供，包括以下说明：英语$leftrightarrow$爱尔兰语、英语$leftrightarrow$马拉地语和台湾手语$leftrightarrow$繁体中文。训练数据分别由8112、20933和128608段组成。马拉地语和英语还有其他单语数据集，由21901段组成。这里给出的结果是基于总共八个团队的参赛作品。三个小组提交了英语$leftrightarrow$Irish的系统，五个小组提交了英语$leftrightarrow$Marathi的系统。不幸的是，没有为台湾手语$leftrightarrow$繁体中文任务提交系统。使用BLEU计算最大系统性能，以下为36.0（英语-爱尔兰语）、34.6（爱尔兰语-英语）、24.2（英语-马拉地语）和31.3（马拉地语-英语）。摘要：We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technologies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following directions: English$leftrightarrow$Irish, English$leftrightarrow$Marathi, and Taiwanese Sign language$leftrightarrow$Traditional Chinese. Training data consists of 8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for Marathi and English that consist of 21901 segments. The results presented here are based on entries from a total of eight teams. Three teams submitted systems for English$leftrightarrow$Irish while five teams submitted systems for English$leftrightarrow$Marathi. Unfortunately, there were no systems submissions for the Taiwanese Sign language$leftrightarrow$Traditional Chinese task. Maximum system performance was computed using BLEU and follow as 36.0 for English--Irish, 34.6 for Irish--English, 24.2 for English--Marathi, and 31.3 for Marathi--English.

【4】 A New Entity Extraction Method Based on Machine Reading Comprehension 标题：一种基于机器阅读理解的实体抽取新方法链接：https://arxiv.org/abs/2108.06444

作者：Xiaobo Jiang,Kun He,Jiajun He,Guangyu Yan 机构：The authors are with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou , China (e-mail:, comprehension. In addition, by locating the answers directly, instead of tagging them verbatim, the probability of long 摘要：实体提取是自然语言处理中从海量文本中获取信息的关键技术。它们之间的进一步互动不符合人类阅读理解的标准，从而限制了对模型的理解，也限制了由于推理问题而遗漏或误判答案（即目标实体）。一种有效的基于MRC的实体提取模型——MRC-I2DP，它使用建议的门控注意吸引机制来调整文本对每个部分的恢复，为多级交互注意计算创造问题和思考，以增加目标实体。它还使用建议的2D概率编码模块，TALU函数和掩码机制加强了对所有可能目标的检测，从而提高了预测的概率和准确性。实验证明，MRC-I2DP代表了7个科学和公共领域的最先进模型，与F1模型相比，性能提高了2.1%~10.4%。摘要：Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the model, and also the omission or misjudgment of the answer (ie the target entity) due to the reasoning question. An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. Experiments have proved that MRC-I2DP represents an overall state-of-the-art model in 7 from the scientific and public domains, achieving a performance improvement of 2.1% ~ 10.4% compared to the model model in F1.

linux https 网络安全 python NLP服务

0 人点赞