cs.CL 方向,今日共计29篇
Transformer(3篇)
【1】 Linear algebra with transformers 标题:带Transformer的线性代数 链接:https://arxiv.org/abs/2112.01898
作者:François Charton 摘要:从积分到定理证明,Transformer在数学中的大多数应用都集中在符号计算上。在本文中,我们表明,Transformer可以进行训练,以执行高精度的数值计算。我们考虑线性代数的问题:矩阵转置、加法、乘法、特征值和向量、奇异值分解和反演。通过对随机矩阵数据集上的小型Transformer(最多六层)进行训练,我们在所有问题上都实现了高精度(超过90%)。我们还表明,经过训练的模型可以从其训练分布中推广出来,并且通过使用更多样化的数据集(特别是通过使用具有非独立和相同分布系数的矩阵进行训练),可以大大提高域外精度。最后,我们展示了很少的镜头学习可以用来重新训练模型以解决更大的问题。 摘要:Most applications of transformers to mathematics, from integration to theorem proving, focus on symbolic computation. In this paper, we show that transformers can be trained to perform numerical calculations with high accuracy. We consider problems of linear algebra: matrix transposition, addition, multiplication, eigenvalues and vectors, singular value decomposition, and inversion. Training small transformers (up to six layers) over datasets of random matrices, we achieve high accuracies (over 90%) on all problems. We also show that trained models can generalize out of their training distribution, and that out-of-domain accuracy can be greatly improved by working from more diverse datasets (in particular, by training from matrices with non-independent and identically distributed coefficients). Finally, we show that few-shot learning can be leveraged to re-train models to solve larger problems.
【2】 TransCouplet:Transformer based Chinese Couplet Generation 标题:TransCouet:基于Transformer的汉语对联生成 链接:https://arxiv.org/abs/2112.01707
作者:Kuan-Yu Chiang,Shihao Lin,Joe Chen,Qian Yin,Qizhen Jin 摘要:对联是一种特殊的诗歌形式,由复杂的句法和古代汉语构成。由于语义和语法规则的复杂性,创作合适的对联是一项艰巨的挑战。本文提出了一种基于Transformer的顺序-顺序耦合器生成模型。通过使用AnchiBERT,该模型能够捕捉古代汉语的理解。此外,我们评估了对联语法规则上的字形、拼音和词性标记,以进一步改进模型。 摘要:Chinese couplet is a special form of poetry composed of complex syntax with ancient Chinese language. Due to the complexity of semantic and grammatical rules, creation of a suitable couplet is a formidable challenge. This paper presents a transformer-based sequence-to-sequence couplet generation model. With the utilization of AnchiBERT, the model is able to capture ancient Chinese language understanding. Moreover, we evaluate the Glyph, PinYin and Part-of-Speech tagging on the couplet grammatical rules to further improve the model.
【3】 LMR-CBT: Learning Modality-fused Representations with CB-Transformer for Multimodal Emotion Recognition from Unaligned Multimodal Sequences 标题:LMR-CBT:基于CB-Transform的学习模态融合表示法在未对齐多模态序列中的多模态情感识别 链接:https://arxiv.org/abs/2112.01697
作者:Ziwang Fu,Feng Liu,Hanyang Wang,Siyuan Shen,Jiahao Zhang,Jiayin Qi,Xiangling Fu,Aimin Zhou 备注:9 pages ,Figure 2, Table 5 摘要:学习模态融合表征和处理未对齐的多模态序列在多模态情感识别中具有重要意义和挑战性。现有的方法使用定向两两注意或信息中心来融合语言、视觉和音频模式。然而,这些方法在融合特征时引入了信息冗余,并且在不考虑模式互补性的情况下效率低下。在本文中,我们提出了一种有效的神经网络学习模式融合表示与CBTransformer(LMR-CBT)的多模态情感识别从未对齐的多模态序列。具体来说,我们首先分别对这三种模式进行特征提取,以获得序列的局部结构。然后,我们设计了一种新的跨模态块变换器(CB-transformer),该变换器支持不同模态的互补学习,主要分为局部时间学习、跨模态特征融合和全局自我注意表征。此外,我们将融合后的特征与原始特征拼接,对序列中的情感进行分类。最后,我们在三个具有挑战性的数据集(IEMOCAP、CMU-MOSI和CMU-MOSEI)上进行单词对齐和非对齐实验。实验结果表明,在这两种情况下,我们提出的方法都具有优越性和有效性。与主流方法相比,我们的方法以最少的参数数量达到了最先进的水平。 摘要:Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition. Existing approaches use directional pairwise attention or a message hub to fuse language, visual, and audio modalities. However, those approaches introduce information redundancy when fusing features and are inefficient without considering the complementarity of modalities. In this paper, we propose an efficient neural network to learn modality-fused representations with CB-Transformer (LMR-CBT) for multimodal emotion recognition from unaligned multimodal sequences. Specifically, we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences. Then, we design a novel transformer with cross-modal blocks (CB-Transformer) that enables complementary learning of different modalities, mainly divided into local temporal learning,cross-modal feature fusion and global self-attention representations. In addition, we splice the fused features with the original features to classify the emotions of the sequences. Finally, we conduct word-aligned and unaligned experiments on three challenging datasets, IEMOCAP, CMU-MOSI, and CMU-MOSEI. The experimental results show the superiority and efficiency of our proposed method in both settings. Compared with the mainstream methods, our approach reaches the state-of-the-art with a minimum number of parameters.
BERT(1篇)
【1】 Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset 标题:基于暹罗BERT的网络搜索相关性排序模型在捷克新数据集上的评价 链接:https://arxiv.org/abs/2112.01810
作者:Matěj Kocián,Jakub Náplava,Daniel Štancl,Vladimír Kadlec 备注:Accepted at the Thirty-Fourth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-22). IAAI Innovative Application Award. 9 pages, 3 figures, 8 tables 摘要:网络搜索引擎专注于在数百毫秒内提供高度相关的结果。因此,诸如BERT之类的预先训练的语言转换器模型由于其高计算需求而很难在该场景中使用。我们介绍了利用基于BERT的暹罗体系结构解决文档排序问题的实时方法。该模型已经部署在一个商业搜索引擎中,它将生产性能提高了3%以上。为了进一步研究和评估,我们发布了DaReCzech,这是一个由160万捷克用户查询文档对组成的独特数据集,具有手动指定的相关性级别。我们还发布了Small-E-Czech,这是一个在大型捷克语语料库上预先训练的Electra小型语言模型。我们相信这些数据将支持搜索相关性和以多种语言为重点的研究社区的努力。 摘要:Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.
QA|VQA|问答|对话(2篇)
【1】 Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts 标题:情绪的形状:通过情绪转换识别对话中的多模态情绪 链接:https://arxiv.org/abs/2112.01938
作者:Harsh Agarwal,Keshav Bansal,Abhinav Joshi,Ashutosh Modi 备注:13 pages 摘要:会话中的情感识别是一个重要而活跃的研究课题。最近的工作表明,在ERC任务中使用多种模式(如文本、音频和视频)的好处。在谈话中,参与者倾向于保持特定的情绪状态,除非某些外部刺激引起变化。在一次谈话中,情绪会不断地起伏波动。受这一观察结果的启发,我们提出了一个多模态ERC模型,并用情绪转移成分对其进行了扩充。提出的情感转移组件是模块化的,可以添加到任何现有的多模态ERC模型中(只需稍作修改),以提高情感识别。我们对该模型的不同变体进行了实验,结果表明,包含情绪转移信号有助于该模型优于现有的ERC多模态模型,从而在MOSEI和IEMOCAP数据集上显示出最先进的性能。 摘要:Emotion Recognition in Conversations (ERC) is an important and active research problem. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some external stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this observation, we propose a multimodal ERC model and augment it with an emotion-shift component. The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications), to improve emotion recognition. We experiment with different variants of the model, and results show that the inclusion of emotion shift signal helps the model to outperform existing multimodal models for ERC and hence showing the state-of-the-art performance on MOSEI and IEMOCAP datasets.
【2】 MetaQA: Combining Expert Agents for Multi-Skill Question Answering 标题:MetaQA:组合专家Agent进行多技能问答 链接:https://arxiv.org/abs/2112.01922
作者:Haritz Puerto,Gözde Gül Şahin,Iryna Gurevych 摘要:最近问答(QA)数据集和模型的爆炸式增长,通过在多个数据集上训练模型或结合多个模型,增加了人们对跨多个领域和格式的模型泛化的兴趣。我们认为,尽管多数据集模型的结果很有希望,但某些领域或QA格式可能需要特定的体系结构,因此这些模型的适应性可能会受到限制。此外,当前用于组合模型的方法忽略了问题-答案兼容性等线索。在这项工作中,我们建议将专家代理与一种新颖、灵活且训练高效的体系结构相结合,该体系结构考虑问题、答案预测和答案预测置信度得分,以从候选答案列表中选择最佳答案。通过定量和定性实验,我们表明,我们的模型i)在代理之间建立了协作,在域内和域外场景中都优于以前的多代理和多数据集方法;ii)训练数据效率极高;iii)可以适应任何QA格式。 摘要:The recent explosion of question answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training models on multiple datasets or by combining multiple models. We argue that despite the promising results of multi-dataset models, some domains or QA formats may require specific architectures, and thus the adaptability of these models might be limited. In addition, current approaches for combining models disregard cues such as question-answer compatibility. In this work, we propose to combine expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores to select the best answer among a list of answer candidates. Through quantitative and qualitative experiments we show that our model i) creates a collaboration between agents that outperforms previous multi-agent and multi-dataset approaches in both in-domain and out-of-domain scenarios, ii) is extremely data-efficient to train, and iii) can be adapted to any QA format.
机器翻译(1篇)
【1】 Multitask Finetuning for Improving Neural Machine Translation in Indian Languages 标题:改进印度语神经机器翻译的多任务微调 链接:https://arxiv.org/abs/2112.01742
作者:Shaily Desai,Atharva Kshirsagar,Manisha Marathe 摘要:基于转换器的语言模型在自然语言处理的所有领域都取得了令人印象深刻的结果。在语言建模任务之前对这些模型进行训练,并在文本分类、问答和神经机器翻译等下游任务中对它们进行精细化,一直显示出示例性的结果。在这项工作中,我们提出了一种多任务精细调整方法,该方法将双语机器翻译任务与辅助因果语言建模任务相结合,以提高前一任务在印度语言上的性能。我们对三对语言(马拉地印地语、马拉地英语和印地语英语)进行了实证研究,比较了多任务微调方法和标准微调方法,我们使用了mBART50模型。我们的研究表明,多任务精细调整方法可能比标准精细调整方法更好,并且可以改进跨语言对的双语机器翻译。 摘要:Transformer based language models have led to impressive results across all domains in Natural Language Processing. Pretraining these models on language modeling tasks and finetuning them on downstream tasks such as Text Classification, Question Answering and Neural Machine Translation has consistently shown exemplary results. In this work, we propose a Multitask Finetuning methodology which combines the Bilingual Machine Translation task with an auxiliary Causal Language Modeling task to improve performance on the former task on Indian Languages. We conduct an empirical study on three language pairs, Marathi-Hindi, Marathi-English and Hindi-English, where we compare the multitask finetuning approach to the standard finetuning approach, for which we use the mBART50 model. Our study indicates that the multitask finetuning method could be a better technique than standard finetuning, and could improve Bilingual Machine Translation across language pairs.
语义分析(1篇)
【1】 Semantic Segmentation of Legal Documents via Rhetorical Roles 标题:基于修辞角色的法律文本语义切分 链接:https://arxiv.org/abs/2112.01836
作者:Vijit Malik,Rishabh Sanjay,Shouvik Kumar Guha,Shubham Kumar Nigam,Angshuman Hazarika,Arnab Bhattacharya,Ashutosh Modi 备注:16 pages 摘要:法律文档是非结构化的,使用法律术语,并且具有相当长的长度,因此很难通过传统的文本处理技术自动处理。如果文档可以在语义上分割为连贯的信息单元,那么法律文档处理系统将大大受益。本文提出了一个修辞角色(RR)系统,用于将法律文件分割为语义连贯的单元:事实、论点、法规、问题、先例、裁决和比率。在法律专家的帮助下,我们提出了一套13个细粒度的修辞角色标签,并创建了一个新的法律文件语料库,用建议的RR注释。我们开发了一个将文档分割成修辞角色单元的系统。特别是,我们开发了一个基于多任务学习的深度学习模型,将文档修辞角色标签转换作为分割法律文档的辅助任务。我们对各种深度学习模型进行了广泛的实验,以预测文档中的修辞角色,与现有模型相比,该模型表现出了更高的性能。此外,我们将RR应用于预测法律案件的判决,并表明与基于Transformer的模型相比,RR的使用增强了预测。 摘要:Legal documents are unstructured, use legal jargon, and have considerable length, making it difficult to process automatically via conventional text processing techniques. A legal document processing system would benefit substantially if the documents could be semantically segmented into coherent units of information. This paper proposes a Rhetorical Roles (RR) system for segmenting a legal document into semantically coherent units: facts, arguments, statute, issue, precedent, ruling, and ratio. With the help of legal experts, we propose a set of 13 fine-grained rhetorical role labels and create a new corpus of legal documents annotated with the proposed RR. We develop a system for segmenting a document into rhetorical role units. In particular, we develop a multitask learning-based deep learning model with document rhetorical role label shift as an auxiliary task for segmenting a legal document. We experiment extensively with various deep learning models for predicting rhetorical roles in a document, and the proposed model shows superior performance over the existing models. Further, we apply RR for predicting the judgment of legal cases and show that the use of RR enhances the prediction compared to the transformer-based models.
摘要|信息提取(2篇)
【1】 The Influence of Data Pre-processing and Post-processing on Long Document Summarization 标题:数据预处理和后处理对长文档摘要的影响 链接:https://arxiv.org/abs/2112.01660
作者:Xinwei Du,Kailun Dong,Yuchen Zhang,Yongsheng Li,Ruei-Yu Tsay 摘要:长文档摘要是自然语言处理领域的一项重要而艰巨的任务。长文档摘要的良好性能表明该模型对人类语言有很好的理解。目前,大多数研究集中在如何修改Transformer的注意机制,以实现更高的胭脂评分。对数据预处理和后处理的研究相对较少。在本文中,我们使用了两种前处理方法和一种后处理方法,并分析了这些方法对各种长文档摘要模型的影响。 摘要:Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-processing are relatively few. In this paper, we use two pre-processing methods and a post-processing method and analyze the effect of these methods on various long document summarization models.
【2】 InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation 标题:InfoLM:一种评价摘要的新指标&Data2Text生成 链接:https://arxiv.org/abs/2112.01589
作者:Pierre Colombo,Chloe Clave,Pablo Piantanida 备注:None 摘要:通过人工注释评估自然语言生成系统的质量非常昂贵。此外,人工注释活动非常耗时,包括不可重复使用的人工劳动。在实践中,研究人员依靠自动度量作为质量的代理。在过去十年中,引入了许多基于字符串的度量(例如BLEU)。然而,这些度量通常依赖于精确匹配,因此不能可靠地处理同义词。在本文中,我们介绍了InfoLM一系列未经训练的度量,这些度量可以被视为基于字符串的度量,由于预先训练的屏蔽语言模型,可以解决上述缺陷。这一系列指标还利用了信息度量,允许InfoLM适应各种评估标准。通过使用直接评估,我们证明InfoLM在统计上取得了显著的改进,在许多配置中,在摘要和data2text生成方面的相关收益超过10美元。 摘要:Assessing the quality of natural language generation systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the adaptation of InfoLM to various evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and over $10$ points of correlation gains in many configurations on both summarization and data2text generation.
推理|分析|理解|解释(1篇)
【1】 Probing Linguistic Information For Logical Inference In Pre-trained Language Models 标题:在预先训练的语言模型中探索逻辑推理的语言信息 链接:https://arxiv.org/abs/2112.01753
作者:Zeming Chen,Qiyue Gao 备注:Accepted in AAAI 2022 摘要:在预先训练的语言模型方面取得的进展已经在自然语言理解的下游任务上取得了令人印象深刻的成果。最近关于探索预先训练的语言模型的工作揭示了在其语境化表达中编码的广泛的语言特性。然而,目前尚不清楚它们是否编码了对符号推理方法至关重要的语义知识。我们提出了一种在预先训练的语言模型表示中探测逻辑推理的语言信息的方法。我们的探测数据集涵盖了主要符号推理系统所需的语言现象列表。我们发现(i)预先训练的语言模型确实编码了几种类型的语言信息用于推理,但也有一些类型的信息是弱编码的,(ii)语言模型可以通过微调有效地学习缺失的语言信息。总的来说,我们的研究结果提供了语言模型及其训练前程序捕捉逻辑推理的语言信息的哪些方面的见解。此外,我们还展示了语言模型作为支持符号推理方法的语义和背景知识库的潜力。 摘要:Progress in pre-trained language models has led to a surge of impressive results on downstream tasks for natural language understanding. Recent work on probing pre-trained language models uncovered a wide range of linguistic properties encoded in their contextualized representations. However, it is unclear whether they encode semantic knowledge that is crucial to symbolic inference methods. We propose a methodology for probing linguistic information for logical inference in pre-trained language model representations. Our probing datasets cover a list of linguistic phenomena required by major symbolic inference systems. We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded, (ii) language models can effectively learn missing linguistic information through fine-tuning. Overall, our findings provide insights into which aspects of linguistic information for logical inference do language models and their pre-training procedures capture. Moreover, we have demonstrated language models' potential as semantic and background knowledge bases for supporting symbolic inference methods.
GAN|对抗|攻击|生成相关(2篇)
【1】 Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition Systems 标题:自动语音识别系统的黑盒非目标对抗性测试 链接:https://arxiv.org/abs/2112.01821
作者:Xiaoliang Wu,Ajitha Rajan 备注:10 pages, 6 figures and 7 tables 摘要:自动语音识别(ASR)系统非常流行,尤其是在家用电器的语音导航和语音控制应用中。ASR的计算核心是深度神经网络(DNN),已被证明易受对抗性扰动的影响;攻击者容易误用以生成恶意输出。为了帮助测试ASR的正确性,我们提出了自动生成blackbox(对DNN不可知)的技术,该技术是跨ASR可移植的非目标对抗性攻击。关于对抗性ASR测试的许多现有工作都集中在有针对性的攻击上,即在给定输出文本的情况下生成音频样本。针对特定ASR内DNN(白盒)结构定制的目标技术不可移植。相反,我们的方法攻击了ASR管道的信号处理阶段,该阶段在大多数ASR中共享。此外,我们通过使用心理声学模型操纵声音信号,将信号保持在人类感知阈值以下,从而确保生成的对抗性音频样本没有人类听觉差异。我们使用三种流行的ASR和三种输入音频数据集,使用输出文本的权重、与原始音频的相似性以及不同ASR上的攻击成功率,评估了我们技术的可移植性和有效性。我们发现,我们的测试技术可以跨ASR进行移植,对抗性音频样本产生了高成功率、WER以及与原始音频的相似性。 摘要:Automatic speech recognition (ASR) systems are prevalent, particularly in applications for voice navigation and voice control of domestic appliances. The computational core of ASRs are deep neural networks (DNNs) that have been shown to be susceptible to adversarial perturbations; easily misused by attackers to generate malicious outputs. To help test the correctness of ASRS, we propose techniques that automatically generate blackbox (agnostic to the DNN), untargeted adversarial attacks that are portable across ASRs. Much of the existing work on adversarial ASR testing focuses on targeted attacks, i.e generating audio samples given an output text. Targeted techniques are not portable, customised to the structure of DNNs (whitebox) within a specific ASR. In contrast, our method attacks the signal processing stage of the ASR pipeline that is shared across most ASRs. Additionally, we ensure the generated adversarial audio samples have no human audible difference by manipulating the acoustic signal using a psychoacoustic model that maintains the signal below the thresholds of human perception. We evaluate portability and effectiveness of our techniques using three popular ASRs and three input audio datasets using the metrics - WER of output text, Similarity to original audio and attack Success Rate on different ASRs. We found our testing techniques were portable across ASRs, with the adversarial audio samples producing high Success Rates, WERs and Similarities to the original audio.
【2】 PLSUM: Generating PT-BR Wikipedia by Summarizing Multiple Websites 标题:PLSUM:通过摘要多个网站生成PT-BR维基百科 链接:https://arxiv.org/abs/2112.01591
作者:André Seidel Oliveira,Anna Helena Reali Costa 备注:Published on Encontro Nacional de Intelig^encia Artificial e Computacional (ENIAC) 2021 conference 摘要:维基百科是可理解知识的重要免费来源。尽管如此,巴西葡萄牙语维基百科仍然缺乏对许多主题的描述。为了扩展巴西维基百科,我们提供了PLSum,一个从多个描述性网站生成类似维基的抽象摘要的框架。该框架有一个提取阶段,然后是抽象阶段。特别是在抽象阶段,我们对Transformer神经网络的两种最新变化PTT5和Longformer进行了微调和比较。为了对模型进行微调和评估,我们创建了一个包含数千个示例的数据集,将参考网站链接到维基百科。我们的结果表明,从巴西葡萄牙语网页内容中生成有意义的摘要是可能的。 摘要:Wikipedia is an important free source of intelligible knowledge. Despite that, Brazilian Portuguese Wikipedia still lacks descriptions for many subjects. In an effort to expand the Brazilian Wikipedia, we contribute PLSum, a framework for generating wiki-like abstractive summaries from multiple descriptive websites. The framework has an extractive stage followed by an abstractive one. In particular, for the abstractive stage, we fine-tune and compare two recent variations of the Transformer neural network, PTT5, and Longformer. To fine-tune and evaluate the model, we created a dataset with thousands of examples, linking reference websites to Wikipedia. Our results show that it is possible to generate meaningful abstractive summaries from Brazilian Portuguese web content.
检测相关(1篇)
【1】 HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla 标题:HS-BAN:用于孟加拉仇恨言论检测的社交媒体评论基准数据集 链接:https://arxiv.org/abs/2112.01902
作者:Nauros Romim,Mosahed Ahmed,Md Saiful Islam,Arnab Sen Sharma,Hriteshwar Talukder,Mohammad Ruhul Amin 备注:Submitted to ICON 21 (Rejected) 摘要:在本文中,我们介绍了HS-BAN,一个孟加拉语言的二元类仇恨言论(HS)数据集,包含50000多条标签评论,其中40.17%是仇恨言论,其余为非仇恨言论。在准备数据集时,遵循了严格而详细的注释指南,以减少人类注释偏差。HS数据集还进行了语言预处理,以提取目前人们使用符号、首字母缩略词或替代拼写书写的不同类型的俚语。这些俚语词被进一步分为传统俚语和非传统俚语两类,并包含在本文的研究结果中。我们探索了传统的语言特征和基于神经网络的方法,开发了孟加拉语言仇恨语音检测的基准系统。我们的实验结果表明,现有的基于非正式文本训练的单词嵌入模型比基于正式文本训练的模型表现得更好。我们的基准测试表明,在FastText非正式单词嵌入之上的Bi LSTM模型获得了86.78%的F1分数。我们将使数据集可供公众使用。 摘要:In this paper, we present HS-BAN, a binary class hate speech (HS) dataset in Bangla language consisting of more than 50,000 labeled comments, including 40.17% hate and rest are non hate speech. While preparing the dataset a strict and detailed annotation guideline was followed to reduce human annotation bias. The HS dataset was also preprocessed linguistically to extract different types of slang currently people write using symbols, acronyms, or alternative spellings. These slang words were further categorized into traditional and non-traditional slang lists and included in the results of this paper. We explored traditional linguistic features and neural network-based methods to develop a benchmark system for hate speech detection for the Bangla language. Our experimental results show that existing word embedding models trained with informal texts perform better than those trained with formal text. Our benchmark shows that a Bi-LSTM model on top of the FastText informal word embedding achieved 86.78% F1-score. We will make the dataset available for public use.
识别/分类(1篇)
【1】 Multilingual Text Classification for Dravidian Languages 标题:面向德拉威甸语的多语言文本分类 链接:https://arxiv.org/abs/2112.01705
作者:Xiaotian Lin,Nankai Lin,Kanoksak Wattanachote,Shengyi Jiang,Lianxi Wang 摘要:作为世界第四大语系,德拉威语已经成为自然语言处理领域的研究热点。尽管德拉维甸语包含大量的语言,但公共可用资源相对较少。此外,文本分类任务作为自然语言处理的一项基本任务,如何将其与德拉威语中的多种语言相结合,仍然是德拉威语自然语言处理的一大难点。因此,为了解决这些问题,我们为德拉威语提出了一个多语言文本分类框架。一方面,该框架使用LaBSE预训练模型作为基础模型。针对多任务学习中的文本信息偏差问题,我们提出使用传销策略来选择语言特定的单词,并使用对抗性训练来干扰它们。另一方面,针对模型不能很好地识别和利用语言之间的相关性的问题,我们进一步提出了一种特定于语言的表示模块来丰富模型的语义信息。实验结果表明,我们提出的框架在多语言文本分类任务中具有显著的性能,每种策略都取得了一定的改进。 摘要:As the fourth largest language family in the world, the Dravidian languages have become a research hotspot in natural language processing (NLP). Although the Dravidian languages contain a large number of languages, there are relatively few public available resources. Besides, text classification task, as a basic task of natural language processing, how to combine it to multiple languages in the Dravidian languages, is still a major difficulty in Dravidian Natural Language Processing. Hence, to address these problems, we proposed a multilingual text classification framework for the Dravidian languages. On the one hand, the framework used the LaBSE pre-trained model as the base model. Aiming at the problem of text information bias in multi-task learning, we propose to use the MLM strategy to select language-specific words, and used adversarial training to perturb them. On the other hand, in view of the problem that the model cannot well recognize and utilize the correlation among languages, we further proposed a language-specific representation module to enrich semantic information for the model. The experimental results demonstrated that the framework we proposed has a significant performance in multilingual text classification tasks with each strategy achieving certain improvements.
Word2Vec|文本|单词(2篇)
【1】 Evaluating NLP Systems On a Novel Cloze Task: Judging the Plausibility of Possible Fillers in Instructional Texts 标题:评估新完形填空任务中的NLP系统:判断教学文本中可能填充词的可信性 链接:https://arxiv.org/abs/2112.01867
作者:Zizhao Hu,Ravikiran Chanumolu,Xingyu Lin,Nayela Ayaz,Vincent Chi 摘要:完形填空是评价自然语言处理系统语言理解能力的一项常用任务。然而,大多数现有的完形填空任务只要求NLP系统以一致的方式对每个输入数据样本给出相对最佳的预测,而不是所有可能预测的绝对质量。因此提出了一个新的任务:预测完形填空任务中的填充词是好的、中性的还是坏的候选词。复杂的版本可以扩展到预测更多的离散类或连续分数。我们关注Semeval 2022任务7中的子任务A,探索了解决此新任务的一些可能的体系结构,提供了它们的详细比较,并提出了一种集成方法来改进此新任务中的传统模型。 摘要:Cloze task is a widely used task to evaluate an NLP system's language understanding ability. However, most of the existing cloze tasks only require NLP systems to give the relative best prediction for each input data sample, rather than the absolute quality of all possible predictions, in a consistent way across the input domain. Thus a new task is proposed: predicting if a filler word in a cloze task is a good, neutral, or bad candidate. Complicated versions can be extended to predicting more discrete classes or continuous scores. We focus on subtask A in Semeval 2022 task 7, explored some possible architectures to solve this new task, provided a detailed comparison of them, and proposed an ensemble method to improve traditional models in this new task.
【2】 A Proposal of Automatic Error Correction in Text 标题:一种文本自动纠错的方案 链接:https://arxiv.org/abs/2112.01846
作者:Wulfrano A. Luna-Ramírez,Carlos R. Jaimez-González 备注:None 摘要:可以存储在电子媒体中的大量信息每天都在增长。其中很多主要是通过打字获得的,比如从Web2.0网站获得的大量信息;或者通过光学字符识别软件进行扫描和处理,比如图书馆和政府办公室的文本。这两个过程都会在文本中引入错误,因此很难将数据用于除阅读之外的其他目的,即通过其他应用程序(如电子学习、语言学习、电子教程、数据挖掘、信息检索以及更专业的系统,如TIFLOGIC软件)处理这些文本,特别是面向人的盲应用程序,如自动阅读,其中文本将尽可能无错误,以简化文本到语音的任务,等等。本文介绍了一种自动识别和纠正电子文本中印刷错误的应用。该任务由三个阶段组成:a)错误检测;b) 候选修正生成;c)纠正——选择最佳候选人。该方案基于词性文本分类、词语相似度、词语分类、统计度量、形态分析和基于n-grams的西班牙语语言模型。 摘要:The great amount of information that can be stored in electronic media is growing up daily. Many of them is got mainly by typing, such as the huge of information obtained from web 2.0 sites; or scaned and processing by an Optical Character Recognition software, like the texts of libraries and goverment offices. Both processes introduce error in texts, so it is difficult to use the data for other purposes than just to read it, i.e. the processing of those texts by other applications like e-learning, learning of languages, electronic tutorials, data minning, information retrieval and even more specialized systems such as tiflologic software, specifically blinded people-oriented applications like automatic reading, where the text would be error free as possible in order to make easier the text to speech task, and so on. In this paper it is showed an application of automatic recognition and correction of ortographic errors in electronic texts. This task is composed of three stages: a) error detection; b) candidate corrections generation; and c) correction -selection of the best candidate. The proposal is based in part of speech text categorization, word similarity, word diccionaries, statistical measures, morphologic analisys and n-grams based language model of Spanish.
其他神经网络|深度学习|模型|建模(1篇)
【1】 Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research 标题:缩减、重用和再循环:机器学习研究中的数据集寿命 链接:https://arxiv.org/abs/2112.01716
作者:Bernard Koch,Emily Denton,Alex Hanna,Jacob G. Foster 备注:35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia 摘要:基准数据集在机器学习研究的组织中起着核心作用。他们围绕共同的研究问题协调研究人员,并作为实现共同目标进展的衡量标准。尽管基准测试实践在这一领域起着基础性作用,但在机器学习子单元内部或之间,对基准数据集使用和重用的动态性关注相对较少。在本文中,我们深入研究了这些动力学。我们研究了2015-2020年间数据集使用模式在机器学习子单元和时间上的差异。我们发现,越来越多的人关注任务社区中越来越少的数据集,大量采用其他任务中的数据集,以及研究人员在少数精英机构中引入的数据集。我们的研究结果对科学评估、人工智能伦理和领域内的公平/准入具有影响。 摘要:Benchmark datasets play a central role in the organization of machine learning research. They coordinate researchers around shared research problems and serve as a measure of progress towards shared goals. Despite the foundational role of benchmarking practices in this field, relatively little attention has been paid to the dynamics of benchmark dataset use and reuse, within or across machine learning subcommunities. In this paper, we dig into these dynamics. We study how dataset usage patterns differ across machine learning subcommunities and across time from 2015-2020. We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions. Our results have implications for scientific evaluation, AI ethics, and equity/access within the field.
其他(11篇)
【1】 Survey on English Entity Linking on Wikidata 标题:关于维基数据上英文实体链接的调查 链接:https://arxiv.org/abs/2112.01989
作者:Cedric Möller,Jens Lehmann,Ricardo Usbeck 备注:Disclaimer: Cedric M"oller, Jens Lehmann, Ricardo Usbeck, 2021. The definitive, peer reviewed and edited version of this article is published in the Semantic Web Journal, Special issue: Latest Advancements in Linguistic 3 Linked Data, 2021 摘要:Wikidata是一个经常更新的、社区驱动的、多语言的知识图。因此,维基数据是实体链接的一个有吸引力的基础,最近发表的论文的增加就是明证。本次调查主要关注四个主题:(1)链接数据集的Wikidata实体存在哪些,它们的使用范围有多广,以及它们是如何构建的?(2) Wikidata的特性对实体链接数据集的设计有影响吗?如果有,如何影响?(3) 当前的实体链接方法如何利用Wikidata的特定特性?(4) 现有实体链接方法未利用哪些Wikidata特性?这项调查显示,当前Wikidata特定实体链接数据集的注释方案与其他知识图(如DBpedia)的注释方案没有区别。因此,自然适合Wikidata的多语言和时间相关数据集的潜力并没有被释放。此外,我们还表明,大多数实体链接方法使用Wikidata的方式与任何其他知识图使用Wikidata的方式相同,没有机会利用Wikidata特定的特性来提高质量。几乎所有的方法都使用特定的属性,如标签,有时使用描述,但忽略了超关系结构等特征。因此,仍然有改进的余地,例如,通过包含超关系图嵌入或类型信息。许多方法还包括来自Wikipedia的信息,它很容易与Wikidata结合,并提供Wikidata所缺乏的有价值的文本信息。 摘要:Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.
【2】 Augmenting Customer Support with an NLP-based Receptionist 标题:通过基于NLP的接待员增强客户支持 链接:https://arxiv.org/abs/2112.01959
作者:André Barbosa,Alan Godoy 备注:None 摘要:在本文中,我们展示了如何将葡萄牙语BERT模型与结构化数据相结合,以部署基于有限状态机的聊天机器人,从而创建一个对话AI系统,帮助房地产公司预测其客户的联系动机。该模型在包含235个不平衡标签的数据集中实现人的级别结果。然后,我们还将其与经典NLP方法进行比较,从业务影响的角度展示其优势。 摘要:In this paper, we show how a Portuguese BERT model can be combined with structured data in order to deploy a chatbot based on a finite state machine to create a conversational AI system that helps a real-estate company to predict its client's contact motivation. The model achieves human level results in a dataset that contains 235 unbalanced labels. Then, we also show its benefits considering the business impact comparing it against classical NLP methods.
【3】 The Catalan Language CLUB 标题:加泰罗尼亚语言俱乐部 链接:https://arxiv.org/abs/2112.01894
作者:Carlos Rodriguez-Penagos,Carme Armentano-Oller,Marta Villegas,Maite Melero,Aitor Gonzalez,Ona de Gibert Bonet,Casimiro Carrino Pio 备注:OpenCor Forum 2021. arXiv admin note: text overlap with arXiv:2107.07903 摘要:加泰罗尼亚语言理解基准(CLUB)包含代表不同NLU任务的各种数据集,这些数据集能够准确评估语言模型,遵循通用语言理解评估(GLUE)示例。它是AINA和PlanTL的一部分,这两项公共资助计划旨在增强人工智能时代加泰罗尼亚语言的能力。 摘要:The Catalan Language Understanding Benchmark (CLUB) encompasses various datasets representative of different NLU tasks that enable accurate evaluations of language models, following the General Language Understanding Evaluation (GLUE) example. It is part of AINA and PlanTL, two public funding initiatives to empower the Catalan language in the Artificial Intelligence era.
【4】 Automatic evaluation of scientific abstracts through natural language processing 标题:基于自然语言处理的科技摘要自动评价 链接:https://arxiv.org/abs/2112.01842
作者:Lucas G. O. Lopes,Thales M. A. Vieira,William W. M. Lira 摘要:这项工作提出了一个框架来分类和评估不同的研究摘要文本,这些文本侧重于过程及其应用的描述。在这种背景下,本文提出了自然语言处理算法来分类、分割和评估科学工作的结果。最初,提出的框架根据文本分类方法要解决的问题,将抽象文本分类为多个类别。然后,将抽象文本分为问题描述、方法和结果。最后,基于对摘要结果的情感分析,对摘要的方法论进行了排名。建议的框架使我们能够快速排列解决特定问题的最佳方法。为了验证所提出的框架,对采油异常摘要进行了实验,并取得了令人满意的结果。 摘要:This work presents a framework to classify and evaluate distinct research abstract texts which are focused on the description of processes and their applications. In this context, this paper proposes natural language processing algorithms to classify, segment and evaluate the results of scientific work. Initially, the proposed framework categorize the abstract texts into according to the problems intended to be solved by employing a text classification approach. Then, the abstract text is segmented into problem description, methodology and results. Finally, the methodology of the abstract is ranked based on the sentiment analysis of its results. The proposed framework allows us to quickly rank the best methods to solve specific problems. To validate the proposed framework, oil production anomaly abstracts were experimented and achieved promising results.
【5】 Translating Politeness Across Cultures: Case of Hindi and English 标题:跨文化礼貌翻译:以印地语和英语为例 链接:https://arxiv.org/abs/2112.01822
作者:Ritesh Kumar,Girish Nath Jha 备注:None 摘要:在本文中,我们提出了一个基于语料库的研究礼貌跨两种语言英语和印地语。它研究印地语和英语平行翻译语料库中的礼貌,并观察印地语文本中的礼貌是如何翻译成英语的。我们提供了进行比较的详细理论背景,然后简要描述了该理论模型中的翻译数据。由于礼貌可能成为冲突和误解的主要原因之一,因此从跨文化角度研究和理解礼貌是一个非常重要的现象,特别是在机器翻译等目的中。 摘要:In this paper, we present a corpus based study of politeness across two languages-English and Hindi. It studies the politeness in a translated parallel corpus of Hindi and English and sees how politeness in a Hindi text is translated into English. We provide a detailed theoretical background in which the comparison is carried out, followed by a brief description of the translated data within this theoretical model. Since politeness may become one of the major reasons of conflict and misunderstanding, it is a very important phenomenon to be studied and understood cross-culturally, particularly for such purposes as machine translation.
【6】 Creating and Managing a large annotated parallel corpora of Indian languages 标题:创建和管理大型带注释的印度语言平行语料库 链接:https://arxiv.org/abs/2112.01764
作者:Ritesh Kumar,Shiv Bhusan Kaushik,Pinkey Nainwani,Girish Nath Jha 备注:None 摘要:本文介绍了作为印度政府信息技术部(DIT)资助的大型联合项目的一部分,创建和管理12种主要印度语言的大型平行语料库(很快将扩展到23种语言),并在印度10所不同大学并行运行的挑战。为了有效地管理这些大型语料库的创建和传播过程,开发了基于网络的(也有简化的独立版本)注释工具ILCIANN(印度语言语料库倡议注释工具)。它主要是为POS注释开发的,以及由能力不同的人在地理位置相距很远的地方管理语料库注释。为了在创建语料库时保持一致性和标准,每个人都必须在该工具提供的公共平台上工作。 摘要:This paper presents the challenges in creating and managing large parallel corpora of 12 major Indian languages (which is soon to be extended to 23 languages) as part of a major consortium project funded by the Department of Information Technology (DIT), Govt. of India, and running parallel in 10 different universities of India. In order to efficiently manage the process of creation and dissemination of these huge corpora, the web-based (with a reduced stand-alone version also) annotation tool ILCIANN (Indian Languages Corpora Initiative Annotation Tool) has been developed. It was primarily developed for the POS annotation as well as the management of the corpus annotation by people with differing amount of competence and at locations physically situated far apart. In order to maintain consistency and standards in the creation of the corpora, it was necessary that everyone works on a common platform which was provided by this tool.
【7】 Given Users Recommendations Based on Reviews on Yelp 标题:根据对Yelp的评论向用户提供推荐 链接:https://arxiv.org/abs/2112.01762
作者:Shuwei Zhang,Maiqi Tang,Qingyang Zhang,Yucan Luo,Yuhui Zou 摘要:在我们的项目中,我们专注于基于NLP的混合推荐系统。我们的数据来自Yelp数据。对于我们的混合推荐系统,我们有两个主要组成部分:第一部分是将评论嵌入Bert模型和word2vec模型;第二部分是基于项目的协同过滤算法的实现,用于计算不同类别餐厅下每个评论的相似性。最后,借助相似性得分,我们能够根据用户记录的评论向用户推荐最匹配的餐厅。编码工作分为几个部分:选择样本和数据清理、处理、嵌入、计算相似度以及计算预测和误差。由于数据的大小,每个部分将生成一个或多个JSON文件作为里程碑,以减少内存压力和每个部分之间的通信。 摘要:In our project, we focus on NLP-based hybrid recommendation systems. Our data is from Yelp Data. For our hybrid recommendation system, we have two major components: the first part is to embed the reviews with the Bert model and word2vec model; the second part is the implementation of an item-based collaborative filtering algorithm to compute the similarity of each review under different categories of restaurants. In the end, with the help of similarity scores, we are able to recommend users the most matched restaurant based on their recorded reviews. The coding work is split into several parts: selecting samples and data cleaning, processing, embedding, computing similarity, and computing prediction and error. Due to the size of the data, each part will generate one or more JSON files as the milestone to reduce the pressure on memory and the communication between each part.
【8】 BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge 标题:BBS-KWS:普通话关键词识别系统荣获视频关键词唤醒挑战赛冠军 链接:https://arxiv.org/abs/2112.01757
作者:Yuting Yang,Binbin Du,Yingxin Zhang,Wenxuan Wang,Yuke Li 摘要:本文介绍了义敦NISP团队提交给视频关键词唤醒挑战赛的系统。我们提出了一个汉语关键词识别系统(KWS),并对其进行了一些新颖而有效的改进,包括一个大主干(B)模型、一个关键词偏倚(B)机制和音节建模单元的引入。考虑到这一点,我们将整个系统BBS-KWS称为缩写。BBS-KWS系统由端到端自动语音识别(ASR)模块和KWS模块组成。ASR模块将语音特征转换为文本表示,将一个大型骨干网络应用于声学模型,并考虑音节建模单元。此外,在ASR推理阶段,采用关键词偏向机制来提高关键词的召回率。KWS模块应用多个标准来确定关键字是否存在,例如多阶段匹配、模糊匹配和连接主义时间分类(CTC)前缀分数。为了进一步改进我们的系统,我们在CN-Celeb数据集上进行了半监督学习,以获得更好的泛化效果。在VKW任务中,BBS-KWS系统在基线上取得了显著的进步,并在两个轨道上获得了第一名。 摘要:This paper introduces the system submitted by the Yidun NISP team to the video keyword wakeup challenge. We propose a mandarin keyword spotting system (KWS) with several novel and effective improvements, including a big backbone (B) model, a keyword biasing (B) mechanism and the introduction of syllable modeling units (S). By considering this, we term the total system BBS-KWS as an abbreviation. The BBS-KWS system consists of an end-to-end automatic speech recognition (ASR) module and a KWS module. The ASR module converts speech features to text representations, which applies a big backbone network to the acoustic model and takes syllable modeling units into consideration as well. In addition, the keyword biasing mechanism is used to improve the recall rate of keywords in the ASR inference stage. The KWS module applies multiple criteria to determine the absence or presence of the keywords, such as multi-stage matching, fuzzy matching, and connectionist temporal classification (CTC) prefix score. To further improve our system, we conduct semi-supervised learning on the CN-Celeb dataset for better generalization. In the VKW task, the BBS-KWS system achieves significant gains over the baseline and won the first place in two tracks.
【9】 LongChecker: Improving scientific claim verification by modeling full-abstract context 标题:LongChecker:通过对全抽象上下文建模来改进科学索赔验证 链接:https://arxiv.org/abs/2112.01640
作者:David Wadden,Kyle Lo,Lucy Lu Wang,Arman Cohan,Iz Beltagy,Hannaneh Hajishirzi 备注:Preprint - work in progress. 9 pages 摘要:我们介绍了用于科学索赔验证的LongChecker系统。给出一份科学声明和一份包含研究摘要的证据,LongChecker根据声明和摘要的共享编码,预测一个准确性标签,并以多任务方式确定支持的理由。我们在SciFact数据集上进行了实验,发现LongChecker实现了最先进的性能。我们进行分析以了解这一改进的来源,并发现确定索赔和报告科学发现的理由之间的关系通常需要了解理由出现的背景。通过基于所有可用上下文做出标记决策,LongChecker在需要此类理解的情况下获得了更好的性能。此外,我们还证明了LongChecker能够利用弱监督的域内数据,以促进科学索赔验证的Few-Shot域自适应。 摘要:We introduce the LongChecker system for scientific claim verification. Given a scientific claim and an evidence-containing research abstract, LongChecker predicts a veracity label and identifies supporting rationales in a multitask fashion based on a shared encoding of the claim and abstract. We perform experiments on the SciFact dataset, and find that LongChecker achieves state-of-the-art performance. We conduct analysis to understand the source of this improvement, and find that identifying the relationship between a claim and a rationale reporting a scientific finding often requires understanding the context in which the rationale appears. By making labeling decisions based on all available context, LongChecker achieves better performance on cases requiring this type of understanding. In addition, we show that LongChecker is able to leverage weakly-supervised in-domain data to facilitate few-shot domain adaptation for scientific claim verification.
【10】 A Survey on Awesome Korean NLP Datasets 标题:令人敬畏的朝鲜语自然语言处理数据集综述 链接:https://arxiv.org/abs/2112.01624
作者:Byunghyun Ban 备注:11 pages, 1 horizontal page for large table 摘要:基于英语的数据集通常可从Kaggle、GitHub或最近发表的论文中获得。尽管使用英语数据集进行的基准测试足以展示新模型和方法的性能,但研究人员仍需要在基于韩语的数据集上训练和验证模型,以生产适合韩语加工的技术或产品。本文介绍了15个流行的基于韩语的NLP数据集,并总结了一些细节,如卷、许可证、存储库以及受这些数据集启发的其他研究结果。此外,我还提供了数据集样本或统计数据的高分辨率说明。数据集的主要特征显示在一个表中,为研究人员提供数据集的快速摘要。 摘要:English based datasets are commonly available from Kaggle, GitHub, or recently published papers. Although benchmark tests with English datasets are sufficient to show off the performances of new models and methods, still a researcher need to train and validate the models on Korean based datasets to produce a technology or product, suitable for Korean processing. This paper introduces 15 popular Korean based NLP datasets with summarized details such as volume, license, repositories, and other research results inspired by the datasets. Also, I provide high-resolution instructions with sample or statistics of datasets. The main characteristics of datasets are presented on a single table to provide a rapid summarization of datasets for researchers.
【11】 Evaluator for Emotionally Consistent Chatbots 标题:情感一致聊天机器人的评价器 链接:https://arxiv.org/abs/2112.01616
作者:Chenxiao Liu,Guanzhi Deng,Tao Ji,Difei Tang,Silai Zheng 备注:7 pages, 6 charts, 1 figure 摘要:评估当前序列级或对话级聊天机器人(如移情开放域会话模型)的一个挑战是确定聊天机器人是否以情感一致的方式执行。最近的研究仅从语境连贯性、语言流畅性、反应多样性或对话之间的逻辑自我一致性等方面进行评价。这项工作建议训练一名评估员来确定聊天机器人的情感一致性。 摘要:One challenge for evaluating current sequence- or dialogue-level chatbots, such as Empathetic Open-domain Conversation Models, is to determine whether the chatbot performs in an emotionally consistent way. The most recent work only evaluates on the aspects of context coherence, language fluency, response diversity, or logical self-consistency between dialogues. This work proposes training an evaluator to determine the emotional consistency of chatbots.
机器翻译,仅供参考