自然语言处理学术速递[12.24]

2021-12-27 17:09:08 浏览数 (1)

cs.CL 方向,今日共计15篇

QA|VQA|问答|对话(2篇)

【1】 TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations 标题:TOD-DA:提高面向任务的对话建模在口语会话中的健壮性 链接:https://arxiv.org/abs/2112.12441

作者:Xin Tian,Xinxian Huang,Dongfeng He,Yingzhan Lin,Siqi Bao,Huang He,Liankai Huang,Qiang Ju,Xiyuan Zhang,Jian Xie,Shuqi Sun,Fan Wang,Hua Wu,Haifeng Wang 机构:Baidu Inc., China 备注:Accepted to the AAAI-22 DSTC10 Workshop. First three authors contributed equally to this work 摘要:面向任务的对话系统一直受到难以获得大规模和高质量的带注释对话的困扰。此外,大多数公开的数据集只包含书面对话,这不足以反映实际口语对话系统中的实际人类行为。在本文中,我们提出了面向任务的对话数据扩充(TOD-DA),这是一种新的模型不可知的数据扩充范式,以增强面向任务的对话建模对口语对话的鲁棒性。TOD-DA由两个模块组成:1)对话丰富,用于扩展面向任务的对话的训练数据,以缓解数据稀疏性;2)口语对话模拟器,用于以不同粒度模拟口语风格表达和语音识别错误,以弥合书面和口语对话之间的差距。有了这样的设计,我们的方法在DSTC10 Track2的两项任务中都排名第一,DSTC10 Track2是口语对话中面向任务的对话建模的基准,证明了我们提出的TOD-DA的优越性和有效性。 摘要:Task-oriented dialogue systems have been plagued by the difficulties of obtaining large-scale and high-quality annotated conversations. Furthermore, most of the publicly available datasets only include written conversations, which are insufficient to reflect actual human behaviors in practical spoken dialogue systems. In this paper, we propose Task-oriented Dialogue Data Augmentation (TOD-DA), a novel model-agnostic data augmentation paradigm to boost the robustness of task-oriented dialogue modeling on spoken conversations. The TOD-DA consists of two modules: 1) Dialogue Enrichment to expand training data on task-oriented conversations for easing data sparsity and 2) Spoken Conversation Simulator to imitate oral style expressions and speech recognition errors in diverse granularities for bridging the gap between written and spoken conversations. With such designs, our approach ranked first in both tasks of DSTC10 Track2, a benchmark for task-oriented dialogue modeling on spoken conversations, demonstrating the superiority and effectiveness of our proposed TOD-DA.

【2】 Investigating Effect of Dialogue History in Multilingual Task Oriented Dialogue Systems 标题:对话历史在面向多语言任务的对话系统中的作用研究 链接:https://arxiv.org/abs/2112.12318

作者:Michael Sun,Kaili Huang,Mehrad Moradshahi 机构:Department of Computer Science, Stanford University 摘要:虽然英语虚拟助理通过大量的训练资源取得了令人振奋的成绩,但非英语使用者的需求并没有得到很好的满足。截至12月2021日,Alexa是世界上最受欢迎的智能扬声器之一,能够支持9种不同的语言(1),而世界上有成千上万种语言,根据2019(2)中发表的统计,其中91种语言由1000万多人讲。然而,用英语以外的其他语言训练虚拟助理往往更加困难,尤其是对于那些资源不足的语言。缺乏高质量的训练数据限制了模型的性能,导致用户满意度差。因此,我们使用与BiToD相同的数据集生成管道和端到端对话系统架构[5],为多语言任务导向对话系统设计了一个高效的训练解决方案,它采用了一些关键的设计选择来进行简约自然语言设计,其中使用正式的对话状态来代替自然语言输入。这减少了较弱的自然语言模型带来的错误空间,并确保模型能够正确提取执行对话状态跟踪(DST)所需的基本时隙值。我们的目标是减少每个回合的自然语言编码量,我们研究的关键参数是作为历史输入模型的回合数(H)。我们首先探讨了一个转折点,即增加H开始对整体性能产生有限的回报。然后,我们检查了一个小H的模型出错的例子是否可以被分类,以便该模型进行一些镜头微调。最后,我们将探讨这种方法的局限性,以及是否存在这种方法无法解决的特定类型的示例。 摘要:While the English virtual assistants have achieved exciting performance with an enormous amount of training resources, the needs of non-English-speakers have not been satisfied well. Up to Dec 2021, Alexa, one of the most popular smart speakers around the world, is able to support 9 different languages [1], while there are thousands of languages in the world, 91 of which are spoken by more than 10 million people according to statistics published in 2019 [2]. However, training a virtual assistant in other languages than English is often more difficult, especially for those low-resource languages. The lack of high-quality training data restricts the performance of models, resulting in poor user satisfaction. Therefore, we devise an efficient and effective training solution for multilingual task-orientated dialogue systems, using the same dataset generation pipeline and end-to-end dialogue system architecture as BiToD[5], which adopted some key design choices for a minimalistic natural language design where formal dialogue states are used in place of natural language inputs. This reduces the room for error brought by weaker natural language models, and ensures the model can correctly extract the essential slot values needed to perform dialogue state tracking (DST). Our goal is to reduce the amount of natural language encoded at each turn, and the key parameter we investigate is the number of turns (H) to feed as history to model. We first explore the turning point where increasing H begins to yield limiting returns on the overall performance. Then we examine whether the examples a model with small H gets wrong can be categorized in a way for the model to do few-shot finetuning on. Lastly, will explore the limitations of this approach, and whether there is a certain type of examples that this approach will not be able to resolve.

Graph|知识图谱|Knowledge(3篇)

【1】 ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation 标题:Ernie 3.0 Titan:探索更大范围的知识增强语言理解和生成的前期训练 链接:https://arxiv.org/abs/2112.12731

作者:Shuohuan Wang,Yu Sun,Yang Xiang,Zhihua Wu,Siyu Ding,Weibao Gong,Shikun Feng,Junyuan Shang,Yanbin Zhao,Chao Pang,Jiaxiang Liu,Xuyi Chen,Yuxiang Lu,Weixin Liu,Xi Wang,Yangfan Bai,Qiuliang Chen,Li Zhao,Shiyong Li,Peng Sun,Dianhai Yu,Yanjun Ma,Hao Tian,Hua Wu,Tian Wu,Wei Zeng,Ge Li,Wen Gao,Haifeng Wang 机构:† Baidu Inc., § Peng Cheng Laboratory 备注:arXiv admin note: text overlap with arXiv:2107.02137 摘要:预先训练的语言模型在各种自然语言处理(NLP)任务中取得了最新的成果。GPT-3表明,扩大预先训练的语言模型可以进一步挖掘它们的巨大潜力。最近提出了一个名为ERNIE 3.0的统一框架,用于大规模知识增强模型的预训练,并训练了一个具有100亿个参数的模型。ERNIE 3.0在各种NLP任务上的表现优于最先进的模型。为了探索Ernee 3的放大性能,我们在PaddlePaddle平台上训练了具有多达2600亿个参数的一千亿参数模型ErNe 3 TITAN。此外,我们设计了一个自我监督的对抗性损失和一个可控的语言建模损失,使ERNIE 3.0 Titan生成可信和可控的文本。为了减少计算开销和碳排放,我们为ERNIE 3.0 Titan提出了一个在线蒸馏框架,其中教师模型将同时教授学生和训练自己。厄尼3.0泰坦是迄今为止中国最大的密集预训练模型。实证结果表明,ERNIE 3.0 Titan在68个NLP数据集上的表现优于最先进的模型。 摘要:Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. Furthermore, we design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts. To reduce the computation overhead and carbon emission, we propose an online distillation framework for ERNIE 3.0 Titan, where the teacher model will teach students and train itself simultaneously. ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets.

【2】 Distilling the Knowledge of Romanian BERTs Using Multiple Teachers 标题:利用多名教师提炼罗马尼亚贝特斯知识 链接:https://arxiv.org/abs/2112.12650

作者:Andrei-Marius Avram,Darius Catrina,Dumitru-Clementin Cercel,Mihai Dascălu,Traian Rebedea,Vasile Păiş,Dan Tufiş 机构:Research Institute for Artificial Intelligence, Romanian Academy, University Politehnica of Bucharest, Tudor Vianu National College of Computer Science, Bucharest, Romania 摘要:随着大规模预训练语言模型的迁移学习在自然语言处理中的流行,在计算受限环境中运行这些模型仍然是一个有待解决的挑战性问题。提出了知识提取、网络量化、网络剪枝等解决方案;然而,这些方法主要集中在英语上,因此在考虑低资源语言时会扩大差距。在这项工作中,我们介绍了罗马尼亚语言的三种简单快速的蒸馏BERT模型:Distil-BERT-base-ro、Distil-RoBERT-base和Distil-multi-BERT-base-ro。前两个模型是通过分别提取文献中两个基本版本的罗马尼亚BERTs的知识而得到的,而最后一个模型是通过提取其集合而得到的。据我们所知,这是首次尝试创建公开可用的罗马尼亚模式,该模式在五项任务中进行了全面评估:词性标注、命名实体识别、情感分析、语义-文本相似度和方言识别。在这些基准上的实验结果证明,我们的三个蒸馏模型在教师方面保持了最高的准确性,而在GPU上的速度是GPU的两倍,更小约35%。此外,我们还通过测量学生的标签忠诚度和概率忠诚度,以及回归忠诚度(这项工作中引入的一个新指标),进一步测试学生和教师预测之间的相似性。 摘要:As transfer learning from large-scale pre-trained language models has become prevalent in Natural Language Processing, running these models in computationally constrained environments remains a challenging problem yet to address. Several solutions including knowledge distillation, network quantization or network pruning have been proposed; however, these approaches focus mostly on the English language, thus widening the gap when considering low-resource languages. In this work, we introduce three light and fast versions of distilled BERT models for the Romanian language: Distil-BERT-base-ro, Distil-RoBERT-base and DistilMulti-BERT-base-ro. The first two models resulted from individually distilling the knowledge of the two base versions of Romanian BERTs available in literature, while the last one was obtained by distilling their ensemble. To our knowledge, this is the first attempt to create publicly available Romanian distilled BERT models, which were thoroughly evaluated on five tasks: part-of-speech tagging, named entity recognition, sentiment analysis, semantic textual similarity and dialect identification. The experimental results on these benchmarks proved that our three distilled models maintain most performance in terms of accuracy with their teachers, while being twice as fast on a GPU and ~35% smaller. In addition, we further test the similarity between our students and their teachers prediction by measuring their label and probability loyalty, together with regression loyalty - a new metric introduced in this work.

【3】 S PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation 标题:S PAGE:一种用于会话情感识别的说话人和位置感知图神经网络模型 链接:https://arxiv.org/abs/2112.12389

作者:Chen Liang,Chong Yang,Jing Xu,Juyang Huang,Yongliang Wang,Yang Dong 摘要:会话中的情感识别(ERC)由于其广泛应用的必要性,近年来受到了广泛关注。现有的ERC方法大多是分别对自我和说话人之间的语境进行建模,这是因为两者之间缺乏足够的互动。在本文中,我们提出了一种新的用于ERC(S PAGE)的说话人和位置感知图神经网络模型,该模型包含三个阶段,以结合变换器和关系图卷积网络(R-GCN)的优点,实现更好的上下文建模。首先,提出了一种双流会话变换器,用于提取每个话语的粗略的自我和说话人之间的上下文特征。然后,我们构造了一个说话人和位置感知的对话图,并提出了一个增强的R-GCN模型PAG来细化由相对位置编码引导的粗糙特征。最后,前两个阶段的两个特征都被输入到条件随机场层来模拟情绪传递。 摘要:Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S PAGE), which contains three stages to combine the benefits of both Transformer and relational graph convolution network (R-GCN) for better contextual modeling. Firstly, a two-stream conversational Transformer is presented to extract the coarse self and inter-speaker contextual features for each utterance. Then, a speaker and position-aware conversation graph is constructed, and we propose an enhanced R-GCN model, called PAG, to refine the coarse features guided by a relative positional encoding. Finally, both of the features from the former two stages are input into a conditional random field layer to model the emotion transfer.

推理|分析|理解|解释(1篇)

【1】 Making sense of electrical vehicle discussions using sentiment analysis on closely related news and user comments 标题:利用对密切相关新闻和用户评论的情感分析来理解电动汽车讨论 链接:https://arxiv.org/abs/2112.12327

作者:Josh Everts,Xuan Jiang 摘要:我们对新闻和用户评论数据集使用了无监督和有监督模型,使用了令牌和文档情感分析。我们的代币式情绪分析发现两组之间的情绪存在统计上的显著差异(均为非常大的N),我们的文档式监督情绪分析发现情绪没有显著差异。 摘要:We used a token-wise and document-wise sentiment analysis using both unsupervised and supervised models applied to both news and user reviews dataset. And our token-wise sentiment analysis found a statistically significant difference in sentiment between the two groups (both of which were very large N), our document-wise supervised sentiment analysis found no significant difference in sentiment.

识别/分类(3篇)

【1】 Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition 标题:基于数据增强的自动语音识别一致性对比预训练 链接:https://arxiv.org/abs/2112.12522

作者:Changfeng Gao,Gaofeng Cheng,Yifan Guo,Qingwei Zhao,Pengyuan Zhang 机构: the pre-trained model can learn the contextualizedThe authors are with the Key Laboratory of Speech Acoustics and ContentUnderstanding, Institute of Acoustics, University of Chinese Academy of Sciences(e-mail 备注:5 pages, 2 figures 摘要:自监督声学预训练在自动语音识别(ASR)任务中取得了惊人的效果。大多数成功的声学预训练方法都采用对比学习的方法,通过区分不同时间步长的声学表征来学习声学表征,忽略了说话人和环境的鲁棒性。因此,在微调过程中遇到域外数据时,预先训练的模型可能会表现出较差的性能。在这封信中,我们设计了一种新的一致性对比学习(CCL)方法,利用数据增强进行声学预训练。在原始音频上应用不同类型的增强,然后将增强后的音频馈入编码器。编码器不仅应对比一个音频中的表示,还应最大限度地测量不同增强音频中的表示。通过这种方式,预先训练好的模型可以学习一种与文本相关的表示方法,这种方法随着说话人或环境的变化而变得更加鲁棒。实验表明,通过将CCL方法应用于Wav2Vec2上,该方法具有良好的性能。0时,可以在域内数据和域外数据上实现更好的结果。特别是对于含噪的域外数据,相对改善率可达15%以上。 摘要:Self-supervised acoustic pre-training has achieved amazing results on the automatic speech recognition (ASR) task. Most of the successful acoustic pre-training methods use contrastive learning to learn the acoustic representations by distinguish the representations from different time steps, ignoring the speaker and environment robustness. As a result, the pre-trained model could show poor performance when meeting out-of-domain data during fine-tuning. In this letter, we design a novel consistency contrastive learning (CCL) method by utilizing data augmentation for acoustic pre-training. Different kinds of augmentation are applied on the original audios and then the augmented audios are fed into an encoder. The encoder should not only contrast the representations within one audio but also maximize the measurement of the representations across different augmented audios. By this way, the pre-trained model can learn a text-related representation method which is more robust with the change of the speaker or the environment.Experiments show that by applying the CCL method on the Wav2Vec2.0, better results can be realized both on the in-domain data and the out-of-domain data. Especially for noisy out-of-domain data, more than 15% relative improvement can be obtained.

【2】 More Than Words: Towards Better Quality Interpretations of Text Classifiers 标题:不仅仅是文字:朝着更高质量的文本分类器解释迈进 链接:https://arxiv.org/abs/2112.12444

作者:Muhammad Bilal Zafar,Philipp Schmidt,Michele Donini,Cédric Archambeau,Felix Biessmann,Sanjiv Ranjan Das,Krishnaram Kenthapadi 机构:Amazon Web Services,Amazon Search,Santa Clara University 摘要:最先进的文本分类器的庞大规模和复杂决策机制使得人类难以理解其预测,从而导致用户可能缺乏信任。这些问题导致采用SHAP和集成梯度等方法,通过为输入标记分配重要性分数来解释分类决策。然而,以前的工作,使用不同的随机化测试,已经表明由这些方法产生的解释可能不可靠。例如,在测试集中做出相同预测的模型可能仍然会导致不同的特征重要性排名。为了解决基于标记的可解释性缺乏稳健性的问题,我们在更高的语义层次(如句子)上探索解释。我们使用计算度量和人类主体研究来比较基于句子的解释和基于标记的解释的质量。我们的实验表明,更高级别的特征属性具有以下优点:1)通过随机化测试测量,它们更稳健;2)使用基于近似的方法(如SHAP)时,它们导致更低的可变性,3)在语言连贯性处于更高粒度水平的情况下,人类更容易理解它们。基于这些发现,我们表明,基于令牌的可解释性虽然是ML模型输入接口的首选,但并非在所有情况下都是最有效的。 摘要:The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using different randomization tests, has shown that interpretations generated by these methods may not be robust. For instance, models making the same predictions on the test set may still lead to different feature importance rankings. In order to address the lack of robustness of token-based interpretability, we explore explanations at higher semantic levels like sentences. We use computational metrics and human subject studies to compare the quality of sentence-based interpretations against token-based ones. Our experiments show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher granularity level. Based on these findings, we show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations.

【3】 Morphological classifiers 标题:形态分类器 链接:https://arxiv.org/abs/2112.12262

作者:É. O. Rodrigues,A. Conci,P. Liatsis 机构:a Institute of Science and Technology, Universidade Federal de Itajuba (UNIFEI), Minas Gerais, Brazil, b Department of Computer Science, Universidade Federal Fluminense, Niterói - Rio de Janeiro, Brazil 备注:Pattern Recognition, 2018 摘要:本文提出了一种新的分类器&形态分类器(MC)。MCs从数学形态学和监督学习中收集概念。这种聚合的结果是,根据停止标准和结构元素的选择,可以保留类的形状特征的分类器。MCs基本上基于集合论,其分类模型本身可以是一个数学集合。本文提出了两种形态分类器,即形态k-NN(MkNN)和形态膨胀分类器(MDC),证明了该方法的可行性。这项工作为MCs的优势提供了证据,例如,非常快的分类时间以及具有竞争力的准确率。使用p维数据集测试MkNN和MDC的性能。在8个数据集中的5个数据集中,MCs与14个成熟的分类器持平或表现优异。在所有情况下,获得的准确度都高于使用所有分类器获得的平均准确度。此外,建议的实现利用图形处理单元(GPU)的能力来加速处理。 摘要:This work proposes a new type of classifier called Morphological Classifier (MC). MCs aggregate concepts from mathematical morphology and supervised learning. The outcomes of this aggregation are classifiers that may preserve shape characteristics of classes, subject to the choice of a stopping criterion and structuring element. MCs are fundamentally based on set theory, and their classification model can be a mathematical set itself. Two types of morphological classifiers are proposed in the current work, namely, Morphological k-NN (MkNN) and Morphological Dilation Classifier (MDC), which demonstrate the feasibility of the approach. This work provides evidence regarding the advantages of MCs, e.g., very fast classification times as well as competitive accuracy rates. The performance of MkNN and MDC was tested using p -dimensional datasets. MCs tied or outperformed 14 well established classifiers in 5 out of 8 datasets. In all occasions, the obtained accuracies were higher than the average accuracy obtained with all classifiers. Moreover, the proposed implementations utilize the power of the Graphics Processing Units (GPUs) to speed up processing.

其他神经网络|深度学习|模型|建模(3篇)

【1】 Towards more patient friendly clinical notes through language models and ontologies 标题:通过语言模型和本体实现对患者更友好的临床记录 链接:https://arxiv.org/abs/2112.12672

作者:Francesco Moramarco,Damir Juric,Aleksandar Savkov,Jack Flann,Maria Lehl,Kristian Boda,Tessa Grafen,Vitalii Zhelezniak,Sunir Gohil,Alex Papadopoulos Korfiatis,Nils Hammerla 机构:Hammerla, Babylon Health, London, UK 摘要:临床记录是记录患者信息的一种有效方式,但对于非专家来说很难破译。自动简化医学文本可以为患者提供有关其健康的宝贵信息,同时节省临床医生的时间。我们提出了一种基于词频和语言建模的医学文本自动简化的新方法,该方法基于医学本体,丰富了外行术语。我们发布了一个新的数据集,该数据集包含了一对公开的医学句子,以及一个由临床医生简化的版本。此外,我们还定义了一个新的文本简化度量和评估框架,我们使用该框架对我们的方法进行了大规模的人工评估,以对抗现有技术。我们的方法基于在医学论坛数据上训练的语言模型,生成更简单的句子,同时保留语法和原始含义,超过了当前的技术水平。 摘要:Clinical notes are an efficient way to record patient information but are notoriously hard to decipher for non-experts. Automatically simplifying medical text can empower patients with valuable information about their health, while saving clinicians time. We present a novel approach to automated simplification of medical text based on word frequencies and language modelling, grounded on medical ontologies enriched with layman terms. We release a new dataset of pairs of publicly available medical sentences and a version of them simplified by clinicians. Also, we define a novel text simplification metric and evaluation framework, which we use to conduct a large-scale human evaluation of our method against the state of the art. Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning, surpassing the current state of the art.

【2】 Do Multi-Lingual Pre-trained Language Models Reveal Consistent Token Attributions in Different Languages? 标题:多语言预先训练的语言模型在不同语言中是否显示出一致的标记属性? 链接:https://arxiv.org/abs/2112.12356

作者:Junxiang Wang,Xuchao Zhang,Bo Zong,Yanchi Liu,Wei Cheng,Jingchao Ni,Haifeng Chen,Liang Zhao 机构:†NEC Laboratories America, Princeton, NJ, USA, ‡ Emory University 摘要:在过去几年中,为了在许多跨语言的下游任务中实现最先进的性能,提出了一种多语言预训练语言模型(PLM)。然而,对多语言PLM为何表现良好的理解仍然是一个开放的领域。例如,目前尚不清楚多语言PLM是否在不同语言中显示一致的标记属性。为了解决这个问题,在本文中,我们提出了一个跨语言标记属性一致性(CCTA)评估框架。在三个下游任务中进行的大量实验表明,多语种PLM对多语种同义词的归因有显著差异。此外,我们有以下观察结果:1)西班牙语在用于PLM训练时,在不同语言中实现了最一致的标记属性;2) 令牌属性的一致性与下游任务的性能密切相关。 摘要:During the past several years, a surge of multi-lingual Pre-trained Language Models (PLMs) has been proposed to achieve state-of-the-art performance in many cross-lingual downstream tasks. However, the understanding of why multi-lingual PLMs perform well is still an open domain. For example, it is unclear whether multi-Lingual PLMs reveal consistent token attributions in different languages. To address this, in this paper, we propose a Cross-lingual Consistency of Token Attributions (CCTA) evaluation framework. Extensive experiments in three downstream tasks demonstrate that multi-lingual PLMs assign significantly different attributions to multi-lingual synonyms. Moreover, we have the following observations: 1) the Spanish achieves the most consistent token attributions in different languages when it is used for training PLMs; 2) the consistency of token attributions strongly correlates with performance in downstream tasks.

【3】 Are E2E ASR models ready for an industrial usage? 标题:E2E ASR型号是否已准备好投入工业使用? 链接:https://arxiv.org/abs/2112.12572

作者:Valentin Vielzeuf,Grigory Antipov 机构:Orange, rue du Clos Courtel, Cesson-S´evign´e, France 摘要:随着全神经(端到端,E2E)方法的兴起,自动语音识别(ASR)社区经历了一个重大转折点。同时,传统的混合模型仍然是ASR实际应用的标准选择。根据以前的研究,E2E ASR在实际应用中的采用受到两个主要限制的阻碍:它们在看不见的领域上的推广能力和高昂的运营成本。在本文中,我们通过对几个当代E2E模型和一个混合基线进行全面的多领域基准测试来研究上述两个缺点。我们的实验表明,E2E模型是混合方法的可行替代方案,甚至在准确性和操作效率方面都优于基线。因此,我们的研究表明,泛化和复杂性问题不再是工业集成的主要障碍,并提请社区注意E2E方法在某些特定用例中的其他潜在限制。 摘要:The Automated Speech Recognition (ASR) community experiences a major turning point with the rise of the fully-neural (End-to-End, E2E) approaches. At the same time, the conventional hybrid model remains the standard choice for the practical usage of ASR. According to previous studies, the adoption of E2E ASR in real-world applications was hindered by two main limitations: their ability to generalize on unseen domains and their high operational cost. In this paper, we investigate both above-mentioned drawbacks by performing a comprehensive multi-domain benchmark of several contemporary E2E models and a hybrid baseline. Our experiments demonstrate that E2E models are viable alternatives for the hybrid approach, and even outperform the baseline both in accuracy and in operational efficiency. As a result, our study shows that the generalization and complexity issues are no longer the major obstacle for industrial integration, and draws the community's attention to other potential limitations of the E2E approaches in some specific use-cases.

其他(3篇)

【1】 TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language 标题:TFW2V:一种提高形态丰富芬兰语文档相似度的方法 链接:https://arxiv.org/abs/2112.12489

作者:Quan Duong,Mika Hämäläinen,Khalid Alnajjar 机构:University of Helsinki,Rootroo Ltd, Finland 备注:Workshop on Natural Language Processing for Digital Humanities (NLP4DH) 摘要:测量不同文本的语义相似度在信息检索、文档聚类和文本摘要等数字人文研究中有着重要的应用。不同方法的性能取决于文本的长度、域和语言。这项研究的重点是对目前芬兰语的一些方法进行实验,芬兰语是一种形态丰富的语言。同时,我们提出了一种简单的方法TFW2V,它在处理长文本文档和有限的数据量方面都显示出了很高的效率。此外,我们还设计了一种客观的评价方法,作为文本相似性方法的基准框架。 摘要:Measuring the semantic similarity of different texts has many important applications in Digital Humanities research such as information retrieval, document clustering and text summarization. The performance of different methods depends on the length of the text, the domain and the language. This study focuses on experimenting with some of the current approaches to Finnish, which is a morphologically rich language. At the same time, we propose a simple method, TFW2V, which shows high efficiency in handling both long text documents and limited amounts of data. Furthermore, we design an objective evaluation method which can be used as a framework for benchmarking text similarity approaches.

【2】 Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation 标题:Sparse-Softmax:一种更简单、更快速的替代Softmax变换 链接:https://arxiv.org/abs/2112.12433

作者:Shaoshi Sun,Zhenyuan Zhang,BoCheng Huang,Pengbin Lei,Jianlin Su,Shengfeng Pan,Jiarun Cao 机构:School of Computer Science and Informatics, Cardiff University, the United Kingdom, Department of Economics, Osaka City University, Japan, School of Software Engineering, Beijing Jiaotong University, China 摘要:softmax函数广泛应用于人工神经网络中的多类分类问题,其中softmax变换强制输出为正和为一,相应的损失函数允许使用最大似然原理优化模型。然而,softmax在进行高维分类时,为损失函数进行优化操作留下了很大的余地,这在一定程度上导致了性能低下。在本文中,我们提供了一个简单简洁的softmax变体,即稀疏softmax的实证研究,以缓解传统softmax在高维分类问题方面出现的问题。我们在几个跨学科任务中评估了我们的方法,实验结果表明稀疏softmax比基线模型更简单、更快,并且产生更好的结果。 摘要:The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to use maximum likelihood principle to optimize the model. However, softmax leaves a large margin for loss function to conduct optimizing operation when it comes to high-dimensional classification, which results in low-performance to some extent. In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems. We evaluate our approach in several interdisciplinary tasks, the experimental results show that sparse-softmax is simpler, faster, and produces better results than the baseline models.

【3】 Evolution and trade-off dynamics of functional load 标题:功能载荷的演化与权衡动力学 链接:https://arxiv.org/abs/2112.12224

作者:Erich Round,Rikker Dockum,Robin J. Ryder 摘要:功能负荷(FL)量化了语音对比对词汇差异的贡献。先前的研究表明,FL值特别低与声音变化有关。在这里,我们扩大了对FL的调查范围,以了解其在所有价值观上的演变。我们应用系统发育方法研究了澳大利亚Pama Nyungan(PN)家族90种语言中FL的历时演化。我们在外语中发现了高度的系统发育信号。虽然系统发育信号已被报道用于语音结构,如语音策略,但其在语音功能测量中的检测是新颖的。我们还发现,元音长度的FL与后续辅音的FL之间存在显著的负相关,即深层次的历史权衡动态,这与现代PN语言中已知的同音异义和过去的补偿性语音变化有关。这一发现揭示了一种历史动态,类似于Transphonology,我们将其描述为音系子系统之间的对比流。我们的发现反复出现在一个跨越整个大陆和数千年时间深度的语系中,为萨皮尔的“漂移”假设提供了一个最令人信服的例子,即历史相关语言的非偶然平行发展。 摘要:Function Load (FL) quantifies the contributions by phonological contrasts to distinctions made across the lexicon. Previous research has linked particularly low values of FL to sound change. Here we broaden the scope of enquiry into FL, to its evolution at all values. We apply phylogenetic methods to examine the diachronic evolution of FL across 90 languages of the Pama-Nyungan (PN) family of Australia. We find a high degree of phylogenetic signal in FL. Though phylogenetic signal has been reported for phonological structures, such as phonotactics, its detection in measures of phonological function is novel. We also find a significant, negative correlation between the FL of vowel length and of the following consonant, that is, a deep-time historical trade-off dynamic, which we relate to known allophony in modern PN languages and compensatory sound changes in their past. The finding reveals a historical dynamic, similar to transphonologization, which we characterize as a flow of contrastiveness between subsystems of the phonology. Recurring across a language family which spans a whole continent and many millennia of time depth, our finding provides one of the most compelling examples yet of Sapir's 'drift' hypothesis, of non-accidentally parallel development in historically related languages.

机器翻译,仅供参考

0 人点赞