Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计76篇
Graph相关(图学习|图神经网络|图优化等)(3篇)
【1】 A Study of Joint Graph Inference and Forecasting 标题:联合图推理与预测的研究 链接:https://arxiv.org/abs/2109.04979
作者:Daniel Zügner,François-Xavier Aubet,Victor Garcia Satorras,Tim Januschowski,Stephan Günnemann,Jan Gasthaus 机构: some- 1Technical University of Munich 2Work done while being anintern at AWS AI Labs, Amazon Web Services 4University of Amsterdam 备注:Published at the ICML 2021 Time Series Workshop 摘要:我们研究了最近一类使用图形神经网络(GNNs)改进多元时间序列预测的模型。这些模型背后的核心假设是,时间序列(节点)之间存在一个潜在图,该图控制着多元时间序列的演化。通过以可微的方式对图进行参数化,这些模型旨在提高预测质量。我们在预测任务上比较了这一类的四个最新模型。此外,我们进行烧蚀以研究其在变化条件下的行为,例如,禁用图形学习模块并提供基本真理关系时。基于我们的发现,我们提出了结合现有体系结构的新方法。 摘要:We study a recent class of models which uses graph neural networks (GNNs) to improve forecasting in multivariate time series. The core assumption behind these models is that there is a latent graph between the time series (nodes) that governs the evolution of the multivariate time series. By parameterizing a graph in a differentiable way, the models aim to improve forecasting quality. We compare four recent models of this class on the forecasting task. Further, we perform ablations to study their behavior under changing conditions, e.g., when disabling the graph-learning modules and providing the ground-truth relations instead. Based on our findings, we propose novel ways of combining the existing architectures.
【2】 Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization 标题:基于GDPA线性化的二叉图分类器展开无投影SDP松弛 链接:https://arxiv.org/abs/2109.04697
作者:Cheng Yang,Gene Cheung,Wai-tian Tan,Guangtao Zhai 机构:† Shanghai Jiaotong University, Shanghai, China ‡ York University, Toronto, Canada § Cisco Systems, San José, CA 摘要:算法展开通过将基于模型的算法的每次迭代作为一个神经层来实现,从而创建了一个可解释且简洁的神经网络体系结构。然而,由于需要全矩阵特征分解,每次迭代使用半正定(PSD)圆锥体投影算子展开近似分裂算法的成本很高。在本文中,利用最近的线性代数定理Gershgorin disc perfect alignment(GDPA),我们展开了一个用于二元图分类器半定规划松弛(SDR)的无投影算法,其中PSD锥约束在每次迭代中被一组“尽可能紧的”线性约束所取代。因此,每次迭代只需要计算一个线性规划(LP)和一个极端特征向量。在展开网络内部,我们通过随机梯度下降(SGD)优化参数,该方法通过两种方式确定图边权重:i)计算特征距离的度量矩阵,以及ii)通过局部线性嵌入(LLE)计算的稀疏权重矩阵。实验结果表明,我们的展开网络优于纯基于模型的图分类器,并且实现了与纯数据驱动网络相当的性能,但使用的参数要少得多。 摘要:Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. However, unfolding a proximal splitting algorithm with a positive semi-definite (PSD) cone projection operator per iteration is expensive, due to the required full matrix eigen-decomposition. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph classifier, where the PSD cone constraint is replaced by a set of "tightest possible" linear constraints per iteration. As a result, each iteration only requires computing a linear program (LP) and one extreme eigenvector. Inside the unrolled network, we optimize parameters via stochastic gradient descent (SGD) that determine graph edge weights in two ways: i) a metric matrix that computes feature distances, and ii) a sparse weight matrix computed via local linear embedding (LLE). Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
【3】 Spatially Focused Attack against Spatiotemporal Graph Neural Networks 标题:时空图神经网络的空间聚焦攻击 链接:https://arxiv.org/abs/2109.04608
作者:Fuqiang Liu,Luis Miranda-Moreno,Lijun Sun 机构:McGill University, Montreal, QC, Canada 摘要:时空预测在智能交通系统(ITS)的各种应用中起着至关重要的作用,如路线规划、导航、交通控制和管理。深度时空图神经网络(GNNs)能够同时捕获空间和时间模式,在交通预测应用中取得了巨大的成功。了解基于GNNs的预测是如何工作的,以及这些模型的脆弱性和鲁棒性对于现实世界的应用至关重要。例如,如果时空GNN在现实世界的交通预测应用中易受攻击,黑客就可以轻易地操纵结果,造成严重的交通拥堵甚至城市规模的崩溃。然而,尽管最近的研究表明,深度神经网络(DNN)容易受到多个领域(如目标分类和图形表示)中精心设计的扰动的影响,由于预测模型的因果性和时空机制,目前的对抗性工作无法直接应用于时空预测。为了填补这一空白,本文设计了空间聚焦攻击(SFA),通过攻击单个顶点来打破时空GNN。为了实现这一点,我们首先提出逆估计来解决因果关系问题;然后,我们采用遗传算法和通用攻击方法作为评估函数来定位最薄弱的顶点;最后,通过求解基于逆估计的优化问题生成扰动。我们对真实的交通数据进行了实验,结果表明,SA设计的一个顶点上的扰动可以扩散到图的大部分。 摘要:Spatiotemporal forecasting plays an essential role in various applications in intelligent transportation systems (ITS), such as route planning, navigation, and traffic control and management. Deep Spatiotemporal graph neural networks (GNNs), which capture both spatial and temporal patterns, have achieved great success in traffic forecasting applications. Understanding how GNNs-based forecasting work and the vulnerability and robustness of these models becomes critical to real-world applications. For example, if spatiotemporal GNNs are vulnerable in real-world traffic prediction applications, a hacker can easily manipulate the results and cause serious traffic congestion and even a city-scale breakdown. However, despite that recent studies have demonstrated that deep neural networks (DNNs) are vulnerable to carefully designed perturbations in multiple domains like objection classification and graph representation, current adversarial works cannot be directly applied to spatiotemporal forecasting due to the causal nature and spatiotemporal mechanisms in forecasting models. To fill this gap, in this paper we design Spatially Focused Attack (SFA) to break spatiotemporal GNNs by attacking a single vertex. To achieve this, we first propose the inverse estimation to address the causality issue; then, we apply genetic algorithms with a universal attack method as the evaluation function to locate the weakest vertex; finally, perturbations are generated by solving an inverse estimation-based optimization problem. We conduct experiments on real-world traffic data and our results show that perturbations in one vertex designed by SA can be diffused into a large part of the graph.
Transformer(2篇)
【1】 Block Pruning For Faster Transformers 标题:挡路修剪以实现更快的Transformer 链接:https://arxiv.org/abs/2109.04838
作者:François Lagunas,Ella Charlaix,Victor Sanh,Alexander M. Rush 机构:Hugging Face 备注:EMNLP 2021. Code, hyper-parameters, evaluation results and checkpoints available at this https URL 摘要:预训练提高了分类和生成任务的模型精度,但代价是引入了更大、更慢的模型。修剪方法已被证明是减少模型尺寸的有效方法,而蒸馏方法已被证明可加速推理。我们介绍了一种针对小型和快速模型的块修剪方法。我们的方法通过考虑任何大小的块来扩展结构化方法,并将此结构集成到运动修剪范式中进行微调。我们发现,这种方法学习剪除底层模型的全部组件,例如注意头。实验考虑分类和生成任务,在其他结果中产生一个修剪模型,该模型是2.4x更快,74%小BERT在小队V1上,在F1上有1%的下降,在速度和修剪模型的大小上都具有竞争性。 摘要:Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models. Pruning methods have proven to be an effective way of reducing model size, whereas distillation methods are proven for speeding up inference. We introduce a block pruning approach targeting both small and fast models. Our approach extends structured methods by considering blocks of any size and integrates this structure into the movement pruning paradigm for fine-tuning. We find that this approach learns to prune out full components of the underlying model, such as attention heads. Experiments consider classification and generation tasks, yielding among other results a pruned model that is a 2.4x faster, 74% smaller BERT on SQuAD v1, with a 1% drop on F1, competitive both with distilled models in speed and pruned models in size.
【2】 On the validity of pre-trained transformers for natural language processing in the software engineering domain 标题:软件工程领域自然语言处理用预训练转换器的有效性研究 链接:https://arxiv.org/abs/2109.04738
作者:Julian von der Mosel,Alexander Trautsch,Steffen Herbold 机构: Trautsch were with the Institute of ComputerScience, University of Goettingen 备注:Review status: submitted 摘要:Transformers是许多领域中自然语言处理的最新技术,在软件工程研究中也发挥着重要作用。这类模型是根据大量数据(通常来自一般领域)预先训练的。然而,我们对软件工程领域中的转换器的有效性只有有限的了解,即这些模型在软件工程环境中理解单词和句子的能力如何,以及这如何提高最新技术水平。在本文中,我们将阐明这个复杂但关键的问题。我们将使用软件工程数据训练的BERT transformer模型与基于多维度通用领域数据的transformer模型进行比较:它们的词汇表、理解缺少哪些单词的能力以及它们在分类任务中的性能。我们的结果表明,对于需要理解软件工程上下文的任务,使用软件工程数据进行预训练是很有价值的,而通用领域模型对于软件工程领域内的通用语言理解是足够的。 摘要:Transformers are the current state-of-the-art of natural language processing in many domains and are using traction within software engineering research as well. Such models are pre-trained on large amounts of data, usually from the general domain. However, we only have a limited understanding regarding the validity of transformers within the software engineering domain, i.e., how good such models are at understanding words and sentences within a software engineering context and how this improves the state-of-the-art. Within this article, we shed light on this complex, but crucial issue. We compare BERT transformer models trained with software engineering data with transformers based on general domain data in multiple dimensions: their vocabulary, their ability to understand which words are missing, and their performance in classification tasks. Our results show that for tasks that require understanding of the software engineering context, pre-training with software engineering data is valuable, while general domain models are sufficient for general language understanding, also within the software engineering domain.
GAN|对抗|攻击|生成相关(2篇)
【1】 Counterfactual Adversarial Learning with Representation Interpolation 标题:基于表征插值的反事实对抗性学习 链接:https://arxiv.org/abs/2109.04746
作者:Wei Wang,Boxin Wang,Ning Shi,Jinfeng Li,Bingyu Zhu,Xiangyu Liu,Rong Zhang 机构:Alibaba Group, University of Illinois at Urbana-Champaign 备注:Accepted to Findings of EMNLP 2021 摘要:与逻辑推理相比,深度学习模型更倾向于统计拟合。当训练数据中存在统计偏差时,可能会记忆伪相关,这严重限制了模型的性能,尤其是在小数据场景中。在这项工作中,我们引入了反事实对抗训练框架(CAT),从因果关系的角度来解决这个问题。特别地,对于特定样本,CAT首先以对抗的方式通过潜在空间插值生成反事实表示,然后对每个原始反事实对执行反事实风险最小化(CRM),以动态调整样本损失权重,这鼓励模型探索真正的因果关系。大量实验表明,与SOTA相比,CAT在不同的下游任务(包括句子分类、自然语言推理和问答)中取得了显著的性能改进。 摘要:Deep learning models exhibit a preference for statistical fitting over logical reasoning. Spurious correlations might be memorized when there exists statistical bias in training data, which severely limits the model performance especially in small data scenarios. In this work, we introduce Counterfactual Adversarial Training framework (CAT) to tackle the problem from a causality perspective. Particularly, for a specific sample, CAT first generates a counterfactual representation through latent space interpolation in an adversarial manner, and then performs Counterfactual Risk Minimization (CRM) on each original-counterfactual pair to adjust sample-wise loss weight dynamically, which encourages the model to explore the true causal effect. Extensive experiments demonstrate that CAT achieves substantial performance improvement over SOTA across different downstream tasks, including sentence classification, natural language inference and question answering.
【2】 Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning 标题:利用可区分奖励模仿学习生成自含式和以摘要为中心的问题答案对 链接:https://arxiv.org/abs/2109.04689
作者:Li Zhou,Kevin Small,Yong Zhang,Sandeep Atluri 机构:Amazon Alexa 备注:To appear in Proceedings of EMNLP 2021 摘要:受会话新闻推荐系统中建议问题生成的启发,我们提出了一个以摘要为中心的自足问题和长度受限的文章摘要答案生成问题-答案对(QA对)的模型。我们首先收集一个新的新闻文章数据集,以问题为标题,并将其与不同长度的摘要配对。该数据集用于学习一个QA对生成模型,生成摘要作为答案,在简洁性和充分性以及相应问题之间取得平衡。然后,我们用一个可微的奖励函数来加强QA对生成过程,以减轻暴露偏差,这是自然语言生成中的一个常见问题。自动度量和人工评估都表明,这些QA对成功地抓住了文章的中心要点,并实现了较高的答案准确性。 摘要:Motivated by suggested question generation in conversational news recommendation systems, we propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers. We begin by collecting a new dataset of news articles with questions as titles and pairing them with summaries of varying length. This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions. We then reinforce the QA pair generation process with a differentiable reward function to mitigate exposure bias, a common problem in natural language generation. Both automatic metrics and human evaluation demonstrate these QA pairs successfully capture the central gists of the articles and achieve high answer accuracy.
半/弱/无/有监督|不确定性|主动学习(9篇)
【1】 Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training 标题:基于噪声鲁棒学习和语言模型增强自训练的远程监督命名实体识别 链接:https://arxiv.org/abs/2109.05003
作者:Yu Meng,Yunyi Zhang,Jiaxin Huang,Xuan Wang,Yu Zhang,Heng Ji,Jiawei Han 机构:University of Illinois Urbana-Champaign, IL, USA 备注:EMNLP 2021. (Code: this https URL) 摘要:我们研究仅使用远距离标记数据训练命名实体识别(NER)模型的问题,该数据可通过将原始文本中的实体提及与知识库中的实体类型匹配而自动获得。远程监督学习的最大挑战是,远程监督可能会导致标签不完整和有噪声,从而导致监督学习的直接应用无效。在本文中,我们提出(1)一种由新的损失函数和噪声标签去除步骤组成的噪声鲁棒学习方案,用于在距离标签数据上训练NER模型;(2)一种自训练方法,使用预先训练的语言模型创建的上下文增强来提高NER模型的泛化能力。在三个基准数据集上,我们的方法取得了优异的性能,显著优于现有的远程监督NER模型。 摘要:We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base. The biggest challenge of distantly-supervised NER is that the distant supervision may induce incomplete and noisy labels, rendering the straightforward application of supervised learning ineffective. In this paper, we propose (1) a noise-robust learning scheme comprised of a new loss function and a noisy label removal step, for training NER models on distantly-labeled data, and (2) a self-training method that uses contextualized augmentations created by pre-trained language models to improve the generalization ability of the NER model. On three benchmark datasets, our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
【2】 Unsupervised Change Detection in Hyperspectral Images using Feature Fusion Deep Convolutional Autoencoders 标题:基于特征融合深卷积自动编码器的高光谱图像无监督变化检测 链接:https://arxiv.org/abs/2109.04990
作者:Debasrita Chakraborty,Ashish Ghosh 备注:19 pages 摘要:由于数据中存在大量谱带,双时相共配准高光谱图像中的二值变化检测是一项具有挑战性的任务。因此,研究人员试图通过缩小尺寸来处理它。提出的工作旨在建立一个新的特征提取系统,使用特征融合深度卷积自动编码器来检测一对这样的双时共配准高光谱图像之间的变化。特征融合考虑了连续层次和多个感受野的特征,因此与现有的特征提取方法相比增加了竞争优势。所描述的变化检测技术是完全无监督的,并且比其他需要一定数量标签信息的有监督或半监督方法更加优雅。对提取的特征采用了不同的方法来发现两幅图像中的变化,并且发现所提出的方法在所有数据集的无监督变化检测方面明显优于最新的方法。 摘要:Binary change detection in bi-temporal co-registered hyperspectral images is a challenging task due to a large number of spectral bands present in the data. Researchers, therefore, try to handle it by reducing dimensions. The proposed work aims to build a novel feature extraction system using a feature fusion deep convolutional autoencoder for detecting changes between a pair of such bi-temporal co-registered hyperspectral images. The feature fusion considers features across successive levels and multiple receptive fields and therefore adds a competitive edge over the existing feature extraction methods. The change detection technique described is completely unsupervised and is much more elegant than other supervised or semi-supervised methods which require some amount of label information. Different methods have been applied to the extracted features to find the changes in the two images and it is found that the proposed method clearly outperformed the state of the art methods in unsupervised change detection for all the datasets.
【3】 ReasonBERT: Pre-trained to Reason with Distant Supervision 标题:ReasonBERT:接受过远程监督推理的预训 链接:https://arxiv.org/abs/2109.04912
作者:Xiang Deng,Yu Su,Alyssa Lees,You Wu,Cong Yu,Huan Sun 机构:The Ohio State University, Columbus, OH, Google Research, New York, NY 备注:Accepted to EMNLP'2021. Our code and pre-trained models are available at this https URL 摘要:我们提出了ReasonBert,这是一种预训练方法,它增强了语言模型在长期关系和多个可能混合的上下文中进行推理的能力。与现有的只从自然出现的文本的局部上下文中获取学习信号的预训练方法不同,我们提出了一种广义的远程监督概念,自动连接多个文本和表格,以创建需要长期推理的预训练示例。模拟了不同类型的推理,包括交叉多个证据、从一个证据过渡到另一个证据以及检测无法回答的案例。我们对从单跳到多跳、从纯文本到纯表格再到混合的各种抽取式问答数据集进行了综合评估,这些数据集需要各种推理能力,并表明ReasonBert在一系列强基线上取得了显著的改进。少数镜头实验进一步证明,我们的预训练方法大大提高了样本效率。 摘要:We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts. Unlike existing pre-training methods that only harvest learning signals from local contexts of naturally occurring texts, we propose a generalized notion of distant supervision to automatically connect multiple pieces of text and tables to create pre-training examples that require long-range reasoning. Different types of reasoning are simulated, including intersecting multiple pieces of evidence, bridging from one piece of evidence to another, and detecting unanswerable cases. We conduct a comprehensive evaluation on a variety of extractive question answering datasets ranging from single-hop to multi-hop and from text-only to table-only to hybrid that require various reasoning capabilities and show that ReasonBert achieves remarkable improvement over an array of strong baselines. Few-shot experiments further demonstrate that our pre-training method substantially improves sample efficiency.
【4】 Active learning for reducing labeling effort in text classification tasks 标题:减少文本分类任务中标注工作量的主动学习 链接:https://arxiv.org/abs/2109.04847
作者:Pieter Floris Jacobs,Gideon Maillette de Buy Wenniger,Marco Wiering,Lambert Schomaker 摘要:标记数据可能是一项昂贵的任务,因为它通常由领域专家手动执行。这对于深度学习来说很麻烦,因为它依赖于大型标记数据集。主动学习(AL)是一种范式,旨在通过仅使用所用模型认为信息量最大的数据来减少标记工作。很少有人在文本分类环境中对AL进行研究,几乎没有人涉及最新的、最先进的NLP模型。在这里,我们提出了一项实证研究,比较了不同的基于不确定性的算法,并使用BERT${base}$作为分类器。我们在两个NLP分类数据集:斯坦福情感树库和KvK FrontPage上评估了算法。此外,我们还探讨了旨在解决基于AL的不确定性预设问题的启发式方法;也就是说,它是不可伸缩的,并且容易选择异常值。此外,我们还探讨了查询池大小对AL性能的影响,发现所提出的AL启发式算法并没有提高AL的性能;我们的结果表明,使用基于不确定性的AL和BERT${base}$比数据的随机抽样要好。随着查询池大小的增大,这种性能差异可能会减小。 摘要:Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art NLP models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT$_{base}$ as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERT$_{base}$ outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.
【5】 Enhancing Unsupervised Anomaly Detection with Score-Guided Network 标题:利用分数导引网络增强无监督异常检测 链接:https://arxiv.org/abs/2109.04684
作者:Zongyuan Huang,Baohua Zhang,Guoqiang Hu,Longyuan Li,Yanyan Xu,Yaohui Jin 机构: Jin are with the MoE Key Laboratoryof Artificial Intelligence and AI Institute 摘要:异常检测在各种实际应用中起着至关重要的作用,包括医疗保健和金融系统。由于这些复杂系统中异常标签的数量有限,无监督异常检测方法近年来受到了广泛关注。现有无监督方法面临的两大挑战是:(i)在过渡场中区分正常数据和异常数据,其中正常数据和异常数据高度混合在一起;(ii)定义一个有效的度量,以最大化由表征学习者建立的假设空间中正常和异常数据之间的差距。为此,本研究提出了一种新的评分网络,该网络采用评分引导的正则化方法来学习和扩大正常数据和异常数据之间的异常评分差异。通过这种分数引导策略,表征学习者可以在模型训练阶段逐步学习更多的信息表征,特别是对于过渡场中的样本。接下来,我们提出了一种分数引导自动编码器(SG-AE),将分数网络合并到用于异常检测的自动编码器框架中,以及其他三种最先进的模型,以进一步证明设计的有效性和可转移性。在合成数据集和真实数据集上的大量实验证明了这些分数引导模型(SGM)的最新性能。 摘要:Anomaly detection plays a crucial role in various real-world applications, including healthcare and finance systems. Owing to the limited number of anomaly labels in these complex systems, unsupervised anomaly detection methods have attracted great attention in recent years. Two major challenges faced by the existing unsupervised methods are: (i) distinguishing between normal and abnormal data in the transition field, where normal and abnormal data are highly mixed together; (ii) defining an effective metric to maximize the gap between normal and abnormal data in a hypothesis space, which is built by a representation learner. To that end, this work proposes a novel scoring network with a score-guided regularization to learn and enlarge the anomaly score disparities between normal and abnormal data. With such score-guided strategy, the representation learner can gradually learn more informative representation during the model training stage, especially for the samples in the transition field. We next propose a score-guided autoencoder (SG-AE), incorporating the scoring network into an autoencoder framework for anomaly detection, as well as other three state-of-the-art models, to further demonstrate the effectiveness and transferability of the design. Extensive experiments on both synthetic and real-world datasets demonstrate the state-of-the-art performance of these score-guided models (SGMs).
【6】 SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks 标题:SanitAis:非监督数据增强以净化特洛伊木马神经网络 链接:https://arxiv.org/abs/2109.04566
作者:Kiran Karra,Chace Ashcraft 机构:Johns Hopkins University Applied Physics Laboratory, Montpelier Rd, Laurel, MD 备注:7 pages, 10 figures 摘要:自监督方法的应用通过利用大量未利用的未标记数据来学习广义底层结构,从而广泛提高了神经网络的性能。在这项工作中,我们利用无监督数据增强(UDA)来缓解深度神经网络上的后门或特洛伊木马攻击。我们表明,对于特征空间和点触发器,UDA在消除触发器影响方面比当前最先进的方法更有效。这些结果表明,UDA是一种有效且实用的方法,可以缓解后门对神经网络的影响。 摘要:The application of self-supervised methods has resulted in broad improvements to neural network performance by leveraging large, untapped collections of unlabeled data to learn generalized underlying structure. In this work, we harness unsupervised data augmentation (UDA) to mitigate backdoor or Trojan attacks on deep neural networks. We show that UDA is more effective at removing the effects of a trigger than current state-of-the-art methods for both feature space and point triggers. These results demonstrate that UDA is both an effective and practical approach to mitigating the effects of backdoors on neural networks.
【7】 FedCon: A Contrastive Framework for Federated Semi-Supervised Learning 标题:FedCon:一种联邦半监督学习的对比框架 链接:https://arxiv.org/abs/2109.04533
作者:Zewei Long,Jiaqi Wang,Yaqing Wang,Houping Xiao,Fenglong Ma 机构:Department of CS, UIUC, Champaign, USA, College of IST, Penn State University, State College, USA, School of ECE, Purdue University, West Lafayette, USA, Institute for Insight, Georgia State University, Atlanta, USA 摘要:联邦半监督学习(Federated Semi-Supervisive Learning,FedSSL)由于其与孤立但未标记的数据共同训练机器学习模型的独特特性,越来越受到学术界和工业界的关注。大多数现有的FedSSL方法都集中于经典场景,即标记和未标记的数据存储在客户端。然而,在现实世界的应用程序中,如果没有任何激励,客户端用户可能无法提供标签。因此,在服务器端使用标签的场景更为实际。由于未标记数据和标记数据是解耦的,大多数现有的FedSSL方法可能无法处理这种情况。为了克服这个问题,本文提出了FedCon,它在FedSSL中引入了一种新的学习范式,即收缩学习。在三个数据集上的实验结果表明,在IID和非IID设置下,与最先进的基线相比,FedCon在压缩框架下实现了最佳性能。此外,烧蚀研究证明了所提出的FedCon框架的特点。 摘要:Federated Semi-Supervised Learning (FedSSL) has gained rising attention from both academic and industrial researchers, due to its unique characteristics of co-training machine learning models with isolated yet unlabeled data. Most existing FedSSL methods focus on the classical scenario, i.e, the labeled and unlabeled data are stored at the client side. However, in real world applications, client users may not provide labels without any incentive. Thus, the scenario of labels at the server side is more practical. Since unlabeled data and labeled data are decoupled, most existing FedSSL approaches may fail to deal with such a scenario. To overcome this problem, in this paper, we propose FedCon, which introduces a new learning paradigm, i.e., contractive learning, to FedSSL. Experimental results on three datasets show that FedCon achieves the best performance with the contractive framework compared with state-of-the-art baselines under both IID and Non-IID settings. Besides, ablation studies demonstrate the characteristics of the proposed FedCon framework.
【8】 Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation 标题:基于VAE的黑盒模型解释的无监督因果二元概念发现 链接:https://arxiv.org/abs/2109.04518
作者:Thien Q. Tran,Kazuto Fukuchi,Youhei Akimoto,Jun Sakuma 机构:University of Tsukuba, Riken AIP 摘要:我们的目的是用以下形式解释一个黑盒分类器:`数据X被分类为类Y,因为Xtextit{has}a、B和textit{has}C'中的a、B和C是高级概念。挑战在于,我们必须以无监督的方式发现一组概念,即a、B和C,这对解释分类器很有用。我们首先介绍一个结构生成模型,它适合于表达和发现这些概念。然后,我们提出了一个学习过程,同时学习数据分布,并鼓励某些概念对分类器输出产生较大的因果影响。我们的方法还可以方便地整合用户的先验知识,从而提高概念的可解释性。通过使用多个数据集,我们证明了我们的方法可以发现有用的二进制概念进行解释。 摘要:We aim to explain a black-box classifier with the form: `data X is classified as class Y because X textit{has} A, B and textit{does not have} C' in which A, B, and C are high-level concepts. The challenge is that we have to discover in an unsupervised manner a set of concepts, i.e., A, B and C, that is useful for the explaining the classifier. We first introduce a structural generative model that is suitable to express and discover such concepts. We then propose a learning process that simultaneously learns the data distribution and encourages certain concepts to have a large causal influence on the classifier output. Our method also allows easy integration of user's prior knowledge to induce high interpretability of concepts. Using multiple datasets, we demonstrate that our method can discover useful binary concepts for explanation.
【9】 Unsupervised classification of simulated magnetospheric regions 标题:模拟磁层区域的非监督分类 链接:https://arxiv.org/abs/2109.04916
作者:Maria Elena Innocenti,Jorge Amaya,Joachim Raeder,Romain Dupuis,Banafsheh Ferdousi,Giovanni Lapenta 机构:Institut für Theoretische Physik, Ruhr-Universität Bochum, Bochum, Germany, Centre for mathematical Plasma Astrophysics, Department of Mathematics, KU Leuven, Leuven, Belgium 摘要:在磁层飞行任务中,突发模式数据采样应在存在具有科学或操作意义的过程时触发。我们提出了一种磁层区域的无监督分类方法,这可能是自动识别感兴趣的磁层过程的多步骤方法的第一步。我们的方法基于自组织映射(SOM),并在OpenGGCM CTIM RCM代码获得的全球磁层模拟数据点上对其进行了初步测试。分类前利用主成分分析对数据进行降维。该分类仅依赖于选定数据点的局部等离子体特性,而没有关于其邻域或时间演化的信息。我们将SOM节点分类为自动选择的类别,并获得映射到定义良好的磁层区域的簇。我们通过在模拟空间中绘制分类数据并与K-均值分类进行比较来验证我们的分类结果。为了结果的可解释性,我们检查了SOM特征图(磁层变量在分类上下文中称为特征),并使用它们来解锁有关簇的信息。我们使用不同的特征集重复分类实验,我们定量地比较不同的分类结果,并且我们获得关于哪些磁层变量对无监督分类更有效的特征的见解。 摘要:In magnetospheric missions, burst mode data sampling should be triggered in the presence of processes of scientific or operational interest. We present an unsupervised classification method for magnetospheric regions, that could constitute the first-step of a multi-step method for the automatic identification of magnetospheric processes of interest. Our method is based on Self Organizing Maps (SOMs), and we test it preliminarily on data points from global magnetospheric simulations obtained with the OpenGGCM-CTIM-RCM code. The dimensionality of the data is reduced with Principal Component Analysis before classification. The classification relies exclusively on local plasma properties at the selected data points, without information on their neighborhood or on their temporal evolution. We classify the SOM nodes into an automatically selected number of classes, and we obtain clusters that map to well defined magnetospheric regions. We validate our classification results by plotting the classified data in the simulated space and by comparing with K-means classification. For the sake of result interpretability, we examine the SOM feature maps (magnetospheric variables are called features in the context of classification), and we use them to unlock information on the clusters. We repeat the classification experiments using different sets of features, we quantitatively compare different classification results, and we obtain insights on which magnetospheric variables make more effective features for unsupervised classification.
迁移|Zero/Few/One-Shot|自适应(4篇)
【1】 PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams 标题:PWPAE:物联网数据流概念漂移自适应集成框架 链接:https://arxiv.org/abs/2109.05013
作者:Li Yang,Dimitrios Michael Manias,Abdallah Shami 机构:Western University, London, Ontario, Canada 备注:Accepted and to appear in IEEE GlobeCom 2021; Code is available at Github link: this https URL 摘要:随着物联网(IoT)设备和系统数量的激增,物联网数据分析技术已被开发用于检测恶意网络攻击和保护物联网系统;然而,物联网数据分析中经常出现概念漂移问题,因为物联网数据通常是随时间变化的动态数据流,导致模型退化和攻击检测失败。这是因为传统的数据分析模型是静态模型,无法适应数据分布的变化。在本文中,我们提出了一个性能加权概率平均集成(PWPAE)框架,用于通过物联网数据流分析进行漂移自适应物联网异常检测。在两个公共数据集上的实验表明,与最新的方法相比,我们提出的PWPAE方法是有效的。 摘要:As the number of Internet of Things (IoT) devices and systems have surged, IoT data analytics techniques have been developed to detect malicious cyber-attacks and secure IoT systems; however, concept drift issues often occur in IoT data analytics, as IoT data is often dynamic data streams that change over time, causing model degradation and attack detection failure. This is because traditional data analytics models are static models that cannot adapt to data distribution changes. In this paper, we propose a Performance Weighted Probability Averaging Ensemble (PWPAE) framework for drift adaptive IoT anomaly detection through IoT data stream analytics. Experiments on two public datasets show the effectiveness of our proposed PWPAE method compared against state-of-the-art methods.
【2】 Does Pretraining for Summarization Require Knowledge Transfer? 标题:摘要前期训练是否需要知识转移? 链接:https://arxiv.org/abs/2109.04953
作者:Kundan Krishna,Jeffrey Bigham,Zachary C. Lipton 机构:Carnegie Mellon University, Forbes Avenue, Pittsburgh, PA 备注:Camera-ready for Findings of EMNLP 2021 摘要:利用大量数据集的预训练技术推动了文本摘要的最新进展。虽然民间解释认为,知识转移可以解释预训练的好处,但对于它为什么有效,或者是什么使预训练任务或数据集合适,我们知之甚少。在本文中,我们挑战了知识转移的故事,表明对由随机选择的字符n-gram组成的文档进行预训练,我们几乎可以与在真实语料库上预训练的模型的性能相匹配。这项工作有望消除上游语料库,这可能会缓解对冒犯性语言、偏见和版权问题的担忧。为了观察使用真实数据的微小剩余收益是否可以由训练前任务的结构来解释,我们设计了几个任务,这些任务的动机是对摘要语料库进行定性研究。然而,这些任务没有带来明显的好处,为知识转移留下了一个小角色的可能性。 摘要:Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining's benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no appreciable benefit, leaving open the possibility of a small role for knowledge transfer.
【3】 Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model 标题:一种文本到文本迁移模型的数值学习能力研究 链接:https://arxiv.org/abs/2109.04672
作者:Kuntal Kumar Pal,Chitta Baral 机构:Department of Computer Science, Arizona State University, Tempe, Arizona, USA 备注:7 pages, 10 figures, 5 tables, Accepted in the Findings of EMNLP 2021 摘要:基于transformer的预训练语言模型在大多数传统NLP任务中都非常成功。但在那些需要数字理解的任务中,他们往往会遇到困难。一些可能的原因可能是没有专门设计用于学习和保持计算能力的标记器和预训练目标。在这里,我们研究了文本到文本迁移学习模型(T5)学习算术的能力,该模型在传统NLP任务中的表现优于前人。我们考虑了四个数字任务:数列、数量级预测、序列中的最小值和最大值和排序。我们发现,尽管T5模型在插值设置中表现相当好,但在所有四项任务中,它们在外推设置中表现相当困难。 摘要:The transformer-based pre-trained language models have been tremendously successful in most of the conventional NLP tasks. But they often struggle in those tasks where numerical understanding is required. Some possible reasons can be the tokenizers and pre-training objectives which are not specifically designed to learn and preserve numeracy. Here we investigate the ability of text-to-text transfer learning model (T5), which has outperformed its predecessors in the conventional NLP tasks, to learn numeracy. We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting. We find that, although T5 models perform reasonably well in the interpolation setting, they struggle considerably in the extrapolation setting across all four tasks.
【4】 CINS: Comprehensive Instruction for Few-shot Learning in Task-orientedDialog Systems 标题:CINS:在任务型对话系统中进行少量学习的综合指导 链接:https://arxiv.org/abs/2109.04645
作者:Fei Mi,Yitong Li,Yasheng Wang,Xin Jiang,Qun Liu 机构:Huawei Noah’s Ark Lab,Huawei Technologies Co., Ltd. 摘要:由于面向任务的对话(ToD)系统中不同模块的标记成本很高,在实践中的一个主要挑战是用最少的标记数据学习不同的任务。最近,基于预训练语言模型(PLM)的提示方法在ToD中的少量镜头学习中显示了良好的效果。为了更好地利用PLMs的功能,本文提出了一种综合指令(CINS),它利用PLMs提供额外的特定于任务的指令。我们为ToD中的三个重要下游任务,即意图分类、对话框状态跟踪和自然语言生成,设计了指令模式(定义、约束、提示)及其定制实现。采用序列到序列模型(T5)在统一的框架内解决这三个任务。在具有少量验证数据的真实少数镜头学习场景中,对这些ToD任务进行了广泛的实验。实证结果表明,提出的CINS方法持续改进了使用原始输入或短提示微调PLM的技术。 摘要:As labeling cost for different modules in task-oriented dialog (ToD) systems is high, a major challenge in practice is to learn different tasks with the least amount of labeled data. Recently, prompting methods over pre-trained language models (PLMs) have shown promising results for few-shot learning in ToD. To better utilize the power of PLMs, this paper proposes Comprehensive Instruction (CINS) that exploits PLMs with extra task-specific instructions. We design a schema(definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD, i.e. intent classification, dialog state tracking, and natural language generation. A sequence-to-sequence model (T5)is adopted to solve these three tasks in a unified framework. Extensive experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data. Empirical results demonstrate that the proposed CINS approach consistently improves techniques that finetune PLMs with raw input or short prompts.
强化学习(2篇)
【1】 Multi-agent deep reinforcement learning (MADRL) meets multi-user MIMO systems 标题:面向多用户MIMO系统的多智能体深度强化学习(MADRL) 链接:https://arxiv.org/abs/2109.04986
作者:Heunchul Lee,Jaeseong Jeong 机构:Ericsson Research, Ericsson AB, Stockholm, Sweden 备注:Accepted for presentation at the IEEE GLOBECOM 2021, SAC, Machine Learning for Communications, December 7 - 11, in Madrid, Spain. @2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media 摘要:多智能体深度强化学习(MADRL)是解决无线环境中具有高维连续动作空间的多决策者(或参与者)问题的有效方法。在本文中,我们提出了一种基于MADRL的方法,该方法可以联合优化预编码器,以实现多输入单输出(MISO)干扰信道(IFC)可实现速率区域的外边界,称为pareto边界。为了解决MISO IFC设置中的两个主要挑战,即具有部分可观测性的多个参与者(或代理)和多维连续动作空间,我们采用了多代理深度确定性策略梯度(MA-DDPG)一个框架,其中具有部分可观测性的分散参与者可以借助于具有全局信息的共享批评家以集中的方式学习多维连续策略。同时,我们还将讨论在无线电通信中广泛使用的信号的常规复基带表示的相位模糊问题。为了缓解相位模糊对训练性能的影响,我们提出了一种称为相位模糊消除(PAE)的训练方法,该方法可以提高MA-DDPG在无线通信系统中的学习速度和性能。仿真结果表明,MA-DDPG能够在MISO IFC环境中学习近似最优的预编码策略。据我们所知,这是第一个证明MA-DDPG框架可以联合优化预编码器以实现多小区多用户多天线系统中可实现速率区域的帕累托边界的工作。 摘要:A multi-agent deep reinforcement learning (MADRL) is a promising approach to challenging problems in wireless environments involving multiple decision-makers (or actors) with high-dimensional continuous action space. In this paper, we present a MADRL-based approach that can jointly optimize precoders to achieve the outer-boundary, called pareto-boundary, of the achievable rate region for a multiple-input single-output (MISO) interference channel (IFC). In order to address two main challenges, namely, multiple actors (or agents) with partial observability and multi-dimensional continuous action space in MISO IFC setup, we adopt a multi-agent deep deterministic policy gradient (MA-DDPG) framework in which decentralized actors with partial observability can learn a multi-dimensional continuous policy in a centralized manner with the aid of shared critic with global information. Meanwhile, we will also address a phase ambiguity issue with the conventional complex baseband representation of signals widely used in radio communications. In order to mitigate the impact of phase ambiguity on training performance, we propose a training method, called phase ambiguity elimination (PAE), that leads to faster learning and better performance of MA-DDPG in wireless communication systems. The simulation results exhibit that MA-DDPG is capable of learning a near-optimal precoding strategy in a MISO IFC environment. To the best of our knowledge, this is the first work to demonstrate that the MA-DDPG framework can jointly optimize precoders to achieve the pareto-boundary of achievable rate region in a multi-cell multi-user multi-antenna system.
【2】 Projected State-action Balancing Weights for Offline Reinforcement Learning 标题:离线强化学习的投影状态-动作平衡权重 链接:https://arxiv.org/abs/2109.04640
作者:Jiayi Wang,Zhengling Qi,Raymond K. W. Wong 机构:Texas A&M University, George Washington University 摘要:离线策略评估(OPE)被认为是强化学习(RL)中一个基本且具有挑战性的问题。本文主要研究在无限水平马尔可夫决策过程框架下,基于从可能不同的策略生成的预收集数据对目标策略的价值估计。基于最近发展起来的RL中的边际重要性抽样方法和因果推理中的协变量平衡思想,我们提出了一种新的估计方法,该方法具有近似投影的状态-动作平衡权,用于策略值估计。我们得到了这些权值的收敛速度,并证明了在技术条件下所提出的值估计是半参数有效的。在渐近性方面,我们的结果以轨迹的数量和每条轨迹上的决策点的数量来衡量。因此,当决策点的数量不同时,仍然可以通过有限数量的受试者实现一致性。此外,我们首次尝试描述OPE问题的难度,这可能是独立的兴趣。数值实验证明了我们提出的估计器的良好性能。 摘要:Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and the covariate balancing idea in causal inference, we propose a novel estimator with approximately projected state-action balancing weights for the policy value estimation. We obtain the convergence rate of these weights, and show that the proposed value estimator is semi-parametric efficient under technical conditions. In terms of asymptotics, our results scale with both the number of trajectories and the number of decision points at each trajectory. As such, consistency can still be achieved with a limited number of subjects when the number of decision points diverges. In addition, we make a first attempt towards characterizing the difficulty of OPE problems, which may be of independent interest. Numerical experiments demonstrate the promising performance of our proposed estimator.
元学习(3篇)
【1】 Rapid Model Architecture Adaption for Meta-Learning 标题:面向元学习的快速模型体系结构自适应 链接:https://arxiv.org/abs/2109.04925
作者:Yiren Zhao,Xitong Gao,Ilia Shumailov,Nicolo Fusi,Robert Mullins 机构:University of Cambridge, Shenzhen Institute of Advanced Technology, CAS, Microsoft Research 摘要:网络体系结构搜索(NAS)方法近年来备受关注。与传统的手动调谐相比,它们设计的网络性能更好,搜索时间更短。尽管大多数NAS算法在模型部署方面效率很高,但它们的目标是固定硬件系统上的单个任务。然而,现实生活中的Few-Shot学习环境通常涵盖大量的任务(T)和在各种硬件平台(H)上的部署。如果天真地将现有NAS方法应用于这些场景,那么组合搜索复杂性T乘以H将带来基本的搜索效率挑战。为了克服这个问题,我们首次展示了如何通过将模型不可知元学习(MAML)集成到NAS流中,在多任务多硬件Few-Shot学习设置中快速调整模型体系结构以适应新任务。建议的NAS方法(H-Meta-NAS)具有硬件感知能力,并在MAML框架中执行优化。与各种NAS和手动基线相比,H-Meta-NAS在具有各种硬件平台和约束的流行Few-Shot学习基准中显示出帕累托优势。特别是,在5路1-shot Mini-ImageNet分类任务中,所提出的方法在计算量减少60%的情况下,大大优于最佳手动基线(准确率为5.21%)。 摘要:Network Architecture Search (NAS) methods have recently gathered much attention. They design networks with better performance and use a much shorter search time compared to traditional manual tuning. Despite their efficiency in model deployments, most NAS algorithms target a single task on a fixed hardware system. However, real-life few-shot learning environments often cover a great number of tasks (T ) and deployments on a wide variety of hardware platforms (H ). The combinatorial search complexity T times H creates a fundamental search efficiency challenge if one naively applies existing NAS methods to these scenarios. To overcome this issue, we show, for the first time, how to rapidly adapt model architectures to new tasks in a many-task many-hardware few-shot learning setup by integrating Model Agnostic Meta Learning (MAML) into the NAS flow. The proposed NAS method (H-Meta-NAS) is hardware-aware and performs optimisation in the MAML framework. H-Meta-NAS shows a Pareto dominance compared to a variety of NAS and manual baselines in popular few-shot learning benchmarks with various hardware platforms and constraints. In particular, on the 5-way 1-shot Mini-ImageNet classification task, the proposed method outperforms the best manual baseline by a large margin (5.21% in accuracy) using 60% less computation.
【2】 Knowledge-Aware Meta-learning for Low-Resource Text Classification 标题:面向低资源文本分类的知识感知元学习 链接:https://arxiv.org/abs/2109.04707
作者:Huaxiu Yao,Yingxin Wu,Maruan Al-Shedivat,Eric P. Xing 机构:Stanford University; ,University of Science and Technology of China, Mohamed bin Zayed University of Artificial Intelligence,Carnegie Mellon University 备注:Accepted by EMNLP 2021 摘要:元学习在利用历史知识促进新任务的学习过程方面取得了巨大成功。然而,当前元学习算法所采用的仅仅从历史任务中学习知识,在没有训练任务很好支持的情况下,可能无法很好地推广到测试任务。本文研究了一个低资源文本分类问题,并利用外部知识库来弥补元训练和元测试任务之间的差距。具体来说,我们建议KGML为从提取的句子特定知识图中学习到的每个句子引入额外的表示。在三个数据集上的大量实验证明了KGML在有监督适应和无监督适应设置下的有效性。 摘要:Meta-learning has achieved great success in leveraging the historical learned knowledge to facilitate the learning process of the new task. However, merely learning the knowledge from the historical tasks, adopted by current meta-learning algorithms, may not generalize well to testing tasks when they are not well-supported by training tasks. This paper studies a low-resource text classification problem and bridges the gap between meta-training and meta-testing tasks by leveraging the external knowledge bases. Specifically, we propose KGML to introduce additional representation for each sentence learned from the extracted sentence-specific knowledge graph. The extensive experiments on three datasets demonstrate the effectiveness of KGML under both supervised adaptation and unsupervised adaptation settings.
【3】 Bootstrapped Meta-Learning 标题:自助式元学习 链接:https://arxiv.org/abs/2109.04504
作者:Sebastian Flennerhag,Yannick Schroecker,Tom Zahavy,Hado van Hasselt,David Silver,Satinder Singh 机构:DeepMind 备注:31 pages, 19 figures, 7 tables 摘要:元学习使人工智能能够通过学习如何学习来提高其效率。释放这种潜力需要克服一个具有挑战性的元优化问题,该问题通常表现出病态和短视的元目标。我们提出了一种算法,通过让元学习者自学来解决这些问题。该算法首先从元学习器中引导一个目标,然后通过在选择的(伪)度量下最小化到该目标的距离来优化元学习器。以梯度元学习为重点,我们建立了保证性能改进的条件,并表明改进与目标距离有关。因此,通过控制曲率,可以使用距离度量来简化元优化,例如通过减少病态。此外,自举机制可以扩展有效的元学习范围,而无需通过所有更新进行反向传播。该算法通用性强,易于实现。我们在Atari ALE基准上实现了无模型代理的最新水平,在Few-Shot学习中改进了MAML,并展示了我们的方法如何通过在Q-learning代理中进行元学习和高效探索来打开新的可能性。 摘要:Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem that often exhibits ill-conditioning, and myopic meta-objectives. We propose an algorithm that tackles these issues by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the improvement is related to the target distance. Thus, by controlling curvature, the distance measure can be used to ease meta-optimization, for instance by reducing ill-conditioning. Further, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The algorithm is versatile and easy to implement. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities by meta-learning efficient exploration in a Q-learning agent.
医学相关(1篇)
【1】 Spike2Vec: An Efficient and Scalable Embedding Approach for COVID-19 Spike Sequences 标题:Spike2Vec:一种高效、可扩展的冠状病毒棘突序列嵌入方法 链接:https://arxiv.org/abs/2109.05019
作者:Sarwan Ali; Murray Patterson 机构: An Efficient and Scalable EmbeddingApproach for COVID- 19 Spike Sequences 1st Sarwan AliDepartment of Computer ScienceGeorgia State UniversityAtlanta, edu 2nd Murray PattersonDepartment of Computer ScienceGeorgia State UniversityAtlanta 摘要:随着新冠病毒-19在全球的迅速传播,越来越多的与该病毒相关的数据变得可用,包括基因组序列数据。在GISAID等平台上公开的基因组序列总数目前为数百万,并且每天都在增加。这种大数据的可用性为研究人员详细研究这种病毒创造了一个新的机会。这对于新出现和传播的新冠病毒-19变异的所有动力学来说尤其重要。这一丰富的数据来源将使我们深入了解对这一和未来大流行威胁进行基因组监测的最佳方法,最终目标是减轻或消除这类威胁。分析和处理数百万个基因组序列是一项具有挑战性的任务。虽然传统的序列分类方法被证明是有效的,但它们并不是用来处理这些特定类型的基因组序列的。此外,现有的大多数方法也面临可伸缩性问题。以前针对冠状病毒基因组数据的研究建议使用尖峰序列(对应于基因组的子序列),而不是使用完整的基因组序列,来执行不同的机器学习(ML)任务,如分类和聚类。然而,这些方法存在可伸缩性问题。在本文中,我们提出了一种称为Spike2Vec的方法,这是一种针对每个spike序列的高效且可扩展的特征向量表示,可用于下游ML任务。通过实验,我们发现Spike2Vec不仅在数百万个尖峰序列上具有可扩展性,而且在预测精度、F1分数等方面都优于基线模型。 摘要:With the rapid global spread of COVID-19, more and more data related to this virus is becoming available, including genomic sequence data. The total number of genomic sequences that are publicly available on platforms such as GISAID is currently several million, and is increasing with every day. The availability of such textit{Big Data} creates a new opportunity for researchers to study this virus in detail. This is particularly important with all of the dynamics of the COVID-19 variants which emerge and circulate. This rich data source will give us insights on the best ways to perform genomic surveillance for this and future pandemic threats, with the ultimate goal of mitigating or eliminating such threats. Analyzing and processing the several million genomic sequences is a challenging task. Although traditional methods for sequence classification are proven to be effective, they are not designed to deal with these specific types of genomic sequences. Moreover, most of the existing methods also face the issue of scalability. Previous studies which were tailored to coronavirus genomic data proposed to use spike sequences (corresponding to a subsequence of the genome), rather than using the complete genomic sequence, to perform different machine learning (ML) tasks such as classification and clustering. However, those methods suffer from scalability issues. In this paper, we propose an approach called Spike2Vec, an efficient and scalable feature vector representation for each spike sequence that can be used for downstream ML tasks. Through experiments, we show that Spike2Vec is not only scalable on several million spike sequences, but also outperforms the baseline models in terms of prediction accuracy, F1-score, etc.
推荐(1篇)
【1】 Trust your neighbors: A comprehensive survey of neighborhood-based methods for recommender systems 标题:信任你的邻居:推荐系统中基于邻居的方法的综合调查 链接:https://arxiv.org/abs/2109.04584
作者:Athanasios N. Nikolakopoulos,Xia Ning,Christian Desrosiers,George Karypis 备注:50 pages; Chapter in the Recommender Systems Handbook, 3rd Edition (to appear) 摘要:基于最近邻的协作推荐方法由于其简单、高效以及能够产生准确和个性化的推荐而在今天仍然非常流行。本章全面介绍了基于邻域的项目推荐方法。本文介绍了这些方法的主要特点和优点,描述了实现基于邻域的推荐系统的关键设计选择,并给出了如何做出这些选择的实用信息。本章涵盖了广泛的方法,包括传统的算法,如k-最近邻算法,以及基于矩阵分解、稀疏编码和随机游动的高级方法。 摘要:Collaborative recommendation approaches based on nearest-neighbors are still highly popular today due to their simplicity, their efficiency, and their ability to produce accurate and personalized recommendations. This chapter offers a comprehensive survey of neighborhood-based methods for the item recommendation problem. It presents the main characteristics and benefits of such methods, describes key design choices for implementing a neighborhood-based recommender system, and gives practical information on how to make these choices. A broad range of methods is covered in the chapter, including traditional algorithms like k-nearest neighbors as well as advanced approaches based on matrix factorization, sparse coding and random walks.
聚类(1篇)
【1】 Feature-based Individual Fairness in k-Clustering 标题:k-聚类中基于特征的个体公平性 链接:https://arxiv.org/abs/2109.04554
作者:Debajyoti Kar,Sourav Medya,Debmalya Mandal,Arlei Silva,Palash Dey,Swagato Sanyal 机构:IIT Kharagpur, India, Northwestern University, USA, Columbia University, USA, Rice University, USA 摘要:确保机器学习算法的公平性是一项具有挑战性的重要任务。我们考虑聚类点的问题,同时确保公平性约束。虽然已经有几次尝试在k-聚类问题中捕获组公平性,但是在个体层面上的公平性没有得到很好的研究。我们在k-聚类中引入了一个新的公平性概念,该概念基于不一定用于聚类的特征。我们证明了这个问题是NP难的,不允许常数因子近似。然后,我们设计了一个随机算法,在距离度量和公平性约束的自然约束下,保证在最小化聚类距离目标和个体公平性方面的近似性。最后,我们的实验结果验证了我们的算法比现有算法产生更低的聚类成本,同时在个体公平性方面具有竞争力。 摘要:Ensuring fairness in machine learning algorithms is a challenging and important task. We consider the problem of clustering a set of points while ensuring fairness constraints. While there have been several attempts to capture group fairness in the k-clustering problem, fairness at an individual level is not well-studied. We introduce a new notion of individual fairness in k-clustering based on features that are not necessarily used for clustering. We show that this problem is NP-hard and does not admit a constant factor approximation. We then design a randomized algorithm that guarantees approximation both in terms of minimizing the clustering distance objective as well as individual fairness under natural restrictions on the distance metric and fairness constraints. Finally, our experimental results validate that our algorithm produces lower clustering costs compared to existing algorithms while being competitive in individual fairness.
自动驾驶|车辆|车道检测等(1篇)
【1】 Citizen centric optimal electric vehicle charging stations locations in a full city: case of Malaga 标题:全市以市民为中心的最优电动汽车充电站选址--以马拉加为例 链接:https://arxiv.org/abs/2109.04975
作者:Christian Cintrano,Jamal Toutouh,Enrique Alba 机构: University of Malaga, Bulevar Louis Pasteur , Malaga, Spain, Massachusetts Institute of Technology, CSAIL, MA, USA 备注:None 摘要:本文通过定义电动汽车充电站选址(EV-CSL)问题,提出了城市电动汽车充电站选址问题。这个想法是为了尽可能缩短市民为车辆充电所需的路程。EV-CSL考虑了要安装的充电站的最大数量和电力需求。采用两种元启发式算法解决依赖优化问题:遗传算法(GA)和可变邻域搜索(VNS)。对西班牙马拉加市现实场景的实验分析表明,元启发式算法能够找到具有竞争力的解决方案,从而显著改善马拉加车站的实际安装情况。GA提供了统计上最好的结果。 摘要:This article presents the problem of locating electric vehicle (EV) charging stations in a city by defining the Electric Vehicle Charging Stations Locations (EV-CSL) problem. The idea is to minimize the distance the citizens have to travel to charge their vehicles. EV-CSL takes into account the maximum number of charging stations to install and the electric power requirements. Two metaheuristics are applied to address the relying optimization problem: a genetic algorithm (GA) and a variable neighborhood search (VNS). The experimental analysis over a realistic scenario of Malaga city, Spain, shows that the metaheuristics are able to find competitive solutions which dramatically improve the actual installation of the stations in Malaga. GA provided statistically the best results.
联邦学习|隐私保护|加密(1篇)
【1】 Multimodal Federated Learning 标题:多模态联合学习 链接:https://arxiv.org/abs/2109.04833
作者:Yuchen Zhao,Payam Barnaghi,Hamed Haddadi 机构:Imperial College London 摘要:联邦学习被提议作为集中式机器学习的替代方案,因为它的客户机-服务器结构在实际应用中提供了更好的隐私保护和可伸缩性。在许多应用中,如带有物联网设备的智能家居,客户端的本地数据是从不同的模式生成的,如感官、视觉和音频数据。现有的联邦学习系统只处理来自单一模式的本地数据,这限制了系统的可伸缩性。在本文中,我们提出了一个多模式半监督联邦学习框架,该框架训练自动编码器从客户端的不同本地数据模式中提取共享或相关表示。此外,我们还提出了一种多模态FedAvg算法来聚合在不同数据模式下训练的本地自动编码器。在服务器上辅助标记数据的帮助下,我们使用学习到的全局自动编码器进行下游分类任务。我们根据经验评估了我们在不同模式下的框架,包括感官数据、深度摄像机视频和RGB摄像机视频。我们的实验结果表明,将多种模式的数据引入联邦学习可以提高其准确性。此外,我们可以仅使用一种模式的标记数据在服务器上进行监督学习,并将学习到的模型应用于其他模式的测试数据,以获得适当的准确性(例如,约70%为最佳性能),特别是在结合单模式客户机和多模式客户机的贡献时。 摘要:Federated learning is proposed as an alternative to centralized machine learning since its client-server structure provides better privacy protection and scalability in real-world applications. In many applications, such as smart homes with IoT devices, local data on clients are generated from different modalities such as sensory, visual, and audio data. Existing federated learning systems only work on local data from a single modality, which limits the scalability of the systems. In this paper, we propose a multimodal and semi-supervised federated learning framework that trains autoencoders to extract shared or correlated representations from different local data modalities on clients. In addition, we propose a multimodal FedAvg algorithm to aggregate local autoencoders trained on different data modalities. We use the learned global autoencoder for a downstream classification task with the help of auxiliary labelled data on the server. We empirically evaluate our framework on different modalities including sensory data, depth camera videos, and RGB camera videos. Our experimental results demonstrate that introducing data from multiple modalities into federated learning can improve its accuracy. In addition, we can use labelled data from only one modality for supervised learning on the server and apply the learned model to testing data from other modalities to achieve decent accuracy (e.g., approximately 70% as the best performance), especially when combining contributions from both unimodal clients and multimodal clients.
推理|分析|理解|解释(1篇)
【1】 Neural Networks for Latent Budget Analysis of Compositional Data 标题:神经网络在成分数据潜在预算分析中的应用 链接:https://arxiv.org/abs/2109.04875
作者:Zhenwei Yang,Ayoub Bagheri,P. G. M van der Heijden 机构:Utrecht University, Sjoerd Groenman building, Padualaan , CH, Utrecht, The, Netherlands, ARTICLE HISTORY 摘要:成分数据是在具有恒定行和的矩形矩阵中收集的非负数据。由于非负性,重点放在每行加起来为1的条件比例上。一行有条件的比例称为观察预算。潜在预算分析(LBA)假设潜在预算的混合,以解释观察到的预算。LBA通常适用于列联表,其中行是一个或多个解释变量的级别,列是响应变量的级别。在前瞻性研究中,只有关于个体解释变量的知识,并且对预测反应变量感兴趣。因此,需要一种具有预测功能的LBA形式。以前的研究提出了LBA的约束神经网络(NN)扩展,但由于预测能力不令人满意而受到阻碍。在这里,我们提出了LBA-NN,这是一种前馈神经网络模型,它产生了与LBA类似的解释,但使LBA具有更好的预测能力。通过使用重要性图和表格,可以获得LBA-NN的稳定且合理的解释,这些图和表格显示了所有解释变量对响应变量的相对重要性。在重要性表上应用K-均值聚类的LBA-NN-K-均值方法用于生成与LBA中的K个潜在预算相当的K个聚类。在这里,我们提供了实现LBA-NN的不同实验,并与LBA进行了比较。在我们的分析中,LBA-NN在预测准确性、特异性、召回率和均方误差方面优于LBA。我们在GitHub提供开源软件。 摘要:Compositional data are non-negative data collected in a rectangular matrix with a constant row sum. Due to the non-negativity the focus is on conditional proportions that add up to 1 for each row. A row of conditional proportions is called an observed budget. Latent budget analysis (LBA) assumes a mixture of latent budgets that explains the observed budgets. LBA is usually fitted to a contingency table, where the rows are levels of one or more explanatory variables and the columns the levels of a response variable. In prospective studies, there is only knowledge about the explanatory variables of individuals and interest goes out to predicting the response variable. Thus, a form of LBA is needed that has the functionality of prediction. Previous studies proposed a constrained neural network (NN) extension of LBA that was hampered by an unsatisfying prediction ability. Here we propose LBA-NN, a feed forward NN model that yields a similar interpretation to LBA but equips LBA with a better ability of prediction. A stable and plausible interpretation of LBA-NN is obtained through the use of importance plots and table, that show the relative importance of all explanatory variables on the response variable. An LBA-NN-K- means approach that applies K-means clustering on the importance table is used to produce K clusters that are comparable to K latent budgets in LBA. Here we provide different experiments where LBA-NN is implemented and compared with LBA. In our analysis, LBA-NN outperforms LBA in prediction in terms of accuracy, specificity, recall and mean square error. We provide open-source software at GitHub.
检测相关(3篇)
【1】 FR-Detect: A Multi-Modal Framework for Early Fake News Detection on Social Media Using Publishers Features 标题:FR-Detect:一种基于发布者特征的多模态社交媒体假新闻早期检测框架 链接:https://arxiv.org/abs/2109.04835
作者:Ali Jarrahi,Leila Safari 机构:Computer Engineering, University of Zanjan, Zanjan, Iran 摘要:近年来,随着互联网的扩张和社交媒体基础设施的吸引,人们更喜欢通过这些媒体关注新闻。尽管这些媒体在新闻领域有许多优势,但由于缺乏任何控制和核查机制,虚假新闻得以传播,成为对民主、经济、新闻和言论自由的最重要威胁之一。设计并使用自动方法检测社交媒体上的假新闻已经成为一项重大挑战。在本文中,我们考察了出版商在社交媒体上发现虚假新闻的作用。我们还提出了一个高精度的多模式框架,即FR检测,使用具有早期检测能力的用户相关和内容相关特征。为此,出版商引入了两项与用户相关的新功能,即活动可信度和影响力。此外,还提供了一个句子级卷积神经网络,将这些特征与潜在的文本内容特征恰当地结合起来。实验结果表明,出版商的功能可以提高基于内容模型的性能,准确率和F1分数分别高达13%和29%。 摘要:In recent years, with the expansion of the Internet and attractive social media infrastructures, people prefer to follow the news through these media. Despite the many advantages of these media in the news field, the lack of any control and verification mechanism has led to the spread of fake news, as one of the most important threats to democracy, economy, journalism and freedom of expression. Designing and using automatic methods to detect fake news on social media has become a significant challenge. In this paper, we examine the publishers' role in detecting fake news on social media. We also suggest a high accurate multi-modal framework, namely FR-Detect, using user-related and content-related features with early detection capability. For this purpose, two new user-related features, namely Activity Credibility and Influence, have been introduced for publishers. Furthermore, a sentence-level convolutional neural network is provided to combine these features with latent textual content features properly. Experimental results have shown that the publishers' features can improve the performance of content-based models by up to 13% and 29% in accuracy and F1-score, respectively.
【2】 Artificial Text Detection via Examining the Topology of Attention Maps 标题:基于注意图拓扑检查的人工文本检测 链接:https://arxiv.org/abs/2109.04825
作者:Laida Kushnareva,Daniil Cherniavskii,Vladislav Mikhailov,Ekaterina Artemova,Serguei Barannikov,Alexander Bernstein,Irina Piontkovskaya,Dmitri Piontkovski,Evgeny Burnaev 机构:Huawei Noah’s Ark lab, Moscow, Russia, Skolkovo Institute of Science and Technology, Moscow, Russia, SberDevices, Sberbank, Moscow, Russia, HSE University, Moscow, Russia, CNRS, IMJ, Paris, France 备注:Accepted to EMNLP 2021 摘要:最近的生成模型创造文本的能力令人印象深刻,这些文本很难与人类书写的文本区分开来,这可能被误用为生成假新闻、产品评论,甚至是滥用内容。尽管现有的人工文本检测方法表现突出,但它们仍然缺乏对不可见模型的解释性和鲁棒性。为此,我们基于拓扑数据分析(TDA)提出了三种新的可解释拓扑特征,目前NLP领域正在研究TDA。我们的经验表明,在三种常见的数据集上,从BERT模型衍生出的特征优于基于计数和神经的基线(高达10%),并且与现有方法相比,对于看不见的GPT样式生成模型往往最为稳健。对这些特征的探索性分析揭示了它们对表面和句法属性的敏感性。结果表明,对于NLP任务,尤其是包含表面和结构信息的任务,TDA是一条很有前途的路线。 摘要:The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information.
【3】 TENET: Temporal CNN with Attention for Anomaly Detection in Automotive Cyber-Physical Systems 标题:宗旨:关注时间的CNN用于汽车数字物理系统中的异常检测 链接:https://arxiv.org/abs/2109.04565
作者:S. V. Thiruloga,V. K. Kukkala,S. Pasricha 机构: Temporal CNN with Attention for Anomaly Detection in Automotive Cyber-Physical Systems Sooryaa Vignesh Thiruloga Electrical and Computer Engineering Colorado State University Fort Collins CO USA sooryaa 摘要:现代车辆具有多个电子控制单元(ECU),它们作为复杂的分布式网络物理系统(CPS)的一部分连接在一起。电子控制系统和外部电子系统之间日益增加的通信使这些车辆特别容易受到各种网络攻击。在这项工作中,我们提出了一种新的异常检测框架,称为TENET,用于检测由对车辆的网络攻击引起的异常。特尼特使用时间卷积神经网络和集成注意机制来检测异常攻击模式。与之前在汽车异常检测方面表现最佳的工作相比,TENET能够实现32.70%的假阴性率、19.14%的马修斯相关系数和17.25%的ROC-AUC度量改进,模型参数减少94.62%,内存占用减少86.95%,推理时间减少48.14%。 摘要:Modern vehicles have multiple electronic control units (ECUs) that are connected together as part of a complex distributed cyber-physical system (CPS). The ever-increasing communication between ECUs and external electronic systems has made these vehicles particularly susceptible to a variety of cyber-attacks. In this work, we present a novel anomaly detection framework called TENET to detect anomalies induced by cyber-attacks on vehicles. TENET uses temporal convolutional neural networks with an integrated attention mechanism to detect anomalous attack patterns. TENET is able to achieve an improvement of 32.70% in False Negative Rate, 19.14% in the Mathews Correlation Coefficient, and 17.25% in the ROC-AUC metric, with 94.62% fewer model parameters, 86.95% decrease in memory footprint, and 48.14% lower inference time when compared to the best performing prior work on automotive anomaly detection.
分类|识别(5篇)
【1】 Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition 标题:端到端多通道远场语音识别的自注意通道组合器前端 链接:https://arxiv.org/abs/2109.04783
作者:Rong Gong,Carl Quillen,Dushyant Sharma,Andrew Goderre,José Laínez,Ljubomir Milanović 机构:Nuance Communications GmbH, Vienna, Austria, Nuance Communications Inc., Burlington, USA, Nuance Communications S.A., Madrid, Spain 备注:In Proceedings of Interspeech 2021 摘要:当提供足够大的远场训练数据时,联合优化多通道前端和端到端(E2E)自动语音识别(ASR)后端将显示有希望的结果。最近的文献表明,传统的波束形成器设计,如MVDR(最小方差无失真响应)或固定波束形成器,可以成功地作为前端集成到具有可学习参数的E2E ASR系统中。在这项工作中,我们提出了自注意通道组合器(SACC)ASR前端,它利用自注意机制在幅度谱域中组合多通道音频信号。在多通道播放测试数据上进行的实验表明,与基于最先进的固定波束形成器的前端相比,SACC实现了9.3%的WERR,两者都与基于ContextNet的ASR后端进行了联合优化。我们还演示了SACC和传统波束形成器之间的连接,并分析了SACC的中间输出。 摘要:When a sufficiently large far-field training data is presented, jointly optimizing a multichannel frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows promising results. Recent literature has shown traditional beamformer designs, such as MVDR (Minimum Variance Distortionless Response) or fixed beamformers can be successfully integrated as the frontend into an E2E ASR system with learnable parameters. In this work, we propose the self-attention channel combinator (SACC) ASR frontend, which leverages the self-attention mechanism to combine multichannel audio signals in the magnitude spectral domain. Experiments conducted on a multichannel playback test data shows that the SACC achieved a 9.3% WERR compared to a state-of-the-art fixed beamformer-based frontend, both jointly optimized with a ContextNet-based ASR backend. We also demonstrate the connection between the SACC and the traditional beamformers, and analyze the intermediate outputs of the SACC.
【2】 Multi-label Classification of Aircraft Heading Changes Using Neural Network to Resolve Conflicts 标题:基于神经网络冲突消解的飞机航向变化多标签分类 链接:https://arxiv.org/abs/2109.04767
作者:Md Siddiqur Rahman,Laurent Lapasset,Josiane Mothe 机构:ENAC, IRIT UMR, CNRS, Univ.de Toulouse , Capitole, DEVI, Ecole Nationale de l’Aviation Civile, INSPE, IRIT, UMR, CNRS, Univ.de Toulouse, Toulouse, France 备注:27 pages, 24 figures 摘要:当两架或多架飞机同时以一定距离交叉时,飞机冲突就会发生。指定了专门的空中交通管制员来解决此类冲突。为了解决冲突,控制器需要考虑各种类型的信息。最常见和初步的信息是相关飞机的坐标位置。此外,管制员必须考虑更多信息,如飞行计划、天气、限制区域等。管制员必须面对的最重要挑战是:思考所涉及的问题,并在很短的时间内做出决定。由于飞机数量的增加,减少管制员的工作量并帮助他们快速做出决策至关重要。冲突可以在许多方面解决,因此,我们认为这个问题是一个多标签分类问题。在此过程中,我们提出了一个多标签分类模型,为给定冲突提供多标题建议。我们称之为CRMLnet的模型基于多层神经网络的新应用,帮助控制器做出决策。与其他机器学习模型相比,我们的CRMLnet取得了最好的结果,准确率为98.72%,ROC为0.999。我们在实验中开发和使用的模拟数据集将交付给研究社区。 摘要:An aircraft conflict occurs when two or more aircraft cross at a certain distance at the same time. Specific air traffic controllers are assigned to solve such conflicts. A controller needs to consider various types of information in order to solve a conflict. The most common and preliminary information is the coordinate position of the involved aircraft. Additionally, a controller has to take into account more information such as flight planning, weather, restricted territory, etc. The most important challenges a controller has to face are: to think about the issues involved and make a decision in a very short time. Due to the increased number of aircraft, it is crucial to reduce the workload of the controllers and help them make quick decisions. A conflict can be solved in many ways, therefore, we consider this problem as a multi-label classification problem. In doing so, we are proposing a multi-label classification model which provides multiple heading advisories for a given conflict. This model we named CRMLnet is based on a novel application of a multi-layer neural network and helps the controllers in their decisions. When compared to other machine learning models, our CRMLnet has achieved the best results with an accuracy of 98.72% and ROC of 0.999. The simulated data set that we have developed and used in our experiments will be delivered to the research community.
【3】 Dual-State Capsule Networks for Text Classification 标题:用于文本分类的双态胶囊网络 链接:https://arxiv.org/abs/2109.04762
作者:Piyumal Demotte,Surangika Ranathunga 机构:Department of Computer Science and Engineering, University of Moratuwa, Katubedda , Sri Lanka 备注:9 pages 摘要:基于上下文嵌入的文本分类系统对于许多低资源语言来说是不可行的。另一方面,最近引入的胶囊网络已经显示出与这些文本分类模型相一致的性能。因此,对于没有预先训练的上下文嵌入模型的语言,它们可以被视为一种可行的文本分类方法。然而,当前的胶囊网络依赖于空间模式,而不考虑文本的顺序特征。在捕获较长序列中的上下文级信息时,它们也是次优的。本文提出了一种新的基于双态胶囊(DS-Caps)网络的文本分类技术,并对其进行了优化以缓解这些问题。两种不同的状态,即句子级和单词级,与胶囊层集成,以获取更深层的上下文级信息,用于语言建模。利用句子级状态获取的上下文级信息,优化了胶囊间的动态路由过程。对于多个数据集,DS Caps网络的性能优于现有的capsule网络体系结构,特别是对于具有较长文本序列的任务。我们还展示了DS CAP在低资源语言文本分类中的优势。 摘要:Text classification systems based on contextual embeddings are not viable options for many of the low resource languages. On the other hand, recently introduced capsule networks have shown performance in par with these text classification models. Thus, they could be considered as a viable alternative for text classification for languages that do not have pre-trained contextual embedding models. However, current capsule networks depend upon spatial patterns without considering the sequential features of the text. They are also sub-optimal in capturing the context-level information in longer sequences. This paper presents a novel Dual-State Capsule (DS-Caps) network-based technique for text classification, which is optimized to mitigate these issues. Two varieties of states, namely sentence-level and word-level, are integrated with capsule layers to capture deeper context-level information for language modeling. The dynamic routing process among capsules was also optimized using the context-level information obtained through sentence-level states. The DS-Caps networks outperform the existing capsule network architectures for multiple datasets, particularly for tasks with longer sequences of text. We also demonstrate the superiority of DS-Caps in text classification for a low resource language.
【4】 Style Pooling: Automatic Text Style Obfuscation for Improved Classification Fairness 标题:样式池:自动文本样式混淆以提高分类公正性 链接:https://arxiv.org/abs/2109.04624
作者:Fatemehsadat Mireshghallah,Taylor Berg-Kirkpatrick 机构:University of California San Diego 备注:EMNLP 2021 摘要:文本风格可以向读者揭示作者的敏感属性(如种族或年龄),这反过来会导致侵犯隐私和基于文本的人类和算法决策的偏见。例如,求职申请中的写作风格可能会揭示应聘者受保护的属性,这可能导致招聘决策中的偏见,无论招聘决策是通过算法还是由人做出的。我们提出了一个基于VAE的框架,该框架通过自动重写文本本身,通过样式转换来模糊人工生成文本的风格特征。我们的框架以灵活的方式操作模糊化风格的概念,实现了模糊化风格的两个不同概念:(1)一个最小的概念,有效地交叉了训练中看到的各种风格;(2)一个最大的概念,寻求通过向文本添加所有敏感属性的风格特征来模糊化,实际上,计算样式的联合。我们的风格混淆框架可以用于多种目的,但是,我们证明了它在提高下游分类器的公平性方面的有效性。我们还对两域和三域风格混淆中风格池对流畅性、语义一致性和文本属性去除的影响进行了全面研究。 摘要:Text style can reveal sensitive attributes of the author (e.g. race or age) to the reader, which can, in turn, lead to privacy violations and bias in both human and algorithmic decisions based on text. For example, the style of writing in job applications might reveal protected attributes of the candidate which could lead to bias in hiring decisions, regardless of whether hiring decisions are made algorithmically or by humans. We propose a VAE-based framework that obfuscates stylistic features of human-generated text through style transfer by automatically re-writing the text itself. Our framework operationalizes the notion of obfuscated style in a flexible way that enables two distinct notions of obfuscated style: (1) a minimal notion that effectively intersects the various styles seen in training, and (2) a maximal notion that seeks to obfuscate by adding stylistic features of all sensitive attributes to text, in effect, computing a union of styles. Our style-obfuscation framework can be used for multiple purposes, however, we demonstrate its effectiveness in improving the fairness of downstream classifiers. We also conduct a comprehensive study on style pooling's effect on fluency, semantic consistency, and attribute removal from text, in two and three domain style obfuscation.
【5】 Best-Arm Identification in Correlated Multi-Armed Bandits 标题:相关多臂强流中的最佳臂识别 链接:https://arxiv.org/abs/2109.04941
作者:Samarth Gupta,Gauri Joshi,Osman Yağan 机构:Carnegie Mellon University, Pittsburgh, PA , Editor: No editors 备注:None 摘要:在本文中,我们考虑在固定的信任设置中的多武装匪徒的最佳臂识别问题,其目标是识别概率为1美元δ$的一些$ delta>0美元,ARM具有最高的平均奖励在最小可能的样本中从武器集$ MathCAL{K}$。大多数现有的最佳手臂识别算法和分析都是在假设不同手臂对应的奖励相互独立的情况下进行的。我们提出了一个新的相关bandit框架,该框架以一个arm的预期条件奖励上界的形式捕获关于arm之间相关性的领域知识,给定另一个arm的奖励实现。我们提出的算法C-LUCB是LUCB算法的推广,它利用了部分相关知识,大大降低了最佳arm识别的样本复杂度。更有趣的是,我们显示C-LUCB获得的总样本的形式为$mathcal{O}left(sum{kinmathcal{C}logleft(frac{1}{delta}right)right)$,而不是独立奖励设置中所需的典型$mathcal{O}left(sum{kinmathcal{k}logleft(frac{1}{delta}right)$right)$samples。改进之处在于$mathcal{O}(log(1/delta))$项仅对竞争性武器集$mathcal{C}$求和,这是原始武器集$mathcal{K}$的子集。根据问题设置,集合$mathcal{C}$的大小可以小到$2$,因此在相关bandits设置中使用C-LUCB可以显著提高性能。我们的理论发现得到了Movielens和Goodreads推荐数据集实验的支持。 摘要:In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability $1-delta$ for some $delta>0$, the arm with the highest mean reward in minimum possible samples from the set of arms $mathcal{K}$. Most existing best-arm identification algorithms and analyses operate under the assumption that the rewards corresponding to different arms are independent of each other. We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization from another arm. Our proposed algorithm C-LUCB, which generalizes the LUCB algorithm utilizes this partial knowledge of correlations to sharply reduce the sample complexity of best-arm identification. More interestingly, we show that the total samples obtained by C-LUCB are of the form $mathcal{O}left(sum_{k in mathcal{C}} logleft(frac{1}{delta}right)right)$ as opposed to the typical $mathcal{O}left(sum_{k in mathcal{K}} logleft(frac{1}{delta}right)right)$ samples required in the independent reward setting. The improvement comes, as the $mathcal{O}(log(1/delta))$ term is summed only for the set of competitive arms $mathcal{C}$, which is a subset of the original set of arms $mathcal{K}$. The size of the set $mathcal{C}$, depending on the problem setting, can be as small as $2$, and hence using C-LUCB in the correlated bandits setting can lead to significant performance improvements. Our theoretical findings are supported by experiments on the Movielens and Goodreads recommendation datasets.
表征(2篇)
【1】 Box Embeddings: An open-source library for representation learning using geometric structures 标题:Box Embedding:使用几何结构进行表示学习的开源库 链接:https://arxiv.org/abs/2109.04997
作者:Tejas Chheda,Purujit Goyal,Trang Tran,Dhruvesh Patel,Michael Boratko,Shib Sankar Dasgupta,Andrew McCallum 机构:† College of Information and Computer Sciences, University of Massachusetts Amherst, MA , USA, ‡ MassMutual Data Science, MA , USA 备注:The source code and the usage and API documentation for the library is available at this https URL and this https URL 摘要:现代表征学习成功的一个主要因素是易于执行各种向量运算。最近,具有几何结构的物体(如分布、复数或双曲向量,或区域,如圆锥体、圆盘或盒子)已被探索用于其替代性感应偏置和额外的表征能力。在这项工作中,我们将介绍Box Embeddings,这是一个Python库,它使研究人员能够轻松地应用和扩展概率Box Embeddings。 摘要:A major factor contributing to the success of modern representation learning is the ease of performing various vector operations. Recently, objects with geometric structures (eg. distributions, complex or hyperbolic vectors, or regions such as cones, disks, or boxes) have been explored for their alternative inductive biases and additional representational capacities. In this work, we introduce Box Embeddings, a Python library that enables researchers to easily apply and extend probabilistic box embeddings.
【2】 Integrating Approaches to Word Representation 标题:整合单词表示方法 链接:https://arxiv.org/abs/2109.04876
作者:Yuval Pinter 备注:Adapted dissertation introduction 摘要:在现代神经学习系统中表示语言的原子元素是自然语言处理领域的核心挑战之一。我介绍了解决这一任务的分布、组合和关系方法,并讨论了将它们集成到系统中的各种方法,特别强调了单词级和词汇表外现象。 摘要:The problem of representing the atomic elements of language in modern neural learning systems is one of the central challenges of the field of natural language processing. I present a survey of the distributional, compositional, and relational approaches to addressing this task, and discuss various means of integrating them into systems, with special emphasis on the word level and the out-of-vocabulary phenomenon.
3D|3D重建等相关(2篇)
【1】 Inverse design of 3d molecular structures with conditional generative neural networks 标题:基于条件生成神经网络的三维分子结构逆向设计 链接:https://arxiv.org/abs/2109.04824
作者:Niklas W. A. Gebauer,Michael Gastegger,Stefaan S. P. Hessmann,Klaus-Robert Müller,Kristof T. Schütt 机构:Hessmann, Klaus-Robert M¨uller, and Kristof T. Sch¨utt, Machine Learning Group, Technische Universit¨at Berlin, Berlin, Germany, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany, BASLEARN – TU BerlinBASF Joint Lab for Machine Learning 摘要:合理设计具有所需性质的分子是化学中一个长期存在的挑战。生成性神经网络已成为从已知分布中取样新分子的有力方法。在这里,我们提出了一种用于具有特定结构和化学性质的三维分子结构的条件生成神经网络。这种方法对化学键是不可知的,可以从条件分布中有针对性地取样新分子,即使在参考计算很少的领域也是如此。我们通过生成具有特定组成或基序的分子,发现特别稳定的分子,并联合针对训练区域以外的多种电子特性,证明了我们的逆设计方法的实用性。 摘要:The rational design of molecules with desired properties is a long-standing challenge in chemistry. Generative neural networks have emerged as a powerful approach to sample novel molecules from a learned distribution. Here, we propose a conditional generative neural network for 3d molecular structures with specified structural and chemical properties. This approach is agnostic to chemical bonding and enables targeted sampling of novel molecules from conditional distributions, even in domains where reference calculations are sparse. We demonstrate the utility of our method for inverse design by generating molecules with specified composition or motifs, discovering particularly stable molecules, and jointly targeting multiple electronic properties beyond the training regime.
【2】 Mesh convolutional neural networks for wall shear stress estimation in 3D artery models 标题:网格卷积神经网络在三维动脉模型壁面切应力估计中的应用 链接:https://arxiv.org/abs/2109.04797
作者:Julian Suk,Pim de Haan,Phillip Lippe,Christoph Brune,Jelmer M. Wolterink 机构: Department of Applied Mathematics & Technical Medical Centre, University of, Twente, Enschede, The Netherlands, QUVA Lab, University of Amsterdam, Amsterdam, The Netherlands, Qualcomm AI Research, Qualcomm Technologies Netherlands B.V.⋆ 备注:(MICCAI 2021) Workshop on Statistical Atlases and Computational Modelling of the Heart (STACOM) 摘要:计算流体力学(CFD)是一种有价值的工具,用于个性化、无创性地评估动脉血流动力学,但其复杂性和耗时性限制了其在实践中的大规模应用。最近,人们研究了利用深度学习快速估算表面网格上的壁面剪应力(WSS)等CFD参数。然而,现有的方法通常依赖于手工制作的曲面网格重新参数化来匹配卷积神经网络结构。在这项工作中,我们建议使用网格卷积神经网络,直接操作CFD中使用的相同有限元表面网格。我们使用从CFD模拟中获得的基本真实值,在有分叉和无分叉的两组合成冠状动脉模型数据集上训练和评估我们的方法。我们证明了我们灵活的深度学习模型能够准确地预测该表面网格上的三维WSS向量。我们的方法在不到5[s]的时间内处理新网格,一致地实现了$leq$1.6[%]的归一化平均绝对误差,并且在保持的测试集上达到了90.5[%]中值近似精度的峰值,与之前发表的工作相比,这是有利的。这表明了在动脉模型中使用网格卷积神经网络进行血流动力学参数估计的CFD替代模型的可行性。 摘要:Computational fluid dynamics (CFD) is a valuable tool for personalised, non-invasive evaluation of hemodynamics in arteries, but its complexity and time-consuming nature prohibit large-scale use in practice. Recently, the use of deep learning for rapid estimation of CFD parameters like wall shear stress (WSS) on surface meshes has been investigated. However, existing approaches typically depend on a hand-crafted re-parametrisation of the surface mesh to match convolutional neural network architectures. In this work, we propose to instead use mesh convolutional neural networks that directly operate on the same finite-element surface mesh as used in CFD. We train and evaluate our method on two datasets of synthetic coronary artery models with and without bifurcation, using a ground truth obtained from CFD simulation. We show that our flexible deep learning model can accurately predict 3D WSS vectors on this surface mesh. Our method processes new meshes in less than 5 [s], consistently achieves a normalised mean absolute error of $leq$ 1.6 [%], and peaks at 90.5 [%] median approximation accuracy over the held-out test set, comparing favorably to previously published work. This shows the feasibility of CFD surrogate modelling using mesh convolutional neural networks for hemodynamic parameter estimation in artery models.
编码器(2篇)
【1】 Fairness without the sensitive attribute via Causal Variational Autoencoder 标题:基于因果变分自动编码器的无敏感属性公平性 链接:https://arxiv.org/abs/2109.04999
作者:Vincent Grari,Sylvain Lamprier,Marcin Detyniecki 机构: Sorbonne Universit´e, LIP,CNRS, Paris, France, AXA, Paris, France, Polish Academy of Science, IBS PAN, Warsaw, Poland 备注:8 pages, 9 figures 摘要:近年来,机器学习模型中的大多数公平策略都集中在通过假设观察到敏感信息来减少不必要的偏差。然而,这在实践中并不总是可能的。由于隐私目的和欧盟的RGPD等各种法规,许多个人敏感属性经常未被收集。我们注意到,在这种困难的环境中,缺乏减少偏见的方法,特别是在实现人口均等和均等赔率等经典公平目标方面。通过利用最近的发展进行近似推断,我们提出了一种填补这一空白的方法。基于因果图,我们依赖一个新的基于变分自动编码的框架SRCVAE来推断敏感信息代理,该代理在对抗性公平方法中用于减少偏差。我们以经验证明,该领域的现有工作有显著改进。我们观察到,生成的代理的潜在空间恢复了敏感信息,并且我们的方法在两个真实数据集上获得相同的公平性水平的同时,获得了更高的精度,这与使用com mon公平性定义测量的结果相同。 摘要:In recent years, most fairness strategies in machine learning models focus on mitigating unwanted biases by assuming that the sensitive information is observed. However this is not always possible in practice. Due to privacy purposes and var-ious regulations such as RGPD in EU, many personal sensitive attributes are frequently not collected. We notice a lack of approaches for mitigating bias in such difficult settings, in particular for achieving classical fairness objectives such as Demographic Parity and Equalized Odds. By leveraging recent developments for approximate inference, we propose an approach to fill this gap. Based on a causal graph, we rely on a new variational auto-encoding based framework named SRCVAE to infer a sensitive information proxy, that serve for bias mitigation in an adversarial fairness approach. We empirically demonstrate significant improvements over existing works in the field. We observe that the generated proxy's latent space recovers sensitive information and that our approach achieves a higher accuracy while obtaining the same level of fairness on two real datasets, as measured using com-mon fairness definitions.
【2】 Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility 标题:监督可变自动编码器解码器提高科学实用性 链接:https://arxiv.org/abs/2109.04561
作者:Liyun Tu,Austin Talbot,Neil Gallagher,David Carlson 机构: Gallagher is with the Department of Neurobiology 摘要:概率生成模型对于科学建模具有吸引力,因为它们的推断参数可用于生成假设和设计实验。这就要求学习模型提供输入数据的准确表示,并产生一个有效预测与科学问题相关结果的潜在空间。监督变分自动编码器(SVAE)以前曾用于此目的,其中精心设计的解码器可以用作可解释的生成模型,同时监督目标确保预测潜在表示。不幸的是,监督目标迫使编码器学习生成后验分布的有偏近似,这使得生成参数在科学模型中使用时不可靠。由于通常用于评估模型性能的重建损失未检测到编码器中的偏差,因此该问题尚未被检测到。我们通过开发一个二阶监督框架(SOS-VAE)来解决这个以前未报告的问题,该框架影响解码器以诱导预测潜在表示。这确保了相关编码器保持可靠的生成解释。我们扩展了这项技术,允许用户权衡生成参数中的一些偏差,以提高预测性能,充当SVAE和我们新的SOS-VAE之间的中间选项。我们还使用这种方法来解决合并多个科学实验记录时经常出现的数据缺失问题。我们使用合成数据和电生理记录证明了这些发展的有效性,重点是我们学习的表征如何用于设计科学实验。 摘要:Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, where a carefully designed decoder can be used as an interpretable generative model while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second order supervision framework (SOS-VAE) that influences the decoder to induce a predictive latent representation. This ensures that the associated encoder maintains a reliable generative interpretation. We extend this technique to allow the user to trade-off some bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.
优化|敛散性(2篇)
【1】 Efficient Locally Optimal Number Set Partitioning for Scheduling, Allocation and Fair Selection 标题:用于调度、分配和公平选择的高效局部最优数集划分 链接:https://arxiv.org/abs/2109.04809
作者:Kaan Gokcesu,Hakan Gokcesu 机构: In schoolyards 摘要:我们研究集划分问题的优化版本(其中划分和之间的差异最小化),它在决策理论文献中有许多应用。而集合划分问题是NP难问题,需要指数复杂度才能解决(即难以解决);我们提出了这个NP难问题的一个较弱的版本,目标是找到一个局部最优解。我们证明了我们提出的算法能够在近似线性时间内找到局部最优解。我们的算法既不需要输入集合中的正元素,也不需要输入集合中的整数元素,因此适用范围更广。 摘要:We study the optimization version of the set partition problem (where the difference between the partition sums are minimized), which has numerous applications in decision theory literature. While the set partitioning problem is NP-hard and requires exponential complexity to solve (i.e., intractable); we formulate a weaker version of this NP-hard problem, where the goal is to find a locally optimal solution. We show that our proposed algorithms can find a locally optimal solution in near linear time. Our algorithms require neither positive nor integer elements in the input set, hence, they are more widely applicable.
【2】 Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees 标题:优化中的异步迭代:新的序列结果和更清晰的算法保证 链接:https://arxiv.org/abs/2109.04522
作者:Hamid Reza Feyzmahdavian,Mikael Johansson 机构: Sweden 2KTH Royal Institute of Technology 备注:44 pages, 1 Figure 摘要:我们介绍了在并行和分布式优化算法分析中出现的异步迭代的新的收敛结果。结果易于应用,并给出了异步度如何影响迭代收敛速度的显式估计。我们的结果缩短、简化和加强了几种异步优化方法的现有收敛性证明,并允许我们为迄今为止缺乏完整理论理解的流行算法建立收敛保证。具体地说,我们使用我们的结果为近端增量聚合梯度方法推导出更好的迭代复杂度边界,为Krasnoselskii-Mann迭代的异步块坐标实现提供较少保守的加速条件分析,并在通信延迟和更新率的各种假设下,量化完全异步迭代的收敛速度。 摘要:We introduce novel convergence results for asynchronous iterations which appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods, and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical understanding. Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates.
预测|估计(3篇)
【1】 Heading Estimation Using Ultra-Wideband Received Signal Strength and Gaussian Processes 标题:基于超宽带接收信号强度和高斯过程的航向估计 链接:https://arxiv.org/abs/2109.04868
作者:Daniil Lisus,Charles Champagne Cossette,Mohammed Shalaby,James Richard Forbes 备注:6 pages, 9 figures, accepted to Robotics and Automation Letters, presented at IROS 2021 摘要:机器人必须能够确定其位置和方向,以便自主执行任务。航向估计在室内环境中尤其具有挑战性,因为在室内环境中,磁畸变使得基于磁强计的航向估计变得困难。超宽带(UWB)收发器在室内定位问题中很常见。这封信实验演示了如何使用超宽带范围和接收信号强度(RSS)测量来估计机器人的航向。超宽带天线的RSS随其方向而变化。因此,高斯过程(GP)用于学习从UWB范围和RSS输入到方向输出的数据驱动关系。结合固定扩展卡尔曼滤波器中的陀螺仪,实现了仅使用超宽带和陀螺仪测量值的航向估计方法。 摘要:It is essential that a robot has the ability to determine its position and orientation to execute tasks autonomously. Heading estimation is especially challenging in indoor environments where magnetic distortions make magnetometer-based heading estimation difficult. Ultra-wideband (UWB) transceivers are common in indoor localization problems. This letter experimentally demonstrates how to use UWB range and received signal strength (RSS) measurements to estimate robot heading. The RSS of a UWB antenna varies with its orientation. As such, a Gaussian process (GP) is used to learn a data-driven relationship from UWB range and RSS inputs to orientation outputs. Combined with a gyroscope in an invariant extended Kalman filter, this realizes a heading estimation method that uses only UWB and gyroscope measurements.
【2】 KNODE-MPC: A Knowledge-based Data-driven Predictive Control Framework for Aerial Robots 标题:Knode-MPC:一种基于知识的空中机器人数据驱动预测控制框架 链接:https://arxiv.org/abs/2109.04821
作者:Kong Yao Chee,Tom Z. Jiahao,M. Ani Hsieh 机构:UniversityofPennsylvania 备注:7 pages, 8 figures. *Equal Contribution. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要:在这项工作中,我们考虑的问题推导和结合模型预测控制(MPC)与四旋翼控制应用的精确动态模型。MPC依靠精确的动态模型来实现所需的闭环性能。然而,复杂系统及其运行环境中存在的不确定性对获得足够精确的系统动力学表示提出了挑战。在这项工作中,我们利用一个深入的学习工具,基于知识的神经常微分方程(KNODE),以扩大从第一原理获得的模型。由此产生的混合模型包括名义第一原理模型和从模拟或真实实验数据中学习的神经网络。使用四转子,我们将我们的混合模型与最先进的高斯过程(GP)模型进行对比,结果表明,混合模型提供了更精确的四转子动力学预测,并且能够推广到训练数据之外。为了提高闭环性能,混合模型被集成到一个新的MPC框架中,称为KNODE-MPC。结果表明,该集成框架在轨迹跟踪性能方面,仿真和物理实验分别提高了73%和14%以上。 摘要:In this work, we consider the problem of deriving and incorporating accurate dynamic models for model predictive control (MPC) with an application to quadrotor control. MPC relies on precise dynamic models to achieve the desired closed-loop performance. However, the presence of uncertainties in complex systems and the environments they operate in poses a challenge in obtaining sufficiently accurate representations of the system dynamics. In this work, we make use of a deep learning tool, knowledge-based neural ordinary differential equations (KNODE), to augment a model obtained from first principles. The resulting hybrid model encompasses both a nominal first-principle model and a neural network learnt from simulated or real-world experimental data. Using a quadrotor, we benchmark our hybrid model against a state-of-the-art Gaussian Process (GP) model and show that the hybrid model provides more accurate predictions of the quadrotor dynamics and is able to generalize beyond the training data. To improve closed-loop performance, the hybrid model is integrated into a novel MPC framework, known as KNODE-MPC. Results show that the integrated framework achieves 73% improvement in simulations and more than 14% in physical experiments, in terms of trajectory tracking performance.
【3】 SeDyT: A General Framework for Multi-Step Event Forecasting via Sequence Modeling on Dynamic Entity Embeddings 标题:SeDyT:基于动态实体嵌入序列建模的多步事件预测通用框架 链接:https://arxiv.org/abs/2109.04550
作者:Hongkuan Zhou,James Orme-Rogers,Rajgopal Kannan,Viktor Prasanna 机构:University of Southern California, Los Angeles, California, USA, US Army Research Lab 摘要:时态知识图以主题、关系、对象和时间戳的形式存储事件,这些事件通常由动态异构图表示。在时态知识图推理中,事件预测是一项关键且具有挑战性的任务,它可以预测未来事件的主题或对象。为了在未来多步获得时间嵌入,现有方法学习捕获观测事件联合分布的生成模型。为了降低高计算成本,这些方法依赖于不现实的独立假设以及训练和推理中的近似值。在这项工作中,我们提出了SeDyT,一个对动态实体嵌入执行序列建模的判别框架,以解决多步骤事件预测问题。SeDyT由两个部分组成:一个时态图神经网络,用于生成过去的动态实体嵌入;一个序列模型,用于预测未来的实体嵌入。与生成模型相比,SeDyT不依赖任何基于启发式的概率模型,并且在训练和推理方面具有较低的计算复杂度。SeDyT与大多数时态图神经网络和序列模型兼容。我们还设计了一种有效的训练方法,在一个梯度下降传播中训练两个分量。我们在五个流行的数据集上评估了SeDyT的性能。通过结合时态图神经网络模型和序列模型,SeDyT在不使用验证集时平均提高了2.4%的MRR,在使用验证集时平均提高了10%以上。 摘要:Temporal Knowledge Graphs store events in the form of subjects, relations, objects, and timestamps which are often represented by dynamic heterogeneous graphs. Event forecasting is a critical and challenging task in Temporal Knowledge Graph reasoning that predicts the subject or object of an event in the future. To obtain temporal embeddings multi-step away in the future, existing methods learn generative models that capture the joint distribution of the observed events. To reduce the high computation costs, these methods rely on unrealistic assumptions of independence and approximations in training and inference. In this work, we propose SeDyT, a discriminative framework that performs sequence modeling on the dynamic entity embeddings to solve the multi-step event forecasting problem. SeDyT consists of two components: a Temporal Graph Neural Network that generates dynamic entity embeddings in the past and a sequence model that predicts the entity embeddings in the future. Compared with the generative models, SeDyT does not rely on any heuristic-based probability model and has low computation complexity in both training and inference. SeDyT is compatible with most Temporal Graph Neural Networks and sequence models. We also design an efficient training method that trains the two components in one gradient descent propagation. We evaluate the performance of SeDyT on five popular datasets. By combining temporal Graph Neural Network models and sequence models, SeDyT achieves an average of 2.4% MRR improvement when not using the validation set and more than 10% MRR improvement when using the validation set.
其他神经网络|深度学习|模型|建模(12篇)
【1】 Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization 标题:基于话题感知的对比学习在抽象对话摘要中的应用 链接:https://arxiv.org/abs/2109.04994
作者:Junpeng Liu,Yanyan Zou,Hainan Zhang,Hongshen Chen,Zhuoye Ding,Caixia Yuan,Xiaojie Wang 机构:Beijing University of Posts and Telecommunications, Beijing, China 备注:EMNLP 2021 摘要:与结构良好的文本(如新闻报道和百科全书文章)不同,对话内容通常来自两个或多个对话者,彼此交换信息。在这种情况下,对话的主题可能会随着进展而变化,并且某个主题的关键信息通常分散在不同说话人的多个话语中,这对抽象总结对话提出了挑战。为了捕捉会话中的各种话题信息,并概述所捕捉话题的显著事实,本研究提出了两个主题感知对比学习目标,即连贯性检测和子摘要生成目标,这将隐含地模拟话题变化,并处理对话摘要任务中信息分散的挑战。提出的对比目标作为主要对话摘要任务的辅助任务,通过替代参数更新策略进行统一。在基准数据集上的大量实验表明,所提出的简单方法显著优于强基线,并实现了最新的性能。代码和经过训练的模型可通过href公开获取{https://github.com/Junpliu/ConDigSum}{https://github.com/Junpliu/ConDigSum}. 摘要:Unlike well-structured text, such as news reports and encyclopedia articles, dialogue content often comes from two or more interlocutors, exchanging information with each other. In such a scenario, the topic of a conversation can vary upon progression and the key information for a certain topic is often scattered across multiple utterances of different speakers, which poses challenges to abstractly summarize dialogues. To capture the various topic information of a conversation and outline salient facts for the captured topics, this work proposes two topic-aware contrastive learning objectives, namely coherence detection and sub-summary generation objectives, which are expected to implicitly model the topic change and handle information scattering challenges for the dialogue summarization task. The proposed contrastive objectives are framed as auxiliary tasks for the primary dialogue summarization task, united via an alternative parameter updating strategy. Extensive experiments on benchmark datasets demonstrate that the proposed simple method significantly outperforms strong baselines and achieves new state-of-the-art performance. The code and trained models are publicly available via href{https://github.com/Junpliu/ConDigSum}{https://github.com/Junpliu/ConDigSum}.
【2】 Saliency Guided Experience Packing for Replay in Continual Learning 标题:用于持续学习中重放的显著引导体验打包 链接:https://arxiv.org/abs/2109.04954
作者:Gobinda Saha,Kaushik Roy 机构: School of Electrical and Computer Engineering, Purdue University 备注:13 pages, 3 figures 摘要:人工学习系统希望通过不断地从一系列任务中学习而不忘记过去的知识来模仿人类智能。实现这种学习的一种方法是将过去的经验以输入示例的形式存储在情景记忆中,并在学习新任务时重放它们。然而,当内存变小时,这种方法的性能会受到影响。在本文中,我们提出了一种新的经验重放方法,通过观察显著性图来选择过去的经验,显著性图为模型的决策提供了直观的解释。在这些显著性映射的指导下,我们只使用对模型预测重要的输入图像的部分或面片来填充内存。在学习新任务时,我们使用适当的零填充重放这些内存补丁,以提醒模型其过去的决定。我们在不同的图像分类数据集上评估了我们的算法,并报告了比最先进的方法更好的性能。通过定性和定量分析,我们发现我们的方法在没有任何记忆增长的情况下捕获了更丰富的过去经验总结,因此在小情景记忆中表现良好。 摘要:Artificial learning systems aspire to mimic human intelligence by continually learning from a stream of tasks without forgetting past knowledge. One way to enable such learning is to store past experiences in the form of input examples in episodic memory and replay them when learning new tasks. However, performance of such method suffers as the size of the memory becomes smaller. In this paper, we propose a new approach for experience replay, where we select the past experiences by looking at the saliency maps which provide visual explanations for the model's decision. Guided by these saliency maps, we pack the memory with only the parts or patches of the input images important for the model's prediction. While learning a new task, we replay these memory patches with appropriate zero-padding to remind the model about its past decisions. We evaluate our algorithm on diverse image classification datasets and report better performance than the state-of-the-art approaches. With qualitative and quantitative analyses we show that our method captures richer summary of past experiences without any memory increase, and hence performs well with small episodic memory.
【3】 ProcK: Machine Learning for Knowledge-Intensive Processes 标题:PROCK:面向知识密集型过程的机器学习 链接:https://arxiv.org/abs/2109.04881
作者:Tobias Jacobs,Jingyi Yu,Julia Gastinger,Timo Sztyler 机构:NEC Laboratories Europe GmbH, Kurf¨ursten-Anlage , Heidelberg, Germany 摘要:流程挖掘处理从业务流程执行日志中提取知识。传统的流程挖掘任务,如流程模型生成或一致性检查,依赖于最低限度的功能集,其中每个事件仅以其案例标识符、活动类型和时间戳为特征。相比之下,现代机器学习的成功是建立在将任何可用数据作为直接输入并在训练期间自动构建功能层的模型之上的。在这项工作中,我们引入了ProcK(Process&Knowledge),这是一种新的管道,用于构建业务流程预测模型,该模型同时考虑了事件日志形式的顺序数据和图形结构知识库中表示的丰富语义信息。混合方法使ProcK能够灵活地利用组织数据库中的所有信息。从关系数据库中提取相互链接的事件日志和知识库的组件是管道的一部分。我们通过在OULAD e-learning数据集上对ProcK进行预测任务训练,展示了ProcK的强大功能,在预测学生辍学和预测其成功的任务方面,我们取得了最先进的成绩。我们还将我们的方法应用于一些额外的机器学习任务,包括考试分数预测和只考虑课程前几周记录的数据的早期预测。 摘要:Process mining deals with extraction of knowledge from business process execution logs. Traditional process mining tasks, like process model generation or conformance checking, rely on a minimalistic feature set where each event is characterized only by its case identifier, activity type, and timestamp. In contrast, the success of modern machine learning is based on models that take any available data as direct input and build layers of features automatically during training. In this work, we introduce ProcK (Process & Knowledge), a novel pipeline to build business process prediction models that take into account both sequential data in the form of event logs and rich semantic information represented in a graph-structured knowledge base. The hybrid approach enables ProcK to flexibly make use of all information residing in the databases of organizations. Components to extract inter-linked event logs and knowledge bases from relational databases are part of the pipeline. We demonstrate the power of ProcK by training it for prediction tasks on the OULAD e-learning dataset, where we achieve state-of-the-art performance on the tasks of predicting student dropout from courses and predicting their success. We also apply our method on a number of additional machine learning tasks, including exam score prediction and early predictions that only take into account data recorded during the first weeks of the courses.
【4】 Hybrid modeling of the human cardiovascular system using NeuralFMUs 标题:基于NeuralFMU的人体心血管系统混合建模 链接:https://arxiv.org/abs/2109.04880
作者:Tobias Thummerer,Johannes Tintenherr,Lars Mikelsons 机构:Chair of Mechatronics, Augsburg University, .uni-augsburg.de 摘要:混合建模是第一原理和机器学习模型的结合,是一个新兴的研究领域,受到越来越多的关注。即使混合模型为学术实例带来了令人生畏的结果,但仍然存在不同的技术挑战,阻碍了混合建模在现实世界应用中的应用。通过介绍融合了FMU、数值ODE解算器和ANN的Neuralmus,我们为使用不同建模工具的各种第一原理模型作为混合模型的一部分铺平了道路。这一贡献处理了一个复杂现实世界示例的混合建模:从人类心血管系统(动脉侧)的简化一维流体模型开始,目的是从数据中学习被忽略的物理效应,如动脉弹性。我们将证明,与仅基于第一原理的建模相比,混合建模过程更舒适,需要更少的系统知识,因此更不容易出错。此外,与纯第一原理白盒模型相比,得到的混合模型在计算性能上有所改进,同时仍然满足所考虑的血流动力学量的准确性要求。本文以一般方式解释了所提出技术的使用,所考虑的用例可以作为医学领域内外其他建模和仿真应用的示例。 摘要:Hybrid modeling, the combination of first principle and machine learning models, is an emerging research field that gathers more and more attention. Even if hybrid models produce formidable results for academic examples, there are still different technical challenges that hinder the use of hybrid modeling in real-world applications. By presenting NeuralFMUs, the fusion of a FMU, a numerical ODE solver and an ANN, we are paving the way for the use of a variety of first principle models from different modeling tools as parts of hybrid models. This contribution handles the hybrid modeling of a complex, real-world example: Starting with a simplified 1D-fluid model of the human cardiovascular system (arterial side), the aim is to learn neglected physical effects like arterial elasticity from data. We will show that the hybrid modeling process is more comfortable, needs less system knowledge and is therefore less error-prone compared to modeling solely based on first principle. Further, the resulting hybrid model has improved in computation performance, compared to a pure first principle white-box model, while still fulfilling the requirements regarding accuracy of the considered hemodynamic quantities. The use of the presented techniques is explained in a general manner and the considered use-case can serve as example for other modeling and simulation applications in and beyond the medical domain.
【5】 Automated Machine Learning, Bounded Rationality, and Rational Metareasoning 标题:自动机器学习、有限理性和理性元区域划分 链接:https://arxiv.org/abs/2109.04744
作者:Eyke Hüllermeier,Felix Mohr,Alexander Tornede,Marcel Wever 机构: LMU Munich, Munich, Germany, Universidad de La Sabana, Ch´ıa, Cundinamarca, Colombia, Paderborn University, Paderborn, Germany 备注:Accepted at ECMLPKDD WORKSHOP ON AUTOMATING DATA SCIENCE (ADS2021) - this https URL 摘要:有限理性的概念源于这样一种认识,即拥有有限认知或计算资源的主体无法实现完全理性的行为。关于有限理性的研究,主要由赫BERT·西蒙(Herbert Simon)发起,在经济学和社会科学领域有着悠久的传统,但在现代人工智能和智能体设计中也发挥着重要作用。在有限资源下采取行动需要代理思考如何以最佳方式使用这些资源,从而在元级别上进行推理和决策。在本文中,我们将从有限理性的角度来看待自动机器学习(AutoML)和相关问题,本质上将AutoML工具视为一个代理,它必须在给定的数据集上训练模型,并将寻找一种好的方法(合适的“ML管道”)视为元层面上的考虑。 摘要:The notion of bounded rationality originated from the insight that perfectly rational behavior cannot be realized by agents with limited cognitive or computational resources. Research on bounded rationality, mainly initiated by Herbert Simon, has a longstanding tradition in economics and the social sciences, but also plays a major role in modern AI and intelligent agent design. Taking actions under bounded resources requires an agent to reflect on how to use these resources in an optimal way - hence, to reason and make decisions on a meta-level. In this paper, we will look at automated machine learning (AutoML) and related problems from the perspective of bounded rationality, essentially viewing an AutoML tool as an agent that has to train a model on a given set of data, and the search for a good way of doing so (a suitable "ML pipeline") as deliberation on a meta-level.
【6】 6MapNet: Representing soccer players from tracking data by a triplet network 标题:6MapNet:用三元组网络表示足球运动员的跟踪数据 链接:https://arxiv.org/abs/2109.04720
作者:Hyunsung Kim,Jihun Kim,Dongwook Chung,Jonghyun Lee,Jinsung Yoon,Sang-Ki Ko 机构: Fitogether Inc., Seoul, South Korea, Seoul National University, Seoul, South Korea, Kangwon National University, Chuncheon, South Korea 备注:12 pages, 4 figures, In 8th Workshop on Machine Learning and Data Mining for Sports Analytics (MLSA21) 摘要:虽然个别足球运动员的价值观已经达到了天文数字,但主观判断仍然在球员分析中发挥着重要作用。最近,有新的尝试使用基于视频的事件流数据定量地掌握球员的风格。然而,由于注释成本高和事件流数据稀疏,它们在可伸缩性方面存在一些限制。在本文中,我们构建了一个名为6MPANET的三重网络,该网络可以使用游戏中的GPS数据有效地捕获玩家的移动方式。在没有任何足球特定动作注释的情况下,我们使用球员的位置和速度生成两种类型的热图。然后,我们的子网络将这些热图对映射为特征向量,这些特征向量的相似性对应于游戏风格的实际相似性。实验结果表明,该方法只需少量的匹配就能准确地识别玩家。 摘要:Although the values of individual soccer players have become astronomical, subjective judgments still play a big part in the player analysis. Recently, there have been new attempts to quantitatively grasp players' styles using video-based event stream data. However, they have some limitations in scalability due to high annotation costs and sparsity of event stream data. In this paper, we build a triplet network named 6MapNet that can effectively capture the movement styles of players using in-game GPS data. Without any annotation of soccer-specific actions, we use players' locations and velocities to generate two types of heatmaps. Our subnetworks then map these heatmap pairs into feature vectors whose similarity corresponds to the actual similarity of playing styles. The experimental results show that players can be accurately identified with only a small number of matches by our method.
【7】 EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling 标题:EfficientCLIP:基于集成自信学习和语言建模的高效跨模态预训练 链接:https://arxiv.org/abs/2109.04699
作者:Jue Wang,Haofan Wang,Jincan Deng,Weijia Wu,Debing Zhang 机构:†Zhongnan University of Economics and Law, ¶Zhejiang University, ‡Kuaishou Technology 摘要:虽然大规模的学前训练在弥合视力和语言之间的差距方面取得了巨大成就,但它仍然面临一些挑战。首先,预训练的成本很高。其次,没有有效的方法来处理数据噪声,这会降低模型性能。第三,以前的方法只利用有限的图像-文本配对数据,而忽略了更丰富的单模态数据,这可能导致对单模态下游任务的泛化能力较差。在这项工作中,我们提出了一种通过集成自信学习获得低噪声数据子集的高效IP方法。超丰富的非成对单模态文本数据用于增强文本分支的泛化。与CLIP和WenLan相比,我们仅使用1/10的训练资源就实现了中文跨模态检索任务的最新性能,同时对单模态任务(包括文本检索和文本分类)表现出了出色的泛化能力。 摘要:While large scale pre-training has achieved great achievements in bridging the gap between vision and language, it still faces several challenges. First, the cost for pre-training is expensive. Second, there is no efficient way to handle the data noise which degrades model performance. Third, previous methods only leverage limited image-text paired data, while ignoring richer single-modal data, which may result in poor generalization to single-modal downstream tasks. In this work, we propose an EfficientCLIP method via Ensemble Confident Learning to obtain a less noisy data subset. Extra rich non-paired single-modal text data is used for boosting the generalization of text branch. We achieve the state-of-the-art performance on Chinese cross-modal retrieval tasks with only 1/10 training resources compared to CLIP and WenLan, while showing excellent generalization to single-modal tasks, including text retrieval and text classification.
【8】 Dynamic Collective Intelligence Learning: Finding Efficient Sparse Model via Refined Gradients for Pruned Weights 标题:动态集体智能学习:通过精化剪枝权重梯度找到有效的稀疏模型 链接:https://arxiv.org/abs/2109.04660
作者:Jangho Kim,Jayeon Yoo,Yeji Song,KiYoon Yoo,Nojun Kwak 摘要:随着深度神经网络(DNN)的发展,DNN参数的数量急剧增加。这使得DNN模型很难部署在资源有限的嵌入式系统上。为了缓解这一问题,出现了动态剪枝方法,该方法通过使用直通估计器(STE)来近似剪枝权重的梯度,试图在训练期间发现不同的稀疏模式。STE可以帮助修剪后的权重在发现动态稀疏模式的过程中恢复。然而,由于STE近似的梯度信号不可靠,使用这些粗梯度会导致训练不稳定和性能下降。在这项工作中,为了解决这个问题,我们引入了改进的梯度,通过从两个权重集(修剪和未修剪)形成双转发路径来更新修剪后的权重。我们提出了一种新的动态集体智能学习(DCIL),它利用了两个权重集的集体智能之间的学习协同作用。我们通过在CIFAR和ImageNet数据集上显示训练稳定性和模型性能的增强来验证改进梯度的有用性。DCIL优于以前提出的各种剪枝方案,包括其他动态剪枝方法,在训练期间具有更强的稳定性。 摘要:With the growth of deep neural networks (DNN), the number of DNN parameters has drastically increased. This makes DNN models hard to be deployed on resource-limited embedded systems. To alleviate this problem, dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. STE can help the pruned weights revive in the process of finding dynamic sparsity patterns. However, using these coarse gradients causes training instability and performance degradation owing to the unreliable gradient signal of the STE approximation. In this work, to tackle this issue, we introduce refined gradients to update the pruned weights by forming dual forwarding paths from two sets (pruned and unpruned) of weights. We propose a novel Dynamic Collective Intelligence Learning (DCIL) which makes use of the learning synergy between the collective intelligence of both weight sets. We verify the usefulness of the refined gradients by showing enhancements in the training stability and the model performance on the CIFAR and ImageNet datasets. DCIL outperforms various previously proposed pruning schemes including other dynamic pruning methods with enhanced stability during training.
【9】 Learning to Teach with Student Feedback 标题:学会利用学生反馈进行教学 链接:https://arxiv.org/abs/2109.04641
作者:Yitao Liu,Tianxiang Sun,Xipeng Qiu,Xuanjing Huang 机构:School of Computer Science, Fudan University, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University 摘要:知识提取(KD)由于其在压缩大规模预训练模型方面的有效性而备受关注。在典型的KD方法中,训练小学生模型以匹配大教师模型生成的软目标。然而,师生之间的互动是单向的。教师通常是固定的,一旦训练,导致静态软目标被蒸馏。这种单向互动导致教师无法感知学生的特征及其训练进度。为了解决这个问题,我们提出了交互式知识提炼(IKD),它也允许教师从学生的反馈中学习教学。特别是,IKD训练教师模型,以便在每个训练步骤为特定学生生成特定的软目标。教师和学生的联合优化通过两个迭代步骤实现:一个是课程步骤,以教师的软目标优化学生;一个是考试步骤,以学生的反馈优化教师。IKD是一个与大多数现有知识提取方法正交的通用框架。实验结果表明,在各种自然语言处理任务中,IKD方法的性能优于传统KD方法。 摘要:Knowledge distillation (KD) has gained much attention due to its effectiveness in compressing large-scale pre-trained models. In typical KD methods, the small student model is trained to match the soft targets generated by the big teacher model. However, the interaction between student and teacher is one-way. The teacher is usually fixed once trained, resulting in static soft targets to be distilled. This one-way interaction leads to the teacher's inability to perceive the characteristics of the student and its training progress. To address this issue, we propose Interactive Knowledge Distillation (IKD), which also allows the teacher to learn to teach from the feedback of the student. In particular, IKD trains the teacher model to generate specific soft target at each training step for a certain student. Joint optimization for both teacher and student is achieved by two iterative steps: a course step to optimize student with the soft target of teacher, and an exam step to optimize teacher with the feedback of student. IKD is a general framework that is orthogonal to most existing knowledge distillation methods. Experimental results show that IKD outperforms traditional KD methods on various NLP tasks.
【10】 Efficiently Identifying Task Groupings for Multi-Task Learning 标题:多任务学习中任务分组的有效识别 链接:https://arxiv.org/abs/2109.04617
作者:Christopher Fifty,Ehsan Amid,Zhe Zhao,Tianhe Yu,Rohan Anil,Chelsea Finn 机构:Google Brain, Google Research, Stanford University 摘要:多任务学习可以利用一个任务学习到的信息来帮助其他任务的训练。尽管有这种能力,但在一个模型中天真地将所有任务训练在一起通常会降低性能,并且通过任务分组组合进行彻底搜索的成本可能会高得令人望而却步。因此,在没有明确解决方案的情况下,有效地确定将从联合训练中受益的任务仍然是一个具有挑战性的设计问题。在本文中,我们提出了一种在多任务学习模型中选择哪些任务应该一起训练的方法。我们的方法通过将所有任务共同训练并量化一个任务的梯度对另一个任务损失的影响来确定单个训练运行中的任务分组。在大规模Taskonomy计算机视觉数据集上,我们发现,与简单地同时训练所有任务相比,这种方法可以减少10.0%的测试损失,同时操作速度比最先进的任务分组方法快11.6倍。 摘要:Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from co-training remains a challenging design question without a clear solution. In this paper, we suggest an approach to select which tasks should train together in multi-task learning models. Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss. On the large-scale Taskonomy computer vision dataset, we find this method can decrease test loss by 10.0% compared to simply training all tasks together while operating 11.6 times faster than a state-of-the-art task grouping method.
【11】 Identifying Morality Frames in Political Tweets using Relational Learning 标题:利用关系学习识别政治推文中的道德框架 链接:https://arxiv.org/abs/2109.04535
作者:Shamik Roy,Maria Leonor Pacheco,Dan Goldwasser 机构:Department of Computer Science, Purdue University, West Lafayette, IN, USA 备注:Accepted to EMNLP 2021 摘要:从文本中提取道德情感是理解公众舆论、社会运动和政策决策的重要组成部分。道德基础理论确定了五个道德基础,每个道德基础都有正负极性。然而,道德情感往往是由其目标驱动的,目标可以对应于个人或集体实体。在本文中,我们介绍了道德框架,这是一个组织针对不同实体的道德态度的表示框架,并提出了一个新颖、高质量的美国政客所写推文注释数据集。然后,我们提出了一个关系学习模型来联合预测对实体和道德基础的道德态度。我们做了定性和定量的评估,表明不同的政治意识形态对实体的道德情感差异很大。 摘要:Extracting moral sentiment from text is a vital component in understanding public opinion, social movements, and policy decisions. The Moral Foundation Theory identifies five moral foundations, each associated with a positive and negative polarity. However, moral sentiment is often motivated by its targets, which can correspond to individuals or collective entities. In this paper, we introduce morality frames, a representation framework for organizing moral attitudes directed at different entities, and come up with a novel and high-quality annotated dataset of tweets written by US politicians. Then, we propose a relational learning model to predict moral attitudes towards entities and moral foundations jointly. We do qualitative and quantitative evaluations, showing that moral sentiment towards entities differs highly across political ideologies.
【12】 Differential Privacy in Personalized Pricing with Nonparametric Demand Models 标题:非参数需求模型下个性化定价中的差分隐私 链接:https://arxiv.org/abs/2109.04615
作者:Xi Chen,Sentao Miao,Yining Wang 机构:Leonard N. Stern School of Business, New York University, New York, NY , USA, McGill University, Montreal, QC H,A ,G, Canada, Warrington College of Business, University of Florida, Gainesville, FL , USA 摘要:近几十年来,信息技术的进步和丰富的个人数据为算法个性化定价的应用提供了便利。然而,这导致人们越来越担心对抗性攻击可能会侵犯隐私。为了解决隐私问题,本文研究了数据隐私保护下的非参数需求模型的动态个性化定价问题。介绍了两个在实践中得到广泛应用的数据隐私概念:textit{central differential privacy(CDP)}和textit{local differential privacy(LDP)},这两个概念在许多情况下被证明比CDP更强。我们开发了两种算法,分别在满足CDP和LDP保证的情况下进行定价决策和动态学习未知需求。特别地,对于具有CDP保证的算法,证明了遗憾最多为$tilde O(T^{(d 2)/(d 4)} varepsilon^{-1}T^{d/(d 4)}$。这里,参数$T$表示时间范围的长度,$d$表示个性化信息向量的维度,关键参数$varepsilon>0$测量隐私的强度(较小的$varepsilon$表示更强的隐私保护)。另一方面,对于具有LDP保证的算法,它的遗憾被证明最多为$tilde O(varepsilon^{-2/(d 2)}T^{(d 1)/(d 2)}),这是接近最优的,因为我们证明了对于任何具有LDP保证的算法$Omega(varepsilon^{-2/(d 2)}T^{(d 1)/(d 2)})的下界。 摘要:In the recent decades, the advance of information technology and abundant personal data facilitate the application of algorithmic personalized pricing. However, this leads to the growing concern of potential violation of privacy due to adversarial attack. To address the privacy issue, this paper studies a dynamic personalized pricing problem with textit{unknown} nonparametric demand models under data privacy protection. Two concepts of data privacy, which have been widely applied in practices, are introduced: textit{central differential privacy (CDP)} and textit{local differential privacy (LDP)}, which is proved to be stronger than CDP in many cases. We develop two algorithms which make pricing decisions and learn the unknown demand on the fly, while satisfying the CDP and LDP gurantees respectively. In particular, for the algorithm with CDP guarantee, the regret is proved to be at most $tilde O(T^{(d 2)/(d 4)} varepsilon^{-1}T^{d/(d 4)})$. Here, the parameter $T$ denotes the length of the time horizon, $d$ is the dimension of the personalized information vector, and the key parameter $varepsilon>0$ measures the strength of privacy (smaller $varepsilon$ indicates a stronger privacy protection). On the other hand, for the algorithm with LDP guarantee, its regret is proved to be at most $tilde O(varepsilon^{-2/(d 2)}T^{(d 1)/(d 2)})$, which is near-optimal as we prove a lower bound of $Omega(varepsilon^{-2/(d 2)}T^{(d 1)/(d 2)})$ for any algorithm with LDP guarantee.
其他(14篇)
【1】 Assessing the Quality of the Datasets by Identifying Mislabeled Samples 标题:通过识别标记错误的样本来评估数据集的质量 链接:https://arxiv.org/abs/2109.05000
作者:Vaibhav Pulastya,Gaurav Nuti,Yash Kumar Atri,Tanmoy Chakraborty 机构:IIIT-Delhi, New Delhi, India 备注:Short paper accepted at ASONAM 2021 摘要:由于过分强调数据的数量,数据质量往往被忽视。然而,并非所有的训练数据点对学习的贡献都是相同的。特别是,如果标记错误,可能会严重损害模型的性能和从分布中概括的能力,因为模型可能最终学习数据集中存在的虚假工件。这一问题由于高度参数化和复杂的深层神经网络的流行而变得更加复杂,这些网络凭借其高容量,最终能够记住数据集中存在的噪声。本文提出了一种新的统计方法——噪声分数,作为对每个数据点质量的度量,以根据潜在空间表示的变化来识别此类错误标记的样本。在我们的工作中,我们使用由数据质量监督变分自动编码器(AQUAVS)推理网络导出的表示。我们的方法利用了一个事实,即属于同一类的样本将具有类似的潜在表示。因此,通过识别潜在空间中的异常值,我们可以发现错误标记的样本。我们通过在不同噪声环境下破坏MNIST、FashionMNIST和CIFAR10/100数据集以识别错误标记样本的实验来验证我们提出的统计数据。我们进一步展示了每个数据集的分类任务在准确性方面的显著改进。 摘要:Due to the over-emphasize of the quantity of data, the data quality has often been overlooked. However, not all training data points contribute equally to learning. In particular, if mislabeled, it might actively damage the performance of the model and the ability to generalize out of distribution, as the model might end up learning spurious artifacts present in the dataset. This problem gets compounded by the prevalence of heavily parameterized and complex deep neural networks, which can, with their high capacity, end up memorizing the noise present in the dataset. This paper proposes a novel statistic -- noise score, as a measure for the quality of each data point to identify such mislabeled samples based on the variations in the latent space representation. In our work, we use the representations derived by the inference network of data quality supervised variational autoencoder (AQUAVS). Our method leverages the fact that samples belonging to the same class will have similar latent representations. Therefore, by identifying the outliers in the latent space, we can find the mislabeled samples. We validate our proposed statistic through experimentation by corrupting MNIST, FashionMNIST, and CIFAR10/100 datasets in different noise settings for the task of identifying mislabelled samples. We further show significant improvements in accuracy for the classification task for each dataset.
【2】 A Neural Tangent Kernel Perspective of Infinite Tree Ensembles 标题:无限树系综的神经切核透视 链接:https://arxiv.org/abs/2109.04983
作者:Ryuichi Kanoh,Mahito Sugiyama 机构:National Institute of Informatics, The Graduate University for Advanced Studies, SOKENDAI 摘要:在实际应用中,集成树模型是与神经网络一起最流行的模型之一。软树是决策树的一种变体。不使用贪婪方法搜索分割规则,而是使用梯度方法训练软树,其中整个分割操作以可微形式表示。尽管近年来这种软树的集合被越来越多地使用,但对于理解它们的行为几乎没有做过什么理论工作。在本文中,通过考虑无限软树的集合,我们引入并研究了树神经切线核(TNTK),它为无限软树集合的行为提供了新的见解。利用TNTK,我们成功地从理论上找到了一些非平凡的性质,如不经意树结构的影响和树加深引起的TNTK简并。此外,我们还使用TNTK实证检验了无限软树集合的性能。 摘要:In practical situations, the ensemble tree model is one of the most popular models along with neural networks. A soft tree is one of the variants of a decision tree. Instead of using a greedy method for searching splitting rules, the soft tree is trained using a gradient method in which the whole splitting operation is formulated in a differentiable form. Although ensembles of such soft trees have been increasingly used in recent years, little theoretical work has been done for understanding their behavior. In this paper, by considering an ensemble of infinite soft trees, we introduce and study the Tree Neural Tangent Kernel (TNTK), which provides new insights into the behavior of the infinite ensemble of soft trees. Using the TNTK, we succeed in theoretically finding several non-trivial properties, such as the effect of the oblivious tree structure and the degeneracy of the TNTK induced by the deepening of the trees. Moreover, we empirically examine the performance of an ensemble of infinite soft trees using the TNTK.
【3】 Simulating the Effects of Eco-Friendly Transportation Selections for Air Pollution Reduction 标题:生态友好型交通选择对减少空气污染影响的模拟研究 链接:https://arxiv.org/abs/2109.04831
作者:Keiichi Ochiai,Tsukasa Demizu,Shin Ishiguro,Shohei Maruyama,Akihiro Kawana 机构:NTT DOCOMO, INC. 备注:KDD Cup 2019 Regular ML Track Task 2, 1st Prize this https URL 摘要:减少二氧化碳和PM2.5排放等空气污染是世界许多国家面临的最重要问题之一。选择一种环境友好的交通方式是个人减少日常生活中空气污染的有效途径。在这项研究中,我们提出了一种方法,通过使用地图搜索日志模拟生态友好型交通方式选择对减少空气污染的有效性。我们将交通方式选择描述为一个组合优化问题,以二氧化碳排放总量为例,以空气污染和平均出行时间为约束条件。优化结果表明,二氧化碳排放总量可以减少9.23%,而平均行程时间实际上可以减少9.96%。我们的研究方案在2019年KDD杯的常规机器学习竞赛轨道任务2中获得一等奖。 摘要:Reducing air pollution, such as CO2 and PM2.5 emissions, is one of the most important issues for many countries worldwide. Selecting an environmentally friendly transport mode can be an effective approach of individuals to reduce air pollution in daily life. In this study, we propose a method to simulate the effectiveness of an eco-friendly transport mode selection for reducing air pollution by using map search logs. We formulate the transport mode selection as a combinatorial optimization problem with the constraints regarding the total amount of CO2 emissions as an example of air pollution and the average travel time. The optimization results show that the total amount of CO2 emissions can be reduced by 9.23%, whereas the average travel time can in fact be reduced by 9.96%. Our research proposal won first prize in Regular Machine Learning Competition Track Task 2 at KDD Cup 2019.
【4】 A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy 标题:一种反序剪枝的快速PC算法及其并行化策略 链接:https://arxiv.org/abs/2109.04626
作者:Kai Zhang,Chao Tian,Kun Zhang,Todd Johnson,Xiaoqian Jiang 机构: Texas A&M University 备注:37 pages 摘要:PC算法是观测数据因果结构发现的最新算法。在最坏的情况下,由于条件独立性测试是以穷举搜索方式执行的,因此计算成本可能会很高。这使得当任务包含数百或数千个节点时,特别是当真正的底层因果图是稠密的时,算法在计算上很难处理。我们提出了一个关键的观察结果,即呈现两个独立节点的条件集是非唯一的,并且包含某些冗余节点不会牺牲结果的准确性。基于这一发现,我们工作的创新有两个方面。首先,我们创新了一种保留顺序链接剪枝PC算法,该算法显著提高了算法的效率。其次,我们提出了一种利用张量计算进行统计独立性测试的并行计算策略,从而进一步提高了计算速度。我们还证明了在温和的图和数据维数假设下,该算法不会导致统计功率损失。实验结果表明,在稠密的95节点图上,与PC算法相比,该算法的单线程版本可以实现6倍的加速,而并行版本可以实现825倍的加速。我们还证明了该算法在与传统PC算法相同的条件下是一致的。 摘要:The PC algorithm is the state-of-the-art algorithm for causal structure discovery on observational data. It can be computationally expensive in the worst case due to the conditional independence tests are performed in an exhaustive-searching manner. This makes the algorithm computationally intractable when the task contains several hundred or thousand nodes, particularly when the true underlying causal graph is dense. We propose a critical observation that the conditional set rendering two nodes independent is non-unique, and including certain redundant nodes do not sacrifice result accuracy. Based on this finding, the innovations of our work are two-folds. First, we innovate on a reserve order linkage pruning PC algorithm which significantly increases the algorithm's efficiency. Second, we propose a parallel computing strategy for statistical independence tests by leveraging tensor computation, which brings further speedup. We also prove the proposed algorithm does not induce statistical power loss under mild graph and data dimensionality assumptions. Experimental results show that the single-threaded version of the proposed algorithm can achieve a 6-fold speedup compared to the PC algorithm on a dense 95-node graph, and the parallel version can make a 825-fold speed-up. We also provide proof that the proposed algorithm is consistent under the same set of conditions with conventional PC algorithm.
【5】 ReLU Regression with Massart Noise 标题:具有Massart噪声的RELU回归 链接:https://arxiv.org/abs/2109.04623
作者:Ilias Diakonikolas,Jongho Park,Christos Tzamos 机构:University of Wisconsin-Madison 摘要:我们研究ReLU回归的基本问题,其目标是将校正线性单位(ReLU)拟合到数据中。这种有监督的学习任务在可实现的环境中是有效的,但已知在对抗性标签噪声下计算困难。在这项工作中,我们重点研究了马萨特噪声模型中的ReLU回归,这是一种自然的、经过充分研究的半随机噪声模型。在该模型中,每个点的标签都是根据类中的一个函数生成的,但允许对手以某种概率任意更改该值,即{em至多}$eta<1/2$。我们开发了一种有效的算法,在对基础分布进行轻度反集中假设的情况下,在该模型中实现精确的参数恢复。这些假设对于理论上可能的准确恢复是必要的。我们证明,在合成数据和真实数据上,我们的算法明显优于$ellu 1$和$ellu 2$回归的简单应用程序。 摘要:We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data. This supervised learning task is efficiently solvable in the realizable setting, but is known to be computationally hard with adversarial label noise. In this work, we focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model. In this model, the label of every point is generated according to a function in the class, but an adversary is allowed to change this value arbitrarily with some probability, which is {em at most} $eta < 1/2$. We develop an efficient algorithm that achieves exact parameter recovery in this model under mild anti-concentration assumptions on the underlying distribution. Such assumptions are necessary for exact recovery to be information-theoretically possible. We demonstrate that our algorithm significantly outperforms naive applications of $ell_1$ and $ell_2$ regression on both synthetic and real data.
【6】 Query-driven Segment Selection for Ranking Long Documents 标题:用于长文档排序的查询驱动的片段选择 链接:https://arxiv.org/abs/2109.04611
作者:Youngwoo Kim,Razieh Rahimi,Hamed Bonab,James Allan 机构:University of Massachusetts Amherst, Amherst, MA, USA 备注:None 摘要:基于Transformer的rankers已显示出最先进的性能。然而,他们的自我注意操作大多无法处理长序列。训练这些RANKER的常用方法之一是启发式地选择每个文档的某些段(如第一段)作为训练数据。但是,这些段可能不包含文档中与查询相关的部分。为了解决这个问题,我们提出了从长文档中选择查询驱动的片段来构建训练数据。段选择器为相关样本提供更精确的标签,并提供更难预测的非相关样本。实验结果表明,使用所提出的分段选择器训练的基于基本BERT的ranker显著优于通过启发式选择的分段训练的ranker,其性能与具有局部自我注意的最新模型相当,能够处理更长的输入序列。我们的发现为设计高效的Transformer式起重机开辟了新的方向。 摘要:Transformer-based rankers have shown state-of-the-art performance. However, their self-attention operation is mostly unable to process long sequences. One of the common approaches to train these rankers is to heuristically select some segments of each document, such as the first segment, as training data. However, these segments may not contain the query-related parts of documents. To address this problem, we propose query-driven segment selection from long documents to build training data. The segment selector provides relevant samples with more accurate labels and non-relevant samples which are harder to be predicted. The experimental results show that the basic BERT-based ranker trained with the proposed segment selector significantly outperforms that trained by the heuristically selected segments, and performs equally to the state-of-the-art model with localized self-attention that can process longer input sequences. Our findings open up new direction to design efficient transformer-based rankers.
【7】 C-MinHash: Practically Reducing Two Permutations to Just One 标题:C-MinHash:实际上将两个排列减少到一个 链接:https://arxiv.org/abs/2109.04595
作者:Xiaoyun Li,Ping Li 机构:Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA 摘要:传统的minwise哈希(MinHash)需要应用$K$独立排列来估计海量二进制(0/1)数据中的Jaccard相似性,其中$K$可以(例如)1024甚至更大,具体取决于应用程序。最近关于C-MinHash的研究(Li和Li,2021)表明,经过严格的证明,只需要两个置换。初始排列用于打破数据中可能存在的任何结构,第二个排列通过循环移位方式重复使用$K$次以产生$K$散列。(Li和Li,2021)已经证明,也许令人惊讶的是,即使$K$散列是相关的,估计方差也严格小于传统MinHash的方差。在(Li和Li,2021)中已经证明,C-MinHash中的初始排列确实是必要的。为了便于理论分析,他们使用了两种独立的排列。在本文中,我们证明了一个人实际上可以简单地使用一个置换。也就是说,初始预处理步骤和循环散列步骤都使用一个置换来分解数据中的结构,循环散列步骤生成$K$散列。虽然理论分析变得非常复杂,但我们能够明确地写出估计量期望的表达式。新的估计器不再是无偏的,但偏差非常小,基本上不影响估计精度(均方误差)。我们提供了一组广泛的实验来验证我们仅使用一种置换的说法。 摘要:Traditional minwise hashing (MinHash) requires applying $K$ independent permutations to estimate the Jaccard similarity in massive binary (0/1) data, where $K$ can be (e.g.,) 1024 or even larger, depending on applications. The recent work on C-MinHash (Li and Li, 2021) has shown, with rigorous proofs, that only two permutations are needed. An initial permutation is applied to break whatever structures which might exist in the data, and a second permutation is re-used $K$ times to produce $K$ hashes, via a circulant shifting fashion. (Li and Li, 2021) has proved that, perhaps surprisingly, even though the $K$ hashes are correlated, the estimation variance is strictly smaller than the variance of the traditional MinHash. It has been demonstrated in (Li and Li, 2021) that the initial permutation in C-MinHash is indeed necessary. For the ease of theoretical analysis, they have used two independent permutations. In this paper, we show that one can actually simply use one permutation. That is, one single permutation is used for both the initial pre-processing step to break the structures in the data and the circulant hashing step to generate $K$ hashes. Although the theoretical analysis becomes very complicated, we are able to explicitly write down the expression for the expectation of the estimator. The new estimator is no longer unbiased but the bias is extremely small and has essentially no impact on the estimation accuracy (mean square errors). An extensive set of experiments are provided to verify our claim for using just one permutation.
【8】 A Large-Scale Study of Machine Translation in the Turkic Languages 标题:突厥语机器翻译的大规模研究 链接:https://arxiv.org/abs/2109.04593
作者:Jamshidbek Mirzakhalov,Anoop Babu,Duygu Ataman,Sherzod Kariev,Francis Tyers,Otabek Abduraufov,Mammad Hajili,Sardana Ivanova,Abror Khaytbaev,Antonio Laverghetta Jr.,Behzodbek Moydinboyev,Esra Onal,Shaxnoza Pulatova,Ahsan Wahab,Orhan Firat,Sriram Chellappan 机构:Turkic Interlingua, bUniversity of South Florida, cNYU, Indiana University, eEPFL, fUniversity of Helsinki, Namangan State University, hGoogle Research 备注:9 pages, 1 figure, 8 tables. Main proceedings of EMNLP 2021 摘要:神经机器翻译(NMT)的最新进展将机器翻译系统的质量推向了广泛应用于构建竞争系统的地步。然而,仍有大量语言尚未从NMT中获益。在本文中,我们提供了突厥语系中机器翻译实际应用的第一个大规模案例研究,以实现在高资源到极低资源情景下突厥语NMT的收益。除了提供广泛的分析,以确定构建竞争性系统以改善数据稀缺性的瓶颈之外,我们的研究还有几个关键贡献,包括:,i)涵盖22种突厥语的大型平行语料库,包括公共数据集和约200万个平行句子的新数据集;ii)26种语言对的双语基线;iii)三个不同翻译领域的新型高质量测试集;iv)人类评估分数。所有模型、脚本和数据都将向公众发布。 摘要:Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems. However, there is still a large number of languages that are yet to reap the benefits of NMT. In this paper, we provide the first large-scale case study of the practical application of MT in the Turkic language family in order to realize the gains of NMT for Turkic languages under high-resource to extremely low-resource scenarios. In addition to presenting an extensive analysis that identifies the bottlenecks towards building competitive systems to ameliorate data scarcity, our study has several key contributions, including, i) a large parallel corpus covering 22 Turkic languages consisting of common public datasets in combination with new datasets of approximately 2 million parallel sentences, ii) bilingual baselines for 26 language pairs, iii) novel high-quality test sets in three different translation domains and iv) human evaluation scores. All models, scripts, and data will be released to the public.
【9】 Deciphering Environmental Air Pollution with Large Scale City Data 标题:利用大比例尺城市数据破译环境空气污染 链接:https://arxiv.org/abs/2109.04572
作者:Mayukh Bhattacharyya,Sayan Nag,Udita Ghosh 机构:Stony Brook University, University of Toronto, Zendrive 摘要:在21世纪对可持续环境条件构成威胁的众多危害中,只有少数危害的影响比空气污染更严重。它在确定城市环境中的健康和生活水平方面的重要性只会随着时间的推移而增加。从交通和发电厂的排放、家庭排放、自然原因等各种因素都是造成空气污染水平上升的主要原因或影响因素。然而,缺乏涉及主要因素的大规模数据阻碍了对不同空气污染物可变性的原因和关系的研究。通过这项工作,我们引入了一个大规模的城市智能数据集,用于探索这些代理之间长期的关系。我们对数据集进行分析和探索,得出我们可以通过对数据建模得出的推论。此外,我们还提供了一套基准,用于使用一套不同的模型和方法来估计或预测污染物水平。通过我们的论文,我们试图为这一领域的进一步研究提供一个基础,这将需要我们在不久的将来给予高度重视。 摘要:Out of the numerous hazards posing a threat to sustainable environmental conditions in the 21st century, only a few have a graver impact than air pollution. Its importance in determining the health and living standards in urban settings is only expected to increase with time. Various factors ranging from emissions from traffic and power plants, household emissions, natural causes are known to be primary causal agents or influencers behind rising air pollution levels. However, the lack of large scale data involving the major factors has hindered the research on the causes and relations governing the variability of the different air pollutants. Through this work, we introduce a large scale city-wise dataset for exploring the relationships among these agents over a long period of time. We analyze and explore the dataset to bring out inferences which we can derive by modeling the data. Also, we provide a set of benchmarks for the problem of estimating or forecasting pollutant levels with a set of diverse models and methodologies. Through our paper, we seek to provide a ground base for further research into this domain that will demand critical attention of ours in the near future.
【10】 Is Attention Better Than Matrix Decomposition? 标题:注意力比矩阵分解好吗? 链接:https://arxiv.org/abs/2109.04553
作者:Zhengyang Geng,Meng-Hao Guo,Hongxu Chen,Xia Li,Ke Wei,Zhouchen Lin 机构:Zhejiang Lab; ,Key Lab. of Machine Perception (MoE), School of EECS, Peking University;, Tsinghua University; ,School of Data Science, Fudan University; ,Pazhou Lab 备注:ICLR 2021 摘要:注意机制,特别是自我注意,作为现代深度学习的重要组成部分,在全球相关性发现中起着至关重要的作用。然而,在为全球环境建模时,手工制作的注意力是不可替代的吗?我们有趣的发现是,就编码长距离依赖关系的性能和计算成本而言,自我关注并不比20年前开发的矩阵分解(MD)模型好。我们将全局上下文问题建模为一个低秩恢复问题,并表明其优化算法可以帮助设计全局信息块。然后,本文提出了一系列的汉堡包,其中我们使用求解MDs的优化算法将输入表示分解为子矩阵,并重构低秩嵌入。当谨慎应对通过MDs传播回来的梯度时,具有不同MDs的汉堡包可以很好地对抗流行的全局上下文模块自我注意。在视觉任务中进行了全面的实验,在视觉任务中学习全局上下文至关重要,包括语义分割和图像生成,证明了自我注意及其变体的显著改进。 摘要:As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank recovery problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants.
【11】 SPECTRA: Sparse Structured Text Rationalization 标题:SPECTRUM:稀疏结构化文本合理化 链接:https://arxiv.org/abs/2109.04552
作者:Nuno Miguel Guerreiro,André F. T. Martins 机构:Instituto de Telecomunicações, Lisbon, Portugal, Instituto Superior Técnico & LUMLIS (Lisbon ELLIS Unit), Lisbon, Portugal, Unbabel, Lisbon, Portugal 备注:Accepted to EMNLP 2021 (main conference) 摘要:选择性合理化的目的是产生决策和理由(例如,文本突出显示或两句话之间的单词对齐)。通常,理论被建模为随机二元掩码,需要基于采样的梯度估计器,这使得训练复杂化,并且需要仔细的超参数调整。稀疏注意机制是一种确定性的替代方案,但它们缺乏一种调整基本原理提取的方法(例如,控制文本突出显示的稀疏性或对齐数量)。在本文中,我们提出了一个统一的框架,用于通过因子图上的约束推理确定性地提取结构化解释,形成一个可微层。我们的方法大大简化了训练和基本原理规范化,通常在性能和提取的基本原理的合理性方面优于以前的工作。我们进一步对分类和自然语言推理任务的基本原理提取的随机和确定性方法进行了比较研究,共同评估了它们的预测能力、解释质量和模型可变性。 摘要:Selective rationalization aims to produce decisions along with rationales (e.g., text highlights or word alignments between two sentences). Commonly, rationales are modeled as stochastic binary masks, requiring sampling-based gradient estimators, which complicates training and requires careful hyperparameter tuning. Sparse attention mechanisms are a deterministic alternative, but they lack a way to regularize the rationale extraction (e.g., to control the sparsity of a text highlight or the number of alignments). In this paper, we present a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer. Our approach greatly eases training and rationale regularization, generally outperforming previous work on what comes to performance and plausibility of the extracted rationales. We further provide a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks, jointly assessing their predictive power, quality of the explanations, and model variability.
【12】 Notes on Generalizing the Maximum Entropy Principle to Uncertain Data 标题:关于将最大熵原理推广到不确定数据的注记 链接:https://arxiv.org/abs/2109.04530
作者:Kenneth Bogert 机构: University of North Carolina Asheville 1 备注:10 pages 摘要:最大熵原理是一种广泛适用的技术,用于计算具有尽可能少的信息量的分布,同时通常限制为匹配经验估计的特征期望。我们试图将这一原理推广到无法计算经验特征预期的场景,因为模型变量仅部分观察到,这引入了对学习模型的依赖性。扩展和推广了潜在最大熵原理,引入了不确定最大熵,描述了一种基于期望最大化的近似求解方法。我们证明了我们的方法推广了最大熵和潜在最大熵的原理,并讨论了在有限数据情况下为特征期望约束添加误差项的一种普遍适用的正则化方法。 摘要:The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while commonly constrained to match empirically estimated feature expectations. We seek to generalize this principle to scenarios where the empirical feature expectations cannot be computed because the model variables are only partially observed, which introduces a dependency on the learned model. Extending and generalizing the principle of latent maximum entropy, we introduce uncertain maximum entropy and describe an expectation-maximization based solution to approximately solve these problems. We show that our technique generalizes the principle of maximum entropy and latent maximum entropy and discuss a generally applicable regularization technique for adding error terms to feature expectation constraints in the event of limited data.
【13】 Truth Discovery in Sequence Labels from Crowds 标题:人群序列标签中的真值发现 链接:https://arxiv.org/abs/2109.04470
作者:Nasim Sabetpour,Adithya Kulkarni,Sihong Xie,Qi Li 机构:Iowa State University, Ames, Iowa, USA, dept. Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA 摘要:注释的质量和数量对序列标记的性能有积极的影响,序列标记是自然语言处理中的一项重要任务。聘请领域专家对语料库集进行注释在金钱和时间上都非常昂贵。已经部署了众包平台,如Amazon Mechanical Turk(AMT),以协助实现这一目标。然而,由于缺乏专业知识,这些平台容易出现人为错误;因此,一个工人的注释不能直接用于训练模型。注释聚合的现有文献更多地关注二元或多选问题。近年来,在令牌之间具有复杂依赖关系的不平衡数据集上处理顺序标签聚合任务一直是一个挑战。为了克服这一挑战,我们提出了一种基于优化的方法,该方法使用工作者提供的标签推断出最佳的聚合注释集。所提出的群体序列标签聚合方法($AggSLC$)综合考虑了序列标签任务的特点、工人的可靠性和先进的机器学习技术。我们在命名实体识别(NER)、生物医学信息提取任务(PICO)和模拟数据集的不同众包数据上评估$AggSLC$。我们的结果表明,所提出的方法优于最先进的聚合方法。为了深入了解该框架,我们通过在没有预测模块和不一致性损失函数的情况下评估我们的模型,通过烧蚀研究来研究$AggSLC$组件的有效性。对我们算法收敛点的理论分析表明,所提出的$AggSLC$在有限次迭代后停止。 摘要:Annotations quality and quantity positively affect the performance of sequence labeling, a vital task in Natural Language Processing. Hiring domain experts to annotate a corpus set is very costly in terms of money and time. Crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), have been deployed to assist in this purpose. However, these platforms are prone to human errors due to the lack of expertise; hence, one worker's annotations cannot be directly used to train the model. Existing literature in annotation aggregation more focuses on binary or multi-choice problems. In recent years, handling the sequential label aggregation tasks on imbalanced datasets with complex dependencies between tokens has been challenging. To conquer the challenge, we propose an optimization-based method that infers the best set of aggregated annotations using labels provided by workers. The proposed Aggregation method for Sequential Labels from Crowds ($AggSLC$) jointly considers the characteristics of sequential labeling tasks, workers' reliabilities, and advanced machine learning techniques. We evaluate $AggSLC$ on different crowdsourced data for Named Entity Recognition (NER), Information Extraction tasks in biomedical (PICO), and the simulated dataset. Our results show that the proposed method outperforms the state-of-the-art aggregation methods. To achieve insights into the framework, we study $AggSLC$ components' effectiveness through ablation studies by evaluating our model in the absence of the prediction module and inconsistency loss function. Theoretical analysis of our algorithm's convergence points that the proposed $AggSLC$ halts after a finite number of iterations.
【14】 Ergodic Limits, Relaxations, and Geometric Properties of Random Walk Node Embeddings 标题:随机游动节点嵌入的遍历极限、松弛和几何性质 链接:https://arxiv.org/abs/2109.04526
作者:Christy Lin,Daniel Sussman,Prakash Ishwar 机构: Boston University 摘要:基于随机游动的节点嵌入算法通过优化节点嵌入向量的目标函数来学习节点的向量表示,并跳过由网络上的随机游动计算的二元统计。它们已被应用于许多有监督学习问题,如链路预测和节点分类,并展示了最先进的性能。然而,人们对它们的性质知之甚少。本文研究了在发现网络中隐藏块结构的无监督环境下,基于随机游走的节点嵌入的性质,即学习节点表示,其在欧氏空间中的簇结构反映了其在网络中的邻接结构。我们刻画了嵌入目标的遍历极限、它的推广和相关的凸松弛,从而得到相应的非随机版本的节点嵌入目标。我们还刻画了两社区随机块模型(SBM)的期望图的非随机目标的最优节点嵌入文法。我们证明了对于非随机化目标的适当核范数松弛,解文法师的秩为$1$。在SBM随机网络上的综合实验结果表明,我们的非随机遍历目标产生的节点嵌入分布类似高斯分布,集中在每个社区内预期网络的节点嵌入,并随着节点数量的增加集中在线性度标度区域。 摘要:Random walk based node embedding algorithms learn vector representations of nodes by optimizing an objective function of node embedding vectors and skip-bigram statistics computed from random walks on the network. They have been applied to many supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, their properties remain poorly understood. This paper studies properties of random walk based node embeddings in the unsupervised setting of discovering hidden block structure in the network, i.e., learning node representations whose cluster structure in Euclidean space reflects their adjacency structure within the network. We characterize the ergodic limits of the embedding objective, its generalization, and related convex relaxations to derive corresponding non-randomized versions of the node embedding objectives. We also characterize the optimal node embedding Grammians of the non-randomized objectives for the expected graph of a two-community Stochastic Block Model (SBM). We prove that the solution Grammian has rank $1$ for a suitable nuclear norm relaxation of the non-randomized objective. Comprehensive experimental results on SBM random networks reveal that our non-randomized ergodic objectives yield node embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases.