机器学习学术速递[7.5]

2021-07-27 10:23:13 浏览数 (1)

cs.LG 方向,今日共计85篇

Graph相关(图学习|图神经网络|图优化等)(3篇)

【1】 Combinatorial Optimization with Physics-Inspired Graph Neural Networks 标题:基于物理启发图神经网络的组合优化

作者:Martin J. A. Schuetz,J. Kyle Brubaker,Helmut G. Katzgraber 机构:Amazon Quantum Solutions Lab, Seattle, Washington , USA, AWS Intelligent and Advanced Compute Technologies, Professional Services, Seattle, Washington , USA, AWS Center for Quantum Computing, Pasadena, CA , USA, ) 备注:Manuscript: 13 pages, 5 figures, 1 table. Supplemental Material: 1 page, 1 table 链接:https://arxiv.org/abs/2107.01188 摘要:我们演示了如何使用图神经网络来解决组合优化问题。我们的方法广泛适用于以二次无约束二元优化问题形式出现的规范NP-hard问题,如最大割集、最小顶点覆盖、最大独立集,以及以多项式无约束二元优化问题形式出现的Ising自旋玻璃及其高阶推广。我们对问题的哈密顿量采用一种松弛策略来生成一个可微的损失函数,用它来训练图神经网络,并在无监督训练过程完成后对整变量应用一个简单的投影。我们展示了我们的方法与数值结果的规范最大割和最大独立集问题。我们发现,图形神经网络优化器执行PAR或优于现有的求解器,具有能力超出现有技术的状态,以百万个变量的问题。 摘要:We demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applicable to canonical NP-hard problems in the form of quadratic unconstrained binary optimization problems, such as maximum cut, minimum vertex cover, maximum independent set, as well as Ising spin glasses and higher-order generalizations thereof in the form of polynomial unconstrained binary optimization problems. We apply a relaxation strategy to the problem Hamiltonian to generate a differentiable loss function with which we train the graph neural network and apply a simple projection to integer variables once the unsupervised training process has completed. We showcase our approach with numerical results for the canonical maximum cut and maximum independent set problems. We find that the graph neural network optimizer performs on par or outperforms existing solvers, with the ability to scale beyond the state of the art to problems with millions of variables.

【2】 A Novel Deep Reinforcement Learning Based Stock Direction Prediction using Knowledge Graph and Community Aware Sentiments 标题:一种新的基于知识图和社区感知情感的深度强化学习股市走向预测方法

作者:Anil Berk Altuner,Zeynep Hilal Kilimci 机构:Department of Information Systems Engineering, University of Kocaeli, Kocaeli ,. 备注:15 pages 链接:https://arxiv.org/abs/2107.00931 摘要:股票市场预测一直是投资者、研究者和分析师的重要课题。由于受诸多因素的影响,股市预测是一项很难处理的任务。在本研究中,我们提出了一种新的方法,即基于深度强化学习的方法,利用社区情绪和知识图对股票进行方向预测。为此,我们首先通过分析连接之间的关系来构造用户的社会知识图。然后,将相关股票的时间序列分析和情绪分析与深度强化方法相结合。土耳其版本的双向编码器表示来自Transformer(BerTurk)被用来分析用户的情绪,而深度Q学习方法被用于所提出的模型的深度强化学习侧来构建深度Q网络。以伊斯坦布尔证券交易所的Garanti-Bank(GARAN)、Akbank(AKBNK)、T'urkiye.Ic{s}Bankas{I}(ISCTR)股票为例,验证了该模型的有效性。实验结果表明,该模型在股市预测任务中取得了显著的效果。 摘要:Stock market prediction has been an important topic for investors, researchers, and analysts. Because it is affected by too many factors, stock market prediction is a difficult task to handle. In this study, we propose a novel method that is based on deep reinforcement learning methodologies for the direction prediction of stocks using sentiments of community and knowledge graph. For this purpose, we firstly construct a social knowledge graph of users by analyzing relations between connections. After that, time series analysis of related stock and sentiment analysis is blended with deep reinforcement methodology. Turkish version of Bidirectional Encoder Representations from Transformers (BerTurk) is employed to analyze the sentiments of the users while deep Q-learning methodology is used for the deep reinforcement learning side of the proposed model to construct the deep Q network. In order to demonstrate the effectiveness of the proposed model, Garanti Bank (GARAN), Akbank (AKBNK), T"urkiye .Ic{s} Bankas{i} (ISCTR) stocks in Istanbul Stock Exchange are used as a case study. Experiment results show that the proposed novel model achieves remarkable results for stock market prediction task.

【3】 Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets 标题:通过学习从数据集生成图形的快速神经体系结构搜索

作者:Hayeon Lee,Eunyoung Hyung,Sung Ju Hwang 机构:KAIST, AITRICS, South Korea 备注:ICLR 2021 链接:https://arxiv.org/abs/2107.00860 摘要:尽管最近的神经结构搜索(NAS)方法在各种任务上取得了成功,其输出的网络在很大程度上优于人类设计的网络,但传统的NAS方法主要解决了为单个任务(数据集)搜索网络结构的优化问题,它不能很好地在多个任务(数据集)中通用化。此外,由于这种特定于任务的方法从零开始为每个给定的任务搜索神经结构,因此它们会产生大量的计算成本,这在时间和金钱预算有限的情况下是有问题的。在本文中,我们提出了一个高效的NAS框架,该框架在由数据集和预训练网络组成的数据库上训练一次,可以快速搜索新数据集的神经结构。提出的MetaD2A(Meta-Dataset-to-Architecture)模型可以从给定的集合(Dataset)中随机生成图(Architecture),该图(Architecture)是通过一个跨模态的潜在空间来学习的。此外,我们还提出了一个元性能预测器来估计和选择最佳的体系结构,而无需对目标数据集进行直接训练。实验结果表明,基于ImageNet-1K子集和NAS-Bench 201搜索空间架构的元学习模型成功地推广到多个不可见数据集,包括CIFAR-10和CIFAR-100,平均搜索时间为33gpu秒。即使在MobileNetV3搜索空间下,MetaD2A也比NSGANetV2(一种可转移的NAS方法)快5.5K倍,性能相当。我们相信MetaD2A为快速NAS提供了一个新的研究方向,以及如何利用过去几年积累的丰富的数据集和体系结构数据库中的知识。代码位于https://github.com/HayeonLee/MetaD2A. 摘要:Despite the success of recent Neural Architecture Search (NAS) methods on various tasks which have shown to output networks that largely outperform human-designed networks, conventional NAS methods have mostly tackled the optimization of searching for the network architecture for a single task (dataset), which does not generalize well across multiple tasks (datasets). Moreover, since such task-specific methods search for a neural architecture from scratch for every given task, they incur a large computational cost, which is problematic when the time and monetary budget are limited. In this paper, we propose an efficient NAS framework that is trained once on a database consisting of datasets and pretrained networks and can rapidly search for a neural architecture for a novel dataset. The proposed MetaD2A (Meta Dataset-to-Architecture) model can stochastically generate graphs (architectures) from a given set (dataset) via a cross-modal latent space learned with amortized meta-learning. Moreover, we also propose a meta-performance predictor to estimate and select the best architecture without direct training on target datasets. The experimental results demonstrate that our model meta-learned on subsets of ImageNet-1K and architectures from NAS-Bench 201 search space successfully generalizes to multiple unseen datasets including CIFAR-10 and CIFAR-100, with an average search time of 33 GPU seconds. Even under MobileNetV3 search space, MetaD2A is 5.5K times faster than NSGANetV2, a transferable NAS method, with comparable performance. We believe that the MetaD2A proposes a new research direction for rapid NAS as well as ways to utilize the knowledge from rich databases of datasets and architectures accumulated over the past years. Code is available at https://github.com/HayeonLee/MetaD2A.

Transformer(2篇)

【1】 R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling 标题:R2D2:用于可解释层次化语言建模的基于可差分树的递归转换器

作者:Xiang Hu,Haitao Mi,Zujie Wen,Yafang Wang,Yi Su,Jing Zheng,Gerard de Melo 机构:Ant Financial Services Group†, Hasso Plattner Institute University of Potsdam‡ 备注:To be published in the proceedings of ACL-IJCNLP 2021 链接:https://arxiv.org/abs/2107.00967 摘要:人类语言理解在多个粒度级别(如单词、短语和句子)上运行,抽象级别不断增加,可以分层组合。然而,现有的具有堆叠层的深层模型并没有显式地建模任何类型的层次过程。本文提出了一种基于可微CKY型二叉树的递归Transformer模型来模拟合成过程。我们将双向语言模型的预训练目标扩展到这个体系结构,尝试在给定左右抽象节点的情况下预测每个单词。为了扩展我们的方法,我们还引入了一种高效的剪枝树归纳算法,使得编码只需线性的合成步骤。语言建模和无监督句法分析的实验结果表明了该方法的有效性。 摘要:Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.

【2】 Transformer-F: A Transformer network with effective methods for learning universal sentence representation 标题:Transform-F:一种学习通用句子表示的有效方法的Transformer网络

作者:Yu Shi 机构:School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing, China, Key Laboratory of Trustworthy Distributed Computing and Service, Beijing University of Posts and 链接:https://arxiv.org/abs/2107.00653 摘要:Transformer模型在自然语言处理中被广泛应用于句子表示。然而,以往基于Transformer的模型大多关注意义有限的虚词,只能提取高层语义抽象特征。本文介绍了提高Transformer性能的两种方法。通过将词性权重向量与相关系数相乘,计算出注意得分,从而提取出更具实际意义的词语。根据词性的重要程度,通过输入文本序列得到权值向量。此外,我们融合了每一层的特征,使得句子表达结果更加全面和准确。在实验中,我们证明了我们的模型Transformer-F在三个标准文本分类数据集上的有效性。实验结果表明,与基线模型相比,本文提出的模型显著提高了文本分类的性能。具体来说,在简单的任务上,我们比vanilla Transformer获得了5.28%的相对改进。 摘要:The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract high-level semantic abstraction features. In this paper, two approaches are introduced to improve the performance of Transformers. We calculated the attention score by multiplying the part-of-speech weight vector with the correlation coefficient, which helps extract the words with more practical meaning. The weight vector is obtained by the input text sequence based on the importance of the part-of-speech. Furthermore, we fuse the features of each layer to make the sentence representation results more comprehensive and accurate. In experiments, we demonstrate the effectiveness of our model Transformer-F on three standard text classification datasets. Experimental results show that our proposed model significantly boosts the performance of text classification as compared to the baseline model. Specifically, we obtain a 5.28% relative improvement over the vanilla Transformer on the simple tasks.

GAN|对抗|攻击|生成相关(1篇)

【1】 Weather-based forecasting of energy generation, consumption and price for electrical microgrids management 标题:微电网管理中基于天气的发电量、耗电量和电价预测

作者:Jonathan Dumas 机构:Supervisor: Prof. Bertrand Cornélusse, Departments of Computer Science and Electrical Engineering, Liège University, This version of the manuscript is pending the approval of the jury., This dissertation is submitted for the degree of, Doctor of Philosophy 备注:PhD thesis. It is a first draft version. This manuscript must be evaluated by the supervisor to be decided if it is suitable to be defended before a jury 链接:https://arxiv.org/abs/2107.01034 摘要:政府间气候变化专门委员会提出了不同的减缓战略,以实现净排放量的减少,这将需要遵循一个途径,即将全球变暖限制在1.5{deg}摄氏度,没有或有限的超调。向无碳社会的过渡经历了可再生能源在能源结构中所占份额的必然增加和矿物燃料总消费量的急剧下降。因此,本文通过研究电力系统中的预测和决策工具,对可再生能源在电力系统中的集成进行了研究。事实上,与传统发电厂相比,可再生能源具有不确定性。大多数基于可再生能源的发电技术都是不可调度的,其发电量具有随机性,难以提前预测。可再生能源的高份额对于为可调度机组设计和确定规模的电力系统来说是一个巨大的挑战。在这种情况下,概率预测,其目的是建模所有可能的未来实现的分布,已成为一个重要的工具,装备决策者,希望导致更好的能源应用决策。本文主要研究两个问题:(1)如何对可再生能源发电、消费和电价进行可靠的概率预测(2) 如何利用概率预测做出具有不确定性的决策?论文的主题是“小型”系统的能源管理,如住宅规模的微电网,以日为基础。本文分为两个主要部分提出了解决这两个研究问题的方向:(1)预测部分(2) 计划和控制部分。 摘要:The Intergovernmental Panel on Climate Change proposes different mitigation strategies to achieve the net emissions reductions that would be required to follow a pathway that limits global warming to 1.5{deg}C with no or limited overshoot. The transition towards a carbon-free society goes through an inevitable increase of the share of renewable generation in the energy mix and a drastic decrease in terms of the total consumption of fossil fuels. Therefore, this thesis studies the integration of renewables in power systems by investigating forecasting and decision-making tools. Indeed, in contrast to conventional power plants, renewable energy is subject to uncertainty. Most of the generation technologies based on renewable sources are non-dispatchable, and their production is stochastic and hard to predict in advance. A high share of renewables is a great challenge for power systems that have been designed and sized for dispatchable units. In this context, probabilistic forecasts, which aim at modeling the distribution of all possible future realizations, have become an important tool to equip decision-makers, hopefully leading to better decisions in energy applications. This thesis focus on two main research questions: (1) How to produce reliable probabilistic forecasts of renewable generation, consumption, and electricity prices? (2) How to make decisions with uncertainty using probabilistic forecasts? The thesis perimeter is the energy management of "small" systems such as microgrids at a residential scale on a day-ahead basis. It is divided into two main parts to propose directions to address both research questions (1) a forecasting part; (2) a planning and control part.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】 Evaluating the Usefulness of Unsupervised monitoring in Cultural Heritage Monuments 标题:评估非监督监测在文化遗产古迹中的作用

作者:Charalampos Zafeiropoulos,Ioannis N. Tzortzis,Ioannis Rallis,Eftychios Protopapadakis,Nikolaos Doulamis,Anastasios Doulamis 机构: Tzortzis a Ioannis Rallis aEftychios Protopapadakis a Nikolaos Doulamis a Anastasios Doulamis aaNational Technological University of Athens 链接:https://arxiv.org/abs/2107.00964 摘要:在本文中,我们仔细研究了各种聚类技术的有效性,探讨了它们在文化遗产监测应用中的适用性。在本文的背景下,我们利用高光谱图像检测了罗德岛圣尼古拉斯堡墙壁上的分解和腐蚀程度。共有6种不同的聚类方法已经在一组14种不同的正射校正高光谱图像上进行了评估。实验装置包括K-means算法、光谱算法、Meanshift算法、DBSCAN算法、Birch算法和光学算法。对于这些技术中的每一种,我们都通过使用诸如Calinski-Harabasz、Davies-Bouldin索引和轮廓值等性能指标来评估其性能。在这种方法中,我们通过将聚类方法的结果与一组表示原始图像分解和/或腐蚀区域的基本事实的注释图像进行比较来评估聚类方法的结果。结果表明,在给定的数据集上应用了几种聚类技术,取得了较好的准确率、查准率、召回率和f1分数。最终,观察到劣化被相当准确地检测出来。 摘要:In this paper, we scrutinize the effectiveness of various clustering techniques, investigating their applicability in Cultural Heritage monitoring applications. In the context of this paper, we detect the level of decomposition and corrosion on the walls of Saint Nicholas fort in Rhodes utilizing hyperspectral images. A total of 6 different clustering approaches have been evaluated over a set of 14 different orthorectified hyperspectral images. Experimental setup in this study involves K-means, Spectral, Meanshift, DBSCAN, Birch and Optics algorithms. For each of these techniques we evaluate its performance by the use of performance metrics such as Calinski-Harabasz, Davies-Bouldin indexes and Silhouette value. In this approach, we evaluate the outcomes of the clustering methods by comparing them with a set of annotated images which denotes the ground truth regarding the decomposition and/or corrosion area of the original images. The results depict that a few clustering techniques applied on the given dataset succeeded decent accuracy, precision, recall and f1 scores. Eventually, it was observed that the deterioration was detected quite accurately.

【2】 Supervised Contrastive Learning for Accented Speech Recognition 标题:有监督对比学习在重音语音识别中的应用

作者:Tao Han,Hantao Huang,Ziang Yang,Wei Han 机构:Mediatek Singapore 备注:Accented speech recognition, deep neural networks, model adaptation, supervised contrastive learning 链接:https://arxiv.org/abs/2107.00921 摘要:基于神经网络的语音识别系统由于口音的存在,特别是陌生口音的存在,导致识别性能下降。本文研究了重音语音识别的有监督对比学习框架。为了建立不同的视角(相似的“正面”数据样本)进行对比学习,本文进一步研究了噪声注入、谱图增强和TTS同句生成三种数据增强技术。在普通语音数据集上的实验表明,对比学习有助于建立数据增强不变性和语音不变性表示,在Zero-Shot和全镜头两种情况下都显著优于传统的联合训练方法。实验表明,与联合训练法相比,对比学习法的准确率平均提高了3.66%(零分)和3.78%(满分)。 摘要:Neural network based speech recognition systems suffer from performance degradation due to accented speech, especially unfamiliar accents. In this paper, we study the supervised contrastive learning framework for accented speech recognition. To build different views (similar "positive" data samples) for contrastive learning, three data augmentation techniques including noise injection, spectrogram augmentation and TTS-same-sentence generation are further investigated. From the experiments on the Common Voice dataset, we have shown that contrastive learning helps to build data-augmentation invariant and pronunciation invariant representations, which significantly outperforms traditional joint training methods in both zero-shot and full-shot settings. Experiments show that contrastive learning can improve accuracy by 3.66% (zero-shot) and 3.78% (full-shot) on average, comparing to the joint training method.

【3】 Few-shot Learning for Unsupervised Feature Selection 标题:用于无监督特征选择的Few-Shot学习

作者:Atsutoshi Kumagai,Tomoharu Iwata,Yasuhiro Fujiwara 机构:NTT Computer and Data Science Laboratories, NTT Communication Science Laboratories 备注:20 pages 链接:https://arxiv.org/abs/2107.00816 摘要:提出了一种无监督特征选择的多镜头学习方法,即在未标记数据中选择相关特征子集的任务。现有的方法通常需要很多实例来进行特征选择。然而,在实践中往往没有足够的实例。该方法通过对多个源任务中的未标记实例进行训练,在给定几个未标记的目标实例的情况下,选择目标任务中相关特征的子集。我们的模型由特征选择器和解码器组成。特征选择器以若干未标记实例作为输入输出相关特征的子集,使得解码器可以从所选实例重构未标记实例的原始特征。特征选择器使用具体的随机变量通过梯度下降来选择特征。为了将特定于任务的属性从几个未标记的实例编码到模型中,具体的随机变量和译码器采用以几个未标记实例为输入的置换不变神经网络进行建模。我们的模型是通过最小化期望的测试重建误差来训练的,给定了一些在源任务中使用数据集计算的未标记实例。实验结果表明,该方法优于现有的特征选择方法。 摘要:We propose a few-shot learning method for unsupervised feature selection, which is a task to select a subset of relevant features in unlabeled data. Existing methods usually require many instances for feature selection. However, sufficient instances are often unavailable in practice. The proposed method can select a subset of relevant features in a target task given a few unlabeled target instances by training with unlabeled instances in multiple source tasks. Our model consists of a feature selector and decoder. The feature selector outputs a subset of relevant features taking a few unlabeled instances as input such that the decoder can reconstruct the original features of unseen instances from the selected ones. The feature selector uses the Concrete random variables to select features via gradient descent. To encode task-specific properties from a few unlabeled instances to the model, the Concrete random variables and decoder are modeled using permutation-invariant neural networks that take a few unlabeled instances as input. Our model is trained by minimizing the expected test reconstruction error given a few unlabeled instances that is calculated with datasets in source tasks. We experimentally demonstrate that the proposed method outperforms existing feature selection methods.

【4】 Mitigating Uncertainty of Classifier for Unsupervised Domain Adaptation 标题:基于无监督领域自适应的分类器不确定性消除

作者:Shanu Kumar,Vinod Kumar Kurmi,Praphul Singh,Vinay P Namboodiri 机构:Microsoft, India, Vinod K Kurmi, IIT Kanpur, Oracle, India, University of Bath 链接:https://arxiv.org/abs/2107.00727 摘要:理解无监督领域适应一直是一个重要的任务,已被很好地探索。然而,各种各样的方法都没有详细分析分类器性能的作用。在本文中,我们深入研究了分类器在匹配源分布和目标分布方面的作用。我们通过匹配a)特征的分布,b)样本的概率不确定性和c)确定性激活映射来具体研究分类器的能力。我们的分析表明,使用这三种分布确实可以在所有数据集上持续改进性能。因此,我们的工作扩展了现有的知识,即从分类器获得的各种分布对于解决无监督领域自适应的作用。 摘要:Understanding unsupervised domain adaptation has been an important task that has been well explored. However, the wide variety of methods have not analyzed the role of a classifier's performance in detail. In this paper, we thoroughly examine the role of a classifier in terms of matching source and target distributions. We specifically investigate the classifier ability by matching a) the distribution of features, b) probabilistic uncertainty for samples and c) certainty activation mappings. Our analysis suggests that using these three distributions does result in a consistently improved performance on all the datasets. Our work thus extends present knowledge on the role of the various distributions obtained from the classifier towards solving unsupervised domain adaptation.

【5】 SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios 标题:相似:现实场景中基于子模块信息度量的主动学习

作者:Suraj Kothawade,Nathan Beck,Krishnateja Killamsetty,Rishabh Iyer 机构:University of Texas at Dallas 链接:https://arxiv.org/abs/2107.00717 摘要:事实证明,主动学习通过选择最具信息量的样本,有助于最小化标签成本。然而,现有的主动学习方法在诸如不平衡或稀有类、未标记集合中的分布外数据和冗余等现实场景中并不能很好地工作。在这项工作中,我们提出了相似(基于子模块信息测度的主动学习),一个统一的主动学习框架,使用最近提出的子模块信息测度(SIM)作为获取函数。我们认为,类似的方法不仅适用于标准的主动学习,而且可以很容易地扩展到上述的实际设置,并作为一个一站式的主动学习解决方案,可扩展到大型真实世界的数据集。实验表明,在CIFAR-10、MNIST和ImageNet等多个图像分类任务中,对于稀有类,SIMILAR算法的性能比现有的主动学习算法高出约5%-18%,对于非分布数据,SIMILAR算法的性能比现有的主动学习算法高出约5%-10%。 摘要:Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples. However, existing active learning methods do not work well in realistic scenarios such as imbalance or rare classes, out-of-distribution data in the unlabeled set, and redundancy. In this work, we propose SIMILAR (Submodular Information Measures based actIve LeARning), a unified active learning framework using recently proposed submodular information measures (SIM) as acquisition functions. We argue that SIMILAR not only works in standard active learning, but also easily extends to the realistic settings considered above and acts as a one-stop solution for active learning that is scalable to large real-world datasets. Empirically, we show that SIMILAR significantly outperforms existing active learning algorithms by as much as ~5% - 18% in the case of rare classes and ~5% - 10% in the case of out-of-distribution data on several image classification tasks like CIFAR-10, MNIST, and ImageNet.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】 A Systems Theory of Transfer Learning 标题:迁移学习的系统论

作者:Tyler Cody,Peter A. Beling 机构:Engineering Systems and Environment, University of Virginia, Charlottesville, USA 链接:https://arxiv.org/abs/2107.01196 摘要:从系统论的角度来看,现有的迁移学习框架是不完整的。他们强调领域和任务的概念,而忽视了结构和行为的概念。在这样做的过程中,他们限制了形式主义能够在多大程度上被贯彻到他们框架的阐述中。在这里,我们使用梅萨罗维亚系统理论来定义迁移学习是一种集合上的关系,并随后将迁移学习的一般性质描述为一种数学结构。我们用我们的框架来解释现有框架,并超越现有框架来定义可转移性、转移粗糙度和转移距离的概念。重要的是,尽管形式主义,我们的框架避免了学习理论或机器学习解决方法的详细数学,而不排除它们的考虑。因此,我们提供了一个正式的,通用的系统框架,为转移学习建模,为系统设计和分析提供了严格的基础。 摘要:Existing frameworks for transfer learning are incomplete from a systems theoretic perspective. They place emphasis on notions of domain and task, and neglect notions of structure and behavior. In doing so, they limit the extent to which formalism can be carried through into the elaboration of their frameworks. Herein, we use Mesarovician systems theory to define transfer learning as a relation on sets and subsequently characterize the general nature of transfer learning as a mathematical construct. We interpret existing frameworks in terms of ours and go beyond existing frameworks to define notions of transferability, transfer roughness, and transfer distance. Importantly, despite its formalism, our framework avoids the detailed mathematics of learning theory or machine learning solution methods without excluding their consideration. As such, we provide a formal, general systems framework for modeling transfer learning that offers a rigorous foundation for system design and analysis.

【2】 Parasitic Egg Detection and Classification in Low-cost Microscopic Images using Transfer Learning 标题:基于转移学习的低成本显微图像寄生卵检测与分类

作者:Thanaphon Suwannaphong,Sawaphob Chavana,Sahapol Tongsom,Duangdao Palasuwan,Thanarat H. Chalidabhongse,Nantheera Anantrasirichai 机构:Department of Engineering Mathematics, University of Bristol, Bristol, UK, Department of Computer Engineering, Chulalongkorn University, Thailand 备注:7 pages, 9 figures, Preprint submitted to Elsevier 链接:https://arxiv.org/abs/2107.00968 摘要:肠道寄生虫感染导致了全世界人类的几种疾病,特别是在热带国家。传统的诊断方法通常依赖于显微图像的人工分析,由于不同寄生卵的形态相似性和样本中杂质的丰富性,容易造成人为错误。许多研究已经开发了寄生虫卵自动检测系统,以减少人类的工作量。然而,他们使用的是高质量的显微镜,不幸的是在一些农村地区仍然买不起。因此,我们的工作利用了低成本USB显微镜的优势。然而,由于放大倍数(10倍)的限制,该仪器提供的图像质量较差,导致寄生虫检测和物种分类困难。在这篇论文中,我们提出了一种基于CNN的技术,使用转移学习策略来提高低质量显微图像中寄生虫自动分类的效率。采用基于滑动窗口的贴片技术搜索卵的位置。两个网络,AlexNet和ResNet50,在架构大小和分类性能之间进行权衡。实验结果表明,该框架的识别效果优于现有的目标识别方法。我们的系统与专家的最终决定相结合,可以改善低成本显微镜下的真实粪便检查。 摘要:Intestinal parasitic infection leads to several morbidities to humans worldwide, especially in tropical countries. The traditional diagnosis usually relies on manual analysis from microscopic images which is prone to human error due to morphological similarity of different parasitic eggs and abundance of impurities in a sample. Many studies have developed automatic systems for parasite egg detection to reduce human workload. However, they work with high quality microscopes, which unfortunately remain unaffordable in some rural areas. Our work thus exploits a benefit of a low-cost USB microscope. This instrument however provides poor quality of images due to limitation of magnification (10x), causing difficulty in parasite detection and species classification. In this paper, we propose a CNN-based technique using transfer learning strategy to enhance the efficiency of automatic parasite classification in poor-quality microscopic images. The patch-based technique with sliding window is employed to search for location of the eggs. Two networks, AlexNet and ResNet50, are examined with a trade-off between architecture size and classification performance. The results show that our proposed framework outperforms the state-of-the-art object recognition methods. Our system combined with final decision from an expert may improve the real faecal examination with low-cost microscopes.

【3】 Data Centric Domain Adaptation for Historical Text with OCR Errors 标题:具有OCR错误的历史文本的以数据为中心的领域适配

作者:Luisa März,Stefan Schweter,Nina Poerner,Benjamin Roth,Hinrich Schütze 机构:Hinrich Sch¨utze, Center for Information and Language Processing, Ludwig Maximilian University, Munich, Germany 备注:14 pages, 2 figures, 6 tables 链接:https://arxiv.org/abs/2107.00927 摘要:针对荷兰语和法语的历史数据,提出了域内和跨域命名实体识别的新方法。对于跨域的情况,我们通过上下文化的字符串嵌入来整合无监督的域内数据来解决域转移问题;通过将合成OCR错误注入源域并解决以数据为中心的域适配,实现OCR错误。我们提出了一种通用的方法来模拟任意输入数据中的OCR错误。我们的跨域和域内结果优于几个强大的基线,并建立了最先进的结果。我们出版了法语和荷兰语Europeana NER语料库的预处理版本。 摘要:We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French. For the cross-domain case, we address domain shift by integrating unsupervised in-domain data via contextualized string embeddings; and OCR errors by injecting synthetic OCR errors into the source domain and address data centric domain adaptation. We propose a general approach to imitate OCR errors in arbitrary input data. Our cross-domain as well as our in-domain results outperform several strong baselines and establish state-of-the-art results. We publish preprocessed versions of the French and Dutch Europeana NER corpora.

【4】 Segmented Federated Learning for Adaptive Intrusion Detection System 标题:分段联邦学习在自适应入侵检测系统中的应用

作者:Geet Shingi,Harsh Saglani,Preeti Jain 机构:Dept. of Computer Engineering, Pune Institute of Computer Technology, Maharashtra, India. 备注:Accepted at the Workshop on Artificial Intelligence for Social Good (AI4SG) at the 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021 链接:https://arxiv.org/abs/2107.00881 摘要:网络攻击是一个重大问题,它给组织带来了巨大的财务、声誉和危害。然而,由于各种因素的影响,目前的网络入侵检测系统显得力不从心。主要的网络入侵检测系统通过手工制作的规则数据集识别网络攻击。尽管近年来机器学习和深度学习的应用减轻了网络入侵检测系统的巨大工作量,但网络数据的安全性一直是人们关注的焦点。然而,为了解决安全问题并实现组织间的共享,采用了联邦学习(FL)方案。尽管目前的FL系统已经取得了成功,但是网络的数据分布并不总是像FL那样适合于一个单一的全局模型。因此,在这种情况下,在FL中拥有一个单一的全局模型是不可行的。在本文中,我们提出了一个分段联邦学习(Segmented FL)的学习方案,以提高网络入侵检测系统的效率。分段FL方法采用周期性的局部模型评估,并在此基础上进行分段。我们的目标是将类似的网络环境带给同一个群体。此外,分段FL系统与基于工人拥有的数据样本数的局部模型参数的加权聚合相耦合,以进一步增强性能。与标准数据集上的FL和集中式系统相比,我们的系统的性能得到了改善,进一步验证了我们的系统,为将我们的技术扩展到各种任务提供了有力的支持。该解决方案适用于希望在不同网络环境下协作学习并保护单个数据集隐私的组织。 摘要:Cyberattacks are a major issues and it causes organizations great financial, and reputation harm. However, due to various factors, the current network intrusion detection systems (NIDS) seem to be insufficent. Predominant NIDS identifies Cyberattacks through a handcrafted dataset of rules. Although the recent applications of machine learning and deep learning have alleviated the enormous effort in NIDS, the security of network data has always been a prime concern. However, to encounter the security problem and enable sharing among organizations, Federated Learning (FL) scheme is employed. Although the current FL systems have been successful, a network's data distribution does not always fit into a single global model as in FL. Thus, in such cases, having a single global model in FL is no feasible. In this paper, we propose a Segmented-Federated Learning (Segmented-FL) learning scheme for a more efficient NIDS. The Segmented-FL approach employs periodic local model evaluation based on which the segmentation occurs. We aim to bring similar network environments to the same group. Further, the Segmented-FL system is coupled with a weighted aggregation of local model parameters based on the number of data samples a worker possesses to further augment the performance. The improved performance by our system as compared to the FL and centralized systems on standard dataset further validates our system and makes a strong case for extending our technique across various tasks. The solution finds its application in organizations that want to collaboratively learn on diverse network environments and protect the privacy of individual datasets.

【5】 Toward Robust Drug-Target Interaction Prediction via Ensemble Modeling and Transfer Learning 标题:基于集成建模和迁移学习的稳健药物-靶相互作用预测

作者:Po-Yu Kao,Shu-Min Kao,Nan-Lan Huang,Yen-Chu Lin 机构:Insilico Medicine Taiwan Ltd., Taipei, Taiwan 备注:8 pages, 1 figure, 10 tables 链接:https://arxiv.org/abs/2107.00719 摘要:药物-靶点相互作用(DTI)预测在药物发现中起着至关重要的作用,深度学习方法在这一领域取得了最新的成果。我们介绍了一个集成的深度学习模型(EnsembleDLM)的鲁棒DTI预测。集成dlm只利用化合物和蛋白质的序列信息,将多个深层神经网络的预测结果进行聚合。这种方法减少了过度拟合的机会,产生了无偏的预测,并在Davis和KIBA数据集中实现了最先进的性能。EnsembleDLM在跨域应用中也达到了最先进的性能,并且通过转移学习(使用大约两倍于新域中的测试数据量)获得了良好的跨域性能(Pearson相关系数和一致性指数>0.8)。 摘要:Drug-target interaction (DTI) prediction plays a crucial role in drug discovery, and deep learning approaches have achieved state-of-the-art performance in this field. We introduce an ensemble of deep learning models (EnsembleDLM) for robust DTI prediction. EnsembleDLM only uses the sequence information of chemical compounds and proteins, and it aggregates the predictions from multiple deep neural networks. This approach reduces the chance of overfitting, yields an unbiased prediction, and achieves state-of-the-art performance in Davis and KIBA datasets. EnsembleDLM also reaches state-of-the-art performance in cross-domain applications and decent cross-domain performance (Pearson correlation coefficient and concordance index > 0.8) with transfer learning using approximately twice the amount of test data in the new domain.

强化学习(7篇)

【1】 Structure-aware reinforcement learning for node-overload protection in mobile edge computing 标题:基于结构感知的强化学习在移动边缘计算节点过载保护中的应用

作者:Anirudha Jitani,Aditya Mahajan,Zhongwen Zhu,Hatem Abou-zeid,Emmanuel T. Fapi,Hakimeh Purmehdi 机构:and Virtual Reality (VR) applications has lead to a strongAnirudha Jitani is with the School of Computer Science, McGill Universityand Montreal Institute of Learning Algorithms, Aditya Mahajan is withthe Department of Electrical and Computer Engineering 备注:16 pages 链接:https://arxiv.org/abs/2107.01025 摘要:移动边缘计算(Mobile Edge Computing,MEC)是指将计算能力和应用程序放置在网络边缘的概念,提供诸如减少处理客户端请求的延迟、减少网络拥塞和提高应用程序性能等好处。当集群中的一个或多个边缘服务器过载时,MEC的性能和可靠性会显著降低。特别是当服务器因过载而崩溃时,它会导致MEC中的服务失败。本文提出了一种自适应接纳控制策略来防止边缘节点过载。该方法基于最近提出的一种低复杂度的RL(Reinforcement Learning)算法SALMUT(Structure Aware Learning for Multiple Thresholds),该算法利用了多类队列中最优接纳控制策略的结构,以平均代价为目标。我们将该框架扩展到折扣成本下的节点过载保护问题。在两种不同的环境(计算机模拟和docker试验台)中,使用模拟真实世界部署的多个场景验证了所提出的解决方案。我们的实证评估表明,SALMUT产生的总贴现成本与最先进的深度RL算法(如PPO(proximitive Policy Optimization)和A2C(Advantage Actor Critic))相似,但需要的训练时间少了一个数量级,输出的策略易于解释,并且可以在线部署。 摘要:Mobile Edge Computing (MEC) refers to the concept of placing computational capability and applications at the edge of the network, providing benefits such as reduced latency in handling client requests, reduced network congestion, and improved performance of applications. The performance and reliability of MEC are degraded significantly when one or several edge servers in the cluster are overloaded. Especially when a server crashes due to the overload, it causes service failures in MEC. In this work, an adaptive admission control policy to prevent edge node from getting overloaded is presented. This approach is based on a recently-proposed low complexity RL (Reinforcement Learning) algorithm called SALMUT (Structure-Aware Learning for Multiple Thresholds), which exploits the structure of the optimal admission control policy in multi-class queues for an average-cost setting. We extend the framework to work for node overload-protection problem in a discounted-cost setting. The proposed solution is validated using several scenarios mimicking real-world deployments in two different settings - computer simulations and a docker testbed. Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms such as PPO (Proximal Policy Optimization) and A2C (Advantage Actor Critic) but requires an order of magnitude less time to train, outputs easily interpretable policy, and can be deployed in an online manner.

【2】 Feeling of Presence Maximization: mmWave-Enabled Virtual Reality Meets Deep Reinforcement Learning 标题:临场感最大化:支持mmWave的虚拟现实遇到深度强化学习

作者:Peng Yang,Tony Q. S. Quek,Jingxuan Chen,Chaoqun You,Xianbin Cao 机构: Singapore University ofTechnology and Design, Cao are with the School of Electronic and Information Engineering, Beihang University 链接:https://arxiv.org/abs/2107.01001 摘要:本文研究了为无线移动用户提供超可靠、节能的虚拟现实(VR)体验的问题。为了保证超高清(UHD)视频帧可靠地传输给移动用户,增强移动用户的沉浸式视觉体验,开发了协调多点(CoMP)传输技术和毫米波(mmWave)通信技术。由于用户的移动和无线信道的时变,无线VR体验增强问题被描述为一个序列相关的混合整数问题,其目标是最大化用户在虚拟世界中的存在感(FoP),受接入点(AP)和用户头戴式显示器(HMD)的功耗限制。然而,由于缺乏用户准确的跟踪信息,以及序列相关和混合整数的特性,这一问题很难直接得到解决。为了克服这一挑战,我们提出了一种并行回声状态网络(ESN)学习方法,通过训练APs分别采集的新鲜和历史跟踪样本来预测用户的跟踪信息。根据学习的结果,我们提出了一种基于深度强化学习(DRL)的优化算法来解决该问题。在该算法中,我们实现了深度神经网络(DNNs)作为一个可伸缩的解决方案来产生整数决策变量,并解决了一个连续功率控制问题来批判整数决策变量。最后,将该算法与各种基准算法的性能进行了比较,并讨论了不同设计参数对算法性能的影响。仿真结果表明,该算法比基准算法节能4.14%。 摘要:This paper investigates the problem of providing ultra-reliable and energy-efficient virtual reality (VR) experiences for wireless mobile users. To ensure reliable ultra-high-definition (UHD) video frame delivery to mobile users and enhance their immersive visual experiences, a coordinated multipoint (CoMP) transmission technique and millimeter wave (mmWave) communications are exploited. Owing to user movement and time-varying wireless channels, the wireless VR experience enhancement problem is formulated as a sequence-dependent and mixed-integer problem with a goal of maximizing users' feeling of presence (FoP) in the virtual world, subject to power consumption constraints on access points (APs) and users' head-mounted displays (HMDs). The problem, however, is hard to be directly solved due to the lack of users' accurate tracking information and the sequence-dependent and mixed-integer characteristics. To overcome this challenge, we develop a parallel echo state network (ESN) learning method to predict users' tracking information by training fresh and historical tracking samples separately collected by APs. With the learnt results, we propose a deep reinforcement learning (DRL) based optimization algorithm to solve the formulated problem. In this algorithm, we implement deep neural networks (DNNs) as a scalable solution to produce integer decision variables and solving a continuous power control problem to criticize the integer decision variables. Finally, the performance of the proposed algorithm is compared with various benchmark algorithms, and the impact of different design parameters is also discussed. Simulation results demonstrate that the proposed algorithm is more 4.14% energy-efficient than the benchmark algorithms.

【3】 SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents 标题:社会人工智能:深度强化学习主体的社会认知能力基准

作者:Grgur Kovač,Rémy Portelas,Katja Hofmann,Pierre-Yves Oudeyer 机构:Inria (FR), Microsoft Research (UK) 备注:under review. arXiv admin note: substantial text overlap with arXiv:2104.13207 链接:https://arxiv.org/abs/2107.00956 摘要:人工智能的主要挑战之一是构建能够参与与人类社会互动的具身自主主体。在深度强化学习(DRL)领域,这一目标激发了许多关于具体语言使用的研究。然而,目前的研究侧重于语言作为一种交际工具在非常简单和非多样化的社会环境中:语言的“自然性”被简化为词汇量大和变异性大的概念。在这篇论文中,我们认为,面向人类水平的人工智能需要更广泛的关键社会技能:1)在复杂多变的社会环境中使用语言;2) 在不断发展的社会世界中,在语言之外,复杂的多模态环境中体现了交流。我们解释了认知科学的概念如何帮助人工智能绘制出一个类似人类智能的路线图,并将重点放在它的社会维度上。作为第一步,我们建议将当前的研究扩展到更广泛的核心社交技能。为了做到这一点,我们提出了SocialAI,这是一个评估DRL代理使用多个网格世界环境(具有其他(脚本化)社会代理)获取社会技能的基准。然后,我们研究了最近在SocialAI上测试的SOTA DRL方法的局限性,并讨论了迈向熟练社会代理的重要下一步。视频和代码可在https://sites.google.com/view/socialai. 摘要:Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. We explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. As a first step, we propose to expand current research to a broader set of core social skills. To do this, we present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents using multiple grid-world environments featuring other (scripted) social agents. We then study the limits of a recent SOTA DRL approach when tested on SocialAI and discuss important next steps towards proficient social agents. Videos and code are available at https://sites.google.com/view/socialai.

【4】 RL-NCS: Reinforcement learning based data-driven approach for nonuniform compressed sensing 标题:RL-NCS:基于强化学习的数据驱动非均匀压缩感知方法

作者:Nazmul Karim,Alireza Zaeemzadeh,Nazanin Rahnavard 机构:School of Electrical and Computer Engineering, University of Central Florida, Orlando, USA 链接:https://arxiv.org/abs/2107.00838 摘要:提出了一种基于强化学习的时变信号非均匀压缩感知框架。该方案被称为RL-NCS,旨在通过在信号的两组系数(称为感兴趣区域(ROI)系数和非感兴趣区域(non-ROI)系数)之间优化和自适应地分配感测能量来提高信号恢复的性能。与非感兴趣区系数相比,感兴趣区系数通常具有更大的重要性,需要以更高的精度重建。为了完成这项任务,在每个时间步使用两种特定的方法来预测ROI。其中一种方法采用长-短期记忆(LSTM)网络进行预测。另一种方法利用先前的ROI信息来预测下一步的ROI。利用勘探开发技术,Q网络学习选择最佳的测量矩阵设计方法。此外,为了有效地训练Q网络和LSTM网络,引入了联合损耗函数。结果表明,即使对于快速变化的信号和减少的测量次数,我们提出的方法也能获得显著的性能增益。 摘要:A reinforcement-learning-based non-uniform compressed sensing (NCS) framework for time-varying signals is introduced. The proposed scheme, referred to as RL-NCS, aims to boost the performance of signal recovery through an optimal and adaptive distribution of sensing energy among two groups of coefficients of the signal, referred to as the region of interest (ROI) coefficients and non-ROI coefficients. The coefficients in ROI usually have greater importance and need to be reconstructed with higher accuracy compared to non-ROI coefficients. In order to accomplish this task, the ROI is predicted at each time step using two specific approaches. One of these approaches incorporates a long short-term memory (LSTM) network for the prediction. The other approach employs the previous ROI information for predicting the next step ROI. Using the exploration-exploitation technique, a Q-network learns to choose the best approach for designing the measurement matrix. Furthermore, a joint loss function is introduced for the efficient training of the Q-network as well as the LSTM network. The result indicates a significant performance gain for our proposed method, even for rapidly varying signals and a reduced number of measurements.

【5】 Reinforcement Learning for Feedback-Enabled Cyber Resilience 标题:支持反馈的网络弹性的强化学习

作者:Yunhan Huang,Linan Huang,Quanyan Zhu 机构:Department of Electrical and Computer Engineering, New York University, Jay Street, Brooklyn, New York, United States 链接:https://arxiv.org/abs/2107.00783 摘要:设备数量及其连接性的快速增长扩大了攻击面,削弱了网络系统。随着攻击者变得越来越复杂和机智,仅仅依靠传统的网络保护(如入侵检测、防火墙和加密)不足以保护网络系统。网络弹性提供了一种新的安全范例,它用弹性机制来补充不充分的保护。一个网络弹性机制(CRM)适应于已知或零天的威胁和不确定性,并对其进行实时和战略性的响应,以维持网络系统的关键功能。反馈体系结构在实现CRM的在线感知、推理和驱动方面起着关键作用。强化学习(Reinforcement Learning,RL)是一类重要的算法,它集中体现了网络弹性的反馈体系结构,使得CRM能够在攻击者有限的先验知识下对攻击提供动态和连续的响应。在这项工作中,我们回顾了有关网络弹性RL的文献,并讨论了针对三种主要类型的漏洞的网络弹性防御,即姿态相关的、信息相关的和人类相关的漏洞。介绍了移动目标防御、防御性网络欺骗和辅助人身安全技术作为CRMs的三个应用领域。RL技术本身也存在漏洞。我们解释了RL的主要漏洞,并给出了几种攻击模型,其中攻击的目标是奖励、度量和执行器。我们证明了攻击者可以用最小的攻击努力诱使RL代理学习恶意策略,这表明了启用RL的系统存在严重的安全问题。最后,我们讨论了RL在网络安全和恢复方面的未来挑战,以及基于RL的crm的新兴应用。 摘要:The rapid growth in the number of devices and their connectivity has enlarged the attack surface and weakened cyber systems. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain the critical functions of the cyber systems. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation of the CRM. Reinforcement Learning (RL) is an important class of algorithms that epitomize the feedback architectures for cyber resiliency, allowing the CRM to provide dynamic and sequential responses to attacks with limited prior knowledge of the attacker. In this work, we review the literature on RL for cyber resiliency and discuss the cyber-resilient defenses against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce moving target defense, defensive cyber deception, and assistive human security technologies as three application domains of CRMs to elaborate on their designs. The RL technique also has vulnerabilities itself. We explain the major vulnerabilities of RL and present several attack models in which the attacks target the rewards, the measurements, and the actuators. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort, which shows serious security concerns for RL-enabled systems. Finally, we discuss the future challenges of RL for cyber security and resiliency and emerging applications of RL-based CRMs.

【6】 Distilling Reinforcement Learning Tricks for Video Games 标题:视频游戏强化学习策略的提取

作者:Anssi Kanervisto,Christian Scheller,Yanick Schraner,Ville Hautamäki 机构:School of Computing, University of Eastern Finland, Joensuu, Finland, Institute for Data Science, University of Applied Sciences, Northwestern Switzerland, Windisch, Switzerland, Ville Hautam¨aki 备注:To appear in IEEE Conference on Games 2021. Experiment code is available at this https URL 链接:https://arxiv.org/abs/2107.00703 摘要:强化学习(RL)研究的重点是可以应用于不同领域的通用解决方案。这就产生了RL从业者几乎可以在任何领域使用的方法。然而,最近的研究往往缺乏有效使用RL所需的工程步骤(“技巧”),例如奖励形成、课程学习和将一个大任务分成更小的部分。如果没有必要的话,这样的技巧是很常见的,可以获得最先进的成绩并赢得RL比赛。为了简化工程工作,我们从最新的结果中提取技巧的描述,并研究这些技巧在多大程度上可以改进标准的深度Q学习代理。这项工作的长期目标是通过提供一个统一的软件框架和多个领域的相关见解,将成熟的RL方法与特定领域的技巧结合起来。 摘要:Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps ("tricks") which may be needed to effectively use RL, such as reward shaping, curriculum learning, and splitting a large task into smaller chunks. Such tricks are common, if not necessary, to achieve state-of-the-art results and win RL competitions. To ease the engineering efforts, we distill descriptions of tricks from state-of-the-art results and study how well these tricks can improve a standard deep Q-learning agent. The long-term goal of this work is to enable combining proven RL methods with domain-specific tricks by providing a unified software framework and accompanying insights in multiple domains.

【7】 Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning 标题:基于视觉模型的强化学习中因果发现的系统评价

作者:Nan Rosemary Ke,Aniket Didolkar,Sarthak Mittal,Anirudh Goyal,Guillaume Lajoie,Stefan Bauer,Danilo Rezende,Yoshua Bengio,Michael Mozer,Christopher Pal 机构: 6Max Planck Institute for Intelligent Systems 链接:https://arxiv.org/abs/2107.00848 摘要:从观察数据中归纳因果关系是机器学习中的一个经典问题。大多数关于因果关系的工作都是从因果变量本身被观察的前提开始的。然而,对于人工智能代理(如机器人)来说,它们试图了解自己的环境,唯一能观察到的是像图像中像素这样的低级变量。为了更好地概括,一个agent必须归纳出高级变量,特别是那些因果变量或受因果变量影响的变量。因此,人工智能和因果关系的中心目标是联合发现抽象表示和因果结构。然而,我们注意到,现有的研究因果归纳的环境不太适合这个目标,因为它们有复杂的特定于任务的因果图,不可能进行参数化操作(例如,节点数、稀疏性、因果链长度等)。在这项工作中,我们的目标是促进研究学习表现的高层次变量以及因果结构之间。为了系统地探索方法识别这些变量和结构的能力,我们设计了一套基准测试RL环境。我们评估了文献中的各种表示学习算法,发现在模型中显式地结合结构和模块化有助于基于模型的强化学习中的因果归纳。 摘要:Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

元学习(2篇)

【1】 Memory Efficient Meta-Learning with Large Images 标题:大图像存储效率高的元学习算法

作者:John Bronskill,Daniela Massiceti,Massimiliano Patacchiola,Katja Hofmann,Sebastian Nowozin,Richard E. Turner 机构:University of Cambridge, Microsoft Research 链接:https://arxiv.org/abs/2107.01105 摘要:用于少量镜头分类的元学习方法在测试时计算效率很高,只需要几个优化步骤或单个前向过程来学习新任务,但它们仍然需要高度的内存密集型训练。出现这种限制是因为在执行优化步骤之前,必须处理任务的整个支持集(最多可包含1000个图像)。因此,利用大型图像提供的性能增益需要跨多个gpu并行化元学习者(这可能不可用),或者在应用内存限制时在任务和图像大小之间进行权衡。我们改进了这两种方案,提出了LITE,一种通用的、内存高效的情景式训练方案,能够在单个GPU上对由大型图像组成的大型任务进行元训练。我们通过观察任务的梯度可以分解为任务训练图像上的梯度总和来实现这一点。这使我们能够对任务的整个训练集执行前向传递,但通过仅向后传播这些图像的随机子集(我们显示的是完全梯度的无偏近似),可以显著节省内存。我们使用LITE来训练元学习者,并在现实世界的轨道基准和具有挑战性的VTAB MD基准的4个部分中的3个部分,相对于领先的元学习者,展示新的最先进的准确性。LITE还使元学习者能够与迁移学习方法竞争,但只需测试时间计算成本的一小部分,因此与最近的说法相反,迁移学习是少数镜头分类所需的全部。 摘要:Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task's training images. This enables us to perform a forward pass on a task's entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.

【2】 Meta-Learning for Relative Density-Ratio Estimation 标题:元学习在相对密度比估计中的应用

作者:Atsutoshi Kumagai,Tomoharu Iwata,Yasuhiro Fujiwara 机构:NTT Computer and Data Science Laboratories, NTT Communication Science Laboratories 备注:17 pages 链接:https://arxiv.org/abs/2107.00801 摘要:在机器学习中,两个概率密度之比,称为密度比,是一个重要的量。尤其是相对密度比,作为密度比的有界扩展,由于其稳定性而受到广泛关注,并被广泛应用于离群点检测和数据集比较等领域。现有的(相对)密度比估计(DRE)方法需要从这两种密度得到许多实例。然而,在实践中往往没有足够的实例。本文提出了一种相对DRE的元学习方法,该方法利用相关数据集中的知识,从几个实例中估计相对密度比。具体地说,在给定的两个由多个实例组成的数据集上,我们的模型利用神经网络来提取数据集的信息,并利用它来获得适合于相对DRE的实例嵌入。我们在嵌入空间上建立了相对密度比的线性模型,其全局最优解可以作为一个闭式解得到。闭式解能快速有效地适应少数实例,其可微性使我们能够训练模型,使相对DRE的期望测试误差在适应少数实例后显式地最小化。通过相对DRE、数据集比较和离群点检测三个问题验证了该方法的有效性。 摘要:The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its stability and has been used in various applications such as outlier detection and dataset comparison. Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. However, sufficient instances are often unavailable in practice. In this paper, we propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. Specifically, given two datasets that consist of a few instances, our model extracts the datasets' information by using neural networks and uses it to obtain instance embeddings appropriate for the relative DRE. We model the relative density-ratio by a linear model on the embedded space, whose global optimum solution can be obtained as a closed-form solution. The closed-form solution enables fast and effective adaptation to a few instances, and its differentiability enables us to train our model such that the expected test error for relative DRE can be explicitly minimized after adapting to a few instances. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.

医学相关(1篇)

【1】 Cooperative Training and Latent Space Data Augmentation for Robust Medical Image Segmentation 标题:基于协作训练和潜在空间数据增强的鲁棒医学图像分割

作者:Chen Chen,Kerstin Hammernik,Cheng Ouyang,Chen Qin,Wenjia Bai,Daniel Rueckert 机构: BioMedIA Group, Department of Computing, Imperial College London, UK, Klinikum rechts der Isar, Technical University of Munich, Germany, Institute for Digital Communications, University of Edinburgh, UK, Data Science Institute, Imperial College London, UK 备注:MICCAI 2021 链接:https://arxiv.org/abs/2107.01079 摘要:基于深度学习的分割方法在部署过程中容易受到不可预见的数据分布变化的影响,如不同扫描仪引起的图像外观或对比度的变化、不可预见的图像伪影等,我们提出了一种用于训练图像分割模型的协作框架和一种用于生成硬示例的潜在空间增强方法。这两个贡献提高了模型的泛化和鲁棒性与有限的数据。合作训练框架由快速思维网络(FTN)和慢速思维网络(STN)组成。FTN学习解耦的图像特征和形状特征,用于图像重建和分割任务。STN学习形状先验知识进行分割校正和细化。这两个网络以合作的方式进行训练。潜在空间增强通过在信道和空间两个方面掩盖解耦的潜在空间,产生具有挑战性的训练示例。我们在公共心脏成像数据集上进行了广泛的实验。与强基线方法相比,我们只使用了10名来自单个站点的受试者进行训练,证明了改进的跨站点分割性能和增强的针对各种不可预见的成像伪影的鲁棒性。特别地,与标准训练方法相比,具有潜在空间数据增强的合作训练在平均骰子得分方面产生15%的改进。 摘要:Deep learning-based segmentation methods are vulnerable to unforeseen data distribution shifts during deployment, e.g. change of image appearances or contrasts caused by different scanners, unexpected imaging artifacts etc. In this paper, we present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples. Both contributions improve model generalization and robustness with limited data. The cooperative training framework consists of a fast-thinking network (FTN) and a slow-thinking network (STN). The FTN learns decoupled image features and shape features for image reconstruction and segmentation tasks. The STN learns shape priors for segmentation correction and refinement. The two networks are trained in a cooperative manner. The latent space augmentation generates challenging examples for training by masking the decoupled latent space in both channel-wise and spatial-wise manners. We performed extensive experiments on public cardiac imaging datasets. Using only 10 subjects from a single site for training, we demonstrated improved cross-site segmentation performance and increased robustness against various unforeseen imaging artifacts compared to strong baseline methods. Particularly, cooperative training with latent space data augmentation yields 15% improvement in terms of average Dice score when compared to a standard training method.

推荐(1篇)

【1】 Quantifying Availability and Discovery in Recommender Systems via Stochastic Reachability 标题:利用随机可达性量化推荐系统中的可用性和发现

作者:Mihaela Curmei,Sarah Dean,Benjamin Recht 机构:Department of EECS, University of California, Berkeley 备注:to appear ICML 2021 链接:https://arxiv.org/abs/2107.00833 摘要:在这项工作中,我们考虑如何在交互式推荐系统中的偏好模型确定内容的可用性和用户的发现机会。我们提出了一个基于随机可达性的评估过程,以量化在一组允许的策略修改下,向用户推荐目标内容的最大概率。这个框架允许我们在对用户行为进行最小假设的情况下计算推荐可能性的上界。随机可达性可用于检测内容可用性中的偏差,并诊断授予用户的发现机会中的限制。我们证明了对于各种实际情况,这个度量可以作为一个凸规划来有效地计算,并且进一步论证了可达性与精确性并不存在内在的矛盾。我们演示了在显式和隐式评分的大数据集上训练的推荐算法的评估。我们的结果说明了偏好模型、选择规则和用户干预是如何影响可达性的,以及这些影响是如何不均匀分布的。 摘要:In this work, we consider how preference models in interactive recommendation systems determine the availability of content and users' opportunities for discovery. We propose an evaluation procedure based on stochastic reachability to quantify the maximum probability of recommending a target piece of content to an user for a set of allowable strategic modifications. This framework allows us to compute an upper bound on the likelihood of recommendation with minimal assumptions about user behavior. Stochastic reachability can be used to detect biases in the availability of content and diagnose limitations in the opportunities for discovery granted to users. We show that this metric can be computed efficiently as a convex program for a variety of practical settings, and further argue that reachability is not inherently at odds with accuracy. We demonstrate evaluations of recommendation algorithms trained on large datasets of explicit and implicit ratings. Our results illustrate how preference models, selection rules, and user interventions impact reachability and how these effects can be distributed unevenly.

聚类(1篇)

【1】 Almost Tight Approximation Algorithms for Explainable Clustering 标题:可解释聚类的几乎紧近似算法

作者:Hossein Esfandiari,Vahab Mirrokni,Shyam Narayanan 备注:23 pages 链接:https://arxiv.org/abs/2107.00774 摘要:近年来,由于人们对人工智能的透明性越来越感兴趣,一些可解释的机器学习方法被开发出来,其目的是同时满足人类对精确性和可解释性的要求。本文研究了Dasgupta等人首次提出的可解释聚类框架~cite{dasgupta2020explainable}。具体来说,我们关注的是$k$-均值和$k$-中值问题,并提供了几乎严格的上下限。首先,我们提供了一个$O(log kloglog k)$-可解释的$k$-中间值的近似算法,改进了最著名的$O(k)$~cite{dasgupta220explainable}算法,并与已知的$Omega(log k)$下限~cite{dasgupta220explainable}近似匹配。此外,在低维空间$dlllog k$中,我们还证明了我们的算法也为可解释的$k$中值提供了一个$O(dlog^2d)$近似解。对于低维~cite{laber2021explainable},这比最著名的$O(dlogk)$界有所改进,对于常维空间,这是一个常数。为了补充这一点,我们给出了一个几乎匹配的$Omega(d)$下界。接下来,我们在此背景下研究了$k$-均值问题,并为可解释的$k$-均值提供了一个$O(klog k)$-近似算法,改进了Dasgupta等人的$O(k^2)$界和cite{laber2021explainable}的$O(d klog k)$界。为了补充这一点,我们提供了一个几乎紧的$Omega(k)$下界,改进了Dasgupta等人的$Omega(log k)$下界。我们所有的算法在点数和维数上都在近似线性时间内运行。 摘要:Recently, due to an increasing interest for transparency in artificial intelligence, several methods of explainable machine learning have been developed with the simultaneous goal of accuracy and interpretability by humans. In this paper, we study a recent framework of explainable clustering first suggested by Dasgupta et al.~cite{dasgupta2020explainable}. Specifically, we focus on the $k$-means and $k$-medians problems and provide nearly tight upper and lower bounds. First, we provide an $O(log k log log k)$-approximation algorithm for explainable $k$-medians, improving on the best known algorithm of $O(k)$~cite{dasgupta2020explainable} and nearly matching the known $Omega(log k)$ lower bound~cite{dasgupta2020explainable}. In addition, in low-dimensional spaces $d ll log k$, we show that our algorithm also provides an $O(d log^2 d)$-approximate solution for explainable $k$-medians. This improves over the best known bound of $O(d log k)$ for low dimensions~cite{laber2021explainable}, and is a constant for constant dimensional spaces. To complement this, we show a nearly matching $Omega(d)$ lower bound. Next, we study the $k$-means problem in this context and provide an $O(k log k)$-approximation algorithm for explainable $k$-means, improving over the $O(k^2)$ bound of Dasgupta et al. and the $O(d k log k)$ bound of cite{laber2021explainable}. To complement this we provide an almost tight $Omega(k)$ lower bound, improving over the $Omega(log k)$ lower bound of Dasgupta et al. All our algorithms run in near linear time in the number of points and the dimension.

联邦学习|隐私保护|加密(2篇)

【1】 Gradient-Leakage Resilient Federated Learning 标题:梯度泄漏弹性联邦学习

作者:Wenqi Wei,Ling Liu,Yanzhao Wu,Gong Su,Arun Iyengar 机构:† Georgia Institute of Technology, School of CS, Atlanta, GA , USA, ∗ IBM T. J. Watson Research Center, Yorktown Heights, NY , USA 链接:https://arxiv.org/abs/2107.01154 摘要:联邦学习(FL)是一种新兴的分布式学习模式,具有默认的客户端隐私,因为客户端可以在其设备上保留敏感数据,并且只与联邦服务器共享本地训练参数更新。然而,最近的研究表明,在FL梯度泄漏可能会损害隐私的客户训练数据。本文提出了一种基于每训练样本的基于客户差异隐私的梯度泄漏恢复隐私保护联邦学习方法,称为Fed-CDP。它有三个原创贡献。首先,我们在联邦学习中识别了三种类型的客户端梯度泄漏威胁,即使使用加密的客户端-服务器通信。我们阐明了传统的服务器协调差分隐私方法(Fed-SDP)何时以及为什么不足以保护训练数据的隐私。其次,我们介绍了基于实例的客户机差异隐私算法Fed-CDP,并给出了Fed-CDP在$(epsilon、delta)$差异隐私保证下的形式化分析,以及Fed-CDP和Fed-SDP在隐私计算方面的形式化比较。第三,对Fed-CDP提供差分隐私保证的隐私效用权衡进行了形式化分析,提出了动态衰减噪声注入策略,进一步提高了Fed-CDP的准确性和恢复能力。我们评估并比较了Fed-CDP和Fed-CDP(decay)与Fed-SDP在五个基准数据集上的差异隐私保证和梯度泄漏恢复能力。结果表明,Fed-CDP方法比传统的Fed-SDP方法具有更好的抗客户端梯度泄漏能力,同时在联邦学习中提供了更具竞争力的准确性。 摘要:Federated learning(FL) is an emerging distributed learning paradigm with default client privacy because clients can keep sensitive data on their devices and only share local training parameter updates with the federated server. However, recent studies reveal that gradient leakages in FL may compromise the privacy of client training data. This paper presents a gradient leakage resilient approach to privacy-preserving federated learning with per training example-based client differential privacy, coined as Fed-CDP. It makes three original contributions. First, we identify three types of client gradient leakage threats in federated learning even with encrypted client-server communications. We articulate when and why the conventional server coordinated differential privacy approach, coined as Fed-SDP, is insufficient to protect the privacy of the training data. Second, we introduce Fed-CDP, the per example-based client differential privacy algorithm, and provide a formal analysis of Fed-CDP with the $(epsilon, delta)$ differential privacy guarantee, and a formal comparison between Fed-CDP and Fed-SDP in terms of privacy accounting. Third, we formally analyze the privacy-utility trade-off for providing differential privacy guarantee by Fed-CDP and present a dynamic decay noise-injection policy to further improve the accuracy and resiliency of Fed-CDP. We evaluate and compare Fed-CDP and Fed-CDP(decay) with Fed-SDP in terms of differential privacy guarantee and gradient leakage resilience over five benchmark datasets. The results show that the Fed-CDP approach outperforms conventional Fed-SDP in terms of resilience to client gradient leakages while offering competitive accuracy performance in federated learning.

【2】 On Bridging Generic and Personalized Federated Learning 标题:泛化与个性化联合学习的桥梁研究

作者:Hong-You Chen,Wei-Lun Chao 机构:The Ohio State University, USA 链接:https://arxiv.org/abs/2107.00778 摘要:联邦学习因其在不访问多个客户机数据的情况下协作训练模型的能力而有希望,但当客户机的数据分布彼此不同时,它很容易受到攻击。这种分歧进一步导致了一个难题:“我们应该优先考虑学习模型的通用性能(供服务器将来使用)还是其个性化性能(供每个客户机使用)?”这两个看似相互竞争的目标使社区各执一词,然而在本文中,我们证明了同时处理这两个问题是可能的。具体地说,我们提出了一个新的联邦学习框架,该框架将模型的双重任务与两个预测任务显式解耦。一方面,我们引入了一系列对非同一类分布具有鲁棒性的损失,使客户能够训练一个具有一致目标的通用预测因子。另一方面,我们将个性化预测作为一个轻量级的自适应模块,在通用预测的基础上学习最小化每个客户的经验风险。在这种双损失双预测框架下,我们称之为联邦鲁棒解耦反馈控制棒,学习模型可以同时实现最先进的通用和个性化性能,基本上连接了这两个任务。 摘要:Federated learning is promising for its ability to collaboratively train models with multiple clients without accessing their data, but vulnerable when clients' data distributions diverge from each other. This divergence further leads to a dilemma: "Should we prioritize the learned model's generic performance (for future use at the server) or its personalized performance (for each client)?" These two, seemingly competing goals have divided the community to focus on one or the other, yet in this paper we show that it is possible to approach both at the same time. Concretely, we propose a novel federated learning framework that explicitly decouples a model's dual duties with two prediction tasks. On the one hand, we introduce a family of losses that are robust to non-identical class distributions, enabling clients to train a generic predictor with a consistent objective across them. On the other hand, we formulate the personalized predictor as a lightweight adaptive module that is learned to minimize each client's empirical risk on top of the generic predictor. With this two-loss, two-predictor framework which we name Federated Robust Decoupling Fed-RoD, the learned model can simultaneously achieve state-of-the-art generic and personalized performance, essentially bridging the two tasks.

推理|分析|理解|解释(2篇)

【1】 Near-optimal Algorithms for Explainable k-Medians and k-Means 标题:求解可解释k-中点和k-均值的近优算法

作者:Konstantin Makarychev,Liren Shan 机构:Northwsestern University 备注:28 pages, 4 figures, ICML 2021 链接:https://arxiv.org/abs/2107.00798 摘要:我们考虑的问题,解释$ $ $中位数和$ K $的手段由DasGupTa,弗罗斯特,莫斯科维茨,和RastChia~(ICML 2020)。在这个问题中,我们的目标是找到一个emph{threshold decision tree},它将数据划分成$k$个簇,并最小化$k$个中间值或$k$个平均值。由于阈值树的每个决策节点都将基于单个特征的数据分成两组,因此得到的聚类结果易于解释。我们提出了一个新的算法,即$tilde O(logk)$与$ellu 1$范数的$tilde O(k)$竞争与$k$平均数的$tilde O(k)$竞争。这是对Dasgupta等人(2020年)之前的$O(k)$和$O(k^2)$担保的改进。我们还提供了一个新的算法,$O(log^{3/2}k)$对$k$中位数有竞争力,且$ellu 2$范数。我们的第一个算法是接近最优的:Dasgupta等人(2020年)显示$k$中位数的下限为$Omega(log k)$;在这项工作中,我们证明了$tildeOmega(k)$对于$k$-均值的下界。我们还提供了一个下限$Omega(log k)$对于$k$-中位数为$ellu 2$norm。 摘要:We consider the problem of explainable $k$-medians and $k$-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem, our goal is to find a emph{threshold decision tree} that partitions data into $k$ clusters and minimizes the $k$-medians or $k$-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is $tilde O(log k)$ competitive with $k$-medians with $ell_1$ norm and $tilde O(k)$ competitive with $k$-means. This is an improvement over the previous guarantees of $O(k)$ and $O(k^2)$ by Dasgupta et al (2020). We also provide a new algorithm which is $O(log^{3/2} k)$ competitive for $k$-medians with $ell_2$ norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of $Omega(log k)$ for $k$-medians; in this work, we prove a lower bound of $tildeOmega(k)$ for $k$-means. We also provide a lower bound of $Omega(log k)$ for $k$-medians with $ell_2$ norm.

【2】 The Causal Neural Connection: Expressiveness, Learnability, and Inference 标题:因果神经联系:表现性、可学习性和推理性

作者:Kevin Xia,Kai-Zhan Lee,Yoshua Bengio,Elias Bareinboim 机构:Columbia University, MILA, Université de Montréal 备注:10 pages main body (53 total pages with references and appendix), 5 figures in main body (20 total figures including appendix) 链接:https://arxiv.org/abs/2107.00793 摘要:任何因果推理的核心要素之一是一个称为结构因果模型(SCM)的对象,它代表了被调查系统随机变化的机制和外部来源的集合(Pearl,2000)。许多神经网络的一个重要特性是普适逼近性:将任意函数逼近到任意精度的能力。考虑到这一特性,人们可能会猜测,一组神经网络能够通过训练SCM生成的数据来学习任何SCM。在本文中,我们通过解开表达性和可学性的概念来证明这不是事实。具体地说,我们证明了因果层次定理(Thm。1,Bareinboim et al.,2020),它描述了从数据中可以学到的东西的局限性,对于神经模型仍然适用。例如,一个任意复杂和表达的神经网络无法预测干预措施的效果,仅凭观察数据。基于这一结果,我们引入了一种特殊类型的SCM,称为神经因果模型(NCM),并形式化了一种新的归纳偏差来编码执行因果推理所必需的结构约束。在这类新模型的基础上,我们致力于解决文献中发现的两个典型任务,即因果识别和估计。利用神经工具箱,我们开发了一个算法,这是充分和必要的,以确定因果关系是否可以从数据中学习(即,因果可识别性);然后,只要可识别性成立,它就估计效果(因果估计)。仿真结果证实了所提出的方法。 摘要:One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mechanisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.

检测相关(1篇)

【1】 Misinformation Detection on YouTube Using Video Captions 标题:利用视频字幕检测YouTube上的错误信息

作者:Raj Jagtap,Abhinav Kumar,Rahul Goel,Shakshi Sharma,Rajesh Sharma,Clint P. George 机构:School of Mathematics and Computer Science, Indian Institute of Technology Goa, India, Institute of Computer Science, University of Tartu, Tartu, Estonia 链接:https://arxiv.org/abs/2107.00941 摘要:数百万人使用YouTube、Facebook、Twitter和其他大众媒体等平台。由于这些平台的可访问性,它们经常被用来建立叙述、进行宣传和传播错误信息。本文提出了一种利用最新的NLP技术从视频字幕中提取特征的方法。为了评估我们的方法,我们利用一个公开访问和标记的数据集来分类视频是否是错误信息。探索视频字幕背后的动机源于我们对视频元数据的分析。视图、喜欢、不喜欢和评论的数量等属性是无效的,因为使用这些信息很难区分视频。利用字幕数据集,该模型可以将视频分为三类(错误信息、揭穿错误信息和中性),F1评分为0.85~0.90f1。为了强调错误信息类的相关性,我们将我们的分类问题重新表述为两类分类-错误信息与其他分类(揭穿错误信息和中立)。在我们的实验中,我们提出的模型可以对F1评分为0.92到0.95,AUC ROC为0.78到0.90的视频进行分类。 摘要:Millions of people use platforms such as YouTube, Facebook, Twitter, and other mass media. Due to the accessibility of these platforms, they are often used to establish a narrative, conduct propaganda, and disseminate misinformation. This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles). To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not. The motivation behind exploring video captions stems from our analysis of videos metadata. Attributes such as the number of views, likes, dislikes, and comments are ineffective as videos are hard to differentiate using this information. Using caption dataset, the proposed models can classify videos among three classes (Misinformation, Debunking Misinformation, and Neutral) with 0.85 to 0.90 F1-score. To emphasize the relevance of the misinformation class, we re-formulate our classification problem as a two-class classification - Misinformation vs. others (Debunking Misinformation and Neutral). In our experiments, the proposed models can classify videos with 0.92 to 0.95 F1-score and 0.78 to 0.90 AUC ROC.

分类|识别(5篇)

【1】 Language Identification of Hindi-English tweets using code-mixed BERT 标题:基于混码BERT的印英推文语言识别

作者:Mohd Zeeshan Ansari,M M Sufyan Beg,Tanvir Ahmad,Mohd Jazib Khan,Ghazali Wasim 机构:Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India, Aligarh Muslim University, Aligarh, India 链接:https://arxiv.org/abs/2107.01202 摘要:社交媒体文本的语言识别是近年来研究的热点问题。在非英语国家,社交媒体信息主要是代码混合的。通过预先训练上下文嵌入的先验知识已经显示了一系列下游任务的最新结果。最近,BERT等模型表明,使用大量的未标记数据,预训练语言模型对学习公共语言表示更为有利。利用迁移学习和微调BERT模型识别Twitter上的语言进行了大量实验。这项工作利用印地语-英语-乌尔都语代码混合文本的数据收集进行语言预训练,并使用印地语-英语代码混合进行后续的单词级语言分类。结果表明,在编码混合数据上预训练的表征与单语对应的表征相比,能产生更好的结果。 摘要:Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, the pretrained language models are even more beneficial for learning common language representations. Extensive experiments exploiting transfer learning and fine-tuning BERT models to identify language on Twitter are presented in this paper. The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification. The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.

【2】 NTIRE 2021 Multi-modal Aerial View Object Classification Challenge 标题:NTIRE 2021多模态航拍目标分类挑战赛

作者:Jerrick Liu,Nathan Inkawhich,Oliver Nina,Radu Timofte,Sahil Jain,Bob Lee,Yuru Duan,Wei Wei,Lei Zhang,Songzheng Xu,Yuxuan Sun,Jiaqi Tang,Xueli Geng,Mengru Ma,Gongzhe Li,Xueli Geng,Huanqia Cai,Chengxue Cai,Sol Cummings,Casian Miron,Alexandru Pasarica,Cheng-Yen Yang,Hung-Min Hsu,Jiarui Cai,Jie Mei,Chia-Ying Yeh,Jenq-Neng Hwang,Michael Xin,Zhongkai Shangguan,Zihe Zheng,Xu Yifei,Lehan Yang,Kele Xu,Min Feng 备注:None 链接:https://arxiv.org/abs/2107.01189 摘要:在本文中,我们介绍了第一个挑战的多模态鸟瞰目标分类(MAVOC)结合NTIR2021年研讨会在CVPR。这个挑战是由两个不同的轨道使用地球观测和合成孔径雷达图像。光电传感器和合成孔径雷达传感器各有优缺点。本次比赛的目的是分析如何以互补的方式使用这两组感官信息。我们讨论了本次比赛提交的顶级方法,并在我们的盲测试集上评估了它们的结果。我们的挑战赛结果显示,从我们目前的比赛每条赛道的基线来看,准确率显著提高了15%以上 摘要:In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition

【3】 Normalizing Flow based Hidden Markov Models for Classification of Speech Phones with Explainability 标题:基于归一化流的隐马尔可夫模型用于可解释语音分类

作者:Anubhab Ghosh,Antoine Honoré,Dong Liu,Gustav Eje Henter,Saikat Chatterjee 机构:Digital Futures, and School of Electrical Engg. and Computer Sc., KTH Royal Institute of Technology, Sweden 备注:12 pages, 4 figures 链接:https://arxiv.org/abs/2107.00730 摘要:为了追求可解释性,我们开发了序列数据的生成模型。该模型为语音电话分类提供了最先进的分类结果和鲁棒性。我们结合了现代神经网络(规范化流)和传统的生成模型(隐马尔可夫模型-HMMs)。基于归一化流的混合模型(NMMs)被用来模拟给定隐状态的条件概率分布。模型参数的学习是通过时间测试贝叶斯学习方法和现代神经网络学习方法的明智组合。我们主要将期望最大化和小批量梯度下降相结合。所提出的生成模型可以计算数据的似然,因此直接适用于最大似然(ML)分类方法。由于HMMs的结构灵活性,我们可以使用不同的归一化流模型。这导致不同类型的hmm在数据建模能力方面提供多样性。这种多样性为不同模型的决策融合提供了方便。对于一个包含39个电话(类)和TIMIT数据集的标准语音电话分类系统,我们证明了使用mel频率倒谱系数(MFCCs)、提出的生成模型和决策融合的标准特征,仅通过生成训练就可以达到86.6%$的准确率。这一结果接近最新的结果,例如,PyTorch-Kaldi工具箱[1]的准确率为86.2%$,使用光选通循环单元[2]的准确率为85.1%$。在本文中,我们不使用任何有区别的学习方法和相关的复杂特性。 摘要:In pursuit of explainability, we develop generative models for sequential data. The proposed models provide state-of-the-art classification results and robust performance for speech phone classification. We combine modern neural networks (normalizing flows) and traditional generative models (hidden Markov models - HMMs). Normalizing flow-based mixture models (NMMs) are used to model the conditional probability distribution given the hidden state in the HMMs. Model parameters are learned through judicious combinations of time-tested Bayesian learning methods and contemporary neural network learning methods. We mainly combine expectation-maximization (EM) and mini-batch gradient descent. The proposed generative models can compute likelihood of a data and hence directly suitable for maximum-likelihood (ML) classification approach. Due to structural flexibility of HMMs, we can use different normalizing flow models. This leads to different types of HMMs providing diversity in data modeling capacity. The diversity provides an opportunity for easy decision fusion from different models. For a standard speech phone classification setup involving 39 phones (classes) and the TIMIT dataset, we show that the use of standard features called mel-frequency-cepstral-coeffcients (MFCCs), the proposed generative models, and the decision fusion together can achieve $86.6%$ accuracy by generative training only. This result is close to state-of-the-art results, for examples, $86.2%$ accuracy of PyTorch-Kaldi toolkit [1], and $85.1%$ accuracy using light gated recurrent units [2]. We do not use any discriminative learning approach and related sophisticated features in this article.

【4】 Neural Task Success Classifiers for Robotic Manipulation from Few Real Demonstrations 标题:基于少数真实示例的机器人操作神经任务成功分类器

作者:Abdalkarim Mohtasib,Amir Ghalamzan E.,Nicola Bellotto,Heriberto Cuayáhuitl 机构:School of Computer Science, University of Lincoln, Lincoln, UK, Lincoln Institute for Agri-Food, Technology, Heriberto Cuay´ahuitl 备注:8 pages 链接:https://arxiv.org/abs/2107.00722 摘要:在不同的工作环境中,越来越多的机器人需要从少量的演示中学习新的操作任务。一个评估动作质量的分类器模型可以预测一个任务的成功完成,智能代理可以利用它进行动作选择。本文提出了一种新的分类器,它只需通过少量的实例就可以对任务完成情况进行分类。我们对不同的神经分类器进行了综合比较,如全连通分类、全卷积分类、序列分类和域自适应分类。我们还提出了一个新的数据集,包括五个机器人操作任务,这是公开的。我们使用我们的数据集和MIME数据集比较了我们的新分类器和现有模型的性能。研究结果表明,领域自适应和基于时间的特征可以提高成功预测。我们的新模型,即具有域自适应和时序特征的全卷积神经网络,在两个数据集中的任务中的平均分类准确率分别为97.3%和95.5%,而没有域自适应和时序特征的最新分类器仅分别达到82.4%和90.3%。 摘要:Robots learning a new manipulation task from a small amount of demonstrations are increasingly demanded in different workspaces. A classifier model assessing the quality of actions can predict the successful completion of a task, which can be used by intelligent agents for action-selection. This paper presents a novel classifier that learns to classify task completion only from a few demonstrations. We carry out a comprehensive comparison of different neural classifiers, e.g. fully connected-based, fully convolutional-based, sequence2sequence-based, and domain adaptation-based classification. We also present a new dataset including five robot manipulation tasks, which is publicly available. We compared the performances of our novel classifier and the existing models using our dataset and the MIME dataset. The results suggest domain adaptation and timing-based features improve success prediction. Our novel model, i.e. fully convolutional neural network with domain adaptation and timing features, achieves an average classification accuracy of 97.3% and 95.5% across tasks in both datasets whereas state-of-the-art classifiers without domain adaptation and timing-features only achieve 82.4% and 90.3%, respectively.

【5】 Long-Short Ensemble Network for Bipolar Manic-Euthymic State Recognition Based on Wrist-worn Sensors 标题:基于手腕佩戴传感器的长短集成网络双相躁狂-愉悦状态识别

作者:Ulysse Côté-Allard,Petter Jakobsen,Andrea Stautland,Tine Nordgreen,Ole Bernt Fasmer,Ketil Joachim Oedegaard,Jim Torresen 机构: Department of Informatics, University of Oslo, Oslo, Norway, NORMENT, Division of Psychiatry, Haukeland University Hospital, Bergen, Norway, Department of Clinical Medicine, University of Bergen, Norway 备注:Submitted for peer-review. 11 pages 3. 2 Figures and 1 table 链接:https://arxiv.org/abs/2107.00710 摘要:躁狂发作的双相情感障碍可导致不加批判的行为和妄想性精神病,往往对受影响的人及其周围环境造成破坏性后果。早期发现和干预躁狂发作对于防止病情升级、入院和过早死亡至关重要。然而,双相情感障碍患者可能没有意识到他们正在经历躁狂发作,诸如快感和生产力提高等症状也会阻止患者寻求帮助。这项工作提出了执行用户独立,自动情绪状态检测的基础上,从手腕上获得的活动和皮肤电活动装置在躁狂和恢复后(安乐死)。本文提出了一种新的基于深度学习的集成方法,利用长(20小时)和短(5分钟)的时间间隔来区分情绪状态。通过对47例双相情感障碍患者的测试,本文提出的分类方法在心境正常/躁狂状态识别中的平均准确率为91.59%。 摘要:Manic episodes of bipolar disorder can lead to uncritical behaviour and delusional psychosis, often with destructive consequences for those affected and their surroundings. Early detection and intervention of a manic episode are crucial to prevent escalation, hospital admission and premature death. However, people with bipolar disorder may not recognize that they are experiencing a manic episode and symptoms such as euphoria and increased productivity can also deter affected individuals from seeking help. This work proposes to perform user-independent, automatic mood-state detection based on actigraphy and electrodermal activity acquired from a wrist-worn device during mania and after recovery (euthymia). This paper proposes a new deep learning-based ensemble method leveraging long (20h) and short (5 minutes) time-intervals to discriminate between the mood-states. When tested on 47 bipolar patients, the proposed classification scheme achieves an average accuracy of 91.59% in euthymic/manic mood-state recognition.

表征(2篇)

【1】 DUKweb: Diachronic word representations from the UK Web Archive corpus 标题:DUKweb:来自英国网络档案馆语料库的历时词汇表征

作者:Adam Tsakalidis,Pierpaolo Basile,Marya Bazzi,Mihai Cucuringu,Barbara McGillivray 机构:. The Alan Turing Institute, London, United Kingdom ,. Queen Mary, University of London, London, UK ,. University of Bari, Bari, Italy ,. Univer-, sity of Warwick, Coventry, United Kingdom ,. University of Oxford, Oxford 备注:24 pages, 6 figures 链接:https://arxiv.org/abs/2107.01076 摘要:词汇语义变化是社会文化研究和自然语言处理应用的一项重要任务。历时性词语嵌入(词语的时间敏感向量表示,保留其含义)已成为这项任务的标准资源。然而,考虑到生成这些词所需的大量计算资源,很少有资源可供科学界使用。在本文中,我们介绍了DUKweb,一套大规模的资源设计的历时分析当代英语。DUKweb是从JISC-UK Web域数据集(1996-2013)创建的,这是一个非常大的归档文件,它从以“.UK”结尾的域托管的互联网归档文件中收集资源。DUKweb由一系列单词共现矩阵和两种类型的单词嵌入组成,每年嵌入JISC-UK Web域数据集。我们通过一个词义变化检测的案例来展示DUKweb的重用潜力及其质量标准。 摘要:Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific community. In this paper we present DUKweb, a set of large-scale resources designed for the diachronic analysis of contemporary English. DUKweb was created from the JISC UK Web Domain Dataset (1996-2013), a very large archive which collects resources from the Internet Archive that were hosted on domains ending in `.uk'. DUKweb consists of a series word co-occurrence matrices and two types of word embeddings for each year in the JISC UK Web Domain dataset. We show the reuse potential of DUKweb and its quality standards via a case study on word meaning change detection.

【2】 Multimodal Representation for Neural Code Search 标题:神经编码搜索的多模态表示法

作者:Jian Gu,Zimin Chen,Martin Monperrus 机构:University of Zurich, KTH Royal Institute of Technology 备注:12 pages, 9 figures, accepted by ICSME 2021 链接:https://arxiv.org/abs/2107.00992 摘要:语义代码搜索是为给定的自然语言查询查找语义相关的代码段。在现有的方法中,代码和查询之间的语义相似性被量化为它们在共享向量空间中表示的距离。为了改进向量空间,本文在AST的简化形式上引入了树序列化方法,并建立了代码数据的多模态表示。我们使用一个大规模的多语言语料库CodeSearchNet进行了广泛的实验。结果表明,我们的树序列表示和多模态学习模型都提高了神经代码搜索的性能。最后,我们定义了两个面向代码数据语义和句法信息完整性的直观量化度量。 摘要:Semantic code search is about finding semantically relevant code snippets for a given natural language query. In the state-of-the-art approaches, the semantic similarity between code and query is quantified as the distance of their representation in the shared vector space. In this paper, to improve the vector space, we introduce tree-serialization methods on a simplified form of AST and build the multimodal representation for the code data. We conduct extensive experiments using a single corpus that is large-scale and multi-language: CodeSearchNet. Our results show that both our tree-serialized representations and multimodal learning model improve the performance of neural code search. Last, we define two intuitive quantification metrics oriented to the completeness of semantic and syntactic information of the code data.

优化|敛散性(2篇)

【1】 Momentum Accelerates the Convergence of Stochastic AUPRC Maximization 标题:动量加速随机AUPRC最大值的收敛

作者:Guanghui Wang,Ming Yang,Lijun Zhang,Tianbao Yang 机构:National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China, Hefei University of Technology, Hefei , China, Department of Computer Science, the University of Iowa, Iowa City, IA , USA 链接:https://arxiv.org/abs/2107.01173 摘要:在本文中,我们研究了精确召回曲线下区域的随机优化(AUPRC),这是一种广泛应用于对抗不平衡分类任务的方法。虽然有一些方法被提出来最大化AUPRC,但具有收敛保证的AUPRC随机优化问题仍然是一个不成熟的领域。最近的一项工作[42]提出了一种基于最大化平均精度的代理损失的AUPRC方法,并证明了寻找非凸目标$epsilon$平稳解的$O(1/epsilon^5)$复杂性。在本文中,我们进一步改进了AURPC的随机优化:(i)发展了新的随机动量方法,其迭代复杂度为$O(1/epsilon^4)$,用于寻找$epsilon$-平稳解;设计了一类迭代复杂度为$O(1/epsilon^4)$的随机自适应方法,在实际应用中具有较快的收敛速度。为此,我们提出了两个创新的技术,这两个技术对于提高收敛性是至关重要的:(i)跟踪个人排名分数的有偏估计是以随机坐标方式更新的;并且(ii)在随机梯度估计器上使用动量更新来跟踪目标的梯度。在不同数据集上的大量实验证明了该算法的有效性。另外,本文提出的随机动量和自适应算法也适用于一类两层随机相关组合优化问题。 摘要:In this paper, we study stochastic optimization of areas under precision-recall curves (AUPRC), which is widely used for combating imbalanced classification tasks. Although a few methods have been proposed for maximizing AUPRC, stochastic optimization of AUPRC with convergence guarantee remains an undeveloped territory. A recent work [42] has proposed a promising approach towards AUPRC based on maximizing a surrogate loss for the average precision, and proved an $O(1/epsilon^5)$ complexity for finding an $epsilon$-stationary solution of the non-convex objective. In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/epsilon^4)$ for finding an $epsilon$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity of $O(1/epsilon^4)$, which enjoy faster convergence in practice. To this end, we propose two innovative techniques that are critical for improving the convergence: (i) the biased estimators for tracking individual ranking scores are updated in a randomized coordinate-wise manner; and (ii) a momentum update is used on top of the stochastic gradient estimator for tracking the gradient of the objective. Extensive experiments on various data sets demonstrate the effectiveness of the proposed algorithms. Of independent interest, the proposed stochastic momentum and adaptive algorithms are also applicable to a class of two-level stochastic dependent compositional optimization problems.

【2】 Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization 标题:基于对比Fenchel-Legendre优化的紧互信息估计

作者:Qing Guo,Junya Chen,Dong Wang,Yuewei Yang,Xinwei Deng,Lawrence Carin,Fan Li,Chenyang Tao 机构:Duke University ,Virginia Tech ,KAUST 链接:https://arxiv.org/abs/2107.01131 摘要:InfoNCE及其变体的成功应用使对比变分互信息(MI)估计器在机器学习中的应用得到了推广。这些估计器虽然具有很好的稳定性,但在很大程度上依赖于代价高昂的大批量训练,并且为了减少方差而牺牲了界紧性。为了克服这些限制,我们从非正规化统计建模和凸优化的角度重新研究了流行的变分MI界的数学。我们的研究不仅产生了一个新的统一的理论框架,包含了流行的变分MI界,而且产生了一个新颖的、简单的、强大的对比MI估计器FLO。理论上,我们证明了FLO估计是紧的,并且在随机梯度下降下是收敛的。经验上,我们的FLO估计克服了前人的局限性,学习效率更高。FLO的实用性通过一组广泛的基准进行了验证,这也揭示了实际MI估计中的权衡。 摘要:Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.

预测|估计(5篇)

【1】 Road Roughness Estimation Using Machine Learning 标题:基于机器学习的道路不平度估计

作者:Milena Bajic,Shahrzad M. Pour,Asmus Skar,Matteo Pettinari,Eyal Levenberg,Tommy Sonne Alstrøm 机构:ID 链接:https://arxiv.org/abs/2107.01199 摘要:路面不平度对基础设施来说是一个非常重要的路况,因为不平度影响乘客的安全性和乘坐舒适性。道路会随着时间的推移而恶化,这意味着必须持续监测道路不平度,以便准确了解道路基础设施的状况。本文提出了一种基于车辆垂直加速度和车速的路面不平度预测的机器学习流水线。比较了线性回归、朴素贝叶斯、k近邻、随机森林、支持向量机和多层感知器神经网络等有监督机器学习模型。这些模型是在时间域和统计域中计算的一组最佳选择的特征上训练的。结果表明,机器学习方法可以准确地预测路面不平度,使用记录的成本可接近的车内传感器安装在常规客车。我们的研究结果表明,该技术非常适合于满足未来路面状况监测,实现了对广泛道路网络的连续监测。 摘要:Road roughness is a very important road condition for the infrastructure, as the roughness affects both the safety and ride comfort of passengers. The roads deteriorate over time which means the road roughness must be continuously monitored in order to have an accurate understand of the condition of the road infrastructure. In this paper, we propose a machine learning pipeline for road roughness prediction using the vertical acceleration of the car and the car speed. We compared well-known supervised machine learning models such as linear regression, naive Bayes, k-nearest neighbor, random forest, support vector machine, and the multi-layer perceptron neural network. The models are trained on an optimally selected set of features computed in the temporal and statistical domain. The results demonstrate that machine learning methods can accurately predict road roughness, using the recordings of the cost approachable in-vehicle sensors installed in conventional passenger cars. Our findings demonstrate that the technology is well suited to meet future pavement condition monitoring, by enabling continuous monitoring of a wide road network.

【2】 Backward-Compatible Prediction Updates: A Probabilistic Approach 标题:向后兼容预测更新:一种概率方法

作者:Frederik Träuble,Julius von Kügelgen,Matthäus Kleindessner,Francesco Locatello,Bernhard Schölkopf,Peter Gehler 机构:Amazon T¨ubingen, Germany, Max Planck Institute for Intelligent Systems, T¨ubingen, Germany, Department of Engineering, University of Cambridge, United Kingdom 链接:https://arxiv.org/abs/2107.01057 摘要:当机器学习系统满足实际应用时,准确度只是其中的一个要求。在这篇论文中,我们分析了一个补充的观点,这个观点源于不断增加的预先训练和定期改进的最新模型的可用性。虽然新的改进模型发展速度很快,但下游任务变化更慢或保持不变。假设我们有一个大的未标记的数据集,我们想保持准确的预测。每当一个新的、可能更好的ML模型可用时,我们都会遇到两个问题:(i)给定有限的预算,哪些数据点应该使用新模型重新评估?;如果新的预测与当前的不同,我们应该更新吗?问题(i)是关于计算成本的,这对于非常大的数据集和模型非常重要。问题(ii)是关于保持预测的一致性,这可能与下游应用程序高度相关;我们的要求是避免消极的转变,即将正确的预测转变为错误的预测。本文将预测更新问题形式化,提出了一种有效的概率方法来解决上述问题。在标准分类基准数据集上的大量实验表明,在向后兼容预测更新的关键度量上,我们的方法优于其他策略。 摘要:When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly improving state-of-the-art models. While new improved models develop at a fast pace, downstream tasks vary more slowly or stay constant. Assume that we have a large unlabelled data set for which we want to maintain accurate predictions. Whenever a new and presumably better ML models becomes available, we encounter two problems: (i) given a limited budget, which data points should be re-evaluated using the new model?; and (ii) if the new predictions differ from the current ones, should we update? Problem (i) is about compute cost, which matters for very large data sets and models. Problem (ii) is about maintaining consistency of the predictions, which can be highly relevant for downstream applications; our demand is to avoid negative flips, i.e., changing correct to incorrect predictions. In this paper, we formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates.

【3】 Online Metro Origin-Destination Prediction via Heterogeneous Information Aggregation 标题:基于异构信息聚合的在线地铁始发地预测

作者:Lingbo Liu,Yuying Zhu,Guanbin Li,Ziyi Wu,Lei Bai,Mingzhi Mao,Liang Lin 备注:UnderReview 链接:https://arxiv.org/abs/2107.00946 摘要:地铁始发-终点预测是智能交通管理中一项关键而又富有挑战性的任务,其目的是准确预测两种特定类型的跨站客流,即始发-终点(OD)和目的地-始发(DO)。然而,在在线地铁系统中,以往时间间隔的OD矩阵不能立即得到完整的OD矩阵,传统的方法只能利用有限的信息分别预测未来OD和客流量,它充分利用历史数据的异构信息(如不完全OD矩阵、未完成序向量和DO矩阵)来联合学习OD和DO的演化模式。具体来说,OD建模分支显式地估计未完成订单的潜在目的地以补充不完全OD矩阵的信息,而DO建模分支以DO矩阵作为输入来捕获DO乘客的时空分布。此外,还引入了双信息变换器来传递OD特征和DO特征之间的互信息,以建立OD-DO因果关系和相关性模型。基于提出的HIAM,我们开发了一个统一的Seq2Seq网络来同时预测未来OD和DO客流量。在两个大型测试平台上进行的大量实验证明了该方法的有效性。 摘要:Metro origin-destination prediction is a crucial yet challenging task for intelligent transportation management, which aims to accurately forecast two specific types of cross-station ridership, i.e., Origin-Destination (OD) one and Destination-Origin (DO) one. However, complete OD matrices of previous time intervals can not be obtained immediately in online metro systems, and conventional methods only used limited information to forecast the future OD and DO ridership separately.In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e.g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership. Specifically, an OD modeling branch estimates the potential destinations of unfinished orders explicitly to complement the information of incomplete OD matrices, while a DO modeling branch takes DO matrices as input to capture the spatial-temporal distribution of DO ridership. Moreover, a Dual Information Transformer is introduced to propagate the mutual information among OD features and DO features for modeling the OD-DO causality and correlation. Based on the proposed HIAM, we develop a unified Seq2Seq network to forecast the future OD and DO ridership simultaneously. Extensive experiments conducted on two large-scale benchmarks demonstrate the effectiveness of our method for online metro origin-destination prediction.

【4】 MegazordNet: combining statistical and machine learning standpoints for time series forecasting 标题:MegazordNet:结合统计和机器学习观点进行时间序列预测

作者:Angelo Garangau Menezes,Saulo Martiello Mastelini 机构: Instituto de Ciˆencias Matem´aticas e de Computac¸˜ao – Universidade de S˜ao Paulo, Av. Trabalhador S˜ao Carlense, – ,-, S˜ao Carlos – SP, Brasil. 链接:https://arxiv.org/abs/2107.01017 摘要:由于金融时间序列的混沌特性,预测金融时间序列是一项困难的任务。统计方法在预测市场走向、股票单一价格等具体问题上取得了良好的效果;然而,随着近年来深度学习和大数据技术的发展,金融时间序列预测的新方法应运而生。此外,最近的文献表明,与单一解相比,采用统计和机器学习相结合的方法可以提高预测的准确性。考虑到上述方面,在这项工作中,我们提出了MegazordNet,这是一个探索金融序列内统计特征的框架,结合了一个用于时间序列预测的结构化深度学习模型。我们使用不同的指标评估了我们预测标准普尔500指数股票收盘价的方法,我们能够击败单一的统计和机器学习方法。 摘要:Forecasting financial time series is considered to be a difficult task due to the chaotic feature of the series. Statistical approaches have shown solid results in some specific problems such as predicting market direction and single-price of stocks; however, with the recent advances in deep learning and big data techniques, new promising options have arises to tackle financial time series forecasting. Moreover, recent literature has shown that employing a combination of statistics and machine learning may improve accuracy in the forecasts in comparison to single solutions. Taking into consideration the mentioned aspects, in this work, we proposed the MegazordNet, a framework that explores statistical features within a financial series combined with a structured deep learning model for time series forecasting. We evaluated our approach predicting the closing price of stocks in the S&P 500 using different metrics, and we were able to beat single statistical and machine learning methods.

【5】 Inter-Beat Interval Estimation with Tiramisu Model: A Novel Approach with Reduced Error 标题:基于Tiramisu模型的拍间间隔估计:一种降低误差的新方法

作者:Asiful Arefeen,Ali Akbari,Seyed Iman Mirzadeh,Roozbeh Jafari,Behrooz A. Shirazi,Hassan Ghasemzadeh 机构:EECS, Washington State University, Texas A&M University, BME, CSE and ECE 备注:16 pages, 14 figures 链接:https://arxiv.org/abs/2107.00693 摘要:搏动间期(IBI)测量可以估计心率变异性(HRV),进而提供潜在心血管疾病的早期指示。然而,从噪声信号中提取IBIs具有挑战性,因为在噪声存在的情况下,信号的形态会发生畸变。一个人在剧烈运动时的心电图(ECG)被噪声严重破坏,称为运动伪影,从中提取的IBI是不准确的。作为远程健康监护和可穿戴系统开发的一部分,对心电信号进行去噪处理和正确估计IBIs已经成为信号处理研究人员的一个新兴课题。除传统方法外,深度学习技术近年来已成功地应用于信号去噪,诊断过程也变得更加简单,从而达到了以前无法达到的精度水平。我们提出了一种深度学习方法,利用tiramisu自动编码器模型来抑制运动伪影噪声,并使ECG信号的R峰在高强度运动的情况下仍然突出。在去噪后,IBI的估计更加准确,加快了诊断任务。结果表明,我们的方法可以从信噪比高达-30dB的含噪心电信号中估计出IBI,平均均方根误差(RMSE)为13毫秒。在这种噪声水平下,我们的错误率保持在8%以下,并优于其他最先进的技术。 摘要:Inter-beat interval (IBI) measurement enables estimation of heart-rate variability (HRV) which, in turns, can provide early indication of potential cardiovascular diseases. However, extracting IBIs from noisy signals is challenging since the morphology of the signal is distorted in the presence of the noise. Electrocardiogram (ECG) of a person in heavy motion is highly corrupted with noise, known as motion-artifact, and IBI extracted from it is inaccurate. As a part of remote health monitoring and wearable system development, denoising ECG signals and estimating IBIs correctly from them have become an emerging topic among signal-processing researchers. Apart from conventional methods, deep-learning techniques have been successfully used in signal denoising recently, and diagnosis process has become easier, leading to accuracy levels that were previously unachievable. We propose a deep-learning approach leveraging tiramisu autoencoder model to suppress motion-artifact noise and make the R-peaks of the ECG signal prominent even in the presence of high-intensity motion. After denoising, IBIs are estimated more accurately expediting diagnosis tasks. Results illustrate that our method enables IBI estimation from noisy ECG signals with SNR up to -30dB with average root mean square error (RMSE) of 13 milliseconds for estimated IBIs. At this noise level, our error percentage remains below 8% and outperforms other state of the art techniques.

其他神经网络|深度学习|模型|建模(16篇)

【1】 CHISEL: Compression-Aware High-Accuracy Embedded Indoor Localization with Deep Learning 标题:基于深度学习的压缩感知高精度嵌入式室内定位

作者:Liping Wang,Saideep Tiku,Sudeep Pasricha 链接:https://arxiv.org/abs/2107.01192 摘要:GPS技术彻底改变了我们在户外定位和导航的方式。然而,GPS信号在建筑物中的接收性能较差,不适合室内定位。基于WiFi指纹的室内定位是满足这一需求的最有前途的方法之一。不幸的是,该领域的大多数工作都无法解决与资源有限的嵌入式设备上的可部署性相关的挑战。在这项工作中,我们提出了一个压缩感知和高精度的深度学习框架称为凿子,它在保持嵌入式设备上的定位鲁棒性的同时,优于该领域最著名的工作。 摘要:GPS technology has revolutionized the way we localize and navigate outdoors. However, the poor reception of GPS signals in buildings makes it unsuitable for indoor localization. WiFi fingerprinting-based indoor localization is one of the most promising ways to meet this demand. Unfortunately, most work in the domain fails to resolve challenges associated with deployability on resource-limited embedded devices. In this work, we propose a compression-aware and high-accuracy deep learning framework called CHISEL that outperforms the best-known works in the area while maintaining localization robustness on embedded devices.

【2】 Artificial Neural Network for Cybersecurity: A Comprehensive Review 标题:人工神经网络在网络安全中的应用综述

作者:Prajoy Podder,Subrato Bharati,M. Rubaiyat Hossain Mondal,Pinto Kumar Paul,Utku Kose 机构: Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka-, Ranada Prasad Shaha University, Narayanganj, Bangladesh, Suleyman Demirel University, Isparta, Turkey 备注:None 链接:https://arxiv.org/abs/2107.01185 摘要:网络安全是一个非常新兴的领域,可以保护系统、网络和数据免受数字攻击。随着互联网规模的扩大和网络攻击的演变,开发新型的网络安全工具显得尤为重要,尤其是物联网网络。本文系统地回顾了深度学习方法在网络安全中的应用。本文简要介绍了网络安全中常用的DL方法,包括深层信念网络、生成对抗网络、递归神经网络等。接下来,我们将说明浅层学习和动态学习的区别。此外,还讨论了物联网和其他网络中当前流行的网络攻击,以及DL方法管理这些攻击的有效性。此外,本文还介绍了DL技术、网络安全应用和数据源的研究。接下来,讨论了DL系统用于恶意软件检测和分类、入侵检测和其他常见网络攻击(包括识别文件类型、垃圾邮件和网络流量)的可行性。我们的研究表明,受限Boltzmann机(RBM)对定制数据集的分类精度高达99.72%,而长短时记忆(LSTM)对KDD-Cup-99数据集的分类精度高达99.80%。最后,本文讨论了网络安全对于物联网驱动的医疗系统的重要性。 摘要:Cybersecurity is a very emerging field that protects systems, networks, and data from digital attacks. With the increase in the scale of the Internet and the evolution of cyber attacks, developing novel cybersecurity tools has become important, particularly for Internet of things (IoT) networks. This paper provides a systematic review of the application of deep learning (DL) approaches for cybersecurity. This paper provides a short description of DL methods which is used in cybersecurity, including deep belief networks, generative adversarial networks, recurrent neural networks, and others. Next, we illustrate the differences between shallow learning and DL. Moreover, a discussion is provided on the currently prevailing cyber-attacks in IoT and other networks, and the effectiveness of DL methods to manage these attacks. Besides, this paper describes studies that highlight the DL technique, cybersecurity applications, and the source of datasets. Next, a discussion is provided on the feasibility of DL systems for malware detection and classification, intrusion detection, and other frequent cyber-attacks, including identifying file type, spam, and network traffic. Our review indicates that high classification accuracy of 99.72% is obtained by restricted Boltzmann machine (RBM) when applied to a custom dataset, while long short-term memory (LSTM) achieves an accuracy of 99.80% for KDD Cup 99 dataset. Finally, this article discusses the importance of cybersecurity for reliable and practicable IoT-driven healthcare systems.

【3】 Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods 标题:改进深度度量学习方法泛化能力的损失函数集成

作者:Davood Zabihzadeh 机构:Computer Department, Hakim Sabzevari University, Sabzevar, IRAN, Corresponding Author 备注:27 pages, 12 figures 链接:https://arxiv.org/abs/2107.01130 摘要:深度度量学习(Deep Metric Learning,DML)从输入数据中学习一种非线性的语义嵌入,它将相似的数据对聚集在一起,同时使不同的数据彼此远离。为此,在过去的十年中提出了许多不同的方法,并在各种应用中取得了很好的结果。DML算法的成功在很大程度上取决于它的损失函数。然而,无损失函数是完美的,它只处理最佳相似性嵌入的某些方面。此外,在测试阶段,DML在不可见类别上的可推广性是一个重要的问题,而现有的损失函数并没有考虑这个问题。为了应对这些挑战,我们提出了一种新的方法来组合构建在共享深度特征提取器之上的不同损失。提出的损失集合强制deep模型提取与所有损失一致的特征。由于选择的损失是多样的,并且每个损失都强调最佳语义嵌入的不同方面,因此我们的有效组合方法比任何单个损失都有相当大的改进,并且可以很好地推广到不可见的类别。在这里,选择损失函数没有限制,并且我们的方法可以与任何一组现有的方法一起工作。此外,他们可以在端到端的范例中优化每个损失函数及其权重,而不需要调整任何超参数。在传统的零拍学习(ZSL)环境下,我们在机器视觉领域的一些流行数据集上评估了我们的方法。结果是非常令人鼓舞的,并且表明我们的方法在所有的数据集中都比所有的基线损失有很大的优势。 摘要:Deep Metric Learning (DML) learns a non-linear semantic embedding from input data that brings similar pairs together while keeps dissimilar data away from each other. To this end, many different methods are proposed in the last decade with promising results in various applications. The success of a DML algorithm greatly depends on its loss function. However, no loss function is perfect, and it deals only with some aspects of an optimal similarity embedding. Besides, the generalizability of the DML on unseen categories during the test stage is an important matter that is not considered by existing loss functions. To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor. The proposed ensemble of losses enforces the deep model to extract features that are consistent with all losses. Since the selected losses are diverse and each emphasizes different aspects of an optimal semantic embedding, our effective combining methods yield a considerable improvement over any individual loss and generalize well on unseen categories. Here, there is no limitation in choosing loss functions, and our methods can work with any set of existing ones. Besides, they can optimize each loss function as well as its weight in an end-to-end paradigm with no need to adjust any hyper-parameter. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings. The results are very encouraging and show that our methods outperform all baseline losses by a large margin in all datasets.

【4】 Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning 标题:神经网络层代数:深度学习中容量和压缩的度量框架

作者:Alberto Badias,Ashis Banerjee 机构: and as visit scholar in the Department ofMechanical Engineering, University of Washington 链接:https://arxiv.org/abs/2107.01081 摘要:我们提出了一个新的框架来衡量(深)神经网络的内在特性。虽然我们专注于卷积网络,我们的框架可以外推到任何网络架构。特别地,我们评估了两个网络属性,即容量(与表达能力相关)和压缩,这两个属性都仅依赖于网络结构,并且与训练和测试数据无关。为此,我们提出了两个度量标准:第一个称为层复杂度(layer complexity),它捕获任何网络层的体系结构复杂度;第二种称为层固有功率,它编码了数据在网络中的压缩方式。这些度量是基于层次代数的概念,本文还介绍了层次代数的概念。这一概念基于这样一种思想,即全局属性依赖于网络拓扑结构,并且任何神经网络的叶节点都可以使用局部传递函数来近似,从而允许简单地计算全局度量。我们还使用我们的度量来比较最先进的体系结构的属性,并使用这些属性来分析基准数据集的分类精度。 摘要:We present a new framework to measure the intrinsic properties of (deep) neural networks. While we focus on convolutional networks, our framework can be extrapolated to any network architecture. In particular, we evaluate two network properties, namely, capacity (related to expressivity) and compression, both of which depend only on the network structure and are independent of the training and test data. To this end, we propose two metrics: the first one, called layer complexity, captures the architectural complexity of any network layer; and, the second one, called layer intrinsic power, encodes how data is compressed along the network. The metrics are based on the concept of layer algebra, which is also introduced in this paper. This concept is based on the idea that the global properties depend on the network topology, and the leaf nodes of any neural network can be approximated using local transfer functions, thereby, allowing a simple computation of the global metrics. We also compare the properties of the state-of-the art architectures using our metrics and use the properties to analyze the classification accuracy on benchmark datasets.

【5】 Gamers Private Network Performance Forecasting. From Raw Data to the Data Warehouse with Machine Learning and Neural Nets 标题:玩家专用网络性能预测。基于机器学习和神经网络的原始数据到数据仓库

作者:Albert Wong,Chun Yin Chiu,Gaétan Hains,Jack Humphrey,Hans Fuhrmann,Youry Khmelevsky,Chris Mazur 机构:Mathematics and Statistics, Langara College, Vancouver BC, Canada, Data Analytics, LACL, Université Paris-Est, Créteil, France, Computer Science, Okanagan College, Kelowna BC, Canada, UBCO, WTFast 备注:8 pages, 12 figures 链接:https://arxiv.org/abs/2107.00998 摘要:游戏玩家专用网络(GPN)是一种客户机/服务器技术,它可以保证在线视频游戏的连接比标准的互联网连接更可靠、更低的延迟。GPN技术的用户受益于稳定和高质量的在线游戏体验,这些游戏在世界各地托管和玩。在对WTFast收集的大量原始网络数据进行转换后,我们将清理后的数据构建成一个专用的数据仓库,并利用机器学习、神经网络技术和商业智能工具完成了广泛的分析。这些分析展示了预测和量化网络变化的能力,并展示了当用户连接到在线游戏会话时使用GPN所获得的好处。 摘要:Gamers Private Network (GPN) is a client/server technology that guarantees a connection for online video games that is more reliable and lower latency than a standard internet connection. Users of the GPN technology benefit from a stable and high-quality gaming experience for online games, which are hosted and played across the world. After transforming a massive volume of raw networking data collected by WTFast, we have structured the cleaned data into a special-purpose data warehouse and completed the extensive analysis using machine learning and neural nets technologies, and business intelligence tools. These analyses demonstrate the ability to predict and quantify changes in the network and demonstrate the benefits gained from the use of a GPN for users when connected to an online game session.

【6】 Inverse-Dirichlet Weighting Enables Reliable Training of Physics Informed Neural Networks 标题:逆-Dirichlet加权实现物理信息神经网络的可靠训练

作者:Suryanarayana Maddu,Dominik Sturm,Christian L. Müller,Ivo F. Sbalzarini 机构: Technische Universit¨at Dresden, Dresden, Germany, Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany, Center for Systems Biology Dresden, Dresden, Germany 链接:https://arxiv.org/abs/2107.00940 摘要:我们描述并纠正了在深层神经网络(如物理信息神经网络)训练过程中,由于多尺度动力学和尺度不平衡而可能出现的失效模式。pinn是流行的机器学习模板,允许将物理方程模型与数据无缝集成。他们的训练相当于解决一个优化问题的加权总和的数据保真度和方程保真度的目标。目标之间的冲突可能源于尺度不平衡、数据的异方差性、物理方程的僵化,或者源于连续训练过程中的灾难性干扰。我们解释了由此产生的训练病理学,并提出了一种简单而有效的逆Dirichlet加权策略来缓解这个问题。我们比较了神经网络的Sobolev训练,提供了分析$boldsymbol{epsilon}$-最优训练的基线。我们证明了逆Dirichlet加权在各种应用中的有效性,包括多尺度主动湍流模型,其中我们显示了比传统PINN训练在精度和收敛性方面的数量级改进。对于使用序列训练的逆建模,我们发现逆Dirichlet加权可以防止PINN发生灾难性遗忘。 摘要:We characterize and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks, such as Physics Informed Neural Networks (PINNs). PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data. Their training amounts to solving an optimization problem over a weighted sum of data-fidelity and equation-fidelity objectives. Conflicts between objectives can arise from scale imbalances, heteroscedasticity in the data, stiffness of the physical equation, or from catastrophic interference during sequential training. We explain the training pathology arising from this and propose a simple yet effective inverse-Dirichlet weighting strategy to alleviate the issue. We compare with Sobolev training of neural networks, providing the baseline of analytically $boldsymbol{epsilon}$-optimal training. We demonstrate the effectiveness of inverse-Dirichlet weighting in various applications, including a multi-scale model of active turbulence, where we show orders of magnitude improvement in accuracy and convergence over conventional PINN training. For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.

【7】 Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions 标题:深卷积神经网络理论Ⅲ:逼近径向函数

作者:Tong Mao,Zhongjie Shi,Ding-Xuan Zhou 机构:School of Data Science, City University of Hong Kong, Kowloon, Hong Kong 链接:https://arxiv.org/abs/2107.00896 摘要:我们考虑一组由两组卷积层组成的深度神经网络,一个下采样算子和一个完全连接层。网络结构取决于两个结构参数,这两个参数决定了卷积层的数量和完全连接层的宽度。当逼近函数采用特征多项式$Q$和一元函数$f$的复合形式$fcirc Q$时,我们建立了一个具有显式逼近率的逼近理论。特别地,我们证明了当$mathbb{R}^d$中数据的维数$d$较大时,这种网络在用$Q(x)=| x |^2$逼近径向函数方面优于完全连通的浅层网络。本文首次严格证明了深卷积神经网络在逼近特殊结构函数方面的优越性。在此基础上,我们在回归框架下,以$fcirc Q$形式的回归函数对这种深度网络进行了经验风险最小化的推广分析。我们的网络结构不使用任何复合信息或函数$Q$和$f$可以通过调整结构参数自动提取特征并利用回归函数的复合性质。我们的分析提供了一个误差界,它随着网络深度的减小而减小,然后增大,从理论上验证了在许多实际应用中观察到的网络深度的折衷现象。 摘要:We consider a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer. We establish an approximation theory with explicit approximation rates when the approximated function takes a composite form $fcirc Q$ with a feature polynomial $Q$ and a univariate function $f$. In particular, we prove that such a network can outperform fully connected shallow networks in approximating radial functions with $Q(x) =|x|^2$, when the dimension $d$ of data from $mathbb{R}^d$ is large. This gives the first rigorous proof for the superiority of deep convolutional neural networks in approximating functions with special structures. Then we carry out generalization analysis for empirical risk minimization with such a deep network in a regression framework with the regression function of the form $fcirc Q$. Our network structure which does not use any composite information or the functions $Q$ and $f$ can automatically extract features and make use of the composite nature of the regression function via tuning the structural parameters. Our analysis provides an error bound which decreases with the network depth to a minimum and then increases, verifying theoretically a trade-off phenomenon observed for network depths in many practical applications.

【8】 Reconsidering Dependency Networks from an Information Geometry Perspective 标题:从信息几何角度重新审视依存网络

作者:Kazuya Takabatake,Shotaro Akaho 机构:Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, Central ,: ,-,-, Umezono, Tsukuba, Ibaraki ,-, Japan., Editor: 备注:28pages, 7figures 链接:https://arxiv.org/abs/2107.00871 摘要:依赖网络(Heckerman et al.,2000)是包含大量变量的系统的潜在概率图形模型。与贝叶斯网络一样,依赖网络的结构由一个有向图表示,每个节点都有一个条件概率表。学习和推理在单个节点上局部实现;因此,即使有大量的变量,计算仍然是容易处理的。然而,依赖网络的学习分布是马尔可夫链的平稳分布,称为伪吉布斯抽样,没有封闭形式的表达式。这种技术上的缺陷阻碍了依赖网络的发展。在本文中,我们考虑每个节点的一个流形。然后,我们可以将伪Gibbs采样解释为这些流形上的迭代m-投影。这种解释为伪Gibbs抽样的平稳分布在分布空间中的位置提供了一个理论界。此外,这种解释涉及结构和参数学习算法作为优化问题。此外,我们还对依赖网络和贝叶斯网络进行了实验比较。结果表明,依赖网络和贝叶斯网络在学习分布的准确性方面具有大致相同的性能。结果还表明,依赖网络的学习速度比贝叶斯网络快得多。 摘要:Dependency networks (Heckerman et al., 2000) are potential probabilistic graphical models for systems comprising a large number of variables. Like Bayesian networks, the structure of a dependency network is represented by a directed graph, and each node has a conditional probability table. Learning and inference are realized locally on individual nodes; therefore, computation remains tractable even with a large number of variables. However, the dependency network's learned distribution is the stationary distribution of a Markov chain called pseudo-Gibbs sampling and has no closed-form expressions. This technical disadvantage has impeded the development of dependency networks. In this paper, we consider a certain manifold for each node. Then, we can interpret pseudo-Gibbs sampling as iterative m-projections onto these manifolds. This interpretation provides a theoretical bound for the location where the stationary distribution of pseudo-Gibbs sampling exists in distribution space. Furthermore, this interpretation involves structure and parameter learning algorithms as optimization problems. In addition, we compare dependency and Bayesian networks experimentally. The results demonstrate that the dependency network and the Bayesian network have roughly the same performance in terms of the accuracy of their learned distributions. The results also show that the dependency network can learn much faster than the Bayesian network.

【9】 Learning Primal Heuristics for Mixed Integer Programs 标题:混合整数规划的学习原始启发式算法

作者:Yunzhuang Shen,Yuan Sun,Andrew Eberhard,Xiaodong Li 机构:Computing Technologies, RMIT University, Melbourne, Australia, School of Mathematics, Monash University, School of Science 备注:Accepted by IJCNN'21 链接:https://arxiv.org/abs/2107.00866 摘要:利用机器学习技术,提出了一种新的混合整数规划原始启发式算法。混合整数规划是求解组合优化问题的一种通用技术。在一个求解器中,原始启发式算法在寻找好的可行解方面起着关键的作用,使人们能够从分支定界算法(B&B)的开始就缩小对偶差距,通过积极地修剪B&B树大大提高其性能。在本文中,我们探讨是否有效的原始启发式可以自动学习通过机器学习。提出了一种将优化问题表示为图的新方法,并在已知最优解的问题实例上训练图卷积网络。这反过来又可以预测决策变量的值在最优解的一个看不见的问题实例类似的类型。然后利用B&B方法的一种新结构,即概率分支和引导深度优先搜索(PB-DFS)方法来预测变量解,旨在快速找到(接近)最优解。实验结果表明,与现有的原始启发式算法相比,这种新的启发式算法能够在求解过程的早期找到更好的原始解。 摘要:This paper proposes a novel primal heuristic for Mixed Integer Programs, by employing machine learning techniques. Mixed Integer Programming is a general technique for formulating combinatorial optimization problems. Inside a solver, primal heuristics play a critical role in finding good feasible solutions that enable one to tighten the duality gap from the outset of the Branch-and-Bound algorithm (B&B), greatly improving its performance by pruning the B&B tree aggressively. In this paper, we investigate whether effective primal heuristics can be automatically learned via machine learning. We propose a new method to represent an optimization problem as a graph, and train a Graph Convolutional Network on solved problem instances with known optimal solutions. This in turn can predict the values of decision variables in the optimal solution for an unseen problem instance of a similar type. The prediction of variable solutions is then leveraged by a novel configuration of the B&B method, Probabilistic Branching with guided Depth-first Search (PB-DFS) approach, aiming to find (near-)optimal solutions quickly. The experimental results show that this new heuristic can find better primal solutions at a much earlier stage of the solving process, compared to other state-of-the-art primal heuristics.

【10】 Deep learning-based statistical noise reduction for multidimensional spectral data 标题:基于深度学习的多维谱数据统计降噪

作者:Younsik Kim,Dongjin Oh,Soonsang Huh,Dongjoon Song,Sunbeom Jeong,Junyoung Kwon,Minsoo Kim,Donghan Kim,Hanyoung Ryu,Jongkeun Jung,Wonshik Kyung,Byungmin Sohn,Suyoung Lee,Jounghoon Hyun,Yeonghoon Lee,Yeongkwan Kimand Changyoung Kim 机构:)Center for Correlated Electron Systems, Institute for Basic Science, Seoul, Korea, )Department of Physics and Astronomy, Seoul National University, Seoul, Korea, )Department of Electrical and Computer Engineering, Seoul National University, Seoul 备注:None 链接:https://arxiv.org/abs/2107.00844 摘要:在光谱实验中,由于要覆盖的相空间体积较大,在多维相空间中进行数据采集可能需要较长的采集时间。在这种情况下,可用于数据采集的有限时间对于采集多维光谱数据的实验来说是一个严重的限制。本文以角分辨光电子能谱(ARPES)为例,介绍了一种基于深度学习的智能去噪方法。利用ARPES数据和随机生成的训练数据集,成功地训练了无过拟合的去噪神经网络。去噪神经网络能在保留数据内在信息的同时去除数据中的噪声。我们表明,去噪神经网络允许我们对采集时间缩短两个数量级的数据进行类似水平的二阶导数和线型分析。我们的方法的重要性在于它适用于任何多维光谱数据,这是容易受到统计噪声。 摘要:In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious constraint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an example, we demonstrate a denoising method that utilizes deep learning as an intelligent way to overcome the constraint. With readily available ARPES data and random generation of training data set, we successfully trained the denoising neural network without overfitting. The denoising neural network can remove the noise in the data while preserving its intrinsic information. We show that the denoising neural network allows us to perform similar level of second-derivative and line shape analysis on data taken with two orders of magnitude less acquisition time. The importance of our method lies in its applicability to any multidimensional spectral data that are susceptible to statistical noise.

【11】 An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors 标题:机器学习可再现性的经验报告:给从业者和TensorFlow模型花园贡献者的指导

作者:Vishnu Banna,Akhil Chinnakotla,Zhengxin Yan,Ani Vegesana,Naveen Vivek,Kruthi Krishnappa,Wenxin Jiang,Yung-Hsiang Lu,George K. Thiruvathukal,James C. Davis 机构:Department of Electrical & Computer Engineering, Purdue University, Department of Computer Science, Loyola University Chicago 备注:Technical Report 链接:https://arxiv.org/abs/2107.00821 摘要:机器学习技术正成为科学和工程进步的基本工具。这些技术被应用于各种各样的环境中,如天文学和垃圾邮件过滤。然而,正确应用这些技术需要仔细的工程设计。重视技术潜力;将基于研究的机器学习技术应用到实际应用中所需的软件工程过程受到的关注相对较少。技术公司通过TensorFLow和PyTorch等机器学习框架为工程界提供了支持,但是如何在这些框架中设计复杂的机器学习模型的细节仍然是隐藏的。为了在工程界推广最佳实践,学术机构和谷歌合作成立了一个机器学习模型特别兴趣小组(SIGMODELS),其目标是在TensorFlow Model Garden(TFMG)等社区位置开发著名机器学习模型的示范性实现。本报告的目的是定义一个以适合纳入TFMG的质量水平再现最先进机器学习模型的过程。我们定义了工程流程,并详细阐述了从论文分析到模型发布的每一步。我们报告了我们与26名学生研究人员组成的团队实施YOLO模型系列的经验,分享了我们开发的工具,并描述了我们在这一过程中学到的经验教训。 摘要:Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The purpose of this report is to define a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way.

【12】 Cell-average based neural network method for hyperbolic and parabolic partial differential equations 标题:求解双曲和抛物型偏微分方程的细胞平均神经网络方法

作者:Changxin Qiu,Jue Yan 机构:Department of Mathematics, Iowa State University, Ames, IA , USA 链接:https://arxiv.org/abs/2107.00813 摘要:在有限体积法的基础上,提出了一种基于单元平均的神经网络方法。该方法基于偏微分方程的积分或弱公式。一个简单的前馈网络被迫学习解在两个相邻时间步之间的平均演化。通过离线监督训练,得到最优的网络参数集,唯一地识别出一种有限体积类神经网络方法。一旦经过良好的训练,网络方法将被实现为有限体积格式,因此依赖于网格。与传统数值方法不同的是,该方法不受显式格式CFL的限制,可以适应任意时间步长的解演化。对于热方程,观测到一阶收敛,误差与空间网格大小有关,但与时间无关。基于单元平均的神经网络方法可以在几乎不引入数值扩散的情况下快速演化接触不连续性。对于非线性双曲守恒律,激波和稀疏波被很好地捕捉到。 摘要:Motivated by finite volume scheme, a cell-average based neural network method is proposed. The method is based on the integral or weak formulation of partial differential equations. A simple feed forward network is forced to learn the solution average evolution between two neighboring time steps. Offline supervised training is carried out to obtain the optimal network parameter set, which uniquely identifies one finite volume like neural network method. Once well trained, the network method is implemented as a finite volume scheme, thus is mesh dependent. Different to traditional numerical methods, our method can be relieved from the explicit scheme CFL restriction and can adapt to any time step size for solution evolution. For Heat equation, first order of convergence is observed and the errors are related to the spatial mesh size but are observed independent of the mesh size in time. The cell-average based neural network method can sharply evolve contact discontinuity with almost zero numerical diffusion introduced. Shock and rarefaction waves are well captured for nonlinear hyperbolic conservation laws.

【13】 The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models 标题:聚光灯:发现深度学习模型系统误差的通用方法

作者:Greg d'Eon,Jason d'Eon,James R. Wright,Kevin Leyton-Brown 机构:University of Alberta, Canada, University of British Columbia, Canada 链接:https://arxiv.org/abs/2107.00758 摘要:有监督学习模型往往会在数据的罕见子集上产生系统性错误。然而,这种系统性错误可能很难识别,因为只有当敏感组已知并明确标记时,模型性能才能在敏感组中分解。本文介绍了一种发现系统误差的方法,我们称之为聚光灯法。关键思想是,相似的输入往往在神经网络的最后一个隐藏层中有相似的表示。我们通过在这个表示空间上“聚焦”来利用这个结构,找到模型性能较差的连续区域。我们发现,聚光灯表面语义上有意义的薄弱环节,在各种各样的模型架构,包括图像分类器,语言模型和推荐系统。 摘要:Supervised learning models often make systematic errors on rare subsets of the data. However, such systematic errors can be difficult to identify, as model performance can only be broken down across sensitive groups when these groups are known and explicitly labelled. This paper introduces a method for discovering systematic errors, which we call the spotlight. The key idea is that similar inputs tend to have similar representations in the final hidden layer of a neural network. We leverage this structure by "shining a spotlight" on this representation space to find contiguous regions where the model performs poorly. We show that the spotlight surfaces semantically meaningful areas of weakness in a wide variety of model architectures, including image classifiers, language models, and recommender systems.

【14】 Shared Data and Algorithms for Deep Learning in Fundamental Physics 标题:用于基础物理深度学习的共享数据和算法

作者:Lisa Benato,Erik Buhmann,Martin Erdmann,Peter Fackeldey,Jonas Glombitza,Nikolai Hartmann,Gregor Kasieczka,William Korcari,Thomas Kuhr,Jan Steinheimer,Horst Stöcker,Tilman Plehn,Kai Zhou 机构:Received: date Accepted: date 备注:13 pages, 5 figures, 5 tables 链接:https://arxiv.org/abs/2107.00656 摘要:我们将介绍一组来自基础物理研究的数据集——包括粒子物理、天体粒子物理、强子和核物理——用于有监督的机器学习研究。这些数据集,包括强子顶夸克、宇宙射线诱导的空气簇射、强子物质的相变和发生器级的历史,被公开,以简化基础物理中跨学科机器学习和转移学习的未来工作。基于这些数据,我们提出了一种简单而灵活的基于图的神经网络结构,可以很容易地应用于这些领域的广泛的有监督学习任务。我们表明,我们的方法在所有数据集上都达到接近最先进的专用方法的性能。为了简化对各种问题的适应,我们提供了易于遵循的说明,说明如何构造与基础物理相关的数据结构的基于图形的表示,并为其中几个问题提供代码实现。文中还给出了我们提出的方法和所有参考算法的实现。 摘要:We introduce a collection of datasets from fundamental physics research -- including particle physics, astroparticle physics, and hadron- and nuclear physics -- for supervised machine learning studies. These datasets, containing hadronic top quarks, cosmic-ray induced air showers, phase transitions in hadronic matter, and generator-level histories, are made public to simplify future work on cross-disciplinary machine learning and transfer learning in fundamental physics. Based on these data, we present a simple yet flexible graph-based neural network architecture that can easily be applied to a wide range of supervised learning tasks in these domains. We show that our approach reaches performance close to state-of-the-art dedicated methods on all datasets. To simplify adaptation for various problems, we provide easy-to-follow instructions on how graph-based representations of data structures, relevant for fundamental physics, can be constructed and provide code implementations for several of them. Implementations are also provided for our proposed method and all reference algorithms.

【15】 Unveiling the structure of wide flat minima in neural networks 标题:揭示神经网络中宽平坦极小点的结构

作者:Carlo Baldassi,Clarissa Lauditi,Enrico M. Malatesta,Gabriele Perugini,Riccardo Zecchina 机构:Artificial Intelligence Lab, Bocconi University, Milano, Italy, Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy 备注:10 pages, 4 figures 链接:https://arxiv.org/abs/2107.01163 摘要:深度学习的成功揭示了神经网络在各个学科中的应用潜力,揭示了一些基本的理论问题。特别是,基于梯度方法的简单变体的学习算法能够找到高度非凸损失函数的近似最优极小值,这是神经网络的一个意想不到的特性,需要深入了解。这种算法即使在有噪声的情况下也能很好地拟合数据,而且具有很好的预测能力。实验结果表明,算法所获得的极小值平坦度与泛化性能之间存在可重复的相关性。同时,统计物理结果表明,在非凸网络中,许多窄极小点可能与数量少得多的宽平坦极小点共存,具有很好的推广性。在这里,我们表明,宽平坦极小值产生于合并的极小值,对应于高利润分类。尽管与零边际解决方案相比,高边际极小值是指数级罕见的,但往往集中在特定区域。这些极小值反过来又被其他边界越来越小的解所包围,导致长距离的解的密集区域。我们的分析还提供了另一种分析方法,当模型参数的数目变化时,估计何时出现平坦极小值,何时算法开始寻找解。 摘要:The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks which needs to be understood in depth. Such algorithms are able to fit the data almost perfectly, even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat minima arise from the coalescence of minima that correspond to high-margin classifications. Despite being exponentially rare compared to zero-margin solutions, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.

【16】 Exploration noise for learning linear-quadratic mean field games 标题:学习线性-二次平均场博弈的探索噪声

作者:François Delarue,Athanasios Vasileiadis 机构: Vasileadis acknowledge the financial support of French ANR project ANR- 19-P 3IA-000 2– 3IA Côte d’Azur – Nice – Interdisciplinary Institute for Artificial Intelligence 链接:https://arxiv.org/abs/2107.00839 摘要:本文的目的是证明公共噪声可以作为学习平均场对策解的探索噪声。这一概念在这里通过一个玩具线性二次模型来说明,对于这个模型,一个合适的公共噪声形式已经被证明可以恢复存在性和唯一性。我们在这里更进一步,并证明了相同形式的公共噪声可能会迫使称为“虚拟游戏”的学习算法收敛,这没有任何进一步的潜在或单调的结构。为了支持我们的理论分析,给出了几个数值例子。 摘要:The goal of this paper is to demonstrate that common noise may serve as an exploration noise for learning the solution of a mean field game. This concept is here exemplified through a toy linear-quadratic model, for which a suitable form of common noise has already been proven to restore existence and uniqueness. We here go one step further and prove that the same form of common noise may force the convergence of the learning algorithm called `fictitious play', and this without any further potential or monotone structure. Several numerical examples are provided in order to support our theoretical analysis.

其他(22篇)

【1】 Empirically Measuring Transfer Distance for System Design and Operation 标题:系统设计和运行中传输距离的经验性测量

作者:Tyler Cody,Stephen Adams,Peter A. Beling 机构:Engineering Systems and Environment, University of Virginia, Charlottesville, USA 链接:https://arxiv.org/abs/2107.01184 摘要:经典的机器学习方法对非平稳性非常敏感。转移学习可以通过从一个系统到另一个系统共享知识来解决非平稳性问题,然而,在机器预测和防御等领域,数据基本上是有限的。因此,迁移学习算法几乎没有可供学习的例子。在这里,我们建议这些对算法学习的限制可以通过系统工程来解决。我们正式定义了转移距离的一般术语,并证明了它在经验量化模型的可转移性的使用。我们考虑在机器重建程序的设计中使用转移距离以允许可转移的预后模型。我们还考虑了转移距离在预测计算机视觉操作性能方面的应用。实践者可以使用所提出的方法来设计和操作系统,并考虑到组件学习系统所面临的学习理论挑战。 摘要:Classical machine learning approaches are sensitive to non-stationarity. Transfer learning can address non-stationarity by sharing knowledge from one system to another, however, in areas like machine prognostics and defense, data is fundamentally limited. Therefore, transfer learning algorithms have little, if any, examples from which to learn. Herein, we suggest that these constraints on algorithmic learning can be addressed by systems engineering. We formally define transfer distance in general terms and demonstrate its use in empirically quantifying the transferability of models. We consider the use of transfer distance in the design of machine rebuild procedures to allow for transferable prognostic models. We also consider the use of transfer distance in predicting operational performance in computer vision. Practitioners can use the presented methodology to design and operate systems with consideration for the learning theoretic challenges faced by component learning systems.

【2】 Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription 标题:VOX Populi,VOX DIY:众包音频转录基准数据集

作者:Nikita Pavlichenko,Ivan Stelmakh,Dmitry Ustalov 机构:Yandex, Moscow, Russia, Carnegie Mellon University, Pittsburgh, PA, USA, Saint Petersburg, Russia 链接:https://arxiv.org/abs/2107.01091 摘要:特定领域的数据是机器学习系统成功地从基准转移到现实生活的关键。对于诸如图像分类这样的简单问题,众包已经成为廉价和高效数据收集的标准工具之一:这在很大程度上要归功于聚合方法研究的进步。然而,由于缺乏这些模式的原则性聚合方法,众包对更复杂任务(如语音识别)的适用性仍然有限。设计高级聚合方法的主要障碍是缺乏训练数据,在这项工作中,我们致力于填补语音识别中的这一空白。为此,我们收集并发布了CrowdSpeech——第一个公开的大规模众包音频转录数据集。对现有数据聚合方法的评估显示了改进的空间,这表明我们的工作可能需要设计更好的算法。在更高的层次上,我们也为使用众包收集高质量数据集这一更普遍的挑战做出了贡献:我们开发了一个原则性的管道,用于构建任何新领域中的众包音频转录数据集。我们通过构建VoxDIY(俄语CrowdSpeech的对应词)来说明它在资源不足的语言上的适用性。我们还发布了允许完全复制我们的数据收集管道的代码,并通过众包分享关于数据收集最佳实践的各种见解。 摘要:Domain-specific data is the crux of the successful transfer of machine learning systems from benchmarks to real life. Crowdsourcing has become one of the standard tools for cheap and time-efficient data collection for simple problems such as image classification: thanks in large part to advances in research on aggregation methods. However, the applicability of crowdsourcing to more complex tasks (e.g., speech recognition) remains limited due to the lack of principled aggregation methods for these modalities. The main obstacle towards designing advanced aggregation methods is the absence of training data, and in this work, we focus on bridging this gap in speech recognition. For this, we collect and release CrowdSpeech -- the first publicly available large-scale dataset of crowdsourced audio transcriptions. Evaluation of existing aggregation methods on our data shows room for improvement, suggesting that our work may entail the design of better algorithms. At a higher level, we also contribute to the more general challenge of collecting high-quality datasets using crowdsourcing: we develop a principled pipeline for constructing datasets of crowdsourced audio transcriptions in any novel domain. We show its applicability on an under-resourced language by constructing VoxDIY -- a counterpart of CrowdSpeech for the Russian language. We also release the code that allows a full replication of our data collection pipeline and share various insights on best practices of data collection via crowdsourcing.

【3】 Design and implementation of an islanded hybrid microgrid system for a large resort center for Penang Island with the proper application of excess energy 标题:合理利用过剩能源的槟榔屿大型度假中心孤岛混合微电网系统设计与实现

作者:SK. A. Shezan,S. Rawdah,Shafin Ali,Ziaur Rahman 机构:|, Rawdah, School of Engineering, RMIT University, Melbourne, Victoria, Australia, School of Science, RMIT University, Correspondence, University, Melbourne, VIC, Australia., Funding information, Federal Government of Australia 备注:None 链接:https://arxiv.org/abs/2107.01032 摘要:由于国际化和文明的发展,能源需求日益加快。然而,正确经济地利用岛上混合微电网系统(IHMS)产生的额外能源,而不是由负荷消耗,是一个重大的全球挑战。为解决上述问题,本研究以马来西亚槟城岛的槟城山度假区为研究对象,探讨有效利用冗余能源的综合健康管理系统。为了有效地利用这些多余的能量,设计了一个电加热器和一个储水箱,考虑到具有适当能量管理的分流负载。此外,系统设计还采用了HOMER-Pro软件进行了有益的实用分析。此外,MATLAB Simulink通过表示2068和19072 kW的值来稳定整个系统,这两个值被确定为度假村每天的近似峰值和平均负荷。此外,优化后的IHMS包括光伏电池、柴油发电机、风力发电机、电池和变流器。与此相邻的是,优化后的系统净现值(NPC)为2166万美元,可再生部分(RF)为27.8%,能源成本(COE)为0.165美元/kWh,CO2为1735836千克/年,过剩能源为517.29MWh/年。由于柴油发电机引入系统已纳入该计划,因此COE为0.217美元/千瓦时,CO2为5124879千克/年,NPC为2325万美元。多余的能量通过电加热器作为分流负载得到有效利用。 摘要:The energy demand is growing daily at an accelerated pace due to the internationalization and development of civilization. Yet proper economic utilization of additional energy generated by the Islanded Hybrid Microgrid System (IHMS) that was not consumed by the load is a major global challenge. To resolve the above-stated summons, this research focuses on a multi-optimal combination of IHMS for the Penang Hill Resort located on Penang Island, Malaysia, with effective use of redundant energy. To avail this excess energy efficiently, an electrical heater along with a storage tank has been designed concerning diversion load having proper energy management. Furthermore, the system design has adopted the HOMER Pro software for profitable and practical analysis. Alongside, MATLAB Simulink had stabilized the whole system by representing the values of 2068 and 19,072 kW that have been determined as the approximated peak and average load per day for the resort. Moreover, the optimized IHMS is comprehended of Photovoltaic (PV) cells, Diesel Generator, Wind Turbine, Battery, and Converter. Adjacent to this, the optimized system ensued in having a Net Present Cost (NPC) of $21.66 million, Renewable Fraction (RF) of 27.8%, Cost of Energy (COE) of $0.165/kWh, CO2 of 1,735,836 kg/year, and excess energy of 517.29MWh per annum. Since the diesel generator lead system was included in the scheme, a COE of $0.217/kWh, CO2 of 5,124,879 kg/year, and NPC of $23.25 million were attained. The amount of excess energy is effectively utilized with an electrical heater as a diversion load.

【4】 WiCluster: Passive Indoor 2D/3D Positioning using WiFi without Precise Labels 标题:WiCluster:使用无精确标签的WiFi进行室内2D/3D被动定位

作者:Ilia Karmanov,Farhad G. Zanjani,Simone Merlin,Ishaque Kadampot,Daniel Dijkman 机构:Qualcomm AI Research 链接:https://arxiv.org/abs/2107.01002 摘要:介绍了一种利用射频信道状态信息(CSI)进行室内无源定位的机器学习方法WiCluster。WiCluster可以预测区域级位置和精确的二维或三维位置,而无需在训练期间使用任何精确的位置标签。先前基于CSI的室内定位工作依赖于使用数字信号处理(DSP)的非参数方法,以及最近的参数方法(例如,完全监督的ML方法)。然而,这些方法不能很好地处理现实环境的复杂性,也不能满足大规模商业部署的要求:基于DSP的方法在非视距条件下的精度显著下降,而有监督的ML方法需要大量难以获取的厘米精度位置标记。相反,WiCluster既精确又需要更弱的标签信息,这些信息很容易收集。我们的第一个贡献是一种新颖的降维方法。它将三重态丢失和多尺度聚类丢失相结合,将高维CSI表示映射到2D/3D潜在空间。我们的第二个贡献是两个弱监督损失映射到一个笛卡尔图这个潜在的空间,导致米精度的位置结果。这些丢失只需要简单的获取先验信息:一个楼层平面图的草图、接入点位置的大致位置和几个在楼层平面图中标记了相应区域的CSI包。第三,我们报告的结果和鲁棒性研究的二维定位在一个单层办公楼和三维定位在一个两层住宅显示我们的方法的鲁棒性。 摘要:We introduce WiCluster, a new machine learning (ML) approach for passive indoor positioning using radio frequency (RF) channel state information (CSI). WiCluster can predict both a zone-level position and a precise 2D or 3D position, without using any precise position labels during training. Prior CSI-based indoor positioning work has relied on non-parametric approaches using digital signal-processing (DSP) and, more recently, parametric approaches (e.g., fully supervised ML methods). However these do not handle the complexity of real-world environments well and do not meet requirements for large-scale commercial deployments: the accuracy of DSP-based method deteriorates significantly in non-line-of-sight conditions, while supervised ML methods need large amounts of hard-to-acquire centimeter accuracy position labels. In contrast, WiCluster is both precise and requires weaker label-information that can be easily collected. Our first contribution is a novel dimensionality reduction method for charting. It combines a triplet-loss with a multi-scale clustering-loss to map the high-dimensional CSI representation to a 2D/3D latent space. Our second contribution is two weakly supervised losses that map this latent space into a Cartesian map, resulting in meter-accuracy position results. These losses only require simple to acquire priors: a sketch of the floorplan, approximate location of access-point locations and a few CSI packets that are labeled with the corresponding zone in the floorplan. Thirdly, we report results and a robustness study for 2D positioning in a single-floor office building and 3D positioning in a two-floor home to show the robustness of our method.

【5】 DeformRS: Certifying Input Deformations with Randomized Smoothing 标题:DeformRS:使用随机平滑验证输入变形

作者:Motasem Alfarra,Adel Bibi,Naeemullah Khan,Philip H. S. Torr,Bernard Ghanem 机构:KAUST, University of Oxford 备注:First two authors contributed equally to this work 链接:https://arxiv.org/abs/2107.00996 摘要:深度神经网络易受像素位移向量场形式的输入变形和其他参数化几何变形(如平移、旋转等)的影响。当前的输入变形验证方法(i)不适用于大型输入数据集上的深度网络,或(ii)只能证明特定类别的变形,例如仅旋转。我们重新构造了一般向量场和参数化变形的随机平滑证明,并分别提出了变形器VF和变形器Par。我们的新公式适用于大型输入数据集上的大型网络。例如,DeformRS Par证明了丰富的变形,包括平移、旋转、缩放、仿射变形和其他视觉对齐的变形,例如通过离散余弦变换基参数化的变形。在MNIST、CIFAR10和ImageNet上的大量实验表明,DeformRS Par在认证精度方面优于现有的最新技术,例如,在ImageNet上设置[-10,10]度的扰动旋转时,认证精度提高了6%。 摘要:Deep neural networks are vulnerable to input deformations in the form of vector fields of pixel displacements and to other parameterized geometric deformations e.g. translations, rotations, etc. Current input deformation certification methods either (i) do not scale to deep networks on large input datasets, or (ii) can only certify a specific class of deformations, e.g. only rotations. We reformulate certification in randomized smoothing setting for both general vector field and parameterized deformations and propose DeformRS-VF and DeformRS-Par, respectively. Our new formulation scales to large networks on large input datasets. For instance, DeformRS-Par certifies rich deformations, covering translations, rotations, scaling, affine deformations, and other visually aligned deformations such as ones parameterized by Discrete-Cosine-Transform basis. Extensive experiments on MNIST, CIFAR10 and ImageNet show that DeformRS-Par outperforms existing state-of-the-art in certified accuracy, e.g. improved certified accuracy of 6% against perturbed rotations in the set [-10,10] degrees on ImageNet.

【6】 ResIST: Layer-Wise Decomposition of ResNets for Distributed Training 标题:RESTACT:面向分布式训练的ResNet的分层分解

作者:Chen Dun,Cameron R. Wolfe,Christopher M. Jermaine,Anastasios Kyrillidis 机构:Department of Computer Science, Rice University, Houston, Texas, USA 备注:11 pages 链接:https://arxiv.org/abs/2107.00961 摘要:提出了{rmtexttt{ResIST}},一种新的分布式剩余网络训练协议{rmtexttt{ResIST}}将一个全局ResNet随机分解为几个浅子ResNet,这些浅子ResNet在同步更新并聚合到全局模型之前,以分布式方式独立地进行多个局部迭代的训练。在下一轮中,随机生成新的子网,并重复该过程。通过构造,每次迭代,{rmtexttt{ResIST}}只向每台机器传递一小部分网络参数,并且在训练期间从不使用完整的模型。因此,{rmtexttt{ResIST}}将ResNet训练的通信、内存和时间需求减少到只有以前方法需求的一小部分。与数据并行训练和使用本地SGD的数据并行训练等常见协议相比,{rmtexttt{ResIST}}减少了挂钟训练时间,同时在模型性能方面具有竞争力。 摘要:We propose {rm texttt{ResIST}}, a novel distributed training protocol for Residual Networks (ResNets). {rm texttt{ResIST}} randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats. By construction, per iteration, {rm texttt{ResIST}} communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, {rm texttt{ResIST}} reduces the communication, memory, and time requirements of ResNet training to only a fraction of the requirements of previous methods. In comparison to common protocols like data-parallel training and data-parallel training with local SGD, {rm texttt{ResIST}} yields a decrease in wall-clock training time, while being competitive with respect to model performance.

【7】 AÇAI: Ascent Similarity Caching with Approximate Indexes 标题:一种基于近似索引的升序相似缓存算法

作者:Tareq Si Salem,Giovanni Neglia,Damiano Carra 机构: and Damiano Carra 2 1Universit´e Cˆote d’Azur, fr 2University of Verona 链接:https://arxiv.org/abs/2107.00957 摘要:相似性搜索是多媒体检索系统和推荐系统中的一项关键操作,在未来的机器学习和增强现实应用中也将发挥重要作用。当这些系统需要为具有严格延迟约束的大型对象提供服务时,靠近最终用户的边缘服务器可以作为相似性缓存来加速检索。本文提出了一种新的相似性缓存策略&c{c}AI,它是在现有技术的基础上改进的:(i)对整个目录使用一个(近似)索引来决定本地服务的对象和从远程服务器检索的对象,以及(ii)镜像提升算法,即使在请求过程不显示任何统计规律的情况下,也能以强保证更新局部对象集。 摘要:Similarity search is a key operation in multimedia retrieval systems and recommender systems, and it will play an important role also for future machine learning and augmented reality applications. When these systems need to serve large objects with tight delay constraints, edge servers close to the end-user can operate as similarity caches to speed up the retrieval. In this paper we present Ac{C}AI, a new similarity caching policy which improves on the state of the art by using (i) an (approximate) index for the whole catalog to decide which objects to serve locally and which to retrieve from the remote server, and (ii) a mirror ascent algorithm to update the set of local objects with strong guarantees even when the request process does not exhibit any statistical regularity.

【8】 From Personalized Medicine to Population Health: A Survey of mHealth Sensing Techniques 标题:从个性化医学到人群健康:移动健康传感技术综述

作者:Zhiyuan Wang,Haoyi Xiong,Jie Zhang,Sijia Yang,Mehdi Boukhechba,Laura E. Barnes,Daqing Zhang 机构: Zhang are with the Department of Computer Science, Peking University 备注:Submitted to a journal for review 链接:https://arxiv.org/abs/2107.00948 摘要:移动感知应用程序作为一种实用的方法被广泛应用于收集个人的行为和健康相关信息,并提供及时的干预以促进健康和福祉,如心理健康和慢性病护理。由于移动传感的目标可以是个体的个性化医疗,也可以是人群的公共卫生,因此在这项工作中,我们回顾了这些移动传感应用程序的设计,并建议将这些应用程序/系统的设计分为两种范式--emph{(i)个人感知}和emph{(ii)人群感知}范式。虽然这两种传感模式都可能与常见的无处不在的传感技术相结合,例如可穿戴传感器、移动监控、移动数据卸载和/或基于云的数据分析,以收集和处理来自个人的传感数据,我们提出了一个新的分类系统,它包含两个主要组件,可以从健康感知的生命周期方面对应用程序/系统进行指定和分类:emph{(1)感知任务创建参与}、emph{(2)健康监测数据收集}、emph{(3)数据分析知识发现}。针对这两种模式的不同目标,本文系统地回顾了这一领域,并从这两个组件之间的配置和交互的角度总结了典型应用程序/系统的设计。除了总结,提出的分类系统也有助于找出潜在的方向,移动传感健康从个性化的药物和人群健康的角度。 摘要:Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either emph{(a) personalized medicine for individuals} or emph{(b) public health for populations}, in this work we review the design of these mobile sensing apps, and propose to categorize the design of these apps/systems in two paradigms -- emph{(i) Personal Sensing} and emph{(ii) Crowd Sensing} paradigms. While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: emph{(1) Sensing Task Creation & Participation}, emph{(2) Health Surveillance & Data Collection}, and emph{(3) Data Analysis & Knowledge Discovery}. With respect to different goals of the two paradigms, this work systematically reviews this field, and summarizes the design of typical apps/systems in the view of the configurations and interactions between these two components. In addition to summarization, the proposed taxonomy system also helps figure out the potential directions of mobile sensing for health from both personalized medicines and population health perspectives.

【9】 Decision tree heuristics can fail, even in the smoothed setting 标题:即使在平滑设置中,决策树启发式也可能失败

作者:Guy Blanc,Jane Lange,Mingda Qiao,Li-Yang Tan 机构:Stanford, MIT 备注:To appear in RANDOM 2021 链接:https://arxiv.org/abs/2107.00819 摘要:贪婪决策树学习启发式算法是机器学习实践的支柱,但其经验成功的理论依据仍然难以捉摸。事实上,人们早就知道有一些简单的目标函数会严重失败(Kearns和Mansour,STOC 1996)。Brutzkus、Daniely和Malach(COLT 2020)最近的工作认为平滑分析模型是解决这种脱节的可能途径。在平滑设置和目标$f$为$k$-juntas的情况下,他们表明这些启发式算法成功地学习了$f$和深度-$k$决策树假设。他们推测,对于深度为$k$决策树的目标,同样的保证更为普遍。我们为这个猜想提供了一个反例:我们构造了深度为$k$决策树的目标,并表明即使在平滑设置下,这些启发式算法在获得高精度之前也会构建深度为$2^{Omega(k)}$的树。我们还表明,Brutzkus等人的保证不能扩展到不可知论的设置:有非常接近$k$-juntas的目标,对于这些目标,这些启发式算法在获得高精度之前构建深度为$2^{Omega(k)}$的树。 摘要:Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets $f$ that are $k$-juntas, they showed that these heuristics successfully learn $f$ with depth-$k$ decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-$k$ decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-$k$ decision trees and show that even in the smoothed setting, these heuristics build trees of depth $2^{Omega(k)}$ before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to $k$-juntas, for which these heuristics build trees of depth $2^{Omega(k)}$ before achieving high accuracy.

【10】 Mitigating deep double descent by concatenating inputs 标题:通过级联输入来缓解深度双重下降

作者:John Chen,Qihan Wang,Anastasios Kyrillidis 机构: There is strong evidence thatincreasing the number of parameters leads to better gener-Equal contribution 1Department of Computer Science, RiceUniversity 链接:https://arxiv.org/abs/2107.00797 摘要:双下降曲线是深层神经网络最有趣的特性之一。它将经典的偏差-方差曲线与现代神经网络的行为进行了对比,后者发生在样本数接近参数数的地方。在这项工作中,我们探讨了双下降现象之间的联系和样本数在深神经网络设置。特别地,我们提出了一种通过人工增加样本数来扩充现有数据集的结构。这种构造从经验上缓解了这种情况下的双下降曲线。我们复制现有的工作,深双下降,并观察到顺利下降到超参数化地区,为我们的建设。这既与模型大小有关,也与历元数有关。 摘要:The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of parameters. In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting. In particular, we propose a construction which augments the existing dataset by artificially increasing the number of samples. This construction empirically mitigates the double descent curve in this setting. We reproduce existing work on deep double descent, and observe a smooth descent into the overparameterized region for our construction. This occurs both with respect to the model size, and with respect to the number epochs.

【11】 On the Bike Spreading Problem 标题:关于自行车推广问题的几点思考

作者:Elia Costa,Francesco Silvestri 机构:University of Padova, Dept. of Information Engineering, Italy 链接:https://arxiv.org/abs/2107.00761 摘要:自由浮动自行车共享系统(FFBSS)是一种无码头租赁系统,个人可以在服务区内的任何地方借用自行车并归还。为了改善租赁服务,应在整个服务区内分配可用的自行车:从任何位置离开的客户都更有可能找到附近的自行车,然后使用服务。此外,在整个服务区推广自行车可提高城市空间公平性,因为快速公交系统的好处不仅仅是少数几个地区的特权。为了保证这种分配,FFBSS运营商可以使用面包车手动重新安置自行车,但这会带来高昂的经济和环境成本。我们提出了一种新的方法,利用现有的自行车流所产生的客户分发自行车。更具体地说,通过将问题设想为一个影响最大化问题,我们证明了将批量自行车放置在少量区域上是可能的,然后日常使用ffbs将有效地将这些自行车分布在大面积上。我们证明了检测这些区域是NP完全的,但是存在一个简单有效的$1-1/e$近似算法;我们的方法,然后评估数据集的乘坐自由浮动自行车共享系统的帕多瓦市。 摘要:A free-floating bike-sharing system (FFBSS) is a dockless rental system where an individual can borrow a bike and returns it everywhere, within the service area. To improve the rental service, available bikes should be distributed over the entire service area: a customer leaving from any position is then more likely to find a near bike and then to use the service. Moreover, spreading bikes among the entire service area increases urban spatial equity since the benefits of FFBSS are not a prerogative of just a few zones. For guaranteeing such distribution, the FFBSS operator can use vans to manually relocate bikes, but it incurs high economic and environmental costs. We propose a novel approach that exploits the existing bike flows generated by customers to distribute bikes. More specifically, by envisioning the problem as an Influence Maximization problem, we show that it is possible to position batches of bikes on a small number of zones, and then the daily use of FFBSS will efficiently spread these bikes on a large area. We show that detecting these areas is NP-complete, but there exists a simple and efficient $1-1/e$ approximation algorithm; our approach is then evaluated on a dataset of rides from the free-floating bike-sharing system of the city of Padova.

【12】 An Investigation of the (In)effectiveness of Counterfactually Augmented Data 标题:对反事实扩充数据有效性(不)的调查

作者:Nitish Joshi,He He 机构:Department of Computer Science, New York University, Center for Data Science, New York University 链接:https://arxiv.org/abs/2107.00753 摘要:虽然预训练语言模型在自然语言理解基准上取得了优异的性能,但它们往往依赖于虚假的相关性,并且对分布外(OOD)数据的泛化能力较差。最近的工作探索了使用反事实增强数据(CAD)——由最小扰动示例生成的数据来翻转地面真值标签——来识别在分布偏移下不变的稳健特征。然而,使用CAD进行OOD泛化的实证结果却参差不齐。为了解释这种差异,我们从一个线性高斯模型中得出了一些见解,并说明了CAD的缺陷。具体来说,我们表明(a)虽然CAD在识别鲁棒特征方面是有效的,但它可能阻止模型学习未受干扰的鲁棒特征,并且(b)CAD可能加剧数据中存在的虚假相关性。我们的研究结果显示,目前CAD数据集中扰动多样性的缺乏限制了其OOD泛化的有效性,需要创新的众包程序来引出不同的扰动实例。 摘要:While pretrained language models achieve excellent performance on natural language understanding benchmarks, they tend to rely on spurious correlations and generalize poorly to out-of-distribution (OOD) data. Recent work has explored using counterfactually-augmented data (CAD) -- data generated by minimally perturbing examples to flip the ground-truth label -- to identify robust features that are invariant under distribution shift. However, empirical results using CAD for OOD generalization have been mixed. To explain this discrepancy, we draw insights from a linear Gaussian model and demonstrate the pitfalls of CAD. Specifically, we show that (a) while CAD is effective at identifying robust features, it may prevent the model from learning unperturbed robust features, and (b) CAD may exacerbate existing spurious correlations in the data. Our results show that the lack of perturbation diversity in current CAD datasets limits its effectiveness on OOD generalization, calling for innovative crowdsourcing procedures to elicit diverse perturbation of examples.

【13】 q-Paths: Generalizing the Geometric Annealing Path using Power Means 标题:Q-路径:用幂平均推广几何退火路径

作者:Vaden Masrani,Rob Brekelmans,Thang Bui,Frank Nielsen,Aram Galstyan,Greg Ver Steeg,Frank Wood 机构:University of British Columbia,USC Information Sciences Institute, University of Sydney,Sony CSL,MILA, ∗Equal Contribution 备注:arXiv admin note: text overlap with arXiv:2012.07823 链接:https://arxiv.org/abs/2107.00745 摘要:许多常见的机器学习方法都涉及到几何退火路径,即用几何平均值构造的两个感兴趣分布之间的中间密度序列。虽然矩平均路径等替代方法在某些情况下表现出性能提升,但它们的实际适用性仍然受到指数族端点假设和缺乏封闭形式能量函数的限制。在这项工作中,我们引入了$q$-路,这是一个由广义平均概念导出的路族,它包括几何和算术混合作为特例,并且允许一个简单的封闭形式,它包含了非扩展热力学中的变形对数函数。根据之前对几何路径的分析,我们将我们的$q$-路径解释为对应于$q$-指数分布族,并将中间密度的变分表示为最小化到端点的$alpha$-发散的混合物。我们表明,小偏差远离几何路径产生经验收益贝叶斯推理使用序贯蒙特卡罗和生成模型评估使用退火重要性抽样。 摘要:Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $alpha$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.

【14】 Gap-Dependent Bounds for Two-Player Markov Games 标题:两人马尔可夫对策的间隙相关界

作者:Zehao Dou,Zhuoran Yang,Zhaoran Wang,Simon S. Du 机构:Yale University, Princeton University, Northwestern University, Simon S.Du, University of Washington 备注:34 pages 链接:https://arxiv.org/abs/2107.00685 摘要:Q-learning作为强化学习领域最流行的方法之一,受到越来越多的关注。近年来,关于Q-learning类算法在不同环境下的遗憾界的理论研究越来越多。本文分析了基于2人回合的随机Markov对策(2-TBSG)在进行Nash Q-学习算法时的累积后悔,并提出了在情景表环境下的第一个缺口相关对数上界。这个界限与理论下限仅在对数项以内相匹配。此外,我们将此结论推广到无限视界的折扣对策情形,并提出一个类似的缺口相关对数遗憾界。同时,在线性MDP假设下,我们得到了2-TBSG在集中和独立两种情况下的另一个对数遗憾。 摘要:As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention. Recently, there have been more theoretical works on the regret bound of algorithms that belong to the Q-learning class in different settings. In this paper, we analyze the cumulative regret when conducting Nash Q-learning algorithm on 2-player turn-based stochastic Markov games (2-TBSG), and propose the very first gap dependent logarithmic upper bounds in the episodic tabular setting. This bound matches the theoretical lower bound only up to a logarithmic term. Furthermore, we extend the conclusion to the discounted game setting with infinite horizon and propose a similar gap dependent logarithmic regret bound. Also, under the linear MDP assumption, we obtain another logarithmic regret for 2-TBSG, in both centralized and independent settings.

【15】 A Map of Bandits for E-commerce 标题:电子商务的强盗地图

作者:Yi Liu,Lihong Li 机构:Seattle, WA, United States 备注:Accepted by KDD Bandit and RL workshop: this https URL 链接:https://arxiv.org/abs/2107.00680 摘要:丰富的Bandit文献不仅提供了各种各样的算法工具箱,而且也使得实践者很难找到正确的解决方法来解决手头的问题。关于盗贼的典型教科书侧重于设计和分析算法,而对应用程序的调查通常会列出单个应用程序的清单。虽然这些都是宝贵的资源,但在将应用程序映射到适当的Bandit算法方面还存在差距。在本文中,我们的目的是减少这种差距与结构化地图的土匪,以帮助从业人员找到相关的导航和实用的土匪算法。我们没有提供一个全面的概述,而是集中在与奖励、行动和特性相关的少量关键决策点上,这些关键决策点通常会影响在实践中如何选择Bandit算法。 摘要:The rich body of Bandit literature not only offers a diverse toolbox of algorithms, but also makes it hard for a practitioner to find the right solution to solve the problem at hand. Typical textbooks on Bandits focus on designing and analyzing algorithms, and surveys on applications often present a list of individual applications. While these are valuable resources, there exists a gap in mapping applications to appropriate Bandit algorithms. In this paper, we aim to reduce this gap with a structured map of Bandits to help practitioners navigate to find relevant and practical Bandit algorithms. Instead of providing a comprehensive overview, we focus on a small number of key decision points related to reward, action, and features, which often affect how Bandit algorithms are chosen in practice.

【16】 Multi-user VoiceFilter-Lite via Attentive Speaker Embedding 标题:通过注意的扬声器嵌入实现多用户语音过滤-Lite

作者:Rajeev Rikhye,Quan Wang,Qiao Liang,Yanzhang He,Ian McGraw 机构:Google LLC, USA 链接:https://arxiv.org/abs/2107.01201 摘要:在本文中,我们提出了一个解决方案,使说话人条件语音模型,如语音滤波器Lite,支持在一个单一的过程中,注册用户的任意数量。这是通过在多个说话人嵌入上使用注意机制来计算单个注意嵌入,然后将其用作模型的侧输入来实现的。我们实现了多用户语音过滤器Lite,并对其进行了三个方面的评估:(1)流式自动语音识别(ASR)任务(2) 与文本无关的说话人验证任务;以及(3)个性化的关键短语检测任务,其中ASR必须在噪声环境中检测来自多个注册用户的关键短语。我们的实验表明,在最多4个注册用户的情况下,多用户VoiceFilter-Lite能够显著减少语音重叠时的语音识别和说话人确认错误,而不会影响其他声学条件下的性能。这种专注的说话人嵌入方法也可以很容易地应用于其他说话人条件模型,如个人VAD和个性化ASR。 摘要:In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated it for three tasks: (1) a streaming automatic speech recognition (ASR) task; (2) a text-independent speaker verification task; and (3) a personalized keyphrase detection task, where ASR has to detect keyphrases from multiple enrolled users in a noisy environment. Our experiments show that, with up to four enrolled users, multi-user VoiceFilter-Lite is able to significantly reduce speech recognition and speaker verification errors when there is overlapping speech, without affecting performance under other acoustic conditions. This attentive speaker embedding approach can also be easily applied to other speaker-conditioned models such as personal VAD and personalized ASR.

【17】 Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE 标题:更简单、更快、更强:用FlatNCE打破对比型学习者的log-K魔咒

作者:Junya Chen,Zhe Gan,Xuan Li,Qing Guo,Liqun Chen,Shuyang Gao,Tagyoung Chung,Yi Xu,Belinda Zeng,Wenlian Lu,Fan Li,Lawrence Carin,Chenyang Tao 机构:Duke University ,Microsoft ,Virginia Tech ,Amazon ,Fudan University ,KAUST 链接:https://arxiv.org/abs/2107.01152 摘要:基于信息的对比表征学习者,如SimCLR,近年来取得了巨大的成功。然而,这些对比方案是出了名的资源需求,因为它们的有效性随着小批量训练而崩溃(即log-K诅咒,而K是批量大小)。在这项工作中,我们从数学上揭示了对比学习者在小批量情况下失败的原因,并提出了一个新的简单的、非平凡的对比目标FlatNCE来解决这个问题。与InfoNCE不同的是,我们的flatness不再明确地诉诸于区分性分类目标来进行对比学习。理论上,我们证明了平坦度是信息量的数学对偶形式,从而架起了能量模型经典文献的桥梁;从经验上讲,我们证明了,在对代码进行最小修改的情况下,FlatNCE能够独立于主题工程工作而立即提高性能。对比学习技术的广泛应用,以及对对比训练的监控和诊断的新工具的引入,进一步说明了本研究的意义。我们用CIFAR10、ImageNet和其他数据集的经验证据来证实我们的观点,在这些数据集中,flatness始终优于InfoNCE。 摘要:InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE.

【18】 Interactive Causal Structure Discovery in Earth System Sciences 标题:地球系统科学中的交互式因果结构发现

作者:Laila Melkas,Rafael Savvides,Suyog Chandramouli,Jarmo Mäkelä,Tuomo Nieminen,Ivan Mammarella,Kai Puolamäki 机构:Suyog H. Chandramouli, Jarmo M¨akel¨a, Department of Computer Science, P.O. Box , FI-, University of Helsinki, Helsinki, Finland, Institute for Atmospheric and Earth System ResearchPhysics, Kai Puolam¨aki 备注:23 pages, 8 figures, to be published in Proceedings of the 2021 KDD Workshop on Causal Discovery 链接:https://arxiv.org/abs/2107.01126 摘要:因果结构发现(CSD)模型正在进入包括地球系统科学在内的多个领域。然而,由于所得到的模型往往不考虑专家的领域知识,而且常常需要迭代地修改所得到的模型,因此它们的广泛适应性受到阻碍。我们提出了一个工作流程,需要考虑到这方面的知识和应用CSD算法在地球系统科学。同时,我们描述了仍然需要解决的开放性研究问题。我们提出了一种交互式修改CSD算法输出的方法,并认为用户交互可以建模为似然函数的局部极大后验解的贪婪发现,该似然函数由因果模型的似然和代表专家用户知识的先验分布组成。我们使用一个真实世界的数据集作为例子,与我们的共同作者(领域专家)合作构建。我们表明,在地球系统科学或其他类似领域寻找最大可用的因果模型是一项艰巨的任务,其中包含许多有趣的开放性研究问题。我们认为,考虑领域知识对最终发现的因果模型有很大的影响。 摘要:Causal structure discovery (CSD) models are making inroads into several domains, including Earth system sciences. Their widespread adaptation is however hampered by the fact that the resulting models often do not take into account the domain knowledge of the experts and that it is often necessary to modify the resulting models iteratively. We present a workflow that is required to take this knowledge into account and to apply CSD algorithms in Earth system sciences. At the same time, we describe open research questions that still need to be addressed. We present a way to interactively modify the outputs of the CSD algorithms and argue that the user interaction can be modelled as a greedy finding of the local maximum-a-posteriori solution of the likelihood function, which is composed of the likelihood of the causal model and the prior distribution representing the knowledge of the expert user. We use a real-world data set for examples constructed in collaboration with our co-authors, who are the domain area experts. We show that finding maximally usable causal models in the Earth system sciences or other similar domains is a difficult task which contains many interesting open research questions. We argue that taking the domain knowledge into account has a substantial effect on the final causal models discovered.

【19】 Screening for a Reweighted Penalized Conditional Gradient Method 标题:加权惩罚条件梯度法的筛选

作者:Yifan Sun,Francis Bach 机构:Stony Brook University, INRIA-Paris 链接:https://arxiv.org/abs/2107.01106 摘要:条件梯度法(CGM)是一种广泛应用于大规模稀疏凸优化问题的方法,它对结构化稀疏正则化子具有较低的迭代计算量和贪婪的非零集合方法。我们研究了一般惩罚CGM(P-CGM)对于凸正则化子和重加权惩罚CGM(RP-CGM)对于非凸正则化子的稀疏性获取性质,用规范激励惩罚代替了通常的凸约束。这种泛化不会显著增加每次迭代的复杂性。在不假设有界迭代或使用线搜索的情况下,我们给出了每个子问题的间隙的$O(1/t)$收敛性,它度量到一个固定点的距离。我们将其与筛选规则结合起来,筛选规则在凸情况下是安全的,以$O(1/(delta^2))$的速率收敛到真支撑,其中$deltageq 0$度量问题与简并的接近程度。在非凸情况下,筛选规则在有限次迭代中收敛到真支撑,但在中间迭代中不一定是安全的。在我们的实验中,我们验证了方法的一致性,并通过调整正则化子的凹度来调整筛选规则的攻击性。 摘要:The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach to collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show $O(1/t)$ convergence of the gap of each subproblem, which measures distance to a stationary point. We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate $O(1/(delta^2))$ where $delta geq 0$ measures how close the problem is to degeneracy. In the nonconvex case the screening rule converges to the true support in a finite number of iterations, but is not necessarily safe in the intermediate iterates. In our experiments, we verify the consistency of the method and adjust the aggressiveness of the screening rule by tuning the concavity of the regularizer.

【20】 Generalized Multivariate Signs for Nonparametric Hypothesis Testing in High Dimensions 标题:高维非参数假设检验的广义多元符号

作者:Subhabrata Majumdar,Snigdhansu Chatterjee 机构:Data Science and AI Research, AT&T Chief Data Office, New York, NY , School of Statistics, University of Minnesota Twin Cities, Minneapolis, MN 链接:https://arxiv.org/abs/2107.01103 摘要:高维数据,即特征空间的维数远大于样本大小的数据,出现在许多统计应用中。在此背景下,我们构造了广义多元符号变换,定义为向量除以其范数。对于范数函数的不同选择,所得到的变换向量适应于数据分布的某些几何特征。基于这一思想,我们利用这些广义符号向量得到了高维数据平均向量的一个样本和两个样本的检验过程。这些测试基于使用内核内积的U-统计量,不需要禁止性的假设,并且易于基于快速随机化的实现。通过在一些数据设置中的实验,我们表明使用广义符号的测试显示出比现有测试更高的功率,同时保持了标称的I型错误率。最后,我们提供了MNIST和明尼苏达双胞胎研究基因组数据的应用实例。 摘要:High-dimensional data, where the dimension of the feature space is much larger than sample size, arise in a number of statistical applications. In this context, we construct the generalized multivariate sign transformation, defined as a vector divided by its norm. For different choices of the norm function, the resulting transformed vector adapts to certain geometrical features of the data distribution. Building up on this idea, we obtain one-sample and two-sample testing procedures for mean vectors of high-dimensional data using these generalized sign vectors. These tests are based on U-statistics using kernel inner products, do not require prohibitive assumptions, and are amenable to a fast randomization-based implementation. Through experiments in a number of data settings, we show that tests using generalized signs display higher power than existing tests, while maintaining nominal type-I error rates. Finally, we provide example applications on the MNIST and Minnesota Twin Studies genomic data.

【21】 Conflict-free collective stochastic decision making by orbital angular momentum entangled photons 标题:轨道角动量纠缠光子的无冲突集体随机决策

作者:Takashi Amakasu,Nicolas Chauvet,Guillaume Bachelier,Serge Huant,Ryoichi Horisaki,Makoto Naruse 机构:Department of Mathematical Engineering and Information Physics, The University of Tokyo, Department of Information Physics and Computing, Graduate School of Information Science and Technology, The, University of Tokyo,-,-, Hongo, Bunkyo-ku, Tokyo ,-, Japan. 链接:https://arxiv.org/abs/2107.00877 摘要:近年来,光学和计算机交叉学科的研究表明,利用光的波粒二象性来解决多臂强盗问题是基于单光子的决策。此外,基于纠缠光子的决策已经成功地解决了一个竞争性的多武装土匪问题,在确保平等的同时避免了参与者之间的决策冲突。然而,由于这些研究是基于光的偏振,可用的选择数量仅限于两个,对应于两个正交偏振态。本文提出了一种利用轨道角动量作为光子可调自由度的可伸缩原理来解决竞争决策问题,理论上允许无限个臂。此外,通过将Hong-Ou-Mandel效应推广到两个以上的态,我们从理论上建立了一个能够产生具有轨道角动量的纠缠光子态的实验结构,并且在每个转弯处提供了无冲突选择的条件。我们用数值方法研究了三个武装土匪问题的总报酬,对于这三个问题,所提出的策略几乎达到了理论上的最大值,这比传统的混合策略要大,以实现纳什均衡。这要归功于纠缠特性,它实现了无冲突的选择,甚至在寻找最佳武器的探索阶段也是如此。 摘要:In recent cross-disciplinary studies involving both optics and computing, single-photon-based decision-making has been demonstrated by utilizing the wave-particle duality of light to solve multi-armed bandit problems. Furthermore, entangled-photon-based decision-making has managed to solve a competitive multi-armed bandit problem in such a way that conflicts of decisions among players are avoided while ensuring equality. However, as these studies are based on the polarization of light, the number of available choices is limited to two, corresponding to two orthogonal polarization states. Here we propose a scalable principle to solve competitive decision-making situations by using the orbital angular momentum as the tunable degree of freedom of photons, which theoretically allows an unlimited number of arms. Moreover, by extending the Hong-Ou-Mandel effect to more than two states, we theoretically establish an experimental configuration able to generate entangled photon states with orbital angular momentum and conditions that provide conflict-free selections at every turn. We numerically examine total rewards regarding three-armed bandit problems, for which the proposed strategy accomplishes almost the theoretical maximum, which is greater than a conventional mixed strategy intending to realize Nash equilibrium. This is thanks to the entanglement property that achieves no-conflict selections, even in the exploring phase to find the best arms.

【22】 Flow-based sampling for multimodal distributions in lattice field theory 标题:格场理论中多峰分布的流抽样

作者:Daniel C. Hackett,Chung-Chun Hsieh,Michael S. Albergo,Denis Boyda,Jiunn-Wei Chen,Kai-Feng Chen,Kyle Cranmer,Gurtej Kanwar,Phiala E. Shanahan 机构:Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, MA , USA, The NSF AI Institute for Artificial Intelligence and Fundamental Interactions, Department of Physics and Center for Theoretical Physics 备注:33 pages, 29 figures 链接:https://arxiv.org/abs/2107.00734 摘要:最近的研究结果表明,用基于流的生成模型构造的采样器是格点场理论中一种很有前途的新构型生成方法。本文提出了一套多分离模式(即多真空理论)目标流动模型的建立方法。我们展示了这些方法在二维实标量场理论对称破缺相建模中的应用。在这种情况下,我们研究了不同的基于流的采样算法的性能,包括一种复合采样算法,其中基于流的建议偶尔会通过使用传统算法(如HMC)进行更新而得到增强。 摘要:Recent results have demonstrated that samplers constructed with flow-based generative models are a promising new approach for configuration generation in lattice field theory. In this paper, we present a set of methods to construct flow models for targets with multiple separated modes (i.e. theories with multiple vacua). We demonstrate the application of these methods to modeling two-dimensional real scalar field theory in its symmetry-broken phase. In this context we investigate the performance of different flow-based sampling algorithms, including a composite sampling algorithm where flow-based proposals are occasionally augmented by applying updates using traditional algorithms like HMC.

0 人点赞