机器学习学术速递[6.18]

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.LG 方向，今日共计126篇

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】 Prototypical Graph Contrastive Learning 标题：原型图对比学习

作者：Shuai Lin,Pan Zhou,Zi-Yuan Hu,Shuojia Wang,Ruihui Zhao,Yefeng Zheng,Liang Lin,Eric Xing,Xiaodan Liang 机构：Sun Yat-sen University,Sea AI Lab,Tencent Jarvis Lab, DarkMatter AI Research,Carnegie Mellon University 链接：https://arxiv.org/abs/2106.09645 摘要：图形级表示在各种实际应用中非常重要，例如预测分子的性质。但是在实践中，精确的图形注释通常是非常昂贵和耗时的。为了解决这一问题，图对比学习构建了一个实例识别任务，该任务将正对（同一图的增广对）拉到一起，将负对（不同图的增广对）推开，进行无监督的表征学习。然而，由于对于一个查询，其否定项是从所有图中统一采样的，因此现有的方法存在严重的采样偏差问题，即否定项可能与查询具有相同的语义结构，从而导致性能下降。为了缓解这种抽样偏差问题，本文提出了一种原型图对比学习（PGCL）方法。具体地说，PGCL通过将语义相似的图聚类到同一个组中来建模图数据的底层语义结构，同时鼓励对同一图的不同扩充进行聚类一致性。然后给出一个查询，通过从不同于查询簇的簇中提取图来进行负采样，保证了查询与其负样本之间的语义差异。此外，对于查询，PGCL进一步基于负样本的原型（簇质心）和查询原型之间的距离对其重新加权，使得那些具有中等原型距离的负样本享有相对较大的权重。这种重加权策略被证明比均匀抽样更有效。在各种图形基准上的实验结果证明了我们的PGCL方法优于现有的方法。摘要：Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph contrastive learning constructs instance discrimination task which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this paper, we propose a Prototypical Graph Contrastive Learning (PGCL) approach. Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group, and simultaneously encourages the clustering consistency for different augmentations of the same graph. Then given a query, it performs negative sampling via drawing the graphs from those clusters that differ from the cluster of query, which ensures the semantic difference between query and its negative samples. Moreover, for a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype such that those negatives having moderate prototype distance enjoy relatively large weights. This reweighting strategy is proved to be more effective than uniform sampling. Experimental results on various graph benchmarks testify the advantages of our PGCL over state-of-the-art methods.

【2】 Learning Knowledge Graph-based World Models of Textual Environments 标题：基于知识图学习的文本环境世界模型

作者：Prithviraj Ammanabrolu,Mark O. Riedl 机构：School of Interactive Computing, Georgia Institute of Technology 备注：Preprint. Under review 链接：https://arxiv.org/abs/2106.09608 摘要：世界模型提高了学习代理在交互和情境环境中高效运行的能力。这项工作的重点是建立基于文本的游戏环境的世界模型的任务。基于文本的游戏，或称互动叙事，是一种强化学习环境，在这种环境中，主体使用文本自然语言感知并与世界互动。这些环境包含了漫长的、多步骤的谜题或任务，它们编织在一个充满数百个角色、地点和对象的世界中。我们的世界模型同时学习：（1）将世界表示为知识图时，预测由代理行为引起的世界变化；以及（2）生成在世界上操作所需的一组上下文相关的自然语言动作。通过利用知识图和动作的内在结构，我们将此任务定义为一组序列生成问题，并引入基于变换器的多任务体系结构和损失函数来训练它。一项对从未见过的文本世界的Zero-Shot消融研究表明，我们的方法显著优于现有的文本世界建模技术以及我们的每一项贡献的重要性。摘要：World models improve a learning agent's ability to efficiently operate in interactive and situated environments. This work focuses on the task of building world models of text-based game environments. Text-based games, or interactive narratives, are reinforcement learning environments in which agents perceive and interact with the world using textual natural language. These environments contain long, multi-step puzzles or quests woven through a world that is filled with hundreds of characters, locations, and objects. Our world model learns to simultaneously: (1) predict changes in the world caused by an agent's actions when representing the world as a knowledge graph; and (2) generate the set of contextually relevant natural language actions required to operate in the world. We frame this task as a Set of Sequences generation problem by exploiting the inherent structure of knowledge graphs and actions and introduce both a transformer-based multi-task architecture and a loss function to train it. A zero-shot ablation study on never-before-seen textual worlds shows that our methodology significantly outperforms existing textual world modeling techniques as well as the importance of each of our contributions.

【3】 Rotation Invariant Graph Neural Networks using Spin Convolutions 标题：基于自旋卷积的旋转不变图神经网络

作者：Muhammed Shuaibi,Adeesh Kolluru,Abhishek Das,Aditya Grover,Anuroop Sriram,Zachary Ulissi,C. Lawrence Zitnick 机构： Carnegie Mellon University, Facebook AI Research 备注：13 pages 链接：https://arxiv.org/abs/2106.09575 摘要：通过对原子系统的有效模拟，可以大大加快实现应对气候变化所需的能源突破的进程。基于第一性原理的模拟技术，如密度泛函理论（DFT），由于计算量大，在实际应用中受到限制。机器学习方法有可能以一种有效的计算方式来逼近DFT，这将极大地增加计算模拟对实际问题的影响。近似DFT带来了一些挑战。其中包括精确地模拟原子间相对位置和角度的细微变化，以及实施诸如旋转不变性或能量守恒等约束。介绍了一种在图神经网络中建立相邻原子间角度信息模型的新方法。通过使用每边局部坐标框架和剩余自由度上的新自旋卷积，实现了网络边缘信息的旋转不变性。在结构弛豫和分子动力学的应用中提出了两种模型变体。在大规模的opencatalyst2020数据集上展示了最新的结果。还对MD17和QM9数据集进行了比较。摘要：Progress towards the energy breakthroughs needed to combat climate change can be significantly accelerated through the efficient simulation of atomic systems. Simulation techniques based on first principles, such as Density Functional Theory (DFT), are limited in their practical use due to their high computational expense. Machine learning approaches have the potential to approximate DFT in a computationally efficient manner, which could dramatically increase the impact of computational simulations on real-world problems. Approximating DFT poses several challenges. These include accurately modeling the subtle changes in the relative positions and angles between atoms, and enforcing constraints such as rotation invariance or energy conservation. We introduce a novel approach to modeling angular information between sets of neighboring atoms in a graph neural network. Rotation invariance is achieved for the network's edge messages through the use of a per-edge local coordinate frame and a novel spin convolution over the remaining degree of freedom. Two model variants are proposed for the applications of structure relaxation and molecular dynamics. State-of-the-art results are demonstrated on the large-scale Open Catalyst 2020 dataset. Comparisons are also performed on the MD17 and QM9 datasets.

【4】 Predicting cognitive scores with graph neural networks through sample selection learning 标题：基于样本选择学习的图形神经网络认知分数预测

作者：Martin Hanik,Mehmet Arif Demirtaş,Mohammed Amine Gharsallaoui,Islem Rekik 机构：Received: date Accepted: date 链接：https://arxiv.org/abs/2106.09408 摘要：分析智力和神经活动之间的关系对于理解人脑在健康和疾病中的工作原理至关重要。在现有文献中，功能性脑连接体已成功地用于预测认知测量，如智商（IQ）分数在健康和无序队列使用机器学习模型。然而，现有的方法通过忽略其拓扑特性的矢量化来展平大脑连接体（即图形）。为了解决这一局限性，并从新兴的图形神经网络（GNNs）的启发，我们设计了一个新的回归GNN模型（即RegGNN）来预测IQ分数从大脑连接。在此基础上，我们提出了一种新颖的、全模块化的样本选择方法来选择最佳的样本进行目标预测。然而，由于这种深度学习体系结构训练的计算成本很高，我们进一步提出了一种基于学习的样本选择方法，该方法学习如何选择对未知样本具有最高预期预测能力的训练样本。为此，我们利用了连接体（即它们的邻接矩阵）位于对称正定（SPD）矩阵锥中的事实。我们在全量表和言语智商预测方面的研究结果优于孤独症谱系障碍队列中的比较方法，并通过3倍交叉验证在神经典型受试者中取得了具有竞争力的表现。此外，我们证明了我们的样本选择方法可以推广到其他基于学习的方法，这表明了它的实用性超出了我们的GNN架构。摘要：Analyzing the relation between intelligence and neural activity is of the utmost importance in understanding the working principles of the human brain in health and disease. In existing literature, functional brain connectomes have been used successfully to predict cognitive measures such as intelligence quotient (IQ) scores in both healthy and disordered cohorts using machine learning models. However, existing methods resort to flattening the brain connectome (i.e., graph) through vectorization which overlooks its topological properties. To address this limitation and inspired from the emerging graph neural networks (GNNs), we design a novel regression GNN model (namely RegGNN) for predicting IQ scores from brain connectivity. On top of that, we introduce a novel, fully modular sample selection method to select the best samples to learn from for our target prediction task. However, since such deep learning architectures are computationally expensive to train, we further propose a emph{learning-based sample selection} method that learns how to choose the training samples with the highest expected predictive power on unseen samples. For this, we capitalize on the fact that connectomes (i.e., their adjacency matrices) lie in the symmetric positive definite (SPD) matrix cone. Our results on full-scale and verbal IQ prediction outperforms comparison methods in autism spectrum disorder cohorts and achieves a competitive performance for neurotypical subjects using 3-fold cross-validation. Furthermore, we show that our sample selection approach generalizes to other learning-based methods, which shows its usefulness beyond our GNN architecture.

【5】 MHNF: Multi-hop Heterogeneous Neighborhood information Fusion graph representation learning 标题：MHNF：多跳异构邻域信息融合图表示学习

作者：Dongjie Zhu,Yundong Sun,Haiwen Du,Zhaoshuo Tian 机构：ZhuiswiththeSchoolofComputerScienceandTechnol-ogy, Harbin Institute of Technology 链接：https://arxiv.org/abs/2106.09289 摘要：注意机制使图神经网络（GNNs）能够学习目标节点与其单跳邻居之间的注意权重，进一步提高了性能。然而，现有的gnn大多是面向齐次图的，每一层只能聚合一跳邻居的信息。多层网络的叠加会引入大量的噪声，容易导致过度平滑。提出了一种多跳异构邻域信息融合图表示学习方法。具体地说，我们首先提出一个混合元路径自主抽取模型来有效地抽取多跳混合邻居。然后，提出了一种跳级异构信息聚合模型，该模型在同一混合元路径中选择性地聚合不同的跳邻域信息。最后，提出了一种分层语义注意融合模型（HSAF），该模型能有效地融合不同跳数和不同路径邻域信息。本文解决了多跳邻域信息的聚合问题，并能学习目标任务的混合元路径，减少了人工指定元路径的限制。此外，HSAF还可以提取元路径的内部节点信息，更好地整合不同层次的语义信息。在真实数据集上的实验结果表明，MHNF在节点分类和聚类任务上优于现有的方法（平均相对改进率分别为10.94%-69.09%和11.58%-394.93%）。摘要：Attention mechanism enables the Graph Neural Networks(GNNs) to learn the attention weights between the target node and its one-hop neighbors, the performance is further improved. However, the most existing GNNs are oriented to homogeneous graphs and each layer can only aggregate the information of one-hop neighbors. Stacking multi-layer networks will introduce a lot of noise and easily lead to over smoothing. We propose a Multi-hop Heterogeneous Neighborhood information Fusion graph representation learning method (MHNF). Specifically, we first propose a hybrid metapath autonomous extraction model to efficiently extract multi-hop hybrid neighbors. Then, we propose a hop-level heterogeneous Information aggregation model, which selectively aggregates different-hop neighborhood information within the same hybrid metapath. Finally, a hierarchical semantic attention fusion model (HSAF) is proposed, which can efficiently integrate different-hop and different-path neighborhood information respectively. This paper can solve the problem of aggregating the multi-hop neighborhood information and can learn hybrid metapaths for target task, reducing the limitation of manually specifying metapaths. In addition, HSAF can extract the internal node information of the metapaths and better integrate the semantic information of different levels. Experimental results on real datasets show that MHNF is superior to state-of-the-art methods in node classification and clustering tasks (10.94% - 69.09% and 11.58% - 394.93% relative improvement on average, respectively).

【6】 Smart Contract Vulnerability Detection: From Pure Neural Network to Interpretable Graph Feature and Expert Pattern Fusion 标题：智能合同漏洞检测：从纯神经网络到可解释图特征和专家模式融合

作者：Zhenguang Liu,Peng Qian,Xiang Wang,Lei Zhu,Qinming He,Shouling Ji 机构：Zhejiang University, National University of Singapore, Shandong Normal University 备注：This paper has been accepted by IJCAI 2021 链接：https://arxiv.org/abs/2106.09282 摘要：智能合约拥有价值数十亿美元的数字硬币，其安全问题在过去几年中引起了广泛关注。对于智能合约漏洞检测，传统的方法严重依赖于固定的专家规则，导致检测精度低，可扩展性差。最近的深度学习方法缓解了这个问题，但未能编码有用的专家知识。在这篇论文中，我们探索了以一种可解释的方式将深度学习与专家模式相结合。具体来说，我们开发了从源代码中提取专家模式的自动工具。然后，我们将代码转换为语义图，以提取深层图特征。然后，融合全局图特征和局部专家模式，进行合作逼近最终预测，同时得到可解释的权值。在以太坊和VNT-Chain两个平台上用源代码对所有可用的智能合约进行了实验。从经验上看，我们的系统明显优于最先进的方法。我们的代码发布了。摘要：Smart contracts hold digital coins worth billions of dollars, their security issues have drawn extensive attention in the past years. Towards smart contract vulnerability detection, conventional methods heavily rely on fixed expert rules, leading to low accuracy and poor scalability. Recent deep learning approaches alleviate this issue but fail to encode useful expert knowledge. In this paper, we explore combining deep learning with expert patterns in an explainable fashion. Specifically, we develop automatic tools to extract expert patterns from the source code. We then cast the code into a semantic graph to extract deep graph features. Thereafter, the global graph feature and local expert patterns are fused to cooperate and approach the final prediction, while yielding their interpretable weights. Experiments are conducted on all available smart contracts with source code in two platforms, Ethereum and VNT Chain. Empirically, our system significantly outperforms state-of-the-art methods. Our code is released.

【7】 EEG-GNN: Graph Neural Networks for Classification of Electroencephalogram (EEG) Signals 标题：EEG-GNN：用于脑电信号分类的图形神经网络

作者：Andac Demir,Toshiaki Koike-Akino,Ye Wang,Masaki Haruna,Deniz Erdogmus 机构： Northeastern University 备注：8 pages, 8 figures, under review in EMBC conference 链接：https://arxiv.org/abs/2106.09135 摘要：卷积神经网络（CNN）是一种常用的从脑电信号（EEG）中提取主题不变特征的分类方法。这种方法的基本假设是电极与图像的像素等距，因此无法探索/利用不同电极位置之间复杂的功能神经连接。我们通过调整卷积和池的概念来克服这一限制，这些概念适用于电极位置功能网络的二维网格状输入。此外，我们开发了各种图形神经网络（GNN）模型，将电极投射到图形的节点上，其中节点特征表示为通过试验采集的EEG通道样本，节点可以根据神经科学家制定的灵活策略通过加权/非加权边连接。实验结果表明，我们提出的基于GNN的框架在ErrP和RSVP数据集上优于标准的CNN分类器，并且允许神经科学的解释性和针对EEG相关分类问题的深度学习方法的解释性。基于GNN框架的另一个实际优势是，它可以用于脑电信道选择，这对于降低计算成本和设计便携式脑电耳机至关重要。摘要：Convolutional neural networks (CNN) have been frequently used to extract subject-invariant features from electroencephalogram (EEG) for classification tasks. This approach holds the underlying assumption that electrodes are equidistant analogous to pixels of an image and hence fails to explore/exploit the complex functional neural connectivity between different electrode sites. We overcome this limitation by tailoring the concepts of convolution and pooling applied to 2D grid-like inputs for the functional network of electrode sites. Furthermore, we develop various graph neural network (GNN) models that project electrodes onto the nodes of a graph, where the node features are represented as EEG channel samples collected over a trial, and nodes can be connected by weighted/unweighted edges according to a flexible policy formulated by a neuroscientist. The empirical evaluations show that our proposed GNN-based framework outperforms standard CNN classifiers across ErrP, and RSVP datasets, as well as allowing neuroscientific interpretability and explainability to deep learning methods tailored to EEG related classification problems. Another practical advantage of our GNN-based framework is that it can be used in EEG channel selection, which is critical for reducing computational cost, and designing portable EEG headsets.

【8】 Regularization of Mixture Models for Robust Principal Graph Learning 标题：用于鲁棒主图学习的混合模型的正则化

作者：Tony Bonnaire,Aurélien Decelle,Nabila Aghanim 备注：12 pages, 6 figures 链接：https://arxiv.org/abs/2106.09035 摘要：提出了一种正则化的混合模型，用于从$D$维数据点的分布中学习主图。在脊线检测流形学习的特殊情况下，我们假设基本流形可以被建模为一个图结构，就像高斯聚类的拓扑先验一样，将问题转化为最大后验估计。通过期望最大化过程迭代估计模型参数，使得结构的学习在多项式时间内对任何图的先验收敛都是有效的。我们还嵌入了一种自然的方法，使算法对模式的异常值和流形采样的异方差具有鲁棒性，并与图结构保持一致。该方法利用最小生成树给出的先验图，通过对数据集的随机子样本进行扩展，以考虑空间分布中可以观察到的循环。摘要：A regularized version of Mixture Models is proposed to learn a principal graph from a distribution of $D$-dimensional data points. In the particular case of manifold learning for ridge detection, we assume that the underlying manifold can be modeled as a graph structure acting like a topological prior for the Gaussian clusters turning the problem into a maximum a posteriori estimation. Parameters of the model are iteratively estimated through an Expectation-Maximization procedure making the learning of the structure computationally efficient with guaranteed convergence for any graph prior in a polynomial time. We also embed in the formalism a natural way to make the algorithm robust to outliers of the pattern and heteroscedasticity of the manifold sampling coherently with the graph structure. The method uses a graph prior given by the minimum spanning tree that we extend using random sub-samplings of the dataset to take into account cycles that can be observed in the spatial distribution.

Transformer(2篇)

【1】 XCiT: Cross-Covariance Image Transformers 标题：XCiT：互协方差图像转换器

作者：Alaaeldin El-Nouby,Hugo Touvron,Mathilde Caron,Piotr Bojanowski,Matthijs Douze,Armand Joulin,Ivan Laptev,Natalia Neverova,Gabriel Synnaeve,Jakob Verbeek,Hervé Jegou 机构：Hervé Jégou, Facebook AI, Inria, Sorbonne University 链接：https://arxiv.org/abs/2106.09681 摘要：随着在自然语言处理方面的成功，《Transformer》最近在计算机视觉方面显示出了巨大的前景。Transformer下的自我注意操作产生了所有标记（即单词或图像块）之间的全局交互，并使图像数据的灵活建模超越了卷积的局部交互。然而，这种灵活性带来了时间和内存的二次复杂性，妨碍了对长序列和高分辨率图像的应用。我们提出了一个“转置”版本的自我注意，它跨特征通道而不是令牌进行操作，其中交互基于键和查询之间的互协方差矩阵。由此产生的交叉协方差注意（XCA）在令牌数量上具有线性复杂度，并且允许高效地处理高分辨率图像。我们的互协方差图像变换器（XCiT）是建立在XCA之上的。它结合了传统Transformer的精度和卷积结构的可扩展性。通过在ImageNet-1k上的图像分类和自监督特征学习、COCO上的目标检测和实例分割、ADE20k上的语义分割等多个视觉基准上的实验，验证了XCiT的有效性和通用性。摘要：Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic complexity in time and memory, hindering application to long sequences and high-resolution images. We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images. Our cross-covariance image transformer (XCiT) is built upon XCA. It combines the accuracy of conventional transformers with the scalability of convolutional architectures. We validate the effectiveness and generality of XCiT by reporting excellent results on multiple vision benchmarks, including image classification and self-supervised feature learning on ImageNet-1k, object detection and instance segmentation on COCO, and semantic segmentation on ADE20k.

【2】 Multi-head or Single-head? An Empirical Comparison for Transformer Training 标题：多头的还是单头的？Transformer训练的实证比较

作者：Liyuan Liu,Jialu Liu,Jiawei Han 机构：University of Illinois at Urbana-Champaign, Google Research 备注：Work in progress 链接：https://arxiv.org/abs/2106.09650 摘要：多头注意在Transformer模型最近的成功中起着至关重要的作用，它导致在各种应用中比传统注意一致的性能改进。人们普遍认为，这种有效性源于联合参加多个职位的能力。在本文中，我们首先证明了联合注意多个位置并不是多头注意的一个独特特征，因为多层单头注意也注意多个位置并且更有效。然后，我们认为，多头部注意的主要优点是训练的稳定性，因为它比单头部注意的层次少，当参加相同数量的职位。例如，24层16头Transformer（BERT large）和384层单头Transformer具有相同的总注意头数和大致相同的模型尺寸，而多头Transformer则明显较浅。同时，我们证明，随着近年来深度学习的进展，我们可以成功地稳定384层Transformer的训练。由于训练难度不再是一个瓶颈，更深层的单头Transformer在不调整超参数的情况下实现了一致的性能改进。摘要：Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications. The popular belief is that this effectiveness stems from the ability of jointly attending multiple positions. In this paper, we first demonstrate that jointly attending multiple positions is not a unique feature of multi-head attention, as multi-layer single-head attention also attends multiple positions and is more effective. Then, we suggest the main advantage of the multi-head attention is the training stability, since it has less number of layers than the single-head attention, when attending the same number of positions. For example, 24-layer 16-head Transformer (BERT-large) and 384-layer single-head Transformer has the same total attention head number and roughly the same model size, while the multi-head one is significantly shallower. Meanwhile, we show that, with recent advances in deep learning, we can successfully stabilize the training of the 384-layer Transformer. As the training difficulty is no longer a bottleneck, substantially deeper single-head Transformer achieves consistent performance improvements without tuning hyper-parameters.

GAN|对抗|攻击|生成相关(7篇)

【1】 Adversarial Visual Robustness by Causal Intervention 标题：因果干预下的对抗性视觉稳健性

作者：Kaihua Tang,Mingyuan Tao,Hanwang Zhang 机构：Nanyang Technological University, Alibaba Group 备注：Codes are available at this https URL 链接：https://arxiv.org/abs/2106.09534 摘要：对抗性训练实际上是对抗性例子最有前途的防御手段。然而，它的被动性不可避免地阻止它免疫未知的攻击者。为了实现主动防御，除了流行的有界威胁模型之外，我们还需要对对手的例子有更基本的了解。在本文中，我们提供了一个因果观点的对手脆弱性：原因是混杂普遍存在于学习，攻击者正是利用混杂效应。因此，对抗鲁棒性的根本解决方案是因果干预。由于一般情况下无法观察到混杂因素，因此我们建议使用工具变量进行干预，而无需观察混杂因素。我们将稳健训练方法称为工具变量因果干预（CiiV）。它具有可微采样层和一致性损失，稳定且不受梯度模糊的影响。对MNIST、CIFAR-10和mini-ImageNet数据集中应用的各种攻击者和设置进行的大量实验表明，CiiV对自适应攻击具有鲁棒性。摘要：Adversarial training is the de facto most promising defense against adversarial examples. Yet, its passive nature inevitably prevents it from being immune to unknown attackers. To achieve a proactive defense, we need a more fundamental understanding of adversarial examples, beyond the popular bounded threat model. In this paper, we provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning, where attackers are precisely exploiting the confounding effect. Therefore, a fundamental solution for adversarial robustness is causal intervention. As the confounder is unobserved in general, we propose to use the instrumental variable that achieves intervention without the need for confounder observation. We term our robust training method as Causal intervention by instrumental Variable (CiiV). It has a differentiable retinotopic sampling layer and a consistency loss, which is stable and guaranteed not to suffer from gradient obfuscation. Extensive experiments on a wide spectrum of attackers and settings applied in MNIST, CIFAR-10, and mini-ImageNet datasets empirically demonstrate that CiiV is robust to adaptive attacks.

【2】 Unsupervised Training Data Generation of Handwritten Formulas using Generative Adversarial Networks with Self-Attention 标题：基于带自我注意的生成性对抗性网络的手写公式无监督训练数据生成

作者：Matthias Springstein,Eric Müller-Budack,Ralph Ewerth 机构：TIB – Leibniz Information Centre for, Science and Technology, Hannover, Germany, L,S Research Center, Leibniz, University Hannover 备注：Accepted for publication in: ACM International Conference on Multimedia Retrieval (ICMR) Workshop 2021 链接：https://arxiv.org/abs/2106.09432 摘要：图像和视频帧中手写体数学表达式的识别是一个尚未解决的难题。深对流神经网络基本上是一种很有前途的方法，但通常需要大量的标记训练数据。然而，对于手写体公式识别的任务来说，这样大的训练数据集是不存在的。在本文中，我们介绍了一个系统，它创建了一个大型的综合训练实例的数学表达式是来自乳胶文件。为此，我们提出了一种新的基于注意的生成性对抗网络来将呈现的方程转化为手写公式。这种方法生成的数据集包含数十万个公式，非常适合于预训练或设计更复杂的模型。我们在CROHME 2014基准数据集上评估了我们的合成数据集和识别方法。实验结果证明了该方法的可行性。摘要：The recognition of handwritten mathematical expressions in images and video frames is a difficult and unsolved problem yet. Deep convectional neural networks are basically a promising approach, but typically require a large amount of labeled training data. However, such a large training dataset does not exist for the task of handwritten formula recognition. In this paper, we introduce a system that creates a large set of synthesized training examples of mathematical expressions which are derived from LaTeX documents. For this purpose, we propose a novel attention-based generative adversarial network to translate rendered equations to handwritten formulas. The datasets generated by this approach contain hundreds of thousands of formulas, making it ideal for pretraining or the design of more complex models. We evaluate our synthesized dataset and the recognition approach on the CROHME 2014 benchmark dataset. Experimental results demonstrate the feasibility of the approach.

【3】 Class Balancing GAN with a Classifier in the Loop 标题：在回路中使用分类器的类平衡GAN

作者：Harsh Rangwani,Konda Reddy Mopuri,R. Venkatesh Babu 机构：Indian Institute of Science, Bengaluru, Indian Institute of Technology Tirupati 备注：UAI 2021 链接：https://arxiv.org/abs/2106.09402 摘要：生成性对抗网络（generativediscountarial Networks，GANs）已经迅速演化为模拟日益复杂的图像分布。然而，大多数的发展集中在GANs在平衡数据集上的性能上。我们发现，在不平衡（即长尾）数据集的情况下，现有的GANs及其训练机制在平衡数据集上效果很好。在这项工作中，我们介绍了一种新的理论上激励类平衡正则化训练机构。我们的正则化器利用来自预先训练的分类器的知识来确保数据集中所有类的均衡学习。这是通过建立基于神经网络中观察到的指数遗忘的有效类频率模型，并鼓励GAN关注代表性不足的类来实现的。通过在多个数据集上取得比现有方法更好的性能，我们证明了正则化器在长尾分布学习表示中的实用性。具体来说，当应用于无条件GAN时，它将长尾不自然列表（2019美元数据集）上的FID从13.03美元提高到9.01美元。摘要：Generative Adversarial Networks (GANs) have swiftly evolved to imitate increasingly complex image distributions. However, majority of the developments focus on performance of GANs on balanced datasets. We find that the existing GANs and their training regimes which work well on balanced datasets fail to be effective in case of imbalanced (i.e. long-tailed) datasets. In this work we introduce a novel theoretically motivated Class Balancing regularizer for training GANs. Our regularizer makes use of the knowledge from a pre-trained classifier to ensure balanced learning of all the classes in the dataset. This is achieved via modelling the effective class frequency based on the exponential forgetting observed in neural networks and encouraging the GAN to focus on underrepresented classes. We demonstrate the utility of our regularizer in learning representations for long-tailed distributions via achieving better performance than existing approaches over multiple datasets. Specifically, when applied to an unconditional GAN, it improves the FID from $13.03$ to $9.01$ on the long-tailed iNaturalist-$2019$ dataset.

【4】 Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks 标题：摄像机和激光雷达都不可见：物理世界攻击下基于多传感器融合感知的自主驾驶安全性

作者：Yulong Cao*,Ningfei Wang*,Chaowei Xiao*,Dawei Yang*,Jin Fang,Ruigang Yang,Qi Alfred Chen,Mingyan Liu,Bo Li 机构：∥NVIDIA Research, ‡‡Arizona State University, ††Inceptio, ‡Baidu Research and National Engineering Laboratory of Deep Learning Technology and Application, China 备注：Accepted by IEEE S&P 2021 链接：https://arxiv.org/abs/2106.09249 摘要：在自动驾驶（AD）系统中，感知对安全性和安全性都至关重要。尽管之前对其安全性问题进行了各种研究，但它们都只考虑了对基于摄像头或激光雷达的广告感知的攻击。然而，目前生产广告系统主要采用基于多传感器融合（MSF）的设计，在假设并非所有融合源同时受到（或可能受到）攻击的情况下，该设计原则上可以更稳健地抵抗这些攻击。本文首次研究了基于MSF感知的AD系统的安全问题。我们通过探索同时攻击所有核聚变源的可能性，直接挑战了上述基本MSF设计假设。这让我们第一次了解无国界医生可以提供多少安全保障作为广告感知的一般防御策略。我们将攻击描述为一个优化问题，以生成一个物理上可实现的、对抗性的3D打印对象，从而误导广告系统检测不到它，从而使其崩溃。我们提出了一种新的攻击管道，解决了两个主要的设计挑战：（1）不可微分的目标摄像机和激光雷达传感系统，以及（2）基于激光雷达的广告感知中广泛使用的不可微分单元级聚集特征。我们评估了在真实驾驶场景中，对具有代表性的开源行业级广告系统中包含的MSF的攻击。我们的结果表明，该攻击在不同的对象类型和MSF中的成功率超过90%。我们的攻击也被发现是隐蔽的，对受害者的位置很强大，可以通过MSF算法转移，并且在激光雷达和相机设备进行3D打印和捕捉后，物理世界是可以实现的。为了具体评估端到端的安全影响，我们进一步进行了模拟评估，结果表明，对于工业级AD系统，它可以导致100%的车辆碰撞率。摘要：In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite various prior studies on its security issues, all of them only consider attacks on camera- or LiDAR-based AD perception alone. However, production AD systems today predominantly adopt a Multi-Sensor Fusion (MSF) based design, which in principle can be more robust against these attacks under the assumption that not all fusion sources are (or can be) attacked at the same time. In this paper, we present the first study of security issues of MSF-based perception in AD systems. We directly challenge the basic MSF design assumption above by exploring the possibility of attacking all fusion sources simultaneously. This allows us for the first time to understand how much security guarantee MSF can fundamentally provide as a general defense strategy for AD perception. We formulate the attack as an optimization problem to generate a physically-realizable, adversarial 3D-printed object that misleads an AD system to fail in detecting it and thus crash into it. We propose a novel attack pipeline that addresses two main design challenges: (1) non-differentiable target camera and LiDAR sensing systems, and (2) non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception. We evaluate our attack on MSF included in representative open-source industry-grade AD systems in real-world driving scenarios. Our results show that the attack achieves over 90% success rate across different object types and MSF. Our attack is also found stealthy, robust to victim positions, transferable across MSF algorithms, and physical-world realizable after being 3D-printed and captured by LiDAR and camera devices. To concretely assess the end-to-end safety impact, we further perform simulation evaluation and show that it can cause a 100% vehicle collision rate for an industry-grade AD system.

【5】 Evaluating the Robustness of Bayesian Neural Networks Against Different Types of Attacks 标题：评估贝叶斯神经网络对不同类型攻击的鲁棒性

作者：Yutian Pang,Sheng Cheng,Jueming Hu,Yongming Liu 机构：Arizona State University, Tempe, AZ 链接：https://arxiv.org/abs/2106.09223 摘要：为了评估贝叶斯神经网络在图像分类任务中的鲁棒性增益，我们对最新的贝叶斯神经网络进行了输入扰动和对抗性攻击，并以一个基准CNN模型为参考。选择这些攻击来模拟对基于CNN的机器学习系统的信号干扰和网络攻击。结果表明，在不进行对抗性训练的情况下，贝叶斯神经网络对确定性神经网络模型产生的对抗性攻击具有更高的鲁棒性。贝叶斯网络可以作为正在进行的恶意活动的安全前兆。此外，我们证明了在确定CNN抽取器之后的随机分类器比在随机分类器之前的随机特征抽取器具有足够的鲁棒性增强。建议在安全关键领域内利用随机层来建立决策管道。摘要：To evaluate the robustness gain of Bayesian neural networks on image classification tasks, we perform input perturbations, and adversarial attacks to the state-of-the-art Bayesian neural networks, with a benchmark CNN model as reference. The attacks are selected to simulate signal interference and cyberattacks towards CNN-based machine learning systems. The result shows that a Bayesian neural network achieves significantly higher robustness against adversarial attacks generated against a deterministic neural network model, without adversarial training. The Bayesian posterior can act as the safety precursor of ongoing malicious activities. Furthermore, we show that the stochastic classifier after the deterministic CNN extractor has sufficient robustness enhancement rather than a stochastic feature extractor before the stochastic classifier. This advises on utilizing stochastic layers in building decision-making pipelines within a safety-critical domain.

【6】 Automatic Construction of Evaluation Suites for Natural Language Generation Datasets 标题：自然语言生成数据集评估套件的自动构建

作者：Simon Mille,Kaustubh D. Dhole,Saad Mahamood,Laura Perez-Beltrachini,Varun Gangal,Mihir Kale,Emiel van Miltenburg,Sebastian Gehrmann 机构：Universitat Pompeu Fabra, Amelia Science, IPsoft R&D, trivago N.V., University of Edinburgh, Carnegie Mellon University, Google Research, Tilburg University 链接：https://arxiv.org/abs/2106.09069 摘要：应用于自然语言处理的机器学习方法通常通过将它们的性能总结为一个数字来进行评估，例如准确性。由于大多数测试集都是从整体数据中构造的i.i.d.样本，这种方法过度简化了语言的复杂性，并鼓励过度拟合数据分布的头部。因此，关于代表性不足群体的罕见语言现象或文本并没有平等地纳入评估。为了鼓励更深入的模型分析，研究人员建议使用多个测试集（也称为挑战集）来评估模型的特定能力。在本文中，我们开发了一个基于这种思想的框架，它能够生成受控扰动，并识别文本到标量、文本到文本或数据到文本设置中的子集。通过将此框架应用于GEM世代基准，我们提出了一个由80个挑战集组成的评估套件，展示了它支持的各种分析，并阐明了当前世代模型的局限性。摘要：Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepresented groups are not equally included in the evaluation. To encourage more in-depth model analyses, researchers have proposed the use of multiple test sets, also called challenge sets, that assess specific capabilities of a model. In this paper, we develop a framework based on this idea which is able to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings. By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.

【7】 Localized Uncertainty Attacks 标题：局部化不确定性攻击

作者：Ousmane Amadou Dia,Theofanis Karaletsos,Caner Hazirbas,Cristian Canton Ferrer,Ilknur Kaynar Kabul,Erik Meijer 机构：Facebook 备注：CVPR 2021 Workshop on Adversarial Machine Learning in Computer Vision 链接：https://arxiv.org/abs/2106.09222 摘要：深度学习模型对对抗性干扰的敏感性在对抗性例子中引起了新的关注，导致了许多攻击。然而，这些攻击中的大多数都未能涵盖人类无法察觉的大范围敌对干扰。本文提出了一类新的针对确定性和随机分类器的威胁模型——局部不确定性攻击。在这个威胁模型下，我们只通过扰动输入中分类器不确定的区域来创建对抗性的例子。为了找到这样的区域，我们利用分类器的预测不确定性当分类器是随机的，或者，我们学习一个代理模型来摊销不确定性当它是确定的。与$ellp$ball或不加区别地干扰输入的功能性攻击不同，我们的目标变化可能不太明显。在我们的威胁模型下，这些攻击仍然会产生强大的对抗性例子；示例与输入保持了更大程度的相似性。摘要：The susceptibility of deep learning models to adversarial perturbations has stirred renewed attention in adversarial examples resulting in a number of attacks. However, most of these attacks fail to encompass a large spectrum of adversarial perturbations that are imperceptible to humans. In this paper, we present localized uncertainty attacks, a novel class of threat models against deterministic and stochastic classifiers. Under this threat model, we create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain. To find such regions, we utilize the predictive uncertainty of the classifier when the classifier is stochastic or, we learn a surrogate model to amortize the uncertainty when it is deterministic. Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible. When considered under our threat model, these attacks still produce strong adversarial examples; with the examples retaining a greater degree of similarity with the inputs.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】 Gone Fishing: Neural Active Learning with Fisher Embeddings 标题：消失的钓鱼：基于Fisher嵌入的神经主动学习

作者：Jordan T. Ash,Surbhi Goel,Akshay Krishnamurthy,Sham Kakade 机构：Microsoft Research NYC, University of Washington 链接：https://arxiv.org/abs/2106.09675 摘要：人们越来越需要与深度神经网络相兼容的有效的主动学习算法。虽然有许多经典的、经过深入研究的样本选择方法，但是神经模型的非凸性和不同的内部表示使得如何扩展这些方法还不清楚。本文介绍了BAIT，一种实用的、易处理的、高性能的神经网络主动学习算法。BAIT从参数模型极大似然估计（MLE）的理论分析中得到启示。它利用Fisher信息优化MLE误差的一个界来选择批量样本，通过利用线性代数结构，特别是适合在现代硬件上执行的线性代数结构，可以在大规模上有效地实现。我们的实验表明，BAIT在分类和回归问题上都优于现有的技术，并且具有足够的灵活性，可以用于各种模型结构。摘要：There is an increasing need for effective active learning algorithms that are compatible with deep neural networks. While there are many classic, well-studied sample selection methods, the non-convexity and varying internal representation of neural models make it unclear how to extend these approaches. This article introduces BAIT, a practical, tractable, and high-performing active learning algorithm for neural networks that addresses these concerns. BAIT draws inspiration from the theoretical analysis of maximum likelihood estimators (MLE) for parametric models. It selects batches of samples by optimizing a bound on the MLE error in terms of the Fisher information, which we show can be implemented efficiently at scale by exploiting linear-algebraic structure especially amenable to execution on modern hardware. Our experiments show that BAIT outperforms the previous state of the art on both classification and regression problems, and is flexible enough to be used with a variety of model architectures.

【2】 A Self-supervised Method for Entity Alignment 标题：一种自监督的实体对齐方法

作者：Xiao Liu,Haoyun Hong,Xinghao Wang,Zeyi Chen,Evgeny Kharlamov,Yuxiao Dong,Jie Tang 机构：†Tsinghua University, ‡University of Oslo, §Facebook AI 链接：https://arxiv.org/abs/2106.09395 摘要：实体对齐（entityalignment）是构建大规模知识图的一个基本问题，其目的是识别不同知识图之间的等价实体。在其发展过程中，监督被认为是精确对准的必要条件。受最近自我监督学习进展的启发，我们探讨了在多大程度上可以摆脱实体对齐的监督。现有的监督方法侧重于将每一对阳性（标记的）实体拉近彼此。然而，我们的分析表明，实体对齐的学习实际上可以从将采样的（未标记的）负片推远而不是拉近正对齐对中获益更多。我们利用这一发现设计了一个跨两个KG的对比学习策略。在基准数据集上的大量实验表明，SelfKG在没有监督的情况下可以与最先进的监督基线相匹配或取得可比的结果。SelfKG的性能表明，自监督学习为KGs中的实体对齐提供了巨大的潜力。摘要：Entity alignment, aiming to identify equivalent entities across different knowledge graphs (KGs), is a fundamental problem for constructing large-scale KGs. Over the course of its development, supervision has been considered necessary for accurate alignments. Inspired by the recent progress of self-supervised learning, we explore the extent to which we can get rid of supervision for entity alignment. Existing supervised methods for this task focus on pulling each pair of positive (labeled) entities close to each other. However, our analysis suggests that the learning of entity alignment can actually benefit more from pushing sampled (unlabeled) negatives far away than pulling positive aligned pairs close. We present SelfKG by leveraging this discovery to design a contrastive learning strategy across two KGs. Extensive experiments on benchmark datasets demonstrate that SelfKG without supervision can match or achieve comparable results with state-of-the-art supervised baselines. The performance of SelfKG demonstrates self-supervised learning offers great potential for entity alignment in KGs.

【3】 Unsupervised Path Representation Learning with Curriculum Negative Sampling 标题：基于课程负抽样的无监督路径表示学习

作者：Sean Bin Yang,Chenjuan Guo,Jilin Hu,Jian Tang,Bin Yang 机构：Department of Computer Science, Aalborg University, Denmark, Mila-Quebec AI Institute, HEC Montreal, Canada, CIFAR AI Research Chair 备注：This paper has been accepted by IJCAI-21 链接：https://arxiv.org/abs/2106.09373 摘要：路径表示在各种交通应用中都是非常关键的，例如在路径推荐系统中估计路径排名，在导航系统中估计路径旅行时间。现有的研究通常是以有监督的方式学习特定于任务的路径表示，这需要大量的有标记的训练数据，对其他任务的泛化能力较差。我们提出一个无监督学习框架路径InfoMax（PIM）来学习适用于不同下游任务的通用路径表示。我们首先提出了一种课程负采样方法，根据课程学习的原则，对每个输入路径生成少量的负路径。接下来，emph{PIM}使用互信息最大化从全局和局部视图学习路径表示。在全局视图中，PIM区分了输入路径和负路径的表示。在局部视图中，emph{PIM}将输入路径表示与仅出现在负路径中的节点表示区分开来。这使得学习到的路径表示能够以不同的尺度对全局和局部信息进行编码。利用两个路网数据集对两个下游任务（排名分数估计和行程时间估计）进行的大量实验表明，PIM方法的性能明显优于其他无监督方法，也可以作为一种预训练方法来增强有监督路径表示学习。摘要：Path representations are critical in a variety of transportation applications, such as estimating path ranking in path recommendation systems and estimating path travel time in navigation systems. Existing studies often learn task-specific path representations in a supervised manner, which require a large amount of labeled training data and generalize poorly to other tasks. We propose an unsupervised learning framework Path InfoMax (PIM) to learn generic path representations that work for different downstream tasks. We first propose a curriculum negative sampling method, for each input path, to generate a small amount of negative paths, by following the principles of curriculum learning. Next, emph{PIM} employs mutual information maximization to learn path representations from both a global and a local view. In the global view, PIM distinguishes the representations of the input paths from those of the negative paths. In the local view, emph{PIM} distinguishes the input path representations from the representations of the nodes that appear only in the negative paths. This enables the learned path representations to encode both global and local information at different scales. Extensive experiments on two downstream tasks, ranking score estimation and travel time estimation, using two road network datasets suggest that PIM significantly outperforms other unsupervised methods and is also able to be used as a pre-training method to enhance supervised path representation learning.

【4】 LiRA: Learning Visual Speech Representations from Audio through Self-supervision 标题：LIRA：通过自我监督从音频中学习视觉语音表征

作者：Pingchuan Ma,Rodrigo Mira,Stavros Petridis,Björn W. Schuller,Maja Pantic 机构：BUG Group, Imperial College London, UK, Facebook London, UK, Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany 备注：Accepted for publication at Interspeech 2021 链接：https://arxiv.org/abs/2106.09171 摘要：如今，大量的视听内容在网上被分享，这使得视听自监督学习的前景备受关注。最近的工作集中在每一个这些模式分别，而其他人试图同时在一个跨模态的方式模型。然而，相对而言，很少有人注意到利用一种模式作为训练目标，从另一种模式中学习。在这项工作中，我们提出学习视觉语音表征从音频通过自我监督（LiRA）。具体来说，我们训练了一个ResNet 一致性模型来预测未标记视觉语音的声学特征。通过特征提取和微调实验，我们发现这个预先训练好的模型可以用于单词级和句子级的唇读。结果表明，我们的方法在野外唇读（LRW）数据集上显著优于其他自监督方法，并且在唇读句子2（LRS2）上仅使用了总标记数据的一小部分就达到了最先进的性能。摘要：The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning. Recent works have focused on each of these modalities separately, while others have attempted to model both simultaneously in a cross-modal fashion. However, comparatively little attention has been given to leveraging one modality as a training objective to learn from the other. In this work, we propose Learning visual speech Representations from Audio via self-supervision (LiRA). Specifically, we train a ResNet Conformer model to predict acoustic features from unlabelled visual speech. We find that this pre-trained model can be leveraged towards word-level and sentence-level lip-reading through feature extraction and fine-tuning experiments. We show that our approach significantly outperforms other self-supervised methods on the Lip Reading in the Wild (LRW) dataset and achieves state-of-the-art performance on Lip Reading Sentences 2 (LRS2) using only a fraction of the total labelled data.

【5】 A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams 标题：延迟部分标记数据流的半监督学习研究综述

作者：Heitor Murilo Gomes,Maciej Grzenda,Rodrigo Mello,Jesse Read,Minh Huong Le Nguyen,Albert Bifet 机构： AI Institute, University of WaikatoMACIEJ GRZENDA, Warsaw University of TechnologyRODRIGO MELLO, University of São PauloJESSE READ, University of WaikatoUnlabelled data appear in many domains and are particularly relevant to streaming applications 链接：https://arxiv.org/abs/2106.09170 摘要：未标记的数据出现在许多领域，尤其与流应用程序相关，在流应用程序中，尽管数据非常丰富，但标记的数据却很少。为了解决与这些数据相关的学习问题，可以忽略未标记的数据而只关注标记的数据（监督学习）；使用标记数据并尝试利用未标记数据（半监督学习）；或者假设可以根据要求提供一些标签（主动学习）。第一种方法是最简单的，但是可用的标记数据量会限制预测性能。第二种依赖于发现和利用数据分布的基本特征。第三种方法依赖于外部代理及时提供所需的标签。这项调查特别关注在半监督环境中利用未标记数据的方法。我们还讨论了延迟标记问题，这对完全监督和半监督方法都有影响。我们提出了一个统一的问题设置，讨论了学习的保证和现有的方法，解释了相关问题设置之间的差异。最后，我们回顾了当前的基准实践，并提出了改进建议。摘要：Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning); or assume some labels will be available on request (active learning). The first approach is the simplest, yet the amount of labelled data available will limit the predictive performance. The second relies on finding and exploiting the underlying characteristics of the data distribution. The third depends on an external agent to provide the required labels in a timely fashion. This survey pays special attention to methods that leverage unlabelled data in a semi-supervised setting. We also discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods. We propose a unified problem setting, discuss the learning guarantees and existing methods, explain the differences between related problem settings. Finally, we review the current benchmarking practices and propose adaptations to enhance them.

【6】 SPeCiaL: Self-Supervised Pretraining for Continual Learning 标题：专题：持续学习的自我监督预训

作者：Lucas Caccia,Joelle Pineau 机构：McGill University, Facebook AI Research 链接：https://arxiv.org/abs/2106.09065 摘要：本文提出了一种无监督的表征预训练方法。我们的方法设计了一个元学习目标，通过顺序学习过程进行区分。具体地说，我们在表示上训练一个线性模型，以匹配同一图像的不同增强视图，每个视图按顺序呈现。然后评估线性模型对刚看到的图像进行分类的能力，以及以前迭代的图像。这就产生了有利于快速记忆知识和最小遗忘的表征。我们在连续Few-Shot学习环境下对SPeCiaL进行了评估，结果表明它可以与其他有监督的预训练方法相媲美或优于其他有监督的预训练方法。摘要：This paper presents SPeCiaL: a method for unsupervised pretraining of representations tailored for continual learning. Our approach devises a meta-learning objective that differentiates through a sequential learning process. Specifically, we train a linear model over the representations to match different augmented views of the same image together, each view presented sequentially. The linear model is then evaluated on both its ability to classify images it just saw, and also on images from previous iterations. This gives rise to representations that favor quick knowledge retention with minimal forgetting. We evaluate SPeCiaL in the Continual Few-Shot Learning setting, and show that it can match or outperform other supervised pretraining approaches.

【7】 Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure 标题：基于三维动态场景结构估计的单帧无监督视频预测

作者：Paul Henderson,Christoph H. Lampert,Bernd Bickel 机构：Institute of Science and Technology (IST) Austria 链接：https://arxiv.org/abs/2106.09051 摘要：我们在这项工作中的目标是生成现实的视频给定一个初始帧作为输入。现有的无监督方法没有考虑到这样一个事实，即视频通常显示一个三维环境，即使相机和对象移动，也应该保持帧与帧之间的一致性。我们通过开发一个模型来解决这个问题，该模型首先估计场景的潜在三维结构，包括任何移动对象的分割。然后，它通过模拟对象和摄影机动态，并渲染生成的视图来预测未来的帧。重要的是，它只使用预测未来帧的无监督目标进行端到端训练，没有任何3D信息或分割注释。在两个具有挑战性的自然视频数据集上的实验表明，我们的模型可以从单个帧中估计三维结构和运动分割，从而产生合理和不同的预测。摘要：Our goal in this work is to generate realistic videos given just one initial frame as input. Existing unsupervised approaches to this task do not consider the fact that a video typically shows a 3D environment, and that this should remain coherent from frame to frame even as the camera and objects move. We address this by developing a model that first estimates the latent 3D structure of the scene, including the segmentation of any moving objects. It then predicts future frames by simulating the object and camera dynamics, and rendering the resulting views. Importantly, it is trained end-to-end using only the unsupervised objective of predicting future frames, without any 3D information nor segmentation annotations. Experiments on two challenging datasets of natural videos show that our model can estimate 3D structure and motion segmentation from a single frame, and hence generate plausible and varied predictions.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】 LoRA: Low-Rank Adaptation of Large Language Models 标题：LORA：大型语言模型的低阶改编

作者：Edward J. Hu,Yelong Shen,Phillip Wallis,Zeyuan Allen-Zhu,Yuanzhi Li,Shean Wang,Weizhu Chen 机构：Microsoft Corporation 链接：https://arxiv.org/abs/2106.09685 摘要：自然语言处理的主导范式包括对一般领域数据的大规模预训练和对特定任务或领域的适应。当我们预先训练较大的模型时，传统的重新训练所有模型参数的微调变得不太可行。以GPT-3175b为例，部署许多微调模型的独立实例（每个实例都有175B参数）是极其昂贵的。我们提出低秩自适应（Low-Rank adaption，简称LoRA），它冻结预先训练好的模型权值，并将可训练的秩分解矩阵注入到Transformer结构的每一层中，大大减少了下游任务的可训练参数数目。对于GPT-3，LoRA算法可使可训练参数的个数减少10000倍，计算硬件需求比全微调算法减少3倍。尽管具有较少的可训练参数、较高的训练吞吐量和没有额外的推理延迟，但在GPT-3和GPT-2上，LoRA在模型质量上的性能与微调相当或更好。我们还对语言模式适应中的秩缺陷进行了实证研究，从而揭示了LoRA的有效性。我们在GPT-2中发布了我们的实现https://github.com/microsoft/LoRA . 摘要：The dominant paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, conventional fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example, deploying many independent instances of fine-tuned models, each with 175B parameters, is extremely expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning. LoRA performs on-par or better than fine-tuning in model quality on both GPT-3 and GPT-2, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptations, which sheds light on the efficacy of LoRA. We release our implementation in GPT-2 at https://github.com/microsoft/LoRA .

【2】 SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies 标题：Sucant：用于视觉策略零射泛化的自我专家克隆(Self-Expert Clone)

作者：Linxi Fan,Guanzhi Wang,De-An Huang,Zhiding Yu,Li Fei-Fei,Yuke Zhu,Anima Anandkumar 机构： 3The University of Texas atAustin, 4California Institute of Technology 备注：ICML 2021. Website: this https URL 链接：https://arxiv.org/abs/2106.09678 摘要：泛化是强化学习的一个长期挑战。特别是视觉RL，在高维的观察空间中，很容易被不相关的因素分散注意力。在这项工作中，我们考虑稳健的策略学习，其目标是将Zero-Shot泛化到具有较大分布偏移的未知视觉环境中。我们提出割线，一种新的自我专家克隆技术，利用图像增强分两个阶段解耦鲁棒表示学习策略优化。具体地说，一个专家策略首先由RL从无到有的弱增广训练。然后，学生网络通过有监督学习和强增强学习来学习模仿专家策略，使其表示比专家更能抵抗视觉变化。大量的实验表明，割线显著地提高了在4个具有挑战性的领域中的Zero-Shot泛化的技术水平。与之前的SOTA相比，我们的平均回报改进是：深度控制（ 26.5%）、机器人操作（ 337.8%）、基于视觉的自动驾驶（ 47.7%）和室内物体导航（ 15.8%）。代码发布和视频可在https://linxifan.github.io/secant-site/. 摘要：Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control ( 26.5%), robotic manipulation ( 337.8%), vision-based autonomous driving ( 47.7%), and indoor object navigation ( 15.8%). Code release and video are available at https://linxifan.github.io/secant-site/.

【3】 Adaptive Low-Rank Regularization with Damping Sequences to Restrict Lazy Weights in Deep Networks 标题：深层网络中约束惰性权重的带阻尼序列的自适应低秩正则化

作者：Mohammad Mahdi Bejani,Mehdi Ghatee 机构：Department of Mathematics and Computer Science, Amirkabir University of Technology 备注：Preprint of a paper submitted in Neural Networks, 27 Pages, 4 Tables and 6 Figures. arXiv admin note: text overlap with arXiv:2005.01995 链接：https://arxiv.org/abs/2106.09677 摘要：过度拟合是深度神经网络的关键问题之一。许多正则化方案试图防止盲目的过度拟合。但是，它们降低了训练算法的收敛速度。自适应正则化方法可以更智能地解决过拟合问题。它们通常不会影响整个网络的权重。本文检测导致过拟合的权重层子集。过拟合通过矩阵和张量条件数进行识别。提出了一种自适应低秩（ALR）正则化方法，将加权层的子集收敛到低秩因子分解（LRF）。它通过最小化一个新的基于Tikhonov的损失函数来实现。ALR还鼓励懒惰的权重有助于规则化的时代长大。它使用阻尼序列来增加上一代中的层选择可能性。因此，在降低训练精度之前，ALR减少了延迟权值，并对网络进行了大幅度的正则化。实验结果表明，ALR算法训练速度快，资源利用率低，能很好地正则化深层网络。摘要：Overfitting is one of the critical problems in deep neural networks. Many regularization schemes try to prevent overfitting blindly. However, they decrease the convergence speed of training algorithms. Adaptive regularization schemes can solve overfitting more intelligently. They usually do not affect the entire network weights. This paper detects a subset of the weighting layers that cause overfitting. The overfitting recognizes by matrix and tensor condition numbers. An adaptive regularization scheme entitled Adaptive Low-Rank (ALR) is proposed that converges a subset of the weighting layers to their Low-Rank Factorization (LRF). It happens by minimizing a new Tikhonov-based loss function. ALR also encourages lazy weights to contribute to the regularization when epochs grow up. It uses a damping sequence to increment layer selection likelihood in the last generations. Thus before falling the training accuracy, ALR reduces the lazy weights and regularizes the network substantially. The experimental results show that ALR regularizes the deep networks well with high training speed and low resource usage.

【4】 Transductive Few-Shot Learning: Clustering is All You Need? 标题：传导性小机会学习：集群是您所需要的全部吗？

作者：Imtiaz Masud Ziko,Malik Boudiaf,Jose Dolz,Eric Granger,Ismail Ben Ayed 链接：https://arxiv.org/abs/2106.09516 摘要：我们研究了一种通用的聚类和导入式Few-Shot学习方法，它集成了基于原型的目标、拉普拉斯正则化和来自少数标记数据点的监督约束。我们提出了一个凹凸松弛的问题，并推导了一个计算效率高的块坐标界优化器，收敛性保证。在每次迭代中，我们的优化器为每个点到集群的分配计算独立（并行）的更新。因此，对于大规模的集群和少量的shot任务，它可以很容易地分布。在此基础上，对点到集映射进行了深入的收敛性分析。通过对不同数据集的端口综合聚类和少量镜头学习实验，表明我们的方法在精度和优化质量方面具有很强的竞争力，同时可以扩展到大型问题。我们的方法在基类上使用标准的训练，而不依赖于复杂的元学习和情景训练策略，在不同的模型、设置和数据集上，我们的方法比最新的少数镜头方法有显著的优势。令人惊讶的是，我们发现，即使是标准的聚类过程（例如，K-均值），对应于我们的一般模型的特定的、非正则化的情况，与最先进的Few-Shot学习相比，已经取得了竞争性的性能。这些令人惊讶的结果指出了当前少数镜头基准的局限性，并质疑了最近文献中大量复杂的少数镜头学习技术的可行性。摘要：We investigate a general formulation for clustering and transductive few-shot learning, which integrates prototype-based objectives, Laplacian regularization and supervision constraints from a few labeled data points. We propose a concave-convex relaxation of the problem, and derive a computationally efficient block-coordinate bound optimizer, with convergence guarantee. At each iteration,our optimizer computes independent (parallel) updates for each point-to-cluster assignment. Therefore, it could be trivially distributed for large-scale clustering and few-shot tasks. Furthermore, we provides a thorough convergence analysis based on point-to-set maps. Were port comprehensive clustering and few-shot learning experiments over various data sets, showing that our method yields competitive performances, in term of accuracy and optimization quality, while scaling up to large problems. Using standard training on the base classes, without resorting to complex meta-learning and episodic-training strategies, our approach outperforms state-of-the-art few-shot methods by significant margins, across various models, settings and data sets. Surprisingly, we found that even standard clustering procedures (e.g., K-means), which correspond to particular, non-regularized cases of our general model, already achieve competitive performances in comparison to the state-of-the-art in few-shot learning. These surprising results point to the limitations of the current few-shot benchmarks, and question the viability of a large body of convoluted few-shot learning techniques in the recent literature.

【5】 ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling 标题：基于交叉话语上下文和多任务语言建模的电子商务聊天机器人ASR适配

作者：Ashish Shenoy,Sravan Bodapati,Katrin Kirchhoff 机构：Amazon AWS AI, USA 备注：Accepted at ACL-IJCNLP 2021 Workshop on e-Commerce and NLP (ECNLP) 链接：https://arxiv.org/abs/2106.09532 摘要：在涉及货币交易和购买的电子商务语音助理中，自动语音识别（ASR）对槽实体的鲁棒性至关重要。随着有效的领域顺应，跨话语语境线索在消除特定领域内容词歧义方面起着重要作用。在本文中，我们研究了各种技术，以提高语境化，内容词稳健性和领域适应的TransformerXL神经语言模型（NLM）重新审视ASR的N-最佳假设。为了提高语境化程度，我们利用了转折层对话行为和跨话语语境的传承。此外，为了使我们的域通用NLM适应动态的电子商务，我们在域内数据上使用了从精细调整的屏蔽LM派生的嵌入。最后，为了提高对领域内内容词的鲁棒性，我们提出了一个多任务模型，可以联合执行内容词检测和语言建模任务。与无上下文的LSTM-LM基线相比，我们性能最好的NLM rescorer在电子商务音频测试集上的内容WER减少了19.2%，slot标签F1提高了6.4%。摘要：Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive that cross utterance contextual cues play an important role in disambiguating domain specific content words from speech. In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM) to rescore ASR N-best hypotheses. To improve contextualization, we utilize turn level dialogue acts along with cross utterance context carry over. Additionally, to adapt our domain-general NLM towards e-commerce on-the-fly, we use embeddings derived from a finetuned masked LM on in-domain data. Finally, to improve robustness towards in-domain content words, we propose a multi-task model that can jointly perform content word detection and language modeling tasks. Compared to a non-contextual LSTM LM baseline, our best performing NLM rescorer results in a content WER reduction of 19.2% on e-commerce audio test set and a slot labeling F1 improvement of 6.4%.

强化学习(6篇)

【1】 Modelling resource allocation in uncertain system environment through deep reinforcement learning 标题：基于深度强化学习的不确定系统环境下资源分配建模

作者：Neel Gandhi,Shakti Mishra 机构：Student,School of Technology, Pandit Deendayal Energy University, Gandhinagar,Gujarat, Associate Professor,School of Technology 备注：Accepted at IRMAS'21 链接：https://arxiv.org/abs/2106.09461 摘要：强化学习在机电一体化、机器人等资源受限控制系统领域有着广泛的应用。资源分配问题主要是利用传统的预定义技术和现代的深度学习方法来解决的。预定义的、最深入的资源分配学习方法的缺点是在系统环境不确定的情况下不能满足要求。利用深度强化学习，我们可以在遵循一定准则的同时，研究不确定系统环境下的资源分配问题。强化学习具有长时间适应新的不确定环境的能力。本文对各种深度强化学习方法进行了详细的对比分析，通过使用不同的组件来修改强化学习的体系结构，包括使用噪声层、优先重放、bagging、决斗网络、，以及其他相关的组合，以获得性能上的改进和计算成本的降低。文中指出，在给定的资源分配模拟环境中，采用带噪声的Bagging-duelling双deep-Q网络可以有效地解决不确定环境下的资源分配问题，通过显著的探索，使报酬最大化，效率达到97.7%。摘要：Reinforcement Learning has applications in field of mechatronics, robotics, and other resource-constrained control system. Problem of resource allocation is primarily solved using traditional predefined techniques and modern deep learning methods. The drawback of predefined and most deep learning methods for resource allocation is failing to meet the requirements in cases of uncertain system environment. We can approach problem of resource allocation in uncertain system environment alongside following certain criteria using deep reinforcement learning. Also, reinforcement learning has ability for adapting to new uncertain environment for prolonged period of time. The paper provides a detailed comparative analysis on various deep reinforcement learning methods by applying different components to modify architecture of reinforcement learning with use of noisy layers, prioritized replay, bagging, duelling networks, and other related combination to obtain improvement in terms of performance and reduction of computational cost. The paper identifies problem of resource allocation in uncertain environment could be effectively solved using Noisy Bagging duelling double deep Q network achieving efficiency of 97.7% by maximizing reward with significant exploration in given simulated environment for resource allocation.

【2】 CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing 标题：CROP：通过函数平滑验证强化学习的健壮策略

作者：Fan Wu,Linyi Li,Zijian Huang,Yevgeniy Vorobeychik,Ding Zhao,Bo Li 机构：University of Illinois at, Urbana-Champaign, Illinois, USA, Washington University, in St. Louis, Missouri, USA, Carnegie Mellon University, Pennsylvania, USA 备注：25 pages, 7 figures 链接：https://arxiv.org/abs/2106.09292 摘要：我们提出了第一个框架，证明强大的政策，强化学习（作物）对抗对抗状态扰动。我们提出了两种特殊类型的鲁棒性证明标准：每状态行为的鲁棒性和累积奖励的下界。具体地说，我们开发了一个局部平滑算法，该算法使用了一个策略，该策略来自于在每个遇到的状态上用高斯噪声平滑的Q函数，以保证沿着该轨迹采取的动作的鲁棒性。接下来，我们发展了一个全局平滑算法来证明有限时域累积报酬在对抗状态扰动下的鲁棒性。最后，我们提出了一种局部平滑的方法，利用自适应搜索来获得严格的奖赏证明边界。我们使用所提出的RL鲁棒性认证框架来评估六种方法，包括对抗性训练和几种形式的正则化，在两个典型的Atari游戏中，这些方法已经被证明能产生经验上鲁棒的RL。我们证明了RegPGD、RegCVX和RadialRL在这些算法中具有很高的鲁棒性。此外，我们通过对这些算法进行对抗性攻击的评估，证明了我们的认证通常是严密的。摘要：We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two particular types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards. Specifically, we develop a local smoothing algorithm which uses a policy derived from Q-functions smoothed with Gaussian noise over each encountered state to guarantee the robustness of actions taken along this trajectory. Next, we develop a global smoothing algorithm for certifying the robustness of a finite-horizon cumulative reward under adversarial state perturbations. Finally, we propose a local smoothing approach which makes use of adaptive search in order to obtain tight certification bounds for reward. We use the proposed RL robustness certification framework to evaluate six methods that have previously been shown to yield empirically robust RL, including adversarial training and several forms of regularization, on two representative Atari games. We show that RegPGD, RegCVX, and RadialRL achieve high certified robustness among these. Furthermore, we demonstrate that our certifications are often tight by evaluating these algorithms against adversarial attacks.

【3】 Mungojerrie: Reinforcement Learning of Linear-Time Objectives 标题：Mungojerrie：线性时间目标的强化学习

作者：Ernst Moritz Hahn,Mateo Perez,Sven Schewe,Fabio Somenzi,Ashutosh Trivedi,Dominik Wojtczak 机构： University of Twente, The Netherlands, University of Colorado Boulder, USA, University of Liverpool, UK 备注：Mungojerrie is available at this https URL 链接：https://arxiv.org/abs/2106.09161 摘要：强化学习在没有系统先验知识的情况下合成控制器。在每一个时间步，都会给予奖励。控制器优化这些奖励的折扣总和。应用这类算法需要设计一个奖励方案，这通常是手动完成的。设计师必须确保他们的意图被准确捕捉。这可能不是小事，而且容易出错。与直接在汇编中编程类似，这种手工编程的另一种方法是用正式语言指定目标，并将其“编译”为奖励方案。蒙哥杰里（$href）{https://plv.colorado.edu/mungojerrie/}{plv.colorado.edu/mungojerrie}$）是在有限模型上测试$omega$常规目标奖励方案的工具。该工具包含强化学习算法和概率模型检查器。Mungojerrie支持PRISM中指定的模型和HOA中指定的$omega$-自动机。摘要：Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie ($href{https://plv.colorado.edu/mungojerrie/}{plv.colorado.edu/mungojerrie}$) is a tool for testing reward schemes for $omega$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $omega$-automata specified in HOA.

【4】 Contrastive Reinforcement Learning of Symbolic Reasoning Domains 标题：符号推理域的对比强化学习

作者：Gabriel Poesia,WenXin Dong,Noah Goodman 机构：Stanford University, Noah D. Goodman 链接：https://arxiv.org/abs/2106.09146 摘要：摘要符号推理是人类智能的重要组成部分，是数学、逻辑学等领域所要求的。这些领域的求解器有着重要的应用，特别是在计算机辅助教育中。但是学习解决符号问题对于机器学习算法来说是一个挑战。现有的模型要么从人类的解决方案中学习，要么使用手工设计的功能，这使得它们在新领域的应用成本很高。在本文中，我们将符号域视为简单的环境，其中状态和动作以非结构化文本形式给出，二进制奖励表示问题是否得到解决。这种灵活的设置使得指定新域变得很容易，但是搜索和规划变得很有挑战性。我们介绍了四个受数学公共核心课程启发的环境，并观察到现有的强化学习基线表现不佳。然后，我们提出了一种新的学习算法，对比策略学习（ConPoLe），它显式地优化了信息损失，降低了当前状态和下一个状态之间的交互信息，这些状态在通往解决方案的路径上继续。康波尔成功地解决了所有四个领域。此外，ConPoLe学习到的问题表征能够准确预测实际数学课程中的问题类别。我们的研究结果为符号领域的强化学习以及在数学教育中的应用提供了新的方向。摘要：Abstract symbolic reasoning, as required in domains such as mathematics and logic, is a key component of human intelligence. Solvers for these domains have important applications, especially to computer-assisted education. But learning to solve symbolic problems is challenging for machine learning algorithms. Existing models either learn from human solutions or use hand-engineered features, making them expensive to apply in new domains. In this paper, we instead consider symbolic domains as simple environments where states and actions are given as unstructured text, and binary rewards indicate whether a problem is solved. This flexible setup makes it easy to specify new domains, but search and planning become challenging. We introduce four environments inspired by the Mathematics Common Core Curriculum, and observe that existing Reinforcement Learning baselines perform poorly. We then present a novel learning algorithm, Contrastive Policy Learning (ConPoLe) that explicitly optimizes the InfoNCE loss, which lower bounds the mutual information between the current state and next states that continue on a path to the solution. ConPoLe successfully solves all four domains. Moreover, problem representations learned by ConPoLe enable accurate prediction of the categories of problems in a real mathematics curriculum. Our results suggest new directions for reinforcement learning in symbolic domains, as well as applications to mathematics education.

【5】 Safe Reinforcement Learning Using Advantage-Based Intervention 标题：基于优势干预的安全强化学习

作者：Nolan Wagener,Byron Boots,Ching-An Cheng 机构： we want the agent not only to find a 1Institute for Robotics and Intelligent Machines, AllenSchool of Computer Science and Engineering, University of Wash-ington 备注：Appearing in ICML 2021. 28 pages, 7 figures 链接：https://arxiv.org/abs/2106.09110 摘要：许多顺序决策问题都涉及到在满足安全约束的情况下寻找一个使总回报最大化的策略。尽管最近的研究主要集中在安全强化学习（RL）算法的开发上，该算法在训练后产生一个安全策略，但是确保训练期间的安全仍然是一个开放的问题。在未知马尔可夫决策过程（MDP）中，一个基本的挑战是在满足约束的同时进行探索。在这项工作中，我们解决这个问题的机会约束设置。我们提出了一种新的算法SAILR，它使用基于优势函数的干预机制来保证agent在整个训练过程中的安全，并使用为无约束mdp设计的现成RL算法来优化agent的策略。与最优安全约束策略相比，我们的方法在训练和部署期间（即训练后和没有干预机制的情况下）的安全性和策略性能都有很强的保证。在我们的实验中，我们发现saillr在训练过程中比标准的safe-RL和受约束的MDP方法更少地违反约束，并且收敛到一个性能良好的策略，该策略可以在不需要干预的情况下安全部署。我们的代码在https://github.com/nolanwagener/safe_rl. 摘要：Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.

【6】 A Deep Reinforcement Learning Approach towards Pendulum Swing-up Problem based on TF-Agents 标题：基于TF-Agents的摆起问题的深度强化学习方法

作者：Yifei Bi,Xinyi Chen,Caihui Xiao 机构：Department of Statistics, Columbia University in the City of New York, New York, USA 链接：https://arxiv.org/abs/2106.09556 摘要：采用深度Q学习agent训练手杖的思想，可以得到一个很好的防止手杖坠落的效果。强化学习（RL）从环境与agent的交互中学习的能力提供了一种最优的控制策略。本文旨在解决经典的摆起问题，即使学习的摆处于直立位置并保持平衡。在该问题中，引入了深度确定的策略梯度算法对连续作用域进行操作。在提高平均收益率、减少损失和视频直播的情况下，证明了最优摆的显著结果。摘要：Adapting the idea of training CartPole with Deep Q-learning agent, we are able to find a promising result that prevent the pole from falling down. The capacity of reinforcement learning (RL) to learn from the interaction between the environment and agent provides an optimal control strategy. In this paper, we aim to solve the classic pendulum swing-up problem that making the learned pendulum to be in upright position and balanced. Deep Deterministic Policy Gradient algorithm is introduced to operate over continuous action domain in this problem. Salient results of optimal pendulum are proved with increasing average return, decreasing loss, and live video in the code part.

元学习(2篇)

【1】 Meta-Calibration: Meta-Learning of Model Calibration Using Differentiable Expected Calibration Error 标题：元校准：基于可微分期望校准误差的模型校准元学习

作者：Ondrej Bohdal,Yongxin Yang,Timothy Hospedales 机构： ( 20 17) 1School of Informatics, The University of Edinburgh 链接：https://arxiv.org/abs/2106.09613 摘要：神经网络的标定是一个热点问题，对于神经网络的实际应用越来越重要。这一问题在使用现代神经网络时尤为突出，因为神经网络模型的置信度与其应有的置信度有很大的差别。已经成功地提出了各种战略，但仍有更多的改进空间。我们提出了一种新的方法，该方法引入了一个期望校准误差的可微度量，并成功地将其作为元学习的目标，通过最新的方法获得了有竞争力的结果。我们的方法提出了一个新的方向，利用元学习直接优化模型校准，我们相信这将激发进一步的工作在这一前景和新的方向。摘要：Calibration of neural networks is a topical problem that is becoming increasingly important for real-world use of neural networks. The problem is especially noticeable when using modern neural networks, for which there is significant difference between the model confidence and the confidence it should have. Various strategies have been successfully proposed, yet there is more space for improvements. We propose a novel approach that introduces a differentiable metric for expected calibration error and successfully uses it as an objective for meta-learning, achieving competitive results with state-of-the-art approaches. Our approach presents a new direction of using meta-learning to directly optimize model calibration, which we believe will inspire further work in this promising and new direction.

【2】 Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers 标题：基于相关均衡元解的超零和多智能体训练

作者：Luke Marris,Paul Muller,Marc Lanctot,Karl Tuyls,Thore Grapael 机构： 20 1 3; 1DeepMind 2University College London 3Universit´e GustaveEiffel 备注：ICML 2021, 9 pages, coded implementation available in this https URL (jpsro.py in examples) 链接：https://arxiv.org/abs/2106.09435 摘要：文献中对两人常和对策进行了很好的研究，但在这一背景之外的研究进展有限。我们提出了联合策略空间响应预言（JPSRO）算法，该算法可证明收敛到一个均衡点。我们进一步提出相关均衡（CE）是一个有前途的元求解器，并提出了一个新的解决方案概念最大基尼相关均衡（MGCE），这是一个解决相关均衡选择问题的有原则和计算效率的解决方案家族。我们使用JPSRO的CE元求解器进行了几个实验，并证明了在n人一般和对策上的收敛性。摘要：Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.

医学相关(3篇)

【1】 Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study 标题：生物医学知识库完善的科学语言模型：一项实证研究

作者：Rahul Nadkarni,David Wadden,Iz Beltagy,Noah A. Smith,Hannaneh Hajishirzi,Tom Hope 机构：Paul G. Allen School for Computer Science & Engineering, University of Washington, Allen Institute for Artificial Intelligence (AI,) 链接：https://arxiv.org/abs/2106.09700 摘要：生物医学知识图（KG）包含了关于疾病、药物和基因等实体的丰富信息。预测这些图表中缺失的链接可以促进许多重要的应用，例如药物设计和重新调整用途。最近的工作表明，通用领域语言模型（LMs）可以充当“软”KG，并且可以针对KG完成的任务进行微调。在这项工作中，我们研究了KG完成的科学LMs，探索是否可以挖掘他们的潜在知识来加强生物医学链接预测。我们评估了几个领域特定的LMs，在以KG表示的药物和疾病为中心的数据集上对它们进行微调，并用文本实体描述来丰富它们。我们将基于LM的模型与KG嵌入模型集成，使用一种路由器方法，该方法学习将每个输入示例分配给任一类型的模型，并提供性能上的显著提升。最后，我们证明了LM模型在具有新颖科学实体的归纳环境中的优势。我们的数据集和代码是公开的。摘要：Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug design and repurposing. Recent work has shown that general-domain language models (LMs) can serve as "soft" KGs, and that they can be fine-tuned for the task of KG completion. In this work, we study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We evaluate several domain-specific LMs, fine-tuning them on datasets centered on drugs and diseases that we represent as KGs and enrich with textual entity descriptions. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance. Finally, we demonstrate the advantage of LM models in the inductive setting with novel scientific entities. Our datasets and code are made publicly available.

【2】 Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech 标题：基于音频、词汇和不流畅特征的多模式融合与门控自发音识别阿尔茨海默病

作者：Morteza Rohanian,Julian Hough,Matthew Purver 机构：Cognitive Science Group, School of Electronic Engineering and Computer Science, Queen Mary University of London, Department of Knowledge Technologies, Joˇzef Stefan Institute 备注：None 链接：https://arxiv.org/abs/2106.09668 摘要：本文是通过自发语音（ADReSS）挑战进行阿尔茨海默氏痴呆症识别的一篇论文，其目的是开发能够帮助从语音数据自动预测阿尔茨海默氏症严重程度的方法。在阿尔茨海默病诊断和简易精神状态检查（MMSE）评分预测的背景下，我们重点研究声学和自然语言特征在自发语言认知损伤检测中的应用。我们提出了一个模型，该模型从不同的lstm获得单峰决策，文本和音频的每种模态各一个，然后使用选通机制将它们结合起来进行最终预测。我们专注于文本和音频的顺序建模，并研究了个体言语中的不流畅是否与认知障碍的程度有关。我们的结果表明，所提出的分类和回归方案在开发集和测试集上都获得了非常有希望的结果。这表明，阿尔茨海默病可以成功地检测序列建模的语音数据的医疗会议。摘要：This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge, which aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data. We focus on acoustic and natural language features for cognitive impairment detection in spontaneous speech in the context of Alzheimer's Disease Diagnosis and the mini-mental state examination (MMSE) score prediction. We proposed a model that obtains unimodal decisions from different LSTMs, one for each modality of text and audio, and then combines them using a gating mechanism for the final prediction. We focused on sequential modelling of text and audio and investigated whether the disfluencies present in individuals' speech relate to the extent of their cognitive impairment. Our results show that the proposed classification and regression schemes obtain very promising results on both development and test sets. This suggests Alzheimer's Disease can be detected successfully with sequence modeling of the speech data of medical sessions.

【3】 Biomedical Interpretable Entity Representations 标题：生物医学可解释实体表示法

作者：Diego Garcia-Olano,Yasumasa Onoe,Ioana Baldini,Joydeep Ghosh,Byron C. Wallace,Kush R. Varshney 机构：IBM Research,University of Texas at Austin,Northeastern University 备注：Accepted into Findings of ACL-IJCNLP 2021 链接：https://arxiv.org/abs/2106.09502 摘要：预先训练的语言模型会产生密集的实体表示，在以实体为中心的自然语言处理任务中具有很强的性能，但这种表示不能立即解释。这可能是一个障碍，以模型吸收在重要领域，如生物医学。最近有一些关于一般解释表征学习的研究（Onoe和Durrett，2020），但是这些领域不可知表征并不容易转移到生物医学的重要领域。在本文中，我们通过将实体映射到医学本体中的概念，并从这些概念映射到以我们的类型为类别的Wikipedia页面，从大量生物医学文本语料库中创建了一个新的实体类型系统和训练集。从这个映射中，我们导出生物医学可解释实体表示（BIERs），其中维度对应于细粒度实体类型，值是给定实体属于相应类型的预测概率。我们提出了一种新的方法，利用BIER的最终稀疏表示和中间密集表示来促进模型和实体类型的调试。我们证明BIERs在生物医学任务中取得了很好的性能，包括命名实体消歧和实体标签分类，并且我们提供了错误分析来突出其可解释性的效用，特别是在低监督设置下。最后，我们提供了我们的诱导68K生物医学类型系统，相应的3700万个用于训练BIER模型的导出数据的三倍以及我们的最佳性能模型。摘要：Pre-trained language models induce dense entity representations that offer strong performance on entity-centric NLP tasks, but such representations are not immediately interpretable. This can be a barrier to model uptake in important domains such as biomedicine. There has been recent work on general interpretable representation learning (Onoe and Durrett, 2020), but these domain-agnostic representations do not readily transfer to the important domain of biomedicine. In this paper, we create a new entity type system and training set from a large corpus of biomedical texts by mapping entities to concepts in a medical ontology, and from these to Wikipedia pages whose categories are our types. From this mapping we derive Biomedical Interpretable Entity Representations(BIERs), in which dimensions correspond to fine-grained entity types, and values are predicted probabilities that a given entity is of the corresponding type. We propose a novel method that exploits BIER's final sparse and intermediate dense representations to facilitate model and entity type debugging. We show that BIERs achieve strong performance in biomedical tasks including named entity disambiguation and entity label classification, and we provide error analysis to highlight the utility of their interpretability, particularly in low-supervision settings. Finally, we provide our induced 68K biomedical type system, the corresponding 37 million triples of derived data used to train BIER models and our best performing model.

推荐(1篇)

【1】 Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for Hyperparameter Recommendation 标题：摊余自动调谐：超参数推荐的低成本传输优化

作者：Yuxin Xiao,Eric P. Xing,Willie Neiswanger 机构：Carnegie Mellon University,Stanford University,Petuum,MBZUAI 链接：https://arxiv.org/abs/2106.09179 摘要：随着现代机器学习模型的超参数数目和训练次数的激增，超参数整定的代价越来越高。尽管已经提出了通过知识转移来加速调谐的方法，但是它们通常需要超参数的最终性能，并且不关注低保真度信息。然而，这种常见的做法是次优的，可能导致不必要的资源使用。相反，利用低保真度的调整观察值来度量任务间的相似性，并相应地将知识从现有任务转移到新任务，更具成本效益。然而，在传输设置中执行多保真度调谐有其自身的挑战：附加观测中的噪声和性能预测的需要。因此，我们对多任务多保真度贝叶斯优化框架进行了深入的分析，得到了最佳的实例——摊余自动调整（AT2）。我们进一步提出了一个离线计算的27任务超参数推荐（HyperRec）数据库来服务于社区。在HyperRec和其他真实数据库上的大量实验说明了我们的AT2方法的有效性。摘要：With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. Although methods have been proposed to speed up tuning via knowledge transfer, they typically require the final performance of hyperparameters and do not focus on low-fidelity information. Nevertheless, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage the low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in the additional observations and the need for performance forecasting. Therefore, we conduct a thorough analysis of the multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.

聚类(1篇)

【1】 Author Clustering and Topic Estimation for Short Texts 标题：短文本的作者聚类与主题估计

作者：Graham Tierney,Christopher Bail,Alexander Volfovsky 链接：https://arxiv.org/abs/2106.09533 摘要：分析短文本，如社交媒体帖子，是非常困难的，因为它依赖于观察许多文档级的单词共现对。除了主题分布之外，建模的一个常见下游任务是将这些文档的作者分组，以便进行后续分析。传统的模型通过一个独立的过程来估计文档分组和识别用户集群。我们提出了一个新的模型，该模型扩展了潜在的Dirichlet分配，通过建模同一文档中单词之间的强依赖关系，并具有用户级的主题分布。我们同时对用户进行聚类，消除了事后聚类估计的需要，并通过将噪声用户级的主题分布缩小到典型值来改进主题估计。我们的方法在处理短文本中出现的问题时表现得和传统方法一样好，甚至更好，我们在美国参议员的推特数据集上证明了它的有效性，恢复了反映党派意识形态的有意义的主题和簇。摘要：Analysis of short text, such as social media posts, is extremely difficult because it relies on observing many document-level word co-occurrence pairs. Beyond topic distributions, a common downstream task of the modeling is grouping the authors of these documents for subsequent analyses. Traditional models estimate the document groupings and identify user clusters with an independent procedure. We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document, with user-level topic distributions. We also simultaneously cluster users, removing the need for post-hoc cluster estimation and improving topic estimation by shrinking noisy user-level topic distributions towards typical values. Our method performs as well as -- or better -- than traditional approaches to problems arising in short text, and we demonstrate its usefulness on a dataset of tweets from United States Senators, recovering both meaningful topics and clusters that reflect partisan ideology.

联邦学习|隐私保护|加密(6篇)

【1】 Optimality and Stability in Federated Learning: A Game-theoretic Approach 标题：联合学习中的最优性和稳定性：博弈论方法

作者：Kate Donahue,Jon Kleinberg 机构：Department of Computer Science, Cornell University, Departments of Computer Science, and Information Science 链接：https://arxiv.org/abs/2106.09580 摘要：联邦学习是一种分布式学习范式，其中多个代理（每个代理只能访问本地数据）共同学习一个全局模型。最近出现了一个爆炸性的研究，其目的不仅是提高联合学习的准确率，而且还围绕总误差等社会福利属性提供一定的保证。本研究的一个分支采用了博弈论的方法，特别是先前的研究将联邦学习视为一种享乐博弈，在这种博弈中，最小化错误的参与者将自己安排到联邦联盟中。过去的工作证明了稳定联盟划分的存在性，但留下了许多问题，包括这些稳定解离最优解有多远。在这项工作中，我们激励和定义了一个概念，即最优性是由联邦代理（参与者）之间的平均错误率给出的。首先，我们提供并证明了一个有效的算法来计算最优（误差最小化）的球员安排的正确性。接下来，我们分析了一个安排的稳定性和最优性之间的关系。首先，我们证明了对于参数空间的某些区域，所有的稳定安排都是最优的（无政府状态的价格等于1）。然而，我们发现这并非适用于所有环境：存在成本高于最优（无政府状态价格大于1）的稳定安排的例子。最后，我们给出了稳定性与最优性之间性能差距的第一常数因子界，证明了最坏稳定解的总误差不超过最优解总误差的9倍（无政府价格界为9）。摘要：Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players). First, we provide and prove the correctness of an efficient algorithm to calculate an optimal (error minimizing) arrangement of players. Next, we analyze the relationship between the stability and optimality of an arrangement. First, we show that for some regions of parameter space, all stable arrangements are optimal (Price of Anarchy equal to 1). However, we show this is not true for all settings: there exist examples of stable arrangements with higher cost than optimal (Price of Anarchy greater than 1). Finally, we give the first constant-factor bound on the performance gap between stability and optimality, proving that the total error of the worst stable solution can be no higher than 9 times the total error of an optimal solution (Price of Anarchy bound of 9).

【2】 Federated Learning for Intrusion Detection System: Concepts, Challenges and Future Directions 标题：入侵检测系统的联合学习：概念、挑战和未来方向

作者：Shaashwat Agrawal,Sagnik Sarkar,Ons Aouedi,Gokul Yenduri,Kandaraj Piamrat,Sweta Bhattacharya,Praveen Kumar Reddy Maddikunta,Thippa Reddy Gadekallu 机构：∗School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India, † University of Nantes, France, ‡School of Information Technology, Vellore Institute of Technology, Vellore, India 备注：Submitted to JNCA, Elsevier 链接：https://arxiv.org/abs/2106.09527 摘要：互联网和智能设备的快速发展引发了网络流量的激增，使得其基础设施更加复杂和异构。移动电话、可穿戴设备和自主车辆的主要用途是分布式网络的例子，这些网络每天都会产生大量的数据。这些设备的计算能力也在稳步发展，这就产生了传输信息、在本地存储数据以及将网络计算推向边缘设备的需求。入侵检测系统在确保此类设备的安全性和隐私性方面发挥着重要作用。基于机器学习和深度学习的入侵检测系统因其具有较高的分类精度而得到了极大的发展。然而，由于需要存储数据并将数据传输到集中式服务器，隐私和安全方面可能会受到威胁。相反，联邦学习（FL）适合作为一种隐私保护的分散学习技术，它不传输数据，而是在本地训练模型，并将参数传输到集中式服务器。本文对FL在入侵检测系统中的应用进行了全面而详尽的综述。为了确定FL的需求，讨论了各种类型的IDS、相关的ML方法及其相关问题。本文详细介绍了FL在异常检测各个方面的实现。同时也指出了FL实现的相关挑战，为今后的研究方向提供了思路。论文最后提出了与基于FL的入侵检测系统实现中所面临的挑战相关的可行解决方案，作为前瞻性研究的基线。摘要：The rapid development of the Internet and smart devices trigger surge in network traffic making its infrastructure more complex and heterogeneous. The predominated usage of mobile phones, wearable devices and autonomous vehicles are examples of distributed networks which generate huge amount of data each and every day. The computational power of these devices have also seen steady progression which has created the need to transmit information, store data locally and drive network computations towards edge devices. Intrusion detection systems play a significant role in ensuring security and privacy of such devices. Machine Learning and Deep Learning with Intrusion Detection Systems have gained great momentum due to their achievement of high classification accuracy. However the privacy and security aspects potentially gets jeopardised due to the need of storing and communicating data to centralized server. On the contrary, federated learning (FL) fits in appropriately as a privacy-preserving decentralized learning technique that does not transfer data but trains models locally and transfers the parameters to the centralized server. The present paper aims to present an extensive and exhaustive review on the use of FL in intrusion detection system. In order to establish the need for FL, various types of IDS, relevant ML approaches and its associated issues are discussed. The paper presents detailed overview of the implementation of FL in various aspects of anomaly detection. The allied challenges of FL implementations are also identified which provides idea on the scope of future direction of research. The paper finally presents the plausible solutions associated with the identified challenges in FL based intrusion detection system implementation acting as a baseline for prospective research.

【3】 Towards Heterogeneous Clients with Elastic Federated Learning 标题：面向异构客户端的弹性联合学习

作者：Zichen Ma,Yu Lu,Zihan Lu,Wenye Li,Jinfeng Yi,Shuguang Cui 机构：The Chinese University of Hong Kong, Shenzhen, JD AI Lab, Ping An Technology 备注：Under Review 链接：https://arxiv.org/abs/2106.09433 摘要：联合学习涉及在设备或数据仓库（如边缘处理器或数据仓库）上训练机器学习模型，同时保持数据的本地性。在异构的、潜在的海量网络中进行训练会给系统带来偏差，这种偏差源于非IID数据和现实中的低参与率。本文提出了弹性联邦学习（EFL）算法，该算法利用不完全局部更新，使得训练过程中信息量最大的参数不易变化。它是一种高效的压缩上下游通信的算法。理论上，该算法在低参与率下对非IID数据进行训练时具有收敛性保证。实证实验证实了EFL框架在鲁棒性和效率上的竞争性能。摘要：Federated learning involves training machine learning models over devices or data silos, such as edge processors or data warehouses, while keeping the data local. Training in heterogeneous and potentially massive networks introduces bias into the system, which is originated from the non-IID data and the low participation rate in reality. In this paper, we propose Elastic Federated Learning (EFL), an unbiased algorithm to tackle the heterogeneity in the system, which makes the most informative parameters less volatile during training, and utilizes the incomplete local updates. It is an efficient and effective algorithm that compresses both upstream and downstream communications. Theoretically, the algorithm has convergence guarantee when training on the non-IID data at the low participation rate. Empirical experiments corroborate the competitive performance of EFL framework on the robustness and the efficiency.

【4】 Quantized Federated Learning under Transmission Delay and Outage Constraints 标题：传输时延和中断约束下的量化联邦学习

作者：Yanmeng Wang,Yanqing Xu,Qingjiang Shi,Tsung-Hui Chang 备注：Submitted for publication 链接：https://arxiv.org/abs/2106.09397 摘要：联邦学习（FL）是一种可行的分布式学习模式，它在保护用户隐私的同时，与无线边缘的大量移动设备协作训练机器学习模型。虽然已经提出了各种各样的通信方案来加速FL过程，但是大多数方案都假设了理想的无线信道，在服务器和移动客户端之间提供可靠和无损的通信链路。然而，在实际无线资源有限的系统中，如训练时延受限、传输功率和带宽受限等，大量模型参数的传输不可避免地受到量化误差（QE）和传输中断（TO）的影响。在本文中，我们考虑了这种非理想的无线信道，并进行了第一次分析，结果表明TO和QE会严重影响FL收敛，但有趣的是，如果客户机具有统一的中断概率，则FL收敛可以得到缓解。这些有见地的结果促使我们提出了一个健壮的FL方案FedTOE，它在客户端之间联合分配无线资源和量化比特，以最小化QE，同时使客户端具有相同的概率。大量的实验结果显示了FedTOE在传输延迟受限的深度学习分类任务中的优越性能。摘要：Federated learning (FL) has been recognized as a viable distributed learning paradigm which trains a machine learning model collaboratively with massive mobile devices in the wireless edge while protecting user privacy. Although various communication schemes have been proposed to expedite the FL process, most of them have assumed ideal wireless channels which provide reliable and lossless communication links between the server and mobile clients. Unfortunately, in practical systems with limited radio resources such as constraint on the training latency and constraints on the transmission power and bandwidth, transmission of a large number of model parameters inevitably suffers from quantization errors (QE) and transmission outage (TO). In this paper, we consider such non-ideal wireless channels, and carry out the first analysis showing that the FL convergence can be severely jeopardized by TO and QE, but intriguingly can be alleviated if the clients have uniform outage probabilities. These insightful results motivate us to propose a robust FL scheme, named FedTOE, which performs joint allocation of wireless resources and quantization bits across the clients to minimize the QE while making the clients have the same TO probability. Extensive experimental results are presented to show the superior performance of FedTOE for a deep learning-based classification task with transmission latency constraints.

【5】 Coded Federated Learning Framework for AI-Based Mobile Application Services with Privacy-Awareness 标题：基于AI的具有隐私感知的移动应用服务编码联邦学习框架

作者：Yuris Mulya Saputra,Diep N. Nguyen,Dinh Thai Hoang,Eryk Dutkiewicz 机构：au) and Department of Electrical En-gineering and Informatics, Universitas Gadjah Mada 备注：18 pages (submitted to an IEEE journal) 链接：https://arxiv.org/abs/2106.09261 摘要：通过对计算任务进行编码，编码计算不仅可以缓解联合学习（FL）中的散乱问题，而且可以保护参与移动用户（mu）上传/贡献到移动应用程序提供商（MAP）拥有的集中式服务器的敏感数据的隐私。然而，这些优势伴随着额外的编码成本/复杂性和通信开销（称为emph{privacy cost}），考虑到MUs/MAP有限的计算/通信资源，MUs之间在向MAP提供数据方面的合理性和激励性竞争，必须考虑这些开销。本文提出了一种新的基于编码FL的隐私感知移动应用服务框架来解决这些挑战。特别地，MAP首先基于mu提供的信息/特征来确定FL过程的最佳mu的集合。然后，每个选定的MU可以根据其期望的可训练的本地数据和受隐私保护的编码数据向MAP提出一个契约。为了在保持整个系统高学习质量的前提下，找到能够最大化MAP和所有参与MU效用的最优契约，我们首先利用基于编码FL的多效用函数，在MUs的隐私成本（MAP有限的计算资源）下，提出了一个基于多主单代理契约的问题，地图和地图之间的信息不对称。然后，我们将问题转化为一个等价的低复杂度问题，并提出一种迭代算法来求解。在真实数据集上的实验表明，该框架在提高网络的社会福利（即所有参与实体的总效用）的同时，在考虑隐私成本的情况下，训练时间可提高49%，预测准确率可提高4.6倍，比基线方法提高114%。摘要：By encoding computing tasks, coded computing can not only mitigate straggling problems in federated learning (FL), but also preserve privacy of sensitive data uploaded/contributed by participating mobile users (MUs) to the centralized server, owned by a mobile application provider (MAP). However, these advantages come with extra coding cost/complexity and communication overhead (referred to as emph{privacy cost}) that must be considered given the limited computing/communications resources at MUs/MAP, the rationality and incentive competition among MUs in contributing data to the MAP. This article proposes a novel coded FL-based framework for a privacy-aware mobile application service to address these challenges. In particular, the MAP first determines a set of the best MUs for the FL process based on MUs' provided information/features. Then, each selected MU can propose a contract to the MAP according to its expected trainable local data and privacy-protected coded data. To find the optimal contracts that can maximize utilities of the MAP and all the participating MUs while maintaining high learning quality of the whole system, we first develop a multi-principal one-agent contract-based problem leveraging coded FL-based multiple utility functions under the MUs' privacy cost, the MAP's limited computing resource, and asymmetric information between the MAP and MUs. Then, we transform the problem into an equivalent low-complexity problem and develop an iterative algorithm to solve it. Experiments with a real-world dataset show that our framework can speed up training time up to 49% and improve prediction accuracy up to 4.6 times while enhancing network's social welfare, i.e., total utility of all participating entities, up to 114% under the privacy cost consideration compared with those of baseline methods.

【6】 QuantumFed: A Federated Learning Framework for Collaborative Quantum Training 标题：QuantumFED：一种面向协同量子训练的联邦学习框架

作者：Qun Xia,Qun Li 机构：Department of Computer Science, College of William and Mary, Williamsburg, VA , USA 链接：https://arxiv.org/abs/2106.09109 摘要：随着量子计算和深度学习的快速发展，量子神经网络近年来受到了广泛的关注。通过利用量子计算的能力，深度神经网络可以潜在地克服经典机器学习中的计算能力限制。然而，当多个量子机器希望使用每台机器上的本地数据来训练一个全局模型时，将数据复制到一台机器并训练模型可能非常困难。因此，有必要建立一个协作的量子神经网络框架。本文借用联邦学习的核心思想，提出了一种量子联邦学习框架QuantumFed，它将多个量子节点和局部量子数据一起训练一种模式。实验证明了该框架的可行性和鲁棒性。摘要：With the fast development of quantum computing and deep learning, quantum neural networks have attracted great attention recently. By leveraging the power of quantum computing, deep neural networks can potentially overcome computational power limitations in classic machine learning. However, when multiple quantum machines wish to train a global model using the local data on each machine, it may be very difficult to copy the data into one machine and train the model. Therefore, a collaborative quantum neural network framework is necessary. In this article, we borrow the core idea of federated learning to propose QuantumFed, a quantum federated learning framework to have multiple quantum nodes with local quantum data train a mode together. Our experiments show the feasibility and robustness of our framework.

推理|分析|理解|解释(9篇)

【1】 Hi-Phy: A Benchmark for Hierarchical Physical Reasoning 标题：Hi-Phy：分层物理推理的基准

作者：Cheng Xue,Vimukthini Pinto,Chathura Gamage,Peng Zhang,Jochen Renz 机构：School of Computing, The Australian National University, Canberra, Australia 链接：https://arxiv.org/abs/2106.09692 摘要：对物理对象的行为进行推理是在物理世界中运行的代理的一项关键能力。人类在物理推理方面非常有经验，但这仍然是人工智能的一个主要挑战。为了促进解决这一问题的研究，最近提出了若干基准。然而，在解决复杂的推理任务时，这些基准并不能使我们度量代理的粒度物理推理能力。在本文中，我们提出了一个新的基准物理推理，使我们能够测试个人的物理推理能力。受人类如何获得这些能力的启发，我们提出了一个具有日益复杂的物理推理能力的一般层次结构。我们的基准测试能力，根据这个层次结构，通过生成的物理推理任务，在视频游戏愤怒的小鸟。这个基准使我们能够通过测量代理的粒度物理推理能力来进行全面的代理评估。我们进行评估与人类球员，学习代理，启发式代理和确定他们的能力。我们的评估表明，学习代理，具有良好的局部泛化能力，仍然难以学习潜在的物理推理能力，表现不如目前最先进的启发式代理和人类。我们相信，这个基准将鼓励研究人员开发具有先进的、类似人类的物理推理能力的智能代理。网址：https://github.com/Cheng-Xue/Hi-Phy 摘要：Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds. Humans are very experienced in physical reasoning while it remains a major challenge for AI. To facilitate research addressing this problem, several benchmarks have been proposed recently. However, these benchmarks do not enable us to measure an agent's granular physical reasoning capabilities when solving a complex reasoning task. In this paper, we propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities. Inspired by how humans acquire these capabilities, we propose a general hierarchy of physical reasoning capabilities with increasing complexity. Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds. This benchmark enables us to conduct a comprehensive agent evaluation by measuring the agent's granular physical reasoning capabilities. We conduct an evaluation with human players, learning agents, and heuristic agents and determine their capabilities. Our evaluation shows that learning agents, with good local generalization ability, still struggle to learn the underlying physical reasoning capabilities and perform worse than current state-of-the-art heuristic agents and humans. We believe that this benchmark will encourage researchers to develop intelligent agents with advanced, human-like physical reasoning capabilities. URL: https://github.com/Cheng-Xue/Hi-Phy

【2】 Accuracy, Interpretability, and Differential Privacy via Explainable Boosting 标题：通过可解释提升实现准确性、可解释性和差异化隐私

作者：Harsha Nori,Rich Caruana,Zhiqi Bu,Judy Hanwen Shen,Janardhan Kulkarni 机构： 2University of Pennsylvania, 3Stanford University 备注：To be published in ICML 2021. 12 pages, 6 figures 链接：https://arxiv.org/abs/2106.09680 摘要：我们表明，在可解释的Boosting机器（EBMs）中加入差异隐私，这是一种最近用于训练可解释的ML模型的方法，在保护隐私的同时，产生了最先进的准确性。我们在多重分类和回归数据集上的实验表明，DP-EBM模型即使在有很强的差异隐私保证的情况下，其精度损失也非常小。除了高精度之外，将DP应用于EBMs的另外两个好处是：a）经过训练的模型提供精确的全局和局部解释性，这在需要不同隐私的环境中通常很重要；并且b）模型可以在训练后编辑而不损失隐私，以纠正DP噪声可能引入的错误。摘要：We show that adding differential privacy to Explainable Boosting Machines (EBMs), a recent method for training interpretable ML models, yields state-of-the-art accuracy while protecting privacy. Our experiments on multiple classification and regression datasets show that DP-EBM models suffer surprisingly little accuracy loss even with strong differential privacy guarantees. In addition to high accuracy, two other benefits of applying DP to EBMs are: a) trained models provide exact global and local interpretability, which is often important in settings where differential privacy is needed; and b) the models can be edited after training without loss of privacy to correct errors which DP noise may have introduced.

【3】 Towards Explainable Student Group Collaboration Assessment Models Using Temporal Representations of Individual Student Roles 标题：基于个体学生角色时间表征的可解释性学生小组协作评价模型

作者：Anirudh Som,Sujeong Kim,Bladimir Lopez-Prado,Svati Dhamija,Nonye Alozie,Amir Tamrakar 机构：Center for Vision Technologies, SRI International, Center for Education, Research and Innovation, bladimir.lopez- 备注：Accepted in the poster session at the 14th International Conference on Educational Data Mining 链接：https://arxiv.org/abs/2106.09623 摘要：协作被认为是学生在科学、技术、工程和数学领域取得成功所必需的技能。然而，由于学生人数的增长和教师队伍的有限，教师很难用教学方法提供建设性的反馈和灌输协作技能。开发简单且易于解释的基于机器学习的自动化系统可以帮助解决这个问题。改进了我们以前的工作，在本文中，我们提出了使用简单的时间CNN深度学习模型来评估学生群体协作，以个体学生角色的时间表征作为输入。我们检查了动态变化的特征表示法在学生小组合作评估中的适用性，以及它们如何影响整体绩效。我们还使用Grad-CAM可视化来更好地理解和解释导致深度学习模型决策的重要时间指标。摘要：Collaboration is identified as a required and necessary skill for students to be successful in the fields of Science, Technology, Engineering and Mathematics (STEM). However, due to growing student population and limited teaching staff it is difficult for teachers to provide constructive feedback and instill collaborative skills using instructional methods. Development of simple and easily explainable machine-learning-based automated systems can help address this problem. Improving upon our previous work, in this paper we propose using simple temporal-CNN deep-learning models to assess student group collaboration that take in temporal representations of individual student roles as input. We check the applicability of dynamically changing feature representations for student group collaboration assessment and how they impact the overall performance. We also use Grad-CAM visualizations to better understand and interpret the important temporal indices that led to the deep-learning model's decision.

【4】 Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation 标题：算法偏差与数据偏差：理解分布式稳健优化与数据处理之间的关系

作者：Agnieszka Słowik,Léon Bottou 机构：Department of Computer Science and Technology, University of Cambridge, Cambridge, UK, Facebook AI Research, New York, NY, USA, and New York University, New York, NY, USA 链接：https://arxiv.org/abs/2106.09467 摘要：基于平均误差最小化的机器学习系统在显著的数据子集上表现出不一致性，而整个数据集的平均误差很低，不会暴露出这种不一致性。在相应的社会和经济应用中，数据代表人，这可能导致对代表性不足的性别和族裔群体的歧视。考虑到偏差缓解在机器学习中的重要性，该主题导致了关于如何确保实践中的公平性（数据偏差与算法偏差）的争论。分布式稳健优化（DRO）似乎通过最小化子种群的最坏预期风险来解决这个问题。我们建立了理论结果，阐明了DRO与在适当加权的训练数据集上平均相同损失的优化之间的关系。结果包括有限个和无限个训练分布，以及凸和非凸损失函数。我们表明，无论是DRO还是训练集的管理都不应该被解释为一个减少偏差的完整解决方案：同样的，没有一个通用的健壮的训练集，也没有一个通用的方法来设置DRO问题并确保一个社会可接受的结果集。然后，我们利用这些见解提供一套小型实用建议，以解决DRO的偏见。最后，以对抗鲁棒性为例，讨论了我们的结果在DRO其它相关应用中的影响。我们的结果表明，只要支持这些观点的论据被精确地限定并得到今天已知的相关数学的支持，偏倚辩论中以算法为中心和以数据为中心的那一方都是有价值的。摘要：Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and ethnic groups. Given the importance of bias mitigation in machine learning, the topic leads to contentious debates on how to ensure fairness in practice (data bias versus algorithmic bias). Distributionally Robust Optimization (DRO) seemingly addresses this problem by minimizing the worst expected risk across subpopulations. We establish theoretical results that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset. The results cover finite and infinite number of training distributions, as well as convex and non-convex loss functions. We show that neither DRO nor curating the training set should be construed as a complete solution for bias mitigation: in the same way that there is no universally robust training set, there is no universal way to setup a DRO problem and ensure a socially acceptable set of results. We then leverage these insights to provide a mininal set of practical recommendations for addressing bias with DRO. Finally, we discuss ramifications of our results in other related applications of DRO, using an example of adversarial robustness. Our results show that there is merit to both the algorithm-focused and the data-focused side of the bias debate, as long as arguments in favor of these positions are precisely qualified and backed by relevant mathematics known today.

【5】 Towards Understanding Deep Learning from Noisy Labels with Small-Loss Criterion 标题：基于小损失准则的噪声标签深度学习理解

作者：Xian-Jin Gui,Wei Wang,Zhang-Hao Tian 机构：National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China 备注：Accepted to International Joint Conference on Artificial Intelligence (IJCAI) 2021, includes non-archival supplementary material 链接：https://arxiv.org/abs/2106.09291 摘要：深度神经网络需要大量的标记数据才能获得良好的性能。在实际应用中，标签通常是从非专家（如众包）处收集的，以节省成本，因此很吵。在过去的几年里，人们发展了一种处理含噪标签的深度学习方法，其中许多方法是基于小损失准则的。然而，很少有理论分析来解释为什么这些方法能从有噪声的标签中很好地学习。本文从理论上解释了广泛应用的小损耗准则的工作原理。在此基础上，我们对vanilla小损失准则进行了改进，以更好地解决标签噪声问题。实验结果验证了我们的理论解释，也证明了改进的有效性。摘要：Deep neural networks need large amounts of labeled data to achieve good performance. In real-world applications, labels are usually collected from non-experts such as crowdsourcing to save cost and thus are noisy. In the past few years, deep learning methods for dealing with noisy labels have been developed, many of which are based on the small-loss criterion. However, there are few theoretical analyses to explain why these methods could learn well from noisy labels. In this paper, we theoretically explain why the widely-used small-loss criterion works. Based on the explanation, we reformalize the vanilla small-loss criterion to better tackle noisy labels. The experimental results verify our theoretical explanation and also demonstrate the effectiveness of the reformalization.

【6】 Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning 标题：为什么预先训练的语言模型有助于下游任务？试析中心语和提示语的调音

作者：Colin Wei,Sang Michael Xie,Tengyu Ma 机构：Stanford University, Department of Computer Science 链接：https://arxiv.org/abs/2106.09226 摘要：经过预训练的语言模型在适应下游NLP任务时取得了最先进的性能。然而，这些模型的理论分析是稀缺的和具有挑战性的，因为预训练和下游任务可能会有很大的不同。我们提出了一个分析框架，将训练前和下游任务与潜在变量文本生成模型联系起来——下游分类器必须恢复潜在变量的后验分布函数。我们分析了这种设置下的头部调整（在冻结的预训练模型上学习分类器）和提示调整。在我们的分析中，生成模型要么是一个隐马尔可夫模型（HMM），要么是一个带有潜在记忆成分的HMM，其动机是自然语言中的长期依赖性。我们证明了：1）在HMM的某些非简并条件下，简单的分类头可以解决下游任务；2）快速调整可以在较弱的非简并条件下获得下游保证，3）由于任务相关信息更容易从长时记忆中恢复，因此增强记忆HMM的恢复保证要比普通HMM强。对HMMs合成数据的实验支持了我们的理论发现。摘要：Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.

【7】 Towards a Rigorous Theoretical Analysis and Evaluation of GNN Explanations 标题：走向严谨的GNN解释理论分析与评价

作者：Chirag Agarwal,Marinka Zitnik,Himabindu Lakkaraju 机构：Harvard University 链接：https://arxiv.org/abs/2106.09078 摘要：随着图形神经网络（GNNs）越来越多地应用于现实世界中，确保利益相关者理解其预测背后的基本原理变得至关重要。虽然最近提出了几种GNN解释方法，但对这些方法的行为进行理论分析或系统评价其有效性的工作却很少。在这里，我们介绍了第一个公理框架的理论分析，评估和比较国家的最先进的GNN解释方法。为了得到可靠的解释，我们概述并形式化了所有GNN解释方法应该满足的关键特性，即忠实性、稳定性和公平性。我们利用这些性质提出了有史以来第一次理论分析的国家最先进的GNN解释方法的有效性。我们的分析为流行的GNN解释方法建立了上述所有性质的上界。我们还利用我们的框架对来自不同领域的多个真实数据集的这些方法进行了实证评估。我们的实证结果表明，一些流行的GNN解释方法（例如，基于梯度的方法）的性能并不比随机基线好，并且利用图结构的方法比仅依赖节点特征的方法更有效。摘要：As Graph Neural Networks (GNNs) are increasingly employed in real-world applications, it becomes critical to ensure that the stakeholders understand the rationale behind their predictions. While several GNN explanation methods have been proposed recently, there has been little to no work on theoretically analyzing the behavior of these methods or systematically evaluating their effectiveness. Here, we introduce the first axiomatic framework for theoretically analyzing, evaluating, and comparing state-of-the-art GNN explanation methods. We outline and formalize the key desirable properties that all GNN explanation methods should satisfy in order to generate reliable explanations, namely, faithfulness, stability, and fairness. We leverage these properties to present the first ever theoretical analysis of the effectiveness of state-of-the-art GNN explanation methods. Our analysis establishes upper bounds on all the aforementioned properties for popular GNN explanation methods. We also leverage our framework to empirically evaluate these methods on multiple real-world datasets from diverse domains. Our empirical results demonstrate that some popular GNN explanation methods (e.g., gradient-based methods) perform no better than a random baseline and that methods which leverage the graph structure are more effective than those that solely rely on the node features.

【8】 Design and Analysis of Robust Deep Learning Models for Stock Price Prediction 标题：股票价格预测的鲁棒深度学习模型设计与分析

作者：Jaydip Sen,Sidra Mehtab 机构：Department of Data Science, Praxis Business School, Kolkata, India., School of Computing and Analytics, NSHM Knowledge Campus, Kolkata, India 备注：This is the pre-print of our chapter that has been accepted for publication in the forthcoming book entitled "Machine Learning: Algorithms, Models, and Applications". The book will be published by IntechOpen, London, UK, in an open access in the later part of the year 2021. The chapter is 29 pages long, and it has 20 figures and 21 tables. arXiv admin note: substantial text overlap with arXiv:2103.15096 链接：https://arxiv.org/abs/2106.09664 摘要：建立预测模型，对股票价格和股价走势进行稳健、准确的预测，是一个具有挑战性的研究课题。众所周知的有效市场假说认为，在一个有效的股票市场中，由于股票价格被假定为纯粹随机的，因此不可能准确地预测未来的股票价格。然而，研究人员提出的大量工作表明，利用复杂的算法、模型结构和模型中适当变量的选择，可以高精度地预测未来股票价格。本章提出了一系列基于深度学习架构的预测回归模型，用于稳健而精确地预测印度国家证券交易所（NSE）多元化行业上市股票的未来价格。Metastock工具用于每隔5分钟下载两年（2013-2014年）的历史股价。虽然第一年的记录用于训练模型，但测试是使用剩余的记录进行的。详细介绍了各种模型的设计方法和性能结果。并从模型的执行时间和预测精度两个方面进行了比较。摘要：Building predictive models for robust and accurate prediction of stock prices and stock price movement is a challenging research problem to solve. The well-known efficient market hypothesis believes in the impossibility of accurate prediction of future stock prices in an efficient stock market as the stock prices are assumed to be purely stochastic. However, numerous works proposed by researchers have demonstrated that it is possible to predict future stock prices with a high level of precision using sophisticated algorithms, model architectures, and the selection of appropriate variables in the models. This chapter proposes a collection of predictive regression models built on deep learning architecture for robust and precise prediction of the future prices of a stock listed in the diversified sectors in the National Stock Exchange (NSE) of India. The Metastock tool is used to download the historical stock prices over a period of two years (2013- 2014) at 5 minutes intervals. While the records for the first year are used to train the models, the testing is carried out using the remaining records. The design approaches of all the models and their performance results are presented in detail. The models are also compared based on their execution time and accuracy of prediction.

【9】 Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit 标题：新生儿重症监护病房以儿童为中心的全天录音中情感内容的自动分析

作者：Einari Vaaras,Sari Ahlqvist-Björkroth,Konstantinos Drossos,Okko Räsänen 机构：Unit of Computing Sciences, Tampere University, Finland, Department of Clinical Medicine, University of Turku, Finland, Department of Signal Processing and Acoustics, Aalto University, Finland 链接：https://arxiv.org/abs/2106.09539 摘要：研究人员最近开始研究婴儿听到的情绪性言语如何影响他们的发育结果。作为这项研究的一部分，在所谓的苹果研究的背景下，从芬兰和爱沙尼亚的两家医院收集了数百小时的来自早产儿音频环境的全天录音。为了在如此庞大的数据集中分析语音的情感内容，需要一个语音情感自动识别系统。然而，没有情感标签或现有的印度教系统可用于此目的。在本文中，我们介绍了这个最初没有注释的大规模真实世界音频数据集，并描述了为芬兰数据子集开发的一个功能SER系统。我们探讨了将SER系统部署到新领域的替代技术的有效性，比较了跨语料库的泛化、基于WGAN的领域自适应和任务中的主动学习。结果表明，对于价和唤醒的二元分类，性能最好的模型能够分别达到73.4%的未加权平均回忆（UAR）和73.2%的UAR。结果还表明，主动学习取得了最一致的表现相比，这两种选择。摘要：Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.

检测相关(2篇)

【1】 The Fishnet Open Images Database: A Dataset for Fish Detection and Fine-Grained Categorization in Fisheries 标题：渔网开放图像数据库：用于渔业鱼类检测和细粒度分类的数据集

作者：Justin Kay,Matt Merrifield 机构：Ai.Fish, The Nature Conservancy 备注：In 8th Workshop on Fine-Grained Visual Categorization at CVPR 2021 链接：https://arxiv.org/abs/2106.09178 摘要：基于摄像机的电子监测系统越来越多地部署在商业渔船上，以收集渔业管理和管制所需的基本数据。这些系统产生大量的视频数据，必须由人类专家在陆地上进行审查。计算机视觉可以通过自动检测和分类鱼种来辅助这一过程，但是这一领域现有公共数据的缺乏阻碍了这一进程。为了解决这个问题，我们提出了鱼网开放图像数据库，一个大型的EM图像数据集，用于商业渔船上的鱼类检测和细粒度分类。该数据集由86029幅图像组成，包含34个对象类，是迄今为止最大、最多样化的渔业EM图像公共数据集。它包括EM数据的许多特征性挑战：物种间的视觉相似性、倾斜的类分布、恶劣的天气条件和混乱的船员活动。我们评估了现有的检测和分类算法的性能，并证明该数据集可以作为渔业计算机视觉算法发展的一个具有挑战性的基准。数据集位于https://www.fishnet.ai/. 摘要：Camera-based electronic monitoring (EM) systems are increasingly being deployed onboard commercial fishing vessels to collect essential data for fisheries management and regulation. These systems generate large quantities of video data which must be reviewed on land by human experts. Computer vision can assist this process by automatically detecting and classifying fish species, however the lack of existing public data in this domain has hindered progress. To address this, we present the Fishnet Open Images Database, a large dataset of EM imagery for fish detection and fine-grained categorization onboard commercial fishing vessels. The dataset consists of 86,029 images containing 34 object classes, making it the largest and most diverse public dataset of fisheries EM imagery to-date. It includes many of the characteristic challenges of EM data: visual similarity between species, skewed class distributions, harsh weather conditions, and chaotic crew activity. We evaluate the performance of existing detection and classification algorithms and demonstrate that the dataset can serve as a challenging benchmark for development of computer vision algorithms in fisheries. The dataset is available at https://www.fishnet.ai/.

【2】 A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection 标题：一种改进近OOD检测的简单马氏距离确定方法

作者：Jie Ren,Stanislav Fort,Jeremiah Liu,Abhijit Guha Roy,Shreyas Padhy,Balaji Lakshminarayanan 机构： 1Google Research 2Stanford Uni-versity 3Harvard University 4Google Health 链接：https://arxiv.org/abs/2106.09022 摘要：马氏距离（MD）是一种简单而流行的后处理方法，用于检测神经网络中的非分布（OOD）输入。分析了近OOD检测的失效模式，提出了一种简单的相对马氏距离（RMD）方法，提高了检测性能，对超参数选择具有较强的鲁棒性。在广泛选择的具有挑战性的视觉、语言和生物学OOD基准（CIFAR-100与CIFAR-10、临床OOD意向检测、基因组学OOD）上，我们表明RMD有意义地改善MD性能（在基因组学OOD上高达15%的AUROC）。摘要：Mahalanobis distance (MD) is a simple and popular post-processing method for detecting out-of-distribution (OOD) inputs in neural networks. We analyze its failure modes for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice. On a wide selection of challenging vision, language, and biology OOD benchmarks (CIFAR-100 vs CIFAR-10, CLINC OOD intent detection, Genomics OOD), we show that RMD meaningfully improves upon MD performance (by up to 15% AUROC on genomics OOD).

分类|识别(7篇)

【1】 PAC-Bayes, MAC-Bayes and Conditional Mutual Information: Fast rate bounds that handle general VC classes 标题：PAC-Bayes、MAC-Bayes和条件互信息：处理一般VC类的快速速率界限

作者：Peter Grünwald,Thomas Steinke,Lydia Zakynthinou 备注：24 pages, accepted for publication at COLT 2021 链接：https://arxiv.org/abs/2106.09683 摘要：我们给出了一个新的，统一的推导条件PAC贝叶斯和互信息（MI）推广界。我们将条件MI边界作为一个实例，通过特殊的先验选择，导出了条件MAC-Bayesian（Mean-approximal Correct）边界，它本身来自条件PAC-Bayesian边界，其中“conditional”表示可以使用以联合训练和ghost样本为条件的先验。这使我们能够为一般的VC类获得非平凡的PAC贝叶斯和MI风格的边界，这在最近的标准PAC贝叶斯/MI边界中是不可能的。第二，如果Bernstein条件成立，对于$gamma>1/2$，对于exp凹损失（使用$gamma=1$），它允许我们获得更快的$O left（（{text{KL}/n）^{gamma}right）$，这在标准PAC-Bayes泛化和MI边界中都是不可能的。我们的工作扩展了Steinke和Zakynthinou[2020]最近的工作，他们用VC处理MI，但既不是PAC-Bayes也不是fast-rates，Hellstr “om和Durisi[2020]最近的工作通过统一的指数不等式将后者扩展到PAC-Bayes设置，Mhammedi等人[2019]提出了快速PAC-Bayes泛化误差界，但既不处理MI类，也不处理一般VC类。摘要：We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and ghost sample. This allows us to get nontrivial PAC-Bayes and MI-style bounds for general VC classes, something recently shown to be impossible with standard PAC-Bayesian/MI bounds. Second, it allows us to get faster rates of order $O left(({text{KL}}/n)^{gamma}right)$ for $gamma > 1/2$ if a Bernstein condition holds and for exp-concave losses (with $gamma=1$), which is impossible with both standard PAC-Bayes generalization and MI bounds. Our work extends the recent work by Steinke and Zakynthinou [2020] who handle MI with VC but neither PAC-Bayes nor fast rates, the recent work of Hellstr"om and Durisi [2020] who extend the latter to the PAC-Bayes setting via a unifying exponential inequality, and Mhammedi et al. [2019] who initiated fast rate PAC-Bayes generalization error bounds but handle neither MI nor general VC classes.

【2】 Multi-Modal Prototype Learning for Interpretable Multivariable Time Series Classification 标题：可解释多变量时间序列分类的多模态原型学习

作者：Gaurav R. Ghosal,Reza Abbasi-Asl 机构：Department of Neurology, University of California, San Francisco, CA, USA, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA 备注：14 pages, 6 figures 链接：https://arxiv.org/abs/2106.09636 摘要：多变量时间序列分类问题在生物学、金融学等领域的应用日益广泛，其复杂性也日益提高。虽然深度学习方法是解决这些问题的有效工具，但它们往往缺乏可解释性。在这项工作中，我们提出了一个新的多变量时间序列分类的模块化原型学习框架。在我们框架的第一阶段，编码器独立地从每个变量中提取特征。原型层在生成的特征空间中识别单变量原型。我们的框架的下一个阶段根据多变量时间序列样本点与这些单变量原型的相似性来表示它们。这导致多变量模式的固有解释性表示，在此基础上应用原型学习来提取代表性示例，即多变量原型。因此，我们的框架能够明确地识别单个变量中的信息模式，以及变量之间的关系。我们在一个具有嵌入模式的模拟数据集以及一个真实的人类活动识别问题上验证了我们的框架。在这些任务上，我们的框架达到了与现有时间序列分类方法相当或更好的分类性能。在模拟的数据集上，我们发现我们的模型返回的解释与嵌入的模式一致。此外，在活动识别数据集上学习到的解释与领域知识一致。摘要：Multivariable time series classification problems are increasing in prevalence and complexity in a variety of domains, such as biology and finance. While deep learning methods are an effective tool for these problems, they often lack interpretability. In this work, we propose a novel modular prototype learning framework for multivariable time series classification. In the first stage of our framework, encoders extract features from each variable independently. Prototype layers identify single-variable prototypes in the resulting feature spaces. The next stage of our framework represents the multivariable time series sample points in terms of their similarity to these single-variable prototypes. This results in an inherently interpretable representation of multivariable patterns, on which prototype learning is applied to extract representative examples i.e. multivariable prototypes. Our framework is thus able to explicitly identify both informative patterns in the individual variables, as well as the relationships between the variables. We validate our framework on a simulated dataset with embedded patterns, as well as a real human activity recognition problem. Our framework attains comparable or superior classification performance to existing time series classification methods on these tasks. On the simulated dataset, we find that our model returns interpretations consistent with the embedded patterns. Moreover, the interpretations learned on the activity recognition dataset align with domain knowledge.

【3】 Interpretable Machine Learning Classifiers for Brain Tumour Survival Prediction 标题：用于脑瘤生存预测的可解释机器学习分类器

作者：Colleen E. Charlton,Michael Tin Chung Poon,Paul M. Brennan,Jacques D. Fleuriot 机构：Artificial Intelligence and its Applications Institute, School of Informatics, University of, Edinburgh, Crichton Street, Edinburgh, EH,AB, UK, Cancer Research UK Brain Tumour Centre of Excellence, CRUK Edinburgh Centre 链接：https://arxiv.org/abs/2106.09424 摘要：由于不同的肿瘤行为和治疗反应，预测脑肿瘤患者的生存率具有挑战性。更好地估计预后将有助于治疗计划和患者支持。机器学习的发展为临床预测模型的发展提供了信息，但将其融入临床实践几乎是不存在的。其中一个原因是模型缺乏可解释性。在这篇论文中，我们使用一个新的脑瘤数据集来比较两个可解释的规则列表模型和流行的机器学习方法来预测脑瘤的存活率。所有模型都使用标准性能指标进行定量评估。对规则列表的可解释性和临床实用性也进行了定性评估。使用两种事后解释技术（LIME和SHAP）对黑盒机器学习模型的可解释性进行了评估。我们的结果表明，规则列表只比黑盒模型略好。我们证明，规则列表算法产生简单的决策列表，符合临床专业知识。相比之下，应用于黑箱模型的事后解释性方法可能会对局部模型预测产生不可靠的解释。模型的可解释性对于理解预测性能的差异和融入临床实践至关重要。摘要：Prediction of survival in patients diagnosed with a brain tumour is challenging because of heterogeneous tumour behaviours and responses to treatment. Better estimations of prognosis would support treatment planning and patient support. Advances in machine learning have informed development of clinical predictive models, but their integration into clinical practice is almost non-existent. One reasons for this is the lack of interpretability of models. In this paper, we use a novel brain tumour dataset to compare two interpretable rule list models against popular machine learning approaches for brain tumour survival prediction. All models are quantitatively evaluated using standard performance metrics. The rule lists are also qualitatively assessed for their interpretability and clinical utility. The interpretability of the black box machine learning models is evaluated using two post-hoc explanation techniques, LIME and SHAP. Our results show that the rule lists were only slightly outperformed by the black box models. We demonstrate that rule list algorithms produced simple decision lists that align with clinical expertise. By comparison, post-hoc interpretability methods applied to black box models may produce unreliable explanations of local model predictions. Model interpretability is essential for understanding differences in predictive performance and for integration into clinical practice.

【4】 Voice2Series: Reprogramming Acoustic Models for Time Series Classification 标题：Voice2Series：用于时间序列分类的重新编程声学模型

作者：Chao-Han Huck Yang,Yun-Yun Tsai,Pin-Yu Chen 机构： 1GeorgiaInstituteofTechnology 2ColumbiaUniver-sity 3IBMResearch 备注：Accepted to ICML 2021, 16 Pages 链接：https://arxiv.org/abs/2106.09296 摘要：如何利用有限的数据对时间序列进行分类是一个实际而又富有挑战性的问题。目前的方法主要是基于手工设计的特征提取规则或特定领域的数据扩充。基于深度语音处理模型的发展和语音数据是单变量时间信号这一事实，本文提出了Voice2Series（V2S），一种新的端到端方法，通过输入变换学习和输出标签映射对声学模型进行重编程以进行时间序列分类。利用大规模预训练语音处理模型的表征学习能力，在30个不同的时间序列任务上，我们证明了v2在20个任务上的表现优于或与最先进的方法相结合，平均准确率提高了1.84%。通过证明V2S的总体风险是源风险和Wasserstein距离的上界，我们进一步从理论上证明了V2S的合理性。研究结果为时间序列分类提供了新的有效手段。摘要：Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation. Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper, we propose Voice2Series (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation learning and output label mapping. Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%. We further provide a theoretical justification of V2S by proving its population risk is upper bounded by the source risk and a Wasserstein distance accounting for feature alignment via reprogramming. Our results offer new and effective means to time series classification.

【5】 Automatic Main Character Recognition for Photographic Studies 标题：摄影研究中的主要字符自动识别

作者：Mert Seker,Anssi Männistö,Alexandros Iosifidis,Jenni Raitoharju 机构： Tampere University, fi) 3Department of Eletrical and Computer Engineering, Aarhus University, Finnish Environment Institute 备注：6 pages, 4 figures, 2 tables 链接：https://arxiv.org/abs/2106.09064 摘要：图像中的主要人物是最重要的人物，他们在第一眼看到时就吸引了观众的注意力，他们被诸如大小、位置、色彩饱和度和焦点清晰度等特性所强调。在传统的摄影研究和媒体分析中，识别图像中的主要人物起着重要的作用，但是这项工作是手工完成的，而且速度慢且费力。此外，主角的选择有时是主观的。本文分析了自动解决摄影学习所需的主要字符识别问题的可行性，提出了一种主要字符识别方法。该方法采用基于机器学习的人体姿态估计和传统的计算机视觉方法。我们将此任务视为一个二元分类问题，其中每个被检测到的人被分类为一个主要角色或不是。为了评估任务的主观性和我们的方法的性能，我们收集了来自多个来源的300个不同图像的数据集，并要求5个人，一个摄影研究人员和另外4个人，对主要人物进行注释。我们的分析表明，不同的注释者之间的一致性相对较高。该方法在完整图像集上获得了0.83的F1分，在被摄影研究者评价为最清晰和最重要案例的子集上获得了0.96的F1分。摘要：Main characters in images are the most important humans that catch the viewer's attention upon first look, and they are emphasized by properties such as size, position, color saturation, and sharpness of focus. Identifying the main character in images plays an important role in traditional photographic studies and media analysis, but the task is performed manually and can be slow and laborious. Furthermore, selection of main characters can be sometimes subjective. In this paper, we analyze the feasibility of solving the main character recognition needed for photographic studies automatically and propose a method for identifying the main characters. The proposed method uses machine learning based human pose estimation along with traditional computer vision approaches for this task. We approach the task as a binary classification problem where each detected human is classified either as a main character or not. To evaluate both the subjectivity of the task and the performance of our method, we collected a dataset of 300 varying images from multiple sources and asked five people, a photographic researcher and four other persons, to annotate the main characters. Our analysis showed a relatively high agreement between different annotators. The proposed method achieved a promising F1 score of 0.83 on the full image set and 0.96 on a subset evaluated as most clear and important cases by the photographic researcher.

【6】 Trainable Discrete Feature Embeddings for Variational Quantum Classifier 标题：变分量子分类器的可训练离散特征嵌入

作者：Napat Thumwanit,Chayaphol Lortararprasert,Hiroshi Yano,Rudy Raymond 机构：Dept. of Computer Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan, Dept. of Applied Physics and Physico-Informatics, Keio University, Kohoku-ku, Yokohama, Japan, Dept. of Mechanical Engineering, IBM Quantum, IBM Japan, Chuo-ku, Tokyo, Japan 链接：https://arxiv.org/abs/2106.09415 摘要：量子分类器在Hilbert空间中提供了复杂的输入数据嵌入，具有量子优势。这种优势源于量子特征映射，它将输入编码成具有变化量子电路的量子态。最近的一项工作展示了如何使用量子随机存取编码（QRAC）来映射具有较少量子比特的离散特征，QRAC是将二进制字符串编码成量子态的一种重要原语。结合QRAC和最近提出的量子特征映射训练策略&量子度量学习，提出了一种在可训练量子电路中嵌入离散特征的新方法。我们证明了所提出的可训练嵌入不仅需要像QRAC一样少的量子位，而且克服了QRAC对基于硬布尔函数的输入分类的局限性。我们用数值方法证明了它在变分量子分类器中的应用，以获得更好的分类性能，从而证明了它利用量子计算机进行量子机器学习的可能性。摘要：Quantum classifiers provide sophisticated embeddings of input data in Hilbert space promising quantum advantage. The advantage stems from quantum feature maps encoding the inputs into quantum states with variational quantum circuits. A recent work shows how to map discrete features with fewer quantum bits using Quantum Random Access Coding (QRAC), an important primitive to encode binary strings into quantum states. We propose a new method to embed discrete features with trainable quantum circuits by combining QRAC and a recently proposed strategy for training quantum feature map called quantum metric learning. We show that the proposed trainable embedding requires not only as few qubits as QRAC but also overcomes the limitations of QRAC to classify inputs whose classes are based on hard Boolean functions. We numerically demonstrate its use in variational quantum classifiers to achieve better performances in classifying real-world datasets, and thus its possibility to leverage near-term quantum computers for quantum machine learning.

【7】 Exponential Error Convergence in Data Classification with Optimized Random Features: Acceleration by Quantum Machine Learning 标题：优化随机特征数据分类的指数误差收敛：量子机器学习加速

作者：Hayata Yamasaki,Sho Sonoda 机构：Austrian Academy of Sciences, Vienna, Austria, Vienna University of Technology, Vienna, Austria, RIKEN AIP, Tokyo, Japan 备注：28 pages, no figure 链接：https://arxiv.org/abs/2106.09028 摘要：随机特征是基于核方法的可伸缩学习算法的核心技术。最近的一项研究表明，量子计算机机器学习算法QML（quantum machine learning）可以指数级地加快优化随机特征的采样速度，即使没有对矩阵的稀疏性和低rankness施加限制性的假设，这限制了传统QML算法的适用性；这种QML算法可以显著减少回归任务所需的特征数，并可证明最小化特征数。然而，QML领域的一个主要兴趣是量子计算的优势可以得到多大程度的利用，而不仅仅是在回归任务中。在这里，我们构造了一个QML算法，用于由优化的随机特征加速的分类任务。我们证明了在低噪声条件下，采样优化随机特征的QML算法与随机梯度下降（SGD）相结合，可以在减少分类错误的情况下达到最新的指数收敛速度；同时，我们的随机特征优化算法可以利用所需特征数的显著减少来加速SGD中的每次迭代和对算法得到的分类器的评估。这些结果发现QML在显著加速基于核方法的领先分类算法方面有很好的应用前景，同时不影响其对实际数据集的适用性和指数误差收敛速度。摘要：Random features are a central technique for scalable learning algorithms based on kernel methods. A recent work has shown that an algorithm for machine learning by quantum computer, quantum machine learning (QML), can exponentially speed up sampling of optimized random features, even without imposing restrictive assumptions on sparsity and low-rankness of matrices that had limited applicability of conventional QML algorithms; this QML algorithm makes it possible to significantly reduce and provably minimize the required number of features for regression tasks. However, a major interest in the field of QML is how widely the advantages of quantum computation can be exploited, not only in the regression tasks. We here construct a QML algorithm for a classification task accelerated by the optimized random features. We prove that the QML algorithm for sampling optimized random features, combined with stochastic gradient descent (SGD), can achieve state-of-the-art exponential convergence speed of reducing classification error in a classification task under a low-noise condition; at the same time, our algorithm with optimized random features can take advantage of the significant reduction of the required number of features so as to accelerate each iteration in the SGD and evaluation of the classifier obtained from our algorithm. These results discover a promising application of QML to significant acceleration of the leading classification algorithm based on kernel methods, without ruining its applicability to a practical class of data sets and the exponential error-convergence speed.

表征(1篇)

【1】 Do Large Scale Molecular Language Representations Capture Important Structural Information? 标题：大规模分子语言表示法能捕捉到重要的结构信息吗？

作者：Jerret Ross,Brian Belgodere,Vijil Chenthamarakshan,Inkit Padhi,Youssef Mroueh,Payel Das 机构：IBM Research AI 备注：17 pages, 3 figures 链接：https://arxiv.org/abs/2106.09553 摘要：从分子结构预测化学性质在药物发现和材料设计等许多应用中具有重要意义。与密度泛函理论（DFT）等计算方法相比，基于机器学习的分子性质预测方法有望以更低的复杂度实现精确预测。从分子图中提取的特征，以有监督的方式使用图神经网络，已经成为此类任务的强基线。然而，巨大的化学空间加上有限的标签使得监督学习具有挑战性，要求学习一个通用的分子表示。最近，在大量未标记语料库上预先训练的基于变换器的语言模型（PTLMs）在许多下游自然语言处理任务中产生了最新的结果。受这一发展的启发，在这里，我们提出了通过训练一个有效的Transformer编码器模型，称为成型机获得的分子嵌入。该模型采用线性注意机制，对PubChem和Zn数据集中11亿个未标记分子的一维SMILES序列进行高度平行化训练。实验表明，与现有的基于图形和基于指纹的有监督学习基线相比，所学习的分子表示在预测QM8和QM9分子性质的挑战性任务上具有竞争力。对MoLFormerr表示法的进一步特定于任务的微调改进了其中几个属性预测基准的性能。这些结果提供了令人鼓舞的证据，大规模的分子语言模型可以捕捉到足够的结构信息，以便能够准确地预测量子化学性质和其他性质。摘要：Predicting chemical properties from the structure of a molecule is of great importance in many applications including drug discovery and material design. Machine learning based molecular property prediction holds the promise of enabling accurate predictions at much less complexity, when compared to, for example Density Functional Theory (DFT) calculations. Features extracted from molecular graphs, using graph neural nets in a supervised manner, have emerged as strong baselines for such tasks. However, the vast chemical space together with the limited availability of labels makes supervised learning challenging, calling for learning a general-purpose molecular representation. Recently, pre-trained transformer-based language models (PTLMs) on large unlabeled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, here we present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer. This model was employed with a linear attention mechanism and highly paralleized training on 1D SMILES sequences of 1.1 billion unlabeled molecules from the PubChem and ZINC datasets. Experiments show that the learned molecular representation performs competitively, when compared to existing graph-based and fingerprint-based supervised learning baselines, on the challenging tasks of predicting properties of QM8 and QM9 molecules. Further task-specific fine-tuning of the MoLFormerr representation improves performance on several of those property prediction benchmarks. These results provide encouraging evidence that large-scale molecular language models can capture sufficient structural information to be able to accurately predict quantum chemical properties and beyond.

编码器(1篇)

【1】 Identifiability-Guaranteed Simplex-Structured Post-Nonlinear Mixture Learning via Autoencoder 标题：基于自动编码器的保证可辨识性的单纯形结构后非线性混合学习

作者：Qi Lyu,Xiao Fu 机构：School of Electrical Engineering and Computer Science, Oregon State University 链接：https://arxiv.org/abs/2106.09070 摘要：这项工作的重点是在无监督的方式解开非线性混合潜在成分的问题。假设潜在成分位于概率单纯形中，并由一个未知的后非线性混合系统进行变换。该问题在信号和数据分析中有着广泛的应用，如非线性高光谱分解、图像嵌入和非线性聚类等。线性混合学习问题已经是病态的，因为目标潜在成分的可辨识性通常很难建立。由于涉及到未知的非线性，这个问题更具挑战性。先前的工作提供了一个基于函数方程的公式，用于可证明的潜在成分识别。然而，可识别性条件有些苛刻和不现实。此外，可辨识性分析是基于无限样本（即总体）的情形，而对于实际有限样本情形的理解一直是难以捉摸的。此外，以往的算法在模型表达性和计算方便性之间进行权衡，这往往会影响学习性能。我们的贡献是三倍的。首先，在很大程度上放松的假设下导出了新的可辨识条件。其次，给出了全面的样本复杂性结果——这是第一次。第三，提出了一种基于约束自动编码器的算法框架，有效地规避了现有算法的挑战。合成和真实的实验证实了我们的理论分析。摘要：This work focuses on the problem of unraveling nonlinearly mixed latent components in an unsupervised manner. The latent components are assumed to reside in the probability simplex, and are transformed by an unknown post-nonlinear mixing system. This problem finds various applications in signal and data analytics, e.g., nonlinear hyperspectral unmixing, image embedding, and nonlinear clustering. Linear mixture learning problems are already ill-posed, as identifiability of the target latent components is hard to establish in general. With unknown nonlinearity involved, the problem is even more challenging. Prior work offered a function equation-based formulation for provable latent component identification. However, the identifiability conditions are somewhat stringent and unrealistic. In addition, the identifiability analysis is based on the infinite sample (i.e., population) case, while the understanding for practical finite sample cases has been elusive. Moreover, the algorithm in the prior work trades model expressiveness with computational convenience, which often hinders the learning performance. Our contribution is threefold. First, new identifiability conditions are derived under largely relaxed assumptions. Second, comprehensive sample complexity results are presented -- which are the first of the kind. Third, a constrained autoencoder-based algorithmic framework is proposed for implementation, which effectively circumvents the challenges in the existing algorithm. Synthetic and real experiments corroborate our theoretical analyses.

优化|敛散性(5篇)

【1】 Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks 标题：正交Padé激活函数：深层网络平滑快速收敛的可训练激活函数

作者：Koushik Biswas,Shilpak Banerjee,Ashish Kumar Pandey 备注：11 pages 链接：https://arxiv.org/abs/2106.09693 摘要：我们提出了正交Pad′e激活函数，这是一种可训练的激活函数，表明它们具有更快的学习能力，提高了标准深度学习数据集和模型的准确性。在我们的实验基础上，我们从六个正交Pad激活函数中找到了两个最佳的候选函数，我们称之为安全Hermite Pade（HP）激活函数，即HP-1和HP-2。与ReLU相比，在PreActResNet-34中，HP-1和HP-2的top-1准确率分别提高了5.06%和4.63%，在CIFAR100数据集上MobileNet V2模型分别提高了3.02%和2.75%，而在CIFAR10数据集上PreActResNet-34的top-1精度分别提高了2.02%和1.78%，LeNet分别提高了2.24%和2.06%，Efficientnet B0分别提高了2.15%和2.03%。摘要：We have proposed orthogonal-Pad'e activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models. Based on our experiments, we have found two best candidates out of six orthogonal-Pad'e activations, which we call safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2. When compared to ReLU, HP-1 and HP-2 has an increment in top-1 accuracy by 5.06% and 4.63% respectively in PreActResNet-34, by 3.02% and 2.75% respectively in MobileNet V2 model on CIFAR100 dataset while on CIFAR10 dataset top-1 accuracy increases by 2.02% and 1.78% respectively in PreActResNet-34, by 2.24% and 2.06% respectively in LeNet, by 2.15% and 2.03% respectively in Efficientnet B0.

【2】 Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework 标题：进行中的工作：移动还是FPGA？能效综合评价与统一优化框架

作者：Geng Yuan,Peiyan Dong,Mengshu Sun,Wei Niu,Zhengang Li,Yuxuan Cai,Jun Liu,Weiwen Jiang,Xue Lin,Bin Ren,Xulong Tang,Yanzhi Wang 机构：Northeastern University,College of William and Mary,Carnegie Mellon University, University of Notre Dame,University of Pittsburgh 备注：Poster in the 27th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2021 链接：https://arxiv.org/abs/2106.09166 摘要：深度神经网络（DNN）在边缘设备（即fpga和移动平台）上的有效部署是非常具有挑战性的，尤其是在DNN模型规模和复杂性不断增加的情况下。虽然各种优化方法已经被证明在许多边缘设备上的dnn中是有效的，但是大多数最新的工作集中在ad-hoc优化上，并且没有深入的研究来全面揭示不同边缘设备在考虑不同优化时的潜力和限制。本文对基于FPGA和基于移动的DNN执行的能量效率进行了定性和定量的比较，并进行了详细的分析。摘要：Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Although various optimization approaches have been proven to be effective in many DNNs on edge devices, most state-of-the-art work focuses on ad-hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different optimizations. In this paper, we qualitatively and quantitatively compare the energy-efficiency of FPGA-based and mobile-based DNN executions, and provide detailed analysis.

【3】 A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization 标题：PAGE的一个简短注记：非凸优化的最优收敛速度

作者：Zhize Li 机构：KAUST 备注：4 pages 链接：https://arxiv.org/abs/2106.09663 摘要：在本文中，我们首先回顾了非凸问题设置，并介绍了最优页面算法（Li等人，ICML'21）。然后，我们提供了一个简单和干净的收敛性分析页实现最佳收敛速度。此外，PAGE及其分析方法也很容易被采用，并推广到其他作品中。希望本文能为以后的工作提供一些启示和帮助。摘要：In this note, we first recall the nonconvex problem setting and introduce the optimal PAGE algorithm (Li et al., ICML'21). Then we provide a simple and clean convergence analysis of PAGE for achieving optimal convergence rates. Moreover, PAGE and its analysis can be easily adopted and generalized to other works. We hope that this note provides the insights and is helpful for future works.

【4】 Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting 标题：插值器的一致收敛性：高斯宽度、范数界和良性过拟合

作者：Frederic Koehler,Lijia Zhou,Danica J. Sutherland,Nathan Srebro 机构：MIT, University of Chicago, UBC and Amii, TTI-Chicago, Collaboration on the Theoretical Foundations of Deep Learning (deepfoundations.ai) 链接：https://arxiv.org/abs/2106.09276 摘要：研究了高斯数据下高维线性回归中的插值学习问题，证明了在任意假设类中，插值器的泛化误差在类的高斯宽度上是一致收敛的。将泛型界应用于欧氏范数球，恢复了Bartlett等人（2020）关于最小范数插值器的一致性结果，并证实了Zhou等人（2020）关于高斯数据特殊情况下的近似最小范数插值器的预测。通过将其应用于单纯形，我们证明了界的一般性，得到了最小l1范数插值（基追踪）的一个新的一致性结果。我们的结果表明，基于范数的泛化边界可以解释和用于分析良性过拟合，至少在某些情况下。摘要：We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum l1-norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.

【5】 Optimum-statistical collaboration towards efficient black-box optimization 标题：面向高效黑盒优化的最优统计协作

作者：Wenjie Li,Chihua Wang,Guang Cheng 机构：Department of Statistics, Purdue University, West Lafayette, IN 链接：https://arxiv.org/abs/2106.09215 摘要：随着训练中涉及的超参数越来越多，机器学习系统需要更好地理解超参数自动调整。这引起了人们对可证明黑盒优化的研究兴趣，通过在算法设计中实现更好的探索机制，管理优化和统计误差的流量，使得黑盒优化更加实用。以前的工作主要集中在描述优化误差，但这是不足的：黑盒优化算法可能是低效的，不考虑异质性之间的奖励样本。本文对统计不确定性在黑箱优化中的作用做了重点刻画，指导了更高效的算法设计。我们介绍了一个管理优化误差流和优化过程中统计误差流之间相互作用的框架textit{optimum statistical collaboration}。受这个框架的启发，我们提出了只考虑局部光滑性假设的目标函数的texttt{VHCT}算法。从理论上证明了该算法具有速率最优后悔界；在实验中，我们证明了该算法在广泛的设置优于先前的努力。摘要：With increasingly more hyperparameters involved in their training, machine learning systems demand a better understanding of hyperparameter tuning automation. This has raised interest in studies of provably black-box optimization, which is made more practical by better exploration mechanism implemented in algorithm design, managing the flux of both optimization and statistical errors. Prior efforts focus on delineating optimization errors, but this is deficient: black-box optimization algorithms can be inefficient without considering heterogeneity among reward samples. In this paper, we make the key delineation on the role of statistical uncertainty in black-box optimization, guiding a more efficient algorithm design. We introduce textit{optimum-statistical collaboration}, a framework of managing the interaction between optimization error flux and statistical error flux evolving in the optimization process. Inspired by this framework, we propose the texttt{VHCT} algorithms for objective functions with only local-smoothness assumptions. In theory, we prove our algorithm enjoys rate-optimal regret bounds; in experiments, we show the algorithm outperforms prior efforts in extensive settings.

预测|估计(3篇)

【1】 Deep generative modeling for probabilistic forecasting in power systems 标题：电力系统概率预测的深度产生式建模

作者：Jonathan Dumas,Antoine Wehenkel Damien Lanaspeze,Bertrand Cornélusse,Antonio Sutera 机构：Liege University, Departments of Computer Science and Electrical Engineering, Belgium, Mines ParisTech, France 链接：https://arxiv.org/abs/2106.09370 摘要：对可再生能源比例较高的终端使用部门进行更大程度的直接电气化，是到2050年实现碳中和社会的支柱之一。本研究采用了一种最新的深度学习技术，即正常化流程，为决策者提供准确的概率预测，以应对电力系统应用中的新挑战。通过使用2014年全球能源预测竞赛的开放数据进行综合实证评估，我们证明了我们的方法与其他最先进的深度学习生成模型（生成对抗网络和变分自动编码器）具有竞争力。通过考虑能源零售商的案例研究，对基于天气的风力、太阳能和负荷情景的模型在预测值和质量方面进行了适当的比较，并使用了若干补充指标。摘要：Greater direct electrification of end-use sectors with a higher share of renewables is one of the pillars to power a carbon-neutral society by 2050. This study uses a recent deep learning technique, the normalizing flows, to produce accurate probabilistic forecasts that are crucial for decision-makers to face the new challenges in power systems applications. Through comprehensive empirical evaluations using the open data of the Global Energy Forecasting Competition 2014, we demonstrate that our methodology is competitive with other state-of-the-art deep learning generative models: generative adversarial networks and variational autoencoders. The models producing weather-based wind, solar power, and load scenarios are properly compared both in terms of forecast value, by considering the case study of an energy retailer, and quality using several complementary metrics.

【2】 Frustratingly Easy Transferability Estimation 标题：容易得令人沮丧的可转移性评估

作者：Long-Kai Huang,Ying Wei,Yu Rong,Qiang Yang,Junzhou Huang 机构：Tencent AI Lab†Corresponding author; Department of Computer Science, City University of Hong Kong‡Hong Kong University of Science and TechnologyEmail 链接：https://arxiv.org/abs/2106.09362 摘要：可转移性估计是选择一个预先训练好的模型及其各层进行转移的重要工具，可以最大限度地提高目标任务的性能，防止负迁移。现有的估计算法要么需要对目标任务进行强化训练，要么难以评估层间的可转移性。我们提出了一个简单、高效、有效的可转移性度量TransRate。TransRate通过对目标数据的单次传递，将预先训练好的模型提取的目标样本特征与标签之间的互信息作为可传递性的度量。我们克服了有效的互信息估计的挑战，借助于编码率，作为一种有效的替代熵。从理论上分析，迁移率与迁移学习后的学习成绩密切相关。尽管TransRate在10行代码中非常简单，但它在22个预先训练的模型和16个下游任务的广泛评估中表现得非常好。摘要：Transferability estimation has been an essential tool in selecting a pre-trained model and the layers of it to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. We propose a simple, efficient, and effective transferability measure named TransRate. With single pass through the target data, TransRate measures the transferability as the mutual information between the features of target examples extracted by a pre-trained model and labels of them. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. TransRate is theoretically analyzed to be closely related to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 22 pre-trained models and 16 downstream tasks.

【3】 Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction 标题：时间序列是一种特殊的序列：样本卷积和交互作用预测

作者：Minhao Liu,Ailing Zeng,Qiuxia Lai,Qiang Xu 机构：The Chinese University of Hong Kong 链接：https://arxiv.org/abs/2106.09305 摘要：时间序列是一种特殊类型的序列数据，是一组以偶数时间间隔收集并按时间顺序排列的观测数据。现有的深度学习技术使用一般的序列模型（例如，递归神经网络、Transformer模型或时间卷积网络）进行时间序列分析，而忽略了它的一些独特特性。例如，时间序列数据的下采样通常会保留数据中的大部分信息，而对于文本序列和DNA序列等一般序列数据则不是这样。基于此，本文提出了一种新的神经网络结构，并将其应用于时间序列预测问题，在多分辨率下进行样本卷积和交互，实现时间序列建模。提出的体系结构名为lyscinet，有助于提取具有增强可预测性的特征。实验结果表明，SCINet在各种实际时间序列预测数据集的预测精度上都比现有的方法有了显著的提高。特别是，它可以在不使用复杂的空间建模技术的情况下，对那些时空数据集实现高精度的预测。我们的代码和数据见补充材料。摘要：Time series is a special type of sequence data, a set of observations collected at even intervals of time and ordered chronologically. Existing deep learning techniques use generic sequence models (e.g., recurrent neural network, Transformer model, or temporal convolutional network) for time series analysis, which ignore some of its unique properties. For example, the downsampling of time series data often preserves most of the information in the data, while this is not true for general sequence data such as text sequence and DNA sequence. Motivated by the above, in this paper, we propose a novel neural network architecture and apply it for the time series forecasting problem, wherein we conduct sample convolution and interaction at multiple resolutions for temporal modeling. The proposed architecture, namelySCINet, facilitates extracting features with enhanced predictability. Experimental results show that SCINet achieves significant prediction accuracy improvement over existing solutions across various real-world time series forecasting datasets. In particular, it can achieve high fore-casting accuracy for those temporal-spatial datasets without using sophisticated spatial modeling techniques. Our codes and data are presented in the supplemental material.

其他神经网络|深度学习|模型|建模(31篇)

【1】 Multi-Label Learning from Single Positive Labels 标题：基于单个正标签的多标签学习

作者：Elijah Cole,Oisin Mac Aodha,Titouan Lorieul,Pietro Perona,Dan Morris,Nebojsa Jojic 机构：Caltech, University of Edinburgh, Inria, Microsoft AI for Earth, Microsoft Research 备注：CVPR 2021 链接：https://arxiv.org/abs/2106.09708 摘要：预测给定图像的所有适用标签称为多标签分类。与标准的多类情况（每个图像只有一个标签）相比，为多标签分类对训练数据进行注释要困难得多。当潜在标签的数量很大时，人工注释者发现很难为每个训练图像提及所有适用的标签。此外，在某些设置中，检测本质上是困难的，例如在高分辨率图像中查找小对象实例。因此，多标签训练数据往往存在误报问题。我们考虑这个问题的最困难版本，注释器只为每个图像提供一个相关标签。因此，训练集将只有一个积极的标签，每个图像和没有确认的负面影响。我们探讨了这种特殊的情况下，学习从四个不同的多标签图像分类数据集的线性分类器和端到端微调深网络丢失标签。我们将现有的多标签损失扩展到这一设置，并提出了新的变体来限制训练过程中预期阳性标签的数量。令人惊讶的是，我们发现，在某些情况下，尽管训练时确认的标签明显较少，但仍然有可能接近完全标记分类器的性能。摘要：Predicting all applicable labels for a given image is known as multi-label classification. Compared to the standard multi-class case (where each image has only one label), it is considerably more challenging to annotate training data for multi-label classification. When the number of potential labels is large, human annotators find it difficult to mention all applicable labels for each training image. Furthermore, in some settings detection is intrinsically difficult e.g. finding small object instances in high resolution images. As a result, multi-label training data is often plagued by false negatives. We consider the hardest version of this problem, where annotators provide only one relevant label for each image. As a result, training sets will have only one positive label per image and no confirmed negatives. We explore this special case of learning from missing labels across four different multi-label image classification datasets for both linear classifiers and end-to-end fine-tuned deep networks. We extend existing multi-label losses to this setting and propose novel variants that constrain the number of expected positive labels during training. Surprisingly, we show that in some cases it is possible to approach the performance of fully labeled classifiers despite training with significantly fewer confirmed labels.

【2】 Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning 标题：永远是梦想：无数据课堂增量学习的新途径

作者：James Smith,Yen-Chang Hsu,Jonathan Balloch,Yilin Shen,Hongxia Jin,Zsolt Kira 机构：Georgia Institute of Technology,Samsung Research America 链接：https://arxiv.org/abs/2106.09701 摘要：随着时间的推移，现代计算机视觉应用在逐渐学习新概念时会遭遇灾难性的遗忘。最成功的缓解这种遗忘的方法需要大量重放以前看到的数据，这在内存限制或数据合法性问题存在时是有问题的。在这项工作中，我们考虑了无数据类增量学习（DFCIL）的高影响问题，其中增量学习代理必须随着时间的推移学习新的概念，而无需存储生成器或来自过去任务的训练数据。DFCIL的一种方法是重放通过反转学习者分类模型的冻结副本生成的合成图像，但是我们表明，当使用标准蒸馏策略时，这种方法对于常见的类增量基准测试是失败的。我们诊断了失败的原因，并提出了一种新的DFCIL增量蒸馏策略，改进了交叉熵训练和重要性加权特征蒸馏，结果表明，与SOTA-DFCIL方法相比，该方法的最终任务精度（绝对差）提高了25.1%。我们的方法甚至优于几种标准的基于回放的方法，这些方法存储一组核心图像。摘要：Modern computer vision applications suffer from catastrophic forgetting when incrementally learning new concepts over time. The most successful approaches to alleviate this forgetting require extensive replay of previously seen data, which is problematic when memory constraints or data legality concerns exist. In this work, we consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL), where an incremental learning agent must learn new concepts over time without storing generators or training data from past tasks. One approach for DFCIL is to replay synthetic images produced by inverting a frozen copy of the learner's classification model, but we show this approach fails for common class-incremental benchmarks when using standard distillation strategies. We diagnose the cause of this failure and propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation, and show that our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks. Our method even outperforms several standard replay based methods which store a coreset of images.

【3】 Poisoning and Backdooring Contrastive Learning 标题：中毒与倒退对比学习

作者：Nicholas Carlini,Andreas Terzis 机构：Google 链接：https://arxiv.org/abs/2106.09667 摘要：在有噪声和无噪声的训练数据集上采用CLIP-train等对比学习方法。这比手动标记数据集便宜，甚至提高了分布外的健壮性。我们表明，这种做法使后门和中毒攻击的一个重大威胁。通过仅使数据集的0.005%中毒（例如，300万个示例概念性标题数据集的150个图像），我们可以通过覆盖一个小补丁使模型对测试图像进行错误分类。有针对性的中毒攻击，即模型将特定的测试输入错误分类为一个不利的期望标签，甚至更容易要求控制少于0.0001%的数据集（例如，300万张图像中只有两张）。我们的攻击让人怀疑，在嘈杂和未经处理的互联网上进行训练是否可取。摘要：Contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.005% of a dataset (e.g., just 150 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of less than 0.0001% of the dataset (e.g., just two out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

【4】 Non-intrusive Nonlinear Model Reduction via Machine Learning Approximations to Low-dimensional Operators 标题：基于机器学习逼近低维算子的非侵入式非线性模型降阶

作者：Zhe Bai,Liqian Peng 机构：Computational Research, Lawrence Berkeley National Lab, Berkeley, CA , Facebook AI Applied Research, Menlo Park, CA 链接：https://arxiv.org/abs/2106.09658 摘要：尽管基于投影的参数化非线性动力系统降阶模型（ROMs）在许多应用中都取得了令人兴奋的结果，但其广泛应用受到其侵入性的限制：实现这种降阶模型通常需要对底层仿真代码进行重大修改。为了解决这个问题，我们提出了一种方法，使传统的入侵降阶模型能够准确地近似在一个非入侵的方式。具体而言，该方法使用现代机器学习回归技术来逼近与基于投影的降阶模型（rom）相关联的低维算子。仿真代码的唯一要求是能够导出给定状态和参数的速度，因为该功能用于训练近似的低维算子。除了支持非侵入性之外，我们还证明了该方法还具有非常低的计算复杂度，实现了高达1000美元的运行时间缩减。我们在两种偏微分方程上证明了该方法的有效性。摘要：Although projection-based reduced-order models (ROMs) for parameterized nonlinear dynamical systems have demonstrated exciting results across a range of applications, their broad adoption has been limited by their intrusivity: implementing such a reduced-order model typically requires significant modifications to the underlying simulation code. To address this, we propose a method that enables traditionally intrusive reduced-order models to be accurately approximated in a non-intrusive manner. Specifically, the approach approximates the low-dimensional operators associated with projection-based reduced-order models (ROMs) using modern machine-learning regression techniques. The only requirement of the simulation code is the ability to export the velocity given the state and parameters as this functionality is used to train the approximated low-dimensional operators. In addition to enabling nonintrusivity, we demonstrate that the approach also leads to very low computational complexity, achieving up to $1000times$ reduction in run time. We demonstrate the effectiveness of the proposed technique on two types of PDEs.

【5】 Deep Learning Through the Lens of Example Difficulty 标题：通过示例难度的镜头进行深度学习

作者：Robert J. N. Baldock,Hartmut Maennel,Behnam Neyshabur 机构：Google Research, Brain Team, Google Research, Blueshift Team 备注：Main paper: 15 pages, 8 figures. Appendix: 31 pages, 40 figures 链接：https://arxiv.org/abs/2106.09647 摘要：现有的理解深度学习的工作通常采用将所有依赖数据的信息压缩成几个数字的方法。在这项工作中，我们采用了一种基于个体例子作用的观点。我们介绍了一个计算困难的措施作出预测的一个给定的输入：（有效）预测深度。我们的广泛调查揭示了一个给定输入的预测深度与该数据点的模型的不确定性、置信度、准确性和学习速度之间令人惊讶而又简单的关系。我们进一步将困难的例子分为三个可解释的组，展示了这些组在深层模型中的不同处理方式，并展示了这种理解如何帮助我们提高预测精度。从我们的研究中得到的见解导致了对文献中一些单独报道的现象的一致看法：早期的层概括而后期的层记忆；早期的层收敛更快，网络首先学习简单的数据和简单的函数。摘要：Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth. Our extensive investigation reveals surprising yet simple relationships between the prediction depth of a given input and the model's uncertainty, confidence, accuracy and speed of learning for that data point. We further categorize difficult examples into three interpretable groups, demonstrate how these groups are processed differently inside deep models and showcase how this understanding allows us to improve prediction accuracy. Insights from our study lead to a coherent view of a number of separately reported phenomena in the literature: early layers generalize while later layers memorize; early layers converge faster and networks learn easy data and simple functions first.

【6】 Privacy-Preserving Eye-tracking Using Deep Learning 标题：基于深度学习的隐私保护眼动跟踪

作者：Salman Seyedi,Zifan Jiang,Allan Levey,Gari D. Clifford 机构：dept. of Biomedical Informatics, Emory School of Medicine, Atlanta, Georgia, dept. of Biomedical Engineering, Georgia Institute of Technology, dept. of Neurology 链接：https://arxiv.org/abs/2106.09621 摘要：像深度学习这样的复杂机器学习方法的广泛使用导致了人类活动识别的爆炸性增长，特别是应用于健康领域。特别是，作为一个更大的身体传感器网络系统的一部分，面部和全身分析在评估健康状况方面变得越来越普遍。然而，处理私有数据（有时是受保护的数据）的复杂模型引发了对可识别数据潜在泄漏的担忧。在这项工作中，我们集中在一个深层网络模型的情况下训练的图像个人的脸。从493名接受基于眼球跟踪的神经功能评估的个体中获取的全脸视频记录被使用。输出、梯度、中间层输出、损失和标签被用作一个带有支持向量机发射层的深层网络的输入，以识别训练数据中的成员。推理攻击方法和相关的数学分析表明，在深度学习模型中，非有意记忆面部特征的可能性很低。在本研究中，我们发现命名的模型以合理的置信度来维持训练资料的完整性。对于不同的模型，相同的过程可以在相似的条件下实现。摘要：The expanding usage of complex machine learning methods like deep learning has led to an explosion in human activity recognition, particularly applied to health. In particular, as part of a larger body sensor network system, face and full-body analysis is becoming increasingly common for evaluating health status. However, complex models which handle private and sometimes protected data, raise concerns about the potential leak of identifiable data. In this work, we focus on the case of a deep network model trained on images of individual faces. Full-face video recordings taken from 493 individuals undergoing an eye-tracking based evaluation of neurological function were used. Outputs, gradients, intermediate layer outputs, loss, and labels were used as inputs for a deep network with an added support vector machine emission layer to recognize membership in the training data. The inference attack method and associated mathematical analysis indicate that there is a low likelihood of unintended memorization of facial features in the deep learning model. In this study, it is showed that the named model preserves the integrity of training data with reasonable confidence. The same process can be implemented in similar conditions for different models.

【7】 On Anytime Learning at Macroscale 标题：浅谈大尺度下的随时随地学习

作者：Lucas Caccia,Jing Xu,Myle Ott,Marc'Aurelio Ranzato,Ludovic Denoyer 机构：Marc’Aurelio Ranzato,, Facebook AI Research, MILA - McGill University 链接：https://arxiv.org/abs/2106.09563 摘要：经典的机器学习框架假设访问一个可能很大的数据集来训练预测模型。然而，在许多实际应用中，数据并非一次全部到达，而是随时间分批到达。这就在模型的准确性和获得这种模型的时间之间建立了一种自然的权衡。一个贪婪的预测可以产生非平凡的预测，通过立即训练成批一旦这些变得可用，但它也可能作出次优利用未来的数据。另一方面，迟钝的预测器可能要等待很长时间才能将多个批聚合到一个更大的数据集中，但最终会提供更好的性能。在这项工作中，我们考虑这样一个流式学习设置，我们称之为{emanytime learning at macroscale}（ALMA）。它是一个即时学习的实例，不应用于单个数据块的级别，而是应用于整个大批量序列的级别。我们首先将这种学习设置形式化，然后引入指标来评估学习者在给定内存和计算预算的情况下在给定任务上的表现，最后我们在重新设计的标准基准上测试了几种基线方法，用于在宏观尺度上的任何时间学习。一般的发现是，模型越大，推广效果越好。特别是，如果初始模型相对较小，则随时间增长模型容量非常重要。此外，以中间速率更新模型可以在准确度和时间之间取得最佳折衷，从而获得有用的预测值。摘要：Classical machine learning frameworks assume access to a possibly large dataset in order to train a predictive model. In many practical applications however, data does not arrive all at once, but in batches over time. This creates a natural trade-off between accuracy of a model and time to obtain such a model. A greedy predictor could produce non-trivial predictions by immediately training on batches as soon as these become available but, it may also make sub-optimal use of future data. On the other hand, a tardy predictor could wait for a long time to aggregate several batches into a larger dataset, but ultimately deliver a much better performance. In this work, we consider such a streaming learning setting, which we dub {em anytime learning at macroscale} (ALMA). It is an instance of anytime learning applied not at the level of a single chunk of data, but at the level of the entire sequence of large batches. We first formalize this learning setting, we then introduce metrics to assess how well learners perform on the given task for a given memory and compute budget, and finally we test several baseline approaches on standard benchmarks repurposed for anytime learning at macroscale. The general finding is that bigger models always generalize better. In particular, it is important to grow model capacity over time if the initial model is relatively small. Moreover, updating the model at an intermediate rate strikes the best trade off between accuracy and time to obtain a useful predictor.

【8】 Machine Learning for Postprocessing Ensemble Streamflow Forecasts 标题：机器学习在后处理集成径流预报中的应用

作者：Sanjib Sharma,Ganesh Raj Ghimire,Ridwan Siddique 机构：Earth and Environmental Systems Institute, The Pennsylvania State University, University Park, PA , USA, Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN , USA 链接：https://arxiv.org/abs/2106.09547 摘要：熟练的径流预测为水政策和管理的各个领域的决策提供信息。我们将动态建模与机器学习相结合，以证明在中短期时间尺度（1-7天）内提高了径流预报的质量。动力模式通过强迫水文模式输出数值天气预报模式来生成集合径流预报。我们采用长-短期记忆（LSTM）神经网络来修正从动力学模型获得的原始集合径流预报中的预报偏差。对于预测验证，我们使用不同的指标，如技能得分和可靠性图，条件是提前期、流量阈值和季节。验证结果表明，相对于气候、时间持续性、确定性和原始集合预报而言，LSTM可以改善径流预报。LSTM展示了所有交付周期、流量阈值和季节的改进。与原始集合相比，在中距离时间尺度下，LSTM在预测技巧上的相对增益通常高于初始提前期；与中低流量相比，高流量；和凉爽的季节相比，温暖的季节。总的来说，我们的结果强调了LSTM在提高径流预测的技巧和可靠性方面的优势。摘要：Skillful streamflow forecasting informs decisions in various areas of water policy and management. We integrate dynamical modeling with machine learning to demonstrate the enhanced quality of streamflow forecasts at short-to medium-range timescales (1 - 7 days). Dynamical modeling generates ensemble streamflow forecasts by forcing a hydrological model with numerical weather prediction model outputs. We employ a Long Short-Term Memory (LSTM) neural network to correct forecast biases in raw ensemble streamflow forecasts obtained from dynamical modeling. For forecast verification, we use different metrics such as skill score and reliability diagram conditioned upon the lead time, flow threshold, and season. The verification results show that the LSTM can improve streamflow forecasts relative to climatological, temporal persistence, deterministic, and raw ensemble forecasts. The LSTM demonstrates improvement across all lead times, flow thresholds, and seasons. As compared to the raw ensembles, relative gain in forecast skill from LSTM is generally higher at medium-range timescales compared to initial lead time; high flows compared to low-moderate flows; and warm-season compared to the cool ones. Overall, our results highlight the benefits of LSTM for improving both the skill and reliability of streamflow forecasts.

【9】 Exploring the Properties and Evolution of Neural Network Eigenspaces during Training 标题：神经网络特征空间在训练过程中的性质和演化研究

作者：Mats L. Richter Leila Malihi Anne-Kathrin Patricia Windler Ulf Krumnack 机构： RichterDepartment of Cognitive ScienceUniversity of Osnabrück 49080, deLeila MalihiDepartment of Cognitive ScienceUniversity of Osnabrück 49080, deAnne-Kathrin Patricia WindlerDepartment of Cognitive ScienceUniversity of Osnabrück 49080 链接：https://arxiv.org/abs/2106.09526 摘要：在这项工作中，我们使用logistic回归探针和饱和度量来探索神经网络内部的信息处理。我们发现，问题难度和神经网络容量以对抗的方式影响预测性能，为检测给定任务的神经网络参数化过度和不足提供了可能性。我们进一步表明，观察到的效应独立于先前报道的病理模式，如cite{featurespaceu saturation}中描述的“尾模式”。最后，我们能够证明饱和模式在训练期间提前收敛，从而在分析期间允许更快的周期时间摘要：In this work we explore the information processing inside neural networks using logistic regression probes cite{probes} and the saturation metric cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given task. We further show that the observed effects are independent from previously reported pathological patterns like the ``tail pattern'' described in cite{featurespace_saturation}. Finally we are able to show that saturation patterns converge early during training, allowing for a quicker cycle time during analysis

【10】 Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity 标题：对角线性网络SGD的隐式偏差：随机性的一个可证明的好处

作者：Scott Pesme,Loucas Pillaud-Vivien,Nicolas Flammarion 机构：EPFL 链接：https://arxiv.org/abs/2106.09524 摘要：理解训练算法的内隐偏差对于解释超参数神经网络的成功是至关重要的。本文研究了对角线性网络上随机梯度下降的连续时间形式，即随机梯度流。我们明确地刻画了随机流所选择的解，并证明了它总是比梯度流具有更好的推广性质。令人惊讶的是，我们发现训练损失的收敛速度控制着偏置效应的大小：收敛越慢，偏置越好。为了完全完成我们的分析，我们为动力学提供了收敛保证。我们也给出了实验结果来支持我们的理论主张。我们的发现强调了这样一个事实，即结构噪声可以诱导更好的泛化，并且它们有助于解释在随机梯度下降的实践中观察到的更好的性能。摘要：Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow. Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. To fully complete our analysis, we provide convergence guarantees for the dynamics. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and they help explain the greater performances observed in practice of stochastic gradient descent over gradient descent.

【11】 Backward Gradient Normalization in Deep Neural Networks 标题：深度神经网络中的后向梯度归一化

作者：Alejandro Cabana,Luis F. Lago-Fernández 机构：Escuela Polit´ecnica Superior, Universidad Aut´onoma de Madrid, Madrid, Spain 链接：https://arxiv.org/abs/2106.09475 摘要：在神经网络训练中引入了一种新的梯度归一化技术。使用在网络体系结构中的某些点引入的规范化层，在向后传递期间重新缩放梯度。这些标准化节点不影响前向活动传播，但修改反向传播方程，以允许在不经历消失或爆炸的情况下达到最深网络层的良好比例梯度流。对深度神经网络的实验结果表明，该方法能有效地控制梯度范数，允许最深层权值的更新，并能在多种实验条件下提高网络精度。摘要：We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions.

【12】 On the Dark Side of Calibration for Modern Neural Networks 标题：论现代神经网络校准的阴暗面

作者：Aditya Singh,Alessandro Bay,Biswa Sengupta,Andrea Mirabile 机构：Zebra AI, Zebra Technologies, London, United Kingdom 备注：15 pages including references and supplemental 链接：https://arxiv.org/abs/2106.09385 摘要：现代神经网络是高度未校准的。如何可靠地利用深度神经网络（DNNs）是安全关键系统面临的重大挑战。最近提出的许多方法在改进DNN校准方面取得了实质性进展。然而，他们几乎没有涉及到细化，这在历史上一直是校准的一个重要方面。细化表示网络正确预测和错误预测的可分性。本文提出了一个理论和经验支持的论述，审查模型的校准和完善。首先，我们将期望校准误差分解为预测置信度和精细化。结合这个结果，我们强调基于正则化的校准只关注于天真地降低模型的可信度。从逻辑上讲，这对模型的改进有严重的不利影响。我们通过对标准数据集上许多最先进的校准方法进行严格的经验评估来支持我们的主张。我们发现，许多校准方法，如标签平滑，混合等，降低了效用的DNN降低其细化。即使在自然数据移动的情况下，这种校准优化的权衡也适用于大多数校准方法。这些发现要求对现代DNN校准所采用的一些流行途径进行紧急回顾。摘要：Modern neural networks are highly uncalibrated. It poses a significant challenge for safety-critical systems to utilise deep neural networks (DNNs), reliably. Many recently proposed approaches have demonstrated substantial progress in improving DNN calibration. However, they hardly touch upon refinement, which historically has been an essential aspect of calibration. Refinement indicates separability of a network's correct and incorrect predictions. This paper presents a theoretically and empirically supported exposition for reviewing a model's calibration and refinement. Firstly, we show the breakdown of expected calibration error (ECE), into predicted confidence and refinement. Connecting with this result, we highlight that regularisation based calibration only focuses on naively reducing a model's confidence. This logically has a severe downside to a model's refinement. We support our claims through rigorous empirical evaluations of many state of the art calibration approaches on standard datasets. We find that many calibration approaches with the likes of label smoothing, mixup etc. lower the utility of a DNN by degrading its refinement. Even under natural data shift, this calibration-refinement trade-off holds for the majority of calibration methods. These findings call for an urgent retrospective into some popular pathways taken for modern DNN calibration.

【13】 Large Scale Private Learning via Low-rank Reparametrization 标题：基于低阶再参数化的大规模私人学习

作者：Da Yu,Huishuai Zhang,Wei Chen,Jian Yin,Tie-Yan Liu 机构： learning with differential privacy (Dwork 1The School of Data and Computer Science & GuangdongKey Laboratory of Big Data Analysis and Processing, Sun Yat-sen University 备注：Published as a conference paper in International Conference on Machine Learning (ICML 2021). Source code available at this https URL 链接：https://arxiv.org/abs/2106.09352 摘要：我们提出了一种重参数化方案来解决在大型神经网络上应用差分私有SGD所面临的挑战：1）存储单个梯度的巨大内存开销，2）具有维数依赖性的附加噪声。具体地说，我们用两个小维的梯度载流子矩阵和一个剩余权重矩阵对每个权重矩阵进行重新参数化。我们认为这种重新参数化保持了前向/后向过程不变，同时使我们能够计算投影梯度而不计算梯度本身。为了使用差分保密性进行学习，我们设计了一个{重参数化梯度扰动（RGP）}，它扰动梯度载波矩阵上的梯度，并从噪声梯度重建原始权重的更新。重要的是，我们使用历史更新来寻找梯度载波矩阵，其最优性在线性回归下得到了严格的证明，并通过深度学习任务进行了实证验证。RGP显著降低了内存开销，提高了实用性。例如，我们是第一个能够在BERT模型上应用差异隐私的人，并且在$epsilon=8$的四个下游任务上实现了83.9%$的平均准确率，与非隐私基线相比，这在$5%$损失范围内，但是隐私泄露风险要低得多。摘要：We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence. Specifically, we reparametrize each weight matrix with two emph{gradient-carrier} matrices of small dimension and a emph{residual weight} matrix. We argue that such reparametrization keeps the forward/backward process unchanged while enabling us to compute the projected gradient without computing the gradient itself. To learn with differential privacy, we design emph{reparametrized gradient perturbation (RGP)} that perturbs the gradients on gradient-carrier matrices and reconstructs an update for the original weight from the noisy gradients. Importantly, we use historical updates to find the gradient-carrier matrices, whose optimality is rigorously justified under linear regression and empirically verified with deep learning tasks. RGP significantly reduces the memory cost and improves the utility. For example, we are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9%$ on four downstream tasks with $epsilon=8$, which is within $5%$ loss compared to the non-private baseline but enjoys much lower privacy leakage risk.

【14】 A Simple Generative Network 标题：一种简单的产生式网络

作者：Daniel N. Nissani 链接：https://arxiv.org/abs/2106.09330 摘要：生成型神经网络能够模拟复杂的概率分布，如手写文本、自然图像等。自其诞生以来，人们提出了几种模型。其中最成功的是基于对抗（GAN）、自动编码（VAE）和最大平均差异（MMD）的相对复杂的体系结构和方案。令人惊讶的是，一个非常简单的架构（一个单一的前向神经网络）和一个明显的优化目标（KullbackèLeibler散度）显然被忽略了。本文证明了这样一个模型（因其简单性而被称为SGN）能够在视觉上和数量上产生与前面提到的最先进的方法相比具有竞争力的样本。摘要：Generative neural networks are able to mimic intricate probability distributions such as those of handwritten text, natural images, etc. Since their inception several models were proposed. The most successful of these were based on adversarial (GAN), auto-encoding (VAE) and maximum mean discrepancy (MMD) relatively complex architectures and schemes. Surprisingly, a very simple architecture (a single feed-forward neural network) in conjunction with an obvious optimization goal (Kullback_Leibler divergence) was apparently overlooked. This paper demonstrates that such a model (denoted SGN for its simplicity) is able to generate samples visually and quantitatively competitive as compared with the fore-mentioned state of the art methods.

【15】 Pruning Randomly Initialized Neural Networks with Iterative Randomization 标题：用迭代随机化方法修剪随机初始化的神经网络

作者：Daiki Chijiwa,Shin'ya Yamaguchi,Yasutoshi Ida,Kenji Umakoshi,Tomohiro Inoue 机构：Shin’ya Yamaguchi, NTT Software Innovation Center, NTT Corporation 备注：Code will be available at this https URL 链接：https://arxiv.org/abs/2106.09269 摘要：随机初始化神经网络权值的剪枝在彩票假设中起着重要的作用。Ramanujan等人（2020）的经验表明，只有删减权重才能获得显著的性能，而不是优化权重值。然而，为了达到与权值优化相同的性能水平，剪枝方法在剪枝之前需要网络中的更多参数，从而需要更多的内存空间。为了克服这个参数无效的问题，我们引入了一个新的框架，用迭代随机权值（IteRand）修剪随机初始化的神经网络。理论上，我们在我们的框架中证明了一个逼近定理，这表明随机化操作可以有效地减少所需的参数数目。在CIFAR-10和ImageNet上进行了多次实验，验证了参数的有效性。摘要：Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet.

【16】 Joining datasets via data augmentation in the label space for neural networks 标题：神经网络在标签空间中通过数据扩充连接数据集

作者：Jake Zhao,Mingfeng Ou,Linji Xue,Yunkai Cui,Sai Wu,Gang Chen 机构： Zhejiang University, China 3Department of Software Engineering, TongjiUniversity 备注：Accepted in ICML 2021. Jake Zhao and Mingfeng Ou contributed equally 链接：https://arxiv.org/abs/2106.09260 摘要：大多数（如果不是全部的话）现代深度学习系统仅限于一个用于神经网络训练和推理的数据集。在本文中，我们感兴趣的是系统化的方法来连接具有类似目的的数据集。与以往文献中普遍将数据集连接到不可理解的潜在向量空间的方法不同，本文方法的核心是在标签空间中进行增广。解决用于数据集连接的标签空间的主要挑战是标签之间的差异：不重叠的标签注释集、不同的标签粒度或层次结构等。值得注意的是，我们提出了一种利用人工创建的知识图的新技术，递归神经网络和策略梯度成功实现了数据集在标签空间的连接。图像和文本分类的实验结果证明了该方法的有效性。摘要：Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes. Unlike previous published works that ubiquitously conduct the dataset joining in the uninterpretable latent vectorial space, the core to our method is an augmentation procedure in the label space. The primary challenge to address the label space for dataset joining is the discrepancy between labels: non-overlapping label annotation sets, different labeling granularity or hierarchy and etc. Notably we propose a new technique leveraging artificially created knowledge graph, recurrent neural networks and policy gradient that successfully achieve the dataset joining in the label space. Empirical results on both image and text classification justify the validity of our approach.

【17】 Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations 标题：看待不同，行动相似：异质观察的模仿学习

作者：Xin-Qiang Cai,Yao-Xiang Ding,Zi-Xuan Chen,Yuan Jiang,Masashi Sugiyama,Zhi-Hua Zhou 机构：National Key Laboratory for Novel Software Technology Nanjing University, Nanjing, China., RIKEN Center for Advanced Intelligence Project, Tokyo, Japan., The University of Tokyo, Tokyo, Japan. 备注：17 pages, 25 figures 链接：https://arxiv.org/abs/2106.09256 摘要：在许多真实世界的模仿学习任务中，演示者和学习者必须在不同但充分的观察空间中行动。这种情况对现有的模仿学习方法产生了重大的障碍，即使它们与传统的空间适应技术相结合。主要的挑战在于在不同的观察空间下，将专家的占用度量与学习者动态变化的占用度量联系起来。在这项工作中，我们将上述学习问题建模为异质观察模仿学习（HOIL）。我们提出了基于重要性加权、拒绝学习和主动查询技术的重要性加权拒绝算法（IWRE）来解决占用度量匹配的关键问题。实验结果表明，IWRE能够成功地解决HOIL任务，包括将基于视觉的演示转化为Atari域下基于随机存取存储器（RAM）策略的挑战性任务。摘要：In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's occupancy measures to learner's dynamically changing occupancy measures under the different observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Imitation Learning (HOIL). We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching. Experimental results show that IWRE can successfully solve HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.

【18】 CoANE: Modeling Context Co-occurrence for Attributed Network Embedding 标题：Coane：属性网络嵌入的上下文共现建模

作者：I-Chung Hsieh,Cheng-Te Li 机构： National Cheng Kung University 备注：Accepted to IEEE TKDE 2021. Code can be accessed via this https URL 链接：https://arxiv.org/abs/2106.09241 摘要：属性网络嵌入（ANE）是学习低维向量，使网络结构和节点属性都能保持在嵌入空间中。现有的ANE模型没有考虑图的结构和属性之间的具体组合。虽然每个节点都有其自身的结构特征，例如高度互联的邻居及其特定的属性分布模式，但每个节点的邻居不应仅由多跳节点来描述，而应考虑特定的集群或社交圈。为了对这些信息进行建模，本文提出了一种新的ANE模型&上下文共现感知属性网络嵌入（CoANE）。CoANE的基本思想是对每个节点所涉及的不同模式的上下文属性进行建模，并将每个属性作为一个通道，应用卷积机制对位置信息进行编码。上下文共现的学习可以捕获每个节点潜在的社交圈。为了更好地编码节点的结构知识和语义知识，我们设计了一个由正图似然、上下文负采样和属性重构组成的三向目标函数。我们在五个真实数据集上进行了链接预测、节点标签分类和节点聚类的实验。结果表明，CoANE模型的性能明显优于现有的ANE模型。摘要：Attributed network embedding (ANE) is to learn low-dimensional vectors so that not only the network structure but also node attributes can be preserved in the embedding space. Existing ANE models do not consider the specific combination between graph structure and attributes. While each node has its structural characteristics, such as highly-interconnected neighbors along with their certain patterns of attribute distribution, each node's neighborhood should be not only depicted by multi-hop nodes, but consider certain clusters or social circles. To model such information, in this paper, we propose a novel ANE model, Context Co-occurrence-aware Attributed Network Embedding (CoANE). The basic idea of CoANE is to model the context attributes that each node's involved diverse patterns, and apply the convolutional mechanism to encode positional information by treating each attribute as a channel. The learning of context co-occurrence can capture the latent social circles of each node. To better encode structural and semantic knowledge of nodes, we devise a three-way objective function, consisting of positive graph likelihood, contextual negative sampling, and attribute reconstruction. We conduct experiments on five real datasets in the tasks of link prediction, node label classification, and node clustering. The results exhibit that CoANE can significantly outperform state-of-the-art ANE models.

【19】 Learning from Demonstration without Demonstrations 标题：从没有示范的示范中学习

作者：Tom Blau,Gilad Francis,Philippe Morere 机构：au† School of Computer Science, The University of Sydney 备注：International Conference on Robotics and Automation (ICRA), 2021. arXiv admin note: substantial text overlap with arXiv:2001.06940 链接：https://arxiv.org/abs/2106.09203 摘要：最新的强化学习（RL）算法具有较高的样本复杂度，特别是在稀疏奖励情况下。缓解这个问题的一个流行策略是通过模仿一组专家演示来学习控制策略。这种方法的缺点是，专家需要进行演示，这在实践中可能代价高昂。为了解决这个缺点，我们提出了演示发现的概率规划（P2D2），这是一种无需专家访问就可以自动发现演示的技术。我们将发现演示描述为一个搜索问题，并利用广泛使用的规划算法（如快速探索随机树）来发现演示轨迹。这些演示用于初始化策略，然后通过通用RL算法进行优化。我们提供了P2D2找到成功轨迹的理论保证，以及它的采样复杂度的界。实验证明，在一系列经典控制和机器人任务中，该方法的性能优于经典的和内在的探索RL技术，只需要一小部分的探索样本，并且获得了更好的渐近性能。摘要：State-of-the-art reinforcement learning (RL) algorithms suffer from high sample complexity, particularly in the sparse reward case. A popular strategy for mitigating this problem is to learn control policies by imitating a set of expert demonstrations. The drawback of such approaches is that an expert needs to produce demonstrations, which may be costly in practice. To address this shortcoming, we propose Probabilistic Planning for Demonstration Discovery (P2D2), a technique for automatically discovering demonstrations without access to an expert. We formulate discovering demonstrations as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree to find demonstration trajectories. These demonstrations are used to initialize a policy, then refined by a generic RL algorithm. We provide theoretical guarantees of P2D2 finding successful trajectories, as well as bounds for its sampling complexity. We experimentally demonstrate the method outperforms classic and intrinsic exploration RL techniques in a range of classic control and robotics tasks, requiring only a fraction of exploration samples and achieving better asymptotic performance.

【20】 Insights into Data through Model Behaviour: An Explainability-driven Strategy for Data Auditing for Responsible Computer Vision Applications 标题：通过模型行为洞察数据：用于负责任的计算机视觉应用的数据审计的可解释性驱动的策略

作者：Alexander Wong,Adam Dorfman,Paul McInnis,Hayden Gunraj 机构：Department of Systems Design Engineering, University of Waterloo, Waterloo Artificial Intelligence Institute, DarwinAI Corp. 备注：4 pages 链接：https://arxiv.org/abs/2106.09177 摘要：在这项研究中，我们从一个角度出发，探索了一种解释性驱动的数据审计策略，即通过对虚拟模型原型暴露于数据时的行为进行定量解释的角度，发现对手头数据的可操作见解。我们通过审计两个流行的医学基准数据集来证明这一策略，并发现隐藏的数据质量问题，这些问题导致深度学习模型出于错误的原因进行预测。从这种可解释性驱动的数据审计策略中获得的可操作的见解被用来解决发现的问题，从而能够创建具有适当预测行为的高性能深度学习模型。希望这种可解释性驱动的策略可以作为数据驱动策略的补充，有助于更负责任地开发计算机视觉应用中的机器学习算法。摘要：In this study, we take a departure and explore an explainability-driven strategy to data auditing, where actionable insights into the data at hand are discovered through the eyes of quantitative explainability on the behaviour of a dummy model prototype when exposed to data. We demonstrate this strategy by auditing two popular medical benchmark datasets, and discover hidden data quality issues that lead deep learning models to make predictions for the wrong reasons. The actionable insights gained from this explainability driven data auditing strategy is then leveraged to address the discovered issues to enable the creation of high-performing deep learning models with appropriate prediction behaviour. The hope is that such an explainability-driven strategy can be complimentary to data-driven strategies to facilitate for more responsible development of machine learning algorithms for computer vision applications.

【21】 Can I Be of Further Assistance? Using Unstructured Knowledge Access to Improve Task-oriented Conversational Modeling 标题：我能为您做些什么吗？利用非结构化知识获取改进面向任务的会话建模

作者：Di Jin,Seokhwan Kim,Dilek Hakkani-Tur 机构：Amazon Alexa AI 备注：Presented as a DIALDOC workshop paper at ACL 2021 链接：https://arxiv.org/abs/2106.09174 摘要：大多数以前关于面向任务的对话系统的工作都局限于对域api的有限覆盖。但是，用户经常有超出这些api范围的请求。这项工作的重点是通过合并外部的、非结构化的知识源来响应这些超出API覆盖范围的用户转向。我们的方法以流水线的方式工作，依次进行知识搜索、转弯检测、知识选择和响应生成。我们在前两个步骤中引入了新的数据扩充方法，并证明了使用从对话上下文中提取的信息可以提高知识选择和端到端性能。通过实验，我们在DSTC9 Track 1基准数据集上实现了自动和人工评估指标的最新性能，验证了我们贡献的有效性。摘要：Most prior work on task-oriented dialogue systems are restricted to limited coverage of domain APIs. However, users oftentimes have requests that are out of the scope of these APIs. This work focuses on responding to these beyond-API-coverage user turns by incorporating external, unstructured knowledge sources. Our approach works in a pipelined manner with knowledge-seeking turn detection, knowledge selection, and response generation in sequence. We introduce novel data augmentation methods for the first two steps and demonstrate that the use of information extracted from dialogue context improves the knowledge selection and end-to-end performances. Through experiments, we achieve state-of-the-art performance for both automatic and human evaluation metrics on the DSTC9 Track 1 benchmark dataset, validating the effectiveness of our contributions.

【22】 FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator 标题：形式：基于细粒度极化ReRAM的混合信号DNN加速器现场计算

作者：Geng Yuan,Payman Behnam,Zhengang Li,Ali Shafiee,Sheng Lin,Xiaolong Ma,Hang Liu,Xuehai Qian,Mahdi Nazm Bojnordi,Yanzhi Wang,Caiwen Ding 机构：Northeastern University,Georgia Institute of Technology,Samsung,Stevens Institute of Technology, University of Southern California,University of Utah,University of Connecticut 备注：In Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA), 2021 链接：https://arxiv.org/abs/2106.09144 摘要：最近的工作证明了使用电阻随机存取存储器（ReRAM）作为一种新兴技术来执行固有的并行模拟域原位矩阵矢量乘法的前景——这是DNNs中的密集和关键计算。以ReRAM纵横制单元中存储的权值作为电导，当输入向量应用于字线时，矩阵向量乘法结果可以作为位线中的电流产生。一个关键问题是，权重可以是正的，也可以是负的，但现场计算假定每个横杆柱上的所有单元都具有相同的符号。当前的体系结构要么使用两个ReRAM横杆来表示正权重和负权重，要么为权重添加一个偏移量，以便所有值都变为正。这两种解决方案都不理想：它们要么使纵横杆的成本增加一倍，要么产生额外的偏移电路。为了更好地解决这一问题，本文提出了一种基于ReRAM的细粒度偏振光DNN加速器。我们的关键设计原则不是试图表示正/负权重，而是严格执行在现场计算中假设的内容——确保横杆的同一列中的所有权重具有相同的符号。它自然避免了额外横杆的成本。利用交替方向乘子法（ADMM）正则化优化方法可以很好地生成这样的权值，该方法可以精确地执行DNN权值中的某些模式。为了获得高精度，我们建议使用细粒度子数组列，这为输入零跳过提供了独特的机会，显著避免了不必要的计算。它还使硬件更易于实现。在相同的优化模型下，与ISAAC相比，FORMS在相同的面积成本下获得了显著的吞吐量提高和每秒帧速的提高。摘要：Recent works demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication -- the intensive and key computation in DNNs. With weights stored in the ReRAM crossbar cells as conductance, when the input vector is applied to word lines, the matrix-vector multiplication results can be generated as the current in bit lines. A key problem is that the weight can be either positive or negative, but the in-situ computation assumes all cells on each crossbar column with the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights, or add an offset to weights so that all values become positive. Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better solve this problem, this paper proposes FORMS, a fine-grained ReRAM-based DNN accelerator with polarized weights. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computation -- ensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we propose to use fine-grained sub-array columns, which provide a unique opportunity for input zero-skipping, significantly avoiding unnecessary computations. It also makes the hardware much easier to implement. Putting all together, with the same optimized models, FORMS achieves significant throughput improvement and speed up in frame per second over ISAAC with similar area cost.

【23】 A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness 标题：赢家：压缩深度网络可以提高分布外健壮性

作者：James Diffenderfer,Brian R. Bartoldson,Shreya Chaganti,Jize Zhang,Bhavya Kailkhura 机构：Lawrence Livermore National Laboratory 链接：https://arxiv.org/abs/2106.09129 摘要：在野外成功采用深度学习的两个关键要求是：（1）对分布转移的鲁棒性，（2）实现效率的模型紧凑性。不幸的是，在不牺牲精度的情况下同时实现分布外（OOD）鲁棒性和极端模型紧凑性的努力大多是不成功的。这提出了一个重要的问题：“无法创建紧凑、准确和健壮的深层神经网络（CARDs）是根本吗？”为了回答这个问题，我们对一系列流行的模型压缩技术进行了大规模分析，揭示了几个有趣的模式。值得注意的是，与传统的剪枝方法（例如微调和渐进幅度剪枝）相比，我们发现“彩票式”剪枝方法可以令人惊讶地用于创建高性能的卡片。具体来说，我们能够创建非常紧凑的卡片，这些卡片在匹配（或击败）它们的测试精度的同时，比它们显著更大和全精度的对应卡片更加健壮，只需修剪和/或量化。为了更好地理解这些差异，我们在傅立叶域中对使用不同数据增强方法训练的卡片进行灵敏度分析。基于我们的分析，我们开发了一种简单的域自适应测试时间置乱方法（CARD-Deck），该方法使用一个选通模块，根据它们与测试样本的光谱相似性，从CARD-Deck中动态选择合适的卡。通过利用不同压缩模型的互补频率偏差，所提出的方法构建了一个卡片的“赢家”，建立了CIFAR-10-C精确度的最新水平（即96.8%干净，92.75%健壮），其内存使用率显著优于非压缩模型。我们也提出了一些理论证据来支持我们的实证研究结果。摘要：Two crucial requirements for a successful adoption of deep learning (DL) in the wild are: (1) robustness to distributional shifts, and (2) model compactness for achieving efficiency. Unfortunately, efforts towards simultaneously achieving Out-of-Distribution (OOD) robustness and extreme model compactness without sacrificing accuracy have mostly been unsuccessful. This raises an important question: "Is the inability to create compact, accurate, and robust deep neural networks (CARDs) fundamental?" To answer this question, we perform a large-scale analysis for a range of popular model compression techniques which uncovers several intriguing patterns. Notably, in contrast to traditional pruning approaches (e.g., fine tuning and gradual magnitude pruning), we find that "lottery ticket-style" pruning approaches can surprisingly be used to create high performing CARDs. Specifically, we are able to create extremely compact CARDs that are dramatically more robust than their significantly larger and full-precision counterparts while matching (or beating) their test accuracy, simply by pruning and/or quantizing. To better understand these differences, we perform sensitivity analysis in the Fourier domain for CARDs trained using different data augmentation methods. Motivated by our analysis, we develop a simple domain-adaptive test-time ensembling approach (CARD-Deck) that uses a gating module to dynamically select an appropriate CARD from the CARD-Deck based on their spectral-similarity with test samples. By leveraging complementary frequency biases of different compressed models, the proposed approach builds a "winning hand" of CARDs that establishes a new state-of-the-art on CIFAR-10-C accuracies (i.e., 96.8% clean and 92.75% robust) with dramatically better memory usage than their non-compressed counterparts. We also present some theoretical evidences supporting our empirical findings.

【24】 Scaling-up Diverse Orthogonal Convolutional Networks with a Paraunitary Framework 标题：具有准么正框架的放大不同正交卷积网络

作者：Jiahao Su,Wonmin Byeon,Furong Huang 机构：University of Maryland, College Park, NVIDIA Research, NVIDIA Corporation 链接：https://arxiv.org/abs/2106.09121 摘要：加强神经网络的正交性是解决梯度消失/爆炸问题、对抗性扰动的敏感性和边界泛化误差的一种方法。然而，以前的许多方法都是启发式的，卷积层的正交性没有得到系统的研究：有些设计不是完全正交的，而有些设计只考虑标准卷积层并提出了具体的实现类别。为了解决这个问题，我们提出了一个正交卷积层的理论框架，该框架建立了空间域中各种正交卷积层与谱域中准酉系统之间的等价性。由于准酉系统存在完全谱分解，任何正交卷积层都可以参数化为空间滤波器的卷积。我们的框架赋予了各种卷积层很高的表达能力，同时保持了它们的精确正交性。此外，与以前的设计相比，我们的层对于深度网络具有内存和计算效率。我们的多功能框架，第一次使我们能够研究深度正交网络的体系结构设计，例如跳跃连接、初始化、跨步和扩展的选择。因此，我们将正交网络扩展到深层架构，包括ResNet、WideResNet和ShuffleNet，大大提高了传统浅层正交网络的性能。摘要：Enforcing orthogonality in neural networks is an antidote for gradient vanishing/exploding problems, sensitivity by adversarial perturbation, and bounding generalization errors. However, many previous approaches are heuristic, and the orthogonality of convolutional layers is not systematically studied: some of these designs are not exactly orthogonal, while others only consider standard convolutional layers and propose specific classes of their realizations. To address this problem, we propose a theoretical framework for orthogonal convolutional layers, which establishes the equivalence between various orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain. Since there exists a complete spectral factorization of paraunitary systems, any orthogonal convolution layer can be parameterized as convolutions of spatial filters. Our framework endows high expressive power to various convolutional layers while maintaining their exact orthogonality. Furthermore, our layers are memory and computationally efficient for deep networks compared to previous designs. Our versatile framework, for the first time, enables the study of architecture designs for deep orthogonal networks, such as choices of skip connection, initialization, stride, and dilation. Consequently, we scale up orthogonal networks to deep architectures, including ResNet, WideResNet, and ShuffleNet, substantially increasing the performance over the traditional shallow orthogonal networks.

【25】 Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL 标题：行为先验和动力学模型：改进离线RL中的性能和域转移

作者：Catherine Cang,Aravind Rajeswaran,Pieter Abbeel,Michael Laskin 机构： UC Berkeley, Facebook AI Research 链接：https://arxiv.org/abs/2106.09119 摘要：离线强化学习（RL）的目的是从不完全的离线数据中提取接近最优的策略，而不需要额外的环境交互。从不同的离线数据集中提取策略有可能扩大RL的适用范围，使训练过程更安全、更快、更精简。研究了如何提高离线RL算法的性能、对离线数据质量的鲁棒性和泛化能力。为此，我们引入了基于离线模型的自适应行为先验RL（MABE）。我们的算法基于这样一个发现：支持域内泛化的动力学模型和支持跨域泛化的行为先验是互补的。当它们结合在一起时，它们大大提高了离线RL策略的性能和通用性。在广泛研究的D4RL离线RL基准测试中，我们发现MABE比以前的无模型和基于模型的算法具有更高的平均性能。在需要跨域泛化的实验中，我们发现MABE的性能优于先前的方法。我们的网站在https://sites.google.com/berkeley.edu/mabe . 摘要：Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions. Extracting policies from diverse offline datasets has the potential to expand the range of applicability of RL by making the training process safer, faster, and more streamlined. We investigate how to improve the performance of offline RL algorithms, its robustness to the quality of offline data, as well as its generalization capabilities. To this end, we introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE). Our algorithm is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. When combined together, they substantially improve the performance and generalization of offline RL policies. In the widely studied D4RL offline RL benchmark, we find that MABE achieves higher average performance compared to prior model-free and model-based algorithms. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods. Our website is available at https://sites.google.com/berkeley.edu/mabe .

【26】 DeepSplit: Scalable Verification of Deep Neural Networks via Operator Splitting 标题：DeepSplit：基于算子分裂的可伸缩深度神经网络验证

作者：Shaoru Chen,Eric Wong,J. Zico Kolter,Mahyar Fazlyab 机构： Massachusetts Institute of Technology, Carnegie Mellon University 链接：https://arxiv.org/abs/2106.09117 摘要：分析深层神经网络在输入扰动下最坏情况下的性能相当于解决一个大规模的非凸优化问题，对于这个问题，过去的一些工作已经提出了凸松弛作为一种很有前途的替代方法。然而，即使对于大小合理的神经网络，这些松弛也是不易处理的，因此在实践中必须用更弱的松弛来代替。在这项工作中，我们提出了一种新的算子分裂方法，可以直接解决问题的凸松弛高精度，通过分裂成更小的子问题，往往有解析解。该方法是模块化的，并可扩展到以前由于其大小而无法精确解决的问题实例。此外，解算器的运算可以通过GPU加速进行快速并行化。我们证明了我们的方法在获得更严格的界限对最坏情况下的表现大型卷积网络在图像分类和强化学习设置。摘要：Analyzing the worst-case performance of deep neural networks against input perturbations amounts to solving a large-scale non-convex optimization problem, for which several past works have proposed convex relaxations as a promising alternative. However, even for reasonably-sized neural networks, these relaxations are not tractable, and so must be replaced by even weaker relaxations in practice. In this work, we propose a novel operator splitting method that can directly solve a convex relaxation of the problem to high accuracy, by splitting it into smaller sub-problems that often have analytical solutions. The method is modular and scales to problem instances that were previously impossible to solve exactly due to their size. Furthermore, the solver operations are amenable to fast parallelization with GPU acceleration. We demonstrate our method in obtaining tighter bounds on the worst-case performance of large convolutional networks in image classification and reinforcement learning settings.

【27】 On the training of sparse and dense deep neural networks: less parameters, same performance 标题：关于稀疏和稠密深度神经网络的训练：参数少，性能相同

作者：Lorenzo Chicchi,Lorenzo Giambagli,Lorenzo Buffoni,Timoteo Carletti,Marco Ciavarella,Duccio Fanelli 机构：Dipartimento di Fisica e Astronomia, Universit´a di Firenze, INFN and CSDC, Via Sansone , Sesto Fiorentino, Firenze, Italy and, naXys, Namur Institute for Complex Systems, University of Namur, Belgium 链接：https://arxiv.org/abs/2106.09021 摘要：深度神经网络可以通过作用于直接空间中合适的转移算子的特征值和特征向量，在倒数空间中进行训练。调整特征值，同时冻结特征向量，产生了参数空间的大量压缩。后者根据计算神经元的数量来定义。然而，对于相同的体系结构和采用全套可训练参数（二次依赖于相邻层的大小）的情况下，通过显示精度测量的分类分数低于在直接空间进行学习时获得的分数。在这封信中，我们提出了一种光谱学习方法的变体，如Giambagli等人{Nat。Comm.}2021，它利用两组特征值，用于相邻层之间的每个映射。特征值就像一个真正的旋钮，可以自由调节，以便（i）增强或消除输入节点的贡献，（ii）用一种机制调节接收节点的兴奋性，我们将其解释为稳态塑性的人工模拟。可训练参数的数目仍然是网络大小的线性函数，但是训练设备的性能更接近通过传统算法获得的性能，但是后者需要相当大的计算成本。通过对特征向量矩阵的非平凡块进行适当的分解，最终可以填补常规训练和谱训练之间的剩余差距。每个谱参数都反映了一组节点间的权值，与传统方法训练的同系物相比，我们应该有效地利用这个属性来生成具有惊人分类能力的稀疏网络。摘要：Deep neural networks can be trained in reciprocal space, by acting on the eigenvalues and eigenvectors of suitable transfer operators in direct space. Adjusting the eigenvalues, while freezing the eigenvectors, yields a substantial compression of the parameter space. This latter scales by definition with the number of computing neurons. The classification scores, as measured by the displayed accuracy, are however inferior to those attained when the learning is carried in direct space, for an identical architecture and by employing the full set of trainable parameters (with a quadratic dependence on the size of neighbor layers). In this Letter, we propose a variant of the spectral learning method as appeared in Giambagli et al {Nat. Comm.} 2021, which leverages on two sets of eigenvalues, for each mapping between adjacent layers. The eigenvalues act as veritable knobs which can be freely tuned so as to (i) enhance, or alternatively silence, the contribution of the input nodes, (ii) modulate the excitability of the receiving nodes with a mechanism which we interpret as the artificial analogue of the homeostatic plasticity. The number of trainable parameters is still a linear function of the network size, but the performances of the trained device gets much closer to those obtained via conventional algorithms, these latter requiring however a considerably heavier computational cost. The residual gap between conventional and spectral trainings can be eventually filled by employing a suitable decomposition for the non trivial block of the eigenvectors matrix. Each spectral parameter reflects back on the whole set of inter-nodes weights, an attribute which we shall effectively exploit to yield sparse networks with stunning classification abilities, as compared to their homologues trained with conventional means.

【28】 Spectral goodness-of-fit tests for complete and partial network data 标题：完全和部分网络数据的光谱拟合优度检验

作者：Shane Lubold,Bolun Liu,Tyler H. McCormick 机构： Research reported in this publication was supported by the National Institute Of MentalHealth of the National Institutes of Health under Award Number DP 2MH 1 2 2 40 5 链接：https://arxiv.org/abs/2106.09702 摘要：网络描述了个体参与者之间的关系，这种关系往往很复杂。在这项工作中，我们讨论如何确定一个参数模型，如随机块模型或潜在空间模型，是否适合一个数据集，并将外推到类似的数据。我们使用随机矩阵理论的最新结果来推导二进数据的一般拟合优度检验。我们表明，我们的方法，当应用于一个特定的模型的兴趣，提供了一个简单的，计算速度快的方法来选择参数在一些常用的网络模型。例如，我们展示了如何在潜在空间模型中选择潜在空间的维数。与其他网络拟合优度方法不同，我们的通用方法不需要从候选参数模型进行模拟，这对于大型图来说可能很麻烦，并且不需要在图上选择一组特定的统计数据进行比较。它还允许我们对部分网络数据（如聚合关系数据）执行拟合优度测试。我们的模拟结果显示，我们的方法在许多感兴趣的情况下表现良好。我们分析了几个经验相关的网络，并表明我们的方法导致改进的社区检测算法。Github上提供了实现我们方法的R代码。摘要：Networks describe the, often complex, relationships between individual actors. In this work, we address the question of how to determine whether a parametric model, such as a stochastic block model or latent space model, fits a dataset well and will extrapolate to similar data. We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data. We show that our method, when applied to a specific model of interest, provides an straightforward, computationally fast way of selecting parameters in a number of commonly used network models. For example, we show how to select the dimension of the latent space in latent space models. Unlike other network goodness-of-fit methods, our general approach does not require simulating from a candidate parametric model, which can be cumbersome with large graphs, and eliminates the need to choose a particular set of statistics on the graph for comparison. It also allows us to perform goodness-of-fit tests on partial network data, such as Aggregated Relational Data. We show with simulations that our method performs well in many situations of interest. We analyze several empirically relevant networks and show that our method leads to improved community detection algorithms. R code to implement our method is available on Github.

【29】 Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison 标题：后处理阵风集合预报的机器学习方法：系统比较

作者：Benedikt Schulz,Sebastian Lerch 机构：Karlsruhe Institute of Technology, Heidelberg Institute for Theoretical Studies 链接：https://arxiv.org/abs/2106.09512 摘要：对集合天气预报进行后处理以纠正系统误差已成为研究和业务中的标准做法。然而，尽管阵风预报在灾害性天气预警中具有重要意义，但目前对阵风预报的集成后处理研究较少。在这里，我们提供了一个全面的回顾和系统的比较8统计和机器学习方法的概率阵风预报集成后处理，可分为三组：国家最先进的后处理技术，从统计（集成模型输出统计（EMOS），逐员后处理、等张分布回归）、已建立的机器学习方法（梯度增强扩展EMOS、分位数回归森林）和基于神经网络的方法（分布回归网络、Bernstein分位数网络、直方图估计网络）。利用德国气象局运行的高分辨率、允许对流的集合预报系统6年的数据和德国175个地面气象站的逐时观测资料，系统地比较了这些方法。虽然所有的后处理方法都能产生校准的预报，并能修正原始集合预报的系统误差，但将来自阵风以外的其他气象预报变量的信息结合起来，可显著提高预报技巧。特别是，我们提出了一个灵活的局部自适应神经网络框架，以不同的概率预测类型作为输出，不仅显著优于所有的基准后处理方法，而且还学习了与日周期相关的物理一致性关系，特别是行星边界层的夜间转变。摘要：Postprocessing ensemble weather predictions to correct systematic errors has become a standard practice in research and operations. However, only few recent studies have focused on ensemble postprocessing of wind gust forecasts, despite its importance for severe weather warnings. Here, we provide a comprehensive review and systematic comparison of eight statistical and machine learning methods for probabilistic wind gust forecasting via ensemble postprocessing, that can be divided in three groups: State of the art postprocessing techniques from statistics (ensemble model output statistics (EMOS), member-by-member postprocessing, isotonic distributional regression), established machine learning methods (gradient-boosting extended EMOS, quantile regression forests) and neural network-based approaches (distributional regression network, Bernstein quantile network, histogram estimation network). The methods are systematically compared using six years of data from a high-resolution, convection-permitting ensemble prediction system that was run operationally at the German weather service, and hourly observations at 175 surface weather stations in Germany. While all postprocessing methods yield calibrated forecasts and are able to correct the systematic errors of the raw ensemble predictions, incorporating information from additional meteorological predictor variables beyond wind gusts leads to significant improvements in forecast skill. In particular, we propose a flexible framework of locally adaptive neural networks with different probabilistic forecast types as output, which not only significantly outperform all benchmark postprocessing methods but also learn physically consistent relations associated with the diurnal cycle, especially the evening transition of the planetary boundary layer.

【30】 Scaling Laws for Acoustic Models 标题：声学模型的标度律

作者：Jasha Droppo,Oguz Elibol 机构：Amazon Alexa 备注：Submitted to Interspeech 2021 链接：https://arxiv.org/abs/2106.09488 摘要：最近机器学习的一个趋势是通过将模型增长到以前认为不合理的大小来提高模型质量。最近的工作表明，具有交叉熵目标函数的自回归生成模型表现出平滑的幂律关系，或者称为标度律，可以从模型大小、训练集大小和可用的计算预算来预测模型质量。这些比例律允许在给定可用训练数据、模型参数计数或训练计算预算的约束条件下选择接近最优的超参数。在本文中，我们证明了声学模型训练与自动预测编码损失的行为，如果他们受到类似的缩放规律。我们扩展了先前的工作，共同预测由于模型大小、训练集大小和任务固有的“不可减少的损失”造成的损失。我们发现标度律在模型大小和训练集大小两个数量级上精确匹配了模型性能，并对模型性能的极限进行了预测。摘要：There is a recent trend in machine learning to increase model quality by growing models to sizes previously thought to be unreasonable. Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships, or scaling laws, that predict model quality from model size, training set size, and the available compute budget. These scaling laws allow one to choose nearly optimal hyper-parameters given constraints on available training data, model parameter count, or training computation budget. In this paper, we demonstrate that acoustic models trained with an auto-predictive coding loss behave as if they are subject to similar scaling laws. We extend previous work to jointly predict loss due to model size, to training set size, and to the inherent "irreducible loss" of the task. We find that the scaling laws accurately match model performance over two orders of magnitude in both model size and training set size, and make predictions about the limits of model performance.

【31】 Physics-informed CoKriging model of a redox flow battery 标题：氧化还原液流电池的物理信息协同克里格模型

作者：Amanda A. Howard,Alexandre M. Tartakovsky 机构：Pacific Northwest National Laboratory, Richland, WA, WA; Department of Civil and Environmental Engineering, University of Illinois Urbana-Champaign, Urbana, IL∗ 链接：https://arxiv.org/abs/2106.09188 摘要：氧化还原液流电池（RFB）具有廉价、高效地储存大量能量的能力，但需要快速、准确地建立RFB充放电曲线模型，以提高电池的容量和性能。提出了一种预测RFB充放电曲线的多理想模型。在多理想模型中，我们使用了基于实验数据的物理信息协克里格（CoPhIK）机器学习方法，该方法受到基于零维物理模型的约束。在这里，我们证明了该模型与实验结果的良好一致性，并且比现有的零维模型有了显著的改进。我们证明了所提出的模型是鲁棒的，因为它对零维模型中的输入参数不敏感。我们还表明，只需要少量的高保真实验数据就可以准确预测所考虑的输入参数范围，包括电流密度、流速和初始浓度。摘要：Redox flow batteries (RFBs) offer the capability to store large amounts of energy cheaply and efficiently, however, there is a need for fast and accurate models of the charge-discharge curve of a RFB to potentially improve the battery capacity and performance. We develop a multifidelity model for predicting the charge-discharge curve of a RFB. In the multifidelity model, we use the Physics-informed CoKriging (CoPhIK) machine learning method that is trained on experimental data and constrained by the so-called "zero-dimensional" physics-based model. Here we demonstrate that the model shows good agreement with experimental results and significant improvements over existing zero-dimensional models. We show that the proposed model is robust as it is not sensitive to the input parameters in the zero-dimensional model. We also show that only a small amount of high-fidelity experimental datasets are needed for accurate predictions for the range of considered input parameters, which include current density, flow rate, and initial concentrations.

其他(19篇)

【1】 BABEL: Bodies, Action and Behavior with English Labels 标题：巴别塔：带有英文标签的身体、动作和行为

作者：Abhinanda R. Punnakkal,Arjun Chandrasekaran,Nikos Athanasiou,Alejandra Quiros-Ramirez,Michael J. Black 机构：Alejandra Quir´os-Ram´ırez, Max Planck Institute for Intelligent Systems, T¨ubingen, Germany, Universit¨at Konstanz, Konstanz, Germany 备注：11 pages, 4 figures, Accepted in CVPR'21 链接：https://arxiv.org/abs/2106.09696 摘要：理解人类运动的语义——运动的内容、方式和原因——是一个重要的问题，需要有语义标签的人类行为数据集。现有的数据集采用两种方法之一。大规模视频数据集包含许多动作标签，但不包含地面真实三维人体运动。或者，运动捕捉（mocap）数据集具有精确的身体运动，但仅限于少量动作。为了解决这个问题，我们提供了BABEL，一个大型的数据集，它带有描述mocap序列中执行的操作的语言标签。BABEL由动作标签组成，用于来自AMASS的大约43小时的mocap序列。动作标签有两个抽象层次——序列标签描述序列中的整体动作，帧标签描述序列中每个帧中的所有动作。每个帧标签与mocap序列中相应动作的持续时间精确对齐，并且多个动作可以重叠。BABEL中有超过28k个序列标签和63k个帧标签，它们属于超过250个独特的动作类别。BABEL的标签可以用于动作识别、时间动作定位、运动合成等任务。为了验证BABEL作为基准的价值，我们评估了模型在3D动作识别中的性能。我们证明，巴贝尔提出了有趣的学习挑战，适用于现实世界的情况下，可以作为一个有用的基准进展三维动作识别。数据集、基线方法和评估代码在https://babel.is.tue.mpg.de/. 摘要：Understanding the semantics of human movement -- the what, how and why of the movement -- is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of action labels for about 43 hours of mocap sequences from AMASS. Action labels are at two levels of abstraction -- sequence labels describe the overall action in the sequence, and frame labels describe all actions in every frame of the sequence. Each frame label is precisely aligned with the duration of the corresponding action in the mocap sequence, and multiple actions can overlap. There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories. Labels from BABEL can be leveraged for tasks like action recognition, temporal action localization, motion synthesis, etc. To demonstrate the value of BABEL as a benchmark, we evaluate the performance of models on 3D action recognition. We demonstrate that BABEL poses interesting learning challenges that are applicable to real-world scenarios, and can serve as a useful benchmark of progress in 3D action recognition. The dataset, baseline method, and evaluation code is made available, and supported for academic research purposes at https://babel.is.tue.mpg.de/.

【2】 Statistical Query Lower Bounds for List-Decodable Linear Regression 标题：列表可解码线性回归的统计查询下界

作者：Ilias Diakonikolas,Daniel M. Kane,Ankit Pensia,Thanasis Pittas,Alistair Stewart 机构：University of Wisconsin-Madison, University of California, San Diego, Web , Foundation 链接：https://arxiv.org/abs/2106.09689 摘要：我们研究列表可解线性回归问题，其中对手可以破坏大多数的例子。具体地说，我们得到了一组$T$的标记示例$（x，y）inmathbb{R}^dtimesmathbb{R}$和参数$0<alpha<1/2$，使得$T$中的点的$alpha$分数是来自具有高斯协变量的线性回归模型的i.i.d.样本，剩余的$（1-alpha）$分数点是从任意噪声分布中提取的。目标是输出一个假设向量的小列表，使得其中至少有一个接近目标回归向量。我们的主要结果是这个问题的统计查询（SQ）下限为$d^{mathrm{poly}（1/alpha）}$。我们的SQ下界定性地与先前开发的算法的性能相匹配，提供了证据表明此任务的当前上界几乎是最好的。摘要：We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) in mathbb{R}^d times mathbb{R}$ and a parameter $0< alpha <1/2$ such that an $alpha$-fraction of the points in $T$ are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining $(1-alpha)$-fraction of the points are drawn from an arbitrary noise distribution. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of $d^{mathrm{poly}(1/alpha)}$ for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.

【3】 How Low Can We Go: Trading Memory for Error in Low-Precision Training 标题：我们能走多低：用记忆换取低精度训练中的错误

作者：Chengrun Yang,Ziyang Wu,Jerry Chee,Christopher De Sa,Madeleine Udell 机构： 1Cornell University 链接：https://arxiv.org/abs/2106.09686 摘要：低精度算法使用更少的能量、更少的内存和更少的时间来训练深度学习模型。然而，我们为此付出了代价：较低的精度可能会产生较大的舍入误差，从而产生较大的预测误差。随着应用程序的激增，用户必须选择使用哪种精度来训练新模型，芯片制造商必须决定制造哪种精度。我们将这些精度选择视为一个超参数调整问题，并借鉴元学习的思想来学习记忆和错误之间的权衡。本文引入Pareto估计来选取最佳精度（PEPPP）。我们使用矩阵分解来寻找非支配配置（帕累托前沿）与数量有限的网络评估。对于任何给定的内存预算，使错误最小化的精度是这个边界上的一个点。实践者可以利用边界来用记忆换取错误，并为他们的目标选择最佳精度。摘要：Low-precision arithmetic trains deep learning models using less energy, less memory and less time. However, we pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. As applications proliferate, users must choose which precision to use to train a new model, and chip manufacturers must decide which precisions to manufacture. We view these precision choices as a hyperparameter tuning problem, and borrow ideas from meta-learning to learn the tradeoff between memory and error. In this paper, we introduce Pareto Estimation to Pick the Perfect Precision (PEPPP). We use matrix factorization to find non-dominated configurations (the Pareto frontier) with a limited number of network evaluations. For any given memory budget, the precision that minimizes error is a point on this frontier. Practitioners can use the frontier to trade memory for error and choose the best precision for their goals.

【4】 Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention 标题：利用视听自我注意改进开放领域视频的屏上声音分离

作者：Efthymios Tzinis,Scott Wisdom,Tal Remez,John R. Hershey 机构：UIUC, Google Research 链接：https://arxiv.org/abs/2106.09669 摘要：我们介绍了一个最先进的视听屏幕声音分离系统，它能够学习分离声音，并通过观看野生视频将声音与屏幕上的物体联系起来。我们指出了以往的视听屏幕声音分离工作的局限性，包括时空注意的简单性和粗糙分辨率，以及音频分离模型的收敛性差。我们提出的模型利用跨模态和自我注意模块来解决这些问题，这些模块能够以更高的分辨率捕获随时间变化的视听相关性，并通过无监督的音频分离模型预训练来解决这些问题。这些改进使得模型可以推广到更广泛的一组看不见的视频。为了评估和半监督训练，我们从一个大型的野生视频数据库（YFCC100M）中收集了屏幕音频的人类注释。我们的结果表明，在更一般的条件下，屏幕上的分离性能比以前的方法有显著的改善。摘要：We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous work on audiovisual on-screen sound separation, including the simplicity and coarse resolution of spatio-temporal attention, and poor convergence of the audio separation model. Our proposed model addresses these issues using cross-modal and self-attention modules that capture audio-visual dependencies at a finer resolution over time, and by unsupervised pre-training of audio separation model. These improvements allow the model to generalize to a much wider set of unseen videos. For evaluation and semi-supervised training, we collected human annotations of on-screen audio from a large database of in-the-wild videos (YFCC100M). Our results show marked improvements in on-screen separation performance, in more general conditions than previous methods.

【5】 Scalable Approach for Normalizing E-commerce Text Attributes (SANTA) 标题：电子商务文本属性规范化的可伸缩方法(圣诞老人)

作者：Ravi Shankar Mishra,Kartik Mehta,Nikhil Rasiwasia 机构：India Machine Learning, Amazon 备注：Accepted in ECNLP workshop of ACL-IJCNLP 2021 (this https URL) 链接：https://arxiv.org/abs/2106.09493 摘要：在本文中，我们提出了SANTA，一个可扩展的框架来自动将电子商务属性值（如win10pro）规范化为一组固定的预定义规范值（如windows10）。早期的属性规范化工作主要集中在模糊字符串匹配（本文也称为句法匹配）。在这项工作中，我们首先对9种句法匹配算法进行了广泛的研究，并确定“余弦”相似度导致最佳结果，比常用的Jaccard索引提高了2.7%。接下来，我们认为字符串相似性本身不足以进行属性规范化，因为许多表面形式需要超越句法匹配（例如，“720p”和“HD”是同义词）。虽然像无监督嵌入（例如word2vec/fastText）这样的语义技术在词汇相似性任务中表现出了很好的效果，但我们观察到它们在区分紧密规范形式方面表现不佳，因为这些紧密形式经常出现在相似的上下文中。我们建议学习令牌嵌入使用双网络与三重损失。我们提出了一个嵌入学习任务，利用原始属性值和产品标题以自我监督的方式学习这些嵌入。我们表明，提供监督使用我们提出的任务改进了句法和无监督嵌入的技术为基础的属性规范化。在一个包含50个属性的真实属性规范化数据集上的实验表明，使用该方法训练的嵌入比最佳字符串匹配提高了2.3%，比最佳无监督嵌入提高了19.3%。摘要：In this paper, we present SANTA, a scalable framework to automatically normalize E-commerce attribute values (e.g. "Win 10 Pro") to a fixed set of pre-defined canonical values (e.g. "Windows 10"). Earlier works on attribute normalization focused on fuzzy string matching (also referred as syntactic matching in this paper). In this work, we first perform an extensive study of nine syntactic matching algorithms and establish that 'cosine' similarity leads to best results, showing 2.7% improvement over commonly used Jaccard index. Next, we argue that string similarity alone is not sufficient for attribute normalization as many surface forms require going beyond syntactic matching (e.g. "720p" and "HD" are synonyms). While semantic techniques like unsupervised embeddings (e.g. word2vec/fastText) have shown good results in word similarity tasks, we observed that they perform poorly to distinguish between close canonical forms, as these close forms often occur in similar contexts. We propose to learn token embeddings using a twin network with triplet loss. We propose an embedding learning task leveraging raw attribute values and product titles to learn these embeddings in a self-supervised fashion. We show that providing supervision using our proposed task improves over both syntactic and unsupervised embeddings based techniques for attribute normalization. Experiments on a real-world attribute normalization dataset of 50 attributes show that the embeddings trained using our proposed approach obtain 2.3% improvement over best string matching and 19.3% improvement over best unsupervised embeddings.

【6】 Secure Multi-Function Computation with Private Remote Sources 标题：具有私有远程源的安全多功能计算

作者：Onur Günlü,Matthieu Bloch,Rafael F. Schaefer 机构： Georgia Institute of Technology 备注：Shorter version to appear in the IEEE International Symposium on Information Theory 2021 链接：https://arxiv.org/abs/2106.09485 摘要：我们考虑了一个分布式函数计算问题，在这个问题中，观测到有噪声版本的远程源的各方通过公共通信在一个融合中心方便地计算其观测值的函数。分布式函数计算不仅具有可靠性和存储能力，而且具有保密性和保密性。具体地说，1）远程源应该与窃听者和融合中心保持私有关系，以泄露的远程源信息来衡量；2）计算出的函数应该对窃听者保密，以函数参数的泄露信息来衡量，以确保不管使用的函数是什么样的函数都是保密的。我们推导了无损和有损单函数计算的精确速率域，并举例说明了一个信息瓶颈例子的有损单函数计算速率域，在这个例子中，最优辅助随机变量被描述为二进制输入对称输出信道。我们将该方法推广到具有联合保密和隐私约束的无损和有损异步多函数计算中，在这种情况下，刻画了仅在马尔可夫链条件下不同的速率区域的内外界。摘要：We consider a distributed function computation problem in which parties observing noisy versions of a remote source facilitate the computation of a function of their observations at a fusion center through public communication. The distributed function computation is subject to constraints, including not only reliability and storage but also privacy and secrecy. Specifically, 1) the remote source should remain private from an eavesdropper and the fusion center, measured in terms of the information leaked about the remote source; 2) the function computed should remain secret from the eavesdropper, measured in terms of the information leaked about the arguments of the function, to ensure secrecy regardless of the exact function used. We derive the exact rate regions for lossless and lossy single-function computation and illustrate the lossy single-function computation rate region for an information bottleneck example, in which the optimal auxiliary random variables are characterized for binary-input symmetric-output channels. We extend the approach to lossless and lossy asynchronous multiple-function computations with joint secrecy and privacy constraints, in which case inner and outer bounds for the rate regions differing only in the Markov chain conditions imposed are characterized.

【7】 Federated CycleGAN for Privacy-Preserving Image-to-Image Translation 标题：保护隐私的图像到图像转换的联邦循环GAN

作者：Joonyoung Song,Jong Chul Ye 机构：Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea 链接：https://arxiv.org/abs/2106.09246 摘要：无监督图像到图像的转换方法，如CycleGAN学习使用来自不同域的未配对训练数据集将图像从一个域转换到另一个域。不幸的是，这些方法仍然需要集中收集未配对的记录，这可能会侵犯隐私和安全问题。虽然最近的联邦学习（FL）允许神经网络在不交换数据的情况下进行训练，但FL的基本假设是所有客户都有自己的来自相似领域的训练数据，这与我们的图像到图像转换场景不同，在这种场景中，每个客户端都有来自其唯一域的图像，目标是在不访问目标域数据的情况下学习不同域之间的图像转换。为了解决这一问题，本文提出了一种新的联邦CycleGAN体系结构，它可以在保持数据隐私的同时，以无监督的方式学习图像翻译。具体来说，我们的方法源自一个新的观察，即CycleGAN损失可以分解为客户特定的局部目标的总和，这些目标可以仅使用其数据进行评估。这种局部目标分解允许多个客户机在不牺牲性能的情况下参与联合CycleGAN训练。此外，我们的方法采用了新的可切换生成器和鉴别器架构，使用自适应实例规范化（AdaIN）显著降低了联邦学习的带宽要求。我们在各种无监督图像翻译任务上的实验结果表明，我们的联邦CycleGAN与非联邦CycleGAN具有相当的性能。摘要：Unsupervised image-to-image translation methods such as CycleGAN learn to convert images from one domain to another using unpaired training data sets from different domains. Unfortunately, these approaches still require centrally collected unpaired records, potentially violating privacy and security issues. Although the recent federated learning (FL) allows a neural network to be trained without data exchange, the basic assumption of the FL is that all clients have their own training data from a similar domain, which is different from our image-to-image translation scenario in which each client has images from its unique domain and the goal is to learn image translation between different domains without accessing the target domain data. To address this, here we propose a novel federated CycleGAN architecture that can learn image translation in an unsupervised manner while maintaining the data privacy. Specifically, our approach arises from a novel observation that CycleGAN loss can be decomposed into the sum of client specific local objectives that can be evaluated using only their data. This local objective decomposition allows multiple clients to participate in federated CycleGAN training without sacrificing performance. Furthermore, our method employs novel switchable generator and discriminator architecture using Adaptive Instance Normalization (AdaIN) that significantly reduces the band-width requirement of the federated learning. Our experimental results on various unsupervised image translation tasks show that our federated CycleGAN provides comparable performance compared to the non-federated counterpart.

【8】 Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery 标题：平方根主成分追打：免调谐噪声鲁棒矩阵恢复

作者：Junhui Zhang,Jingkai Yan,John Wright 机构：Department of Applied Physics and Applied Math, Columbia University, New York, NY , Department of Electrical Engineering 链接：https://arxiv.org/abs/2106.09211 摘要：我们提出了一个新的框架-平方根主成分追踪-从被噪声和异常值破坏的观测值中恢复低秩矩阵。受平方根套索的启发，这种新的公式不需要噪声级的先验知识。我们证明了一个单一的，普遍的正则化参数的选择足以实现重建误差成比例的（先验未知）噪声水平。相比之下，以前的公式，如稳定的PCP依赖于噪声相关参数来实现类似的性能，因此在噪声级未知的应用中部署具有挑战性。通过对模拟数据集和真实数据集的实验，验证了新方法的有效性。我们的模拟结果证实了一个观点，即正则化参数的普遍选择在一系列噪声水平上产生接近最优的性能，表明所提出的方法优于这里证明的（有些松散的）界限。摘要：We propose a new framework -- Square Root Principal Component Pursuit -- for low-rank matrix recovery from observations corrupted with noise and outliers. Inspired by the square root Lasso, this new formulation does not require prior knowledge of the noise level. We show that a single, universal choice of the regularization parameter suffices to achieve reconstruction error proportional to the (a priori unknown) noise level. In comparison, previous formulations such as stable PCP rely on noise-dependent parameters to achieve similar performance, and are therefore challenging to deploy in applications where the noise level is unknown. We validate the effectiveness of our new method through experiments on simulated and real datasets. Our simulations corroborate the claim that a universal choice of the regularization parameter yields near optimal performance across a range of noise levels, indicating that the proposed method outperforms the (somewhat loose) bound proved here.

【9】 On the Power of Preconditioning in Sparse Linear Regression 标题：关于稀疏线性回归中预处理的作用

作者：Jonathan Kelner,Frederic Koehler,Raghu Meka,Dhruv Rohatgi 机构：MIT, UCLA 备注：73 pages, 5 figures 链接：https://arxiv.org/abs/2106.09207 摘要：稀疏线性回归是高维统计中的一个基本问题，但对于如何在不限制设计矩阵的条件下有效地求解它却知之甚少。我们考虑（相关）随机设计设置，其中协变量独立地从多元高斯$N（0，，Sigma）$和$Sigma:Ntimes N$中提取，并且寻求估计量$hat{w}$最小化$（hat{w}-w^*）^TSigma（hat{w}-w^*）$，其中，$w^*$是$k$-稀疏的基本真值。从理论上讲，对于任意的$Sigma$和$$w^*$，我们可以用$O（klogn）$样本获得很强的误差界；然而，在没有进一步假设$Sigma$或$w^*$的情况下，即使使用$o（n）$样本，也没有有效的算法能够匹配这些保证。至于硬度，计算下限只知道最坏情况下的设计矩阵。随机设计实例是已知的，对于套索来说是困难的，但是这些实例通常可以通过简单地改变基（即预处理）后由套索求解。在这项工作中，我们给出了上界和下界澄清权力的预处理稀疏线性回归。首先，我们证明了预处理套索可以近似最优地解决一大类稀疏线性回归问题：只要协变量的依赖结构（在马尔可夫性质的意义上）具有较低的树宽，它就会成功——即使$Sigma$是高度病态的。第二，我们构造（第一次）随机设计实例，这对于一个最优预处理套索是很难证明的。事实上，我们通过证明对于任何树宽-$t$图，在这个图上存在一个高斯马尔可夫随机场来完成我们的树宽分类，这样，当从这个模型中提取协变量时，预处理的套索，在任何预处理子的选择下，都需要$Omega（t^{1/20}）$样本来恢复$O（logn）$-稀疏信号。摘要：Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian $N(0,Sigma)$ with $Sigma : n times n$, and seek estimators $hat{w}$ minimizing $(hat{w}-w^*)^TSigma(hat{w}-w^*)$, where $w^*$ is the $k$-sparse ground truth. Information theoretically, one can achieve strong error bounds with $O(k log n)$ samples for arbitrary $Sigma$ and $w^*$; however, no efficient algorithms are known to match these guarantees even with $o(n)$ samples, without further assumptions on $Sigma$ or $w^*$. As far as hardness, computational lower bounds are only known with worst-case design matrices. Random-design instances are known which are hard for the Lasso, but these instances can generally be solved by Lasso after a simple change-of-basis (i.e. preconditioning). In this work, we give upper and lower bounds clarifying the power of preconditioning in sparse linear regression. First, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth -- even if $Sigma$ is highly ill-conditioned. Second, we construct (for the first time) random-design instances which are provably hard for an optimally preconditioned Lasso. In fact, we complete our treewidth classification by proving that for any treewidth-$t$ graph, there exists a Gaussian Markov Random Field on this graph such that the preconditioned Lasso, with any choice of preconditioner, requires $Omega(t^{1/20})$ samples to recover $O(log n)$-sparse signals when covariates are drawn from this model.

【10】 RHNAS: Realizable Hardware and Neural Architecture Search 标题：RHNAS：可实现的硬件和神经结构搜索

作者：Yash Akhauri,Adithya Niranjan,J. Pablo Muñoz,Suvadeep Banerjee,Abhijit Davare,Pasquale Cocchini,Anton A. Sorokin,Ravi Iyer,Nilesh Jain 机构：Intel Labs, India, Intel Labs, USA 备注：15 pages 链接：https://arxiv.org/abs/2106.09180 摘要：快速发展的人工智能领域需要自动化的方法来共同设计神经网络结构和神经加速器，以最大限度地提高系统效率和解决生产力的挑战。为了实现这一巨大空间的联合优化，可微神经网络-硬件协同设计越来越受到人们的关注。完全可微协同设计降低了发现优化NN-HW配置的资源需求，但不能适应一般硬件加速器搜索空间。这是由于在许多硬件加速器的搜索空间中存在不可合成（无效）的设计。为了实现具有任意神经网络搜索空间的可配置硬件加速器的高效和可实现的协同设计，我们引入了RHNAS。RHNAS是一种将硬件优化的强化学习与可微神经结构搜索相结合的方法。RHNAS发现了可实现的NN-HW设计，在ImageNet上的延迟降低了1.84倍，能量延迟积（EDP）降低了1.86倍，在CIFAR-10上的延迟降低了2.81倍，EDP降低了3.30倍。摘要：The rapidly evolving field of Artificial Intelligence necessitates automated approaches to co-design neural network architecture and neural accelerators to maximize system efficiency and address productivity challenges. To enable joint optimization of this vast space, there has been growing interest in differentiable NN-HW co-design. Fully differentiable co-design has reduced the resource requirements for discovering optimized NN-HW configurations, but fail to adapt to general hardware accelerator search spaces. This is due to the existence of non-synthesizable (invalid) designs in the search space of many hardware accelerators. To enable efficient and realizable co-design of configurable hardware accelerators with arbitrary neural network search spaces, we introduce RHNAS. RHNAS is a method that combines reinforcement learning for hardware optimization with differentiable neural architecture search. RHNAS discovers realizable NN-HW designs with 1.84x lower latency and 1.86x lower energy-delay product (EDP) on ImageNet and 2.81x lower latency and 3.30x lower EDP on CIFAR-10 over the default hardware accelerator design.

【11】 mPyPl: Python Monadic Pipeline Library for Complex Functional Data Processing 标题：mPyP1：用于复杂函数数据处理的Python一元流水线库

作者：Dmitry Soshnikov,Yana Valieva 备注：Published in Microsoft Journal of Applied Research, Dec.2019., Vol. 12 链接：https://arxiv.org/abs/2106.09164 摘要：在本文中，我们提出了一个新的Python库mPyPl，它旨在使用函数方法简化复杂的数据处理任务。该库定义了对以生成器表示的命名词典的惰性数据流（所谓的多字段数据流）的操作，并允许在数据准备和特征提取过程中使用更多的“字段”丰富这些数据流。因此，大多数数据准备任务都可以用简洁的线性“pipeline”形式表示，类似于UNIX管道的语法，或者F |中的|>函数复合操作符。我们定义了多字段数据流上的基本操作，这些操作类似于经典的一元操作，并且展示了所提出的方法与函数编程中的一元操作的相似性。我们还展示了如何在视频事件检测的复杂深度学习任务中使用该库，并讨论了允许在记忆和性能方面进行不同折衷的不同评估策略。摘要：In this paper, we present a new Python library called mPyPl, which is intended to simplify complex data processing tasks using functional approach. This library defines operations on lazy data streams of named dictionaries represented as generators (so-called multi-field datastreams), and allows enriching those data streams with more 'fields' in the process of data preparation and feature extraction. Thus, most data preparation tasks can be expressed in the form of neat linear 'pipeline', similar in syntax to UNIX pipes, or |> functional composition operator in F#. We define basic operations on multi-field data streams, which resemble classical monadic operations, and show similarity of the proposed approach to monads in functional programming. We also show how the library was used in complex deep learning tasks of event detection in video, and discuss different evaluation strategies that allow for different compromises in terms of memory and performance.

【12】 Automatic Curricula via Expert Demonstrations 标题：通过专家演示实现自动课程

作者：Siyu Dai,Andreas Hofmann,Brian Williams 机构：Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, United States 备注：Preprint, work in progress 链接：https://arxiv.org/abs/2106.09159 摘要：为了解决具有稀疏奖励函数的机器人操作任务，提出了一种基于专家演示（ACED）的强化学习（RL）方法，该方法结合了模仿学习和课程学习的思想。课程学习通过引入一系列辅助任务来解决复杂的学习任务，难度越来越大，然而如何自动设计有效的、可推广的课程仍然是一个具有挑战性的研究课题。ACED从少量的专家演示轨迹中提取课程，方法是将演示划分为几个部分，并对从演示的不同部分取样的州初始化训练集。随着学习代理性能的提高，ACED通过将重置状态从演示结束移动到演示开始，不仅可以学习具有未知初始化和目标的具有挑战性的操作任务，而且可以发现不同于演示的新颖解决方案。此外，ACED可以自然地与其他模仿学习方法相结合，以更有效的方式利用专家演示，并且我们表明，ACED与行为克隆的结合允许仅用1个演示就可以学习挑选和放置任务，而用20个演示就可以学习块堆积任务。摘要：We propose Automatic Curricula via Expert Demonstrations (ACED), a reinforcement learning (RL) approach that combines the ideas of imitation learning and curriculum learning in order to solve challenging robotic manipulation tasks with sparse reward functions. Curriculum learning solves complicated RL tasks by introducing a sequence of auxiliary tasks with increasing difficulty, yet how to automatically design effective and generalizable curricula remains a challenging research problem. ACED extracts curricula from a small amount of expert demonstration trajectories by dividing demonstrations into sections and initializing training episodes to states sampled from different sections of demonstrations. Through moving the reset states from the end to the beginning of demonstrations as the learning agent improves its performance, ACED not only learns challenging manipulation tasks with unseen initializations and goals, but also discovers novel solutions that are distinct from the demonstrations. In addition, ACED can be naturally combined with other imitation learning methods to utilize expert demonstrations in a more efficient manner, and we show that a combination of ACED with behavior cloning allows pick-and-place tasks to be learned with as few as 1 demonstration and block stacking tasks to be learned with 20 demonstrations.

【13】 An Imprecise SHAP as a Tool for Explaining the Class Probability Distributions under Limited Training Data 标题：有限训练数据下类概率分布的不精确形状解释工具

作者：Lev V. Utkin,Andrei V. Konstantinov,Kirill A. Vishniakov 机构：Peter the Great St.Petersburg Polytechnic University, St.Petersburg, Russia 链接：https://arxiv.org/abs/2106.09111 摘要：最流行的机器学习预测解释方法之一是SHapley加性解释方法（SHAP）。对于类概率分布不精确且用分布集表示的情况，提出了一种不精确形状作为原始形状的修正。不精确形状背后的第一个想法是一种计算特征边际贡献的新方法，它满足了Shapley值的重要效率特性。第二个想法是尝试考虑计算和减少区间值Shapley值的一般方法，这类似于不精确概率论中的可达概率区间的想法。基于Kolmogorov-Smirnov距离和不精确污染模型，提出了一种线性优化问题形式的通用方法的简单特殊实现。用合成数据和实际数据的数值例子说明了这种不精确的形状。摘要：One of the most popular methods of the machine learning prediction explanation is the SHapley Additive exPlanations method (SHAP). An imprecise SHAP as a modification of the original SHAP is proposed for cases when the class probability distributions are imprecise and represented by sets of distributions. The first idea behind the imprecise SHAP is a new approach for computing the marginal contribution of a feature, which fulfils the important efficiency property of Shapley values. The second idea is an attempt to consider a general approach to calculating and reducing interval-valued Shapley values, which is similar to the idea of reachable probability intervals in the imprecise probability theory. A simple special implementation of the general approach in the form of linear optimization problems is proposed, which is based on using the Kolmogorov-Smirnov distance and imprecise contamination models. Numerical examples with synthetic and real data illustrate the imprecise SHAP.

【14】 Disentangling Online Chats with DAG-Structured LSTMs 标题：利用DAG结构的LSTM解开在线聊天

作者：Duccio Pappadopulo,Lisa Bauer,Marco Farina,Ozan İrsoy,Mohit Bansal 机构：Bloomberg, UNC Chapel Hill 备注：8 pages, 1 figure. Accepted at *SEM 2021 链接：https://arxiv.org/abs/2106.09024 摘要：许多现代消息传递系统允许许多用户之间进行快速、同步的文本通信。由此产生的消息序列隐藏了一个更复杂的结构，其中独立的子会话相互交织。这对任何旨在理解聊天日志内容或从中收集信息的任务都是一个挑战。理清这些对话的能力相当于许多下游任务的成功，比如总结和问题回答。伴随着文本的结构化信息，如用户转向、用户提及、时间戳，被参与者自己用作提示，他们需要关注对话，并且已经被证明对解开矛盾很重要。DAG-LSTMs是树LSTMs的一个推广，它可以处理有向无环依赖，是一种自然的方法来整合这些信息及其非序列性。本文将DAG-LSTMs应用于会话解纠缠任务。我们在ubuntuirc数据集上进行实验。我们证明了我们提出的新模型在恢复回复关系的任务上达到了最先进的水平，并且在其他解纠缠度量上具有竞争力。摘要：Many modern messaging systems allow fast and synchronous textual communication among many users. The resulting sequence of messages hides a more complicated structure in which independent sub-conversations are interwoven with one another. This poses a challenge for any task aiming to understand the content of the chat logs or gather information from them. The ability to disentangle these conversations is then tantamount to the success of many downstream tasks such as summarization and question answering. Structured information accompanying the text such as user turn, user mentions, timestamps, is used as a cue by the participants themselves who need to follow the conversation and has been shown to be important for disentanglement. DAG-LSTMs, a generalization of Tree-LSTMs that can handle directed acyclic dependencies, are a natural way to incorporate such information and its non-sequential nature. In this paper, we apply DAG-LSTMs to the conversation disentanglement task. We perform our experiments on the Ubuntu IRC dataset. We show that the novel model we propose achieves state of the art status on the task of recovering reply-to relations and it is competitive on other disentanglement metrics.

【15】 WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis 标题：WaveGrad 2：文本到语音合成的迭代求精

作者：Nanxin Chen,Yu Zhang,Heiga Zen,Ron J. Weiss,Mohammad Norouzi,Najim Dehak,William Chan 机构：Center for Language and Speech Processing, Johns Hopkins University, Brain Team, Google Research 备注：Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works 链接：https://arxiv.org/abs/2106.09660 摘要：本文介绍了一种用于文语合成的非自回归生成模型wavegrad2。训练wavegrad2来估计给定音素序列的波形的对数条件密度的梯度。该模型采用一个输入音素序列，通过一个迭代求精过程，产生一个音频波形。这与原始的WaveGrad声码器形成了对比，WaveGrad声码器以mel谱图特征为条件，由一个单独的模型生成。迭代细化过程从高斯噪声开始，通过一系列细化步骤（例如，50个步骤）逐步恢复音频序列。wavegrad2通过调整精化步骤的数量，提供了一种在推理速度和样本质量之间进行权衡的自然方法。实验表明，该模型能产生高保真度的音频，接近目前最先进的神经TTS系统的性能。我们还报告了不同模型配置下的各种烧蚀研究。音频样本可在https://wavegrad.github.io/v2. 摘要：This paper introduces WaveGrad 2, a non-autoregressive generative model for text-to-speech synthesis. WaveGrad 2 is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence. The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform. This contrasts to the original WaveGrad vocoder which conditions on mel-spectrogram features, generated by a separate model. The iterative refinement process starts from Gaussian noise, and through a series of refinement steps (e.g., 50 steps), progressively recovers the audio sequence. WaveGrad 2 offers a natural way to trade-off between inference speed and sample quality, through adjusting the number of refinement steps. Experiments show that the model can generate high fidelity audio, approaching the performance of a state-of-the-art neural TTS system. We also report various ablation studies over different model configurations. Audio samples are available at https://wavegrad.github.io/v2.

【16】 Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA 标题：基于结构化非线性ICA的噪声数据可识别特征解缠

作者：Hermanni Hälvä,Sylvain Le Corff,Luc Lehéricy,Jonathan So,Yongjie Zhu,Elisabeth Gassiat,Aapo Hyvarinen 机构：Aapo Hyvärinen, †, Department of Computer Science, University of Helsinki, Samovar, Télécom SudParis, département CITI, Institut Polytechnique de Paris, Palaiseau, France, Laboratoire J. A. Dieudonné, Université Côte d’Azur, CNRS, Nice, France 备注：preprint 链接：https://arxiv.org/abs/2106.09620 摘要：我们介绍了一个新的一般可识别框架的原则解纠缠称为结构非线性独立成分分析（SNICA）。我们的贡献是将深生成模型的可辨识性理论推广到一类非常广泛的结构化模型。虽然以前的工作已经证明了时间序列模型的特殊类别的可识别性，我们的定理将其扩展到更一般的时间结构以及具有更复杂结构（如空间依赖）的模型。特别地，我们建立了一个主要的结果，即即使在存在未知分布的噪声的情况下，该框架的可辨识性仍然成立。因此，SNICA设置包含了时间序列的所有现有非线性ICA模型，并允许新的更丰富的可识别模型。最后，作为我们框架灵活性的一个例子，我们介绍了第一个时间序列的非线性ICA模型，它结合了以下非常有用的特性：它在完全无监督的环境中同时考虑了非平稳性和自相关性；进行降维；模型隐藏状态；并利用变分极大似然法进行有原则的估计和推理。摘要：We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend this to more general temporal structures as well as to models with more complex structures such as spatial dependencies. In particular, we establish the major result that identifiability for this framework holds even in the presence of noise of unknown distribution. The SNICA setting therefore subsumes all the existing nonlinear ICA models for time-series and also allows for new much richer identifiable models. Finally, as an example of our framework's flexibility, we introduce the first nonlinear ICA model for time-series that combines the following very useful properties: it accounts for both nonstationarity and autocorrelation in a fully unsupervised setting; performs dimensionality reduction; models hidden states; and enables principled estimation and inference by variational maximum-likelihood.

【17】 Stochastic Bias-Reduced Gradient Methods 标题：随机减偏梯度法

作者：Hilal Asi,Yair Carmon,Arun Jambulapati,Yujia Jin,Aaron Sidford 链接：https://arxiv.org/abs/2106.09481 摘要：我们发展了一个新的随机优化原语：任何Lipschitz强凸函数的最小x星的低偏差、低成本估计。特别地，我们利用Blanchet和Glynn提出的多层蒙特卡罗方法，将任何最优随机梯度方法转化为一个估计量为$xstar$，偏差为$delta$，方差为$O（log（1/delta））$，预期抽样成本为$O（log（1/delta））$随机梯度评估。作为直接的结果，我们获得了廉价的几乎无偏的梯度估计Moreau-Yoshida包络的任何Lipschitz凸函数，使我们能够执行无量纲随机平滑。我们通过四个应用证明了估计量的潜力。首先，我们发展了一种最小化最大N$函数的方法，改进了最近的结果，并匹配了一个下限对数因子。第二和第三，我们使用简单算法和透明分析恢复投影效率和梯度效率优化的最新速率。最后，我们证明了我们的估计的一个改进版本将产生一个近似线性的时间，最优效用，微分私有非光滑随机优化方法。摘要：We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_star$ with bias $delta$, variance $O(log(1/delta))$, and an expected sampling cost of $O(log(1/delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.

【18】 Importance measures derived from random forests: characterisation and extension 标题：随机森林的重要性测度：刻画和推广

作者：Antonio Sutera 机构：Advisors:, Prof. PIERRE GEURTS, Prof. LOUIS WEHENKEL, arXiv:,.,v, [stat.ML] , Jun 备注：PhD thesis, Li`ege, Belgium, June 2019. Permalink : this http URL 链接：https://arxiv.org/abs/2106.09473 摘要：如今，新技术，特别是人工智能，在我们的社会中越来越成熟。大数据分析和机器学习是人工智能的两个子领域，是许多应用领域（如医学、通信、金融等）最近取得突破的核心，包括一些与我们日常生活密切相关的领域（如社交网络、计算机、智能手机等）。在机器学习中，显著的改进通常是以不断增加的计算复杂度为代价的，这要归功于更大的数据集。目前，由最先进的机器学习算法构建的尖端模型通常同时变得非常高效和有利可图，但也非常复杂。它们的复杂性是如此之大，以至于这些模型通常被视为提供无法解释或证明的预测或决策的黑匣子。然而，无论这些模型是自动使用还是作为一个简单的决策支持工具，它们已经被用于机器学习应用程序中，那里的健康和人类生命都受到威胁。因此，显然有必要在不详细了解这些模型的预测或决定的情况下，不要盲目相信这些模型产生的一切。因此，本论文的目的在于提高由一系列特定的机器学习算法所建立的模型的可解释性，即所谓的基于树的方法。一些机制已经被提出来解释这些模型，我们的目的是沿着这篇论文来提高他们的理解，研究他们的性质，并界定他们的局限性。摘要：Nowadays new technologies, and especially artificial intelligence, are more and more established in our society. Big data analysis and machine learning, two sub-fields of artificial intelligence, are at the core of many recent breakthroughs in many application fields (e.g., medicine, communication, finance, ...), including some that are strongly related to our day-to-day life (e.g., social networks, computers, smartphones, ...). In machine learning, significant improvements are usually achieved at the price of an increasing computational complexity and thanks to bigger datasets. Currently, cutting-edge models built by the most advanced machine learning algorithms typically became simultaneously very efficient and profitable but also extremely complex. Their complexity is to such an extent that these models are commonly seen as black-boxes providing a prediction or a decision which can not be interpreted or justified. Nevertheless, whether these models are used autonomously or as a simple decision-making support tool, they are already being used in machine learning applications where health and human life are at stake. Therefore, it appears to be an obvious necessity not to blindly believe everything coming out of those models without a detailed understanding of their predictions or decisions. Accordingly, this thesis aims at improving the interpretability of models built by a specific family of machine learning algorithms, the so-called tree-based methods. Several mechanisms have been proposed to interpret these models and we aim along this thesis to improve their understanding, study their properties, and define their limitations.

【19】 Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization 标题：凸凹极小极大问题的零阶方法：在决策相关风险最小化中的应用

作者：Chinmay Maheshwari,Chih-Yuan Chiu,Eric Mazumdar,S. Shankar Sastry,Lillian J. Ratliff 机构：Electrical Engineering and Computer Sciences, University of California, Berkeley, Electrical and Computer Engineering, University of Washington, Seattle 备注：32 pages, 5 figures 链接：https://arxiv.org/abs/2106.09082 摘要：最小-最大优化正逐渐成为一个关键的框架来分析问题的鲁棒性，以战略和对手产生的数据。针对有限和结构的凸凹极小极大问题，提出了一种基于随机重组的无梯度优化梯度下降上升算法。证明了该算法具有与零阶算法相同的收敛速度。我们进一步专门化算法，以解决分布式鲁棒，决策相关的学习问题，梯度信息是不容易获得的。通过示例性仿真，我们观察到我们提出的方法学习的模型对来自数据源的敌对分布转移和战略决策同时具有鲁棒性，并且优于来自战略分类文献的现有方法。摘要：Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algorithms for convex minimization problems. We further specialize the algorithm to solve distributionally robust, decision-dependent learning problems, where gradient information is not readily available. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.

linux 网络安全 https NLP服务学习方法

0 人点赞