机器学习学术速递[10.22]

cs.LG 方向，今日共计190篇

Graph相关(图学习|图神经网络|图优化等)(9篇)

【1】 Generative Adversarial Graph Convolutional Networks for Human Action Synthesis 标题：用于人体动作综合的生成式对抗性图卷积网络链接：https://arxiv.org/abs/2110.11191

作者：Bruno Degardin,João Neves,Vasco Lopes,João Brito,Ehsan Yaghoubi,Hugo Proença 机构：IT - Instituto de Telecomunicac¸˜oes,NOVA LINCS,C,-Cloud Computing Competence Center, Universidade da Beira Interior, Portugal, DeepNeuronic 备注：Published as a conference paper at WACV 2022. Code and pretrained models available at this https URL 摘要：合成人体骨骼的空间和时间动力学仍然是一项具有挑战性的任务，不仅在生成形状的质量方面，而且在其多样性方面，特别是在合成特定动作的真实身体运动（动作调节）方面。在本文中，我们提出了动态GAN，这是一种利用生成对抗网络和图卷积网络的优点来综合人体动力学的新型架构。所提出的对抗性架构可以针对局部和全局身体运动调节多达120种不同的动作，同时通过潜在空间分离和随机变化提高样本质量和多样性。我们的实验在三个著名的数据集中进行，其中动力学GAN在分布质量指标方面明显优于最先进的方法，同时能够综合不同行动数量的一个数量级以上。我们的代码和模型在https://github.com/DegardinBruno/Kinetic-GAN. 摘要：Synthesising the spatial and temporal dynamics of the human body skeleton remains a challenging task, not only in terms of the quality of the generated shapes, but also of their diversity, particularly to synthesise realistic body movements of a specific action (action conditioning). In this paper, we propose Kinetic-GAN, a novel architecture that leverages the benefits of Generative Adversarial Networks and Graph Convolutional Networks to synthesise the kinetics of the human body. The proposed adversarial architecture can condition up to 120 different actions over local and global body movements while improving sample quality and diversity through latent space disentanglement and stochastic variations. Our experiments were carried out in three well-known datasets, where Kinetic-GAN notably surpasses the state-of-the-art methods in terms of distribution quality metrics while having the ability to synthesise more than one order of magnitude regarding the number of different actions. Our code and models are publicly available at https://github.com/DegardinBruno/Kinetic-GAN.

【2】 Watermarking Graph Neural Networks based on Backdoor Attacks 标题：基于后门攻击的水印图神经网络链接：https://arxiv.org/abs/2110.11024

作者：Jing Xu,Stjepan Picek 机构：Delft University of Technology 摘要：图形神经网络（GNNs）在各种实际应用中取得了良好的性能。构建一个强大的GNN模型并不是一项简单的任务，因为它需要大量的训练数据、强大的计算资源以及对模型进行微调的人力资源。此外，随着对抗性攻击（如模型窃取攻击）的发展，GNN对模型认证提出了挑战。为了避免侵犯GNN的版权，有必要验证GNN模型的所有权。在本文中，我们提出了一个用于GNNs的图和节点分类任务的水印框架。我们1）设计了两种策略来生成用于图分类的水印数据和一种用于节点分类任务的水印数据，2）通过训练将水印嵌入到主机模型中以获得带水印的GNN模型，以及3）在黑盒设置中验证可疑模型的所有权。实验表明，我们的框架能够以非常高的概率（大约$100%$）验证两个任务的GNN模型的所有权。此外，我们的实验表明，我们的水印方法仍然是有效的，即使在考虑从不同的体系结构获得的可疑模型时也是如此。摘要：Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. Building a powerful GNN model is not a trivial task, as it requires a large amount of training data, powerful computing resources, and human expertise on fine-tuning the model. What is more, with the development of adversarial attacks, e.g., model stealing attacks, GNNs raise challenges to model authentication. To avoid copyright infringement on GNNs, it is necessary to verify the ownership of the GNN models. In this paper, we present a watermarking framework for GNNs for both graph and node classification tasks. We 1) design two strategies to generate watermarked data for the graph classification and one for the node classification task, 2) embed the watermark into the host model through training to obtain the watermarked GNN model, and 3) verify the ownership of the suspicious model in a black-box setting. The experiments show that our framework can verify the ownership of GNN models with a very high probability (around $100%$) for both tasks. In addition, we experimentally show that our watermarking approach is still effective even when considering suspicious models obtained from different architectures than the owner's.

【3】 Learning Time-Varying Graphs from Online Data 标题：从在线数据中学习时变图链接：https://arxiv.org/abs/2110.11017

作者：Alberto Natali,Elvin Isufi,Mario Coutino,Geert Leus 机构： Delft University of Technology 摘要：本文提出了一个从在线数据中学习时变图的算法框架。该框架所提供的通用性使得它与模型无关，也就是说，它可以在抽象公式中进行理论分析，然后在各种依赖模型的图学习问题下进行实例化。这可以通过将（时变）图形学习作为复合优化问题来实现，其中不同的函数调节不同的需求，例如数据保真度、稀疏性或平滑度。这些发现的工具是认识到大多数（如果不是全部）数据驱动的图形学习算法对数据的依赖性是通过经验协方差矩阵施加的，这表示估计问题的充分统计。其用户定义的递归更新使框架能够在非平稳环境中工作，而基于新的时变优化工具的迭代算法明确考虑了时间动态，加快了收敛速度，并隐含了解决方案的时间正则化。我们将该框架专门用于三种著名的图形学习模型，即高斯图形模型（GGM）、结构方程模型（SEM）和基于平滑度的模型（SBM），其中我们还介绍了结构化矩阵（对称、空心等）的特殊矢量化方案这对于执行正确的梯度计算至关重要，而不是能够在低维向量空间中工作，从而缓解存储需求。在讨论了所提出的框架的理论保证之后，我们用合成和真实数据进行了大量的数值试验来证实它。摘要：This work proposes an algorithmic framework to learn time-varying graphs from online data. The generality offered by the framework renders it model-independent, i.e., it can be theoretically analyzed in its abstract formulation and then instantiated under a variety of model-dependent graph learning problems. This is possible by phrasing (time-varying) graph learning as a composite optimization problem, where different functions regulate different desiderata, e.g., data fidelity, sparsity or smoothness. Instrumental for the findings is recognizing that the dependence of the majority (if not all) data-driven graph learning algorithms on the data is exerted through the empirical covariance matrix, representing a sufficient statistic for the estimation problem. Its user-defined recursive update enables the framework to work in non-stationary environments, while iterative algorithms building on novel time-varying optimization tools explicitly take into account the temporal dynamics, speeding up convergence and implicitly including a temporal-regularization of the solution. We specialize the framework to three well-known graph learning models, namely, the Gaussian graphical model (GGM), the structural equation model (SEM), and the smoothness-based model (SBM), where we also introduce ad-hoc vectorization schemes for structured matrices (symmetric, hollows, etc.) which are crucial to perform correct gradient computations, other than enabling to work in low-dimensional vector spaces and hence easing storage requirements. After discussing the theoretical guarantees of the proposed framework, we corroborate it with extensive numerical tests in synthetic and real data.

【4】 High-resolution rainfall-runoff modeling using graph neural network 标题：基于图神经网络的高分辨率降雨径流模拟链接：https://arxiv.org/abs/2110.10833

作者：Zhongrun Xiang,Ibrahim Demir 机构：University of Iowa, Iowa City, IA 摘要：时间序列建模在最近使用最新的深度学习算法（如LSTM）的研究中显示出巨大的前景。这些研究主要集中在流域尺度降雨径流建模或径流预测，但大多数研究仅将单个流域视为一个单元。虽然这种简化非常有效，但它没有考虑到空间信息，这可能会导致大流域中的重大误差。一些研究通过将一个大流域分解为多个子流域，调查了GNN（图形神经网络）在数据集成中的应用，但每个子流域仍然被视为一个整体，并且流域中包含的地理信息没有得到充分利用。在本文中，我们提出了GNRRM（图形神经降雨径流模型），这是一种新的深度学习模型，它充分利用了来自高分辨率降雨数据的空间信息，包括流向和地理信息。与基线模型相比，GNRRM具有较少的过度拟合，并显著提高了模型性能。我们的研究结果支持水文数据在基于深度学习的降雨径流建模中的重要性，我们鼓励研究人员在其模型中包含更多领域知识。摘要：Time-series modeling has shown great promise in recent studies using the latest deep learning algorithms such as LSTM (Long Short-Term Memory). These studies primarily focused on watershed-scale rainfall-runoff modeling or streamflow forecasting, but the majority of them only considered a single watershed as a unit. Although this simplification is very effective, it does not take into account spatial information, which could result in significant errors in large watersheds. Several studies investigated the use of GNN (Graph Neural Networks) for data integration by decomposing a large watershed into multiple sub-watersheds, but each sub-watershed is still treated as a whole, and the geoinformation contained within the watershed is not fully utilized. In this paper, we propose the GNRRM (Graph Neural Rainfall-Runoff Model), a novel deep learning model that makes full use of spatial information from high-resolution precipitation data, including flow direction and geographic information. When compared to baseline models, GNRRM has less over-fitting and significantly improves model performance. Our findings support the importance of hydrological data in deep learning-based rainfall-runoff modeling, and we encourage researchers to include more domain knowledge in their models.

【5】 SEA: Graph Shell Attention in Graph Neural Networks 标题：SEA：图神经网络中的图壳注意力链接：https://arxiv.org/abs/2110.10674

作者：Christian M. M. Frey,Yunpu Ma,Matthias Schubert 机构：Institute for Informatics, Oettingenstr. , Munich, Germany 摘要：图神经网络（GNNs）中的一个常见问题是过度平滑。通过增加GNN消息传递中的迭代次数，节点对输入图的表示相互对齐，变得不可分辨。最近，有研究表明，通过集成注意机制来增加模型的复杂性，可以产生更具表现力的体系结构。这主要有助于引导节点的表示只指向比其他节点更具信息性的节点。Transformer模型与GNN的结合产生了包括图形转换器层（GTL）在内的体系结构，其中的层完全基于注意操作。然而，节点表示的计算仍然局限于GNN的计算工作流程。在我们的工作中，我们通过实现一个路由启发式来放松GNN体系结构。具体而言，节点的表示将路由给专门的专家。每个专家根据各自的GNN工作流计算表示。可分辨GNN的定义来自于从中心节点开始的k本地化视图。我们将此过程称为图壳注意（SEA），专家以Transformer驱动的方式处理不同的子图。直观地说，通过增加专家的数量，模型的表达能力增强，因此节点的表示仅基于位于专家感受区内的节点。我们在各种基准数据集上评估了我们的体系结构，与最先进的模型相比，这些数据集显示了具有竞争力的结果。摘要：A common issue in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes' representations of the input graph align with each other and become indiscernible. Recently, it has been shown that increasing a model's complexity by integrating an attention mechanism yields more expressive architectures. This is majorly contributed to steering the nodes' representations only towards nodes that are more informative than others. Transformer models in combination with GNNs result in architectures including Graph Transformer Layers (GTL), where layers are entirely based on the attention operation. However, the calculation of a node's representation is still restricted to the computational working flow of a GNN. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes' representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph Shell Attention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node's representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results compared to state-of-the-art models.

【6】 Distributionally Robust Semi-Supervised Learning Over Graphs 标题：基于图的分布鲁棒半监督学习链接：https://arxiv.org/abs/2110.10582

作者：Alireza Sadeghi,Meng Ma,Bingcong Li,Georgios B. Giannakis 机构：Department of ECE and Digital Technology Center, University of Minnesota, Minneapolis, MN , USA 摘要：图结构数据上的半监督学习（SSL）出现在许多网络科学应用中。为了有效地管理图上的学习，最近发展了各种图神经网络（GNN）。通过简洁地编码局部图结构和节点特征，最先进的GNN可以随着图的大小线性扩展。尽管在实践中取得了成功，但现有的大多数方法都无法处理节点属性不确定的图。特别是当训练和测试数据分布之间存在不匹配时，这些模型在实践中就会失败。由于与噪声测量获得的数据相关的分布不确定性，也带来了挑战。在此背景下，开发了一个分布式鲁棒学习框架，其目标是训练对扰动具有可量化鲁棒性的模型。数据分布被认为是未知的，但位于以经验数据分布为中心的Wasserstein球内。通过最小化该球的最坏期望损失，得到了一个鲁棒模型。然而，解决新出现的函数优化问题是具有挑战性的，如果不是不可能的话。提倡一个强对偶条件，我们发展了一个原则性的方法，使问题易于处理和有效地解决。实验评估了该方法的性能。摘要：Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications. To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently. By succinctly encoding local graph structures and features of nodes, state-of-the-art GNNs can scale linearly with the size of graph. Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes. Specifically whenever mismatches between training and testing data distribution exists, these models fail in practice. Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements. In this context, a distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations. The data distribution is considered unknown, but lies within a Wasserstein ball centered around empirical data distribution. A robust model is obtained by minimizing the worst expected loss over this ball. However, solving the emerging functional optimization problem is challenging, if not impossible. Advocating a strong duality condition, we develop a principled method that renders the problem tractable and efficiently solvable. Experiments assess the performance of the proposed method.

【7】 Recurrent Brain Graph Mapper for Predicting Time-Dependent Brain Graph Evaluation Trajectory 标题：递归脑图映射器在预测时间相关脑图评估轨迹中的应用链接：https://arxiv.org/abs/2110.11237

作者：Alpay Tekin,Ahmed Nebli,Islem Rekik 机构：ID ,†, BASIRA Lab, Istanbul Technical University, Istanbul, Turkey, National School of Computer Science (ENSI), University of Manouba, Manouba, Tunisia 摘要：通过观察大脑结构和功能连接的变化可以发现几种脑部疾病。神经学发现表明，早期诊断脑部疾病，如轻度认知障碍（MCI），可以预防甚至逆转其发展为阿尔茨海默病（AD）。在此背景下，最近的研究旨在通过提出处理大脑图像的机器学习模型来预测大脑连接性随时间的演变。然而，这种方法成本高昂且耗时。在这里，我们建议使用大脑连接性作为一种更有效的替代方法，用于时间依赖性大脑疾病的认知诊断将大脑作为一个大的互连图来描述几个大脑区域之间的互连方案。我们将我们提出的方法称为递归脑图映射器（RBGM），一种新的高效的基于边缘的递归图神经网络，可从单个基线预测大脑图的时间相关评估轨迹。我们的RBGM包含一组受递归神经网络启发的映射器，用于每个时间点，其中每个映射器旨在将基本真实大脑图投影到其下一个时间点。我们利用教师强迫法用于加强训练和提高进化脑图的质量。为了在每个时间点保持预测脑图与其对应的基本真实脑图之间的拓扑一致性，我们进一步整合了拓扑损失。我们还使用l1损失捕获时间依赖性并最小化距离在连续时间点的脑图之间进行正则化。针对几种RBGM变体的基准测试和最先进的方法证明，我们可以在更有效地预测脑图进化方面达到相同的精度，为新的图神经网络结构和高效的训练方案铺平了道路。摘要：Several brain disorders can be detected by observing alterations in the brain's structural and functional connectivities. Neurological findings suggest that early diagnosis of brain disorders, such as mild cognitive impairment (MCI), can prevent and even reverse its development into Alzheimer's disease (AD). In this context, recent studies aimed to predict the evolution of brain connectivities over time by proposing machine learning models that work on brain images. However, such an approach is costly and time-consuming. Here, we propose to use brain connectivities as a more efficient alternative for time-dependent brain disorder diagnosis by regarding the brain as instead a large interconnected graph characterizing the interconnectivity scheme between several brain regions. We term our proposed method Recurrent Brain Graph Mapper (RBGM), a novel efficient edge-based recurrent graph neural network that predicts the time-dependent evaluation trajectory of a brain graph from a single baseline. Our RBGM contains a set of recurrent neural network-inspired mappers for each time point, where each mapper aims to project the ground-truth brain graph onto its next time point. We leverage the teacher forcing method to boost training and improve the evolved brain graph quality. To maintain the topological consistency between the predicted brain graphs and their corresponding ground-truth brain graphs at each time point, we further integrate a topological loss. We also use l1 loss to capture time-dependency and minimize the distance between the brain graph at consecutive time points for regularization. Benchmarks against several variants of RBGM and state-of-the-art methods prove that we can achieve the same accuracy in predicting brain graph evolution more efficiently, paving the way for novel graph neural network architecture and a highly efficient training scheme.

【8】 Online non-parametric change-point detection for heterogeneous data streams observed over graph nodes 标题：基于图节点的异构数据流在线非参数变点检测链接：https://arxiv.org/abs/2110.10518

作者：Alejandro de la Concha,Argyris Kalogeratos,Nicolas Vayatis 机构：Center Borelli, ENS Paris-Saclay, Universit´e Paris-Saclay, Gif-sur-Yvette, France 备注：11 pages 摘要：考虑由图的节点生成的异构数据流。数据流本质上由多个流组成，可能具有不同的性质，取决于每个节点。在给定时刻$tau$，节点$C$的子集出现一个变化点，表示其关联流的概率分布发生变化。在本文中，我们提出了一种在线非参数方法来推断$tau$，该方法基于与每个节点的数据流相关的变更后和变更前分布之间的似然比的直接估计。我们提出了一种基于核的方法，假设图的连通节点在没有变化点的情况下具有相似的似然比估计。我们在合成实验和实际应用中展示了我们的方法的质量。摘要：Consider a heterogeneous data stream being generated by the nodes of a graph. The data stream is in essence composed by multiple streams, possibly of different nature that depends on each node. At a given moment $tau$, a change-point occurs for a subset of nodes $C$, signifying the change in the probability distribution of their associated streams. In this paper we propose an online non-parametric method to infer $tau$ based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distribution associated with the data stream of each node. We propose a kernel-based method, under the hypothesis that connected nodes of the graph are expected to have similar likelihood-ratio estimates when there is no change-point. We demonstrate the quality of our method on synthetic experiments and real-world applications.

【9】 Computational Graph Completion 标题：计算图完成链接：https://arxiv.org/abs/2110.10323

作者：Houman Owhadi 备注：31 pages 摘要：我们介绍了一个用于生成、组织和推理计算知识的框架。其动机是观察到，计算科学与工程（CSE）中的大多数问题可以描述为（从数据）完成表示函数和变量之间依赖关系的计算图。函数和变量可以是已知的、未知的或随机的。数据以观察图中有限数量的变量子集的不同值的形式出现。基本问题结合了回归问题（逼近未知函数）和矩阵完成问题（恢复数据中未观察到的变量）。用高斯过程（GPs）代替未知函数，并对观测数据进行调节，提供了一种简单但有效的方法来完成此类图。由于该框架具有很强的表达能力，因此具有广阔的应用前景。由于完成过程可以自动化，当人们不用考虑就在袖珍计算器上解决$sqrt{sqrt{2} sqrt{3}}$时，利用所提出的框架，可以通过绘制图表来解决复杂的CSE问题。与传统的克里格法相比，该框架可以利用多个函数和变量之间的相互依赖关系来恢复数据稀少的未知函数。因此，所提出的框架解决的计算图完成（CGC）问题也可以解释为求解线性方程组的问题的推广，即近似具有噪声、不完全和非线性依赖的未知变量和函数的问题。大量示例说明了CGC框架的灵活性、范围、有效性和鲁棒性，并说明了如何将其用作确定经典CSE问题（数字孪生模型、降维、模式分解等）简单解决方案的途径。摘要：We introduce a framework for generating, organizing, and reasoning with computational knowledge. It is motivated by the observation that most problems in Computational Sciences and Engineering (CSE) can be described as that of completing (from data) a computational graph representing dependencies between functions and variables. Functions and variables may be known, unknown, or random. Data comes in the form of observations of distinct values of a finite number of subsets of the variables of the graph. The underlying problem combines a regression problem (approximating unknown functions) with a matrix completion problem (recovering unobserved variables in the data). Replacing unknown functions by Gaussian Processes (GPs) and conditioning on observed data provides a simple but efficient approach to completing such graphs. Since the proposed framework is highly expressive, it has a vast potential application scope. Since the completion process can be automatized, as one solves $sqrt{sqrt{2} sqrt{3}}$ on a pocket calculator without thinking about it, one could, with the proposed framework, solve a complex CSE problem by drawing a diagram. Compared to traditional kriging, the proposed framework can be used to recover unknown functions with much scarcer data by exploiting interdependencies between multiple functions and variables. The Computational Graph Completion (CGC) problem addressed by the proposed framework could therefore also be interpreted as a generalization of that of solving linear systems of equations to that of approximating unknown variables and functions with noisy, incomplete, and nonlinear dependencies. Numerous examples illustrate the flexibility, scope, efficacy, and robustness of the CGC framework and show how it can be used as a pathway to identifying simple solutions to classical CSE problems (digital twin modeling, dimension reduction, mode decomposition, etc.).

Transformer(4篇)

【1】 Transformer Acceleration with Dynamic Sparse Attention 标题：具有动态稀疏关注的Transformer加速链接：https://arxiv.org/abs/2110.11299

作者：Liu Liu,Zheng Qu,Zhaodong Chen,Yufei Ding,Yuan Xie 机构： we propose to exploitthe dynamic sparse patterns to save attention computationsEqual contribution 1Department of Computer Science, USA 2Department of Electricaland Computer Engineering, University of California 摘要：Transformer是NLP应用的主流，在计算机视觉等其他领域也越来越流行。尽管模型质量有所提高，但巨大的计算成本使得Transformer难以部署，尤其是在新兴应用中序列长度较大的情况下。加工注意机制作为Transformer的重要组成部分，由于其二次复杂性，成为执行的瓶颈。现有技术注意探索稀疏模式以支持长序列建模，但这些工作是基于静态或固定模式的。我们证明了稀疏模式是动态的，取决于输入序列。因此，我们提出了动态稀疏注意（DSA），可以有效地利用Transformer注意中的动态稀疏性。与其他方法相比，我们的方法可以在准确性和模型复杂性之间实现更好的权衡。展望未来，我们确定了挑战，并提供了在现有硬件（GPU）和专用硬件上实现DSA的解决方案，以实现Transformer执行的实际加速和效率改进。摘要：Transformers are the mainstream of NLP applications and are becoming increasingly popular in other domains such as Computer Vision. Despite the improvements in model quality, the enormous computation costs make Transformers difficult at deployment, especially when the sequence length is large in emerging applications. Processing attention mechanism as the essential component of Transformer is the bottleneck of execution due to the quadratic complexity. Prior art explores sparse patterns in attention to support long sequence modeling, but those pieces of work are on static or fixed patterns. We demonstrate that the sparse patterns are dynamic, depending on input sequences. Thus, we propose the Dynamic Sparse Attention (DSA) that can efficiently exploit the dynamic sparsity in the attention of Transformers. Compared with other methods, our approach can achieve better trade-offs between accuracy and model complexity. Moving forward, we identify challenges and provide solutions to implement DSA on existing hardware (GPUs) and specialized hardware in order to achieve practical speedup and efficiency improvements for Transformer execution.

【2】 Few-Shot Temporal Action Localization with Query Adaptive Transformer 标题：基于查询自适应变换的Few-Shot时间动作定位链接：https://arxiv.org/abs/2110.10552

作者：Sauradip Nag,Xiatian Zhu,Tao Xiang 机构： Centre for Vision Speech and Signal, Processing (CVSSP), University of Surrey, UK, iFlyTek-Surrey Joint Research, Centre on Artificial Intelligence 备注：BMVC 2021 摘要：现有的时间动作定位（TAL）工作依赖于大量具有详尽片段级注释的训练视频，防止它们扩展到新类。作为这个问题的一个解决方案，Few-ShotTAL（FS-TAL）的目标是使模型适应一个新的类，该类仅由一个视频表示。现有的FS-TAL方法假设新课程的训练视频经过修剪。但是，此设置不仅适用于通常在未剪辑视频中捕获的非自然动作，而且还忽略包含用于前景动作分割的重要上下文提示的背景视频片段。在这项工作中，我们首先提出了一个新的FS-TAL设置，建议使用未剪辑的训练视频。此外，提出了一种新的FS-TAL模型，该模型最大化了训练课程的知识转移，同时使模型能够动态地适应新课程和该课程的每个视频。这是通过在模型中引入查询自适应转换器来实现的。在两个动作定位基准上的大量实验表明，我们的方法在单域和跨域场景中都能显著优于所有最先进的方案。源代码可以在中找到https://github.com/sauradip/fewshotQAT 摘要：Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setting is not only unnatural actions are typically captured in untrimmed videos, but also ignores background video segments containing vital contextual cues for foreground action segmentation. In this work, we first propose a new FS-TAL setting by proposing to use untrimmed training videos. Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously. This is achieved by introducing a query adaptive Transformer in the model. Extensive experiments on two action localization benchmarks demonstrate that our method can outperform all the state of the art alternatives significantly in both single-domain and cross-domain scenarios. The source code can be found in https://github.com/sauradip/fewshotQAT

【3】 JavaBERT: Training a transformer-based model for the Java programming language 标题：JavaBERT：为Java编程语言训练基于转换器的模型链接：https://arxiv.org/abs/2110.10404

作者：Nelson Tavares de Sousa,Wilhelm Hasselbring 机构：Software Engineering Group, Kiel University, Kiel, Germany 备注：6 pages, to appear in the Proceedings of the 9th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE'2021) 摘要：在开发新的软件代码时，代码质量是并且将是一个关键因素，需要适当的工具来确保代码的功能和可靠性。机器学习技术仍然很少用于软件工程工具，因此错过了其应用的潜在好处。自然语言处理已经显示出处理各种任务的文本数据的潜力。我们认为，这样的模型也可以为软件代码处理带来类似的好处。在本文中，我们研究了用于自然语言处理的模型如何在软件代码上进行训练。我们引入了一个用于软件代码的数据检索管道，并基于Java软件代码训练了一个模型。由此产生的模型JavaBERT在屏蔽语言建模任务上显示出了很高的准确性，显示了它对于软件工程工具的潜力。摘要：Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, that such models can also show similar benefits for software code processing. In this paper, we investigate how models used for natural language processing can be trained upon software code. We introduce a data retrieval pipeline for software code and train a model upon Java software code. The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task showing its potential for software engineering tools.

【4】 AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation 标题：After-UNET：用于医学图像分割的轴向融合TransformerUNET 链接：https://arxiv.org/abs/2110.10403

作者：Xiangyi Yan,Hao Tang,Shanlin Sun,Haoyu Ma,Deying Kong,Xiaohui Xie 机构：University of California, Irvine 摘要：基于Transformer的模型的最新进展引起了人们对在医学图像分割中探索这些技术的关注，特别是结合U-Net模型（或其变体），该模型在二维和三维设置下的医学图像分割中显示了巨大的成功。目前的基于2D的方法要么直接用纯Transformer替换卷积层，要么考虑Transformer作为U-NET的编码器和解码器之间的附加中间编码器。然而，这些方法只考虑一个切片内的注意力编码，并且不利用由3D体积自然提供的轴向轴信息。在3D设置中，体积数据的卷积和Transformer都会消耗大量GPU内存。人们要么减少图像的采样，要么使用裁剪过的局部补丁来减少GPU内存的使用，这限制了它的性能。在本文中，我们提出了轴向融合TransformerUNet（在UNet之后），它既利用了卷积层提取细节特征的能力，又利用了Transformer在长序列建模中的强度。它同时考虑了片内和片间的长距离线索来指导分割。同时，与以前基于Transformer的模型相比，它的参数更少，需要更少的GPU内存进行训练。在三个多器官分割数据集上的大量实验表明，我们的方法优于目前最先进的方法。摘要：Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

GAN|对抗|攻击|生成相关(16篇)

【1】 Physical Side-Channel Attacks on Embedded Neural Networks: A Survey 标题：嵌入式神经网络物理侧信道攻击研究综述链接：https://arxiv.org/abs/2110.11290

作者：Maria Méndez Real,Rubén Salvador 机构：��, Citation: Méndez Real, M.; Salvador, R. Physical Side-Channel Attacks on, Embedded Neural Networks: A, Survey. Preprints ,. 备注：None 摘要：在过去的十年中，深度神经网络（DNN）已逐步集成到所有类型的平台上，从数据中心到嵌入式系统，包括低功耗处理器，以及最近的FPGA。神经网络（NN）有望通过转换各种现实世界的应用，包括安全关键和安全敏感领域的应用，在物联网系统中变得无处不在。然而，嵌入式NN实现的底层硬件安全漏洞仍未解决。特别是，嵌入式DNN实现容易受到侧通道分析（SCA）攻击，这在物联网和边缘计算环境中尤其重要，在这些环境中，攻击者通常可以获得对目标设备的物理访问。因此，就SCA的使用而言，出现了一个研究领域，该领域正在迅速发展，包括针对NN嵌入式实现的定时、电磁攻击和电源攻击。自2018年以来，研究论文表明，SCA使攻击者能够恢复推理模型体系结构和参数，暴露工业IP，并危及数据机密性和隐私。目前为止，在文献中没有对这一新兴领域进行全面回顾的情况下，本文调查了与在微控制器和FPGA上实施嵌入式DNN相关的最先进的物理SCA攻击，以便对当前形势进行全面分析。它提供了当前攻击的分类和详细分类。它首先讨论缓解技术，然后为未来的研究线索提供见解。摘要：During the last decade, Deep Neural Networks (DNN) have progressively been integrated on all types of platforms, from data centers to embedded systems including low-power processors and, recently, FPGAs. Neural Networks (NN) are expected to become ubiquitous in IoT systems by transforming all sorts of real-world applications, including applications in the safety-critical and security-sensitive domains. However, the underlying hardware security vulnerabilities of embedded NN implementations remain unaddressed. In particular, embedded DNN implementations are vulnerable to Side-Channel Analysis (SCA) attacks, which are especially important in the IoT and edge computing contexts where an attacker can usually gain physical access to the targeted device. A research field has therefore emerged and is rapidly growing in terms of the use of SCA including timing, electromagnetic attacks and power attacks to target NN embedded implementations. Since 2018, research papers have shown that SCA enables an attacker to recover inference models architectures and parameters, to expose industrial IP and endangers data confidentiality and privacy. Without a complete review of this emerging field in the literature so far, this paper surveys state-of-the-art physical SCA attacks relative to the implementation of embedded DNNs on micro-controllers and FPGAs in order to provide a thorough analysis on the current landscape. It provides a taxonomy and a detailed classification of current attacks. It first discusses mitigation techniques and then provides insights for future research leads.

【2】 Super-resolution of multiphase materials by combining complementary 2D and 3D image data using generative adversarial networks 标题：利用生成对抗性网络结合互补的2D和3D图像数据的多相材料的超分辨率链接：https://arxiv.org/abs/2110.11281

作者：Amir Dahari,Steve Kench,Isaac Squires,Samuel J. Cooper 机构：Dyson School of Design Engineering, Imperial College London, London SW,DB 摘要：模拟材料细观结构对设备级性能的影响通常需要访问包含所有相关信息的3D图像数据，以定义模拟域的几何结构。该图像数据必须包括相位之间的足够对比度，以区分每种材料，具有足够高的分辨率以捕捉关键细节，并且具有足够大的视野，以代表一般材料。从单一成像技术中获得具有所有这些特性的数据几乎是不可能的。在本文中，我们提出了一种方法，用于组合来自不同但互补的成像技术对的信息，以便准确地重建所需的多相位、高分辨率、代表性的三维图像。具体来说，我们使用深度卷积生成对抗网络来实现超分辨率、风格转换和维度扩展。为了证明该工具的广泛适用性，使用两对数据集来验证通过融合成对成像技术产生的信息生成的体积的质量。在每种情况下计算了三个关键的细观结构指标，以表明该方法的准确性。在对我们的方法的准确性充满信心的情况下，我们通过将其应用于锂离子电池电极的真实数据对来证明其威力，因为在文献中，所需的3D高分辨率图像数据在任何地方都不可用。我们相信这种方法在保真度和易用性方面都优于先前报道的统计材料重建方法。此外，训练该算法所需的许多数据已经存在于文献中，等待合并。因此，我们的开放存取代码可以通过生成模拟中尺度行为所需的难以获得的高质量图像体积来加速阶跃变化。摘要：Modelling the impact of a material's mesostructure on device level performance typically requires access to 3D image data containing all the relevant information to define the geometry of the simulation domain. This image data must include sufficient contrast between phases to distinguish each material, be of high enough resolution to capture the key details, but also have a large enough field-of-view to be representative of the material in general. It is rarely possible to obtain data with all of these properties from a single imaging technique. In this paper, we present a method for combining information from pairs of distinct but complementary imaging techniques in order to accurately reconstruct the desired multi-phase, high resolution, representative, 3D images. Specifically, we use deep convolutional generative adversarial networks to implement super-resolution, style transfer and dimensionality expansion. To demonstrate the widespread applicability of this tool, two pairs of datasets are used to validate the quality of the volumes generated by fusing the information from paired imaging techniques. Three key mesostructural metrics are calculated in each case to show the accuracy of this method. Having confidence in the accuracy of our method, we then demonstrate its power by applying to a real data pair from a lithium ion battery electrode, where the required 3D high resolution image data is not available anywhere in the literature. We believe this approach is superior to previously reported statistical material reconstruction methods both in terms of its fidelity and ease of use. Furthermore, much of the data required to train this algorithm already exists in the literature, waiting to be combined. As such, our open-access code could precipitate a step change by generating the hard to obtain high quality image volumes necessary to simulate behaviour at the mesoscale.

【3】 Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness 标题：基于对抗性鲁棒性镜头的神经组合求解器的泛化链接：https://arxiv.org/abs/2110.10942

作者：Simon Geisler,Johanna Sommer,Jan Schuchardt,Aleksandar Bojchevski,Stephan Günnemann 机构：Stephan G¨unnemann, Technical University of Munich 摘要：端到端（几何）深度学习在近似求解组合优化问题方面取得了第一次成功。然而，在NP难/完全任务领域生成数据带来了实践和理论挑战，导致评估协议过于乐观。具体地说，大多数数据集只捕获一个更简单的子问题，并且可能存在虚假特性。我们通过研究对抗性稳健性（一种局部泛化特性）来研究这些影响，以揭示特定于模型的硬实例和虚假特征。为此，我们推导了SAT和TSP的扰动模型。与其他应用不同，在其他应用中，扰动模型是围绕不可感知性的主观概念设计的，我们的扰动模型是有效和可靠的，允许我们在没有解算器的情况下确定扰动样本的真实标签。令人惊讶的是，在这种扰动下，一个充分表达的神经解算器不会受到监督学习中常见的精度-稳健性权衡的限制。虽然存在这样的鲁棒解算器，但我们的经验表明，经过评估的神经解算器不能很好地推广问题实例的w.r.t.小扰动。摘要：End-to-end (geometric) deep learning has seen first successes in approximating the solution of combinatorial optimization problems. However, generating data in the realm of NP-hard/-complete tasks brings practical and theoretical challenges, resulting in evaluation protocols that are too optimistic. Specifically, most datasets only capture a simpler subproblem and likely suffer from spurious features. We investigate these effects by studying adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. For this purpose, we derive perturbation models for SAT and TSP. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound, allowing us to determine the true label of perturbed samples without a solver. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning. Although such robust solvers exist, we show empirically that the assessed neural solvers do not generalize well w.r.t. small perturbations of the problem instance.

【4】 On some theoretical limitations of Generative Adversarial Networks 标题：论生成性对抗网络的若干理论局限性链接：https://arxiv.org/abs/2110.10915

作者：Benoît Oriol,Alexandre Miot 备注：7 pages 摘要：生成性对抗网络已经成为机器学习的核心技术，它可以从数据样本中生成未知分布。它们被广泛应用于各种环境中，但没有注意到这些模型可能存在的理论局限性。事实上，由于神经网络的普遍逼近特性，一般假设GANs可以生成任何概率分布。最近，人们开始质疑这一假设，而本文正符合这种想法。我们提供了一个基于极值理论的新结果，表明GANs不能生成重尾分布。给出了这一结果的充分证明。摘要：Generative Adversarial Networks have become a core technique in Machine Learning to generate unknown distributions from data samples. They have been used in a wide range of context without paying much attention to the possible theoretical limitations of those models. Indeed, because of the universal approximation properties of Neural Networks, it is a general assumption that GANs can generate any probability distribution. Recently, people began to question this assumption and this article is in line with this thinking. We provide a new result based on Extreme Value Theory showing that GANs can't generate heavy tailed distributions. The full proof of this result is given.

【5】 Controllable and Compositional Generation with Latent-Space Energy-Based Models 标题：基于潜在空间能量的可控组合发电模型链接：https://arxiv.org/abs/2110.10873

作者：Weili Nie,Arash Vahdat,Anima Anandkumar 机构：Caltech, NVIDIA 备注：32 pages, NeurIPS 2021 摘要：可控发电是在实际应用中成功采用深度发电模型的关键要求之一，但它仍然是一个巨大的挑战。特别是，生成新概念组合的合成能力对于大多数当前模型来说是无法实现的。在这项工作中，我们使用基于能量的模型（EBM）来处理一组属性上的合成生成。为了使其可扩展到高分辨率图像生成，我们在预先训练的生成模型（如StyleGAN）的潜在空间中引入了EBM。我们提出了一种新的EBM公式来表示数据和属性的联合分布，并且我们展示了如何将从中提取的采样公式化为求解常微分方程（ODE）。给定一个预先训练好的生成器，我们所需要的可控生成就是训练一个属性分类器。ODE采样有效地在潜在空间进行，并且对超参数具有鲁棒性。因此，我们的方法简单，训练速度快，采样效率高。实验结果表明，我们的方法在条件采样和顺序编辑方面都优于现有的方法。在合成生成中，我们的方法优于不可见属性组合的Zero-Shot生成。此外，通过使用逻辑运算符组合能量函数，这项工作是第一次在生成分辨率为1024x1024的照片逼真图像时实现这种组合性。摘要：Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.

【6】 Style Agnostic 3D Reconstruction via Adversarial Style Transfer 标题：基于对抗性风格转移的风格不可知3D重建链接：https://arxiv.org/abs/2110.10784

作者：Felix Petersen,Bastian Goldluecke,Oliver Deussen,Hilde Kuehne 机构：University of Konstanz, University of Frankfurt, IBM-MIT Watson AI Lab 备注：To be published at WACV 2022, Code @ this https URL 摘要：从图像重建物体的三维几何结构是计算机视觉的一个主要挑战。最近引入的可微分渲染器可用于从2D图像学习对象的3D几何体，但这些方法需要额外的监督，以使渲染器能够生成可与输入图像进行比较的输出。这可以是场景信息或约束，例如对象轮廓、均匀背景、材质、纹理和照明。在本文中，我们提出了一种方法，使一个可区分的渲染为基础的学习三维物体从图像的背景，而不需要剪影监督。我们没有试图渲染接近输入的图像，而是提出了一种对抗式的传输和域适配管道，允许将输入图像域转换为渲染图像域。这使我们能够直接比较转换图像和三维对象重建的可微渲染，以训练三维对象重建网络。我们表明，该方法可以从背景图像中学习三维几何，并且在单视图三维对象重建中比约束方法具有更好的性能。摘要：Reconstructing the 3D geometry of an object from an image is a major challenge in computer vision. Recently introduced differentiable renderers can be leveraged to learn the 3D geometry of objects from 2D images, but those approaches require additional supervision to enable the renderer to produce an output that can be compared to the input image. This can be scene information or constraints such as object silhouettes, uniform backgrounds, material, texture, and lighting. In this paper, we propose an approach that enables a differentiable rendering-based learning of 3D objects from images with backgrounds without the need for silhouette supervision. Instead of trying to render an image close to the input, we propose an adversarial style-transfer and domain adaptation pipeline that allows to translate the input image domain to the rendered image domain. This allows us to directly compare between a translated image and the differentiable rendering of a 3D object reconstruction in order to train the 3D object reconstruction network. We show that the approach learns 3D geometry from images with backgrounds and provides a better performance than constrained methods for single-view 3D object reconstruction on this task.

【7】 Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees 标题：Part-X：一类概率保证的基于搜索的测试生成随机算法链接：https://arxiv.org/abs/2110.10729

作者：Giulia Pedrielli,Tanmay Kandhait,Surdeep Chotaliya,Quinn Thibeault,Hao Huang,Mauricio Castillo-Effen,Georgios Fainekos 机构：School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ , College of Engineering, Yuan Ze University, Yuandong Rd, Taoyuan City, Taiwan , Advanced Technology Laboratories, Lockheed Martin, Arlington, VA 备注：25 pages, 7 Figures 摘要：需求驱动的基于搜索的测试（也称为伪造）已被证明是发现网络物理系统中错误行为的一种实用有效的方法。尽管伪造方法的性能和适用性不断提高，但它们都有一个共同的特点。也就是说，当测试预算用尽时，它们是尽最大努力的方法，不能保证没有错误行为（伪造者）。缺乏有限时间保证是一个主要限制，它阻止了在认证程序中使用伪造方法。在本文中，我们通过发展一种新的随机算法来解决有限时间保证问题。我们提出的算法不仅估计（限制）了伪造行为存在的概率，而且还识别了这些伪造行为可能发生的区域。我们证明了我们的方法对优化文献中的标准基准函数和F16基准问题的适用性。摘要：Requirements driven search-based testing (also known as falsification) has proven to be a practical and effective method for discovering erroneous behaviors in Cyber-Physical Systems. Despite the constant improvements on the performance and applicability of falsification methods, they all share a common characteristic. Namely, they are best-effort methods which do not provide any guarantees on the absence of erroneous behaviors (falsifiers) when the testing budget is exhausted. The absence of finite time guarantees is a major limitation which prevents falsification methods from being utilized in certification procedures. In this paper, we address the finite-time guarantees problem by developing a new stochastic algorithm. Our proposed algorithm not only estimates (bounds) the probability that falsifying behaviors exist, but also it identifies the regions where these falsifying behaviors may occur. We demonstrate the applicability of our approach on standard benchmark functions from the optimization literature and on the F16 benchmark problem.

【8】 Adversarial Socialbot Learning via Multi-Agent Deep Hierarchical Reinforcement Learning 标题：基于多Agent深度分层强化学习的对抗性社会机器人学习链接：https://arxiv.org/abs/2110.10655

作者：Thai Le,Long Tran-Thanh,Dongwon Lee 机构：The Pennsylvania State University, University of Warwick 摘要：社交机器人是在社交平台上由软件驱动的用户帐户，自主行动（模仿人类行为），目的是影响其他用户的意见或传播特定目标的有针对性的错误信息。由于社交机器人破坏了社交平台的生态系统，它们通常被认为是有害的。因此，已经进行了一些计算工作来自动检测社交机器人。然而，据我们所知，这些社交机器人的对抗性尚未被研究。这就引出了一个问题“控制社交机器人的对手能否利用AI技术发挥其优势？”对于这个问题，我们成功地证明了对手确实有可能利用强化学习（RL）等计算学习机制来最大限度地发挥社交机器人的影响，同时避免被发现。我们首先将对抗式社会机器人学习描述为两个功能分层RL代理之间的合作博弈。当一个代理策划一系列可以避免检测的活动时，另一个代理的目标是通过有选择地连接正确的用户来最大化网络影响。我们提出的策略网络使用大量合成图进行训练，在强大的机器人检测器（检测准确率为90%）下，在最大化网络影响（高达 18%）和可持续的隐身性（高达 40%的不可检测性）方面，比看不见的现实生活图上的基线概括得更好。在推理过程中，我们的方法的复杂性呈线性扩展，与网络结构和新闻的病毒性无关。这使得我们的方法在实际环境中部署时成为一种实用的对抗性攻击。摘要：Socialbots are software-driven user accounts on social platforms, acting autonomously (mimicking human behavior), with the aims to influence the opinions of other users or spread targeted misinformation for particular goals. As socialbots undermine the ecosystem of social platforms, they are often considered harmful. As such, there have been several computational efforts to auto-detect the socialbots. However, to our best knowledge, the adversarial nature of these socialbots has not yet been studied. This begs a question "can adversaries, controlling socialbots, exploit AI techniques to their advantage?" To this question, we successfully demonstrate that indeed it is possible for adversaries to exploit computational learning mechanism such as reinforcement learning (RL) to maximize the influence of socialbots while avoiding being detected. We first formulate the adversarial socialbot learning as a cooperative game between two functional hierarchical RL agents. While one agent curates a sequence of activities that can avoid the detection, the other agent aims to maximize network influence by selectively connecting with right users. Our proposed policy networks train with a vast amount of synthetic graphs and generalize better than baselines on unseen real-life graphs both in terms of maximizing network influence (up to 18%) and sustainable stealthiness (up to 40% undetectability) under a strong bot detector (with 90% detection accuracy). During inference, the complexity of our approach scales linearly, independent of a network's structure and the virality of news. This makes our approach a practical adversarial attack when deployed in a real-life setting.

【9】 Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems 标题：自动驾驶系统中光信号攻击的检测与识别链接：https://arxiv.org/abs/2110.10523

作者：Jindi Zhang,Yifan Zhang,Kejie Lu,Jianping Wang,Kui Wu,Xiaohua Jia,Bin Liu 机构：University of Puerto Rico at Mayag¨uez, Wu is with the Department of Computer Science, University of Victoria 摘要：对于自动驾驶，一项基本任务是准确检测周围的物体。为此，大多数现有系统使用光学设备，包括照相机和光探测和测距（LiDAR）传感器，实时收集环境数据。近年来，许多研究人员开发了先进的机器学习模型来检测周围的物体。然而，上述光学设备容易受到光信号攻击，这可能会影响目标检测的准确性。为了解决这个关键问题，我们提出了一个框架来检测和识别受到攻击的传感器。具体来说，我们首先开发了一种新技术来检测对由三个传感器组成的系统的攻击。我们的主要想法是：1）使用来自三个传感器的数据获得两种版本的深度图（即视差），2）通过分析视差误差的分布来检测攻击。在我们的研究中，我们使用真实数据集和最先进的机器学习模型来评估我们的攻击检测方案，结果证实了我们的检测方法的有效性。基于该检测方案，我们进一步开发了一个识别模型，该模型能够在一个具有一个激光雷达和n个摄像头的系统中识别多达n-2个受到攻击的传感器。我们证明了我们的识别方案的正确性，并通过实验证明了我们的识别方法的准确性。最后，我们调查了框架的总体敏感性。摘要：For autonomous driving, an essential task is to detect surrounding objects accurately. To this end, most existing systems use optical devices, including cameras and light detection and ranging (LiDAR) sensors, to collect environment data in real time. In recent years, many researchers have developed advanced machine learning models to detect surrounding objects. Nevertheless, the aforementioned optical devices are vulnerable to optical signal attacks, which could compromise the accuracy of object detection. To address this critical issue, we propose a framework to detect and identify sensors that are under attack. Specifically, we first develop a new technique to detect attacks on a system that consists of three sensors. Our main idea is to: 1) use data from three sensors to obtain two versions of depth maps (i.e., disparity) and 2) detect attacks by analyzing the distribution of disparity errors. In our study, we use real data sets and the state-of-the-art machine learning model to evaluate our attack detection scheme and the results confirm the effectiveness of our detection method. Based on the detection scheme, we further develop an identification model that is capable of identifying up to n-2 attacked sensors in a system with one LiDAR and n cameras. We prove the correctness of our identification scheme and conduct experiments to show the accuracy of our identification method. Finally, we investigate the overall sensitivity of our framework.

【10】 Repaint: Improving the Generalization of Down-Stream Visual Tasks by Generating Multiple Instances of Training Examples 标题：重绘：通过生成训练示例的多个实例来改进下行视觉任务的泛化链接：https://arxiv.org/abs/2110.10366

作者：Amin Banitalebi-Dehkordi,Yong Zhang 机构：Huawei Technologies Canada Co., Ltd., Vancouver, Canada 备注：BMVC 2021 摘要：用于视觉任务的卷积神经网络（CNN）被认为可以在整个网络深度内学习低级纹理和高级对象属性。本文进一步研究了CNN中的“纹理偏差”。为此，我们通过一个称为“重新绘制”的过程，从每个原始图像中重新生成多个训练示例实例。重新绘制的示例保留场景中区域和对象的形状和结构，但使其纹理和颜色多样化。我们的方法可以在不同的日光、季节或天气条件下重新生成相同的图像，可以具有着色或去着色效果，甚至可以从遮光区域恢复一些纹理信息。就地重新绘制允许我们进一步使用这些重新绘制的示例来改进CNN的泛化。通过一组广泛的实验，我们证明了重新绘制的示例在训练中的有用性，用于图像分类（ImageNet）和目标检测（COCO）任务，在不同容量的多个最先进的网络体系结构中，以及在不同的数据可用性机制中。摘要：Convolutional Neural Networks (CNNs) for visual tasks are believed to learn both the low-level textures and high-level object attributes, throughout the network depth. This paper further investigates the `texture bias' in CNNs. To this end, we regenerate multiple instances of training examples from each original image, through a process we call `repainting'. The repainted examples preserve the shape and structure of the regions and objects within the scenes, but diversify their texture and color. Our method can regenerate a same image at different daylight, season, or weather conditions, can have colorization or de-colorization effects, or even bring back some texture information from blacked-out areas. The in-place repaint allows us to further use these repainted examples for improving the generalization of CNNs. Through an extensive set of experiments, we demonstrate the usefulness of the repainted examples in training, for the tasks of image classification (ImageNet) and object detection (COCO), over several state-of-the-art network architectures at different capacities, and across different data availability regimes.

【11】 Detecting Backdoor Attacks Against Point Cloud Classifiers 标题：检测针对点云分类器的后门攻击链接：https://arxiv.org/abs/2110.10354

作者：Zhen Xiang,David J. Miller,Siheng Chen,Xi Li,George Kesidis 机构：Pennsylvania State University,Shanghai Jiao Tong University 摘要：后门攻击（BA）是深层神经网络分类器的一个新威胁。当源类的测试样本嵌入后门模式（BP）时，被攻击的分类器将预测攻击者的目标类。最近，第一个针对点云（PC）的BA分类器被提出，对包括自动驾驶在内的许多重要应用造成了新的威胁。由于其特殊的BP嵌入机制，现有BA防御无法检测到此类PC-BA。在本文中，我们提出了一种反向工程防御，可以推断PC分类器是否受到后门攻击，而不访问其训练集或任何干净的分类器以供参考。我们防御的有效性在PC的基准ModeNet40数据集上得到了证明。摘要：Backdoor attacks (BA) are an emerging threat to deep neural network classifiers. A classifier being attacked will predict to the attacker's target class when a test sample from a source class is embedded with the backdoor pattern (BP). Recently, the first BA against point cloud (PC) classifiers was proposed, creating new threats to many important applications including autonomous driving. Such PC BAs are not detectable by existing BA defenses due to their special BP embedding mechanism. In this paper, we propose a reverse-engineering defense that infers whether a PC classifier is backdoor attacked, without access to its training set or to any clean classifiers for reference. The effectiveness of our defense is demonstrated on the benchmark ModeNet40 dataset for PCs.

【12】 Multi-concept adversarial attacks 标题：多概念对抗性攻击链接：https://arxiv.org/abs/2110.10287

作者：Vibha Belavadi,Yan Zhou,Murat Kantarcioglu,Bhavani M. Thuraisingham 机构：The University of Texas at Dallas 备注：20 pages, 28 figures, 9 tables 摘要：随着机器学习（ML）技术在许多应用中得到越来越多的应用，其对抗性攻击的脆弱性变得众所周知。测试时攻击通常通过向测试实例添加对抗性噪声来发起，已证明对部署的ML模型有效。在实践中，不同的ML模型可以利用一个测试输入。针对单个ML模型的测试时攻击通常会忽略它们对其他ML模型的影响。在这项工作中，我们经验证明，天真地攻击学习一个概念的分类器可能会对学习其他概念的分类器产生负面影响。例如，对于在线图像分类场景，当性别分类器受到攻击时，（戴）眼镜分类器同时受到攻击，准确度从98.69降至88.42。这提出了一个有趣的问题：是否有可能攻击一组分类器而不影响使用相同测试实例的另一组分类器？上述研究问题的答案对于保护隐私免受ML模型滥用具有有趣的意义。攻击造成不必要的隐私入侵风险的ML模型可以成为保护个人免受有害隐私攻击的重要工具。在本文中，我们通过开发新的攻击技术来解决上述研究问题，这些技术可以同时攻击一组ML模型，同时保持另一组模型的准确性。在线性分类器的情况下，我们提供了一个理论框架，用于找到生成此类对抗性示例的最佳解决方案。利用这个理论框架，我们在深度学习的背景下开发了一个多概念攻击策略。我们的结果表明，我们的技术可以成功地攻击目标类，同时在许多不同的设置中保护受保护的类，这在现有的测试时攻击单一策略中是不可能的。摘要：As machine learning (ML) techniques are being increasingly used in many applications, their vulnerability to adversarial attacks becomes well-known. Test time attacks, usually launched by adding adversarial noise to test instances, have been shown effective against the deployed ML models. In practice, one test input may be leveraged by different ML models. Test time attacks targeting a single ML model often neglect their impact on other ML models. In this work, we empirically demonstrate that naively attacking the classifier learning one concept may negatively impact classifiers trained to learn other concepts. For example, for the online image classification scenario, when the Gender classifier is under attack, the (wearing) Glasses classifier is simultaneously attacked with the accuracy dropped from 98.69 to 88.42. This raises an interesting question: is it possible to attack one set of classifiers without impacting the other set that uses the same test instance? Answers to the above research question have interesting implications for protecting privacy against ML model misuse. Attacking ML models that pose unnecessary risks of privacy invasion can be an important tool for protecting individuals from harmful privacy exploitation. In this paper, we address the above research question by developing novel attack techniques that can simultaneously attack one set of ML models while preserving the accuracy of the other. In the case of linear classifiers, we provide a theoretical framework for finding an optimal solution to generate such adversarial examples. Using this theoretical framework, we develop a multi-concept attack strategy in the context of deep learning. Our results demonstrate that our techniques can successfully attack the target classes while protecting the protected classes in many different settings, which is not possible with the existing test-time attack-single strategies.

【13】 Robust Semi-Supervised Classification using GANs with Self-Organizing Maps 标题：基于自组织映射遗传算法的鲁棒半监督分类链接：https://arxiv.org/abs/2110.10286

作者：Ronald Fick,Paul Gader,Alina Zare 机构：¶Computer and Information Science and Engineering, §Electrical and Computer Engineering, University of Florida, Gainesville, USA 备注：9 pages, 13 figures This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要：生成性对抗网络（GAN）在学习生成数据和有效辅助半监督分类方面显示出巨大的潜力。然而，在这一点上，半监督GAN方法假设未标记的数据集只包含感兴趣的类的联合分布的样本，称为内联样本。因此，当呈现来自其他分布的样本（称为异常值）时，GAN在确定其没有资格对样本做出决策时表现不佳。在保持分类精度的同时，将离群值与内联值区分开来的问题称为DOIC问题。在这项工作中，我们描述了一种将自组织映射（SOM）与SS-GANS相结合的体系结构，其目标是缓解DOIC问题，实验结果表明该体系结构实现了这一目标。在高光谱图像数据集上进行了多项实验。SS-GANS在有或没有SOM的分类问题上的表现略好于监督GANS。与SS-GANs和没有SOMs的GANs相比，将SOMs纳入SS-GANs和受监督的GANs可大大缓解内政部的问题。此外，SS-GANS在内政部问题上的表现比GANS好得多，即使没有SOM。摘要：Generative adversarial networks (GANs) have shown tremendous promise in learning to generate data and effective at aiding semi-supervised classification. However, to this point, semi-supervised GAN methods make the assumption that the unlabeled data set contains only samples of the joint distribution of the classes of interest, referred to as inliers. Consequently, when presented with a sample from other distributions, referred to as outliers, GANs perform poorly at determining that it is not qualified to make a decision on the sample. The problem of discriminating outliers from inliers while maintaining classification accuracy is referred to here as the DOIC problem. In this work, we describe an architecture that combines self-organizing maps (SOMs) with SS-GANS with the goal of mitigating the DOIC problem and experimental results indicating that the architecture achieves the goal. Multiple experiments were conducted on hyperspectral image data sets. The SS-GANS performed slightly better than supervised GANS on classification problems with and without the SOM. Incorporating the SOMs into the SS-GANs and the supervised GANS led to substantially mitigation of the DOIC problem when compared to SS-GANS and GANs without the SOMs. Furthermore, the SS-GANS performed much better than GANS on the DOIC problem, even without the SOMs.

【14】 Early- and in-season crop type mapping without current-year ground truth: generating labels from historical information via a topology-based approach 标题：不考虑当年地面实况的早季和季作物类型制图：通过基于拓扑的方法从历史信息生成标签链接：https://arxiv.org/abs/2110.10275

作者：Chenxi Lin,Liheng Zhong,Xiao-Peng Song,Jinwei Dong,David B. Lobell,Zhenong Jin 机构：a Department of Bioproducts and Biosystems Engineering, University of Minnesota Twin Cities, St Paul, MN, b Ant Group, Beijing, China, c Department of Geosciences, Texas Tech University, Lubbock, TX, United States 摘要：遥感土地覆盖分类常常面临着有限的地面真实性的挑战。结合历史信息有可能显著降低与收集地面真相相关的昂贵成本，更重要的是，能够进行有助于许多收获前决策的早期和季节内绘图。在这项研究中，我们提出了一种新的方法，可以有效地在光谱特征空间（例如SWIR1与RDEG1波段的直方图）中传递关于不同作物类型的拓扑（即相对位置）的知识，以生成标签，从而支持不同年份的作物分类。重要的是，我们的方法不试图转移易受天气和管理年际变化影响的分类决策边界，而是依赖于更稳健和平移不变的拓扑信息。我们使用Landsat-8和Sentinel-2数据对美国中西部的玉米/大豆和中国东北部的水稻/玉米/大豆进行了测试。结果表明，我们的方法在每个图像可用后立即为目标年份的作物自动生成高质量标签。根据我们的方法生成的这些标签，随后使用随机森林分类器绘制的作物类型图在吐丝期玉米的F1得分高达0.887，开花期大豆的F1得分高达0.851，在爱荷华州的总体精度为0.873。在中国东北地区，水稻、玉米和大豆的F1分数和总体准确度可在收获前两个半月超过0.85。总的来说，这些结果突出了我们的方法在转移历史知识和最大限度地提高作物地图时效性方面的独特优势。我们的方法支持向学习可转移和概括知识的一般范式转变，以促进土地覆盖分类。摘要：Land cover classification in remote sensing is often faced with the challenge of limited ground truth. Incorporating historical information has the potential to significantly lower the expensive cost associated with collecting ground truth and, more importantly, enable early- and in-season mapping that is helpful to many pre-harvest decisions. In this study, we propose a new approach that can effectively transfer knowledge about the topology (i.e. relative position) of different crop types in the spectral feature space (e.g. the histogram of SWIR1 vs RDEG1 bands) to generate labels, thereby support crop classification in a different year. Importantly, our approach does not attempt to transfer classification decision boundaries that are susceptible to inter-annual variations of weather and management, but relies on the more robust and shift-invariant topology information. We tested this approach for mapping corn/soybeans in the US Midwest and paddy rice/corn/soybeans in Northeast China using Landsat-8 and Sentinel-2 data. Results show that our approach automatically generates high-quality labels for crops in the target year immediately after each image becomes available. Based on these generated labels from our approach, the subsequent crop type mapping using a random forest classifier reach the F1 score as high as 0.887 for corn as early as the silking stage and 0.851 for soybean as early as the flowering stage and the overall accuracy of 0.873 in Iowa. In Northeast China, F1 scores of paddy rice, corn and soybeans and the overall accuracy can exceed 0.85 two and half months ahead of harvest. Overall, these results highlight unique advantages of our approach in transferring historical knowledge and maximizing the timeliness of crop maps. Our approach supports a general paradigm shift towards learning transferrable and generalizable knowledge to facilitate land cover classification.

【15】 Adversarial attacks against Bayesian forecasting dynamic models 标题：对贝叶斯预测动态模型的敌意攻击链接：https://arxiv.org/abs/2110.10783

作者：Roi Naveiro 机构：Institute of Mathematical Sciences (ICMAT-CSIC), Madrid, Spain. 摘要：过去十年中，对抗式机器学习（AML）兴起。这门学科研究如何操纵数据来愚弄推理机，以及如何保护这些系统免受此类操纵攻击。针对回归和分类系统的攻击进行了广泛的研究，而针对时间序列预测系统的攻击很少受到关注。在本文中，我们提出了一种基于决策分析的攻击策略，可用于对抗贝叶斯预测动态模型。摘要：The last decade has seen the rise of Adversarial Machine Learning (AML). This discipline studies how to manipulate data to fool inference engines, and how to protect those systems against such manipulation attacks. Extensive work on attacks against regression and classification systems is available, while little attention has been paid to attacks against time series forecasting systems. In this paper, we propose a decision analysis based attacking strategy that could be utilized against Bayesian forecasting dynamic models.

【16】 Semi-supervised physics guided DL framework for predicting the I-V characteristics of GAN HEMT 标题：预测GaN HEMT I-V特性的半监督物理制导DL框架链接：https://arxiv.org/abs/2110.10724

作者：Shivanshu Mishra,Bipin Gaikwad,Nidhi Chaturvedi 机构：Central Electronics Engineering Research Institute, Pilani, India, Academy of Scientific and Innovative Research, Ghajiabad, India 摘要：这封信提出了一个新的深度学习框架（DLF），它解决了采用深度学习技术解决基于物理的问题的两个主要障碍：1）训练DL模型需要大量数据集，2）DL模型与现象物理的一致性。该框架本质上是通用的，只要其行为已知，就可以应用于其他研究领域的现象建模。为了演示该技术，开发了一种半监督物理引导神经网络（SPGNN），用于预测基于氮化镓的高电子迁移率晶体管（GaN HEMT）的I-V特性。提出了一种两阶段训练方法，其中在第一阶段，通过无监督学习方法训练DL模型，使用场效应晶体管的I-V方程作为模型的损耗函数，该模型将物理行为纳入DL模型，在第二阶段，DL模型已经用一组非常小的实验数据进行了微调。与传统的神经网络（TNN）相比，SPGNN显著降低了80%以上的训练数据需求，即使在不可见的条件下也能获得类似或更好的性能。SPGNN预测32.4%的不可见测试数据的误差小于1%，只有0.4%的不可见测试数据的误差大于10%。摘要：This letter proposes a novel deep learning framework (DLF) that addresses two major hurdles in the adoption of deep learning techniques for solving physics-based problems: 1) requirement of the large dataset for training the DL model, 2) consistency of the DL model with the physics of the phenomenon. The framework is generic in nature and can be applied to model a phenomenon from other fields of research too as long as its behaviour is known. To demonstrate the technique, a semi-supervised physics guided neural network (SPGNN) has been developed that predicts I-V characteristics of a gallium nitride-based high electron mobility transistor (GaN HEMT). A two-stage training method is proposed, where in the first stage, the DL model is trained via the unsupervised learning method using the I-V equations of a field-effect transistor as a loss function of the model that incorporates physical behaviors in the DL model and in the second stage, the DL model has been fine-tuned with a very small set of experimental data. The SPGNN significantly reduces the requirement of the training data by more than 80% for achieving similar or better performance than a traditional neural network (TNN) even for unseen conditions. The SPGNN predicts 32.4% of the unseen test data with less than 1% of error and only 0.4% of the unseen test data with more than 10% of error.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】 Self-Supervised Visual Representation Learning Using Lightweight Architectures 标题：基于轻量级体系结构的自监督视觉表征学习链接：https://arxiv.org/abs/2110.11160

作者：Prathamesh Sonawane,Sparsh Drolia,Saqib Shamsi,Bhargav Jain 机构： Pune Institute of Computer Technology, Maharashtra, India , Whirlpool Corporation 备注：8 pages, 4 figures, 1 table, submitted to Artificial Intelligence and Statistics 2022 (AISTATS 2022) 摘要：在自监督学习中，使用机器创建注释的数据集训练模型来解决借口任务。目标是将训练好的权重转移到目标域中执行下游任务。我们仔细研究了从图像数据中提取特征的最显著的借口任务，并进一步在资源受限的网络上进行实验，这有助于更快的实验和部署。我们研究了保持所有其他参数一致的各种自监督技术的性能。我们研究了通过改变模型类型、大小和对骨干进行预训练的数量而产生的模式，并建立了一个标准，以便在未来的研究中进行比较。我们还进行了全面的研究，以了解不同体系结构学习到的表示质量。摘要：In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine. The objective is to transfer the trained weights to perform a downstream task in the target domain. We critically examine the most notable pretext tasks to extract features from image data and further go on to conduct experiments on resource constrained networks, which aid faster experimentation and deployment. We study the performance of various self-supervised techniques keeping all other parameters uniform. We study the patterns that emerge by varying model type, size and amount of pre-training done for the backbone as well as establish a standard to compare against for future research. We also conduct comprehensive studies to understand the quality of representations learned by different architectures.

【2】 Single-Modal Entropy based Active Learning for Visual Question Answering 标题：基于单模熵的视觉答疑主动学习链接：https://arxiv.org/abs/2110.10906

作者：Dong-Jin Kim,Jae Won Cho,Jinsoo Choi,Yunjae Jung,In So Kweon 机构：Korea Advanced Institute of Science, and Technology (KAIST), Daejeon, South Korea, ( indicates equal contribution) 备注：Accepted to BMVC 2021 摘要：在现实世界中构建大规模标记数据集，特别是对于高级任务（例如，可视化问答），成本高昂且耗时。此外，随着数据量的不断增加和体系结构的复杂性，主动学习已经成为计算机视觉研究的一个重要方面。在这项工作中，我们解决了视觉问答（VQA）的多模式设置中的主动学习问题。针对多模态输入、图像和问题，我们提出了一种新的有效样本获取方法，通过对每个输入使用特别的单模态分支来利用其信息。我们基于互信息的样本采集策略-单峰熵测量（SMEM）以及自蒸馏技术使样本采集者能够利用所有现有模式并找到信息量最大的样本。我们的新想法易于实现，成本高效，并且易于适应其他多模式任务。我们通过与现有的主动学习基线进行比较，通过最先进的性能来确认我们在各种VQA数据集上的发现。摘要：Constructing a large-scale labeled dataset in the real world, especially for high-level tasks (eg, Visual Question Answering), can be expensive and time-consuming. In addition, with the ever-growing amounts of data and architecture complexity, Active Learning has become an important aspect of computer vision research. In this work, we address Active Learning in the multi-modal setting of Visual Question Answering (VQA). In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition through the use of ad hoc single-modal branches for each input to leverage its information. Our mutual information based sample acquisition strategy Single-Modal Entropic Measure (SMEM) in addition to our self-distillation technique enables the sample acquisitor to exploit all present modalities and find the most informative samples. Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks. We confirm our findings on various VQA datasets through state-of-the-art performance by comparing to existing Active Learning baselines.

【3】 Dynamic Bottleneck for Robust Self-Supervised Exploration 标题：鲁棒自监督探测的动态瓶颈链接：https://arxiv.org/abs/2110.10735

作者：Chenjia Bai,Lingxiao Wang,Lei Han,Animesh Garg,Jianye Hao,Peng Liu,Zhaoran Wang 机构：Harbin Institute of Technology, China, Northwestern University, USA, Tencent Robotics X, University of Toronto, Vector Institute, Tianjin University, China 备注：NeurIPS 2021 摘要：基于转移伪计数或动力学好奇心的探索方法在解决奖励稀疏的强化学习问题上取得了很好的效果。然而，此类方法通常对环境动力学无关信息敏感，例如白噪声。为了处理这些与动态无关的信息，我们提出了一个动态瓶颈（DB）模型，该模型基于信息瓶颈原理获得了与动态相关的表示。在DB模型的基础上，我们进一步提出了DB bonus，它鼓励agent探索具有高信息增益的状态-动作对。我们在建议的DB奖金、线性情况下的置信上限（UCB）和表格情况下的访问计数之间建立了理论联系。我们在带有动力学无关噪声的Atari套装上评估了所提出的方法。我们的实验表明，在噪声环境中，使用DB奖金的探测方法优于几种最先进的探测方法。摘要：Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

【4】 Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos 标题：看看我在做什么：教学视频中叙述的自我监督空间基础链接：https://arxiv.org/abs/2110.10596

作者：Reuben Tan,Bryan A. Plummer,Kate Saenko,Hailin Jin,Bryan Russell 机构：Boston University,MIT-IBM Watson AI Lab, IBM Research, Adobe Research 备注：Accepted at NeurIPS 2021 摘要：我们介绍了在视频中对叙述的交互进行空间定位的任务。我们的方法的关键是能够学会在大量视频和伴随转录的叙述中，通过自我监督在空间上定位互动。为了实现这一目标，我们提出了一种多层跨模态注意网络，它能够有效地优化训练过程中的对比损失。我们引入了一种分割策略，在视觉和自然语言模式中交替计算模式间和模式内的注意，通过直接对比两种模式的表征，可以进行有效的训练。我们在HowTo100M教学视频数据集上进行自我训练，并在YouCook2数据集中新收集的本地化描述交互数据集上进行评估，以此证明我们方法的有效性。我们表明，我们的方法优于其他基线，包括浅联合注意和完全交叉模式注意。我们还将我们的方法应用于Flickr30K上监督较弱的图像中的固定短语，并表明堆叠多个注意层是有效的，并且，当与单词到区域的丢失相结合时，在一次回忆和定点手的准确度方面达到了最先进的水平。摘要：We introduce the task of spatially localizing narrated interactions in videos. Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations. To achieve this goal, we propose a multilayer cross-modal attention network that enables effective optimization of a contrastive loss during training. We introduce a divided strategy that alternates between computing inter- and intra-modal attention across the visual and natural language modalities, which allows effective training via directly contrasting the two modalities' representations. We demonstrate the effectiveness of our approach by self-training on the HowTo100M instructional video dataset and evaluating on a newly collected dataset of localized described interactions in the YouCook2 dataset. We show that our approach outperforms alternative baselines, including shallow co-attention and full cross-modal attention. We also apply our approach to grounding phrases in images with weak supervision on Flickr30K and show that stacking multiple attention layers is effective and, when combined with a word-to-region loss, achieves state of the art on recall-at-one and pointing hand accuracies.

【5】 Robust Monocular Localization in Sparse HD Maps Leveraging Multi-Task Uncertainty Estimation 标题：基于多任务不确定性估计的稀疏高清地图鲁棒单目定位链接：https://arxiv.org/abs/2110.10563

作者：Kürsat Petek,Kshitij Sirohi,Daniel Büscher,Wolfram Burgard 机构： University of Freiburg 摘要：使用低成本传感器设置和稀疏高清地图在密集城市场景中进行稳健定位与当前自主驾驶的进展高度相关，但仍然是一个具有挑战性的研究课题。我们提出了一种新的基于滑动窗口姿态图的单目定位方法，该方法利用预测的不确定性来提高精度和鲁棒性，以应对具有挑战性的场景和每帧故障。为此，我们提出了一个有效的多任务不确定性感知模块，该模块包括语义分割和边界盒检测，以实现稀疏地图中车辆的定位，该地图仅包含车道边界和交通灯。此外，我们还设计了直接由估计的不确定性生成的可微成本图。这为以无关联和不确定性感知的方式最大限度地减少无定形映射元素的重投影损失提供了可能性。对Lyft 5数据集的广泛评估表明，尽管地图稀疏，但我们的方法能够在具有挑战性的城市场景中实现稳健而准确的6D定位摘要：Robust localization in dense urban scenarios using a low-cost sensor setup and sparse HD maps is highly relevant for the current advances in autonomous driving, but remains a challenging topic in research. We present a novel monocular localization approach based on a sliding-window pose graph that leverages predicted uncertainties for increased precision and robustness against challenging scenarios and per frame failures. To this end, we propose an efficient multi-task uncertainty-aware perception module, which covers semantic segmentation, as well as bounding box detection, to enable the localization of vehicles in sparse maps, containing only lane borders and traffic lights. Further, we design differentiable cost maps that are directly generated from the estimated uncertainties. This opens up the possibility to minimize the reprojection loss of amorphous map elements in an association free and uncertainty-aware manner. Extensive evaluation on the Lyft 5 dataset shows that, despite the sparsity of the map, our approach enables robust and accurate 6D localization in challenging urban scenarios

【6】 ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning 标题：ABC：类不平衡半监督学习的辅助平衡分类器链接：https://arxiv.org/abs/2110.10368

作者：Hyuck Lee,Seungjae Shin,Heeyoung Kim 机构：Department of Industrial and Systems Engineering, KAIST 摘要：现有的半监督学习（SSL）算法通常假设类平衡数据集，尽管许多真实数据集的类分布是不平衡的。通常，在类不平衡数据集上训练的分类器偏向于大多数类。对于SSL算法来说，这个问题变得更加棘手，因为它们利用对未标记数据的有偏预测进行训练。然而，传统的类不平衡学习技术是为标记数据设计的，不能很容易地与SSL算法相结合。我们提出了一种可扩展的类不平衡SSL算法，该算法可以有效地使用未标记的数据，同时通过引入附加到现有SSL算法表示层的单层辅助平衡分类器（ABC）来缓解类不平衡。ABC使用小批量的类平衡损失进行训练，同时使用主干SSL算法从小批量中的所有数据点学习的高质量表示，以避免过度拟合和信息丢失。此外，我们使用一致性正则化，这是一种最新的SSL技术，用于以改进的方式利用未标记数据，通过为每个类别选择概率相同的未标记数据，训练ABC在类别之间达到平衡。该算法在使用四个基准数据集的各种类不平衡SSL实验中取得了最新的性能。摘要：Existing semi-supervised learning (SSL) algorithms typically assume class-balanced datasets, although the class distributions of many real-world datasets are imbalanced. In general, classifiers trained on a class-imbalanced dataset are biased toward the majority classes. This issue becomes more problematic for SSL algorithms because they utilize the biased prediction of unlabeled data for training. However, traditional class-imbalanced learning techniques, which are designed for labeled data, cannot be readily combined with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by introducing an auxiliary balanced classifier (ABC) of a single layer, which is attached to a representation layer of an existing SSL algorithm. The ABC is trained with a class-balanced loss of a minibatch, while using high-quality representations learned from all data points in the minibatch using the backbone SSL algorithm to avoid overfitting and information loss.Moreover, we use consistency regularization, a recent SSL technique for utilizing unlabeled data in a modified way, to train the ABC to be balanced among the classes by selecting unlabeled data with the same probability for each class. The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.

【7】 Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles 标题：从自监督集成中学习富最近邻表示链接：https://arxiv.org/abs/2110.10293

作者：Bram Wallace,Devansh Arpit,Huan Wang,Caiming Xiong 机构：Cornell University, Salesforce AI Research 摘要：通过自我监督对卷积神经网络进行预训练，并将其应用于转移学习，这是一个发展极为迅速的领域，几乎可以在所有图像领域快速迭代地提高性能。同时，模型集成是监督学习文献和实践中最普遍适用的技术之一，为可靠地提高性能提供了一种简单的解决方案。但如何优化组合自监督模型以最大化表示质量在很大程度上仍未得到解决。在这项工作中，我们提供了一个框架，通过在推理时通过梯度下降直接学习表示的新方法来执行自监督模型融合。通过将模型从域内数据集转移到域内数据集和转移设置，该技术提高了表示质量（通过k近邻来衡量）。此外，这种通过反向传播直接学习特征的方法甚至可以改进单个模型的表示，与自蒸馏中发现的改进相呼应。摘要：Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains. Meanwhile, model ensembling is one of the most universally applicable techniques in supervised learning literature and practice, offering a simple solution to reliably improve performance. But how to optimally combine self-supervised models to maximize representation quality has largely remained unaddressed. In this work, we provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time. This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting, with models transferable from the former setting to the latter. Additionally, this direct learning of feature through backpropagation improves representations from even a single model, echoing the improvements found in self-distillation.

迁移|Zero/Few/One-Shot|自适应(8篇)

【1】 One-Shot Transfer Learning of Physics-Informed Neural Networks 标题：物理信息神经网络的一次转移学习链接：https://arxiv.org/abs/2110.11286

作者：Shaan Desai,Marios Mattheakis,Hayden Joy,Pavlos Protopapas,Stephen Roberts 机构：Machine Learning Research Group, University of Oxford, School of Engineering and Applied Science, Harvard University 备注：[under review] 摘要：从经典动力系统到量子力学，高效准确地求解微分方程是许多科学研究领域进步的核心。使用物理信息神经网络（PINN）来解决这类问题的兴趣激增，因为它们比传统的数值方法有许多好处。尽管迁移学习在求解微分方程方面有潜在的好处，但它还没有得到充分的探索。在这项研究中，我们提出了一个迁移学习PINNs的一般框架，该框架可以对常微分方程和偏微分方程的线性系统进行一次性推理。这意味着许多未知微分方程的高精度解可以瞬间获得，而无需重新训练整个网络。我们通过解决一些实际问题，如一阶和二阶线性常微分方程、泊松方程和含时薛定谔复值偏微分方程，证明了所提出的深度学习方法的有效性。摘要：Solving differential equations efficiently and accurately sits at the heart of progress in many areas of scientific research, from classical dynamical systems to quantum mechanics. There is a surge of interest in using Physics-Informed Neural Networks (PINNs) to tackle such problems as they provide numerous benefits over traditional numerical approaches. Despite their potential benefits for solving differential equations, transfer learning has been under explored. In this study, we present a general framework for transfer learning PINNs that results in one-shot inference for linear systems of both ordinary and partial differential equations. This means that highly accurate solutions to many unknown differential equations can be obtained instantaneously without retraining an entire network. We demonstrate the efficacy of the proposed deep learning approach by solving several real-world problems, such as first- and second-order linear ordinary equations, the Poisson equation, and the time-dependent Schrodinger complex-value partial differential equation.

【2】 One Representative-Shot Learning Using a Population-Driven Template with Application to Brain Connectivity Classification and Evolution Prediction 标题：基于群体驱动模板的一次代表性学习及其在脑连通性分类和进化预测中的应用链接：https://arxiv.org/abs/2110.11238

作者：Umut Guvercin,Mohammed Amine Gharsallaoui,Islem Rekik 机构：ID, ⋆, BASIRA Lab, Istanbul Technical University, Istanbul, Turkey 摘要：少数镜头学习提出了一个具有挑战性的范例，用于在代表目标类的几个训练样本上训练判别模型。然而，基于深度学习的分类方法不适合这种学习，因为它们需要大量的训练数据——更不用说一次性学习了。最近，图形神经网络（GNNs）被引入到网络神经科学领域，大脑连接被编码在一个图形中。然而，由于神经成像数据集（特别是罕见疾病和低资源临床设施）稀少，这种数据消耗架构可能无法学习目标任务。在本文中，我们采用了一种非常不同的方法来训练GNN，我们的目标是用一个样本进行学习并获得最佳性能——这是一个需要解决的艰巨挑战。具体来说，我们提出了第一个一次性范例，其中GNN在单个群体驱动的模板上进行训练，即连接大脑模板（CBT）。CBT是一组大脑图形的紧凑表示，这些图形捕捉了个体之间共享的独特连接模式。它类似于神经成像数据集的大脑图像图谱。使用一个具有代表性的CBT作为训练样本，我们减轻了GNN模型的训练负荷，同时提高了它们在各种分类和回归任务中的性能。我们证明，我们的方法在下游分类和时间相关的脑图数据预测任务方面显著优于基准一次性学习方法，同时在所有传统训练策略上与训练进行竞争。我们的源代码可以在https://github.com/basiralab/one-representative-shot-learning. 摘要：Few-shot learning presents a challenging paradigm for training discriminative models on a few training samples representing the target classes to discriminate. However, classification methods based on deep learning are ill-suited for such learning as they need large amounts of training data --let alone one-shot learning. Recently, graph neural networks (GNNs) have been introduced to the field of network neuroscience, where the brain connectivity is encoded in a graph. However, with scarce neuroimaging datasets particularly for rare diseases and low-resource clinical facilities, such data-devouring architectures might fail in learning the target task. In this paper, we take a very different approach in training GNNs, where we aim to learn with one sample and achieve the best performance --a formidable challenge to tackle. Specifically, we present the first one-shot paradigm where a GNN is trained on a single population-driven template --namely a connectional brain template (CBT). A CBT is a compact representation of a population of brain graphs capturing the unique connectivity patterns shared across individuals. It is analogous to brain image atlases for neuroimaging datasets. Using a one-representative CBT as a training sample, we alleviate the training load of GNN models while boosting their performance across a variety of classification and regression tasks. We demonstrate that our method significantly outperformed benchmark one-shot learning methods with downstream classification and time-dependent brain graph data forecasting tasks while competing with the train-on-all conventional training strategy. Our source code can be found at https://github.com/basiralab/one-representative-shot-learning.

【3】 Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System 标题：适应人体感知运动系统噪声特性的逆最优控制链接：https://arxiv.org/abs/2110.11130

作者：Matthias Schultheis,Dominik Straub,Constantin A. Rothkopf 机构：Centre for Cognitive Science, Technical University of Darmstadt, Darmstadt, Germany 备注：24 pages, 11 figures, to be published at NeurIPS 2021 摘要：基于信号相关噪声的最优反馈控制的计算级解释已经能够解释人类感觉运动行为中的大量现象。然而，通常需要为任务假设一个成本函数，并通过比较观察到的和预测的轨迹来评估人类行为的最优性。在这里，我们介绍了信号相关噪声的逆最优控制，它允许从观察到的行为推断成本函数。为此，我们将问题形式化为一个部分可观测的马尔可夫决策过程，并区分agent和实验者的推理问题。具体地说，我们推导了状态和信念状态演化的概率公式，以及具有信号相关噪声的线性二次高斯问题中传播方程的近似。我们从实验者的角度将模型推广到状态变量部分可观测的情况。通过对合成数据的验证和对实验数据的应用，证明了该方法的可行性。我们的方法能够恢复人类顺序感觉运动行为中隐含的成本和收益，从而在计算框架中协调规范和描述性方法。摘要：Computational level explanations based on optimal feedback control with signal-dependent noise have been able to account for a vast array of phenomena in human sensorimotor behavior. However, commonly a cost function needs to be assumed for a task and the optimality of human behavior is evaluated by comparing observed and predicted trajectories. Here, we introduce inverse optimal control with signal-dependent noise, which allows inferring the cost function from observed behavior. To do so, we formalize the problem as a partially observable Markov decision process and distinguish between the agent's and the experimenter's inference problems. Specifically, we derive a probabilistic formulation of the evolution of states and belief states and an approximation to the propagation equation in the linear-quadratic Gaussian problem with signal-dependent noise. We extend the model to the case of partial observability of state variables from the point of view of the experimenter. We show the feasibility of the approach through validation on synthetic data and application to experimental data. Our approach enables recovering the costs and benefits implicit in human sequential sensorimotor behavior, thereby reconciling normative and descriptive approaches in a computational framework.

【4】 Memory Efficient Adaptive Attention For Multiple Domain Learning 标题：记忆高效的自适应注意在多域学习中的应用链接：https://arxiv.org/abs/2110.10969

作者：Himanshu Pradeep Aswani,Abhiraj Sunil Kanse,Shubhang Bhatnagar,Amit Sethi 机构：Indian Institute of Technology, Bombay 备注：13 pages, 3 figures, 4 graphs, 3 tables 摘要：在新域上从头开始训练CNN通常需要大量标记图像和计算，这不适合低功耗硬件。减少这些需求的一种方法是将CNN架构模块化，并在预训练后冻结较重模块（即较低层）的权重。最近的研究提出了替代的模块化架构和方案，这些架构和方案可以减少可训练参数的数量，以匹配新域上完全微调CNN的精度。我们的工作表明，可训练参数的数量可能进一步减少一个数量级。此外，我们还建议，多领域学习的新模块化技术还应与其他现实指标进行比较，如固定模块和可训练模块之间需要的互连数量、需要的训练样本数量、，所需的计算顺序以及对训练数据的部分错误标记的鲁棒性。根据所有这些标准，所提出的体系结构显示出优于或符合当前最先进技术的优势。摘要：Training CNNs from scratch on new domains typically demands large numbers of labeled images and computations, which is not suitable for low-power hardware. One way to reduce these requirements is to modularize the CNN architecture and freeze the weights of the heavier modules, that is, the lower layers after pre-training. Recent studies have proposed alternative modular architectures and schemes that lead to a reduction in the number of trainable parameters needed to match the accuracy of fully fine-tuned CNNs on new domains. Our work suggests that a further reduction in the number of trainable parameters by an order of magnitude is possible. Furthermore, we propose that new modularization techniques for multiple domain learning should also be compared on other realistic metrics, such as the number of interconnections needed between the fixed and trainable modules, the number of training samples needed, the order of computations required and the robustness to partial mislabeling of the training data. On all of these criteria, the proposed architecture demonstrates advantages over or matches the current state-of-the-art.

【5】 EBJR: Energy-Based Joint Reasoning for Adaptive Inference 标题：EBJR：基于能量的自适应联合推理链接：https://arxiv.org/abs/2110.10343

作者：Mohammad Akbari,Amin Banitalebi-Dehkordi,Yong Zhang 机构：Huawei Technologies Canada Co., Ltd. 备注：BMVC 2021 摘要：最先进的深度学习模型在各种基准上取得了显著的性能水平。然而，出色的性能是以低效的计算成本为代价的。另一方面，轻量级体系结构实现了中等精度，但延迟要高得多。本文提出了一种大精度模型与小快速模型联合使用的新方法。为此，我们提出了一个基于能量的联合推理（EBJR）框架，该框架在浅层和深层模型之间自适应地分配样本，以实现接近深层模型的精度，但延迟接近浅层模型。我们的方法适用于开箱即用的预先训练模型，因为它不需要架构更改或重新训练。此外，它易于使用和部署，特别是对于云服务。通过对不同下游任务的一组综合实验，我们表明我们的方法比强大的最先进的方法有相当大的优势。此外，我们还提出了专门化EBJR，这是我们方法的一个扩展，我们创建了一个较小的专门化侧模型，该模型仅部分执行目标任务，但产生了更高的准确性和更快的推理。我们通过理论和实验评估来验证我们方法的优势。摘要：State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.

【6】 Layer-wise Adaptive Model Aggregation for Scalable Federated Learning 标题：面向可扩展联邦学习的分层自适应模型聚合链接：https://arxiv.org/abs/2110.10302

作者：Sunwoo Lee,Tuo Zhang,Chaoyang He,Salman Avestimehr 机构：Viterbi School of Engineering, University of Southern California 摘要：在联合学习中，跨客户端聚合本地模型的一种常见方法是定期平均整个模型参数。然而，我们知道，不同层次的神经网络在不同的客户之间可能存在不同程度的模型差异。传统的完全聚合方案不考虑这种差异，同时同步整个模型参数，从而导致网络带宽消耗的低效。在增加通信成本的同时，聚合客户机中相似的参数并不能取得有意义的训练进展。我们提出了FedLAMA，一种用于可伸缩联邦学习的分层模型聚合方案。FedLAMA以分层方式自适应调整聚合间隔，同时考虑模型差异和通信成本。分层聚合方法能够精细地控制聚合间隔，以放松聚合频率，而不会对模型精度产生重大影响。我们的实证研究表明，FedLAMA将IID数据和非IID数据的通信成本分别降低了60%和70%，同时实现了与FedAvg相当的准确性。摘要：In Federated Learning, a common approach for aggregating local models across clients is periodic averaging of the full model parameters. It is, however, known that different layers of neural networks can have a different degree of model discrepancy across the clients. The conventional full aggregation scheme does not consider such a difference and synchronizes the whole model parameters at once, resulting in inefficient network bandwidth consumption. Aggregating the parameters that are similar across the clients does not make meaningful training progress while increasing the communication cost. We propose FedLAMA, a layer-wise model aggregation scheme for scalable Federated Learning. FedLAMA adaptively adjusts the aggregation interval in a layer-wise manner, jointly considering the model discrepancy and the communication cost. The layer-wise aggregation method enables to finely control the aggregation interval to relax the aggregation frequency without a significant impact on the model accuracy. Our empirical study shows that FedLAMA reduces the communication cost by up to 60% for IID data and 70% for non-IID data while achieving a comparable accuracy to FedAvg.

【7】 Test time Adaptation through Perturbation Robustness 标题：基于摄动鲁棒性的测试时间自适应链接：https://arxiv.org/abs/2110.10232

作者：Prabhu Teja Sivaprasad,François Fleuret 机构：Test time Adaptation through PerturbationRobustnessPrabhu Teja SIdiap Research Institute & EPFL, chFranc¸ois FleuretIdiap Research Institute & University of Geneva 备注：Under review 摘要：由几个真实过程生成的数据样本本质上是动态的，它们的特征随时间而变化。因此，不可能使用文献中大量的迁移学习方法来训练和处理训练和推理之间所有可能的分布转移。在本文中，我们解决了在推理时适应域转移的问题，我们不改变训练过程，而是在测试时快速适应模型以处理任何域转移。为此，我们建议在图像流形上加强测试样本附近采样数据预测的一致性。在一系列测试场景中，例如处理腐败（CIOFE-10-C和CIOFE-100-C）和域自适应（VISDA-C），我们的方法与以前的方法相比是显著的。摘要：Data samples generated by several real world processes are dynamic in nature textit{i.e.}, their characteristics vary with time. Thus it is not possible to train and tackle all possible distributional shifts between training and inference, using the host of transfer learning methods in literature. In this paper, we tackle this problem of adapting to domain shift at inference time textit{i.e.}, we do not change the training process, but quickly adapt the model at test-time to handle any domain shift. For this, we propose to enforce consistency of predictions of data sampled in the vicinity of test sample on the image manifold. On a host of test scenarios like dealing with corruptions (CIFAR-10-C and CIFAR-100-C), and domain adaptation (VisDA-C), our method is at par or significantly outperforms previous methods.

【8】 Mean Nyström Embeddings for Adaptive Compressive Learning 标题：自适应压缩学习的平均Nyström嵌入链接：https://arxiv.org/abs/2110.10996

作者：Antoine Chatalic,Luigi Carratino,Ernesto De Vito,Lorenzo Rosasco 机构：⋆ DIBRIS & MaLGA, Università di Genova (Genoa, Italy), † DIMA & MaLGA, Università di Genova (Genoa, Italy) 备注：22 pages, 4 figures 摘要：压缩学习是一种有效的大规模学习方法，其基础是将整个数据集绘制为单个平均嵌入（草图），即广义矩向量。然后使用自适应参数模型将学习任务近似地作为反问题求解。在此背景下，以前的工作都集中在通过平均随机特征获得的草图上，虽然通用性很难适应当前的问题。在本文中，我们提出并研究了基于数据相关的Nystr进行草图绘制的思想“om近似。从理论角度，我们证明了超额风险可以在几何假设下控制，几何假设与用于从草图学习的参数模型和与手头任务相关的协方差算子有关。从经验上看，对于k-均值聚类和高斯建模，对于固定的草图尺寸，Nystr“om草图确实优于使用随机特征构建的草图。摘要：Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an inverse problem using an adapted parametric model. Previous works in this context have focused on sketches obtained by averaging random features, that while universal can be poorly adapted to the problem at hand. In this paper, we propose and study the idea of performing sketching based on data-dependent Nystr"om approximation. From a theoretical perspective we prove that the excess risk can be controlled under a geometric assumption relating the parametric model used to learn from the sketch and the covariance operator associated to the task at hand. Empirically, we show for k-means clustering and Gaussian modeling that for a fixed sketch size, Nystr"om sketches indeed outperform those built with random features.

强化学习(7篇)

【1】 Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations 标题：随机偏微分方程在线控制的深度强化学习链接：https://arxiv.org/abs/2110.11265

作者：Erfan Pirmorad,Faraz Khoshbakhtian,Farnam Mansouri,Amir-massoud Farahmand 机构：Department of Mechanical & Industrial Engineering, University of Toronto, Department of Computer Science, University of Toronto, Vector Institute 摘要：在许多领域，如物理科学、生命科学和金融领域，控制方法被用于在由微分方程控制的复杂动力系统中实现预期目标。在这项工作中，我们将控制随机偏微分方程（SPDE）问题描述为一个强化学习问题。我们提出了一种基于学习的分布式控制方法，用于使用深度确定性策略梯度法在线控制具有高维状态-动作空间的SPDE系统。我们在控制随机Burgers方程的问题上测试了我们的方法的性能，该方程描述了无限大区域中的湍流流动。摘要：In many areas, such as the physical sciences, life sciences, and finance, control approaches are used to achieve a desired goal in complex dynamical systems governed by differential equations. In this work we formulate the problem of controlling stochastic partial differential equations (SPDE) as a reinforcement learning problem. We present a learning-based, distributed control approach for online control of a system of SPDEs with high dimensional state-action space using deep deterministic policy gradient method. We tested the performance of our method on the problem of controlling the stochastic Burgers' equation, describing a turbulent fluid flow in an infinitely large domain.

【2】 Reinforcement Learning Based Optimal Camera Placement for Depth Observation of Indoor Scenes 标题：基于强化学习的室内场景深度观测摄像机最优布置链接：https://arxiv.org/abs/2110.11106

作者：Yichuan Chen,Manabu Tsukada,Hiroshi Esaki 机构：∗ Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan 备注：Accepted to IEEE International Conference on Networking, Sensing and Control (ICNSC) 2021 摘要：在使用多个摄像机的任务中，探索最适合任务的摄像机设置——最优摄像机放置（OCP）问题——非常重要。然而，现有的OCP解决方案很少专门用于室内场景的深度观察，大多数通用解决方案都是离线工作的。针对这一问题，本文提出了一种基于强化学习的室内场景深度观测OCP在线解决方案。所提出的解决方案包括一个模拟环境，该环境使用阴影贴图实现场景观察和奖励估计，以及一个代理网络，该代理网络包含一个基于软参与者-批评家（SAC）的强化学习主干和一个用于逐层从观察到的点云中提取特征的特征提取器。对两种最先进的离线优化方法进行了对比实验。实验结果表明，该系统在获得较低的深度观测误差方面优于十分之七的测试场景。所有测试场景中的总误差也小于基线场景的90%。因此，在没有场景先验知识或以较低的深度观测误差为主要目标的场景中，建议的系统更适合于深度摄影机的放置。摘要：Exploring the most task-friendly camera setting -- optimal camera placement (OCP) problem -- in tasks that use multiple cameras is of great importance. However, few existing OCP solutions specialize in depth observation of indoor scenes, and most versatile solutions work offline. To this problem, an OCP online solution to depth observation of indoor scenes based on reinforcement learning is proposed in this paper. The proposed solution comprises a simulation environment that implements scene observation and reward estimation using shadow maps and an agent network containing a soft actor-critic (SAC)-based reinforcement learning backbone and a feature extractor to extract features from the observed point cloud layer-by-layer. Comparative experiments with two state-of-the-art optimization-based offline methods are conducted. The experimental results indicate that the proposed system outperforms seven out of ten test scenes in obtaining lower depth observation error. The total error in all test scenes is also less than 90% of the baseline ones. Therefore, the proposed system is more competent for depth camera placement in scenarios where there is no prior knowledge of the scenes or where a lower depth observation error is the main objective.

【3】 RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System 标题：RL4RS：基于强化学习的推荐系统的真实基准链接：https://arxiv.org/abs/2110.11073

作者：Kai Wang,Zhene Zou,Qilin Deng,Yue Shang,Minghao Zhao,Runze Wu,Xudong Shen,Tangjie Lyu,Changjie Fan 机构：Fuxi AI Lab, NetEase Games, Hangzhou, Zhejiang, China 备注：First version 摘要：基于强化学习的推荐系统（RL-basedrs）旨在从一批收集的数据中学习一个好的策略，并将顺序推荐转化为多步决策任务。然而，目前基于RL的RS基准通常存在较大的现实差距，因为它们涉及人工RL数据集或半模拟RS数据集，并且训练的策略直接在模拟环境中评估。在现实环境中，并不是所有的推荐问题都适合转化为强化学习问题。与以往的学术RL研究不同，基于RL的RS存在外推误差，并且在部署前很难得到充分验证。在本文中，我们介绍了RL4RS（推荐系统的强化学习）基准-一种从工业应用中完全收集的新资源，用于训练和评估RL算法，特别关注上述问题。它包含两个数据集：优化的模拟环境、相关的高级RL基线、数据理解工具和反事实策略评估算法。RL4RS套装可在以下网址找到：https://github.com/fuxiAIlab/RL4RS. 除了基于RL的推荐系统之外，我们希望这些资源能够为强化学习和神经组合优化的研究做出贡献。摘要：Reinforcement learning based recommender systems (RL-based RS) aims at learning a good policy from a batch of collected data, with casting sequential recommendation to multi-step decision-making tasks. However, current RL-based RS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets, and the trained policy is directly evaluated in the simulation environment. In real-world situations, not all recommendation problems are suitable to be transformed into reinforcement learning problems. Unlike previous academic RL researches, RL-based RS suffer from extrapolation error and the difficulties of being well validated before deployment. In this paper, we introduce the RL4RS (Reinforcement Learning for Recommender Systems) benchmark - a new resource fully collected from industrial applications to train and evaluate RL algorithms with special concerns on the above issues. It contains two datasets, tuned simulation environments, related advanced RL baselines, data understanding tools, and counterfactual policy evaluation algorithms. The RL4RS suit can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in reinforcement learning and neural combinatorial optimization.

【4】 Neuro-Symbolic Reinforcement Learning with First-Order Logic 标题：基于一阶逻辑的神经符号强化学习链接：https://arxiv.org/abs/2110.10963

作者：Daiki Kimura,Masaki Ono,Subhajit Chaudhury,Ryosuke Kohita,Akifumi Wachi,Don Joven Agravante,Michiaki Tatsubori,Asim Munawar,Alexander Gray 机构：IBM Research 备注：EMNLP 2021 (main conference) 摘要：深度强化学习（RL）方法在收敛之前通常需要多次试验，并且没有提供训练策略的直接解释能力。为了实现RL策略的快速收敛性和可解释性，我们提出了一种新的文本游戏RL方法，该方法采用了一种称为逻辑神经网络的神经符号框架，可以在可微网络中学习符号规则和可解释规则。该方法首先从文本观察和外部词义网络（ConceptNet）中提取一阶逻辑事实，然后在网络中用可直接解释的逻辑运算符训练策略。我们的实验结果表明，在TextWorld基准测试中，使用该方法的RL训练收敛速度明显快于其他最先进的神经符号方法。摘要：Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules in their differentiable network. The method is first to extract first-order logical facts from text observation and external word meaning network (ConceptNet), then train a policy in the network with directly interpretable logical operators. Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.

【5】 Transferring Reinforcement Learning for DC-DC Buck Converter Control via Duty Ratio Mapping: From Simulation to Implementation 标题：占空比映射的DC-DC Buck变换器控制的强化学习转移：从仿真到实现链接：https://arxiv.org/abs/2110.10490

作者：Chenggang Cui,Tianxiao Yang,Yuxuan Dai,Chuanlin Zhang 机构： Shanghai Universityof Electric Power 摘要：强化学习（RL）控制方法在电力电子系统中的应用已成为一个新兴主题，而真实问题的模拟仍然是一个具有挑战性的问题，因为文献中很少有结果可参考。事实上，由于仿真模型和现实系统之间不可避免的不匹配，离线训练的RL控制策略可能会在传输过程中在实际实现中遇到意想不到的障碍。作为本文的主要贡献，提出了一种通过精心设计的占空比映射（DRM）实现DC-DC降压变换器传输的方法。然后，给出了一个详细的模拟真实过程，以实现无模型深度强化学习（DRL）控制器。通过对比实验研究，验证了该方法的可行性和有效性。摘要：Reinforcement learning (RL) control approach with application into power electronics systems has become an emerging topic whilst the sim-to-real issue remains a challenging problem as very few results can be referred to in the literature. Indeed, due to the inevitable mismatch between simulation models and real-life systems, offline trained RL control strategies may sustain unexpected hurdles in practical implementation during transferring procedure. As the main contribution of this paper, a transferring methodology via a delicately designed duty ratio mapping (DRM) is proposed for a DC-DC buck converter. Then, a detailed sim-to-real process is presented to enable the implementation of a model-free deep reinforcement learning (DRL) controller. The feasibility and effectiveness of the proposed methodology are demonstrated by comparative experimental studies.

【6】 Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching 标题：基于分布式强化学习的隐私保护动态边缘缓存链接：https://arxiv.org/abs/2110.10349

作者：Shengheng Liu,Chong Zheng,Yongming Huang,Tony Q. S. Quek 备注：12 pages, 6 figures, under review with the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 摘要：移动边缘计算（mobileedgecomputing，MEC）是一种突出的计算范式，它拓展了无线通信的应用领域。由于用户设备和MEC服务器容量的限制，边缘缓存（EC）优化对于MEC无线网络中缓存资源的有效利用至关重要。然而，内容流行在空间和时间上的动态性和复杂性以及用户的隐私保护对电子商务优化提出了重大挑战。本文提出了一种保护隐私的分布式深层确定性策略梯度（P2D3PG）算法，以最大化MEC网络中设备的缓存命中率。具体而言，我们考虑的事实是，内容的群众是动态的，复杂的和不可观察的，并制定最大化的高速缓存命中率的设备作为分布式问题的约束下的隐私保护。特别地，我们将分布式优化转化为分布式无模型马尔可夫决策过程问题，然后引入一种保护隐私的联邦学习方法进行流行度预测。随后，提出了一种基于分布式强化学习的P2D3PG算法来解决分布式问题。仿真结果表明，该方法在保持用户隐私的同时，提高了EC命中率。摘要：Mobile edge computing (MEC) is a prominent computing paradigm which expands the application fields of wireless communication. Due to the limitation of the capacities of user equipments and MEC servers, edge caching (EC) optimization is crucial to the effective utilization of the caching resources in MEC-enabled wireless networks. However, the dynamics and complexities of content popularities over space and time as well as the privacy preservation of users pose significant challenges to EC optimization. In this paper, a privacy-preserving distributed deep deterministic policy gradient (P2D3PG) algorithm is proposed to maximize the cache hit rates of devices in the MEC networks. Specifically, we consider the fact that content popularities are dynamic, complicated and unobservable, and formulate the maximization of cache hit rates on devices as distributed problems under the constraints of privacy preservation. In particular, we convert the distributed optimizations into distributed model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction. Subsequently, a P2D3PG algorithm is developed based on distributed reinforcement learning to solve the distributed problems. Simulation results demonstrate the superiority of the proposed approach in improving EC hit rate over the baseline methods while preserving user privacy.

【7】 Feedback Linearization of Car Dynamics for Racing via Reinforcement Learning 标题：基于强化学习的赛车动力学反馈线性化链接：https://arxiv.org/abs/2110.10441

作者：Michael Estrada,Sida Li,Xiangyu Cai 机构：Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, US 备注：Final research paper for Berkeley's CS 285 (Deep Reinforcement Learning) in Fall 2020 摘要：通过学习反馈线性化的方法，我们寻求学习一个线性化控制器，以简化控制汽车自主比赛的过程。在手动设计的线性化控制器中，采用软参与者-批评家方法学习解耦矩阵和漂移向量，有效地校正误差。其结果是一个精确线性化控制器，可用于使发展良好的线性系统理论能够设计易于实现且计算要求显著降低的路径规划和跟踪方案。为了演示反馈线性化方法，首先使用反馈线性化学习一个精确结构已知但与初始控制器不同的仿真模型，从而引入误差。我们进一步寻求将此方法应用于一个系统，该系统在专门为赛车动力学建模而设计的健身房环境中引入了更多错误。为此，我们对学习反馈线性化方法进行了扩展；使用监督学习训练的神经网络，将线性化控制器的输出转换为赛车环境所需的输入。报告了我们在实现这些目标方面取得的进展，并讨论了实现这些目标的下一步。摘要：Through the method of Learning Feedback Linearization, we seek to learn a linearizing controller to simplify the process of controlling a car to race autonomously. A soft actor-critic approach is used to learn a decoupling matrix and drift vector that effectively correct for errors in a hand-designed linearizing controller. The result is an exactly linearizing controller that can be used to enable the well-developed theory of linear systems to design path planning and tracking schemes that are easy to implement and significantly less computationally demanding. To demonstrate the method of feedback linearization, it is first used to learn a simulated model whose exact structure is known, but varied from the initial controller, so as to introduce error. We further seek to apply this method to a system that introduces even more error in the form of a gym environment specifically designed for modeling the dynamics of car racing. To do so, we posit an extension to the method of learning feedback linearization; a neural network that is trained using supervised learning to convert the output of our linearizing controller to the required input for the racing environment. Our progress towards these goals is reported and the next steps in their accomplishment are discussed.

元学习(3篇)

【1】 On Hard Episodes in Meta-Learning 标题：论元学习中的困难情节链接：https://arxiv.org/abs/2110.11190

作者：Samyadeep Basu,Amr Sharaf,Nicolo Fusi,Soheil Feizi 机构：Microsoft AI, Microsoft Research, Department of Computer Science, University of Maryland, College Park 摘要：现有的元学习者主要关注提高跨多个事件的平均任务准确性。然而，不同的情节在硬度和质量上可能有所不同，导致元学习者在不同情节中的表现存在很大差距。理解这一问题在工业Few-Shot设置中尤为关键，因为在工业Few-Shot设置中，对测试片段的控制有限，因为它们通常由最终用户上传。在本文中，我们实证分析了元学习者在三个标准基准数据集（CIFAR-FS、mini-ImageNet和分层ImageNet）上不同硬度的事件中的行为。令人惊讶的是，我们观察到，在所有标准基准测试和元学习者中，最难和最容易的情节之间的准确率相差约50%。此外，我们还研究了硬事件的各种性质，并强调了它们与元训练中灾难性遗忘的联系。为了解决在困难事件中表现不佳的问题，我们调查并测试了基于对抗性训练和课程学习的不同元训练策略。我们发现对抗性训练策略比课程学习更能提高对困难事件的预测能力。摘要：Existing meta-learners primarily focus on improving the average task accuracy across multiple episodes. Different episodes, however, may vary in hardness and quality leading to a wide gap in the meta-learner's performance across episodes. Understanding this issue is particularly critical in industrial few-shot settings, where there is limited control over test episodes as they are typically uploaded by end-users. In this paper, we empirically analyse the behaviour of meta-learners on episodes of varying hardness across three standard benchmark datasets: CIFAR-FS, mini-ImageNet, and tiered-ImageNet. Surprisingly, we observe a wide gap in accuracy of around 50% between the hardest and easiest episodes across all the standard benchmarks and meta-learners. We additionally investigate various properties of hard episodes and highlight their connection to catastrophic forgetting during meta-training. To address the issue of sub-par performance on hard episodes, we investigate and benchmark different meta-training strategies based on adversarial training and curriculum learning. We find that adversarial training strategies are much more powerful than curriculum learning in improving the prediction performance on hard episodes.

【2】 Bayesian Meta-Learning Through Variational Gaussian Processes 标题：基于变分高斯过程的贝叶斯元学习链接：https://arxiv.org/abs/2110.11044

作者：Vivek Myers,Nikhil Sardana 机构： children between 18 and 30 months old 1Department of Computer Science, Stanford University 摘要：元学习领域的最新进展解决了由大量小（“少量”）监督学习任务组成的领域。元学习算法必须能够快速适应任何单个少量任务，适应任务中的一个小支持集，并使用它预测任务查询集的标签。此问题设置可以扩展到贝叶斯上下文，其中模型预测捕获其不确定性的标签分布，而不是预测每个查询数据点的单个标签。该领域的成功方法包括基于MAML模型的贝叶斯融合、贝叶斯神经网络和具有学习的深核和均值函数的高斯过程。虽然高斯过程在元学习环境中具有稳健的贝叶斯解释，但它们不能自然地建模非高斯预测后验概率来表示不确定性。在本文中，我们设计了一种理论上具有原则性的方法VMGP，它扩展了基于高斯过程的元学习，以实现高质量、任意非高斯不确定性预测。在具有复杂非光滑或不连续结构的基准环境中，我们发现我们的VMGP方法的性能明显优于现有的贝叶斯元学习基线。摘要：Recent advances in the field of meta-learning have tackled domains consisting of large numbers of small ("few-shot") supervised learning tasks. Meta-learning algorithms must be able to rapidly adapt to any individual few-shot task, fitting to a small support set within a task and using it to predict the labels of the task's query set. This problem setting can be extended to the Bayesian context, wherein rather than predicting a single label for each query data point, a model predicts a distribution of labels capturing its uncertainty. Successful methods in this domain include Bayesian ensembling of MAML-based models, Bayesian neural networks, and Gaussian processes with learned deep kernel and mean functions. While Gaussian processes have a robust Bayesian interpretation in the meta-learning context, they do not naturally model non-Gaussian predictive posteriors for expressing uncertainty. In this paper, we design a theoretically principled method, VMGP, extending Gaussian-process-based meta-learning to allow for high-quality, arbitrary non-Gaussian uncertainty predictions. On benchmark environments with complex non-smooth or discontinuous structure, we find our VMGP method performs significantly better than existing Bayesian meta-learning baselines.

【3】 Forecasting Market Prices using DL with Data Augmentation and Meta-learning: ARIMA still wins! 标题：使用数据增强和元学习的DL预测市场价格：ARIMA仍然获胜！链接：https://arxiv.org/abs/2110.10233

作者：Vedant Shah,Gautam Shroff 机构：APPCAIR, BITS Pilani, K K Birla Goa Campus, TCS Research, New Delhi 备注：Accepted at the ICBINB Workshop @ NeurIPS, 2021 摘要：深度学习技术已成功地用于时间序列预测，与传统技术相比，在许多标准基准数据集上往往表现出优越的性能。在这里，我们对金融市场中预测价格的深度学习技术的性能进行了全面和比较研究。我们根据来自货币和股票市场的数据，对最先进的深度学习基线（如NBeats等）进行基准测试。我们还使用基于模糊逻辑的需求模型生成合成数据，该模型由技术规则（如交易员经常使用的移动平均线）驱动。我们在这个合成数据上对基线技术进行基准测试，并将其用于数据扩充。我们还应用基于梯度的元学习来解释金融时间序列的非平稳性。尽管我们进行了大量的实验，但令人惊讶的结果是，即使使用数据增强或元学习，标准ARIMA模型也优于深度学习。最后，我们猜测为什么会出现这种情况。摘要：Deep-learning techniques have been successfully used for time-series forecasting and have often shown superior performance on many standard benchmark datasets as compared to traditional techniques. Here we present a comprehensive and comparative study of performance of deep-learning techniques for forecasting prices in financial markets. We benchmark state-of-the-art deep-learning baselines, such as NBeats, etc., on data from currency as well as stock markets. We also generate synthetic data using a fuzzy-logic based model of demand driven by technical rules such as moving averages, which are often used by traders. We benchmark the baseline techniques on this synthetic data as well as use it for data augmentation. We also apply gradient-based meta-learning to account for non-stationarity of financial time-series. Our extensive experiments notwithstanding, the surprising result is that the standard ARIMA models outperforms deep-learning even using data augmentation or meta-learning. We conclude by speculating as to why this might be the case.

符号|符号学习(2篇)

【1】 SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark 标题：SILG：多环境符号交互语言基础基准链接：https://arxiv.org/abs/2110.10661

作者：Victor Zhong,Austin W. Hanjie,Sida I. Wang,Karthik Narasimhan,Luke Zettlemoyer 机构：Department of Computer Science, University of Washington, Department of Computer Science, Princeton University, Facebook AI Research 备注：NeurIPS 2021. 14 pages, 8 figures 摘要：语言基础的现有工作通常研究单一环境。我们如何构建跨多个环境应用的统一模型？我们提出了多环境符号交互语言基础基准测试（SILG），它在一个共同的界面下统一了一系列不同的基础语言学习环境。SILG由网格世界环境组成，这些环境需要泛化到新的动力学、实体和部分观察到的世界（RTFM、Messenger、NetHack），以及需要解释复杂场景丰富自然语言的视觉世界的符号对应物（ALFWorld、触地）。总之，这些环境在丰富的观察空间、行动空间、语言规范和计划复杂性方面提供了不同的基础挑战。此外，我们提出了第一个在这些环境下的RL共享模型架构，并评估了最近的进展，如以自我为中心的局部卷积、重复状态跟踪、以实体为中心的注意以及使用SILG的预训练LM。我们的共享体系结构实现了与特定于环境的体系结构相当的性能。此外，我们发现，许多最新的建模进展并没有在设计环境以外的环境中带来显著的收益。这突出了对多环境基准的需要。最后，最好的模型在SILG上的表现明显逊于人类，这意味着未来的工作有足够的空间。我们希望SILG能够使社区快速确定新的语言基础方法，从而推广到各种环境及其相关挑战。摘要：Existing work in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG. Our shared architecture achieves comparable performance to environment-specific architectures. Moreover, we find that many recent modelling advances do not result in significant gains on environments other than the one they were designed for. This highlights the need for a multi-environment benchmark. Finally, the best models significantly underperform humans on SILG, which suggests ample room for future work. We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.

【2】 More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences 标题：用符号先验更有效地探索动作序列等价性链接：https://arxiv.org/abs/2110.10632

作者：Toby Johnstone,Nathan Grinsztajn,Johan Ferret,Philippe Preux 机构：Inria, Scool Team, Ecole Polytechnique, CRIStAL, CNRS, Université de Lille, Google Research, Brain Team 摘要：在强化学习算法中加入先验知识主要是一个悬而未决的问题。即使有关于环境动态的见解，强化学习传统上也是在表格环境中使用的，必须从头开始探索和学习一切。在本文中，我们考虑利用行为序列等价的先验问题：即，当不同的动作序列产生相同的效果时。我们提出了一种新的局部探索策略，该策略可以最小化碰撞并最大化新的状态访问。我们通过求解一个凸优化问题，证明了该策略可以以很小的代价进行计算。通过替换DQN中通常的epsilon贪婪策略，我们展示了它在具有各种动态结构的多个环境中的潜力。摘要：Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.

分层学习(2篇)

【1】 Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning 标题：深度学习层次化系统的最优并行度放置和归约策略综合链接：https://arxiv.org/abs/2110.10548

作者：Ningning Xie,Tamara Norman,Dominik Grewe,Dimitrios Vytiniotis 机构： and the communication cost becomes more 1University of Cambridge 2DeepMind 备注：Submitted to the 5th MLSys Conference 摘要：我们提出了一种将多种并行形式（如数据和模型并行）映射到分层加速器系统的新特征，该系统具有层次意识，并大大减少了软硬件映射的空间。我们通过实验验证了这些映射对所有reduce性能（高达448x）的实质性影响。我们提供了一种新的语法引导的程序合成框架，该框架能够以层次和映射感知的方式将一个或多个并行轴上的约简分解为集合序列。对于69%的并行放置和用户请求的缩减，我们的框架综合了在不同GPU层次结构上评估时优于默认all REDUCT实现的程序（最大2.04x，平均1.27x）。我们用一个超过90%top-10精度的模拟器来补充我们的合成工具，因此减少了对合成结果进行大规模评估以确定一小部分最优程序和映射的需要。摘要：We present a novel characterization of the mapping of multiple parallelism forms (e.g. data and model parallelism) onto hierarchical accelerator systems that is hierarchy-aware and greatly reduces the space of software-to-hardware mapping. We experimentally verify the substantial effect of these mappings on all-reduce performance (up to 448x). We offer a novel syntax-guided program synthesis framework that is able to decompose reductions over one or more parallelism axes to sequences of collectives in a hierarchy- and mapping-aware way. For 69% of parallelism placements and user requested reductions, our framework synthesizes programs that outperform the default all-reduce implementation when evaluated on different GPU hierarchies (max 2.04x, average 1.27x). We complement our synthesis tool with a simulator exceeding 90% top-10 accuracy, which therefore reduces the need for massive evaluations of synthesis results to determine a small set of optimal programs and mappings.

【2】 Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach 标题：从语言模型到声学模型的知识提炼：一种分层多任务学习方法链接：https://arxiv.org/abs/2110.10429

作者：Mun-Hak Lee,Joon-Hyuk Chang 机构：Department of Electronics Engineering, Hanyang University, Seoul, Republic of Korea 备注：4page 1page for citation 2 pages for appendix 摘要：使用自我监督学习的预训练语言模型（LM）的显著性能导致了自然语言处理研究的重大范式转变。根据这些变化，利用基于大规模深度学习的LMs的语音识别系统的性能是语音识别研究的一个主要课题。在将LMs应用于语音识别系统的各种方法中，本文重点研究了一种跨模态知识提取方法，该方法可以在两种不同模态的深度神经网络之间传递知识。我们提出了一种具有多个辅助输出层的跨模式蒸馏声学模型结构，并证明了该方法有效地弥补了现有基于标签插值的蒸馏方法的不足。此外，我们将所提出的方法扩展到使用不同单元（senone、monophone和subwords）训练的LMs的分层蒸馏方法，并通过烧蚀研究揭示了分层蒸馏方法的有效性。摘要：The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

医学相关(6篇)

【1】 Using NASA Satellite Data Sources and Geometric Deep Learning to Uncover Hidden Patterns in COVID-19 Clinical Severity 标题：利用NASA卫星数据源和几何深度学习揭示冠状病毒临床严重程度的隐藏模式链接：https://arxiv.org/abs/2110.10849

作者：Ignacio Segovia-Dominguez,Huikyo Lee,Zhiwei Zhen,Yuzhou Chen,Michael Garay,Daniel Crichton,Rishabh Wagh,Yulia R. Gel 机构：NASA Jet Propulsion Lab†UT Dallas‡Princeton UniversitySeptember 200 2 备注：Main Paper and Appendix 摘要：正如2021年发生的多起不良事件所表明的那样，我们社会功能的几乎所有方面——从水和食品安全到能源供应到医疗保健——比以往任何时候都更加依赖于环境因素的动态。然而，由于缺乏可靠且易于使用的数据，机器学习社区对天气和气候的社会层面的探索明显较少。在这里，我们介绍了一个独特的尚未广泛使用的美国宇航局关于气溶胶光学厚度（AOD）、温度和相对湿度的卫星数据集，并讨论了这些新数据对新冠病毒-19生物监测的效用。特别是，我们使用几何深度学习模型在邻近美国的县级基础上进行半监督分类，研究了紧迫的社会问题：大气变量是否对新冠病毒-19的临床严重性有相当大的影响。摘要：As multiple adverse events in 2021 illustrated, virtually all aspects of our societal functioning -- from water and food security to energy supply to healthcare -- more than ever depend on the dynamics of environmental factors. Nevertheless, the social dimensions of weather and climate are noticeably less explored by the machine learning community, largely, due to the lack of reliable and easy access to use data. Here we present a unique not yet broadly available NASA's satellite dataset on aerosol optical depth (AOD), temperature and relative humidity and discuss the utility of these new data for COVID-19 biosurveillance. In particular, using the geometric deep learning models for semi-supervised classification on a county-level basis over the contiguous United States, we investigate the pressing societal question whether atmospheric variables have considerable impact on COVID-19 clinical severity.

【2】 CXR-Net: An Encoder-Decoder-Encoder Multitask Deep Neural Network for Explainable and Accurate Diagnosis of COVID-19 pneumonia with Chest X-ray Images 标题：CXR-Net：一种编解码器-解码器-编码器多任务深度神经网络用于解释和准确诊断冠状病毒肺炎链接：https://arxiv.org/abs/2110.10813

作者：Xin Zhang,Liangxiu Han,Tam Sobeih,Lianghao Han,Nina Dempsey,Symeon Lechareas,Ascanio Tridente,Haoming Chen,Stephen White 机构：ManchesterMetropolitanUniversity 摘要：准确、快速地检测新冠肺炎对优化患者治疗至关重要。胸部X射线（CXR）是诊断新冠肺炎的第一线影像学检查，因为它快速、廉价且容易获得。受计算机视觉中深度学习（DL）成功的启发，许多DL模型被提出使用CXR图像检测新冠肺炎。不幸的是，这些深层分类器在解释研究结果时缺乏透明度，这可能会限制它们在临床实践中的应用。现有常用的视觉解释方法要么过于嘈杂，要么不精确，分辨率低，因此不适合用于诊断目的。在这项工作中，我们提出了一种新的可解释的深度学习框架（CXRNet），用于准确检测新冠肺炎，并从CXR图像中增强像素级的视觉解释。该框架基于一种新的编码器-解码器-编码器多任务体系结构，支持疾病分类和可视化解释。该方法已在公共和私人数据源的真实CXR数据集上进行了评估，包括：健康、细菌性肺炎、，病毒性肺炎和新冠肺炎病例的实验结果表明，所提出的方法能够达到令人满意的准确度，并为肺部疾病检测中的可视化解释提供了精细的分辨率分类激活图。新冠肺炎的平均准确度、精密度、召回率和F1评分分别为0.879、0.985、0.992和0.989。我们还发现，使用肺分割（CXR）图像有助于提高模型的性能。与目前最先进的可视化解释方法相比，本文提出的方法可以为分类决策提供更详细的高分辨率可视化解释，在临床上用于新冠肺炎诊断具有很大的潜力。摘要：Accurate and rapid detection of COVID-19 pneumonia is crucial for optimal patient treatment. Chest X-Ray (CXR) is the first line imaging test for COVID-19 pneumonia diagnosis as it is fast, cheap and easily accessible. Inspired by the success of deep learning (DL) in computer vision, many DL-models have been proposed to detect COVID-19 pneumonia using CXR images. Unfortunately, these deep classifiers lack the transparency in interpreting findings, which may limit their applications in clinical practice. The existing commonly used visual explanation methods are either too noisy or imprecise, with low resolution, and hence are unsuitable for diagnostic purposes. In this work, we propose a novel explainable deep learning framework (CXRNet) for accurate COVID-19 pneumonia detection with an enhanced pixel-level visual explanation from CXR images. The proposed framework is based on a new Encoder-Decoder-Encoder multitask architecture, allowing for both disease classification and visual explanation. The method has been evaluated on real world CXR datasets from both public and private data sources, including: healthy, bacterial pneumonia, viral pneumonia and COVID-19 pneumonia cases The experimental results demonstrate that the proposed method can achieve a satisfactory level of accuracy and provide fine-resolution classification activation maps for visual explanation in lung disease detection. The Average Accuracy, the Precision, Recall and F1-score of COVID-19 pneumonia reached 0.879, 0.985, 0.992 and 0.989, respectively. We have also found that using lung segmented (CXR) images can help improve the performance of the model. The proposed method can provide more detailed high resolution visual explanation for the classification decision, compared to current state-of-the-art visual explanation methods and has a great potential to be used in clinical practice for COVID-19 pneumonia diagnosis.

【3】 Predicting Tau Accumulation in Cerebral Cortex with Multivariate MRI Morphometry Measurements, Sparse Coding, and Correntropy 标题：用多变量MRI形态测量、稀疏编码和相关熵预测Tau在大脑皮层的积聚链接：https://arxiv.org/abs/2110.10709

作者：Jianfeng Wu,Wenhui Zhu,Yi Su,Jie Gui,Natasha Lepore,Eric M. Reiman,Richard J. Caselli,Paul M. Thompson,Kewei Chen,Yalin Wang 机构：a School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, USA; b Banner Alzheimer’s Institute, Phoenix, USA; c School of Cyber Science and 备注：10 pages, 5 figures, 17th International Symposium on Medical Information Processing and Analysis 摘要：生物标志物辅助诊断和干预阿尔茨海默病（AD）可能是预防突破的关键。AD的特征之一是大脑中tau斑块的积累。然而，目前检测tau病理的方法要么是侵入性的（腰椎穿刺），要么是非常昂贵且不广泛使用的（tau-PET）。在我们之前的工作中，基于结构MRI的海马多变量形态计量统计（MMS）显示出作为临床前AD和基于斑块分析的表面相关熵诱导稀疏编码和最大池（PASCS-MP）的有效神经退行性生物标记物的优越性能具有生成低维表示的出色能力，具有强大的统计能力，可用于脑淀粉样蛋白预测。在这项工作中，我们应用这个框架和岭回归模型分别预测Braak12和Braak34脑区的Tau沉积。我们对来自阿尔茨海默病神经成像倡议（ADNI）的925名受试者评估了我们的框架。每个受试者都有一对由PET图像和MRI扫描组成的图像，这些图像是在大约相同的时间采集的。实验结果表明，我们的MMS和PASCS-MP表示具有更强的预测能力，与其他方法（如海马表面积和体积）以及基于球谐函数的形状形态测量特征（SPHARM）相比，其预测的Braak12和Braak34更接近真实值. 摘要：Biomarker-assisted diagnosis and intervention in Alzheimer's disease (AD) may be the key to prevention breakthroughs. One of the hallmarks of AD is the accumulation of tau plaques in the human brain. However, current methods to detect tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (Tau PET). In our previous work, structural MRI-based hippocampal multivariate morphometry statistics (MMS) showed superior performance as an effective neurodegenerative biomarker for preclinical AD and Patch Analysis-based Surface Correntropy-induced Sparse coding and max-pooling (PASCS-MP) has excellent ability to generate low-dimensional representations with strong statistical power for brain amyloid prediction. In this work, we apply this framework together with ridge regression models to predict Tau deposition in Braak12 and Braak34 brain regions separately. We evaluate our framework on 925 subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Each subject has one pair consisting of a PET image and MRI scan which were collected at about the same times. Experimental results suggest that the representations from our MMS and PASCS-MP have stronger predictive power and their predicted Braak12 and Braak34 are closer to the real values compared to the measures derived from other approaches such as hippocampal surface area and volume, and shape morphometry features based on spherical harmonics (SPHARM).

【4】 OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data 标题：OSS-Net：高效存储的三维医学数据高分辨率语义分割链接：https://arxiv.org/abs/2110.10640

作者：Christoph Reich,Tim Prangemeier,Özdemir Cetin,Heinz Koeppl 机构：Centre for Synthetic Biology, Department of Electrical Engineering, and Information Technology, Department of Biology, Technische Universität Darmstadt 备注：BMVC 2021 (accepted), this https URL (code) 摘要：卷积神经网络（CNN）是目前最先进的元算法，用于医学数据的体积分割，例如，在计算机断层扫描上定位新冠病毒-19感染的组织或在磁共振成像中检测肿瘤体积。3D CNN对体素化数据的一个关键限制是，内存消耗随着训练数据分辨率的提高而呈立方体增长。占用网络（O型网络）是一种替代方案，在功能空间中连续表示数据，并将3D形状作为连续决策边界进行学习。虽然O型网络比3D CNN具有更高的内存效率，但它们仅限于简单的形状，推理速度相对较慢，并且尚未适应医学数据的3D语义分割。在这里，我们提出了用于语义分割的占用网络（OSS网络），以准确且高效地分割3D医疗数据。我们在原始O-Net的基础上进行了修改，以提高表达能力，从而提高了与3D CNN相当的分割性能，并进行了修改以加快推理速度。我们利用局部观测来表示复杂的形状和先前的编码器预测来加速推理。我们根据功能空间基线（O-Net）、性能基线（3D残余U-Net）和效率基线（2D残余U-Net），展示了OSS-Net在3D脑瘤和肝脏分割方面的性能。OSS Net产生的细分结果类似于性能基线，优于功能空间和效率基线。在内存效率方面，OSS Net消耗的内存量与功能空间基线相当，略高于效率基线，显著低于性能基线。因此，OSS网络能够实现高效、准确的3D语义分割，可以扩展到高分辨率。摘要：Convolutional neural networks (CNNs) are the current state-of-the-art meta-algorithm for volumetric segmentation of medical data, for example, to localize COVID-19 infected tissue on computer tomography scans or the detection of tumour volumes in magnetic resonance imaging. A key limitation of 3D CNNs on voxelised data is that the memory consumption grows cubically with the training data resolution. Occupancy networks (O-Nets) are an alternative for which the data is represented continuously in a function space and 3D shapes are learned as a continuous decision boundary. While O-Nets are significantly more memory efficient than 3D CNNs, they are limited to simple shapes, are relatively slow at inference, and have not yet been adapted for 3D semantic segmentation of medical data. Here, we propose Occupancy Networks for Semantic Segmentation (OSS-Nets) to accurately and memory-efficiently segment 3D medical data. We build upon the original O-Net with modifications for increased expressiveness leading to improved segmentation performance comparable to 3D CNNs, as well as modifications for faster inference. We leverage local observations to represent complex shapes and prior encoder predictions to expedite inference. We showcase OSS-Net's performance on 3D brain tumour and liver segmentation against a function space baseline (O-Net), a performance baseline (3D residual U-Net), and an efficiency baseline (2D residual U-Net). OSS-Net yields segmentation results similar to the performance baseline and superior to the function space and efficiency baselines. In terms of memory efficiency, OSS-Net consumes comparable amounts of memory as the function space baseline, somewhat more memory than the efficiency baseline and significantly less than the performance baseline. As such, OSS-Net enables memory-efficient and accurate 3D semantic segmentation that can scale to high resolutions.

【5】 Medical Knowledge-Guided Deep Curriculum Learning for Elbow Fracture Diagnosis from X-Ray Images 标题：医学知识指导下的肘部骨折X线影像诊断深度课程学习链接：https://arxiv.org/abs/2110.10381

作者：Jun Luo,Gene Kitamura,Emine Doganay,Dooman Arefan,Shandong Wu 机构：Intelligent Systems Program, University of Pittsburgh, Fifth Avenue, Pittsburgh, PA, Dept. of Radiology, University of Pittsburgh, Fifth Avenue, Pittsburgh, PA, USA, Dept. of Biomedical Informatics, University of Pittsburgh, Fifth Avenue, Pittsburgh 备注：None 摘要：肘关节骨折是最常见的骨折类型之一。肘部骨折的诊断通常需要经过多年训练的专业放射科医生通过放射成像进行阅读和分析。得益于深度学习的最新进展，一个能够分类和检测不同类型骨折的模型只需要数小时的训练，并已显示出有希望的结果。然而，大多数现有的深度学习模型都是纯数据驱动的，缺乏来自人类专家的已知领域知识。在这项工作中，我们提出了一种新的深度学习方法，通过将特定领域的医学知识整合到课程学习框架中，从肘部X射线图像诊断肘部骨折。在我们的方法中，在每个训练时段开始时，通过不替换的采样对训练数据进行置换。每个训练样本的抽样概率由基于人类专家的临床已知知识构建的评分标准指导，其中评分指示不同肘关节骨折亚型的诊断难度。我们还提出了一种更新每个历元抽样概率的算法，该算法适用于其他基于抽样的课程学习框架。我们设计了一个实验，使用1865个肘关节X射线图像进行骨折/正常二值分类，并将我们提出的方法与基线方法和以前使用多个度量的方法进行比较。结果表明，该方法具有最高的分类性能。此外，我们提出的概率更新算法提高了前一种方法的性能。摘要：Elbow fractures are one of the most common fracture types. Diagnoses on elbow fractures often need the help of radiographic imaging to be read and analyzed by a specialized radiologist with years of training. Thanks to the recent advances of deep learning, a model that can classify and detect different types of bone fractures needs only hours of training and has shown promising results. However, most existing deep learning models are purely data-driven, lacking incorporation of known domain knowledge from human experts. In this work, we propose a novel deep learning method to diagnose elbow fracture from elbow X-ray images by integrating domain-specific medical knowledge into a curriculum learning framework. In our method, the training data are permutated by sampling without replacement at the beginning of each training epoch. The sampling probability of each training sample is guided by a scoring criterion constructed based on clinically known knowledge from human experts, where the scoring indicates the diagnosis difficultness of different elbow fracture subtypes. We also propose an algorithm that updates the sampling probabilities at each epoch, which is applicable to other sampling-based curriculum learning frameworks. We design an experiment with 1865 elbow X-ray images for a fracture/normal binary classification task and compare our proposed method to a baseline method and a previous method using multiple metrics. Our results show that the proposed method achieves the highest classification performance. Also, our proposed probability update algorithm boosts the performance of the previous method.

【6】 A New Automatic Change Detection Frame-work Based on Region Growing and Weighted Local Mutual Information: Analysis of Breast Tumor Response to Chemotherapy in Serial MR Images 标题：一种新的基于区域生长和加权局部互信息的自动变化检测框架：乳腺肿瘤对化疗反应的序列磁共振图像分析链接：https://arxiv.org/abs/2110.10242

作者：Narges Norouzi,Reza Azmi,Nooshin Noshiri,Robab Anbiaee 机构： Alzahra University Nooshin Noshiri, The University of Winnipeg Robab Anbiaee 备注：18 pages, 16 figures, 14 tables 摘要：自动分析纵向MR图像之间的细微变化是一项重要任务，因为它仍然是乳腺医学图像处理领域的一个具有挑战性的问题。在本文中，我们提出了一个由两个阶段组成的有效的自动变化检测框架，因为以前使用的方法具有低区分能力的特点。首先，在预处理阶段，提出了一种基于层次直方图匹配（HHM）的强度归一化方法，该方法比以前的方法对噪声具有更强的鲁棒性。为了消除不必要的变化并提取包含显著变化的区域，提出了基于强度分布和爬山算法的提取变化区域（EROC）方法。其次，在检测阶段，建议使用基于区域增长的方法来区分重大变化和非真实变化。由于使用了加权局部互信息（WLMI）方法来提取高层特征，并且利用了变化的局部一致性原则，因此该方法具有合理的性能。在模拟和真实纵向乳腺MR图像上的实验结果证实了该框架的有效性。此外，在某些情况下，该框架优于人类专家，可以检测专家遗漏的许多病变演变。摘要：The automatic analysis of subtle changes between longitudinal MR images is an important task as it is still a challenging issue in scope of the breast medical image processing. In this paper we propose an effective automatic change detection framework composed of two phases since previously used methods have features with low distinctive power. First, in the preprocessing phase an intensity normalization method is suggested based on Hierarchical Histogram Matching (HHM) that is more robust to noise than previous methods. To eliminate undesirable changes and extract the regions containing significant changes the proposed Extraction Region of Changes (EROC) method is applied based on intensity distribution and Hill-Climbing algorithm. Second, in the detection phase a region growing-based approach is suggested to differentiate significant changes from unreal ones. Due to using proposed Weighted Local Mutual Information (WLMI) method to extract high level features and also utilizing the principle of the local consistency of changes, the proposed approach enjoys reasonable performance. The experimental results on both simulated and real longitudinal Breast MR Images confirm the effectiveness of the proposed framework. Also, this framework outperforms the human expert in some cases which can detect many lesion evolutions that are missed by expert.

推荐(3篇)

【1】 Personalized Transfer of User Preferences for Cross-domain Recommendation 标题：用于跨域推荐的用户偏好的个性化传递链接：https://arxiv.org/abs/2110.11154

作者：Yongchun Zhu,Zhenwei Tang,Yudan Liu,Fuzhen Zhuang,Ruobing Xie,Xu Zhang,Leyu Lin,Qing He 机构：Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing, Technology, CAS, Beijing , China, Institute of Artificial Intelligence, Beihang University, Beijing , China 备注：Accepted by WSDM 2022 摘要：冷启动问题在推荐系统中仍然是一个非常具有挑战性的问题。幸运的是，辅助源域中冷启动用户的交互可以帮助目标域中的冷启动建议。如何将用户的偏好从源域转移到目标域，是跨域推荐（CDR）中的关键问题，CDR是解决冷启动问题的一种很有前途的方法。大多数现有的方法都建立了一个通用的偏好桥来传递所有用户的偏好。直观地说，由于不同用户的偏好不同，不同用户的偏好桥梁应该不同。基于此，我们提出了一种新的跨域推荐个性化用户偏好传递框架（PTUPCDR）。具体地说，学习一个由用户特征嵌入提供反馈的元网络来生成个性化的桥接函数，以实现每个用户的个性化偏好传递。为了稳定地学习元网络，我们采用了面向任务的优化过程。通过元生成的个性化桥接功能，可以将源域中的用户偏好嵌入转化为目标域，转化后的用户偏好嵌入可以作为冷启动用户在目标域中的初始嵌入。使用大量真实数据集，我们进行了大量实验，以评估PTUPCDR在冷启动和热启动阶段的有效性。该代码已在url中提供{https://github.com/easezyc/WSDM2022-PTUPCDR}. 摘要：Cold-start problem is still a very challenging problem in recommender systems. Fortunately, the interactions of the cold-start users in the auxiliary source domain can help cold-start recommendations in the target domain. How to transfer user's preferences from the source domain to the target domain, is the key issue in Cross-domain Recommendation (CDR) which is a promising solution to deal with the cold-start problem. Most existing methods model a common preference bridge to transfer preferences for all users. Intuitively, since preferences vary from user to user, the preference bridges of different users should be different. Along this line, we propose a novel framework named Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR). Specifically, a meta network fed with users' characteristic embeddings is learned to generate personalized bridge functions to achieve personalized transfer of preferences for each user. To learn the meta network stably, we employ a task-oriented optimization procedure. With the meta-generated personalized bridge function, the user's preference embedding in the source domain can be transformed into the target domain, and the transformed user preference embedding can be utilized as the initial embedding for the cold-start user in the target domain. Using large real-world datasets, we conduct extensive experiments to evaluate the effectiveness of PTUPCDR on both cold-start and warm-start stages. The code has been available at url{https://github.com/easezyc/WSDM2022-PTUPCDR}.

【2】 Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce 标题：电子商务中监视列表推荐的多属性序贯建模链接：https://arxiv.org/abs/2110.11072

作者：Uriel Singer,Haggai Roitman,Yotam Eshel,Alexander Nus,Ido Guy,Or Levi,Idan Hasson,Eliyahu Kiperwasser 机构：Kiperwasser, Bay Research, Israel, Facebook 备注：None 摘要：在电子商务中，观察列表使用户能够随着时间的推移跟踪商品，并已成为一项主要功能，在用户的购物过程中发挥着重要作用。观察列表项通常具有多个属性，其值可能随时间而变化（例如，价格、数量）。由于许多用户在他们的观察列表上累积了几十个项目，而且购物意图随着时间的推移而变化，因此在给定的上下文中推荐排名靠前的观察列表项目可能是有价值的。在这项工作中，我们研究了电子商务中的观察列表功能，并引入了一个新的观察列表推荐任务。我们的目标是通过预测用户将单击的下一个项目来确定用户下一步应该注意哪些观察列表项目的优先级。我们将此任务转换为一个特殊的顺序推荐任务，并讨论其特征。我们提出的推荐模型Trans2D是建立在Transformer架构之上的，在Transformer架构中，我们进一步提出了一种新的扩展注意机制（Attention2D），该机制允许从具有多个项目属性的序列数据中学习复杂的项目、属性和项目属性模式。使用来自eBay的大规模观察列表数据集，我们对我们提出的模型进行了评估，与多个最先进的基线相比，我们展示了它的优越性，其中许多基线适用于此任务。摘要：In e-commerce, the watchlist enables users to track items over time and has emerged as a primary feature, playing an important role in users' shopping journey. Watchlist items typically have multiple attributes whose values may change over time (e.g., price, quantity). Since many users accumulate dozens of items on their watchlist, and since shopping intents change over time, recommending the top watchlist items in a given context can be valuable. In this work, we study the watchlist functionality in e-commerce and introduce a novel watchlist recommendation task. Our goal is to prioritize which watchlist items the user should pay attention to next by predicting the next items the user will click. We cast this task as a specialized sequential recommendation task and discuss its characteristics. Our proposed recommendation model, Trans2D, is built on top of the Transformer architecture, where we further suggest a novel extended attention mechanism (Attention2D) that allows to learn complex item-item, attribute-attribute and item-attribute patterns from sequential-data with multiple item attributes. Using a large-scale watchlist dataset from eBay, we evaluate our proposed model, where we demonstrate its superiority compared to multiple state-of-the-art baselines, many of which are adapted for this task.

【3】 A Real-Time Energy and Cost Efficient Vehicle Route Assignment Neural Recommender System 标题：一种实时节能高效的车辆路径分配神经推荐系统链接：https://arxiv.org/abs/2110.10887

作者：Ayman Moawad,Zhijian Li,Ines Pancorbo,Krishna Murthy Gurumurthy,Vincent Freyermuth,Ehsan Islam,Ram Vijayagopal,Monique Stinson,Aymeric Rousseau 备注：14 pages, 11 figures 摘要：提出了一种基于能量和费用准则的车辆路径分配神经网络推荐系统算法。在这项工作中，我们应用这一新方法，从总拥有成本（TCO）的角度，针对给定的行程，有效地确定最具成本效益的中型和重型卡车（MDHDT）动力传动系统技术。我们采用一种基于机器学习的方法来有效地估计给定路线上各种候选车辆的能量消耗，这些路线被定义为路段序列（路段），而对内部动力学知之甚少，即使用高水平的宏观路线信息。然后开发一个完整的推荐逻辑，以便根据车队的运行约束，实时优化每条路线的分配。我们展示了如何使用该框架（1）有效地提供具有最高$k$车辆星级排名系统的单程推荐，以及（2）在需要在$mleq n$行程上部署$n$车辆的情况下，处理更一般的分配问题。这一新的分配系统已部署并集成到POLARIS交通系统仿真工具中，用于能源部交通（智能）移动联盟加速研究系统和建模的研究摘要：This paper presents a neural network recommender system algorithm for assigning vehicles to routes based on energy and cost criteria. In this work, we applied this new approach to efficiently identify the most cost-effective medium and heavy duty truck (MDHDT) powertrain technology, from a total cost of ownership (TCO) perspective, for given trips. We employ a machine learning based approach to efficiently estimate the energy consumption of various candidate vehicles over given routes, defined as sequences of links (road segments), with little information known about internal dynamics, i.e using high level macroscopic route information. A complete recommendation logic is then developed to allow for real-time optimum assignment for each route, subject to the operational constraints of the fleet. We show how this framework can be used to (1) efficiently provide a single trip recommendation with a top-$k$ vehicles star ranking system, and (2) engage in more general assignment problems where $n$ vehicles need to be deployed over $m leq n$ trips. This new assignment system has been deployed and integrated into the POLARIS Transportation System Simulation Tool for use in research conducted by the Department of Energy's Systems and Modeling for Accelerated Research in Transportation (SMART) Mobility Consortium

自动驾驶|车辆|车道检测等(2篇)

【1】 A Utility Maximization Model of Pedestrian and Driver Interactions 标题：行人与驾驶员相互作用的效用最大化模型链接：https://arxiv.org/abs/2110.11015

作者：Yi-Shin Lin,Aravinda Ramakrishnan Srinivasan,Matteo Leonetti,Jac Billington,Gustav Markkula 机构：Aravinda R. Srinivasan is with the Institute for Transport Studies, University, surrounding road user trajectories [,], [,] and as virtual agents, in offline simulations, for safety testing of the algorithms 备注：10 pages, 7 figures 摘要：许多模型考虑了道路使用者的交通流，但很少有模型考虑了局部相互作用的细节，以及它们如何恶化为安全临界情况。基于感觉运动控制的概念，我们开发了一个建模框架，应用效用最大化原理、运动原语和间歇动作决策来解释道路使用者之间交互行为的细节。该框架将这些原则与决策理论联系起来，并用于确定这种方法是否能够再现以下现象：当两个行人在交叉口道路上行驶时，（a）他们的相互作用对初始不对称性敏感，以及（b）基于此，他们通过调整自己的行为迅速解决冲突。当行人面对迎面而来的汽车横穿马路时，（c）任一道路使用者向另一方让步以解决他们的冲突，类似于行人互动，（d）结果揭示了与车辆加速性质相关的特定情境运动学。我们表明，当模型可以根据情况演化其参数时，这些现象自然地出现在我们的建模框架中。我们相信，建模框架和以现象为中心的分析为理解道路使用者交互提供了有希望的工具。最后，我们讨论了在道路使用者互动中加入其他变量时，该模型如何有助于研究安全临界情况。摘要：Many models account for the traffic flow of road users but few take the details of local interactions into consideration and how they could deteriorate into safety-critical situations. Building on the concept of sensorimotor control, we develop a modeling framework applying the principles of utility maximization, motor primitives, and intermittent action decisions to account for the details of interactive behaviors among road users. The framework connects these principles to the decision theory and is applied to determine whether such an approach can reproduce the following phenomena: When two pedestrians travel on crossing paths, (a) their interaction is sensitive to initial asymmetries, and (b) based on which, they rapidly resolve collision conflict by adapting their behaviors. When a pedestrian crosses the road while facing an approaching car, (c) either road user yields to the other to resolve their conflict, akin to the pedestrian interaction, and (d) the outcome reveals a specific situational kinematics, associated with the nature of vehicle acceleration. We show that these phenomena emerge naturally from our modeling framework when the model can evolve its parameters as a consequence of the situations. We believe that the modeling framework and phenomenon-centered analysis offer promising tools to understand road user interactions. We conclude with a discussion on how the model can be instrumental in studying the safety-critical situations when including other variables in road-user interactions.

【2】 Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic Forecasting 标题：学习记忆模式：用于交通预测的模式匹配记忆网络链接：https://arxiv.org/abs/2110.10380

作者：Hyunwook Lee,Seungmin Jin,Hyeshin Chu,Hongkyu Lim,Sungahn Ko 机构：Ulsan National Institute of Science and Technology 备注：12 pages, Submitted as conference paper to ICLR 2022 摘要：交通预测是一个具有挑战性的问题，因为复杂的道路网络和道路上各种事件引起的突然速度变化。为了解决这一具有挑战性的问题，人们提出了许多模型，重点是学习道路的时空相关性。在这项工作中，我们提出了一个将预测问题转换为模式匹配任务的新视角，假设大数据可以由一组模式表示。为了评估新观点的有效性，我们设计了一种新的流量预测模型，称为模式匹配记忆网络（PM MemNet），该模型学习将输入数据匹配到具有键值记忆结构的代表模式。我们首先提取和聚类代表性的流量模式，作为内存中的密钥。然后，PM MemNet通过匹配提取的密钥和输入，从内存中获取现有交通模式的必要信息，并将其用于预测。为了模拟流量的时空相关性，我们提出了一种新的内存结构GCMem，它集成了注意力和图卷积来增强内存。实验结果表明，PM-MemNet模型比现有的图形-波网络模型具有更高的响应精度。我们还提供了定性分析结果，描述了PM MemNet如何在道路速度快速变化时工作并实现更高的精度。摘要：Traffic forecasting is a challenging problem due to complex road networks and sudden speed changes caused by various events on roads. A number of models have been proposed to solve this challenging problem with a focus on learning spatio-temporal dependencies of roads. In this work, we propose a new perspective of converting the forecasting problem into a pattern matching task, assuming that large data can be represented by a set of patterns. To evaluate the validness of the new perspective, we design a novel traffic forecasting model, called Pattern-Matching Memory Networks (PM-MemNet), which learns to match input data to the representative patterns with a key-value memory structure. We first extract and cluster representative traffic patterns, which serve as keys in the memory. Then via matching the extracted keys and inputs, PM-MemNet acquires necessary information of existing traffic patterns from the memory and uses it for forecasting. To model spatio-temporal correlation of traffic, we proposed novel memory architecture GCMem, which integrates attention and graph convolution for memory enhancement. The experiment results indicate that PM-MemNet is more accurate than state-of-the-art models, such as Graph WaveNet with higher responsiveness. We also present a qualitative analysis result, describing how PM-MemNet works and achieves its higher accuracy when road speed rapidly changes.

点云|SLAM|雷达|激光|深度RGBD相关(2篇)

【1】 Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning 标题：用于高效点云表示学习的各向异性可分集合抽象链接：https://arxiv.org/abs/2110.10538

作者：Guocheng Qian,Hasan Abed Al Kader Hammoud,Guohao Li,Ali Thabet,Bernard Ghanem 机构：King Abdullah University of Science and Technology (KAUST) 备注：NeurIPS'21 Spotlight paper. code available at this https URL 摘要：嵌入在各种移动设备中的激光雷达传感器广泛促进了对三维点云表示的访问。这导致了对快速、准确的点云处理技术的需求。在本文中，我们重新回顾并深入研究了PointNet ，这是最具影响力但尚未充分开发的网络之一，并开发了更快的点云处理技术我们首先提出了一个新的可分离集抽象（SA）模块，该模块将PointNet 中使用的普通SA模块分为两个独立的学习阶段：（1）学习通道相关性和（2）学习空间相关性。可分离SA模块的速度明显快于普通版本，但性能相当。然后，我们在可分离SA模块中引入了一个新的各向异性约简函数，并提出了各向异性可分离SA（ASSA）模块，大大提高了网络的准确性。我们随后将PointNet 中的普通SA模块替换为建议的ASSA模块，并将修改后的网络表示为ASANET。对点云分类、语义分割和部分分割的大量实验表明，ASANET优于PointNet 和其他方法，并且实现更高的精度和更快的速度。特别是，在S3DIS Area 5上，ASSANet的性能比PointNet 高出7.4百万美元，而在单个NVIDIA 2080Ti GPU上，推理速度比PointNet 快1.6倍。我们扩展的ASSANet变体达到66.8百万美元，比KPConv快54倍以上。摘要：Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet , one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first present a novel Separable Set Abstraction (SA) module that disentangles the vanilla SA module used in PointNet into two separate learning stages: (1) learning channel correlation and (2) learning spatial correlation. The Separable SA module is significantly faster than the vanilla version, yet it achieves comparable performance. We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy. We later replace the vanilla SA modules in PointNet with the proposed ASSA module, and denote the modified network as ASSANet. Extensive experiments on point cloud classification, semantic segmentation, and part segmentation show that ASSANet outperforms PointNet and other methods, achieving much higher accuracy and faster speeds. In particular, ASSANet outperforms PointNet by $7.4$ mIoU on S3DIS Area 5, while maintaining $1.6 times $ faster inference speed on a single NVIDIA 2080Ti GPU. Our scaled ASSANet variant achieves $66.8$ mIoU and outperforms KPConv, while being more than $54 times$ faster.

【2】 SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training 标题：SLAM：一种基于语音-文本联合预训练的语音和语言建模统一编码器链接：https://arxiv.org/abs/2110.10329

作者：Ankur Bapna,Yu-an Chung,Nan Wu,Anmol Gulati,Ye Jia,Jonathan H. Clark,Melvin Johnson,Jason Riesa,Alexis Conneau,Yu Zhang 机构：Google Research, MIT Computer Science and Artificial Intelligence Laboratory, Center for Data Science, New York University 摘要：无监督的预训练现在是文本和语音理解的主要方法。当对来自不同领域和语言的下游任务进行微调时，在大量未注数据上预先训练的自我注意模型已经取得了巨大的成功。本文通过将语音和文本预训练统一到一个模型中，进一步提高了无监督语言预训练的普遍性。我们在未标记文本上构建了一个具有BERT目标的编码器，在未标记语音上构建了w2v BERT目标。为了进一步跨模式对齐我们的模型表示，我们利用对齐损失，特别是利用监督语音文本识别数据的翻译语言建模（TLM）和语音文本匹配（STM）。我们证明，与单模态预训练模型相比，在预训练期间合并语音和文本数据可以显著提高CoVoST~2语音翻译的下游质量，大约提高1 BLEU，同时在LibriSpeech和SpeechStew ASR任务上保持接近SotA的性能。在四个粘合任务和文本规范化方面，我们观察到两种模式之间存在容量限制和干扰的证据，与等效的纯文本模型相比，这导致性能下降，同时仍然与BERT竞争。通过广泛的实证分析，我们还证明了选择语音预训练目标函数的重要性，以及添加额外监督信号对学习表征质量的有益影响。摘要：Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-training within a single model. We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. To further align our model representations across modalities, we leverage alignment losses, specifically Translation Language Modeling (TLM) and Speech Text Matching (STM) that make use of supervised speech-text recognition data. We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST~2 speech translation, by around 1 BLEU compared to single-modality pre-trained models, while retaining close to SotA performance on LibriSpeech and SpeechStew ASR tasks. On four GLUE tasks and text-normalization, we observe evidence of capacity limitations and interference between the two modalities, leading to degraded performance compared to an equivalent text-only model, while still being competitive with BERT. Through extensive empirical analysis we also demonstrate the importance of the choice of objective function for speech pre-training, and the beneficial effect of adding additional supervised signals on the quality of the learned representations.

联邦学习|隐私保护|加密(4篇)

【1】 FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion 标题：FedGEMS：基于选择性知识融合的大型服务器模型联合学习链接：https://arxiv.org/abs/2110.11027

作者：Sijie Cheng,Jingwen Wu,Yanghua Xiao,Yang Liu,Yang Liu 机构：School of Computer Science, Fudan University, Shanghai, China, International School, Beijing University of Posts and Telecommunications, Beijing, China, Institute for AI Industry Research, Tsinghua University, Beijing, China 备注：Under review as a conference paper at ICLR 2022 摘要：如今，数据往往分散在数十亿个资源受限的边缘设备中，且存在安全和隐私限制。联邦学习（FL）已经成为一种可行的解决方案，可以在保持数据私有的同时学习全局模型，但FL的模型复杂性受到边缘节点计算资源的限制。在这项工作中，我们研究了一种新的范式，利用一个强大的服务器模型来突破外语教学中的模型容量。通过有选择地从多个教师客户机和服务器本身学习，服务器模型开发深入的知识，并将其知识转移回客户机，以提高其各自的性能。我们提出的框架在服务器和客户端模型上都实现了优异的性能，并在统一框架中提供了一些优势，包括异构客户端架构的灵活性、对中毒攻击的鲁棒性以及客户端和服务器之间的通信效率。通过将FL与更大的服务器模型训练有效地联系起来，我们提出的范例为从分布式和私有数据中稳健和持续地积累知识铺平了道路。摘要：Today data is often scattered among billions of resource-constrained edge devices with security and privacy constraints. Federated Learning (FL) has emerged as a viable solution to learn a global model while keeping data private, but the model complexity of FL is impeded by the computation resources of edge nodes. In this work, we investigate a novel paradigm to take advantage of a powerful server model to break through model capacity in FL. By selectively learning from multiple teacher clients and itself, a server model develops in-depth knowledge and transfers its knowledge back to clients in return to boost their respective performance. Our proposed framework achieves superior performance on both server and client models and provides several advantages in a unified framework, including flexibility for heterogeneous client architectures, robustness to poisoning attacks, and communication efficiency between clients and server. By bridging FL effectively with larger server model training, our proposed paradigm paves ways for robust and continual knowledge accumulation from distributed and private data.

【2】 Bristle: Decentralized Federated Learning in Byzantine, Non-i.i.d. Environments 标题：硬毛：拜占庭的分散联合学习，无身份证明。环境链接：https://arxiv.org/abs/2110.11006

作者：Joost Verbraeken,Martijn de Vos,Johan Pouwelse 机构：Delft University of Technology, Delft, The Netherlands 摘要：联邦学习（FL）是一种隐私友好型机器学习，设备在本地根据其私有数据训练模型，通常与服务器进行模型更新通信。在分散式FL（DFL）中，对等方相互通信模型更新。然而，DFL具有挑战性，因为（1）不同对等方拥有的训练数据通常是非i.i.d.（即，在对等方之间分布不同）和（2）恶意或拜占庭式攻击者可以与其他对等方共享任意模型更新以颠覆训练过程。我们解决了这两个挑战，并提出了猪鬃，学习应用程序和分散网络层之间的中间件。猪鬃利用转移学习预先确定和冻结神经网络的非输出层，显著加快模型训练并降低通信成本。为了使用来自其他对等方的模型更新安全地更新输出层，我们设计了一个快速的基于距离的优先级器和一个新的基于性能的积分器。它们的综合效应导致了对拜占庭式攻击者的高恢复力以及处理非i.i.d.类的能力。我们的经验表明，在拜占庭环境中，猪鬃收敛到一致的95%精度，优于所有评估基线。在非拜占庭式的环境中，与最先进的方法相比，猪鬃需要83%的迭代次数才能达到90%的精度。我们表明，当训练课程为非i.i.d.时，猪鬃显著优于最具拜占庭弹性基线的准确性2.3倍，同时将通信成本降低90%。摘要：Federated learning (FL) is a privacy-friendly type of machine learning where devices locally train a model on their private data and typically communicate model updates with a server. In decentralized FL (DFL), peers communicate model updates with each other instead. However, DFL is challenging since (1) the training data possessed by different peers is often non-i.i.d. (i.e., distributed differently between the peers) and (2) malicious, or Byzantine, attackers can share arbitrary model updates with other peers to subvert the training process. We address these two challenges and present Bristle, middleware between the learning application and the decentralized network layer. Bristle leverages transfer learning to predetermine and freeze the non-output layers of a neural network, significantly speeding up model training and lowering communication costs. To securely update the output layer with model updates from other peers, we design a fast distance-based prioritizer and a novel performance-based integrator. Their combined effect results in high resilience to Byzantine attackers and the ability to handle non-i.i.d. classes. We empirically show that Bristle converges to a consistent 95% accuracy in Byzantine environments, outperforming all evaluated baselines. In non-Byzantine environments, Bristle requires 83% fewer iterations to achieve 90% accuracy compared to state-of-the-art methods. We show that when the training classes are non-i.i.d., Bristle significantly outperforms the accuracy of the most Byzantine-resilient baselines by 2.3x while reducing communication costs by 90%.

【3】 SecureBoost : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning 标题：SecureBoost ：一种面向大规模垂直联合学习的高性能梯度增强树框架链接：https://arxiv.org/abs/2110.10927

作者：Weijing Chen,Guoqiang Ma,Tao Fan,Yan Kang,Qian Xu,Qiang Yang 机构： AI Department of WeBank, Shenzhen, China, Hong Kong University of Science and Technology, Hong Kong, China 摘要：梯度提升决策树（GBDT）是一种广泛应用的集成算法。它的垂直联合学习版本SecureBoost是跨竖井隐私保护建模中最流行的算法之一。随着隐私计算领域近年来的蓬勃发展，对大规模和高性能联合学习的需求在现实世界的应用中急剧增长。在本文中，为了满足这些需求，我们提出了SecureBoost ，这是一种新颖的、改进于先前工作SecureBoost的技术。SecureBoost 集成了多个密文计算优化和工程优化。实验结果表明，与Secureboost相比，Secureboost 在大型和高维数据集上具有显著的性能改进。它使得高效的大规模垂直联合学习成为可能。摘要：Gradient boosting decision tree (GBDT) is a widely used ensemble algorithm in the industry. Its vertical federated learning version, SecureBoost, is one of the most popular algorithms used in cross-silo privacy-preserving modeling. As the area of privacy computation thrives in recent years, demands for large-scale and high-performance federated learning have grown dramatically in real-world applications. In this paper, to fulfill these requirements, we propose SecureBoost that is both novel and improved from the prior work SecureBoost. SecureBoost integrates several ciphertext calculation optimizations and engineering optimizations. The experimental results demonstrate that Secureboost has significant performance improvements on large and high dimensional data sets compared to SecureBoost. It makes effective and efficient large-scale vertical federated learning possible.

【4】 A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison 标题：一种面向普适计算的联邦学习聚合算法：评估与比较链接：https://arxiv.org/abs/2110.10223

作者：Sannara Ek,François Portet,Philippe Lalanda,German Vega 机构：Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG F-, Grenoble, France 备注：9th IEEE International Conference on Pervasive Computing and Communications (PerCom 2021) 摘要：普适计算促进在我们的生活空间中安装连接设备以提供服务。最近有两项重大发展取得了重大进展：先进地利用边缘资源和将机器学习技术集成到工程应用中。这一演变带来了重大挑战，特别是与计算元素沿边缘到云连续分布相关的挑战。关于这一点，联邦学习最近被提出用于edge的分布式模型训练。这种方法的原理是聚合在分布式客户机上学习的模型，以获得新的、更通用的模型。然后将生成的模型重新分发给客户进行进一步训练。迄今为止，最流行的联邦学习算法使用模型参数的坐标平均值进行聚合。然而，已经表明，该方法不适用于数据不完全独立分布（非iid）的异构环境。这直接对应于一些普适计算场景，其中设备和用户的异构性挑战机器学习的通用性和个性化双重目标。在本文中，我们提出了一种新的聚合算法，称为FedDist，它能够通过识别客户机中特定神经元之间的差异来修改其模型结构（这里称为深层神经网络）。这允许在不影响概括的情况下解释客户的特殊性。此外，我们还定义了一个完整的方法，以现实的方式评估联邦学习，同时考虑了泛化和个性化。使用这种方法，FedDist在智能手机的人类活动识别领域进行了广泛的测试，并与三种最先进的联合学习算法进行了比较。摘要：Pervasive computing promotes the installation of connected devices in our living spaces in order to provide services. Two major developments have gained significant momentum recently: an advanced use of edge resources and the integration of machine learning techniques for engineering applications. This evolution raises major challenges, in particular related to the appropriate distribution of computing elements along an edge-to-cloud continuum. About this, Federated Learning has been recently proposed for distributed model training in the edge. The principle of this approach is to aggregate models learned on distributed clients in order to obtain a new, more general model. The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. However, it has been shown that this method is not adapted in heterogeneous environments where data is not identically and independently distributed (non-iid). This corresponds directly to some pervasive computing scenarios where heterogeneity of devices and users challenges machine learning with the double objective of generalization and personalization. In this paper, we propose a novel aggregation algorithm, termed FedDist, which is able to modify its model architecture (here, deep neural network) by identifying dissimilarities between specific neurons amongst the clients. This permits to account for clients' specificity without impairing generalization. Furthermore, we define a complete method to evaluate federated learning in a realistic way taking generalization and personalization into account. Using this method, FedDist is extensively tested and compared with three state-of-the-art federated learning algorithms on the pervasive domain of Human Activity Recognition with smartphones.

推理|分析|理解|解释(8篇)

【1】 A Fine-Grained Analysis on Distribution Shift 标题：分布偏移的细粒度分析链接：https://arxiv.org/abs/2110.11328

作者：Olivia Wiles,Sven Gowal,Florian Stimberg,Sylvestre Alvise-Rebuffi,Ira Ktena,Krishnamurthy,Dvijotham,Taylan Cemgil 机构：DeepMind, London, UK 摘要：对分布变化的鲁棒性对于在现实世界中部署机器学习模型至关重要。尽管有这种必要性，但在定义导致这些变化的潜在机制和评估算法在多个不同分布变化中的稳健性方面的工作很少。为此，我们引入了一个框架，可以对各种分布变化进行细粒度分析。我们通过评估19种不同的方法，将其分为五类，包括合成数据集和真实数据集，对当前最先进的方法进行了全面分析。总的来说，我们训练了85K多个型号。我们的实验框架可以很容易地扩展到包括新方法、转换和数据集。我们发现，与以前的工作不同~citep{gullajani20}，在标准ERM基线上取得了进展；特别是，在许多情况下，预训练和强化（学习型或启发型）会带来巨大的收益。然而，最好的方法在不同的数据集和班次上并不一致。摘要：Robustness to distribution shifts is critical for deploying machine learning models in the real world. Despite this necessity, there has been little work in defining the underlying mechanisms that cause these shifts and evaluating the robustness of algorithms across multiple, different distribution shifts. To this end, we introduce a framework that enables fine-grained analysis of various distribution shifts. We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into five categories across both synthetic and real-world datasets. Overall, we train more than 85K models. Our experimental framework can be easily extended to include new methods, shifts, and datasets. We find, unlike previous work~citep{Gulrajani20}, that progress has been made over a standard ERM baseline; in particular, pretraining and augmentations (learned or heuristic) offer large gains in many cases. However, the best methods are not consistent over different datasets and shifts.

【2】 StyleAlign: Analysis and Applications of Aligned StyleGAN Models 标题：StyleAlign：对齐StyleGAN模型的分析与应用链接：https://arxiv.org/abs/2110.11323

作者：Zongze Wu,Yotam Nitzan,Eli Shechtman,Dani Lischinski 机构：The Hebrew University, Tel-Aviv University, Adobe Research 备注：39 pages, 33 figures 摘要：在本文中，我们对对齐生成模型的性质和应用进行了深入的研究。如果两个模型共享相同的体系结构，并且其中一个模型（子模型）通过微调到另一个领域从另一个模型（父模型）获得，我们称之为对齐模型，这是迁移学习中的常见做法。一些作品已经利用对齐样式模型的一些基本属性来执行图像到图像的转换。在这里，我们对模型对齐进行了第一次详细的探索，重点也是StyleGAN。首先，我们对对齐模型进行实证分析，并回答有关其性质的重要问题。特别是，我们发现子模型的潜在空间与父模型的潜在空间在语义上是一致的，继承了极其丰富的语义，即使对于遥远的数据域，如人脸和教堂也是如此。其次，有了这一更好的理解，我们可以利用一致的模型来解决一系列不同的任务。除了图像转换，我们还演示了完全自动的跨域图像变形。我们进一步表明，Zero-Shot视觉任务可以在子域中执行，而完全依赖于父域中的监督。我们从定性和定量上证明，我们的方法可以产生最先进的结果，同时只需要简单的微调和反转。摘要：In this paper, we perform an in-depth study of the properties and applications of aligned generative models. We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.

【3】 Principal Component Analysis versus Factor Analysis 标题：主成分分析与因子分析链接：https://arxiv.org/abs/2110.11261

作者：Zenon Gniazdowski 机构：Warsaw School of Computer Science 备注：None 摘要：本文讨论了与主成分分析（PCA）和因子分析（FA）相关的选定问题。特别是，对这两种类型的分析进行了比较。还提出了PCA和FA的矢量解释。详细讨论了主成分分析中主成分个数的确定和主成分分析中因子的确定问题。讨论了一种确定因子和主成分数量的新标准，该标准将允许呈现每个分析的主要变量的大部分方差。此外，还提出了一种确定FA中因子数的有效算法，该算法符合该准则。该算法适用于主成分分析中主成分数的确定。还提出了一种新的确定主成分个数的方法来改进PCA算法。对所得结果进行了讨论。摘要：The article discusses selected problems related to both principal component analysis (PCA) and factor analysis (FA). In particular, both types of analysis were compared. A vector interpretation for both PCA and FA has also been proposed. The problem of determining the number of principal components in PCA and factors in FA was discussed in detail. A new criterion for determining the number of factors and principal components is discussed, which will allow to present most of the variance of each of the analyzed primary variables. An efficient algorithm for determining the number of factors in FA, which complies with this criterion, was also proposed. This algorithm was adapted to find the number of principal components in PCA. It was also proposed to modify the PCA algorithm using a new method of determining the number of principal components. The obtained results were discussed.

【4】 Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks 标题：深线性网络反馈对准的收敛性分析和隐式正则化链接：https://arxiv.org/abs/2110.10815

作者：Manuela Girotti,Ioannis Mitliagkas,Gauthier Gidel 机构：Mila Institute, Saint Mary’s University, Department of Mathematics and Computing Science, Halifax, NS B,H ,C, Canada, Mila Institute, Université de Montréal, Department of Computer Science, and Operational Reserarch, Montréal, QC H,T ,J, Canada 备注：10 pages (Main) 19 pages (Appendix), 6 figures 摘要：我们从理论上分析了反馈对齐（FA）算法，它是训练神经网络的一种有效的反向传播方法。我们为连续和离散动态的深线性网络提供了收敛速度保证。此外，我们还研究了浅层线性网络的增量学习现象。有趣的是，某些特定的初始化意味着可以忽略的组件在主要组件之前就被学习，因此可能会对这种学习算法的有效性产生负面影响；我们将这种现象归类为隐式反正则化。我们还提供了初始化方案，其中通过降低重要性顺序来近似学习问题的组成部分，从而提供了一种隐式正则化形式。摘要：We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks. We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics. Additionally, we study incremental learning phenomena for shallow linear networks. Interestingly, certain specific initializations imply that negligible components are learned before the principal ones, thus potentially negatively affecting the effectiveness of such a learning algorithm; a phenomenon we classify as implicit anti-regularization. We also provide initialization schemes where the components of the problem are approximately learned by decreasing order of importance, thus providing a form of implicit regularization.

【5】 Behavioral Experiments for Understanding Catastrophic Forgetting 标题：理解灾难性遗忘的行为实验链接：https://arxiv.org/abs/2110.10570

作者：Samuel J. Bell,Neil D. Lawrence 机构：Dept. of Computer Science & Technology, University of Cambridge, United Kingdom 摘要：在本文中，我们探讨了实验心理学的基本工具，行为实验，是否不仅有能力洞察人类和动物，而且有能力洞察人工系统。我们应用实验心理学的技术来研究神经网络中的灾难性遗忘。我们用两层ReLU网络进行了一系列控制实验，探索性结果揭示了对灾难性遗忘行为的新理解。除了我们的实证研究结果，我们还展示了一种替代的、行为优先的方法来研究神经网络现象。摘要：In this paper we explore whether the fundamental tool of experimental psychology, the behavioral experiment, has the power to generate insight not only into humans and animals, but artificial systems too. We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exploratory results revealing a new understanding of the behavior of catastrophic forgetting. Alongside our empirical findings, we demonstrate an alternative, behavior-first approach to investigating neural network phenomena.

【6】 When in Doubt, Summon the Titans: Efficient Inference with Large Models 标题：有疑问时，召唤巨人：用大型模型进行有效推断链接：https://arxiv.org/abs/2110.10305

作者：Ankit Singh Rawat,Manzil Zaheer,Aditya Krishna Menon,Amr Ahmed,Sanjiv Kumar 机构：Google Research, USA 摘要：将神经网络扩展到具有数十亿个参数的“大”规模，已被证明在许多具有挑战性的问题上产生了令人印象深刻的结果。然而，这样大的模型所产生的推理成本往往妨碍了它们在大多数现实环境中的应用。在本文中，我们提出了一个基于蒸馏的两阶段框架，实现了大型模型的建模优势，同时在很大程度上保留了使用更轻量级模型进行推理的计算优势。简言之，我们使用大型教师模型来引导轻型学生模型仅对“简单”示例的子集做出正确预测；对于“难”的例子，我们求助于老师。这种方法使我们能够在实际场景中有效地使用大型模型，其中简单的示例比罕见的硬示例要频繁得多。我们建议使用蒸馏仅处理简单的实例，从而在学生人数方面进行更积极的权衡，从而降低推断的摊余成本，并获得比标准蒸馏更好的准确性。经验上，我们展示了我们的方法在图像分类和自然语言处理基准上的优势。摘要：Scaling neural networks to "large" sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while largely preserving the computational benefits of inference with more lightweight models. In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher. Such an approach allows us to efficiently employ large models in practical scenarios where easy examples are much more frequent than rare hard examples. Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference and achieving better accuracy than standard distillation. Empirically, we demonstrate the benefits of our approach on both image classification and natural language processing benchmarks.

【7】 fairadapt: Causal Reasoning for Fair Data Pre-processing 标题：Fairadapt：公平数据预处理的因果推理链接：https://arxiv.org/abs/2110.10200

作者：Drago Plečko,Nicolas Bennett,Nicolai Meinshausen 机构：ETH Zürich 备注：Keywords: algorithmic fairness, causal inference, machine learning 摘要：机器学习算法适用于各种预测任务，但它们也可以学习如何基于性别、种族或其他敏感属性进行区分。这种认识产生了公平机器学习领域，其目的是测量和减轻这种算法偏差。本文描述了R-package fairadapt，它实现了一种因果推理预处理方法。通过使用因果图形模型和观察数据，该方法可用于解决“如果我属于不同性别/种族，我的工资会是多少？”这类假设性问题。这种个人层面的反事实推理有助于消除歧视，并有助于证明公平决策的合理性。我们还讨论了适当的放松，这些放松假设从敏感属性到结果的某些因果途径是非歧视性的。摘要：Machine learning algorithms are useful for various predictions tasks, but they can also learn how to discriminate, based on gender, race or other sensitive attributes. This realization gave rise to the field of fair machine learning, which aims to measure and mitigate such algorithmic bias. This manuscript describes the R-package fairadapt, which implements a causal inference pre-processing method. By making use of a causal graphical model and the observed data, the method can be used to address hypothetical questions of the form "What would my salary have been, had I been of a different gender/race?". Such individual level counterfactual reasoning can help eliminate discrimination and help justify fair decisions. We also discuss appropriate relaxations which assume certain causal pathways from the sensitive attribute to the outcome are not discriminatory.

【8】 Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process 标题：约束马尔可夫决策过程的快速算法及夏普分析链接：https://arxiv.org/abs/2110.10351

作者：Tianjiao Li,Ziwei Guan,Shaofeng Zou,Tengyu Xu,Yingbin Liang,Guanghui Lan 机构：⋆Georgia Institute of Technology, †The Ohio State University, ‡University at Buffalo, The State University of New York 备注：The paper was initially submitted for publication in January 2021 摘要：研究了约束马尔可夫决策过程（CMDP）问题，其中一个代理的目标是在其效用/成本受到多个约束的情况下，使期望的累计折扣报酬最大化。提出了一种新的原始-对偶方法，将熵正则化策略优化器、对偶变量正则化器和Nesterov的加速梯度下降对偶优化器这三个要素进行了新的集成，所有这些要素都是实现更快收敛的关键。给出了该方法的有限时间误差界。尽管非CAVE目标受到非CAVE约束的挑战，但所提出的方法在最优性差距和约束违反方面收敛到全局最优，复杂性为$tilde{mathcal O}（1/epsilon）$，它将现有原始-对偶方法的复杂性提高了$mathcal O（1/epsilon）$citep{ding2020natural，paternain2019constrated}。这是第一次证明，对于受凸约束的凸优化，非凸CMDP问题可以达到$mathcal O（1/epsilon）$的复杂性下限。我们的原始-对偶方法和非渐近分析与所使用的RL优化器无关，因此在实际应用中更加灵活。更一般地说，我们的方法也是第一个通过利用梯度优势条件等几何条件来加速具有零对偶间隙的约束非凸优化的算法，对于这些几何条件，现有的约束凸优化加速方法是不适用的。摘要：The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new primal-dual approach is proposed with a novel integration of three ingredients: entropy regularized policy optimizer, dual variable regularizer, and Nesterov's accelerated gradient descent dual optimizer, all of which are critical to achieve a faster convergence. The finite-time error bound of the proposed approach is characterized. Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $tilde{mathcal O}(1/epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $mathcal O(1/epsilon)$ citep{ding2020natural,paternain2019constrained}. This is the first demonstration that nonconcave CMDP problems can attain the complexity lower bound of $mathcal O(1/epsilon)$ for convex optimization subject to convex constraints. Our primal-dual approach and non-asymptotic analysis are agnostic to the RL optimizer used, and thus are more flexible for practical applications. More generally, our approach also serves as the first algorithm that provably accelerates constrained nonconvex optimization with zero duality gap by exploiting the geometries such as the gradient dominance condition, for which the existing acceleration methods for constrained convex optimization are not applicable.

检测相关(3篇)

【1】 Generalized Out-of-Distribution Detection: A Survey 标题：广义失配检测：综述链接：https://arxiv.org/abs/2110.11334

作者：Jingkang Yang,Kaiyang Zhou,Yixuan Li,Ziwei Liu 备注：Issues, comments, and questions are all welcomed in this https URL 摘要：分布外（OOD）检测对于确保机器学习系统的可靠性和安全性至关重要。例如，在自动驾驶中，我们希望驾驶系统在检测到以前从未见过的异常场景或物体且无法做出安全决策时发出警报并将控制权移交给人类。这一问题最早出现于2017年，自那以后，研究界越来越关注这一问题，并开发了大量方法，从基于分类到基于密度再到基于距离的方法。同时，从动机和方法论的角度来看，其他几个问题与OOD检测密切相关。其中包括异常检测（AD）、新颖性检测（ND）、开放集识别（OSR）和离群点检测（OD）。尽管有不同的定义和问题设置，但这些问题往往会使读者和从业者感到困惑，因此，一些现有研究误用了术语。在本次调查中，我们首先提出了一个称为广义OOD检测的通用框架，该框架包含上述五个问题，即AD、ND、OSR、OOD检测和OD。在我们的框架下，这五个问题可以被视为特殊情况或子任务，并且更容易区分。然后，我们通过总结这五个领域的最新技术发展，对它们中的每一个进行彻底的回顾。我们以开放的挑战和潜在的研究方向来结束这项调查。摘要：Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.

【2】 A Systematic Review on the Detection of Fake News Articles 标题：对虚假新闻文章检测的系统评述链接：https://arxiv.org/abs/2110.11240

作者：Nathaniel Hoy,Theodora Koulouri 机构：Brunel University London, Department of Computer Science, United Kingdom 备注：22 Pages, 16 Figures, Currently submitted to ACM TIST - Awaiting Peer-Review 摘要：有人认为，虚假新闻和虚假信息的传播对世界各地的社会构成威胁，从影响选举结果到阻碍控制新冠疫情的努力。为了应对这种威胁，人们开发了许多自然语言处理（NLP）方法。它们利用大量数据集、特征提取/选择技术和机器学习（ML）算法，在假新闻传播之前检测假新闻。虽然这些方法都有很好的文献记载，但关于其在该领域的有效性的证据较少。通过系统地回顾文献，本文旨在描述最有效的假新闻检测方法，识别现有方法的局限性，并提出缓解这些局限性的方法。对结果的分析表明，结合新闻内容和基于社会的特征的集成方法是目前最有效的。最后，有人建议，未来的研究应侧重于开发解决普遍性问题（部分原因是当前数据集的局限性）、可解释性和偏见的方法。摘要：It has been argued that fake news and the spread of false information pose a threat to societies throughout the world, from influencing the results of elections to hindering the efforts to manage the COVID-19 pandemic. To combat this threat, a number of Natural Language Processing (NLP) approaches have been developed. These leverage a number of datasets, feature extraction/selection techniques and machine learning (ML) algorithms to detect fake news before it spreads. While these methods are well-documented, there is less evidence regarding their efficacy in this domain. By systematically reviewing the literature, this paper aims to delineate the approaches for fake news detection that are most performant, identify limitations with existing approaches, and suggest ways these can be mitigated. The analysis of the results indicates that Ensemble Methods using a combination of news content and socially-based features are currently the most effective. Finally, it is proposed that future research should focus on developing approaches that address generalisability issues (which, in part, arise from limitations with current datasets), explainability and bias.

【3】 DeLag: Detecting Latency Degradation Patterns in Service-based Systems 标题：DeLag：检测基于服务的系统中的延迟降低模式链接：https://arxiv.org/abs/2110.11155

作者：Luca Traini,Vittorio Cortellessa 机构： Cortellessa are with the Department of InformationEngineering, University of L’Aquila 摘要：生产中的性能调试是现代基于服务的系统中的一项基本活动。性能问题的诊断通常非常耗时，因为它需要彻底检查大量的跟踪和性能指标。在本文中，我们提出了DeLag，一种新的基于自动搜索的方法，用于诊断基于服务的系统中的性能问题。DeLag识别请求的子集，这些请求结合其远程过程调用执行时间显示潜在相关性能问题的症状。我们称这种症状为潜伏期退化模式。DeLag同时搜索多个延迟降级模式，同时优化精确度、召回率和延迟差异性。在两个基于微服务的系统生成的700个请求数据集上的实验表明，我们的方法比三种最先进的方法和通用机器学习聚类算法提供了更好、更稳定的有效性。此外，在我们评估中使用的最大数据集上，DeLag在效率方面优于第二和第三最有效的基线技术。摘要：Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance indices. In this paper we present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems. DeLag identifies subsets of requests that show, in the combination of their Remote Procedure Call execution times, symptoms of potentially relevant performance issues. We call such symptoms Latency Degradation Patterns. DeLag simultaneously search for multiple latency degradation patterns while optimizing precision, recall and latency dissimilarity. Experimentation on 700 datasets of requests generated from two microservice-based systems shows that our approach provide better and more stable effectiveness than three state-of-the-art approaches and general purpose machine learning clustering algorithms. Moreover, DeLag outperforms in terms of efficiency the second and the third most effective baseline techniques on the largest datasets used in our evaluation.

编码器(3篇)

【1】 Dual Encoding U-Net for Spatio-Temporal Domain Shift Frame Prediction 标题：时空域移位帧预测的双重编码U网链接：https://arxiv.org/abs/2110.11140

作者：Jay Santokhi,Dylan Hillier,Yiming Yang,Joned Sarwar,Anna Jordan,Emil Hewage 机构：Alchera Data Technologies Ltd, Cambridge, CB,NN 备注：8 pages, 4 figures, 5 tables 摘要：在过去的18个月中，城市范围内的交通行为发生了重大变化。对这种行为作出准确和可靠预测的能力也发生了巨大变化，新冠病毒-19的措施影响到世界各地的人口如何与流动性的不同方面互动。这就提出了一个问题：“如何利用大量的新冠病毒感染前的流动性数据来预测当前/后新冠病毒感染环境下的未来行为？”本文试图通过引入一种使用轻量级双编码U-Net（仅使用12个卷积层构建）的流量帧预测方法来解决这个问题，该U-Net采用了一种新的方法来跳过卷积LSTM层之间的连接。这种方法与训练数据的直观处理相结合，可以对时间和时空域转移进行建模（gitlab.com/alchera/alchera-traffic4cast-2021）。摘要：The landscape of city-wide mobility behaviour has altered significantly over the past 18 months. The ability to make accurate and reliable predictions on such behaviour has likewise changed drastically with COVID-19 measures impacting how populations across the world interact with the different facets of mobility. This raises the question: "How does one use an abundance of pre-covid mobility data to make predictions on future behaviour in a present/post-covid environment?" This paper seeks to address this question by introducing an approach for traffic frame prediction using a lightweight Dual-Encoding U-Net built using only 12 Convolutional layers that incorporates a novel approach to skip-connections between Convolutional LSTM layers. This approach combined with an intuitive handling of training data can model both a temporal and spatio-temporal domain shift (gitlab.com/alchera/alchera-traffic4cast-2021).

【2】 Encoding spatiotemporal priors with VAEs for small-area estimation 标题：用于小区域估计的VAEs时空先验编码链接：https://arxiv.org/abs/2110.10422

作者：Elizaveta Semenova,Yidan Xu,Adam Howes,Theo Rashid,Samir Bhatt,Swapnil Mishra,Seth Flaxman 机构：Imperial College London, University of Michigan, University of Copenhagen, University of Oxford 摘要：高斯过程（GPs）是小区域时空统计建模中最流行的方法，通过有限数据集合的多元高斯分布实现。在这种情况下，它们被用来编码空间和时间上的相关结构，并且可以很好地推广到插值任务中。尽管具有灵活性，但现成的GPs仍存在严重的计算挑战，限制了其在应用环境中的可扩展性和实用性。在这里，我们提出了一种新的深度生成建模方法来应对这一挑战：对于特定的时空环境，我们通过事先采样和随后拟合变分自动编码器（VAE）来近似一类GP先验。给定一个经过训练的VAE，由于VAE的低维、独立分布的潜在高斯空间表示，由此产生的解码器使得时空推理变得非常有效。一旦训练完成，使用VAE解码器的推理将在贝叶斯抽样框架内取代GP。该方法提供了易于处理且易于实现的近似编码时空先验的方法，并促进了有效的统计推断。我们证明了我们的VAE两阶段方法在贝叶斯小面积估计任务中的实用性。摘要：Gaussian processes (GPs), implemented through multivariate Gaussian distributions for a finite collection of data, are the most popular approach in small-area spatiotemporal statistical modelling. In this context they are used to encode correlation structures over space and time and can generalise well in interpolation tasks. Despite their flexibility, off-the-shelf GPs present serious computational challenges which limit their scalability and practical usefulness in applied settings. Here, we propose a novel, deep generative modelling approach to tackle this challenge: for a particular spatiotemporal setting, we approximate a class of GP priors through prior sampling and subsequent fitting of a variational autoencoder (VAE). Given a trained VAE, the resultant decoder allows spatiotemporal inference to become incredibly efficient due to the low dimensional, independently distributed latent Gaussian space representation of the VAE. Once trained, inference using the VAE decoder replaces the GP within a Bayesian sampling framework. This approach provides tractable and easy-to-implement means of approximately encoding spatiotemporal priors and facilitates efficient statistical inference. We demonstrate the utility of our VAE two stage approach on Bayesian, small-area estimation tasks.

【3】 Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE 标题：动量对比自动编码器：基于对比学习的WAE潜在空间分布匹配链接：https://arxiv.org/abs/2110.10303

作者：Devansh Arpit,Aadyot,Bhatnagar,Huan Wang,Caiming Xiong 机构：Salesforce AI Research 摘要：Wasserstein自动编码器（WAE）表明，在该AE的潜在空间与预先指定的先验分布匹配的约束下，匹配两个分布相当于最小化一个简单的自动编码器（AE）损失。这种潜在空间分布匹配是WAE的核心组成部分，也是一项具有挑战性的任务。在本文中，我们建议使用对比学习框架来解决这个问题，该框架已被证明对自我监督表征学习是有效的。我们这样做是通过利用这样一个事实，即对比学习目标优化了潜在空间分布，使其在单位超球面上均匀分布，而单位超球面很容易从中取样。结果表明，与现有的WAE算法相比，使用对比学习框架优化WAE损失可以获得更快的收敛速度和更稳定的优化效果。这也反映在CelebA和CIFAR-10数据集上的FID分数以及CelebA HQ数据集上真实生成的图像质量上。摘要：Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent space distribution matching is a core component of WAE, and a challenging task. In this paper, we propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem. We do so by exploiting the fact that contrastive learning objectives optimize the latent space distribution to be uniform over the unit hyper-sphere, which can be easily sampled from. We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE. This is also reflected in the FID scores on CelebA and CIFAR-10 datasets, and the realistic generated image quality on the CelebA-HQ dataset.

优化|敛散性(13篇)

【1】 Actor-critic is implicitly biased towards high entropy optimal policies 标题：行动者-批评家隐含地偏向于高熵最优策略链接：https://arxiv.org/abs/2110.11280

作者：Yuzheng Hu,Ziwei Ji,Matus Telgarsky 机构：University of Illinois, Urbana-Champaign 摘要：我们证明了最简单的actor-critic方法——通过与线性MDP交互使用TD更新的线性softmax策略，但没有显式正则化或探索——不仅找到了最优策略，而且更倾向于高熵最优策略。为了证明这种偏差的强度，该算法不仅没有正则化，没有投影，也没有像$epsilon$-greedy那样的探索，而且在没有重置的单个轨迹上进行训练。高熵偏差的关键结果是，可以放弃所有先前工作中以某种形式存在的MDP均匀混合假设：高熵偏差的隐式正则化足以确保所有链混合，并以高概率达到最优策略。作为辅助贡献，这项工作通过将参与者更新编写为显式镜像下降来分离参与者和批评家之间的关注，提供了在政策空间的KL球内统一绑定混合时间的工具，并提供了一个无投影TD分析，其自身的隐式偏差可从未混合的起始分布运行。摘要：We show that the simplest actor-critic method -- a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration -- does not merely find an optimal policy, but moreover prefers high entropy optimal policies. To demonstrate the strength of this bias, the algorithm not only has no regularization, no projections, and no exploration like $epsilon$-greedy, but is moreover trained on a single trajectory with no resets. The key consequence of the high entropy bias is that uniform mixing assumptions on the MDP, which exist in some form in all prior work, can be dropped: the implicit regularization of the high entropy bias is enough to ensure that all chains mix and an optimal policy is reached with high probability. As auxiliary contributions, this work decouples concerns between the actor and critic by writing the actor update as an explicit mirror descent, provides tools to uniformly bound mixing times within KL balls of policy space, and provides a projection-free TD analysis with its own implicit bias which can be run from an unmixed starting distribution.

【2】 Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation 标题：噪声对比估计法的优化前景分析与改进链接：https://arxiv.org/abs/2110.11271

作者：Bingbin Liu,Elan Rosenfeld,Pradeep Ravikumar,Andrej Risteski 机构：Carnegie Mellon University 摘要：噪声对比估计（NCE）是一种用于学习非规范化概率模型的统计一致性方法。据经验观察，噪声分布的选择对NCE的性能至关重要。然而，这种观察从未正式或定量。事实上，甚至不清楚由于选择不当的噪声分布而产生的困难是统计上的还是算法上的。在这项工作中，我们正式指出了当使用不适当的噪声分布时NCE性能差的原因。也就是说，我们证明了这些挑战是由于行为不端（更准确地说，是平坦的）损失环境造成的。为了解决这个问题，我们引入了一种称为“eNCE”的NCE变体，它使用指数损失，并且当目标和噪声分布在给定的指数族中时，归一化梯度下降可证明地解决了景观问题。摘要：Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. However, such observations have never been made formal or quantitative. In fact, it is not even clear whether the difficulties arising from a poorly chosen noise distribution are statistical or algorithmic in nature. In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used. Namely, we prove these challenges arise due to an ill-behaved (more precisely, flat) loss landscape. To address this, we introduce a variant of NCE called "eNCE" which uses an exponential loss and for which normalized gradient descent addresses the landscape issues provably when the target and noise distributions are in a given exponential family.

【3】 A Nested Weighted Tchebycheff Multi-Objective Bayesian Optimization Approach for Flexibility of Unknown Utopia Estimation in Expensive Black-box Design Problems 标题：昂贵黑箱设计问题未知乌托邦估计灵活性的嵌套加权切比雪夫多目标贝叶斯优化方法链接：https://arxiv.org/abs/2110.11070

作者：Arpan Biswas,Claudio Fuentes,Christopher Hoyle 机构：Department of Mechanical Engineering, Oregon State University, Corvallis, OR , USA, Department of Statistics, Oregon State University, Corvallis, OR , USA, Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 备注：35 pages, 8 figures in main text and 2 figures in supplementary 摘要：我们提出了一个嵌套的加权切比雪夫多目标贝叶斯优化框架，其中我们从模型集合中建立了回归模型选择过程，以更好地估计加权切比雪夫黑箱多目标函数的不确定参数。在现有的工作中，已经证明了一种加权Tchebycheff-MOBO方法，该方法试图通过使用先验选择回归模型进行校准，在制定采集函数时估计未知的乌托邦。然而，现有的MOBO模型在选择适当的回归模型时缺乏灵活性，因此，随着MOBO迭代的进行，可能会出现拟合不足或拟合过度的情况，从而降低整体MOBO性能。由于它是太复杂的先验保证一般最好的模型，这促使我们考虑一个组合的不同家庭的预测模型拟合当前训练数据，由WTB MOBO引导；根据用户定义的预测均方根误差方法选择最佳模型。该方法用于优化多模态基准问题和恒定温度压力载荷下的薄壁管设计，最大限度地降低蠕变疲劳失效风险和设计成本。最后，在参数估计精度、帕累托最优解和函数估计代价等方面，比较了嵌套加权切比雪夫MOBO模型与不同MOBO框架的性能。这种方法被广泛地推广到考虑组合中的不同模型的预测模型，其中总体设计结构允许求解任何高维（多个函数）。复杂的黑箱问题，可以推广到任何其他需要乌托邦先验知识的全局准则多目标优化方法。摘要：We propose a nested weighted Tchebycheff Multi-objective Bayesian optimization framework where we build a regression model selection procedure from an ensemble of models, towards better estimation of the uncertain parameters of the weighted-Tchebycheff expensive black-box multi-objective function. In existing work, a weighted Tchebycheff MOBO approach has been demonstrated which attempts to estimate the unknown utopia in formulating acquisition function, through calibration using a priori selected regression model. However, the existing MOBO model lacks flexibility in selecting the appropriate regression models given the guided sampled data and therefore, can under-fit or over-fit as the iterations of the MOBO progress, reducing the overall MOBO performance. As it is too complex to a priori guarantee a best model in general, this motivates us to consider a portfolio of different families of predictive models fitted with current training data, guided by the WTB MOBO; the best model is selected following a user-defined prediction root mean-square-error-based approach. The proposed approach is implemented in optimizing a multi-modal benchmark problem and a thin tube design under constant loading of temperature-pressure, with minimizing the risk of creep-fatigue failure and design cost. Finally, the nested weighted Tchebycheff MOBO model performance is compared with different MOBO frameworks with respect to accuracy in parameter estimation, Pareto-optimal solutions and function evaluation cost. This method is generalized enough to consider different families of predictive models in the portfolio for best model selection, where the overall design architecture allows for solving any high-dimensional (multiple functions) complex black-box problems and can be extended to any other global criterion multi-objective optimization methods where prior knowledge of utopia is required.

【4】 LOA: Logical Optimal Actions for Text-based Interaction Games 标题：LOA：基于文本的交互游戏的逻辑最优动作链接：https://arxiv.org/abs/2110.10973

作者：Daiki Kimura,Subhajit Chaudhury,Masaki Ono,Michiaki Tatsubori,Don Joven Agravante,Asim Munawar,Akifumi Wachi,Ryosuke Kohita,Alexander Gray 机构：IBM Research 备注：ACL-IJCNLP 2021 (demo paper) 摘要：我们提出了逻辑最优动作（LOA），这是一种强化学习应用的动作决策架构，具有神经-符号框架，该框架是自然语言交互游戏中神经网络和符号知识获取方法的组合。LOA实验的演示包括一个基于web的交互式平台，用于基于文本的游戏和获取知识的可视化，以提高训练规则的可解释性。本演示还提供了一个与其他神经符号方法以及基于文本的相同游戏上的非符号最新代理模型的比较模块。我们的LOA还为强化学习环境提供了Python的开源实现，以促进研究神经符号代理的实验。代码：https://github.com/ibm/loa 摘要：We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. The demonstration for LOA experiments consists of a web-based interactive platform for text-based games and visualization for acquired knowledge for improving interpretability for trained rules. This demonstration also provides a comparison module with other neuro-symbolic approaches as well as non-symbolic state-of-the-art agent models on the same text-based games. Our LOA also provides open-sourced implementation in Python for the reinforcement learning environment to facilitate an experiment for studying neuro-symbolic agents. Code: https://github.com/ibm/loa

【5】 CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization 标题：CATRO：基于类感知跟踪率优化的通道修剪链接：https://arxiv.org/abs/2110.10921

作者：Wenzheng Hu,Ning Liu,Zhengping Che,Mingyang Li,Jian Tang,Changshui Zhang,Jianqiang Wang 摘要：深度卷积神经网络在许多应用场景中表现出高参数和计算冗余度，并且越来越多的工作探索了模型修剪以获得轻量级和高效的网络。然而，大多数现有的修剪方法是由经验启发式驱动的，很少考虑信道的联合影响，导致无保证和次优性能。在本文中，我们提出了一种新的基于类感知跟踪比优化（CATRO）的信道剪枝方法，以减少计算负担并加速模型推理。CATRO利用来自少数样本的类别信息，通过特征空间鉴别测量多个通道的联合影响，并整合保留通道的分层影响。通过将信道修剪描述为子模集函数最大化问题，CATRO通过两阶段贪婪迭代优化过程有效地解决了该问题。更重要的是，我们给出了CATRO收敛性和性能的理论证明。实验结果表明，CATRO算法在计算量相近的情况下获得了较高的精度，或者在计算量相近的情况下获得了较低的精度。此外，由于CATRO的类感知特性，它适合于为各种分类子任务自适应地修剪有效网络，从而在实际应用中方便地部署和使用深度网络。摘要：Deep convolutional neural networks are shown to be overkill with high parametric and computational redundancy in many application scenarios, and an increasing number of works have explored model pruning to obtain lightweight and efficient networks. However, most existing pruning approaches are driven by empirical heuristics and rarely consider the joint impact of channels, leading to unguaranteed and suboptimal performance. In this paper, we propose a novel channel pruning method via class-aware trace ratio optimization (CATRO) to reduce the computational burden and accelerate the model inference. Utilizing class information from a few samples, CATRO measures the joint impact of multiple channels by feature space discriminations and consolidates the layer-wise impact of preserved channels. By formulating channel pruning as a submodular set function maximization problem, CATRO solves it efficiently via a two-stage greedy iterative optimization procedure. More importantly, we present theoretical justifications on convergence and performance of CATRO. Experimental results demonstrate that CATRO achieves higher accuracy with similar computation cost or lower computation cost with similar accuracy than other state-of-the-art channel pruning algorithms. In addition, because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.

【6】 Utilizing Redundancy in Cost Functions for Resilience in Distributed Optimization and Learning 标题：利用代价函数中的冗余实现分布式优化和学习中的弹性链接：https://arxiv.org/abs/2110.10858

作者：Shuo Liu,Nirupam Gupta,Nitin Vaidya 机构：Nitin H. Vaidya § 备注：66 pages, 1 figure, and 1 table. Supersede our previous report arXiv:2106.03998 in asynchronous distributed optimization by containing the most of its results 摘要：本文研究了基于服务器架构的弹性分布式优化和随机机器学习问题。该系统包括一台服务器和多个代理，其中每个代理具有本地成本函数。代理与服务器协作，以找到其聚合成本函数的最小值。我们考虑的情况下，一些代理可能是异步和/或拜占庭故障。在这种情况下，分布式梯度下降（DGD）的经典算法变得无效。我们的目标是设计技术来提高DGD在异步和拜占庭故障下的效率。为此，我们首先提出一种方法，通过$（f，，r；epsilon）$-冗余的一般概念对代理的成本函数进行建模，其中$f$和$r$分别是拜占庭式故障和异步的参数，$epsilon$表示代理的成本函数之间的接近性。这使我们能够量化任何给定分布式优化问题的代理成本函数中存在的冗余水平。我们从理论和经验上证明了我们提出的冗余模型在提高DGD对异步和拜占庭代理的鲁棒性方面的优点，以及它们对分布式随机梯度下降（D-SGD）的扩展，以实现异步和拜占庭代理的鲁棒分布式机器学习。摘要：This paper considers the problem of resilient distributed optimization and stochastic machine learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has a local cost function. The agents collaborate with the server to find a minimum of their aggregate cost functions. We consider the case when some of the agents may be asynchronous and/or Byzantine faulty. In this case, the classical algorithm of distributed gradient descent (DGD) is rendered ineffective. Our goal is to design techniques improving the efficacy of DGD with asynchrony and Byzantine failures. To do so, we start by proposing a way to model the agents' cost functions by the generic notion of $(f, ,r; epsilon)$-redundancy where $f$ and $r$ are the parameters of Byzantine failures and asynchrony, respectively, and $epsilon$ characterizes the closeness between agents' cost functions. This allows us to quantify the level of redundancy present amongst the agents' cost functions, for any given distributed optimization problem. We demonstrate, both theoretically and empirically, the merits of our proposed redundancy model in improving the robustness of DGD against asynchronous and Byzantine agents, and their extensions to distributed stochastic gradient descent (D-SGD) for robust distributed machine learning with asynchronous and Byzantine agents.

【7】 A Data-Centric Optimization Framework for Machine Learning 标题：一种以数据为中心的机器学习优化框架链接：https://arxiv.org/abs/2110.10802

作者：Oliver Rausch,Tal Ben-Nun,Nikoli Dryden,Andrei Ivanov,Shigang Li,Torsten Hoefler 机构：Equal contribution 1Department of Computer Science 摘要：深度学习的快速发展导致了一系列快速变化的模型，对计算机的需求急剧增长。然而，由于框架专门针对流行网络中的模式进行优化，它们隐含地限制了推动研究进展的新颖多样的模型。我们通过定义一个灵活的、用户可定制的管道来优化基于数据移动最小化的任意深度神经网络的训练，为深度学习研究者提供了能力。管道从Pytork或ONNX中的标准网络开始，通过逐步降低来转换计算。我们定义了四个级别的通用转换，从局部操作员内部优化到全局数据移动减少。它们在以数据为中心的图形中间表示上运行，表示所有抽象级别的计算和数据移动，包括将卷积等基本运算符扩展到其底层计算。设计的核心是管道的互动性和可反思性。每个部分都可以通过pythonapi进行扩展，并且可以使用GUI进行交互调优。我们在十种不同的网络上展示了具有竞争力的性能或加速，通过交互式优化发现了EfficientNet中的新机会。摘要：Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute. However, as frameworks specialize optimization to patterns in popular networks, they implicitly constrain novel and diverse models that drive progress in research. We empower deep learning researchers by defining a flexible and user-customizable pipeline for optimizing training of arbitrary deep neural networks, based on data movement minimization. The pipeline begins with standard networks in PyTorch or ONNX and transforms computation through progressive lowering. We define four levels of general-purpose transformations, from local intra-operator optimizations to global data movement reduction. These operate on a data-centric graph intermediate representation that expresses computation and data movement at all levels of abstraction, including expanding basic operators such as convolutions to their underlying computations. Central to the design is the interactive and introspectable nature of the pipeline. Every part is extensible through a Python API, and can be tuned interactively using a GUI. We demonstrate competitive performance or speedups on ten different networks, with interactive optimizations discovering new opportunities in EfficientNet.

【8】 CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric 标题：CIM-PPO：基于刘氏熵诱导度量的近邻策略优化链接：https://arxiv.org/abs/2110.10522

作者：Yunxiao Guo,Han Long,Xiaojun Duan,Kaiyuan Feng,Maochu Li,Xiaying Ma 机构：alarewiththeNationalUniversityofDefenseTechnology 摘要：作为一种基于深度强化学习的算法，近端策略优化（PPO）在许多复杂任务中表现良好，是近年来最流行的RL算法之一。根据替代目标的惩罚机制，PPO可分为KL发散PPO（KL-PPO）和Clip函数PPO（Clip-PPO）。Clip-PPO广泛应用于各种实际场景中，引起了众多研究者的关注。因此，也产生了许多变体，使得算法越来越好。然而，KL-PPO作为一种更具理论性的算法，由于其性能不如CliP-PPO，因此被忽略。本文分析了KL散度对PPO目标函数的非对称性影响，并给出了非对称性何时会影响KL-PPO效率的不等式。提出了基于相关熵的PPO诱导度量算法（CIM-PPO），该算法利用了相关熵理论（一种在M估计中广泛用于评估两种分布差异的对称度量方法），并将其应用于PPO。然后，我们设计了基于OpenAIgym的实验来测试新算法的有效性，并将其与KL-PPO和CliP-PPO进行了比较。摘要：As an algorithm based on deep reinforcement learning, Proximal Policy Optimization (PPO) performs well in many complex tasks and has become one of the most popular RL algorithms in recent years. According to the mechanism of penalty in surrogate objective, PPO can be divided into PPO with KL Divergence (KL-PPO) and PPO with Clip function(Clip-PPO). Clip-PPO is widely used in a variety of practical scenarios and has attracted the attention of many researchers. Therefore, many variations have also been created, making the algorithm better and better. However, as a more theoretical algorithm, KL-PPO was neglected because its performance was not as good as CliP-PPO. In this article, we analyze the asymmetry effect of KL divergence on PPO's objective function , and give the inequality that can indicate when the asymmetry will affect the efficiency of KL-PPO. Proposed PPO with Correntropy Induced Metric algorithm(CIM-PPO) that use the theory of correntropy(a symmetry metric method that was widely used in M-estimation to evaluate two distributions' difference)and applied it in PPO. Then, we designed experiments based on OpenAIgym to test the effectiveness of the new algorithm and compare it with KL-PPO and CliP-PPO.

【9】 ProxyBO: Accelerating Neural Architecture Search via Bayesian Optimization with Zero-cost Proxies 标题：ProxyBO：基于零成本代理的贝叶斯优化加速神经结构搜索链接：https://arxiv.org/abs/2110.10423

作者：Yu Shen,Yang Li,Jian Zheng,Wentao Zhang,Peng Yao,Jixiang Li,Sen Yang,Ji Liu,Cui Bin 机构：Key Laboratory of High Confidence Software Technologies (MOE), School of EECS, Peking University, China, School of Computer Science and Engineering, Beihang University, China ,Kuaishou Technology, China 摘要：设计神经结构需要大量的手工工作。这促进了神经架构搜索（NAS）的发展，使设计自动化。虽然以前的NAS方法取得了有希望的结果，但运行速度较慢，零成本代理运行速度极快，但前景不大，但最近的工作考虑通过简单的预热来利用零成本代理。现有的方法有两个局限性，即不可预见的可靠性和一次性使用。为了解决这些局限性，我们提出了ProxyBO，一种有效的贝叶斯优化框架，它利用零成本代理来加速神经结构搜索。我们提出了泛化能力度量来估计每个迭代过程中代理对任务的适应度，然后通过动态影响组合将BO与零成本代理相结合。广泛的实证研究表明，ProxyBO在三个公共基准的五项任务上始终优于竞争基线。具体而言，ProxyBO在最先进的REA和BRP-NAS方法上分别实现了5.41倍和3.83倍的加速。摘要：Designing neural architectures requires immense manual efforts. This has promoted the development of neural architecture search (NAS) to automate this design. While previous NAS methods achieve promising results but run slowly and zero-cost proxies run extremely fast but are less promising, recent work considers utilizing zero-cost proxies via a simple warm-up. The existing method has two limitations, which are unforeseeable reliability and one-shot usage. To address the limitations, we present ProxyBO, an efficient Bayesian optimization framework that utilizes the zero-cost proxies to accelerate neural architecture search. We propose the generalization ability measurement to estimate the fitness of proxies on the task during each iteration and then combine BO with zero-cost proxies via dynamic influence combination. Extensive empirical studies show that ProxyBO consistently outperforms competitive baselines on five tasks from three public benchmarks. Concretely, ProxyBO achieves up to 5.41x and 3.83x speedups over the state-of-the-art approach REA and BRP-NAS, respectively.

【10】 Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond 标题：带洗牌的MiniBatch vs Local SGD：紧收敛界限和超越链接：https://arxiv.org/abs/2110.10342

作者：Chulhee Yun,Shashank Rajput,Suvrit Sra 机构：Massachusetts Institute of Technology, University of Wisconsin-Madison 备注：72 pages 摘要：在分布式学习中，局部SGD（也称为联邦平均）及其简单的基线小批量SGD是广泛研究的优化方法。这些方法的大多数现有分析假设通过替换抽样获得独立且无偏的梯度估计。相比之下，我们研究了基于洗牌的变体：小批量和局部随机洗牌，它们在不替换的情况下绘制随机梯度，因此更接近实践。对于满足Polyak-{L}ojasiewicz条件的光滑函数，我们得到了收敛界（在大历元区域），这表明这些基于洗牌的变体比替换变体收敛更快。此外，我们证明了匹配下界，表明我们的收敛性分析是严密的。最后，我们提出了一种称为同步洗牌（synchronizedshuffling）的算法改进，在近似齐次设置下，该算法的收敛速度快于我们的下限。摘要：In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak-{L}ojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-based variants converge faster than their with-replacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is tight. Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.

【11】 NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters 标题：NAS-HPO-BENCH-II：卷积神经网络结构与训练超参数联合优化的基准数据集链接：https://arxiv.org/abs/2110.10165

作者：Yoichi Hirose,Nozomu Yoshinari,Shinichi Shirakawa 机构：Yokohama National University, Kanagawa, Japan, Editors: Vineeth N Balasubramanian and Ivor Tsang 备注：16 pages, 6 figures. Accepted at ACML2021 (long oral). API is available at this https URL 摘要：神经架构搜索（NAS）的基准数据集已经开发出来，以减轻计算代价高昂的评估过程，并确保公平比较。最近的NAS基准只关注体系结构优化，尽管训练超参数会影响获得的模型性能。构建用于联合优化体系结构和训练超参数的基准数据集对于进一步的NAS研究至关重要。现有的NAS HPO平台是联合优化的基准，但它不考虑网络NAS算法在现代NAS算法中的设计。本文介绍了第一个用于联合优化网络连接和训练超参数的基准数据集，我们称之为NAS HPO Bench II。我们收集了在不同学习率和批量设置的CIFAR-10数据集上训练的基于4K单元的卷积神经网络结构的性能数据，得到了192K配置的数据。数据集包括12个历元训练的精确数据。我们进一步建立代理模型，预测200个历元训练后的精度，以提供更长训练历元的性能数据。通过分析NAS-HPO-Bench-II，我们确认了体系结构与训练超参数之间的依赖性以及联合优化的必要性。最后，我们使用NAS HPO Bench II演示了基线优化算法的基准测试。摘要：The benchmark datasets for neural architecture search (NAS) have been developed to alleviate the computationally expensive evaluation process and ensure a fair comparison. Recent NAS benchmarks only focus on architecture optimization, although the training hyperparameters affect the obtained model performances. Building the benchmark dataset for joint optimization of architecture and training hyperparameters is essential to further NAS research. The existing NAS-HPO-Bench is a benchmark for joint optimization, but it does not consider the network connectivity design as done in modern NAS algorithms. This paper introduces the first benchmark dataset for joint optimization of network connections and training hyperparameters, which we call NAS-HPO-Bench-II. We collect the performance data of 4K cell-based convolutional neural network architectures trained on the CIFAR-10 dataset with different learning rate and batch size settings, resulting in the data of 192K configurations. The dataset includes the exact data for 12 epoch training. We further build the surrogate model predicting the accuracies after 200 epoch training to provide the performance data of longer training epoch. By analyzing NAS-HPO-Bench-II, we confirm the dependency between architecture and training hyperparameters and the necessity of joint optimization. Finally, we demonstrate the benchmarking of the baseline optimization algorithms using NAS-HPO-Bench-II.

【12】 On Optimal Interpolation In Linear Regression 标题：关于线性回归中的最优插值链接：https://arxiv.org/abs/2110.11258

作者：Eduard Oravkin,Patrick Rebeschini 机构：University of Oxford 备注：25 pages, 7 figures, to appear in NeurIPS 2021 摘要：理解插值方法何时以及为什么能很好地推广是统计学习理论中的一个有趣的话题。然而，系统地将插值方法与可实现的最优性概念联系起来只得到了部分关注。在本文中，我们研究了什么是插值的最佳方法的问题在线性回归中，使用响应变量中的线性函数（如岭回归中的Bayes最优估计）依赖于数据、数据的总体协方差、信号的信噪比和先验协方差，但不依赖于信号本身的值或训练数据中的噪声向量。我们为插值器提供了一个封闭形式的表达式，实现了这种最优性概念，并表明它可以将N作为预条件梯度下降的极限，用一个特定的初始化，我们确定了一个最小范数插值器可推广地比我们所介绍的最优响应线性可实现插值器更坏的推广，并用数值实验验证了我们认为CA的最优性概念。n在各向同性先验条件下，可以通过仅使用训练数据作为输入的插值方法来实现。最后，我们将最优响应线性插值的概念扩展到线性数据生成模型下的随机特征回归，该线性数据生成模型已在文献中研究过。摘要：Understanding when and why interpolating methods generalize well has recently been a topic of interest in statistical learning theory. However, systematically connecting interpolating methods to achievable notions of optimality has only received partial attention. In this paper, we investigate the question of what is the optimal way to interpolate in linear regression using functions that are linear in the response variable (as the case for the Bayes optimal estimator in ridge regression) and depend on the data, the population covariance of the data, the signal-to-noise ratio and the covariance of the prior for the signal, but do not depend on the value of the signal itself nor the noise vector in the training data. We provide a closed-form expression for the interpolator that achieves this notion of optimality and show that it can be derived as the limit of preconditioned gradient descent with a specific initialization. We identify a regime where the minimum-norm interpolator provably generalizes arbitrarily worse than the optimal response-linear achievable interpolator that we introduce, and validate with numerical experiments that the notion of optimality we consider can be achieved by interpolating methods that only use the training data as input in the case of an isotropic prior. Finally, we extend the notion of optimal response-linear interpolation to random features regression under a linear data-generating model that has been previously studied in the literature.

【13】 Stochastic Learning Rate Optimization in the Stochastic Approximation and Online Learning Settings 标题：随机近似和在线学习环境下的随机学习率优化链接：https://arxiv.org/abs/2110.10710

作者：Theodoros Mamalis,Dusan Stipanovic,Petros Voulgaris 机构：Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, N Wright St, Urbana, IL , USA, Dušan Stipanovi´c, Coordinated Science Laboratory, W Main St, Urbana, IL , USA, Department of Mechanical Engineering, University of Nevada 摘要：本文将乘性随机性应用于随机优化算法的学习率，提出了随机学习率方案。期望给出随机梯度下降算法在随机环境下的理论收敛结果，以及在线优化环境下的收敛结果。实证结果考虑自适应均匀分布乘法随机性的情况下，不仅包括随机梯度下降，但也有其他流行的算法配备随机学习率。相对于确定性学习率版本，它们表现出显著的优化性能增益。摘要：In this work, multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rate schemes. In-expectation theoretical convergence results of Stochastic Gradient Descent equipped with this novel stochastic learning rate scheme under the stochastic setting, as well as convergence results under the online optimization settings are provided. Empirical results consider the case of an adaptively uniformly distributed multiplicative stochasticity and include not only Stochastic Gradient Descent, but also other popular algorithms equipped with a stochastic learning rate. They demonstrate noticeable optimization performance gains, with respect to their deterministic-learning-rate versions.

预测|估计(6篇)

【1】 Variational Predictive Routing with Nested Subjective Timescales 标题：主观时间尺度嵌套的变分预测路由链接：https://arxiv.org/abs/2110.11236

作者：Alexey Zakharov,Qinghai Guo,Zafeirios Fountas 机构：Huawei Technologies, London, UK, Shenzhen, China 备注：18 pages, 13 figures 摘要：序列数据中潜在时空层次结构的发现和学习是机器学习的一个重要课题。尽管如此，研究层次生成模型的工作很少，这些模型可以灵活地调整其分层表示，以响应具有不同时间动态的数据集。在这里，我们提出了变分预测路由（VPR）——一种神经概率推理系统，它根据视频特征的变化率在时间层次中组织视频特征的潜在表示，从而将连续数据建模为一个层次更新过程。通过采用完全依赖于系统潜在表示的事件检测机制（无需单独的模型），VPR能够在观察到的特征发生变化后动态调整其内部状态，从而促进模型潜在层次中表示的优化组织。通过使用多个视频数据集，我们证明了VPR能够检测事件边界，分离其层次结构中的时空特征，适应数据的动态，并产生准确的未来时间不可知的滚动。我们的方法整合了神经科学的见解，并引入了一个在基于模型的强化学习中具有很高应用潜力的框架，其中灵活且信息丰富的状态空间展开是特别令人感兴趣的。摘要：Discovery and learning of an underlying spatiotemporal hierarchy in sequential data is an important topic for machine learning. Despite this, little work has been done to explore hierarchical generative models that can flexibly adapt their layerwise representations in response to datasets with different temporal dynamics. Here, we present Variational Predictive Routing (VPR) - a neural probabilistic inference system that organizes latent representations of video features in a temporal hierarchy, based on their rates of change, thus modeling continuous data as a hierarchical renewal process. By employing an event detection mechanism that relies solely on the system's latent representations (without the need of a separate model), VPR is able to dynamically adjust its internal state following changes in the observed features, promoting an optimal organisation of representations across the levels of the model's latent hierarchy. Using several video datasets, we show that VPR is able to detect event boundaries, disentangle spatiotemporal features across its hierarchy, adapt to the dynamics of the data, and produce accurate time-agnostic rollouts of the future. Our approach integrates insights from neuroscience and introduces a framework with high potential for applications in model-based reinforcement learning, where flexible and informative state-space rollouts are of particular interest.

【2】 PPFS: Predictive Permutation Feature Selection 标题：PPFS：预测置换特征选择链接：https://arxiv.org/abs/2110.10713

作者：Atif Hassan,Jiaul H. Paik,Swanand Khare,Syed Asif Hassan 机构： Indian Institute of Technology Kharagpur, King Abdulaziz University 备注：7 pages. For the implementation of this work, see this https URL 摘要：我们提出了预测置换特征选择（PPFS），这是一种基于马尔可夫覆盖（MB）概念的新的基于包装器的特征选择方法。与以前的MB方法不同，PPFS是一种通用的特征选择技术，因为它可以用于包含分类和/或连续特征的数据集上的分类和回归任务。我们提出了预测置换独立性（PPI），这是一种新的条件独立性（CI）测试，它可以将PPF分类为包装器特征选择方法。这与当前基于滤波器的MB特征选择技术形成对比，后者无法利用梯度增强机（GBM）等有监督算法的进步。PPI测试基于仿冒框架，利用监督算法测量单个或一组特征与目标变量之间的关联。我们还提出了一个新的MB聚合步骤，解决了样本效率低下的问题。对大量数据集的实证评估和比较表明，PPFS优于最先进的马尔可夫覆盖发现算法以及著名的包装器方法。我们还提供了我们的方法正确性证明的草图。此工作的实施可在url上获得{https://github.com/atif-hassan/PyImpetus} 摘要：We propose Predictive Permutation Feature Selection (PPFS), a novel wrapper-based feature selection method based on the concept of Markov Blanket (MB). Unlike previous MB methods, PPFS is a universal feature selection technique as it can work for both classification as well as regression tasks on datasets containing categorical and/or continuous features. We propose Predictive Permutation Independence (PPI), a new Conditional Independence (CI) test, which enables PPFS to be categorised as a wrapper feature selection method. This is in contrast to current filter based MB feature selection techniques that are unable to harness the advancements in supervised algorithms such as Gradient Boosting Machines (GBM). The PPI test is based on the knockoff framework and utilizes supervised algorithms to measure the association between an individual or a set of features and the target variable. We also propose a novel MB aggregation step that addresses the issue of sample inefficiency. Empirical evaluations and comparisons on a large number of datasets demonstrate that PPFS outperforms state-of-the-art Markov blanket discovery algorithms as well as, well-known wrapper methods. We also provide a sketch of the proof of correctness of our method. Implementation of this work is available at url{https://github.com/atif-hassan/PyImpetus}

【3】 Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction 标题：基于翻译对预测的改进社交媒体文本多语言模型预训练链接：https://arxiv.org/abs/2110.10318

作者：Shubhanshu Mishra,Aria Haghighi 机构：Twitter, Inc. 备注：Camera ready version. Accepted to WNUT 2021. Code for reproducing the experiments can be found at: this https URL 摘要：我们评估了一种简单的方法，通过添加一个称为翻译对预测（TPP）的训练前任务来改善社交媒体语料库上mBERT的Zero-Shot多语言迁移，该任务预测一对跨语言文本是否是有效的翻译。我们的方法假设可以访问源-目标语言对之间的翻译（精确或近似），我们在源语言任务数据上微调模型，并在目标语言中评估模型。特别是，我们关注的是对于mBERT来说迁移学习很困难的语言对：那些源语言和目标语言在脚本、词汇和语言类型上不同的语言对。我们在两项社交媒体任务上展示了TPP预训练在从英语到印地语、阿拉伯语和日语的Zero-Shot转换方面比mBERT单独进行的改进：NER（目标语言F1平均相对提高37%）和社交媒体文本的情感分类（F1相对提高12%），同时还对通用依赖词性标注的非社交媒体任务进行基准测试（准确度相对提高6.7%）。考虑到缺乏社交媒体双文本语料库，我们的结果是有希望的。我们的代码可在以下网址找到：https://github.com/twitter-research/multilingual-alignment-tpp. 摘要：We evaluate a simple approach to improving zero-shot multilingual transfer of mBERT on social media corpus by adding a pretraining task called translation pair prediction (TPP), which predicts whether a pair of cross-lingual texts are a valid translation. Our approach assumes access to translations (exact or approximate) between source-target language pairs, where we fine-tune a model on source language task data and evaluate the model in the target language. In particular, we focus on language pairs where transfer learning is difficult for mBERT: those where source and target languages are different in script, vocabulary, and linguistic typology. We show improvements from TPP pretraining over mBERT alone in zero-shot transfer from English to Hindi, Arabic, and Japanese on two social media tasks: NER (a 37% average relative improvement in F1 across target languages) and sentiment classification (12% relative improvement in F1) on social media text, while also benchmarking on a non-social media task of Universal Dependency POS tagging (6.7% relative improvement in accuracy). Our results are promising given the lack of social media bitext corpus. Our code can be found at: https://github.com/twitter-research/multilingual-alignment-tpp.

【4】 On Coordinate Decoding for Keypoint Estimation Tasks 标题：关键点估计任务的坐标译码研究链接：https://arxiv.org/abs/2110.10289

作者：Anargyros Chatzitofis,Nikolaos Zioulis,Georgios Nikolaos Albanis,Dimitrios Zarpalas,Petros Daras 机构：Georgios Albanis, Centre for Research & Technology Hellas, Information Technologies Institute, th km Charilaou-Thermi, Thessaloniki, Greece, Reproducibility Summary, Scope of Reproducibility 摘要：一系列2D（和3D）关键点估计任务建立在热图坐标表示的基础上，即概率图，允许对网格上的关键点坐标进行可学习和空间感知的编码和解码，甚至允许亚像素坐标精度。在本报告中，我们的目的是通过强调地面真实热图编码和将预测热图解码到关键点坐标的重要性，重现黑暗研究2D热图表示的发现。作者声称a）一种更具原则性的分布感知坐标解码方法克服了文献中广泛使用的标准技术的局限性，b）通过生成精确且连续的热图分布，从地面真实坐标重构热图，从而实现无偏模型训练，与标准坐标编码过程相反，标准坐标编码过程根据输入图像网格的分辨率对关键点坐标进行量化。摘要：A series of 2D (and 3D) keypoint estimation tasks are built upon heatmap coordinate representation, i.e. a probability map that allows for learnable and spatially aware encoding and decoding of keypoint coordinates on grids, even allowing for sub-pixel coordinate accuracy. In this report, we aim to reproduce the findings of DARK that investigated the 2D heatmap representation by highlighting the importance of the encoding of the ground truth heatmap and the decoding of the predicted heatmap to keypoint coordinates. The authors claim that a) a more principled distribution-aware coordinate decoding method overcomes the limitations of the standard techniques widely used in the literature, and b), that the reconstruction of heatmaps from ground-truth coordinates by generating accurate and continuous heatmap distributions lead to unbiased model training, contrary to the standard coordinate encoding process that quantizes the keypoint coordinates on the resolution of the input image grid.

【5】 What Averages Do Not Tell -- Predicting Real Life Processes with Sequential Deep Learning 标题：平均数不能说明什么--用序贯深度学习预测现实生活过程链接：https://arxiv.org/abs/2110.10225

作者：István Ketykó,Felix Mannhardt,Marwan Hassani,Boudewijn van Dongen 机构：Eindhoven University of Technology, Mathematics and, Computer Science, Eindhoven, the Netherlands, Boudewijn F. van Dongen 摘要：自然语言、计算机视觉和信号处理领域的成功证明，深度学习是对序列数据建模的有效工具。流程挖掘关注从支持信息系统记录的业务流程执行数据中发现对业务流程的见解。记录的数据（事件日志）由与流程执行相对应的事件序列（跟踪）组成。许多深度学习技术已经成功地应用于预测过程挖掘，其目的是预测过程结果、剩余时间、下一个事件，甚至是运行轨迹的后缀。过程挖掘中的痕迹是多模态序列，其结构与自然语言句子或图像非常不同。这可能需要不同的处理方法。到目前为止，人们很少关注这些差异和带来的挑战。将后缀预测视为这些任务中最具挑战性的任务，深度学习模型的性能仅在平均测量值和少量实际事件日志中进行评估。由于不同的预处理和评估策略，比较论文之间的结果很困难。与此相关的挑战可能是跟踪长度分布的偏斜和现实事件日志中活动分布的偏斜。我们提供了一个端到端框架，可以比较七种最先进的顺序体系结构在常见设置下的性能。结果表明，对于大多数更复杂的数据集，序列建模仍有很大的改进空间。需要进一步研究和深入了解，以获得一致的性能，不仅是在平均指标上，而且在所有前缀上。摘要：Deep Learning is proven to be an effective tool for modeling sequential data as shown by the success in Natural Language, Computer Vision and Signal Processing. Process Mining concerns discovering insights on business processes from their execution data that are logged by supporting information systems. The logged data (event log) is formed of event sequences (traces) that correspond to executions of a process. Many Deep Learning techniques have been successfully adapted for predictive Process Mining that aims to predict process outcomes, remaining time, the next event, or even the suffix of running traces. Traces in Process Mining are multimodal sequences and very differently structured than natural language sentences or images. This may require a different approach to processing. So far, there has been little focus on these differences and the challenges introduced. Looking at suffix prediction as the most challenging of these tasks, the performance of Deep Learning models was evaluated only on average measures and for a small number of real-life event logs. Comparing the results between papers is difficult due to different pre-processing and evaluation strategies. Challenges that may be relevant are the skewness of trace-length distribution and the skewness of the activity distribution in real-life event logs. We provide an end-to-end framework which enables to compare the performance of seven state-of-the-art sequential architectures in common settings. Results show that sequence modeling still has a lot of room for improvement for majority of the more complex datasets. Further research and insights are required to get consistent performance not just in average measures but additionally over all the prefixes.

【6】 Joint Gaussian Graphical Model Estimation: A Survey 标题：联合高斯图形模型估计：综述链接：https://arxiv.org/abs/2110.10281

作者：Katherine Tsai,Oluwasanmi Koyejo,Mladen Kolar 机构：Department of Electrical and Computer Engineering, University of Illinois at, Department of Computer Science, University of Illinois at Urbana-Champaign, Booth School of Business, The University of Chicago 摘要：复杂系统中的图形通常跨域共享部分底层结构，同时保留单个特征。因此，例如，当应用于科学发现或临床诊断时，识别常见结构可以揭示潜在信号。此外，越来越多的证据表明，跨域的共享结构提高了图的估计能力，特别是对于高维数据。然而，建立一个联合估计器来提取公共结构可能比看起来更复杂，这通常是由于数据源之间的异构性。这篇手稿调查了联合高斯图形模型统计推断的最新工作，确定了适合各种数据生成过程的模型结构。在不同的数据生成过程下进行了仿真，并对模型的选择进行了详细讨论。摘要：Graphs from complex systems often share a partial underlying structure across domains while retaining individual features. Thus, identifying common structures can shed light on the underlying signal, for instance, when applied to scientific discoveries or clinical diagnoses. Furthermore, growing evidence shows that the shared structure across domains boosts the estimation power of graphs, particularly for high-dimensional data. However, building a joint estimator to extract the common structure may be more complicated than it seems, most often due to data heterogeneity across sources. This manuscript surveys recent work on statistical inference of joint Gaussian graphical models, identifying model structures that fit various data generation processes. Simulations under different data generation processes are implemented with detailed discussions on the choice of models.

其他神经网络|深度学习|模型|建模(37篇)

【1】 RoQNN: Noise-Aware Training for Robust Quantum Neural Networks 标题：RoQNN：鲁棒量子神经网络的噪声感知训练链接：https://arxiv.org/abs/2110.11331

作者：Hanrui Wang,Jiaqi Gu,Yongshan Ding,Zirui Li,Frederic T. Chong,David Z. Pan,Song Han 机构：EECS, Massachusetts Institute of Technology, ECE, University of Texas at Austin, CS, Yale University, CS, Shanghai Jiao Tong University, CS, University of Chicago 备注：19 pages, 10 figures, open-source at this https URL 摘要：量子神经网络（QNN）是在近期量子硬件上实现量子优势的一种很有前途的应用。然而，由于量子噪声（误差）较大，QNN模型在实际量子器件上的性能严重下降。例如，对于MNIST-4分类，IBMQ Yorktown上无噪声模拟和噪声结果之间的精度差距超过60%。现有的噪声抑制方法是通用的，没有利用QNN的独特特性，只适用于推理；另一方面，现有的QNN工作不考虑噪声效应。为此，我们提出了RoQNN，一个QNN特定的框架，用于在训练和推理阶段执行噪声感知优化，以提高鲁棒性。我们通过分析推导和实验观察到，量子噪声对QNN测量结果的影响是无噪声结果的线性映射，具有标度和移位因子。基于此，我们提出了测量后标准化来缓解无噪声和噪声场景之间的特征分布差异。此外，为了提高对噪声的鲁棒性，我们根据量子硬件的实际噪声模型，通过在QNN中插入量子误差门，提出了在训练过程中注入噪声的方法。最后，引入测量后量化，将测量结果量化为离散值，达到去噪效果。使用6台量子设备对8项分类任务进行的大量实验表明，RoQNN的分类精度提高了43%，在实际量子计算机上实现了94%以上的2级、80%以上的4级和34%以上的10级MNIST分类精度。我们还为QNN的施工和噪声意识训练开放了PyTorch图书馆https://github.com/mit-han-lab/pytorch-quantum . 摘要：Quantum Neural Network (QNN) is a promising application towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of QNN models has a severe degradation on real quantum devices. For example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is over 60%. Existing noise mitigation methods are general ones without leveraging unique characteristics of QNN and are only applicable to inference; on the other hand, existing QNN work does not consider noise effect. To this end, we present RoQNN, a QNN-specific framework to perform noise-aware optimizations in both training and inference stages to improve robustness. We analytically deduct and experimentally observe that the effect of quantum noise to QNN measurement outcome is a linear map from noise-free outcome with a scaling and a shift factor. Motivated by that, we propose post-measurement normalization to mitigate the feature distribution differences between noise-free and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to QNN according to realistic noise models of quantum hardware. Finally, post-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect. Extensive experiments on 8 classification tasks using 6 quantum devices demonstrate that RoQNN improves accuracy by up to 43%, and achieves over 94% 2-class, 80% 4-class, and 34% 10-class MNIST classification accuracy measured on real quantum computers. We also open-source our PyTorch library for construction and noise-aware training of QNN at https://github.com/mit-han-lab/pytorch-quantum .

【2】 CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP 标题：Clob：拥有InfoLOOB的现代Hopfield网络优于CLIP 链接：https://arxiv.org/abs/2110.11316

作者：Andreas Fürst,Elisabeth Rumetshofer,Viet Tran,Hubert Ramsauer,Fei Tang,Johannes Lehner,David Kreil,Michael Kopp,Günter Klambauer,Angela Bitto-Nemling,Sepp Hochreiter 机构： ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria, Institute of Advanced Research in Artificial Intelligence (IARAI), HERE Technologies, ∗ Equal contribution 备注：14 pages ( appendix); Blog: this https URL GitHub: this https URL 摘要：在各种自我监督学习任务中，以InfoNCE为目标的对比学习非常成功。最近，当使用InfoNCE从自然语言监督中学习视觉表征时，CLIP模型在Zero-Shot迁移学习方面取得了令人印象深刻的结果。然而，InfoNCE作为互信息的下限，在高互信息的情况下表现不佳。相比之下，InfoLOOB上界（保留一个外界）适用于高互信息，但存在较大的方差和不稳定性。我们介绍了“对比遗漏提升”（CLOOB），现代Hopfield网络通过InfoLOOB目标提升学习。现代Hopfield网络通过检索InfoLOOB目标中的嵌入来替换原始嵌入。检索到的嵌入为InfoLOOB提供了两个资产。首先，检索到的嵌入稳定InfoLOOB，因为它们比原始嵌入噪音更小，彼此更相似。其次，由于嵌入的协方差结构通过检索得到加强，因此它们通过相关性得到丰富。我们比较了CLOOB和CLIP在概念性字幕和YFCC数据集上的学习后在其他数据集上的Zero-Shot转移学习性能。在所有考虑的体系结构和数据集中，CLOOB始终优于CLIP零炮转移学习。摘要：Contrastive learning with the InfoNCE objective is exceptionally successful in various self-supervised learning tasks. Recently, the CLIP model yielded impressive results on zero-shot transfer learning when using InfoNCE for learning visual representations from natural language supervision. However, InfoNCE as a lower bound on the mutual information has been shown to perform poorly for high mutual information. In contrast, the InfoLOOB upper bound (leave one out bound) works well for high mutual information but suffers from large variance and instabilities. We introduce "Contrastive Leave One Out Boost" (CLOOB), where modern Hopfield networks boost learning with the InfoLOOB objective. Modern Hopfield networks replace the original embeddings by retrieved embeddings in the InfoLOOB objective. The retrieved embeddings give InfoLOOB two assets. Firstly, the retrieved embeddings stabilize InfoLOOB, since they are less noisy and more similar to one another than the original embeddings. Secondly, they are enriched by correlations, since the covariance structure of embeddings is reinforced through retrievals. We compare CLOOB to CLIP after learning on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets.

【3】 Center Loss Regularization for Continual Learning 标题：中心损失正则化在持续学习中的应用链接：https://arxiv.org/abs/2110.11314

作者：Kaustubh Olpadkar,Ekta Gavas 机构：Stony Brook University, NY, USA, IIIT Hyderabad, India 备注：16 pages, 9 figures, Submitted to the ICLR 2022 conference 摘要：顺序学习不同任务的能力对人工智能的发展至关重要。一般来说，神经网络缺乏这种能力，主要障碍是灾难性遗忘。当从非平稳数据分布中不断获取增量可用信息时，就会发生这种情况，破坏了模型已经学到的知识。我们的方法在保持决策边界不变的情况下，通过将新任务的表示投影到接近旧任务的表示来记住旧任务。我们采用中心损失作为正则化惩罚，强制新任务的特征具有与旧任务相同的类中心，并使特征具有高度的区分性。这反过来又导致对已学习信息的遗忘最少。这种方法易于实现，需要最小的计算和内存开销，并允许神经网络在许多连续遇到的任务中保持高性能。我们还证明了将中心丢失与内存重放结合使用优于其他基于重放的策略。除了用于持续学习的标准MNIST变体外，我们还将我们的方法应用于具有数字和PACS数据集的持续域适应场景。我们证明了我们的方法是可扩展的、有效的，并且与最先进的持续学习方法相比具有竞争力。摘要：The ability to learn different tasks sequentially is essential to the development of artificial intelligence. In general, neural networks lack this capability, the major obstacle being catastrophic forgetting. It occurs when the incrementally available information from non-stationary data distributions is continually acquired, disrupting what the model has already learned. Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks while keeping the decision boundaries unchanged. We employ the center loss as a regularization penalty that enforces new tasks' features to have the same class centers as old tasks and makes the features highly discriminative. This, in turn, leads to the least forgetting of already learned information. This method is easy to implement, requires minimal computational and memory overhead, and allows the neural network to maintain high performance across many sequentially encountered tasks. We also demonstrate that using the center loss in conjunction with the memory replay outperforms other replay-based strategies. Along with standard MNIST variants for continual learning, we apply our method to continual domain adaptation scenarios with the Digits and PACS datasets. We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.

【4】 Fast Model Editing at Scale 标题：按比例快速编辑模型链接：https://arxiv.org/abs/2110.11309

作者：Eric Mitchell,Charles Lin,Antoine Bosselut,Chelsea Finn,Christopher D. Manning 机构：Stanford University 备注：View implementation and additional project info at this https URL 摘要：虽然大型预先训练的模型在各种下游任务上取得了令人印象深刻的结果，但现有最大的模型仍然会出错，甚至准确的预测也可能随着时间的推移而过时。由于在训练时检测所有此类故障是不可能的，因此使此类模型的开发人员和最终用户能够纠正不准确的输出，同时保持模型的完整性是可取的。然而，大型神经网络学习的表示的分布式、黑盒性质使得生成此类目标编辑变得困难。如果只有一个有问题的输入和新的期望输出，微调方法往往会过度拟合；当应用于非常大的模型时，其他编辑算法要么在计算上不可行，要么根本无效。为了方便大规模的事后编辑，我们提出了带有梯度分解（MEND）的模型编辑器网络，这是一个小型辅助编辑网络的集合，使用单个期望的输入输出对对对预先训练的模型进行快速的局部编辑。MEND学习变换通过标准微调获得的梯度，使用梯度的低秩分解使该变换的参数化易于处理。MEND可以在不到一天的时间内在单个GPU上训练，即使是100亿以上的参数模型；经过训练后，MEND可以快速将新编辑应用于预训练的模型。我们对T5、GPT、BERT和BART模型的实验表明，MEND是模型编辑的唯一方法，可以对具有数千万到100亿个参数的模型进行有效编辑。可在https://sites.google.com/view/mend-editing. 摘要：While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is desirable. However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted edits difficult. If presented with only a single problematic input and new desired output, fine-tuning approaches tend to overfit; other editing algorithms are either computationally infeasible or simply ineffective when applied to very large models. To enable easy post-hoc editing at scale, we propose Model Editor Networks with Gradient Decomposition (MEND), a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model. MEND learns to transform the gradient obtained by standard fine-tuning, using a low-rank decomposition of the gradient to make the parameterization of this transformation tractable. MEND can be trained on a single GPU in less than a day even for 10 billion parameter models; once trained MEND enables rapid application of new edits to the pre-trained model. Our experiments with T5, GPT, BERT, and BART models show that MEND is the only approach to model editing that produces effective edits for models with tens of millions to over 10 billion parameters. Implementation available at https://sites.google.com/view/mend-editing.

【5】 OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis 标题：OpenABC-D：用于机器学习指导的集成电路综合的大规模数据集链接：https://arxiv.org/abs/2110.11292

作者：Animesh Basak Chowdhury,Benjamin Tan,Ramesh Karri,Siddharth Garg 机构：New York University 备注：18 pages 摘要：在集成电路（IC）设计中，逻辑综合是一个具有挑战性且被广泛研究的组合优化问题。它用Verilog等编程语言将硬件的高级描述转换为优化的数字电路网表，即实现该功能的互连布尔逻辑门网络。由于ML在解决其他领域的组合和图形问题方面的成功，人们对ML引导的逻辑综合工具的设计越来越感兴趣。然而，没有为这个问题领域定义标准数据集或原型学习任务。在这里，我们描述了OpenABC-D，这是一个大规模的标记数据集，通过使用领先的开源逻辑合成工具合成开源设计而产生，并说明了它在开发、评估和基准测试ML引导的逻辑合成中的使用。OpenABC-D具有中间和最终输出，其形式为870000和逆变器图（AIG），由1500次合成运行生成，加上标签，如优化节点数和布局。我们在此数据集上定义了一个通用的学习问题，并对现有的解决方案进行了基准测试。与数据集创建和基准模型相关的代码可用athttps://github.com/NYU-MLDA/OpenABC.git. 生成的数据集可用athttps://archive.nyu.edu/handle/2451/63311 摘要：Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git. The dataset generated is available athttps://archive.nyu.edu/handle/2451/63311

【6】 Modeling the AC Power Flow Equations with Optimally Compact Neural Networks: Application to Unit Commitment 标题：基于最优紧凑神经网络的交流潮流方程建模及其在机组组合中的应用链接：https://arxiv.org/abs/2110.11269

作者：Alyssa Kody,Samuel Chevalier,Spyros Chatzivasileiadis,Daniel Molzahn 机构： Department of Electrical Engineering, TechnicalUniversity of Denmark (DTU), Daniel Molzahn is with the School of Electrical and Computer Engineering, Georgia Institute of Technology 备注：first two authors equally contributed, 8 pages, 3 figures, 1 table 摘要：非线性潮流约束使得各种电力系统优化问题难以计算。然而，新的研究表明，使用神经网络（NNs）可以成功地建模非线性交流潮流方程。这些神经网络可以精确地转化为混合整数线性规划（MILP）并嵌入到具有挑战性的优化问题中，从而用易于处理的分段线性近似代替许多应用中难以处理的非线性。然而，这种方法面临着表示NN所需的二进制变量数量激增的问题。因此，本文发展了一种训练“最优紧凑”神经网络的技术，即在保持可处理的二进制变量数量的同时，能够以足够高的精度表示潮流方程的神经网络。我们表明，当嵌入具有挑战性的优化问题（即交流机组组合问题）中时，所得到的神经网络模型比直流和线性化潮流近似更具表达力。摘要：Nonlinear power flow constraints render a variety of power system optimization problems computationally intractable. Emerging research shows, however, that the nonlinear AC power flow equations can be successfully modeled using Neural Networks (NNs). These NNs can be exactly transformed into Mixed Integer Linear Programs (MILPs) and embedded inside challenging optimization problems, thus replacing nonlinearities that are intractable for many applications with tractable piecewise linear approximations. Such approaches, though, suffer from an explosion of the number of binary variables needed to represent the NN. Accordingly, this paper develops a technique for training an "optimally compact" NN, i.e., one that can represent the power flow equations with a sufficiently high degree of accuracy while still maintaining a tractable number of binary variables. We show that the resulting NN model is more expressive than both the DC and linearized power flow approximations when embedded inside of a challenging optimization problem (i.e., the AC unit commitment problem).

【7】 Learning to Recommend Using Non-Uniform Data 标题：学习推荐使用非统一数据链接：https://arxiv.org/abs/2110.11248

作者：Wanning Chen,Mohsen Bayati 机构：Learning to Recommend Using Non-Uniform DataWanning ChenGraduate School of Business, Stanford University, eduMohsen BayatiGraduate School of Business 摘要：根据用户过去的购买或评论了解用户对产品的偏好是现代推荐引擎的基石。这项学习任务的一个复杂之处是，一些用户更有可能购买或审查产品，而一些产品更有可能由用户购买或审查。这种非均匀模式降低了许多现有推荐算法的性能，因为它们假设观测数据在用户-产品对之间均匀随机采样。此外，关于非均匀性建模的现有文献要么假设用户兴趣独立于产品，要么缺乏理论理解。在本文中，我们首先将用户产品偏好建模为具有非均匀观测模式的部分观测矩阵。接下来，在低秩矩阵估计文献的基础上，我们引入了一种新的加权迹范数惩罚回归来预测矩阵的未观测值。然后，我们证明了我们提出的方法的预测误差的上界。我们的上界是基于特定权重矩阵的多个参数的函数，该矩阵取决于用户和产品的联合分布。利用这一观察结果，我们引入了一个新的优化问题来选择一个使预测误差上界最小化的权重矩阵。NU建议，最终产品是一种新的估计器，在合成数据集和真实数据集上都优于现有方法。摘要：Learning user preferences for products based on their past purchases or reviews is at the cornerstone of modern recommendation engines. One complication in this learning task is that some users are more likely to purchase products or review them, and some products are more likely to be purchased or reviewed by the users. This non-uniform pattern degrades the power of many existing recommendation algorithms, as they assume that the observed data is sampled uniformly at random among user-product pairs. In addition, existing literature on modeling non-uniformity either assume user interests are independent of the products, or lack theoretical understanding. In this paper, we first model the user-product preferences as a partially observed matrix with non-uniform observation pattern. Next, building on the literature about low-rank matrix estimation, we introduce a new weighted trace-norm penalized regression to predict unobserved values of the matrix. We then prove an upper bound for the prediction error of our proposed approach. Our upper bound is a function of a number of parameters that are based on a certain weight matrix that depends on the joint distribution of users and products. Utilizing this observation, we introduce a new optimization problem to select a weight matrix that minimizes the upper bound on the prediction error. The final product is a new estimator, NU-Recommend, that outperforms existing methods in both synthetic and real datasets.

【8】 User-Level Private Learning via Correlated Sampling 标题：基于相关抽样的用户级私人学习链接：https://arxiv.org/abs/2110.11208

作者：Badih Ghazi,Ravi Kumar,Pasin Manurangsi 机构：Google, Mountain View, CA. 备注：To appear in NeurIPS 2021 摘要：大多数关于差异隐私学习（DP）的工作都集中在每个用户都有一个样本的设置上。在这项工作中，我们考虑每个用户持有$M $样本的设置，并在每个用户数据的级别上实施隐私保护。我们表明，在这种情况下，我们可以用更少的用户学习。具体地说，我们表明，只要每个用户接收到足够多的样本，我们就可以通过$（epsilon、delta）$-DP算法，仅使用$O（log（1/delta）/epsilon）$用户学习任何私人可学习的类。对于$epsilon$-DP算法，我们表明，即使在局部模型中，$d$是概率表示维，我们也只能使用$O{epsilon}（d）$用户进行学习。在这两种情况下，我们在所需的用户数量上显示了一个几乎匹配的下限。我们的结果的一个关键部分是对全球稳定性的概括[Bun等人，FOCS 2020]，允许使用公共随机性。在这个宽松的概念下，我们采用了相关抽样策略来证明全局稳定性可以被提升到任意接近1的水平，在样本数量上以多项式为代价。摘要：Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample. In this work, we consider the setting where each user holds $m$ samples and the privacy protection is enforced at the level of each user's data. We show that, in this setting, we may learn with a much fewer number of users. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an $(epsilon, delta)$-DP algorithm using only $O(log(1/delta)/epsilon)$ users. For $epsilon$-DP algorithms, we show that we can learn using only $O_{epsilon}(d)$ users even in the local model, where $d$ is the probabilistic representation dimension. In both cases, we show a nearly-matching lower bound on the number of users required. A crucial component of our results is a generalization of global stability [Bun et al., FOCS 2020] that allows the use of public randomness. Under this relaxed notion, we employ a correlated sampling strategy to show that the global stability can be boosted to be arbitrarily close to one, at a polynomial expense in the number of samples.

【9】 RoMA: a Method for Neural Network Robustness Measurement and Assessment 标题：ROMA：一种神经网络健壮性度量与评估方法链接：https://arxiv.org/abs/2110.11088

作者：Natan Levy,Guy Katz 机构：School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel 摘要：神经网络模型已经成为许多任务的主要解决方案，如分类、语言处理、蛋白质折叠等。然而，它们的可靠性受到敌对输入的严重影响：小的输入扰动会导致模型产生错误的输出。当系统的环境表现为随机行为时，即使没有恶意对手，也会自然产生对抗性输入，并且当试图在关键系统中部署神经网络时，这是一个严重的问题。在本文中，我们提出了一种新的统计方法，称为稳健性测量和评估（RoMA），它可以测量神经网络模型的预期稳健性。具体而言，RoMA确定随机输入扰动可能导致误分类的概率。该方法允许我们提供正式的保证，以确保经过训练的模型在部署后会遇到预期的错误频率。我们的方法可以应用于大规模的黑盒神经网络，这与最近提出的验证方法相比是一个显著的优势。我们以两种方式应用我们的方法：比较不同模型的鲁棒性，以及测量输入扰动的大小如何影响模型的鲁棒性。通过这项工作获得的一个有趣的见解是，在分类网络中，不同的输出标签可以表现出非常不同的鲁棒性级别。我们将这种现象称为范畴鲁棒性。我们在分类基础上进行风险和稳健性评估的能力为风险缓解打开了大门，这可能证明是在安全关键应用中实现神经网络认证的重要一步。摘要：Neural network models have become the leading solution for a large variety of tasks, such as classification, language processing, protein folding, and others. However, their reliability is heavily plagued by adversarial inputs: small input perturbations that cause the model to produce erroneous outputs. Adversarial inputs can occur naturally when the system's environment behaves randomly, even in the absence of a malicious adversary, and are a severe cause for concern when attempting to deploy neural networks within critical systems. In this paper, we present a new statistical method, called Robustness Measurement and Assessment (RoMA), which can measure the expected robustness of a neural network model. Specifically, RoMA determines the probability that a random input perturbation might cause misclassification. The method allows us to provide formal guarantees regarding the expected frequency of errors that a trained model will encounter after deployment. Our approach can be applied to large-scale, black-box neural networks, which is a significant advantage compared to recently proposed verification methods. We apply our approach in two ways: comparing the robustness of different models, and measuring how a model's robustness is affected by the magnitude of input perturbation. One interesting insight obtained through this work is that, in a classification network, different output labels can exhibit very different robustness levels. We term this phenomenon categorial robustness. Our ability to perform risk and robustness assessments on a categorial basis opens the door to risk mitigation, which may prove to be a significant step towards neural network certification in safety-critical applications.

【10】 Continuous Authentication Using Mouse Movements, Machine Learning, and Minecraft 标题：使用鼠标移动、机器学习和我的世界的持续身份验证链接：https://arxiv.org/abs/2110.11080

作者：Nyle Siddiqui,Rushit Dave,Naeem Seliya 机构：Department of Computer Science, University of Wisconsin - Eau Claire, Eau Claire, US, University of Wisconsin – Eau Claire 摘要：鼠标动力学作为一种新型的、不可复制的行为生物识别技术，已经越来越受欢迎。在当前的文献中，包含来自用户的一般无限制鼠标移动的数据集很少。2016年生产的Balabit鼠标动力学数据集是为数据科学竞赛制作的，尽管存在一些缺点，但被认为是第一个公开的鼠标动力学数据集。像Balabit那样以单调的管理方式收集鼠标移动可能会无意中使数据同质化，也不能代表现实世界的应用场景。本文介绍了一个新的鼠标动力学数据集，收集了10名用户在台式计算机上玩电子游戏Minecraft的过程。为每个用户创建二进制随机森林（RF）分类器，以检测特定用户移动和冒名顶替者移动之间的差异。提出了两种评估方案来评估这些分类器的性能；一个场景在所有评估指标上都优于以前的工作，达到了92%的平均准确率，而另一个场景成功地报告了冒名顶替者虚假身份验证的减少。摘要：Mouse dynamics has grown in popularity as a novel irreproducible behavioral biometric. Datasets which contain general unrestricted mouse movements from users are sparse in the current literature. The Balabit mouse dynamics dataset produced in 2016 was made for a data science competition and despite some of its shortcomings, is considered to be the first publicly available mouse dynamics dataset. Collecting mouse movements in a dull administrative manner as Balabit does may unintentionally homogenize data and is also not representative of realworld application scenarios. This paper presents a novel mouse dynamics dataset that has been collected while 10 users play the video game Minecraft on a desktop computer. Binary Random Forest (RF) classifiers are created for each user to detect differences between a specific users movements and an imposters movements. Two evaluation scenarios are proposed to evaluate the performance of these classifiers; one scenario outperformed previous works in all evaluation metrics, reaching average accuracy rates of 92%, while the other scenario successfully reported reduced instances of false authentications of imposters.

【11】 Interpretable Machine Learning for Resource Allocation with Application to Ventilator Triage 标题：资源分配的可解释机器学习及其在呼吸机分诊中的应用链接：https://arxiv.org/abs/2110.10994

作者：Julien Grand-Clément,Carri Chan,Vineet Goyal,Elizabeth Chuang 机构：IEOR Department, Columbia University, Albert Einstein College of Medicine 摘要：在大流行、自然灾害或大规模伤亡事件期间，政策制定者和提供者可能被迫做出医疗资源配给这一具有挑战性的决定。对稀缺救生资源进行分类的明确指导方针必须旨在促进透明度、信任和一致性。为了便于在高压力情况下购买和使用，这些指南需要具有可解释性和可操作性。我们提出了一种新的数据驱动模型来计算基于马尔可夫决策过程策略的可解释的分类准则，该决策过程可以表示为简单的决策树序列（“树策略”）。特别地，我们刻画了最优树策略的性质，并提出了一种基于动态规划递归的算法来计算好的树策略。基于Montefiore医院的真实患者数据，我们利用该方法为新冠病毒-19患者获得简单、新颖的呼吸机分配分类指南。我们还将我们的指南与2015年制定的纽约州官方指南（早在新冠病毒-19大流行之前）的绩效进行了比较。我们的实证研究表明，使用我们的政策，与呼吸机短缺相关的过度死亡人数可以显著减少。我们的工作突出了现有官方分类指南的局限性，在成功部署之前，需要专门针对新冠病毒-19进行调整。摘要：Rationing of healthcare resources is a challenging decision that policy makers and providers may be forced to make during a pandemic, natural disaster, or mass casualty event. Well-defined guidelines to triage scarce life-saving resources must be designed to promote transparency, trust, and consistency. To facilitate buy-in and use during high-stress situations, these guidelines need to be interpretable and operational. We propose a novel data-driven model to compute interpretable triage guidelines based on policies for Markov Decision Process that can be represented as simple sequences of decision trees ("tree policies"). In particular, we characterize the properties of optimal tree policies and present an algorithm based on dynamic programming recursions to compute good tree policies. We utilize this methodology to obtain simple, novel triage guidelines for ventilator allocations for COVID-19 patients, based on real patient data from Montefiore hospitals. We also compare the performance of our guidelines to the official New York State guidelines that were developed in 2015 (well before the COVID-19 pandemic). Our empirical study shows that the number of excess deaths associated with ventilator shortages could be reduced significantly using our policy. Our work highlights the limitations of the existing official triage guidelines, which need to be adapted specifically to COVID-19 before being successfully deployed.

【12】 Learning OFDM Waveforms with PAPR and ACLR Constraints 标题：具有PAPR和ACLR约束的OFDM波形学习链接：https://arxiv.org/abs/2110.10987

作者：Mathieu Goutay,Fayçal Ait Aoudia,Jakob Hoydis,Jean-Marie Gorce 机构：Senior Member, IEEE 摘要：未来通信系统的一个有吸引力的研究方向是设计既能支持高吞吐量又能呈现有利信号特征的新波形。尽管大多数现代系统使用正交频分复用（OFDM）进行有效均衡，但该波形受到多个限制，例如高相邻信道泄漏率（ACLR）和高峰值平均功率比（PAPR）。在本文中，我们提出了一种基于学习的方法来设计基于OFDM的波形，以满足选定的约束条件，同时最大化可实现的信息速率。为此，我们将发射机和接收机建模为卷积神经网络（CNN），分别实现高维调制方案并执行发射比特的检测。这将导致使用增广拉格朗日方法解决的优化问题。评估结果表明，与音调保留（TR）基线相比，端到端系统能够满足目标PAPR和ACLR约束，并允许显著的吞吐量增益。另一个优点是不需要专门的飞行员。摘要：An attractive research direction for future communication systems is the design of new waveforms that can both support high throughputs and present advantageous signal characteristics. Although most modern systems use orthogonal frequency-division multiplexing (OFDM) for its efficient equalization, this waveform suffers from multiple limitations such as a high adjacent channel leakage ratio (ACLR) and high peak-to-average power ratio (PAPR). In this paper, we propose a learning-based method to design OFDM-based waveforms that satisfy selected constraints while maximizing an achievable information rate. To that aim, we model the transmitter and the receiver as convolutional neural networks (CNNs) that respectively implement a high-dimensional modulation scheme and perform the detection of the transmitted bits. This leads to an optimization problem that is solved using the augmented Lagrangian method. Evaluation results show that the end-to-end system is able to satisfy target PAPR and ACLR constraints and allows significant throughput gains compared to a tone reservation (TR) baseline. An additional advantage is that no dedicated pilots are needed.

【13】 A channel attention based MLP-Mixer network for motor imagery decoding with EEG 标题：一种基于通道注意力的运动图像脑电解码MLP-Mixer网络链接：https://arxiv.org/abs/2110.10939

作者：Yanbin He,Zhiyang Lu,Jun Wang,Jun Shi 机构：School of Communication and Information Engineering, Shanghai University, Shanghai, China. 摘要：卷积神经网络（CNN）及其变体已成功应用于基于脑电图（EEG）的运动想象（MI）解码任务。然而，这些基于CNN的算法通常在感知EEG信号的全局时间依赖性方面存在局限性。此外，他们还忽略了不同脑电通道对分类任务的不同贡献。为了解决这些问题，提出了一种新的基于通道注意的MLP混合网络（CAMLP-Net），用于基于EEG的MI解码。具体而言，该网络采用基于MLP的体系结构来捕获时间和空间信息。注意机制被进一步嵌入到MLP混合器中，以自适应地利用不同EEG通道的重要性。因此，所提出的CAMLP网络可以有效地学习更多的全球时空信息。在新建的MI-2数据集上的实验结果表明，我们提出的CAMLP网络比所有比较算法都具有更好的分类性能。摘要：Convolutional neural networks (CNNs) and their variants have been successfully applied to the electroencephalogram (EEG) based motor imagery (MI) decoding task. However, these CNN-based algorithms generally have limitations in perceiving global temporal dependencies of EEG signals. Besides, they also ignore the diverse contributions of different EEG channels to the classification task. To address such issues, a novel channel attention based MLP-Mixer network (CAMLP-Net) is proposed for EEG-based MI decoding. Specifically, the MLP-based architecture is applied in this network to capture the temporal and spatial information. The attention mechanism is further embedded into MLP-Mixer to adaptively exploit the importance of different EEG channels. Therefore, the proposed CAMLP-Net can effectively learn more global temporal and spatial information. The experimental results on the newly built MI-2 dataset indicate that our proposed CAMLP-Net achieves superior classification performance over all the compared algorithms.

【14】 Can Q-learning solve Multi Armed Bantids? 标题：Q-Learning能解决多武装冲突吗？链接：https://arxiv.org/abs/2110.10934

作者：Refael Vivanti 备注：arXiv admin note: text overlap with arXiv:1905.10144 摘要：当强化学习（RL）方法必须仅通过查看收到的奖励来决定多个可选策略时，它必须隐式优化多武装强盗（MAB）问题。这就产生了一个问题：当前的RL算法是否能够解决MAB问题？我们声称，令人惊讶的答案是否定的。在我们的实验中，我们表明，在某些情况下，他们无法解决基本的MAB问题，在许多常见情况下，他们很难解决：他们在训练过程中出现结果回归、对初始化的敏感性和高样本复杂性。我们认为，这源于政策之间的差异，这导致了两个问题：第一个问题是“无聊的政策陷阱”，每个政策都有不同的内隐探索取决于其报酬差异，而留下无聊的或低差异的政策的可能性较小，因为其内隐探索较低。第二个问题是“操纵性顾问”问题，其中DQN或deep Actor Critic方法等深度RL算法中使用的值估计函数最大化估计精度而不是平均回报，并且在低方差策略中具有更好的损失，这导致网络收敛到次优策略。对人类的认知实验表明，有噪音的奖励信号可能会矛盾地提高表现。我们用前面提到的问题来解释这一点，声称人类和算法在决策过程中可能面临相似的挑战。受这一结果的启发，我们提出了自适应对称报酬噪声（ASRN）方法，即在不同策略之间均衡报酬方差，从而在不影响环境平均报酬行为的情况下避免这两个问题。我们证明了ASRN方案可以显著改善结果。摘要：When a reinforcement learning (RL) method has to decide between several optional policies by solely looking at the received reward, it has to implicitly optimize a Multi-Armed-Bandit (MAB) problem. This arises the question: are current RL algorithms capable of solving MAB problems? We claim that the surprising answer is no. In our experiments we show that in some situations they fail to solve a basic MAB problem, and in many common situations they have a hard time: They suffer from regression in results during training, sensitivity to initialization and high sample complexity. We claim that this stems from variance differences between policies, which causes two problems: The first problem is the "Boring Policy Trap" where each policy have a different implicit exploration depends on its rewards variance, and leaving a boring, or low variance, policy is less likely due to its low implicit exploration. The second problem is the "Manipulative Consultant" problem, where value-estimation functions used in deep RL algorithms such as DQN or deep Actor Critic methods, maximize estimation precision rather than mean rewards, and have a better loss in low-variance policies, which cause the network to converge to a sub-optimal policy. Cognitive experiments on humans showed that noised reward signals may paradoxically improve performance. We explain this using the aforementioned problems, claiming that both humans and algorithms may share similar challenges in decision making. Inspired by this result, we propose the Adaptive Symmetric Reward Noising (ASRN) method, by which we mean equalizing the rewards variance across different policies, thus avoiding the two problems without affecting the environment's mean rewards behavior. We demonstrate that the ASRN scheme can dramatically improve the results.

【15】 Quantum field theories, Markov random fields and machine learning 标题：量子场论、马尔可夫随机场与机器学习链接：https://arxiv.org/abs/2110.10928

作者：Dimitrios Bachtis,Gert Aarts,Biagio Lucini 机构：Department of Mathematics, Swansea University, Bay Campus, SA,EN, Swansea, Wales, Department of Physics, Swansea University, Singleton Campus, SA,PP, Swansea, Wales, European Centre for Theoretical Studies in Nuclear Physics and Related Areas (ECT) & 备注：Contribution submitted to the CCP2021: XXXII IUPAP Conference on Computational Physics, Coventry University, United Kingdom. arXiv admin note: substantial text overlap with arXiv:2109.07730 摘要：向欧几里德空间的过渡以及量子场论在空间或时空格上的离散化为从量子场论的角度研究概率机器学习提供了机会。在这里，我们将讨论如何在马尔可夫随机场的数学框架内重铸离散化的欧几里德场论，马尔可夫随机场是一类著名的概率图形模型，应用于包括机器学习在内的各种研究领域。具体地说，我们将证明方格上的$phi^{4}$标量场理论满足Hammersley-Clifford定理，因此将其重新描述为一个马尔可夫随机场，神经网络从该随机场衍生而来。然后，我们将讨论与最小化$phi^{4}$机器学习算法的概率分布和目标概率分布之间的不对称距离相关的应用。摘要：The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning from the perspective of quantum field theory. Here, we will discuss how discretized Euclidean field theories can be recast within the mathematical framework of Markov random fields, which is a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. Specifically, we will demonstrate that the $phi^{4}$ scalar field theory on a square lattice satisfies the Hammersley-Clifford theorem, therefore recasting it as a Markov random field from which neural networks are additionally derived. We will then discuss applications pertinent to the minimization of an asymmetric distance between the probability distribution of the $phi^{4}$ machine learning algorithms and that of target probability distributions.

【16】 Finite Volume Least-Squares Neural Network (FV-LSNN) Method for Scalar Nonlinear Hyperbolic Conservation Laws 标题：标量非线性双曲守恒律的有限体积最小二乘神经网络(FV-LSNN)方法链接：https://arxiv.org/abs/2110.10895

作者：Zhiqiang Cai,Jingshuang Chen,Min Liu 机构： University Street 备注：arXiv admin note: text overlap with arXiv:2105.11627 摘要：在[4]中，我们介绍了用于求解具有不连续解的线性平流反应问题的最小二乘ReLU神经网络（LSNN）方法，并表明LSNN方法的自由度明显少于传统的基于网格的方法。LSNN方法是一类具有ReLU激活函数的神经网络函数中等效最小二乘（LS）公式的离散化；通过数值积分和适当的数值微分对LS泛函进行了求值。通过对散度算子发展一种新的有限体积近似（FVA），研究了标量非线性双曲守恒律的LSNN方法。本文介绍的FVA是为LSNN方法量身定制的，比基于网格的数值方法中使用的传统、经过充分研究的FV格式更精确。一些凸通量和非凸通量的基准测试问题的数值结果表明，有限体积LSNN（FV-LSNN）方法能够计算稀疏波问题的物理解，并通过ReLU神经网络的自由超平面自动捕捉潜在问题的冲击。此外，该方法不存在沿不连续界面的常见吉布斯现象。摘要：In [4], we introduced the least-squares ReLU neural network (LSNN) method for solving the linear advection-reaction problem with discontinuous solution and showed that the number of degrees of freedom for the LSNN method is significantly less than that of traditional mesh-based methods. The LSNN method is a discretization of an equivalent least-squares (LS) formulation in the class of neural network functions with the ReLU activation function; and evaluation of the LS functional is done by using numerical integration and proper numerical differentiation. By developing a novel finite volume approximation (FVA) to the divergence operator, this paper studies the LSNN method for scalar nonlinear hyperbolic conservation laws. The FVA introduced in this paper is tailored to the LSNN method and is more accurate than traditional, well-studied FV schemes used in mesh-based numerical methods. Numerical results of some benchmark test problems with both convex and non-convex fluxes show that the finite volume LSNN (FV-LSNN) method is capable of computing the physical solution for problems with rarefaction waves and capturing the shock of the underlying problem automatically through the free hyper-planes of the ReLU neural network. Moreover, the method does not exhibit the common Gibbs phenomena along the discontinuous interface.

【17】 Deep Generative Models in Engineering Design: A Review 标题：工程设计中的深度生成模型研究综述链接：https://arxiv.org/abs/2110.10863

作者：Lyle Regenwetter,Amin Heyrani Nobari,Faez Ahmed 机构：Dept. of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 摘要：自动化设计合成有可能彻底改变现代人类设计过程，并改善无数行业中高度优化和定制产品的使用。将生成性机器学习成功地应用于设计工程，可能是此类自动化设计综合的关键，也是一个非常重要的研究课题。我们对工程设计中的深层生成学习模型进行了回顾和分析。深度生成模型（DGM）通常利用深度网络从输入数据集学习，并学习合成新设计。最近，DGM如生成对抗网络（GANs）、变分自动编码器（VAEs）、前馈神经网络（NNs）和某些深度强化学习（DRL）框架在结构优化、材料设计和形状综合等设计应用中显示出了良好的结果。自2016年以来，DGMs在工程设计中的普及率直线上升。由于预期会持续增长，我们对最新进展进行了回顾，希望能使对DGMs设计感兴趣的研究人员受益。我们对当前文献中常用的算法、数据集、表示方法和应用进行了阐述。特别是，我们讨论了在DGMs中引入新技术和方法、成功地将DGMs应用于设计相关领域或通过数据集或辅助方法直接支持DGMs开发的关键工作。我们进一步确定了DGMs目前在设计领域中遇到的主要挑战和限制，如设计创意、处理复杂约束和目标以及同时对形式和功能性能进行建模。在我们的讨论中，我们将可能的解决方案路径确定为未来工作的重点领域。摘要：Automated design synthesis has the potential to revolutionize the modern human design process and improve access to highly optimized and customized products across countless industries. Successfully adapting generative Machine Learning to design engineering may be the key to such automated design synthesis and is a research subject of great importance. We present a review and analysis of Deep Generative Learning models in engineering design. Deep Generative Models (DGMs) typically leverage deep networks to learn from an input dataset and learn to synthesize new designs. Recently, DGMs such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), feedforward Neural Networks (NNs) and certain Deep Reinforcement Learning (DRL) frameworks have shown promising results in design applications like structural optimization, materials design, and shape synthesis. The prevalence of DGMs in Engineering Design has skyrocketed since 2016. Anticipating continued growth, we conduct a review of recent advances with the hope of benefitting researchers interested in DGMs for design. We structure our review as an exposition of the algorithms, datasets, representation methods, and applications commonly used in the current literature. In particular, we discuss key works that have introduced new techniques and methods in DGMs, successfully applied DGMs to a design-related domain, or directly supported development of DGMs through datasets or auxiliary methods. We further identify key challenges and limitations currently seen in DGMs across design fields, such as design creativity, handling complex constraints and objectives, and modeling both form and functional performance simultaneously. In our discussion we identify possible solution pathways as key areas on which to target future work.

【18】 Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization 标题：平均集成：改进模型选择并提高领域概括的性能链接：https://arxiv.org/abs/2110.10832

作者：Devansh Arpit,Huan Wang,Yingbo Zhou,Caiming Xiong 摘要：在域泛化（DG）环境中，在给定训练域集上训练的模型在分布移位测试域上具有众所周知的混沌性能，而优化中的随机性（例如种子）起着重要作用。这使得深度学习模型在现实环境中不可靠。我们首先展示了一个简单的协议，该协议用于沿优化路径平均模型参数，在训练期间尽早开始，通过改善域内验证精度和域外测试精度之间的秩相关性，显著提高了领域泛化，并减少了随机性的影响，这对于可靠的模型选择至关重要。接下来，我们证明了独立训练模型的集合在DG环境中也具有混沌行为。利用我们的观察结果，我们发现，不同运行的移动平均模型（EoA）并没有对未使用的模型进行置乱，而是提高了稳定性并进一步提高了性能。在DomainBed基准测试中，当使用在ImageNet上预先训练过的ResNet-50时，这组平均值在PACS上达到88.6%$，在VLC上达到79.1%$，在OfficeHome上达到72.5%$，在TerraIncognita上达到52.3%$，在DomainNet上达到47.4%$，平均值为68.0%$，比ERM（w/o模型平均值）高出4%$。我们还评估了一个在更大数据集上预训练的模型，其中我们显示EoA的平均准确度为72.7%$，比相应的ERM基线高出5%$。摘要：In Domain Generalization (DG) settings, models trained on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that a simple protocol for averaging model parameters along the optimization path, starting early during training, both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable model selection. Next, we show that an ensemble of independently trained models also has a chaotic behavior in the DG setting. Taking advantage of our observation, we show that instead of ensembling unaveraged models, ensembling moving average models (EoA) from different runs does increase stability and further boosts performance. On the DomainBed benchmark, when using a ResNet-50 pre-trained on ImageNet, this ensemble of averages achieves $88.6%$ on PACS, $79.1%$ on VLCS, $72.5%$ on OfficeHome, $52.3%$ on TerraIncognita, and $47.4%$ on DomainNet, an average of $68.0%$, beating ERM (w/o model averaging) by $sim 4%$. We also evaluate a model that is pre-trained on a larger dataset, where we show EoA achieves an average accuracy of $72.7%$, beating its corresponding ERM baseline by $5%$.

【19】 Shaking the foundations: delusions in sequence models for interaction and control 标题：动摇基础：相互作用和控制的序列模型中的错觉链接：https://arxiv.org/abs/2110.10819

作者：Pedro A. Ortega,Markus Kunesch,Grégoire Delétang,Tim Genewein,Jordi Grau-Moya,Joel Veness,Jonas Buchli,Jonas Degrave,Bilal Piot,Julien Perolat,Tom Everitt,Corentin Tallec,Emilio Parisotto,Tom Erez,Yutian Chen,Scott Reed,Marcus Hutter,Nando de Freitas,Shane Legg 机构：Deepmind Safety Analysis,DeepMind 备注：DeepMind Tech Report, 16 pages, 4 figures 摘要：最近，语言模型的巨大成功为机器学习研究注入了新的活力，Transformer等大序列模型正被应用于各种领域。然而，一个相对难以理解的重要问题是有目的的适应行为。目前，人们普遍认为，序列模型“缺乏对其行为因果的理解”，这导致它们由于自我暗示错觉而得出错误的推论。在本报告中，我们解释了这种不匹配的起源，并表明可以通过将行为视为因果干预来解决。最后，我们证明了在监督学习中，人们可以通过分别使用事实错误信号和反事实错误信号进行训练来教导系统对数据进行条件化或干预。摘要：The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

【20】 Class Incremental Online Streaming Learning 标题：课堂增量式在线流学习链接：https://arxiv.org/abs/2110.10741

作者：Soumya Banerjee,Vinay Kumar Verma,Toufiq Parag,Maneesh Singh,Vinay P. Namboodiri 机构：IIT Kanpur, India, Duke University, USA, Verisk Analytics, NJ, USA, University of Bath, UK 摘要：为了在传统的深度神经网络中实现终身学习，已经开发了多种方法。然而，要想取得成功，这些方法需要一批样本，并在训练期间多次访问。虽然这在静态设置中效果很好，但在数据以emph{online streaming-way}方式到达的更现实的情况下，这些方法仍然会受到影响。我们的经验表明，如果输入是以数据流的形式获得的，并且存在以下限制，则当前方法的性能会下降：$（i）$每个实例一次出现一个，并且只能看到一次，$（ii）$输入数据违反了i.i.d假设，即可能存在基于类的相关性。我们提出了一种新的方法（CIOSL），用于在线流媒体环境中的课堂增量学习，以应对这些挑战。该方法利用隐式和显式双重权重正则化和经验重放。隐式正则化通过知识提取来实现，而显式正则化通过学习缓冲区和当前样本的联合分布来实现参数正则化。此外，我们还提出了一种有效的在线内存重放和替换缓冲区策略，显著提高了模型的性能。大量实验和具有挑战性的数据集上的烧蚀表明了该方法的有效性。摘要：A wide variety of methods have been developed to enable lifelong learning in conventional deep neural networks. However, to succeed, these methods require a `batch' of samples to be available and visited multiple times during training. While this works well in a static setting, these methods continue to suffer in a more realistic situation where data arrives in emph{online streaming manner}. We empirically demonstrate that the performance of current approaches degrades if the input is obtained as a stream of data with the following restrictions: $(i)$ each instance comes one at a time and can be seen only once, and $(ii)$ the input data violates the i.i.d assumption, i.e., there can be a class-based correlation. We propose a novel approach (CIOSL) for the class-incremental learning in an emph{online streaming setting} to address these challenges. The proposed approach leverages implicit and explicit dual weight regularization and experience replay. The implicit regularization is leveraged via the knowledge distillation, while the explicit regularization incorporates a novel approach for parameter regularization by learning the joint distribution of the buffer replay and the current sample. Also, we propose an efficient online memory replay and replacement buffer strategy that significantly boosts the model's performance. Extensive experiments and ablation on challenging datasets show the efficacy of the proposed method.

【21】 Transductive Robust Learning Guarantees 标题：传导性鲁棒学习保证链接：https://arxiv.org/abs/2110.10602

作者：Omar Montasser,Steve Hanneke,Nathan Srebro 机构：Toyota Technological Institute at Chicago, Purdue University 摘要：我们研究了在转换环境下的逆向鲁棒学习问题。对于有界VC维的$mathcal{H}$类，我们提出了一个简单的转换学习器，当呈现一组有标记的训练示例和一组无标记的测试示例（两组都可能受到逆变量扰动）时，它用鲁棒错误率正确地标记测试示例，该鲁棒错误率在VC维中是线性的，并且能够适应扰动集的复杂性。这一结果提供了一个指数级的改善，依赖于VC维度，超过了归纳设置中鲁棒误差的最已知上界，其代价是与更严格的最优鲁棒误差概念竞争。摘要：We study the problem of adversarially robust learning in the transductive setting. For classes $mathcal{H}$ of bounded VC dimension, we propose a simple transductive learner that when presented with a set of labeled training examples and a set of unlabeled test examples (both sets possibly adversarially perturbed), it correctly labels the test examples with a robust error rate that is linear in the VC dimension and is adaptive to the complexity of the perturbation set. This result provides an exponential improvement in dependence on VC dimension over the best known upper bound on the robust error in the inductive setting, at the expense of competing with a more restrictive notion of optimal robust error.

【22】 Color Teams for Machine Learning Development 标题：机器学习开发中的颜色团队链接：https://arxiv.org/abs/2110.10601

作者：Josh Kalin,David Noever,Matthew Ciolino 机构：Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, PeopleTec, Inc, Huntsville, AL, USA 备注：8 Pages, 6 Figures 摘要：机器学习和软件开发共享向客户可靠交付产品的过程和方法。这项工作建议使用一种新的团队结构来组建机器学习团队，以便更好地打击敌对攻击者。在网络安全方面，基础设施使用这些团队来保护他们的系统，通过使用系统建设者和程序员来为他们的平台提供更健壮的功能。颜色团队为每个团队中的个人提供明确的责任，负责管道基线（黄色）、攻击（红色）和防御（蓝色）的哪一部分。组合颜色可以在整个团队中共享更多的知识，并在开发过程中构建更健壮的模型。本文将概述橙色、绿色和紫色新团队的职责，并概述这些团队取得成功所需的资源。摘要：Machine learning and software development share processes and methodologies for reliably delivering products to customers. This work proposes the use of a new teaming construct for forming machine learning teams for better combatting adversarial attackers. In cybersecurity, infrastructure uses these teams to protect their systems by using system builders and programmers to also offer more robustness to their platforms. Color teams provide clear responsibility to the individuals on each team for which part of the baseline (Yellow), attack (Red), and defense (Blue) breakout of the pipeline. Combining colors leads to additional knowledge shared across the team and more robust models built during development. The responsibilities of the new teams Orange, Green, and Purple will be outlined during this paper along with an overview of the necessary resources for these teams to be successful.

【23】 Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs 标题：对预先训练好的模型进行排序和调整：开发模型中心的新范式链接：https://arxiv.org/abs/2110.10545

作者：Kaichao You,Yong Liu,Jianmin Wang,Michael I. Jordan,Mingsheng Long 机构： School of Software, BNRist, Tsinghua University, Beijing , China., Division of Computer Science and Department of Statistics, UC Berkeley, CA ,-, USA 备注：45 pages 摘要：预训练模型中心和许多预训练模型（PTM）是深度学习的基石。尽管构建成本很高，但实际上它们还没有被充分利用：实践者通常根据受欢迎程度从提供的模型中心选择一个PTM，然后微调PTM以解决目标任务。这种简单但常见的做法对充分利用预先训练的模型中心造成了两个障碍：（1）PTM选择程序没有最优保证；（2）只有一个PTM被使用，而其余的PTM被忽略。理想情况下，为了最大限度地利用预先训练好的模型集线器，需要尝试所有的PTM组合并对每个PTM组合进行广泛的微调，这会导致指数组合和无法负担的计算预算。在本文中，我们提出了一种利用模型集线器的新范式通过对预先训练的模型进行排序和调整的中心：（1）我们的会议工作~citep{you_logme:_2021}建议使用logme来估计预先训练的模型提取的特征的标签证据的最大值，它可以对模型中心中各种类型的PTM和任务的所有PTM进行排序emph{在微调之前}。（2）如果我们对模型的体系结构没有偏好，则可以对排名最佳的PTM进行微调和部署，或者通过提出的B-调优算法，由排名前K的PTM对目标PTM进行调优。排名部分基于会议论文，我们完成了理论分析（启发式证据最大化过程的收敛性证明，以及特征维数的影响）调优部分介绍了一种新的贝叶斯调优（B-调优）用于多PTM调优的方法，它超越了为同质PTM调优而设计的专用方法，并为异构PTM调优建立了新的技术状态。我们相信，开发PTM集线器的新范式可以吸引社区的大量受众。摘要：Pre-trained model hubs with many pre-trained models (PTMs) have been a cornerstone in deep learning. Although built at a high cost, they are in fact emph{under-exploited}: practitioners usually pick one PTM from the provided model hub by popularity, and then fine-tune the PTM to solve the target task. This na"ve but common practice poses two obstacles to sufficiently exploiting pre-trained model hubs: (1) the PTM selection procedure has no optimality guarantee; (2) only one PTM is used while the rest PTMs are overlooked. Ideally, to maximally exploit pre-trained model hubs, trying all combinations of PTMs and extensively fine-tuning each combination of PTMs are required, which incurs exponential combinations and unaffordable computational budget. In this paper, we propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models: (1) Our conference work~citep{you_logme:_2021} proposed LogME to estimate the maximum value of label evidence given features extracted by pre-trained models, which can rank all the PTMs in a model hub for various types of PTMs and tasks emph{before fine-tuning}. (2) the best ranked PTM can be fine-tuned and deployed if we have no preference for the model's architecture, or the target PTM can be tuned by top-K ranked PTMs via the proposed B-Tuning algorithm. The ranking part is based on the conference paper, and we complete its theoretical analysis (convergence proof of the heuristic evidence maximization procedure, and the influence of feature dimension) in this paper. The tuning part introduces a novel Bayesian Tuning (B-Tuning) method for multiple PTMs tuning, which surpasses dedicated methods designed for homogeneous PTMs tuning and sets up new state of the art for heterogeneous PTMs tuning. We believe the new paradigm of exploiting PTM hubs can interest a large audience of the community.

【24】 Sampling from Arbitrary Functions via PSD Models 标题：通过PSD模型对任意函数进行采样链接：https://arxiv.org/abs/2110.10527

作者：Ulysse Marteau-Ferey,Alessandro Rudi,Francis Bach 机构：INRIA - Département d’Informatique de l’École Normale Supérieure, PSL Research University, Paris, France 摘要：在应用统计学和机器学习的许多领域中，从给定分布生成任意数量的独立同分布（i.i.d.）样本是一项关键任务。当仅通过密度的评估来了解分布时，当前的方法要么随维度严重扩展，要么需要非常复杂的实现。相反，我们采取两步方法，首先对概率分布进行建模，然后从该模型中进行采样。我们使用最近引入的一类半正定（PSD）模型，这类模型已被证明对近似概率密度是有效的。我们证明了这些模型可以使用很少的评估来简洁地近似一大类密度，并给出了一个简单的算法来有效地从这些模型中采样。我们还提供了初步的实证结果来说明我们的主张。摘要：In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implementations. Instead, we take a two-step approach by first modeling the probability distribution and then sampling from that model. We use the recently introduced class of positive semi-definite (PSD) models, which have been shown to be efficient for approximating probability densities. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models. We also present preliminary empirical results to illustrate our assertions.

【25】 A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays 标题：一个TinyML量化潜伏重放的设备上持续学习平台链接：https://arxiv.org/abs/2110.10486

作者：Leonardo Ravaglia,Manuele Rusci,Davide Nadalini,Alessandro Capotondi,Francesco Conti,Luca Benini 备注：14 pages 摘要：在过去几年中，针对超低功耗设备的深度学习模型和技术的研究和开发简言之，TinyML主要关注一种先训练后部署的假设，静态模型如果没有基于云的数据收集和微调，就无法适应新收集的数据。基于潜在重播的持续学习（CL）技术[1]原则上支持在线、无服务器的自适应，但对于超低功耗TinyML设备（通常基于微控制器）来说，它们仍然过于依赖计算和内存。在这项工作中，我们介绍了一个基于支持10核FP32的并行超低功耗（PALL）处理器的端到端CL硬件/软件平台。我们重新考虑了基线潜在重放CL算法，利用模型冻结阶段的量化和潜在重放（LRs）减少其内存开销，同时对准确性的影响最小。特别是，与全精度基线实现相比，LR内存的8位压缩几乎是无损的（3000LR为0.26%），但需要的内存少4倍，而7位也可以使用，同时精度降低最小（高达5%）。我们还介绍了纸浆处理器上向前和向后传播的优化原语。我们的研究结果表明，通过结合这些技术，可以在实践中使用不到64MB的内存来实现持续学习，这与TinyML设备中的嵌入量相当。在我们平台的高级22nm原型（称为VEGA）上，提出的解决方案比低功耗STM32 L4微控制器平均执行速度快65倍，每分钟学习一次新的小批量数据时，在535h的使用寿命内，能量效率高37倍。摘要：In the last few years, research and development on Deep Learning models and techniques for ultra-low-power devices in a word, TinyML has mainly focused on a train-then-deploy assumption, with static models that cannot be adapted to newly collected data without cloud-based data collection and fine-tuning. Latent Replay-based Continual Learning (CL) techniques[1] enable online, serverless adaptation in principle, but so farthey have still been too computation and memory-hungry for ultra-low-power TinyML devices, which are typically based on microcontrollers. In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor. We rethink the baseline Latent Replay CL algorithm, leveraging quantization of the frozen stage of the model and Latent Replays (LRs) to reduce their memory cost with minimal impact on accuracy. In particular, 8-bit compression of the LR memory proves to be almost lossless (-0.26% with 3000LR) compared to the full-precision baseline implementation, but requires 4x less memory, while 7-bit can also be used with an additional minimal accuracy degradation (up to 5%). We also introduce optimized primitives for forward and backward propagation on the PULP processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory an amount compatible with embedding in TinyML devices. On an advanced 22nm prototype of our platform, called VEGA, the proposed solution performs onaverage 65x faster than a low-power STM32 L4 microcontroller, being 37x more energy efficient enough for a lifetime of 535h when learning a new mini-batch of data once every minute.

【26】 An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR 标题：基于触发注意力的流媒体ASR增强CTC模型研究链接：https://arxiv.org/abs/2110.10402

作者：Huaibo Zhao,Yosuke Higuchi,Tetsuji Ogawa,Tetsunori Kobayashi 机构：∗ Department of Communications and Computer Engineering, Waseda University, Tokyo, Japan 备注：Accepted to APSIPA 2021 摘要：本文尝试将Mask-CTC与触发注意机制相结合，构建一个高性能、低延迟的流式端到端自动语音识别（ASR）系统。触发注意机制执行由CTC尖峰触发的自回归解码，已证明在流式ASR中是有效的。然而，为了保持基于CTC输出的对准估计的高精度（这是其性能的关键），不可避免地需要在某些未来信息输入的情况下执行解码（即，具有更高的延迟）。应当注意，在流式ASR中，期望能够在保持低延迟的同时实现高识别精度。因此，本研究旨在通过引入掩码CTC来实现具有低延迟的高准确度流式ASR，该掩模CTC能够学习预测未来信息（即，可以考虑长期上下文）的特征表示，以用于编码器预训练。使用WSJ数据进行的实验比较表明，与传统的基于触发注意的流式ASR系统相比，该方法在较低延迟的情况下实现了更高的准确性。摘要：In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintain high accuracy of alignment estimation based on CTC outputs, which is the key to its performance, it is inevitable that decoding should be performed with some future information input (i.e., with higher latency). It should be noted that in streaming ASR, it is desirable to be able to achieve high recognition accuracy while keeping the latency low. Therefore, the present study aims to achieve highly accurate streaming ASR with low latency by introducing Mask-CTC, which is capable of learning feature representations that anticipate future information (i.e., that can consider long-term contexts), to the encoder pre-training. Experimental comparisons conducted using WSJ data demonstrate that the proposed method achieves higher accuracy with lower latency than the conventional triggered attention-based streaming ASR system.

【27】 Cascaded Compressed Sensing Networks: A Reversible Architecture for Layerwise Learning 标题：级联压缩传感网络：一种分层学习的可逆结构链接：https://arxiv.org/abs/2110.10379

作者：Weizhi Lu,Mingrui Chen,Kai Guo,Weiyu Li 机构： Guo are with the School of Control Science andEngineering, Shandong University, Li is with the Zhongtai Securities Institute for Financial Studies 摘要：近年来，逐层学习网络的方法因其易于分析而受到越来越多的关注。对于该方法，主要挑战在于通过反向传播网络的全局目标来为每一层导出优化目标。传播问题是不适定的，因为涉及从低维到高维空间的非线性激活的反演。为了解决这个问题，现有的解决方案是学习一个辅助网络来专门传播目标。然而，网络缺乏稳定性，并且导致了网络学习的更高复杂性。在这封信中，我们证明了目标传播可以通过使用压缩感知对网络的每一层进行建模来实现，而不需要辅助网络。实验表明，该方法比基于辅助网络的方法具有更好的性能。摘要：Recently, the method that learns networks layer by layer has attracted increasing interest for its ease of analysis. For the method, the main challenge lies in deriving an optimization target for each layer by inversely propagating the global target of the network. The propagation problem is ill posed, due to involving the inversion of nonlinear activations from lowdimensional to high-dimensional spaces. To address the problem, the existing solution is to learn an auxiliary network to specially propagate the target. However, the network lacks stability, and moreover, it results in higher complexity for network learning. In the letter, we show that target propagation could be achieved by modeling the network s each layer with compressed sensing, without the need of auxiliary networks. Experiments show that the proposed method could achieve better performance than the auxiliary network-based method.

【28】 Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data? 标题：模型组合：可以仅使用未标记的数据将多个神经网络组合成单个网络吗？链接：https://arxiv.org/abs/2110.10369

作者：Amin Banitalebi-Dehkordi,Xinyu Kang,Yong Zhang 机构： Huawei Technologies Canada Co., Ltd., University of British Columbia, Vancouver, Canada 备注：BMVC 2021 摘要：深度学习应用程序、数据集和神经网络体系结构的多样性要求仔细选择与目标应用程序最匹配的体系结构和数据。为了缓解这一困境，本文研究了使用未标记数据组合多个训练神经网络的思想。此外，将多个模型组合成一个模型可以加快推理速度，生成更强大、更有能力的模型，并允许我们选择高效的设备友好型目标网络体系结构。为此，所提出的方法利用从未标记数据收集的可靠伪标签的生成、过滤和聚合。我们的方法支持使用任意结构和类别的任意数量的输入模型。广泛的性能评估表明，我们的方法是非常有效的。例如，对于目标检测任务，在不使用任何地面真值标签的情况下，可以将在Pascal VOC上训练的EfficientSet-D0和在COCO上训练的EfficientSet-D1组合到RetinaNet-ResNet50模型，并具有与监督训练类似的映射。如果在半监督环境中进行微调，则组合模型在1%、5%和10%标签的监督训练中，mAP改善率分别为 18.6%、 12.6%和 8.1%。摘要：The diversity of deep learning applications, datasets, and neural network architectures necessitates a careful selection of the architecture and data that match best to a target application. As an attempt to mitigate this dilemma, this paper investigates the idea of combining multiple trained neural networks using unlabeled data. In addition, combining multiple models into one can speed up the inference, result in stronger, more capable models, and allows us to select efficient device-friendly target network architectures. To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data. Our method supports using an arbitrary number of input models with arbitrary architectures and categories. Extensive performance evaluations demonstrated that our method is very effective. For example, for the task of object detection and without using any ground-truth labels, an EfficientDet-D0 trained on Pascal-VOC and an EfficientDet-D1 trained on COCO, can be combined to a RetinaNet-ResNet50 model, with a similar mAP as the supervised training. If fine-tuned in a semi-supervised setting, the combined model achieves 18.6%, 12.6%, and 8.1% mAP improvements over supervised training with 1%, 5%, and 10% of labels.

【29】 One-Step Abductive Multi-Target Learning with Diverse Noisy Samples 标题：不同噪声样本下的一步外推多目标学习链接：https://arxiv.org/abs/2110.10325

作者：Yongquan Yang 备注：6 pages 摘要：提出了一步诱因多目标学习（OSAMTL）方法来处理复杂的噪声标签。在本文中，给出了不同噪声样本（DNS）的定义，我们提出了一步诱因多目标DNS学习（OSAMTL-DNS），将原始OSAMTL扩展到处理复杂噪声标签的更广泛任务。摘要：One-step abductive multi-target learning (OSAMTL) was proposed to handle complex noisy labels. In this paper, giving definition of diverse noisy samples (DNS), we propose one-step abductive multi-target learning with DNS (OSAMTL-DNS) to expand the original OSAMTL to a wider range of tasks that handle complex noisy labels.

【30】 Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovsky's Theorem 标题：超越Sharkovsky定理的混沌行程神经网络的表现性链接：https://arxiv.org/abs/2110.10295

作者：Clayton Sanford,Vaggos Chatziafratis 机构：Columbia University, edu†Northwestern University 备注：47 pages, 19 figures 摘要：给定一个目标函数$f$，为了接近$f$，神经网络必须有多大？最近的工作从动力系统的角度研究了神经网络的这一基本问题，并为一大系列函数$f$提供了新颖的“深度与宽度”权衡。他们认为，这种权衡取决于$f$中textit{periodic}点或emph{cycles}的存在。我们的工作，通过进一步部署动力系统概念，阐明了周期性和表现性之间更微妙的联系：我们证明了周期点单独导致次优的深度-宽度权衡，我们通过证明某些“混沌路线”给出了更强的指数权衡来改进它们，即使在以前的分析仅暗示多项式缺口的情况下。与之前的工作相反，我们的边界几乎是最优的，随着周期的增加而收紧，并处理不可逼近性的强烈概念（例如，常数$L_1$错误）。更广泛地说，我们确定了一个到{混沌区}的相变，该相变恰好与函数复杂性的其他概念（包括VC维数和拓扑熵）的突变相一致。摘要：Given a target function $f$, how large must a neural network be in order to approximate $f$? Recent works examine this basic question on neural network textit{expressivity} from the lens of dynamical systems and provide novel ``depth-vs-width'' tradeoffs for a large family of functions $f$. They suggest that such tradeoffs are governed by the existence of textit{periodic} points or emph{cycles} in $f$. Our work, by further deploying dynamical systems concepts, illuminates a more subtle connection between periodicity and expressivity: we prove that periodic points alone lead to suboptimal depth-width tradeoffs and we improve upon them by demonstrating that certain ``chaotic itineraries'' give stronger exponential tradeoffs, even in regimes where previous analyses only imply polynomial gaps. Contrary to prior works, our bounds are nearly-optimal, tighten as the period increases, and handle strong notions of inapproximability (e.g., constant $L_1$ error). More broadly, we identify a phase transition to the textit{chaotic regime} that exactly coincides with an abrupt shift in other notions of function complexity, including VC-dimension and topological entropy.

【31】 A Simple Approach to Continual Learning by Transferring Skill Parameters 标题：通过传递技能参数实现持续学习的一种简单方法链接：https://arxiv.org/abs/2110.10255

作者：K. R. Zentner,Ryan Julian,Ujjwal Puri,Yulun Zhang,Gaurav S. Sukhatme 备注：Submitted to ICRA 2022 摘要：为了在现实世界中成为有效的通用机器，机器人不仅需要将其现有的操作技能适应新的环境，还需要在飞行中获得全新的技能。持续学习的一个巨大前景是，通过利用机器人从先前技能中积累的知识和经验，赋予机器人这种能力。我们重新审视这个问题，考虑一个设置，即机器人仅限于以学习技能策略的形式存储知识和经验。我们表明，存储技能策略、仔细的预训练以及适当选择何时转移这些技能策略足以在机器人操作的环境中建立一个持续的学习者。我们分析在具有挑战性的元世界模拟基准中需要哪些条件来转移技能。通过这一分析，我们引入了一个成对的度量相关技能，它允许我们预测任务之间技能转移的有效性，并使用它来减少课程选择中的持续学习问题。在适当的课程中，我们将展示如何在不忘记的情况下不断获得机器人操作技能，并且使用的样本远远少于从头开始训练它们所需的样本。摘要：In order to be effective general purpose machines in real world environments, robots not only will need to adapt their existing manipulation skills to new circumstances, they will need to acquire entirely new skills on-the-fly. A great promise of continual learning is to endow robots with this ability, by using their accumulated knowledge and experience from prior skills. We take a fresh look at this problem, by considering a setting in which the robot is limited to storing that knowledge and experience only in the form of learned skill policies. We show that storing skill policies, careful pre-training, and appropriately choosing when to transfer those skill policies is sufficient to build a continual learner in the context of robotic manipulation. We analyze which conditions are needed to transfer skills in the challenging Meta-World simulation benchmark. Using this analysis, we introduce a pair-wise metric relating skills that allows us to predict the effectiveness of skill transfer between tasks, and use it to reduce the problem of continual learning to curriculum selection. Given an appropriate curriculum, we show how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch.

【32】 More Engineering, No Silos: Rethinking Processes and Interfaces in Collaboration between Interdisciplinary Teams for Machine Learning Projects 标题：更多的工程，没有孤岛：在机器学习项目的跨学科团队之间的协作中重新思考过程和接口链接：https://arxiv.org/abs/2110.10234

作者：Nadia Nahar,Shurui Zhou,Grace Lewis,Christian Kästner 机构：Carnegie Mellon University, Pittsburgh, PA, USA, University of Toronto, Toronto, Ontario, Canada, Carnegie Mellon Software Engineering Institute 备注：22 pages, 10 figures, 5 tables 摘要：The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges.

【33】 Learning Equivariances and Partial Equivariances from Data 标题：从数据中学习等差和偏等差链接：https://arxiv.org/abs/2110.10211

作者：David W. Romero,Suhas Lohit 机构：Vrije Universiteit Amsterdam, Amsterdam, The Netherlands, Mitsubishi Electric Research Laboratories, Cambridge, MA, USA 摘要：Group equivariant Convolutional Neural Networks (G-CNNs) constrain features to respect the chosen symmetries, and lead to better generalization when these symmetries appear in the data. However, if the chosen symmetries are not present, group equivariant architectures lead to overly constrained models and worse performance. Frequently, the distribution of the data can be better represented by a subset of a group than by the group as a whole, e.g., rotations in $[-90^{circ}, 90^{circ}]$. In such cases, a model that respects equivariance partially is better suited to represent the data. Moreover, relevant symmetries may differ for low and high-level features, e.g., edge orientations in a face, and face poses relative to the camera. As a result, the optimal level of equivariance may differ per layer. In this work, we introduce Partial G-CNNs: a family of equivariant networks able to learn partial and full equivariances from data at every layer end-to-end. Partial G-CNNs retain full equivariance whenever beneficial, e.g., for rotated MNIST, but are able to restrict it whenever it becomes harmful, e.g., for 6~/~9 or natural image classification. Partial G-CNNs perform on par with G-CNNs when full equivariance is necessary, and outperform them otherwise. Our method is applicable to discrete groups, continuous groups and combinations thereof.

【34】 StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects 标题：StructFormer：用于语言引导的新奇物体语义重排的空间结构学习链接：https://arxiv.org/abs/2110.10189

作者：Weiyu Liu,Chris Paxton,Tucker Hermans,Dieter Fox 机构：Rearrange objects that are smaller than the green glass pan, tower, top, left, west, line, top, left, large, circle, top, right, large, north, Rearrange objects that have the same color as the glass stapler, tower, top, right, west, line, bottom, left, large 摘要：Geometric organization of objects into semantically meaningful arrangements pervades the built world. As such, assistive robots operating in warehouses, offices, and homes would greatly benefit from the ability to recognize and rearrange objects into these semantically meaningful structures. To be useful, these robots must contend with previously unseen objects and receive instructions without significant programming. While previous works have examined recognizing pairwise semantic relations and sequential manipulation to change these simple relations none have shown the ability to arrange objects into complex structures such as circles or table settings. To address this problem we propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement and a structured language command encoding the desired object configuration. We show through rigorous experiments that StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures with multi-object relational constraints inferred from the language command.

【35】 Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality 标题：迭代挡路粒子过滤高维参数学习：战胜维度诅咒链接：https://arxiv.org/abs/2110.10745

作者：Ning Ning,Edward L. Ionides 摘要：Parameter learning for high-dimensional, partially observed, and nonlinear stochastic processes is a methodological challenge. Spatiotemporal disease transmission systems provide examples of such processes giving rise to open inference problems. We propose the iterated block particle filter (IBPF) algorithm for learning high-dimensional parameters over graphical state space models with general state spaces, measures, transition densities and graph structure. Theoretical performance guarantees are obtained on beating the curse of dimensionality (COD), algorithm convergence, and likelihood maximization. Experiments on a highly nonlinear and non-Gaussian spatiotemporal model for measles transmission reveal that the iterated ensemble Kalman filter algorithm (Li et al. (2020)) is ineffective and the iterated filtering algorithm (Ionides et al. (2015)) suffers from the COD, while our IBPF algorithm beats COD consistently across various experiments with different metrics.

【36】 Learning quantum dynamics with latent neural ODEs 标题：用潜伏神经ODE学习量子动力学链接：https://arxiv.org/abs/2110.10721

作者：Matthew Choi,Daniel Flam-Shepherd,Thi Ha Kyaw,Alán Aspuru-Guzik 机构：Department of Computer Science, University of Toronto, Toronto, Ontario M,S ,E, Canada, Vector Institute for Artificial Intelligence, Toronto, Ontario M,S ,M, Canada, Department of Chemistry, University of Toronto, Toronto, Ontario M,G ,Z, Canada 摘要：The core objective of machine-assisted scientific discovery is to learn physical laws from experimental data without prior knowledge of the systems in question. In the area of quantum physics, making progress towards these goals is significantly more challenging due to the curse of dimensionality as well as the counter-intuitive nature of quantum mechanics. Here, we present the QNODE, a latent neural ODE trained on dynamics from closed and open quantum systems. The QNODE can learn to generate quantum dynamics and extrapolate outside of its training region that satisfy the von Neumann and time-local Lindblad master equations for closed and open quantum systems. Furthermore the QNODE rediscovers quantum mechanical laws such as Heisenberg's uncertainty principle in a totally data-driven way, without constraints or guidance. Additionally, we show that trajectories that are generated from the QNODE and are close in its latent space have similar quantum dynamics while preserving the physics of the training system.

【37】 Deep Learning for HDR Imaging: State-of-the-Art and Future Trends 标题：HDR成像的深度学习：现状和未来趋势链接：https://arxiv.org/abs/2110.10394

作者：Lin Wang,Kuk-Jin Yoon 机构： Korea Advanced Institute of Science andTechnology 备注：Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 摘要：High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in image processing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.

其他(45篇)

【1】 Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation 标题：利用基于梯度的潜在插值对非结构化数据空间中的危险因素进行建模链接：https://arxiv.org/abs/2110.11312

作者：Tobias Weber,Michael Ingrisch,Bernd Bischl,David Rügamer 机构：Department of Statistics, LMU Munich, Department of Radiology 备注：NeurIPS 2021 Workshop, Deep Generative Models and Downstream Applications 摘要：The application of deep learning in survival analysis (SA) gives the opportunity to utilize unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

【2】 On games and simulators as a platform for development of artificial intelligence for command and control 标题：游戏模拟器作为指挥控制人工智能发展平台的探讨链接：https://arxiv.org/abs/2110.11305

作者：Vinicius G. Goecks,Nicholas Waytowich,Derrik E. Asher,Song Jun Park,Mark Mittrick,John Richardson,Manuel Vindiola,Anne Logie,Mark Dennison,Theron Trout,Priya Narayanan,Alexander Kott 备注：Preprint submitted to the Journal of Defense Modeling and Simulation (JDMS) for peer review 摘要：Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.

【3】 Survival-oriented embeddings for improving accessibility to complex data structures 标题：用于提高复杂数据结构可访问性的面向生存的嵌入链接：https://arxiv.org/abs/2110.11303

作者：Tobias Weber,Michael Ingrisch,Matthias Fabritius,Bernd Bischl,David Rügamer 机构：Department of Statistics, LMU Munich, Department of Radiology 备注：NeurIPS 2021 Workshop, Bridging the Gap: From Machine Learning Research to Clinical Practice 摘要：Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.

【4】 Is High Variance Unavoidable in RL? A Case Study in Continuous Control 标题：在RL中高方差是不可避免的吗？连续控制中的一个案例研究链接：https://arxiv.org/abs/2110.11222

作者：Johan Bjorck,Carla P. Gomes,Kilian Q. Weinberger 机构：Cornell University 摘要：Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an in-depth analysis, we focus on a specifically popular setup with high variance -- continuous control from pixels with an actor-critic agent. In this setting, we demonstrate that variance mostly arises early in training as a result of poor "outlier" runs, but that weight initialization and initial exploration are not to blame. We show that one cause for early variance is numerical instability which leads to saturating nonlinearities. We investigate several fixes to this issue and find that one particular method is surprisingly effective and simple -- normalizing penultimate features. Addressing the learning instability allows for larger learning rates, and significantly decreases the variance of outcomes. This demonstrates that the perceived variance in RL is not necessarily inherent to the problem definition and may be addressed through simple architectural modifications.

【5】 DAIR: Data Augmented Invariant Regularization 标题：DAIR：数据增广不变正则化链接：https://arxiv.org/abs/2110.11205

作者：Tianjian Huang,Shaunak Halbe,Chinnadhurai Sankar,Pooyan Amini,Satwik Kottur,Alborz Geramifard,Meisam Razaviyayn,Ahmad Beirami 机构：University of Southern California, College of Engineering Pune, Facebook AI 备注：15 pages 摘要：While deep learning through empirical risk minimization (ERM) has succeeded at achieving human-level performance at a variety of complex tasks, ERM generalizes poorly to distribution shift. This is partly explained by overfitting to spurious features such as background in images or named entities in natural language. Synthetic data augmentation followed by empirical risk minimization (DA-ERM) is a simple yet powerful solution to remedy this problem. In this paper, we propose data augmented invariant regularization (DAIR). The idea of DAIR is based on the observation that the model performance (loss) is desired to be consistent on the augmented sample and the original one. DAIR introduces a regularizer on DA-ERM to penalize such loss inconsistency. Both theoretically and through empirical experiments, we show that a particular form of the DAIR regularizer consistently performs well in a variety of settings. We apply it to multiple real-world learning problems involving domain shift, namely robust regression, visual question answering, robust deep neural network training, and task-oriented dialog modeling. Our experiments show that DAIR consistently outperforms ERM and DA-ERM with little marginal cost and setting new state-of-the-art results in several benchmarks.

【6】 Anti-Concentrated Confidence Bonuses for Scalable Exploration 标题：可扩展勘探的反集中信心奖金链接：https://arxiv.org/abs/2110.11202

作者：Jordan T. Ash,Cyril Zhang,Surbhi Goel,Akshay Krishnamurthy,Sham Kakade 机构：Microsoft Research NYC, University of Washington 摘要：Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in large action spaces. This bonus scheme cannot be directly transferred to high-dimensional exploration problems, however, due to the computational cost of maintaining the inverse covariance matrix of action features. We introduce emph{anti-concentrated confidence bounds} for efficiently approximating the elliptical bonus, using an ensemble of regressors trained to predict random noise from policy network-derived features. Using this approximation, we obtain stochastic linear bandit algorithms which obtain $tilde O(d sqrt{T})$ regret bounds for $mathrm{poly}(d)$ fixed actions. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic reward heuristics on Atari benchmarks.

【7】 Sensing Cox Processes via Posterior Sampling and Positive Bases 标题：基于后验抽样和正基的Cox过程感知链接：https://arxiv.org/abs/2110.11181

作者：Mojmír Mutný,Andreas Krause 机构：ETH Zürich 摘要：We study adaptive sensing of Cox point processes, a widely used model from spatial statistics. We introduce three tasks: maximization of captured events, search for the maximum of the intensity function and learning level sets of the intensity function. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. In this basis, the positivity constraint on the intensity function has a simple form. We show how an minimal description positive basis can be adapted to the covariance kernel, non-stationarity and make connections to common positive bases from prior works. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (textsc{Cox-Thompson}) and top-two posterior sampling (textsc{Top2}) principles. With latter, the difference between samples serves as a surrogate to the uncertainty. We demonstrate the approach using examples from environmental monitoring and crime rate modeling, and compare it to the classical Bayesian experimental design approach.

【8】 Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing 标题：每个属性都很重要：基于句子的图像编辑的对比注意链接：https://arxiv.org/abs/2110.11159

作者：Liuqing Zhao,Fan Lyu,Fuyuan Hu,Kaizhu Huang,Fenglei Xu,Linyan Li 机构： Suzhou University of, Science and Technology, Suzhou, China, College of Intelligence and Computing, Tianjin University, Tianjin, China, Xi’an Jiaotong-Liverpool University, Suzhou Institute of Trade and Commerce, L.Zhao and F.Lyu share equal contribution. 备注：Accepted by BMVC 2021 摘要：Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enhancing the difference between attributes, this paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN), which is inspired from contrastive training. Specifically, we first design a novel contrastive attention module to enlarge the editing difference between random combinations of attributes which are formed during training. We then construct an attribute discriminator to ensure effective editing on each attribute. A series of experiments show that our method can generate very encouraging results in sentence-based image editing with multiple attributes on CUB and COCO dataset. Our code is available at https://github.com/Zlq2021/CA-GAN

【9】 Towards strong pruning for lottery tickets with non-zero biases 标题：非零偏彩票的强剪枝链接：https://arxiv.org/abs/2110.11150

作者：Jonas Fischer,Rebekka Burkholz 机构：Max Planck Institute for Informatics, Saarbr¨ucken, Germany, CISPA Helmholtz Center for Information Security 摘要：The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to non-zero biases, including explicit 'looks-linear' approaches for ReLU activation functions. These do not only enable truly orthogonal parameter initialization but also reduce potential pruning errors. In experiments on standard benchmark data sets, we further highlight the practical benefits of non-zero bias initialization schemes, and present theoretically inspired extensions for state-of-the-art strong lottery ticket pruning.

【10】 Sliced-Wasserstein Gradient Flows 标题：切片-瓦瑟斯坦梯度流链接：https://arxiv.org/abs/2110.10972

作者：Clément Bonet,Nicolas Courty,François Septier,Lucas Drumetz 机构：Fran¸cois Septier, Univ. Bretagne Sud, IMT Atlantique 摘要：Minimizing functionals in the space of probability distributions can be done with Wasserstein gradient flows. To solve them numerically, a possible approach is to rely on the Jordan-Kinderlehrer-Otto (JKO) scheme which is analogous to the proximal scheme in Euclidean spaces. However, this bilevel optimization problem is known for its computational challenges, especially in high dimension. To alleviate it, very recent works propose to approximate the JKO scheme leveraging Brenier's theorem, and using gradients of Input Convex Neural Networks to parameterize the density (JKO-ICNN). However, this method comes with a high computational cost and stability issues. Instead, this work proposes to use gradient flows in the space of probability measures endowed with the sliced-Wasserstein (SW) distance. We argue that this method is more flexible than JKO-ICNN, since SW enjoys a closed-form differentiable approximation. Thus, the density at each step can be parameterized by any generative model which alleviates the computational burden and makes it tractable in higher dimensions. Interestingly, we also show empirically that these gradient flows are strongly related to the usual Wasserstein gradient flows, and that they can be used to minimize efficiently diverse machine learning functionals.

【11】 Autonomous Dimension Reduction by Flattening Deformation of Data Manifold under an Intrinsic Deforming Field 标题：本征形变场下数据流形的扁平化自主降维链接：https://arxiv.org/abs/2110.10938

作者：Xiaodong Zhuang 机构：Electronic Information College, Qingdao University, China 备注：18 pages, 23 figures 摘要：A new dimension reduction (DR) method for data sets is proposed by autonomous deforming of data manifolds. The deformation is guided by the proposed deforming vector field, which is defined by two kinds of virtual interactions between data points. The flattening of data manifold is achieved as an emergent behavior under the elastic and repelling interactions between data points, meanwhile the topological structure of the manifold is preserved. To overcome the uneven sampling (or "short-cut edge") problem, the soft neighborhood is proposed, in which the neighbor degree is defined and adaptive interactions between neighbor points is implemented. The proposed method provides a novel geometric viewpoint on dimension reduction. Experimental results prove the effectiveness of the proposed method in dimension reduction, and implicit feature of data sets may also be revealed.

【12】 Subspace Detours Meet Gromov-Wasserstein 标题：子空间迂回满足Gromov-Wasserstein 链接：https://arxiv.org/abs/2110.10932

作者：Clément Bonet,Nicolas Courty,François Septier,Lucas Drumetz 机构：Univ. Bretagne Sud, LMBA, F-, Vannes, Univ. Bretagne Sud, IRISA, IMT Atlantique, Lab-STICC, F-, Brest 摘要：In the context of optimal transport methods, the subspace detour approach was recently presented by Muzellec and Cuturi (2019). It consists in building a nearly optimal transport plan in the measures space from an optimal transport plan in a wisely chosen subspace, onto which the original measures are projected. The contribution of this paper is to extend this category of methods to the Gromov-Wasserstein problem, which is a particular type of transport distance involving the inner geometry of the compared distributions. After deriving the associated formalism and properties, we also discuss a specific cost for which we can show connections with the Knothe-Rosenblatt rearrangement. We finally give an experimental illustration on a shape matching problem.

【13】 An Empirical Evaluation of Time-Series Feature Sets 标题：时间序列特征集的一种经验评价链接：https://arxiv.org/abs/2110.10914

作者：Trent Henderson,Ben D. Fulcher 机构：School of Physics, The University of Sydney, Sydney, Australia 备注：Submitted to and accepted for publication in SFE-TSDM Workshop at 21st IEEE International Conference on Data Mining (IEEE ICDM 2021) 摘要：Solving time-series problems with features has been rising in popularity due to the availability of software for feature extraction. Feature-based time-series analysis can now be performed using many different feature sets, including hctsa (7730 features: Matlab), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (up to 1558 features: Python), TSFEL (390 features: Python), and the C-coded catch22 (22 features: Matlab, R, Python, and Julia). There is substantial overlap in the types of methods included in these sets (e.g., properties of the autocorrelation function and Fourier power spectrum), but they are yet to be systematically compared. Here we compare these seven sets on computational speed, assess the redundancy of features contained in each, and evaluate the overlap and redundancy between them. We take an empirical approach to feature similarity based on outputs across a diverse set of real-world and simulated time series. We find that feature sets vary across three orders of magnitude in their computation time per feature on a laptop for a 1000-sample series, from the fastest sets catch22 and TSFEL (~0.1ms per feature) to tsfeatures (~3s per feature). Using PCA to evaluate feature redundancy within each set, we find the highest within-set redundancy for TSFEL and tsfresh. For example, in TSFEL, 90% of the variance across 390 features can be captured with just four PCs. Finally, we introduce a metric for quantifying overlap between pairs of feature sets, which indicates substantial overlap. We found that the largest feature set, hctsa, is the most comprehensive, and that tsfresh is the most distinctive, due to its incorporation of many low-level Fourier coefficients. Our results provide empirical understanding of the differences between existing feature sets, information that can be used to better tailor feature sets to their applications.

【14】 Deep Image Matting with Flexible Guidance Input 标题：具有灵活引导输入的深部图像遮片链接：https://arxiv.org/abs/2110.10898

作者：Hang Cheng,Shugong Xu,Xiufeng Jiang,Rongrong Wang 机构：Shanghai Institute for Advanced, Communication and Data, Science(SICS), Shanghai University, China 备注：Accepted to BMVC2021 摘要：Image matting is an important computer vision problem. Many existing matting methods require a hand-made trimap to provide auxiliary information, which is very expensive and limits the real world usage. Recently, some trimap-free methods have been proposed, which completely get rid of any user input. However, their performance lag far behind trimap-based methods due to the lack of guidance information. In this paper, we propose a matting method that use Flexible Guidance Input as user hint, which means our method can use trimap, scribblemap or clickmap as guidance information or even work without any guidance input. To achieve this, we propose Progressive Trimap Deformation(PTD) scheme that gradually shrink the area of the foreground and background of the trimap with the training step increases and finally become a scribblemap. To make our network robust to any user scribble and click, we randomly sample points on foreground and background and perform curve fitting. Moreover, we propose Semantic Fusion Module(SFM) which utilize the Feature Pyramid Enhancement Module(FPEM) and Joint Pyramid Upsampling(JPU) in matting task for the first time. The experiments show that our method can achieve state-of-the-art results comparing with existing trimap-based and trimap-free methods.

【15】 Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization 标题：将视觉空间结构、语言结构和常识结构整合到故事可视化中链接：https://arxiv.org/abs/2110.10834

作者：Adyasha Maharana,Mohit Bansal 机构：Department of Computer Science, University of North Carolina at Chapel Hill 备注：EMNLP 2021 (16 pages) 摘要：While much research has been done in text-to-image synthesis, little work has been done to explore the usage of linguistic structure of the input text. Such information is even more important for story visualization since its inputs have an explicit narrative structure that needs to be translated into an image sequence (or visual story). Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance. In this paper, we first explore the use of constituency parse trees using a Transformer-based recurrent architecture for encoding structured input. Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story. Third, we also incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images within a dual learning setup. We show that off-the-shelf dense-captioning models trained on Visual Genome can improve the spatial structure of images from a different target domain without needing fine-tuning. We train the model end-to-end using intra-story contrastive loss (between words and image sub-regions) and show significant improvements in several metrics (and human evaluation) for multiple datasets. Finally, we provide an analysis of the linguistic and visuo-spatial information. Code and data: https://github.com/adymaharana/VLCStoryGan.

【16】 AdamD: Improved bias-correction in Adam 标题：ADAMD：改进了ADAM中的偏差校正链接：https://arxiv.org/abs/2110.10828

作者：John St John 机构：Ravel Biotechnology, rd St, Suite , San Francisco, CA , USA, Editor: NA 备注：6 pages, 1 figure 摘要：Here I present a small update to the bias correction term in the Adam optimizer that has the advantage of behaving well in the first several steps. The default implementation of Adam may be as sensitive as it is to hyperparameters partially due to the originally proposed bias correction procedure, and its behavior in early steps of training.

【17】 HALP: Hardware-Aware Latency Pruning 标题：HALP：硬件感知延迟修剪链接：https://arxiv.org/abs/2110.10811

作者：Maying Shen,Hongxu Yin,Pavlo Molchanov,Lei Mao,Jianna Liu,Jose M. Alvarez 机构：NVIDIA 摘要：Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by $1.60times$/$1.90times$ with $ 0.3%$/$-0.2%$ top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by $1.94times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior art, sometimes by large margins.

【18】 Hierarchical Skills for Efficient Exploration 标题：有效探索的分层技能链接：https://arxiv.org/abs/2110.10809

作者：Jonas Gehring,Gabriel Synnaeve,Andreas Krause,Nicolas Usunier 机构：Facebook AI Research, ETH Zürich 备注：To appear in 35th Conference on Neural Information Processing Systems (NeurIPS 2021) 摘要：In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. In previous work on continuous control, the sensitivity of methods to this trade-off has not been addressed explicitly, as locomotion provides a suitable prior for navigation tasks, which have been of foremost interest. In this work, we analyze this trade-off for low-level policy pre-training with a new benchmark suite of diverse, sparse-reward tasks for bipedal robots. We alleviate the need for prior knowledge by proposing a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner. For utilization on downstream tasks, we present a three-layered hierarchical learning algorithm to automatically trade off between general and specific skills as required by the respective task. In our experiments, we show that our approach performs this trade-off effectively and achieves better results than current state-of-the-art methods for end- to-end hierarchical reinforcement learning and unsupervised skill discovery. Code and videos are available at https://facebookresearch.github.io/hsd3 .

【19】 Propensity-scored Probabilistic Label Trees 标题：倾向评分概率标签树链接：https://arxiv.org/abs/2110.10803

作者：Marek Wydmuch,Kalina Jasinska-Kobus,Rohit Babbar,Krzysztof Dembczyński 机构：Poznan University of Technology, Poznan, Poland, ML Research at Allegro.pl, Aalto University, Helsinki, Finland, Yahoo! Research, New York, USA 备注：The extended version of SIGIR '21 Short Research Paper 摘要：Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels. Recently, XMLC has been widely applied to diverse web applications such as automatic content labeling, online advertising, or recommendation systems. In such environments, label distribution is often highly imbalanced, consisting mostly of very rare tail labels, and relevant labels can be missing. As a remedy to these problems, the propensity model has been introduced and applied within several XMLC algorithms. In this work, we focus on the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems. We introduce an inference procedure, based on the $A^*$-search algorithm, that efficiently finds the optimal solution, assuming that all probabilities and propensities are known. We demonstrate the attractiveness of this approach in a wide empirical study on popular XMLC benchmark datasets.

【20】 OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems 标题：OMB-Py：在HPC系统上评估MPI库性能的Python微基准测试链接：https://arxiv.org/abs/2110.10659

作者：Nawras Alnaasan,Arpan Jain,Aamir Shafi,Hari Subramoni,Dhabaleswar K Panda 机构：Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA 摘要：Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Efficient communication is key to scaling applications on parallel systems, which is typically enabled by the Message Passing Interface (MPI) standard and compliant libraries on HPC hardware. mpi4py is a Python-based communication library that provides an MPI-like interface for Python applications allowing application developers to utilize parallel processing elements including GPUs. However, there is currently no benchmark suite to evaluate communication performance of mpi4py -- and Python MPI codes in general -- on modern HPC systems. In order to bridge this gap, we propose OMB-Py -- Python extensions to the open-source OSU Micro-Benchmark (OMB) suite -- aimed to evaluate communication performance of MPI-based parallel applications in Python. To the best of our knowledge, OMB-Py is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests that are implemented for a range of popular Python libraries including NumPy, CuPy, Numba, and PyCUDA. We also provide Python implementation for several distributed ML algorithms as benchmarks to understand the potential gain in performance for ML/DL workloads. Our evaluation reveals that mpi4py introduces a small overhead when compared to native MPI libraries. We also evaluate the ML/DL workloads and report up to 106x speedup on 224 CPU cores compared to sequential execution. We plan to publicly release OMB-Py to benefit Python HPC community.

【21】 Independent Natural Policy Gradient Always Converges in Markov Potential Games 标题：马尔可夫势对策中独立的自然政策梯度总是收敛的链接：https://arxiv.org/abs/2110.10614

作者：Roy Fox,Stephen McAleer,Will Overman,Ioannis Panageas 机构：University of California, Irvine 备注：24 pages 摘要：Multi-agent reinforcement learning has been successfully applied to fully-cooperative and fully-competitive environments, but little is currently known about mixed cooperative/competitive environments. In this paper, we focus on a particular class of multi-agent mixed cooperative/competitive stochastic games called Markov Potential Games (MPGs), which include cooperative games as a special case. Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well. We prove that Independent Natural Policy Gradient always converges in the last iterate using constant learning rates. The proof deviates from the existing approaches and the main challenge lies in the fact that Markov Potential Games do not have unique optimal values (as single-agent settings exhibit) so different initializations can lead to different limit point values. We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.

【22】 Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training 标题：基于时域映射的分层约束训练单通道语音分离链接：https://arxiv.org/abs/2110.10593

作者：Chenyang Gao,Yue Gu,Ivan Marsic 机构：Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA 摘要：Single-channel speech separation is required for multi-speaker speech recognition. Recent deep learning-based approaches focused on time-domain audio separation net (TasNet) because it has superior performance and lower latency compared to the conventional time-frequency-based (T-F-based) approaches. Most of these works rely on the masking-based method that estimates a linear mapping function (mask) for each speaker. However, the other commonly used method, the mapping-based method that is less sensitive to SNR variations, is inadequately studied in the time domain. We explore the potential of the mapping-based method by introducing attention augmented DPRNN (AttnAugDPRNN) which directly approximates the clean sources from the mixture for speech separation. Permutation Invariant Training (PIT) has been a paradigm to solve the label ambiguity problem for speech separation but usually leads to suboptimal performance. To solve this problem, we propose an efficient training strategy called Hierarchical Constraint Training (HCT) to regularize the training, which could effectively improve the model performance. When using PIT, our results showed that mapping-based AttnAugDPRNN outperformed masking-based AttnAugDPRNN when the training corpus is large. Mapping-based AttnAugDPRNN with HCT significantly improved the SI-SDR by 10.1% compared to the masking-based AttnAugDPRNN without HCT.

【23】 Why Settle for Just One? Extending EL Ontology Embeddings with Many-to-Many Relationships 标题：为什么只满足于一个呢？用多对多关系扩展EL 本体嵌入链接：https://arxiv.org/abs/2110.10555

作者：Biswesh Mohapatra,Sumit Bhatia,Raghava Mutharaju,G. Srinivasaraghavan 机构：International Institute of Information Technology, Bangalore, India, Adobe Research, Indraprastha Institute of Information Technology, Delhi, India 备注：The paper got accepted in SemrRec challenge in ISWC 2021 摘要：Knowledge Graph (KG) embeddings provide a low-dimensional representation of entities and relations of a Knowledge Graph and are used successfully for various applications such as question answering and search, reasoning, inference, and missing link prediction. However, most of the existing KG embeddings only consider the network structure of the graph and ignore the semantics and the characteristics of the underlying ontology that provides crucial information about relationships between entities in the KG. Recent efforts in this direction involve learning embeddings for a Description Logic (logical underpinning for ontologies) named EL . However, such methods consider all the relations defined in the ontology to be one-to-one which severely limits their performance and applications. We provide a simple and effective solution to overcome this shortcoming that allows such methods to consider many-to-many relationships while learning embedding representations. Experiments conducted using three different EL ontologies show substantial performance improvement over five baselines. Our proposed solution also paves the way for learning embedding representations for even more expressive description logics such as SROIQ.

【24】 Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation 标题：垃圾还是珍宝？一种交互式双流单幅图像反射分离策略链接：https://arxiv.org/abs/2110.10546

作者：Qiming Hu,Xiaojie Guo 机构：College of Intelligence and Computing, Tianjin University, Tianjin , China 备注：Accepted to NeurIPS2021 摘要：Single image reflection separation (SIRS), as a representative blind source separation task, aims to recover two layers, $textit{i.e.}$, transmission and reflection, from one mixed observation, which is challenging due to the highly ill-posed nature. Existing deep learning based solutions typically restore the target layers individually, or with some concerns at the end of the output, barely taking into account the interaction across the two streams/branches. In order to utilize information more efficiently, this work presents a general yet simple interactive strategy, namely $textit{your trash is my treasure}$ (YTMT), for constructing dual-stream decomposition networks. To be specific, we explicitly enforce the two streams to communicate with each other block-wisely. Inspired by the additive property between the two components, the interactive path can be easily built via transferring, instead of discarding, deactivated information by the ReLU rectifier from one stream to the other. Both ablation studies and experimental results on widely-used SIRS datasets are conducted to demonstrate the efficacy of YTMT, and reveal its superiority over other state-of-the-art alternatives. The implementation is quite simple and our code is publicly available at $href{https://github.com/mingcv/YTMT-Strategy}{textit{https://github.com/mingcv/YTMT-Strategy}}$.

【25】 Statistical and Topological Properties of Gaussian Smoothed Sliced Probability Divergences 标题：高斯平滑切片概率发散的统计和拓扑性质链接：https://arxiv.org/abs/2110.10524

作者：Alain Rakotomamonjy,Mokhtar Z. Alaya,Maxime Berar,Gilles Gasso 机构：Criteo AI Lab, Paris, Mokhtar Zahdi El Alaya†, LMAC, Universit´e Technologique de Compiegne, LITIS, Universit´e de Rouen, LITIS, INSA de Rouen 摘要：Gaussian smoothed sliced Wasserstein distance has been recently introduced for comparing probability distributions, while preserving privacy on the data. It has been shown, in applications such as domain adaptation, to provide performances similar to its non-private (non-smoothed) counterpart. However, the computational and statistical properties of such a metric is not yet been well-established. In this paper, we analyze the theoretical properties of this distance as well as those of generalized versions denoted as Gaussian smoothed sliced divergences. We show that smoothing and slicing preserve the metric property and the weak topology. We also provide results on the sample complexity of such divergences. Since, the privacy level depends on the amount of Gaussian smoothing, we analyze the impact of this parameter on the divergence. We support our theoretical findings with empirical studies of Gaussian smoothed and sliced version of Wassertein distance, Sinkhorn divergence and maximum mean discrepancy (MMD). In the context of privacy-preserving domain adaptation, we confirm that those Gaussian smoothed sliced Wasserstein and MMD divergences perform very well while ensuring data privacy.

【26】 Periodic DMP formulation for Quaternion Trajectories 标题：四元数轨道的周期DMP公式链接：https://arxiv.org/abs/2110.10510

作者：Fares J. Abu-Dakka,Matteo Saveriano,Luka Peternel 备注：2021 20th International Conference on Advanced Robotics (ICAR) 摘要：Imitation learning techniques have been used as a way to transfer skills to robots. Among them, dynamic movement primitives (DMPs) have been widely exploited as an effective and an efficient technique to learn and reproduce complex discrete and periodic skills. While DMPs have been properly formulated for learning point-to-point movements for both translation and orientation, periodic ones are missing a formulation to learn the orientation. To address this gap, we propose a novel DMP formulation that enables encoding of periodic orientation trajectories. Within this formulation we develop two approaches: Riemannian metric-based projection approach and unit quaternion based periodic DMP. Both formulations exploit unit quaternions to represent the orientation. However, the first exploits the properties of Riemannian manifolds to work in the tangent space of the unit sphere. The second encodes directly the unit quaternion trajectory while guaranteeing the unitary norm of the generated quaternions. We validated the technical aspects of the proposed methods in simulation. Then we performed experiments on a real robot to execute daily tasks that involve periodic orientation changes (i.e., surface polishing/wiping and liquid mixing by shaking).

【27】 Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation 标题：高维权重更新超参数隐式微分法的可伸缩单程优化链接：https://arxiv.org/abs/2110.10461

作者：Ross M. Clarke,Elre T. Oldewage,José Miguel Hernández-Lobato 机构：University of Cambridge, Alan Turing Institute 备注：34 pages, 18 figures, 13 tables 摘要：Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.

【28】 Reconstruction of Fragmented Trajectories of Collective Motion using Hadamard Deep Autoencoders 标题：利用Hadamard深度自动编码器重建集体运动的碎片化轨迹链接：https://arxiv.org/abs/2110.10428

作者：Kelum Gajamannage,Yonggi Park,Randy Paffenroth,Anura P. Jayasumana 机构：Department of Mathematics and Statistics, Texas A&M University–Corpus Christi, Corpus Christi, TX–, USA., Department of Mathematical Sciences, Department of Computer Science, Data Science Program, Worcester Polytechnic, Institute, Worcester, MA–, USA. 备注：21 Pages, 5 figures, submitted into Pattern Recognition 摘要：Learning dynamics of collectively moving agents such as fish or humans is an active field in research. Due to natural phenomena such as occlusion and change of illumination, the multi-object methods tracking such dynamics might lose track of the agents where that might result fragmentation in the constructed trajectories. Here, we present an extended deep autoencoder (DA) that we train only on fully observed segments of the trajectories by defining its loss function as the Hadamard product of a binary indicator matrix with the absolute difference between the outputs and the labels. The trajectories of the agents practicing collective motion is low-rank due to mutual interactions and dependencies between the agents that we utilize as the underlying pattern that our Hadamard deep autoencoder (HDA) codes during its training. The performance of our HDA is compared with that of a low-rank matrix completion scheme in the context of fragmented trajectory reconstruction.

【29】 Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover 标题：稳健的学习收缩阈值(睡觉)：用于稀疏恢复的稳健展开链接：https://arxiv.org/abs/2110.10391

作者：Wei Pu,Chao Zhou,Yonina C. Eldar,Miguel R. D. Rodrigues 机构： Rodrigues are with the Department of Electronic and Electrical Engineering, University CollegeLondon, Eldar is with the Weizmann Institute of Science 摘要：In this paper, we consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications. Specifically, we treat sensing problems with model mismatch where one wishes to recover a sparse high-dimensional vector from low-dimensional observations subject to uncertainty in the measurement operator. We then design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. Our proposed network - named Robust lEarned Shrinkage-Thresholding (REST) - exhibits an additional normalization processing compared to Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), leading to reliable recovery of the signal under sample-wise varying model mismatch. The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems wherein model mismatch is taken into consideration.

【30】 Frontiers in Evolutionary Computation: A Workshop Report 标题：进化计算前沿：研讨会报告链接：https://arxiv.org/abs/2110.10320

作者：Tyler Millhouse,Melanie Moses,Melanie Mitchell 机构：Santa Fe Institute, University of New Mexico 摘要：In July of 2021, the Santa Fe Institute hosted a workshop on evolutionary computation as part of its Foundations of Intelligence in Natural and Artificial Systems project. This project seeks to advance the field of artificial intelligence by promoting interdisciplinary research on the nature of intelligence. The workshop brought together computer scientists and biologists to share their insights about the nature of evolution and the future of evolutionary computation. In this report, we summarize each of the talks and the subsequent discussions. We also draw out a number of key themes and identify important frontiers for future research.

【31】 LMSOC: An Approach for Socially Sensitive Pretraining 标题：LMSOC：一种对社会敏感的预训练方法链接：https://arxiv.org/abs/2110.10319

作者：Vivek Kulkarni,Shubhanshu Mishra,Aria Haghighi 机构：Twitter Cortex 备注：Camera ready version. Accepted to EMNLP 2021 Findings. Code for reproducing the experiments can be found at: this https URL 摘要：While large-scale pretrained language models have been shown to learn effective linguistic representations for many NLP tasks, there remain many real-world contextual aspects of language that current approaches do not capture. For instance, consider a cloze-test "I enjoyed the ____ game this weekend": the correct answer depends heavily on where the speaker is from, when the utterance occurred, and the speaker's broader social milieu and preferences. Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models. We propose a simple but effective approach to incorporate speaker social context into the learned representations of large-scale language models. Our method first learns dense representations of social contexts using graph representation learning algorithms and then primes language model pretraining with these social context representations. We evaluate our approach on geographically-sensitive language-modeling tasks and show a substantial improvement (more than 100% relative lift on MRR) compared to baselines.

【32】 Neural Stochastic Partial Differential Equations 标题：神经随机偏微分方程链接：https://arxiv.org/abs/2110.10249

作者：Cristopher Salvi,Maud Lemercier 摘要：Stochastic partial differential equations (SPDEs) are the mathematical tool of choice to model complex spatio-temporal dynamics of systems subject to the influence of randomness. We introduce the Neural SPDE model providing an extension to two important classes of physics-inspired neural architectures. On the one hand, it extends all the popular neural -- ordinary, controlled, stochastic, rough -- differential equation models in that it is capable of processing incoming information even when the latter evolves in an infinite dimensional state space. On the other hand, it extends Neural Operators -- recent generalizations of neural networks modelling mappings between functional spaces -- in that it can be used to learn complex SPDE solution operators $(u_0,xi) mapsto u$ depending simultaneously on an initial condition $u_0$ and on a stochastic forcing term $xi$, while remaining resolution-invariant and equation-agnostic. A Neural SPDE is constrained to respect real physical dynamics and consequently requires only a modest amount of data to train, depends on a significantly smaller amount of parameters and has better generalization properties compared to Neural Operators. Through various experiments on semilinear SPDEs with additive and multiplicative noise (including the stochastic Navier-Stokes equations) we demonstrate how Neural SPDEs can flexibly be used in a supervised learning setting as well as conditional generative models to sample solutions of SPDEs conditioned on prior knowledge, systematically achieving in both cases better performance than all alternative models.

【33】 The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding 标题：CORA张量编译器：最小填充的粗糙张量编译链接：https://arxiv.org/abs/2110.10221

作者：Pratik Fegade,Tianqi Chen,Phillip B. Gibbons,Todd C. Mowry 机构： Note 1Carnegie Mellon University 备注：23 pages, 25 figures and 10 tables 摘要：There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution on ragged tensors, current deep learning frameworks generally use techniques such as padding and masking to make the data shapes uniform and then offload the computations to optimized kernels for dense tensor algebra. Such techniques can, however, lead to a lot of wasted computation and therefore, a loss in performance. This paper presents CoRa, a tensor compiler that allows users to easily generate efficient code for ragged tensor operators targeting a wide range of CPUs and GPUs. Evaluating CoRa on a variety of operators on ragged tensors as well as on an encoder layer of the transformer model, we find that CoRa (i)performs competitively with hand-optimized implementations of the operators and the transformer encoder and (ii) achieves, over PyTorch, a 1.6X geomean speedup for the encoder on an Nvidia GPU and a 1.86X geomean speedup for the multi-head attention module used in transformers on an ARM CPU.

【34】 Identifying Stroke Indicators Using Rough Sets 标题：基于粗糙集的笔画指标识别链接：https://arxiv.org/abs/2110.10152

作者：Muhammad Salman Pathan,Jianbiao Zhang,Deepu John,Avishek Nag,Soumyabrata Dev 机构：DEV, (Member, IEEE), Beijing Key Laboratory of Trusted Computing, Beijing University of Technology, Beijing , China, University College Dublin, Dublin, Ireland, ADAPT SFI Research Centre, Dublin, Ireland 备注：Accepted in IEEE Access, 2020 摘要：Stroke is widely considered as the second most common cause of mortality. The adverse consequences of stroke have led to global interest and work for improving the management and diagnosis of stroke. Various techniques for data mining have been used globally for accurate prediction of occurrence of stroke based on the risk factors that are associated with the electronic health care records (EHRs) of the patients. In particular, EHRs routinely contain several thousands of features and most of them are redundant and irrelevant that need to be discarded to enhance the prediction accuracy. The choice of feature-selection methods can help in improving the prediction accuracy of the model and efficient data management of the archived input features. In this paper, we systematically analyze the various features in EHR records for the detection of stroke. We propose a novel rough-set based technique for ranking the importance of the various EHR records in detecting stroke. Unlike the conventional rough-set techniques, our proposed technique can be applied on any dataset that comprises binary feature sets. We evaluated our proposed method in a publicly available dataset of EHR, and concluded that age, average glucose level, heart disease, and hypertension were the most essential attributes for detecting stroke in patients. Furthermore, we benchmarked the proposed technique with other popular feature-selection techniques. We obtained the best performance in ranking the importance of individual features in detecting stroke.

【35】 Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory 标题：基于前向后向SDES理论的薛定谔桥似然训练链接：https://arxiv.org/abs/2110.11291

作者：Tianrong Chen,Guan-Horng Liu,Evangelos A. Theodorou 机构：Georgia Institute of Technology, USA 摘要：Schr"odinger Bridge (SB) is an optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing parameterized log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory -- a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10.

【36】 User-friendly introduction to PAC-Bayes bounds 标题：PAC-Bayes界的用户友好介绍链接：https://arxiv.org/abs/2110.11216

作者：Pierre Alquier 机构：RIKEN AIP, Tokyo, Japan 摘要：Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of Catoni that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, "(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by Dziugaite and Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.

【37】 Data splitting improves statistical performance in overparametrized regimes 标题：数据拆分提高了过度参数化状态下的统计性能链接：https://arxiv.org/abs/2110.10956

作者：Nicole Mücke,Enrico Reiss,Jonas Rungenhagen,Markus Klein 机构：University of Potsdam 摘要：While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparametrization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.

【38】 The R package sentometrics to compute, aggregate and predict with textual sentiment 标题：用文本情感计算、聚集和预测R包语句计量学链接：https://arxiv.org/abs/2110.10817

作者：David Ardia,Keven Bluteau,Samuel Borms,Kris Boudt 机构：HEC Montréal, GERAD, Université de Sherbrooke, University of Neuchâtel, Vrije Universiteit Brussel, Ghent University, Vrije Universiteit Amsterdam, Journal of Statistical Software,- 备注：None 摘要：We provide a hands-on introduction to optimized textual sentiment indexation using the R package sentometrics. Textual sentiment analysis is increasingly used to unlock the potential information value of textual data. The sentometrics package implements an intuitive framework to efficiently compute sentiment scores of numerous texts, to aggregate the scores into multiple time series, and to use these time series to predict other variables. The workflow of the package is illustrated with a built-in corpus of news articles from two major U.S. journals to forecast the CBOE Volatility Index.

【39】 REAL-M: Towards Speech Separation on Real Mixtures 标题：Real-M：实现实混合上的语音分离链接：https://arxiv.org/abs/2110.10812

作者：Cem Subakan,Mirco Ravanelli,Samuele Cornell,François Grondin 机构：Universit´e de Sherbrooke, Canada,Mila-Quebec AI Institute, Canada, Italy 备注：Submitted to ICASSP 2022 摘要：In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures. The performance predictions of the SI-SNR estimator indeed correlate well with human opinions. Moreover, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow those achieved on synthetic benchmarks when evaluating popular speech separation models.

【40】 Identifiable Variational Autoencoders via Sparse Decoding 标题：基于稀疏解码的可识别变分自动编码器链接：https://arxiv.org/abs/2110.10804

作者：Gemma E. Moran,Dhanya Sridhar,Yixin Wang,David M. Blei 机构：Data Science Institute, Columbia University, Department of Statistics, University of Michigan, Department of Statistics, Columbia University, Department of Computer Science, Columbia University 摘要：We develop the Sparse VAE, a deep generative model for unsupervised representation learning on high-dimensional data. Given a dataset of observations, the Sparse VAE learns a set of latent factors that captures its distribution. The model is sparse in the sense that each feature of the dataset (i.e., each dimension) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We first show that the Sparse VAE is identifiable: given data drawn from the model, there exists a uniquely optimal set of factors. (In contrast, most VAE-based models are not identifiable.) The key assumption behind Sparse-VAE identifiability is the existence of "anchor features", where for each factor there exists a feature that depends only on that factor. Importantly, the anchor features do not need to be known in advance. We then show how to fit the Sparse VAE with variational EM. Finally, we empirically study the Sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.

【41】 Pick-and-Mix Information Operators for Probabilistic ODE Solvers 标题：概率ODE求解器的拾取混合信息算子链接：https://arxiv.org/abs/2110.10770

作者：Nathanael Bosch,Filip Tronarp,Philipp Hennig 机构：University of Tübingen, Max Planck Institute for Intelligent Systems, Tübingen, Germany 备注：13 pages, 7 figures 摘要：Probabilistic numerical solvers for ordinary differential equations compute posterior distributions over the solution of an initial value problem via Bayesian inference. In this paper, we leverage their probabilistic formulation to seamlessly include additional information as general likelihood terms. We show that second-order differential equations should be directly provided to the solver, instead of transforming the problem to first order. Additionally, by including higher-order information or physical conservation laws in the model, solutions become more accurate and more physically meaningful. Lastly, we demonstrate the utility of flexible information operators by solving differential-algebraic equations. In conclusion, the probabilistic formulation of numerical solvers offers a flexible way to incorporate various types of information, thus improving the resulting solutions.

【42】 Factorization Approach for Low-complexity Matrix Completion Problems: Exponential Number of Spurious Solutions and Failure of Gradient Methods 标题：低复杂度矩阵补全问题的因式分解方法：伪解的指数数和梯度法的失效链接：https://arxiv.org/abs/2110.10279

作者：Baturalp Yalcin,Haixiang Zhang,Javad Lavaei,Somayeh Sojoudi 机构：UC Berkeley 备注：21 pages, 1 figure 摘要：It is well-known that the Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the RIP condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with a low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. In this work, we provide a negative answer to the above question. We investigate the landscape of B-M factorized polynomial-time solvable matrix completion (MC) problems, which are the most popular subclass of low-rank matrix optimization problems without the RIP condition. We construct an instance of polynomial-time solvable MC problems with exponentially many spurious local minima, which leads to the failure of most gradient-based methods. Based on those results, we define a new complexity metric that potentially measures the solvability of low-rank matrix optimization problems based on the B-M factorization approach. In addition, we show that more measurements of the ground truth matrix can deteriorate the landscape, which further reveals the unfavorable behavior of the B-M factorization on general low-rank matrix optimization problems.

【43】 Patch Based Transformation for Minimum Variance Beamformer Image Approximation Using Delay and Sum Pipeline 标题：基于面片变换的延迟和流水线最小方差波束形成器图像逼近链接：https://arxiv.org/abs/2110.10220

作者：Sairoop Bodepudi,A N Madhavanunni,Mahesh Raveendranatha Panicker 机构：Indian Institute of Technology Palakkad, Kerala, India, A. N. Madhavanunni 备注：6 pages, 3 figures 摘要：In the recent past, there have been several efforts in accelerating computationally heavy beamforming algorithms such as minimum variance distortionless response (MVDR) beamforming to achieve real-time performance comparable to the popular delay and sum (DAS) beamforming. This has been achieved using a variety of neural network architectures ranging from fully connected neural networks (FCNNs), convolutional neural networks (CNNs) and general adversarial networks (GANs). However most of these approaches are working with optimizations considering image level losses and hence require a significant amount of dataset to ensure that the process of beamforming is learned. In this work, a patch level U-Net based neural network is proposed, where the delay compensated radio frequency (RF) patch for a fixed region in space (e.g. 32x32) is transformed through a U-Net architecture and multiplied with DAS apodization weights and optimized for similarity with MVDR image of the patch. Instead of framing the beamforming problem as a regression problem to estimate the apodization weights, the proposed approach treats the non-linear transformation of the RF data space that can account for the data driven weight adaptation done by the MVDR approach in the parameters of the network. In this way, it is also observed that by restricting the input to a patch the model will learn the beamforming pipeline as an image non-linear transformation problem.

【44】 Long Random Matrices and Tensor Unfolding 标题：长随机矩阵与张量展开链接：https://arxiv.org/abs/2110.10210

作者：Gérard Ben Arous,Daniel Zhengyu Huang,Jiaoyang Huang 机构：Courant Institute, NYU, New York, NY, California Institute of Technology, Pasadena, CA 备注：29 pages, 4 figures 摘要：In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is "long": we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and singular vectors exhibit a BBP type phase transition. As a main application, we investigate the tensor unfolding algorithm for the asymmetric rank-one spiked tensor model, and obtain an exact threshold, which is independent of the procedure of tensor unfolding. If the signal-to-noise ratio is above the threshold, tensor unfolding detects the signals; otherwise, it fails to capture the signals.

【45】 Barriers and Dynamical Paths in Alternating Gibbs Sampling of Restricted Boltzmann Machines 标题：受限Boltzmann机交替Gibbs抽样中的障碍和动态路径链接：https://arxiv.org/abs/2107.06013

作者：Clément Roussel,Simona Cocco,Rémi Monasson 机构：Laboratory of Physics of the ´Ecole Normale Sup´erieure, CNRS UMR , & PSL Research, Sorbonne Universit´e, rue Lhomond, Paris, France 备注：None 摘要：Restricted Boltzmann Machines (RBM) are bi-layer neural networks used for the unsupervised learning of model distributions from data. The bipartite architecture of RBM naturally defines an elegant sampling procedure, called Alternating Gibbs Sampling (AGS), where the configurations of the latent-variable layer are sampled conditional to the data-variable layer, and vice versa. We study here the performance of AGS on several analytically tractable models borrowed from statistical mechanics. We show that standard AGS is not more efficient than classical Metropolis-Hastings (MH) sampling of the effective energy landscape defined on the data layer. However, RBM can identify meaningful representations of training data in their latent space. Furthermore, using these representations and combining Gibbs sampling with the MH algorithm in the latent space can enhance the sampling performance of the RBM when the hidden units encode weakly dependent features of the data. We illustrate our findings on three datasets: Bars and Stripes and MNIST, well known in machine learning, and the so-called Lattice Proteins, introduced in theoretical biology to study the sequence-to-structure mapping in proteins.

linux https 网络安全数据挖掘批量计算

0 人点赞