机器学习学术速递[11.11]

cs.LG 方向，今日共计63篇

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】 LSP : Acceleration and Regularization of Graph Neural Networks via Locality Sensitive Pruning of Graphs 标题：LSP：基于图的位置敏感剪枝的图神经网络加速和正则化链接：https://arxiv.org/abs/2111.05694

作者：Eitan Kosman,Joel Oren,Dotan Di Castro 机构：Bosch Center for Artificial Intelligence, Haifa, Israel 摘要：图形神经网络（GNNs）已经成为图形相关任务的非常成功的工具。然而，现实世界中的问题涉及非常大的图，并且使GNN适应这些问题所需的计算资源迅速增长。此外，真实世界图形的噪声性质和大小会导致GNN过度拟合，如果不进行适当的正则化。令人惊讶的是，最近的研究表明，大型图通常涉及许多冗余组件，这些组件可以在不太影响性能的情况下删除。这包括在通过GNNs层进行推理期间或作为稀疏输入图的预处理步骤删除节点或边。这一有趣的现象使最先进的GNN得以发展，既高效又准确。在本文中，我们进一步揭开了这一现象的神秘面纱，并提出了一种基于局部敏感散列的图剪枝系统方法，称为局部敏感剪枝（LSP）。我们的目标是对图进行稀疏化，使原始图的相似局部环境在生成的稀疏图中产生相似的环境，这是与图相关的任务的一个基本特征。为了证明基于局部图属性的剪枝的应用，我们举例说明了在各种场景中应用基于局部性属性的剪枝比其他剪枝策略的优势。在合成数据集和真实数据集上的大量实验证明了LSP的优越性，它可以在不影响性能的情况下从大型图形中删除大量边，并伴随着相当大的加速。摘要：Graph Neural Networks (GNNs) have emerged as highly successful tools for graph-related tasks. However, real-world problems involve very large graphs, and the compute resources needed to fit GNNs to those problems grow rapidly. Moreover, the noisy nature and size of real-world graphs cause GNNs to over-fit if not regularized properly. Surprisingly, recent works show that large graphs often involve many redundant components that can be removed without compromising the performance too much. This includes node or edge removals during inference through GNNs layers or as a pre-processing step that sparsifies the input graph. This intriguing phenomenon enables the development of state-of-the-art GNNs that are both efficient and accurate. In this paper, we take a further step towards demystifying this phenomenon and propose a systematic method called Locality-Sensitive Pruning (LSP) for graph pruning based on Locality-Sensitive Hashing. We aim to sparsify a graph so that similar local environments of the original graph result in similar environments in the resulting sparsified graph, which is an essential feature for graph-related tasks. To justify the application of pruning based on local graph properties, we exemplify the advantage of applying pruning based on locality properties over other pruning strategies in various scenarios. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of LSP, which removes a significant amount of edges from large graphs without compromising the performance, accompanied by a considerable acceleration.

【2】 Graph Transplant: Node Saliency-Guided Graph Mixup with Local Structure Preservation 标题：图移植：节点显著性引导图与局部结构保留的混合链接：https://arxiv.org/abs/2111.05639

作者：Joonhyung Park,Hajin Shim,Eunho Yang 机构：KAIST, South Korea 备注：Graph augmentation method for graph classification 摘要：图结构数据集通常具有不规则的图大小和连通性，这使得使用最新的数据增强技术（如混合）变得困难。为了应对这一挑战，我们提出了第一种类似于混合图的图级增强方法，称为图移植，它在数据空间中混合不规则图。为了在图的不同尺度上得到很好的定义，我们的方法将子结构识别为可以保留局部信息的混合单元。由于没有特别考虑上下文的基于混合的方法容易产生噪声样本，我们的方法显式地利用节点显著性信息来选择有意义的子图并自适应地确定标签。我们在不同大小的图域的多个图分类基准数据集上使用不同的GNN体系结构对我们的方法进行了广泛的验证。实验结果表明，与其他基础数据扩充基线相比，该方法具有一致的优越性。我们还证明了图形移植在鲁棒性和模型校准方面增强了性能。摘要：Graph-structured datasets usually have irregular graph sizes and connectivities, rendering the use of recent data augmentation techniques, such as Mixup, difficult. To tackle this challenge, we present the first Mixup-like graph augmentation method at the graph-level called Graph Transplant, which mixes irregular graphs in data space. To be well defined on various scales of the graph, our method identifies the sub-structure as a mix unit that can preserve the local information. Since the mixup-based methods without special consideration of the context are prone to generate noisy samples, our method explicitly employs the node saliency information to select meaningful subgraphs and adaptively determine the labels. We extensively validate our method with diverse GNN architectures on multiple graph classification benchmark datasets from a wide range of graph domains of different sizes. Experimental results show the consistent superiority of our method over other basic data augmentation baselines. We also demonstrate that Graph Transplant enhances the performance in terms of robustness and model calibration.

【3】 Convolutional Neural Network Dynamics: A Graph Perspective 标题：卷积神经网络动力学：图论视角链接：https://arxiv.org/abs/2111.05410

作者：Fatemeh Vahedian,Ruiyu Li,Puja Trivedi,Di Jin,Danai Koutra 机构：University of Michigan 摘要：神经网络（NNs）在广泛应用中的成功，使人们对理解这些模型的潜在学习动态产生了越来越大的兴趣。在本文中，我们从图形的角度研究NNs的图形结构与其性能之间的关系，从而超越了单纯的学习动力学描述。具体而言，我们建议（1）将神经网络学习过程表示为一个时间演化图（即一系列跨时代的静态图快照），（2）在一个简单的时间摘要中捕获神经网络在训练阶段的结构变化，以及（3）利用结构摘要预测分类或回归任务中基础NN的准确性。对于NNs的动态图表示，我们探索了全连通层和卷积层的结构表示，这是强大NN模型的关键组件。我们的分析表明，简单总结几个时期的图统计数据，如加权度和特征向量中心度，可以用来准确预测NNs的性能。例如，基于LeNet体系结构的5个训练时段构建的基于加权度的时间演化图摘要实现了93%以上的分类准确率。我们的发现对于不同的神经网络体系结构是一致的，包括LeNet、VGG、AlexNet和ResNet。摘要：The success of neural networks (NNs) in a wide range of applications has led to increased interest in understanding the underlying learning dynamics of these models. In this paper, we go beyond mere descriptions of the learning dynamics by taking a graph perspective and investigating the relationship between the graph structure of NNs and their performance. Specifically, we propose (1) representing the neural network learning process as a time-evolving graph (i.e., a series of static graph snapshots over epochs), (2) capturing the structural changes of the NN during the training phase in a simple temporal summary, and (3) leveraging the structural summary to predict the accuracy of the underlying NN in a classification or regression task. For the dynamic graph representation of NNs, we explore structural representations for fully-connected and convolutional layers, which are key components of powerful NN models. Our analysis shows that a simple summary of graph statistics, such as weighted degree and eigenvector centrality, over just a few epochs can be used to accurately predict the performance of NNs. For example, a weighted degree-based summary of the time-evolving graph that is constructed based on 5 training epochs of the LeNet architecture achieves classification accuracy of over 93%. Our findings are consistent for different NN architectures, including LeNet, VGG, AlexNet and ResNet.

【4】 Graph Matching via Optimal Transport 标题：基于最优传输的图匹配链接：https://arxiv.org/abs/2111.05366

作者：Ali Saad-Eldin,Benjamin D. Pedigo,Carey E. Priebe,Joshua T. Vogelstein 机构：Department of Biomedical Engineering, Johns Hopkins University, Department of Applied Mathematics and Statistics, Johns Hopkins University, Institute for Computational Medicine, Kavli Neuroscience Discovery Institute, Johns Hopkins University 摘要：图匹配问题旨在找到两个图的节点之间的对齐，以最小化邻接不一致的数量。由于图形匹配在运筹学、计算机视觉、神经科学等领域的应用，解决图形匹配问题变得越来越重要。然而，目前最先进的算法在匹配非常大的图时效率低下，尽管它们能产生很好的精度。这些算法的主要计算瓶颈是线性分配问题，它必须在每次迭代中解决。在本文中，我们利用优化交通领域的最新进展来取代线性分配算法。我们介绍了GOAT，它是对最先进的图形匹配近似算法“FAQ”（Vogelstein，2015）的一种修改，将其线性和分配步骤替换为Cuturi（2013）的“光速最佳传输”方法。该修改提高了速度和经验匹配精度。该方法的有效性在模拟和真实数据的匹配图中得到了验证。摘要：The graph matching problem seeks to find an alignment between the nodes of two graphs that minimizes the number of adjacency disagreements. Solving the graph matching is increasingly important due to it's applications in operations research, computer vision, neuroscience, and more. However, current state-of-the-art algorithms are inefficient in matching very large graphs, though they produce good accuracy. The main computational bottleneck of these algorithms is the linear assignment problem, which must be solved at each iteration. In this paper, we leverage the recent advances in the field of optimal transport to replace the accepted use of linear assignment algorithms. We present GOAT, a modification to the state-of-the-art graph matching approximation algorithm "FAQ" (Vogelstein, 2015), replacing its linear sum assignment step with the "Lightspeed Optimal Transport" method of Cuturi (2013). The modification provides improvements to both speed and empirical matching accuracy. The effectiveness of the approach is demonstrated in matching graphs in simulated and real data examples.

Transformer(1篇)

【1】 Multi-Task Prediction of Clinical Outcomes in the Intensive Care Unit using Flexible Multimodal Transformers 标题：使用柔性多模式Transformer对重症监护病房临床结果的多任务预测链接：https://arxiv.org/abs/2111.05431

作者：Benjamin Shickel,Patrick J. Tighe,Azra Bihorac,Parisa Rashidi 机构：University of Florida, Gainesville, FL, United States 摘要：最近基于Transformer模型体系结构的深度学习研究已经证明了在各种领域和任务中的最新性能，主要是在计算机视觉和自然语言处理领域。虽然最近的一些研究使用电子健康记录数据为临床任务实施了Transformer，但它们在范围、灵活性和全面性方面受到限制。在这项研究中，我们提出了一个灵活的基于转换器的EHR嵌入管道和预测模型框架，该框架引入了对现有工作流的一些新的修改，这些修改利用了医疗领域特有的数据属性。我们在重症监护病房的一个案例研究中展示了我们灵活设计的可行性，我们的模型准确预测了未来多个时间范围内与再入院和患者死亡率相关的七个临床结果。摘要：Recent deep learning research based on Transformer model architectures has demonstrated state-of-the-art performance across a variety of domains and tasks, mostly within the computer vision and natural language processing domains. While some recent studies have implemented Transformers for clinical tasks using electronic health records data, they are limited in scope, flexibility, and comprehensiveness. In this study, we propose a flexible Transformer-based EHR embedding pipeline and predictive model framework that introduces several novel modifications of existing workflows that capitalize on data attributes unique to the healthcare domain. We showcase the feasibility of our flexible design in a case study in the intensive care unit, where our models accurately predict seven clinical outcomes pertaining to readmission and patient mortality over multiple future time horizons.

GAN|对抗|攻击|生成相关(1篇)

【1】 Training Generative Adversarial Networks with Adaptive Composite Gradient 标题：用自适应复合梯度训练产生式对抗网络链接：https://arxiv.org/abs/2111.05508

作者：Huiqing Qi,Fang Li,Shengli Tan,Xiangyun Zhang 机构：School of Mathematical Sciences, East China Normal University, Shanghai , China 摘要：生成对抗网络的广泛应用得益于成功的训练方法，保证目标函数收敛到局部极小值。然而，由于一些基于梯度的方法的循环行为以及这些基于Hessian矩阵的方法的昂贵计算成本，设计一种高效且具有竞争力的训练方法仍然是一项具有挑战性的任务。本文提出了自适应复合梯度（ACG）方法，在适当的设置下，该方法在双线性对策中线性收敛。理论和玩具函数实验表明，与最近提出的算法相比，我们的方法可以缓解循环行为，收敛速度更快。值得注意的是，ACG方法不仅用于寻找双线性对策以及一般对策中的稳定不动点。ACG方法是一种新的半梯度自由算法，因为它不需要计算每一步的梯度，通过在未来的迭代中利用预测信息来降低梯度和Hessian的计算成本。我们进行了两个混合高斯实验，将ACG与现有的线性GANs算法相结合。结果表明，ACG算法和以前的算法相比具有一定的竞争力。使用DCGANs对四个流行数据集（MNIST、Fashion MNIST、CIFAR-10和CelebA）进行的真实实验表明，我们的ACG方法优于几个基线，这说明了我们方法的优越性和有效性。摘要：The wide applications of Generative adversarial networks benefit from the successful training methods, guaranteeing that an object function converges to the local minima. Nevertheless, designing an efficient and competitive training method is still a challenging task due to the cyclic behaviors of some gradient-based ways and the expensive computational cost of these methods based on the Hessian matrix. This paper proposed the adaptive Composite Gradients (ACG) method, linearly convergent in bilinear games under suitable settings. Theory and toy-function experiments suggest that our approach can alleviate the cyclic behaviors and converge faster than recently proposed algorithms. Significantly, the ACG method is not only used to find stable fixed points in bilinear games as well as in general games. The ACG method is a novel semi-gradient-free algorithm since it does not need to calculate the gradient of each step, reducing the computational cost of gradient and Hessian by utilizing the predictive information in future iterations. We conducted two mixture of Gaussians experiments by integrating ACG to existing algorithms with Linear GANs. Results show ACG is competitive with the previous algorithms. Realistic experiments on four prevalent data sets (MNIST, Fashion-MNIST, CIFAR-10, and CelebA) with DCGANs show that our ACG method outperforms several baselines, which illustrates the superiority and efficacy of our method.

半/弱/无/有监督|不确定性|主动学习(1篇)

【1】 Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity 标题：放松跨模态时间同步性的自监督视听表征学习链接：https://arxiv.org/abs/2111.05329

作者：Pritam Sarkar,Ali Etemad 机构：Queen’s University, Canada, Vector Institute 备注：18 pages 摘要：我们提出了Crosscross，一个用于学习视听表示的自我监督框架。在我们的框架中引入了一个新概念，即除了学习模式内和标准的“同步”跨模式关系外，Crosscross还学习“异步”跨模式关系。我们表明，通过放松音频和视频模式之间的时间同步性，网络学习强时不变表示。我们的实验表明，音频和视频模式的强增强以及跨模式时间同步性的放松优化了性能。为了预先训练我们提出的框架，我们使用了3个不同大小的数据集：Kinetics Sound、Kinetics-400和AudioSet。学习到的表征在许多下游任务上进行评估，即动作识别、声音分类和检索。Crosscross展示了动作识别（UCF101和HMDB51）和声音分类（ESC50）的最新性能。代码和预训练模型将公开提供。摘要：We present CrissCross, a self-supervised framework for learning audio-visual representations. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard 'synchronous' cross-modal relations, CrissCross also learns 'asynchronous' cross-modal relationships. We show that by relaxing the temporal synchronicity between the audio and visual modalities, the network learns strong time-invariant representations. Our experiments show that strong augmentations for both audio and visual modalities with relaxation of cross-modal temporal synchronicity optimize performance. To pretrain our proposed framework, we use 3 different datasets with varying sizes, Kinetics-Sound, Kinetics-400, and AudioSet. The learned representations are evaluated on a number of downstream tasks namely action recognition, sound classification, and retrieval. CrissCross shows state-of-the-art performances on action recognition (UCF101 and HMDB51) and sound classification (ESC50). The codes and pretrained models will be made publicly available.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 Inclusive Speaker Verification with Adaptive thresholding 标题：基于自适应阈值的包容性说话人确认链接：https://arxiv.org/abs/2111.05501

作者：Navdeep Jain,Hongcheng Wang 机构：Comcast Applied AI Research 摘要：在商业应用程序中使用基于说话人验证（SV）的系统时，重要的是，客户无论性别、年龄或种族，都要有包容性的体验。在本文中，我们分析了性别和年龄对SV的影响，发现对于不同性别和年龄组的期望共同错误接受率（FAR），不同性别和年龄组的错误拒绝率（FRR）是不同的。为了优化所有用户的FRR以获得所需的FAR，我们提出了SV的上下文（例如性别、年龄）自适应阈值框架。上下文可以作为许多实际应用的先验信息。我们还提出了一个连接的性别/年龄检测模型，在没有这些先验信息的情况下通过算法推导上下文。我们的实验表明，我们的上下文自适应阈值方法在构建更高效的包容性SV系统中是有效的。具体地说，我们表明，我们可以通过使用性别特定阈值，在voxceleb1测试集上降低特定性别所需FAR的FRR。对OGI儿童语音语料库的类似分析表明，通过使用特定于年龄的阈值，我们可以显著降低特定年龄组所需FAR的FRR。摘要：While using a speaker verification (SV) based system in a commercial application, it is important that customers have an inclusive experience irrespective of their gender, age, or ethnicity. In this paper, we analyze the impact of gender and age on SV and find that for a desired common False Acceptance Rate (FAR) across different gender and age groups, the False Rejection Rate (FRR) is different for different gender and age groups. To optimize FRR for all users for a desired FAR, we propose a context (e.g. gender, age) adaptive thresholding framework for SV. The context can be available as prior information for many practical applications. We also propose a concatenated gender/age detection model to algorithmically derive the context in absence of such prior information. We experimentally show that our context-adaptive thresholding method is effective in building a more efficient inclusive SV system. Specifically, we show that we can reduce FRR for specific gender for a desired FAR on the voxceleb1 test set by using gender-specific thresholds. Similar analysis on OGI kids' speech corpus shows that by using an age-specific threshold, we can significantly reduce FRR for certain age groups for desired FAR.

【2】 Gaussian Process Meta Few-shot Classifier Learning via Linear Discriminant Laplace Approximation 标题：基于线性判别拉普拉斯逼近的高斯过程元小镜头分类器学习链接：https://arxiv.org/abs/2111.05392

作者：Minyoung Kim,Timothy Hospedales 机构：Samsung AI Center, Cambridge, UK, University of Edinburgh, Edinburgh, United Kingdom 摘要：元学习Few-Shot分类是机器学习中的一个新兴问题，最近受到了极大的关注，其目标是学习一个只需少量标记数据就能快速适应新任务的模型。我们考虑贝叶斯高斯过程（GP）的方法，在其中我们学习GP先验，并适应新的任务是由GP预测模型进行的后验推断。我们采用拉普拉斯后验近似，但为了避免迭代梯度步骤寻找映射解，我们引入了一个新的线性判别分析（LDA）插件作为映射解的替代。本质上，MAP解由LDA估计近似，但为了考虑GP先验，我们采用先验范数调整来估计LDA的共享方差参数，从而确保调整后的估计与GP先验一致。这使得闭合形式的可微GP后验分布和预测分布成为可能，从而允许快速元训练。与以前的方法相比，我们显示出了相当大的改进。摘要：The meta learning few-shot classification is an emerging problem in machine learning that received enormous attention recently, where the goal is to learn a model that can quickly adapt to a new task with only a few labeled data. We consider the Bayesian Gaussian process (GP) approach, in which we meta-learn the GP prior, and the adaptation to a new task is carried out by the GP predictive model from the posterior inference. We adopt the Laplace posterior approximation, but to circumvent the iterative gradient steps for finding the MAP solution, we introduce a novel linear discriminant analysis (LDA) plugin as a surrogate for the MAP solution. In essence, the MAP solution is approximated by the LDA estimate, but to take the GP prior into account, we adopt the prior-norm adjustment to estimate LDA's shared variance parameters, which ensures that the adjusted estimate is consistent with the GP prior. This enables closed-form differentiable GP posteriors and predictive distributions, thus allowing fast meta training. We demonstrate considerable improvement over the previous approaches.

强化学习(3篇)

【1】 DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning 标题：Decom：约束协作多Agent强化学习的分解策略链接：https://arxiv.org/abs/2111.05670

作者：Zhaoxing Yang,Rong Ding,Haiming Jin,Yifei Wei,Haoyi You,Guiyun Fan,Xiaoying Gan,Xinbing Wang 机构：Shanghai Jiao Tong University 备注：25 pages 摘要：近年来，多智能体强化学习（MARL）在各种应用中取得了令人瞩目的成绩。然而，物理限制、预算限制和许多其他因素通常会对多代理系统（MAS）施加约束，这是传统MARL框架无法处理的。具体地说，本文重点研究了约束MASE，其中代理在各种约束条件下协同工作以最大化预期团队平均收益，并为此类MASE开发了一个名为DeCOM的约束合作MARL框架。特别是，DeCOM将每个代理的策略分解为两个模块，这使得代理之间能够共享信息以实现更好的合作。此外，通过这种模块化，DeCOM的训练算法将原来的约束优化分解为报酬上的无约束优化和费用上的约束满足问题。DeCOM然后以计算效率高的方式迭代解决这些问题，这使得DeCOM具有高度的可伸缩性。我们还为DeCOM策略更新算法的收敛性提供了理论保证。最后，我们验证了DeCOM在玩具和大规模（500个代理）环境中的有效性。摘要：In recent years, multi-agent reinforcement learning (MARL) has presented impressive performance in various applications. However, physical limitations, budget restrictions, and many other factors usually impose textit{constraints} on a multi-agent system (MAS), which cannot be handled by traditional MARL frameworks. Specifically, this paper focuses on constrained MASes where agents work textit{cooperatively} to maximize the expected team-average return under various constraints on expected team-average costs, and develops a textit{constrained cooperative MARL} framework, named DeCOM, for such MASes. In particular, DeCOM decomposes the policy of each agent into two modules, which empowers information sharing among agents to achieve better cooperation. In addition, with such modularization, the training algorithm of DeCOM separates the original constrained optimization into an unconstrained optimization on reward and a constraints satisfaction problem on costs. DeCOM then iteratively solves these problems in a computationally efficient manner, which makes DeCOM highly scalable. We also provide theoretical guarantees on the convergence of DeCOM's policy update algorithm. Finally, we validate the effectiveness of DeCOM with various types of costs in both toy and large-scale (with 500 agents) environments.

【2】 Spatially and Seamlessly Hierarchical Reinforcement Learning for State Space and Policy space in Autonomous Driving 标题：自主驾驶状态空间和策略空间的空间无缝分层强化学习链接：https://arxiv.org/abs/2111.05479

作者：Jaehyun Kim,Jaeseung Jeong 备注：14 pages, 8 figures, and 3 tables 摘要：尽管分层强化学习取得了进展，但其在公路自主驾驶路径规划中的应用仍具有挑战性。一个原因是传统的分层强化学习方法由于其风险性而不适用于自主驾驶：代理必须移动以避免多个障碍物，例如高度不可预测的其他代理，因此安全区域很小、分散且随时间变化。为了克服这一挑战，我们提出了一种适用于状态空间和策略空间的空间分层强化学习方法。高级策略不仅选择行为子策略，还选择状态空间中需要注意的区域和策略空间中的大纲。随后，低级策略在高级命令选择的区域大纲内详细说明代理的短期目标位置。我们的方法所建议的网络结构和优化与单级方法一样简洁。在具有各种形状道路的环境上的实验表明，我们的方法从早期事件中找到接近最优的策略，优于基线分层强化学习方法，尤其是在狭窄和复杂的道路上。由此产生的道路上的轨迹与人类在行为规划层面上的策略相似。摘要：Despite advances in hierarchical reinforcement learning, its applications to path planning in autonomous driving on highways are challenging. One reason is that conventional hierarchical reinforcement learning approaches are not amenable to autonomous driving due to its riskiness: the agent must move avoiding multiple obstacles such as other agents that are highly unpredictable, thus safe regions are small, scattered, and changeable over time. To overcome this challenge, we propose a spatially hierarchical reinforcement learning method for state space and policy space. The high-level policy selects not only behavioral sub-policy but also regions to pay mind to in state space and for outline in policy space. Subsequently, the low-level policy elaborates the short-term goal position of the agent within the outline of the region selected by the high-level command. The network structure and optimization suggested in our method are as concise as those of single-level methods. Experiments on the environment with various shapes of roads showed that our method finds the nearly optimal policies from early episodes, outperforming a baseline hierarchical reinforcement learning method, especially in narrow and complex roads. The resulting trajectories on the roads were similar to those of human strategies on the behavioral planning level.

【3】 Dealing with the Unknown: Pessimistic Offline Reinforcement Learning 标题：应对未知：悲观的离线强化学习链接：https://arxiv.org/abs/2111.05440

作者：Jinning Li,Chen Tang,Masayoshi Tomizuka,Wei Zhan 机构：Department of Mechanical Engineering, University of California, Berkeley, United States 备注：Published in 5th Annual Conference on Robot Learning (CoRL 2021) 摘要：强化学习（RL）已被证明在智能体可以通过与其操作环境的主动交互来学习策略的领域是有效的。然而，如果我们将RL方案更改为离线设置，其中代理只能通过静态数据集更新其策略，那么离线强化学习中的一个主要问题就会出现，即分布转移。我们提出了一种悲观离线强化学习（PessORL）算法，通过操纵值函数，主动引导agent返回到熟悉的区域。我们关注由分布外（OOD）状态引起的问题，并故意惩罚训练数据集中不存在的状态的高值，以便学习到的悲观值函数在状态空间中的任何位置降低真实值。我们在各种基准任务上评估了PessORL算法，结果表明，与那些只考虑OOD操作的方法相比，我们的方法通过显式处理OOD状态获得了更好的性能。摘要：Reinforcement Learning (RL) has been shown effective in domains where the agent can learn policies by actively interacting with its operating environment. However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional shift. We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset, so that the learned pessimistic value function lower bounds the true value anywhere within the state space. We evaluate the PessORL algorithm on various benchmark tasks, where we show that our method gains better performance by explicitly handling OOD states, when compared to those methods merely considering OOD actions.

医学相关(2篇)

【1】 STNN-DDI: A Substructure-aware Tensor Neural Network to Predict Drug-Drug Interactions 标题：STNN-DDI：一种预测药物相互作用的子结构感知张量神经网络链接：https://arxiv.org/abs/2111.05708

作者：Hui Yu,ShiYu Zhao,JianYu Shi 机构：School of Computer Science, Northwestern Polytechnical University, Xi’an , China, School of Life Sciences, Northwestern Polytechnical University, Xi’an , China. 摘要：动机：多类型药物相互作用（DDI）的计算预测有助于减少多药治疗中意外的副作用。尽管现有的计算方法取得了令人鼓舞的结果，但它们忽略了药物的作用主要是由其化学亚结构引起的。此外，它们的可解释性仍然很弱。结果：本文假设两种药物之间的相互作用是由它们的局部化学结构（子结构）引起的，它们的DDI类型是由不同子结构集之间的联系决定的，我们设计了一种新的子结构-张量神经网络DDI预测模型（STNN-DDI）。该模型学习（子结构，交互类型，子结构）三元组的三维张量，表征子结构-子结构相互作用（SSI）空间。根据具有特定化学意义的预定义子结构列表，将药物映射到此SSI空间使STNN-DDI能够以统一的形式以可解释的方式在转导和诱导情景中执行多类型DDI预测。与基于深度学习的最新基线的比较表明，STNN-DDI在AUC、AUPR、准确度和精密度方面都有显著提高，具有优越性。更重要的是，案例研究通过揭示药物间关于DDI感兴趣类型的关键子结构对和揭示给定DDI中相互作用类型特异性子结构对来说明其可解释性。总之，STNN-DDI为预测DDI以及解释药物间的相互作用机制提供了一种有效的方法。摘要：Motivation: Computational prediction of multiple-type drug-drug interaction (DDI) helps reduce unexpected side effects in poly-drug treatments. Although existing computational approaches achieve inspiring results, they ignore that the action of a drug is mainly caused by its chemical substructures. In addition, their interpretability is still weak. Results: In this paper, by supposing that the interactions between two given drugs are caused by their local chemical structures (sub-structures) and their DDI types are determined by the linkages between different substructure sets, we design a novel Substructure-ware Tensor Neural Network model for DDI prediction (STNN-DDI). The proposed model learns a 3-D tensor of (substructure, in-teraction type, substructure) triplets, which characterizes a substructure-substructure interaction (SSI) space. According to a list of predefined substructures with specific chemical meanings, the mapping of drugs into this SSI space enables STNN-DDI to perform the multiple-type DDI prediction in both transductive and inductive scenarios in a unified form with an explicable manner. The compar-ison with deep learning-based state-of-the-art baselines demonstrates the superiority of STNN-DDI with the significant improvement of AUC, AUPR, Accuracy, and Precision. More importantly, case studies illustrate its interpretability by both revealing a crucial sub-structure pair across drugs regarding a DDI type of interest and uncovering interaction type-specific substructure pairs in a given DDI. In summary, STNN-DDI provides an effective approach to predicting DDIs as well as explaining the interaction mechanisms among drugs.

【2】 Biomarker Gene Identification for Breast Cancer Classification 标题：用于乳腺癌分类的生物标记物基因识别链接：https://arxiv.org/abs/2111.05546

作者：Sheetal Rajpal,Ankit Rajpal,Manoj Agarwal,Naveen Kumar 机构：Department of Computer Science, University of Delhi, Department of Computer Science, Hans Raj College, University of Delhi 备注：Paper submitted to Technology and Healthcare for review 摘要：背景：乳腺癌已成为女性中最常见的癌症之一，导致高死亡率。由于乳腺癌的异质性，需要识别与乳腺癌亚型相关的差异表达基因，以便及时诊断和治疗。目的：为了为四种乳腺癌亚型中的每一种确定一个小的基因集作为其特征，本文提出了一种新的基因特征识别算法。方法：本研究采用可解释的人工智能方法，研究用于亚型分类的深层神经网络的预测，以利用TCGA乳腺癌RNA序列数据识别生物标记物。结果：提出的算法发现了一组43个差异表达的基因特征。使用神经网络分类器，我们获得了竞争性平均精度为0.91的10倍。此外，基因集分析揭示了几个相关的途径，如ERBB2和p53信号通路中的GRB7事件。使用皮尔逊相关矩阵，我们注意到亚型特异性基因在每个亚型内都是相关的。结论：提出的技术使我们能够找到一个简洁和临床相关的基因特征集。摘要：BACKGROUND: Breast cancer has emerged as one of the most prevalent cancers among women leading to a high mortality rate. Due to the heterogeneous nature of breast cancer, there is a need to identify differentially expressed genes associated with breast cancer subtypes for its timely diagnosis and treatment. OBJECTIVE: To identify a small gene set for each of the four breast cancer subtypes that could act as its signature, the paper proposes a novel algorithm for gene signature identification. METHODS: The present work uses interpretable AI methods to investigate the predictions made by the deep neural network employed for subtype classification to identify biomarkers using the TCGA breast cancer RNA Sequence data. RESULTS: The proposed algorithm led to the discovery of a set of 43 differentially expressed gene signatures. We achieved a competitive average 10-fold accuracy of 0.91, using neural network classifier. Further, gene set analysis revealed several relevant pathways, such as GRB7 events in ERBB2 and p53 signaling pathway. Using the Pearson correlation matrix, we noted that the subtype-specific genes are correlated within each subtype. CONCLUSIONS: The proposed technique enables us to find a concise and clinically relevant gene signature set.

蒸馏|知识提取(1篇)

【1】 Multimodal Approach for Metadata Extraction from German Scientific Publications 标题：从德国科学出版物中提取元数据的多模态方法链接：https://arxiv.org/abs/2111.05736

作者：Azeddine Bouabdallah,Jorge Gavilan,Jennifer Gerbl,Prayuth Patumcharoenpol 机构：Institute for Web Science and Technologies (WeST), University of Koblenz-Landau, Koblenz, Germany 备注：8 pages, 5 figures, 4 tables 摘要：如今，元数据信息通常由作者自己在提交时提供。然而，现有研究论文中有相当一部分缺少或不完整的元数据信息。德国科学论文有各种各样的布局，这使得元数据的提取成为一项非常重要的任务，需要一种精确的方法对从文档中提取的元数据进行分类。在本文中，我们提出了一种从德语科学论文中提取元数据的多模式深度学习方法。我们结合自然语言处理和图像视觉处理，考虑了多种类型的输入数据。与其他最先进的方法相比，此模型旨在提高元数据提取的总体准确性。它能够利用空间和上下文特征，以实现更可靠的提取。我们的这种方法模型是在一个由大约8800个文档组成的数据集上训练的，能够获得0.923的F1总分。摘要：Nowadays, metadata information is often given by the authors themselves upon submission. However, a significant part of already existing research papers have missing or incomplete metadata information. German scientific papers come in a large variety of layouts which makes the extraction of metadata a non-trivial task that requires a precise way to classify the metadata extracted from the documents. In this paper, we propose a multimodal deep learning approach for metadata extraction from scientific papers in the German language. We consider multiple types of input data by combining natural language processing and image vision processing. This model aims to increase the overall accuracy of metadata extraction compared to other state-of-the-art approaches. It enables the utilization of both spatial and contextual features in order to achieve a more reliable extraction. Our model for this approach was trained on a dataset consisting of around 8800 documents and is able to obtain an overall F1-score of 0.923.

聚类(1篇)

【1】 Clustering of longitudinal data: A tutorial on a variety of approaches 标题：纵向数据聚类：关于各种方法的教程链接：https://arxiv.org/abs/2111.05469

作者：Niek Den Teuling,Steffen Pauws,Edwin van den Heuvel 机构： Eindhoven University of Technology, Tilburg University 备注：37 pages, 12 figures 摘要：在过去二十年中，在许多研究领域，识别纵向数据中具有不同趋势的群体的方法越来越受到关注。为了支持研究者，我们总结了文献中关于纵向聚类的指导。此外，我们提出了一系列纵向聚类方法，包括基于组的轨迹建模（GBTM）、增长混合建模（GMM）和纵向k-均值（KML）。这些方法是在基本层面上介绍的，并列出了它们的优点、局限性和模型扩展。随着数据收集的最新发展，人们关注这些方法对密集纵向数据（ILD）的适用性。我们使用R中提供的包在合成数据集上演示了这些方法的应用。摘要：During the past two decades, methods for identifying groups with different trends in longitudinal data have become of increasing interest across many areas of research. To support researchers, we summarize the guidance from the literature regarding longitudinal clustering. Moreover, we present a selection of methods for longitudinal clustering, including group-based trajectory modeling (GBTM), growth mixture modeling (GMM), and longitudinal k-means (KML). The methods are introduced at a basic level, and strengths, limitations, and model extensions are listed. Following the recent developments in data collection, attention is given to the applicability of these methods to intensive longitudinal data (ILD). We demonstrate the application of the methods on a synthetic dataset using packages available in R.

联邦学习|隐私保护|加密(2篇)

【1】 DACFL: Dynamic Average Consensus Based Federated Learning in Decentralized Topology 标题：DACFL：分布式拓扑中基于动态平均一致性的联邦学习链接：https://arxiv.org/abs/2111.05505

作者：Zhikun Chen,Daofeng Li,Jinkang Zhu,Sihai Zhang 机构： University of Science and Technology of China 摘要：联邦学习（FL）是一个新兴的分布式机器学习框架，其中一个中心参数服务器（PS）协调许多本地用户来训练一个全局一致的模型。传统的联合学习不可避免地依赖于带有PS的集中式拓扑。因此，一旦PS出现故障，它将瘫痪。为了缓解这种单点故障，特别是在PS上，一些现有工作提供了分散FL（DFL）实现，如CDSGD和D-PSGD，以促进分散拓扑中的FL。然而，这些方法仍存在一些问题，例如，CDSGD中用户的最终模型与D-PSGD中网络范围模型的平均必要性之间存在显著差异。为了解决这些不足，本文设计了一种新的DFL实现，称为DACFL，每个用户使用自己的训练数据训练自己的模型，并通过对称双随机矩阵与邻居交换中间模型。DACFL将每个用户的本地训练进度视为一个离散时间过程，并采用一阶动态平均一致性（FODAC）方法在没有PS的情况下跟踪text{average model}，我们还提供了基于i.i.d数据的DACFL的理论收敛性分析，以加强其合理性。在MNIST、Fashion MNIST和CIFAR-10上的实验结果验证了我们的解决方案在时不变和时变网络拓扑中的可行性，并表明DACFL在大多数情况下都优于D-PSGD和CDSGD。摘要：Federated learning (FL) is a burgeoning distributed machine learning framework where a central parameter server (PS) coordinates many local users to train a globally consistent model. Conventional federated learning inevitably relies on a centralized topology with a PS. As a result, it will paralyze once the PS fails. To alleviate such a single point failure, especially on the PS, some existing work has provided decentralized FL (DFL) implementations like CDSGD and D-PSGD to facilitate FL in a decentralized topology. However, there are still some problems with these methods, e.g., significant divergence between users' final models in CDSGD and a network-wide model average necessity in D-PSGD. In order to solve these deficiency, this paper devises a new DFL implementation coined as DACFL, where each user trains its model using its own training data and exchanges the intermediate models with its neighbors through a symmetric and doubly stochastic matrix. The DACFL treats the progress of each user's local training as a discrete-time process and employs a first order dynamic average consensus (FODAC) method to track the textit{average model} in the absence of the PS. In this paper, we also provide a theoretical convergence analysis of DACFL on the premise of i.i.d data to strengthen its rationality. The experimental results on MNIST, Fashion-MNIST and CIFAR-10 validate the feasibility of our solution in both time-invariant and time-varying network topologies, and declare that DACFL outperforms D-PSGD and CDSGD in most cases.

【2】 DP-REC: Private & Communication-Efficient Federated Learning 标题：DP-REC：私有高效通信联合学习链接：https://arxiv.org/abs/2111.05454

作者：Aleksei Triastcyn,Matthias Reisser,Christos Louizos 机构：Qualcomm AI Research∗ 摘要：隐私和通信效率是神经网络联合训练的重要挑战，将它们结合起来仍然是一个开放的问题。在这项工作中，我们开发了一种将高度压缩通信和差异隐私（DP）相结合的方法。我们将基于相对熵编码（REC）的压缩技术引入到联邦设置中。通过对REC的微小修改，我们得到了一个可证明的差异私有学习算法DP-REC，并展示了如何计算其隐私保证。我们的实验表明，DP-REC可大幅降低通信成本，同时提供与最新技术相当的隐私保障。摘要：Privacy and communication efficiency are important challenges in federated training of neural networks, and combining them is still an open problem. In this work, we develop a method that unifies highly compressed communication and differential privacy (DP). We introduce a compression technique based on Relative Entropy Coding (REC) to the federated setting. With a minor modification to REC, we obtain a provably differentially private learning algorithm, DP-REC, and show how to compute its privacy guarantees. Our experiments demonstrate that DP-REC drastically reduces communication costs while providing privacy guarantees comparable to the state-of-the-art.

推理|分析|理解|解释(4篇)

【1】 Counterfactual Explanations for Models of Code 标题：代码模型的反事实解释链接：https://arxiv.org/abs/2111.05711

作者：Jürgen Cito,Isil Dillig,Vijayaraghavan Murali,Satish Chandra 机构：TU Wien and Facebook, Austria, UT Austin†, U.S.A. 备注：10 pages, 6 listings, 2 algorithms, 2 tables, 1 figure 摘要：机器学习（ML）模型在许多软件工程任务中扮演着越来越普遍的角色。然而，由于大多数模型现在都由不透明的深层神经网络提供支持，开发人员可能很难理解为什么模型会得出某个结论，以及如何根据模型的预测采取行动。基于这个问题，本文探讨了源代码模型的反事实解释。这种反事实的解释构成了对源代码的最小更改，在此更改下模型“改变了主意”。我们将反事实解释生成集成到真实环境中的源代码模型中。我们描述了影响找到真实和可信的反事实解释的能力的考虑因素，以及此类解释对模型用户的有用性。在一系列实验中，我们在三种不同的模型上研究了我们的方法的有效性，每种模型都基于在源代码上运行的类似于BERT的体系结构。摘要：Machine learning (ML) models play an increasingly prevalent role in many software engineering tasks. However, because most models are now powered by opaque deep neural networks, it can be difficult for developers to understand why the model came to a certain conclusion and how to act upon the model's prediction. Motivated by this problem, this paper explores counterfactual explanations for models of source code. Such counterfactual explanations constitute minimal changes to the source code under which the model "changes its mind". We integrate counterfactual explanation generation to models of source code in a real-world setting. We describe considerations that impact both the ability to find realistic and plausible counterfactual explanations, as well as the usefulness of such explanation to the user of the model. In a series of experiments we investigate the efficacy of our approach on three different models, each based on a BERT-like architecture operating over source code.

【2】 Social Fraud Detection Review: Methods, Challenges and Analysis 标题：社会诈骗侦查审查：方法、挑战与分析链接：https://arxiv.org/abs/2111.05645

作者：Saeedreza Shehnepoor,Roberto Togneri,Wei Liu,Mohammed Bennamoun 机构：University of Western Australia, Australia 摘要：社交评论已经占据了网络的主导地位，并成为产品信息的合理来源。人们和企业利用这些信息进行决策。企业还利用社交信息，利用单个用户、多组用户或经过训练生成欺诈内容的机器人传播虚假信息。许多研究提出了基于用户行为和审查文本的方法来应对欺诈检测的挑战。为了提供详尽的文献综述，使用一个考虑三个关键组成部分的框架对社会欺诈检测进行了综述：综述本身、进行综述的用户和被综述的项目。在为组件表示提取特征时，将基于行为、基于文本的特征及其组合提供特征方面的审查。在这个框架下，我们对各种方法进行了全面的概述，包括有监督、半监督和无监督学习。介绍了欺诈检测的监督方法，并将其分为两个子类；古典的，深入的学习。解释了缺乏标记数据集的原因，并提出了可能的解决方案。为了帮助该领域的新研究人员更好地理解，在拟议系统框架的每个步骤中都提供了主题分析和未来方向的概述。摘要：Social reviews have dominated the web and become a plausible source of product information. People and businesses use such information for decision-making. Businesses also make use of social information to spread fake information using a single user, groups of users, or a bot trained to generate fraudulent content. Many studies proposed approaches based on user behaviors and review text to address the challenges of fraud detection. To provide an exhaustive literature review, social fraud detection is reviewed using a framework that considers three key components: the review itself, the user who carries out the review, and the item being reviewed. As features are extracted for the component representation, a feature-wise review is provided based on behavioral, text-based features and their combination. With this framework, a comprehensive overview of approaches is presented including supervised, semi-supervised, and unsupervised learning. The supervised approaches for fraud detection are introduced and categorized into two sub-categories; classical, and deep learning. The lack of labeled datasets is explained and potential solutions are suggested. To help new researchers in the area develop a better understanding, a topic analysis and an overview of future directions is provided in each step of the proposed systematic framework.

【3】 Understanding the Generalization Benefit of Model Invariance from a Data Perspective 标题：从数据角度理解模型不变性的泛化效益链接：https://arxiv.org/abs/2111.05529

作者：Sicheng Zhu,Bang An,Furong Huang 机构：Department of Computer Science, University of Maryland, College Park 备注：Accepted to NeurIPS 2021 摘要：机器学习模型在某些类型的数据转换下具有不变性，在实践中表现出更好的泛化能力。然而，对不变性为什么有利于推广的原则性理解是有限的。给定一个数据集，通常没有原则性的方法来选择“合适的”数据转换，在这种转换下，模型不变性可以保证更好的泛化。本文通过引入由变换引起的样本覆盖，即数据集的一个代表子集，可以通过变换近似地恢复整个数据集，研究了模型不变性的泛化优势。对于任何数据转换，我们基于样本覆盖率为不变模型提供精确的泛化边界。我们还通过转换所诱导的样本覆盖数，即其诱导样本覆盖的最小大小，来描述一组数据转换的“适用性”。我们表明，对于样本覆盖数较小的“合适”变换，我们可以收紧推广边界。此外，我们提出的样本覆盖数可以进行经验评估，从而为选择变换以发展模型不变性以更好地推广提供了指导。在多个数据集上的实验中，我们评估了一些常用变换的样本覆盖数，并表明一组变换（例如，3D视图变换）的较小样本覆盖数表明不变模型的测试和训练误差之间的差距较小，这验证了我们的命题。摘要：Machine learning models that are developed to be invariant under certain types of data transformations have shown improved generalization in practice. However, a principled understanding of why invariance benefits generalization is limited. Given a dataset, there is often no principled way to select "suitable" data transformations under which model invariance guarantees better generalization. This paper studies the generalization benefit of model invariance by introducing the sample cover induced by transformations, i.e., a representative subset of a dataset that can approximately recover the whole dataset using transformations. For any data transformations, we provide refined generalization bounds for invariant models based on the sample cover. We also characterize the "suitability" of a set of data transformations by the sample covering number induced by transformations, i.e., the smallest size of its induced sample covers. We show that we may tighten the generalization bounds for "suitable" transformations that have a small sample covering number. In addition, our proposed sample covering number can be empirically evaluated and thus provides a guide for selecting transformations to develop model invariance for better generalization. In experiments on multiple datasets, we evaluate sample covering numbers for some commonly used transformations and show that the smaller sample covering number for a set of transformations (e.g., the 3D-view transformation) indicates a smaller gap between the test and training error for invariant models, which verifies our propositions.

【4】 DataWords: Getting Contrarian with Text, Structured Data and Explanations 标题：DataWords：利用文本、结构化数据和解释实现反向操作链接：https://arxiv.org/abs/2111.05384

作者：Stephen I. Gallant,Mirza Nasir Hossain 机构：Textician, Cambridge, MA 备注：11 pages 摘要：我们的目标是使用自由文本和结构化数据的组合构建分类模型。为此，我们通过文本句子、数据词来表示结构化数据，以便将相似的数据项映射到同一个句子中。这允许只使用文本建模算法对文本和结构化数据的混合进行建模。几个例子说明，在建立模型和分类之前，可以先运行提取工具（命名实体识别），然后将输出转换为数据词，并将数据词添加到原始文本中，从而提高文本分类性能。这种方法还允许我们根据自由文本和结构化数据对推论进行解释。摘要：Our goal is to build classification models using a combination of free-text and structured data. To do this, we represent structured data by text sentences, DataWords, so that similar data items are mapped into the same sentence. This permits modeling a mixture of text and structured data by using only text-modeling algorithms. Several examples illustrate that it is possible to improve text classification performance by first running extraction tools (named entity recognition), then converting the output to DataWords, and adding the DataWords to the original text -- before model building and classification. This approach also allows us to produce explanations for inferences in terms of both free text and structured data.

检测相关(3篇)

【1】 A framework for comprehensible multi-modal detection of cyber threats 标题：一种可理解的多模式网络威胁检测框架链接：https://arxiv.org/abs/2111.05764

作者：Jan Kohout,Čeněk Škarda,Kyrylo Shcherbin,Martin Kopp,Jan Brabec 机构：Cisco Systems, Prague, Czech Republic 摘要：检测企业环境中的恶意活动是一项非常复杂的任务，人们在其自动化方面投入了大量精力。然而，绝大多数现有方法只能在狭窄的范围内操作，这限制了它们只能捕获恶意软件存在的证据片段。因此，这种方法与领域专家研究和描述网络威胁的方式不一致。在这项工作中，我们讨论了这些限制，并设计了一个检测框架，它结合了来自不同数据源的观察事件。有鉴于此，它提供了对攻击生命周期的全面了解，并能够检测到需要将不同遥测数据的观测耦合起来以确定事件的全部范围的威胁。我们在一个公司网络中观察到的真实恶意软件感染的案例研究中展示了该框架的适用性。摘要：Detection of malicious activities in corporate environments is a very complex task and much effort has been invested into research of its automation. However, vast majority of existing methods operate only in a narrow scope which limits them to capture only fragments of the evidence of malware's presence. Consequently, such approach is not aligned with the way how the cyber threats are studied and described by domain experts. In this work, we discuss these limitations and design a detection framework which combines observed events from different sources of data. Thanks to this, it provides full insight into the attack life cycle and enables detection of threats that require this coupling of observations from different telemetries to identify the full scope of the incident. We demonstrate applicability of the framework on a case study of a real malware infection observed in a corporate network.

【2】 Automatically detecting data drift in machine learning classifiers 标题：机器学习分类器中数据漂移的自动检测链接：https://arxiv.org/abs/2111.05672

作者：Samuel Ackerman,Orna Raz,Marcel Zalmanovici,Aviad Zlotnick 机构：IBM Research, Israel 备注：None 摘要：分类器和其他基于统计的机器学习（ML）技术基于训练数据的各种统计特性进行概括或学习。导致理论或经验性能保证的统计ML的基本假设是，训练数据的分布代表生产数据的分布。这种假设经常被打破；例如，数据的统计分布可能会发生变化。我们将影响ML性能的更改称为“数据漂移”或“漂移”。许多分类技术计算结果的置信度。此度量可能无法反映实际的ML性能。一个著名的例子是熊猫图片，其正确分类的置信度约为60%，但当添加噪声时，它被错误分类为置信度大于99%的长臂猿。然而，我们在此报告的工作表明，分类器的置信度可以用于检测数据漂移。我们提出了一种仅基于分类器建议的标签及其可信度的方法，用于警告可能导致数据漂移的数据分布或特征空间变化。我们的方法可以识别模型性能的退化，并且不需要在生产中标记数据，因为生产中经常缺少或延迟数据。我们用三种不同的数据集和分类器进行的实验证明了该方法在检测数据漂移方面的有效性。这尤其令人鼓舞，因为分类本身可能正确，也可能不正确，并且不需要模型输入数据。我们进一步探索了顺序变化点测试的统计方法，以自动确定识别漂移所需的数据量，同时控制假阳性率（1型错误）。摘要：Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical performance guarantees is that the distribution of the training data is representative of the production data distribution. This assumption often breaks; for instance, statistical distributions of the data may change. We term changes that affect ML performance `data drift' or `drift'. Many classification techniques compute a measure of confidence in their results. This measure might not reflect the actual ML performance. A famous example is the Panda picture that is correctly classified as such with a confidence of about 60%, but when noise is added it is incorrectly classified as a Gibbon with a confidence of above 99%. However, the work we report on here suggests that a classifier's measure of confidence can be used for the purpose of detecting data drift. We propose an approach based solely on classifier suggested labels and its confidence in them, for alerting on data distribution or feature space changes that are likely to cause data drift. Our approach identities degradation in model performance and does not require labeling of data in production which is often lacking or delayed. Our experiments with three different data sets and classifiers demonstrate the effectiveness of this approach in detecting data drift. This is especially encouraging as the classification itself may or may not be correct and no model input data is required. We further explore the statistical approach of sequential change-point tests to automatically determine the amount of data needed in order to identify drift while controlling the false positive rate (Type-1 error).

【3】 Early Myocardial Infarction Detection over Multi-view Echocardiography 标题：多视角超声心动图检测早期心肌梗死链接：https://arxiv.org/abs/2111.05790

作者：Aysen Degerli,Serkan Kiranyaz,Tahir Hamid,Rashid Mazhar,Moncef Gabbouj 机构： Tampere University, qa) is with the Department of Elec-trical Engineering, Qatar University 摘要：心肌梗死（MI）是世界上由于供养心肌的冠状动脉阻塞而导致死亡的主要原因。心肌梗死的早期诊断和定位可通过促进早期治疗干预减轻心肌损伤的程度。冠状动脉阻塞后，缺血心肌节段的局部室壁运动异常（RWMA）是最早发生的改变。超声心动图是评估任何RWMA的基本工具。仅从单个超声心动图视图评估左心室（LV）壁的运动可能导致错过MI的诊断，因为在该特定视图上可能看不到RWMA。因此，在本研究中，我们建议融合心尖四腔（A4C）和心尖二腔（A2C）视图，其中总共11个心肌节段可用于心肌梗死检测。所提出的方法首先通过主动多项式（APs）估计左室壁的运动，该多项式提取并跟踪心内膜边界以计算心肌节段位移。从A4C和A2C视图位移中提取特征，将其融合并输入分类器以检测MI。本研究的主要贡献是：1）通过将A4C和A2C视图包含在总共260个超声心动图记录中，创建了一个新的基准数据集，并与研究社区公开共享；2）通过基于机器学习的方法改进基于阈值的AP先前工作的性能，3）融合A4C和A2C视图信息，通过多视图超声心动图进行心肌梗死检测的先驱方法。实验结果表明，该方法对多视点超声心动图检测心肌梗死的灵敏度为90.91%，准确率为86.36%。摘要：Myocardial infarction (MI) is the leading cause of mortality in the world that occurs due to a blockage of the coronary arteries feeding the myocardium. An early diagnosis of MI and its localization can mitigate the extent of myocardial damage by facilitating early therapeutic interventions. Following the blockage of a coronary artery, the regional wall motion abnormality (RWMA) of the ischemic myocardial segments is the earliest change to set in. Echocardiography is the fundamental tool to assess any RWMA. Assessing the motion of the left ventricle (LV) wall only from a single echocardiography view may lead to missing the diagnosis of MI as the RWMA may not be visible on that specific view. Therefore, in this study, we propose to fuse apical 4-chamber (A4C) and apical 2-chamber (A2C) views in which a total of 11 myocardial segments can be analyzed for MI detection. The proposed method first estimates the motion of the LV wall by Active Polynomials (APs), which extract and track the endocardial boundary to compute myocardial segment displacements. The features are extracted from the A4C and A2C view displacements, which are fused and fed into the classifiers to detect MI. The main contributions of this study are 1) creation of a new benchmark dataset by including both A4C and A2C views in a total of 260 echocardiography recordings, which is publicly shared with the research community, 2) improving the performance of the prior work of threshold-based APs by a Machine Learning based approach, and 3) a pioneer MI detection approach via multi-view echocardiography by fusing the information of A4C and A2C views. Experimental results show that the proposed method achieves 90.91% sensitivity and 86.36% precision for MI detection over multi-view echocardiography.

分类|识别(2篇)

【1】 BagBERT: BERT-based bagging-stacking for multi-topic classification 标题：BagBERT：基于BERT的多主题分类打包堆叠链接：https://arxiv.org/abs/2111.05808

作者：Loïc Rakotoson,Charles Letaillieur,Sylvain Massip,Fréjus Laleye 机构：Opscidia, Paris, France 摘要：本文描述了我们在Biocreative VII上提交的关于新冠病毒-19文献注释任务的报告。我们提出了一种方法，利用知识的全局非最优权重，通常拒绝，建立一个丰富的代表性，每个标签。我们提出的方法分为两个阶段：（1）以弱训练权重为特征的训练数据的各种初始化的打包，（2）基于BERT和RoBERTa嵌入的异构词汇模型的堆叠。这些弱洞察的聚合比经典的全局有效模型表现得更好。其目的是将丰富的知识提炼为更简单、更轻的模型。我们的系统获得了92.96的基于实例的F1和91.35的基于标签的micro-F1。摘要：This paper describes our submission on the COVID-19 literature annotation task at Biocreative VII. We proposed an approach that exploits the knowledge of the globally non-optimal weights, usually rejected, to build a rich representation of each label. Our proposed approach consists of two stages: (1) A bagging of various initializations of the training data that features weakly trained weights, (2) A stacking of heterogeneous vocabulary models based on BERT and RoBERTa Embeddings. The aggregation of these weak insights performs better than a classical globally efficient model. The purpose is the distillation of the richness of knowledge to a simpler and lighter model. Our system obtains an Instance-based F1 of 92.96 and a Label-based micro-F1 of 91.35.

【2】 Important Sentence Identification in Legal Cases Using Multi-Class Classification 标题：多类分类在法律案件重要句子识别中的应用链接：https://arxiv.org/abs/2111.05721

作者：Sahan Jayasinghe,Lakith Rambukkanage,Ashan Silva,Nisansa de Silva,Amal Shehan Perera 机构：Department of Computer Science & Engineering, University of Moratuwa 摘要：自然语言处理（NLP）的发展正以实际应用和学术兴趣的形式在各个领域传播。从本质上讲，法律领域包含大量文本格式的数据。因此，需要应用NLP来满足该领域的分析需求。对于法律专业人士来说，识别法律案件中的重要句子、事实和论据是一项非常繁琐的任务。在这项研究中，我们从案件主要当事人的角度探讨了多类别分类中句子嵌入的用法，以识别法律案件中的重要句子。此外，定义了特定于任务的损失函数，以提高直接使用分类交叉熵损失所限制的准确性。摘要：The advancement of Natural Language Processing (NLP) is spreading through various domains in forms of practical applications and academic interests. Inherently, the legal domain contains a vast amount of data in text format. Therefore it requires the application of NLP to cater to the analytically demanding needs of the domain. Identifying important sentences, facts and arguments in a legal case is such a tedious task for legal professionals. In this research we explore the usage of sentence embeddings for multi-class classification to identify important sentences in a legal case, in the perspective of the main parties present in the case. In addition, a task-specific loss function is defined in order to improve the accuracy restricted by the straightforward use of categorical cross entropy loss.

表征(4篇)

【1】 Topic-aware latent models for representation learning on networks 标题：基于主题感知的网络表征学习潜在模型链接：https://arxiv.org/abs/2111.05576

作者：Abdulkadir Çelikkanat,Fragkiskos D. Malliaros 机构：Paris-Saclay University, CentraleSup´elec, Inria, Centre for Visual Computing, Gif-Sur-Yvette, France 备注：None 摘要：网络表示学习（NRL）方法在过去几年中受到了广泛的关注，这得益于它们在一些图形分析问题上的成功，包括节点分类、链接预测和聚类。这种方法的目的是将网络的每个顶点映射到低维空间中，以保持网络的结构信息。特别感兴趣的是基于随机游动的方法；这些方法将网络转化为一组节点序列，旨在通过预测序列中每个节点的上下文来学习节点表示。在本文中，我们介绍了TNE，这是一个通用的框架，用于增强通过基于主题信息的随机游走方法获得的节点的嵌入。与自然语言处理中主题词嵌入的概念类似，该模型首先利用各种统计图模型和社区检测方法将每个节点分配给潜在社区，然后学习增强的主题感知表示。我们在两个下游任务中评估我们的方法：节点分类和链接预测。实验结果表明，通过结合节点和社区嵌入，我们能够超越广泛已知的基线NRL模型。摘要：Network representation learning (NRL) methods have received significant attention over the last years thanks to their success in several graph analysis problems, including node classification, link prediction, and clustering. Such methods aim to map each vertex of the network into a low-dimensional space in a way that the structural information of the network is preserved. Of particular interest are methods based on random walks; such methods transform the network into a collection of node sequences, aiming to learn node representations by predicting the context of each node within the sequence. In this paper, we introduce TNE, a generic framework to enhance the embeddings of nodes acquired by means of random walk-based approaches with topic-based information. Similar to the concept of topical word embeddings in Natural Language Processing, the proposed model first assigns each node to a latent community with the favor of various statistical graph models and community detection methods and then learns the enhanced topic-aware representations. We evaluate our methodology in two downstream tasks: node classification and link prediction. The experimental results demonstrate that by incorporating node and community embeddings, we are able to outperform widely-known baseline NRL models.

【2】 ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees 标题：ResNEst和DenseNEst：基于挡路的改进表示保证的DNN模型链接：https://arxiv.org/abs/2111.05496

作者：Kuan-Lin Chen,Ching-Hua Lee,Harinath Garudadri,Bhaskar D. Rao 机构：Department of Electrical and Computer Engineering,Qualcomm Institute, University of California, San Diego, La Jolla, CA , USA 备注：24 pages. Accepted by NeurIPS 2021 摘要：最近文献中使用的证明残差网络（resnet）优于线性预测的模型实际上不同于计算机视觉中广泛使用的标准resnet。除了标量值输出或单个残差块等假设外，这些模型在馈入最终仿射层的最终残差表示处没有非线性。为了对这种非线性差异进行编码并揭示线性估计特性，我们定义了resnest，即残差非线性估计，方法是简单地从标准resnet的最后一个残差表示处删除非线性。我们表明，具有瓶颈块的宽重嵌套始终可以保证标准重嵌套所要达到的非常理想的训练特性，即，在给定相同的基本元素集的情况下，添加更多的块不会降低性能。为了证明这一点，我们首先认识到resnest是基础函数模型，它受到基础学习和线性预测耦合问题的限制。然后，为了将预测权重与基础学习解耦，我们构建了一种称为增强ResNEst（a-ResNEst）的特殊体系结构，该体系结构始终保证在添加块的情况下不会产生更差的性能。因此，这样的a-ResNEst使用相应的基础为ResNEst建立了经验风险下界。我们的结果表明，重新嵌套确实存在减少特征重用的问题；然而，可以通过充分扩展或加宽输入空间来避免这种情况，从而实现上述理想特性。受已证明优于resnet的densenet的启发，我们还提出了相应的新模型，称为稠密连接非线性估计器（DenseNEst）。我们证明了任何密集度都可以表示为带有瓶颈块的宽重嵌套。与重新设计不同，密集型建筑在不进行任何特殊的建筑重新设计的情况下展现出理想的性能。摘要：Models recently used in the literature proving residual networks (ResNets) are better than linear predictors are actually different from standard ResNets that have been widely used in computer vision. In addition to the assumptions such as scalar-valued output or single residual block, these models have no nonlinearities at the final residual representation that feeds into the final affine layer. To codify such a difference in nonlinearities and reveal a linear estimation property, we define ResNEsts, i.e., Residual Nonlinear Estimators, by simply dropping nonlinearities at the last residual representation from standard ResNets. We show that wide ResNEsts with bottleneck blocks can always guarantee a very desirable training property that standard ResNets aim to achieve, i.e., adding more blocks does not decrease performance given the same set of basis elements. To prove that, we first recognize ResNEsts are basis function models that are limited by a coupling problem in basis learning and linear prediction. Then, to decouple prediction weights from basis learning, we construct a special architecture termed augmented ResNEst (A-ResNEst) that always guarantees no worse performance with the addition of a block. As a result, such an A-ResNEst establishes empirical risk lower bounds for a ResNEst using corresponding bases. Our results demonstrate ResNEsts indeed have a problem of diminishing feature reuse; however, it can be avoided by sufficiently expanding or widening the input space, leading to the above-mentioned desirable property. Inspired by the DenseNets that have been shown to outperform ResNets, we also propose a corresponding new model called Densely connected Nonlinear Estimator (DenseNEst). We show that any DenseNEst can be represented as a wide ResNEst with bottleneck blocks. Unlike ResNEsts, DenseNEsts exhibit the desirable property without any special architectural re-design.

【3】 DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution 标题：DistIR：一种高效神经网络分布的中间表示和模拟器链接：https://arxiv.org/abs/2111.05426

作者：Keshav Santhanam,Siddharth Krishna,Ryota Tomioka,Tim Harris,Matei Zaharia 机构： resulting in a large space of potential 1Stanford University 2Microsoft 摘要：深度神经网络（DNN）模型和数据集的规模迅速增长，产生了各种分布策略，如数据、张量模型、管道并行性及其混合组合。这些策略中的每一种都提供了自己的权衡，并在不同的模型和硬件拓扑中表现出最佳性能。为给定的设置选择最佳的策略集是一项挑战，因为搜索空间会组合增长，并且在集群上调试和测试成本高昂。在这项工作中，我们提出了DistIR，这是一种用于分布式DNN计算的表达性中间表示，专门用于有效的分析，如模拟。这样可以自动识别性能最佳的策略，而无需在物理硬件上执行。与以前的工作不同，DistIR可以自然地表达许多分布策略，包括具有任意调度的管道并行性。我们对MLP训练和GPT-2推理模型的评估表明，DistIR及其模拟器如何在跨越1000多个配置的复杂分布空间上实现快速网格搜索，从而将某些区域的优化时间减少一个数量级。摘要：The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup is challenging because the search space grows combinatorially, and debugging and testing on clusters is expensive. In this work we propose DistIR, an expressive intermediate representation for distributed DNN computation that is tailored for efficient analyses, such as simulation. This enables automatically identifying the top-performing strategies without having to execute on physical hardware. Unlike prior work, DistIR can naturally express many distribution strategies including pipeline parallelism with arbitrary schedules. Our evaluation on MLP training and GPT-2 inference models demonstrates how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000 configurations, reducing optimization time by an order of magnitude for certain regimes.

【4】 Object-Centric Representation Learning with Generative Spatial-Temporal Factorization 标题：基于产生式时空分解的以对象为中心的表征学习链接：https://arxiv.org/abs/2111.05393

作者：Li Nanbo,Muhammad Ahmed Raza,Hu Wenbin,Zhaole Sun,Robert B. Fisher 机构：School of Informatics, University of Edinburgh 备注：Accepted at NeurIPS 2021 摘要：学习以对象为中心的场景表示对于获得复杂场景的结构理解和抽象至关重要。然而，由于当前的无监督以对象为中心的表示学习方法是建立在静态观察者假设或静态场景假设的基础上的，因此它们通常：i）存在单视图空间模糊性，或ii）从动态场景中错误或不准确地推断对象表示。为了解决这个问题，我们提出了动态感知多对象网络（DyMON），这是一种将多视图以对象为中心的表示学习扩展到动态场景的方法。我们在多视图动态场景数据上训练DyMON，并表明DyMON在没有监督的情况下学习从一系列观察中分解观察者运动和场景对象动态的纠缠效应，并构建适合在任意时间渲染的场景对象空间表示（跨时间查询）以及从任意视点（跨空间查询）。我们还表明，分解场景表示（w.r.t.对象）支持独立于空间和时间查询单个对象。摘要：Learning object-centric scene representations is essential for attaining structural understanding and abstraction of complex scenes. Yet, as current approaches for unsupervised object-centric representation learning are built upon either a stationary observer assumption or a static scene assumption, they often: i) suffer single-view spatial ambiguities, or ii) infer incorrectly or inaccurately object representations from dynamic scenes. To address this, we propose Dynamics-aware Multi-Object Network (DyMON), a method that broadens the scope of multi-view object-centric representation learning to dynamic scenes. We train DyMON on multi-view-dynamic-scene data and show that DyMON learns -- without supervision -- to factorize the entangled effects of observer motions and scene object dynamics from a sequence of observations, and constructs scene object spatial representations suitable for rendering at arbitrary times (querying across time) and from arbitrary viewpoints (querying across space). We also show that the factorized scene representations (w.r.t. objects) support querying about a single object by space and time independently.

3D|3D重建等相关(1篇)

【1】 Efficient Data Compression for 3D Sparse TPC via Bicephalous Convolutional Autoencoder 标题：基于双头卷积自动编码器的三维稀疏TPC数据压缩链接：https://arxiv.org/abs/2111.05423

作者：Yi Huang,Yihui Ren,Shinjae Yoo,Jin Huang 备注：6 pages, 6 figures 摘要：大型实验设施中的实时数据收集和分析是跨多个领域的巨大挑战，包括高能物理、核物理和宇宙学。为了解决这个问题，基于机器学习（ML）的实时数据压缩方法引起了人们的极大关注。然而，与自然图像数据（如CIFAR和ImageNet）不同，自然图像数据规模相对较小且连续，科学数据通常以三维数据体的形式出现，具有高稀疏性（多个零）和非高斯值分布。这使得直接应用流行的ML压缩方法以及传统的数据压缩方法都不太理想。为了解决这些障碍，这项工作引入了一种双头自动编码器来同时解决稀疏性和回归问题，称为textit{Bicephalous voluminal autoencoder}（BCAE）。与传统的数据压缩方法（如MGARD、SZ和ZFP）相比，该方法在压缩保真度和压缩比方面都具有优势。为了获得相似的保真度，传统方法中性能最好的方法只能达到BCAE压缩比的一半。此外，对BCAE方法的深入研究表明，专用的分割解码器可以改善重建效果。摘要：Real-time data collection and analysis in large experimental facilities present a great challenge across multiple domains, including high energy physics, nuclear physics, and cosmology. To address this, machine learning (ML)-based methods for real-time data compression have drawn significant attention. However, unlike natural image data, such as CIFAR and ImageNet that are relatively small-sized and continuous, scientific data often come in as three-dimensional data volumes at high rates with high sparsity (many zeros) and non-Gaussian value distribution. This makes direct application of popular ML compression methods, as well as conventional data compression methods, suboptimal. To address these obstacles, this work introduces a dual-head autoencoder to resolve sparsity and regression simultaneously, called textit{Bicephalous Convolutional AutoEncoder} (BCAE). This method shows advantages both in compression fidelity and ratio compared to traditional data compression methods, such as MGARD, SZ, and ZFP. To achieve similar fidelity, the best performer among the traditional methods can reach only half the compression ratio of BCAE. Moreover, a thorough ablation study of the BCAE method shows that a dedicated segmentation decoder improves the reconstruction.

优化|敛散性(5篇)

【1】 Searching in the Forest for Local Bayesian Optimization 标题：局部贝叶斯优化的森林搜索链接：https://arxiv.org/abs/2111.05834

作者：Difan Deng,Marius Lindauer 机构：Leibniz University Hannover 摘要：由于其样本效率，贝叶斯优化（BO）已成为处理昂贵的黑盒优化问题（如超参数优化（HPO））的一种流行方法。最近的实证实验表明，HPO问题的损失情况往往比以前假设的更为有利，即在最佳情况下是单峰和凸的，因此，如果BO框架能够专注于那些有希望的局部区域，则其效率可能会更高。在本文中，我们提出了BOinG，这是一种针对中型配置空间定制的两阶段方法，因为在许多HPO问题中会遇到这种方法。在第一阶段，我们建立了一个具有随机森林的可伸缩全局代理模型来描述整个景观结构。此外，我们通过自下而上的方法在上层树结构上选择一个有希望的子区域。在第二阶段，利用该次区域的局部模型来建议下一步要评估的点。实证实验表明，BOinG能够利用典型HPO问题的结构，并且在综合函数和HPO的中等规模问题上表现尤其出色。摘要：Because of its sample efficiency, Bayesian optimization (BO) has become a popular approach dealing with expensive black-box optimization problems, such as hyperparameter optimization (HPO). Recent empirical experiments showed that the loss landscapes of HPO problems tend to be more benign than previously assumed, i.e. in the best case uni-modal and convex, such that a BO framework could be more efficient if it can focus on those promising local regions. In this paper, we propose BOinG, a two-stage approach that is tailored toward mid-sized configuration spaces, as one encounters in many HPO problems. In the first stage, we build a scalable global surrogate model with a random forest to describe the overall landscape structure. Further, we choose a promising subregion via a bottom-up approach on the upper-level tree structure. In the second stage, a local model in this subregion is utilized to suggest the point to be evaluated next. Empirical experiments show that BOinG is able to exploit the structure of typical HPO problems and performs particularly well on mid-sized problems from synthetic functions and HPO.

【2】 Safe Real-Time Optimization using Multi-Fidelity Gaussian Processes 标题：基于多保真高斯过程的安全实时优化链接：https://arxiv.org/abs/2111.05589

作者：Panagiotis Petsagkourakis,Benoit Chachuat,Ehecatl Antonio del Rio-Chanona 机构：UniversityCollegeLondon 备注：Accepted in CDC 2021 摘要：本文提出了一类新的实时优化方案，以克服不确定过程的系统模型失配。这项工作的新颖之处在于将无导数优化方案和多保真度高斯过程集成到贝叶斯优化框架中。该方案对随机系统使用两个高斯过程，一个模拟（已知）过程模型，另一个通过测量模拟真实系统。这样，可以通过模型获得低保真度样本，而通过系统测量获得高保真度样本。该框架以非参数方式捕获系统行为，同时通过采集功能驱动探索。使用高斯过程表示系统的好处是能够实时执行不确定性量化，并允许以高置信度满足机会约束。这导致了一个实际的方法，这在数值案例研究中得到了说明，包括一个半间歇式光生物反应器优化问题。摘要：This paper proposes a new class of real-time optimization schemes to overcome system-model mismatch of uncertain processes. This work's novelty lies in integrating derivative-free optimization schemes and multi-fidelity Gaussian processes within a Bayesian optimization framework. The proposed scheme uses two Gaussian processes for the stochastic system, one emulates the (known) process model, and another, the true system through measurements. In this way, low fidelity samples can be obtained via a model, while high fidelity samples are obtained through measurements of the system. This framework captures the system's behavior in a non-parametric fashion while driving exploration through acquisition functions. The benefit of using a Gaussian process to represent the system is the ability to perform uncertainty quantification in real-time and allow for chance constraints to be satisfied with high confidence. This results in a practical approach that is illustrated in numerical case studies, including a semi-batch photobioreactor optimization problem.

【3】 Deducing of Optimal Machine Learning Algorithms for Heterogeneity 标题：异构环境下最优机器学习算法的推导链接：https://arxiv.org/abs/2111.05558

作者：Omar Alfarisi,Zeyar Aung,Mohamed Sassi 机构： ADNOC. , Khalifa University., Preliminary work. Under review. 摘要：为了确定最佳的机器学习算法，我们要选择的决策并不容易。为了帮助未来的研究人员，我们在本文中描述了最好的算法中的最优算法。我们构建了一个合成数据集，并对五种不同的算法进行了监督机器学习。对于异质性，我们确定随机森林是最好的算法。摘要：For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

【4】 Efficient Projection-Free Online Convex Optimization with Membership Oracle 标题：基于成员Oracle的高效无投影在线凸优化链接：https://arxiv.org/abs/2111.05818

作者：Zakaria Mhammedi 机构：Massachusets Institute of Technology 备注：67 pages, 1 figure, 3 tables 摘要：在约束凸优化中，现有的基于椭球体或割平面的方法不能很好地适应环境空间的尺寸。替代方法（如投影梯度下降法）仅为简单凸集（如欧几里德球）提供计算优势，其中欧几里德投影可以有效执行。对于其他集合，投影的成本可能太高。为了避免这些问题，研究并使用了基于著名的Frank-Wolfe算法的替代方法。这种方法在每次迭代中使用线性优化预言，而不是欧几里德投影；前者通常可以有效地执行。这种方法也被扩展到在线和随机优化设置。然而，对于一般凸集，Frank-Wolfe算法及其变体在遗憾或比率方面并没有达到最佳性能。更重要的是，在某些情况下，他们使用的线性优化Oracle在计算上仍然很昂贵。在本文中，我们不再使用Frank-Wolfe风格的算法，而是提出了一种新的简化方法，将定义在欧几里德球上的任何算法a（投影便宜）转换为包含在球内的约束集C上的算法，而不会大大牺牲原算法a的性能。我们的缩减需要在T轮之后对C上的成员Oracle进行O（T log T）调用，并且不需要对C进行线性优化。在在线[响应随机]凸优化中，利用我们的约化，我们恢复了迭代次数方面的最佳遗憾边界[响应率]。当环境空间的维数较大时，我们的保证在离线凸优化设置中也很有用。摘要：In constrained convex optimization, existing methods based on the ellipsoid or cutting plane method do not scale well with the dimension of the ambient space. Alternative approaches such as Projected Gradient Descent only provide a computational benefit for simple convex sets such as Euclidean balls, where Euclidean projections can be performed efficiently. For other sets, the cost of the projections can be too high. To circumvent these issues, alternative methods based on the famous Frank-Wolfe algorithm have been studied and used. Such methods use a Linear Optimization Oracle at each iteration instead of Euclidean projections; the former can often be performed efficiently. Such methods have also been extended to the online and stochastic optimization settings. However, the Frank-Wolfe algorithm and its variants do not achieve the optimal performance, in terms of regret or rate, for general convex sets. What is more, the Linear Optimization Oracle they use can still be computationally expensive in some cases. In this paper, we move away from Frank-Wolfe style algorithms and present a new reduction that turns any algorithm A defined on a Euclidean ball (where projections are cheap) to an algorithm on a constrained set C contained within the ball, without sacrificing the performance of the original algorithm A by much. Our reduction requires O(T log T) calls to a Membership Oracle on C after T rounds, and no linear optimization on C is needed. Using our reduction, we recover optimal regret bounds [resp. rates], in terms of the number of iterations, in online [resp. stochastic] convex optimization. Our guarantees are also useful in the offline convex optimization setting when the dimension of the ambient space is large.

【5】 Linear Convergence of Stochastic Primal Dual Methods for Linear Programming Using Variance Reduction and Restarts 标题：线性规划随机原始对偶方法降方差重启的线性收敛性链接：https://arxiv.org/abs/2111.05530

作者：Haihao Lu,Jinwen Yang 摘要：最近人们对线性规划（LP）的一阶方法产生了兴趣。在本文中，我们提出了一种使用方差缩减和重新启动的随机算法来解决尖锐的原始-对偶问题，如LP。我们证明了所提出的随机算法对于高概率的尖锐实例具有线性收敛速度，这提高了现有确定性和随机算法的复杂性。此外，针对无约束双线性问题，我们提出了一种有效的基于坐标的随机预言机，其每次迭代成本为$mathcal O（1）$，并提高了总的失败次数以达到一定的精度。摘要：There is a recent interest on first-order methods for linear programming (LP). In this paper, we propose a stochastic algorithm using variance reduction and restarts for solving sharp primal-dual problems such as LP. We show that the proposed stochastic method exhibits a linear convergence rate for sharp instances with a high probability, which improves the complexity of the existing deterministic and stochastic algorithms. In addition, we propose an efficient coordinate-based stochastic oracle for unconstrained bilinear problems, which has $mathcal O(1)$ per iteration cost and improves the total flop counts to reach a certain accuracy.

其他神经网络|深度学习|模型|建模(15篇)

【1】 Towards Green Automated Machine Learning: Status Quo and Future Directions 标题：走向绿色自动机器学习：现状与未来链接：https://arxiv.org/abs/2111.05850

作者：Tanja Tornede,Alexander Tornede,Jonas Hanselle,Marcel Wever,Felix Mohr,Eyke Hüllermeier 机构：Department of Computer Science, Paderborn University, Germany, Institut of Informatics, University of Munich, Universidad de La Sabana, Chia, Cundinamarca, Colombia 摘要：自动机器学习（AutoML）致力于自动配置机器学习算法，并将其组合成一个整体（软件）解决方案-一个机器学习管道-根据手头的学习任务（数据集）定制。在过去的十年中，AutoML已经成为一个热门的研究课题，贡献了数百篇。虽然AutoML提供了许多前景，但它也被认为是资源密集型的，这是它的主要批评点之一。高资源消耗的主要原因是，许多方法在搜索好的候选对象时依赖于对许多ML管道的（昂贵的）评估。这一问题在AutoML方法研究的背景下被放大，因为使用许多数据集和方法进行了大规模实验，每种方法都经过多次重复以排除随机效应。本着最近绿色人工智能工作的精神，本文旨在提高AutoML研究人员对该问题的认识，并详细阐述可能的补救措施。为此，我们确定了社区为实现更可持续的AutoML研究而可能采取的四类行动，即方法设计、基准测试、研究激励和透明度。摘要：Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution - a machine learning pipeline - tailored to the learning task (dataset) at hand. Over the last decade, AutoML has become a hot research topic with hundreds of contributions. While AutoML offers many prospects, it is also known to be quite resource-intensive, which is one of its major points of criticism. The primary cause for a high resource consumption is that many approaches rely on the (costly) evaluation of many ML pipelines while searching for good candidates. This problem is amplified in the context of research on AutoML methods, due to large scale experiments conducted with many datasets and approaches, each of them being run with several repetitions to rule out random effects. In the spirit of recent work on Green AI, this paper is written in an attempt to raise the awareness of AutoML researchers for the problem and to elaborate on possible remedies. To this end, we identify four categories of actions the community may take towards more sustainable research on AutoML, namely approach design, benchmarking, research incentives, and transparency.

【2】 Palette: Image-to-Image Diffusion Models 标题：调色板：图像到图像扩散模型链接：https://arxiv.org/abs/2111.05826

作者：Chitwan Saharia,William Chan,Huiwen Chang,Chris A. Lee,Jonathan Ho,Tim Salimans,David J. Fleet,Mohammad Norouzi 机构：Google Research 摘要：我们介绍调色板，这是一个使用条件扩散模型进行图像到图像转换的简单而通用的框架。在四项具有挑战性的图像到图像转换任务（着色、修复、取消裁剪和JPEG解压缩）中，调色板的性能优于强GAN和回归基线，并建立了一种新的技术状态。这是在没有特定任务的超参数调整、架构定制或任何辅助损失的情况下完成的，显示了理想的通用性和灵活性。我们揭示了在去噪扩散目标中使用$L_2$与$L_1$损失对样本多样性的影响，并通过实证架构研究证明了自我关注的重要性。重要的是，我们提倡基于ImageNet的统一评估协议，并报告多个样本质量分数，包括FID、初始分数、预先训练的ResNet-50的分类精度，以及不同基线下参考图像的感知距离。我们期望这个标准化的评估协议在推进图像到图像的翻译研究中发挥关键作用。最后，我们展示了在3项任务（着色、修复、JPEG解压缩）上训练的单一通用调色板模型的性能与特定任务的专家模型相当或更好。摘要：We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. On four challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG decompression), Palette outperforms strong GAN and regression baselines, and establishes a new state of the art. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss, demonstrating a desirable degree of generality and flexibility. We uncover the impact of using $L_2$ vs. $L_1$ loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, and report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images for various baselines. We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research. Finally, we show that a single generalist Palette model trained on 3 tasks (colorization, inpainting, JPEG decompression) performs as well or better than task-specific specialist counterparts.

【3】 HARPO: Learning to Subvert Online Behavioral Advertising 标题：哈波：学会颠覆网络行为广告链接：https://arxiv.org/abs/2111.05792

作者：Jiang Zhang,Konstantinos Psounis,Muhammad Haroon,Zubair Shafiq 机构：University of Southern California, University of California, Davis 备注：Accepted at NDSS'22 摘要：在线行为广告和相关的追踪工具构成了真正的隐私威胁。不幸的是，现有的隐私增强工具并不总能有效地对抗在线广告和跟踪。我们提出了Harpo，这是一种基于原则的学习方法，通过混淆颠覆在线行为广告。Harpo使用强化学习自适应地将真实页面访问与假页面交错，以扭曲跟踪者对用户浏览配置文件的视图。我们根据现实世界中用于在线行为广告的用户配置和广告定位模型对Harpo进行评估。结果表明，Harpo通过触发40%以上的不正确兴趣段和6倍以上的出价来改善隐私。在相同的开销下，Harpo的性能比现有的模糊处理工具高出16倍。与现有的模糊处理工具相比，Harpo还能够实现更好的对抗性检测的隐蔽性。Harpo有意义地推进了利用模糊技术颠覆在线行为广告的最新技术摘要：Online behavioral advertising, and the associated tracking paraphernalia, poses a real privacy threat. Unfortunately, existing privacy-enhancing tools are not always effective against online advertising and tracking. We propose Harpo, a principled learning-based approach to subvert online behavioral advertising through obfuscation. Harpo uses reinforcement learning to adaptively interleave real page visits with fake pages to distort a tracker's view of a user's browsing profile. We evaluate Harpo against real-world user profiling and ad targeting models used for online behavioral advertising. The results show that Harpo improves privacy by triggering more than 40% incorrect interest segments and 6x higher bid values. Harpo outperforms existing obfuscation tools by as much as 16x for the same overhead. Harpo is also able to achieve better stealthiness to adversarial detection than existing obfuscation tools. Harpo meaningfully advances the state-of-the-art in leveraging obfuscation to subvert online behavioral advertising

【4】 Prune Once for All: Sparse Pre-Trained Language Models 标题：一劳永逸地修剪：稀疏的预先训练的语言模型链接：https://arxiv.org/abs/2111.05754

作者：Ofir Zafrir,Ariel Larey,Guy Boudoukh,Haihao Shen,Moshe Wasserblat 机构：Intel Labs, Israel, Intel Corporation 备注：ENLSP NeurIPS Workshop 2021, 12 pages 摘要：基于转换器的语言模型在自然语言处理中有着广泛的应用。但是，它们效率低下，难以部署。近年来，为了提高基于目标硬件的大型Transformer模型的实现效率，提出了许多压缩算法。在这项工作中，我们提出了一种新的训练稀疏预训练Transformer语言模型的方法，结合权值剪枝和模型蒸馏。这些稀疏的预训练模型可用于广泛任务的学习迁移，同时保持其稀疏模式。我们用三种已知的体系结构来演示我们的方法，以创建稀疏的预训练的BERT基、BERT大和DistilBERT。我们展示了我们训练的压缩稀疏预训练模型如何将其知识转移到五种不同的下游自然语言任务中，并且精度损失最小。此外，我们还展示了如何使用量化感知训练将稀疏模型的权重进一步压缩到8位精度。例如，通过我们的稀疏预训练BERT Large在SQuADv1.1上进行微调并量化为8比特，我们为编码器实现了$40$X的压缩比，精度损失小于$1%$。据我们所知，我们的结果显示，对于BERT-Base、BERT-Large和DistilBERT，压缩精度比最佳。摘要：Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase the implementation efficiency of large Transformer-based models on target hardware. In this work we present a new method for training sparse pre-trained Transformer language models by integrating weight pruning and model distillation. These sparse pre-trained models can be used to transfer learning for a wide range of tasks while maintaining their sparsity pattern. We demonstrate our method with three known architectures to create sparse pre-trained BERT-Base, BERT-Large and DistilBERT. We show how the compressed sparse pre-trained models we trained transfer their knowledge to five different downstream natural language tasks with minimal accuracy loss. Moreover, we show how to further compress the sparse models' weights to 8bit precision using quantization-aware training. For example, with our sparse pre-trained BERT-Large fine-tuned on SQuADv1.1 and quantized to 8bit we achieve a compression ratio of $40$X for the encoder with less than $1%$ accuracy loss. To the best of our knowledge, our results show the best compression-to-accuracy ratio for BERT-Base, BERT-Large, and DistilBERT.

【5】 Efficient Neural Network Training via Forward and Backward Propagation Sparsification 标题：基于前向和后向传播稀疏化的高效神经网络训练链接：https://arxiv.org/abs/2111.05685

作者：Xiao Zhou,Weizhong Zhang,Zonghao Chen,Shizhe Diao,Tong Zhang 机构： Hong Kong University of Science and Technology, Tsinghua University 摘要：稀疏训练是一种自然的想法，可以加快深层神经网络的训练速度并节省内存使用，特别是在大型现代神经网络明显过度参数化的情况下。然而，大多数现有方法在实践中无法实现这一目标，因为先前方法采用的基于链规则的梯度（w.r.t.结构参数）估计至少在反向传播步骤中需要密集计算。本文提出了一种具有完全稀疏前向和后向通道的有效稀疏训练方法，解决了这一问题。我们首先将训练过程描述为全局稀疏约束下的连续最小化问题。然后，我们将优化过程分为两个步骤，分别对应于权重更新和结构参数更新。对于前一步，我们使用传统的链式规则，通过利用稀疏结构可以实现稀疏。对于后一步，我们不使用现有方法中基于链规则的梯度估计，而是提出了一种减少方差的策略梯度估计，它只需要两个前向过程而不需要反向传播，从而实现完全稀疏的训练。我们证明了梯度估计的方差是有界的。在真实数据集上的大量实验结果表明，与以前的方法相比，我们的算法在加速训练过程方面更加有效，速度快了一个数量级。摘要：Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods cannot achieve this goal in practice because the chain rule based gradient (w.r.t. structure parameters) estimators adopted by previous methods require dense computation at least in the backward propagation step. This paper solves this problem by proposing an efficient sparse training method with completely sparse forward and backward passes. We first formulate the training process as a continuous minimization problem under global sparsity constraint. We then separate the optimization process into two steps, corresponding to weight update and structure parameter update. For the former step, we use the conventional chain rule, which can be sparse via exploiting the sparse structure. For the latter step, instead of using the chain rule based gradient estimators as in existing methods, we propose a variance reduced policy gradient estimator, which only requires two forward passes without backward propagation, thus achieving completely sparse training. We prove that the variance of our gradient estimator is bounded. Extensive experimental results on real-world datasets demonstrate that compared to previous methods, our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.

【6】 Learning to ignore: rethinking attention in CNNs 标题：学会忽视：CNN中注意力的再思考链接：https://arxiv.org/abs/2111.05684

作者：Firas Laakom,Kateryna Chumachenko,Jenni Raitoharju,Alexandros Iosifidis,Moncef Gabbouj 机构： Tampere University, FinnishEnvironment Institute 备注：accepted to BMVC 2021 摘要：近年来，人们越来越关注卷积神经网络（CNN）中的注意机制来解决计算机视觉任务。这些方法大多学习明确识别和突出显示场景的相关部分，并将参与的图像传递到网络的其他层。在本文中，我们认为这种方法可能不是最优的。可以说，明确地了解图像的哪些部分是相关的通常比了解图像的哪些部分不那么相关更难，因此应该忽略。事实上，在视觉领域中，存在许多容易识别的无关特征模式。例如，靠近边界的图像区域不太可能包含分类任务的有用信息。基于这一观点，我们建议重新构建CNN中的注意机制，使其学会忽略，而不是学会参与。具体来说，我们建议显式学习场景中的无关信息，并在生成的表示中抑制它，只保留重要属性。这种内隐注意机制可以整合到任何现有的注意机制中。在这项工作中，我们用两种最新的注意方法——挤压和激发（SE）块和卷积块注意模块（CBAM）验证了这一想法。在不同数据集和模型架构上的实验结果表明，与标准方法相比，学习忽略，即内隐注意，可以产生更好的性能。摘要：Recently, there has been an increasing interest in applying attention mechanisms in Convolutional Neural Networks (CNNs) to solve computer vision tasks. Most of these methods learn to explicitly identify and highlight relevant parts of the scene and pass the attended image to further layers of the network. In this paper, we argue that such an approach might not be optimal. Arguably, explicitly learning which parts of the image are relevant is typically harder than learning which parts of the image are less relevant and, thus, should be ignored. In fact, in vision domain, there are many easy-to-identify patterns of irrelevant features. For example, image regions close to the borders are less likely to contain useful information for a classification task. Based on this idea, we propose to reformulate the attention mechanism in CNNs to learn to ignore instead of learning to attend. Specifically, we propose to explicitly learn irrelevant information in the scene and suppress it in the produced representation, keeping only important attributes. This implicit attention scheme can be incorporated into any existing attention mechanism. In this work, we validate this idea using two recent attention methods Squeeze and Excitation (SE) block and Convolutional Block Attention Module (CBAM). Experimental results on different datasets and model architectures show that learning to ignore, i.e., implicit attention, yields superior performance compared to the standard approaches.

【7】 Conditional Alignment and Uniformity for Contrastive Learning with Continuous Proxy Labels 标题：具有连续代理标签的对比学习的条件对齐和一致性链接：https://arxiv.org/abs/2111.05643

作者：Benoit Dufumier,Pietro Gori,Julie Victor,Antoine Grigis,Edouard Duchesnay 机构：NeuroSpin, CEA Saclay, Université Paris-Saclay, France, LTCI, Télécom Paris, IPParis, France 备注：Accepted to MedNeurIPS 2021 (Oral) 摘要：对比学习在自然图像和医学图像上显示了令人印象深刻的结果，而不需要注释数据。然而，医学图像的一个特殊性是可以利用元数据（如年龄或性别）来学习表征。在这里，我们展示了最近提出的对比y-感知信息损失，它集成了多维元数据，渐进地优化了两个属性：条件对齐和全局一致性。与[Wang，2020]类似，条件对齐意味着相似的样本应该具有相似的特征，但有条件地依赖于元数据。相反，全局一致性意味着（规范化的）特征应均匀分布在单位超球体上，独立于元数据。在这里，我们建议根据元数据定义条件一致性，仅排斥具有不同元数据的样本。我们表明，在线性评估方面，条件对齐和均匀性的直接优化改善了CIFAR-100和脑MRI数据集的表示。摘要：Contrastive Learning has shown impressive results on natural and medical images, without requiring annotated data. However, a particularity of medical images is the availability of meta-data (such as age or sex) that can be exploited for learning representations. Here, we show that the recently proposed contrastive y-Aware InfoNCE loss, that integrates multi-dimensional meta-data, asymptotically optimizes two properties: conditional alignment and global uniformity. Similarly to [Wang, 2020], conditional alignment means that similar samples should have similar features, but conditionally on the meta-data. Instead, global uniformity means that the (normalized) features should be uniformly distributed on the unit hyper-sphere, independently of the meta-data. Here, we propose to define conditional uniformity, relying on the meta-data, that repel only samples with dissimilar meta-data. We show that direct optimization of both conditional alignment and uniformity improves the representations, in terms of linear evaluation, on both CIFAR-100 and a brain MRI dataset.

【8】 Parallel Physics-Informed Neural Networks with Bidirectional Balance 标题：双向平衡的并行物理信息神经网络链接：https://arxiv.org/abs/2111.05641

作者：Yuhao Huang 机构：∗Beijing Jiaotong University, Beijing, China 备注：9 pages, 10 figures 摘要：作为一种新兴的深度学习技术，物理信息神经网络（PINNs）已被广泛应用于解决工程中的各种偏微分方程（pde）。然而，基于实际考虑的偏微分方程包含多个物理量和复杂的初始边界条件，因此PINNs通常返回错误的结果。这里我们以多层织物的传热问题为例。它由多个具有强相关性的温度场耦合而成，变量值在不同维度之间极不平衡。我们阐明了用经典PINNs解决这类问题的潜在困难，并提出了一种具有双向平衡的并行物理信息神经网络。具体而言，我们的并行求解框架通过几个多层感知同步拟合耦合方程。此外，我们还设计了两个模块来平衡数据的前向处理和损失梯度的反向传播过程。这种双向平衡不仅使整个网络稳定收敛，而且有助于充分了解PDE中的各种物理条件。我们提供了一系列烧蚀实验来验证所提出方法的有效性。结果表明，该方法使PINNs不可解问题可解，并获得了良好的求解精度。摘要：As an emerging technology in deep learning, physics-informed neural networks (PINNs) have been widely used to solve various partial differential equations (PDEs) in engineering. However, PDEs based on practical considerations contain multiple physical quantities and complex initial boundary conditions, thus PINNs often returns incorrect results. Here we take heat transfer problem in multilayer fabrics as a typical example. It is coupled by multiple temperature fields with strong correlation, and the values of variables are extremely unbalanced among different dimensions. We clarify the potential difficulties of solving such problems by classic PINNs, and propose a parallel physics-informed neural networks with bidirectional balance. In detail, our parallel solving framework synchronously fits coupled equations through several multilayer perceptions. Moreover, we design two modules to balance forward process of data and back-propagation process of loss gradient. This bidirectional balance not only enables the whole network to converge stably, but also helps to fully learn various physical conditions in PDEs. We provide a series of ablation experiments to verify the effectiveness of the proposed methods. The results show that our approach makes the PINNs unsolvable problem solvable, and achieves excellent solving accuracy.

【9】 Lightweight machine unlearning in neural network 标题：神经网络中的轻量级机器遗忘链接：https://arxiv.org/abs/2111.05528

作者：Kongyang Chen,Yiwen Wang,Yao Huang 摘要：近年来，机器学习神经网络已经深入到人们的生活中。作为便利的代价，人们的私人信息也有被披露的风险。及时引入了“被遗忘的权利”，规定个人有权根据自己的同意撤销对个人信息处理活动的同意。为了解决这一问题，提出了机器忘却学习，允许模型擦除所有私有信息的内存。以前的研究，包括再训练和增量学习来更新模型，通常会占用额外的存储空间或难以应用于神经网络。我们的方法只需要对目标模型的权重进行一个小的扰动，并使其在使用剩余数据子集训练的模型的方向上迭代，直到完全消除未学习数据对模型的贡献。在本文中，在五个数据集上的实验证明了我们的机器学习方法的有效性，并且我们的方法比再训练快15倍。摘要：In recent years, machine learning neural network has penetrated deeply into people's life. As the price of convenience, people's private information also has the risk of disclosure. The "right to be forgotten" was introduced in a timely manner, stipulating that individuals have the right to withdraw their consent from personal information processing activities based on their consent. To solve this problem, machine unlearning is proposed, which allows the model to erase all memory of private information. Previous studies, including retraining and incremental learning to update models, often take up extra storage space or are difficult to apply to neural networks. Our method only needs to make a small perturbation of the weight of the target model and make it iterate in the direction of the model trained with the remaining data subset until the contribution of the unlearning data to the model is completely eliminated. In this paper, experiments on five datasets prove the effectiveness of our method for machine unlearning, and our method is 15 times faster than retraining.

【10】 Multi-Agent Learning for Iterative Dominance Elimination: Formal Barriers and New Algorithms 标题：迭代优势消除的多Agent学习：形式障碍与新算法链接：https://arxiv.org/abs/2111.05486

作者：Jibang Wu,Haifeng Xu,Fan Yao 机构：University of Virginia 摘要：支配行为是次优行为的自然（可能是最简单的）多智能体推广，如标准单智能体决策中的行为。因此，与标准的bandit学习类似，多agent系统中的一个基本学习问题是，如果agent只能观察到关于其所玩动作的回报的嘈杂的bandit反馈，则agent是否能够学习有效地消除未知博弈中的所有支配动作。令人惊讶的是，尽管一项看似简单的任务，我们却显示出一个相当消极的结果；也就是说，标准的无遗憾算法——包括整个对偶平均算法家族——可以证明需要指数级的多次循环来消除所有支配行为。此外，具有更强无交换遗憾的算法也遭受类似的指数效率低下。为了克服这些障碍，我们开发了一种新的算法，可以在历史回报递减的情况下调整Exp3（称为Exp3 DH）；Exp3 DH以精心定制的速率逐渐忘记历史。我们证明了当所有代理运行Exp3 DH（也称为多代理学习中的自我游戏）时，所有支配行为都可以在多项式多轮内迭代消除。我们的实验结果进一步证明了Exp3 DH的有效性，并且最先进的bandit算法，即使是专门为游戏学习而开发的算法，也无法有效地消除所有支配动作。摘要：Dominated actions are natural (and perhaps the simplest possible) multi-agent generalizations of sub-optimal actions as in standard single-agent decision making. Thus similar to standard bandit learning, a basic learning question in multi-agent systems is whether agents can learn to efficiently eliminate all dominated actions in an unknown game if they can only observe noisy bandit feedback about the payoff of their played actions. Surprisingly, despite a seemingly simple task, we show a quite negative result; that is, standard no regret algorithms -- including the entire family of Dual Averaging algorithms -- provably take exponentially many rounds to eliminate all dominated actions. Moreover, algorithms with the stronger no swap regret also suffer similar exponential inefficiency. To overcome these barriers, we develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards (termed Exp3-DH); Exp3-DH gradually forgets history at carefully tailored rates. We prove that when all agents run Exp3-DH (a.k.a., self-play in multi-agent learning), all dominated actions can be iteratively eliminated within polynomially many rounds. Our experimental results further demonstrate the efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those developed specifically for learning in games, fail to eliminate all dominated actions efficiently.

【11】 Constrained Instance and Class Reweighting for Robust Learning under Label Noise 标题：标签噪声下鲁棒学习的约束实例和类加权链接：https://arxiv.org/abs/2111.05428

作者：Abhishek Kumar,Ehsan Amid 机构：Google Research, Brain Team 备注：27 pages, including Appendix 摘要：深度神经网络在监督学习中表现出了令人印象深刻的性能，这是由于它们能够很好地适应所提供的训练数据。然而，它们的性能在很大程度上取决于训练数据的质量，并且在存在噪声的情况下往往会下降。我们提出了一种处理标签噪声的原则性方法，目的是为单个实例和类标签分配重要性权重。我们的方法通过构造一类约束优化问题来工作，这些约束优化问题为这些重要权重生成简单的闭式更新。所提出的优化问题在每个小批量中解决，从而避免了在整个数据集上存储和更新权重的需要。我们的优化框架还为解决标签噪声（如标签引导）的现有标签平滑启发式算法提供了理论视角。我们在几个基准数据集上对我们的方法进行了评估，并观察到在存在标签噪声的情况下有相当大的性能提升。摘要：Deep neural networks have shown impressive performance in supervised learning, enabled by their ability to fit well to the provided training data. However, their performance is largely dependent on the quality of the training data and often degrades in the presence of noise. We propose a principled approach for tackling label noise with the aim of assigning importance weights to individual instances and class labels. Our method works by formulating a class of constrained optimization problems that yield simple closed form updates for these importance weights. The proposed optimization problems are solved per mini-batch which obviates the need of storing and updating the weights over the full dataset. Our optimization framework also provides a theoretical perspective on existing label smoothing heuristics for addressing label noise (such as label bootstrapping). We evaluate our method on several benchmark datasets and observe considerable performance gains in the presence of label noise.

【12】 MNet-Sim: A Multi-layered Semantic Similarity Network to Evaluate Sentence Similarity 标题：Mnet-Sim：一种用于句子相似度评价的多层次语义相似度网络链接：https://arxiv.org/abs/2111.05412

作者：Manuela Nayantara Jeyaraj,Dharshana Kasthurirathna 机构：Sri Lanka Institute of Information Technology, Malabe, Sri Lanka 备注：None 摘要：相似性是一种比较主观的度量，它随所考虑的领域而变化。在诸如文档分类、模式识别、聊天机器人问答、情感分析等NLP应用中，识别句子对的准确相似度分数已成为一个关键的研究领域。在现有的评估相似性的模型中，基于上下文比较有效计算这种相似性的局限性、中心理论的局限性以及缺乏非语义文本比较已被证明是缺点。因此，本文提出了一个基于多个相似度度量的多层语义相似度网络模型，该模型基于网络科学原理、邻域加权关系边和一个扩展的节点相似度计算公式，给出了整体句子相似度得分。对所提出的多层网络模型进行了评估，并与已建立的最新模型进行了测试，结果表明，该模型在评估句子相似性方面具有更好的性能分数。摘要：Similarity is a comparative-subjective measure that varies with the domain within which it is considered. In several NLP applications such as document classification, pattern recognition, chatbot question-answering, sentiment analysis, etc., identifying an accurate similarity score for sentence pairs has become a crucial area of research. In the existing models that assess similarity, the limitation of effectively computing this similarity based on contextual comparisons, the localization due to the centering theory, and the lack of non-semantic textual comparisons have proven to be drawbacks. Hence, this paper presents a multi-layered semantic similarity network model built upon multiple similarity measures that render an overall sentence similarity score based on the principles of Network Science, neighboring weighted relational edges, and a proposed extended node similarity computation formula. The proposed multi-layered network model was evaluated and tested against established state-of-the-art models and is shown to have demonstrated better performance scores in assessing sentence similarity.

【13】 Which priors matter? Benchmarking models for learning latent dynamics 标题：哪些前科很重要？学习潜在动力学的基准模型链接：https://arxiv.org/abs/2111.05458

作者：Aleksandar Botev,Andrew Jaegle,Peter Wirnsberger,Daniel Hennes,Irina Higgins 机构：DeepMind, London 摘要：学习动力学是机器学习（ML）许多重要应用的核心，例如机器人技术和自动驾驶。在这些设置中，ML算法通常需要使用高维观测（如图像）对物理系统进行推理，而不需要访问底层状态。最近，有几种方法提出将经典力学中的先验知识集成到ML模型中，以解决从图像进行物理推理的挑战。在这项工作中，我们冷静地看待这些模型的当前功能。为此，我们介绍了一套由17个数据集组成的数据集，这些数据集基于表现出广泛动态的物理系统进行视觉观察。我们对物理激励方法的主要类别以及几个强基线进行了彻底和详细的比较。虽然包含物理先验的模型通常可以学习具有理想性质的潜在空间，但我们的结果表明，这些方法无法显著改进标准技术。尽管如此，我们发现使用连续和时间可逆动力学有利于所有类别的模型。摘要：Learning dynamics is at the heart of many important applications of machine learning (ML), such as robotics and autonomous driving. In these settings, ML algorithms typically need to reason about a physical system using high dimensional observations, such as images, without access to the underlying state. Recently, several methods have proposed to integrate priors from classical mechanics into ML models to address the challenge of physical reasoning from images. In this work, we take a sober look at the current capabilities of these models. To this end, we introduce a suite consisting of 17 datasets with visual observations based on physical systems exhibiting a wide range of dynamics. We conduct a thorough and detailed comparison of the major classes of physically inspired methods alongside several strong baselines. While models that incorporate physical priors can often learn latent spaces with desirable properties, our results demonstrate that these methods fail to significantly improve upon standard techniques. Nonetheless, we find that the use of continuous and time-reversible dynamics benefits models of all classes.

【14】 Importance of Kernel Bandwidth in Quantum Machine Learning 标题：核带宽在量子机器学习中的重要性链接：https://arxiv.org/abs/2111.05451

作者：Ruslan Shaydulin,Stefan M. Wild 机构：Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL , USA 摘要：量子核方法被认为是将量子计算机应用于机器学习问题的一条很有前途的途径。然而，最近的研究结果忽略了超参数在决定机器学习方法性能中的核心作用。在这项工作中，我们展示了如何优化量子核的带宽可以提高核方法的性能，从随机猜测到与最好的经典方法竞争。在没有超参数优化的情况下，核值随量子比特数呈指数下降，这是最近观察到量子核方法的性能随量子比特数下降的原因。我们重现了这些负面结果，并通过使用多量子核和经典数据集的大量数值实验表明，如果核带宽得到优化，性能反而会随着量子比特数的增加而提高。我们在经典核和量子核的带宽之间建立了联系，并在这两种情况下表现出类似的行为。摘要：Quantum kernel methods are considered a promising avenue for applying quantum computers to machine learning problems. However, recent results overlook the central role hyperparameters play in determining the performance of machine learning methods. In this work we show how optimizing the bandwidth of a quantum kernel can improve the performance of the kernel method from a random guess to being competitive with the best classical methods. Without hyperparameter optimization, kernel values decrease exponentially with qubit count, which is the cause behind recent observations that the performance of quantum kernel methods decreases with qubit count. We reproduce these negative results and show, through extensive numerical experiments using multiple quantum kernels and classical datasets, that if the kernel bandwidth is optimized, the performance instead improves with growing qubit count. We draw a connection between the bandwidth of classical and quantum kernels and show analogous behavior in both cases.

【15】 Robust deep learning-based semantic organ segmentation in hyperspectral images 标题：基于鲁棒深度学习的高光谱图像语义器官分割链接：https://arxiv.org/abs/2111.05408

作者：Silvia Seidlitz,Jan Sellner,Jan Odenthal,Berkin Özdemir,Alexander Studier-Fischer,Samuel Knödler,Leonardo Ayala,Tim Adler,Hannes G. Kenngott,Minu Tizabi,Martin Wagner,Felix Nickel,Beat P. Müller-Stich,Lena Maier-Hein 机构：Maier-Heina,b,d,e,f, Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany, Helmholtz Information and Data Science School for Health, KarlsruheHeidelberg, Germany 备注：The first two authors (Silvia Seidlitz and Jan Sellner) contributed equally to this paper 摘要：语义图像分割是外科手术中上下文感知和自主机器人技术的重要前提。目前的技术水平主要集中在微创手术期间获取的常规RGB视频数据上，但迄今为止，基于光谱成像数据和开放手术期间获取的全场景语义分割几乎没有受到关注。为了解决文献中的这一差距，我们正在研究以下基于开放手术环境下获得的猪高光谱成像（HSI）数据的研究问题：（1）基于神经网络的全自动器官分割的HSI数据的适当表示形式是什么，特别是在数据的空间粒度方面（像素与超级像素、面片与完整图像）？（2）在执行语义器官分割时，与其他模式（即RGB数据和处理后的HSI数据（例如，组织参数，如氧合）相比，使用HSI数据是否有好处？根据一项基于20头猪的506张HSI图像的综合验证研究，共有19个类别进行了注释，基于深度学习的分割性能随着输入数据的空间背景而不断提高。未经处理的HSI数据比RGB数据或来自摄像机提供商的已处理数据具有优势，优势随着神经网络输入尺寸的减小而增大。最大性能（应用于整个图像的HSI）产生的平均骰子相似系数（DSC）为0.89（标准偏差（SD）0.04），在评分者间可变性（DSC为0.89（SD 0.07））范围内。我们得出结论，HSI可以成为全自动手术场景理解的强大图像模式，与传统成像相比具有许多优势，包括恢复额外功能组织信息的能力。摘要：Semantic image segmentation is an important prerequisite for context-awareness and autonomous robotics in surgery. The state of the art has focused on conventional RGB video data acquired during minimally invasive surgery, but full-scene semantic segmentation based on spectral imaging data and obtained during open surgery has received almost no attention to date. To address this gap in the literature, we are investigating the following research questions based on hyperspectral imaging (HSI) data of pigs acquired in an open surgery setting: (1) What is an adequate representation of HSI data for neural network-based fully automated organ segmentation, especially with respect to the spatial granularity of the data (pixels vs. superpixels vs. patches vs. full images)? (2) Is there a benefit of using HSI data compared to other modalities, namely RGB data and processed HSI data (e.g. tissue parameters like oxygenation), when performing semantic organ segmentation? According to a comprehensive validation study based on 506 HSI images from 20 pigs, annotated with a total of 19 classes, deep learning-based segmentation performance increases - consistently across modalities - with the spatial context of the input data. Unprocessed HSI data offers an advantage over RGB data or processed data from the camera provider, with the advantage increasing with decreasing size of the input to the neural network. Maximum performance (HSI applied to whole images) yielded a mean dice similarity coefficient (DSC) of 0.89 (standard deviation (SD) 0.04), which is in the range of the inter-rater variability (DSC of 0.89 (SD 0.07)). We conclude that HSI could become a powerful image modality for fully-automatic surgical scene understanding with many advantages over traditional imaging, including the ability to recover additional functional tissue information.

其他(11篇)

【1】 Physics-enhanced deep surrogates for PDEs 标题：物理增强的偏微分方程深层替代物链接：https://arxiv.org/abs/2111.05841

作者：Raphaël Pestourie,Youssef Mroueh,Chris Rackauckas,Payel Das,Steven G. Johnson 机构：Johnson 摘要：我们提出了一种“物理增强深度代理”（PEDS）方法，用于为偏微分方程（PDE）和类似模型描述的复杂物理系统开发快速代理模型：我们展示了如何将低保真“粗略”解算器与生成“粗化”输入的神经网络相结合，经过端到端训练，以全局匹配昂贵的高保真数值解算器的输出。这样，通过将有限的物理知识以低保真度模型的形式结合起来，我们发现，PEDS代理可以使用比原始模型少至少$sim 10 倍的数据进行训练“黑箱”神经网络具有相同的精确度。渐进地，PEDS似乎比黑盒代理学习更陡峭的幂律，并且与主动学习相结合时获益更大。我们以光学超材料设计中出现的电磁散射问题为例，说明了该方法的可行性和优点。摘要：We present a "physics-enhanced deep-surrogate ("PEDS") approach towards developing fast surrogate models for complex physical systems described by partial differential equations (PDEs) and similar models: we show how to combine a low-fidelity "coarse" solver with a neural network that generates "coarsified'' inputs, trained end-to-end to globally match the output of an expensive high-fidelity numerical solver. In this way, by incorporating limited physical knowledge in the form of the low-fidelity model, we find that a PEDS surrogate can be trained with at least $sim 10times$ less data than a "black-box'' neural network for the same accuracy. Asymptotically, PEDS appears to learn with a steeper power law than black-box surrogates, and benefits even further when combined with active learning. We demonstrate feasibility and benefit of the proposed approach by using an example problem in electromagnetic scattering that appears in the design of optical metamaterials.

【2】 Multi-Task Neural Processes 标题：多任务神经过程链接：https://arxiv.org/abs/2111.05820

作者：Jiayi Shen,Xiantong Zhen,Marcel Worring,Ling Shao 机构：AIM Lab, University of Amsterdam, Netherlands, Inception Institute of Artificial Intelligence, Abu Dhabi, UAE 摘要：神经过程是最近出现的一类强大的神经潜变量模型，它结合了神经网络和随机过程的优点。由于它们可以在网络功能空间中对上下文数据进行编码，因此为多任务学习中的任务相关性建模提供了一种新的方法。为了研究它的潜力，我们开发了多任务神经过程，这是一种用于多任务学习的神经过程的新变体。特别是，我们建议从功能空间中的相关任务中探索可转移的知识，以提供归纳偏差来改进每个单独的任务。为此，我们在一个分层贝叶斯推理框架中推导出函数先验，使每个任务能够将相关任务提供的共享知识纳入其预测函数的上下文中。我们的多任务神经过程在方法上扩展了普通神经过程的范围，为探索多任务学习函数空间中的任务相关性提供了一种新的方法。所提出的多任务神经过程能够在有限的标记数据和存在域移位的情况下学习多个任务。我们对多任务回归和分类任务的几个基准进行了广泛的实验评估。结果表明，多任务神经过程在多任务学习任务间传递有用知识方面的有效性，以及在多任务分类和脑图像分割方面的优越性能。摘要：Neural processes have recently emerged as a class of powerful neural latent variable models that combine the strengths of neural networks and stochastic processes. As they can encode contextual data in the network's function space, they offer a new way to model task relatedness in multi-task learning. To study its potential, we develop multi-task neural processes, a new variant of neural processes for multi-task learning. In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task. To do so, we derive the function priors in a hierarchical Bayesian inference framework, which enables each task to incorporate the shared knowledge provided by related tasks into its context of the prediction function. Our multi-task neural processes methodologically expand the scope of vanilla neural processes and provide a new way of exploring task relatedness in function spaces for multi-task learning. The proposed multi-task neural processes are capable of learning multiple tasks with limited labeled data and in the presence of domain shift. We perform extensive experimental evaluations on several benchmarks for the multi-task regression and classification tasks. The results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning and superior performance in multi-task classification and brain image segmentation.

【3】 SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval 标题：SWAMP：面向跨模态检索的多模态对互换赋值链接：https://arxiv.org/abs/2111.05814

作者：Minyoung Kim 机构：Samsung AI Center, Cambridge, UK 摘要：我们处理跨模态检索问题，其中训练仅由数据中的相关多模态对监督。对比学习是完成这项任务最常用的方法。然而，它的学习采样复杂度在训练数据点的数量上是二次的。此外，它还可能错误地假设不同对中的实例自动无关。为了解决这些问题，我们提出了一种新的基于未知类自标记的损失函数。具体来说，我们的目标是预测每个模态中数据实例的类标签，并将这些标签分配给另一模态中的相应实例（即交换伪标签）。通过这些交换标签，我们使用监督交叉熵损失学习每个模态的数据嵌入，从而导致线性采样复杂性。我们还维护用于存储最新批次嵌入的队列，对于这些队列，聚类分配和嵌入学习以在线方式同时完成。这消除了为离线集群注入间歇的整个训练数据扫描的计算开销。我们在几个实际的跨模态检索问题上测试了我们的方法，包括基于文本的视频检索、基于草图的图像检索和图像文本检索，并且对于所有这些任务，我们的方法都比对比学习有显著的性能改进。摘要：We tackle the cross-modal retrieval problem, where the training is only supervised by the relevant multi-modal pairs in the data. The contrastive learning is the most popular approach for this task. However, its sampling complexity for learning is quadratic in the number of training data points. Moreover, it makes potentially wrong assumption that the instances in different pairs are automatically irrelevant. To address these issues, we propose a novel loss function that is based on self-labeling of the unknown classes. Specifically, we aim to predict class labels of the data instances in each modality, and assign those labels to the corresponding instances in the other modality (i.e., swapping the pseudo labels). With these swapped labels, we learn the data embedding for each modality using the supervised cross-entropy loss, hence leading to linear sampling complexity. We also maintain the queues for storing the embeddings of the latest batches, for which clustering assignment and embedding learning are done at the same time in an online fashion. This removes computational overhead of injecting intermittent epochs of entire training data sweep for offline clustering. We tested our approach on several real-world cross-modal retrieval problems, including text-based video retrieval, sketch-based image retrieval, and image-text retrieval, and for all these tasks our method achieves significant performance improvement over the contrastive learning.

【4】 Gradients are Not All You Need 标题：渐变不是您需要的全部链接：https://arxiv.org/abs/2111.05803

作者：Luke Metz,C. Daniel Freeman,Samuel S. Schoenholz,Tal Kachman 机构：Google Research, Brain Team, Radboud University, Donders Institute for Brain, Cognition and Behaviour 摘要：可微编程技术在社区中得到了广泛的应用，并在过去几十年中促成了机器学习的复兴。虽然这些方法很强大，但也有局限性。在这篇简短的报告中，我们讨论了一种常见的基于混沌的故障模式，它出现在各种可微环境中，从递归神经网络和数值物理模拟到训练学习的优化器。我们将这一失败追溯到所研究系统的雅可比矩阵谱，并提供从业者预计这一失败何时会破坏其基于微分的优化算法的标准。摘要：Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms.

【5】 Distribution-Invariant Differential Privacy 标题：分布不变的差分隐私链接：https://arxiv.org/abs/2111.05791

作者：Xuan Bi,Xiaotong Shen 机构： Carlson School of Management, University of Minnesota, edu; 2School of Statistics 摘要：差异隐私正在成为保护公共共享数据隐私的一个黄金标准。它已广泛应用于社会科学、数据科学、公共卫生、信息技术和美国十年一次的人口普查。然而，为了保证不同的隐私，现有的方法可能不可避免地改变原始数据分析的结论，因为私有化经常改变样本分布。这种现象被称为隐私保护和统计准确性之间的权衡。在这项工作中，我们通过开发一种分布不变私有化（DIP）方法来打破这种平衡，以协调高统计精度和严格的差异隐私。因此，任何下游统计或机器学习任务都会产生与使用原始数据基本相同的结论。从数字上看，在同样严格的隐私保护下，DIP在两次模拟和三次真实基准测试中获得了更高的统计精度。摘要：Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we break this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in two simulations and on three real-world benchmarks.

【6】 ICDAR 2021 Competition on Document VisualQuestion Answering 标题：ICDAR 2021文件视觉问答比赛链接：https://arxiv.org/abs/2111.05547

作者：Rubèn Tito,Minesh Mathew,C. V. Jawahar,Ernest Valveny,Dimosthenis Karatzas 机构： Computer Vision Center, UAB, Spain, CVIT, IIIT Hyderabad, India 摘要：在本报告中，我们介绍了ICDAR 2021版文件“视觉问题挑战”的结果。这个版本补充了以前关于单文档VQA和文档集合VQA的任务，新引入了一个关于信息图形VQA的任务。Infographics VQA基于一个包含5000多张Infographics图像和30000对问答的新数据集。获胜者方法在信息图形VQA任务中的ANLS得分为0.6120，在文档收集VQA任务中的ANLSL得分为0.7743，在单文档VQA中的ANLS得分为0.8705。我们总结了每项任务所使用的数据集，描述了每种提交的方法，并对其性能进行了结果和分析。本文还总结了自DocVQA 2020挑战第一版以来在单文件VQA方面取得的进展。摘要：In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5,000 infographics images and 30,000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.

【7】 Attention Approximates Sparse Distributed Memory 标题：注意力近似稀疏分布式内存链接：https://arxiv.org/abs/2111.05498

作者：Trenton Bricken,Cengiz Pehlevan 机构：Systems, Synthetic and Quantitative Biology, Harvard University, Applied Mathematics 备注：None 摘要：虽然注意力已经成为深度学习的一个重要机制，但对于它为何如此有效的直觉仍然有限。在这里，我们表明，在某些数据条件下，Transformer注意可能与Kanerva的稀疏分布记忆（SDM）密切相关，SDM是一种生物学上合理的联想记忆模型。我们确认在预先训练的GPT2Transformer模型中满足这些条件。我们讨论了注意力SDM图的含义，并提供了关于注意力的新的计算和生物学解释。摘要：While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva's Sparse Distributed Memory (SDM), a biologically plausible associative memory model. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.

【8】 SGD Through the Lens of Kolmogorov Complexity 标题：从柯尔莫戈洛夫复杂性的透镜看SGD 链接：https://arxiv.org/abs/2111.05478

作者：Gregory Schwartzman 摘要：我们证明了随机梯度下降法（SGD）可以在整个数据集上获得$（1-epsilon）$分类精度。我们是在两个主要假设下这样做的：（1.局部进展）模型的精度在批次上有一致的提高。（2.模型计算简单函数）模型计算的函数简单（具有较低的Kolmogorov复杂度）。直观地说，上面的意思是SGD的emph{localprogress}意味着emph{global progress}。假设2对欠参数化模型非常适用，因此，我们的工作为一般的{欠参数化模型}提供了第一个收敛保证。此外，这是第一个完全不依赖模型的结果-我们不要求模型具有任何特定的结构或激活功能，它甚至可能不是神经网络。我们的分析使用了熵压缩方法，这是由Moser和Tardos在Lov'asz局部引理的背景下首次引入的。摘要：We prove that stochastic gradient descent (SGD) finds a solution that achieves $(1-epsilon)$ classification accuracy on the entire dataset. We do so under two main assumptions: (1. Local progress) There is consistent improvement of the model accuracy over batches. (2. Models compute simple functions) The function computed by the model is simple (has low Kolmogorov complexity). Intuitively, the above means that emph{local progress} of SGD implies emph{global progress}. Assumption 2 trivially holds for underparameterized models, hence, our work gives the first convergence guarantee for general, emph{underparameterized models}. Furthermore, this is the first result which is completely emph{model agnostic} - we don't require the model to have any specific architecture or activation function, it may not even be a neural network. Our analysis makes use of the entropy compression method, which was first introduced by Moser and Tardos in the context of the Lov'asz local lemma.

【9】 Cross-Layered Distributed Data-driven Framework For Enhanced Smart Grid Cyber-Physical Security 标题：面向增强型智能电网网络物理安全的跨层分布式数据驱动框架链接：https://arxiv.org/abs/2111.05460

作者：Allen Starke,Keerthiraj Nagaraj,Cody Ruben,Nader Aljohani,Sheng Zou,Arturo Bretas,Janise McNair,Alina Zare 机构：Electrical and Computer Engineering, University of Florida, Center Dr, Gainesville, Fl. , U.S. 摘要：智能电网（SG）的研究和开发因其对社会、经济和环境的巨大影响而受到学术界、工业界和政府的广泛关注。由于越来越依赖通信网络来协助物理过程控制，使SG面临各种网络威胁，因此保护SG是一项相当重大的挑战。除了使用虚假数据注入（FDI）技术改变测量值的攻击外，对通信网络的攻击还可能通过截获消息或用不必要的数据淹没通信信道而中断电力系统的实时运行。解决这些攻击需要跨层方法。本文提出了一种跨层策略，称为自适应统计跨层集成CorrDet（CECD-AS），该策略集成了故障SG测量数据的检测以及不一致的网络到达时间和传输延迟，以实现更可靠和准确的异常检测和攻击解译。数值结果表明，CECD-AS可以检测到多个虚假数据注入、拒绝服务（DoS）和中间人（MITM）攻击，具有较高的F1得分，相比于仅使用SG测量数据的现有方法，例如传统的基于物理的状态估计，采用自适应统计策略和其他基于机器学习分类的检测方案进行集成校正。摘要：Smart Grid (SG) research and development has drawn much attention from academia, industry and government due to the great impact it will have on society, economics and the environment. Securing the SG is a considerably significant challenge due the increased dependency on communication networks to assist in physical process control, exposing them to various cyber-threats. In addition to attacks that change measurement values using False Data Injection (FDI) techniques, attacks on the communication network may disrupt the power system's real-time operation by intercepting messages, or by flooding the communication channels with unnecessary data. Addressing these attacks requires a cross-layer approach. In this paper a cross-layered strategy is presented, called Cross-Layer Ensemble CorrDet with Adaptive Statistics(CECD-AS), which integrates the detection of faulty SG measurement data as well as inconsistent network inter-arrival times and transmission delays for more reliable and accurate anomaly detection and attack interpretation. Numerical results show that CECD-AS can detect multiple False Data Injections, Denial of Service (DoS) and Man In The Middle (MITM) attacks with a high F1-score compared to current approaches that only use SG measurement data for detection such as the traditional physics-based State Estimation, Ensemble CorrDet with Adaptive Statistics strategy and other machine learning classification-based detection schemes.

【10】 Identifying the Risks of Chronic Diseases Using BMI Trajectories 标题：利用体重指数轨迹识别慢性病风险链接：https://arxiv.org/abs/2111.05385

作者：Md Mozaharul Mottalib,Jessica C Jones-Smith,Bethany Sheridan,Rahmatollah Beheshti 摘要：肥胖是一个主要的健康问题，增加了患各种主要慢性病的风险，如糖尿病、癌症和中风。虽然通过横断面BMI记录确定的肥胖的作用已经被大量研究，但BMI轨迹的作用却很少被探讨。在这项研究中，我们使用一种机器学习方法，通过使用从一个大型且地理位置不同的EHR数据集中提取的BMI轨迹来分析亚型个体患18种主要慢性病的风险，该数据集捕获了大约200万个体在六年期间的健康状况。我们根据BMI轨迹定义了九个新的可解释和基于证据的变量，使用k-均值聚类方法将患者分组。我们从人口统计学、社会经济和生理学测量变量的角度全面回顾了每个集群的特征，以明确集群中患者的不同特征。在我们的实验中，肥胖与糖尿病、高血压、阿尔茨海默病和痴呆症之间的直接关系已经被重新确立，并且发现一些慢性病具有特定特征的独特集群符合或补充了现有的知识体系。摘要：Obesity is a major health problem, increasing the risk of various major chronic diseases, such as diabetes, cancer, and stroke. While the role of obesity identified by cross-sectional BMI recordings has been heavily studied, the role of BMI trajectories is much less explored. In this study, we use a machine learning approach to subtype individuals' risk of developing 18 major chronic diseases by using their BMI trajectories extracted from a large and geographically diverse EHR dataset capturing the health status of around two million individuals for a period of six years. We define nine new interpretable and evidence-based variables based on the BMI trajectories to cluster the patients into subgroups using the k-means clustering method. We thoroughly review each clusters' characteristics in terms of demographic, socioeconomic, and physiological measurement variables to specify the distinct properties of the patients in the clusters. In our experiments, direct relationship of obesity with diabetes, hypertension, Alzheimer's, and dementia have been re-established and distinct clusters with specific characteristics for several of the chronic diseases have been found to be conforming or complementary to the existing body of knowledge.

【11】 Federated Expectation Maximization with heterogeneity mitigation and variance reduction 标题：具有异质性缓解和方差减小的联合期望最大化链接：https://arxiv.org/abs/2111.02083

作者：Aymeric Dieuleveut,Gersende Fort,Eric Moulines,Geneviève Robin 机构：Centre de Math´ematiques Appliqu´ees, Ecole Polytechnique, France, Institut Polytechnique de Paris, Institut de Math´ematiques de Toulouse, Universit´e de Toulouse; CNRS, UPS, Toulouse, France, CS Dpt, HSE University, Russian Federation, Genevieve Robin 备注：None 摘要：期望最大化（EM）算法是潜变量模型推理的默认算法。与机器学习的任何其他领域一样，将潜在变量模型应用于非常大的数据集使得必须使用先进的并行和分布式体系结构。本文介绍了FedEM，它是EM算法在联邦学习环境中的第一个扩展。FedEM是一种新的高效通信方法，它处理本地设备的部分参与，并且对数据集的异构分布具有鲁棒性。为了缓解通信瓶颈，FedEM压缩适当定义的完整数据和足够的统计数据。我们还开发和分析了FedEM的一个扩展，以进一步合并方差缩减方案。在所有情况下，我们推导了光滑非凸问题的有限时间复杂度界。数值结果支持我们的理论发现，并应用于生物多样性监测中的联邦缺失值插补。摘要：The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning context. FedEM is a new communication efficient method, which handles partial participation of local devices, and is robust to heterogeneous distributions of the datasets. To alleviate the communication bottleneck, FedEM compresses appropriately defined complete data sufficient statistics. We also develop and analyze an extension of FedEM to further incorporate a variance reduction scheme. In all cases, we derive finite-time complexity bounds for smooth non-convex problems. Numerical results are presented to support our theoretical findings, as well as an application to federated missing values imputation for biodiversity monitoring.

linux https 批量计算网络安全 NLP服务

0 人点赞