机器学习学术速递[11.15]

cs.LG 方向，今日共计70篇

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】 STFL: A Temporal-Spatial Federated Learning Framework for Graph Neural Networks 标题：STFL：一种图神经网络的时空联合学习框架链接：https://arxiv.org/abs/2111.06750

作者：Guannan Lou,Yuze Liu,Tiehua Zhang,Xi Zheng 机构： Macquarie University, Ant Group 摘要：我们提出了一个图神经网络的时空联合学习框架，即STFL。该框架探索了输入时空数据的潜在相关性，并将其转换为节点特征和邻接矩阵。框架中的联邦学习设置确保了数据隐私，同时实现了良好的模型泛化。在睡眠阶段数据集ISRUC_S3上的实验结果说明了STFL在图形预测任务中的有效性。摘要：We present a spatial-temporal federated learning framework for graph neural networks, namely STFL. The framework explores the underlying correlation of the input spatial-temporal data and transform it to both node features and adjacency matrix. The federated learning setting in the framework ensures data privacy while achieving a good model generalization. Experiments results on the sleep stage dataset, ISRUC_S3, illustrate the effectiveness of STFL on graph prediction tasks.

【2】 deepstruct -- linking deep learning and graph theory 标题：深度结构--连接深度学习与图论链接：https://arxiv.org/abs/2111.06679

作者：Julian Stier,Michael Granitzer 机构：Chair of Data Science, University of Passau 摘要：deepstruct将深度学习模型和图论连接起来，这样可以对神经网络施加不同的图结构，或者从训练过的神经网络模型中提取图结构。为此，deepstruct提供了具有不同限制的深度神经网络模型，这些模型可以基于初始图创建。此外，还提供了从经过训练的模型中提取图形结构的工具。即使对于只有几万个参数的模型来说，提取图形的这一步骤在计算上也很昂贵，这是一个具有挑战性的问题。deepstruct支持修剪、神经结构搜索、自动网络设计和神经网络结构分析方面的研究。摘要：deepstruct connects deep learning models and graph theory such that different graph structures can be imposed on neural networks or graph structures can be extracted from trained neural network models. For this, deepstruct provides deep neural network models with different restrictions which can be created based on an initial graph. Further, tools to extract graph structures from trained models are available. This step of extracting graphs can be computationally expensive even for models of just a few dozen thousand parameters and poses a challenging problem. deepstruct supports research in pruning, neural architecture search, automated network design and structure analysis of neural networks.

【3】 Implicit vs Unfolded Graph Neural Networks 标题：隐式VS展开图神经网络链接：https://arxiv.org/abs/2111.06592

作者：Yongyi Yang,Yangkun Wang,Zengfeng Huang,David Wipf 机构：Fudan University,Shanghai Jiao Tong University,Amazon 摘要：已经观察到，图形神经网络（GNN）有时难以在跨节点建模长期依赖关系之间保持健康的平衡，同时避免意外后果，如过度平滑的节点表示。为了解决这一问题（除其他外），最近提出了两种不同的战略，即隐式和非隐式GNN。前者将节点表示视为深度均衡模型的不动点，该模型可以有效地促进在具有固定内存占用的图上的任意隐式传播。与之相反，后者涉及将图传播视为应用于某些图正则化能量函数的未展开下降迭代。虽然动机不同，但在本文中，我们仔细阐明了这些方法的相似性和差异性，量化了它们产生的解决方案可能实际上是等效的，而其他解决方案可能会出现行为分歧的显式情况。这包括对趋同性、代表性和可解释性的分析。我们还提供了各种合成和公共现实世界基准之间的经验对比。摘要：It has been observed that graph neural networks (GNN) sometimes struggle to maintain a healthy balance between modeling long-range dependencies across nodes while avoiding unintended consequences such as oversmoothed node representations. To address this issue (among other things), two separate strategies have recently been proposed, namely implicit and unfolded GNNs. The former treats node representations as the fixed points of a deep equilibrium model that can efficiently facilitate arbitrary implicit propagation across the graph with a fixed memory footprint. In contrast, the latter involves treating graph propagation as the unfolded descent iterations as applied to some graph-regularized energy function. While motivated differently, in this paper we carefully elucidate the similarity and differences of these methods, quantifying explicit situations where the solutions they produced may actually be equivalent and others where behavior diverges. This includes the analysis of convergence, representational capacity, and interpretability. We also provide empirical head-to-head comparisons across a variety of synthetic and public real-world benchmarks.

【4】 AnchorGAE: General Data Clustering via O(n) Bipartite Graph Convolution标题：AnclGAE：基于O(N)二部图卷积的通用数据聚类链接：https://arxiv.org/abs/2111.06586

作者：Hongyuan Zhang,Jiankun Shi,Rui Zhang,Xuelong Li 机构：School of Computer Science and School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an , Shaanxi, P. R. China 摘要：基于图的聚类在聚类任务中起着重要作用。由于图卷积网络（GCN）是一种针对图类型数据的神经网络变体，它已经取得了令人印象深刻的性能，因此，研究GCN是否可以用于增强非图数据（即一般数据）上基于图的聚类方法是很有吸引力的。然而，给定$n$样本，基于图的聚类方法通常至少需要$O（n^2）$时间来构建图，而图卷积对于稠密图需要$O（n^2）$，对于具有$mathcal{E}边的稀疏图需要$O（| mathcal{E}）$）。换句话说，基于图的聚类和GCN都存在严重的低效率问题。为了解决这个问题并进一步利用GCN提高基于图的聚类能力，我们提出了一种新的聚类方法AnchorGAE。由于在一般的聚类场景中没有提供图结构，我们首先介绍如何通过引入生成图模型将非图数据集转换为图，该模型用于构建GCN。锚从原始数据生成，以构造二部图，从而使图卷积的计算复杂度从$O（n^2）$和$O（| mathcal{E}|）$降低到$O（n）$。集群的后续步骤可以很容易地设计为$O（n）$操作。有趣的是，锚定自然会导致暹罗GCN架构。由锚构建的二部图被动态更新以利用数据背后的高级信息。最后，我们从理论上证明了简单的更新会导致退化，并据此设计了具体的策略。摘要：Graph-based clustering plays an important role in clustering tasks. As graph convolution network (GCN), a variant of neural networks on graph-type data, has achieved impressive performance, it is attractive to find whether GCNs can be used to augment the graph-based clustering methods on non-graph data, i.e., general data. However, given $n$ samples, the graph-based clustering methods usually need at least $O(n^2)$ time to build graphs and the graph convolution requires nearly $O(n^2)$ for a dense graph and $O(|mathcal{E}|)$ for a sparse one with $|mathcal{E}|$ edges. In other words, both graph-based clustering and GCNs suffer from severe inefficiency problems. To tackle this problem and further employ GCN to promote the capacity of graph-based clustering, we propose a novel clustering method, AnchorGAE. As the graph structure is not provided in general clustering scenarios, we first show how to convert a non-graph dataset into a graph by introducing the generative graph model, which is used to build GCNs. Anchors are generated from the original data to construct a bipartite graph such that the computational complexity of graph convolution is reduced from $O(n^2)$ and $O(|mathcal{E}|)$ to $O(n)$. The succeeding steps for clustering can be easily designed as $O(n)$ operations. Interestingly, the anchors naturally lead to a siamese GCN architecture. The bipartite graph constructed by anchors is updated dynamically to exploit the high-level information behind data. Eventually, we theoretically prove that the simple update will lead to degeneration and a specific strategy is accordingly designed.

【5】 Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs 标题：顺序聚集与再物质化：图神经网络在大型图上的分布式全批次训练链接：https://arxiv.org/abs/2111.06483

作者：Hesham Mostafa 摘要：我们提出了一种顺序聚合和再物质化（SAR）方案，用于在大型图上对图神经网络（GNN）进行分布式全批量训练。最近，GNNs的大规模训练主要是基于采样的方法和基于不可学习消息传递的方法。另一方面，SAR是一种分布式技术，可以在整个大型图形上直接训练任何GNN类型。合成孔径雷达的关键创新在于分布式顺序重物质化方案，该方案在向后传递过程中顺序重新构造，然后释放令人望而却步的大GNN计算图。这导致了出色的内存扩展行为，其中每个工作线程的内存消耗量与工作线程数呈线性下降，即使对于密集连接的图也是如此。使用SAR，我们报告了迄今为止最大规模的全批量GNN训练应用，并展示了随着工作人员数量的增加而节省的大量内存。我们还提出了一种基于核融合和注意矩阵重物质化的通用技术来优化基于注意模型的运行时和内存效率。我们表明，与SAR结合，我们优化的注意内核在基于注意的GNN中可以显著提高速度并节省内存。摘要：We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory savings as the number of workers increases. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models. We show that, coupled with SAR, our optimized attention kernels lead to significant speedups and memory savings in attention-based GNNs.

【6】 Simplifying approach to Node Classification in Graph Neural Networks 标题：图神经网络中节点分类的简化方法链接：https://arxiv.org/abs/2111.06748

作者：Sunil Kumar Maurya,Xin Liu,Tsuyoshi Murata 机构：Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan, Artificial Intelligence Research Center, AIST, Tokyo, Japan, AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Tokyo, Japan, A R T I C L E I N F O 备注：arXiv admin note: substantial text overlap with arXiv:2105.07634 摘要：图神经网络已经成为从图结构数据中学习的不可或缺的工具之一，并且在各种各样的任务中都显示了其有用性。近年来，体系结构设计有了巨大的改进，在各种预测任务中获得了更好的性能。一般来说，这些神经结构使用同一层中的可学习权重矩阵将节点特征聚合和特征转换结合起来。这使得分析从各种跃点聚合的节点特征的重要性和神经网络层的表达能力变得具有挑战性。由于不同的图形数据集在特征和类标签分布中表现出不同程度的同质性和异质性，因此在没有任何先验信息的情况下，了解哪些特征对预测任务很重要变得至关重要。在这项工作中，我们解耦了图神经网络的节点特征聚合步骤和深度，并实证分析了不同聚合特征对预测性能的影响。我们表明，并非所有通过聚合步骤生成的特性都是有用的，并且通常使用这些信息量较小的特性会对GNN模型的性能造成不利影响。通过我们的实验，我们表明学习这些特征的某些子集可以在各种数据集上获得更好的性能。我们建议使用softmax作为正则化器和“软选择器”，从不同跳距的邻居聚合特征；以及GNN层上的L2规范化。结合这些技术，我们提出了一个简单而浅显的模型，即特征选择图神经网络（FSGNN），并通过实证表明，在节点分类任务的九个基准数据集中，该模型达到了与最先进的GNN模型相当甚至更高的精度，显著提高了51.1%。摘要：Graph Neural Networks have become one of the indispensable tools to learn from graph-structured data, and their usefulness has been shown in wide variety of tasks. In recent years, there have been tremendous improvements in architecture design, resulting in better performance on various prediction tasks. In general, these neural architectures combine node feature aggregation and feature transformation using learnable weight matrix in the same layer. This makes it challenging to analyze the importance of node features aggregated from various hops and the expressiveness of the neural network layers. As different graph datasets show varying levels of homophily and heterophily in features and class label distribution, it becomes essential to understand which features are important for the prediction tasks without any prior information. In this work, we decouple the node feature aggregation step and depth of graph neural network, and empirically analyze how different aggregated features play a role in prediction performance. We show that not all features generated via aggregation steps are useful, and often using these less informative features can be detrimental to the performance of the GNN model. Through our experiments, we show that learning certain subsets of these features can lead to better performance on wide variety of datasets. We propose to use softmax as a regularizer and "soft-selector" of features aggregated from neighbors at different hop distances; and L2-Normalization over GNN layers. Combining these techniques, we present a simple and shallow model, Feature Selection Graph Neural Network (FSGNN), and show empirically that the proposed model achieves comparable or even higher accuracy than state-of-the-art GNN models in nine benchmark datasets for the node classification task, with remarkable improvements up to 51.1%.

Transformer(1篇)

【1】 Automated question generation and question answering from Turkish texts using text-to-text transformers 标题：使用文本到文本转换器从土耳其语文本自动生成问题和回答问题链接：https://arxiv.org/abs/2111.06476

作者：Fatih Cagatay Akyon,Devrim Cavusoglu,Cemil Cengiz,Sinan Onur Altinuc,Alptekin Temizel 机构：OBSS AI, Ankara, Turkey, Graduate School of Informatics, METU, Ankara, Turkey, Computer Engineering, METU, Ankara, Turkey, Received: ., •, AcceptedPublished Online: ., Final Version: .. 备注：10 pages, 3 figures, 7 tables 摘要：虽然考试式问题是一种基本的教育工具，服务于多种目的，但手工构建问题是一个复杂的过程，需要训练、经验和资源。为了减少与手工构造问题相关的费用，并满足不断提供新问题的需要，可以使用自动问题生成（QG）技术。然而，与自动问答（QA）相比，QG是一项更具挑战性的任务。在这项工作中，我们使用土耳其语QA数据集在多任务设置中为QA、QG和答案提取任务微调多语言T5（mT5）转换器。据我们所知，这是首次尝试从土耳其文本自动生成文本到文本问题的学术工作。评估结果表明，所提出的多任务设置在TQuADv1、TQuADv2数据集和XQuAD Turkish split上实现了最先进的土耳其问答和问题生成性能。源代码和预先训练的模型可在https://github.com/obss/turkish-question-generation. 摘要：While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. To reduce the expenses associated with the manual construction of questions and to satisfy the need for a continuous supply of new questions, automatic question generation (QG) techniques can be utilized. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using a Turkish QA dataset. To the best of our knowledge, this is the first academic work that attempts to perform automated text-to-text question generation from Turkish texts. Evaluation results show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance over TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and pre-trained models are available at https://github.com/obss/turkish-question-generation.

GAN|对抗|攻击|生成相关(2篇)

【1】 Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data 标题：欺骗性D：有限数据条件下GaN训练的自适应伪增强算法链接：https://arxiv.org/abs/2111.06849

作者：Liming Jiang,Bo Dai,Wayne Wu,Chen Change Loy 机构：S-Lab, Nanyang Technological University, SenseTime Research 备注：NeurIPS 2021. Code: this https URL Project page: this https URL 摘要：生成性对抗网络（GAN）通常需要大量数据进行训练，以便合成高保真图像。最近的研究表明，由于鉴别器过度拟合（阻碍生成器收敛的根本原因），数据有限的训练GANs仍然很难实现。本文介绍了一种称为自适应伪增强（APA）的新策略，以鼓励发生器和鉴别器之间的健康竞争。作为依赖于标准数据增强或模型正则化的现有方法的替代方法，APA通过使用生成器本身用生成的图像增强真实数据分布来缓解过度拟合，从而自适应地欺骗鉴别器。大量实验证明了APA在低数据区提高合成质量的有效性。我们提供了一个理论分析来检验我们新训练策略的收敛性和合理性。APA方法简单有效。它可以无缝地添加到功能强大的当代GaN中，如StyleGAN2，计算成本可以忽略不计。摘要：Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime. We provide a theoretical analysis to examine the convergence and rationality of our new training strategy. APA is simple and effective. It can be added seamlessly to powerful contemporary GANs, such as StyleGAN2, with negligible computational cost.

【2】 Bi-Discriminator Class-Conditional Tabular GAN 标题：双鉴别器类-条件表格GAN 链接：https://arxiv.org/abs/2111.06549

作者：Mohammad Esmaeilpour,Nourhene Chaalia,Adel Abusitta,Francois-Xavier Devailly,Wissem Maazoun,Patrick Cardinal 机构： Universit´e du Qu´ebec, ‡Affiliated with ´Ecole Polytechnique de Montr´eal and McGillUniversity 备注：Submitted to IEEE Signal Processing Letters (IEEE-SPL) 摘要：本文介绍了一种用于合成包含连续、二进制和离散列的表格数据集的双鉴别器GAN。我们提出的方法采用了一种自适应的预处理方案和一个新的条件项来生成网络，以更有效地捕获输入样本分布。此外，我们为鉴别器网络实现了简单而有效的体系结构，旨在为发生器提供更多的鉴别梯度信息。我们在四个基准公共数据集上的实验结果证实了我们的GAN在似然适应度度量和机器学习效率方面的优越性能。摘要：This paper introduces a bi-discriminator GAN for synthesizing tabular datasets containing continuous, binary, and discrete columns. Our proposed approach employs an adapted preprocessing scheme and a novel conditional term for the generator network to more effectively capture the input sample distributions. Additionally, we implement straightforward yet effective architectures for discriminator networks aiming at providing more discriminative gradient information to the generator. Our experimental results on four benchmarking public datasets corroborates the superior performance of our GAN both in terms of likelihood fitness metric and machine learning efficacy.

半/弱/无/有监督|不确定性|主动学习(1篇)

【1】 Online-compatible Unsupervised Non-resonant Anomaly Detection 标题：在线兼容的无监督非共振异常检测链接：https://arxiv.org/abs/2111.06417

作者：Vinicius Mikuni,Benjamin Nachman,David Shih 机构：National Energy Research Scientific Computing Center, Berkeley Lab, Berkeley, CA , USA, Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA , USA, Berkeley Institute for Data Science, University of California, Berkeley, CA , USA 备注：9 pages, 3 figures 摘要：越来越需要异常检测方法，这种方法可以以模型不可知的方式扩大对新粒子的搜索。大多数新方法的建议都只关注信号灵敏度。然而，仅仅选择异常事件是不够的——还必须有一种策略来为所选事件提供上下文。我们提出了第一个完整的非共振异常无监督检测策略，包括信号灵敏度和背景估计的数据驱动方法。我们的技术是由两个同时训练的自动编码器构成的，它们被迫相互去相关。该方法可以离线部署用于非共振异常检测，也是第一个完整的在线兼容异常检测策略。我们表明，我们的方法在为ADC2021数据挑战准备的各种信号上取得了优异的性能。摘要：There is a growing need for anomaly detection methods that can broaden the search for new particles in a model-agnostic manner. Most proposals for new methods focus exclusively on signal sensitivity. However, it is not enough to select anomalous events - there must also be a strategy to provide context to the selected events. We propose the first complete strategy for unsupervised detection of non-resonant anomalies that includes both signal sensitivity and a data-driven method for background estimation. Our technique is built out of two simultaneously-trained autoencoders that are forced to be decorrelated from each other. This method can be deployed offline for non-resonant anomaly detection and is also the first complete online-compatible anomaly detection strategy. We show that our method achieves excellent performance on a variety of signals prepared for the ADC2021 data challenge.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 Self-Reflective Terrain-Aware Robot Adaptation for Consistent Off-Road Ground Navigation 标题：用于一致越野地面导航的自反射地形感知机器人自适应链接：https://arxiv.org/abs/2111.06742

作者：Sriram Siva,Maggie Wigness,John G. Rogers,Long Quang,Hao Zhang 备注：13 pages, 7 figures, IJRR21 摘要：地面机器人需要具备穿越非结构化和无准备地形的关键能力，并能够避免障碍物，以完成现实世界机器人应用（如灾难响应）中的任务。当机器人在森林等越野野外环境中工作时，由于地形特征和机器人自身的变化，机器人的实际行为通常与预期或计划的行为不匹配。因此，机器人适应一致行为生成的能力对于非结构化越野地形上的机动性至关重要。为了应对这一挑战，我们提出了一种新的自反射地形感知自适应方法，用于地面机器人生成一致的控制，以便在非结构化越野地形上导航，这使得机器人能够在适应各种非结构化地形的同时，通过机器人自我反射更准确地执行预期行为。为了评估我们的方法的性能，我们在不同的非结构化越野地形上使用具有各种功能变化的真实地面机器人进行了广泛的实验。综合实验结果表明，我们的自反射地形感知自适应方法能够使地面机器人产生一致的导航行为，并且优于先前和基线技术。摘要：Ground robots require the crucial capability of traversing unstructured and unprepared terrains and avoiding obstacles to complete tasks in real-world robotics applications such as disaster response. When a robot operates in off-road field environments such as forests, the robot's actual behaviors often do not match its expected or planned behaviors, due to changes in the characteristics of terrains and the robot itself. Therefore, the capability of robot adaptation for consistent behavior generation is essential for maneuverability on unstructured off-road terrains. In order to address the challenge, we propose a novel method of self-reflective terrain-aware adaptation for ground robots to generate consistent controls to navigate over unstructured off-road terrains, which enables robots to more accurately execute the expected behaviors through robot self-reflection while adapting to varying unstructured terrains. To evaluate our method's performance, we conduct extensive experiments using real ground robots with various functionality changes over diverse unstructured off-road terrains. The comprehensive experimental results have shown that our self-reflective terrain-aware adaptation method enables ground robots to generate consistent navigational behaviors and outperforms the compared previous and baseline techniques.

【2】 An Enhanced Adaptive Bi-clustering Algorithm through Building a Shielding Complex Sub-Matrix 标题：一种通过构造屏蔽复数矩阵的增强型自适应双向聚类算法链接：https://arxiv.org/abs/2111.06524

作者：Kaijie Xu 机构：a School of Electronic Engineering, Xidian University, Xi’an , China 摘要：双聚类是指在数据矩阵中查找子矩阵（由一组列和一组行索引）的任务，以便每个子矩阵（数据和特征）的元素以特定方式相关，例如，它们在某些度量方面相似。本文分析了著名的Cheng-and-Church（CC）双聚类算法，该算法已被证明是挖掘共表达基因的有效工具。然而，Cheng和Church双聚类算法在总结其局限性（如贪婪策略中随机数的干扰；忽略重叠双聚类）的基础上，我们提出了一种新的自适应双聚类算法增强，其中，构造屏蔽复数子矩阵来屏蔽已获得的bi簇并发现重叠的bi簇。在屏蔽复矩阵中，利用虚部和实部分别屏蔽和扩展新的bi簇，形成一系列最优bi簇。为了保证所获得的双簇对已经产生的双簇没有影响，引入单位脉冲信号来自适应地检测和屏蔽所构建的双簇。同时，为了有效屏蔽零数据（零尺寸数据），设置另一单位脉冲信号进行自适应检测和屏蔽。此外，我们添加了屏蔽因子来调整包含子矩阵屏蔽数据的行（或列）的均方剩余分数，以决定是否保留它们。我们对所开发的方案进行了彻底的分析。实验结果与理论分析一致。在公开的真实微阵列数据集上获得的结果表明，由于所提出的方法，双聚类性能得到了增强。摘要：Bi-clustering refers to the task of finding sub-matrices (indexed by a group of columns and a group of rows) within a matrix of data such that the elements of each sub-matrix (data and features) are related in a particular way, for instance, that they are similar with respect to some metric. In this paper, after analyzing the well-known Cheng and Church (CC) bi-clustering algorithm which has been proved to be an effective tool for mining co-expressed genes. However, Cheng and Church bi-clustering algorithm and summarizing its limitations (such as interference of random numbers in the greedy strategy; ignoring overlapping bi-clusters), we propose a novel enhancement of the adaptive bi-clustering algorithm, where a shielding complex sub-matrix is constructed to shield the bi-clusters that have been obtained and to discover the overlapping bi-clusters. In the shielding complex sub-matrix, the imaginary and the real parts are used to shield and extend the new bi-clusters, respectively, and to form a series of optimal bi-clusters. To assure that the obtained bi-clusters have no effect on the bi-clusters already produced, a unit impulse signal is introduced to adaptively detect and shield the constructed bi-clusters. Meanwhile, to effectively shield the null data (zero-size data), another unit impulse signal is set for adaptive detecting and shielding. In addition, we add a shielding factor to adjust the mean squared residue score of the rows (or columns), which contains the shielded data of the sub-matrix, to decide whether to retain them or not. We offer a thorough analysis of the developed scheme. The experimental results are in agreement with the theoretical analysis. The results obtained on a publicly available real microarray dataset show the enhancement of the bi-clusters performance thanks to the proposed method.

强化学习(3篇)

【1】 Resilient Consensus-based Multi-agent Reinforcement Learning 标题：基于弹性共识的多智能体强化学习链接：https://arxiv.org/abs/2111.06776

作者：Martin Figura,Yixuan Lin,Ji Liu,Vijay Gupta 机构：Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN , USA, Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY , USA, Department of Electrical and Computer Engineering 摘要：训练过程中的对抗性攻击会严重影响多智能体强化学习算法的性能。因此，非常需要增强现有算法，以消除对抗性攻击对协作网络的影响，或者至少有界。在这项工作中，我们考虑一个完全分散的网络，其中每个代理接收一个本地奖励，并观察全局状态和动作。我们提出了一种基于弹性共识的参与者-批评家算法，其中每个代理估计团队平均报酬和价值函数，并将相关参数向量传递给其直接邻居。我们证明，在存在估计和通信策略完全任意的拜占庭代理的情况下，如果每个合作代理的邻域中最多有$H$拜占庭代理，且网络为$（2H 1），则合作代理的估计收敛到概率为1的有界一致值美元-稳健。此外，我们证明了在敌对代理的策略渐近稳定的假设下，合作代理的策略以概率1收敛到其团队平均目标函数的局部极大值附近的有界邻域。摘要：Adversarial attacks during training can strongly influence the performance of multi-agent reinforcement learning algorithms. It is, thus, highly desirable to augment existing algorithms such that the impact of adversarial attacks on cooperative networks is eliminated, or at least bounded. In this work, we consider a fully decentralized network, where each agent receives a local reward and observes the global state and action. We propose a resilient consensus-based actor-critic algorithm, whereby each agent estimates the team-average reward and value function, and communicates the associated parameter vectors to its immediate neighbors. We show that in the presence of Byzantine agents, whose estimation and communication strategies are completely arbitrary, the estimates of the cooperative agents converge to a bounded consensus value with probability one, provided that there are at most $H$ Byzantine agents in the neighborhood of each cooperative agent and the network is $(2H 1)$-robust. Furthermore, we prove that the policy of the cooperative agents converges with probability one to a bounded neighborhood around a local maximizer of their team-average objective function under the assumption that the policies of the adversarial agents asymptotically become stationary.

【2】 Causal Multi-Agent Reinforcement Learning: Review and Open Problems 标题：因果多智能体强化学习：综述和有待解决的问题链接：https://arxiv.org/abs/2111.06721

作者：St John Grimbly,Jonathan Shock,Arnu Pretorius 机构：University of Cape Town, InstaDeep 备注：Accepted at CoopAI NeurIPS Workshop 2021 摘要：本文旨在向读者介绍多智能体强化学习（MARL）领域及其与因果关系研究方法的交叉。我们强调MARL中的关键挑战，并在因果方法如何帮助解决这些挑战的背景下讨论这些挑战。我们提倡对泥灰岩采取“因果关系优先”的观点。具体而言，我们认为因果关系可以提高安全性、可解释性和鲁棒性，同时也为紧急行为提供强有力的理论保证。我们讨论共同挑战的潜在解决方案，并利用这一背景来推动未来的研究方向。摘要：This paper serves to introduce the reader to the field of multi-agent reinforcement learning (MARL) and its intersection with methods from the study of causality. We highlight key challenges in MARL and discuss these in the context of how causal methods may assist in tackling them. We promote moving toward a 'causality first' perspective on MARL. Specifically, we argue that causality can offer improved safety, interpretability, and robustness, while also providing strong theoretical guarantees for emergent behaviour. We discuss potential solutions for common challenges, and use this context to motivate future research directions.

【3】 Promoting Resilience in Multi-Agent Reinforcement Learning via Confusion-Based Communication 标题：基于念力通信的多智能体强化学习中提高弹性的研究链接：https://arxiv.org/abs/2111.06614

作者：Ofir Abu,Matthias Gerstgrasser,Jeffrey Rosenschein,Sarah Keren 机构：Hebrew University of Jerusalem, School of Engineering And Applied Sciences, Harvard University, Jeffrey S. Rosenschein, Technion - Israel Institute of Technology 备注：Submission from Neurips 2021 WS 摘要：多智能体强化学习（MARL）的最新进展提供了多种工具，支持智能体适应其环境中意外变化的能力，并根据其环境的动态特性（可能会因其他智能体的存在而增强）成功运行。在这项工作中，我们强调了团队有效协作的能力与团队弹性之间的关系，我们将其衡量为团队适应环境扰动的能力。为了提高恢复力，我们建议通过一种新的基于混淆的通信协议来促进协作，根据该协议，代理可以广播与其先前经验不一致的观察结果。我们允许代理自主学习有关消息宽度和频率的决策，这有助于减少混淆。我们在各种泥灰岩环境中对我们的方法进行了实证评估。摘要：Recent advances in multi-agent reinforcement learning (MARL) provide a variety of tools that support the ability of agents to adapt to unexpected changes in their environment, and to operate successfully given their environment's dynamic nature (which may be intensified by the presence of other agents). In this work, we highlight the relationship between a group's ability to collaborate effectively and the group's resilience, which we measure as the group's ability to adapt to perturbations in the environment. To promote resilience, we suggest facilitating collaboration via a novel confusion-based communication protocol according to which agents broadcast observations that are misaligned with their previous experiences. We allow decisions regarding the width and frequency of messages to be learned autonomously by agents, which are incentivized to reduce confusion. We present empirical evaluation of our approach in a variety of MARL settings.

元学习(1篇)

【1】 Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and Future Opportunities 标题：可解释人工智能(XAI)：对当前挑战和未来机遇的系统元调查链接：https://arxiv.org/abs/2111.06420

作者：Waddah Saeed,Christian Omlin 机构：Center for Artificial Intelligence Research, University of Agder, Grimstad, Norway 备注：29 pages, 2 figures, 4 tables 摘要：在过去的十年里，人工智能（AI）取得了重大进展，算法被用于解决各种问题。然而，这种成功是通过增加模型的复杂性和采用缺乏透明度的黑盒人工智能模型来实现的。为了满足这一需求，提出了可解释人工智能（XAI），以使人工智能更加透明，从而促进人工智能在关键领域的应用。尽管文献中对XAI主题进行了多次回顾，确定了XAI的挑战和潜在研究方向，但这些挑战和研究方向是分散的。因此，本研究对XAI的挑战和未来研究方向进行了系统的元调查，分为两个主题：（1）XAI的一般挑战和研究方向；（2）基于机器学习生命周期阶段的XAI挑战和研究方向：设计、开发和部署。我们相信，我们的元调查为XAI地区的未来勘探提供了指南，从而有助于XAI文献。摘要：The past decade has seen significant progress in artificial intelligence (AI), which has resulted in algorithms being adopted for resolving a variety of problems. However, this success has been met by increasing model complexity and employing black-box AI models that lack transparency. In response to this need, Explainable AI (XAI) has been proposed to make AI more transparent and thus advance the adoption of AI in critical domains. Although there are several reviews of XAI topics in the literature that identified challenges and potential research directions in XAI, these challenges and research directions are scattered. This study, hence, presents a systematic meta-survey for challenges and future research directions in XAI organized in two themes: (1) general challenges and research directions in XAI and (2) challenges and research directions in XAI based on machine learning life cycle's phases: design, development, and deployment. We believe that our meta-survey contributes to XAI literature by providing a guide for future exploration in the XAI area.

医学相关(3篇)

【1】 ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects 标题：ADCB：评估因果效应观测估计值的阿尔茨海默病基准链接：https://arxiv.org/abs/2111.06811

作者：Newton Mwai Kinyanjui,Fredrik D. Johansson 机构：Chalmers University of Technology, Sweden 备注：Machine Learning for Health (ML4H) - Extended Abstract 摘要：模拟器为因果效应估计提供了独特的基准，因为它们不依赖于无法验证的假设或干预现实世界系统的能力，但往往过于简单，无法捕获实际应用的重要方面。我们提出了一个阿尔茨海默病模拟器，旨在对复杂的医疗数据进行建模，同时对因果效应和政策评估进行基准测试。我们将该系统与阿尔茨海默病神经成像计划（ADNI）数据集相匹配，并根据比较治疗试验和观察治疗模式的结果手工制作地面组件。该模拟器包括改变因果推理任务性质和难度的参数，如潜在变量、效应异质性、观察历史的长度、行为策略和样本量。我们使用模拟器来比较平均和条件治疗效果的估计。摘要：Simulators make unique benchmarks for causal effect estimation since they do not rely on unverifiable assumptions or the ability to intervene on real-world systems, but are often too simple to capture important aspects of real applications. We propose a simulator of Alzheimer's disease aimed at modeling intricacies of healthcare data while enabling benchmarking of causal effect and policy estimators. We fit the system to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and ground hand-crafted components in results from comparative treatment trials and observational treatment patterns. The simulator includes parameters which alter the nature and difficulty of the causal inference tasks, such as latent variables, effect heterogeneity, length of observed history, behavior policy and sample size. We use the simulator to compare estimators of average and conditional treatment effects.

【2】 Deep Reinforcement Model Selection for Communications Resource Allocation in On-Site Medical Care 标题：现场医疗通信资源配置的深度强化模式选择链接：https://arxiv.org/abs/2111.06680

作者：Steffen Gracla,Edgar Beck,Carsten Bockelmann,Armin Dekorsy 机构：Dept. of Communications Engineering, University of Bremen, Bremen, Germany 摘要：移动通信技术的更大能力使得现场医疗服务能够以以前无法达到的规模进行互连。然而，在已经复杂的移动通信基础设施中嵌入如此关键、苛刻的任务具有挑战性。本文探讨了一个资源分配场景，其中调度器必须平衡连接用户之间的混合性能指标。为了完成这项资源分配任务，我们提出了一种调度器，它可以在不同的基于模型的调度算法之间自适应切换。我们利用深度Q网络，结合模型驱动和数据驱动方法的优点，了解为给定情况选择调度范例的好处。由此产生的集成调度器能够组合其组成算法，以最大化和效用成本函数，同时确保指定高优先级用户的性能。摘要：Greater capabilities of mobile communications technology enable interconnection of on-site medical care at a scale previously unavailable. However, embedding such critical, demanding tasks into the already complex infrastructure of mobile communications proves challenging. This paper explores a resource allocation scenario where a scheduler must balance mixed performance metrics among connected users. To fulfill this resource allocation task, we present a scheduler that adaptively switches between different model-based scheduling algorithms. We make use of a deep Q-Network to learn the benefit of selecting a scheduling paradigm for a given situation, combining advantages from model-driven and data-driven approaches. The resulting ensemble scheduler is able to combine its constituent algorithms to maximize a sum-utility cost function while ensuring performance on designated high-priority users.

【3】 Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence 标题：利用深度学习序列模型识别SARS-CoV-2分歧链接：https://arxiv.org/abs/2111.06593

作者：Yanyi Ding,Zhiyi Kuang,Yuxin Pei,Jeff Tan,Ziyu Zhang,Joseph Konan 机构：Healthcare Analytics & IT, Carnegie Mellon University, Pittsburgh, PA , Computational Finance, Computer Science, Electrical & Computer Eng. 摘要：SARS-CoV-2是一种上呼吸道RNA病毒，截至2021年5月，该病毒已导致300多万人死亡，并在全球范围内感染1.5亿人。迄今为止，已有数千株SARS-CoV-2突变株测序，这对科学家跟上疫苗开发和公共卫生措施提出了重大挑战。因此，一种有效的方法来鉴定患者实验室样本的差异将极大地有助于SARS-CoV-2基因组学的记录。在这项研究中，我们提出了一个神经网络模型，该模型利用循环和卷积单元直接获取棘突蛋白的氨基酸序列并对相应的分支进行分类。我们还将我们的模型的性能与在蛋白质数据库上预训练的Transformers（BERT）的双向编码器表示进行了比较。我们的方法有可能提供一种更有效的计算方法，以替代目前基于同源性的种内分化。摘要：SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021. With thousands of strains sequenced to date, SARS-CoV-2 mutations pose significant challenges to scientists on keeping pace with vaccine development and public health measures. Therefore, an efficient method of identifying the divergence of lab samples from patients would greatly aid the documentation of SARS-CoV-2 genomics. In this study, we propose a neural network model that leverages recurrent and convolutional units to directly take in amino acid sequences of spike proteins and classify corresponding clades. We also compared our model's performance with Bidirectional Encoder Representations from Transformers (BERT) pre-trained on protein database. Our approach has the potential of providing a more computationally efficient alternative to current homology based intra-species differentiation.

蒸馏|知识提取(1篇)

【1】 Extraction of Medication Names from Twitter Using Augmentation and an Ensemble of Language Models 标题：基于增强和语言模型集成的Twitter药物名称提取链接：https://arxiv.org/abs/2111.06664

作者：Igor Kulev,Berkay Köprü,Raul Rodriguez-Esteban,Diego Saldana,Yi Huang,Alessandro La Torraca,Elif Ozkirimli 机构：. Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Switzerland, . Personalized Healthcare Center of Excellence, F. Hoffmann-La Roche Ltd, Basel, Switzerland 备注：Proceedings of the BioCreative VII Challenge Evaluation Workshop 摘要：BioCreative VII Track 3挑战的重点是在Twitter用户时间表中识别药物名称。为了应对这一挑战，我们使用了几种数据增强技术来扩展可用的训练数据。然后，这些增强的数据被用来微调一组语言模型，这些语言模型是在一般领域Twitter内容上预先训练过的。所提出的方法优于先前最先进的Kusuri算法，并在所选目标函数（重叠F1分数）的竞争中排名靠前。摘要：The BioCreative VII Track 3 challenge focused on the identification of medication names in Twitter user timelines. For our submission to this challenge, we expanded the available training data by using several data augmentation techniques. The augmented data was then used to fine-tune an ensemble of language models that had been pre-trained on general-domain Twitter content. The proposed approach outperformed the prior state-of-the-art algorithm Kusuri and ranked high in the competition for our selected objective function, overlapping F1 score.

聚类(2篇)

【1】 Hierarchical Clustering: New Bounds and Objective 标题：层次聚类：新的界限和目标链接：https://arxiv.org/abs/2111.06863

作者：Mirmahdi Rahgoshay,Mohammad R. Salavatipour 机构：Department of Computing Science, University of Alberta 摘要：层次聚类作为一种数据分析方法已经得到了广泛的研究和应用。最近，Dasgupta[2016]定义了一个精确的目标函数。给定一组$n$数据点，每两个$i$和$j$的权重函数为$w{i，j}$，表示它们的相似性/不相似性，目标是将数据点（项）构建递归（树状）划分为连续较小的簇。他将树$T$的代价函数定义为$cost（T）=sum{i，jin[n]}big（w{i，j}times{T{i，j}}big）$，其中$T{i，j}$是以$i$和$j$的最小共同祖先为根的子树，并给出了这种聚类的一次近似算法。然后，Moseley和Wang[2017]考虑了Dasgupta基于相似性的权重目标函数的对偶性，并表明随机划分和平均链接的近似比均为$1/3$，在一系列工作中，该近似比已提高到$0.585$[Alon et al.2020]。后来，Cohen Addad等人[2019]考虑了与Dasgupta相同的目标函数，但基于不同的度量，称为$Rev（T）$。结果表明，随机分区和平均链接的比率均为$2/3$，仅略微提高到$0.667078$[Charikar等人，SODA2020]。我们的第一个主要结果是考虑$ Rev（t）$，并提出一个更精细的算法和仔细的分析，实现近似0.71604美元。我们还引入了一个新的基于差异性的聚类目标函数。对于任何树$T$，让$H{i，j}$为$i$和$j$的共同祖先数。直觉上，相似的项目应该尽可能深入地保留在同一个集群中。因此，对于基于差异性的度量，我们建议每个树的成本$T$，我们希望最小化，为$cost_H（T）=sum_{i，jin[n]}big（w_{i，j}乘以H_{i，j}big）$。我们为这个目标提供了1.3977美元的近似值。摘要：Hierarchical Clustering has been studied and used extensively as a method for analysis of data. More recently, Dasgupta [2016] defined a precise objective function. Given a set of $n$ data points with a weight function $w_{i,j}$ for each two items $i$ and $j$ denoting their similarity/dis-similarity, the goal is to build a recursive (tree like) partitioning of the data points (items) into successively smaller clusters. He defined a cost function for a tree $T$ to be $Cost(T) = sum_{i,j in [n]} big(w_{i,j} times |T_{i,j}| big)$ where $T_{i,j}$ is the subtree rooted at the least common ancestor of $i$ and $j$ and presented the first approximation algorithm for such clustering. Then Moseley and Wang [2017] considered the dual of Dasgupta's objective function for similarity-based weights and showed that both random partitioning and average linkage have approximation ratio $1/3$ which has been improved in a series of works to $0.585$ [Alon et al. 2020]. Later Cohen-Addad et al. [2019] considered the same objective function as Dasgupta's but for dissimilarity-based metrics, called $Rev(T)$. It is shown that both random partitioning and average linkage have ratio $2/3$ which has been only slightly improved to $0.667078$ [Charikar et al. SODA2020]. Our first main result is to consider $Rev(T)$ and present a more delicate algorithm and careful analysis that achieves approximation $0.71604$. We also introduce a new objective function for dissimilarity-based clustering. For any tree $T$, let $H_{i,j}$ be the number of $i$ and $j$'s common ancestors. Intuitively, items that are similar are expected to remain within the same cluster as deep as possible. So, for dissimilarity-based metrics, we suggest the cost of each tree $T$, which we want to minimize, to be $Cost_H(T) = sum_{i,j in [n]} big(w_{i,j} times H_{i,j} big)$. We present a $1.3977$-approximation for this objective.

【2】 Detecting Quality Problems in Data Models by Clustering Heterogeneous Data Values 标题：通过对异构数据值进行聚类来检测数据模型中的质量问题链接：https://arxiv.org/abs/2111.06661

作者：Viola Wenz,Arno Kesper,Gabriele Taentzer 机构：Philipps-Universit¨at Marburg 备注：17 pages. This paper is an extended version of a paper to be published in "MoDELS '21: ACM/IEEE 24th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings". It was presented at the 3rd Workshop on Artificial Intelligence and Model-driven Engineering 摘要：如果数据适合其预期用途，则数据具有高质量。数据质量受底层数据模型及其质量的影响。一个主要的质量问题是数据的异构性，因为诸如可理解性和互操作性等质量方面受到损害。这种异质性可能是由数据模型中的质量问题引起的。数据异构性可能会发生，尤其是当给定的信息没有足够的结构化并且只是在数据值中捕获时，这通常是由于底层数据模型中缺少或不合适的结构造成的。我们提出了一种自下而上的方法来检测异构数据值中的数据模型中的质量问题。它支持对现有数据进行探索性分析，并可由领域专家根据其领域知识进行配置。所选数据字段的所有值均按语法相似性进行聚类。因此，本文概述了数据值在语法上的多样性。它将帮助领域专家了解数据模型在实践中的使用方式，并得出数据模型的潜在质量问题。我们概述了概念验证实施，并使用文化遗产数据评估了我们的方法。摘要：Data is of high quality if it is fit for its intended use. The quality of data is influenced by the underlying data model and its quality. One major quality problem is the heterogeneity of data as quality aspects such as understandability and interoperability are impaired. This heterogeneity may be caused by quality problems in the data model. Data heterogeneity can occur in particular when the information given is not structured enough and just captured in data values, often due to missing or non-suitable structure in the underlying data model. We propose a bottom-up approach to detecting quality problems in data models that manifest in heterogeneous data values. It supports an explorative analysis of the existing data and can be configured by domain experts according to their domain knowledge. All values of a selected data field are clustered by syntactic similarity. Thereby an overview of the data values' diversity in syntax is provided. It shall help domain experts to understand how the data model is used in practice and to derive potential quality problems of the data model. We outline a proof-of-concept implementation and evaluate our approach using cultural heritage data.

推理|分析|理解|解释(3篇)

【1】 Understanding the Information Needs and Practices of Human Supporters of an Online Mental Health Intervention to Inform Machine Learning Applications 标题：了解在线心理健康干预的人类支持者的信息需求和实践，以便为机器学习应用提供信息链接：https://arxiv.org/abs/2111.06667

作者：Anja Thieme 机构：Microsoft Research 备注：41 pages, 3 figures, 3 tables 摘要：在数字治疗干预的背景下，例如用于治疗抑郁和焦虑的互联网认知行为疗法（iCBT），广泛的研究表明，帮助接受治疗者的人类支持者或教练的参与，提高用户对治疗的参与度，并比无支持的干预措施带来更有效的健康结果。为了最大限度地发挥这种人力支持的效果和成果，该研究调查了人工智能和机器学习（ML）领域的最新进展所提供的新机会如何能够提供有用的数据见解，从而有效地支持iCBT支持者的工作实践。本文报告了对15名iCBT支持者的访谈研究的详细结果，该研究加深了对他们现有工作实践和信息需求的理解，旨在有意义地为开发有用的、可实施的ML应用程序提供信息，特别是在iCBT治疗抑郁和焦虑的背景下。该分析有助于（1）一组六个主题，总结了iCBT支持者在向其心理健康客户提供有效、个性化反馈方面遇到的战略和挑战；作为对这些经验的回应，（2）为每个主题提供了具体的机会，说明ML方法如何帮助支持和解决已确定的挑战和信息需求。它以在支持者领导的客户审查实践中引入新的机器生成数据洞察的潜在社会、情感和语用影响的反思作为结束。摘要：In the context of digital therapy interventions, such as internet-delivered Cognitive Behavioral Therapy (iCBT) for the treatment of depression and anxiety, extensive research has shown how the involvement of a human supporter or coach, who assists the person undergoing treatment, improves user engagement in therapy and leads to more effective health outcomes than unsupported interventions. Seeking to maximize the effects and outcomes of this human support, the research investigates how new opportunities provided through recent advances in the field of AI and machine learning (ML) can contribute useful data insights to effectively support the work practices of iCBT supporters. This paper reports detailed findings of an interview study with 15 iCBT supporters that deepens understanding of their existing work practices and information needs with the aim to meaningfully inform the development of useful, implementable ML applications particularly in the context of iCBT treatment for depression and anxiety. The analysis contributes (1) a set of six themes that summarize the strategies and challenges that iCBT supporters encounter in providing effective, personalized feedback to their mental health clients; and in response to these learnings, (2) presents for each theme concrete opportunities for how methods of ML could help support and address identified challenges and information needs. It closes with reflections on potential social, emotional and pragmatic implications of introducing new machine-generated data insights within supporter-led client review practices.

【2】 On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference 标题：稳健大词汇量话题推理的飞翔纠错链接：https://arxiv.org/abs/2111.06580

作者：Moontae Lee,Sungjun Cho,Kun Dong,David Mimno,David Bindel 机构： Cornell University 摘要：在许多数据域中，关于对象的联合外观的共现统计信息非常有用。通过将无监督学习问题转化为共现统计的分解，谱算法为后验推理（如潜在主题分析和社区检测）提供了透明高效的算法。然而，随着对象词汇表的增长，存储和运行基于共现统计的推理算法的成本会迅速增加。纠正共现（维护模型假设的关键过程）在出现稀有术语时变得越来越重要，但目前的技术无法扩展到大型词汇表。我们提出了一种新的方法，可以同时压缩和校正共现统计信息，并根据词汇的大小和潜在空间的大小进行适当的缩放。我们还提出了从压缩统计数据中学习潜在变量的新算法，并验证了我们的方法在文本和非文本数据上的性能与以前的方法相当。摘要：Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it becomes rapidly more expensive to store and run inference algorithms on co-occurrence statistics. Rectifying co-occurrence, the key process to uphold model assumptions, becomes increasingly more vital in the presence of rare terms, but current techniques cannot scale to large vocabularies. We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space. We also present new algorithms learning latent variables from the compressed statistics, and verify that our methods perform comparably to previous approaches on both textual and non-textual data.

【3】 Variational Auto-Encoder Architectures that Excel at Causal Inference 标题：擅长因果推理的变分自动编码器体系结构链接：https://arxiv.org/abs/2111.06486

作者：Negar Hassanpour,Russell Greiner 机构：Department of Computing Science, University of Alberta, Amii, Edmonton, Canada 摘要：从观察数据（在个人或群体层面）估计因果效应对于做出许多类型的决策至关重要。解决这一任务的一种方法是学习数据基本因素的分解表示；当存在混杂因素（影响因果）时，这将变得更具挑战性。在本文中，我们采取了一种生成方法，该方法建立在变分自动编码器的最新进展的基础上，以同时了解这些潜在因素以及因果关系。我们提出了一个渐进的模型序列，其中每个模型都比前一个模型有所改进，最终形成了混合模型。我们的实证结果表明，这三种模型的性能都优于文献中最先进的区分方法和其他生成方法。摘要：Estimating causal effects from observational data (at either an individual -- or a population -- level) is critical for making many types of decisions. One approach to address this task is to learn decomposed representations of the underlying factors of data; this becomes significantly more challenging when there are confounding factors (which influence both the cause and the effect). In this paper, we take a generative approach that builds on the recent advances in Variational Auto-Encoders to simultaneously learn those underlying factors as well as the causal effects. We propose a progressive sequence of models, where each improves over the previous one, culminating in the Hybrid model. Our empirical results demonstrate that the performance of all three proposed models are superior to both state-of-the-art discriminative as well as other generative approaches in the literature.

检测相关(3篇)

【1】 Multimodal Virtual Point 3D Detection 标题：多模态虚拟点三维检测链接：https://arxiv.org/abs/2111.06881

作者：Tianwei Yin,Xingyi Zhou,Philipp Krähenbühl 机构：UT Austin 备注：NeurIPS 2021, code available at this https URL 摘要：基于激光雷达的传感驱动当前的自主车辆。尽管进展迅速，但目前的激光雷达传感器在分辨率和成本方面仍落后于传统彩色相机20年。对于自动驾驶，这意味着靠近传感器的大型物体很容易看到，但距离较远或较小的物体仅包含一个或两个测量值。这是一个问题，尤其是当这些物体被证明是驾驶危险时。另一方面，这些相同的物体在机载RGB传感器中清晰可见。在这项工作中，我们提出了一种将RGB传感器无缝融合到基于激光雷达的3D识别中的方法。我们的方法采用一组2D检测来生成密集的3D虚拟点，以增强原本稀疏的3D点云。这些虚拟点自然地与任何基于激光雷达的标准3D探测器以及常规激光雷达测量集成在一起。由此产生的多模态检测器简单有效。在大规模nuScenes数据集上的实验结果表明，我们的框架将一个强大的中心点基线提高了6.6MAP，并且优于其他融合方法。代码和更多可视化可在https://tianweiy.github.io/mvp/ 摘要：Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches. Code and more visualizations are available at https://tianweiy.github.io/mvp/

【2】 Alleviating the transit timing variation bias in transit surveys. I. RIVERS: Method and detection of a pair of resonant super-Earths around Kepler-1705 标题：缓解中转调查中的中转时差偏差。I.河流：开普勒-1705附近一对共振超地球的方法和探测链接：https://arxiv.org/abs/2111.06825

作者：A. Leleu,G. Chatel,S. Udry,Y. Alibert,J. -B. Delisle,R. Mardling 机构： Observatoire de Genève, Université de Genève, Chemin Pegasi, Versoix, Switzerland., Physikalisches Institut, Universität Bern, Gesellschaftsstr. , Bern, Switzerland., Disaitek, www.disaitek.ai. 摘要：凌日时间变化（TTV）可以为通过凌日观测到的系统提供有用的信息，因为它们允许我们限制观测到的行星的质量和偏心，甚至限制非凌日同伴的存在。然而，TTV也可以作为一种探测偏差，防止在轨道未受扰动的情况下，通过标准算法（如装箱最小二乘法）探测到的凌日测量中的小行星。这种偏差尤其存在于基线较长的调查中，如开普勒、一些TESS区和即将到来的柏拉图任务。在这里，我们介绍了一种对大型TTV具有鲁棒性的检测方法，并通过恢复和确认开普勒-1705周围具有10小时TTV的一对共振超级地球来说明其用途。该方法基于训练的神经网络，用于恢复河流图中低信噪比（S/N）扰动行星的轨迹。我们通过拟合光线曲线来恢复这些候选者的渡越参数。开普勒-1705b和c的单次凌日S/N比之前已知的所有TTV为3小时或以上的行星低约三倍，推动了这些小型动态活动行星的恢复。恢复这类天体对于获得观测到的行星系统的完整图像以及解决系外行星种群统计研究中不常考虑的偏差至关重要。此外，TTV是获得质量估计值的一种手段，这对于研究通过轨道测量发现的行星的内部结构至关重要。最后，我们表明，由于强烈的轨道扰动，开普勒-1705外共振行星的自旋可能被困在亚同步或超同步自旋轨道共振中。摘要：Transit timing variations (TTVs) can provide useful information for systems observed by transit, as they allow us to put constraints on the masses and eccentricities of the observed planets, or even to constrain the existence of non-transiting companions. However, TTVs can also act as a detection bias that can prevent the detection of small planets in transit surveys that would otherwise be detected by standard algorithms such as the Boxed Least Square algorithm (BLS) if their orbit was not perturbed. This bias is especially present for surveys with a long baseline, such as Kepler, some of the TESS sectors, and the upcoming PLATO mission. Here we introduce a detection method that is robust to large TTVs, and illustrate its use by recovering and confirming a pair of resonant super-Earths with ten-hour TTVs around Kepler-1705. The method is based on a neural network trained to recover the tracks of low-signal-to-noise-ratio(S/N) perturbed planets in river diagrams. We recover the transit parameters of these candidates by fitting the light curve. The individual transit S/N of Kepler-1705b and c are about three times lower than all the previously known planets with TTVs of 3 hours or more, pushing the boundaries in the recovery of these small, dynamically active planets. Recovering this type of object is essential for obtaining a complete picture of the observed planetary systems, and solving for a bias not often taken into account in statistical studies of exoplanet populations. In addition, TTVs are a means of obtaining mass estimates which can be essential for studying the internal structure of planets discovered by transit surveys. Finally, we show that due to the strong orbital perturbations, it is possible that the spin of the outer resonant planet of Kepler-1705 is trapped in a sub- or super-synchronous spin-orbit resonance.

【3】 A Time-Series Scale Mixture Model of EEG with a Hidden Markov Structure for Epileptic Seizure Detection 标题：用于癫痫发作检测的隐马尔可夫结构EEG时间序列尺度混合模型链接：https://arxiv.org/abs/2111.06526

作者：Akira Furui,Tomoyuki Akiyama,Toshio Tsuji 机构： Tsuji are with the Graduate School of Advanced Scienceand Engineering, Hiroshima University, Akiyama is with the Department of Child Neurology, OkayamaUniversity Hospital 备注：Accepted at EMBC2021 摘要：在这篇文章中，我们提出了一个基于尺度混合分布和马尔可夫变换的时间序列随机模型来检测脑电图（EEG）中的癫痫发作。在该模型中，假设每个时间点的脑电信号是服从高斯分布的随机变量。高斯分布的协方差矩阵用一个潜在尺度参数加权，该参数也是一个随机变量，导致协方差的随机波动。通过在这种随机关系的背景下引入带有马尔可夫链的潜在状态变量，可以根据癫痫发作的状态来表示潜在尺度参数分布的时间序列变化。在一个实验中，我们使用从临床数据集中分解的多频带EEG评估了所提出的癫痫检测模型的性能。结果表明，所提出的模型能够以高灵敏度检测癫痫发作，并且优于多个基线。摘要：In this paper, we propose a time-series stochastic model based on a scale mixture distribution with Markov transitions to detect epileptic seizures in electroencephalography (EEG). In the proposed model, an EEG signal at each time point is assumed to be a random variable following a Gaussian distribution. The covariance matrix of the Gaussian distribution is weighted with a latent scale parameter, which is also a random variable, resulting in the stochastic fluctuations of covariances. By introducing a latent state variable with a Markov chain in the background of this stochastic relationship, time-series changes in the distribution of latent scale parameters can be represented according to the state of epileptic seizures. In an experiment, we evaluated the performance of the proposed model for seizure detection using EEGs with multiple frequency bands decomposed from a clinical dataset. The results demonstrated that the proposed model can detect seizures with high sensitivity and outperformed several baselines.

分类|识别(2篇)

【1】 Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization 标题：基于残差归一化的有效声场分类的域泛化链接：https://arxiv.org/abs/2111.06531

作者：Byeonggeun Kim,Seunghan Yang,Jangho Kim,Simyung Chang 机构：Qualcomm AI Research†, Qualcomm Korea YH, Seoul, Republic of Korea, Seoul National University, Seoul, Republic of Korea 备注：Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021) 摘要：如何通过一个有效设计的单一声场景分类系统来处理多设备音频输入是一个实际的研究课题。在这项工作中，我们提出了残差规范化，这是一种新的特征规范化方法，它使用频率方向的规范化%实例规范化和一个快捷路径来丢弃不必要的特定于设备的信息，而不会丢失有用的分类信息。此外，我们还介绍了一种高效的体系结构，BC-ResNet-ASC，它是基线体系结构的一个改进版本，具有有限的接受域。BC ResNet ASC的性能优于基线体系结构，即使它包含少量参数。通过三种模型压缩方案：剪枝、量化和知识提取，我们可以进一步降低模型复杂度，同时缓解性能下降。建议的系统在TAU Urban Acoustic Scenes 2020移动开发数据集上实现了76.3%的平均测试精度，该数据集具有315k参数，压缩到61.0KB的非零参数后，平均测试精度为75.3%。该方法在DCASE 2021挑战赛TASK1A中获得第一名。摘要：It is a practical research topic how to deal with multi-device audio inputs by a single acoustic scene classification system with efficient design. In this work, we propose Residual Normalization, a novel feature normalization method that uses frequency-wise normalization % instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful information for classification. Moreover, we introduce an efficient architecture, BC-ResNet-ASC, a modified version of the baseline architecture with a limited receptive field. BC-ResNet-ASC outperforms the baseline architecture even though it contains the small number of parameters. Through three model compression schemes: pruning, quantization, and knowledge distillation, we can reduce model complexity further while mitigating the performance degradation. The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters. The proposed method won the 1st place in DCASE 2021 challenge, TASK1A.

【2】 Multiple Hypothesis Hypergraph Tracking for Posture Identification in Embryonic Caenorhabditis elegans 标题：多假设超图跟踪在秀丽线虫胚胎体位识别中的应用链接：https://arxiv.org/abs/2111.06425

作者：Andrew Lauziere,Evan Ardiel,Stephen Xu,Hari Shroff 机构：Department of Mathematics, University of Maryland, College Park, College Park, MD , Department of Molecular Biology at MGH, Harvard Medical School, Boston, MA , Laboratory of High Resolution Optical Imaging, National Institutes of Health, Bethesda, MD 摘要：当前的多目标跟踪（MOT）方法依赖于经历可预测运动的独立目标轨迹来有效跟踪大量目标。不稳定的物体运动和不完善的检测等对抗性条件造成了一种具有挑战性的跟踪环境，在这种环境中，已建立的方法可能产生不充分的结果。多假设超图跟踪（MHHT）是为了在噪声检测中对相互依赖的目标进行MOT而发展起来的。该方法通过超图扩展了传统的多假设跟踪（MHT）方法，对相关目标运动进行建模，从而在具有挑战性的场景中实现鲁棒跟踪。MHHT应用于秀丽隐杆线虫胚胎发育后期的接缝细胞追踪。摘要：Current methods in multiple object tracking (MOT) rely on independent object trajectories undergoing predictable motion to effectively track large numbers of objects. Adversarial conditions such as volatile object motion and imperfect detections create a challenging tracking landscape in which established methods may yield inadequate results. Multiple hypothesis hypergraph tracking (MHHT) is developed to perform MOT among interdependent objects amid noisy detections. The method extends traditional multiple hypothesis tracking (MHT) via hypergraphs to model correlated object motion, allowing for robust tracking in challenging scenarios. MHHT is applied to perform seam cell tracking during late-stage embryogenesis in embryonic C. elegans.

编码器(1篇)

【1】 PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages 标题：PESTO：基于切换点的代码混合语言动态相对位置编码链接：https://arxiv.org/abs/2111.06599

作者：Mohsin Ali,Kandukuri Sai Teja,Sumanth Manduru,Parth Patwa,Amitava Das 机构：IIIT Sri City, India, UCLA, USA, Wipro AI Labs, India, AI Institute, University of South Carolina, USA 备注：Accepted as Student Abstract at AAAI 2022 摘要：最近，针对代码混合（CM）或混合语言文本的NLP应用取得了巨大的发展势头，主要原因是在印度、墨西哥、欧洲、美国部分地区等多语言社会的社交媒体通信中，语言混合非常普遍。单词嵌入是当今任何NLP系统的基本构建块，CM语言的单词嵌入是一个尚未探索的领域。CM单词嵌入的主要瓶颈是语言切换的切换点。由于所见示例的高度差异，这些位置缺乏上下文和统计系统，无法对这种现象进行建模。在本文中，我们提出了我们的初步观察应用开关点为基础的位置编码技术的CM语言，特别是Hinglish（印地语英语）。结果仅略好于SOTA，但很明显，位置编码是为CM文本训练位置敏感语言模型的有效方法。摘要：NLP applications for code-mixed (CM) or mix-lingual text have gained a significant momentum recently, the main reason being the prevalence of language mixing in social media communications in multi-lingual societies like India, Mexico, Europe, parts of USA etc. Word embeddings are basic build-ing blocks of any NLP system today, yet, word embedding for CM languages is an unexplored territory. The major bottleneck for CM word embeddings is switching points, where the language switches. These locations lack in contextually and statistical systems fail to model this phenomena due to high variance in the seen examples. In this paper we present our initial observations on applying switching point based positional encoding techniques for CM language, specifically Hinglish (Hindi - English). Results are only marginally better than SOTA, but it is evident that positional encoding could bean effective way to train position sensitive language models for CM text.

优化|敛散性(4篇)

【1】 Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity 标题：一般空间MDP的Q-学习：弱连续性下量化的收敛性和近最优性链接：https://arxiv.org/abs/2111.06781

作者：Ali Devran Kara,Naci Saldi,Serdar Yüksel 机构： Y¨uksel is with the Department of Mathematics and Statistics, Queen’s University 摘要：强化学习算法通常要求马尔可夫决策过程（MDP）中的状态空间和动作空间是有限的，文献中对这种算法在连续状态空间和动作空间中的适用性进行了各种努力。在本文中，我们证明了在非常温和的正则性条件下（特别是只涉及MDP过渡核的弱连续性），标准Borel MDP通过量化状态和动作的Q-学习收敛到极限，此外，该极限满足一个最优性方程，该方程导致具有显式性能边界的近似最优性，或保证渐近最优。我们的方法建立在（i）将量化视为测量核心，从而将量化MDP视为POMDP，（ii）利用POMDP的Q-学习的近似最优性和收敛结果，以及（iii）最后，具有弱连续核的MDP的有限状态模型逼近的近最优性，我们证明了它对应于构造的POMDP的不动点。因此，我们提出了一个非常普遍的收敛性和逼近结果的适用性Q-学习的连续MDP。摘要：Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions converge to a limit, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a POMDP, (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.

【2】 Approximating Optimal Transport via Low-rank and Sparse Factorization 标题：利用低秩稀疏因式分解逼近最优传输链接：https://arxiv.org/abs/2111.06546

作者：Weijie Liu,Chao Zhang,Nenggan Zheng,Hui Qian 机构： Zhejiang University 摘要：最优传输（OT）自然出现在广泛的机器学习应用中，但往往成为计算瓶颈。最近，有一行工作建议通过在低秩子空间中搜索emph{transport plan}来近似求解OT。然而，最优运输计划往往不是低秩的，这往往会产生较大的近似误差。例如，当Monge的emph{transport map}存在时，传输计划是满秩的。本文讨论了具有足够精度和效率的OT距离计算。提出了一种新的OT近似方法，将交通计划分解为低秩矩阵和稀疏矩阵之和。我们从理论上分析了近似误差。然后设计了一种增广拉格朗日方法来有效地计算运输计划。摘要：Optimal transport (OT) naturally arises in a wide range of machine learning applications but may often become the computational bottleneck. Recently, one line of works propose to solve OT approximately by searching the emph{transport plan} in a low-rank subspace. However, the optimal transport plan is often not low-rank, which tends to yield large approximation errors. For example, when Monge's emph{transport map} exists, the transport plan is full rank. This paper concerns the computation of the OT distance with adequate accuracy and efficiency. A novel approximation for OT is proposed, in which the transport plan can be decomposed into the sum of a low-rank matrix and a sparse one. We theoretically analyze the approximation error. An augmented Lagrangian method is then designed to efficiently calculate the transport plan.

【3】 Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs 标题：评估费用未知的多步预算贝叶斯优化链接：https://arxiv.org/abs/2111.06537

作者：Raul Astudillo,Daniel R. Jiang,Maximilian Balandat,Eytan Bakshy,Peter I. Frazier 机构：Cornell University, Facebook 备注：In Advances in Neural Information Processing Systems, 2021 摘要：贝叶斯优化（BO）是一种样本有效的方法，用于优化代价高昂的黑盒函数。大多数BO方法忽略了评估成本在优化领域的变化。然而，这些成本可能具有高度的异质性，并且通常事先未知。这发生在许多实际环境中，例如机器学习算法的超参数调整或基于物理的模拟优化。此外，承认成本异质性的少数现有方法自然不能适应总评估成本的预算约束。这种未知成本和预算约束的组合为勘探开发权衡引入了一个新的维度，即了解成本本身会产生成本。现有的方法没有以一种有原则的方式对这个问题的各种权衡进行推理，常常导致性能不佳。我们通过证明预期改善和单位成本预期改善（可以说是实践中使用最广泛的两个收购函数）可以任意低于最优非短视政策，从而正式证明这一说法。为了克服现有方法的缺点，我们提出了预算多步骤预期改进，这是一个非短视的获取函数，将经典的预期改进推广到异质和未知评估成本的设置。最后，我们证明了我们的捕获函数在各种合成和实际问题上优于现有方法。摘要：Bayesian optimization (BO) is a sample-efficient approach to optimizing costly-to-evaluate black-box functions. Most BO methods ignore how evaluation costs may vary over the optimization domain. However, these costs can be highly heterogeneous and are often unknown in advance. This occurs in many practical settings, such as hyperparameter tuning of machine learning algorithms or physics-based simulation optimization. Moreover, those few existing methods that acknowledge cost heterogeneity do not naturally accommodate a budget constraint on the total evaluation cost. This combination of unknown costs and a budget constraint introduces a new dimension to the exploration-exploitation trade-off, where learning about the cost incurs the cost itself. Existing methods do not reason about the various trade-offs of this problem in a principled way, leading often to poor performance. We formalize this claim by proving that the expected improvement and the expected improvement per unit of cost, arguably the two most widely used acquisition functions in practice, can be arbitrarily inferior with respect to the optimal non-myopic policy. To overcome the shortcomings of existing approaches, we propose the budgeted multi-step expected improvement, a non-myopic acquisition function that generalizes classical expected improvement to the setting of heterogeneous and unknown evaluation costs. Finally, we show that our acquisition function outperforms existing methods in a variety of synthetic and real problems.

【4】 Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem 标题：指数族映射的收敛速度与随机镜像下降--一个公开问题链接：https://arxiv.org/abs/2111.06826

作者：Rémi Le Priol,Frederik Kunstner,Damien Scieur,Simon Lacoste-Julien 机构：Mila, Université de Montreal, University of British Columbia, Samsung, SAIT AI Lab, Montreal, Canada CIFAR AI Chair 备注：9 pages and 3 figures Appendix 摘要：我们考虑的最大上限估计（MLE），或共轭最大后验概率（MAP）的指数族的上限对数似然次优的问题，在非渐近的方式。令人惊讶的是，我们在文献中没有找到这个问题的一般解决方案。特别是，当前的理论不适用于高斯分布或有趣的少样本区域。在展示了问题的各个方面之后，我们展示了我们可以将地图解释为在对数似然上运行随机镜像下降（SMD）。然而，现代收敛结果并不适用于指数族的标准示例，突出了收敛文献中的漏洞。我们相信，解决这个非常根本的问题可能会给统计和优化社区带来进步。摘要：We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime. After exhibiting various facets of the problem, we show we can interpret the MAP as running stochastic mirror descent (SMD) on the log-likelihood. However, modern convergence results do not apply for standard examples of the exponential family, highlighting holes in the convergence literature. We believe solving this very fundamental problem may bring progress to both the statistics and optimization communities.

预测|估计(6篇)

【1】 AWD3: Dynamic Reduction of the Estimation Bias 标题：AWD3：动态减小估计偏差链接：https://arxiv.org/abs/2111.06780

作者：Dogan C. Cicek,Enes Duran,Baturay Saglam,Kagan Kaya,Furkan B. Mutlu,Suleyman S. Kozat 机构：Electrical and Electronics Engineering Department, Bilkent University, Ankara, Turkey, Equal contribution, †IEEE Senior Member 备注：Accepted at The 33rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2021) 摘要：基于值的深度强化学习（RL）算法存在主要由函数逼近和时间差（TD）学习引起的估计偏差。该问题会导致错误的状态动作值估计，从而损害学习算法的性能和鲁棒性。尽管提出了几种技术来解决这个问题，但学习算法仍然存在这种偏差。在这里，我们介绍了一种技术，它使用经验重放机制消除了非策略连续控制算法中的估计偏差。我们在加权双延迟深确定性策略梯度算法中自适应学习加权超参数beta。我们的方法称为自适应WD3（AWD3）。我们通过OpenAI gym的连续控制环境表明，我们的算法匹配或优于最先进的非策略梯度学习算法。摘要：Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from this bias. Here, we introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism. We adaptively learn the weighting hyper-parameter beta in the Weighted Twin Delayed Deep Deterministic Policy Gradient algorithm. Our method is named Adaptive-WD3 (AWD3). We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.

【2】 Identifying On-road Scenarios Predictive of ADHD usingDriving Simulator Time Series Data 标题：利用驾驶模拟器时间序列数据识别ADHD的道路情景预测链接：https://arxiv.org/abs/2111.06774

作者：David Grethlein,Aleksanteri Sladek,Santiago Ontañón 机构：Drexel University, Philadelphia, Pennsylvania, USA, University of Pennsylvania 摘要：在本文中，我们介绍了一种新的算法称为迭代节减少（ISR）自动识别子区间的时空时间序列是预测的目标分类任务。具体地说，利用从驾驶模拟器研究中收集的数据，我们确定沿模拟路线的哪些空间区域（称为“部分”）倾向于表现出能够预测注意力缺陷多动障碍（ADHD）存在的驾驶行为。识别这些路段非常重要，主要原因有两个：（1）通过过滤掉非预测性时间序列子区间来提高训练模型的预测精度；（2）深入了解道路场景（命名事件）从接受ADHD治疗的患者与未接受ADHD治疗的患者中引出明显不同的驾驶行为。我们的实验结果表明，与之前的工作相比，性能得到了改善（ 10%的准确度），并且在模拟器中识别和编写道路事件的预测路段（通过转弯和弯道）之间具有良好的对齐。摘要：In this paper we introduce a novel algorithm called Iterative Section Reduction (ISR) to automatically identify sub-intervals of spatiotemporal time series that are predictive of a target classification task. Specifically, using data collected from a driving simulator study, we identify which spatial regions (dubbed "sections") along the simulated routes tend to manifest driving behaviors that are predictive of the presence of Attention Deficit Hyperactivity Disorder (ADHD). Identifying these sections is important for two main reasons: (1) to improve predictive accuracy of the trained models by filtering out non-predictive time series sub-intervals, and (2) to gain insights into which on-road scenarios (dubbed events) elicit distinctly different driving behaviors from patients undergoing treatment for ADHD versus those that are not. Our experimental results show both improved performance over prior efforts ( 10% accuracy) and good alignment between the predictive sections identified and scripted on-road events in the simulator (negotiating turns and curves).

【3】 Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-based Approaches 标题：行人轨迹预测方法综述：深度学习和基于知识方法的比较链接：https://arxiv.org/abs/2111.06740

作者：Raphael Korbmacher,Antoine Tordeux 机构： University of Wuppertal 备注：20 pages, 7 tables, 4 figures 摘要：在人群场景中，预测行人的轨迹是一项复杂且具有挑战性的任务，取决于许多外部因素。场景的拓扑结构和行人之间的相互作用只是其中的一部分。随着数据科学和数据采集技术的发展，深度学习方法已成为众多领域的研究热点。因此，越来越多的研究人员将这些方法应用于预测行人的轨迹，这并不奇怪。本文将这些相对较新的深度学习算法与广泛用于模拟行人动力学的经典知识模型进行了比较。它提供了这两种方法的全面文献综述，探讨了技术和面向应用的差异，解决了开放性问题以及未来的发展方向。我们的研究指出，由于深度学习算法的高精度，基于知识的模型预测局部轨迹的相关性现在是值得怀疑的。然而，深度学习算法用于大规模模拟和集体动力学描述的能力仍有待证明。此外，比较表明，两种方法的结合（混合方法）似乎有希望克服诸如深度学习方法缺乏可解释性等缺点。摘要：In crowd scenarios, predicting trajectories of pedestrians is a complex and challenging task depending on many external factors. The topology of the scene and the interactions between the pedestrians are just some of them. Due to advancements in data-science and data collection technologies deep learning methods have recently become a research hotspot in numerous domains. Therefore, it is not surprising that more and more researchers apply these methods to predict trajectories of pedestrians. This paper compares these relatively new deep learning algorithms with classical knowledge-based models that are widely used to simulate pedestrian dynamics. It provides a comprehensive literature review of both approaches, explores technical and application oriented differences, and addresses open questions as well as future development directions. Our investigations point out that the pertinence of knowledge-based models to predict local trajectories is nowadays questionable because of the high accuracy of the deep learning algorithms. Nevertheless, the ability of deep-learning algorithms for large-scale simulation and the description of collective dynamics remains to be demonstrated. Furthermore, the comparison shows that the combination of both approaches (the hybrid approach) seems to be promising to overcome disadvantages like the missing explainability of the deep learning approach.

【4】 Mobility prediction Based on Machine Learning Algorithms 标题：基于机器学习算法的移动性预测链接：https://arxiv.org/abs/2111.06723

作者：Donglin Wang,Qiuheng Zhou,Sanket Partani,Anjie Qiu,Hans D. Schotten 机构：University of Kaiserslautern, Kaiserslautern, Germany, German Research Center for Artificial Intelligence(DFKI) 备注：5 pages, 7 figures, MKT'21 osnabruck 摘要：目前，移动通信在5G通信行业发展迅速。随着容量要求和体验质量要求的不断提高，移动性预测已广泛应用于移动通信，并已成为利用历史交通信息预测未来交通用户位置的关键因素之一，因为准确的移动性预测可以帮助实现高效的无线资源管理、协助路线规划、指导车辆调度或缓解交通拥堵。然而，由于复杂的交通网络，移动预测是一个具有挑战性的问题。在过去的几年里，人们在这方面做了大量的研究，包括基于非机器学习（Non-ML）和基于机器学习（ML）的移动预测。本文首先介绍了移动预测的最新技术。然后，选择支持向量机（SVM）算法、ML算法进行实际交通数据的训练。最后，我们分析了移动性预测的仿真结果，并介绍了未来的工作计划，其中移动性预测将用于改善移动通信。摘要：Nowadays mobile communication is growing fast in the 5G communication industry. With the increasing capacity requirements and requirements for quality of experience, mobility prediction has been widely applied to mobile communication and has becoming one of the key enablers that utilizes historical traffic information to predict future locations of traffic users, Since accurate mobility prediction can help enable efficient radio resource management, assist route planning, guide vehicle dispatching, or mitigate traffic congestion. However, mobility prediction is a challenging problem due to the complicated traffic network. In the past few years, plenty of researches have been done in this area, including Non-Machine-Learning (Non-ML)- based and Machine-Learning (ML)-based mobility prediction. In this paper, firstly we introduce the state of the art technologies for mobility prediction. Then, we selected Support Vector Machine (SVM) algorithm, the ML algorithm for practical traffic date training. Lastly, we analyse the simulation results for mobility prediction and introduce a future work plan where mobility prediction will be applied for improving mobile communication.

【5】 A Reverse Jensen Inequality Result with Application to Mutual Information Estimation 标题：逆Jensen不等式结果及其在互信息估计中的应用链接：https://arxiv.org/abs/2111.06676

作者：Gerhard Wunder,Benedikt Groß,Rick Fritschek,Rafael F. Schaefer 机构：∗ Cybersecurity and AI Group, Freie Universit¨at Berlin, Takustr. , Berlin, Germany, † Chair of Communications Engineering and Security, University of Siegen, H¨olderlinstr. , Siegen, Germany 备注：6 pages, ITW 2021 摘要：Jensen不等式是信息论和机器学习等众多领域中广泛使用的工具。它还可用于推导其他标准不等式，如算术和几何平均不等式或H“更古老的不平等。在概率环境中，Jensen不等式描述了凸函数与期望值之间的关系。在这项工作中，我们想从不等式的相反方向看概率设置。我们证明了在最小约束条件下，通过适当的标度，Jensen不等式可以被逆转。我们相信，由此产生的工具可用于许多应用，并提供互信息的变分估计，其中逆不等式导致新的估计量具有优于当前估计量的训练行为。摘要：The Jensen inequality is a widely used tool in a multitude of fields, such as for example information theory and machine learning. It can be also used to derive other standard inequalities such as the inequality of arithmetic and geometric means or the H"older inequality. In a probabilistic setting, the Jensen inequality describes the relationship between a convex function and the expected value. In this work, we want to look at the probabilistic setting from the reverse direction of the inequality. We show that under minimal constraints and with a proper scaling, the Jensen inequality can be reversed. We believe that the resulting tool can be helpful for many applications and provide a variational estimation of mutual information, where the reverse inequality leads to a new estimator with superior training behavior compared to current estimators.

【6】 Learning Quantile Functions without Quantile Crossing for Distribution-free Time Series Forecasting 标题：无分布时间序列预测中无交叉分位数函数的学习链接：https://arxiv.org/abs/2111.06581

作者：Youngsuk Park,Danielle Maddix,François-Xavier Aubet,Kelvin Kan,Jan Gasthaus,Yuyang Wang 机构：Fran¸cois-Xavier Aubet, AWS AI Labs 备注：24 pages 摘要：分位数回归是量化不确定性、拟合具有挑战性的基本分布的有效技术，通常通过多个分位数水平上的联合学习提供完整的概率预测。然而，这些联合分位数回归的一个常见缺点是{分位数交叉}，这违反了条件分位数函数的期望单调性。在这项工作中，我们提出了增量（样条）分位数函数I（S）QF，这是一种灵活有效的无分布分位数估计框架，可通过简单的神经网络层解决分位数交叉问题。此外，I（S）QF inter/extraction可预测与基础训练水平不同的任意分位数水平。借助于对I（S）QF表示的连续排序概率得分的分析评估，我们将我们的方法应用于基于神经网络的时间序列预测案例，在这些案例中，非训练分位数水平的昂贵再训练成本的节约尤为显著。我们还提供了在序列到序列设置下我们提出的方法的泛化误差分析。最后，大量实验表明，与其他基线相比，一致性和精度误差有所改善。摘要：Quantile regression is an effective technique to quantify uncertainty, fit challenging underlying distributions, and often provide full probabilistic predictions through joint learnings over multiple quantile levels. A common drawback of these joint quantile regressions, however, is textit{quantile crossing}, which violates the desirable monotone property of the conditional quantile function. In this work, we propose the Incremental (Spline) Quantile Functions I(S)QF, a flexible and efficient distribution-free quantile estimation framework that resolves quantile crossing with a simple neural network layer. Moreover, I(S)QF inter/extrapolate to predict arbitrary quantile levels that differ from the underlying training ones. Equipped with the analytical evaluation of the continuous ranked probability score of I(S)QF representations, we apply our methods to NN-based times series forecasting cases, where the savings of the expensive re-training costs for non-trained quantile levels is particularly significant. We also provide a generalization error analysis of our proposed approaches under the sequence-to-sequence setting. Lastly, extensive experiments demonstrate the improvement of consistency and accuracy errors over other baselines.

其他神经网络|深度学习|模型|建模(16篇)

【1】 Monolithic Silicon Photonic Architecture for Training Deep Neural Networks with Direct Feedback Alignment 标题：用于训练直接反馈对准的深度神经网络的单片硅光子结构链接：https://arxiv.org/abs/2111.06862

作者：Matthew J. Filipovich,Zhimu Guo,Mohammed Al-Qadasi,Bicky A. Marquez,Hugh D. Morison,Volker J. Sorger,Paul R. Prucnal,Sudip Shekhar,Bhavin J. Shastri 机构：Department of Physics, Engineering Physics and Astronomy, Queen’s University, Kingston, ON K,L ,N, Canada, Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V,T ,Z, Canada 备注：11 pages, 5 figures 摘要：人工智能（AI）领域近年来取得了巨大的发展，然而，AI系统持续发展面临的一些最紧迫的挑战是电子计算机体系结构面临的基本带宽、能效和速度限制。人们对使用光子处理器来执行神经网络推理操作越来越感兴趣，然而这些网络目前使用标准数字电子学进行训练。在这里，我们提出了由CMOS兼容硅光子体系结构实现的神经网络片上训练，以利用大规模并行、高效和快速数据操作的潜力。我们的方案采用了直接反馈对齐训练算法，该算法使用错误反馈而不是错误反向传播来训练神经网络，并且可以以每秒数万亿次乘法累加（MAC）操作的速度运行，同时每个MAC操作消耗不到一个微微库勒。光子体系结构利用微环谐振器阵列的并行矩阵矢量乘法来处理沿单波导总线的多通道模拟信号，以原位计算每个神经网络层的梯度矢量，这是在反向传递过程中执行的计算成本最高的操作。我们还通过实验演示了使用片上MAC操作结果，使用MNIST数据集训练深度神经网络。我们的高效、超快神经网络训练新方法展示了光子学作为执行AI应用程序的一个有前途的平台。摘要：The field of artificial intelligence (AI) has witnessed tremendous growth in recent years, however some of the most pressing challenges for the continued development of AI systems are the fundamental bandwidth, energy efficiency, and speed limitations faced by electronic computer architectures. There has been growing interest in using photonic processors for performing neural network inference operations, however these networks are currently trained using standard digital electronics. Here, we propose on-chip training of neural networks enabled by a CMOS-compatible silicon photonic architecture to harness the potential for massively parallel, efficient, and fast data operations. Our scheme employs the direct feedback alignment training algorithm, which trains neural networks using error feedback rather than error backpropagation, and can operate at speeds of trillions of multiply-accumulate (MAC) operations per second while consuming less than one picojoule per MAC operation. The photonic architecture exploits parallelized matrix-vector multiplications using arrays of microring resonators for processing multi-channel analog signals along single waveguide buses to calculate the gradient vector of each neural network layer in situ, which is the most computationally expensive operation performed during the backward pass. We also experimentally demonstrate training a deep neural network with the MNIST dataset using on-chip MAC operation results. Our novel approach for efficient, ultra-fast neural network training showcases photonics as a promising platform for executing AI applications.

【2】 A posteriori learning of quasi-geostrophic turbulence parametrization: an experiment on integration steps 标题：准地转湍流参数化的后验学习：积分步骤的实验链接：https://arxiv.org/abs/2111.06841

作者：Hugo Frezat,Julien Le Sommer,Ronan Fablet,Guillaume Balarac,Redouane Lguensat 机构：Univ. Grenoble Alpes, CNRS UMR LEGI, Grenoble, France, Univ. Grenoble Alpes, CNRS UMR IGE, Grenoble, France, IMT Atlantique, CNRS UMR Lab-STICC, Brest, France, Institut Universitaire de France (IUF), Paris, France 备注：6 pages, 3 figures, presented at the Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021) 摘要：简化模型的亚网格尺度动力学建模是一个长期存在的开放问题，在无法进行直接数值模拟（DNS）的海洋、大气和气候预测中得到应用。虽然神经网络（NNs）已经成功地应用于一系列三维问题，但二维流动的反向能量传递仍然是训练模型的稳定性问题。我们表明，当应用于准地转湍流时，结合动力学解算器和有意义的基于$textit{a posteriori}$$的损失函数学习模型可以获得稳定和真实的模拟。摘要：Modeling the subgrid-scale dynamics of reduced models is a long standing open problem that finds application in ocean, atmosphere and climate predictions where direct numerical simulation (DNS) is impossible. While neural networks (NNs) have already been applied to a range of three-dimensional problems with success, the backward energy transfer of two-dimensional flows still remains a stability issue for trained models. We show that learning a model jointly with the dynamical solver and a meaningful $textit{a posteriori}$-based loss function lead to stable and realistic simulations when applied to quasi-geostrophic turbulence.

【3】 A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes 标题：部分可观测马尔可夫决策过程非策略评估的极小极大学习方法链接：https://arxiv.org/abs/2111.06784

作者：Chengchun Shi,Masatoshi Uehara,Nan Jiang 机构：Department of Statistics, London School of Economics and Political Science, Department of Computer Science, Cornell University, Department of Computer Science, University of Illinois Urbana-Champaign 摘要：我们考虑在部分Observable Markov决策过程（POMDP）中的非策略评估（OPE），其中评估策略仅依赖于可观察变量，行为策略依赖于不可观察的潜在变量。现有的工作要么假设没有未测量的混杂因素，要么关注观察和状态空间都是表格的设置。因此，这些方法要么在存在未测量的混杂因素时存在较大偏差，要么在具有连续或较大观察/状态空间的设置中存在较大差异。在这项工作中，我们首先通过引入连接目标策略值和观测数据分布的桥函数，提出了具有潜在混杂因子的POMDPs中OPE的新识别方法。在完全可观察的MDP中，这些桥函数简化为评估和行为策略之间熟悉的值函数和边际密度比。接下来，我们提出学习这些桥函数的极小极大估计方法。我们的建议允许一般函数近似，因此适用于具有连续或大观测/状态空间的设置。最后，我们基于这些估计的桥函数构造了三个估计量，分别对应于基于值函数的估计量、边缘化重要性抽样估计量和双稳健估计量。详细研究了它们的非渐近性和渐近性。摘要：We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. As such, these methods suffer from either a large bias in the presence of unmeasured confounders, or a large variance in settings with continuous or large observation/state spaces. In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution. In fully-observable MDPs, these bridge functions reduce to the familiar value functions and marginal density ratios between the evaluation and the behavior policies. We next propose minimax estimation methods for learning these bridge functions. Our proposal permits general function approximation and is thus applicable to settings with continuous or large observation/state spaces. Finally, we construct three estimators based on these estimated bridge functions, corresponding to a value function-based estimator, a marginalized importance sampling estimator, and a doubly-robust estimator. Their nonasymptotic and asymptotic properties are investigated in detail.

【4】 Can neural networks predict dynamics they have never seen? 标题：神经网络能预测他们从未见过的动态吗？链接：https://arxiv.org/abs/2111.06783

作者：Anton Pershin,Cedric Beaume,Kuan Li,Steven M. Tobias 机构：Atmospheric, Oceanic and Planetary Physics, University of Oxford, Oxford, UK; bSchool of Mathematics, University of Leeds, Leeds, UK 备注：7 pages, 5 figures 摘要：神经网络已被证明在广泛的复杂任务中非常成功，从图像识别和目标检测到语音识别和机器翻译。他们的成功之一是在提供合适的数据训练集的情况下预测未来动态的技能。以往的研究表明，回声状态网络（ESNs）是递归神经网络的一个子集，它可以成功地预测比Lyapunov时间更长的混沌系统。这项研究表明，值得注意的是，ESN可以成功地预测与训练集中包含的任何行为在质量上不同的动力学行为。为流体动力学问题提供了证据，其中流体可以在层流（有序）和湍流（无序）状态之间转换。尽管ESN仅在湍流区域进行训练，但发现它可以预测层流行为。此外，还成功地预测了湍流-层流和层流-湍流过渡的统计数据，并讨论了ESNs作为过渡预警系统的作用。这些结果预计将广泛适用于数据驱动的一系列物理、气候、生物、生态和金融模型中的时间行为建模，这些模型的特点是存在临界点和几个竞争状态之间的突然转变。摘要：Neural networks have proven to be remarkably successful for a wide range of complicated tasks, from image recognition and object detection to speech recognition and machine translation. One of their successes is the skill in prediction of future dynamics given a suitable training set of data. Previous studies have shown how Echo State Networks (ESNs), a subset of Recurrent Neural Networks, can successfully predict even chaotic systems for times longer than the Lyapunov time. This study shows that, remarkably, ESNs can successfully predict dynamical behavior that is qualitatively different from any behavior contained in the training set. Evidence is provided for a fluid dynamics problem where the flow can transition between laminar (ordered) and turbulent (disordered) regimes. Despite being trained on the turbulent regime only, ESNs are found to predict laminar behavior. Moreover, the statistics of turbulent-to-laminar and laminar-to-turbulent transitions are also predicted successfully, and the utility of ESNs in acting as an early-warning system for transition is discussed. These results are expected to be widely applicable to data-driven modelling of temporal behaviour in a range of physical, climate, biological, ecological and finance models characterized by the presence of tipping points and sudden transitions between several competing states.

【5】 Monte Carlo dropout increases model repeatability 标题：蒙特卡罗退学提高了模型的重复性链接：https://arxiv.org/abs/2111.06754

作者：Andreanne Lemay,Katharina Hoebel,Christopher P. Bridge,Didem Egemen,Ana Cecilia Rodriguez,Mark Schiffman,John Peter Campbell,Jayashree Kalpathy-Cramer 机构： Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, USA, NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montreal, Canada, Mila, Quebec AI Institute, Canada 备注：Machine Learning for Health (ML4H) at NeurIPS 2021 - Extended Abstract 摘要：将人工智能集成到临床工作流程中需要可靠和健壮的模型。稳健性的主要特征之一是可重复性。在不评估模型重复性的情况下，对分类性能给予了很大的关注，从而导致开发出在实践中无法使用的模型。在这项工作中，我们评估了同一患者在同一次就诊期间获得的图像上四种模型类型的重复性。我们研究了二元、多类、有序和回归模型在三项医学图像分析任务中的性能：宫颈癌筛查、乳腺密度估计和早产儿视网膜病变分类。此外，我们还评估了测试时抽样蒙特卡罗辍学预测对分类性能和重复性的影响。利用蒙特卡罗预测显著提高了二元、多类和有序模型上所有任务的可重复性，导致95%一致性限值平均降低17%。摘要：The integration of artificial intelligence into clinical workflows requires reliable and robust models. Among the main features of robustness is repeatability. Much attention is given to classification performance without assessing the model repeatability, leading to the development of models that turn out to be unusable in practice. In this work, we evaluate the repeatability of four model types on images from the same patient that were acquired during the same visit. We study the performance of binary, multi-class, ordinal, and regression models on three medical image analysis tasks: cervical cancer screening, breast density estimation, and retinopathy of prematurity classification. Moreover, we assess the impact of sampling Monte Carlo dropout predictions at test time on classification performance and repeatability. Leveraging Monte Carlo predictions significantly increased repeatability for all tasks on the binary, multi-class, and ordinal models leading to an average reduction of the 95% limits of agreement by 17% points.

【6】 One model Packs Thousands of Items with Recurrent Conditional Query Learning 标题：一个模型使用递归条件查询学习来打包数千个项目链接：https://arxiv.org/abs/2111.06726

作者：Dongda Li,Zhaoquan Gu,Yuexuan Wang,Changwei Ren,Francis C. M. Lau 机构： Guangzhou University, Zhejiang University,The University of Hong Kong 备注：None 摘要：最近的研究表明，神经组合优化（NCO）在路由等许多组合优化问题上比传统算法具有优势，但在复杂的优化任务（如包含相互制约的动作空间的布局）中效率较低。在本文中，我们提出了一种递归条件查询学习（RCQL）方法来解决二维和三维包装问题。我们首先通过一个循环编码器嵌入状态，然后通过来自先前操作的条件查询采用注意。条件查询机制填补了学习步骤之间的信息鸿沟，将问题塑造为马尔可夫决策过程。得益于重复性，单个RCQL模型能够处理不同规模的包装问题。实验结果表明，RCQL能够有效地学习离线和在线条形包装问题（SPP）的强启发式算法，在空间利用率方面优于各种基线。与最先进的方法相比，RCQL在离线2D 40箱情况下将平均箱间距比降低1.83%，在3D情况下将平均箱间距比降低7.84%。同时，我们的方法还实现了1000个项目的SPP的空间利用率比现有技术高5.64%。摘要：Recent studies have revealed that neural combinatorial optimization (NCO) has advantages over conventional algorithms in many combinatorial optimization problems such as routing, but it is less efficient for more complicated optimization tasks such as packing which involves mutually conditioned action spaces. In this paper, we propose a Recurrent Conditional Query Learning (RCQL) method to solve both 2D and 3D packing problems. We first embed states by a recurrent encoder, and then adopt attention with conditional queries from previous actions. The conditional query mechanism fills the information gap between learning steps, which shapes the problem as a Markov decision process. Benefiting from the recurrence, a single RCQL model is capable of handling different sizes of packing problems. Experiment results show that RCQL can effectively learn strong heuristics for offline and online strip packing problems (SPPs), outperforming a wide range of baselines in space utilization ratio. RCQL reduces the average bin gap ratio by 1.83% in offline 2D 40-box cases and 7.84% in 3D cases compared with state-of-the-art methods. Meanwhile, our method also achieves 5.64% higher space utilization ratio for SPPs with 1000 items than the state of the art.

【7】 Silicon photonic subspace neural chip for hardware-efficient deep learning 标题：用于硬件高效深度学习的硅光子子空间神经芯片链接：https://arxiv.org/abs/2111.06705

作者：Chenghao Feng,Jiaqi Gu,Hanqing Zhu,Zhoufeng Ying,Zheng Zhao,David Z. Pan,Ray T. Chen 机构：T. Chen,, Microelectronics Research Center, The University of Texas at Austin, Austin, Texas , USA., Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Alpine Optoelectronics, CA, USA., Synopsys Inc., CA, USA. 备注：17 pages,3 figures 摘要：随着深度学习在许多人工智能应用中显示出革命性的性能，其不断增长的计算需求需要硬件加速器来实现大规模并行和提高吞吐量。光学神经网络（ONN）具有高并行性、低延迟和低能耗等优点，是下一代神经计算的一个很有前途的候选者。在这里，我们设计了一种硬件高效的光子子空间神经网络（PSNN）体系结构，其目标是比以前的ONN体系结构更低的光学元件使用率、面积成本和能耗，并且具有类似的任务性能。此外，还提供了一个硬件感知的训练框架，以最小化所需的设备编程精度，减少芯片面积，提高噪声鲁棒性。我们在一个蝴蝶型可编程硅光子集成电路上实验证明了我们的PSNN，并展示了它在实际图像识别任务中的实用性。摘要：As deep learning has shown revolutionary performance in many artificial intelligence applications, its escalating computation demand requires hardware accelerators for massive parallelism and improved throughput. The optical neural network (ONN) is a promising candidate for next-generation neurocomputing due to its high parallelism, low latency, and low energy consumption. Here, we devise a hardware-efficient photonic subspace neural network (PSNN) architecture, which targets lower optical component usage, area cost, and energy consumption than previous ONN architectures with comparable task performance. Additionally, a hardware-aware training framework is provided to minimize the required device programming precision, lessen the chip area, and boost the noise robustness. We experimentally demonstrate our PSNN on a butterfly-style programmable silicon photonic integrated circuit and show its utility in practical image recognition tasks.

【8】 DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents 标题：DeepXML：一种应用于短文本的深度极限多标签学习框架链接：https://arxiv.org/abs/2111.06685

作者：Kunal Dahiya,Deepak Saini,Anshul Mittal,Ankush Shaw,Kushal Dave,Akshay Soni,Himanshu Jain,Sumeet Agarwal,Manik Varma 机构：IIT Delhi, India, Microsoft Research, USA 备注：None 摘要：在深度极限多标签学习中，可伸缩性和准确性是公认的挑战，其目标是训练体系结构，以便使用超大标签集中最相关的标签子集自动注释数据点。本文开发了DeepXML框架，通过将deep-extreme-multi-label任务分解为四个简单的子任务来解决这些挑战，每个子任务都可以准确有效地进行训练。为这四个子任务选择不同的组件可以让DeepXML生成一系列算法，在准确性和可伸缩性之间进行不同的权衡。特别是，在公开可用的短文本数据集上，与领先的deep extreme分类器相比，DeepXML生成的Astec算法的准确度要高2-12%，训练速度要快5-30倍。Astec还可以高效地在Bing短文本数据集上进行训练，该数据集包含多达6200万个标签，同时每天在商品硬件上对数十亿用户和数据点进行预测。这使得Astec能够部署在Bing搜索引擎上，用于许多短文本应用程序，从匹配用户查询到广告商竞价短语，再到显示个性化广告，在点击率、覆盖率、收入和其他在线指标方面，它比目前正在生产的最先进技术有了显著的提高。DeepXML的代码可在https://github.com/Extreme-classification/deepxml 摘要：Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at https://github.com/Extreme-classification/deepxml

【9】 Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash 标题：学习打破深度感知散列：用例NeuralHash 链接：https://arxiv.org/abs/2111.06628

作者：Lukas Struppek,Dominik Hintersdorf,Daniel Neider,Kristian Kersting 机构：Department of Computer Science, TU Darmstadt, Darmstadt, Germany, Max Planck Institute for Software Systems, Kaiserslautern, Germany, Centre for Cognitive Science, TU Darmstadt, and Hessian Center for AI (hessian.AI) 备注：22 pages, 15 figures, 5 tables 摘要：苹果公司最近公布了其深度感知哈希系统NeuralHash，该系统可以在文件上传到iCloud服务之前检测用户设备上的儿童性虐待材料（CSAM）。公众对保护用户隐私和系统可靠性的批评很快就出现了。在本文中，我们提出了第一个基于神经哈希的深度感知哈希综合实证分析。具体地说，我们证明了当前的深度感知哈希可能并不健壮。对手可以通过在图像中应用细微的更改来操纵哈希值，这些更改可能是由基于梯度的方法引起的，也可能只是通过执行标准图像转换，强制或防止哈希冲突。这样的攻击使得恶意行为者很容易利用检测系统：从隐藏滥用材料到诬陷无辜用户，一切皆有可能。此外，使用散列值，仍然可以对存储在用户设备上的数据进行推断。在我们看来，根据我们的结果，目前形式的深度感知散列通常不适合于健壮的客户端扫描，不应从隐私角度使用。摘要：Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system's reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.

【10】 A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal 标题：一种基于卷积神经网络的孟加拉语音数字识别方法链接：https://arxiv.org/abs/2111.06625

作者：Ovishake Sen,Al-Mahmud,Pias Roy 机构：Computer Science and Engineering, Khulna University of Engineering, & Technology, Khulna, Bangladesh 备注：4 pages, 5 figures, 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), 14 to 16 September 2021, Khulna, Bangladesh 摘要：语音识别是一种将人类的语音信号转换成文本或文字，或以计算机或其他机器容易理解的任何形式的技术。有一些关于孟加拉语数字识别系统的研究，其中大多数使用的是在性别、年龄、方言和其他变量上几乎没有变化的小型数据集。本研究使用不同性别、年龄和方言的孟加拉国人的录音来创建一个大型语音数据集，该数据集包含说话的“0-9”孟加拉语数字。在这里，为创建数据集，每个数字记录了400个噪声和无噪声样本。Mel倒谱系数（MFCC）被用于从原始语音数据中提取有意义的特征。然后，利用卷积神经网络（CNN）检测孟加拉语数字。建议的技术在整个数据集中识别“0-9”孟加拉语语音数字的准确率为97.1%。使用10倍交叉验证对模型的效率进行了评估，获得了96.7%的准确率。摘要：Speech recognition is a technique that converts human speech signals into text or words or in any form that can be easily understood by computers or other machines. There have been a few studies on Bangla digit recognition systems, the majority of which used small datasets with few variations in genders, ages, dialects, and other variables. Audio recordings of Bangladeshi people of various genders, ages, and dialects were used to create a large speech dataset of spoken '0-9' Bangla digits in this study. Here, 400 noisy and noise-free samples per digit have been recorded for creating the dataset. Mel Frequency Cepstrum Coefficients (MFCCs) have been utilized for extracting meaningful features from the raw speech data. Then, to detect Bangla numeral digits, Convolutional Neural Networks (CNNs) were utilized. The suggested technique recognizes '0-9' Bangla spoken digits with 97.1% accuracy throughout the whole dataset. The efficiency of the model was also assessed using 10-fold crossvalidation, which yielded a 96.7% accuracy.

【11】 A Robust Deep Learning-Based Beamforming Design for RIS-assisted Multiuser MISO Communications with Practical Constraints 标题：一种基于深度学习的实用约束下RIS辅助多用户MISO通信的鲁棒波束形成设计链接：https://arxiv.org/abs/2111.06555

作者：Wangyang Xu,Lu Gan,Chongwen Huang 备注：31 pages, 13 figures 摘要：近年来，可重构智能表面（RIS）技术已成为改善无线通信的一项有前途的技术。它通过控制可重构无源元件，以更低的硬件成本和功耗控制入射信号，从而创造良好的传播环境。在本文中，我们考虑了一个RIS辅助多用户多输入单输出下行链路通信系统。我们的目标是通过联合优化接入点处的主动波束形成和RIS元素的被动波束形成向量来最大化所有用户的加权和速率。与大多数现有的工作，我们认为更实际的情况下，离散相移和不完善的信道状态信息（CSI）。具体地说，在考虑离散相移和理想CSI的情况下，我们首先开发了一种深度量化神经网络（DQNN）来同时设计主动和被动波束形成，而大多数文献都是交替设计的。然后，我们提出了一种基于DQNN的改进结构（I-DQNN），以简化每个RIS元素的控制比特大于1比特时的参数决策过程。最后，我们将提出的两种基于DQNN的算法推广到同时考虑离散相移和不完全CSI的情况。我们的仿真结果表明，两种基于DQNN的算法在理想CSI情况下比传统算法具有更好的性能，在不理想CSI情况下也具有更强的鲁棒性。摘要：Reconfigurable intelligent surface (RIS) has become a promising technology to improve wireless communication in recent years. It steers the incident signals to create a favorable propagation environment by controlling the reconfigurable passive elements with less hardware cost and lower power consumption. In this paper, we consider a RIS-aided multiuser multiple-input single-output downlink communication system. We aim to maximize the weighted sum-rate of all users by joint optimizing the active beamforming at the access point and the passive beamforming vector of the RIS elements. Unlike most existing works, we consider the more practical situation with the discrete phase shifts and imperfect channel state information (CSI). Specifically, for the situation that the discrete phase shifts and perfect CSI are considered, we first develop a deep quantization neural network (DQNN) to simultaneously design the active and passive beamforming while most reported works design them alternatively. Then, we propose an improved structure (I-DQNN) based on DQNN to simplify the parameters decision process when the control bits of each RIS element are greater than 1 bit. Finally, we extend the two proposed DQNN-based algorithms to the case that the discrete phase shifts and imperfect CSI are considered simultaneously. Our simulation results show that the two DQNN-based algorithms have better performance than traditional algorithms in the perfect CSI case, and are also more robust in the imperfect CSI case.

【12】 Nonlinear Tensor Ring Network 标题：非线性张量环网络链接：https://arxiv.org/abs/2111.06532

作者：Xiao Peng Li,Qi Liu,Hing Cheung So 摘要：最先进的深度神经网络（DNN）已广泛应用于各种实际应用，并在认知问题上取得了显著的性能。然而，DNN在体系结构上的宽度和深度的增加导致了大量参数对存储和内存成本的挑战，从而限制了DNN在资源受限的平台（如便携式设备）上的使用。通过将冗余模型转换为紧凑模型，压缩技术似乎是减少存储和内存消耗的实用解决方案。在本文中，我们发展了一个非线性张量环网络（NTRN），其中完全连接层和卷积层都通过张量环分解进行压缩。此外，为了减少压缩造成的精度损失，在压缩层内的张量收缩和卷积运算中嵌入了一个非线性激活函数。实验结果证明了所提出的NTRN在三个数据集上的有效性和优越性。MNIST、时尚MNIST和Cifar-10。摘要：The state-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems. However, the increment of DNNs' width and depth in architecture results in a huge amount of parameters to challenge the storage and memory cost, limiting to the usage of DNNs on resource-constrained platforms, such as portable devices. By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption. In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed via tensor ring decomposition. Furthermore, to mitigate the accuracy loss caused by compression, a nonlinear activation function is embedded into the tensor contraction and convolution operations inside the compressed layer. Experimental results demonstrate the effectiveness and superiority of the proposed NTRN for image classification using two basic neural networks, LeNet-5 and VGG-11 on three datasets, viz. MNIST, Fashion MNIST and Cifar-10.

【13】 AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator 标题：AnalogNets：抗噪TinyML模型的ML-HW协同设计和始终在线模拟内存计算加速器(Always-On Analog Compute-in-Memory Accelerator) 链接：https://arxiv.org/abs/2111.06503

作者：Chuteng Zhou,Fernando Garcia Redondo,Julian Büchel,Irem Boybat,Xavier Timoneda Comas,S. R. Nandakumar,Shidhartha Das,Abu Sebastian,Manuel Le Gallo,Paul N. Whatmough 摘要：物联网应用中的常开TinyML感知任务需要非常高的能效。使用非易失性存储器（NVM）的模拟内存计算（CiM）保证了高效率，并提供了独立的片上模型存储。然而，模拟CiM引入了新的实际考虑因素，包括电导漂移、读/写噪声、固定模数（ADC）转换器增益等。必须解决这些额外的限制，以实现可在模拟CiM上部署且精度损失可接受的模型。这项工作描述了$textit{AnalogNets}$:TinyML模型，用于流行的关键字定位（KWS）和视觉唤醒词（VWW）应用程序。模型体系结构是专门为模拟CiM设计的，我们详细介绍了一种全面的训练方法，以在推理时保持模拟非理想性和低精度数据转换器的准确性。我们还介绍了AON CiM，一种可编程、最小面积相变存储器（PCM）模拟CiM加速器，它采用了一种新的层串行方法，以消除与完全流水线设计相关的复杂互连成本。我们在校准的模拟器以及真实硬件上评估了模拟网络，发现KWS/VWW的PCM漂移（8位）24小时后，精度下降限制在0.8$%%$/1.2$%%$。在14nm AON CiM加速器上运行的模拟网络分别使用8位激活演示了KWS/VWW工作负载的8.58/4.37 TOPS/W，并使用$4$-位激活增加到57.39/25.69 TOPS/W。摘要：Always-on TinyML perception tasks in IoT applications require very high energy efficiency. Analog compute-in-memory (CiM) using non-volatile memory (NVM) promises high efficiency and also provides self-contained on-chip model storage. However, analog CiM introduces new practical considerations, including conductance drift, read/write noise, fixed analog-to-digital (ADC) converter gain, etc. These additional constraints must be addressed to achieve models that can be deployed on analog CiM with acceptable accuracy loss. This work describes $textit{AnalogNets}$: TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW). The model architectures are specifically designed for analog CiM, and we detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities, and low-precision data converters at inference time. We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator, with a novel layer-serial approach to remove the cost of complex interconnects associated with a fully-pipelined design. We evaluate the AnalogNets on a calibrated simulator, as well as real hardware, and find that accuracy degradation is limited to 0.8$%$/1.2$%$ after 24 hours of PCM drift (8-bit) for KWS/VWW. AnalogNets running on the 14nm AON-CiM accelerator demonstrate 8.58/4.37 TOPS/W for KWS/VWW workloads using 8-bit activations, respectively, and increasing to 57.39/25.69 TOPS/W with $4$-bit activations.

【14】 Molecular Dynamics Simulations on Cloud Computing and Machine Learning Platforms 标题：云计算和机器学习平台上的分子动力学模拟链接：https://arxiv.org/abs/2111.06466

作者：Prateek Sharma,Vikram Jadhao 机构：Intelligent Systems Engineering, N. Woodlawn Avenue, Indiana University, Bloomington, Indiana 备注：4 pages, position paper appearing in the Proceedings of the 2021 IEEE 14th International Conference on Cloud Computing (CLOUD) 摘要：科学计算应用从超级计算机等高性能计算基础设施中受益匪浅。然而，我们看到了这些应用程序的计算结构、设计和需求的范式转变。数据驱动和机器学习方法越来越多地被用于支持、加速和增强科学计算应用，特别是分子动力学模拟。同时，云计算平台对科学计算越来越有吸引力，它提供了“无限”的计算能力、更简单的编程和部署模型，以及对计算加速器（如TPU（张量处理单元））的访问。机器学习（ML）和云计算的融合为云和系统研究人员带来了激动人心的机会。ML辅助分子动力学模拟是一种新的工作负载，具有独特的计算模式。这些模拟为低成本和高性能执行带来了新的挑战。我们认为暂时性的云资源，例如低成本的可抢占云虚拟机，可以成为这种新工作负载的可行平台。最后，我们介绍了云资源管理方面的一些低挂成果和长期挑战，以及分子动力学模拟与ML平台（如TensorFlow）的集成。摘要：Scientific computing applications have benefited greatly from high performance computing infrastructure such as supercomputers. However, we are seeing a paradigm shift in the computational structure, design, and requirements of these applications. Increasingly, data-driven and machine learning approaches are being used to support, speed-up, and enhance scientific computing applications, especially molecular dynamics simulations. Concurrently, cloud computing platforms are increasingly appealing for scientific computing, providing "infinite" computing powers, easier programming and deployment models, and access to computing accelerators such as TPUs (Tensor Processing Units). This confluence of machine learning (ML) and cloud computing represents exciting opportunities for cloud and systems researchers. ML-assisted molecular dynamics simulations are a new class of workload, and exhibit unique computational patterns. These simulations present new challenges for low-cost and high-performance execution. We argue that transient cloud resources, such as low-cost preemptible cloud VMs, can be a viable platform for this new workload. Finally, we present some low-hanging fruits and long-term challenges in cloud resource management, and the integration of molecular dynamics simulations into ML platforms (such as TensorFlow).

【15】 Variability-Aware Training and Self-Tuning of Highly Quantized DNNs for Analog PIM 标题：用于模拟PIM的高量化DNN的可变性感知训练和自调优链接：https://arxiv.org/abs/2111.06457

作者：Zihao Deng,Michael Orshansky 机构：Department of Electrical and Computer Engineering, University of Texas at Austin, Austin TX, USA 备注：This is the preprint version of our paper accepted in DATE 2022 摘要：部署在模拟内存处理（PIM）体系结构上的DNN受制造时间变化的影响。我们为基于PIM的高量化模拟模型开发了一种新的联合可变性和量化感知DNN训练算法，该算法比以前的工作更有效。在多个计算机视觉数据集/模型上，它优于不经意变化和训练后量化模型。对于低位宽模型和高变化，ResNet-18在最佳替代方案上的精度增益高达35.7%。我们证明，在芯片内和芯片间组件可变性的现实模式下，单独训练无法防止较大的DNN精度损失（在CIFAR-100/ResNet-18上高达54%）。我们介绍了一种自校正DNN体系结构，该体系结构可以在推理过程中动态调整分层激活，并有效地将精度损失降低到10%以下。摘要：DNNs deployed on analog processing in memory (PIM) architectures are subject to fabrication-time variability. We developed a new joint variability- and quantization-aware DNN training algorithm for highly quantized analog PIM-based models that is significantly more effective than prior work. It outperforms variability-oblivious and post-training quantized models on multiple computer vision datasets/models. For low-bitwidth models and high variation, the gain in accuracy is up to 35.7% for ResNet-18 over the best alternative. We demonstrate that, under a realistic pattern of within- and between-chip components of variability, training alone is unable to prevent large DNN accuracy loss (of up to 54% on CIFAR-100/ResNet-18). We introduce a self-tuning DNN architecture that dynamically adjusts layer-wise activations during inference and is effective in reducing accuracy loss to below 10%.

【16】 Observation Error Covariance Specification in Dynamical Systems for Data assimilation using Recurrent Neural Networks 标题：基于递归神经网络的动力系统数据同化观测误差协方差规范链接：https://arxiv.org/abs/2111.06447

作者：Sibo Cheng,Mingming Qiu 机构： Data Science Instituite, Department of computing, Imperial College, London, UK, Institut Polytechnique de Paris, France, EDF R&D, France, Accepted for publication in Neural computing and applications 备注：The manuscript is accepted for publication in Neural computing and applications 摘要：基于时间序列观测数据，数据同化技术被广泛用于预测具有不确定性的复杂动力系统。误差协方差矩阵建模是数据同化算法中的一个重要组成部分，它对预测精度有很大影响。这些协方差的估计通常依赖于经验假设和物理约束，尤其是对于大尺寸系统，通常不精确且计算成本高。在这项工作中，我们提出了一种基于长短时记忆（LSTM）递归神经网络（RNN）的数据驱动方法，以提高动力系统数据同化中观测协方差规范的准确性和效率。与经典的后验校正方法不同，该方法从观测/模拟的时间序列数据中学习协方差矩阵，不需要任何关于先验误差分布的知识或假设。我们将这种新方法与两种最先进的协方差调整算法，即DI01和D05进行了比较，首先是在Lorenz动力系统中，然后是在具有不同协方差参数化的2D浅水孪生实验框架中，使用集合同化。这种新方法在观测协方差规范、同化精度和计算效率方面显示出显著的优势。摘要：Data assimilation techniques are widely used to predict complex dynamical systems with uncertainties, based on time-series observation data. Error covariance matrices modelling is an important element in data assimilation algorithms which can considerably impact the forecasting accuracy. The estimation of these covariances, which usually relies on empirical assumptions and physical constraints, is often imprecise and computationally expensive especially for systems of large dimension. In this work, we propose a data-driven approach based on long short term memory (LSTM) recurrent neural networks (RNN) to improve both the accuracy and the efficiency of observation covariance specification in data assimilation for dynamical systems. Learning the covariance matrix from observed/simulated time-series data, the proposed approach does not require any knowledge or assumption about prior error distribution, unlike classical posterior tuning methods. We have compared the novel approach with two state-of-the-art covariance tuning algorithms, namely DI01 and D05, first in a Lorenz dynamical system and then in a 2D shallow water twin experiments framework with different covariance parameterization using ensemble assimilation. This novel method shows significant advantages in observation covariance specification, assimilation accuracy and computational efficiency.

其他(13篇)

【1】 Speeding Up Entmax 标题：加速Entmax 链接：https://arxiv.org/abs/2111.06832

作者：Maxat Tezekbayev,Vassilina Nikoulina,Matthias Gallé,Zhenisbek Assylbekov 机构：School of Sciences and Humanities, Nazarbayev University, NAVER Labs Europe 备注：8 pages, 6 figures 摘要：Softmax是现代神经网络中用于语言处理的规范化逻辑的事实标准。然而，通过产生密集的概率分布，词汇表中的每个标记在每个生成步骤中被选择的概率都不是零，这导致了文本生成中出现的各种问题$arXiv:1905.05702的alpha$-entmax解决了这个问题，但比softmax慢很多。在本文中，我们提出了一个替代$ 阿尔法$ -EnthMax，它保持其良好的特性，但与优化的SOFTMax一样快，达到机器翻译任务的PAR或更好的性能。摘要：Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. $alpha$-entmax of arXiv:1905.05702 solves this problem, but is considerably slower than softmax. In this paper, we propose an alternative to $alpha$-entmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.

【2】 NRC-GAMMA: Introducing a Novel Large Gas Meter Image Dataset 标题：NRC-GAMMA：引入一种新的大型煤气表图像数据集链接：https://arxiv.org/abs/2111.06827

作者：Ashkan Ebadi,Patrick Paul,Sofia Auer,Stéphane Tremblay 机构： National Research Council Canada, Montreal, QC H,T ,B, Canada, National Research Council Canada, Ottawa, ON K,K ,E, Canada 备注：12 pages, 7 figures, 1 table 摘要：自动抄表技术尚未普及。天然气、电力或水累积仪表读数大多由操作员或业主在现场手动完成。在某些国家/地区，运营商通过与其他运营商进行离线检查和/或在发生冲突或投诉时使用照片作为证据，将照片作为阅读证明，以确认阅读。整个过程耗时、昂贵，而且容易出错。自动化可以优化和促进此类劳动密集型和容易出现人为错误的流程。随着人工智能和计算机视觉领域的最新进展，自动抄表系统比以往任何时候都更加可行。受人工智能领域最新进展的推动，受研究界开源开放获取计划的启发，我们引入了一个新的大型基准数据集，即真实气体流量计图像，名为NRC-GAMMA数据集。数据是在2020年1月20日上午00:05到晚上11:59之间从Itron 400A隔膜式燃气表收集的。我们采用了一种系统的方法来标记图像，验证标签，并确保注释的质量。该数据集包含整个煤气表的28883幅图像，以及左、右刻度盘显示的57766幅裁剪图像。我们希望NRC-GAMMA数据集有助于研究团体设计和实施准确、创新、智能和可再生的自动燃气表读数解决方案。摘要：Automatic meter reading technology is not yet widespread. Gas, electricity, or water accumulation meters reading is mostly done manually on-site either by an operator or by the homeowner. In some countries, the operator takes a picture as reading proof to confirm the reading by checking offline with another operator and/or using it as evidence in case of conflicts or complaints. The whole process is time-consuming, expensive, and prone to errors. Automation can optimize and facilitate such labor-intensive and human error-prone processes. With the recent advances in the fields of artificial intelligence and computer vision, automatic meter reading systems are becoming more viable than ever. Motivated by the recent advances in the field of artificial intelligence and inspired by open-source open-access initiatives in the research community, we introduce a novel large benchmark dataset of real-life gas meter images, named the NRC-GAMMA dataset. The data were collected from an Itron 400A diaphragm gas meter on January 20, 2020, between 00:05 am and 11:59 pm. We employed a systematic approach to label the images, validate the labellings, and assure the quality of the annotations. The dataset contains 28,883 images of the entire gas meter along with 57,766 cropped images of the left and the right dial displays. We hope the NRC-GAMMA dataset helps the research community to design and implement accurate, innovative, intelligent, and reproducible automatic gas meter reading solutions.

【3】 Explainability and the Fourth AI Revolution 标题：可解释性与第四次人工智能革命链接：https://arxiv.org/abs/2111.06773

作者：Loizos Michael 机构： INNOVATION AND ENTREPRENEURSHIP” to be published by Edward Elgar Publishing Loizos Michael Open University of Cyprus & CYENS Center of Excellence loizos 摘要：本章从数据组织自动化过程的角度讨论人工智能，并举例说明可解释性在从当前一代人工智能系统转移到下一代人工智能系统中所起的作用，在这里，人类的角色从为人工智能系统工作的数据注释者提升到与人工智能系统工作的协作者。摘要：This chapter discusses AI from the prism of an automated process for the organization of data, and exemplifies the role that explainability has to play in moving from the current generation of AI systems to the next one, where the role of humans is lifted from that of data annotators working for the AI systems to that of collaborators working with the AI systems.

【4】 Neural Motion Planning for Autonomous Parking 标题：自主停车的神经运动规划链接：https://arxiv.org/abs/2111.06739

作者：Dongchan Kim,Kunsoo Huh 机构：Dongchan Kim and Kunsoo Huh are with the Department of Automo-tive Engineering, Hanyang University 备注：8 pages, 11 figures 摘要：本文提出了一种将深度生成网络与传统运动规划方法相结合的混合运动规划策略。现有的规划方法，如A*和混合A*广泛应用于路径规划任务中，因为它们能够在复杂环境中确定可行路径；然而，它们在效率方面有局限性。为了克服这些限制，提出了一种基于神经网络的路径规划算法，即神经混合a*。本文提出使用条件变分自动编码器（CVAE）来引导搜索算法，利用CVAE在给定停车环境信息的情况下学习规划空间信息的能力。基于演示中学习到的可行轨迹分布，采用非均匀展开策略。该方法有效地学习了给定状态的表示，并在算法性能方面有所改进。摘要：This paper presents a hybrid motion planning strategy that combines a deep generative network with a conventional motion planning method. Existing planning methods such as A* and Hybrid A* are widely used in path planning tasks because of their ability to determine feasible paths even in complex environments; however, they have limitations in terms of efficiency. To overcome these limitations, a path planning algorithm based on a neural network, namely the neural Hybrid A*, is introduced. This paper proposes using a conditional variational autoencoder (CVAE) to guide the search algorithm by exploiting the ability of CVAE to learn information about the planning space given the information of the parking environment. A non-uniform expansion strategy is utilized based on a distribution of feasible trajectories learned in the demonstrations. The proposed method effectively learns the representations of a given state, and shows improvement in terms of algorithm performance.

【5】 The Science of Rejection: A Research Area for Human Computation 标题：拒绝科学：人类计算的研究领域链接：https://arxiv.org/abs/2111.06736

作者：Burcu Sayin,Jie Yang,Andrea Passerini,Fabio Casati 机构：University of Trento, Via Calepina, Trento TN, Italy, Delft University of Technology, Mekelweg , CD Delft, Netherlands, Servicenow, Santa Clara, CA, USA 备注：To appear in the Proceedings of The 9th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2021) 摘要：我们激发了为什么学习拒绝模型预测的科学是ML的核心，以及为什么人类计算在这项工作中起主导作用。摘要：We motivate why the science of learning to reject model predictions is central to ML, and why human computation has a lead role in this effort.

【6】 BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML 标题：BSC：基于挡路的随机计算，实现准确高效的TinyML 链接：https://arxiv.org/abs/2111.06686

作者：Yuhong Song,Edwin Hsing-Mean Sha,Qingfeng Zhuge,Rui Xu,Yongzhuo Zhang,Bingzhe Li,Lei Yang 机构： East China Normal University, Oklahoma State University, University of New Mexico 备注：Accept by ASP-DAC 2022 摘要：随着人工智能民主化的发展，机器学习（ML）已成功应用于边缘应用，如智能手机和自动驾驶。如今，越来越多的应用需要在资源极其有限的微型设备上使用ML，比如植入式心律转复除颤器（ICD），它被称为TinyML。与边缘的ML不同，能量供应有限的TinyML对低功耗执行有更高的要求。使用位流表示数据的随机计算（SC）在TinyML中很有前途，因为它可以使用简单的逻辑门执行基本的ML运算，而不是复杂的二进制加法器和乘法器。然而，由于数据精度低和运算单元不准确，SC通常会受到ML任务精度低的影响。在现有作品中增加比特流的长度可以缓解精度问题，但会导致更高的延迟。在这项工作中，我们提出了一种新的SC体系结构，即基于块的随机计算（BSC）。BSC将输入分成块，这样就可以通过利用高数据并行性来减少延迟。此外，还提出了优化运算单元和输出修正（OUR）方案来提高精度。在此基础上，设计了一种全局优化方法来确定块的数量，这可以更好地权衡延迟功率。实验结果表明，BSC在ML任务的准确率提高10%以上，功耗降低6倍以上方面优于现有设计。摘要：Along with the progress of AI democratization, machine learning (ML) has been successfully applied to edge applications, such as smart phones and automated driving. Nowadays, more applications require ML on tiny devices with extremely limited resources, like implantable cardioverter defibrillator (ICD), which is known as TinyML. Unlike ML on the edge, TinyML with a limited energy supply has higher demands on low-power execution. Stochastic computing (SC) using bitstreams for data representation is promising for TinyML since it can perform the fundamental ML operations using simple logical gates, instead of the complicated binary adder and multiplier. However, SC commonly suffers from low accuracy for ML tasks due to low data precision and inaccuracy of arithmetic units. Increasing the length of the bitstream in the existing works can mitigate the precision issue but incur higher latency. In this work, we propose a novel SC architecture, namely Block-based Stochastic Computing (BSC). BSC divides inputs into blocks, such that the latency can be reduced by exploiting high data parallelism. Moreover, optimized arithmetic units and output revision (OUR) scheme are proposed to improve accuracy. On top of it, a global optimization approach is devised to determine the number of blocks, which can make a better latency-power trade-off. Experimental results show that BSC can outperform the existing designs in achieving over 10% higher accuracy on ML tasks and over 6 times power reduction.

【7】 Fully Automatic Page Turning on Real Scores 标题：全自动页面打开真实分数链接：https://arxiv.org/abs/2111.06643

作者：Florian Henkel,Stephanie Schwaiger,Gerhard Widmer 机构： Institute of Computational Perception, Johannes Kepler University, Linz, Austria, LIT Artificial Intelligence Lab, Linz Institute of Technology, Austria 备注：ISMIR 2021 Late Breaking/Demo 摘要：我们提出了一个自动翻页系统的原型，该系统直接处理真实分数，即纸张图像，无需任何符号表示。我们的系统基于一个多模态神经网络架构，它观察一个完整的图像页面作为输入，聆听传入的音乐表演，并预测图像中相应的位置。使用我们系统的位置估计，我们使用一种简单的启发式方法，一旦到达纸张图像中的某个位置，就触发翻页事件。作为概念证明，我们进一步将我们的系统与实际机器结合起来，实际机器将根据命令翻开新的一页。摘要：We present a prototype of an automatic page turning system that works directly on real scores, i.e., sheet images, without any symbolic representation. Our system is based on a multi-modal neural network architecture that observes a complete sheet image page as input, listens to an incoming musical performance, and predicts the corresponding position in the image. Using the position estimation of our system, we use a simple heuristic to trigger a page turning event once a certain location within the sheet image is reached. As a proof of concept we further combine our system with an actual machine that will physically turn the page on command.

【8】 Distributed Sparse Regression via Penalization 标题：基于惩罚的分布式稀疏回归链接：https://arxiv.org/abs/2111.06530

作者：Yao Ji,Gesualdo Scutari,Ying Sun,Harsha Honnappa 机构：School of Industrial Engineering, Purdue University, West Lafayette, IN , USA, School of Electrical Engineering and Computer Science, The Pennsylvania State University, State College, PA , USA 备注：63 pages, journal publication 摘要：我们研究代理网络上的稀疏线性回归，建模为无向图（没有集中节点）。估计问题被表述为局部LASSO损失函数之和的最小化加上一致性约束的二次惩罚——后者有助于获得分布式解方法。虽然基于惩罚的一致性方法在优化文献中得到了广泛的研究，但它们在高维环境中的统计和计算保证仍然不清楚。这项工作为这个开放性问题提供了答案。我们的贡献是双重的。首先，我们建立了估计量的统计一致性：在适当选择惩罚参数的情况下，惩罚问题的最优解在$ellu 2$-损失中达到接近最优的极大极小率$mathcal{O}（slog d/N）$，其中，$s$是稀疏值，$d$是环境维数，$N$是网络中的总样本量——这与集中样本率相匹配。第二，我们证明了应用于惩罚问题的近似梯度算法，这自然会导致分布式实现，线性收敛到集中统计误差阶的容差——速率标度为$mathcal{O}（d）$，数值结果表明了导出的采样率和收敛速度标度的严密性。摘要：We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $mathcal{O}(s log d/N)$ in $ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.

【9】 Fair AutoML 标题：公平的AutoML 链接：https://arxiv.org/abs/2111.06495

作者：Qingyun Wu,Chi Wang 机构： 20 20) found that 86% of financial 1Pennsylvania State University (part of the work is done whenthe author is at Microsoft Research) 备注：14 pages (including 2 pages of appendix), 8 figures 摘要：我们提出了一个端到端的自动机器学习系统，以发现机器学习模型不仅具有良好的预测精度，而且公平。出于以下原因，该系统是可取的。（1）与传统的AutoML系统相比，该系统将公平性评估和不公平性缓解有机地结合在一起，使得量化尝试的机器学习模型的公平性并在必要时缓解其不公平性成为可能。（2）该系统被设计为具有良好的随时“公平”性能，例如满足必要公平约束的模型的准确性。为了实现这一点，该系统包括一个策略，根据预测精度、公平性和动态资源消耗动态决定何时以及在何种模型上进行不公平缓解。（3）该系统使用灵活。它可以与大多数现有的公平性度量和不公平性缓解方法一起使用。摘要：We present an end-to-end automated machine learning system to find machine learning models not only with good prediction accuracy but also fair. The system is desirable for the following reasons. (1) Comparing to traditional AutoML systems, this system incorporates fairness assessment and unfairness mitigation organically, which makes it possible to quantify fairness of the machine learning models tried and mitigate their unfairness when necessary. (2) The system is designed to have a good anytime `fair' performance, such as accuracy of a model satisfying necessary fairness constraints. To achieve it, the system includes a strategy to dynamically decide when and on which models to conduct unfairness mitigation according to the prediction accuracy, fairness and the resource consumption on the fly. (3) The system is flexible to use. It can be used together with most of the existing fairness metrics and unfairness mitigation methods.

【10】 SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets 标题：SynthBio：文本数据集人工智能协同生成的实例研究链接：https://arxiv.org/abs/2111.06467

作者：Ann Yuan,Daphne Ippolito,Vitaly Nikolaev,Chris Callison-Burch,Andy Coenen,Sebastian Gehrmann 机构：Google Research, University of Pennsylvania 备注：10 pages, 2 figures, accepted to NeurIPS 2021 Datasets and Benchmarks Track 摘要：NLP研究人员需要更多、更高质量的文本数据集。人类标记的数据集的收集成本很高，而通过自动检索从网络（如WikiBio）收集的数据集很嘈杂，可能包含不必要的偏见。此外，来自网络的数据通常包含在用于预训练模型的数据集中，导致训练集和测试集的无意交叉污染。在这项工作中，我们介绍了一种高效数据集整理的新方法：我们使用大型语言模型为人类评分员提供种子代，从而将数据集编写从编写任务更改为编辑任务。我们使用我们的方法来策划SynthBio——WikiBio的一个新评估集——由描述虚构个体的结构化属性列表组成，映射到自然语言传记。我们发现，我们的虚构传记数据集比维基百科的噪音小，而且在性别和国籍方面更平衡。摘要：NLP researchers need more, higher-quality text datasets. Human-labeled datasets are expensive to collect, while datasets collected via automatic retrieval from the web such as WikiBio are noisy and can include undesired biases. Moreover, data sourced from the web is often included in datasets used to pretrain models, leading to inadvertent cross-contamination of training and test sets. In this work we introduce a novel method for efficient dataset curation: we use a large language model to provide seed generations to human raters, thereby changing dataset authoring from a writing task to an editing task. We use our method to curate SynthBio - a new evaluation set for WikiBio - composed of structured attribute lists describing fictional individuals, mapped to natural language biographies. We show that our dataset of fictional biographies is less noisy than WikiBio, and also more balanced with respect to gender and nationality.

【11】 Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication 标题：噪音的催化作用和诱导偏向在作文交际中的必要性链接：https://arxiv.org/abs/2111.06464

作者：Łukasz Kuciński,Tomasz Korbak,Paweł Kołodziej,Piotr Miłoś 机构：University of Sussex, Polish Academy of Sciences†, Piotr Miło´s, University of Oxford, deepsense.ai 备注：NeurIPS 2021 摘要：如果复杂信号可以表示为更简单子部分的组合，则通信是合成的。在这篇文章中，我们从理论上证明，发展组合通信需要训练框架和数据上的归纳偏差。此外，我们还证明了在信号博弈中，合成性是自发产生的，在这种博弈中，代理通过噪声信道进行通信。我们通过实验证实，一系列噪声水平（取决于模型和数据）确实促进了合成。最后，我们提供了对这种依赖性的全面研究，并根据最近研究的组成性度量报告了结果：地形相似性、冲突计数和上下文独立性。摘要：Communication is compositional if complex signals can be represented as a combination of simpler subparts. In this paper, we theoretically show that inductive biases on both the training framework and the data are needed to develop a compositional communication. Moreover, we prove that compositionality spontaneously arises in the signaling games, where agents communicate over a noisy channel. We experimentally confirm that a range of noise levels, which depends on the model and the data, indeed promotes compositionality. Finally, we provide a comprehensive study of this dependence and report results in terms of recently studied compositionality metrics: topographical similarity, conflict count, and context independence.

【12】 Differential privacy and robust statistics in high dimensions 标题：差异化隐私和高维稳健统计链接：https://arxiv.org/abs/2111.06578

作者：Xiyang Liu,Weihao Kong,Sewoong Oh 机构：AllenSchoolofComputerScience&Engineering, UniversityofWashington 摘要：我们引入了一个通用的框架来描述具有差异隐私保证的统计估计问题的统计效率。我们的框架，我们称之为高维建议测试发布（HPTR），建立在三个关键组件之上：指数机制、稳健统计和建议测试发布机制。将所有这些结合在一起的是弹性的概念，它是稳健统计估计的核心。弹性指导算法的设计、敏感性分析和测试发布中测试步骤的成功概率分析。关键的洞察是，如果我们设计一种指数机制，只通过一维稳健统计数据访问数据，那么由此产生的局部敏感性可以显著降低。使用弹性，我们可以提供严格的局部敏感度界限。在某些情况下，这些紧边界很容易转化为接近最优的效用保证。我们给出了将HPTR应用于统计估计问题给定实例的一般方法，并在均值估计、线性回归、协方差估计和主成分分析的典型问题上进行了演示。我们介绍了一种通用的效用分析技术，证明了在文献中研究的几种情况下，HPTR几乎达到了最佳样本复杂度。摘要：We introduce a universal framework for characterizing the statistical efficiency of a statistical estimation problem with differential privacy guarantees. Our framework, which we call High-dimensional Propose-Test-Release (HPTR), builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism. Gluing all these together is the concept of resilience, which is central to robust statistical estimation. Resilience guides the design of the algorithm, the sensitivity analysis, and the success probability analysis of the test step in Propose-Test-Release. The key insight is that if we design an exponential mechanism that accesses the data only via one-dimensional robust statistics, then the resulting local sensitivity can be dramatically reduced. Using resilience, we can provide tight local sensitivity bounds. These tight bounds readily translate into near-optimal utility guarantees in several cases. We give a general recipe for applying HPTR to a given instance of a statistical estimation problem and demonstrate it on canonical problems of mean estimation, linear regression, covariance estimation, and principal component analysis. We introduce a general utility analysis technique that proves that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.

【13】 MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification 标题：MultiSV：用于远场多通道说话人确认的数据集链接：https://arxiv.org/abs/2111.06458

作者：Ladislav Mošner,Oldřich Plchot,Lukáš Burget,Jan Černocký 机构： Jan “Honza” ˇCernock´yBrno University of Technology 备注：Submitted to ICASSP 2022 摘要：受数据不整合和该领域缺乏标准基准的影响，我们补充了我们之前的工作，并提出了一个用于训练和评估文本无关多通道说话人验证系统的综合语料库。它也可以很容易地用于去冗余、去噪和语音增强的实验。我们通过在Voxceleb数据集的干净部分上使用数据模拟来解决始终存在的缺少多通道训练数据的问题。开发和评估试验基于复杂环境设置（Voices）语料库中模糊的重发语音，我们对其进行了修改，以提供多通道试验。我们发布了从公共来源创建数据集的完整配方作为MultiSV语料库，并提供了两个基于神经网络波束形成的多通道说话人验证系统的结果，该系统基于预测理想的二进制掩码或最近的Conv-TasNet。摘要：Motivated by unconsolidated data situation and the lack of a standard benchmark in the field, we complement our previous efforts and present a comprehensive corpus designed for training and evaluating text-independent multi-channel speaker verification systems. It can be readily used also for experiments with dereverberation, denoising, and speech enhancement. We tackled the ever-present problem of the lack of multi-channel training data by utilizing data simulation on top of clean parts of the Voxceleb dataset. The development and evaluation trials are based on a retransmitted Voices Obscured in Complex Environmental Settings (VOiCES) corpus, which we modified to provide multi-channel trials. We publish full recipes that create the dataset from public sources as the MultiSV corpus, and we provide results with two of our multi-channel speaker verification systems with neural network-based beamforming based either on predicting ideal binary masks or the more recent Conv-TasNet.

linux https 网络安全批量计算机器学习

0 人点赞