机器学习学术速递[12.15]

cs.LG 方向，今日共计99篇

Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】 Robust Graph Neural Networks via Probabilistic Lipschitz Constraints 标题：基于概率Lipschitz约束的鲁棒图神经网络链接：https://arxiv.org/abs/2112.07575

作者：Raghu Arghal,Eric Lei,Shirin Saeedi Bidokhti 机构：Dept. of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 摘要：最近，图形神经网络（GNN）已被证明在各种基于网络的任务（如分散控制和资源分配）上表现良好，并为这些任务提供了计算效率高的方法，这些任务在这方面历来具有挑战性。然而，像许多基于神经网络的系统一样，GNN容易受到其输入上的移位和扰动的影响，这可能包括节点属性和图结构。为了使它们对实际应用程序更有用，在部署后确保它们的健壮性非常重要。通过控制GNN滤波器的Lipschitz常数与节点属性的关系，我们提出了约束GNN滤波器组的频率响应的方法。我们将此公式扩展到使用连续频率响应约束的动态图设置，并通过场景方法解决问题的松弛变量。这允许在采样约束上使用相同的计算效率高的算法，该算法使用场景优化结果为GNN的稳定性提供PAC风格的保证。我们还强调了这种设置和GNN对图扰动的稳定性之间的重要联系，并提供了实验结果，证明了我们方法的有效性和广泛性。摘要：Graph neural networks (GNNs) have recently been demonstrated to perform well on a variety of network-based tasks such as decentralized control and resource allocation, and provide computationally efficient methods for these tasks which have traditionally been challenging in that regard. However, like many neural-network based systems, GNNs are susceptible to shifts and perturbations on their inputs, which can include both node attributes and graph structure. In order to make them more useful for real-world applications, it is important to ensure their robustness post-deployment. Motivated by controlling the Lipschitz constant of GNN filters with respect to the node attributes, we propose to constrain the frequency response of the GNN's filter banks. We extend this formulation to the dynamic graph setting using a continuous frequency response constraint, and solve a relaxed variant of the problem via the scenario approach. This allows for the use of the same computationally efficient algorithm on sampled constraints, which provides PAC-style guarantees on the stability of the GNN using results in scenario optimization. We also highlight an important connection between this setup and GNN stability to graph perturbations, and provide experimental results which demonstrate the efficacy and broadness of our approach.

【2】 Anti-Money Laundering Alert Optimization Using Machine Learning with Graphs 标题：基于图论机器学习的反洗钱预警优化链接：https://arxiv.org/abs/2112.07508

作者：Ahmad Naser Eddin,Jacopo Bono,David Aparício,David Polido,João Tiago Ascensão,Pedro Bizarro,Pedro Ribeiro 机构： Feedzai, Portugal, DCC-FCUP, University of Porto, Portugal 备注：8 pages, 5 figures 摘要：洗钱是一个全球性问题，涉及毒品交易、人口贩运或腐败等严重重罪（每年1.7-4万亿欧元）收益的合法化。金融机构部署的反洗钱系统通常包括与监管框架相一致的规则。调查人员审查警报并报告可疑案件。这类系统的假阳性率很高，破坏了它们的有效性，并导致了高昂的运营成本。我们提出了一个机器学习分类模型，它补充了基于规则的系统，并学习准确预测警报的风险。我们的模型使用以实体为中心的工程特征和以基于图形的特征形式表征实体间关系的属性。我们利用时间窗口构建动态图，优化时间和空间效率。我们在真实世界的银行数据集上验证了我们的模型，并展示了分类模型如何在检测90%以上的真阳性的同时将误报数量减少80%。通过这种方式，我们的模式可以显著改善反洗钱行动。摘要：Money laundering is a global problem that concerns legitimizing proceeds from serious felonies (1.7-4 trillion euros annually) such as drug dealing, human trafficking, or corruption. The anti-money laundering systems deployed by financial institutions typically comprise rules aligned with regulatory frameworks. Human investigators review the alerts and report suspicious cases. Such systems suffer from high false-positive rates, undermining their effectiveness and resulting in high operational costs. We propose a machine learning triage model, which complements the rule-based system and learns to predict the risk of an alert accurately. Our model uses both entity-centric engineered features and attributes characterizing inter-entity relations in the form of graph-based features. We leverage time windows to construct the dynamic graph, optimizing for time and space efficiency. We validate our model on a real-world banking dataset and show how the triage model can reduce the number of false positives by 80% while detecting over 90% of true positives. In this way, our model can significantly improve anti-money laundering operations.

【3】 Graph Kernel Neural Networks 标题：图核神经网络链接：https://arxiv.org/abs/2112.07436

作者：Luca Cosmo,Giorgia Minello,Michael Bronstein,Emanuele Rodolà,Luca Rossi,Andrea Torsello 机构：Ca’ Foscari University of Venice, Italy, USI University of Lugano, Switzerland, IESE Business School, Spain, Twitter, United Kingdom, Imperial College of London, United Kingdom, Emanuele Rodola, Sapienza University of Rome, Italy 摘要：在许多现代神经结构的核心中，卷积算子可以有效地看作是在输入矩阵和滤波器之间执行点积。虽然这很容易适用于图像等数据，图像可以在欧几里德空间中表示为规则网格，但将卷积算子扩展到图形上更具挑战性，因为它们的结构不规则。在本文中，我们建议使用图核，即计算图的内积的核函数，将标准卷积算子扩展到图域。这允许我们定义一个完全结构化的模型，不需要计算输入图的嵌入。我们的体系结构允许插入任何类型和数量的图形内核，并具有额外的好处，即提供在训练过程中学习的结构掩码的一些可解释性，类似于传统卷积神经网络中卷积掩码的情况。我们进行了广泛的烧蚀研究，以调查模型超参数的影响，并表明我们的模型在标准图分类数据集上取得了具有竞争力的性能。摘要：The convolution operator at the core of many modern neural architectures can effectively be seen as performing a dot product between an input matrix and a filter. While this is readily applicable to data such as images, which can be represented as regular grids in the Euclidean space, extending the convolution operator to work on graphs proves more challenging, due to their irregular structure. In this paper, we propose to use graph kernels, i.e., kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type and number of graph kernels and has the added benefit of providing some interpretability in terms of the structural masks that are learned during the training process, similarly to what happens for convolutional masks in traditional convolutional neural networks. We perform an extensive ablation study to investigate the impact of the model hyper-parameters and we show that our model achieves competitive performance on standard graph classification datasets.

【4】 Improving Spectral Graph Convolution for Learning Graph-level Representation 标题：改进谱图卷积学习图级表示链接：https://arxiv.org/abs/2112.07160

作者：Mingqi Yang,Rui Li,Yanming Shen,Heng Qi,Baocai Yin 摘要：从最初理论上定义良好的谱图卷积到随后的基于空间的消息传递模型，空间局部性（顶点域）是大多数图神经网络（GNN）的基本原理。在谱图卷积中，滤波器用多项式近似，其中$k$阶多项式覆盖$k$跳邻居。在消息传递中，聚合中使用的各种邻居定义实际上是对空间位置信息的广泛探索。对于学习节点表示，拓扑距离似乎是必要的，因为它表征了节点之间的基本关系。然而，为了学习整个图的表示，是否仍然需要保持？在这项工作中，我们表明，这样的原则是没有必要的，它阻碍了大多数现有的GNN有效地编码图形结构。通过消除它以及多项式滤波器的限制，由此产生的新体系结构显著提高了学习图表示的性能。我们还研究了图形频谱对信号的影响，并将现有的各种改进解释为不同的频谱平滑技术。它作为一种空间理解，与众所周知的高/低通滤波器频谱理解相比，定量测量频谱对输入信号的影响。更重要的是，它为开发强大的图形表示模型提供了帮助。摘要：From the original theoretically well-defined spectral graph convolution to the subsequent spatial bassed message-passing model, spatial locality (in vertex domain) acts as a fundamental principle of most graph neural networks (GNNs). In the spectral graph convolution, the filter is approximated by polynomials, where a $k$-order polynomial covers $k$-hop neighbors. In the message-passing, various definitions of neighbors used in aggregations are actually an extensive exploration of the spatial locality information. For learning node representations, the topological distance seems necessary since it characterizes the basic relations between nodes. However, for learning representations of the entire graphs, is it still necessary to hold? In this work, we show that such a principle is not necessary, it hinders most existing GNNs from efficiently encoding graph structures. By removing it, as well as the limitation of polynomial filters, the resulting new architecture significantly boosts performance on learning graph representations. We also study the effects of graph spectrum on signals and interpret various existing improvements as different spectrum smoothing techniques. It serves as a spatial understanding that quantitatively measures the effects of the spectrum to input signals in comparison to the well-known spectral understanding as high/low-pass filters. More importantly, it sheds the light on developing powerful graph representation models.

【5】 Graph network for simultaneous learning of forward and inverse physics 标题：用于正反物理同时学习的图网络链接：https://arxiv.org/abs/2112.07054

作者：Sakthi Kumar Arul Prakash,Conrad Tucker 机构：Department of Mechanical Engineering, Carnegie Mellon University, Department of Machine Learning, Carnegie Mellon University, The Robotics Institute, Carnegie Mellon University, CyLab Security and Privacy Institute, Carnegie Mellon University 摘要：在这项工作中，我们提出了一个端到端的图形网络，该网络使用可解释的归纳偏差学习基于粒子的物理的正向和反向模型。物理信息神经网络通常通过特定问题的正则化和损失函数来解决特定问题。这种显式学习会使网络偏向于学习特定于数据的模式，可能需要改变损失函数或神经网络结构，从而限制其通用性。虽然最近的研究提出了图形网络来研究正向动力学，但它们依赖于粒子特定的参数（如质量等）来近似系统的动力学。我们的图网络通过学习解决多个任务而隐含地有偏差，从而在任务之间共享表示，以便学习正向动力学以及推断未知粒子特性的概率分布。我们评估了我们的方法在不同数据集的一步下一步状态预测任务，这些数据集具有不同的粒子相互作用。我们与相关数据驱动物理学习方法的比较表明，我们的模型能够以至少一个数量级的更高精度预测正向动力学。我们还表明，我们的方法能够使用数量级较少的样本恢复未知物理参数的多模概率分布。摘要：In this work, we propose an end-to-end graph network that learns forward and inverse models of particle-based physics using interpretable inductive biases. Physics-informed neural networks are often engineered to solve specific problems through problem-specific regularization and loss functions. Such explicit learning biases the network to learn data specific patterns and may require a change in the loss function or neural network architecture hereby limiting their generalizabiliy. While recent studies have proposed graph networks to study forward dynamics, they rely on particle specific parameters such as mass, etc. to approximate the dynamics of the system. Our graph network is implicitly biased by learning to solve several tasks, thereby sharing representations between tasks in order to learn the forward dynamics as well as infer the probability distribution of unknown particle specific properties. We evaluate our approach on one-step next state prediction tasks across diverse datasets that feature different particle interactions. Our comparison against related data-driven physics learning approaches reveals that our model is able to predict the forward dynamics with at least an order of magnitude higher accuracy. We also show that our approach is able to recover multi-modal probability distributions of unknown physical parameters using orders of magnitude fewer samples.

Transformer(2篇)

【1】 AdaViT: Adaptive Tokens for Efficient Vision Transformer 标题：AdaViT：高效视觉转换器的自适应标记链接：https://arxiv.org/abs/2112.07658

作者：Hongxu Yin,Arash Vahdat,Jose Alvarez,Arun Mallya,Jan Kautz,Pavlo Molchanov 机构：NVIDIA, Token removal & reorg., Token halting, Transformer Block, Token Depths, Mean-field, aggregation, Task head, Tokens remain: , Adaptive Halting, Halting probability, Tokenization, Embedding, Class token, memory, Layer K, ImageNet,K Examples for Adaptive Tokens 摘要：我们介绍了AdaViT，一种针对不同复杂度的图像自适应调整视觉转换器（ViT）推理代价的方法。AdaViT通过在推理过程中自动减少网络中处理的vision Transformer中的令牌数量来实现这一点。我们为这项任务重新制定了自适应计算时间（ACT），扩展了暂停以丢弃冗余的空间令牌。视觉转换器吸引人的体系结构特性使我们的自适应令牌缩减机制能够在不修改网络体系结构或推理硬件的情况下加速推理。我们证明了AdaViT不需要额外的参数或子网络来停止，因为我们基于原始网络参数学习自适应停止。我们进一步引入了分布先验正则化，与先验ACT方法相比，它可以稳定训练。在图像分类任务（ImageNet1K）中，我们证明了我们提出的AdaViT在过滤信息性空间特征和减少总体计算量方面的高效性。该方法将DeiT-Tiny和DeiT-Small的吞吐量分别提高了62%和38%，而准确度仅下降了0.3%，大大优于现有技术。摘要：We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds. We reformulate Adaptive Computation Time (ACT) for this task, extending halting to discard redundant spatial tokens. The appealing architectural properties of vision transformers enables our adaptive token reduction mechanism to speed up inference without modifying the network architecture or inference hardware. We demonstrate that AdaViT requires no extra parameters or sub-network for halting, as we base the learning of adaptive halting on the original network parameters. We further introduce distributional prior regularization that stabilizes training compared to prior ACT approaches. On the image classification task (ImageNet1K), we show that our proposed AdaViT yields high efficacy in filtering informative spatial features and cutting down on the overall compute. The proposed method improves the throughput of DeiT-Tiny by 62% and DeiT-Small by 38% with only 0.3% accuracy drop, outperforming prior art by a large margin.

【2】 Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text 标题：迈向统一的基础模型：对未配对的图像和文本进行联合预训练Transformer 链接：https://arxiv.org/abs/2112.07074

作者：Qing Li,Boqing Gong,Yin Cui,Dan Kondratyuk,Xianzhi Du,Ming-Hsuan Yang,Matthew Brown 机构：Google Research, University of California, Los Angeles 备注：preliminary work 摘要：在本文中，我们探讨建立统一的基础模型的可能性，可以适用于视觉和纯文本的任务。从BERT和ViT开始，我们设计了一个统一的转换器，它由特定于模态的标记器、共享的转换器编码器和特定于任务的输出头组成。为了有效地在未配对的图像和文本上对所提出的模型进行联合预训练，我们提出了两种新技术：（i）我们使用单独训练的BERT和ViT模型作为教师，并应用知识提取为联合训练提供额外的、准确的监督信号；（ii）我们提出了一种新的梯度掩蔽策略来平衡来自图像和文本预训练损失的参数更新。我们通过分别在图像分类任务和自然语言理解任务中对联合预训练的transformer进行微调来评估它。实验结果表明，所得到的统一基础Transformer在视觉和纯文本两个任务上都令人惊讶地很好地工作，并且所提出的知识蒸馏和梯度掩蔽策略可以有效地提升性能以接近单独训练的模型的水平。摘要：In this paper, we explore the possibility of building a unified foundation model that can be adapted to both vision-only and text-only tasks. Starting from BERT and ViT, we design a unified transformer consisting of modality-specific tokenizers, a shared transformer encoder, and task-specific output heads. To efficiently pre-train the proposed model jointly on unpaired images and text, we propose two novel techniques: (i) We employ the separately-trained BERT and ViT models as teachers and apply knowledge distillation to provide additional, accurate supervision signals for the joint training; (ii) We propose a novel gradient masking strategy to balance the parameter updates from the image and text pre-training losses. We evaluate the jointly pre-trained transformer by fine-tuning it on image classification tasks and natural language understanding tasks, respectively. The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.

GAN|对抗|攻击|生成相关(7篇)

【1】 Adversarial Examples for Extreme Multilabel Text Classification 标题：极端多标签文本分类的对抗性实例链接：https://arxiv.org/abs/2112.07512

作者：Mohammadreza Qaraei,Rohit Babbar 机构：Aalto University, Helsinki, Finland 摘要：极端多标签文本分类（XMTC）是一个文本分类问题，其中，（i）输出空间非常大，（ii）每个数据点可能有多个正标签，以及（iii）数据遵循强不平衡分布。随着XMTC在推荐系统和web文档自动标注中的应用，XMTC的研究重点已经放在提高预测精度和处理不平衡数据上。然而，基于深度学习的XMTC模型对对抗性示例的鲁棒性在很大程度上还没有得到充分的探索。本文研究了XMTC模型在对抗攻击下的行为。为此，首先，我们在多标签文本分类问题中定义了对抗性攻击。我们将攻击性多标签文本分类器分类为（a）正目标，其中目标正标签应不属于前k个预测标签；和（b）负目标，其中目标负标签应位于前k个预测标签中。然后，通过在APLC XLNet和AttentionXML上的实验，我们表明XMTC模型对积极目标攻击非常脆弱，但对消极目标攻击更具鲁棒性。此外，我们的实验表明，正面目标对抗攻击的成功率具有不平衡分布。更准确地说，tail类极易受到敌对攻击，攻击者可以为其生成与实际数据点高度相似的敌对样本。为了克服这个问题，我们探索了XMTC中重新平衡损失函数的效果，它们不仅提高了尾部类的准确性，而且还提高了这些类对敌对攻击的鲁棒性。我们的实验代码可在https://github.com/xmc-aalto/adv-xmtc 摘要：Extreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space is extremely large, (ii) each data point may have multiple positive labels, and (iii) the data follows a strongly imbalanced distribution. With applications in recommendation systems and automatic tagging of web-scale documents, the research on XMTC has been focused on improving prediction accuracy and dealing with imbalanced data. However, the robustness of deep learning based XMTC models against adversarial examples has been largely underexplored. In this paper, we investigate the behaviour of XMTC models under adversarial attacks. To this end, first, we define adversarial attacks in multilabel text classification problems. We categorize attacking multilabel text classifiers as (a) positive-targeted, where the target positive label should fall out of top-k predicted labels, and (b) negative-targeted, where the target negative label should be among the top-k predicted labels. Then, by experiments on APLC-XLNet and AttentionXML, we show that XMTC models are highly vulnerable to positive-targeted attacks but more robust to negative-targeted ones. Furthermore, our experiments show that the success rate of positive-targeted adversarial attacks has an imbalanced distribution. More precisely, tail classes are highly vulnerable to adversarial attacks for which an attacker can generate adversarial samples with high similarity to the actual data-points. To overcome this problem, we explore the effect of rebalanced loss functions in XMTC where not only do they increase accuracy on tail classes, but they also improve the robustness of these classes against adversarial attacks. The code for our experiments is available at https://github.com/xmc-aalto/adv-xmtc

【2】 On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training 标题：论对抗性训练中硬性对抗性实例对过度适应的影响链接：https://arxiv.org/abs/2112.07324

作者：Chen Liu,Zhichao Huang,Mathieu Salzmann,Tong Zhang,Sabine Süsstrunk 摘要：对抗性训练是一种常用的方法，用于对模型进行对抗性攻击的鲁棒性验证。然而，它表现出比清洁输入训练更严重的过度拟合。在这项工作中，我们从训练实例，即训练输入目标对的角度来研究这一现象。基于对实例难度的定量度量，我们分析了模型在不同难度训练实例上的行为。这让我们可以看出，对抗性训练的泛化性能下降是模型试图适应硬对抗实例的结果。我们从理论上验证了我们对线性和一般非线性模型的观察，证明了在硬实例上训练的模型比在简单实例上训练的模型具有更差的泛化性能。此外，我们还证明了不同难度的实例训练的模型之间的泛化差距随着对抗预算的大小而增加。最后，我们进行了案例研究，探讨了在几种情况下缓解对抗性过度拟合的方法。我们的分析表明，成功缓解对抗性过度拟合的方法都避免了拟合硬对抗性实例，而拟合硬对抗性实例的方法并不能实现真正的鲁棒性。摘要：Adversarial training is a popular method to robustify models against adversarial attacks. However, it exhibits much more severe overfitting than training on clean inputs. In this work, we investigate this phenomenon from the perspective of training instances, i.e., training input-target pairs. Based on a quantitative metric measuring instances' difficulty, we analyze the model's behavior on training instances of different difficulty levels. This lets us show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances. We theoretically verify our observations for both linear and general nonlinear models, proving that models trained on hard instances have worse generalization performance than ones trained on easy instances. Furthermore, we prove that the difference in the generalization gap between models trained by instances of different difficulty levels increases with the size of the adversarial budget. Finally, we conduct case studies on methods mitigating adversarial overfitting in several scenarios. Our analysis shows that methods successfully mitigating adversarial overfitting all avoid fitting hard adversarial instances, while ones fitting hard adversarial instances do not achieve true robustness.

【3】 Compensating trajectory bias for unsupervised patient stratification using adversarial recurrent neural networks 标题：用对抗性递归神经网络补偿无监督患者分层的轨迹偏差链接：https://arxiv.org/abs/2112.07239

作者：Avelino Javer,Owen Parsons,Oliver Carr,Janie Baxter,Christian Diedrich,Eren Elçi,Steffen Schaper,Katrin Coboeken,Robert Dürichen 摘要：电子医疗记录是一个重要的信息来源，可用于患者分层以发现新的疾病表型。然而，由于数据往往稀疏且采样不规则，因此使用它们可能会很有挑战性。解决这些限制的一种方法是使用递归神经网络自动编码器（RNN-AE）学习表示单个患者轨迹的密集嵌入。此过程可能容易受到不必要的数据偏差的影响。我们发现，使用先前提出的RNN-AE模型的患者嵌入和聚类可能受到轨迹偏差的影响，这意味着结果主要取决于每个患者轨迹中包含的数据量，而不是临床相关细节。我们在2个数据集（来自不同的医院）和2个疾病区域以及使用患者轨迹的不同部分调查这种偏差。我们使用2种先前公布的基线方法得出的结果表明，在事件到终点轨迹的情况下，偏差特别大。我们提出了一种在RNN-AE上使用对抗性训练方案来克服此问题的方法。我们的结果表明，我们的方法可以减少所有情况下的轨迹偏差。摘要：Electronic healthcare records are an important source of information which can be used in patient stratification to discover novel disease phenotypes. However, they can be challenging to work with as data is often sparse and irregularly sampled. One approach to solve these limitations is learning dense embeddings that represent individual patient trajectories using a recurrent neural network autoencoder (RNN-AE). This process can be susceptible to unwanted data biases. We show that patient embeddings and clusters using previously proposed RNN-AE models might be impacted by a trajectory bias, meaning that results are dominated by the amount of data contained in each patients trajectory, instead of clinically relevant details. We investigate this bias on 2 datasets (from different hospitals) and 2 disease areas as well as using different parts of the patient trajectory. Our results using 2 previously published baseline methods indicate a particularly strong bias in case of an event-to-end trajectory. We present a method that can overcome this issue using an adversarial training scheme on top of a RNN-AE. Our results show that our approach can reduce the trajectory bias in all cases.

【4】 ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval 标题：ACE-BERT：面向电子商务检索的对抗性跨模态增强型BERT 链接：https://arxiv.org/abs/2112.07209

作者：Boxuan Zhang,Chao Wei,Yan Jin,Weiru Zhang 机构：Alibaba Group, Hangzhou, China 摘要：如今，在电子商务平台上，产品以多种形式呈现给客户。在为客户提供吸引人的产品的同时，这些多种模式对于检索系统非常重要。因此，如何同时考虑多个模式以提高检索性能至关重要。由于以下原因，这个问题对我们是一个巨大的挑战：（1）用预先训练好的图像模型（如基于CNN的模型）提取面片特征的方法存在很大的归纳偏差。在电子商务中，很难从产品图像中获取有效的信息。（2）多模态数据的异构性使得在一个公共子空间中构造包含标题和图像的查询文本和产品的表示具有挑战性。我们提出了一种新的对抗性跨模式增强型BERT（ACE-BERT），用于高效的电子商务检索。具体而言，ACE-BERT利用面片特征和像素特征作为图像表示。因此，转换器结构可以直接应用于原始图像序列。ACE-BERT以预先训练好的增强型BERT为骨干网络，通过添加领域分类器进一步采用对抗式学习，确保不同模态表示的分布一致性，以缩小查询和产品之间的表示差距。实验结果表明，ACE-BERT在检索任务上优于现有的方法。值得注意的是，ACE-BERT已经部署在我们的电子商务搜索引擎中，导致收入增长1.46%。摘要：Nowadays on E-commerce platforms, products are presented to the customers with multiple modalities. These multiple modalities are significant for a retrieval system while providing attracted products for customers. Therefore, how to take into account those multiple modalities simultaneously to boost the retrieval performance is crucial. This problem is a huge challenge to us due to the following reasons: (1) the way of extracting patch features with the pre-trained image model (e.g., CNN-based model) has much inductive bias. It is difficult to capture the efficient information from the product image in E-commerce. (2) The heterogeneity of multimodal data makes it challenging to construct the representations of query text and product including title and image in a common subspace. We propose a novel Adversarial Cross-modal Enhanced BERT (ACE-BERT) for efficient E-commerce retrieval. In detail, ACE-BERT leverages the patch features and pixel features as image representation. Thus the Transformer architecture can be applied directly to the raw image sequences. With the pre-trained enhanced BERT as the backbone network, ACE-BERT further adopts adversarial learning by adding a domain classifier to ensure the distribution consistency of different modality representations for the purpose of narrowing down the representation gap between query and product. Experimental results demonstrate that ACE-BERT outperforms the state-of-the-art approaches on the retrieval task. It is remarkable that ACE-BERT has already been deployed in our E-commerce's search engine, leading to 1.46% increase in revenue.

【5】 Controlled Cue Generation for Play Scripts 标题：游戏脚本的受控线索生成链接：https://arxiv.org/abs/2112.06953

作者：Alara Dirik,Hilal Donmez,Pinar Yanardag 机构：Bo˘gaziçi University, Istanbul, Turkey 摘要：在本文中，我们使用了一个大规模的剧本数据集，提出了从对话中生成戏剧线索的新任务。使用超过一百万行的对话和线索，我们将线索生成问题作为受控文本生成任务来处理，并展示如何使用以对话/线索鉴别器为条件的语言模型来使用线索来增强对话的影响。此外，我们还探讨了主题关键字和情感在受控文本生成中的应用。大量的定量和定性实验表明，语言模型可以成功地用于在高度专业化的领域（如剧本）中生成合理的、属性受控的文本。有关支持材料，请访问：https://catlab-team.github.io/cuegen. 摘要：In this paper, we use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. Using over one million lines of dialogue and cues, we approach the problem of cue generation as a controlled text generation task, and show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator. In addition, we explore the use of topic keywords and emotions for controlled text generation. Extensive quantitative and qualitative experiments show that language models can be successfully used to generate plausible and attribute-controlled texts in highly specialised domains such as play scripts. Supporting materials can be found at: https://catlab-team.github.io/cuegen.

【6】 CGAN-EB: A Non-parametric Empirical Bayes Method for Crash Hotspot Identification Using Conditional Generative Adversarial Networks: A Simulated Crash Data Study 标题：CGAN-EB：基于条件生成对抗网络的坠机热点识别的非参数经验贝叶斯方法：模拟坠机数据研究链接：https://arxiv.org/abs/2112.06925

作者：Mohammad Zarei,Bruce Hellinga,Pedram Izadpanah 机构： Ph.D. Candidate, Department of Civil and Environmental Engineering, University of Waterloo, University Ave., Waterloo, ON N,L,G 备注：17 pages, 8 figures 摘要：本文提出了一种新的非参数经验贝叶斯方法，称为CGAN-EB，用于近似交通位置（如路段）的经验贝叶斯（EB）估计，该方法得益于深度神经网络的建模优势，并与基于负二项模型的传统方法（NB-EB）进行了仿真比较。NB-EB使用负二项模型对碰撞数据进行建模，是实践中最常用的方法。为了对所提出的CGAN-EB中的碰撞数据进行建模，使用了条件生成对抗网络，这是一种功能强大的基于深度神经网络的方法，可以对任何类型的分布进行建模。设计并进行了大量模拟实验，以评估CGAN-EB在不同条件下的性能，并将其与NB-EB进行比较。结果表明，当条件有利于NB-EB模型（即数据符合NB模型的假设）时，CGAN-EB的表现与NB-EB一样好，并且在反映实践中经常遇到的条件的实验中，尤其是低样本平均数，CGAN-EB的表现优于NB-EB，当碰撞频率与协变量不呈对数线性关系时。摘要：In this paper, a new non-parametric empirical Bayes approach called CGAN-EB is proposed for approximating empirical Bayes (EB) estimates in traffic locations (e.g., road segments) which benefits from the modeling advantages of deep neural networks, and its performance is compared in a simulation study with the traditional approach based on negative binomial model (NB-EB). The NB-EB uses negative binomial model in order to model the crash data and is the most common approach in practice. To model the crash data in the proposed CGAN-EB, conditional generative adversarial network is used, which is a powerful deep neural network based method that can model any types of distributions. A number of simulation experiments are designed and conducted to evaluate the CGAN-EB performance in different conditions and compare it with the NB-EB. The results show that CGAN-EB performs as well as NB-EB when conditions favor the NB-EB model (i.e. data conform to the assumptions of the NB model) and outperforms NB-EB in experiments reflecting conditions frequently encountered in practice, specifically low sample means, and when crash frequency does not follow a log-linear relationship with covariates.

【7】 Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing 标题：使用无人监督的后期编辑生成流畅的事实核查解释链接：https://arxiv.org/abs/2112.06924

作者：Shailza Jolly,Pepa Atanasova,Isabelle Augenstein 机构： Department of Computer Science, University of Copenhagen, Department ofComputer Science, Universityof Copenhagen 摘要：事实核查系统已经成为核实虚假和误导性新闻的重要工具。当人类可读的解释伴随着准确性标签时，这些系统变得更加可信。然而，手动收集此类解释既昂贵又耗时。最近的著作将解释生成框架为提取摘要，并建议从专业记者的裁决评论（RCs）中自动选择足够多的最重要事实子集，以获得事实检查解释。然而，这些解释缺乏流利性和句子连贯性。在这项工作中，我们提出了一种基于迭代编辑的算法，该算法仅使用短语级编辑来对断开连接的RCs执行无监督的后期编辑。为了调整我们的编辑算法，我们使用了一个评分函数，其中包括流畅性和语义保留。此外，我们还展示了我们的方法在完全无监督的环境中的适用性。我们使用两个基准数据集进行实验，LIAR-PLUS和PubHealth。我们表明，我们的模型生成的解释流畅、可读、无冗余，并且涵盖了事实检查的重要信息。摘要：Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when human-readable explanations accompany the veracity labels. However, manual collection of such explanations is expensive and time-consuming. Recent works frame explanation generation as extractive summarization, and propose to automatically select a sufficient subset of the most important facts from the ruling comments (RCs) of a professional journalist to obtain fact-checking explanations. However, these explanations lack fluency and sentence coherence. In this work, we present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of disconnected RCs. To regulate our editing algorithm, we use a scoring function with components including fluency and semantic preservation. In addition, we show the applicability of our approach in a completely unsupervised setting. We experiment with two benchmark datasets, LIAR-PLUS and PubHealth. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】 Cooperation for Scalable Supervision of Autonomy in Mixed Traffic 标题：混合交通中自主性可扩展监管的协作链接：https://arxiv.org/abs/2112.07569

作者：Cameron Hickert,Sirui Li,Cathy Wu 机构：comSirui Li is with the Institute for Data, MassachusettsInstitute of Technology 备注：14 pages, 7 figures 摘要：自主性的提高为许多领域带来了积极成果的潜力，但很难保证它们的安全部署。这项工作调查了人类如何能够智能地监督代理，以实现某种程度的安全，即使在性能保证难以捉摸的情况下。令人振奋的研究问题是：在安全关键设置中，我们是否可以避免需要一个人随时监控一台机器？本文形式化了这种“缩放监控”问题，并研究了其在自动车辆（AVs）融入交通安全关键环境中的应用。它提出了一种保守的、基于可达性的方法，以减轻AVs人工监管者的负担，允许在此设置中建立监督要求的高置信上限。订单统计和具有深度强化学习的流量模拟从分析和数值上表明，AVs的组合能够实现AV采用的监控时间次线性。一个关键的收获是，尽管AVs目前存在缺陷，但随着AVs的大规模部署，监督变得更加容易处理。虽然这项工作的重点是AVs，但可扩展的监控框架与更广泛的自主控制挑战相关。摘要：Improvements in autonomy offer the potential for positive outcomes in a number of domains, yet guaranteeing their safe deployment is difficult. This work investigates how humans can intelligently supervise agents to achieve some level of safety even when performance guarantees are elusive. The motivating research question is: In safety-critical settings, can we avoid the need to have one human supervise one machine at all times? The paper formalizes this 'scaling supervision' problem, and investigates its application to the safety-critical context of autonomous vehicles (AVs) merging into traffic. It proposes a conservative, reachability-based method to reduce the burden on the AVs' human supervisors, which allows for the establishment of high-confidence upper bounds on the supervision requirements in this setting. Order statistics and traffic simulations with deep reinforcement learning show analytically and numerically that teaming of AVs enables supervision time sublinear in AV adoption. A key takeaway is that, despite present imperfections of AVs, supervision becomes more tractable as AVs are deployed en masse. While this work focuses on AVs, the scalable supervision framework is relevant to a broader array of autonomous control challenges.

【2】 n-CPS: Generalising Cross Pseudo Supervision to n networks for Semi-Supervised Semantic Segmentation标题：n-CPS：将交叉伪监督推广到n网络进行半监督语义分割链接：https://arxiv.org/abs/2112.07528

作者：Dominik Filipiak,Piotr Tempczyk,Marek Cygan 机构： AI Clearing, Inc., Semantic Technology Institute, Department of Computer Science, University of Innsbruck, Informatics and Mechanics, University of Warsaw 摘要：我们提出了$n$-CPS——一种最新的用于半监督语义切分任务的交叉伪监督（CPS）方法的推广。在$n$-CPS中，有$n$同时训练的子网络通过一个热编码扰动和一致性正则化相互学习。我们还表明，集成技术应用于子网输出可以显著提高性能。据我们所知，$n$-CPS与CutMix组合的表现优于CPS，并为Pascal VOC 2012设定了新的最先进水平，包括（1/16、1/8、1/4和1/2监管制度）和城市景观（1/16监管）。摘要：We present $n$-CPS - a generalisation of the recent state-of-the-art cross pseudo supervision (CPS) approach for the task of semi-supervised semantic segmentation. In $n$-CPS, there are $n$ simultaneously trained subnetworks that learn from each other through one-hot encoding perturbation and consistency regularisation. We also show that ensembling techniques applied to subnetworks outputs can significantly improve the performance. To the best of our knowledge, $n$-CPS paired with CutMix outperforms CPS and sets the new state-of-the-art for Pascal VOC 2012 with (1/16, 1/8, 1/4, and 1/2 supervised regimes) and Cityscapes (1/16 supervised).

【3】 Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry 标题：流程工业智能故障诊断中的技术语言监督链接：https://arxiv.org/abs/2112.07356

作者：Karl Löwenmark,Cees Taal,Stephan Schnabel,Marcus Liwicki,Fredrik Sandin 机构：Embedded Intelligent Systems Laboratory (EISLAB), Lule˚a University of Technology, Lule˚a, Sweden, SKF Research & Technology Development, Meidoornkade , AE Houten, P.O. Box , DT Nieuwegein, The, Netherlands 摘要：在流程工业中，具有自动故障诊断方法的状态监测系统可帮助人类专家，从而提高维护效率、流程可持续性和工作场所安全性。利用基于数据和机器学习的模型改进自动故障诊断方法是智能故障诊断（IFD）的一个核心方面。IFD中的一个主要挑战是开发具有训练和验证模型所需的准确标签的真实数据集，并将使用标签实验室数据训练的模型传输到异构流程工业环境。然而，在现代状态监测系统中，领域专家编写的故障描述和工作指令越来越数字化，例如在旋转设备监测中。因此，关于故障特征和严重性的领域特定知识作为技术语言注释存在于工业数据集中。此外，自然语言处理的最新进展使使用自然语言注释的弱监督模型优化成为可能，最明显的形式是自然语言监督（NLS）。这为基于工业数据的IFD系统开发技术语言监控（TLS）解决方案创造了一个及时的机会，例如，作为实验室数据预训练的补充，以解决过度拟合和不准确的样本外概括等问题。我们调查了文献，发现在过去两年中NLS的成熟度有了相当大的提高，促进了自然语言以外的应用；薄弱监督手段迅速发展；迁移学习是IFD的一个当前趋势，可以从这些发展中受益。最后，我们描述了一个在IFD中集成TLS的框架，该框架受到最近NLS创新的启发。摘要：In the process industry, condition monitoring systems with automated fault diagnosis methods assisthuman experts and thereby improve maintenance efficiency, process sustainability, and workplace safety.Improving the automated fault diagnosis methods using data and machine learning-based models is a centralaspect of intelligent fault diagnosis (IFD). A major challenge in IFD is to develop realistic datasets withaccurate labels needed to train and validate models, and to transfer models trained with labeled lab datato heterogeneous process industry environments. However, fault descriptions and work-orders written bydomain experts are increasingly digitized in modern condition monitoring systems, for example in the contextof rotating equipment monitoring. Thus, domain-specific knowledge about fault characteristics and severitiesexists as technical language annotations in industrial datasets. Furthermore, recent advances in naturallanguage processing enable weakly supervised model optimization using natural language annotations, mostnotably in the form ofnatural language supervision(NLS). This creates a timely opportunity to developtechnical language supervision(TLS) solutions for IFD systems grounded in industrial data, for exampleas a complement to pre-training with lab data to address problems like overfitting and inaccurate out-of-sample generalisation. We surveyed the literature and identify a considerable improvement in the maturityof NLS over the last two years, facilitating applications beyond natural language; a rapid development ofweak supervision methods; and transfer learning as a current trend in IFD which can benefit from thesedevelopments. Finally, we describe a framework for integration of TLS in IFD which is inspired by recentNLS innovations.

【4】 Unsupervised feature selection via self-paced learning and low-redundant regularization 标题：基于自定步长学习和低冗余正则化的无监督特征选择链接：https://arxiv.org/abs/2112.07227

作者：Weiyi Li,Hongmei Chen,Tianrui Li,Jihong Wan,Binbin Sang 机构：School of Computing and Artificial Intelligence, Southwest Jiaotong University, China, National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, China 摘要：由于大量未标记数据的出现，无监督特征选择越来越受到人们的重视。为了提高学习方法的鲁棒性，需要考虑样本的分布和以更有效的顺序使用样本训练学习方法的潜在影响。考虑样本的训练顺序，自配学习是一种有效的学习方法。在本研究中，结合自配学习和子空间学习的框架，提出了一种无监督的特征选择方法。此外，局部流形结构得到了保留，特征的冗余度受到两个正则化项的约束$投影矩阵采用L_{2,1/2}$-范数，其目的是保留鉴别特征并进一步减轻数据中噪声的影响。然后，提出了一种求解优化问题的迭代方法。理论和实验证明了该方法的收敛性。在9个真实数据集上，将该方法与其他最新算法进行了比较。实验结果表明，该方法能够提高聚类算法的性能，并优于其他同类算法。摘要：Much more attention has been paid to unsupervised feature selection nowadays due to the emergence of massive unlabeled data. The distribution of samples and the latent effect of training a learning method using samples in more effective order need to be considered so as to improve the robustness of the method. Self-paced learning is an effective method considering the training order of samples. In this study, an unsupervised feature selection is proposed by integrating the framework of self-paced learning and subspace learning. Moreover, the local manifold structure is preserved and the redundancy of features is constrained by two regularization terms. $L_{2,1/2}$-norm is applied to the projection matrix, which aims to retain discriminative features and further alleviate the effect of noise in the data. Then, an iterative method is presented to solve the optimization problem. The convergence of the method is proved theoretically and experimentally. The proposed method is compared with other state of the art algorithms on nine real-world datasets. The experimental results show that the proposed method can improve the performance of clustering methods and outperform other compared algorithms.

【5】 Addressing Bias in Active Learning with Depth Uncertainty Networks... or Not 标题：用深度不确定网络解决主动学习中的偏差。或者不是链接：https://arxiv.org/abs/2112.06926

作者：Chelsea Murray,James U. Allingham,Javier Antorán,José Miguel Hernández-Lobato 机构：Department of Engineering, University of Cambridge 备注：arXiv admin note: substantial text overlap with arXiv:2112.06796 摘要：Farquhar等人[2021]表明，用参数不足的模型纠正主动学习偏差可以提高下游绩效。然而，对于NNs等过度参数化模型，校正会导致性能降低或保持不变。他们认为这是由于“过度拟合偏差”抵消了主动学习偏差。我们表明，深度不确定性网络在低过拟合状态下运行，很像低参数模型。因此，通过偏差校正，他们应该看到性能的提高。令人惊讶的是，他们没有。我们认为，这一负面结果以及Farquhar等人[2021]的结果可以通过广义误差的偏差方差分解来解释。摘要：Farquhar et al. [2021] show that correcting for active learning bias with underparameterised models leads to improved downstream performance. For overparameterised models such as NNs, however, correction leads either to decreased or unchanged performance. They suggest that this is due to an "overfitting bias" which offsets the active learning bias. We show that depth uncertainty networks operate in a low overfitting regime, much like underparameterised models. They should therefore see an increase in performance with bias correction. Surprisingly, they do not. We propose that this negative result, as well as the results Farquhar et al. [2021], can be explained via the lens of the bias-variance decomposition of generalisation error.

【6】 Inductive Semi-supervised Learning Through Optimal Transport 标题：基于最优传输的归纳半监督学习链接：https://arxiv.org/abs/2112.07262

作者：Mourad El Hamri,Younès Bennani,Issam Falih 机构： LIPN - CNRS UMR , Universit´e Sorbonne Paris Nord, France, LaMSN - La Maison des Sciences Num´eriques, France, LIMOS - CNRS UMR , Universit´e Clermont Auvergne, France 备注：None 摘要：在本文中，我们解决了归纳半监督学习问题，该问题旨在获得样本外数据的标签预测。提出的方法称为最优传输诱导（OTI），有效地将基于最优传输的转换算法（OTP）扩展到二进制和多类设置的诱导任务。在几个数据集上进行了一系列实验，以便将所提出的方法与最先进的方法进行比较。实验证明了该方法的有效性。我们公开了我们的代码（代码可从以下网址获得：https://github.com/MouradElHamri/OTI). 摘要：In this paper, we tackle the inductive semi-supervised learning problem that aims to obtain label predictions for out-of-sample data. The proposed approach, called Optimal Transport Induction (OTI), extends efficiently an optimal transport based transductive algorithm (OTP) to inductive tasks for both binary and multi-class settings. A series of experiments are conducted on several datasets in order to compare the proposed approach with state-of-the-art methods. Experiments demonstrate the effectiveness of our approach. We make our code publicly available (Code is available at: https://github.com/MouradElHamri/OTI).

【7】 Active Learning for the Optimal Design of Multinomial Classification in Physics 标题：主动学习在物理多项式分类优化设计中的应用链接：https://arxiv.org/abs/2109.08612

作者：Yongcheng Ding,José D. Martín-Guerrero,Yujing Song,Rafael Magdalena-Benedito,Xi Chen 机构：Department of Physical Chemistry, University of the Basque Country UPVEHU, Apartado , Bilbao, Spain, ProQuam Co., Ltd., Shanghai, China, IDAL, Electronic Engineering Department, ETSE-UV, University of Valencia 备注：13 pages and 11 figures 摘要：模型训练的优化设计是机器学习中的一个重要课题。主动学习的目的是根据人工标注的估计模型，通过查询具有最大不确定性的样本，获得改进的模型；这还有一个额外的优点，即通过减少标记样本的数量来实现成功的性能。我们分析了它作为实验设计的助手的能力，以最小的保真度损失提取最大的学习信息，或者减少实验室标记的总操作成本。我们介绍了两个典型的应用，如量子信息检索在qutrits和相界预测在多体物理。对于一个等价的多项式分类问题，我们在标记少于2%样本的情况下，实现了99%的正确率。我们认为，以主动学习为灵感的物理实验将在不降低准确性的情况下显著节省预算。摘要：Optimal design for model training is a critical topic in machine learning. Active Learning aims at obtaining improved models by querying samples with maximum uncertainty according to the estimation model for artificially labeling; this has the additional advantage of achieving successful performances with a reduced number of labeled samples. We analyze its capability as an assistant for the design of experiments, extracting maximum information for learning with the minimal cost in fidelity loss, or reducing total operation costs of labeling in the laboratory. We present two typical applications as quantum information retrieval in qutrits and phase boundary prediction in many-body physics. For an equivalent multinomial classification problem, we achieve the correct rate of 99% with less than 2% samples labeled. We reckon that active-learning-inspired physics experiments will remarkably save budget without loss of accuracy.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】 Exploring the Limits of Natural Language Inference Based Setup for Few-Shot Intent Detection 标题：探索基于自然语言推理的少发意图检测设置的局限性链接：https://arxiv.org/abs/2112.07434

作者：Vijit Malik,Ayush Kumar,Jithendra Veppa 机构：IIT Kanpur, India, Observe.AI, India, Jithendra Vepa 摘要：面向目标的对话系统的核心组件之一是意图检测任务。由于缺乏可用的注释话语，很少有意图检测时的镜头学习具有挑战性。尽管最近提出了使用基于度量和基于优化的方法的工作，但在大标签空间和更少的快照数量中，该任务仍然具有挑战性。由于在测试阶段同时存在新的和可见的类，因此广义的Few-Shot学习更加困难。在这项工作中，我们提出了一种基于自然语言推理的简单有效的方法，不仅解决了少数镜头意图检测问题，而且在Zero-Shot和广义少数镜头学习问题中也证明了它的有效性。我们在大量自然语言理解（NLU）和口语理解（SLU）数据集上的大量实验表明了我们方法的有效性。此外，我们还强调了我们基于NLI的方法比基线的性能好很多的设置。摘要：One of the core components of goal-oriented dialog systems is the task of Intent Detection. Few-shot Learning upon Intent Detection is challenging due to the scarcity of available annotated utterances. Although recent works making use of metric-based and optimization-based methods have been proposed, the task is still challenging in large label spaces and much smaller number of shots. Generalized Few-shot learning is more difficult due to the presence of both novel and seen classes during the testing phase. In this work, we propose a simple and effective method based on Natural Language Inference that not only tackles the problem of few shot intent detection, but also proves useful in zero-shot and generalized few shot learning problems. Our extensive experiments on a number of Natural Language Understanding (NLU) and Spoken Language Understanding (SLU) datasets show the effectiveness of our approach. In addition, we highlight the settings in which our NLI based method outperforms the baselines by huge margins.

【2】 Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization 标题：非凸优化自适应方法的随机一阶Oracle复杂度最小化链接：https://arxiv.org/abs/2112.07163

作者：Hideaki Iiduka 摘要：数值评估明确表明，对于深度学习优化器，如随机梯度下降法、动量法和自适应方法，训练深度神经网络所需的步骤数为批次大小每增加一倍的一半，并且在临界批次大小之外存在一个收益递减区域。在本文中，我们使用优化器的随机一阶oracle（SFO）复杂性的全局极小值来确定实际的关键批量大小。为了证明实际临界批量的存在性，我们设置了SFO复杂性的上下界，并证明了在最小化上下界的意义下存在临界批量。这个证明意味着，如果SFO复杂性符合上下限，那么这些临界批量的存在证明了实际临界批量的存在。我们还讨论了SFO复杂性满足上下界所需的条件，并提供了支持我们理论结果的数值结果。摘要：Numerical evaluations have definitively shown that, for deep learning optimizers such as stochastic gradient descent, momentum, and adaptive methods, the number of steps needed to train a deep neural network halves for each doubling of the batch size and that there is a region of diminishing returns beyond the critical batch size. In this paper, we determine the actual critical batch size by using the global minimizer of the stochastic first-order oracle (SFO) complexity of the optimizer. To prove the existence of the actual critical batch size, we set the lower and upper bounds of the SFO complexity and prove that there exist critical batch sizes in the sense of minimizing the lower and upper bounds. This proof implies that, if the SFO complexity fits the lower and upper bounds, then the existence of these critical batch sizes demonstrates the existence of the actual critical batch size. We also discuss the conditions needed for the SFO complexity to fit the lower and upper bounds and provide numerical results that support our theoretical results.

【3】 Adaptive Projected Residual Networks for Learning Parametric Maps from Sparse Data 标题：稀疏数据中参数映射学习的自适应投影残差网络链接：https://arxiv.org/abs/2112.07096

作者：Thomas O'Leary-Roseberry,Xiaosong Du,Anirban Chaudhuri,Joaquim R. R. A. Martins,Karen Willcox,Omar Ghattas 机构：UniversityofMichigan(xsdu, §Oden Institute for Computational Engineering & Sciences, Department of Aerospace En-gineering and Engineering Mechanics 摘要：我们提出了一个简约的代理框架，用于从有限的训练数据中学习高维参数映射。在许多需要重复查询复杂计算模型的应用程序中，都需要参数代理。这些应用包括诸如贝叶斯逆问题、最优实验设计、不确定性下的最优设计和控制等“外环”问题，以及实时推理和控制问题。许多高维参数映射允许低维结构，可以通过映射已知的输入和输出的缩减基来利用低维结构。利用这一性质，我们提出了一个学习此类映射的低维近似的框架，通过在其输入和输出的缩减基之间自适应地构造ResNet近似。受控制流离散化的ResNet的最新近似理论的启发，我们证明了我们提出的自适应投影ResNet框架的通用近似性质，这激发了ResNet构造的相关迭代算法。该策略代表了近似理论和算法的融合，因为两者都使用顺序最小化流。在数值例子中，我们表明，这些节省的、映射信息丰富的体系结构能够在给定少量训练数据的情况下实现非常高的精度，这使得它们成为一种理想的替代策略，可以在训练数据生成中实现最小的计算投资。摘要：We present a parsimonious surrogate framework for learning high dimensional parametric maps from limited training data. The need for parametric surrogates arises in many applications that require repeated queries of complex computational models. These applications include such "outer-loop" problems as Bayesian inverse problems, optimal experimental design, and optimal design and control under uncertainty, as well as real time inference and control problems. Many high dimensional parametric mappings admit low dimensional structure, which can be exploited by mapping-informed reduced bases of the inputs and outputs. Exploiting this property, we develop a framework for learning low dimensional approximations of such maps by adaptively constructing ResNet approximations between reduced bases of their inputs and output. Motivated by recent approximation theory for ResNets as discretizations of control flows, we prove a universal approximation property of our proposed adaptive projected ResNet framework, which motivates a related iterative algorithm for the ResNet construction. This strategy represents a confluence of the approximation theory and the algorithm since both make use of sequentially minimizing flows. In numerical examples we show that these parsimonious, mapping-informed architectures are able to achieve remarkably high accuracy given few training data, making them a desirable surrogate strategy to be implemented for minimal computational investment in training data generation.

强化学习(7篇)

【1】 Tree-based Focused Web Crawling with Reinforcement Learning 标题：基于强化学习的基于树的聚焦Web爬行链接：https://arxiv.org/abs/2112.07620

作者：Andreas Kontogiannis,Dimitrios Kelesis,Vasilis Pollatos,Georgios Paliouras,George Giannakopoulos 机构：School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece, Software and Knowledge Engineering Lab, NCSR “Demokritos”, Athens, Greece, SciFY PNPC, Athens, Greece 摘要：聚焦爬虫旨在发现尽可能多的与目标主题相关的网页，同时避免不相关的网页；i、 e.最大化收获率。强化学习（RL）已被用于优化爬行过程，但它处理巨大的状态和动作空间，这可能构成严重的挑战。在本文中，我们提出了TRES，一个端到端RL授权的聚焦爬行框架。与其他方法不同，我们将爬行环境正确地建模为马尔可夫决策过程，将状态表示为Web的子图，将动作表示为其扩展边。TRES采用基于关键词嵌入的余弦相似性的关键词扩展策略。为了学习奖励函数，我们提出了一种深度神经网络，称为KwBiLSTM，利用发现的关键字。为了降低选择最佳动作的时间复杂度，我们提出了树边界（一种双重决策树），它还通过离散状态空间和动作空间来加速训练。实验表明，TRES在收获率方面优于最先进的方法至少58%，而在域最大化方面具有竞争性结果。我们的实现代码可以在https://github.com/ddaedalus/TRES. 摘要：A focused crawler aims at discovering as many web pages relevant to a target topic as possible, while avoiding irrelevant ones; i.e. maximizing the harvest rate. Reinforcement Learning (RL) has been utilized to optimize the crawling process, yet it deals with huge state and action spaces, which can constitute a serious challenge. In this paper, we propose TRES, an end-to-end RL-empowered framework for focused crawling. Unlike other approaches, we properly model a crawling environment as a Markov Decision Process, by representing the state as a subgraph of the Web and actions as its expansion edges. TRES adopts a keyword expansion strategy based on the cosine similarity of keyword embeddings. To learn a reward function, we propose a deep neural network, called KwBiLSTM, leveraging the discovered keywords. To reduce the time complexity of selecting a best action, we propose Tree-Frontier, a two-fold decision tree, which also speeds up training by discretizing the state and action spaces. Experimentally, we show that TRES outperforms state-of-the-art methods in terms of harvest rate by at least 58%, while it has competitive results in the domain maximization. Our implementation code can be found on https://github.com/ddaedalus/TRES.

【2】 Scientific Discovery and the Cost of Measurement -- Balancing Information and Cost in Reinforcement Learning 标题：科学发现与测量成本--强化学习中信息与成本的平衡链接：https://arxiv.org/abs/2112.07535

作者：Colin Bellinger,Andriy Drozdyuk,Mark Crowley,Isaac Tamblyn 机构： National Research Council of Canada, Carleton University, University of Waterloo, University of Ottawa, Vector Institute for Artificial Intelligence 备注：To appear in: 1st Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE) 摘要：强化学习（RL）在材料设计和自动化化学等科学应用中的应用正在增加。然而，一个主要挑战在于，在科学应用中，测量系统状态通常成本高昂且耗时，而使用RL学习政策需要在每个时间步后进行测量。在这项工作中，我们以成本报酬的形式明确了测量成本，并提出了一个框架，使现成的深度RL算法能够学习选择操作和确定是否在每个时间步测量系统当前状态的策略。通过这种方式，代理学会在信息需求和信息成本之间取得平衡。我们的结果表明，当在这种机制下训练时，决斗的DQN和PPO代理可以学习最佳行动策略，同时减少50%的状态测量，而递归神经网络可以减少50%以上的测量。我们假设这些减少有助于降低将RL应用于实际科学应用的障碍。摘要：The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50% fewer state measurements, and recurrent neural networks can produce a greater than 50% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.

【3】 Conjugated Discrete Distributions for Distributional Reinforcement Learning 标题：分布式强化学习的共轭离散分布链接：https://arxiv.org/abs/2112.07424

作者：Björn Lindenberg,Jonas Nordqvist,Karl-Olof Lindahl 机构：Department of Mathematics, Linnæus University, V¨axj¨o, Sweden 备注：17 pages, 7 figures, conference 摘要：在这项工作中，我们继续建立在有限马尔可夫过程强化学习的最新进展之上。在以前的现有算法中，无论是单参与者算法还是分布式算法，都有一种常见的方法，即要么剪辑奖励，要么对Q函数应用转换方法，以处理实际贴现回报中的大量数量级。我们从理论上证明，如果我们有一个不确定的过程，最成功的方法之一可能不会产生最优策略。作为一种解决方案，我们认为分布强化学习有助于完全纠正这种情况。通过引入共轭分配算子，我们可以在保证理论收敛的情况下处理一大类实际收益的变换。我们提出了一种基于该算子的近似单角色算法，该算法使用Cramêer距离给出的适当分布度量，直接在不变的奖励上训练代理。为了评估其在随机环境中的性能，我们使用粘性动作在55个Atari 2600游戏套件上训练代理，并与多巴胺框架中的其他著名算法相比，获得最先进的性能。摘要：In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cram'er distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.

【4】 Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning 标题：元强化学习的大幅减方差有偏梯度估计链接：https://arxiv.org/abs/2112.07328

作者：Yunhao Tang 机构：DeepMind 摘要：尽管元强化学习（meta-RL）在经验上取得了成功，但在理论和实践之间仍存在一些理解不足的差异。关键的是，有偏梯度估计几乎总是在实践中实现，而先前的meta-RL理论只在无偏梯度估计下建立收敛性。在这项工作中，我们调查了这种差异。特别地，（1）我们证明了无偏梯度估计具有方差$Theta（N）$，其线性依赖于内循环更新的样本大小$N$；（2）我们提出了线性化分数函数（LSF）梯度估计，它有偏差$mathcal{O}（1/sqrt{N}）$和方差$mathcal{O}（1/N）$；（3）我们表明，大多数经验先前的工作实际上实现了LSF梯度估计的变体。这意味着实际算法“偶然”引入偏差以获得更好的性能；（4）我们为meta-RL中的LSF梯度估计建立了关于其收敛到平稳点的理论保证，当$N$较大时，与先前的工作相比，显示出更好的对$N$的依赖性。摘要：Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. In particular, (1) We show that unbiased gradient estimates have variance $Theta(N)$ which linearly depends on the sample size $N$ of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias $mathcal{O}(1/sqrt{N})$ and variance $mathcal{O}(1/N)$; (3) We show that most empirical prior work in fact implements variants of the LSF gradient estimates. This implies that practical algorithms "accidentally" introduce bias to achieve better performance; (4) We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on $N$ than prior work when $N$ is large.

【5】 Autonomous Navigation and Configuration of Integrated Access Backhauling for UAV Base Station Using Reinforcement Learning 标题：基于强化学习的无人机基站综合接入回程自主导航配置链接：https://arxiv.org/abs/2112.07313

作者：Hongyi Zhang,Jingya Li,Zhiqiang Qi,Xingqin Lin,Anders Aronsson,Jan Bosch,Helena Holmström Olsson 机构：Chalmers University of Technology, com‡Malm¨o University 备注：This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要：快速可靠的连接对于增强公共安全任务关键型（MC）用户的态势感知和运营效率至关重要。在紧急或灾难情况下，现有蜂窝网络覆盖范围和容量可能无法满足MC通信需求，可快速利用基于网络的可部署解决方案，如车轮/机翼上的蜂窝，以确保MC用户的可靠连接。在本文中，我们考虑的情况下，一个宏观基站（BS）被破坏的自然灾害和无人机携带BS（UAV-BS）成立，以提供临时覆盖范围内的用户在灾区。UAV-BS使用5G综合接入和回程（IAB）技术集成到移动网络中。我们提出了一个将机器学习应用于这个用例的框架和信号传递过程。设计了一种深度强化学习算法，以联合优化接入和回程天线倾斜以及UAV-BS的三维位置，以便在保持良好回程连接的同时为地面MC用户提供最佳服务。结果表明，该算法能够自主导航和配置无人机基站，提高吞吐量，降低MC用户的丢包率。摘要：Fast and reliable connectivity is essential to enhancing situational awareness and operational efficiency for public safety mission-critical (MC) users. In emergency or disaster circumstances, where existing cellular network coverage and capacity may not be available to meet MC communication demands, deployable-network-based solutions such as cells-on-wheels/wings can be utilized swiftly to ensure reliable connection for MC users. In this paper, we consider a scenario where a macro base station (BS) is destroyed due to a natural disaster and an unmanned aerial vehicle carrying BS (UAV-BS) is set up to provide temporary coverage for users in the disaster area. The UAV-BS is integrated into the mobile network using the 5G integrated access and backhaul (IAB) technology. We propose a framework and signalling procedure for applying machine learning to this use case. A deep reinforcement learning algorithm is designed to jointly optimize the access and backhaul antenna tilt as well as the three-dimensional location of the UAV-BS in order to best serve the on-ground MC users while maintaining a good backhaul connection. Our result shows that the proposed algorithm can autonomously navigate and configure the UAV-BS to improve the throughput and reduce the drop rate of MC users.

【6】 NEORL: NeuroEvolution Optimization with Reinforcement Learning 标题：NEORL：基于强化学习的神经进化优化链接：https://arxiv.org/abs/2112.07057

作者：Majdi I. Radaideh,Katelin Du,Paul Seurin,Devin Seyler,Xubo Gu,Haijia Wang,Koroush Shirvan 机构：Department of Nuclear Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA , United States, Department of Physics, School of Nuclear Science and Engineering, Shanghai Jiao Tong University, Shanghai , China 备注：23 pages, 6 figures, 7 tables 摘要：我们介绍了麻省理工学院开发的用于神经进化优化和强化学习（NEORL）的开源Python框架。NEORL为进化计算领域的最新算法、通过强化学习的神经网络和混合神经进化算法提供了一个全局优化接口。NEORL具有多种算法、用户友好的界面、并行计算支持、自动超参数调整、详细文档以及数学和现实工程优化应用演示。NEORL包括各种优化问题，从组合优化、连续优化、混合离散/连续优化，到高维、昂贵和受约束的工程优化。NEORL在与解决气候变化问题的低碳能源研究相关的各种工程应用中进行了测试。这些例子包括核反应堆控制和燃料电池发电。结果表明，NEORL相对于文献中的其他算法和优化框架具有竞争力，是解决大规模优化问题的潜在工具。NEORL的更多示例和基准可在此处找到：https://neorl.readthedocs.io/en/latest/index.html 摘要：We present an open-source Python framework for NeuroEvolution Optimization with Reinforcement Learning (NEORL) developed at the Massachusetts Institute of Technology. NEORL offers a global optimization interface of state-of-the-art algorithms in the field of evolutionary computation, neural networks through reinforcement learning, and hybrid neuroevolution algorithms. NEORL features diverse set of algorithms, user-friendly interface, parallel computing support, automatic hyperparameter tuning, detailed documentation, and demonstration of applications in mathematical and real-world engineering optimization. NEORL encompasses various optimization problems from combinatorial, continuous, mixed discrete/continuous, to high-dimensional, expensive, and constrained engineering optimization. NEORL is tested in variety of engineering applications relevant to low carbon energy research in addressing solutions to climate change. The examples include nuclear reactor control and fuel cell power production. The results demonstrate NEORL competitiveness against other algorithms and optimization frameworks in the literature, and a potential tool to solve large-scale optimization problems. More examples and benchmarking of NEORL can be found here: https://neorl.readthedocs.io/en/latest/index.html

【7】 Teaching a Robot to Walk Using Reinforcement Learning 标题：用强化学习教机器人行走链接：https://arxiv.org/abs/2112.07031

作者：Jack Dibachi,Jacob Azoulay 机构：Stanford University, AA,: Decision Making under Uncertainty 备注：6 pages, 9 figures 摘要：经典控制技术，如PID和LQR，已被有效地用于维持系统状态，但当模型动态的复杂性和灵敏度增加时，这些技术变得更难实现。对于具有多个自由度的自适应机器人运动任务，该任务在经典控制技术下变得不可行。相反，强化学习可以轻松地训练最佳步行策略。我们应用深度Q-学习和增强随机搜索（ARS）在OpenAI Gym BipedalWalker-v3环境中教一个模拟的二维两足机器人如何行走。深度Q-学习不会产生高回报策略，通常过早地收敛到次优局部极大值，这可能是由于粗略离散的动作空间。然而，ARS产生了一个训练有素的机器人，并产生了一个最优策略，正式“解决”了Bipedalker-v3问题。各种简单策略，包括随机策略、手动编码的英寸前进策略和静止策略，被用作评估学习算法结果熟练程度的基准。摘要：Classical control techniques such as PID and LQR have been used effectively in maintaining a system state, but these techniques become more difficult to implement when the model dynamics increase in complexity and sensitivity. For adaptive robotic locomotion tasks with several degrees of freedom, this task becomes infeasible with classical control techniques. Instead, reinforcement learning can train optimal walking policies with ease. We apply deep Q-learning and augmented random search (ARS) to teach a simulated two-dimensional bipedal robot how to walk using the OpenAI Gym BipedalWalker-v3 environment. Deep Q-learning did not yield a high reward policy, often prematurely converging to suboptimal local maxima likely due to the coarsely discretized action space. ARS, however, resulted in a better trained robot, and produced an optimal policy which officially "solves" the BipedalWalker-v3 problem. Various naive policies, including a random policy, a manually encoded inch forward policy, and a stay still policy, were used as benchmarks to evaluate the proficiency of the learning algorithm results.

元学习(1篇)

【1】 Meta-CPR: Generalize to Unseen Large Number of Agents with Communication Pattern Recognition Module 标题：Meta-CPR：用通信模式识别模块推广到看不见的大量Agent 链接：https://arxiv.org/abs/2112.07222

作者：Wei-Cheng Tseng,Wei Wei,Da-Chen Juan,Min Sun 机构： National Tsing Hua University, Google AI Research, Appier Inc., Taiwan 摘要：在强化学习中，设计一种有效的agent之间的通信机制一直是一项具有挑战性的任务，特别是在现实应用中。代理的数量可能会增加，或者环境有时需要与真实场景中不断变化的代理数量进行交互。为此，多代理框架需要在规模和动态方面处理代理的各种场景，以便在实际应用中实用。我们将具有不同数量代理的多代理环境描述为一个多任务问题，并提出了一个元强化学习（meta-RL）框架来解决这个问题。该框架采用元学习通信模式识别（CPR）模块来识别通信行为，并提取有助于训练过程的信息。实验结果表明，所提出的框架（a）可以推广到未知的更大数量的代理，并且（b）允许代理的数量在不同的事件之间变化。消融研究也被提供来解释提议的CPR设计，并表明这种设计是有效的。摘要：Designing an effective communication mechanism among agents in reinforcement learning has been a challenging task, especially for real-world applications. The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios. To this end, a multi-agent framework needs to handle various scenarios of agents, in terms of both scales and dynamics, for being practical to real-world applications. We formulate the multi-agent environment with a different number of agents as a multi-tasking problem and propose a meta reinforcement learning (meta-RL) framework to tackle this problem. The proposed framework employs a meta-learned Communication Pattern Recognition (CPR) module to identify communication behavior and extract information that facilitates the training process. Experimental results are poised to demonstrate that the proposed framework (a) generalizes to an unseen larger number of agents and (b) allows the number of agents to change between episodes. The ablation study is also provided to reason the proposed CPR design and show such design is effective.

医学相关(4篇)

【1】 Automatic COVID-19 disease diagnosis using 1D convolutional neural network and augmentation with human respiratory sound based on parameters: cough, breath, and voice 标题：基于咳嗽、呼吸和声音参数的一维卷积神经网络和人体呼吸音增强的冠状病毒病自动诊断链接：https://arxiv.org/abs/2112.07285

作者：Kranthi Kumar Lella,Alphonse Pja 机构：Department of Computer Applications, NIT Tiruchirappalli, Tamil Nadu, India 备注：None 摘要：呼吸音分类问题在过去一年中得到了临床科学家和医学研究者的关注，以诊断COVID-19病。到目前为止，各种人工智能模型（AI）进入真实世界以检测人类产生的声音如语音/语音、咳嗽和呼吸的COVID-19疾病。卷积神经网络（CNN）模型是基于人工智能（AI）实现的，用于解决机器上的许多实际问题。在此背景下，2019冠状病毒疾病的诊断和治疗，从呼吸、声音、呼吸等呼吸声中检测出COVID-19型呼吸系统疾病。基于2019冠状病毒疾病2019冠状病毒疾病的数据集，采用基于增强的机制来提高COVID-19声音数据集的预处理性能，并利用一维卷积网络实现COVID-19疾病诊断自动化。此外，使用DDAE（数据去噪自动编码器）技术来生成深度声音特征，例如1D CNN的输入函数，而不是采用MFCC（Mel频率倒谱系数）的标准输入，并且它比以前的模型具有更好的精度和性能。摘要：The issue in respiratory sound classification has attained good attention from the clinical scientists and medical researcher's group in the last year to diagnosing COVID-19 disease. To date, various models of Artificial Intelligence (AI) entered into the real-world to detect the COVID-19 disease from human-generated sounds such as voice/speech, cough, and breath. The Convolutional Neural Network (CNN) model is implemented for solving a lot of real-world problems on machines based on Artificial Intelligence (AI). In this context, one dimension (1D) CNN is suggested and implemented to diagnose respiratory diseases of COVID-19 from human respiratory sounds such as a voice, cough, and breath. An augmentation-based mechanism is applied to improve the preprocessing performance of the COVID-19 sounds dataset and to automate COVID-19 disease diagnosis using the 1D convolutional network. Furthermore, a DDAE (Data De-noising Auto Encoder) technique is used to generate deep sound features such as the input function to the 1D CNN instead of adopting the standard input of MFCC (Mel-frequency cepstral coefficient), and it is performed better accuracy and performance than previous models.

【2】 Classification of histopathology images using ConvNets to detect Lupus Nephritis 标题：基于ConvNets的狼疮性肾炎组织病理学图像分类链接：https://arxiv.org/abs/2112.07555

作者：Akash Gupta,Anirudh Reddy,CV Jawahar,PK Vinod 机构：New York University, IIIT Hyderabad 备注：Accepted in the 2021 Medical Imaging meets NeurIPS Workshop 摘要：系统性红斑狼疮（SLE）是一种自身免疫性疾病，患者的免疫系统开始攻击身体的健康组织。狼疮性肾炎（LN）是指肾组织的炎症，由于这些攻击而导致肾功能衰竭。国际肾病学会/肾脏病理学会（ISN/RPS）发布了一个基于SLE肾损伤过程中观察到的各种模式的分类系统。传统的方法需要对肾活检进行细致的病理评估，而且耗时。最近，计算技术通过使用虚拟显微镜或全玻片成像（WSI）帮助缓解了这个问题。通过使用深度学习和现代计算机视觉技术，我们提出了一个管道，该管道能够自动完成以下过程：1）检测这些完整幻灯片图像中的各种肾小球模式；2）使用提取的肾小球特征对每个图像进行分类。摘要：Systemic lupus erythematosus (SLE) is an autoimmune disease in which the immune system of the patient starts attacking healthy tissues of the body. Lupus Nephritis (LN) refers to the inflammation of kidney tissues resulting in renal failure due to these attacks. The International Society of Nephrology/Renal Pathology Society (ISN/RPS) has released a classification system based on various patterns observed during renal injury in SLE. Traditional methods require meticulous pathological assessment of the renal biopsy and are time-consuming. Recently, computational techniques have helped to alleviate this issue by using virtual microscopy or Whole Slide Imaging (WSI). With the use of deep learning and modern computer vision techniques, we propose a pipeline that is able to automate the process of 1) detection of various glomeruli patterns present in these whole slide images and 2) classification of each image using the extracted glomeruli features.

【3】 Improving COVID-19 CXR Detection with Synthetic Data Augmentation 标题：利用合成数据增强改进冠状病毒CXR检测链接：https://arxiv.org/abs/2112.07529

作者：Daniel Schaudt,Christopher Kloth,Christian Spaete,Andreas Hinteregger,Meinrad Beer,Reinhold von Schwerin 机构： Technische Hochschule Ulm - Ulm University of Applied Sciences, Universitätsklinikum Ulm - Ulm University Medical Center 备注：This paper has been accepted at the Upper-Rhine Artificial Intelligence Symposium 2021 arXiv:2112.05657 摘要：自从COVID-19流行病开始以来，研究人员已经开发出深度学习模型来分类COVID-19诱导的肺炎。与许多医学成像任务一样，可用数据的质量和数量通常是有限的。2019冠状病毒疾病的影像学研究，在国内外的胸部X射线数据上进行了深入的研究。两名放射科医生对数据进行了审查和标记，以确保对模型的泛化能力进行高质量的估计。此外，我们正在使用生成性对抗网络，根据这些数据生成合成X射线图像。我们的结果表明，使用这些合成图像进行数据扩充可以显著提高模型的性能。对于许多稀疏数据域，这是一种很有前途的方法。摘要：Since the beginning of the COVID-19 pandemic, researchers have developed deep learning models to classify COVID-19 induced pneumonia. As with many medical imaging tasks, the quality and quantity of the available data is often limited. In this work we train a deep learning model on publicly available COVID-19 image data and evaluate the model on local hospital chest X-ray data. The data has been reviewed and labeled by two radiologists to ensure a high quality estimation of the generalization capabilities of the model. Furthermore, we are using a Generative Adversarial Network to generate synthetic X-ray images based on this data. Our results show that using those synthetic images for data augmentation can improve the model's performance significantly. This can be a promising approach for many sparse data domains.

【4】 COVID-19 Pneumonia and Influenza Pneumonia Detection Using Convolutional Neural Networks 标题：基于卷积神经网络的冠状病毒肺炎和流感肺炎检测链接：https://arxiv.org/abs/2112.07102

作者：Julianna Antonchuk,Benjamin Prescott,Philip Melanchthon,Robin Singh 机构：Northwestern University 备注：for associated Azure ML notebook code, see this https URL 摘要：在研究2019冠状病毒疾病、流感病毒肺炎和正常生物标志物方面，我们开发了一种计算机视觉解决方案来支持诊断放射学。COVID-19肺炎的胸片外观被认为是非特异性的，它提出了一个挑战，以确定卷积神经网络（CNN）的最佳结构，该分类将在COVID-19和非COVID-19型肺炎的肺部炎症特征之间进行高灵敏度分类。拉赫曼（2021）2019冠状病毒疾病图像的不可用性和质量问题影响诊断过程，并影响深度学习检测模型的准确性。COVID-19射线照相图像的一个显著的不足引入了一个不平衡的数据激励我们使用过采样技术。在这项研究2019冠状病毒疾病中，我们包括了一套广泛的X射线成像（CXR）和COVID-19肺炎、流感病毒肺炎和正常生物标志物，以实现可扩展和精确的CNN模型。在研究的实验阶段，我们评估了各种卷积网络结构，选择了具有两个传统卷积层和两个最大功能池层的顺序卷积网络。在分类性能方面，表现最好的模型的验证准确率为93%，F1得分为0.95。我们选择Azure机器学习服务来执行网络实验和解决方案部署。自动缩放计算集群显著减少了网络训练的时间。我们希望看到人工智能和人类生物学领域的科学家合作并扩大拟议解决方案的范围，以提供快速和全面的诊断，有效地减缓病毒的传播摘要：In the research, we developed a computer vision solution to support diagnostic radiology in differentiating between COVID-19 pneumonia, influenza virus pneumonia, and normal biomarkers. The chest radiograph appearance of COVID-19 pneumonia is thought to be nonspecific, having presented a challenge to identify an optimal architecture of a convolutional neural network (CNN) that would classify with a high sensitivity among the pulmonary inflammation features of COVID-19 and non-COVID-19 types of pneumonia. Rahman (2021) states that COVID-19 radiography images observe unavailability and quality issues impacting the diagnostic process and affecting the accuracy of the deep learning detection models. A significant scarcity of COVID-19 radiography images introduced an imbalance in data motivating us to use over-sampling techniques. In the study, we include an extensive set of X-ray imaging of human lungs (CXR) with COVID-19 pneumonia, influenza virus pneumonia, and normal biomarkers to achieve an extensible and accurate CNN model. In the experimentation phase of the research, we evaluated a variety of convolutional network architectures, selecting a sequential convolutional network with two traditional convolutional layers and two pooling layers with maximum function. In its classification performance, the best performing model demonstrated a validation accuracy of 93% and an F1 score of 0.95. We chose the Azure Machine Learning service to perform network experimentation and solution deployment. The auto-scaling compute clusters offered a significant time reduction in network training. We would like to see scientists across fields of artificial intelligence and human biology collaborating and expanding on the proposed solution to provide rapid and comprehensive diagnostics, effectively mitigating the spread of the virus

蒸馏|知识提取(1篇)

【1】 Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification 标题：事实提取与验证中分散实体的虚假声明鲁棒信息检索链接：https://arxiv.org/abs/2112.07618

作者：Mingwen Dong,Christos Christodoulopoulos,Sheng-Min Shih,Xiaofei Ma 摘要：准确的证据检索对于自动事实检查至关重要。以前很少有研究关注真实和虚假陈述之间的差异以及它们如何影响证据检索。本文表明，与真实声明相比，虚假声明更频繁地包含无关实体，这会分散证据检索模型的注意力。基于BERT的检索模型在检索虚假声明的反驳证据时比检索真实声明的支持证据时犯的错误更多。当使用包含无关实体的对抗性虚假声明（合成生成）进行测试时，检索模型的召回率明显低于原始声明。这些结果表明，基于香草BERT的检索模型对虚假声明中的无关实体不具有鲁棒性。通过使用包含不相关实体的合成虚假声明扩充训练数据，训练后的模型实现了更高的证据召回率，包括包含不相关实体的虚假声明的证据召回率。此外，使用单独的模型检索反驳和支持证据，然后将其聚合也可以提高证据召回率，包括不相关实体的虚假声明。这些结果表明，我们可以通过数据扩充和模型集成来提高基于BERT的检索模型对具有无关实体的虚假声明的鲁棒性。摘要：Accurate evidence retrieval is essential for automated fact checking. Little previous research has focused on the differences between true and false claims and how they affect evidence retrieval. This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model. A BERT-based retrieval model made more mistakes in retrieving refuting evidence for false claims than supporting evidence for true claims. When tested with adversarial false claims (synthetically generated) containing irrelevant entities, the recall of the retrieval model is significantly lower than that for original claims. These results suggest that the vanilla BERT-based retrieval model is not robust to irrelevant entities in the false claims. By augmenting the training data with synthetic false claims containing irrelevant entities, the trained model achieved higher evidence recall, including that of false claims with irrelevant entities. In addition, using separate models to retrieve refuting and supporting evidence and then aggregating them can also increase the evidence recall, including that of false claims with irrelevant entities. These results suggest that we can increase the BERT-based retrieval model's robustness to false claims with irrelevant entities via data augmentation and model ensemble.

推荐(3篇)

【1】 Re-ranking With Constraints on Diversified Exposures for Homepage Recommender System 标题：网页推荐系统中具有多样化曝光率约束的重排序链接：https://arxiv.org/abs/2112.07621

作者：Qi Hao,Tianze Luo,Guangda Huzhang 机构：Alibaba Group,Hangzhou,China,Nanyang Technological University,Singapore 备注：8pages,7figures 摘要：大多数电子商务应用程序的主页推荐以分层方式放置项目，不同的渠道以不同的样式显示项目。现有算法通常优化单个信道的性能。因此，设计该模型以获得最大化整个主页点击率（CTR）的最佳推荐列表是一个具有挑战性的问题。除了准确性目标之外，主页上的显示多样性也很重要，因为同质显示通常会损害用户体验。在本文中，我们提出了一个两阶段的主页推荐系统架构。在第一阶段，我们开发了有效的算法，在保持多样性的同时将项目推荐到适当的渠道。这两种方法可以结合使用：具有多样性约束的用户渠道项目预测模型。在第二阶段，我们提供每个通道中项目的有序列表。现有的重新排序模型难以描述渠道内和渠道间项目之间的相互影响。因此，我们提出了一种用于网页推荐系统的深层层次注意网络重排序（DHANR）模型。分层注意网络由项目编码器、项目级注意层、通道编码器和通道级注意层组成。我们的方法在精度、列表内平均距离（ILAD）和信道方面都有显著的改进Precision@k在离线实验和在线系统中的CTR和ILAD方面。摘要：The homepage recommendation on most E-commerce applications places items in a hierarchical manner, where different channels display items in different styles. Existing algorithms usually optimize the performance of a single channel. So designing the model to achieve the optimal recommendation list which maximize the Click-Through Rate (CTR) of whole homepage is a challenge problem. Other than the accuracy objective, display diversity on the homepage is also important since homogeneous display usually hurts user experience. In this paper, we propose a two-stage architecture of the homepage recommendation system. In the first stage, we develop efficient algorithms for recommending items to proper channels while maintaining diversity. The two methods can be combined: user-channel-item predictive model with diversity constraint. In the second stage, we provide an ordered list of items in each channel. Existing re-ranking models are hard to describe the mutual influence between items in both intra-channel and inter-channel. Therefore, we propose a Deep & Hierarchical Attention Network Re-ranking (DHANR) model for homepage recommender systems. The Hierarchical Attention Network consists of an item encoder, an item-level attention layer, a channel encoder and a channel-level attention layer. Our method achieves a significant improvement in terms of precision, intra-list average distance(ILAD) and channel-wise Precision@k in offline experiments and in terms of CTR and ILAD in our online systems.

【2】 A cross-domain recommender system using deep coupled autoencoders 标题：基于深度耦合自动编码器的跨域推荐系统链接：https://arxiv.org/abs/2112.07617

作者：Alexandros Gkillas,Dimitrios Kosmopoulos 机构： Gkillas is with Graduate department of Electrical and Computer Engineer-ing, Patras University, Kosmopoulos is with the Faculty of Department of Computer Engineeringand Informatics 备注：This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要：长期存在的数据稀疏性和冷启动对推荐系统来说是一个棘手的问题。跨域推荐作为一种域适应框架，通过利用来自多个域的信息，有效地解决了这些具有挑战性的问题。本研究探索了一个项目级关联跨域推荐任务，其中两个相关域，即源域和目标域包含公共项目，而不共享有关用户行为的敏感信息，从而避免了用户隐私泄露。针对这种情况，提出了两种新的基于耦合自编码器的深度学习方法用于跨域推荐。第一种方法旨在同时学习一对自动编码器，以揭示源域和目标域中项目的内在表示，以及耦合映射函数，以建模这些表示之间的非线性关系，从而将有益的信息从源领域转移到目标领域。第二种方法是基于一个新的联合正则化优化问题导出的，该问题使用两个自动编码器以深度和非线性方式生成用户和项目潜在因素，同时学习数据驱动函数以跨域映射项目潜在因素。在两个公开的基准数据集上进行了大量的数值实验，说明了我们提出的方法与几种最先进的跨领域推荐框架相比的优越性能。摘要：Long-standing data sparsity and cold-start constitute thorny and perplexing problems for the recommendation systems. Cross-domain recommendation as a domain adaptation framework has been utilized to efficiently address these challenging issues, by exploiting information from multiple domains. In this study, an item-level relevance cross-domain recommendation task is explored, where two related domains, that is, the source and the target domain contain common items without sharing sensitive information regarding the users' behavior, and thus avoiding the leak of user privacy. In light of this scenario, two novel coupled autoencoder-based deep learning methods are proposed for cross-domain recommendation. The first method aims to simultaneously learn a pair of autoencoders in order to reveal the intrinsic representations of the items in the source and target domains, along with a coupled mapping function to model the non-linear relationships between these representations, thus transferring beneficial information from the source to the target domain. The second method is derived based on a new joint regularized optimization problem, which employs two autoencoders to generate in a deep and non-linear manner the user and item-latent factors, while at the same time a data-driven function is learnt to map the item-latent factors across domains. Extensive numerical experiments on two publicly available benchmark datasets are conducted illustrating the superior performance of our proposed methods compared to several state-of-the-art cross-domain recommendation frameworks.

【3】 DiPS: Differentiable Policy for Sketching in Recommender Systems 标题：DIPS：推荐系统中草图的差异化策略链接：https://arxiv.org/abs/2112.07616

作者：Aritra Ghosh,Saayan Mitra,Andrew Lan 机构：University of Massachusetts Amherst, Adobe Research 备注：AAAI 2022 with supplementary material 摘要：在顺序推荐系统应用程序中，重要的是开发能够捕获用户随时间变化的兴趣的模型，以便成功推荐他们可能与之交互的未来项目。对于历史悠久的用户，基于递归神经网络的典型模型往往会忘记遥远过去的重要项目。最近的工作表明，存储过去项目的小草图可以改进顺序推荐任务。然而，这些工作都依赖于静态草图策略，即选择要保留在草图中的项目的启发式方法，这不一定是最优的，并且不能随着时间的推移使用更多的训练数据进行改进。在本文中，我们提出了一种可微素描策略（DiPS），一种以端到端的方式学习数据驱动的素描策略的框架，以及推荐系统模型，以明确地最大化未来的推荐质量。我们还提出了一个梯度的近似估计，用于优化素描算法参数，这在计算上是有效的。我们在各种实际设置下验证了真实世界数据集上DIP的有效性，并表明与现有草图策略相比，它需要多达50%$的草图项才能达到相同的预测质量。摘要：In sequential recommender system applications, it is important to develop models that can capture users' evolving interest over time to successfully recommend future items that they are likely to interact with. For users with long histories, typical models based on recurrent neural networks tend to forget important items in the distant past. Recent works have shown that storing a small sketch of past items can improve sequential recommendation tasks. However, these works all rely on static sketching policies, i.e., heuristics to select items to keep in the sketch, which are not necessarily optimal and cannot improve over time with more training data. In this paper, we propose a differentiable policy for sketching (DiPS), a framework that learns a data-driven sketching policy in an end-to-end manner together with the recommender system model to explicitly maximize recommendation quality in the future. We also propose an approximate estimator of the gradient for optimizing the sketching algorithm parameters that is computationally efficient. We verify the effectiveness of DiPS on real-world datasets under various practical settings and show that it requires up to $50%$ fewer sketch items to reach the same predictive quality than existing sketching policies.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Joint 3D Object Detection and Tracking Using Spatio-Temporal Representation of Camera Image and LiDAR Point Clouds 标题：基于摄像机图像和激光雷达点云时空表示的联合三维目标检测与跟踪链接：https://arxiv.org/abs/2112.07116

作者：Junho Koh,Jaekyum Kim,Jinhyuk Yoo,Yecheol Kim,Jun Won Choi 机构：Hanyang University, Korea Advanced Institute of Science and Technology (KAIST) 摘要：在本文中，我们提出了一种新的联合目标检测和跟踪（JoDT）框架，用于基于相机和激光雷达传感器的三维目标检测和跟踪。所提出的方法称为3D DetectTrack，使探测器和跟踪器能够协作生成相机和激光雷达数据的时空表示，然后使用该表示来执行3D对象检测和跟踪。探测器通过相机和激光雷达融合获得的空间特征的加权时间聚集来构造时空特征。然后，检测器使用来自保持到上一时间步的轨迹的信息重新配置初始检测结果。基于检测器生成的时空特征，跟踪器使用图形神经网络（GNN）将检测到的对象与先前跟踪的对象相关联。我们通过基于规则的边缘修剪和基于注意的边缘选通相结合，设计了一个完全连通的GNN，它利用空间和时间对象上下文来提高跟踪性能。在KITTI和nuScenes基准上进行的实验表明，与基线方法相比，3D DetecTrack在检测和跟踪性能方面取得了显著的改进，并且通过检测器和跟踪器之间的协作，在现有方法中实现了最先进的性能。摘要：In this paper, we propose a new joint object detection and tracking (JoDT) framework for 3D object detection and tracking based on camera and LiDAR sensors. The proposed method, referred to as 3D DetecTrack, enables the detector and tracker to cooperate to generate a spatio-temporal representation of the camera and LiDAR data, with which 3D object detection and tracking are then performed. The detector constructs the spatio-temporal features via the weighted temporal aggregation of the spatial features obtained by the camera and LiDAR fusion. Then, the detector reconfigures the initial detection results using information from the tracklets maintained up to the previous time step. Based on the spatio-temporal features generated by the detector, the tracker associates the detected objects with previously tracked objects using a graph neural network (GNN). We devise a fully-connected GNN facilitated by a combination of rule-based edge pruning and attention-based edge gating, which exploits both spatial and temporal object contexts to improve tracking performance. The experiments conducted on both KITTI and nuScenes benchmarks demonstrate that the proposed 3D DetecTrack achieves significant improvements in both detection and tracking performances over baseline methods and achieves state-of-the-art performance among existing methods through collaboration between the detector and tracker.

推理|分析|理解|解释(4篇)

【1】 Branching Time Active Inference with Bayesian Filtering 标题：基于贝叶斯滤波的分支时间主动推理链接：https://arxiv.org/abs/2112.07406

作者：Théophile Champion,Marek Grześ,Howard Bowman 机构：University of Kent, School of Computing, Canterbury CT,NZ, United Kingdom, University of Birmingham, School of Psychology, Birmingham B,TT, United Kingdom, Editor: TO BE FILLED 备注：16 pages, 2 figures, 2 tables. arXiv admin note: text overlap with arXiv:2111.11276 摘要：分支时间主动推理（Champion et al.，2021b，a）是一个框架，旨在将规划视为贝叶斯模型扩展的一种形式。其根源可以在积极推理（Friston等人，2016年；Da Costa等人，2020年；Champion等人，2021c）中找到，这是一种广泛用于大脑建模的神经科学框架，也可以在Monte Carlo树搜索中找到（Browne等人，2012年），这是一种在强化学习文献中广泛应用的方法。到目前为止，通过利用变分信息传递（Winn和Bishop，2005）所提供的灵活性来推断潜在变量，这是一个迭代过程，可以理解为沿因子图的边缘发送信息（Forney，2001）。在本文中，我们利用了一种称为贝叶斯滤波（Fox et al.，2003）的替代推理方法的效率，该方法在变分自由能收敛之前不需要迭代更新方程。相反，该方案在两个阶段之间交替进行：证据集成和未来状态预测。这两个阶段都可以有效地执行，速度比最新技术提高了70倍。摘要：Branching Time Active Inference (Champion et al., 2021b,a) is a framework proposing to look at planning as a form of Bayesian model expansion. Its root can be found in Active Inference (Friston et al., 2016; Da Costa et al., 2020; Champion et al., 2021c), a neuroscientific framework widely used for brain modelling, as well as in Monte Carlo Tree Search (Browne et al., 2012), a method broadly applied in the Reinforcement Learning literature. Up to now, the inference of the latent variables was carried out by taking advantage of the flexibility offered by Variational Message Passing (Winn and Bishop, 2005), an iterative process that can be understood as sending messages along the edges of a factor graph (Forney, 2001). In this paper, we harness the efficiency of an alternative method for inference called Bayesian Filtering (Fox et al., 2003), which does not require the iteration of the update equations until convergence of the Variational Free Energy. Instead, this scheme alternates between two phases: integration of evidence and prediction of future states. Both of those phases can be performed efficiently and this provides a seventy times speed up over the state-of-the-art.

【2】 Survey of Generative Methods for Social Media Analysis 标题：社交媒体分析生成方法综述链接：https://arxiv.org/abs/2112.07041

作者：Stan Matwin,Aristides Milios,Paweł Prałat,Amilcar Soares,François Théberge 机构：Pawe�l Pra�lat§, Fran¸cois Th´eberge‖, ∗We acknowledge the support of the Communications Security Establishment and Defence Research, and Development Canada. The scientific or technical validity of this report is entirely the responsibility 摘要：这项调查为社会媒体数据分析的生成方法的研究勾勒出了一幅宽泛、全景式的艺术状态（SoTA）。它填补了一个空白，因为现有的调查文章要么范围狭窄得多，要么年代久远。我们包括了目前在挖掘和建模社交媒体方面越来越重要的两个方面：动态和网络。社会动态对于理解影响或疾病的传播、友谊的形成、团队的生产力等非常重要。另一方面，网络可以捕捉各种复杂的关系，提供额外的洞察力，并识别可能不被注意的重要模式。摘要：This survey draws a broad-stroke, panoramic picture of the State of the Art (SoTA) of the research in generative methods for the analysis of social media data. It fills a void, as the existing survey articles are either much narrower in their scope or are dated. We included two important aspects that currently gain importance in mining and modeling social media: dynamics and networks. Social dynamics are important for understanding the spreading of influence or diseases, formation of friendships, the productivity of teams, etc. Networks, on the other hand, may capture various complex relationships providing additional insight and identifying important patterns that would otherwise go unnoticed.

【3】 Automated Customization of On-Thing Inference for Quality-of-Experience Enhancement 标题：自动定制物联网推理以提高体验质量链接：https://arxiv.org/abs/2112.06918

作者：Yang Bai,Lixing Chen,Shaolei Ren,Jie Xu 机构： Chen is with the Institute of Cyber Science and Technology, ShanghaiJiao Tong University, and Shanghai Key Laboratory of Integrated Admin-istration Technologies for Information Security 摘要：智能应用的迅速普及正在推动深度学习（DL）能力向物联网（IoT）发展。尽管出现了将深度神经网络（DNN）嵌入物联网设备的新工具，但由于DNN架构、物联网设备和用户偏好的异质性，为用户提供满意的体验质量（QoE）仍然具有挑战性。本文研究物联网设备上DL推理的自动化定制（称为物上推理），我们的目标是通过为不同使用场景下的用户配置合适的DNN来提高用户的QoE。该方法的核心是一个DNN选择模块，该模块动态学习用户QoE模式，并利用所学知识确定最适合于事物推理的DNN。它利用了一种新的在线学习算法NeuralUCB，该算法具有处理各种用户QoE模式的出色泛化能力。我们还将知识转移技术嵌入到NeuralUCB中，以加快学习过程。然而，NeuralUCB经常向用户征求QoE评级，这带来了不可忽视的不便。为了解决这个问题，我们设计了反馈请求方案，以减少QoE请求的数量，同时保持NeuralUCB的学习效率。为了提高框架的实用性，我们进一步研究了一个实用问题，即聚合QoE。我们对合成数据和真实数据进行实验。结果表明，我们的方法能够有效地学习用户的QoE模式，并且能够在很少的请求下为物联网设备提供显著的QoE增强。摘要：The rapid uptake of intelligent applications is pushing deep learning (DL) capabilities to Internet-of-Things (IoT). Despite the emergence of new tools for embedding deep neural networks (DNNs) into IoT devices, providing satisfactory Quality of Experience (QoE) to users is still challenging due to the heterogeneity in DNN architectures, IoT devices, and user preferences. This paper studies automated customization for DL inference on IoT devices (termed as on-thing inference), and our goal is to enhance user QoE by configuring the on-thing inference with an appropriate DNN for users under different usage scenarios. The core of our method is a DNN selection module that learns user QoE patterns on-the-fly and identifies the best-fit DNN for on-thing inference with the learned knowledge. It leverages a novel online learning algorithm, NeuralUCB, that has excellent generalization ability for handling various user QoE patterns. We also embed the knowledge transfer technique in NeuralUCB to expedite the learning process. However, NeuralUCB frequently solicits QoE ratings from users, which incurs non-negligible inconvenience. To address this problem, we design feedback solicitation schemes to reduce the number of QoE solicitations while maintaining the learning efficiency of NeuralUCB. A pragmatic problem, aggregated QoE, is further investigated to improve the practicality of our framework. We conduct experiments on both synthetic and real-world data. The results indicate that our method efficiently learns the user QoE pattern with few solicitations and provides drastic QoE enhancement for IoT devices.

【4】 Boosting Independent Component Analysis 标题：增强独立成分分析链接：https://arxiv.org/abs/2112.06920

作者：Yunpeng Li,ZhaoHui Ye 机构：Department of Automation, Tsinghua University, Beijing , China 摘要：独立成分分析旨在从线性混合物中尽可能独立地恢复未知成分。该技术已广泛应用于数据分析、信号处理和机器学习等领域。本文提出了一种新的基于boosting的独立分量分析算法。我们的算法通过在极大似然估计中引入boosting，填补了非参数独立分量分析的空白。与目前已知的许多算法相比，各种实验验证了其性能。摘要：Independent component analysis is intended to recover the unknown components as independent as possible from their linear mixtures. This technique has been widely used in many fields, such as data analysis, signal processing, and machine learning. In this paper, we present a novel boosting-based algorithm for independent component analysis. Our algorithm fills the gap in the nonparametric independent component analysis by introducing boosting to maximum likelihood estimation. A variety of experiments validate its performance compared with many of the presently known algorithms.

检测相关(1篇)

【1】 Out-of-Distribution Detection without Class Labels 标题：无类别标签的失配检测链接：https://arxiv.org/abs/2112.07662

作者：Niv Cohen,Ron Abutbul,Yedid Hoshen 机构：School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel 摘要：异常检测方法识别偏离数据集正常行为的样本。它通常用于包含来自多个标记类或单个未标记类的正常数据的训练集。当前的方法在面对由多个类组成但没有标签的训练数据时会遇到困难。在这项工作中，我们首先发现，通过自监督图像聚类方法学习的分类器为未标记的多类数据集的异常检测提供了强大的基线。也许令人惊讶的是，我们发现使用预先训练的特征初始化聚类方法并没有比其自我监督的方法有所改进。这是由于灾难性遗忘的现象。相反，我们建议采用两阶段方法。我们首先使用自监督方法对图像进行聚类，并为每个图像获得一个聚类标签。我们使用集群标签作为分布外（OOD）方法的“伪监督”。具体地说，我们在通过聚类标签对图像进行分类的任务中对预训练的特征进行微调。我们对我们的方法进行了广泛的分析，并论证了我们两阶段方法的必要性。我们根据最先进的自我监督和预训练方法对其进行评估，并证明其具有优异的性能。摘要：Anomaly detection methods identify samples that deviate from the normal behavior of the dataset. It is typically tackled either for training sets containing normal data from multiple labeled classes or a single unlabeled class. Current methods struggle when faced with training data consisting of multiple classes but no labels. In this work, we first discover that classifiers learned by self-supervised image clustering methods provide a strong baseline for anomaly detection on unlabeled multi-class datasets. Perhaps surprisingly, we find that initializing clustering methods with pre-trained features does not improve over their self-supervised counterparts. This is due to the phenomenon of catastrophic forgetting. Instead, we suggest a two stage approach. We first cluster images using self-supervised methods and obtain a cluster label for every image. We use the cluster labels as "pseudo supervision" for out-of-distribution (OOD) methods. Specifically, we finetune pretrained features on the task of classifying images by their cluster labels. We provide extensive analyses of our method and demonstrate the necessity of our two-stage approach. We evaluate it against the state-of-the-art self-supervised and pretrained methods and demonstrate superior performance.

分类|识别(5篇)

【1】 On the Use of External Data for Spoken Named Entity Recognition 标题：浅谈外部数据在口语命名实体识别中的应用链接：https://arxiv.org/abs/2112.07648

作者：Ankita Pasad,Felix Wu,Suwon Shon,Karen Livescu,Kyu J. Han 机构：ASAPP, Toyota Technological Institute at Chicago 摘要：口语理解（SLU）任务涉及从语音信号到语义标签的映射。考虑到这些任务的复杂性，良好的性能可能需要大量标记的数据集，这些数据集很难为每个新任务和域收集。然而，自监督语音表示的最新进展使得考虑有限的标记数据学习SLU模型是可行的。在这项工作中，我们将重点放在低资源语音命名实体识别（NER）上，并解决以下问题：除了自我监督的预训练外，我们如何使用未为任务注释的外部语音和/或文本数据？我们采用各种方法，包括自我训练，知识蒸馏和转移学习，并考虑其适用于端到端的模型和管道（语音识别，其次是文本模型）的方法。我们发现，其中一些方法在资源受限的环境中提高了性能，而不仅仅是预先训练好的表示。与之前的工作相比，我们发现F1成绩提高了16%。虽然最佳基线模型是管道方法，但使用外部数据时的最佳性能最终是通过端到端模型实现的。我们提供了详细的比较和分析，例如表明端到端模型能够关注更具体的单词。摘要：Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labels. Given the complexity of such tasks, good performance might be expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with limited labeled data. In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task? We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline (speech recognition followed by text NER model) approaches. We find that several of these approaches improve performance in resource-constrained settings beyond the benefits from pre-trained representations alone. Compared to prior work, we find improved F1 scores of up to 16%. While the best baseline model is a pipeline approach, the best performance when using external data is ultimately achieved by an end-to-end model. We provide detailed comparisons and analyses, showing for example that end-to-end models are able to focus on the more NER-specific words.

【2】 Margin Calibration for Long-Tailed Visual Recognition 标题：长尾视觉识别中的边缘校正链接：https://arxiv.org/abs/2112.07225

作者：Yidong Wang,Bowen Zhang,Wenxin Hou,Zhen Wu,Jindong Wang,Takahiro Shinozaki 机构：Tokyo Institute of Technology, Nanjing University, Microsoft Research Asia 备注：Technical report; 9 pages 摘要：视觉识别任务中的长尾类分布对神经网络如何处理头类和尾类之间的有偏预测提出了巨大挑战，即该模型倾向于将尾类分类为头类。虽然现有的研究主要集中在数据重采样和损失函数工程上，但在本文中，我们采用了不同的视角：分类裕度。我们研究了边际与logits（分类分数）之间的关系，并实证观察了有偏边际和有偏logits之间的正相关关系。我们提出MARC，一个简单而有效的边缘校准函数，用于动态校准无偏Logit的有偏边缘。我们通过对常见的长尾基准测试（包括CIFAR-LT、ImageNet LT、Places LT和iNaturalist-LT）进行广泛的实验来验证MARC。实验结果表明，我们的MARC在这些基准测试上取得了良好的结果。此外，MARC非常容易实现，只需三行代码。我们希望这一简单的方法将激励人们重新思考长尾视觉识别中的偏差边际和偏差逻辑。摘要：The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. We study the relationship between the margins and logits (classification scores) and empirically observe the biased margins and the biased logits are positively correlated. We propose MARC, a simple yet effective MARgin Calibration function to dynamically calibrate the biased margins for unbiased logits. We validate MARC through extensive experiments on common long-tailed benchmarks including CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist-LT. Experimental results demonstrate that our MARC achieves favorable results on these benchmarks. In addition, MARC is extremely easy to implement with just three lines of code. We hope this simple method will motivate people to rethink the biased margins and biased logits in long-tailed visual recognition.

【3】 Federated Nearest Neighbor Classification with a Colony of Fruit-Flies: With Supplement 标题：果蝇群体联合最近邻分类及其补充链接：https://arxiv.org/abs/2112.07157

作者：Parikshit Ram,Kaushik Sinha 机构：IBM Research AI,Wichita State University 备注：A extended version of the original paper with detailed supplementary materials (21 pages, 17 figures) 摘要：最近，有人提出并“重新编程”了果蝇嗅觉回路中神经机制的数学形式化，将其作为局部敏感哈希（Flyhash）和bloom过滤器（FBF），用于各种机器学习任务，如相似性搜索、离群点检测和文本嵌入。在具有挑战性的联邦学习（FL）设置中，我们提出了一种新的哈希和布鲁姆过滤器重新编程，以模拟规范最近邻分类器（NNC），其中训练和测试数据分布在各方之间，没有数据可以离开各自的方。具体来说，我们利用Flyhash和FBF创建FlyNN分类器，并从理论上建立FlyNN与NNC匹配的条件。我们展示了FlyNN如何在FL设置中以低通信开销准确地进行训练以产生FlyNNFL，以及它如何具有不同的私有性。从经验上看，我们证明（i）FlyNN在70个OpenML数据集上匹配NNC精度，（ii）FlyNNFL训练具有高度可扩展性和较低的通信开销，在16美元的数据集上提供高达8倍的加速比。摘要：The mathematical formalization of a neurological mechanism in the olfactory circuit of a fruit-fly as a locality sensitive hash (Flyhash) and bloom filter (FBF) has been recently proposed and "reprogrammed" for various machine learning tasks such as similarity search, outlier detection and text embeddings. We propose a novel reprogramming of this hash and bloom filter to emulate the canonical nearest neighbor classifier (NNC) in the challenging Federated Learning (FL) setup where training and test data are spread across parties and no data can leave their respective parties. Specifically, we utilize Flyhash and FBF to create the FlyNN classifier, and theoretically establish conditions where FlyNN matches NNC. We show how FlyNN is trained exactly in a FL setup with low communication overhead to produce FlyNNFL, and how it can be differentially private. Empirically, we demonstrate that (i) FlyNN matches NNC accuracy across 70 OpenML datasets, (ii) FlyNNFL training is highly scalable with low communication overhead, providing up to $8times$ speedup with $16$ parties.

【4】 Robustifying automatic speech recognition by extracting slowly varying features 标题：通过提取缓慢变化的特征来实现自动语音识别的ROBUST化链接：https://arxiv.org/abs/2112.07400

作者：Matias Pizarro,Dorothea Kolossa,Asja Fischer 机构：Ruhr University Bochum, Germany 摘要：在过去的几年中，已经证明深度学习系统在对抗性攻击下非常脆弱。基于神经网络的自动语音识别（ASR）系统也不例外。有针对性和无针对性的攻击可以修改音频输入信号，使人类仍能识别相同的单词，而ASR系统则被引导预测不同的转录。在本文中，我们提出了一种针对目标对抗性攻击的防御机制，包括在将输入反馈给ASR系统之前，通过应用慢速特征分析、低通滤波器或两者，从音频信号中移除快速变化的特征。我们对以这种方式预处理的数据训练的混合ASR模型进行了实证分析。虽然生成的模型在良性数据上表现得相当好，但它们对目标对手攻击的鲁棒性显著提高：我们最终提出的模型在干净数据上表现出与基线模型类似的性能，同时鲁棒性提高了四倍以上。摘要：In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial attacks: Our final, proposed model shows a performance on clean data similar to the baseline model, while being more than four times more robust.

【5】 Quantum Pattern Recognition in Photonic Circuits 标题：光子电路中的量子模式识别链接：https://arxiv.org/abs/2107.09961

作者：Rui Wang,Carlos Hernani-Morales,José D. Martín-Guerrero,Enrique Solano,Francisco Albarrán-Arriagada 机构：Albarr´an-Arriagada, International Center of Quantum Artificial Intelligence for Science and Technology, (QuArtist) and Department of Physics, Shanghai University, Shanghai, China, IDAL, Electronic Engineering Department, University of Valencia, Avgda. 备注：None 摘要：本文提出了一种机器学习方法，通过一个简单的光学电路和光子数分布的数据处理（如光子图案）来表征光子态。输入态由两个用作参考的相干态和一个待研究的双模未知态组成。我们成功地训练了监督学习算法，可以预测双模态的纠缠度，并对一个光子模式进行完整的层析成像，在考虑的回归度量中获得了令人满意的值。摘要：This paper proposes a machine learning method to characterize photonic states via a simple optical circuit and data processing of photon number distributions, such as photonic patterns. The input states consist of two coherent states used as references and a two-mode unknown state to be studied. We successfully trained supervised learning algorithms that can predict the degree of entanglement in the two-mode state as well as perform the full tomography of one photonic mode, obtaining satisfactory values in the considered regression metrics.

3D|3D重建等相关(1篇)

【1】 Learning Body-Aware 3D Shape Generative Models 标题：学习体感三维形状生成模型链接：https://arxiv.org/abs/2112.07022

作者：Bryce Blinn,Alexander Ding,Daniel Ritchie,R. Kenny Jones,Srinath Sridhar,Manolis Savva 机构：Brown University, Providence, RI , Simon Fraser University, Burnaby, BC V,A ,S 备注：11 pages, 8 figures 摘要：建筑环境中许多物体的形状是由它们与人体的关系决定的：一个人将如何与这个物体互动？现有的数据驱动的三维形状生成模型生成看似合理的对象，但无法解释这些对象与人体的关系。在本文中，我们学习三维形状的身体感知生成模型。具体来说，我们训练椅子的生成模型，这是一个普遍存在的形状类别，可以根据给定的身体形状或坐姿进行调节。体形调节模型生产的椅子对于具有给定体形的人来说是舒适的；姿势调节模型生成可容纳给定坐姿的椅子。为了训练这些模型，我们定义了一个“坐姿匹配”度量和一个新的“坐姿舒适度”度量。计算这些指标需要一个昂贵的优化，让身体坐在椅子上，这太慢了，无法用作训练生成模型的损失函数。因此，我们训练神经网络来有效地逼近这些度量。我们使用我们的方法来训练三个身体感知的生成形状模型：基于结构化零件的生成器、点云生成器和隐式曲面生成器。在所有情况下，我们的方法都会生成模型，使其输出座椅形状适应输入人体规格。摘要：The shape of many objects in the built environment is dictated by their relationships to the human body: how will a person interact with this object? Existing data-driven generative models of 3D shapes produce plausible objects but do not reason about the relationship of those objects to the human body. In this paper, we learn body-aware generative models of 3D shapes. Specifically, we train generative models of chairs, an ubiquitous shape category, which can be conditioned on a given body shape or sitting pose. The body-shape-conditioned models produce chairs which will be comfortable for a person with the given body shape; the pose-conditioned models produce chairs which accommodate the given sitting pose. To train these models, we define a "sitting pose matching" metric and a novel "sitting comfort" metric. Calculating these metrics requires an expensive optimization to sit the body into the chair, which is too slow to be used as a loss function for training a generative model. Thus, we train neural networks to efficiently approximate these metrics. We use our approach to train three body-aware generative shape models: a structured part-based generator, a point cloud generator, and an implicit surface generator. In all cases, our approach produces models which adapt their output chair shapes to input human body specifications.

优化|敛散性(6篇)

【1】 Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions 标题：固定目标函数RELU激活的深度神经网络训练中随机梯度下降的收敛性证明链接：https://arxiv.org/abs/2112.07369

作者：Martin Hutzenthaler,Arnulf Jentzen,Katharina Pohl,Adrian Riekert,Luca Scarpa 机构：University of Duisburg-Essen, Essen, Germany, e-mail: martin.hutzenthaler a, ○uni-due.de, Applied Mathematics: Institute for Analysis and Numerics, University of Münster, Münster, Germany, e-mail: ajentzen a, ○uni-muenster.de 备注：52 pages, 1 figure. arXiv admin note: text overlap with arXiv:2104.00277, arXiv:2107.04479 摘要：在许多数值模拟中，随机梯度下降（SGD）型优化方法在深层神经网络（DNN）的训练中表现得非常有效，但直到今天，提供数学收敛性分析仍然是一个开放的研究问题，它严格解释了SGD型优化方法在DNN的训练。在这项工作中，我们研究了SGD型优化方法在训练具有整流线性单元（ReLU）激活的全连接前馈DNN中的应用。我们首先建立了这类DNN训练中出现的风险函数及其广义梯度函数的一般正则性，然后，在所考虑的目标函数为常数函数的假设下，研究了这类DNN训练中的普通SGD优化方法。明确地我们在假设学习率（SGD优化方法的步长）足够小但不是$L^1$-可和的情况下，以及在假设目标函数是一个常数函数的情况下，证明了所考虑的SGD过程的风险期望在训练此类DNN时收敛为零，作为SGD步数增加到无穷大。摘要：In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation. We first establish general regularity properties for the risk functions and their generalized gradient functions appearing in the training of such DNNs and, thereafter, we investigate the plain vanilla SGD optimization method in the training of such DNNs under the assumption that the target function under consideration is a constant function. Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method) are sufficiently small but not $L^1$-summable and under the assumption that the target function is a constant function that the expectation of the riskof the considered SGD process converges in the training of such DNNs to zero as the number of SGD steps increases to infinity.

【2】 Modeling Image Quantization Tradeoffs for Optimal Compression 标题：基于最优压缩的图像量化权衡建模链接：https://arxiv.org/abs/2112.07207

作者：Johnathan Chiu 机构：University of California, Berkeley 摘要：所有有损压缩算法都采用类似的压缩方案——频域变换，然后是量化和无损编码方案。他们通过量化高频数据来提高压缩率，从而达到折衷的目的，而压缩率的代价是更高的图像失真。我们提出了一种利用深度学习和极大极小损失函数优化量化表的新方法，与以前的方法相比，该方法能够更准确地测量率和失真参数（RD）之间的权衡。我们设计了一个卷积神经网络（CNN），以无监督的方式学习图像块和量化表之间的映射。通过一次处理所有通道中的图像，我们还可以通过测量不同通道之间信息丢失的权衡来实现更高的性能。我们最初的目标是对JPEG图像进行优化，但感觉这可以扩展到任何有损压缩。摘要：All Lossy compression algorithms employ similar compression schemes -- frequency domain transform followed by quantization and lossless encoding schemes. They target tradeoffs by quantizating high frequency data to increase compression rates which come at the cost of higher image distortion. We propose a new method of optimizing quantization tables using Deep Learning and a minimax loss function that more accurately measures the tradeoffs between rate and distortion parameters (RD) than previous methods. We design a convolutional neural network (CNN) that learns a mapping between image blocks and quantization tables in an unsupervised manner. By processing images across all channels at once, we can achieve stronger performance by also measuring tradeoffs in information loss between different channels. We initially target optimization on JPEG images but feel that this can be expanded to any lossy compressor.

【3】 Heuristic Hyperparameter Optimization for Convolutional Neural Networks using Genetic Algorithm 标题：基于遗传算法的卷积神经网络启发式超参数优化链接：https://arxiv.org/abs/2112.07087

作者：Meng Zhou 机构：School of Computing, Queen’s University, Kingston, ON, Canada 备注：8 pages, 3 figures 摘要：近年来，2019冠状病毒疾病2019例，冠状病毒病，被世界各国人民所知。当病毒到达肺部时，更容易引起肺部肺炎和败血症。在2019冠状病毒疾病患者中，X射线图像是鉴别感染典型特征的有力工具。放射科医生和病理学家观察到，感染患者的胸部X光片中出现毛玻璃状阴影，可作为诊断过程中的标准之一。在过去的几年中，深度学习已被证明是图像分类领域最强大的方法之一。由于正常人和感染者之间的胸部X射线有显著差异，可以使用深度模型来确定患者胸部X射线是否存在疾病。许多深层模型都是复杂的，并且随着输入参数的变化而变化。设计师有时会在深层模型的调整过程中苦苦挣扎，尤其是当他们从头开始构建模型时。遗传算法受生物进化过程的启发，在解决此类复杂问题中发挥着关键作用。在本文中，我提出了一种基于遗传的方法来优化用于胸部X射线分类任务的卷积神经网络（CNN）。摘要：In recent years, people from all over the world are suffering from one of the most severe diseases in history, known as Coronavirus disease 2019, COVID-19 for short. When the virus reaches the lungs, it has a higher probability to cause lung pneumonia and sepsis. X-ray image is a powerful tool in identifying the typical features of the infection for COVID-19 patients. The radiologists and pathologists observe that ground-glass opacity appears in the chest X-ray for infected patient cite{cozzi2021ground}, and it could be used as one of the criteria during the diagnosis process. In the past few years, deep learning has proven to be one of the most powerful methods in the field of image classification. Due to significant differences in Chest X-Ray between normal and infected people cite{rousan2020chest}, deep models could be used to identify the presence of the disease given a patient's Chest X-Ray. Many deep models are complex, and it evolves with lots of input parameters. Designers sometimes struggle with the tuning process for deep models, especially when they build up the model from scratch. Genetic Algorithm, inspired by the biological evolution process, plays a key role in solving such complex problems. In this paper, I proposed a genetic-based approach to optimize the Convolutional Neural Network(CNN) for the Chest X-Ray classification task.

【4】 Acceleration techniques for optimization over trained neural network ensembles 标题：训练好的神经网络集成上的优化加速技术链接：https://arxiv.org/abs/2112.07007

作者：Keliang Wang,Leonardo Lozano,Carlos Cardonha,David Bergman 机构：Department of Operations and Information Management, School of Business, University of Connecticut, Operations, Business Analytics & Information Systems, University of Cincinnati 备注：17 pages, 4 tables, 2 figures 摘要：我们研究的优化问题，其中的目标函数是通过前馈神经网络与校正线性单元（ReLU）激活建模。最近的文献探讨了使用单个神经网络对目标函数中的不确定或复杂元素进行建模。然而，众所周知，与单一神经网络模型相比，神经网络集成产生更稳定的预测，具有更好的泛化性，这表明神经网络集成在决策管道中的应用。我们研究了如何将神经网络集成作为优化模型的目标函数，并探索了相应问题的计算方法。我们提出了一个混合整数线性规划的基础上，现有的流行大-$M$公式优化在一个单一的神经网络。我们为我们的模型开发了两种加速技术，第一种是用于收紧神经网络中关键神经元边界的预处理过程，第二种是基于Benders分解的一组有效不等式。在一个全局优化问题和两个真实数据集上对我们的解决方法进行了实验评估；结果表明，我们的优化算法在计算时间和最优性差距方面优于最新方法的适应性。摘要：We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit (ReLU) activation. Recent literature has explored the use of a single neural network to model either uncertain or complex elements within an objective function. However, it is well known that ensembles of neural networks produce more stable predictions and have better generalizability than models with single neural networks, which suggests the application of ensembles of neural networks in a decision-making pipeline. We study how to incorporate a neural network ensemble as the objective function of an optimization model and explore computational approaches for the ensuing problem. We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network. We develop two acceleration techniques for our model, the first one is a preprocessing procedure to tighten bounds for critical neurons in the neural network while the second one is a set of valid inequalities based on Benders decomposition. Experimental evaluations of our solution methods are conducted on one global optimization problem and two real-world data sets; the results suggest that our optimization algorithm outperforms the adaption of an state-of-the-art approach in terms of computational time and optimality gaps.

【5】 Triangulation candidates for Bayesian optimization 标题：贝叶斯优化的三角剖分候选算法链接：https://arxiv.org/abs/2112.07457

作者：Robert B. Gramacy,Annie Sauer,Nathan Wycoff 备注：19 pages, 9 figures 摘要：贝叶斯优化是顺序设计的一种形式：用适当灵活的非线性回归模型理想化投入产出关系；与初始实验活动的数据相符；设计并优化一个标准，用于选择拟合模型下的下一个实验条件（例如，通过预测方程），以达到目标结果（比如最小值）；在这些条件下获取输出并更新拟合后重复。在许多情况下，这种针对新数据采集标准的“内部优化”很麻烦，因为它是非凸/高度多模态的，可能是不可微的，或者可能会妨碍数值优化器，特别是当推理需要蒙特卡罗时。在这种情况下，将连续搜索替换为随机候选上的离散搜索并不少见。在这里，我们建议使用基于现有输入设计的Delaunay三角剖分的候选项。除了基于传统凸包库的简单包装器来详细说明这些“三边”的构造之外，我们还基于所涉及的几何准则的特性来推广一些优势。然后，我们通过经验证明，与数值优化采集和基于基准问题的随机候选方案相比，tricands如何能够带来更好的贝叶斯优化性能。摘要：Bayesian optimization is a form of sequential design: idealize input-output relationships with a suitably flexible nonlinear regression model; fit to data from an initial experimental campaign; devise and optimize a criterion for selecting the next experimental condition(s) under the fitted model (e.g., via predictive equations) to target outcomes of interest (say minima); repeat after acquiring output under those conditions and updating the fit. In many situations this "inner optimization" over the new-data acquisition criterion is cumbersome because it is non-convex/highly multi-modal, may be non-differentiable, or may otherwise thwart numerical optimizers, especially when inference requires Monte Carlo. In such cases it is not uncommon to replace continuous search with a discrete one over random candidates. Here we propose using candidates based on a Delaunay triangulation of the existing input design. In addition to detailing construction of these "tricands", based on a simple wrapper around a conventional convex hull library, we promote several advantages based on properties of the geometric criterion involved. We then demonstrate empirically how tricands can lead to better Bayesian optimization performance compared to both numerically optimized acquisitions and random candidate-based alternatives on benchmark problems.

【6】 Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent 标题：在线乘性随机梯度下降法优化问题的非渐近界链接：https://arxiv.org/abs/2112.07110

作者：Riddhiman Bhattacharya 机构：University of Minnesota 摘要：随机梯度下降（SGD）的梯度噪声被认为在其性质（如逃逸低势点和正则化）中起着关键作用。过去的研究表明，通过小批量处理产生的SGD误差的协方差在确定其正则化和从低电位点逃逸方面起着关键作用。然而，对于误差分布对算法行为的影响程度，还没有太多的探讨。在这一领域一些新研究的推动下，我们通过显示具有相同SGD均值和协方差结构的噪声类具有相似的性质来证明普适性结果。我们主要考虑吴等人引入的乘法随机梯度下降算法（M-SGD），它比通过小批量处理的SGD算法具有更广泛的噪声级。我们主要针对通过小批量处理对应于SGD的随机微分方程，建立了M-SGD算法的非渐近界。我们还证明了M-SGD算法的误差近似为标度高斯分布，在M-SGD算法的任何固定点上的平均值为$0$。摘要：The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its properties (e.g. escaping low potential points and regularization). Past research has indicated that the covariance of the SGD error done via minibatching plays a critical role in determining its regularization and escape from low potential points. It is however not much explored how much the distribution of the error influences the behavior of the algorithm. Motivated by some new research in this area, we prove universality results by showing that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties. We mainly consider the Multiplicative Stochastic Gradient Descent (M-SGD) algorithm as introduced by Wu et al., which has a much more general noise class than the SGD algorithm done via minibatching. We establish nonasymptotic bounds for the M-SGD algorithm mainly with respect to the Stochastic Differential Equation corresponding to SGD via minibatching. We also show that the M-SGD error is approximately a scaled Gaussian distribution with mean $0$ at any fixed point of the M-SGD algorithm.

预测|估计(4篇)

【1】 M3E2: Multi-gate Mixture-of-experts for Multi-treatment Effect Estimation 标题：M3E2：用于多治疗效果评估的多门混合专家链接：https://arxiv.org/abs/2112.07574

作者：Raquel Aoki,Yizhou Chen,Martin Ester 机构：Simon Fraser University 备注：4 figures, 10 pages 摘要：这项工作提出了M3E2，一个多任务学习神经网络模型来估计多种治疗的效果。与现有方法相比，M3E2对同时应用于同一单位、连续和二元处理以及许多协变量的多种处理效果具有鲁棒性。我们在三个综合基准数据集中比较了M3E2与三个基线：两个采用多种治疗，一个采用一种治疗。我们的分析表明，我们的方法具有优越的性能，对真正的治疗效果做出了更为自信的估计。该代码可在github上获得。com/raquelaoki/M3E2。摘要：This work proposes the M3E2, a multi-task learning neural network model to estimate the effect of multiple treatments. In contrast to existing methods, M3E2 is robust to multiple treatment effects applied simultaneously to the same unit, continuous and binary treatments, and many covariates. We compared M3E2 with three baselines in three synthetic benchmark datasets: two with multiple treatments and one with one treatment. Our analysis showed that our method has superior performance, making more assertive estimations of the true treatment effects. The code is available at github.com/raquelaoki/M3E2.

【2】 Scale-Aware Neural Architecture Search for Multivariate Time Series Forecasting 标题：多变量时间序列预测的尺度感知神经结构搜索链接：https://arxiv.org/abs/2112.07459

作者：Donghui Chen,Ling Chen,Zongjiang Shang,Youdong Zhang,Bo Wen,Chenghu Yang 机构： Zhejiang University 摘要：多元时间序列（MTS）预测在许多智能应用中引起了广泛关注。这不是一个微不足道的任务，因为我们需要考虑变量内依赖和变量间的依赖关系。然而，现有的工作是针对特定场景设计的，需要大量领域知识和专家的努力，这很难在不同场景之间进行转换。在本文中，我们提出了一个用于MTS预测的尺度感知神经结构搜索框架（SNAS4MTF）。多尺度分解模块将原始时间序列转换为多尺度子序列，从而保持多尺度时间模式。自适应图学习模块在没有任何先验知识的情况下，在不同的时间尺度下推断出不同的变量间依赖关系。对于MTS预测，设计了一个搜索空间来捕获每个时间尺度上的变量内依赖和变量间依赖。多尺度分解、自适应图学习和神经结构搜索模块在端到端框架中联合学习。在两个真实数据集上进行的大量实验表明，与最先进的方法相比，SNAS4MTF具有良好的性能。摘要：Multivariate time series (MTS) forecasting has attracted much attention in many intelligent applications. It is not a trivial task, as we need to consider both intra-variable dependencies and inter-variable dependencies. However, existing works are designed for specific scenarios, and require much domain knowledge and expert efforts, which is difficult to transfer between different scenarios. In this paper, we propose a scale-aware neural architecture search framework for MTS forecasting (SNAS4MTF). A multi-scale decomposition module transforms raw time series into multi-scale sub-series, which can preserve multi-scale temporal patterns. An adaptive graph learning module infers the different inter-variable dependencies under different time scales without any prior knowledge. For MTS forecasting, a search space is designed to capture both intra-variable dependencies and inter-variable dependencies at each time scale. The multi-scale decomposition, adaptive graph learning, and neural architecture search modules are jointly learned in an end-to-end framework. Extensive experiments on two real-world datasets demonstrate that SNAS4MTF achieves a promising performance compared with the state-of-the-art methods.

【3】 Machine Learning-based Prediction of Porosity for Concrete Containing Supplementary Cementitious Materials 标题：基于机器学习的含辅助胶凝材料混凝土孔隙率预测链接：https://arxiv.org/abs/2112.07353

作者：Chong Cao 机构：University of California, Los Angeles, Westwood Plaza, Los Angeles, CA , Corresponding author at:, UCLA Anderson School of Management, Westwood Plaza, Los Angeles, CA , Tel: , (,) 摘要：孔隙率已被确定为暴露在侵蚀性环境中的混凝土耐久性的关键指标。本文应用集成学习方法预测含辅助胶凝材料的高性能混凝土的孔隙率。本研究中使用的混凝土样品具有八种成分特征，包括w/b比、粘合剂含量、粉煤灰、GGBS、高效减水剂、粗/细骨料比、养护条件和养护天数。组装的数据库由240条数据记录组成，具有74种独特的混凝土混合料设计。建议的机器学习算法在从数据集中随机选择的180个观察值（75%）上进行训练，然后在剩余的60个观察值（25%）上进行测试。数值试验表明，回归树集成可以从混凝土的混合料组成中准确预测混凝土的孔隙率。梯度增强树在预测精度方面通常优于随机森林。对于随机森林，基于包外误差的超参数调整策略比k-折叠交叉验证更有效。摘要：Porosity has been identified as the key indicator of the durability properties of concrete exposed to aggressive environments. This paper applies ensemble learning to predict porosity of high-performance concrete containing supplementary cementitious materials. The concrete samples utilized in this study are characterized by eight composition features including w/b ratio, binder content, fly ash, GGBS, superplasticizer, coarse/fine aggregate ratio, curing condition and curing days. The assembled database consists of 240 data records, featuring 74 unique concrete mixture designs. The proposed machine learning algorithms are trained on 180 observations (75%) chosen randomly from the data set and then tested on the remaining 60 observations (25%). The numerical experiments suggest that the regression tree ensembles can accurately predict the porosity of concrete from its mixture compositions. Gradient boosting trees generally outperforms random forests in terms of prediction accuracy. For random forests, the out-of-bag error based hyperparameter tuning strategy is found to be much more efficient than k-Fold Cross-Validation.

【4】 Calibrated and Sharp Uncertainties in Deep Learning via Simple Density Estimation 标题：基于简单密度估计的深度学习中的校准和锐化不确定性链接：https://arxiv.org/abs/2112.07184

作者：Volodymyr Kuleshov,Shachi Deshpande 机构：Department of Computer Science, Cornell Tech, New York, NY 摘要：预测不确定性可以由两个特性来表征——校准和锐度。本文论证了根据这些属性对不确定性进行推理，并提出了在深度学习中实施这些属性的简单算法。我们的方法专注于最强大的校准概念——分布校准——并通过使用神经估计器拟合低维密度或分位数函数来实现。由此产生的方法比以前的分类和回归方法简单得多，适用范围更广。从经验上看，我们发现我们的方法以最小的计算和实现开销改善了多个任务的预测不确定性。我们的见解提出了训练深度学习模型的简单而改进的方法，这些方法会导致准确的不确定性，应该利用这些不确定性来提高下游应用程序的性能。摘要：Predictive uncertainties can be characterized by two properties--calibration and sharpness. This paper argues for reasoning about uncertainty in terms these properties and proposes simple algorithms for enforcing them in deep learning. Our methods focus on the strongest notion of calibration--distribution calibration--and enforce it by fitting a low-dimensional density or quantile function with a neural estimator. The resulting approach is much simpler and more broadly applicable than previous methods across both classification and regression. Empirically, we find that our methods improve predictive uncertainties on several tasks with minimal computational and implementation overhead. Our insights suggest simple and improved ways of training deep learning models that lead to accurate uncertainties that should be leveraged to improve performance across downstream applications.

其他神经网络|深度学习|模型|建模(26篇)

【1】 Learning Connectivity-Maximizing Network Configurations 标题：学习连接-最大化网络配置链接：https://arxiv.org/abs/2112.07663

作者：Daniel Mox,Vijay Kumar,Alejandro Ribeiro 机构：Sensing and Perception (GRASP) Laboratory at the University of Pennsylva-nia, PAAlejandro Ribeiro is with the Electrical and Systems Engineering Depart-ment at the University of Pennsylvania 摘要：在这项工作中，我们提出了一种数据驱动的方法来优化一组机器人的代数连接。虽然已经有大量的研究致力于这个问题，但我们缺乏一种方法，这种方法能够以一种适合于多个代理的在线应用程序的方式进行扩展。为此，我们提出了一种带卷积神经网络（CNN）的监督学习方法，该方法使用基于优化的策略从专家那里学习放置通信代理。我们展示了我们的CNN在标准线和环拓扑、105k随机生成的测试用例以及在训练期间未见过的更大团队上的性能。我们还通过基于Unity的仿真展示了我们的系统如何应用于动态机器人团队。经过训练后，我们的系统生成连接配置的速度比10-20个代理团队基于优化的方案快2个数量级。摘要：In this work we propose a data-driven approach to optimizing the algebraic connectivity of a team of robots. While a considerable amount of research has been devoted to this problem, we lack a method that scales in a manner suitable for online applications for more than a handful of agents. To that end, we propose a supervised learning approach with a convolutional neural network (CNN) that learns to place communication agents from an expert that uses an optimization-based strategy. We demonstrate the performance of our CNN on canonical line and ring topologies, 105k randomly generated test cases, and larger teams not seen during training. We also show how our system can be applied to dynamic robot teams through a Unity-based simulation. After training, our system produces connected configurations 2 orders of magnitude faster than the optimization-based scheme for teams of 10-20 agents.

【2】 Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time 标题：在次二次时间内训练多层超参数化神经网络链接：https://arxiv.org/abs/2112.07628

作者：Zhao Song,Lichen Zhang,Ruizhe Zhang 机构： Carnegie Mellon University, The University of Texas at Austin 摘要：我们考虑的问题，训练多层次的参数化神经网络，以最小化由损失函数引起的经验风险。在过参数化的典型设置中，网络宽度$m$远大于数据维度$d$和训练样本数$n$（$m=mathrm{poly}（n，d）$），这导致每层产生一个禁止性的大权重矩阵$WInmathbb{R}{m乘以m}$。天真地说，人们必须支付$O（m^2）$时间来读取权重矩阵，并在正向和反向计算中评估神经网络功能。在这项工作中，我们展示了如何减少每次迭代的训练成本，具体地说，我们提出了一个框架，该框架仅在初始化阶段使用$m^2$成本，并以$m$的形式实现每次迭代的真正次二次成本，即每次迭代$m^{2-Omega（1）}$。为了得到这个结果，我们使用了各种技术，包括基于移位ReLU的稀疏化器、惰性低秩维护数据结构、快速矩形矩阵乘法、基于张量的绘制技术和预处理。摘要：We consider the problem of training a multi-layer over-parametrized neural networks to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width $m$ is much larger than the data dimension $d$ and number of training samples $n$ ($m=mathrm{poly}(n,d)$), which induces a prohibitive large weight matrix $Win mathbb{R}^{mtimes m}$ per layer. Naively, one has to pay $O(m^2)$ time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration, specifically, we propose a framework that uses $m^2$ cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of $m$, i.e., $m^{2-Omega(1)}$ per iteration. To obtain this result, we make use of various techniques, including a shifted ReLU-based sparsifier, a lazy low rank maintenance data structure, fast rectangular matrix multiplication, tensor-based sketching techniques and preconditioning.

【3】 Epigenomic language models powered by Cerebras 标题：由大脑驱动的表观语言模型链接：https://arxiv.org/abs/2112.07571

作者：Meredith V. Trotter,Cuong Q. Nguyen,Stephen Young,Rob T. Woodruff,Kim M. Branson 机构：Artificial Intelligence and Machine Learning, GlaxoSmithKline 备注：18 pages, 5 figures, 3 tables 摘要：Transformer语言模型的大规模自我监督预训练推动了自然语言处理领域的发展，并有望在蛋白质和DNA的生物“语言”的交叉应用中发挥作用。利用大型基因组序列小体学习DNA序列的有效表达可以通过转移学习加速基因调控和功能模型的发展。然而，为了准确地模拟细胞类型特异性基因调控和功能，有必要不仅考虑DNA核苷酸序列中所包含的信息，该信息在细胞类型中大多是不变的，而且还考虑染色体的局部化学和结构“表观遗传状态”如何在细胞类型之间变化。在这里，我们介绍了一种来自Transformer的双向编码表示（BERT）模型，该模型学习基于DNA序列和成对表观遗传状态输入的表示，我们称之为表观基因BERT（或EBERT）。我们在整个人类基因组和127种细胞类型中使用蒙面语言模型对EBERT进行预训练。通过与Cerbinas Systems的合作，该公司的CS-1系统为所有训练前实验提供了动力，这是第一次有可能使用一个以前令人望而却步的大数据集来训练这个复杂的模型。我们通过在细胞类型特异性转录因子结合预测任务中表现出出色的表现来展示EBERT的转移学习潜力。我们的微调模型在ENCODE-DREAM基准测试的13个评估数据集中的4个上超过了最先进的性能，并在挑战排行榜上获得了第三名的总体排名。我们探讨了表观遗传数据和任务特定特征增强对迁移学习绩效的影响。摘要：Large scale self-supervised pre-training of Transformer language models has advanced the field of Natural Language Processing and shown promise in cross-application to the biological `languages' of proteins and DNA. Learning effective representations of DNA sequences using large genomic sequence corpuses may accelerate the development of models of gene regulation and function through transfer learning. However, to accurately model cell type-specific gene regulation and function, it is necessary to consider not only the information contained in DNA nucleotide sequences, which is mostly invariant between cell types, but also how the local chemical and structural `epigenetic state' of chromosomes varies between cell types. Here, we introduce a Bidirectional Encoder Representations from Transformers (BERT) model that learns representations based on both DNA sequence and paired epigenetic state inputs, which we call Epigenomic BERT (or EBERT). We pre-train EBERT with a masked language model objective across the entire human genome and across 127 cell types. Training this complex model with a previously prohibitively large dataset was made possible for the first time by a partnership with Cerebras Systems, whose CS-1 system powered all pre-training experiments. We show EBERT's transfer learning potential by demonstrating strong performance on a cell type-specific transcription factor binding prediction task. Our fine-tuned model exceeds state of the art performance on 4 of 13 evaluation datasets from ENCODE-DREAM benchmarks and earns an overall rank of 3rd on the challenge leaderboard. We explore how the inclusion of epigenetic data and task specific feature augmentation impact transfer learning performance.

【4】 Modeling Strong and Human-Like Gameplay with KL-Regularized Search 标题：基于KL正则化搜索的强势类人游戏建模链接：https://arxiv.org/abs/2112.07544

作者：Athul Paul Jacob,David J. Wu,Gabriele Farina,Adam Lerer,Anton Bakhtin,Jacob Andreas,Noam Brown 机构： USA 3School of Computer Sci-ence, Carnegie Mellon University 摘要：考虑到人类行为的例子，我们考虑在多智能体决策问题中建立强但类人的策略的任务。模仿学习在预测人类行为方面是有效的，但可能无法与专家的力量相匹配，而自我游戏学习和搜索技术（如AlphaZero）可以带来强大的性能，但可能产生人类难以理解和协调的策略。我们在chess and Go中表明，通过应用Monte Carlo树搜索，基于模仿学习策略的KL差异对搜索策略进行正则化，可以产生比模仿策略更高的人类预测精度和更强的策略。然后，我们介绍了一种新的基于模仿学习策略KL发散的正则化后悔最小化算法，并表明将该算法应用于无新闻外交会产生一种策略，该策略在保持与模仿学习相同的人类预测精度的同时，具有更大的强度。摘要：We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.

【5】 Pruning Coherent Integrated Photonic Neural Networks Using the Lottery Ticket Hypothesis 标题：基于彩票假设的相干集成光子神经网络修剪链接：https://arxiv.org/abs/2112.07485

作者：Sanmitra Banerjee,Mahdi Nikdast,Sudeep Pasricha,Krishnendu Chakrabarty 机构：Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA, Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO, USA 摘要：基于奇异值分解的相干集成光子神经网络（SC-IPNNs）占地面积大，训练和推理的静态功耗高，无法使用传统的DNN剪枝技术进行剪枝。我们利用彩票假设为SC IPNN提出了第一种硬件感知修剪方法，通过最小化权重参数的数量来缓解这些挑战。我们修剪了一个基于多层感知器的SC-IPNN，结果表明，多达89%的相位角（对应于SC-IPNN中的权重参数）可以修剪，精度损失可以忽略不计（小于5%），同时静态功耗可以降低86%。摘要：Singular-value-decomposition-based coherent integrated photonic neural networks (SC-IPNNs) have a large footprint, suffer from high static power consumption for training and inference, and cannot be pruned using conventional DNN pruning techniques. We leverage the lottery ticket hypothesis to propose the first hardware-aware pruning method for SC-IPNNs that alleviates these challenges by minimizing the number of weight parameters. We prune a multi-layer perceptron-based SC-IPNN and show that up to 89% of the phase angles, which correspond to weight parameters in SC-IPNNs, can be pruned with a negligible accuracy loss (smaller than 5%) while reducing the static power consumption by up to 86%.

【6】 Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models 标题：用有偏见的尺子衡量公正性：对预先训练的语言模型中的偏差进行量化的研究综述链接：https://arxiv.org/abs/2112.07447

作者：Pieter Delobelle,Ewoenam Kwaku Tokpo,Toon Calders,Bettina Berendt 机构： Department of Computer Science, KU Leuven; Leuven.ai, Department of Computer Science, University of Antwerp, TU Berlin 备注：15 pages, 4 figures, 3 tables 摘要：人们对自然语言处理资源（如BERT）中的偏见模式的认识不断提高，促使许多指标量化“偏见”和“公平性”。但比较不同指标的结果和使用这些指标进行评估的工作仍然很困难，如果不是完全不可能的话。我们调查了关于预训练语言模型的公平性度量的现有文献，并通过实验评估了兼容性，包括语言模型中的两种偏见及其下游任务。我们通过混合传统文献调查和相关分析，以及进行实证评估来做到这一点。我们发现许多指标不兼容，高度依赖于（i）模板，（ii）属性和目标种子，以及（iii）嵌入的选择。这些结果表明，对于语境化的语言模型来说，公平性或偏见评估仍然具有挑战性，如果不是高度主观的话。为了改进未来的比较和公平性评估，我们建议避免嵌入基于度量的方法，并将重点放在下游任务中的公平性评估上。摘要：An increasing awareness of biased patterns in natural language processing resources, like BERT, has motivated many metrics to quantify `bias' and `fairness'. But comparing the results of different metrics and the works that evaluate with such metrics remains difficult, if not outright impossible. We survey the existing literature on fairness metrics for pretrained language models and experimentally evaluate compatibility, including both biases in language models as in their downstream tasks. We do this by a mixture of traditional literature survey and correlation analysis, as well as by running empirical evaluations. We find that many metrics are not compatible and highly depend on (i) templates, (ii) attribute and target seeds and (iii) the choice of embeddings. These results indicate that fairness or bias evaluation remains challenging for contextualized language models, if not at least highly subjective. To improve future comparisons and fairness evaluations, we recommend avoiding embedding-based metrics and focusing on fairness evaluations in downstream tasks.

【7】 An Interpretive Constrained Linear Model for ResNet and MgNet 标题：ResNet和MgNet的一种解释性约束线性模型链接：https://arxiv.org/abs/2112.07441

作者：Juncai He,Jinchao Xu,Lian Zhang,Jianqing Zhu 机构： The University of Texas at Austin, †Department of Mathematics, The Pennsylvania State University, University Park 备注：26 pages, 2 figures and 11 tables. arXiv admin note: text overlap with arXiv:1911.10428 摘要：我们提出了一种约束线性数据特征映射模型，作为使用卷积神经网络（CNN）进行图像分类的可解释数学模型。从这个观点出发，我们在线性系统的传统迭代方案和ResNet和MgNet类型模型的基本块的体系结构之间建立了详细的联系。利用这些联系，我们提出了一些改进的ResNet模型，与原始模型相比，这些模型具有更少的参数，但可以产生更精确的结果，从而证明了这种约束学习数据特征映射假设的有效性。基于这一假设，我们进一步提出了一种通用的数据特征迭代方案来证明MgNet的合理性。我们还对MgNet进行了系统的数值研究，以展示其在图像分类问题上的成功和优势，并与已建立的网络进行了比较。摘要：We propose a constrained linear data-feature-mapping model as an interpretable mathematical model for image classification using a convolutional neural network (CNN). From this viewpoint, we establish detailed connections between the traditional iterative schemes for linear systems and the architectures of the basic blocks of ResNet- and MgNet-type models. Using these connections, we present some modified ResNet models that compared with the original models have fewer parameters and yet can produce more accurate results, thereby demonstrating the validity of this constrained learning data-feature-mapping assumption. Based on this assumption, we further propose a general data-feature iterative scheme to show the rationality of MgNet. We also provide a systematic numerical study on MgNet to show its success and advantages in image classification problems and demonstrate its advantages in comparison with established networks.

【8】 Bayesian Learning of Play Styles in Multiplayer Video Games 标题：多人视频游戏中游戏风格的贝叶斯学习链接：https://arxiv.org/abs/2112.07437

作者：Aline Normoyle,Shane T. Jensen 机构：Bryn Mawr College and University of Pennsylvania 摘要：在线多人游戏中游戏的复杂性已经引起了人们对模拟玩家成功使用的不同游戏风格或策略的浓厚兴趣。我们为在线多人游戏《战场3》开发了一种分层贝叶斯回归方法，在该方法中，性能被建模为角色、游戏类型和该玩家在每一场比赛中使用的地图的函数。我们使用Dirichlet过程，使回归模型中具有类似球员特定系数的球员能够聚集，这使我们能够在我们的战地3球员样本中发现共同的比赛风格。这种贝叶斯半参数聚类方法有几个优点：不需要指定常见游戏风格的数量，玩家可以在多个集群之间移动，并且生成的分组通常具有直接的解释。我们详细研究了战场3玩家中最常见的游戏风格，找到了表现出整体高性能的玩家组，以及在特定游戏类型、地图和角色中表现特别出色的玩家组。我们还能够区分具有特定打法风格的稳定球员和在比赛中表现出多种打法风格的混合球员。为不同游戏风格的场景建模将有助于游戏开发人员为新参与者开发专门的教程，并改进在线匹配队列中互补团队的构建。摘要：The complexity of game play in online multiplayer games has generated strong interest in modeling the different play styles or strategies used by players for success. We develop a hierarchical Bayesian regression approach for the online multiplayer game Battlefield 3 where performance is modeled as a function of the roles, game type, and map taken on by that player in each of their matches. We use a Dirichlet process prior that enables the clustering of players that have similar player-specific coefficients in our regression model, which allows us to discover common play styles amongst our sample of Battlefield 3 players. This Bayesian semi-parametric clustering approach has several advantages: the number of common play styles do not need to be specified, players can move between multiple clusters, and the resulting groupings often have a straight-forward interpretations. We examine the most common play styles among Battlefield 3 players in detail and find groups of players that exhibit overall high performance, as well as groupings of players that perform particularly well in specific game types, maps and roles. We are also able to differentiate between players that are stable members of a particular play style from hybrid players that exhibit multiple play styles across their matches. Modeling this landscape of different play styles will aid game developers in developing specialized tutorials for new participants as well as improving the construction of complementary teams in their online matching queues.

【9】 Obtaining Calibrated Probabilities with Personalized Ranking Models 标题：利用个性化排序模型获取校准概率链接：https://arxiv.org/abs/2112.07428

作者：Wonbin Kweon,SeongKu Kang,Hwanjo Yu 机构：Pohang University of Science and Technology, South Korea 备注：AAAI 2022 摘要：对于个性化的排名模型，一个项目被用户首选的良好校准概率具有很大的实用价值。虽然现有的工作在图像分类方面显示了有希望的结果，但对于个性化排序的概率校准还没有太多的探索。在本文中，我们的目标是估计用户选择某个项目的可能性。我们研究了各种参数分布，并提出了两种参数校准方法，即高斯校准和伽马校准。每种方法都可以看作是一种后处理函数，它将预先训练的模型的排名分数映射到经过良好校准的偏好概率，而不会影响推荐性能。我们还设计了无偏经验风险最小化框架，指导校准方法从有偏用户项交互数据集中学习真实偏好概率。对真实数据集的各种个性化排序模型的广泛评估表明，所提出的校准方法和无偏经验风险最小化显著提高了校准性能。摘要：For personalized ranking models, the well-calibrated probability of an item being preferred by a user has great practical value. While existing work shows promising results in image classification, probability calibration has not been much explored for personalized ranking. In this paper, we aim to estimate the calibrated probability of how likely a user will prefer an item. We investigate various parametric distributions and propose two parametric calibration methods, namely Gaussian calibration and Gamma calibration. Each proposed method can be seen as a post-processing function that maps the ranking scores of pre-trained models to well-calibrated preference probabilities, without affecting the recommendation performance. We also design the unbiased empirical risk minimization framework that guides the calibration methods to learning of true preference probability from the biased user-item interaction dataset. Extensive evaluations with various personalized ranking models on real-world datasets show that both the proposed calibration methods and the unbiased empirical risk minimization significantly improve the calibration performance.

【10】 Direct Training via Backpropagation for Ultra-low Latency Spiking Neural Networks with Multi-threshold 标题：多阈值超低潜伏期尖峰神经网络的反向传播直接训练链接：https://arxiv.org/abs/2112.07426

作者：Changqing Xu,Yi Liu,Yintang Yang 机构： by the Industry-University-Academy Cooperation Pro-gram of Xidian University-Chongqing IC Innovation Research Instituteunder Grant CQIRI- 20 2 1CXY-Z0 1, and by the Fundamental ResearchFunds for the Central Universities 摘要：尖峰神经网络（SNN）可以利用时空信息，具有能量效率的特性，是深度神经网络（DNN）的一个很好的替代方案。事件驱动的信息处理使得SNNs可以减少DNNs的昂贵计算量，节省大量的能量消耗。然而，较高的训练和推理延迟限制了更深层次SNN的发展。snn在训练和推理过程中通常需要数十甚至数百个时间步长，这不仅会增加延迟，而且会造成能量消耗的浪费。为了克服这个问题，我们提出了一种新的基于反向传播（BP）的多阈值超低延迟（1-2个时间步长）SNN训练方法。为了提高每个尖峰的信息容量，我们引入了多阈值泄漏集成和触发（LIF）模型。在我们提出的训练方法中，我们提出了三个近似的尖峰活动导数，以解决基于BP的SNN直接训练中存在的不可微问题。实验结果表明，我们提出的方法在MNIST、FashionMNIST和CIFAR10上的平均准确率分别为99.56%、93.08%和87.90%，只需2个时间步长。对于CIFAR10数据集，我们提出的方法比以前报道的直接训练SNN的精度提高了1.12%，时间步长更少。摘要：Spiking neural networks (SNNs) can utilize spatio-temporal information and have a nature of energy efficiency which is a good alternative to deep neural networks(DNNs). The event-driven information processing makes SNNs can reduce the expensive computation of DNNs and save a lot of energy consumption. However, high training and inference latency is a limitation of the development of deeper SNNs. SNNs usually need tens or even hundreds of time steps during the training and inference process which causes not only the increase of latency but also the waste of energy consumption. To overcome this problem, we proposed a novel training method based on backpropagation (BP) for ultra-low latency(1-2 time steps) SNN with multi-threshold. In order to increase the information capacity of each spike, we introduce the multi-threshold Leaky Integrate and Fired (LIF) model. In our proposed training method, we proposed three approximated derivative for spike activity to solve the problem of the non-differentiable issue which cause difficulties for direct training SNNs based on BP. The experimental results show that our proposed method achieves an average accuracy of 99.56%, 93.08%, and 87.90% on MNIST, FashionMNIST, and CIFAR10, respectively with only 2 time steps. For the CIFAR10 dataset, our proposed method achieve 1.12% accuracy improvement over the previously reported direct trained SNNs with fewer time steps.

【11】 Simple and Robust Loss Design for Multi-Label Learning with Missing Labels 标题：具有缺失标签的多标签学习的简单鲁棒损失设计链接：https://arxiv.org/abs/2112.07368

作者：Youcai Zhang,Yuhao Cheng,Xinyu Huang,Fei Wen,Rui Feng,Yaqian Li,Yandong Guo 机构：Guo 摘要：标签缺失情况下的多标签学习（MLML）是一个具有挑战性的问题。现有的方法主要集中在网络结构或训练方案的设计上，这增加了实现的复杂性。这项工作试图在不增加程序和复杂性的情况下实现MLML中的潜在损失函数。为此，我们提出了两种简单而有效的方法，通过鲁棒损失设计，基于模型可以在训练期间以高精度识别缺失标签的观察结果。第一种是一种新颖的鲁棒性负片损失，即希尔损失，它将负片重新加权为希尔形状，以减轻假负片的影响。第二种是自配速损失校正（SPLC）方法，该方法在丢失标签的近似分布下使用从最大似然准则导出的损失。在大量多标签图像分类数据集上的综合实验表明，我们的方法可以显著提高MLML的性能，并在MLML中实现新的最先进的损失函数。摘要：Multi-label learning in the presence of missing labels (MLML) is a challenging problem. Existing methods mainly focus on the design of network structures or training schemes, which increase the complexity of implementation. This work seeks to fulfill the potential of loss function in MLML without increasing the procedure and complexity. Toward this end, we propose two simple yet effective methods via robust loss design based on an observation that a model can identify missing labels during training with a high precision. The first is a novel robust loss for negatives, namely the Hill loss, which re-weights negatives in the shape of a hill to alleviate the effect of false negatives. The second is a self-paced loss correction (SPLC) method, which uses a loss derived from the maximum likelihood criterion under an approximate distribution of missing labels. Comprehensive experiments on a vast range of multi-label image classification datasets demonstrate that our methods can remarkably boost the performance of MLML and achieve new state-of-the-art loss functions in MLML.

【12】 SC-Reg: Training Overparameterized Neural Networks under Self-Concordant Regularization 标题：SC-REG：自协调正则化下的过参数神经网络训练链接：https://arxiv.org/abs/2112.07344

作者：Adeyemi D. Adeoye,Alberto Bemporad 机构：it†IMT School for Advanced Studies Lucca 备注：16 pages, 4 figures 摘要：在本文中，我们通过在凸问题的{emph{Newton decrement}框架中加入二阶信息，提出了学习过参数化前馈神经网络的SC-Reg（自协调正则化）框架。我们提出了广义高斯-牛顿自协调正则化（SCoRe GGN）算法，该算法在每次接收到新的输入批次时更新网络参数。该算法利用了Hessian矩阵中二阶信息的结构，从而减少了训练计算开销。虽然我们目前的分析只考虑了凸情况，但数值实验表明，我们的方法在凸和非凸环境下都是有效的，并且收敛速度很快，这与基线一阶方法和拟牛顿方法相比是非常好的。摘要：In this paper we propose the SC-Reg (self-concordant regularization) framework for learning overparameterized feedforward neural networks by incorporating second-order information in the emph{Newton decrement} framework for convex problems. We propose the generalized Gauss-Newton with Self-Concordant Regularization (SCoRe-GGN) algorithm that updates the network parameters each time it receives a new input batch. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing the training computational overhead. Although our current analysis considers only the convex case, numerical experiments show the efficiency of our method and its fast convergence under both convex and non-convex settings, which compare favorably against baseline first-order methods and a quasi-Newton method.

【13】 Learning to Guide and to Be Guided in the Architect-Builder Problem 标题：在架构师-构建者问题中学习指导和被指导链接：https://arxiv.org/abs/2112.07342

作者：Barde Paul,Karch Tristan,Nowrouzezahrai Derek,Moulin-Frier Clément,Pal Christopher,Oudeyer Pierre-Yves 机构：†Qu´ebec AI institute (Mila)McGill Universitybardepau, Inria - Flowers teamUniversit´e de Bordeauxtristan 摘要：我们感兴趣的是学习协调的交互式代理，即$builder$（执行操作但忽略任务目标）和$architect$（引导构建器实现任务目标）。我们定义并探索了一个正式的环境，其中人工智能体配备了一种机制，允许他们在同时学习任务的同时进化出一个共享的通信协议。实验符号学领域已经显示了人类从先验未知指令中学习意义的熟练程度。因此，我们从中得到启发，提出了建筑师-建筑商问题（ABP）：一种不对称的环境，建筑师必须学会引导建筑商建造特定的结构。架构师知道目标结构，但不能在环境中操作，只能向架构师发送任意消息。另一方面，架构师可以在环境中工作，但不知道手头的任务，必须学会仅依靠架构师发送的消息来解决它。至关重要的是，消息的含义最初没有定义，也没有在代理之间共享，但必须在整个学习过程中协商。在这些约束条件下，我们提出了架构师-架构师迭代指导（ABIG），这是架构师-架构师问题的一种解决方案，架构师利用已学习的架构师模型来指导它，而架构师使用自模仿学习来强化其指导行为。我们分析了ABIG的关键学习机制，并在ABP的二维实例中对其进行了测试，其中的任务包括抓取立方体、将其放置在给定位置或构建各种形状。在这种环境中，ABIG产生了一种低级别、高频率、指导性的通信协议，它不仅使架构师-构建者对能够解决手头的任务，而且还可以推广到不可见的任务。摘要：We are interested in interactive agents that learn to coordinate, namely, a $builder$ -- which performs actions but ignores the goal of the task -- and an $architect$ which guides the builder towards the goal of the task. We define and explore a formal setting where artificial agents are equipped with mechanisms that allow them to simultaneously learn a task while at the same time evolving a shared communication protocol. The field of Experimental Semiotics has shown the extent of human proficiency at learning from a priori unknown instructions meanings. Therefore, we take inspiration from it and present the Architect-Builder Problem (ABP): an asymmetrical setting in which an architect must learn to guide a builder towards constructing a specific structure. The architect knows the target structure but cannot act in the environment and can only send arbitrary messages to the builder. The builder on the other hand can act in the environment but has no knowledge about the task at hand and must learn to solve it relying only on the messages sent by the architect. Crucially, the meaning of messages is initially not defined nor shared between the agents but must be negotiated throughout learning. Under these constraints, we propose Architect-Builder Iterated Guiding (ABIG), a solution to the Architect-Builder Problem where the architect leverages a learned model of the builder to guide it while the builder uses self-imitation learning to reinforce its guided behavior. We analyze the key learning mechanisms of ABIG and test it in a 2-dimensional instantiation of the ABP where tasks involve grasping cubes, placing them at a given location, or building various shapes. In this environment, ABIG results in a low-level, high-frequency, guiding communication protocol that not only enables an architect-builder pair to solve the task at hand, but that can also generalize to unseen tasks.

【14】 Quantifying Multimodality in World Models 标题：世界模型中多模态的量化链接：https://arxiv.org/abs/2112.07263

作者：Andreas Sedlmeier,Michael Kölle,Robert Müller,Leo Baudrexel,Claudia Linnhoff-Popien 机构：LMU Munich, Munich, Germany 摘要：基于模型的深度强化学习（RL）假设环境的底层过渡动力学模型可用。该模型可用于预测代理人可能采取的行动的未来影响。当没有此类模型可用时，可以学习真实环境的近似值，例如通过使用生成性神经网络，有时也称为世界模型。由于大多数现实世界的环境本质上是随机的，过渡动力学通常是多模态的，因此使用能够反映这种多模态不确定性的建模技术非常重要。为了安全地将这些学习系统部署在现实世界中，特别是在工业环境中，考虑这些不确定性是至关重要的。在这项工作中，我们分析了基于RL的世界模型中的多模态不确定性，并提出了新的检测和量化指标。正确的建模和检测不确定的未来状态奠定了基础，以安全的方式处理危急情况，这是在现实世界中部署RL系统的先决条件。摘要：Model-based Deep Reinforcement Learning (RL) assumes the availability of a model of an environment's underlying transition dynamics. This model can be used to predict future effects of an agent's possible actions. When no such model is available, it is possible to learn an approximation of the real environment, e.g. by using generative neural networks, sometimes also called World Models. As most real-world environments are stochastic in nature and the transition dynamics are oftentimes multimodal, it is important to use a modelling technique that is able to reflect this multimodal uncertainty. In order to safely deploy such learning systems in the real world, especially in an industrial context, it is paramount to consider these uncertainties. In this work, we analyze existing and propose new metrics for the detection and quantification of multimodal uncertainty in RL based World Models. The correct modelling & detection of uncertain future states lays the foundation for handling critical situations in a safe way, which is a prerequisite for deploying RL systems in real-world settings.

【15】 TopNet: Learning from Neural Topic Model to Generate Long Stories 标题：TopNet：学习神经主题模型生成长篇故事链接：https://arxiv.org/abs/2112.07259

作者：Yazheng Yang,Boyuan Pan,Deng Cai,Huan Sun 机构：College of Computer Science, Hangzhou, China, State Key Lab of CAD&CG, Alibaba-Zhejiang University Joint Institute of Frontier, Technologies, Department of Computer Science and Engineering, The Ohio State University, Columbus, USA 备注：None 摘要：长故事生成（LSG）是自然语言处理领域梦寐以求的目标之一。与大多数文本生成任务不同，LSG需要基于更短的文本输入输出内容丰富的长篇大论，并且常常存在信息稀疏的问题。在本文中，我们提出emph{TopNet}来缓解这个问题，通过利用神经主题建模的最新进展来获得高质量的骨架词来补充短输入。特别是，我们首先学习将短文本输入映射到低维主题分布（由主题模型预先指定），而不是直接生成故事。基于这一潜在主题分布，我们可以使用主题模型的重建解码器对一系列相互关联的单词进行采样，作为故事的骨架。在两个基准数据集上的实验表明，我们提出的框架在骨架词选择方面非常有效，并且在自动评估和人工评估方面都显著优于最新的模型。摘要：Long story generation (LSG) is one of the coveted goals in natural language processing. Different from most text generation tasks, LSG requires to output a long story of rich content based on a much shorter text input, and often suffers from information sparsity. In this paper, we propose emph{TopNet} to alleviate this problem, by leveraging the recent advances in neural topic modeling to obtain high-quality skeleton words to complement the short input. In particular, instead of directly generating a story, we first learn to map the short text input to a low-dimensional topic distribution (which is pre-assigned by a topic model). Based on this latent topic distribution, we can use the reconstruction decoder of the topic model to sample a sequence of inter-related words as a skeleton for the story. Experiments on two benchmark datasets show that our proposed framework is highly effective in skeleton word selection and significantly outperforms the state-of-the-art models in both automatic evaluation and human evaluation.

【16】 HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework 标题：HET：通过支持缓存的分布式框架扩展巨大的嵌入模型训练链接：https://arxiv.org/abs/2112.07221

作者：Xupeng Miao,Hailin Zhang,Yining Shi,Xiaonan Nie,Zhi Yang,Yangyu Tao,Bin Cui 机构：Department of Computer Science & Key Lab of High Confidence Software Technologies (MOE), Peking University, Institute of Computational Social Science, Peking University (Qingdao),Tencent Inc. 备注：None 摘要：嵌入模型已经成为高维数据的有效学习范式。然而，嵌入模型的一个公开问题是，它们的表示（潜在因素）通常会导致较大的参数空间。我们观察到，现有的分布式训练框架面临着嵌入模型的可伸缩性问题，因为从服务器更新和检索共享的嵌入参数通常主导着训练周期。在本文中，我们提出了HET，这是一个新的系统框架，它显著提高了大型嵌入模型训练的可扩展性。我们将嵌入的受欢迎度分布作为一个性能机会，并利用它通过嵌入缓存解决通信瓶颈。为了确保缓存之间的一致性，我们在HET设计中加入了一个新的一致性模型，它在每个嵌入的基础上提供细粒度的一致性保证。与以前只允许读操作的陈旧性相比，HET还将陈旧性用于写操作。对六项代表性任务的评估表明，HET在最先进的基线上实现了高达88%的嵌入通信减少和高达20.68倍的性能加速。摘要：Embedding models have been an effective learning paradigm for high-dimensional data. However, one open issue of embedding models is that their representations (latent factors) often result in large parameter space. We observe that existing distributed training frameworks face a scalability issue of embedding models since updating and retrieving the shared embedding parameters from servers usually dominates the training cycle. In this paper, we propose HET, a new system framework that significantly improves the scalability of huge embedding model training. We embrace skewed popularity distributions of embeddings as a performance opportunity and leverage it to address the communication bottleneck with an embedding cache. To ensure consistency across the caches, we incorporate a new consistency model into HET design, which provides fine-grained consistency guarantees on a per-embedding basis. Compared to previous work that only allows staleness for read operations, HET also utilizes staleness for write operations. Evaluations on six representative tasks show that HET achieves up to 88% embedding communication reductions and up to 20.68x performance speedup over the state-of-the-art baselines.

【17】 Continual Learning In Environments With Polynomial Mixing Times 标题：多项式混合时间环境下的连续学习链接：https://arxiv.org/abs/2112.07066

作者：Matthew Riemer,Sharath Chandra Raparthy,Ignacio Cases,Gopeshh Subbaraj,Maximilian Puelma Touzel,Irina Rish 机构： Universit´e deMontr´eal 3Massachusetts Institute of Technology 备注：2 Figures, 20 pages 摘要：由策略引起的马尔可夫链的混合时间限制了真实世界连续学习场景中的性能。然而，在持续强化学习（RL）中，混合时间对学习的影响仍然没有得到充分的研究。在本文中，我们通过混合时间的角度来描述对连续RL（我们称之为可伸缩MDP）的发展具有长期意义的问题。特别是，我们确定可伸缩MDP的混合时间与问题的大小成多项式比例。我们继续证明多项式混合时间对于现有的方法来说是非常困难的，并提出了一系列基于模型的算法，这些算法通过一种新的自举过程直接优化平均报酬来加速学习。最后，我们对我们提出的方法进行了实证分析，展示了对基线的明显改进，以及可伸缩MDP如何用于分析混合时间尺度的RL算法。摘要：The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches and propose a family of model-based algorithms that speed up learning by directly optimizing for the average reward through a novel bootstrapping procedure. Finally, we perform empirical regret analysis of our proposed approaches, demonstrating clear improvements over baselines and also how scalable MDPs can be used for analysis of RL algorithms as mixing times scale.

【18】 Language Models are not Models of Language 标题：语言模型不是语言的模型链接：https://arxiv.org/abs/2112.07055

作者：Csaba Veres 机构：University of Bergen 摘要：自然语言处理（NLP）已成为当前人工智能热潮中的主要应用领域之一。迁移学习使得在语言建模任务中训练的大型深度学习神经网络能够极大地提高几乎所有语言任务的性能。有趣的是，当使用包含软件代码的数据对模型进行训练时，它们表现出从自然语言规范生成功能计算机代码的非凡能力。我们认为，这为神经模型在解释语言如何工作时为生成短语结构语法提供了另一种理论的说法带来了一个难题。由于编程语言的语法是由短语结构语法决定的，因此成功的神经模型显然对编程语言以及自然语言的理论基础缺乏信息。我们认为，语言模型这一术语具有误导性，因为深度学习模型不是语言的理论模型，因此建议采用语料库模型，这更好地反映了模型的起源和内容。摘要：Natural Language Processing (NLP) has become one of the leading application areas in the current Artificial Intelligence boom. Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance in almost all language tasks. Interestingly, when the models are trained with data that includes software code, they demonstrate remarkable abilities in generating functioning computer code from natural language specifications. We argue that this creates a conundrum for claims that neural models provide an alternative theory to generative phrase structure grammars in explaining how language works. Since the syntax of programming languages is determined by phrase structure grammars, successful neural models are apparently uninformative about the theoretical foundations of programming languages, and by extension, natural languages. We argue that the term language model is misleading because deep learning models are not theoretical models of language and propose the adoption of corpus model instead, which better reflects the genesis and contents of the model.

【19】 How to Learn when Data Gradually Reacts to Your Model 标题：如何了解数据何时会逐渐对您的模型做出反应链接：https://arxiv.org/abs/2112.07042

作者：Zachary Izzo,James Zou,Lexing Ying 机构：Department of Mathematics, Stanford University, Institute for Computational and Mathematical Engineering, Stanford University, Department of Biomedical Data Science, Stanford University 备注：40 pages, 8 figures 摘要：最近的一项工作重点是在性能设置中训练机器学习（ML）模型，即当数据分布对部署的模型作出反应时。此设置的目标是学习一个模型，该模型既能诱导有利的数据分布，又能在诱导分布上表现良好，从而将测试损失降至最低。以前寻找最优模型的工作假设数据分布立即适应部署的模型。然而，在实践中，情况可能并非如此，因为人口可能需要时间来适应该模型。在许多应用程序中，数据分布既取决于当前部署的ML模型，也取决于在部署模型之前填充所处的“状态”。在这项工作中，我们提出了一种新的算法，状态执行梯度下降（statefulperfgd），以最小化即使在存在这些影响的情况下的执行损失。我们为有状态PerfGD的收敛性提供了理论保证。我们的实验证实，有状态PerfGD大大优于以前的最先进的方法。摘要：A recent line of work has focused on training machine learning (ML) models in the performative setting, i.e. when the data distribution reacts to the deployed model. The goal in this setting is to learn a model which both induces a favorable data distribution and performs well on the induced distribution, thereby minimizing the test loss. Previous work on finding an optimal model assumes that the data distribution immediately adapts to the deployed model. In practice, however, this may not be the case, as the population may take time to adapt to the model. In many applications, the data distribution depends on both the currently deployed ML model and on the "state" that the population was in before the model was deployed. In this work, we propose a new algorithm, Stateful Performative Gradient Descent (Stateful PerfGD), for minimizing the performative loss even in the presence of these effects. We provide theoretical guarantees for the convergence of Stateful PerfGD. Our experiments confirm that Stateful PerfGD substantially outperforms previous state-of-the-art methods.

【20】 Designing weighted and multiplex networks for deep learning user geolocation in Twitter 标题：推特深度学习用户地理定位的加权多路网络设计链接：https://arxiv.org/abs/2112.06999

作者：Federico M. Funes,José Ignacio Alvarez-Hamelin,Mariano G. Beiró 机构：Facultad de Ingeniería (UBA), INTECIN (UBA-CONICET) 摘要：预测推特等社交媒体用户的地理位置在健康监测、紧急情况监测、内容个性化和一般社会研究中有多种应用。在这项工作中，我们通过设计和评估基于加权多重图文献并结合最先进的深度学习技术的新方法，为这一领域的研究做出贡献。所探索的方法与类似的底层结构（扩展提及和/或跟随者网络）不同，但使用不同的信息处理策略，例如，通过转换和归纳算法（分别为RGCNs和GraphSAGE）进行信息扩散，以及使用Node2vec 进行节点嵌入。然后将这些图形与注意机制结合起来，将用户的文本视图合并到模型中。我们评估每种方法的性能，并将其与公开的Twitter美国数据集中的基线模型进行比较；我们还提供了一个新的数据集，该数据集基于拉丁美洲的一个大型Twitter捕获。最后，我们的工作讨论了在不同标签定义和度量的上下文中比较方法的局限性和有效性。摘要：Predicting the geographical location of users of social media like Twitter has found several applications in health surveillance, emergency monitoring, content personalization, and social studies in general. In this work we contribute to the research in this area by designing and evaluating new methods based on the literature of weighted multigraphs combined with state-of-the-art deep learning techniques. The explored methods depart from a similar underlying structure (that of an extended mention and/or follower network) but use different information processing strategies, e.g., information diffusion through transductive and inductive algorithms -- RGCNs and GraphSAGE, respectively -- and node embeddings with Node2vec . These graphs are then combined with attention mechanisms to incorporate the users' text view into the models. We assess the performance of each of these methods and compare them to baseline models in the publicly available Twitter-US dataset; we also make a new dataset available based on a large Twitter capture in Latin America. Finally, our work discusses the limitations and validity of the comparisons among methods in the context of different label definitions and metrics.

【21】 Analyzing a Caching Model 标题：分析缓存模型链接：https://arxiv.org/abs/2112.06989

作者：Leon Sixt,Evan Zheran Liu,Marie Pellat,James Wexler,Milad Hashemi Been Kim,Martin Maas 机构：∗ Freie Universit¨at Berlin, † Stanford University, ◦ Google 备注：Presented at the Neurips 2021 Workshop ML for System 摘要：机器学习已经成功地应用于诸如内存预取和缓存之类的系统应用中，在这些应用中，学习模型的性能优于启发式算法。然而，缺乏对这些模型的内部工作原理（可解释性）的理解仍然是在实际部署中采用这些模型的主要障碍。了解模型的行为可以帮助系统管理员和开发人员获得对模型的信心，了解风险，并调试生产中的意外行为。计算机系统中使用的模型的可解释性带来了一个特殊的挑战：与在图像或文本上训练的ML模型不同，输入域（例如，内存访问模式、程序计数器）不能立即解释。因此，一个主要的挑战是用人类实践者可以理解的概念来解释模型。通过分析一个最先进的缓存模型，我们提供了证据，证明该模型所学到的概念超出了可用于解释的简单统计数据。我们的工作为系统ML模型的可解释性迈出了第一步，并强调了这一新兴研究领域的前景和挑战。摘要：Machine Learning has been successfully applied in systems applications such as memory prefetching and caching, where learned models have been shown to outperform heuristics. However, the lack of understanding the inner workings of these models -- interpretability -- remains a major obstacle for adoption in real-world deployments. Understanding a model's behavior can help system administrators and developers gain confidence in the model, understand risks, and debug unexpected behavior in production. Interpretability for models used in computer systems poses a particular challenge: Unlike ML models trained on images or text, the input domain (e.g., memory access patterns, program counters) is not immediately interpretable. A major challenge is therefore to explain the model in terms of concepts that are approachable to a human practitioner. By analyzing a state-of-the-art caching model, we provide evidence that the model has learned concepts beyond simple statistics that can be leveraged for explanations. Our work provides a first step towards explanability of system ML models and highlights both promises and challenges of this emerging research area.

【22】 On The Reliability Of Machine Learning Applications In Manufacturing Environments 标题：制造环境中机器学习应用的可靠性研究链接：https://arxiv.org/abs/2112.06986

作者：Nicolas Jourdan,Sagar Sen,Erik Johannes Husom,Enrique Garcia-Ceja,Tobias Biegel,Joachim Metternich 机构：TU Darmstadt, Germany, SINTEF, Norway 备注：Workshop on Distribution Shifts, 35th Conference on Neural Information Processing Systems (NeurIPS 2021) 摘要：物联网（IoT）设备和网络物理系统（CPS）等先进数字技术在工业环境中的应用日益增多，这使得机器学习（ML）算法在制造领域的生产性应用成为可能。随着ML应用程序在现实工业环境中从研究扩展到生产应用，可靠性问题也随之产生。由于大多数ML模型都是在静态数据集上进行训练和评估的，因此需要对其性能进行连续的在线监测，以建立可靠的系统。此外，随着时间的推移，概念和传感器漂移可能会导致算法的精度下降，因此，如果未被检测到且未正确处理，则会影响安全性、可接受性和经济性。在这项工作中，我们以36个月期间记录的公开工业数据集为例，强调了问题的严重性，并解释了漂移的可能来源。我们评估了制造业中常用的最大似然算法的鲁棒性，结果表明，所有测试算法的精度都会随着漂移的增加而显著下降。我们进一步研究如何利用不确定性估计进行在线性能估计以及漂移检测，作为不断学习应用程序的第一步。结果表明，随机森林等集成算法在漂移下的置信度校正衰减最小。摘要：The increasing deployment of advanced digital technologies such as Internet of Things (IoT) devices and Cyber-Physical Systems (CPS) in industrial environments is enabling the productive use of machine learning (ML) algorithms in the manufacturing domain. As ML applications transcend from research to productive use in real-world industrial environments, the question of reliability arises. Since the majority of ML models are trained and evaluated on static datasets, continuous online monitoring of their performance is required to build reliable systems. Furthermore, concept and sensor drift can lead to degrading accuracy of the algorithm over time, thus compromising safety, acceptance and economics if undetected and not properly addressed. In this work, we exemplarily highlight the severity of the issue on a publicly available industrial dataset which was recorded over the course of 36 months and explain possible sources of drift. We assess the robustness of ML algorithms commonly used in manufacturing and show, that the accuracy strongly declines with increasing drift for all tested algorithms. We further investigate how uncertainty estimation may be leveraged for online performance estimation as well as drift detection as a first step towards continually learning applications. The results indicate, that ensemble algorithms like random forests show the least decay of confidence calibration under drift.

【23】 Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans{ä}tze 标题：利用群等变卷积量子算法加速学习量子态链接：https://arxiv.org/abs/2112.07611

作者：Han Zheng,Zimu Li,Junyu Liu,Sergii Strelchuk,Risi Kondor 机构：Department of Statistics, The University of Chicago, Chicago, IL , USA, DAMTP, Center for Mathematical Sciences, University of Cambridge, Cambridge CB,WA, UK, Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL , USA 备注：16 pages, 12 figures 摘要：我们发展了一个$S_n$等变量子卷积电路的理论框架，建立在并显著推广了Jordan的置换量子计算（PQC）形式。我们证明，量子电路是傅里叶空间神经结构的自然选择，与对称群上最著名的经典快速傅里叶变换（FFT）相比，量子电路在计算$S_n$-傅里叶系数的矩阵元素时具有超指数加速。特别地，我们利用Okounkov-Vershik方法证明了Harrow关于$operatorname{SU}（D）$和$s_n$不可修复基之间等价性的陈述（博士论文2005年第160页），并使用Young Jucys Murphy（YJM）建立了$s_n$等变卷积量子交替Ans{a}tze（$s_n$-CQA）元素。我们证明了$S_n$-CQA是稠密的，因此可以在每个$S_n$-unrepr块中表达，这可以作为未来潜在量子机器学习和优化应用的通用模型。我们的方法从表象理论的角度为证明量子近似优化算法（QOA）的普适性提供了另一种途径。我们的框架可以自然地应用于具有全局$operatorname{SU}（d）$对称性的广泛问题。我们通过数值模拟展示了ans{“a}的有效性tze在矩形晶格和Kagome晶格上找到$J_1$--$J_2$反铁磁海森堡模型基态的符号结构。我们的工作确定了特定机器学习问题的量子优势，并首次将著名的Okounkov Vershik表示理论应用于机器学习和量子物理。摘要：We develop a theoretical framework for $S_n$-equivariant quantum convolutional circuits, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism. We show that quantum circuits are a natural choice for Fourier space neural architectures affording a super-exponential speedup in computing the matrix elements of $S_n$-Fourier coefficients compared to the best known classical Fast Fourier Transform (FFT) over the symmetric group. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $operatorname{SU}(d)$- and $S_n$-irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans{"a}tze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA are dense, thus expressible within each $S_n$-irrep block, which may serve as a universal model for potential future quantum machine learning and optimization applications. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA), from the representation-theoretical point of view. Our framework can be naturally applied to a wide array of problems with global $operatorname{SU}(d)$ symmetry. We present numerical simulations to showcase the effectiveness of the ans{"a}tze to find the sign structure of the ground state of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work identifies quantum advantage for a specific machine learning problem, and provides the first application of the celebrated Okounkov-Vershik's representation theory to machine learning and quantum physics.

【24】 Score-Based Generative Modeling with Critically-Damped Langevin Diffusion 标题：临界阻尼朗之万扩散基于分数的产生式建模链接：https://arxiv.org/abs/2112.07068

作者：Tim Dockhorn,Arash Vahdat,Karsten Kreis 机构：NVIDIA, University of Waterloo, Vector Institute 摘要：基于分数的生成模型（SGMs）已显示出显著的综合质量。SGM依赖于一个扩散过程，该过程会逐渐将数据扰动到一个可处理的分布，而生成模型则学习去噪。除数据分布本身外，该去噪任务的复杂性由扩散过程唯一决定。我们认为，当前的SGM采用了过于简单的扩散，导致不必要的复杂去噪过程，从而限制了生成性建模性能。基于与统计力学的联系，我们提出了一种新的临界阻尼朗之万扩散（CLD），并表明基于CLD的SGM具有优越的性能。CLD可以解释为在扩展空间中运行联合扩散，其中辅助变量可以被视为与哈密顿动力学中的数据变量耦合的“速度”。我们推导了一个新的CLD分数匹配目标，表明该模型只需要学习给定数据的条件分布的分数函数，比直接学习数据的分数更容易。我们还从基于CLD的扩散模型导出了一种新的有效合成抽样方案。我们发现，对于类似的网络结构和采样计算预算，CLD在合成质量上优于以前的SGM。我们表明，我们的新型CLD采样器明显优于Euler-Maruyama等解算器。我们的框架为基于分数的去噪扩散模型提供了新的见解，并可用于高分辨率图像合成。项目页面和代码：https://nv-tlabs.github.io/CLD-SGM. 摘要：Score-based generative models (SGMs) have demonstrated remarkable synthesis quality. SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the generative model learns to denoise. The complexity of this denoising task is, apart from the data distribution itself, uniquely determined by the diffusion process. We argue that current SGMs employ overly simplistic diffusions, leading to unnecessarily complex denoising processes, which limit generative modeling performance. Based on connections to statistical mechanics, we propose a novel critically-damped Langevin diffusion (CLD) and show that CLD-based SGMs achieve superior performance. CLD can be interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered "velocities" that are coupled to the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD and show that the model only needs to learn the score function of the conditional distribution of the velocity given data, an easier task than learning scores of the data directly. We also derive a new sampling scheme for efficient synthesis from CLD-based diffusion models. We find that CLD outperforms previous SGMs in synthesis quality for similar network architectures and sampling compute budgets. We show that our novel sampler for CLD significantly outperforms solvers such as Euler--Maruyama. Our framework provides new insights into score-based denoising diffusion models and can be readily used for high-resolution image synthesis. Project page and code: https://nv-tlabs.github.io/CLD-SGM.

【25】 Dynamic Learning of Correlation Potentials for a Time-Dependent Kohn-Sham System 标题：含时Kohn-Sham系统关联势的动态学习链接：https://arxiv.org/abs/2112.07067

作者：Harish S. Bhat,Kevin Collins,Prachi Gupta,Christine M. Isborn 机构：Department of Applied Mathematics, University of California Merced, Department of Physics, University of California Merced, Department of Applied Mathematics, Department of Chemistry and Biochemistry, University of California 备注：18 pages, 5 figures 摘要：我们发展了在一维空间中学习时间相关Kohn-Sham（TDKS）系统的相关势的方法。我们从一个低维双电子系统开始，我们可以数值求解含时薛定谔方程“odinger方程；这产生了适用于相关势训练模型的电子密度。我们将学习问题定义为在动力学服从TDKS方程的约束下优化最小二乘目标的问题。应用伴随，我们开发了计算梯度的有效方法，从而学习f相关电位。我们的结果表明，可以学习相关势的值，从而得到与基态真值密度匹配的电子密度。我们还展示了如何学习与记忆相关的势泛函，展示了一个这样的模型，该模型可以为训练集之外的轨迹产生合理的结果。摘要：We develop methods to learn the correlation potential for a time-dependent Kohn-Sham (TDKS) system in one spatial dimension. We start from a low-dimensional two-electron system for which we can numerically solve the time-dependent Schr"odinger equation; this yields electron densities suitable for training models of the correlation potential. We frame the learning problem as one of optimizing a least-squares objective subject to the constraint that the dynamics obey the TDKS equation. Applying adjoints, we develop efficient methods to compute gradients and thereby learn models of the correlation potential. Our results show that it is possible to learn values of the correlation potential such that the resulting electron densities match ground truth densities. We also show how to learn correlation potential functionals with memory, demonstrating one such model that yields reasonable results for trajectories outside the training set.

【26】 Quantum Stream Learning 标题：量子流学习链接：https://arxiv.org/abs/2112.06628

作者：Yongcheng Ding,Xi Chen,Rafael Magdalena-Benedicto,José D. Martín-Guerrero 机构： quantum control arisesYongcheng Ding and Xi Chen are with Department of Physical Chemistry, University of the Basque Country UPVEHU 备注：7 pages, 3 figures, submitted to the special issue on stream learning, comments are welcomed 摘要：量子力学的奇异性质使得机器学习（ML）在量子领域与经典应用不同。ML可以用于知识发现，它使用从量子系统中连续提取的信息来完成广泛的任务。该模型接收用于学习和决策的流式量子信息，从而对量子系统产生即时反馈。作为一种流学习方法，我们提出了一种在失谐、退相和弛豫情况下对连续测量的量子比特流数据进行深度强化学习的方法。我们还研究了代理如何通过转移学习来适应另一种量子噪声模式。流学习提供了对闭环量子控制的更好理解，这可能为先进的量子技术铺平道路。摘要：The exotic nature of quantum mechanics makes machine learning (ML) be different in the quantum realm compared to classical applications. ML can be used for knowledge discovery using information continuously extracted from a quantum system in a broad range of tasks. The model receives streaming quantum information for learning and decision-making, resulting in instant feedback on the quantum system. As a stream learning approach, we present a deep reinforcement learning on streaming data from a continuously measured qubit at the presence of detuning, dephasing, and relaxation. We also investigate how the agent adapts to another quantum noise pattern by transfer learning. Stream learning provides a better understanding of closed-loop quantum control, which may pave the way for advanced quantum technologies.

其他(11篇)

【1】 How and Why to Manipulate Your Own Agent 标题：如何以及为什么操纵您自己的代理链接：https://arxiv.org/abs/2112.07640

作者：Yoav Kolumbus,Noam Nisan 机构：†The Hebrew University of Jerusalem 摘要：我们考虑战略设置，其中几个用户参与重复的在线互动，辅以后悔最小化代理重复玩“游戏”为他们的名义。我们研究了代理重复博弈的动力学和平均结果，并将其视为诱导用户之间的元博弈。我们主要关注的是，用户是否可以在这个元游戏中通过错误地向代理报告他们的参数来“操纵”他们自己的代理而获益。我们正式定义了一般博弈的“用户-代理元博弈”模型，讨论了它在自动代理动力学收敛的不同概念下的性质，并分析了在2x2博弈中，当动态收敛到一个均衡时，对用户产生的均衡。摘要：We consider strategic settings where several users engage in a repeated online interaction, assisted by regret-minimizing agents that repeatedly play a "game" on their behalf. We study the dynamics and average outcomes of the repeated game of the agents, and view it as inducing a meta-game between the users. Our main focus is on whether users can benefit in this meta-game from "manipulating" their own agent by mis-reporting their parameters to it. We formally define this "user-agent meta-game" model for general games, discuss its properties under different notions of convergence of the dynamics of the automated agents and analyze the equilibria induced on the users in 2x2 games in which the dynamics converge to a single equilibrium.

【2】 Cold Item Integration in Deep Hybrid Recommenders via Tunable Stochastic Gates 标题：基于可调随机门的深度混合推荐器冷项集成链接：https://arxiv.org/abs/2112.07615

作者：Oren Barkan,Roy Hirsch,Ori Katz,Avi Caciularu,Jonathan Weill,Noam Koenigstein 机构：The Open University, Tel-Aviv University, Technion, Bar-Ilan University, Microsoft 摘要：协作过滤方法中的一个主要挑战是如何为冷项（没有评级的项）生成建议，或将冷项集成到现有目录中。多年来，人们提出了各种混合推荐模型，通过利用项目的元数据和内容以及它们的评级或使用模式来解决这个问题。在这项工作中，我们希望重新探讨冷启动问题，以提请注意一个被忽视的挑战：整合和平衡（常规）温暖项目和完全寒冷项目的能力。在这种情况下，出现了两个不同的挑战：（1）保持温暖物品的高质量性能，（2）学习向相关用户推广冷物品。首先，我们表明这两个目标实际上是相互冲突的，它们之间的平衡取决于业务需求和手头的应用程序。接下来，我们提出了一种新的混合推荐算法，该算法将这两个相互冲突的目标连接起来，并在保持温暖项目的高准确性的同时有效地促进完全寒冷的项目之间实现协调平衡。我们在电影、应用程序和文章推荐上证明了所提出算法的有效性，并对冷-暖权衡进行了实证分析。摘要：A major challenge in collaborative filtering methods is how to produce recommendations for cold items (items with no ratings), or integrate cold item into an existing catalog. Over the years, a variety of hybrid recommendation models have been proposed to address this problem by utilizing items' metadata and content along with their ratings or usage patterns. In this work, we wish to revisit the cold start problem in order to draw attention to an overlooked challenge: the ability to integrate and balance between (regular) warm items and completely cold items. In this case, two different challenges arise: (1) preserving high quality performance on warm items, while (2) learning to promote cold items to relevant users. First, we show that these two objectives are in fact conflicting, and the balance between them depends on the business needs and the application at hand. Next, we propose a novel hybrid recommendation algorithm that bridges these two conflicting objectives and enables a harmonized balance between preserving high accuracy for warm items while effectively promoting completely cold items. We demonstrate the effectiveness of the proposed algorithm on movies, apps, and articles recommendations, and provide an empirical analysis of the cold-warm trade-off.

【3】 A Style and Semantic Memory Mechanism for Domain Generalization 标题：一种面向领域泛化的风格和语义记忆机制链接：https://arxiv.org/abs/2112.07517

作者：Yang Chen,Yu Wang,Yingwei Pan,Ting Yao,Xinmei Tian,Tao Mei 机构：† University of Science and Technology of China, Hefei, China, ‡JD AI Research, Beijing, China 备注：ICCV 2021 摘要：主流最先进的领域泛化算法倾向于优先考虑跨领域语义不变性的假设。同时，固有的域内风格不变性通常被低估和搁置。在本文中，我们发现利用域内风格不变性对于提高域泛化的效率也至关重要。我们验证了网络提供关于哪些领域特征是不变的以及在实例之间共享的信息是至关重要的，这样网络可以增强其理解能力并提高其语义辨别能力。相应地，我们还提出了一种新的“陪审团”机制，该机制在学习领域间有用的语义特征共性方面特别有效。我们称为STEAM的完整模型可以解释为一种新的概率图形模型，其实现需要方便地构造两种类型的内存库：语义特征库和样式特征库。实证结果表明，我们提出的框架明显优于最先进的方法。摘要：Mainstream state-of-the-art domain generalization algorithms tend to prioritize the assumption on semantic invariance across domains. Meanwhile, the inherent intra-domain style invariance is usually underappreciated and put on the shelf. In this paper, we reveal that leveraging intra-domain style invariance is also of pivotal importance in improving the efficiency of domain generalization. We verify that it is critical for the network to be informative on what domain features are invariant and shared among instances, so that the network sharpens its understanding and improves its semantic discriminative ability. Correspondingly, we also propose a novel "jury" mechanism, which is particularly effective in learning useful semantic feature commonalities among domains. Our complete model called STEAM can be interpreted as a novel probabilistic graphical model, for which the implementation requires convenient constructions of two kinds of memory banks: semantic feature bank and style feature bank. Empirical results show that our proposed framework surpasses the state-of-the-art methods by clear margins.

【4】 PP-HumanSeg: Connectivity-Aware Portrait Segmentation with a Large-Scale Teleconferencing Video Dataset 标题：PP-HumanSeg：基于大规模电话会议视频数据集的连通性感知的人像分割链接：https://arxiv.org/abs/2112.07146

作者：Lutao Chu,Yi Liu,Zewu Wu,Shiyu Tang,Guowei Chen,Yuying Hao,Juncai Peng,Zhiliang Yu,Zeyu Chen,Baohua Lai,Haoyi Xiong 机构：Baidu, Inc. 备注：Accepted by WACV workshop 摘要：随着全球范围内2019冠状病毒疾病的猖獗，视频会议的需求激增。为此，实时肖像分割成为一种流行的功能，以取代会议参与者的背景。虽然为从生活场景中提取身体姿势的分割提供了功能丰富的数据集、模型和算法，但视频会议环境中尚未很好地涵盖肖像分割。为了促进这一领域的进展，我们引入了一个名为PP HumanSeg的开源解决方案。这项工作是第一次构建大规模视频肖像数据集，其中包含来自23个会议场景的291个视频，具有14K精细标记帧，并扩展到多摄像机远程会议。此外，我们还提出了一种新的语义连接感知学习（SCL）方法用于语义分割，该方法引入了语义连接感知损失，从连接的角度提高了分割结果的质量。我们提出了一种超轻量级的基于SCL的人像分割模型，在IoU和推理速度之间实现了最佳的平衡。对我们的数据集的广泛评估证明了SCL和我们的模型的优越性。源代码可在https://github.com/PaddlePaddle/PaddleSeg. 摘要：As the COVID-19 pandemic rampages across the world, the demands of video conferencing surge. To this end, real-time portrait segmentation becomes a popular feature to replace backgrounds of conferencing participants. While feature-rich datasets, models and algorithms have been offered for segmentation that extract body postures from life scenes, portrait segmentation has yet not been well covered in a video conferencing context. To facilitate the progress in this field, we introduce an open-source solution named PP-HumanSeg. This work is the first to construct a large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K fine-labeled frames and extensions to multi-camera teleconferencing. Furthermore, we propose a novel Semantic Connectivity-aware Learning (SCL) for semantic segmentation, which introduces a semantic connectivity-aware loss to improve the quality of segmentation results from the perspective of connectivity. And we propose an ultra-lightweight model with SCL for practical portrait segmentation, which achieves the best trade-off between IoU and the speed of inference. Extensive evaluations on our dataset demonstrate the superiority of SCL and our model. The source code is available at https://github.com/PaddlePaddle/PaddleSeg.

【5】 GEO-BLEU: Similarity Measure for Geospatial Sequences 标题：地理BLEU：地理空间序列的相似性度量链接：https://arxiv.org/abs/2112.07144

作者：Toru Shimizu,Kota Tsubouchi,Takahiro Yabe 机构： Yahoo Japan Corporation, Tokyo, Japan, MIT Media Lab, Cambridge, MA, USA 摘要：在最近的地理空间研究中，通过自监督学习对大规模人类移动数据进行建模的重要性与日俱增，与此同时，在使用大规模语料库的自监督方法驱动的自然语言处理方面也取得了进展。虽然已经有很多可行的方法适用于地理空间序列建模本身，但在评估方面，特别是在如何衡量生成序列和参考序列之间的相似性方面，似乎还有改进的余地。在这项工作中，我们提出了一种新的相似性度量GEO-BLEU，它在地理空间序列建模和生成方面特别有用。顾名思义，这项工作是基于BLEU的，BLEU是机器翻译研究中最常用的度量方法之一，同时引入了n-gram概念的空间接近度。我们将此度量与已建立的基线、动态时间扭曲进行比较，并将其应用于实际生成的地理空间序列。使用从12000多个案例中收集的地理空间序列之间相似性的众包注释数据，我们定量和定性地展示了该方法的优越性。摘要：In recent geospatial research, the importance of modeling large-scale human mobility data via self-supervised learning is rising, in parallel with progress in natural language processing driven by self-supervised approaches using large-scale corpora. Whereas there are already plenty of feasible approaches applicable to geospatial sequence modeling itself, there seems to be room to improve with regard to evaluation, specifically about how to measure the similarity between generated and reference sequences. In this work, we propose a novel similarity measure, GEO-BLEU, which can be especially useful in the context of geospatial sequence modeling and generation. As the name suggests, this work is based on BLEU, one of the most popular measures used in machine translation research, while introducing spatial proximity to the idea of n-gram. We compare this measure with an established baseline, dynamic time warping, applying it to actual generated geospatial sequences. Using crowdsourced annotated data on the similarity between geospatial sequences collected from over 12,000 cases, we quantitatively and qualitatively show the proposed method's superiority.

【6】 Real-Time Neural Voice Camouflage 标题：实时神经语音伪装链接：https://arxiv.org/abs/2112.07076

作者：Mia Chiquier,Chengzhi Mao,Carl Vondrick 机构：Department of Computer Science, Columbia University, New York, NY 备注：14 pages 摘要：自动语音识别系统为应用创造了令人兴奋的可能性，但也为系统窃听提供了机会。我们提出了一种方法，可以在不影响房间里人与人之间对话的情况下，通过这些系统在空中伪装一个人的声音。标准的对抗性攻击在实时流情况下无效，因为在执行攻击时，信号的特征将发生变化。我们引入预测攻击，通过预测未来最有效的攻击来实现实时性能。在实时性约束下，我们的方法比通过字错误率测量的基线多4.17倍，通过字符错误率测量的基线多7.27倍。此外，我们还证明了我们的方法在物理距离的现实环境中是切实有效的。摘要：Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping. We propose a method to camouflage a person's voice over-the-air from these systems without inconveniencing the conversation between people in the room. Standard adversarial attacks are not effective in real-time streaming situations because the characteristics of the signal will have changed by the time the attack is executed. We introduce predictive attacks, which achieve real-time performance by forecasting the attack that will be the most effective in the future. Under real-time constraints, our method jams the established speech recognition system DeepSpeech 4.17x more than baselines as measured through word error rate, and 7.27x more as measured through character error rate. We furthermore demonstrate our approach is practically effective in realistic environments over physical distances.

【7】 PantheonRL: A MARL Library for Dynamic Training Interactions 标题：PantheonRL：用于动态训练交互的Marl库链接：https://arxiv.org/abs/2112.07013

作者：Bidipta Sarkar,Aditi Talati,Andy Shih,Dorsa Sadigh 机构：Department of Computer Science, Stanford University 备注：3 pages, 3 figures. Published in Proceedings of the 36th AAAI Conference on Artificial Intelligence (Demo Track) 2022 摘要：我们介绍了PantheonRL，一个用于动态训练交互（如循环、自适应和即席训练）的多智能体强化学习软件包。我们的软件包是围绕灵活的代理对象设计的，这些对象可以轻松配置以支持不同的训练交互，并处理具有混合奖励和n个代理的完全通用多代理环境。我们的软件包建立在StableBaselines3之上，可直接与现有强大的deep RL算法配合使用。最后，PantheonRL提供了一个直观但功能强大的web用户界面，用于配置实验和启动多个异步作业。我们的包裹可以在https://github.com/Stanford-ILIAD/PantheonRL. 摘要：We present PantheonRL, a multiagent reinforcement learning software package for dynamic training interactions such as round-robin, adaptive, and ad-hoc training. Our package is designed around flexible agent objects that can be easily configured to support different training interactions, and handles fully general multiagent environments with mixed rewards and n agents. Built on top of StableBaselines3, our package works directly with existing powerful deep RL algorithms. Finally, PantheonRL comes with an intuitive yet functional web user interface for configuring experiments and launching multiple asynchronous jobs. Our package can be found at https://github.com/Stanford-ILIAD/PantheonRL.

【8】 ELF: Exact-Lipschitz Based Universal Density Approximator Flow 标题：ELF：基于Exact-Lipschitz的通用密度近似子流链接：https://arxiv.org/abs/2112.06997

作者：Achintya Gopal 机构：Bloomberg Quant Research 摘要：在过去几年中，流动正常化越来越受欢迎；然而，它们的计算成本仍然很高，这使得它们很难被更广泛的机器学习社区所接受。本文介绍了一个简单的一维单层网络，它具有封闭形式的Lipschitz常数；利用这一点，我们引入了一种新的精确Lipschitz流（ELF），它结合了从剩余流采样的方便性和自回归流的强大性能。此外，我们还证明了ELF是一种通用密度近似器，与许多其他流相比，它的计算和参数效率更高，并且在多个大规模数据集上实现了最先进的性能。摘要：Normalizing flows have grown more popular over the last few years; however, they continue to be computationally expensive, making them difficult to be accepted into the broader machine learning community. In this paper, we introduce a simple one-dimensional one-layer network that has closed form Lipschitz constants; using this, we introduce a new Exact-Lipschitz Flow (ELF) that combines the ease of sampling from residual flows with the strong performance of autoregressive flows. Further, we show that ELF is provably a universal density approximator, more computationally and parameter efficient compared to a multitude of other flows, and achieves state-of-the-art performance on multiple large-scale datasets.

【9】 Exploring Latent Dimensions of Crowd-sourced Creativity 标题：探索众包创意的潜在维度链接：https://arxiv.org/abs/2112.06978

作者：Umut Kocasari,Alperen Bag,Efehan Atici,Pinar Yanardag 机构：Bogazici University 备注：5th Workshop on Machine Learning for Creativity and Design (NeurIPS 2021), Sydney, Australia 摘要：近年来，在预先训练的GANs的潜在空间中发现可解释方向已成为一个热门话题。虽然现有的作品大多考虑方向的语义图像处理，我们专注于一个抽象的属性：创造力。我们可以操纵一个图像来增加或减少创造性吗？我们在最大的基于人工智能的创意平台ArtBreeer上构建我们的工作，用户可以使用预先训练好的GAN模型生成图像。我们探索了在这个平台上生成的图像的潜在维度，并提出了一个新的操作图像的框架，使其更具创造性。我们的代码和数据集可在http://github.com/catlab-team/latentcreative. 摘要：Recently, the discovery of interpretable directions in the latent spaces of pre-trained GANs has become a popular topic. While existing works mostly consider directions for semantic image manipulations, we focus on an abstract property: creativity. Can we manipulate an image to be more or less creative? We build our work on the largest AI-based creativity platform, Artbreeder, where users can generate images using pre-trained GAN models. We explore the latent dimensions of images generated on this platform and present a novel framework for manipulating images to make them more creative. Our code and dataset are available at http://github.com/catlab-team/latentcreative.

【10】 Efficient differentiable quadratic programming layers: an ADMM approach 标题：高效可微二次规划层：ADMM方法链接：https://arxiv.org/abs/2112.07464

作者：Andrew Butler,Roy Kwon 机构：University of Toronto, Department of Mechanical and Industrial Engineering 摘要：神经网络结构的最新进展允许将凸优化问题无缝集成为端到端可训练神经网络中的可微层。然而，将大中型二次规划集成到深度神经网络结构中是一个挑战，因为用内点方法精确求解二次规划在变量数量上具有最坏情况下的立方复杂性。在本文中，我们提出了一种基于交替方向乘数法（ADMM）的替代网络层架构，该架构能够扩展到具有中等数量变量的问题。后向微分是通过对修正的定点迭代的残差映射进行隐式微分来实现的。模拟结果证明了ADMM层的计算优势，对于中等规模的问题，它比OptNet二次规划层大约快一个数量级。此外，与基于KKT最优性条件的展开微分或隐式微分的标准方法相比，从记忆和计算的角度来看，我们新的后向传递例程是有效的。最后，我们以综合预测和优化范式中的投资组合优化为例进行总结。摘要：Recent advances in neural-network architecture allow for seamless integration of convex optimization problems as differentiable layers in an end-to-end trainable neural network. Integrating medium and large scale quadratic programs into a deep neural network architecture, however, is challenging as solving quadratic programs exactly by interior-point methods has worst-case cubic complexity in the number of variables. In this paper, we present an alternative network layer architecture based on the alternating direction method of multipliers (ADMM) that is capable of scaling to problems with a moderately large number of variables. Backward differentiation is performed by implicit differentiation of the residual map of a modified fixed-point iteration. Simulated results demonstrate the computational advantage of the ADMM layer, which for medium scaled problems is approximately an order of magnitude faster than the OptNet quadratic programming layer. Furthermore, our novel backward-pass routine is efficient, from both a memory and computation standpoint, in comparison to the standard approach based on unrolled differentiation or implicit differentiation of the KKT optimality conditions. We conclude with examples from portfolio optimization in the integrated prediction and optimization paradigm.

【11】 ImportantAug: a data augmentation agent for speech 标题：ImportantAug：一种语音数据增强剂链接：https://arxiv.org/abs/2112.07156

作者：Viet Anh Trinh,Hassan Salami Kavaki,Michael I Mandel 机构： The Graduate Center, CUNY, New York, USA, Brooklyn College, CUNY, New York, USA 备注：Submitted to ICASSP 2022 摘要：我们介绍IMPORTATAUG，一种通过向语音的不重要区域和非重要区域添加噪声来增加语音分类和识别模型训练数据的技术。通过数据增强代理预测每个话语的重要性，该数据增强代理经过训练以最大化其添加的噪声量，同时最小化其对识别性能的影响。我们的方法的有效性在谷歌语音命令（GSC）数据集的第二版上得到了验证。在标准GSC测试集上，与传统的噪声增强相比，它实现了23.3%的相对错误率降低。传统的噪声增强将噪声应用于语音，而不考虑在何处可能最有效。与不增加数据的基线相比，它还提供了25.4%的错误率降低。此外，在添加额外噪声的两个测试集上，所提出的算法优于传统的噪声增强算法和基线算法。摘要：We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions. Importance is predicted for each utterance by a data augmentation agent that is trained to maximize the amount of noise it adds while minimizing its impact on recognition performance. The effectiveness of our method is illustrated on version two of the Google Speech Commands (GSC) dataset. On the standard GSC test set, it achieves a 23.3% relative error rate reduction compared to conventional noise augmentation which applies noise to speech without regard to where it might be most effective. It also provides a 25.4% error rate reduction compared to a baseline without data augmentation. Additionally, the proposed ImportantAug outperforms the conventional noise augmentation and the baseline on two test sets with additional noise added.

linux https 网络安全 NLP服务批量计算

0 人点赞