机器学习学术速递[9.1]

2021-09-16 14:55:15 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.LG 方向,今日共计70篇

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】 Structure-Aware Hard Negative Mining for Heterogeneous Graph Contrastive Learning 标题:基于结构感知的异构图对比学习硬否定挖掘 链接:https://arxiv.org/abs/2108.13886

作者:Yanqiao Zhu,Yichen Xu,Hejie Cui,Carl Yang,Qiang Liu,Shu Wu 机构:Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences 备注:KDD Workshop on Deep Learning on Graphs: Method and Applications (DLG@KDD 2021) 摘要:近年来,异构图神经网络(GNNs)已成为分析HGs的一种事实模型,而它们大多依赖于相对大量的标记数据。在这项工作中,我们研究了对比学习(CL),自我监督方法的一个关键组成部分,对HGs缓解标签稀缺问题。我们首先根据元路径和网络模式生成多个语义视图。然后,通过将对应于不同语义视图的节点嵌入推到彼此靠近的位置(正面)和将其他嵌入拉开的位置(负面),可以在不需要人工注释的情况下获得信息表示。然而,这种CL方法忽略了负片样品的相对硬度,这可能导致次优性能。考虑到复杂的图形结构和GNNs的平滑性质,我们提出了一种结构感知的硬负挖掘方案,该方案通过HGs的结构特征来测量硬度。通过合成更多的负节点,我们在有限的计算开销下为更硬的负节点赋予更大的权重,以进一步提高性能。对三个真实数据集的实证研究表明了该方法的有效性。所提出的方法始终优于现有的最先进的方法,尤其是,甚至超过了一些监督的方法。 摘要:Recently, heterogeneous Graph Neural Networks (GNNs) have become a de facto model for analyzing HGs, while most of them rely on a relative large number of labeled data. In this work, we investigate Contrastive Learning (CL), a key component in self-supervised approaches, on HGs to alleviate the label scarcity problem. We first generate multiple semantic views according to metapaths and network schemas. Then, by pushing node embeddings corresponding to different semantic views close to each other (positives) and pulling other embeddings apart (negatives), one can obtain informative representations without human annotations. However, this CL approach ignores the relative hardness of negative samples, which may lead to suboptimal performance. Considering the complex graph structure and the smoothing nature of GNNs, we propose a structure-aware hard negative mining scheme that measures hardness by structural characteristics for HGs. By synthesizing more negative nodes, we give larger weights to harder negatives with limited computational overhead to further boost the performance. Empirical studies on three real-world datasets show the effectiveness of our proposed method. The proposed method consistently outperforms existing state-of-the-art methods and notably, even surpasses several supervised counterparts.

【2】 Heterogeneous Graph Neural Network with Multi-view Representation Learning 标题:具有多视图表示学习的异构图神经网络 链接:https://arxiv.org/abs/2108.13650

作者:Zezhi Shao,Yongjun Xu,Wei Wei,Fei Wang,Zhao Zhang,Feida Zhu 机构: HuazhongUniversity of Science and Technology 摘要:异构图嵌入的图神经网络是通过探索异构图的异构性和语义,将节点投影到低维空间。然而,一方面,现有的大多数异构图嵌入方法要么对特定语义下的局部结构建模不足,要么在聚合信息时忽略了异构性。另一方面,来自多个语义的表示没有全面集成以获得通用的节点嵌入。为了解决这个问题,我们引入多视图表示学习的思想,提出了一种基于多视图表示学习的异构图神经网络(MV-HetGNN),用于异构图的嵌入。该模型包括节点特征转换、视图特定的自我图编码和自动多视图融合,以彻底学习复杂的结构和语义信息,生成全面的节点表示。在三个真实的异构图形数据集上进行的大量实验表明,所提出的MV-HetGNN模型在各种下游任务(例如节点分类、节点聚类和链路预测)中始终优于所有最先进的GNN基线。 摘要:Graph neural networks for heterogeneous graph embedding is to project nodes into a low-dimensional space by exploring the heterogeneity and semantics of the heterogeneous graph. However, on the one hand, most of existing heterogeneous graph embedding methods either insufficiently model the local structure under specific semantic, or neglect the heterogeneity when aggregating information from it. On the other hand, representations from multiple semantics are not comprehensively integrated to obtain versatile node embeddings. To address the problem, we propose a Heterogeneous Graph Neural Network with Multi-View Representation Learning (named MV-HetGNN) for heterogeneous graph embedding by introducing the idea of multi-view representation learning. The proposed model consists of node feature transformation, view-specific ego graph encoding and auto multi-view fusion to thoroughly learn complex structural and semantic information for generating comprehensive node representations. Extensive experiments on three real-world heterogeneous graph datasets show that the proposed MV-HetGNN model consistently outperforms all the state-of-the-art GNN baselines in various downstream tasks, e.g., node classification, node clustering, and link prediction.

【3】 Adaptive Label Smoothing To Regularize Large-Scale Graph Training 标题:自适应标签平滑在规则化大规模图训练中的应用 链接:https://arxiv.org/abs/2108.13555

作者:Kaixiong Zhou,Ninghao Liu,Fan Yang,Zirui Liu,Rui Chen,Li Li,Soo-Hyun Choi,Xia Hu 机构:Rice University, University of Georgia, Samsung Research America, Samsung Electronics 摘要:图神经网络(GNNs)通过递归地聚集邻居的信息来学习节点表示,已经成为许多领域的主要计算工具。为了处理大规模图,大多数现有方法将输入图划分为多个子图(例如,通过节点聚类)并应用批训练以节省内存成本。然而,这种批量训练将导致每个批次内的标签偏差,然后导致模型预测的过度置信。由于具有正相关标签的连接节点往往被分配到一起,因此传统的交叉熵最小化过程将关注批中有偏类的预测,并可能加剧过度拟合问题。为了克服标签偏差问题,我们提出了自适应标签平滑(ALS)方法,将一个热硬标签替换为平滑标签,该方法学习将标签信任从有偏差的类分配给其他类。具体而言,ALS在预处理步骤中传播节点标签以聚合邻域标签分布,然后在线更新最佳平滑标签以适应特定的图结构。在真实数据集上的实验表明,ALS可以普遍应用于主要的可伸缩学习框架,以校准有偏标签并提高泛化性能。 摘要:Graph neural networks (GNNs), which learn the node representations by recursively aggregating information from its neighbors, have become a predominant computational tool in many domains. To handle large-scale graphs, most of the existing methods partition the input graph into multiple sub-graphs (e.g., through node clustering) and apply batch training to save memory cost. However, such batch training will lead to label bias within each batch, and then result in over-confidence in model predictions. Since the connected nodes with positively related labels tend to be assigned together, the traditional cross-entropy minimization process will attend on the predictions of biased classes in the batch, and may intensify the overfitting issue. To overcome the label bias problem, we propose the adaptive label smoothing (ALS) method to replace the one-hot hard labels with smoothed ones, which learns to allocate label confidences from the biased classes to the others. Specifically, ALS propagates node labels to aggregate the neighborhood label distribution in a pre-processing step, and then updates the optimal smoothed labels online to adapt to specific graph structure. Experiments on the real-world datasets demonstrate that ALS can be generally applied to the main scalable learning frameworks to calibrate the biased labels and improve generalization performances.

【4】 An FEA surrogate model with Boundary Oriented Graph Embedding approach 标题:一种边界有向图嵌入法的有限元代理模型 链接:https://arxiv.org/abs/2108.13509

作者:Xingyu Fu,Fengfeng Zhou,Dheeraj Peddireddy,Zhengyang Kang,Martin Byung-Guk Jun,Vaneet Aggarwal 机构:Aggarwalb, School of Mechanical Engineering, Purdue University, Purdue Mall, West Lafayette, IN , USA, School of Industrial Engineering, Purdue University, Purdue Mall, West Lafayette, IN , USA 摘要:在这项工作中,我们提出了一种面向边界的图嵌入(BOGE)方法,用于图神经网络(GNN)作为回归物理场和求解边值问题的通用替代模型。BOGE方法为边界元素和局部相邻元素提供快捷方式,可以将结构化网格元素嵌入到图形中,并对基于大规模三角形网格的FEA结果执行有效回归,这是其他基于机器学习的代理方法无法实现的。针对悬臂梁问题,我们的BOGE方法不仅能拟合应力场分布,而且能回归拓扑优化结果,显示了其实现抽象决策设计过程的潜力。采用三层DeepGCN模型的BOGE方法实现了应力场预测的MSE为0.011706(2.41%MAPE)和拓扑优化的MSE为0.002735(1.58%元素的误差大于0.01)的回归。}BOGE方法的总体概念为通用和高效的基于深度学习的FEA模拟器,将对工业和设计相关领域都有利。 摘要:In this work, we present a Boundary Oriented Graph Embedding (BOGE) approach for the Graph Neural Network (GNN) to serve as a general surrogate model for regressing physical fields and solving boundary value problems. Providing shortcuts for both boundary elements and local neighbor elements, the BOGE approach can embed structured mesh elements into the graph and performs an efficient regression on large-scale triangular-mesh-based FEA results, which cannot be realized by other machine-learning-based surrogate methods. Focusing on the cantilever beam problem, our BOGE approach cannot only fit the distribution of stress fields but also regresses the topological optimization results, which show its potential of realizing abstract decision-making design process. The BOGE approach with 3-layer DeepGCN model textcolor{blue}{achieves the regression with MSE of 0.011706 (2.41% MAPE) for stress field prediction and 0.002735 MSE (with 1.58% elements having error larger than 0.01) for topological optimization.} The overall concept of the BOGE approach paves the way for a general and efficient deep-learning-based FEA simulator that will benefit both industry and design-related areas.

Transformer(1篇)

【1】 Medical SANSformers: Training self-supervised transformers without attention for Electronic Medical Records 标题:医疗SANSformers:训练自我监督的Transformer而不关注电子病历 链接:https://arxiv.org/abs/2108.13672

作者:Yogesh Kumar,Alexander Ilin,Henri Salo,Sangita Kulathinal,Maarit K. Leinonen,Pekka Marttinen 机构:Department of Computer Science, Aalto University, Finland, Information Services Department, Finnish Institute for Health and Welfare, Finland, Department of Mathematics and Statistics, University of Helsinki, Finland, Editor: 备注:25 pages, 8 figures, 5 tables, Submitted to a journal 摘要:我们利用深度序列模型来解决预测患者医疗利用率的问题,这可以帮助政府更好地为未来医疗利用分配资源。具体地说,我们研究了{发散子群}问题,其中较小的人群子集的结果分布与一般人群的结果分布有很大的差异。如果子组的规模非常小(例如,罕见疾病),则为不同的子组构建专门模型的传统方法可能会有问题。为了应对这一挑战,我们首先开发了一种新的无注意顺序模型SANSformers,该模型注入了适用于电子病历中临床代码建模的归纳偏差。然后,我们设计了一个特定于任务的自我监督目标,并通过在对不同亚组的下游任务进行微调之前,在整个健康登记处(有近100万名患者)对每个模型进行预训练,证明其有效性,特别是在缺乏数据的情况下。我们使用两个数据源和一个有助于医疗利用率预测的多任务学习目标,将新的SANSformer架构与LSTM和Transformer模型进行比较。从经验上看,无注意模型在整个实验中表现一贯良好,在大多数情况下至少比基线高出10$%。此外,自我监督的预训练在整个过程中显著提高了绩效,例如,在预测医院就诊次数时,在$R^2$分数上提高了$50$%(高达$800$%)。 摘要:We leverage deep sequential models to tackle the problem of predicting healthcare utilization for patients, which could help governments to better allocate resources for future healthcare use. Specifically, we study the problem of textit{divergent subgroups}, wherein the outcome distribution in a smaller subset of the population considerably deviates from that of the general population. The traditional approach for building specialized models for divergent subgroups could be problematic if the size of the subgroup is very small (for example, rare diseases). To address this challenge, we first develop a novel attention-free sequential model, SANSformers, instilled with inductive biases suited for modeling clinical codes in electronic medical records. We then design a task-specific self-supervision objective and demonstrate its effectiveness, particularly in scarce data settings, by pre-training each model on the entire health registry (with close to one million patients) before fine-tuning for downstream tasks on the divergent subgroups. We compare the novel SANSformer architecture with the LSTM and Transformer models using two data sources and a multi-task learning objective that aids healthcare utilization prediction. Empirically, the attention-free SANSformer models perform consistently well across experiments, outperforming the baselines in most cases by at least $sim 10$%. Furthermore, the self-supervised pre-training boosts performance significantly throughout, for example by over $sim 50$% (and as high as $800$%) on $R^2$ score when predicting the number of hospital visits.

GAN|对抗|攻击|生成相关(8篇)

【1】 Quantization of Generative Adversarial Networks for Efficient Inference: a Methodological Study 标题:用于有效推理的生成性对抗网络的量化:方法论研究 链接:https://arxiv.org/abs/2108.13996

作者:Pavel Andreev,Alexander Fritzler,Dmitry Vetrov 机构:Higher School of Economics, Skolkovo Institute of Science and Technology, Samsung AI Center Moscow, Moscow, Russia, Yandex, Samsung-HSE Laboratory 摘要:生成性对抗网络(GAN)对数字内容的创建具有巨大的潜在影响,例如照片逼真的数字化身、语义内容编辑以及语音和图像的质量增强。然而,现代GANs的性能伴随着推理过程中执行的大量计算和高能耗。这使得它们在边缘设备上的部署变得复杂,甚至不可能。这个问题可以通过量化来解决,量化是一种神经网络压缩技术,它通过用低位整数取代浮点计算来简化硬件友好的推理。虽然量化已经很好地建立了判别模型,但现代量化技术在GANs应用中的性能仍不清楚。GANs生成的内容比鉴别模型更复杂,因此GANs的量化更具挑战性。为了解决这个问题,我们对三种不同的GAN结构,即StyleGAN、Self-Attention GAN和CycleGAN,进行了大量的实验研究。因此,我们发现了一些实用的方法,使我们能够成功地用4/8位权重和8位激活对这些模型进行量化,同时保持原始全精度模型的质量。 摘要:Generative adversarial networks (GANs) have an enormous potential impact on digital content creation, e.g., photo-realistic digital avatars, semantic content editing, and quality enhancement of speech and images. However, the performance of modern GANs comes together with massive amounts of computations performed during the inference and high energy consumption. That complicates, or even makes impossible, their deployment on edge devices. The problem can be reduced with quantization -- a neural network compression technique that facilitates hardware-friendly inference by replacing floating-point computations with low-bit integer ones. While quantization is well established for discriminative models, the performance of modern quantization techniques in application to GANs remains unclear. GANs generate content of a more complex structure than discriminative models, and thus quantization of GANs is significantly more challenging. To tackle this problem, we perform an extensive experimental study of state-of-art quantization techniques on three diverse GAN architectures, namely StyleGAN, Self-Attention GAN, and CycleGAN. As a result, we discovered practical recipes that allowed us to successfully quantize these models for inference with 4/8-bit weights and 8-bit activations while preserving the quality of the original full-precision models.

【2】 Morphence: Moving Target Defense Against Adversarial Examples 标题:Morphence:针对敌方的移动目标防御示例 链接:https://arxiv.org/abs/2108.13952

作者:Abderrahmen Amich,Birhanu Eshete 机构:University of Michigan, Dearborn 备注:None 摘要:对机器学习模型的对抗性示例的鲁棒性仍然是一个开放的研究主题。攻击通常是通过反复探测固定目标模型,并故意制作对抗性示例来愚弄它而成功的。在本文中,我们介绍了变形,这是一种通过使模型成为对抗对手的移动目标来改变防御格局的方法。通过定期移动模型的决策函数,Morphence使重复或相关攻击的成功变得非常困难。Morphence以一种在响应预测查询时引入足够随机性的方式部署从基础模型生成的模型池。为确保重复攻击或相关攻击失败,在达到查询预算后,部署的模型池将自动过期,并且模型池将无缝替换为预先生成的新模型池。我们在两个基准图像分类数据集(MNIST和CIFAR10)上评估了五种参考攻击(2种白盒和3种黑盒)下的变形。在所有情况下,Morphence始终优于迄今为止有效的防御和对抗性训练,即使在面对强大的白盒攻击时也是如此,同时保留了干净数据的准确性。 摘要:Robustness to adversarial examples of machine learning models remains an open topic of research. Attacks often succeed by repeatedly probing a fixed target model with adversarial examples purposely crafted to fool it. In this paper, we introduce Morphence, an approach that shifts the defense landscape by making a model a moving target against adversarial examples. By regularly moving the decision function of a model, Morphence makes it significantly challenging for repeated or correlated attacks to succeed. Morphence deploys a pool of models generated from a base model in a manner that introduces sufficient randomness when it responds to prediction queries. To ensure repeated or correlated attacks fail, the deployed pool of models automatically expires after a query budget is reached and the model pool is seamlessly replaced by a new model pool generated in advance. We evaluate Morphence on two benchmark image classification datasets (MNIST and CIFAR10) against five reference attacks (2 white-box and 3 black-box). In all cases, Morphence consistently outperforms the thus-far effective defense, adversarial training, even in the face of strong white-box attacks, while preserving accuracy on clean data.

【3】 EG-Booster: Explanation-Guided Booster of ML Evasion Attacks 标题:EG-Booster:解释制导的ML规避攻击助推器 链接:https://arxiv.org/abs/2108.13930

作者:Abderrahmen Amich,Birhanu Eshete 机构:University of Michigan, Dearborn 摘要:机器学习(ML)在众多领域的广泛应用,使人们对其在安全关键环境中的可信性产生了疑问。寻求可信ML的一部分是对ML模型进行健壮性评估,以测试时间对抗性示例。与可信的ML目标相一致,基于特征的模型预测解释可能有助于健壮性评估。在本文中,我们提出了一种称为EG-Booster的新方法,该方法利用可解释ML中的技术来指导对抗性示例制作,以改进ML模型的健壮性评估,然后再将其部署到安全关键环境中。EG-Booster中的关键洞察是使用基于特征的模型预测解释,通过添加可能导致模型规避的后果性扰动和避免不可能导致规避的非后果性扰动来指导对抗性示例制作。EG-Booster对模型架构、威胁模型不可知,并支持以前文献中使用的各种距离度量。我们使用图像分类基准数据集MNIST和CIFAR10评估EG-Booster。我们的研究结果表明,EG-Booster可以显著提高最先进攻击的规避率,同时减少干扰次数。通过覆盖四个白盒和三个黑盒攻击的广泛实验,我们证明了EG-Booster对在MNIST和CIFAR10上训练的两个未设防神经网络以及在CIFAR10上训练的另一个经对抗训练的ResNet模型的有效性。此外,我们引入了一个稳定性评估指标,并通过观察模型在多个EG助推器运行期间的分类输出之间的相似性来评估我们基于解释的方法的可靠性。 摘要:The widespread usage of machine learning (ML) in a myriad of domains has raised questions about its trustworthiness in security-critical environments. Part of the quest for trustworthy ML is robustness evaluation of ML models to test-time adversarial examples. Inline with the trustworthy ML goal, a useful input to potentially aid robustness evaluation is feature-based explanations of model predictions. In this paper, we present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting for improved robustness evaluation of ML models before deploying them in security-critical settings. The key insight in EG-Booster is the use of feature-based explanations of model predictions to guide adversarial example crafting by adding consequential perturbations likely to result in model evasion and avoiding non-consequential ones unlikely to contribute to evasion. EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature. We evaluate EG-Booster using image classification benchmark datasets, MNIST and CIFAR10. Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations. Through extensive experiments that covers four white-box and three black-box attacks, we demonstrate the effectiveness of EG-Booster against two undefended neural networks trained on MNIST and CIFAR10, and another adversarially-trained ResNet model trained on CIFAR10. Furthermore, we introduce a stability assessment metric and evaluate the reliability of our explanation-based approach by observing the similarity between the model's classification outputs across multiple runs of EG-Booster.

【4】 Beyond Model Extraction: Imitation Attack for Black-Box NLP APIs 标题:超越模型提取:黑盒NLP API的模仿攻击 链接:https://arxiv.org/abs/2108.13873

作者:Qiongkai Xu,Xuanli He,Lingjuan Lyu,Lizhen Qu,Gholamreza Haffari 机构:The Australian National University, Canberra, ACT, Australia, Data, CSIRO, Canberra, ACT, Australia, Monash University, Clayton, VIC, Australia, Sony AI, Japan 摘要:机器学习即服务(MLaaS)已经吸引了数以百万计的用户使用其性能优于复杂的模型。虽然发布为黑盒API,但这些服务背后的有价值的模型仍然容易受到模仿攻击。最近,一系列研究表明攻击者能够窃取或提取受害者模型。尽管如此,以前被盗的模型中没有一个能比原来的黑盒API更好。在这项工作中,我们迈出了第一步,证明攻击者可以通过无监督的域自适应和多受害者集成潜在地超过受害者。在基准数据集和真实世界的API上进行的大量实验验证了模仿者可以成功地超越原始的黑盒模型。我们认为这是模仿攻击研究的一个里程碑,特别是在NLP API上,因为优越的性能会影响API提供者的防御甚至发布策略。 摘要:Machine-learning-as-a-service (MLaaS) has attracted millions of users to their outperforming sophisticated models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrated that attackers manage to steal or extract the victim models. Nonetheless, none of the previous stolen models can outperform the original black-box APIs. In this work, we take the first step of showing that attackers could potentially surpass victims via unsupervised domain adaptation and multi-victim ensemble. Extensive experiments on benchmark datasets and real-world APIs validate that the imitators can succeed in outperforming the original black-box models. We consider this as a milestone in the research of imitation attack, especially on NLP APIs, as the superior performance could influence the defense or even publishing strategy of API providers.

【5】 Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models 标题:基于强化学习的视频识别模型稀疏黑盒对抗攻击 链接:https://arxiv.org/abs/2108.13872

作者:Zeyuan Wang,Chaofeng Sha,Su Yang 机构:Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University 备注:None 摘要:我们探讨了视频识别模型上的黑盒对抗攻击。攻击仅在选定的关键区域和关键帧上执行,以降低因视频高维性而搜索对抗性干扰的高计算成本。要选择关键帧,一种方法是使用启发式算法评估每个帧的重要性并选择关键帧。然而,它在排序和搜索方面的时间效率很低。为了加快攻击过程,我们提出了一种基于强化学习的帧选择策略。具体来说,代理探索原始类和目标类视频之间的差异,以做出选择决策。它从显示决策质量的威胁模型中获得奖励。此外,我们还使用显著性检测来选择关键区域,并在零阶优化中仅估计梯度符号,而不是梯度本身,以进一步提高攻击过程。我们可以在非目标攻击中直接使用经过训练的模型,也可以在目标攻击中进行少量微调,从而节省计算时间。对真实数据集的一系列实证结果表明了该方法的有效性和有效性。 摘要:We explore the black-box adversarial attack on video recognition models. Attacks are only performed on selected key regions and key frames to reduce the high computation cost of searching adversarial perturbations on a video due to its high dimensionality. To select key frames, one way is to use heuristic algorithms to evaluate the importance of each frame and choose the essential ones. However, it is time inefficient on sorting and searching. In order to speed up the attack process, we propose a reinforcement learning based frame selection strategy. Specifically, the agent explores the difference between the original class and the target class of videos to make selection decisions. It receives rewards from threat models which indicate the quality of the decisions. Besides, we also use saliency detection to select key regions and only estimate the sign of gradient instead of the gradient itself in zeroth order optimization to further boost the attack process. We can use the trained model directly in the untargeted attack or with little fine-tune in the targeted attack, which saves computation time. A range of empirical results on real datasets demonstrate the effectiveness and efficiency of the proposed method.

【6】 Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings 标题:基于自监督嵌入的对抗性攻击样本高效检测与分类 链接:https://arxiv.org/abs/2108.13797

作者:Mazda Moayeri,Soheil Feizi 机构:Department of Computer Science, University of Maryland 备注:Accepted to ICCV 2021 摘要:深部模型的对抗性健壮性对于确保在真实环境中的安全部署至关重要,但大多数现代防御范围狭窄,成本昂贵。在本文中,我们提出了一种自监督的方法来检测对抗性攻击,并将其分类到各自的威胁模型中,该方法基于一个线性模型,该模型基于预先训练的自监督编码器的嵌入操作。我们在实验中使用了SimCLR编码器,因为我们表明SimCLR嵌入距离是人类感知能力的一个很好的代理,使其能够同时封装许多威胁模型。我们称我们的方法为SimCat,因为它使用SimCLR编码器捕获并分类各种类型的对抗性攻击,包括L_p和非L_p规避攻击,以及数据中毒。线性分类器的简单性质使得我们的方法在时间和样本复杂度上都是有效的。例如,在SVHN上,仅使用PGD-L_inf攻击计算的五对干净和对抗性示例,SimCat的检测精度超过85%。此外,在ImageNet上,仅使用每个威胁模型中的25个示例,SimCat就可以对八种不同的攻击类型进行分类,如PGD-L_2、PGD-L_inf、CW-L_2、PPGD、LPA、StAdv、RECLOR和JPEG-L_inf,准确率超过40%。在STL10数据上,我们使用SimCat作为对中毒攻击的防御,例如BP、CP、FC、CLBD、HTBD,在只使用20种总毒药进行训练的情况下,成功率降低了一半。我们发现,检测器可以很好地推广到看不见的威胁模型。最后,我们研究了我们的检测方法在自适应攻击下的性能,并通过对抗性训练进一步增强了其对此类攻击的鲁棒性。 摘要:Adversarial robustness of deep models is pivotal in ensuring safe deployment in real world settings, but most modern defenses have narrow scope and expensive costs. In this paper, we propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models, based on a linear model operating on the embeddings from a pre-trained self-supervised encoder. We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility, enabling it to encapsulate many threat models at once. We call our method SimCat since it uses SimCLR encoder to catch and categorize various types of adversarial attacks, including L_p and non-L_p evasion attacks, as well as data poisonings. The simple nature of a linear classifier makes our method efficient in both time and sample complexity. For example, on SVHN, using only five pairs of clean and adversarial examples computed with a PGD-L_inf attack, SimCat's detection accuracy is over 85%. Moreover, on ImageNet, using only 25 examples from each threat model, SimCat can classify eight different attack types such as PGD-L_2, PGD-L_inf, CW-L_2, PPGD, LPA, StAdv, ReColor, and JPEG-L_inf, with over 40% accuracy. On STL10 data, we apply SimCat as a defense against poisoning attacks, such as BP, CP, FC, CLBD, HTBD, halving the success rate while using only twenty total poisons for training. We find that the detectors generalize well to unseen threat models. Lastly, we investigate the performance of our detection method under adaptive attacks and further boost its robustness against such attacks via adversarial training.

【7】 Segmentation Fault: A Cheap Defense Against Adversarial Machine Learning 标题:分割错误:对抗对抗性机器学习的廉价防御 链接:https://arxiv.org/abs/2108.13617

作者:Doha Al Bared,Mohamed Nassar 机构:American University of Beirut (AUB), Beirut, Lebanon, University of New Haven, West Haven, CT, USA 摘要:最近发表的针对深层神经网络(DNN)的攻击强调了评估在关键系统中使用该技术的安全风险的方法和工具的重要性。检测对抗性机器学习的有效技术有助于建立信任,并促进在敏感和安全系统中采用深度学习。在本文中,我们提出了一种新的技术来保护深度神经网络分类器,特别是卷积分类器。我们的防御是廉价的,因为它需要更少的计算能力,尽管在检测精度方面花费很小。这项工作引用了最近发表的一种称为ML-LOO的技术。我们采用粗粒度的漏选方法,取代了ML-LOO中昂贵的逐像素漏选方法。我们评估和比较了不同分割算法在这项任务中的效率。我们的结果表明,即使检测精度略有下降,效率仍有可能大幅度提高。 摘要:Recently published attacks against deep neural networks (DNNs) have stressed the importance of methodologies and tools to assess the security risks of using this technology in critical systems. Efficient techniques for detecting adversarial machine learning helps establishing trust and boost the adoption of deep learning in sensitive and security systems. In this paper, we propose a new technique for defending deep neural network classifiers, and convolutional ones in particular. Our defense is cheap in the sense that it requires less computation power despite a small cost to pay in terms of detection accuracy. The work refers to a recently published technique called ML-LOO. We replace the costly pixel by pixel leave-one-out approach of ML-LOO by adopting coarse-grained leave-one-out. We evaluate and compare the efficiency of different segmentation algorithms for this task. Our results show that a large gain in efficiency is possible, even though penalized by a marginal decrease in detection accuracy.

【8】 How Does Adversarial Fine-Tuning Benefit BERT? 标题:对抗性微调对BERT有什么好处? 链接:https://arxiv.org/abs/2108.13602

作者:Javid Ebrahimi,Hao Yang,Wei Zhang 机构:Visa Research, Palo Alto, USA 摘要:对抗性训练(AT)是机器学习中防御对抗性攻击最可靠的方法之一。该方法的变体已被用作正则化机制,以在NLP基准上实现SOTA结果,并被发现对迁移学习和持续学习有用。我们通过对比普通和不利微调的BERT模型来寻找AT有效性的原因。我们认为在微调过程中部分保留BERT的句法能力是AT成功的关键。我们观察到,逆向微调模型更忠实于BERT的语言建模行为,并且对词序更敏感。作为句法能力的具体例子,逆向微调模型在回指一致性方面的优势高达38%,在依赖性分析方面的优势高达11%。我们的分析表明,香草精调过度简化了句子表达,主要集中在一个或几个标签指示词上。然而,AT缓和了这些有影响力的词语的影响,并鼓励了代表性的多样性。这使得一个句子的表达更加层次化,从而减轻了BERT的句法能力丧失。 摘要:Adversarial training (AT) is one of the most reliable methods for defending against adversarial attacks in machine learning. Variants of this method have been used as regularization mechanisms to achieve SOTA results on NLP benchmarks, and they have been found to be useful for transfer learning and continual learning. We search for the reasons for the effectiveness of AT by contrasting vanilla and adversarially fine-tuned BERT models. We identify partial preservation of BERT's syntactic abilities during fine-tuning as the key to the success of AT. We observe that adversarially fine-tuned models remain more faithful to BERT's language modeling behavior and are more sensitive to the word order. As concrete examples of syntactic abilities, an adversarially fine-tuned model could have an advantage of up to 38% on anaphora agreement and up to 11% on dependency parsing. Our analysis demonstrates that vanilla fine-tuning oversimplifies the sentence representation by focusing heavily on one or a few label-indicative words. AT, however, moderates the effect of these influential words and encourages representational diversity. This allows for a more hierarchical representation of a sentence and leads to the mitigation of BERT's loss of syntactic abilities.

半/弱/无/有监督|不确定性|主动学习(3篇)

【1】 S4-Crowd: Semi-Supervised Learning with Self-Supervised Regularisation for Crowd Counting 标题:S4-人群:具有自我监督规则化的半监督学习人群计数 链接:https://arxiv.org/abs/2108.13969

作者:Haoran Duan,Yu Guan 机构:Newcastle University 摘要:人群计数因其在智能城市中的广泛应用而受到越来越多的关注。最近的工作取得了良好的性能,但依赖于监督范式和昂贵的人群注释。为了降低注释成本,在这项工作中,我们提出了一个半监督学习框架S4-Crowd,它可以利用未标记/标记的数据进行鲁棒群组建模。在无监督路径中,提出了两种自监督损失来模拟人群的尺度、光照等变化,并在此基础上生成监督信息伪标签,并逐步细化。我们还提出了一种群组驱动的递归单元——门控群组递归单元(GCRU),它可以通过提取二阶统计量来保留鉴别群组信息,产生质量更好的伪标签。提出了一种包含无监督/监督信息的联合损失,并采用动态加权策略来平衡不同训练阶段的无监督损失和监督损失的重要性。我们在半监督环境下对四个流行的人群计数数据集进行了广泛的实验。实验结果表明,在我们的S4群组框架中,每个提议的组件都是有效的。在这些人群数据集上,我们的方法也优于其他最先进的半监督学习方法。 摘要:Crowd counting has drawn more attention because of its wide application in smart cities. Recent works achieved promising performance but relied on the supervised paradigm with expensive crowd annotations. To alleviate annotation cost, in this work we proposed a semi-supervised learning framework S4-Crowd, which can leverage both unlabeled/labeled data for robust crowd modelling. In the unsupervised pathway, two self-supervised losses were proposed to simulate the crowd variations such as scale, illumination, etc., based on which and the supervised information pseudo labels were generated and gradually refined. We also proposed a crowd-driven recurrent unit Gated-Crowd-Recurrent-Unit (GCRU), which can preserve discriminant crowd information by extracting second-order statistics, yielding pseudo labels with improved quality. A joint loss including both unsupervised/supervised information was proposed, and a dynamic weighting strategy was employed to balance the importance of the unsupervised loss and supervised loss at different training stages. We conducted extensive experiments on four popular crowd counting datasets in semi-supervised settings. Experimental results suggested the effectiveness of each proposed component in our S4-Crowd framework. Our method also outperformed other state-of-the-art semi-supervised learning approaches on these crowd datasets.

【2】 Self-balanced Learning For Domain Generalization 标题:自平衡学习在领域泛化中的应用 链接:https://arxiv.org/abs/2108.13597

作者:Jin Kim,Jiyoung Lee,Jungin Park,Dongbo Min,Kwanghoon Sohn 机构:†School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea, ‡Department of Computer Science and Engineering, Ewha Womans University, Seoul, Korea 备注:None 摘要:领域泛化的目的是在多领域源数据上学习一个预测模型,使该模型能够泛化到具有未知统计信息的目标领域。大多数现有方法都是在假设源数据在域和类方面都很平衡的情况下开发的。然而,使用不同组合偏差收集的真实世界训练数据往往在领域和类别上表现出严重的分布差距,导致性能大幅下降。在本文中,我们提出了一个自平衡的领域泛化框架,该框架自适应地学习损失的权重,以减轻由多领域源数据的不同分布所引起的偏差。自平衡方案基于一个辅助重新加权网络,该网络通过利用平衡元数据迭代更新以域和类信息为条件的损失权重。实验结果表明,我们的方法在领域泛化方面是有效的。 摘要:Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics. Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class. However, real-world training data collected with different composition biases often exhibits severe distribution gaps for domain and class, leading to substantial performance degradation. In this paper, we propose a self-balanced domain generalization framework that adaptively learns the weights of losses to alleviate the bias caused by different distributions of the multi-domain source data. The self-balanced scheme is based on an auxiliary reweighting network that iteratively updates the weight of loss conditioned on the domain and class information by leveraging balanced meta data. Experimental results demonstrate the effectiveness of our method overwhelming state-of-the-art works for domain generalization.

【3】 Semi-Supervised Exaggeration Detection of Health Science Press Releases 标题:卫生科学新闻稿的半监督夸张检测 链接:https://arxiv.org/abs/2108.13493

作者:Dustin Wright,Isabelle Augenstein 机构:Dept. of Computer Science, University of Copenhagen, Denmark 备注:Accepted to EMNLP 2021; 13 pages, 6 figures, 9 tables 摘要:公众对科学的信任有赖于科学论文的诚实和实事求是的交流。然而,最近的研究表明,新闻媒体倾向于通过夸大科学论文的发现来歪曲科学论文。有鉴于此,我们提出了一个形式化的和研究的问题,夸大检测在科学传播。虽然有大量关于它们的科学论文和大众媒体文章,但这些文章很少直接链接到原始论文,这使得数据收集具有挑战性。我们通过策划一组有标签的新闻稿/摘要对来解决这一问题,这些新闻稿/摘要对来自现有的专家注释研究,这些研究是关于科学论文新闻稿中的夸张现象,适用于对任务中机器学习模型的性能进行基准测试。利用本研究和先前科学中夸张检测研究的有限数据,我们介绍了MT-PET,一种模式利用训练(PET)的多任务版本,它利用互补完形填空式QA任务的知识来改进Few-Shot学习。我们证明,无论是在数据有限的情况下,还是在主任务数据丰富的情况下,MT-PET都优于PET和监督学习。 摘要:Public trust in science depends on honest and factual communication of scientific papers. However, recent studies have demonstrated a tendency of news media to misrepresent scientific papers by exaggerating their findings. Given this, we present a formalization of and study into the problem of exaggeration detection in science communication. While there are an abundance of scientific papers and popular media articles written about them, very rarely do the articles include a direct link to the original paper, making data collection challenging. We address this by curating a set of labeled press release/abstract pairs from existing expert annotated studies on exaggeration in press releases of scientific papers suitable for benchmarking the performance of machine learning models on the task. Using limited data from this and previous studies on exaggeration detection in science, we introduce MT-PET, a multi-task version of Pattern Exploiting Training (PET), which leverages knowledge from complementary cloze-style QA tasks to improve few-shot learning. We demonstrate that MT-PET outperforms PET and supervised learning both when data is limited, as well as when there is an abundance of data for the main task.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 Aligning Hotel Embeddings using Domain Adaptation for Next-Item Recommendation 标题:使用用于下一项推荐的领域适配来对齐酒店嵌入 链接:https://arxiv.org/abs/2108.13824

作者:Ioannis Partalas 机构:Expedia Group, Geneva, Switzerland 备注:ACM SIGIR Workshop on eCommerce, July 15, 2021, Virtual Event, Montreal, Canada 摘要:在在线平台中,通常情况下,在同一组下有多个品牌,这些品牌可能针对不同的客户资料,或具有不同的领域。例如,在酒店业领域,Expedia集团拥有多个品牌,如品牌Expedia、Hotels.com和Wotif,这些品牌要么有不同的旅行者简介,要么与当地环境更相关。在这种情况下,学习可以在多个品牌的推荐任务中利用的酒店嵌入需要有一个通用的嵌入,可以使用对齐方法进行诱导。同时,我们需要确保这种公共嵌入空间不会降低任何品牌的性能。在这项工作中,我们建立在hotel2vec模型的基础上,提出了一种简单的正则化方法,用于通过域适应调整不同品牌的酒店嵌入。我们还探讨了以前在跨语言嵌入中使用的对齐方法,以对齐不同语言的空间。我们使用来自两个品牌的点击会话来展示下一个酒店预测任务的结果。结果表明,所提出的方法可以在两个品牌中实现良好性能的同时对齐两个嵌入空间。此外,关于单品牌训练,我们表明,所提出的方法可以显著减少训练时间,提高预测性能。 摘要:In online platforms it is often the case to have multiple brands under the same group which may target different customer profiles, or have different domains. For example, in the hospitality domain, Expedia Group has multiple brands like Brand Expedia, Hotels.com and Wotif which have either different traveler profiles or are more relevant in a local context. In this context, learning embeddings for hotels that can be leveraged in recommendation tasks in multiple brands requires to have a common embedding that can be induced using alignment approaches. In the same time, one needs to ensure that this common embedding space does not degrade the performance in any of the brands. In this work we build upon the hotel2vec model and propose a simple regularization approach for aligning hotel embeddings of different brands via domain adaptation. We also explore alignment methods previously used in cross-lingual embeddings to align spaces of different languages. We present results on the task of next-hotel prediction using click sessions from two brands. The results show that the proposed approach can align the two embedding spaces while achieving good performance in both brands. Additionally, with respect to single-brand training we show that the proposed approach can significantly reduce training time and improve the predictive performance.

【2】 Rapidly and accurately estimating brain strain and strain rate across head impact types with transfer learning and data fusion 标题:利用转移学习和数据融合快速准确地估计不同头部撞击类型的脑应变和应变率 链接:https://arxiv.org/abs/2108.13577

作者:Xianghao Zhan,Yuzhe Liu,Nicholas J. Cecchi,Olivier Gevaert,Michael M. Zeineh,Gerald A. Grant,David B. Camarillo 机构: Gevaert is with the Department of Biomedical Data Science and StanfordCenter for Biomedical Informatics Research, Stanford University 备注:14 pages, 6 figures 摘要:脑应变和应变率可有效预测头部撞击造成的创伤性脑损伤(TBI)。然而,先进的有限元建模(FEM)在计算中需要大量的计算时间,限制了其在实时TBI风险监测中的应用。为了加快速度,开发了机器学习头部模型(MLHMs),发现当训练/测试数据集来自不同的头部碰撞类型时,模型精度降低。但是,特定影响类型的数据集大小可能不足以进行模型训练。为了解决有限元法的计算成本、有限应变率预测以及MLHMs对现场数据集的可推广性,我们提出了数据融合和转移学习来开发一系列预测最大主应变(MPS)和最大主应变率(MPSR)的MLHMs。我们对来自模拟、美式足球、混合武术、车祸的13623次头部碰撞的MLHMs进行了训练和测试,并与仅在模拟或仅在现场碰撞中训练的模型进行了比较。使用转移学习开发的MLHMs在估算MPS和MPSR方面比其他模型更准确,在预测MPS方面的平均绝对误差(MAE)小于0.03,在预测所有影响数据集的MPSR方面的平均绝对误差(MAE)小于7(1/s)。MLHMs可应用于各种头部碰撞类型,以快速准确地计算大脑应变和应变率。除了在实时脑应变和应变率监测中的临床应用外,该模型有助于研究人员比FEM更有效地估计头部撞击引起的脑应变和应变率。 摘要:Brain strain and strain rate are effective in predicting traumatic brain injury (TBI) caused by head impacts. However, state-of-the-art finite element modeling (FEM) demands considerable computational time in the computation, limiting its application in real-time TBI risk monitoring. To accelerate, machine learning head models (MLHMs) were developed, and the model accuracy was found to decrease when the training/test datasets were from different head impacts types. However, the size of dataset for specific impact types may not be enough for model training. To address the computational cost of FEM, the limited strain rate prediction, and the generalizability of MLHMs to on-field datasets, we propose data fusion and transfer learning to develop a series of MLHMs to predict the maximum principal strain (MPS) and maximum principal strain rate (MPSR). We trained and tested the MLHMs on 13,623 head impacts from simulations, American football, mixed martial arts, car crash, and compared against the models trained on only simulations or only on-field impacts. The MLHMs developed with transfer learning are significantly more accurate in estimating MPS and MPSR than other models, with a mean absolute error (MAE) smaller than 0.03 in predicting MPS and smaller than 7 (1/s) in predicting MPSR on all impact datasets. The MLHMs can be applied to various head impact types for rapidly and accurately calculating brain strain and strain rate. Besides the clinical applications in real-time brain strain and strain rate monitoring, this model helps researchers estimate the brain strain and strain rate caused by head impacts more efficiently than FEM.

强化学习(2篇)

【1】 WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU 标题:WarpDrive:GPU上极快的端到端深度多Agent强化学习 链接:https://arxiv.org/abs/2108.13976

作者:Tian Lan,Sunil Srinivasa,Stephan Zheng 机构:Salesforce Research 备注:TL and SS contributed equally. Code is available at this https URL 14 pages, 7 figures 摘要:深度强化学习(RL)是在复杂动态环境中训练决策模型的强大框架。然而,当RL通过与环境模拟的重复交互进行学习时,它可能会变慢。加速RL需要算法和工程创新。特别是,在以多个代理或高维状态、观察或动作空间为特征的复杂环境中使用RL时,存在关键的系统工程瓶颈。我们介绍了WarpDrive,这是一个灵活、轻量级、易于使用的开源RL框架,它基于PyCUDA和PyTorch,在单个GPU(图形处理单元)上实现端到端多代理RL。与混合CPU模拟和GPU模型的常见实现相比,WarpDrive利用GPU的极端并行能力,实现了数量级的快速RL。我们的设计运行模拟,每个模拟中的代理并行运行。它消除了CPU和GPU之间的数据复制。它还使用GPU上的单个模拟数据存储,该数据存储可以安全地就地更新。总之,这允许用户运行数千个并发多代理模拟,并对大量体验进行训练。例如,WarpDrive在基准测试标记模拟中每秒产生290万个环境步骤,其中有2000个环境和1000个代理(与CPU实现相比,吞吐量至少高出100倍)。WarpDrive提供了一个轻量级Python接口和环境包装器,以简化使用并提高灵活性和扩展性。因此,WarpDrive提供了构建高吞吐量RL系统的框架。 摘要:Deep reinforcement learning (RL) is a powerful framework to train decision-making models in complex dynamical environments. However, RL can be slow as it learns through repeated interaction with a simulation of the environment. Accelerating RL requires both algorithmic and engineering innovations. In particular, there are key systems engineering bottlenecks when using RL in complex environments that feature multiple agents or high-dimensional state, observation, or action spaces, for example. We present WarpDrive, a flexible, lightweight, and easy-to-use open-source RL framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit), building on PyCUDA and PyTorch. Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared to common implementations that blend CPU simulations and GPU models. Our design runs simulations and the agents in each simulation in parallel. It eliminates data copying between CPU and GPU. It also uses a single simulation data store on the GPU that is safely updated in-place. Together, this allows the user to run thousands of concurrent multi-agent simulations and train on extremely large batches of experience. For example, WarpDrive yields 2.9 million environment steps/second with 2000 environments and 1000 agents (at least 100x higher throughput compared to a CPU implementation) in a benchmark Tag simulation. WarpDrive provides a lightweight Python interface and environment wrappers to simplify usage and promote flexibility and extensions. As such, WarpDrive provides a framework for building high-throughput RL systems.

【2】 Identifying optimal cycles in quantum thermal machines with reinforcement-learning 标题:基于强化学习的量子热机最优循环识别 链接:https://arxiv.org/abs/2108.13525

作者:Paolo Andrea Erdman,Frank Noé 机构: † 1Freie Universit¨at Berlin, Department of Mathematics and Computer Science, Germany 2Freie Universit¨at Berlin, Department of Physics, Germany 3Rice University, Department of Chemistry 备注:14 pages, 7 figures 摘要:开放量子系统的最优控制是一项具有挑战性的任务,但在改进现有量子信息处理技术方面具有关键作用。我们介绍了一个基于强化学习的通用框架,以发现最佳热力学循环,最大限度地提高非平衡量子热机和冰箱的功率。我们将我们的方法应用于三个系统:一个基准两级系统热机,在这里我们找到了最佳已知循环;一个基于超导量子位产生相干的实验真实冰箱,我们发现一个非直观的控制序列优于文献中提出的先前周期;一个基于量子谐振子的热机,我们发现一个结构复杂的循环比优化的奥托循环好。然后,我们在最大功率下评估相应的效率。 摘要:The optimal control of open quantum systems is a challenging task but has a key role in improving existing quantum information processing technologies. We introduce a general framework based on Reinforcement Learning to discover optimal thermodynamic cycles that maximize the power of out-of-equilibrium quantum heat engines and refrigerators. We apply our method, based on the soft actor-critic algorithm, to three systems: a benchmark two-level system heat engine, where we find the optimal known cycle; an experimentally realistic refrigerator based on a superconducting qubit that generates coherence, where we find a non-intuitive control sequence that outperform previous cycles proposed in literature; a heat engine based on a quantum harmonic oscillator, where we find a cycle with an elaborate structure that outperforms the optimized Otto cycle. We then evaluate the corresponding efficiency at maximum power.

医学相关(3篇)

【1】 Clustering of Pain Dynamics in Sickle Cell Disease from Sparse, Uneven Samples 标题:来自稀疏、不均匀样本的镰刀细胞病疼痛动力学的聚类 链接:https://arxiv.org/abs/2108.13963

作者:Gary K. Nave Jr.,Swati Padhee,Amanuel Alambo,Tanvi Banerjee,Nirmish Shah,Daniel M. Abrams 机构:∗Engineering Science and Applied Mathematics, Northwestern University, Evanston, IL, USA, †Computer Science and Engineering, Wright State University, Dayton, OH, USA, ‡Department of Medicine, Duke University, Durham, NC, USA 备注:7 pages, 5 figures 摘要:不规则采样的时间序列数据在许多领域都很常见。在这种情况下,许多从数据中提取洞察力的典型方法都失败了。在这里,我们试图将聚类轨迹的方法推广到不规则和稀疏采样数据。我们首先构建合成数据集,然后提出并评估四种数据对齐方法,以便应用光谱聚类。我们还对镰状细胞病患者的医疗记录中提取的真实数据重复同样的过程,这些患者的主观疼痛体验通过移动应用程序跟踪了几个月。我们发现,对齐不规则采样稀疏数据集的不同方法可以导致不同的最优聚类数,即使对于具有已知属性的合成数据也是如此。对于镰状细胞病的病例,我们发现三组是一个合理的选择,它们似乎对应于(1)偶尔出现急性疼痛的低痛组,(2)经历通常从低到高波动的中度平均疼痛的组,以及(3)经历持续高水平疼痛的组。我们的研究结果可能有助于医生和患者更好地理解和管理患者的疼痛程度,我们希望我们开发的方法将应用于医学和其他领域的广泛数据源。 摘要:Irregularly sampled time series data are common in a variety of fields. Many typical methods for drawing insight from data fail in this case. Here we attempt to generalize methods for clustering trajectories to irregularly and sparsely sampled data. We first construct synthetic data sets, then propose and assess four methods of data alignment to allow for application of spectral clustering. We also repeat the same process for real data drawn from medical records of patients with sickle cell disease -- patients whose subjective experiences of pain were tracked for several months via a mobile app. We find that different methods for aligning irregularly sampled sparse data sets can lead to different optimal numbers of clusters, even for synthetic data with known properties. For the case of sickle cell disease, we find that three clusters is a reasonable choice, and these appear to correspond to (1) a low pain group with occasionally acute pain, (2) a group which experiences moderate mean pain that fluctuates often from low to high, and (3) a group that experiences persistent high levels of pain. Our results may help physicians and patients better understand and manage patients' pain levels over time, and we expect that the methods we develop will apply to a wide range of other data sources in medicine and beyond.

【2】 Modeling the effect of the vaccination campaign on the Covid-19 pandemic 标题:模拟疫苗接种运动对冠状病毒大流行的影响 链接:https://arxiv.org/abs/2108.13908

作者:Mattia Angeli,Georgios Neofotistos,Marios Mattheakis,Efthimios Kaxiras 机构:John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts , Department of Physics, Harvard University, Cambridge, Massachusetts , USA 摘要:当与限制性和预防性措施相结合时,全民疫苗接种对于控制SARS-CoV-2(Covid-19)大流行至关重要。在这项2019冠状病毒疾病疫苗研究中,我们引入了一个数学模型,该模型能够预测疫苗接种过程中的COVID-19流行病演变。SAIVR通过考虑无症状(A)和接种(V)隔室,扩展了广泛使用的易感感染(SIR)模型。该模型包含多个参数和初始条件,这些参数和初始条件通过半监督机器学习过程进行估计。在训练一个无监督神经网络来求解SAIVR微分方程后,一个有监督的框架然后估计最适合27个国家最近传染曲线的最佳条件和参数。在这些结果的指导下,我们对大流行的时间演变进行了广泛的研究,研究对象包括不同的每日推出率、疫苗效力和广泛的社会疫苗犹豫/拒绝水平。通过研究未来的方案,涉及不同的疫苗接种努力和更具传染性的COVID-19变异体,牛群免疫力的概念受到质疑。 摘要:Population-wide vaccination is critical for containing the SARS-CoV-2 (Covid-19) pandemic when combined with restrictive and prevention measures. In this study, we introduce SAIVR, a mathematical model able to forecast the Covid-19 epidemic evolution during the vaccination campaign. SAIVR extends the widely used Susceptible-Infectious-Removed (SIR) model by considering the Asymptomatic (A) and Vaccinated (V) compartments. The model contains several parameters and initial conditions that are estimated by employing a semi-supervised machine learning procedure. After training an unsupervised neural network to solve the SAIVR differential equations, a supervised framework then estimates the optimal conditions and parameters that best fit recent infectious curves of 27 countries. Instructed by these results, we performed an extensive study on the temporal evolution of the pandemic under varying values of roll-out daily rates, vaccine efficacy, and a broad range of societal vaccine hesitancy/denial levels. The concept of herd immunity is questioned by studying future scenarios which involve different vaccination efforts and more infectious Covid-19 variants.

【3】 Temporal Deep Learning Architecture for Prediction of COVID-19 Cases in India 标题:用于预测印度冠状病毒病例的时间深度学习结构 链接:https://arxiv.org/abs/2108.13823

作者:Hanuman Verma,Saurav Mandal,Akshansh Gupta 机构:Bareilly College, Bareilly, Uttar Pradesh, India , School of Computational and Intergative Sciences, Jawaharlal Nehru University, New Delhi, India , CSIR-Central Electronics Engineering Research Institute,Pilani Rajasthan, India 备注:13 pages 摘要:为了对抗最近的冠状病毒病2019(COVID-19),院士和临床医生正在寻找新的方法来预测COVID-19暴发的动态趋势,这可能减缓或阻止大流行。诸如易感感染恢复(SIR)及其变体等流行病学模型有助于了解大流行的动态趋势,可用于决策以优化传染病的可能控制。但这些基于数学假设的流行病学模型可能无法预测真正的大流行情况。最近,2019冠状病毒疾病的动态趋势被用来理解新的机器学习方法。在本文中,我们设计了递归和卷积神经网络模型:香草LSTM、堆叠LSTM、ED-LSTM、Bi LSTM、美国有线电视新闻网和混合美国有线电视新闻网 LSTM模型,以捕获COVID-19暴发的复杂趋势,并对印度及其四个受影响最大的国家(马哈拉施特拉)进行7, 14, 21天的COVID-19日确诊病例的预测,喀拉拉邦、卡纳塔克邦和泰米尔纳德邦)。在测试数据上计算均方根误差(RMSE)和平均绝对百分比误差(MAPE)评估指标,以证明这些模型的相对性能。结果表明,与其他模型相比,叠加LSTM和CNN LSTM混合模型的性能最好。 摘要:To combat the recent coronavirus disease 2019 (COVID-19), academician and clinician are in search of new approaches to predict the COVID-19 outbreak dynamic trends that may slow down or stop the pandemic. Epidemiological models like Susceptible-Infected-Recovered (SIR) and its variants are helpful to understand the dynamics trend of pandemic that may be used in decision making to optimize possible controls from the infectious disease. But these epidemiological models based on mathematical assumptions may not predict the real pandemic situation. Recently the new machine learning approaches are being used to understand the dynamic trend of COVID-19 spread. In this paper, we designed the recurrent and convolutional neural network models: vanilla LSTM, stacked LSTM, ED-LSTM, Bi-LSTM, CNN, and hybrid CNN LSTM model to capture the complex trend of COVID-19 outbreak and perform the forecasting of COVID-19 daily confirmed cases of 7, 14, 21 days for India and its four most affected states (Maharashtra, Kerala, Karnataka, and Tamil Nadu). The root mean square error (RMSE) and mean absolute percentage error (MAPE) evaluation metric are computed on the testing data to demonstrate the relative performance of these models. The results show that the stacked LSTM and hybrid CNN LSTM models perform best relative to other models.

推荐(1篇)

【1】 Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations 标题:基于最大效用的顺序查询推荐ARM选择策略 链接:https://arxiv.org/abs/2108.13810

作者:Shameem A. Puthiya Parambath,Christos Anagnostopoulos,Roderick Murray-Smith,Sean MacAvaney,Evangelos Zervas 机构:University of Glasgow, Glasgow, UK, University of West Attica, Greece 摘要:我们考虑在闭环交互式学习设置中的查询推荐问题,如在线信息收集和探索性分析。这个问题可以自然地使用多武装土匪(MAB)框架来建模,该框架具有可数个武器。可数个臂的标准MAB算法首先选择一组随机的候选臂,然后在下游的该候选集上应用标准MAB算法,例如UCB。我们证明了这种选择策略通常会导致更高的累积后悔,为此,我们提出了一种基于最大效用的选择策略。我们表明,在在线信息收集等任务中,采用顺序查询建议,查询序列相互关联,通过选择对当前执行的查询具有最大效用的查询,可以将潜在最优查询的数量减少到可管理的大小。我们使用最新的在线文献发现服务日志文件进行的实验结果表明,与最先进的基线算法相比,所提出的arm选择策略显著提高了累积后悔率以及常用的随机选择策略,用于各种背景下的多武装土匪算法。我们的数据模型和源代码可从~url获得{https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/}. 摘要:We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. The standard MAB algorithms for countably many arms begin with selecting a random set of candidate arms and then applying standard MAB algorithms, e.g., UCB, on this candidate set downstream. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms. We show that in tasks like online information gathering, where sequential query recommendations are employed, the sequences of queries are correlated and the number of potentially optimal queries can be reduced to a manageable size by selecting queries with maximum utility with respect to the currently executing query. Our experimental results using a recent real online literature discovery service log file demonstrate that the proposed arm selection strategy improves the cumulative regret substantially with respect to the state-of-the-art baseline algorithms. % and commonly used random selection strategy for a variety of contextual multi-armed bandit algorithms. Our data model and source code are available at ~url{https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/}.

超分辨率|去噪|去模糊|去雾(2篇)

【1】 Super-Resolution Appearance Transfer for 4D Human Performances 标题:4D人体表演的超分辨率外观传递 链接:https://arxiv.org/abs/2108.13739

作者:Marco Pesavento,Marco Volino,Adrian Hilton 机构:Centre for Vision, Speech and Signal Processing, University of Surrey, UK 摘要:从多视点视频中对人进行4D重建的一个常见问题是捕获的动态纹理外观的质量,这取决于相机分辨率和捕获体积。通常,要求对摄像机进行帧处理以捕获动态性能的体积($>50m^3$)导致人员仅占据视野的一小部分$<$10%。即使使用超高清晰度4k视频采集,这也会导致以低于标准清晰度0.5k视频分辨率对人进行采样,从而导致低质量渲染。在本文中,我们提出了一种解决方案,通过使用数字静物照相机($>8k$)从静态高分辨率外观捕捉装置进行超分辨率外观转移,以小体积($<8m^3$)捕捉人物。提出了一种从高分辨率静态捕获到动态视频性能捕获的超分辨率外观转换管道,以生成超分辨率动态纹理。这解决了两个关键问题:不同摄像机系统之间的颜色映射;并利用学习到的模型进行动态纹理贴图超分辨率处理。对比评估表明,在呈现具有超分辨率动态纹理外观的4D性能捕获方面,在定性和定量方面都有显著改进。所提出的方法再现了静态捕获的高分辨率细节,同时保持捕获视频的外观动态。 摘要:A common problem in the 4D reconstruction of people from multi-view video is the quality of the captured dynamic texture appearance which depends on both the camera resolution and capture volume. Typically the requirement to frame cameras to capture the volume of a dynamic performance ($>50m^3$) results in the person occupying only a small proportion $<$ 10% of the field of view. Even with ultra high-definition 4k video acquisition this results in sampling the person at less-than standard definition 0.5k video resolution resulting in low-quality rendering. In this paper we propose a solution to this problem through super-resolution appearance transfer from a static high-resolution appearance capture rig using digital stills cameras ($> 8k$) to capture the person in a small volume ($<8m^3$). A pipeline is proposed for super-resolution appearance transfer from high-resolution static capture to dynamic video performance capture to produce super-resolution dynamic textures. This addresses two key problems: colour mapping between different camera systems; and dynamic texture map super-resolution using a learnt model. Comparative evaluation demonstrates a significant qualitative and quantitative improvement in rendering the 4D performance capture with super-resolution dynamic texture appearance. The proposed approach reproduces the high-resolution detail of the static capture whilst maintaining the appearance dynamics of the captured video.

【2】 Attention-based Multi-Reference Learning for Image Super-Resolution 标题:基于注意力的图像超分辨率多参考学习 链接:https://arxiv.org/abs/2108.13697

作者:Marco Pesavento,Marco Volino,Adrian Hilton 机构:Centre for Vision, Speech and Signal Processing, University of Surrey, UK 摘要:本文提出了一种新的基于注意的多参考超分辨率网络(AMRSR),该网络在给定低分辨率图像的情况下,学习将多参考图像中最相似的纹理自适应地传输到超分辨率输出,同时保持空间一致性。在多个基准数据集上,使用多个参考图像和基于注意的采样可以显著提高最先进的参考超分辨率方法的性能。参考超分辨率方法最近被提出,通过提供来自高分辨率参考图像的附加信息来克服图像超分辨率的不适定问题。多参考超分辨率通过提供更多样化的图像特征池来扩展此方法,以克服固有的信息不足,同时保持内存效率。提出了一种新的基于分层注意的抽样方法,用于基于感知损失的低分辨率图像特征与多幅参考图像之间的相似性学习。消融证明了多参考和基于分层注意的抽样对整体表现的贡献。即使参考图像明显偏离目标图像,感知和定量的地面真实度评估也显示出显著的性能改进。项目网站可在以下网址找到:https://marcopesavento.github.io/AMRSR/ 摘要:This paper proposes a novel Attention-based Multi-Reference Super-resolution network (AMRSR) that, given a low-resolution image, learns to adaptively transfer the most similar texture from multiple reference images to the super-resolution output whilst maintaining spatial coherence. The use of multiple reference images together with attention-based sampling is demonstrated to achieve significantly improved performance over state-of-the-art reference super-resolution approaches on multiple benchmark datasets. Reference super-resolution approaches have recently been proposed to overcome the ill-posed problem of image super-resolution by providing additional information from a high-resolution reference image. Multi-reference super-resolution extends this approach by providing a more diverse pool of image features to overcome the inherent information deficit whilst maintaining memory efficiency. A novel hierarchical attention-based sampling approach is introduced to learn the similarity between low-resolution image features and multiple reference images based on a perceptual loss. Ablation demonstrates the contribution of both multi-reference and hierarchical attention-based sampling to overall performance. Perceptual and quantitative ground-truth evaluation demonstrates significant improvement in performance even when the reference images deviate significantly from the target image. The project website can be found at https://marcopesavento.github.io/AMRSR/

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images 标题:InSeGan:深度图像中分割相同实例的产生式方法 链接:https://arxiv.org/abs/2108.13865

作者:Anoop Cherian,Goncalo Dias Pais,Siddarth Jain,Tim K. Marks,Alan Sullivan 机构:Gonc¸alo Dias Pais,, Mitsubishi Electric Research Labs (MERL), Cambridge, MA, Instituto Superior T´ecnico, University of Lisbon, Portugal 备注:Accepted at ICCV 2021 摘要:在本文中,我们提出了InSeGAN,一种无监督的3D生成对抗网络(GAN),用于在深度图像中分割(几乎)相同的刚性物体实例。采用综合分析的方法,我们设计了一种新颖的GAN结构来合成多实例深度图像,并对每个实例进行独立控制。InSeGAN接受一组代码向量(例如,随机噪声向量),每个编码对象的3D姿势,该姿势由学习的隐式对象模板表示。发电机有两个不同的模块。第一个模块,实例特征生成器,使用每个编码姿势将隐式模板转换为每个对象实例的特征映射表示。第二个模块(深度图像渲染器)聚合第一个模块输出的所有单实例特征贴图,并生成多实例深度图像。鉴别器将生成的多实例深度图像与真实深度图像的分布区分开来。为了使用我们的模型进行实例分割,我们提出了一个实例姿势编码器,该编码器学习获取生成的深度图像,并为所有对象实例重现姿势代码向量。为了评估我们的方法,我们引入了一个新的合成数据集“Insta-10”,由100000个深度图像组成,每个深度图像包含10个类中的5个对象实例。我们在Insta-10上的实验以及在现实世界中有噪声的深度图像上的实验表明,InSeGAN实现了最先进的性能,通常比以前的方法有很大的优势。 摘要:In this paper, we present InSeGAN, an unsupervised 3D generative adversarial network (GAN) for segmenting (nearly) identical instances of rigid objects in depth images. Using an analysis-by-synthesis approach, we design a novel GAN architecture to synthesize a multiple-instance depth image with independent control over each instance. InSeGAN takes in a set of code vectors (e.g., random noise vectors), each encoding the 3D pose of an object that is represented by a learned implicit object template. The generator has two distinct modules. The first module, the instance feature generator, uses each encoded pose to transform the implicit template into a feature map representation of each object instance. The second module, the depth image renderer, aggregates all of the single-instance feature maps output by the first module and generates a multiple-instance depth image. A discriminator distinguishes the generated multiple-instance depth images from the distribution of true depth images. To use our model for instance segmentation, we propose an instance pose encoder that learns to take in a generated depth image and reproduce the pose code vectors for all of the object instances. To evaluate our approach, we introduce a new synthetic dataset, "Insta-10", consisting of 100,000 depth images, each with 5 instances of an object from one of 10 classes. Our experiments on Insta-10, as well as on real-world noisy depth images, show that InSeGAN achieves state-of-the-art performance, often outperforming prior methods by large margins.

联邦学习|隐私保护|加密(2篇)

【1】 GRP-FED: Addressing Client Imbalance in Federated Learning via Global-Regularized Personalization 标题:GRP-FED:通过全球正规化个性化解决联合学习中的客户失衡问题 链接:https://arxiv.org/abs/2108.13858

作者:Yen-Hsiu Chou,Shenda Hong,Chenxi Sun,Derun Cai,Moxian Song,Hongyan Li 机构: People’s Republic of China 3National Institute ofHealth Data Science, People’s Repub-lic of China 4Institute of Medical Technology, Health Science Cen-ter of Peking University 备注:(FL-ICML'21) International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 摘要:由于数据在现实中呈现出长尾性,因此联邦学习(FL)在分散的客户机上作为实际应用进行训练是一项挑战。我们提出了一种全局正则化个性化(GRP-FED)方法来解决数据不平衡问题,方法是为每个客户机考虑一个全局模型和多个局部模型。通过自适应聚合,全局模型公平地对待多个客户机,并缓解了全局长尾问题。每个本地模型都从本地数据中学习,并与其分布相一致,以便进行自定义。为了防止局部模型过度拟合,GRP-FED应用了一个对抗性的鉴别器来调整学习到的全局局部特征。大量结果表明,我们的GRP-FED在现实世界MIT-BIH和合成CIFAR-10数据集的全球和本地场景下都有所改进,实现了可比性能并解决了客户不平衡问题。 摘要:Since data is presented long-tailed in reality, it is challenging for Federated Learning (FL) to train across decentralized clients as practical applications. We present Global-Regularized Personalization (GRP-FED) to tackle the data imbalanced issue by considering a single global model and multiple local models for each client. With adaptive aggregation, the global model treats multiple clients fairly and mitigates the global long-tailed issue. Each local model is learned from the local data and aligns with its distribution for customization. To prevent the local model from just overfitting, GRP-FED applies an adversarial discriminator to regularize between the learned global-local features. Extensive results show that our GRP-FED improves under both global and local scenarios on real-world MIT-BIH and synthesis CIFAR-10 datasets, achieving comparable performance and addressing client imbalance.

【2】 Unit-Modulus Wireless Federated Learning Via Penalty Alternating Minimization 标题:基于惩罚交替最小化的单位模无线联合学习 链接:https://arxiv.org/abs/2108.13669

作者:Shuai Wang,Dachuan Li,Rui Wang,Qi Hao,Yik-Chung Wu,Derrick Wing Kwan Ng 机构:∗Department of Electrical and Electronic Engineering, Southern University of Science and Technology, China, ⋄Department of Computer Science and Engineering, Southern University of Science and Technology, China 备注:IEEE Global Communications Conference 2021. arXiv admin note: substantial text overlap with arXiv:2101.12051 摘要:无线联合学习(FL)是一种新兴的机器学习范式,通过无线通信从分布式数据集训练全局参数模型。提出了一种单位模无线FL(UMWFL)框架,该框架通过优化相移同时上传局部模型参数和计算全局模型参数。该框架避免了复杂的基带信号处理,从而降低了通信延迟和实现成本。推导了训练损失界,提出了惩罚交替最小化(PAM)算法来最小化非凸非光滑损失界。在Car-Learning-to-Act(CARLA)平台上的实验结果表明,与基准方案相比,采用PAM算法的UMWFL框架实现了更小的训练损失和测试误差。 摘要:Wireless federated learning (FL) is an emerging machine learning paradigm that trains a global parametric model from distributed datasets via wireless communications. This paper proposes a unit-modulus wireless FL (UMWFL) framework, which simultaneously uploads local model parameters and computes global model parameters via optimized phase shifting. The proposed framework avoids sophisticated baseband signal processing, leading to both low communication delays and implementation costs. A training loss bound is derived and a penalty alternating minimization (PAM) algorithm is proposed to minimize the nonconvex nonsmooth loss bound. Experimental results in the Car Learning to Act (CARLA) platform show that the proposed UMWFL framework with PAM algorithm achieves smaller training losses and testing errors than those of the benchmark scheme.

推理|分析|理解|解释(5篇)

【1】 Explainable AI for engineering design: A unified approach of systems engineering and component-based deep learning 标题:用于工程设计的可解释人工智能:系统工程和基于组件的深度学习的统一方法 链接:https://arxiv.org/abs/2108.13836

作者:Philipp Geyer,Manav Mahan Singh,Xia Chen 备注:19 pages 摘要:由机器学习创建的数据驱动模型在设计和工程的所有领域都具有重要意义。它们在帮助决策者创造具有更好性能和可持续性的新型人工制品方面具有很大潜力。然而,这些模型的有限泛化和黑盒特性导致了有限的可解释性和可重用性。这些缺点严重阻碍了工程设计的采用。为了克服这种情况,我们提出了一种基于组件的方法,通过机器学习(ML)创建部分组件模型。这种基于组件的方法将深度学习与系统工程(SE)相结合。通过节能建筑设计的实例,我们首先证明了基于构件的方法的泛化,通过准确预测不同于训练数据的随机结构设计的性能。其次,我们通过局部抽样、敏感性信息和来自低深度决策树的规则以及从工程设计角度评估这些信息来说明可解释性。可解释性的关键在于组件之间接口处的激活是可解释的工程量。通过这种方式,分层组件系统形成了一个深度神经网络(DNN),直接集成工程可解释性信息。组成组件中的大量可能配置允许使用可理解的数据驱动模型检查新的看不见的设计案例。通过相似的概率分布匹配组件的参数范围,可以生成可重用、通用性好且可信的模型。该方法使模型结构适应系统工程和领域知识的工程方法。 摘要:Data-driven models created by machine learning gain in importance in all fields of design and engineering. They have high potential to assists decision-makers in creating novel artefacts with a better performance and sustainability. However, limited generalization and the black-box nature of these models induce limited explainability and reusability. These drawbacks provide significant barriers retarding adoption in engineering design. To overcome this situation, we propose a component-based approach to create partial component models by machine learning (ML). This component-based approach aligns deep learning to systems engineering (SE). By means of the example of energy efficient building design, we first demonstrate generalization of the component-based method by accurately predicting the performance of designs with random structure different from training data. Second, we illustrate explainability by local sampling, sensitivity information and rules derived from low-depth decision trees and by evaluating this information from an engineering design perspective. The key for explainability is that activations at interfaces between the components are interpretable engineering quantities. In this way, the hierarchical component system forms a deep neural network (DNN) that directly integrates information for engineering explainability. The large range of possible configurations in composing components allows the examination of novel unseen design cases with understandable data-driven models. The matching of parameter ranges of components by similar probability distribution produces reusable, well-generalizing, and trustworthy models. The approach adapts the model structure to engineering methods of systems engineering and domain knowledge.

【2】 Phy-Q: A Benchmark for Physical Reasoning 标题:PHY-Q:物理推理的基准 链接:https://arxiv.org/abs/2108.13696

作者:Cheng Xue,Vimukthini Pinto,Chathura Gamage,Ekaterina Nikonova,Peng Zhang,Jochen Renz 机构:School of Computing, The Australian National University, Canberra, Australia 备注:For the associated website, see this https URL 摘要:人类在选择行动来完成任务时非常擅长对物理对象的行为进行推理,而这仍然是人工智能面临的一个主要挑战。为了促进解决这个问题的研究,我们提出了一个新的基准,要求代理对物理场景进行推理并采取相应的行动。受婴儿期获得的物理知识和机器人在现实环境中运行所需的能力的启发,我们确定了15个基本物理场景。对于每个场景,我们创建各种不同的任务模板,并确保同一场景中的所有任务模板都可以通过使用一个特定的物理规则来解决。通过这样的设计,我们评估了两个不同的泛化水平,即局部泛化和广义泛化。我们对人类参与者、具有不同输入类型和结构的学习代理以及具有不同策略的启发式代理进行了广泛的评估。基准测试给出了反映代理物理推理能力的Phy-Q(物理推理商)分数。我们的评估表明:1)所有的智能体都无法达到人的绩效;2)学习智能体,即使具有良好的局部泛化能力,也难以学习底层的物理推理规则,并且无法进行广泛的泛化。我们鼓励开发在物理领域具有广泛泛化能力的智能代理。 摘要:Humans are well-versed in reasoning about the behaviors of physical objects when choosing actions to accomplish tasks, while it remains a major challenge for AI. To facilitate research addressing this problem, we propose a new benchmark that requires an agent to reason about physical scenarios and take an action accordingly. Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios. For each scenario, we create a wide variety of distinct task templates, and we ensure all the task templates within the same scenario can be solved by using one specific physical rule. By having such a design, we evaluate two distinct levels of generalization, namely the local generalization and the broad generalization. We conduct an extensive evaluation with human players, learning agents with varying input types and architectures, and heuristic agents with different strategies. The benchmark gives a Phy-Q (physical reasoning quotient) score that reflects the physical reasoning ability of the agents. Our evaluation shows that 1) all agents fail to reach human performance, and 2) learning agents, even with good local generalization ability, struggle to learn the underlying physical reasoning rules and fail to generalize broadly. We encourage the development of intelligent agents with broad generalization abilities in physical domains.

【3】 DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of Heterogeneous Data 标题:DORR:用于异质数据可重复性分析的分解高斯回归 链接:https://arxiv.org/abs/2108.13581

作者:Nazanin Alipourfard,Keith Burghardt,Kristina Lerman 机构:USC Information Sciences Institute, Marina del Rey, CA 摘要:大规模数据的定量分析往往因存在不同的子组而变得复杂,这降低了他们对所持数据进行推断的准确性。为了应对异构数据分析的挑战,我们引入了DoGR,这是一种通过同时将数据划分为重叠簇(分解)和建模其中的行为(回归)来发现潜在混杂因素的方法。当应用于真实数据时,我们的方法发现有意义的集群及其特征行为,从而深入了解群体差异及其对兴趣结果的影响。通过考虑潜在的混杂因素,我们的框架有助于对噪声、异构数据进行探索性分析,并可用于学习更好地推广到新数据的预测模型。我们提供的代码使其他人能够在其数据分析工作流中使用DoGR。 摘要:Quantitative analysis of large-scale data is often complicated by the presence of diverse subgroups, which reduce the accuracy of inferences they make on held-out data. To address the challenge of heterogeneous data analysis, we introduce DoGR, a method that discovers latent confounders by simultaneously partitioning the data into overlapping clusters (disaggregation) and modeling the behavior within them (regression). When applied to real-world data, our method discovers meaningful clusters and their characteristic behaviors, thus giving insight into group differences and their impact on the outcome of interest. By accounting for latent confounders, our framework facilitates exploratory analysis of noisy, heterogeneous data and can be used to learn predictive models that better generalize to new data. We provide the code to enable others to use DoGR within their data analytic workflows.

【4】 An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion Prediction 标题:用于点击后转换预测的全空间多任务模型分析 链接:https://arxiv.org/abs/2108.13475

作者:Conor O'Brien,Kin Sum Liu,James Neufeld,Rafael Barreto,Jonathan J Hunt 备注:RecSys 21 Late Breaking Results 摘要:工业推荐系统的任务通常是对多个(通常是密切相关的)用户行为的概率进行近似。例如,预测用户是否会点击广告,然后他们是否会购买广告产品。这些任务之间的概念相似性促进了多任务学习的使用:一类旨在从相关任务中带来正归纳迁移的算法。在这里,我们实证评估多任务学习方法与神经网络的在线广告任务。具体来说,我们考虑在大规模广告平台上使用相关点击事件(CTR)作为辅助任务来近似移动应用程序广告的点击后转换事件(安装)(CVR)的概率。我们使用消融方法系统地研究了最近的方法,这些方法结合了多任务学习和“全空间建模”,在所有记录的示例上训练CVR,而不是学习给定转换的条件可能性。基于这些结果,我们证明了几种不同的方法导致了从数据丰富的CTR任务到CVR任务的相似正迁移水平,并对多任务设计选择如何解决影响CVR任务的两个主要问题提供了一些见解:数据稀疏性和数据偏差。我们的发现补充了越来越多的证据,表明标准多任务学习是在现实世界大规模应用程序中模拟相关事件的一种明智方法,并表明特定的多任务方法可以通过在现有系统中易于实现来指导。 摘要:Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions. For example, predicting if a user will click on an advertisement and if they will then purchase the advertised product. The conceptual similarity between these tasks has promoted the use of multi-task learning: a class of algorithms that aim to bring positive inductive transfer from related tasks. Here, we empirically evaluate multi-task learning approaches with neural networks for an online advertising task. Specifically, we consider approximating the probability of post-click conversion events (installs) (CVR) for mobile app advertising on a large-scale advertising platform, using the related click events (CTR) as an auxiliary task. We use an ablation approach to systematically study recent approaches that incorporate both multitask learning and "entire space modeling" which train the CVR on all logged examples rather than learning a conditional likelihood of conversion given clicked. Based on these results we show that several different approaches result in similar levels of positive transfer from the data-abundant CTR task to the CVR task and offer some insight into how the multi-task design choices address the two primary problems affecting the CVR task: data sparsity and data bias. Our findings add to the growing body of evidence suggesting that standard multi-task learning is a sensible approach to modelling related events in real-world large-scale applications and suggest the specific multitask approach can be guided by ease of implementation in an existing system.

【5】 Disentanglement Analysis with Partial Information Decomposition 标题:基于部分信息分解的解缠分析 链接:https://arxiv.org/abs/2108.13753

作者:Seiya Tokui,Issei Sato 机构:The University of Tokyo 摘要:给定由多个协同改变其外观的变化因素生成的数据,分离表示的目的是通过将数据映射到多个随机变量(分别捕获不同的生成因素)来逆转过程。由于概念直观而抽象,因此需要使用解纠缠度量对其进行量化,以评估和比较不同模型之间解纠缠表示的质量。当前解纠缠度量的目的是测量由每个生成因子调节的每个变量的浓度,例如绝对偏差、方差或熵,可选择由其边际分布的浓度抵消,并在不同变量之间进行比较。当表示由两个以上的变量组成时,这些度量可能无法检测它们之间的相互作用,因为它们只测量成对的相互作用。在这项工作中,我们使用部分信息分解框架来评估两个以上变量之间的信息共享,并构建一个框架,包括一个新的解纠缠度量,用于分析表示如何清晰、冗余和协作地编码生成因素。我们建立了一个实验协议来评估每个度量如何评估越来越纠缠的表示,并通过人工和现实的设置来确认所提出的度量正确地响应纠缠。我们的研究结果有望促进信息论对解纠缠的理解,并推动度量和学习方法的进一步发展。 摘要:Given data generated from multiple factors of variation that cooperatively transform their appearance, disentangled representations aim at reversing the process by mapping data to multiple random variables that individually capture distinct generative factors. As the concept is intuitive but abstract, one needs to quantify it with disentanglement metrics to evaluate and compare the quality of disentangled representations between different models. Current disentanglement metrics are designed to measure the concentration, e.g., absolute deviation, variance, or entropy, of each variable conditioned by each generative factor, optionally offset by the concentration of its marginal distribution, and compare it among different variables. When representations consist of more than two variables, such metrics may fail to detect the interplay between them as they only measure pairwise interactions. In this work, we use the Partial Information Decomposition framework to evaluate information sharing between more than two variables, and build a framework, including a new disentanglement metric, for analyzing how the representations encode the generative factors distinctly, redundantly, and cooperatively. We establish an experimental protocol to assess how each metric evaluates increasingly entangled representations and confirm through artificial and realistic settings that the proposed metric correctly responds to entanglement. Our results are expected to promote information theoretic understanding of disentanglement and lead to further development of metrics as well as learning methods.

分类|识别(2篇)

【1】 Cross-Lingual Text Classification of Transliterated Hindi and Malayalam 标题:音译印地语和马来亚语的跨语言文本分类 链接:https://arxiv.org/abs/2108.13620

作者:Jitin Krishnan,Antonios Anastasopoulos,Hemant Purohit,Huzefa Rangwala 机构:George Mason University, Fairfax, VA, USA 备注:12 pages, 5 tables, 7 Figures 摘要:音译在社交媒体上非常常见,但对于各种NLP任务,现代神经模型无法充分处理音译文本。在这项工作中,我们将数据扩充方法与师生训练方案相结合,在跨语言迁移环境中解决这一问题,以微调最先进的预训练多语言模型,如mBERT和XLM-R。我们评估了我们对音译印地语和马来语的方法,还引入了新的数据集,用于对真实场景进行基准测试:一个是音译印地语和马来语的情绪分类,另一个是音译印地语和马来语的危机推特分类(与2013年北印度和2018年喀拉拉邦洪水有关)。我们的方法在F1成绩上比强基线平均提高了5.6%的mBERT和4.7%的XLM-R。 摘要:Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks. In this work, we combine data augmentation approaches with a Teacher-Student training scheme to address this issue in a cross-lingual transfer setting for fine-tuning state-of-the-art pre-trained multilingual language models such as mBERT and XLM-R. We evaluate our method on transliterated Hindi and Malayalam, also introducing new datasets for benchmarking on real-world scenarios: one on sentiment classification in transliterated Malayalam, and another on crisis tweet classification in transliterated Hindi and Malayalam (related to the 2013 North India and 2018 Kerala floods). Our method yielded an average improvement of 5.6% on mBERT and 4.7% on XLM-R in F1 scores over their strong baselines.

【2】 Recent advances for quantum classifiers 标题:量子分类器的最新进展 链接:https://arxiv.org/abs/2108.13421

作者:Weikang Li,Dong-Ling Deng 机构:Center for Quantum Information, IIIS, Tsinghua University, Beijing , People’s Republic of China, Shanghai Qi Zhi Institute,th Floor, AI Tower, No. , Yunjin Road, Xuhui District, Shanghai , China, ) 备注:invited review, 21 pages, 10 figures 摘要:机器学习在广泛的应用中取得了巨大的成功。它与量子物理的相互作用可能会为基础研究和商业应用带来前所未有的前景,从而引发量子机器学习的新兴研究前沿。沿着这条路线,量子分类器,这是一种旨在解决机器学习中的分类问题的量子设备,最近引起了极大的关注。在这篇综述中,我们对量子分类器的研究给出了一个相对全面的综述,重点介绍了最近的进展。首先,我们将回顾一些量子分类算法,包括量子支持向量机、量子核方法、量子决策树和量子最近邻算法。然后,我们继续介绍变分量子分类器,它本质上是用于分类的变分量子电路。我们将回顾构造变分量子分类器的不同架构,并介绍贫瘠平台问题,其中量子分类器的训练可能会受到指数消失梯度的阻碍。此外,还将讨论量子分类器在对抗性学习环境中的脆弱性方面以及不同量子分类器的最新实验进展。 摘要:Machine learning has achieved dramatic success in a broad spectrum of applications. Its interplay with quantum physics may lead to unprecedented perspectives for both fundamental research and commercial applications, giving rise to an emergent research frontier of quantum machine learning. Along this line, quantum classifiers, which are quantum devices that aim to solve classification problems in machine learning, have attracted tremendous attention recently. In this review, we give a relatively comprehensive overview for the studies of quantum classifiers, with a focus on recent advances. First, we will review a number of quantum classification algorithms, including quantum support vector machine, quantum kernel methods, quantum decision tree, and quantum nearest neighbor algorithm. Then, we move on to introduce the variational quantum classifiers, which are essentially variational quantum circuits for classifications. We will review different architectures for constructing variational quantum classifiers and introduce the barren plateau problem, where the training of quantum classifiers might be hindered by the exponentially vanishing gradient. In addition, the vulnerability aspect of quantum classifiers in the setting of adversarial learning and the recent experimental progress on different quantum classifiers will also be discussed.

表征(1篇)

【1】 A manifold learning perspective on representation learning: Learning decoder and representations without an encoder 标题:表征学习的流形学习视角:学习解码器和无编码器的表征 链接:https://arxiv.org/abs/2108.13910

作者:Viktoria Schuster,Anders Krogh 机构:Center for Health Data Science, University of Copenhagen, Denmark, Dept of Computer Science and 摘要:自动编码器通常用于表征学习。它们由一个编码器和一个解码器组成,这提供了一种简单的方法,可以将输入空间中的$n$维数据映射到较低的$m$维表示空间,然后再映射回来。解码器本身在输入空间中定义了一个$m$维流形。受流形学习的启发,我们通过梯度下降学习训练样本和解码器权重的表示,证明了解码器可以自行训练。然后,平方和损失对应于优化流形,使其与训练样本之间的欧几里德距离最小,对于其他损失函数也是如此。我们推导了指定编码器和解码器所需的样本数表达式,并表明与编码器相比,解码器通常需要更少的训练样本才能很好地指定。我们从这个角度讨论了自动编码器的训练,并与先前使用噪声训练示例和其他正则化类型的领域工作相关。在自然图像数据集MNIST和CIFAR10上,我们证明解码器更适合学习低维表示,尤其是在小数据集上训练时。使用模拟的基因调控数据,我们进一步表明,解码器单独导致更好的泛化和有意义的表示。我们单独训练解码器的方法有助于表示学习,即使是在小数据集上,也可以改进自动编码器的训练。我们希望这些简单的分析也有助于提高对表征学习的概念理解。 摘要:Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward way to map $n$-dimensional data in input space to a lower $m$-dimensional representation space and back. The decoder itself defines an $m$-dimensional manifold in input space. Inspired by manifold learning, we show that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derive expressions for the number of samples needed to specify the encoder and decoder and show that the decoder generally requires much less training samples to be well-specified compared to the encoder. We discuss training of autoencoders in this perspective and relate to previous work in the field that use noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrate that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further show that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.

3D|3D重建等相关(1篇)

【1】 Learning Practically Feasible Policies for Online 3D Bin Packing 标题:学习实用可行的在线3D装箱策略 链接:https://arxiv.org/abs/2108.13680

作者:Hang Zhao,Chenyang Zhu,Xin Xu,Hui Huang,Kai Xu 机构:National University of Defense Technology, Changsha , China;, Shenzhen University, Shenzhen , China 摘要:我们处理在线三维装箱问题,这是经典装箱问题的一个具有挑战性但实际有用的变体。在这个问题中,项目被交付给代理,而不通知完整的序列信息。代理商必须在不改变到货顺序的情况下直接将这些物品稳定地包装到目标箱中,并且不允许进一步调整。在线3D-BPP可以自然地表示为马尔可夫决策过程(MDP)。我们采用深度强化学习,特别是基于策略-行动者-批评家框架,来解决行动空间受限的MDP问题。为了学习切实可行的包装政策,我们提出了三个关键设计。首先,我们提出了一种在线分析包装稳定性的一种新的堆叠树的基础上。它在将计算复杂度从$O(N^2)$降低到$O(Nlog N)$的同时,获得了较高的分析精度,特别适合于RL训练。其次,我们针对不同的布局维度提出了一种解耦的布局策略学习方法,该方法可以实现高分辨率的空间离散化,从而提高布局精度。第三,我们引入了一个奖励函数,该函数指示机器人按从远到近的顺序放置物品,从而简化了机械臂运动规划中的碰撞避免。此外,我们还对几个关键的实现问题进行了全面的讨论。广泛的评估表明,我们学习的策略显著优于最先进的方法,并且实际可用于实际应用。 摘要:We tackle the Online 3D Bin Packing Problem, a challenging yet practically useful variant of the classical Bin Packing Problem. In this problem, the items are delivered to the agent without informing the full sequence information. Agent must directly pack these items into the target bin stably without changing their arrival order, and no further adjustment is permitted. Online 3D-BPP can be naturally formulated as Markov Decision Process (MDP). We adopt deep reinforcement learning, in particular, the on-policy actor-critic framework, to solve this MDP with constrained action space. To learn a practically feasible packing policy, we propose three critical designs. First, we propose an online analysis of packing stability based on a novel stacking tree. It attains a high analysis accuracy while reducing the computational complexity from $O(N^2)$ to $O(N log N)$, making it especially suited for RL training. Second, we propose a decoupled packing policy learning for different dimensions of placement which enables high-resolution spatial discretization and hence high packing precision. Third, we introduce a reward function that dictates the robot to place items in a far-to-near order and therefore simplifies the collision avoidance in movement planning of the robotic arm. Furthermore, we provide a comprehensive discussion on several key implemental issues. The extensive evaluation demonstrates that our learned policy outperforms the state-of-the-art methods significantly and is practically usable for real-world applications.

优化|敛散性(2篇)

【1】 Learning Optimal Prescriptive Trees from Observational Data 标题:从观测数据中学习最优规范树 链接:https://arxiv.org/abs/2108.13628

作者:Nathanael Jo,Sina Aghaei,Andrés Gómez,Phebe Vayanos 机构: Phebe VayanosUniversity of Southern California 摘要:我们考虑的问题,学习一个最佳规定树(即,个性化的治疗分配策略的形式二叉树)中等深度,从观测数据。这一问题出现在许多重要的社会领域,如公共卫生和个性化医疗领域,在这些领域,通过被动收集数据,而不是随机试验,根据部署中收集的数据,寻求可解释和数据驱动的干预措施。提出了一种利用混合整数优化(MIO)技术学习最优规定树的方法。我们证明,在温和的条件下,当历史数据样本数趋于无穷大时,我们的方法在某种意义上是渐近精确的,即收敛到最优的样本外处理分配策略。这使我们不同于关于这个主题的现有文献,后者要么要求数据随机化,要么对树施加严格的假设。基于对合成数据和真实数据的大量计算实验,我们证明了即使在有限样本中,我们的渐近保证也能转化为显著的样本外性能改进。 摘要:We consider the problem of learning an optimal prescriptive tree (i.e., a personalized treatment assignment policy in the form of a binary tree) of moderate depth, from observational data. This problem arises in numerous socially important domains such as public health and personalized medicine, where interpretable and data-driven interventions are sought based on data gathered in deployment, through passive collection of data, rather than from randomized trials. We propose a method for learning optimal prescriptive trees using mixed-integer optimization (MIO) technology. We show that under mild conditions our method is asymptotically exact in the sense that it converges to an optimal out-of-sample treatment assignment policy as the number of historical data samples tends to infinity. This sets us apart from existing literature on the topic which either requires data to be randomized or imposes stringent assumptions on the trees. Based on extensive computational experiments on both synthetic and real data, we demonstrate that our asymptotic guarantees translate to significant out-of-sample performance improvements even in finite samples.

【2】 On the Stability Properties and the Optimization Landscape of Training Problems with Squared Loss for Neural Networks and General Nonlinear Conic Approximation Schemes 标题:神经网络平方损失训练问题及一般非线性二次逼近方案的稳定性和优化前景 链接:https://arxiv.org/abs/2011.03293

作者:Constantin Christof 机构:Technische Universit¨at M¨unchen, Chair of Optimal Control, Center for Mathematical Sciences, M, Boltzmannstraße , Garching, Germany 备注:Some comments and explanations have been added. The result on regularized training problems has been strengthened 摘要:我们研究了神经网络和一般非线性二次曲线逼近方案的最优景观和平方损失训练问题的稳定性。证明了,如果认为非线性圆锥近似格式(在适当定义的意义上)比经典线性近似方法更具表现力,并且如果存在不可实现的标签向量,那么,具有平方损失的训练问题必然是不稳定的,因为其解集不连续地依赖于训练数据中的标签向量。我们进一步证明,导致这些不稳定性性质的相同效应也是鞍点和伪局部极小值出现的原因,这些鞍点和伪局部极小值可能任意远离全局解,并且通常,训练问题的不稳定性和伪局部极小值的存在都不能,可以通过向目标函数添加正则化项来克服该问题,该项会惩罚近似方案中参数的大小。无论是否满足可变现性假设,后一种结果都是正确的。我们证明,我们的分析特别适用于自由节点插值方案和具有可变宽度的深度和浅层神经网络的训练问题,这些神经网络涉及各种激活函数的任意混合(例如,二进制、sigmoid、tanh、arctan、软符号、ISRU、软剪辑、SQNL、ReLU、泄漏ReLU、soft plus、,bent identity、SILU、ISRLU和ELU)。总之,本文的研究结果表明,神经网络和一般非线性二次曲线逼近工具的改进逼近特性与优化问题的不良特性有着直接的、可量化的联系,而优化问题的不良特性是为了训练它们而必须解决的。 摘要:We study the optimization landscape and the stability properties of training problems with squared loss for neural networks and general nonlinear conic approximation schemes. It is demonstrated that, if a nonlinear conic approximation scheme is considered that is (in an appropriately defined sense) more expressive than a classical linear approximation approach and if there exist unrealizable label vectors, then a training problem with squared loss is necessarily unstable in the sense that its solution set depends discontinuously on the label vector in the training data. We further prove that the same effects that are responsible for these instability properties are also the reason for the emergence of saddle points and spurious local minima, which may be arbitrarily far away from global solutions, and that neither the instability of the training problem nor the existence of spurious local minima can, in general, be overcome by adding a regularization term to the objective function that penalizes the size of the parameters in the approximation scheme. The latter results are shown to be true regardless of whether the assumption of realizability is satisfied or not. We demonstrate that our analysis in particular applies to training problems for free-knot interpolation schemes and deep and shallow neural networks with variable widths that involve an arbitrary mixture of various activation functions (e.g., binary, sigmoid, tanh, arctan, soft-sign, ISRU, soft-clip, SQNL, ReLU, leaky ReLU, soft-plus, bent identity, SILU, ISRLU, and ELU). In summary, the findings of this paper illustrate that the improved approximation properties of neural networks and general nonlinear conic approximation instruments are linked in a direct and quantifiable way to undesirable properties of the optimization problems that have to be solved in order to train them.

预测|估计(5篇)

【1】 Bubblewrap: Online tiling and real-time flow prediction on neural manifolds 标题:Bubblewrap:神经流形上的在线平铺和实时流量预测 链接:https://arxiv.org/abs/2108.13941

作者:Anne Draelos,Pranjal Gupta,Na Young Jun,Chaichontat Sriworarat,John Pearson 机构:Biostatistics & Bioinformatics,Psychology & Neuroscience,Neurobiology,Electrical & Com-, puter Engineering, Duke University 摘要:虽然实验神经科学中大多数经典的功能研究都集中在单个神经元的编码特性上,但记录技术的最新发展使得人们越来越重视神经种群的动力学。这就产生了各种各样的模型,用于分析与实验变量相关的群体活动,但直接测试许多神经群体假设需要基于当前神经状态干预系统,因此需要能够在线推断神经状态的模型。现有的方法主要基于动力系统,需要强大的参数假设,这些假设在噪声占主导地位的情况下很容易被违反,并且在现代实验中无法很好地扩展到数千个数据通道。为了解决这个问题,我们提出了一种方法,将快速、稳定的降维与生成的神经流形的软平铺相结合,使动力学近似为平铺之间的概率流。该方法可以使用在线期望最大化进行有效拟合,可扩展到数万个瓦片,并且在动力学以噪声为主或具有多模态转移概率的情况下优于现有方法。由此产生的模型可以以千赫兹的数据速率进行训练,在几分钟内产生精确的神经动力学近似值,并在亚毫秒的时间尺度上产生预测。它在未来的许多时间步骤中保持预测性能,并且速度足够快,可以作为闭环因果实验的一个组成部分。 摘要:While most classic studies of function in experimental neuroscience have focused on the coding properties of individual neurons, recent developments in recording technologies have resulted in an increasing emphasis on the dynamics of neural populations. This has given rise to a wide variety of models for analyzing population activity in relation to experimental variables, but direct testing of many neural population hypotheses requires intervening in the system based on current neural state, necessitating models capable of inferring neural state online. Existing approaches, primarily based on dynamical systems, require strong parametric assumptions that are easily violated in the noise-dominated regime and do not scale well to the thousands of data channels in modern experiments. To address this problem, we propose a method that combines fast, stable dimensionality reduction with a soft tiling of the resulting neural manifold, allowing dynamics to be approximated as a probability flow between tiles. This method can be fit efficiently using online expectation maximization, scales to tens of thousands of tiles, and outperforms existing methods when dynamics are noise-dominated or feature multi-modal transition probabilities. The resulting model can be trained at kiloHertz data rates, produces accurate approximations of neural dynamics within minutes, and generates predictions on submillisecond time scales. It retains predictive performance throughout many time steps into the future and is fast enough to serve as a component of closed-loop causal experiments.

【2】 Estimation of Air Pollution with Remote Sensing Data: Revealing Greenhouse Gas Emissions from Space 标题:用遥感数据估算空气污染:揭示来自太空的温室气体排放 链接:https://arxiv.org/abs/2108.13902

作者:Linus Scheibenreif,Michael Mommert,Damian Borth 机构: an-thropogenic GHG emissions from the combustion of fossilfuels in industrial plants or for transportation are harmful 1Institute of Computer Science, University of St 备注:for associated codebase, see this https URL 摘要:空气污染是气候变化的主要驱动力。为运输和发电而燃烧化石燃料所产生的人为排放物排放了大量有问题的空气污染物,包括温室气体(GHG)。尽管限制温室气体排放对缓解气候变化十分重要,但很难获得有关温室气体和其他空气污染物时空分布的详细信息。现有的地表空气污染模型依赖于广泛的土地利用数据集,这些数据集通常是局部受限的,并且是暂时静态的。这项工作为环境空气污染预测提出了一种深度学习方法,该方法仅依赖于全球可用且经常更新的遥感数据。将光学卫星图像与基于卫星的大气柱密度空气污染测量相结合,可以在任意位置将空气污染估计值(本例中为NO$_2$)缩放到高空间分辨率(高达$sim$10m),并在这些估计值中添加时间成分。当根据地面站的空气质量测量值(平均绝对误差$<$6$~mu g/m^3$)进行评估时,建议的模型具有较高的精度。我们的结果有助于识别和暂时监测空气污染和温室气体的主要来源。 摘要:Air pollution is a major driver of climate change. Anthropogenic emissions from the burning of fossil fuels for transportation and power generation emit large amounts of problematic air pollutants, including Greenhouse Gases (GHGs). Despite the importance of limiting GHG emissions to mitigate climate change, detailed information about the spatial and temporal distribution of GHG and other air pollutants is difficult to obtain. Existing models for surface-level air pollution rely on extensive land-use datasets which are often locally restricted and temporally static. This work proposes a deep learning approach for the prediction of ambient air pollution that only relies on remote sensing data that is globally available and frequently updated. Combining optical satellite imagery with satellite-based atmospheric column density air pollution measurements enables the scaling of air pollution estimates (in this case NO$_2$) to high spatial resolution (up to $sim$10m) at arbitrary locations and adds a temporal component to these estimates. The proposed model performs with high accuracy when evaluated against air quality measurements from ground stations (mean absolute error $<$6$~mu g/m^3$). Our results enable the identification and temporal monitoring of major sources of air pollution and GHGs.

【3】 Time Series Prediction using Deep Learning Methods in Healthcare 标题:基于深度学习方法的医疗时间序列预测 链接:https://arxiv.org/abs/2108.13461

作者:Mohammad Amin Morid,Olivia R. Liu Sheng,Joseph Dunbar 机构:Department of Information Systems and Analytics, Leavey School of Business, Santa Clara University, Santa Clara, CA, USA, Department of Operations and Information Systems, David Eccles School of Business, University of Utah, Salt Lake City 摘要:传统的机器学习方法在处理医疗预测分析任务时面临两个主要挑战。首先,医疗数据的高维特性需要劳动密集型和耗时的过程来为每个新任务选择一组合适的功能。其次,这些方法依赖于特征工程来捕获患者数据的连续性,这可能无法充分利用医疗事件的时间模式及其依赖性。最近的深度学习方法通过解决医学数据的高维和时间挑战,在各种医疗保健预测任务中显示了良好的性能。这些方法可以从高维原始(或最小处理)医疗数据中学习关键因素(例如,医疗概念或患者)的有用表示及其相互作用。在本文中,我们系统地回顾了一些研究,这些研究侧重于使用深度学习作为预测模型,从方法学角度利用患者时间序列数据进行医疗预测任务。为了确定相关研究,我们在MEDLINE、IEEE、Scopus和ACM数字图书馆中搜索了截至2021年2月7日发表的研究。我们发现,研究人员在十个研究流中对深度时间序列预测文献做出了贡献:深度学习模型、缺失值处理、不规则处理、患者表征,静态数据包含、注意机制、解释、合并医学本体、学习策略和可伸缩性。本研究总结了来自这些文献流的研究见解,确定了几个关键的研究差距,并提出了在患者时间序列数据中进行深入学习的未来研究机会。 摘要:Traditional machine learning methods face two main challenges in dealing with healthcare predictive analytics tasks. First, the high-dimensional nature of healthcare data needs labor-intensive and time-consuming processes to select an appropriate set of features for each new task. Secondly, these methods depend on feature engineering to capture the sequential nature of patient data, which may not adequately leverage the temporal patterns of the medical events and their dependencies. Recent deep learning methods have shown promising performance for various healthcare prediction tasks by addressing the high-dimensional and temporal challenges of medical data. These methods can learn useful representations of key factors (e.g., medical concepts or patients) and their interactions from high-dimensional raw (or minimally-processed) healthcare data. In this paper we systemically reviewed studies focused on using deep learning as the prediction model to leverage patient time series data for a healthcare prediction task from methodological perspective. To identify relevant studies, MEDLINE, IEEE, Scopus and ACM digital library were searched for studies published up to February 7th 2021. We found that researchers have contributed to deep time series prediction literature in ten research streams: deep learning models, missing value handling, irregularity handling, patient representation, static data inclusion, attention mechanisms, interpretation, incorporating medical ontologies, learning strategies, and scalability. This study summarizes research insights from these literature streams, identifies several critical research gaps, and suggests future research opportunities for deep learning in patient time series data.

【4】 Decision Tree-Based Predictive Models for Academic Achievement Using College Students' Support Networks 标题:基于决策树的大学生支持网络学业成绩预测模型 链接:https://arxiv.org/abs/2108.13947

作者:Anthony Frazier,Joethi Silva,Rachel Meilak,Indranil Sahoo,David Chan,Michael Broda 机构:Weber State University, George Mason University, Loyola Marymount University, Virginia Commonwealth University 摘要:在这项研究2019冠状病毒疾病的早期阶段,我们研究了一组来自484名学生的原始数据,这些学生是在美国大西洋中部地区的一所大型公立大学中登记的。这些数据称为Ties数据,包括学生的人口统计和支持网络信息。支持网络数据包括强调支持类型的信息(即情感或教育支持);常规的或激烈的)。利用该数据集,使用决策树算法卡方自动交互检测(CHAID)和使用条件推理树的随机森林算法cforest,创建了预测学生学业成绩的模型,该模型通过学生自我报告的GPA进行量化。我们比较了每种算法所建议的一组重要变量的方法的准确性和变化。每种算法都发现不同的变量对不同的学生人口统计数据有一定的重叠。对于白人学生,不同类型的教育支持对学业成绩的预测很重要,而对于非白人学生,不同类型的情感支持对学业成绩的预测很重要。不同类型的常规支持对预测CISEXED女性的学业成绩很重要,而不同类型的强化支持对预测CISEXED男性的学业成绩很重要。 摘要:In this study, we examine a set of primary data collected from 484 students enrolled in a large public university in the Mid-Atlantic United States region during the early stages of the COVID-19 pandemic. The data, called Ties data, included students' demographic and support network information. The support network data comprised of information that highlighted the type of support, (i.e. emotional or educational; routine or intense). Using this data set, models for predicting students' academic achievement, quantified by their self-reported GPA, were created using Chi-Square Automatic Interaction Detection (CHAID), a decision tree algorithm, and cforest, a random forest algorithm that uses conditional inference trees. We compare the methods' accuracy and variation in the set of important variables suggested by each algorithm. Each algorithm found different variables important for different student demographics with some overlap. For White students, different types of educational support were important in predicting academic achievement, while for non-White students, different types of emotional support were important in predicting academic achievement. The presence of differing types of routine support were important in predicting academic achievement for cisgender women, while differing types of intense support were important in predicting academic achievement for cisgender men.

【5】 On the interpretation of black-box default prediction models: an Italian Small and Medium Enterprises case 标题:关于黑箱违约预测模型的解读--以意大利中小企业为例 链接:https://arxiv.org/abs/2108.13914

作者:Lisa Crosato,Caterina Liberati,Marco Repetto 摘要:近年来,由于机器学习算法能够解决复杂的学习任务,学术研究和金融行业对其给予了极大的关注。然而,在企业违约预测领域,缺乏可解释性阻碍了黑箱模型的广泛采用。为了克服这个缺点并保持黑箱的高性能,本文采用了一种模型不可知的方法。累积局部效应和Shapley值用于塑造预测因素对违约可能性的影响,并根据其对模型结果的贡献对其进行排序。与三种标准判别模型相比,预测是通过两种机器学习算法(极端梯度增强和前馈神经网络)实现的。结果表明,我们对意大利中小企业制造业的分析得益于极端梯度推进算法的总体最高分类权,而不放弃丰富的解释框架。 摘要:Academic research and the financial industry have recently paid great attention to Machine Learning algorithms due to their power to solve complex learning tasks. In the field of firms' default prediction, however, the lack of interpretability has prevented the extensive adoption of the black-box type of models. To overcome this drawback and maintain the high performances of black-boxes, this paper relies on a model-agnostic approach. Accumulated Local Effects and Shapley values are used to shape the predictors' impact on the likelihood of default and rank them according to their contribution to the model outcome. Prediction is achieved by two Machine Learning algorithms (eXtreme Gradient Boosting and FeedForward Neural Network) compared with three standard discriminant models. Results show that our analysis of the Italian Small and Medium Enterprises manufacturing industry benefits from the overall highest classification power by the eXtreme Gradient Boosting algorithm without giving up a rich interpretation framework.

其他神经网络|深度学习|模型|建模(13篇)

【1】 Designing Rotationally Invariant Neural Networks from PDEs and Variational Methods 标题:用偏微分方程和变分法设计旋转不变神经网络 链接:https://arxiv.org/abs/2108.13993

作者:Tobias Alt,Karl Schrader,Joachim Weickert,Pascal Peter,Matthias Augustin 机构: Saarland University 摘要:偏微分方程(PDE)模型及其相关的变分能量公式通常具有旋转不变性。这确保了输入的旋转导致输出的相应旋转,这在图像分析等应用中是理想的。卷积神经网络(CNN)不具备这种特性,现有的补救措施往往很复杂。本文的目的是研究扩散模型和变分模型如何实现旋转不变性,并将这些思想转化为神经网络。作为一个核心创新点,我们提出了激活函数,该函数通过组合来自多个定向滤波器的信息来耦合网络通道。这保证了网络基本构建块内的旋转不变性,同时仍然允许方向滤波。由此产生的神经结构本质上是旋转不变的。只需几个小的过滤器,它们就可以实现与现有技术相同的不变性,而现有技术需要对方向进行细粒度采样。我们的发现有助于将扩散和变分模型转化为数学上基础良好的网络体系结构,并为基于模型的CNN设计提供新的概念。 摘要:Partial differential equation (PDE) models and their associated variational energy formulations are often rotationally invariant by design. This ensures that a rotation of the input results in a corresponding rotation of the output, which is desirable in applications such as image analysis. Convolutional neural networks (CNNs) do not share this property, and existing remedies are often complex. The goal of our paper is to investigate how diffusion and variational models achieve rotation invariance and transfer these ideas to neural networks. As a core novelty we propose activation functions which couple network channels by combining information from several oriented filters. This guarantees rotation invariance within the basic building blocks of the networks while still allowing for directional filtering. The resulting neural architectures are inherently rotationally invariant. With only a few small filters, they can achieve the same invariance as existing techniques which require a fine-grained sampling of orientations. Our findings help to translate diffusion and variational models into mathematically well-founded network architectures, and provide novel concepts for model-based CNN design.

【2】 Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training 标题:用全批次损失的一维抛物线模型估计训练过程中的学习率 链接:https://arxiv.org/abs/2108.13880

作者:Maximus Mutschler,Andreas Zell 机构:University of T¨ubingen, Sand , D-, T¨ubingen, Germany 摘要:深度学习的一个基本挑战是寻找随机梯度下降的最佳步长。在传统优化中,线搜索是确定步长的常用方法。深度学习中的一个问题是,在整批损失上找到合适的步长是不可行的,而且代价高昂。因此,为无固有噪声的损耗而设计的经典线搜索方法通常不适用。最近的实证结果表明,整批损失在噪声更新阶跃方向上表现为局部抛物线。此外,最优更新步长的变化趋势是缓慢的。通过利用这些发现,这项工作引入了一种线搜索方法,该方法通过在几个小批次上估计抛物线来近似整个批次损失。学习率由训练期间的抛物线得出。在所进行的实验中,我们的方法在验证和测试准确性方面,在模型、数据集和批量大小的深度学习方面,大多优于采用分段恒定学习速率计划调整的SGD和其他行搜索方法。 摘要:A fundamental challenge in Deep Learning is to find optimal step sizes for stochastic gradient descent. In traditional optimization, line searches are a commonly used method to determine step sizes. One problem in Deep Learning is that finding appropriate step sizes on the full-batch loss is unfeasible expensive. Therefore, classical line search approaches, designed for losses without inherent noise, are usually not applicable. Recent empirical findings suggest that the full-batch loss behaves locally parabolically in the direction of noisy update step directions. Furthermore, the trend of the optimal update step size is changing slowly. By exploiting these findings, this work introduces a line-search method that approximates the full-batch loss with a parabola estimated over several mini-batches. Learning rates are derived from such parabolas during training. In the experiments conducted, our approach mostly outperforms SGD tuned with a piece-wise constant learning rate schedule and other line search approaches for Deep Learning across models, datasets, and batch sizes on validation and test accuracy.

【3】 Deep Learning of Transferable MIMO Channel Modes for 6G V2X Communications 标题:6G V2X通信中可转移MIMO信道模式的深度学习 链接:https://arxiv.org/abs/2108.13831

作者:Lorenzo Cazzella,Dario Tagliaferri,Marouan Mizmizi,Damiano Badini,Christian Mazzucco,Matteo Matteucci,Umberto Spagnolini 摘要:在使用毫米波(mmWave)和亚太赫兹(sub-THz)的新兴高移动性车辆到一切(V2X)通信中,多输入多输出(MIMO)信道估计是一项极具挑战性的任务。在毫米波/亚太赫兹频率下,MIMO信道在时空域(即方向或到达/离开和延迟)中表现出很少的前导路径。代数低秩(LR)信道估计通过计算位置相关的MIMO信道特征模式,利用覆盖小区中的反复训练车辆通道,利用空时信道稀疏性。LR需要车辆的地理位置和每个位置数十到数百辆训练车辆的通道,这导致了显著的复杂性和控制信号开销。在这里,我们设计了一种基于DL的LR信道估计方法,从单个LS信道估计开始,在V2X城市环境中推断MIMO信道本征模式,而不需要车辆的位置信息。数值结果表明,该方法与基于位置的LR方法相比,具有较好的均方误差(MSE)性能。此外,我们还表明,所提出的模型可以在参考场景上进行训练,并且可以有效地转移到具有不同时空信道特征的城市环境中,在没有明确的转移学习过程的情况下提供可比的MSE性能。这一结果简化了在任意密集城市场景中的部署。 摘要:In the emerging high mobility Vehicle-to-Everything (V2X) communications using millimeter Wave (mmWave) and sub-THz, Multiple-Input Multiple-Output (MIMO) channel estimation is an extremely challenging task. At mmWaves/sub-THz frequencies, MIMO channels exhibit few leading paths in the space-time domain (i.e., directions or arrival/departure and delays). Algebraic Low-rank (LR) channel estimation exploits space-time channel sparsity through the computation of position-dependent MIMO channel eigenmodes leveraging recurrent training vehicle passages in the coverage cell. LR requires vehicles' geographical positions and tens to hundreds of training vehicles' passages for each position, leading to significant complexity and control signalling overhead. Here we design a DL-based LR channel estimation method to infer MIMO channel eigenmodes in V2X urban settings, starting from a single LS channel estimate and without needing vehicle's position information. Numerical results show that the proposed method attains comparable Mean Squared Error (MSE) performance as the position-based LR. Moreover, we show that the proposed model can be trained on a reference scenario and be effectively transferred to urban contexts with different space-time channel features, providing comparable MSE performance without an explicit transfer learning procedure. This result eases the deployment in arbitrary dense urban scenarios.

【4】 Chi-square Loss for Softmax: an Echo of Neural Network Structure 标题:Softmax的卡方损失:神经网络结构的回声 链接:https://arxiv.org/abs/2108.13822

作者:Zeyu Wang,Meiqing Wang 机构:School of Mechanical Engineering & Automation, Beihang University, Beijing, China 摘要:Softmax与交叉熵的结合广泛应用于分类中,用于评估两个离散分布列(预测和真实标签)之间的相似性。受卡方检验的启发,我们设计了一个称为卡方损失的新损失函数,该函数也适用于Softmax。卡方损失具有统计学背景。我们证明了它在优化中是无偏的,并阐明了它的使用条件(它的公式决定了它必须与标签平滑一起工作)。此外,我们通过可视化研究了该损失函数的样本分布,发现该分布与神经网络结构有关,这与交叉熵相比是不同的。在过去,结构的影响在可视化时常常被忽略。卡方损失可以注意到神经网络结构的变化,因为它非常严格,我们解释了这种严格的原因。我们还研究了标签平滑的影响,并讨论了标签平滑与训练精度和稳定性之间的关系。由于卡方损失非常严格,因此在处理很多类的样本时,性能会下降。 摘要:Softmax working with cross-entropy is widely used in classification, which evaluates the similarity between two discrete distribution columns (predictions and true labels). Inspired by chi-square test, we designed a new loss function called chi-square loss, which is also works for Softmax. Chi-square loss has a statistical background. We proved that it is unbiased in optimization, and clarified its using conditions (its formula determines that it must work with label smoothing). In addition, we studied the sample distribution of this loss function by visualization and found that the distribution is related to the neural network structure, which is distinct compared to cross-entropy. In the past, the influence of structure was often ignored when visualizing. Chi-square loss can notice changes in neural network structure because it is very strict, and we explained the reason for this strictness. We also studied the influence of label smoothing and discussed the relationship between label smoothing and training accuracy and stability. Since the chi-square loss is very strict, the performance will degrade when dealing samples of very many classes.

【5】 Identifying Ransomware Actors in the Bitcoin Network 标题:识别比特币网络中的勒索软件行为者 链接:https://arxiv.org/abs/2108.13807

作者:Siddhartha Dalal,Zihe Wang,Siddhanth Sabharwal 机构:Columbia University, New York, USA 备注:None 摘要:由于比特币网络的伪匿名性,用户可以隐藏在他们的比特币地址后面,这些地址可以无限量地动态生成,而不需要他们之间的任何正式链接。因此,参与勒索和其他非法活动的行为者正在使用它进行付款转移。我们考虑的其他活动与赌博有关,因为赌博经常用于转移非法资金。这里提出的问题是,鉴于比特币交易的时间有限,人们能在多大程度上识别与这些欺诈活动相关的常见模式,并将其应用于寻找其他勒索行为人。这个问题相当复杂,因为数千个地址可能属于同一个参与者,它们之间没有任何明显的联系,也没有任何共同的行为模式。本文的主要贡献是介绍并应用新的局部聚类和有监督图机器学习算法来识别恶意参与者。我们证明,已知此类参与者的非常局部的子图足以在测试数据集上以85%的预测准确率区分勒索软件、随机和赌博参与者。 摘要:Due to the pseudo-anonymity of the Bitcoin network, users can hide behind their bitcoin addresses that can be generated in unlimited quantity, on the fly, without any formal links between them. Thus, it is being used for payment transfer by the actors involved in ransomware and other illegal activities. The other activity we consider is related to gambling since gambling is often used for transferring illegal funds. The question addressed here is that given temporally limited graphs of Bitcoin transactions, to what extent can one identify common patterns associated with these fraudulent activities and apply them to find other ransomware actors. The problem is rather complex, given that thousands of addresses can belong to the same actor without any obvious links between them and any common pattern of behavior. The main contribution of this paper is to introduce and apply new algorithms for local clustering and supervised graph machine learning for identifying malicious actors. We show that very local subgraphs of the known such actors are sufficient to differentiate between ransomware, random and gambling actors with 85% prediction accuracy on the test data set.

【6】 Deep Learning on Edge TPUs 标题:边缘TPU的深度学习 链接:https://arxiv.org/abs/2108.13732

作者:Andreas M Kist 机构: KistDepartment Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-University Erlangen-NürnbergGermanyandreas 备注:8 pages, 3 figures, 3 tables 摘要:边缘计算在远程环境中非常重要,然而,传统硬件并没有针对深度神经网络进行优化。Google Edge TPU是一款新兴的硬件加速器,具有成本、功耗和速度效率,可用于原型设计和生产目的。在此,我将回顾Edge TPU平台、使用Edge TPU完成的任务,以及将模型部署到Edge TPU硬件所需的步骤。Edge TPU不仅能够处理常见的计算机视觉任务,而且优于其他硬件加速器,尤其是当整个模型可以部署到Edge TPU时。将Edge TPU共同嵌入摄像头,可以无缝分析原始数据。总之,Edge TPU是一个成熟的系统,已经在多个任务中证明了其可用性。 摘要:Computing at the edge is important in remote settings, however, conventional hardware is not optimized for utilizing deep neural networks. The Google Edge TPU is an emerging hardware accelerator that is cost, power and speed efficient, and is available for prototyping and production purposes. Here, I review the Edge TPU platform, the tasks that have been accomplished using the Edge TPU, and which steps are necessary to deploy a model to the Edge TPU hardware. The Edge TPU is not only capable of tackling common computer vision tasks, but also surpasses other hardware accelerators, especially when the entire model can be deployed to the Edge TPU. Co-embedding the Edge TPU in cameras allows a seamless analysis of primary data. In summary, the Edge TPU is a maturing system that has proven its usability across multiple tasks.

【7】 Learning to Synthesize Programs as Interpretable and Generalizable Policies 标题:学习将程序综合为可解释和可概括的策略 链接:https://arxiv.org/abs/2108.13643

作者:Dweep Trivedi,Jesse Zhang,Shao-Hua Sun,Joseph J. Lim 机构:University of Southern California 备注:52 pages, 16 figures, 12 tables 摘要:最近,深度强化学习(DRL)方法在许多领域的任务中取得了令人印象深刻的性能。然而,使用DRL方法生成的神经网络策略不是人类可解释的,并且通常难以推广到新的场景。为了解决这些问题,先前的工作探索了更具解释性和结构化的学习计划策略。然而,这些工作要么采用有限的策略表示(如决策树、状态机或预定义的程序模板),要么需要更强的监督(如输入/输出状态对或专家演示)。我们提出了一个框架,而不是学习合成一个程序,它详细说明了以灵活和表达的方式解决任务的过程,仅从奖励信号。为了减轻学习编写程序以从零开始诱导所需代理行为的难度,我们建议首先学习一个程序嵌入空间,该空间以无监督的方式连续参数化不同的行为,然后搜索学习的程序嵌入空间,以生成一个程序,该程序可以最大化给定任务的回报。实验结果表明,该框架不仅能够学习可靠地综合任务解决程序,而且在产生可解释和更一般化的策略的同时,其性能优于DRL和程序综合基线。我们还证明了所提出的两阶段学习方案的必要性,并分析了学习程序嵌入的各种方法。 摘要:Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding.

【8】 When are Deep Networks really better than Random Forests at small sample sizes? 标题:在小样本量的情况下,什么时候深度网络真的比随机森林更好? 链接:https://arxiv.org/abs/2108.13637

作者:Haoyin Xu,Michael Ainsworth,Yu-Chung Peng,Madi Kusmanov,Sambit Panda,Joshua T. Vogelstein 机构: The smallest 1 Department of Biomedical Engineering, Johns Hopkins University 2 Department of Computer Science, Johns HopkinsUniversity † Denotes equal contribution 摘要:随机森林(RF)和深度网络(DN)是当前科学文献中最流行的两种机器学习方法,在不同的数据模式下产生不同的性能水平。我们希望进一步探索和确定每种方法的优越条件和领域,特别是在样本量和特征维度方面。为了解决这些问题,我们使用不同的模型参数和架构测试了这些方法在表格、图像和音频设置中的性能。我们的重点是最多10000个样本的数据集,这些样本代表了科学和生物医学数据集的很大一部分。总的来说,我们发现RF在小样本量的表格和结构化数据(图像和音频)方面表现出色,而DN在大样本量的结构化数据方面表现更好。尽管我们计划在未来几个月内继续更新本技术报告,但我们相信目前的初步结果可能会引起其他人的兴趣。 摘要:Random forests (RF) and deep networks (DN) are two of the most popular machine learning methods in the current scientific literature and yield differing levels of performance on different data modalities. We wish to further explore and establish the conditions and domains in which each approach excels, particularly in the context of sample size and feature dimension. To address these issues, we tested the performance of these approaches across tabular, image, and audio settings using varying model parameters and architectures. Our focus is on datasets with at most 10,000 samples, which represent a large fraction of scientific and biomedical datasets. In general, we found RF to excel at tabular and structured data (image and audio) with small sample sizes, whereas DN performed better on structured data with larger sample sizes. Although we plan to continue updating this technical report in the coming months, we believe the current preliminary results may be of interest to others.

【9】 Fast Multi-label Learning 标题:快速多标签学习 链接:https://arxiv.org/abs/2108.13570

作者:Xiuwen Gong,Dong Yuan,Wei Bao 机构: The University of Sydney 摘要:嵌入方法已经成为最普遍的多标签分类技术之一。然而,嵌入方法的训练过程通常涉及一个复杂的二次或半定规划问题,或者模型甚至可能涉及一个NP难问题。因此,此类方法禁止大规模应用。更重要的是,许多文献已经表明,二进制关联(BR)方法对于某些应用来说通常已经足够好了。不幸的是,BR运行缓慢,因为它与输入数据的大小成线性关系。本文的目标是提供一种简单但有可证明保证的方法,该方法无需复杂的训练过程即可获得具有竞争力的绩效。为了实现我们的目标,我们为多标签分类提供了一个简单的随机草图策略,并从算法和统计学习的角度给出了理论结果。我们的综合实证研究证实了我们的理论发现,并证明了所提出方法的优越性。 摘要:Embedding approaches have become one of the most pervasive techniques for multi-label classification. However, the training process of embedding methods usually involves a complex quadratic or semidefinite programming problem, or the model may even involve an NP-hard problem. Thus, such methods are prohibitive on large-scale applications. More importantly, much of the literature has already shown that the binary relevance (BR) method is usually good enough for some applications. Unfortunately, BR runs slowly due to its linear dependence on the size of the input data. The goal of this paper is to provide a simple method, yet with provable guarantees, which can achieve competitive performance without a complex training process. To achieve our goal, we provide a simple stochastic sketch strategy for multi-label classification and present theoretical results from both algorithmic and statistical learning perspectives. Our comprehensive empirical studies corroborate our theoretical findings and demonstrate the superiority of the proposed methods.

【10】 Regularizing (Stabilizing) Deep Learning Based Reconstruction Algorithms 标题:基于正则化(稳定化)深度学习的重构算法 链接:https://arxiv.org/abs/2108.13551

作者:Abinash Nayak 摘要:众所周知,反问题是不适定的,要有意义地解决它们,必须使用正则化方法。传统上,流行的正则化方法是惩罚变分方法。近年来,经典的正则化重建方法已经被基于深度学习的学习重建算法所超越。然而,与传统的正则化方法不同,这种学习重建算法的理论基础(如稳定性和正则化)并不充分。因此,从这些算法中获得的结果,尽管在经验上很突出,但并不总是完全可信的,因为它们可能包含学习过程中产生的某些不稳定性或(幻觉)特征。事实上,已经证明,此类学习算法非常容易受到数据中的小(对抗性)噪声的影响,并可能导致恢复解中的严重不稳定性,这可能与不适定(逆)问题的固有不稳定性大不相同。然而,经典的正则化方法可以很好地处理此类(对抗性)噪声,并且可以产生稳定的恢复。在这里,我们试图提出一些正则化方法来稳定这种(不稳定的)学习重建方法,并恢复正则化解,即使在存在对抗性噪声的情况下。为此,我们需要扩展正则化的经典概念,并将其纳入学习的重建算法中。我们还介绍了一些正则化技术来正则化两种最流行的学习重建算法,即学习后处理重建和学习展开重建。 摘要:It's well-known that inverse problems are ill-posed and to solve them meaningfully one has to employ regularization methods. Traditionally, popular regularization methods have been the penalized Variational approaches. In recent years, the classical regularized-reconstruction approaches have been outclassed by the (deep-learning-based) learned reconstruction algorithms. However, unlike the traditional regularization methods, the theoretical underpinnings, such as stability and regularization, have been insufficient for such learned reconstruction algorithms. Hence, the results obtained from such algorithms, though empirically outstanding, can't always be completely trusted, as they may contain certain instabilities or (hallucinated) features arising from the learned process. In fact, it has been shown that such learning algorithms are very susceptible to small (adversarial) noises in the data and can lead to severe instabilities in the recovered solution, which can be quite different than the inherent instabilities of the ill-posed (inverse) problem. Whereas, the classical regularization methods can handle such (adversarial) noises very well and can produce stable recovery. Here, we try to present certain regularization methods to stabilize such (unstable) learned reconstruction methods and recover a regularized solution, even in the presence of adversarial noises. For this, we need to extend the classical notion of regularization and incorporate it in the learned reconstruction algorithms. We also present some regularization techniques to regularize two of the most popular learning reconstruction algorithms, the Learned Post-Processing Reconstruction and the Learned Unrolling Reconstruction.

【11】 A Convolutional Neural Network-based Approach to Field Reconstruction 标题:一种基于卷积神经网络的场重建方法 链接:https://arxiv.org/abs/2108.13517

作者:Roberto Ponciroli,Andrea Rovinelli,Lander Ibarra 机构:Nuclear Science and Engineering Division, Argonne National Laboratory, Lemont, IL , Applied Materials Division 摘要:这项工作已提交给IEEE进行可能的出版。版权可在不另行通知的情况下转让,此后可能无法再访问此版本。在许多应用中,需要仔细监测场的空间分布,以检测尖峰、不连续或危险的异质性,但不能使用侵入性监测方法。此外,由于无法采用准确的系统模型,可能无法获得有关该过程的技术规范。在这项工作中,提出了一种物理信息、数据驱动的算法,可以满足这些需求。该方法基于卷积神经网络中边界元法(BEM)方案的实现。由于能够用较少的参数表示任何连续的数学函数,网络允许在给定边界条件和域内少量测量的情况下,预测域内任何点的场值。所提出的方法被应用于重建三维区域上由亥姆霍兹方程描述的场。通过调查不同的物理条件和不同的网络配置,还进行了敏感性分析。由于唯一的假设是边界元法的适用性,目前的方法可以应用于广泛过程的监测,从水库内污染物来源的定位到核反应堆中子通量的监测。 摘要:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. In many applications, the spatial distribution of a field needs to be carefully monitored to detect spikes, discontinuities or dangerous heterogeneities, but invasive monitoring approaches cannot be used. Besides, technical specifications about the process might not be available by preventing the adoption of an accurate model of the system. In this work, a physics-informed, data-driven algorithm that allows addressing these requirements is presented. The approach is based on the implementation of a boundary element method (BEM)-scheme within a convolutional neural network. Thanks to the capability of representing any continuous mathematical function with a reduced number of parameters, the network allows predicting the field value in any point of the domain, given the boundary conditions and few measurements within the domain. The proposed approach was applied to reconstruct a field described by the Helmholtz equation over a three-dimensional domain. A sensitivity analysis was also performed by investigating different physical conditions and different network configurations. Since the only assumption is the applicability of BEM, the current approach can be applied to the monitoring of a wide range of processes, from the localization of the source of pollutant within a water reservoir to the monitoring of the neutron flux in a nuclear reactor.

【12】 Machine Learning Methods for Management UAV Flocks -- a Survey 标题:无人机群管理的机器学习方法--综述 链接:https://arxiv.org/abs/2108.13448

作者:Rina Azoulay,Yoram Haddad,Shulamit Reches 机构:Jerusalem College of Technology, Jerusalem, Israel 摘要:由于技术的进步和成本的大幅降低,近年来无人机的发展势头越来越强劲。无人机技术可广泛应用于通信、农业、安全和运输等领域。将无人机分为特定领域的集群/集群可能很有用,集群可以缓解与无人机使用相关的各种挑战。无人机群体管理中出现了一些计算难题,可以通过使用机器学习(ML)方法来解决。在本次调查中,我们描述了与无人机和现代ML方法相关的基本术语,并提供了相关教程和调查的概述。我们随后考虑不同的挑战出现在无人机成群。对于每一个问题,我们调查了文献中提出的几种基于机器学习的方法,以应对相关的挑战。此后,我们描述了各种开放性问题,在这些问题中,ML可以应用于解决群体的不同挑战,并且我们建议为此目的使用ML方法的方法。这一全面的回顾可能对研究人员和开发人员都有用,可以提供适用于羊群管理的最新ML技术各个方面的广泛视角。 摘要:The development of unmanned aerial vehicles (UAVs) has been gaining momentum in recent years owing to technological advances and a significant reduction in their cost. UAV technology can be used in a wide range of domains, including communication, agriculture, security, and transportation. It may be useful to group the UAVs into clusters/flocks in certain domains, and various challenges associated with UAV usage can be alleviated by clustering. Several computational challenges arise in UAV flock management, which can be solved by using machine learning (ML) methods. In this survey, we describe the basic terms relating to UAVS and modern ML methods, and we provide an overview of related tutorials and surveys. We subsequently consider the different challenges that appear in UAV flocks. For each issue, we survey several machine learning-based methods that have been suggested in the literature to handle the associated challenges. Thereafter, we describe various open issues in which ML can be applied to solve the different challenges of flocks, and we suggest means of using ML methods for this purpose. This comprehensive review may be useful for both researchers and developers in providing a wide view of various aspects of state-of-the-art ML technologies that are applicable to flock management.

【13】 Bayesian learning of forest and tree graphical models 标题:森林和树木图形模型的贝叶斯学习 链接:https://arxiv.org/abs/2108.13992

作者:Edmund Jones 机构:A dissertation submitted to the University of Bristol, in accordance with the requirements for award of the, School of Mathematics, Statistics Group, This is a compact version of my PhD thesis, not the official version that was submitted to the 备注:PhD thesis, 2013, University of Bristol; 148 pages, 24 figures 摘要:在高斯图形模型结构的贝叶斯学习中,通常使用MCMC或随机鸟枪搜索(SSS)等方法,将注意力限制在某些类别的图上,并通过从一个图重复移动到另一个图来近似后验分布。我给出了不可分解图算法的两个修正版本,并讨论了随机图分布,特别是先验分布。本论文的主要主题是基于森林或树木的贝叶斯结构学习。限制对这些图的关注可以用随机图上的定理来证明。我描述了如何使用Chow$unicode{x2013}$Liu算法和矩阵树定理来查找映射林和树上后验分布中的某些数量。我给出了MCMC和SSS的改进版本,用于近似森林和树木的后验分布,以及存储这些图的系统,以便很容易选择移动到相邻图。实验表明,当真实图是树或稀疏图时,带树的SSS效果良好。在某些情况下,具有树或森林的SSS比具有可分解图的SSS做得更好。图先验改进了中心点的检测,但需要大范围的概率。森林中的MCMC不能很好地混合,树上的MCMC比SSS慢(有关更长的摘要,请参见论文。) 摘要:In Bayesian learning of Gaussian graphical model structure, it is common to restrict attention to certain classes of graphs and approximate the posterior distribution by repeatedly moving from one graph to another, using MCMC or methods such as stochastic shotgun search (SSS). I give two corrected versions of an algorithm for non-decomposable graphs and discuss random graph distributions, in particular as prior distributions. The main topic of the thesis is Bayesian structure-learning with forests or trees. Restricting attention to these graphs can be justified using theorems on random graphs. I describe how to use the Chow$unicode{x2013}$Liu algorithm and the Matrix Tree Theorem to find the MAP forest and certain quantities in the posterior distribution on trees. I give adapted versions of MCMC and SSS for approximating the posterior distribution for forests and trees, and systems for storing these graphs so that it is easy to choose moves to neighbouring graphs. Experiments show that SSS with trees does well when the true graph is a tree or sparse graph. SSS with trees or forests does better than SSS with decomposable graphs in certain cases. Graph priors improve detection of hubs but need large ranges of probabilities. MCMC on forests fails to mix well and MCMC on trees is slower than SSS. (For a longer abstract see the thesis.)

其他(12篇)

【1】 Effective Sequence-to-Sequence Dialogue State Tracking 标题:有效的序列到序列对话状态跟踪 链接:https://arxiv.org/abs/2108.13990

作者:Jeffrey Zhao,Mahdis Mahdieh,Ye Zhang,Yuan Cao,Yonghui Wu 机构:Google Research 备注:Accepted at EMNLP 2021 摘要:序列到序列模型已经应用于各种NLP任务,但是如何正确地使用它们进行对话状态跟踪还没有得到系统的研究。在本文中,我们从训练前的目标以及上下文表示的格式的角度来研究这个问题。我们证明了训练前目标的选择对状态跟踪质量有显著影响。特别是,我们发现掩蔽跨度预测比自回归语言建模更有效。我们还探讨了使用Pegasus(一种基于跨度预测的文本摘要预训练目标)作为状态跟踪模型。我们发现,对于看似遥远的摘要任务的预训练对于对话状态跟踪效果出奇地好。此外,我们发现,虽然循环状态上下文表示也相当有效,但该模型可能很难从早期错误中恢复。我们在MultiWOZ 2.1-2.4数据集上进行了实验,观察结果一致。 摘要:Sequence-to-sequence models have been applied to a wide variety of NLP tasks, but how to properly use them for dialogue state tracking has not been systematically investigated. In this paper, we study this problem from the perspectives of pre-training objectives as well as the formats of context representations. We demonstrate that the choice of pre-training objective makes a significant difference to the state tracking quality. In particular, we find that masked span prediction is more effective than auto-regressive language modeling. We also explore using Pegasus, a span prediction-based pre-training objective for text summarization, for the state tracking model. We found that pre-training for the seemingly distant summarization task works surprisingly well for dialogue state tracking. In addition, we found that while recurrent state context representation works also reasonably well, the model may have a hard time recovering from earlier mistakes. We conducted experiments on the MultiWOZ 2.1-2.4 data sets with consistent observations.

【2】 Approximation Methods for Partially Observed Markov Decision Processes (POMDPs) 标题:部分观测马尔可夫决策过程(POMDP)的逼近方法 链接:https://arxiv.org/abs/2108.13965

作者:Caleb M. Bowyer 摘要:POMDP对于外部观察者不完全了解真实底层状态的系统是有用的模型;外部观察者不完全了解系统的真实状态,并观察真实系统状态的噪声版本。当POMDP中的系统状态数较大时,通常需要使用近似方法来获得控制的近似最优解。本综述以有限状态POMDP的起源、理论和近似为中心。为了理解POMDP,需要理解autoref{mdp}中的有限状态马尔可夫决策过程(mdp)和autoref{hmm}中的隐马尔可夫模型(hmm)。对于这个背景理论,我只提供了关于MDP和HMM的基本细节,并在深入研究POMDP的主要主题之前,将更长的论述留给教科书处理。一旦覆盖了所需的背景,POMDP将在autoref{POMDP}中引入。POMDP的起源在经典论文部分autoref{classic}中解释。一旦从精确的方法论角度理解了高计算要求,主要的近似方法将在autoref{approximations}中进行调查。然后,我在autoref{conclusion}中以一些新的研究方向结束了调查。 摘要:POMDPs are useful models for systems where the true underlying state is not known completely to an outside observer; the outside observer incompletely knows the true state of the system, and observes a noisy version of the true system state. When the number of system states is large in a POMDP that often necessitates the use of approximation methods to obtain near optimal solutions for control. This survey is centered around the origins, theory, and approximations of finite-state POMDPs. In order to understand POMDPs, it is required to have an understanding of finite-state Markov Decision Processes (MDPs) in autoref{mdp} and Hidden Markov Models (HMMs) in autoref{hmm}. For this background theory, I provide only essential details on MDPs and HMMs and leave longer expositions to textbook treatments before diving into the main topics of POMDPs. Once the required background is covered, the POMDP is introduced in autoref{pomdp}. The origins of the POMDP are explained in the classical papers section autoref{classical}. Once the high computational requirements are understood from the exact methodological point of view, the main approximation methods are surveyed in autoref{approximations}. Then, I end the survey with some new research directions in autoref{conclusion}.

【3】 APS: Active Pretraining with Successor Features 标题:APS:具有后续功能的主动预训练 链接:https://arxiv.org/abs/2108.13956

作者:Hao Liu,Pieter Abbeel 机构: 1University of California 备注:Appeared in ICML 2021 摘要:我们引入了一个新的无监督的强化学习预训练目标。在无监督的无报酬预训练阶段,agent最大化策略诱导的任务和状态之间的互信息。我们的主要贡献是这个棘手数量的一个新的下限。我们表明,通过重新解释和结合变分后继特征~citep{Hansen2020Fast}和非参数熵最大化~citep{liu2021behavior},可以有效地优化棘手的互信息。本文提出的基于后继特征的主动预训练方法(APS)通过非参数熵最大化来探索环境,并通过变化后继特征有效地利用所探索的数据来学习行为。APS解决了现有基于互信息最大化和基于熵最大化的无监督RL的局限性,并结合了这两个方面的优点。在Atari 100k数据效率基准上进行评估时,我们的方法明显优于以前将无监督预训练与任务特定微调相结合的方法。 摘要:We introduce a new unsupervised pretraining objective for reinforcement learning. During the unsupervised reward-free pretraining phase, the agent maximizes mutual information between tasks and states induced by the policy. Our key contribution is a novel lower bound of this intractable quantity. We show that by reinterpreting and combining variational successor features~citep{Hansen2020Fast} with nonparametric entropy maximization~citep{liu2021behavior}, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via nonparametric entropy maximization, and the explored data can be efficiently leveraged to learn behavior by variational successor features. APS addresses the limitations of existing mutual information maximization based and entropy maximization based unsupervised RL, and combines the best of both worlds. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning.

【4】 Towards a Common Testing Terminology for Software Engineering and Artificial Intelligence Experts 标题:为软件工程和人工智能专家建立通用测试术语 链接:https://arxiv.org/abs/2108.13837

作者:Lisa Jöckel,Thomas Bauer,Michael Kläs,Marc P. Hauer,Janek Groß 机构: Fraunhofer Institute for Experimental Software Engineering IESE, Fraunhofer-Platz , Kaiserslautern, Germany, Algorithm Accountability Lab, TU Kaiserslautern, Gottlieb-Daimler-Straße 备注:Submitted, currently under review 摘要:分析质量保证,特别是测试,是软件密集型系统开发的一个组成部分。随着人工智能(AI)和机器学习(ML)作为此类系统的一部分使用的增加,这变得更加困难,因为众所周知的软件测试方法无法直接应用于系统中支持AI的部分。人工智能和软件工程专家之间更深入的理解和交流将有助于对经典测试方法进行必要的调整,并开发新的人工智能概念。这方面的一个主要障碍是,我们从两个社区使用的不同术语中看到。由于我们认为测试术语的相互理解是一个关键,本文贡献了最重要的概念之间的映射,从经典的软件测试和人工智能测试。在映射中,我们强调了映射概念的相关性和命名方面的差异。 摘要:Analytical quality assurance, especially testing, is an integral part of software-intensive system development. With the increased usage of Artificial Intelligence (AI) and Machine Learning (ML) as part of such systems, this becomes more difficult as well-understood software testing approaches cannot be applied directly to the AI-enabled parts of the system. The required adaptation of classical testing approaches and development of new concepts for AI would benefit from a deeper understanding and exchange between AI and software engineering experts. A major obstacle on this way, we see in the different terminologies used in the two communities. As we consider a mutual understanding of the testing terminology as a key, this paper contributes a mapping between the most important concepts from classical software testing and AI testing. In the mapping, we highlight differences in relevance and naming of the mapped concepts.

【5】 SemIE: Semantically-aware Image Extrapolation 标题:Semie:语义感知的图像外推 链接:https://arxiv.org/abs/2108.13702

作者:Bholeshwar Khurana,Soumya Ranjan Dash,Abhishek Bhatia,Aniruddha Mahapatra,Hrituraj Singh,Kuldeep Kulkarni 机构:IIT Kanpur, Adobe Research India, Triomics 备注:To appear in International Conference on Computer Vision (ICCV) 2021. Project URL: this https URL 摘要:我们提出了一个语义感知的新范例来执行图像外推,从而能够添加新的对象实例。所有以前的方法都局限于其外推能力,仅扩展图像中已经存在的对象。然而,我们提出的方法不仅关注(i)扩展已经存在的对象,还关注(ii)基于上下文在扩展区域中添加新对象。为此,对于给定的图像,我们首先使用最先进的语义分割方法获得对象分割图。因此,将获得的分割图输入网络,以计算外推语义分割和相应的全景分割图。进一步利用输入图像和获得的分割图来生成最终的外推图像。我们在Cityscapes和ADE20K卧室数据集上进行了实验,结果表明,我们的方法在FID和对象共现统计的相似性方面优于所有基线。 摘要:We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the extended region based on the context. To this end, for a given image, we first obtain an object segmentation map using a state-of-the-art semantic segmentation method. The, thus, obtained segmentation map is fed into a network to compute the extrapolated semantic segmentation and the corresponding panoptic segmentation maps. The input image and the obtained segmentation maps are further utilized to generate the final extrapolated image. We conduct experiments on Cityscapes and ADE20K-bedroom datasets and show that our method outperforms all baselines in terms of FID, and similarity in object co-occurrence statistics.

【6】 Towards Out-Of-Distribution Generalization: A Survey 标题:走向离散型泛化:综述 链接:https://arxiv.org/abs/2108.13624

作者:Zheyan Shen,Jiashuo Liu,Yue He,Xingxuan Zhang,Renzhe Xu,Han Yu,Peng Cui 机构:Member, IEEE 摘要:经典的机器学习方法建立在$i.i.d.$假设的基础上,即训练和测试数据是独立的且分布相同的。然而,在实际场景中,难以满足$i.i.d.$假设,使得经典机器学习算法在分布转移下的性能急剧下降,这表明研究分布外泛化问题的重要性。分布外(OOD)泛化问题解决了测试分布未知且与训练不同的具有挑战性的环境。本文首次从定义、方法、评价、意义和未来方向等方面系统、全面地讨论了面向对象的泛化问题。首先,我们给出了OOD泛化问题的形式化定义。其次,根据现有方法在整个学习过程中的位置,将其分为三个部分,即无监督表示学习、有监督模型学习和优化,并详细讨论了每一类的典型方法。然后,我们展示了不同类别的理论联系,并介绍了常用的数据集和评估指标。最后,我们对全文进行了总结,并对面向对象的泛化问题提出了一些未来的研究方向。本次调查中回顾的OOD概括方法总结可在http://out-of-distribution-generalization.com. 摘要:Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at http://out-of-distribution-generalization.com.

【7】 A New Approach to Multilinear Dynamical Systems and Control 标题:多线性动力系统及其控制的一种新方法 链接:https://arxiv.org/abs/2108.13583

作者:Randy C. Hoover,Kyle Caudle,Karen Braman 机构: theseThe current research was supported in part by the Department of theNavy 摘要:本文提出了一种多线性动力系统分析与控制的新方法。该方法基于张量分解和新定义的循环代数的最新发展。特别是,在右张量乘法算子下,三阶张量可以写成三阶张量的乘积,类似于传统的矩阵特征值分解,“特征向量”变成特征矩阵,“特征值”变成特征元组。这一新发展允许定义适当的张量特征值分解,并通过{张量指数}自然扩展到线性系统理论。通过这个框架,我们将线性系统理论中使用的许多传统技术扩展到它们的多线性对应项。 摘要:The current paper presents a new approach to multilinear dynamical systems analysis and control. The approach is based upon recent developments in tensor decompositions and a newly defined algebra of circulants. In particular, it is shown that under the right tensor multiplication operator, a third order tensor can be written as a product of third order tensors that is analogous to a traditional matrix eigenvalue decomposition where the "eigenvectors" become eigenmatrices and the "eigenvalues" become eigen-tuples. This new development allows for a proper tensor eigenvalue decomposition to be defined and has natural extension to linear systems theory through a textit{tensor-exponential}. Through this framework we extend many of traditional techniques used in linear system theory to their multilinear counterpart.

【8】 DoWhy: Addressing Challenges in Expressing and Validating Causal Assumptions 标题:为什么这么做:解决表达和验证因果假设的挑战 链接:https://arxiv.org/abs/2108.13518

作者:Amit Sharma,Vasilis Syrgkanis,Cheng Zhang,Emre Kıcıman 备注:Presented at ICML 2021 Workshop on the Neglected Assumptions in Causal Inference(NACI) 摘要:因果效应的估计涉及有关数据生成过程的关键假设,如效应的方向性、工具变量或中介的存在,以及是否观察到所有相关的混杂因素。违反任何这些假设都会导致效果估计中的重大错误。然而,与预测模型的交叉验证不同,因果估计没有全局验证方法。因此,正式表达不同的因果假设并(尽可能)验证它们对于任何分析都至关重要。我们提出了DoWhy框架,该框架允许通过因果图明确声明假设,并提供多个验证测试来检查这些假设的子集。我们在DoWhy的经验突出了未来研究中的一些开放性问题:开发因果图以外的新方法来表达假设,因果发现在学习因果图相关部分中的作用,以及开发能够更好地检测错误的验证测试,包括平均和条件治疗效果。DoWhy可在https://github.com/microsoft/dowhy. 摘要:Estimation of causal effects involves crucial assumptions about the data-generating process, such as directionality of effect, presence of instrumental variables or mediators, and whether all relevant confounders are observed. Violation of any of these assumptions leads to significant error in the effect estimate. However, unlike cross-validation for predictive models, there is no global validator method for a causal estimate. As a result, expressing different causal assumptions formally and validating them (to the extent possible) becomes critical for any analysis. We present DoWhy, a framework that allows explicit declaration of assumptions through a causal graph and provides multiple validation tests to check a subset of these assumptions. Our experience with DoWhy highlights a number of open questions for future research: developing new ways beyond causal graphs to express assumptions, the role of causal discovery in learning relevant parts of the graph, and developing validation tests that can better detect errors, both for average and conditional treatment effects. DoWhy is available at https://github.com/microsoft/dowhy.

【9】 Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision 标题:低碳计算机视觉的全周期能耗基准 链接:https://arxiv.org/abs/2108.13465

作者:Bo Li,Xinyang Jiang,Donglin Bai,Yuge Zhang,Ningxin Zheng,Xuanyi Dong,Lu Liu,Yuqing Yang,Dongsheng Li 机构:Microsoft Research Asia, University of Technology Sydney 备注:ArXiv Preprint 摘要:深度学习模式的能源消耗正以惊人的速度增长,这引起了人们对全球变暖和气候变化背景下碳中和的潜在负面影响的担忧。随着有效的深度学习技术的进步,例如模型压缩,研究人员可以获得参数更少、延迟更小的有效模型。然而,大多数现有的有效的深度学习方法没有明确地将能耗作为一个关键性能指标。此外,现有的方法大多侧重于得到的有效模型的推理成本,而忽略了算法整个生命周期中显著的能量消耗。在本文中,我们提出了第一个高效计算机视觉模型的大规模能源消耗基准,其中提出了一个新的指标来明确评估不同模型使用强度下的全周期能源消耗。当在不同的模型使用场景中选择有效的深度学习算法时,基准可以为低碳排放提供见解。 摘要:The energy consumption of deep learning models is increasing at a breathtaking rate, which raises concerns due to potential negative effects on carbon neutrality in the context of global warming and climate change. With the progress of efficient deep learning techniques, e.g., model compression, researchers can obtain efficient models with fewer parameters and smaller latency. However, most of the existing efficient deep learning methods do not explicitly consider energy consumption as a key performance indicator. Furthermore, existing methods mostly focus on the inference costs of the resulting efficient models, but neglect the notable energy consumption throughout the entire life cycle of the algorithm. In this paper, we present the first large-scale energy consumption benchmark for efficient computer vision models, where a new metric is proposed to explicitly evaluate the full-cycle energy consumption under different model usage intensity. The benchmark can provide insights for low carbon emission when selecting efficient deep learning algorithms in different model usage scenarios.

【10】 Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms 标题:对反馈对准算法的准确性和稳健性进行基准测试 链接:https://arxiv.org/abs/2108.13446

作者:Albert Jiménez Sanfiz,Mohamed Akrout 机构:AIP Labs 摘要:反向传播算法因其简单、高效、收敛速度快而成为深度神经网络训练的默认算法。然而,它的要求使得它不可能在人脑中实现。近年来,人们提出了更具生物学意义的学习方法。其中一些方法可以匹配反向传播精度,并同时提供其他额外好处,如在专用硬件(如ASIC)上进行更快的训练,或对对抗性攻击具有更高的鲁棒性。尽管人们对该领域的兴趣越来越大,但有必要使用开源库和工具包来促进研究和基准算法。在本文中,我们介绍了BioTorch,一个用于创建、训练和测试生物激励神经网络的软件框架。此外,我们还研究了文献中提出的几种反馈对齐方法的性能,从而揭示了前向和后向权重初始化以及优化器选择的重要性。最后,我们提供了一个新的鲁棒性研究这些方法对国家的最先进的白盒和黑盒对抗性攻击。 摘要:Backpropagation is the default algorithm for training deep neural networks due to its simplicity, efficiency and high convergence rate. However, its requirements make it impossible to be implemented in a human brain. In recent years, more biologically plausible learning methods have been proposed. Some of these methods can match backpropagation accuracy, and simultaneously provide other extra benefits such as faster training on specialized hardware (e.g., ASICs) or higher robustness against adversarial attacks. While the interest in the field is growing, there is a necessity for open-source libraries and toolkits to foster research and benchmark algorithms. In this paper, we present BioTorch, a software framework to create, train, and benchmark biologically motivated neural networks. In addition, we investigate the performance of several feedback alignment methods proposed in the literature, thereby unveiling the importance of the forward and backward weight initialization and optimizer choice. Finally, we provide a novel robustness study of these methods against state-of-the-art white and black-box adversarial attacks.

【11】 A Subsampling Based Method for Causal Discovery on Discrete Data 标题:一种基于二次抽样的离散数据因果发现方法 链接:https://arxiv.org/abs/2108.13984

作者:Austin Goddard,Yu Xiang 机构:University of Utah, S Central Campus Dr #, Salt Lake City, UT, USA 备注:Accepted to the 2021 IEEE Statistical Signal Processing Workshop 摘要:在离散和分类数据上推断因果方向是一个重要但具有挑战性的问题。即使加性噪声模型(ANMs)方法可以适用于离散数据,但功能结构假设使其不适用于分类数据。受原因和机制是独立的原则的启发,已经开发了各种方法,利用独立性测试,如距离相关性度量。在这项工作中,我们采取了另一种观点,并提出了一种基于子抽样的方法来测试原因和机制的生成方案之间的独立性。我们的方法既适用于离散数据,也适用于分类数据,并且不涉及数据的任何功能模型,因此是一种更灵活的方法。为了证明我们的方法的有效性,我们在各种合成数据和真实数据实验中将其与现有基线进行了比较。 摘要:Inferring causal directions on discrete and categorical data is an important yet challenging problem. Even though the additive noise models (ANMs) approach can be adapted to the discrete data, the functional structure assumptions make it not applicable on categorical data. Inspired by the principle that the cause and mechanism are independent, various methods have been developed, leveraging independence tests such as the distance correlation measure. In this work, we take an alternative perspective and propose a subsampling-based method to test the independence between the generating schemes of the cause and that of the mechanism. Our methodology works for both discrete and categorical data and does not imply any functional model on the data, making it a more flexible approach. To demonstrate the efficacy of our methodology, we compare it with existing baselines over various synthetic data and real data experiments.

【12】 Evaluating the Robustness of Off-Policy Evaluation 标题:评价非政策评估的稳健性 链接:https://arxiv.org/abs/2108.13703

作者:Yuta Saito,Takuma Udagawa,Haruka Kiyohara,Kazuki Mogi,Yusuke Narita,Kei Tateno 机构:Hanjuku-Kaso Co., Ltd., Tokyo, Japan, Sony Group Corporation, Tokyo Institute of Technology, Stanford University, California, United States, Yale University, Connecticut, United States 备注:Accepted at RecSys2021 摘要:Off policy Evaluation(OPE)或离线评估(通常指离线评估)仅利用离线日志数据评估假设策略的性能。它特别适用于在线交互涉及高风险和昂贵设置的应用,如精确医学和推荐系统。由于已经提出了许多OPE估计器,并且其中一些估计器具有需要调整的超参数,因此从业者在选择和调整OPE估计器以满足其特定应用方面面临着新的挑战。不幸的是,从研究论文中报告的结果中识别一个可靠的估计器通常是困难的,因为目前的实验程序在一组狭窄的超参数和评估策略上评估和比较估计器的性能。因此,很难知道哪个估计器是安全可靠的。在这项工作中,我们开发了离线评估的可解释性评估(IEOE),这是一种以可解释的方式评估OPE估计器对超参数和/或评估策略变化的鲁棒性的实验程序。然后,使用IEOE程序,我们对OpenBandit数据集(一个用于OPE的大规模公共真实世界数据集)上的各种现有估计量进行了广泛的评估。我们证明了我们的方法可以评估估计器对超参数选择的鲁棒性,帮助我们避免使用不安全的估计器。最后,我们将IEOE应用于真实的电子商务平台数据,并演示如何在实践中使用我们的协议。 摘要:Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems. Since many OPE estimators have been proposed and some of them have hyperparameters to be tuned, there is an emerging challenge for practitioners to select and tune OPE estimators for their specific application. Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies. Therefore, it is difficult to know which estimator is safe and reliable to use. In this work, we develop Interpretable Evaluation for Offline Evaluation (IEOE), an experimental procedure to evaluate OPE estimators' robustness to changes in hyperparameters and/or evaluation policies in an interpretable manner. Then, using the IEOE procedure, we perform extensive evaluation of a wide variety of existing estimators on Open Bandit Dataset, a large-scale public real-world dataset for OPE. We demonstrate that our procedure can evaluate the estimators' robustness to the hyperparamter choice, helping us avoid using unsafe estimators. Finally, we apply IEOE to real-world e-commerce platform data and demonstrate how to use our protocol in practice.

机器翻译,仅供参考

0 人点赞