Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计84篇
Graph相关(图学习|图神经网络|图优化等)(5篇)
【1】 Local Permutation Equivariance For Graph Neural Networks 标题:图神经网络的局部置换等方差性 链接:https://arxiv.org/abs/2111.11840
作者:Joshua Mitton,Roderick Murray-Smith 机构:School of Computing Science, Glasgow University, Glasgow, UK 备注:Permutation equivariant update function on sub-graphs 摘要:在这项工作中,我们开发了一种新的方法,称为局部置换等变图神经网络,它提供了一个框架,通过子图,同时使用置换等变更新函数,构建在局部节点邻域上运行的图神经网络。消息传递神经网络的表达能力有限,最近的解决方法要么缺乏可伸缩性,要么需要将结构信息编码到特征空间中。本文提出的通用框架通过受限表示对子图进行操作,克服了与全局置换等价性相关的可伸缩性问题。此外,我们还证明了通过使用限制表示不存在表达能力损失。此外,所提出的框架只需要选择创建子图的$k$-跳数,以及选择用于每个层的表示空间,这使得该方法很容易适用于一系列基于图的域。我们在一系列图形基准分类任务上对该方法进行了实验验证,在所有基准上展示了最先进的结果或非常有竞争力的结果。此外,我们还证明了使用局部更新函数比全局方法在GPU内存方面有显著的改进。 摘要:In this work we develop a new method, named locally permutation-equivariant graph neural networks, which provides a framework for building graph neural networks that operate on local node neighbourhoods, through sub-graphs, while using permutation equivariant update functions. Message passing neural networks have been shown to be limited in their expressive power and recent approaches to over come this either lack scalability or require structural information to be encoded into the feature space. The general framework presented here overcomes the scalability issues associated with global permutation equivariance by operating on sub-graphs through restricted representations. In addition, we prove that there is no loss of expressivity by using restricted representations. Furthermore, the proposed framework only requires a choice of $k$-hops for creating sub-graphs and a choice of representation space to be used for each layer, which makes the method easily applicable across a range of graph based domains. We experimentally validate the method on a range of graph benchmark classification tasks, demonstrating either state-of-the-art results or very competitive results on all benchmarks. Further, we demonstrate that the use of local update functions offers a significant improvement in GPU memory over global methods.
【2】 Network In Graph Neural Network 标题:图神经网络中的网络 链接:https://arxiv.org/abs/2111.11638
作者:Xiang Song,Runjie Ma,Jiahang Li,Muhan Zhang,David Paul Wipf 机构:AWS AI, USA, AWS Shanghai AI Lab, China, Institute for Artificial Intelligence of, Peking University 摘要:图神经网络(GNNs)在从包含节点/边缘特征信息的图结构数据学习方面取得了成功,并应用于社交网络、推荐、欺诈检测和知识图推理。在这方面,过去曾提出各种策略来提高GNN的表达能力。例如,一个简单的选择是通过扩展hid den维度或增加GNN层的数量来简单地增加参数大小。然而,更宽的隐藏层很容易导致过度拟合,并且递增地添加更多的GNN层可能导致过度平滑。在本文中,我们提出了一种模型不可知的方法,即图中网络神经网络(NGNN),该方法允许任意GNN模型通过使模型更深来增加其模型容量。然而,NGNN不是增加或扩大GNN层,而是通过在每个GNN层中插入非线性前馈神经网络层来深化GNN模型。对基于ogbn产品数据的基于图形图像的GNN的NGNN的分析表明,它可以保持模型在节点特征或图形结构扰动下的稳定性。此外,对节点分类和链路预测任务的广泛评估结果表明,NGNN在不同的GNN架构中可靠地工作,它将ogbn产品的图形图像测试精度提高了1.6%,并提高了测试效率hits@100ogbl ppa上的印章得分为7.08%,且hits@20ogbl ppi上的GraphSage Edge Attr得分提高6.22%。在提交时,它在OGB链接预测排行榜上获得了两个第一名。 摘要:Graph Neural Networks (GNNs) have shown success in learning from graph structured data containing node/edge feature information, with application to social networks, recommendation, fraud detection and knowledge graph reasoning. In this regard, various strategies have been proposed in the past to improve the expressiveness of GNNs. For example, one straightforward option is to simply increase the parameter size by either expanding the hid-den dimension or increasing the number of GNN layers. However, wider hidden layers can easily lead to overfitting, and incrementally adding more GNN layers can potentially result in over-smoothing.In this paper, we present a model-agnostic methodology, namely Network In Graph Neural Network (NGNN ), that allows arbitrary GNN models to increase their model capacity by making the model deeper. However, instead of adding or widening GNN layers, NGNN deepens a GNN model by inserting non-linear feedforward neural network layer(s) within each GNN layer. An analysis of NGNN as applied to a GraphSage base GNN on ogbn-products data demonstrate that it can keep the model stable against either node feature or graph structure perturbations. Furthermore, wide-ranging evaluation results on both node classification and link prediction tasks show that NGNN works reliably across diverse GNN architectures.For instance, it improves the test accuracy of GraphSage on the ogbn-products by 1.6% and improves the hits@100 score of SEAL on ogbl-ppa by 7.08% and the hits@20 score of GraphSage Edge-Attr on ogbl-ppi by 6.22%. And at the time of this submission, it achieved two first places on the OGB link prediction leaderboard.
【3】 Learnable Structural Semantic Readout for Graph Classification 标题:用于图分类的可学习结构语义读出 链接:https://arxiv.org/abs/2111.11523
作者:Dongha Lee,Su Kim,Seonghyeon Lee,Chanyoung Park,Hwanjo Yu 机构:University of Illinois at Urbana-Champaign (UIUC), Urbana, IL, United States, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea 备注:ICDM 2021. 10 pages, 8 figures 摘要:随着深度学习在各个领域的巨大成功,图形神经网络(GNNs)也成为图形分类的主要方法。通过简单聚合所有节点(或节点簇)表示的全局读出操作,现有GNN分类器获得输入图的图级表示,并使用该表示预测其类标签。然而,这样的全局聚合不考虑每个节点的结构信息,这会导致全局结构上的信息丢失。特别是,它通过对所有节点表示强制使用相同的分类器权重参数来限制识别能力;实际上,它们中的每一个都根据其结构语义对目标类做出不同的贡献。在这项工作中,我们提出了结构语义读出(SSRead)来总结位置级别的节点表示,它允许为分类建模特定于位置的权重参数,以及有效地捕获与全局结构相关的图语义。给定一个输入图,SSRead的目标是通过使用节点和结构原型之间的语义对齐来识别结构上有意义的位置,结构原型对每个位置的原型特征进行编码。对结构原型进行优化以使所有训练图的对齐成本最小化,同时对其他GNN参数进行训练以预测类别标签。我们的实验结果表明,SSRead显著提高了GNN分类器的分类性能和可解释性,同时与各种聚合函数、GNN体系结构和学习框架兼容。 摘要:With the great success of deep learning in various domains, graph neural networks (GNNs) also become a dominant approach to graph classification. By the help of a global readout operation that simply aggregates all node (or node-cluster) representations, existing GNN classifiers obtain a graph-level representation of an input graph and predict its class label using the representation. However, such global aggregation does not consider the structural information of each node, which results in information loss on the global structure. Particularly, it limits the discrimination power by enforcing the same weight parameters of the classifier for all the node representations; in practice, each of them contributes to target classes differently depending on its structural semantic. In this work, we propose structural semantic readout (SSRead) to summarize the node representations at the position-level, which allows to model the position-specific weight parameters for classification as well as to effectively capture the graph semantic relevant to the global structure. Given an input graph, SSRead aims to identify structurally-meaningful positions by using the semantic alignment between its nodes and structural prototypes, which encode the prototypical features of each position. The structural prototypes are optimized to minimize the alignment cost for all training graphs, while the other GNN parameters are trained to predict the class labels. Our experimental results demonstrate that SSRead significantly improves the classification performance and interpretability of GNN classifiers while being compatible with a variety of aggregation functions, GNN architectures, and learning frameworks.
【4】 Graph Neural Networks with Parallel Neighborhood Aggregations for Graph Classification 标题:具有并行邻域聚集的图神经网络在图分类中的应用 链接:https://arxiv.org/abs/2111.11482
作者:Siddhant Doshi,Sundeep Prabhakar Chepuri 摘要:我们重点研究使用图神经网络(GNN)模型的图分类,该模型使用并行排列的邻域聚合图算子库预计算节点特征。由于预计算,这些GNN模型具有减少训练和推理时间的自然优势,但也从根本上不同于流行的GNN变体,后者在训练期间通过顺序邻域聚合过程更新节点特征。我们提供了理论条件,在此条件下,具有并行邻域聚合的一般GNN模型(简称PA GNN)在区分非同构图方面与著名的Weisfeiler-Lehman(WL)图同构测试一样强大。虽然PA-GNN模型与WL检验没有明显的关系,但我们证明了从这两种方法得到的图嵌入是内射相关的。然后,我们提出了一个特殊的PA-GNN模型,称为自旋模型,该模型遵循已发展的条件。我们通过数值实验证明,所开发的模型在许多不同的真实数据集上实现了最先进的性能,同时保持了WL测试的辨别能力和在训练过程之前预处理图形的计算优势。 摘要:We focus on graph classification using a graph neural network (GNN) model that precomputes the node features using a bank of neighborhood aggregation graph operators arranged in parallel. These GNN models have a natural advantage of reduced training and inference time due to the precomputations but are also fundamentally different from popular GNN variants that update node features through a sequential neighborhood aggregation procedure during training. We provide theoretical conditions under which a generic GNN model with parallel neighborhood aggregations (PA-GNNs, in short) are provably as powerful as the well-known Weisfeiler-Lehman (WL) graph isomorphism test in discriminating non-isomorphic graphs. Although PA-GNN models do not have an apparent relationship with the WL test, we show that the graph embeddings obtained from these two methods are injectively related. We then propose a specialized PA-GNN model, called SPIN, which obeys the developed conditions. We demonstrate via numerical experiments that the developed model achieves state-of-the-art performance on many diverse real-world datasets while maintaining the discriminative power of the WL test and the computational advantage of preprocessing graphs before the training process.
【5】 Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks 标题:基于层次依赖结构和图注意网络的源代码上下文语义精确学习 链接:https://arxiv.org/abs/2111.11435
作者:Zhehao Zhao,Bo Yang,Ge Li,Huai Liu,Zhi Jin 机构:Key Laboratory of High Confidence Software Technologies, Peking University, Beijing , China, School of Information Science and Technology, Beijing Forestry University, Beijing , China 备注:None 摘要:深度学习广泛应用于各种软件工程任务中,例如程序分类和缺陷预测。尽管该技术消除了特征工程所需的过程,但源代码模型的构建对这些任务的性能有显著影响。最近的工作主要集中在通过引入从CFG中提取的上下文依赖来补充基于AST的源代码模型。然而,它们都很少关注作为上下文依赖基础的基本块的表示。在本文中,我们集成了AST和CFG,并提出了一种新的嵌入层次依赖的源代码模型。在此基础上,我们还设计了一个依赖于图形注意机制的神经网络。具体来说,我们在源代码模型中引入了基本块的语法结构,即其对应的AST,以提供足够的信息并填补这一空白。我们在三个实际的软件工程任务中评估了该模型,并将其与其他最先进的方法进行了比较。结果表明,我们的模型可以显著提高性能。例如,与性能最佳的基线相比,我们的模型将参数规模减少了50%,并在程序分类任务的准确性上提高了4%。 摘要:Deep learning is being used extensively in a variety of software engineering tasks, e.g., program classification and defect prediction. Although the technique eliminates the required process of feature engineering, the construction of source code model significantly affects the performance on those tasks. Most recent works was mainly focused on complementing AST-based source code models by introducing contextual dependencies extracted from CFG. However, all of them pay little attention to the representation of basic blocks, which are the basis of contextual dependencies. In this paper, we integrated AST and CFG and proposed a novel source code model embedded with hierarchical dependencies. Based on that, we also designed a neural network that depends on the graph attention mechanism.Specifically, we introduced the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information and fill the gap. We have evaluated this model on three practical software engineering tasks and compared it with other state-of-the-art methods. The results show that our model can significantly improve the performance. For example, compared to the best performing baseline, our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
Transformer(2篇)
【1】 Sparse Fusion for Multimodal Transformers 标题:多模态Transformer的稀疏融合 链接:https://arxiv.org/abs/2111.11992
作者:Yi Ding,Alex Rich,Mason Wang,Noah Stier,Pradeep Sen,Matthew Turk,Tobias Höllerer 机构:Tobias H¨ollerer†, † University of California, Santa Barbara, ‡ Saratoga High School, § Toyota Technological Institute in Chicago 备注:11 pages, 4 figures, 5 tables 摘要:多模态分类是以人为中心的机器学习的核心任务。我们观察到,信息在各个模式之间具有高度互补性,因此,在多模式融合之前,单峰信息可以显著稀疏,而不会丢失准确性。为此,我们提出了稀疏融合Transformer(SFT),这是一种新的Transformer多模融合方法,其性能与现有最先进的方法相当,同时大大降低了内存占用和计算成本。我们想法的关键是一个稀疏池块,它在跨模态建模之前减少了单峰令牌集。在多个多模态基准数据集上对广泛的分类任务进行评估。在相似的实验条件下,在多个基准上获得了最先进的性能,同时报告计算成本和内存需求减少了六倍。广泛的消融研究显示了我们将稀疏化和多模式学习相结合的优势,而不是单纯的方法。这为在低资源设备上实现多模式学习铺平了道路。 摘要:Multimodal classification is a core task in human-centric machine learning. We observe that information is highly complementary across modalities, thus unimodal information can be drastically sparsified prior to multimodal fusion without loss of accuracy. To this end, we present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers that performs comparably to existing state-of-the-art methods while having greatly reduced memory footprint and computation cost. Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling. Evaluations are conducted on multiple multimodal benchmark datasets for a wide range of classification tasks. State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements. Extensive ablation studies showcase our benefits of combining sparsification and multimodal learning over naive approaches. This paves the way for enabling multimodal learning on low-resource devices.
【2】 DBIA: Data-free Backdoor Injection Attack against Transformer Networks 标题:DBIA:针对Transformer网络的无数据后门注入攻击 链接:https://arxiv.org/abs/2111.11870
作者:Peizhuo Lv,Hualong Ma,Jiachen Zhou,Ruigang Liang,Kai Chen,Shengzhi Zhang,Yunfei Yang 机构:: SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, : School of Cyber Security, University of Chinese Academy of Sciences, China, : Department of Computer Science, Metropolitan College, Boston University, USA 摘要:最近,transformer架构在自然语言处理(NLP)和计算机视觉(CV)任务中都显示了其重要性。尽管已知其他网络模型易受后门攻击的攻击,后门攻击在模型中嵌入触发器,并在出现触发器时控制模型行为,但对于此类攻击是否在Transformer模型上仍然有效以及如果有效,是否可以以更具成本效益的方式进行,我们知之甚少。在本文中,我们提出了DBIA,这是一种针对面向CV的transformer网络的新型无数据后门攻击,利用transformer固有的注意机制生成触发器,并使用中毒的代理数据集注入后门。我们在三个基准Transformer(即ViT、DeiT和SWNTransformer)的基础上,在两个主流图像分类任务(即CIFAR10和ImageNet)上进行了广泛的实验。评估结果表明,在消耗较少资源的情况下,我们的方法能够以较高的成功率嵌入后门,并且对受害者Transformer的性能影响较小。我们的代码可在https://anonymous.4open.science/r/DBIA-825D. 摘要:Recently, transformer architecture has demonstrated its significance in both Natural Language Processing (NLP) and Computer Vision (CV) tasks. Though other network models are known to be vulnerable to the backdoor attack, which embeds triggers in the model and controls the model behavior when the triggers are presented, little is known whether such an attack is still valid on the transformer models and if so, whether it can be done in a more cost-efficient manner. In this paper, we propose DBIA, a novel data-free backdoor attack against the CV-oriented transformer networks, leveraging the inherent attention mechanism of transformers to generate triggers and injecting the backdoor using the poisoned surrogate dataset. We conducted extensive experiments based on three benchmark transformers, i.e., ViT, DeiT and Swin Transformer, on two mainstream image classification tasks, i.e., CIFAR10 and ImageNet. The evaluation results demonstrate that, consuming fewer resources, our approach can embed backdoors with a high success rate and a low impact on the performance of the victim transformers. Our code is available at https://anonymous.4open.science/r/DBIA-825D.
GAN|对抗|攻击|生成相关(6篇)
【1】 Generating GPU Compiler Heuristics using Reinforcement Learning 标题:基于强化学习的GPU编译器启发式生成 链接:https://arxiv.org/abs/2111.12055
作者:Ian Colbert,Jake Daly,Norm Rubin 机构:Advanced Micro Devices, Inc. 摘要:GPU编译器是复杂的软件程序,针对目标硬件进行了许多优化。这些优化通常由编译器专家使用时间和资源密集型流程手工设计的启发式算法控制。在本文中,我们开发了一个GPU编译器自动调整框架,该框架使用非策略深度强化学习生成启发式,以提高图形应用程序的帧速率。此外,我们还通过分析它们在一年的代码签入过程中的稳定性(无需重新训练),展示了这些学习到的启发式算法对频繁的编译器更新的弹性。我们展示了我们基于机器学习的编译器自动调整框架能够匹配或超过98%的图形基准的帧速率,平均提升1.6%到15.8%。 摘要:GPU compilers are complex software programs with many optimizations specific to target hardware. These optimizations are often controlled by heuristics hand-designed by compiler experts using time- and resource-intensive processes. In this paper, we developed a GPU compiler autotuning framework that uses off-policy deep reinforcement learning to generate heuristics that improve the frame rates of graphics applications. Furthermore, we demonstrate the resilience of these learned heuristics to frequent compiler updates by analyzing their stability across a year of code check-ins without retraining. We show that our machine learning-based compiler autotuning framework matches or surpasses the frame rates for 98% of graphics benchmarks with an average uplift of 1.6% up to 15.8%.
【2】 Adversarial machine learning for protecting against online manipulation 标题:用于防止在线操纵的对抗性机器学习 链接:https://arxiv.org/abs/2111.12034
作者:Stefano Cresci,Marinella Petrocchi,Angelo Spognardi,Stefano Tognazzi 机构: Università di Roma‘La Sapienza’ 备注:To appear on IEEE Internet Computing. `Accepted manuscript' version 摘要:对抗性示例是机器学习系统的输入,导致该系统输出错误。通过这种类型的输入发起的攻击可能会造成严重后果:例如,在图像识别领域,停车信号可能会被误分类为限速指示。然而,敌对的例子也代表了不同领域和应用中一系列研究方向的燃料。在这里,我们概述了如何利用它们作为强大的工具,构建更强大的学习模型,更好地抵御攻击,以完成两项关键任务:假新闻和社交机器人检测。 摘要:Adversarial examples are inputs to a machine learning system that result in an incorrect output from that system. Attacks launched through this type of input can cause severe consequences: for example, in the field of image recognition, a stop signal can be misclassified as a speed limit indication.However, adversarial examples also represent the fuel for a flurry of research directions in different domains and applications. Here, we give an overview of how they can be profitably exploited as powerful tools to build stronger learning models, capable of better-withstanding attacks, for two crucial tasks: fake news and social bot detection.
【3】 Adversarial Sampling for Solving Differential Equations with Neural Networks 标题:用神经网络求解微分方程的对抗性抽样 链接:https://arxiv.org/abs/2111.12024
作者:Kshitij Parwani,Pavlos Protopapas 机构:Department of Mathematical Sciences, Indian Institute of Technology, Varanasi, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts , United States 摘要:基于神经网络的微分方程求解方法已经得到了广泛的应用。它们的工作原理是在每次迭代中对点样本改进神经网络的微分方程残差。然而,它们大多采用标准采样方案,如均匀或扰动等间距点。我们提出了一种新的采样方案,该方案对点进行不利采样,以最大化当前解估计的损失。描述了采样器体系结构以及用于训练的损失术语。最后,我们通过比较这两种方案在一些问题上的表现,证明了该方案优于已有的方案。 摘要:Neural network-based methods for solving differential equations have been gaining traction. They work by improving the differential equation residuals of a neural network on a sample of points in each iteration. However, most of them employ standard sampling schemes like uniform or perturbing equally spaced points. We present a novel sampling scheme which samples points adversarially to maximize the loss of the current solution estimate. A sampler architecture is described along with the loss terms used for training. Finally, we demonstrate that this scheme outperforms pre-existing schemes by comparing both on a number of problems.
【4】 Weakly-Supervised Cloud Detection with Fixed-Point GANs 标题:基于定点遗传算法的弱监督云检测 链接:https://arxiv.org/abs/2111.11879
作者:Joachim Nyborg,Ira Assent 机构:∗Department of Computer Science, Aarhus University, Denmark, †FieldSense AS, Aarhus, Denmark 备注:Accepted to the 3rd IEEE Workshop on Machine Learning for Big Data Analytics in Remote Sensing 摘要:卫星云图的检测是遥感大数据的重要预处理任务。卷积神经网络(CNN)极大地提高了卫星图像中云检测的最新水平,但现有的基于CNN的方法成本高昂,因为它们需要大量具有昂贵像素级云标签的训练图像。为了降低这一成本,我们提出了一种用于云检测的定点GAN(FCD),这是一种弱监督方法。只使用图像级别标签进行训练,我们学习清晰图像和模糊图像之间的定点转换,因此在转换过程中只影响云。这样做使我们的方法能够通过将卫星图像转换为清晰图像并为两幅图像之间的差异设置阈值来预测像素级云标签。此外,我们提出了FCD ,其中我们利用CNN的标签噪声鲁棒性来改进FCD的预测,从而导致进一步的改进。我们在Landsat-8生物群落云检测数据集上证明了我们方法的有效性,在该数据集上,我们获得的性能接近于使用昂贵像素级标签进行训练的现有完全监督方法。通过仅使用1%的可用像素级标签微调我们的FCD ,我们可以匹配完全监督方法的性能。 摘要:The detection of clouds in satellite images is an essential preprocessing task for big data in remote sensing. Convolutional neural networks (CNNs) have greatly advanced the state-of-the-art in the detection of clouds in satellite images, but existing CNN-based methods are costly as they require large amounts of training images with expensive pixel-level cloud labels. To alleviate this cost, we propose Fixed-Point GAN for Cloud Detection (FCD), a weakly-supervised approach. Training with only image-level labels, we learn fixed-point translation between clear and cloudy images, so only clouds are affected during translation. Doing so enables our approach to predict pixel-level cloud labels by translating satellite images to clear ones and setting a threshold to the difference between the two images. Moreover, we propose FCD , where we exploit the label-noise robustness of CNNs to refine the prediction of FCD, leading to further improvements. We demonstrate the effectiveness of our approach on the Landsat-8 Biome cloud detection dataset, where we obtain performance close to existing fully-supervised methods that train with expensive pixel-level labels. By fine-tuning our FCD with just 1% of the available pixel-level labels, we match the performance of fully-supervised methods.
【5】 Machine unlearning via GAN 标题:通过GAN实现机器遗忘 链接:https://arxiv.org/abs/2111.11869
作者:Kongyang Chen,Yao Huang,Yiwen Wang 摘要:机器学习模型,尤其是深度模型,可能会无意中记住有关其训练数据的信息。因此,恶意攻击者可以通过成员推断攻击或模型反转攻击来攻击模型,从而窃取有关训练数据的某些属性。一些法规,如欧盟的GDPR,已经颁布了“被遗忘的权利”,以保护用户的数据隐私,增强个人对其数据的主权。因此,从训练模型中删除训练数据信息已成为一个关键问题。在本文中,我们提出了一种基于GAN的算法来删除深层模型中的数据,与从头开始的重新训练相比,该算法显著提高了删除速度,尤其是在复杂场景中。我们在五个常用数据集上进行了实验,实验结果表明了我们方法的有效性。 摘要:Machine learning models, especially deep models, may unintentionally remember information about their training data. Malicious attackers can thus pilfer some property about training data by attacking the model via membership inference attack or model inversion attack. Some regulations, such as the EU's GDPR, have enacted "The Right to Be Forgotten" to protect users' data privacy, enhancing individuals' sovereignty over their data. Therefore, removing training data information from a trained model has become a critical issue. In this paper, we present a GAN-based algorithm to delete data in deep models, which significantly improves deleting speed compared to retraining from scratch, especially in complicated scenarios. We have experimented on five commonly used datasets, and the experimental results show the efficiency of our method.
【6】 A Comparison of State-of-the-Art Techniques for Generating Adversarial Malware Binaries 标题:生成敌意恶意软件二进制文件的最新技术比较 链接:https://arxiv.org/abs/2111.11487
作者:Prithviraj Dasgupta,Zachariah Osman 机构:U. S. Naval Research Laboratory, Washington, D. C. 备注:18 pages, 7 figures; summer project report from NREIP internship at Naval Research Laboratory 摘要:我们考虑的问题产生对手恶意软件的网络攻击者,攻击者的任务是战略修改现有的二进制恶意软件文件中的某些字节,使修改后的文件能够躲避恶意软件检测器,如基于机器学习的恶意软件分类器。我们使用从单个公开可用恶意软件数据集中提取的二进制恶意软件样本评估了三种最新的对抗性恶意软件生成技术,并比较了它们规避基于机器学习的恶意软件分类器MalConv的性能。我们的结果表明,在比较的技术中,最有效的技术是策略性地修改二进制文件头中的字节的技术。最后,我们讨论了对抗性恶意软件生成的经验教训和未来研究方向。 摘要:We consider the problem of generating adversarial malware by a cyber-attacker where the attacker's task is to strategically modify certain bytes within existing binary malware files, so that the modified files are able to evade a malware detector such as machine learning-based malware classifier. We have evaluated three recent adversarial malware generation techniques using binary malware samples drawn from a single, publicly available malware data set and compared their performances for evading a machine-learning based malware classifier called MalConv. Our results show that among the compared techniques, the most effective technique is the one that strategically modifies bytes in a binary's header. We conclude by discussing the lessons learned and future research directions on the topic of adversarial malware generation.
半/弱/无/有监督|不确定性|主动学习(8篇)
【1】 DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning 标题:DABS:一种与领域无关的自我监督学习基准 链接:https://arxiv.org/abs/2111.12062
作者:Alex Tamkin,Vincent Liu,Rongfei Lu,Daniel Fein,Colin Schultz,Noah Goodman 机构:Stanford University 备注:NeurIPS 2021, Datasets & Benchmarks Track 摘要:自监督学习算法,包括BERT和SimCLR,在自然语言处理、计算机视觉和语音处理等领域取得了重大进展。然而,这些算法是特定于领域的,这意味着必须为每个新设置开发新的自监督学习算法,包括无数医疗、科学和多模式领域。为了促进领域不可知方法的发展,我们引入了DABS:一种用于自监督学习的领域不可知基准。为了在DABS上表现良好,我们在七个不同的领域对算法进行了评估:自然图像、多通道传感器数据、英语文本、语音记录、多语言文本、胸部x射线和带有文本描述的图像。每个域包含一个用于预训练的未标记数据集;然后根据模型在域中一组标记任务上的下游性能对模型进行评分。我们还介绍了e-Mix和ShED:两种基线域不可知算法;他们相对温和的表现表明,在自监督学习成为任意领域的现成解决方案之前,需要取得重大进展。有关基准数据集和基准算法的代码,请访问https://github.com/alextamkin/dabs. 摘要:Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that significant progress is needed before self-supervised learning is an out-of-the-box solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.
【2】 Uncertainty estimation under model misspecification in neural network regression 标题:神经网络回归中模型误指定下的不确定性估计 链接:https://arxiv.org/abs/2111.11763
作者:Maria R. Cervera,Rafael Dätwyler,Francesco D'Angelo,Hamza Keurti,Benjamin F. Grewe,Christian Henning 机构:Equal Contribution, Institute of Neuroinformatics, University of Zürich and ETH Zürich, Zürich, Switzerland, Max Planck ETH Center for Learning Systems, Institute of Theoretical Computer Science, ETH Zürich, Zürich, Switzerland 备注:Published at the NeurIPS 2021 workshop "Your Model Is Wrong: Robustness and Misspecification in Probabilistic Modeling" 摘要:尽管神经网络是功能强大的函数逼近器,但基本的建模假设最终定义了可能性,从而定义了它们参数化的假设类别。在分类中,这些假设是最小的,因为常用的softmax能够表示任何分类分布。然而,在回归中,通常会对要实现的连续分布类型进行限制性假设,如通过均方误差进行训练的主要选择及其潜在的高斯假设。最近,建模技术的进步使得对要建模的连续分布类型不可知,从而使回归具有分类模型的灵活性。虽然过去的研究强调这种灵活的回归模型在性能方面的好处,但我们在此研究模型选择对不确定性估计的影响。我们强调,在模型错误指定的情况下,任意不确定性没有被正确捕获,并且错误指定模型的贝叶斯处理导致不可靠的认知不确定性估计。总的来说,我们的研究概述了回归中的建模选择如何影响不确定性估计,从而影响任何下游决策过程。 摘要:Although neural networks are powerful function approximators, the underlying modelling assumptions ultimately define the likelihood and thus the hypothesis class they are parameterizing. In classification, these assumptions are minimal as the commonly employed softmax is capable of representing any categorical distribution. In regression, however, restrictive assumptions on the type of continuous distribution to be realized are typically placed, like the dominant choice of training via mean-squared error and its underlying Gaussianity assumption. Recently, modelling advances allow to be agnostic to the type of continuous distribution to be modelled, granting regression the flexibility of classification models. While past studies stress the benefit of such flexible regression models in terms of performance, here we study the effect of the model choice on uncertainty estimation. We highlight that under model misspecification, aleatoric uncertainty is not properly captured, and that a Bayesian treatment of a misspecified model leads to unreliable epistemic uncertainty estimates. Overall, our study provides an overview on how modelling choices in regression may influence uncertainty estimation and thus any downstream decision making process.
【3】 CoDiM: Learning with Noisy Labels via Contrastive Semi-Supervised Learning 标题:CoDiM:基于对比半监督学习的带噪声标签学习 链接:https://arxiv.org/abs/2111.11652
作者:Xin Zhang,Zixuan Liu,Kaiwen Xiao,Tian Shen,Junzhou Huang,Wei Yang,Dimitris Samaras,Xiao Han 机构: Stony Brook University, USA, University of Washington, USA, Tencent AI Lab, China 备注:19 Pages, 9 figures, conference paper 摘要:标签价格昂贵,有时不可靠。噪声标签学习、半监督学习和对比学习是设计需要较少注释成本的学习过程的三种不同策略。最近,半监督学习和对比学习已经被证明可以改进处理带有噪声标签的数据集的学习策略。尽管如此,这些领域之间的内在联系以及将它们的优势结合在一起的潜力才刚刚开始显现。在本文中,我们将探讨进一步融合它们的方法和优势。具体来说,我们提出了一种统一的对比半监督学习算法CSSL和一种新的带噪标签学习算法CoDiM。CSSL利用了经典的半监督学习和对比学习技术的力量,并进一步适用于CoDiM,CoDiM能够从多种类型和级别的标签噪声中进行稳健学习。我们表明,CoDiM带来了一致的改进,并在多个基准上实现了最先进的结果。 摘要:Labels are costly and sometimes unreliable. Noisy label learning, semi-supervised learning, and contrastive learning are three different strategies for designing learning processes requiring less annotation cost. Semi-supervised learning and contrastive learning have been recently demonstrated to improve learning strategies that address datasets with noisy labels. Still, the inner connections between these fields as well as the potential to combine their strengths together have only started to emerge. In this paper, we explore further ways and advantages to fuse them. Specifically, we propose CSSL, a unified Contrastive Semi-Supervised Learning algorithm, and CoDiM (Contrastive DivideMix), a novel algorithm for learning with noisy labels. CSSL leverages the power of classical semi-supervised learning and contrastive learning technologies and is further adapted to CoDiM, which learns robustly from multiple types and levels of label noise. We show that CoDiM brings consistent improvements and achieves state-of-the-art results on multiple benchmarks.
【4】 Semi-Supervised Learning with Taxonomic Labels 标题:带有分类标签的半监督学习 链接:https://arxiv.org/abs/2111.11595
作者:Jong-Chyi Su,Subhransu Maji 机构:University of Massachusetts Amherst, Amherst, MA, USA 备注:BMVC 2021 摘要:我们提出了在细粒度域中结合粗分类标签来训练图像分类器的技术。对于细粒度的领域,例如根据生物分类法组织类别的自然世界,通常可以通过较小的努力获得此类标签。在由三国810个物种组成的半iNat数据集上,在使用ImageNet预训练模型的转移学习设置中,加入门标签可将物种级分类精度提高6%。将分层标签结构与称为FixMatch的最先进的半监督学习算法相结合,可将性能进一步提高1.3%。如果提供了详细的标签,如类或订单,或者从头开始训练模型,相对收益会更大。然而,我们发现大多数方法对于来自新类的域外数据的存在并不鲁棒。我们提出了一种在层次结构指导下从大量未标记图像中选择相关数据的技术,提高了鲁棒性。总的来说,我们的实验表明,粗分类标签的半监督学习对于细粒度领域的分类器训练是实用的。 摘要:We propose techniques to incorporate coarse taxonomic labels to train image classifiers in fine-grained domains. Such labels can often be obtained with a smaller effort for fine-grained domains such as the natural world where categories are organized according to a biological taxonomy. On the Semi-iNat dataset consisting of 810 species across three Kingdoms, incorporating Phylum labels improves the Species level classification accuracy by 6% in a transfer learning setting using ImageNet pre-trained models. Incorporating the hierarchical label structure with a state-of-the-art semi-supervised learning algorithm called FixMatch improves the performance further by 1.3%. The relative gains are larger when detailed labels such as Class or Order are provided, or when models are trained from scratch. However, we find that most methods are not robust to the presence of out-of-domain data from novel classes. We propose a technique to select relevant data from a large collection of unlabeled images guided by the hierarchy which improves the robustness. Overall, our experiments show that semi-supervised learning with coarse taxonomic labels are practical for training classifiers in fine-grained domains.
【5】 RIO: Rotation-equivariance supervised learning of robust inertial odometry 链接:https://arxiv.org/abs/2111.11676
作者:Caifa Zhou,Xiya Cao,Dandan Zeng,Yongliang Wang 机构:Riemann lab, Laboratories, Huawei Technologies Co. Ltd 备注:12 pages, 17 figures, 2 tables 摘要:本文引入旋转等变作为惯性里程计模型训练的自监督器。我们证明了自监督方案在训练阶段和推理阶段都提供了强大的监督信号。它减少了训练鲁棒模型对大量标记数据的依赖,并使使用各种未标记数据更新模型成为可能。此外,我们提出了基于不确定性估计的自适应测试时间训练(TTT),以增强惯性里程计对各种未知数据的通用性。实验表明,用30%数据训练的旋转等方差监督Inertial Odometry(RIO)实现了与整个数据库训练的模型的PAR性能。Adaptive TTT在所有情况下都提高了模型的性能,并在几种情况下提高了25%以上。 摘要:This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Further, we propose adaptive Test-Time Training (TTT) based on uncertainty estimations in order to enhance the generalizability of the inertial odometry to various unseen data. We show in experiments that the Rotation-equivariance-supervised Inertial Odometry (RIO) trained with 30% data achieves on par performance with a model trained with the whole database. Adaptive TTT improves models performance in all cases and makes more than 25% improvements under several scenarios.
【6】 Weight Pruning and Uncertainty in Radio Galaxy Classification 链接:https://arxiv.org/abs/2111.11654
作者:Devina Mohan,Anna Scaife 机构:Department of Physics & Astronomy, University of Manchester, UK 备注:Accepted in: Fourth Workshop on Machine Learning and the Physical Sciences (35th Conference on Neural Information Processing Systems; NeurIPS2021); final version 摘要:在这项工作中,我们使用变分推理来量化射电星系分类模型预测中的认知不确定性程度,并表明单个测试样本的模型后验方差水平与标记射电星系时的人类不确定性相关。我们探讨了各种不同权重先验的模型性能和不确定度校准,并建议稀疏先验产生更精确校准的不确定度估计。使用单个权重的后验分布,我们表明信噪比(SNR)排序允许将完全连接的层修剪到30%的水平,而不会显著降低性能,并且这种修剪增加了模型中的预测不确定性。最后,我们证明了,和这一领域的其他工作一样,我们经历了一个寒冷的后效应。我们检验了在我们的模型中调整成本函数以适应模型错误指定是否可以补偿这种影响,但发现这并没有显著的差异。我们还检查了原则性数据扩充的效果,发现它在基线上有所改善,但不能完全补偿观察到的效果。我们将其解释为冷后效应,这是由于我们的训练样本的过度有效管理导致了可能性错误指定,并将其作为未来射电星系分类的贝叶斯深度学习方法的一个潜在问题提出。 摘要:In this work we use variational inference to quantify the degree of epistemic uncertainty in model predictions of radio galaxy classification and show that the level of model posterior variance for individual test samples is correlated with human uncertainty when labelling radio galaxies. We explore the model performance and uncertainty calibration for a variety of different weight priors and suggest that a sparse prior produces more well-calibrated uncertainty estimates. Using the posterior distributions for individual weights, we show that signal-to-noise ratio (SNR) ranking allows pruning of the fully-connected layers to the level of 30% without significant loss of performance, and that this pruning increases the predictive uncertainty in the model. Finally we show that, like other work in this field, we experience a cold posterior effect. We examine whether adapting the cost function in our model to accommodate model misspecification can compensate for this effect, but find that it does not make a significant difference. We also examine the effect of principled data augmentation and find that it improves upon the baseline but does not compensate for the observed effect fully. We interpret this as the cold posterior effect being due to the overly effective curation of our training sample leading to likelihood misspecification, and raise this as a potential issue for Bayesian deep learning approaches to radio galaxy classification in future.
【7】 Predicting Osteoarthritis Progression in Radiographs via Unsupervised Representation Learning 标题:基于无监督表征学习的X线骨性关节炎进展预测 链接:https://arxiv.org/abs/2111.11439
作者:Tianyu Han,Jakob Nikolas Kather,Federico Pedersoli,Markus Zimmermann,Sebastian Keil,Maximilian Schulze-Hagen,Marc Terwoelbeck,Peter Isfort,Christoph Haarburger,Fabian Kiessling,Volkmar Schulz,Christiane Kuhl,Sven Nebelung,Daniel Truhn 机构:Physics of Molecular Imaging Systems, Experimental Molecular Imaging, RWTH Aachen University, Germany, Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany 摘要:骨关节炎(OA)是最常见的关节疾病,影响全球大部分人口,主要是老年人。尽管其个体和社会经济负担,OA的发病和进展仍然无法可靠预测。为了填补这一诊断空白,我们引入了一种基于生成模型的无监督学习方案,以预测基于膝关节X线照片的骨性关节炎的未来发展。利用骨关节炎研究的纵向数据,我们探索潜在的时间轨迹,以预测患者在八年随访期间的未来X线照片。我们的模型预测了骨性关节炎进展的风险,并超过了由七位经验丰富的放射科医生提供的监督模型。在模型的支持下,敏感性、特异性、阳性预测值和阴性预测值分别从42.1%显著增加到51.6%,从72.3%增加到88.6%,从28.4%增加到57.6%,从83.9%增加到88.4%,而在没有这种支持的情况下,放射科医生的表现仅略好于随机猜测。我们的预测模型改善了对OA发病和进展的预测,尽管在训练阶段不需要人工注释。 摘要:Osteoarthritis (OA) is the most common joint disorder affecting substantial proportions of the global population, primarily the elderly. Despite its individual and socioeconomic burden, the onset and progression of OA can still not be reliably predicted. Aiming to fill this diagnostic gap, we introduce an unsupervised learning scheme based on generative models to predict the future development of OA based on knee joint radiographs. Using longitudinal data from osteoarthritis studies, we explore the latent temporal trajectory to predict a patient's future radiographs up to the eight-year follow-up visit. Our model predicts the risk of progression towards OA and surpasses its supervised counterpart whose input was provided by seven experienced radiologists. With the support of the model, sensitivity, specificity, positive predictive value, and negative predictive value increased significantly from 42.1% to 51.6%, from 72.3% to 88.6%, from 28.4% to 57.6%, and from 83.9% to 88.4%, respectively, while without such support, radiologists performed only slightly better than random guessing. Our predictive model improves predictions on OA onset and progression, despite requiring no human annotation in the training phase.
【8】 Fink: early supernovae Ia classification using active learning 标题:芬克:使用主动学习的早期超新星Ia分类 链接:https://arxiv.org/abs/2111.11438
作者:Marco Leoni,Emille E. O. Ishida,Julien Peloton,Anais Möller 机构: 3 1 Université Paris-Saclay, Université Clermont Auvergne, Swinburne University of Technology 备注:8 pages, 7 figures - submitted to Astronomy and Astrophysics. Comments are welcome 摘要:我们描述了Fink broker早期超新星Ia分类器如何通过采用主动学习(AL)策略优化其ML分类。我们证明了在当前Zwicky瞬态设施(ZTF)公共警报数据流中实施此类策略的可行性。我们比较了不确定性抽样和随机抽样两种AL策略的性能。我们的流程包括3个阶段:特征提取、分类和学习策略。从10个警报(5个SN Ia和5个非Ia)的初始样本开始,我们让算法确定应该将哪个警报添加到训练样本中。该系统允许通过300次迭代进行发展。我们的数据集包括来自ZTF的23840个警报,通过与SIMBAD数据库和瞬态名称服务器(TNS)的交叉匹配确认分类,其中1600个是SNe Ia(1021个唯一对象)。学习周期完成后,数据配置包括310个训练警报和23530个测试警报。平均超过100次实现,分类器实现了89%的纯度和54%的效率。从2020年11月1日至2021年10月31日,Fink将其早期超新星Ia模块应用于ZTF流,并向TNS传达了有希望的SN Ia候选者。从535名光谱分类的候选者中,459名(86%)被证明是SNe Ia。我们的结果证实了主动学习策略对于指导天文分类器最佳训练样本构建的有效性。它在实际数据中表明,学习算法的性能可以在不需要额外计算资源或大量训练样本的情况下得到极大提高。据我们所知,这是AL首次应用于真实警报数据。 摘要:We describe how the Fink broker early supernova Ia classifier optimizes its ML classifications by employing an active learning (AL) strategy. We demonstrate the feasibility of implementation of such strategies in the current Zwicky Transient Facility (ZTF) public alert data stream. We compare the performance of two AL strategies: uncertainty sampling and random sampling. Our pipeline consists of 3 stages: feature extraction, classification and learning strategy. Starting from an initial sample of 10 alerts (5 SN Ia and 5 non-Ia), we let the algorithm identify which alert should be added to the training sample. The system is allowed to evolve through 300 iterations. Our data set consists of 23 840 alerts from the ZTF with confirmed classification via cross-match with SIMBAD database and the Transient name server (TNS), 1 600 of which were SNe Ia (1 021 unique objects). The data configuration, after the learning cycle was completed, consists of 310 alerts for training and 23 530 for testing. Averaging over 100 realizations, the classifier achieved 89% purity and 54% efficiency. From 01/November/2020 to 31/October/2021 Fink has applied its early supernova Ia module to the ZTF stream and communicated promising SN Ia candidates to the TNS. From the 535 spectroscopically classified Fink candidates, 459 (86%) were proven to be SNe Ia. Our results confirm the effectiveness of active learning strategies for guiding the construction of optimal training samples for astronomical classifiers. It demonstrates in real data that the performance of learning algorithms can be highly improved without the need of extra computational resources or overwhelmingly large training samples. This is, to our knowledge, the first application of AL to real alerts data.
迁移|Zero/Few/One-Shot|自适应(6篇)
【1】 Minimizing subject-dependent calibration for BCI with Riemannian transfer learning 标题:利用黎曼转移学习最小化BCI的主题相关校准 链接:https://arxiv.org/abs/2111.12071
作者:Salim Khazem,Sylvain Chevallier,Quentin Barthélemy,Karim Haroun,Camille Noûs 机构:Universit´e de Versailles Saint-Quentin, University Paris-Saclay 备注:4 pages, 2 figures, 1 table, Conference on Neural Engineering 摘要:校准仍然是脑机接口(BCI)中用户体验的一个重要问题。常见的实验设计通常需要较长的训练时间,这会导致认知疲劳,甚至在开始使用脑机接口之前。依靠先进的机器学习技术,如转移学习,可以减少或抑制这种依赖于主题的校准。在黎曼BCI的基础上,我们提出了一种简单有效的方案来训练来自不同主题的数据的分类器,以减少校准,同时保持良好的性能。本文的主要创新之处在于提出了一种独特的方法,可以应用于非常不同的范式。为了证明这种方法的稳健性,我们对三种脑机接口范式的多个数据集进行了荟萃分析:事件相关电位(P300)、运动想象和SSVEP。依靠MOABB开源框架确保实验和统计分析的可重复性,结果清楚地表明,所提出的方法可以应用于任何类型的BCI范式,并且在大多数情况下可以显著提高分类器的可靠性。我们指出了进一步改进迁移学习方法的一些关键特征。 摘要:Calibration is still an important issue for user experience in Brain-Computer Interfaces (BCI). Common experimental designs often involve a lengthy training period that raises the cognitive fatigue, before even starting to use the BCI. Reducing or suppressing this subject-dependent calibration is possible by relying on advanced machine learning techniques, such as transfer learning. Building on Riemannian BCI, we present a simple and effective scheme to train a classifier on data recorded from different subjects, to reduce the calibration while preserving good performances. The main novelty of this paper is to propose a unique approach that could be applied on very different paradigms. To demonstrate the robustness of this approach, we conducted a meta-analysis on multiple datasets for three BCI paradigms: event-related potentials (P300), motor imagery and SSVEP. Relying on the MOABB open source framework to ensure the reproducibility of the experiments and the statistical analysis, the results clearly show that the proposed approach could be applied on any kind of BCI paradigm and in most of the cases to significantly improve the classifier reliability. We point out some key features to further improve transfer learning methods.
【2】 Adaptive Multi-Goal Exploration 标题:自适应多目标探索 链接:https://arxiv.org/abs/2111.12045
作者:Jean Tarbouriech,Omar Darwiche Domingues,Pierre Ménard,Matteo Pirotta,Michal Valko,Alessandro Lazaric 机构: Scool team§Otto von Guericke University Magdeburg¶DeepMind 1arXiv 摘要:我们介绍了一种可证明有效的多目标探索的通用策略。它依赖于AdaGoal,一种基于简单约束优化问题的新型目标选择方案,该方案根据agent的当前知识自适应地针对既不太难也不太容易达到的目标状态。我们展示了如何使用AdaGoal来处理学习$epsilon$最优目标条件策略的目标,该策略适用于在无报酬马尔可夫决策过程中从参考状态$s_0$在$L$步内可达到的所有目标状态。在带有$S$状态和$A$操作的表格情况下,我们的算法需要$tilde{O}(L^3 S Aepsilon^{-2})$探索步骤,这几乎是最小最大最优的。我们也很容易在线性混合马尔可夫决策过程中实例化AdaGoal,这产生了第一个线性函数近似的面向目标的PAC保证。除了强大的理论保证之外,AdaGoal还植根于现有目标条件下深度强化学习方法的高级算法结构中。 摘要:We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that is based on a simple constrained optimization problem, which adaptively targets goal states that are neither too difficult nor too easy to reach according to the agent's current knowledge. We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy for all the goal states that are reachable within $L$ steps in expectation from a reference state $s_0$ in a reward-free Markov decision process. In the tabular case with $S$ states and $A$ actions, our algorithm requires $tilde{O}(L^3 S A epsilon^{-2})$ exploration steps, which is nearly minimax optimal. We also readily instantiate AdaGoal in linear mixture Markov decision processes, which yields the first goal-oriented PAC guarantee with linear function approximation. Beyond its strong theoretical guarantees, AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
【3】 Predicting High-Flow Nasal Cannula Failure in an ICU Using a Recurrent Neural Network with Transfer Learning and Input Data Perseveration: A Retrospective Analysis 标题:基于转移学习和输入数据保存的递归神经网络预测ICU高流量鼻插管失效的回顾性分析 链接:https://arxiv.org/abs/2111.11846
作者:George A. Pappy,Melissa D. Aczon,Randall C. Wetzel,David R. Ledbetter 机构:Virtual Pediatric Intensive Care Unit, Children’s Hospital Los Angeles, Los Angeles, CA 备注:13 pages, 7 figures, 6 tables 摘要:高流量鼻插管(HFNC)为危重患儿提供无创呼吸支持,与其他无创(NIV)技术相比,HFNC更容易耐受。及时预测HFNC失败可提供增加呼吸支持的指示。这项工作开发和比较了机器学习模型来预测HFNC故障。采用EMR对2010年1月至2020年2月期间入住三级儿科ICU的患者进行回顾性研究。训练长短时记忆(LSTM)模型,以生成HFNC故障的连续预测。在HFNC启动后的不同时间,使用接收器工作曲线下的面积(AUROC)评估性能。还评估了HFNC启动后2小时预测的敏感性、特异性、阳性和阴性预测值(PPV、NPV)。这些指标也在主要为呼吸系统诊断的队列中计算。834项HFNC试验[455项训练,173项验证,206项测试]符合纳入标准,其中175项[103,30,42](21.0%)升级为NIV或插管。通过迁移学习训练的LSTM模型通常比LR模型表现更好,最佳LSTM模型在启动后两小时的AUROC为0.78,LR为0.66。使用EMR数据训练的机器学习模型能够在启动后24小时内识别HFNC失败的风险儿童。与LR和标准LSTM模型相比,包含迁移学习、输入数据保持和整合的LSTM模型表现出更好的性能。 摘要:High Flow Nasal Cannula (HFNC) provides non-invasive respiratory support for critically ill children who may tolerate it more readily than other Non-Invasive (NIV) techniques. Timely prediction of HFNC failure can provide an indication for increasing respiratory support. This work developed and compared machine learning models to predict HFNC failure. A retrospective study was conducted using EMR of patients admitted to a tertiary pediatric ICU from January 2010 to February 2020. A Long Short-Term Memory (LSTM) model was trained to generate a continuous prediction of HFNC failure. Performance was assessed using the area under the receiver operating curve (AUROC) at various times following HFNC initiation. The sensitivity, specificity, positive and negative predictive values (PPV, NPV) of predictions at two hours after HFNC initiation were also evaluated. These metrics were also computed in a cohort with primarily respiratory diagnoses. 834 HFNC trials [455 training, 173 validation, 206 test] met the inclusion criteria, of which 175 [103, 30, 42] (21.0%) escalated to NIV or intubation. The LSTM models trained with transfer learning generally performed better than the LR models, with the best LSTM model achieving an AUROC of 0.78, vs 0.66 for the LR, two hours after initiation. Machine learning models trained using EMR data were able to identify children at risk for failing HFNC within 24 hours of initiation. LSTM models that incorporated transfer learning, input data perseveration and ensembling showed improved performance than the LR and standard LSTM models.
【4】 Dynamic Regret for Strongly Adaptive Methods and Optimality of Online KRR 链接:https://arxiv.org/abs/2111.11550
作者:Dheeraj Baby,Hilaf Hasson,Yuyang Wang 机构:University of California, Santa Barbara, Amazon Research 摘要:我们考虑非平稳在线凸优化的框架,其中学习者试图控制其动态后悔反对任意序列的比较器。当损失函数是强凸函数或exp-凹函数时,我们证明了强自适应(SA)算法可以被视为根据比较器序列的路径变化$V_T$控制动态遗憾的一种原则性方法。具体地说,我们证明了SA算法在不预先知道$V_T$的情况下,分别享有$tilde O(sqrt{TV_T}veelog T)$和$tilde O(sqrt{dTV_T}vee dlog T)$强凸和exp凹损失的动态后悔。基于有界线性预测和高斯核在线回归的学习设置的新结果进一步证明了该原理方法的通用性。在相关背景下,本文的第二部分讨论了Zhdanov和Kalnishkan(2010)提出的一个开放性问题,该问题涉及具有平方误差损失的在线内核回归。我们推导了一个新的关于惩罚遗憾的下界,它建立了在线核岭回归(KRR)的近极小极大最优性。我们的下界可视为Vovk(2001)中导出的有限维在线线性回归下界的RKHS扩展。 摘要:We consider the framework of non-stationary Online Convex Optimization where a learner seeks to control its dynamic regret against an arbitrary sequence of comparators. When the loss functions are strongly convex or exp-concave, we demonstrate that Strongly Adaptive (SA) algorithms can be viewed as a principled way of controlling dynamic regret in terms of path variation $V_T$ of the comparator sequence. Specifically, we show that SA algorithms enjoy $tilde O(sqrt{TV_T} vee log T)$ and $tilde O(sqrt{dTV_T} vee dlog T)$ dynamic regret for strongly convex and exp-concave losses respectively without apriori knowledge of $V_T$. The versatility of the principled approach is further demonstrated by the novel results in the setting of learning against bounded linear predictors and online regression with Gaussian kernels. Under a related setting, the second component of the paper addresses an open question posed by Zhdanov and Kalnishkan (2010) that concerns online kernel regression with squared error losses. We derive a new lower bound on a certain penalized regret which establishes the near minimax optimality of online Kernel Ridge Regression (KRR). Our lower bound can be viewed as an RKHS extension to the lower bound derived in Vovk (2001) for online linear regression in finite dimensions.
【5】 Component Transfer Learning for Deep RL Based on Abstract Representations 标题:基于抽象表示的深层RL构件迁移学习 链接:https://arxiv.org/abs/2111.11525
作者:Geoffrey van Driessel,Vincent Francois-Lavet 机构:VU Amsterdam 备注:Workshop paper NeurIPS 2021 摘要:在这项工作中,我们研究了一种特定的迁移学习方法,用于在两个任务之间的内部动力学相同但视觉表征不同的情况下进行深度强化学习。我们学习环境的低维编码,旨在捕获总结抽象,从中学习内部动态和价值函数。然后通过冻结学习到的内部动力学和值函数来获得传递,从而重用共享的低维嵌入空间。当重新训练编码器进行传输时,我们进行了几项观察:(i)在某些情况下,存在局部极小值,损失很小,但嵌入空间不匹配,导致任务性能差;(ii)在没有局部极小值的情况下,编码器的输出在我们的实验中收敛到相同的嵌入空间,与从头开始学习相比,这将导致快速有效的迁移。局部极小值是由冻结模型导致的优化过程自由度降低引起的。我们还发现,传输性能严重依赖于基础模型;一些基本模型通常会导致成功的传输,而其他基本模型通常会导致失败的传输。 摘要:In this work we investigate a specific transfer learning approach for deep reinforcement learning in the context where the internal dynamics between two tasks are the same but the visual representations differ. We learn a low-dimensional encoding of the environment, meant to capture summarizing abstractions, from which the internal dynamics and value functions are learned. Transfer is then obtained by freezing the learned internal dynamics and value functions, thus reusing the shared low-dimensional embedding space. When retraining the encoder for transfer, we make several observations: (i) in some cases, there are local minima that have small losses but a mismatching embedding space, resulting in poor task performance and (ii) in the absence of local minima, the output of the encoder converges in our experiments to the same embedding space, which leads to a fast and efficient transfer as compared to learning from scratch. The local minima are caused by the reduced degree of freedom of the optimization process caused by the frozen models. We also find that the transfer performance is heavily reliant on the base model; some base models often result in a successful transfer, whereas other base models often result in a failing transfer.
【6】 Zero-Shot Open-Book Question Answering 标题:零杆开卷问答 链接:https://arxiv.org/abs/2111.11520
作者:Sia Gholami,Mehdi Noori 机构:Amazon Web Services, CA USA 摘要:开放式图书问答是问答任务的一个子集,系统的目标是在给定的一组文档(开放式图书)中查找答案和有关主题的公共知识。本文提出了一种从AmazonWebServices(AWS)技术文档语料库中回答自然语言问题的解决方案,其中没有特定领域的标记数据(zero-shot)。这些问题可以有是非非答案、简短答案、长篇答案或以上任意组合。此解决方案包括两步架构,其中检索器查找正确的文档,提取器在检索到的文档中查找答案。我们将根据AWS技术文档中的实际客户问题,为开卷QA引入一个新的测试数据集。在对几种基于抽取语言模型的信息检索系统和抽取器模型进行实验后,该解决方案试图在同一过程中找到是非非答案和文本答案。该模型在斯坦福问答数据集-团队(Rajpurkar等人,2016年)和自然问题(Kwiatkowski等人,2019年)数据集上进行训练。我们能够在没有特定领域训练的情况下实现49%的F1和39%的端到端精确比赛分数(EM)。 摘要:Open book question answering is a subset of question answering tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions can have yes-no-none answers, short answers, long answers, or any combination of the above. This solution comprises a two-step architecture in which a retriever finds the right document and an extractor finds the answers in the retrieved document. We are introducing a new test dataset for open-book QA based on real customer questions on AWS technical documentation. After experimenting with several information retrieval systems and extractor models based on extractive language models, the solution attempts to find the yes-no-none answers and text answers in the same pass. The model is trained on the The Stanford Question Answering Dataset - SQuAD (Rajpurkaret al., 2016) and Natural Questions (Kwiatkowski et al., 2019) datasets. We were able to achieve 49% F1 and 39% exact match score (EM) end-to-end with no domain-specific training.
强化学习(3篇)
【1】 Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks 标题:无线蜂窝网络中的语义感知协同深度强化学习 链接:https://arxiv.org/abs/2111.12064
作者:Fatemeh Lotfi,Omid Semiari,Walid Saad 机构: Semiari are with Department of Electrical andComputer Engineering, University of Colorado 摘要:协作式深度强化学习(CDRL)算法是一种多智能体通过无线网络进行协作的方法,它可以使未来的智能自主系统在复杂的动态环境中依靠实时决策。然而,在实际场景中,由于代理及其学习任务的异构性、不同的环境、学习的时间限制以及无线网络的资源限制,CDRL面临许多挑战。为了应对这些挑战,本文提出了一种新的语义感知CDRL方法,使一组具有语义链接DRL任务的异构未经训练的代理能够在资源受限的无线蜂窝网络中高效协作。为此,提出了一种新的异构联邦DRL(HFDRL)算法来选择语义相关DRL代理的最佳子集进行协作。然后,所提出的方法联合优化协作选择代理的训练损失和无线带宽分配,以便在其实时任务的时间限制内训练每个代理。仿真结果表明,与最先进的基线相比,该算法具有优越的性能。 摘要:Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach to enable future intelligent and autonomous systems that rely on real-time decision-making in complex dynamic environments. Nonetheless, in practical scenarios, CDRL faces many challenges due to the heterogeneity of agents and their learning tasks, different environments, time constraints of the learning, and resource limitations of wireless networks. To address these challenges, in this paper, a novel semantic-aware CDRL method is proposed to enable a group of heterogeneous untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network. To this end, a new heterogeneous federated DRL (HFDRL) algorithm is proposed to select the best subset of semantically relevant DRL agents for collaboration. The proposed approach then jointly optimizes the training loss and wireless bandwidth allocation for the cooperating selected agents in order to train each agent within the time limit of its real-time task. Simulation results show the superior performance of the proposed algorithm compared to state-of-the-art baselines.
【2】 Multi-agent Bayesian Deep Reinforcement Learning for Microgrid Energy Management under Communication Failures 标题:通信故障下微电网能量管理的多智能体贝叶斯深度强化学习 链接:https://arxiv.org/abs/2111.11868
作者:Hao Zhou,Atakan Aral,Ivona Brandic,Melike Erol-Kantarci 机构: University of Vienna 摘要:微电网(MG)是未来交易式能源系统的重要参与者,在智能电网中,许多智能物联网(IoT)设备相互作用以进行能源管理。虽然已经有很多关于MG能量管理的研究,但大多数研究都假设有一个完美的通信环境,其中不考虑通信故障。在本文中,我们考虑MG作为一个多代理环境与IOT设备,其中人工智能代理交换信息与他们的同行合作。然而,协作信息可能由于通信故障或分组丢失而丢失。此类事件可能会影响整个MG的运行。为此,我们提出了一种多智能体贝叶斯深度强化学习(BA-DRL)方法,用于通信故障下的MG能量管理。我们首先定义了一个多智能体部分可观测马尔可夫决策过程(MA-POMDP)来描述通信故障下的智能体,在该过程中,每个智能体可以更新其对其对等体行为的信念。然后,我们将双深度Q学习(DDQN)体系结构应用于BA-DRL中的Q值估计,并提出了一种基于信念的相关均衡,用于多智能体BA-DRL的联合行动选择。最后,仿真结果表明BA-DRL对电源不确定性和通信故障不确定性都具有鲁棒性。在1%的通信失败概率下,BA-DRL比纳什深度Q学习(Nash DQN)和交替方向乘数法(ADMM)分别高4.1%和10.3%。 摘要:Microgrids (MGs) are important players for the future transactive energy systems where a number of intelligent Internet of Things (IoT) devices interact for energy management in the smart grid. Although there have been many works on MG energy management, most studies assume a perfect communication environment, where communication failures are not considered. In this paper, we consider the MG as a multi-agent environment with IoT devices in which AI agents exchange information with their peers for collaboration. However, the collaboration information may be lost due to communication failures or packet loss. Such events may affect the operation of the whole MG. To this end, we propose a multi-agent Bayesian deep reinforcement learning (BA-DRL) method for MG energy management under communication failures. We first define a multi-agent partially observable Markov decision process (MA-POMDP) to describe agents under communication failures, in which each agent can update its beliefs on the actions of its peers. Then, we apply a double deep Q-learning (DDQN) architecture for Q-value estimation in BA-DRL, and propose a belief-based correlated equilibrium for the joint-action selection of multi-agent BA-DRL. Finally, the simulation results show that BA-DRL is robust to both power supply uncertainty and communication failure uncertainty. BA-DRL has 4.1% and 10.3% higher reward than Nash Deep Q-learning (Nash-DQN) and alternating direction method of multipliers (ADMM) respectively under 1% communication failure probability.
【3】 Inducing Functions through Reinforcement Learning without Task Specification 标题:无任务指定的强化学习诱导函数 链接:https://arxiv.org/abs/2111.11647
作者:Junmo Cho,Dong-Hwan Lee,Young-Gyu Yoon 机构:School of Electrical Engineering, KAIST, Daejeon, Republic of Korea 备注:14 pages 摘要:我们报告了一个仿生框架,用于通过强化学习来训练神经网络,从而在网络中诱导高级功能。基于这样一种解释,即动物由于最大限度地适应环境而获得了其认知功能,如物体识别,而从未接受过专门的训练,我们将我们的代理置于一个开发某些功能可能有助于决策的环境中。实验结果表明,在不进行任何预训练或指定的情况下,可以自然地同时导出图像分类和隐变量估计等高级函数。 摘要:We report a bio-inspired framework for training a neural network through reinforcement learning to induce high level functions within the network. Based on the interpretation that animals have gained their cognitive functions such as object recognition - without ever being specifically trained for - as a result of maximizing their fitness to the environment, we place our agent in an environment where developing certain functions may facilitate decision making. The experimental results show that high level functions, such as image classification and hidden variable estimation, can be naturally and simultaneously induced without any pre-training or specifying them.
符号|符号学习(1篇)
【1】 Learning Symbolic Rules for Reasoning in Quasi-Natural Language 标题:学习准自然语言中的符号推理规则 链接:https://arxiv.org/abs/2111.12038
作者:Kaiyu Yang,Jia Deng 机构:Department of Computer Science, Princeton University 摘要:符号推理,即基于规则的符号操作,是人类智能的一个标志。然而,基于规则的系统与形式化领域(如自动定理证明)之外的基于学习的系统相比,成功率有限。我们假设这是由于在过去的尝试中手动构建规则造成的。在这项工作中,我们询问如何构建一个基于规则的系统,该系统可以使用自然语言输入进行推理,但不需要手动构建规则。我们提出MetaQNL,一种可以表达形式逻辑和自然语言句子的“准自然”语言,以及MetaIncluse,一种从由问答组成的训练数据中归纳MetaQNL规则的学习算法,有或没有中间推理步骤。我们的方法在多个推理基准上实现了最先进的准确性;它用更少的数据学习紧凑的模型,不仅给出答案,而且给出可检查的证明。此外,在真实世界的形态学分析基准上的实验表明,我们的方法可以处理噪声和歧义。守则将于https://github.com/princeton-vl/MetaQNL. 摘要:Symbolic reasoning, rule-based symbol manipulation, is a hallmark of human intelligence. However, rule-based systems have had limited success competing with learning-based systems outside formalized domains such as automated theorem proving. We hypothesize that this is due to the manual construction of rules in past attempts. In this work, we ask how we can build a rule-based system that can reason with natural language input but without the manual construction of rules. We propose MetaQNL, a "Quasi-Natural" language that can express both formal logic and natural language sentences, and MetaInduce, a learning algorithm that induces MetaQNL rules from training data consisting of questions and answers, with or without intermediate reasoning steps. Our approach achieves state-of-the-art accuracy on multiple reasoning benchmarks; it learns compact models with much less data and produces not only answers but also checkable proofs. Further, experiments on a real-world morphological analysis benchmark show that it is possible for our method to handle noise and ambiguity. Code will be released at https://github.com/princeton-vl/MetaQNL.
医学相关(1篇)
【1】 Prediction Model for Mortality Analysis of Pregnant Women Affected With COVID-19 标题:用于孕妇冠状病毒感染死亡分析的预测模型 链接:https://arxiv.org/abs/2111.11477
作者:Quazi Adibur Rahman Adib,Sidratul Tanzila Tasmi,Md. Shahriar Islam Bhuiyan,Md. Mohsin Sarker Raihan,Abdullah Bin Shams 机构:Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh, Islamic University of Technology, Gazipur, Bangladesh, Department of Biomedical Engineering, Khulna University of Engineering and Technology, Khulna, Bangladesh 备注:Accepted for publication in IEEE Explore, ICCIT-2021 摘要:2019冠状病毒疾病流行是全球正在流行的流行病,它在公共卫生部门和全球经济中造成了前所未有的破坏。SARS-CoV-2病毒是导致冠状病毒病快速传播的病毒。由于其传染性,该病毒很容易感染无保护和接触的个人,症状从轻微到严重不等。考虑到病毒将如何影响母亲和新生儿的健康,研究病毒对孕妇和新生儿的影响现在是全球平民和公共卫生工作者关注的问题。本文旨在开发一个预测模型,根据记录的症状(呼吸困难、咳嗽、鼻漏、关节痛和肺炎诊断)来估计确诊为新冠病毒的母亲死亡的可能性。在我们的研究中使用的机器学习模型有支持向量机、决策树、随机森林、梯度增强和人工神经网络。这些模型提供了令人印象深刻的结果,能够在给定输入的情况下准确预测孕妇死亡率。3个模型(ANN、梯度增强、随机森林)的准确率为100%。最高准确率得分(梯度增强,ANN)为95%,最高召回率(支持向量机)为92.75%,最高f1得分(梯度增强,ANN)为94.66%。由于该模型的准确性,怀孕母亲可以根据其因病毒死亡的可能性立即接受医疗治疗。全球急诊科医生可以利用2019冠状病毒疾病来降低急诊科患者的死亡率。 摘要:COVID-19 pandemic is an ongoing global pandemic which has caused unprecedented disruptions in the public health sector and global economy. The virus, SARS-CoV-2 is responsible for the rapid transmission of coronavirus disease. Due to its contagious nature, the virus can easily infect an unprotected and exposed individual from mild to severe symptoms. The study of the virus effects on pregnant mothers and neonatal is now a concerning issue globally among civilians and public health workers considering how the virus will affect the mother and the neonates health. This paper aims to develop a predictive model to estimate the possibility of death for a COVID-diagnosed mother based on documented symptoms: dyspnea, cough, rhinorrhea, arthralgia, and the diagnosis of pneumonia. The machine learning models that have been used in our study are support vector machine, decision tree, random forest, gradient boosting, and artificial neural network. The models have provided impressive results and can accurately predict the mortality of pregnant mothers with a given input.The precision rate for 3 models(ANN, Gradient Boost, Random Forest) is 100% The highest accuracy score(Gradient Boosting,ANN) is 95%,highest recall(Support Vector Machine) is 92.75% and highest f1 score(Gradient Boosting,ANN) is 94.66%. Due to the accuracy of the model, pregnant mother can expect immediate medical treatment based on their possibility of death due to the virus. The model can be utilized by health workers globally to list down emergency patients, which can ultimately reduce the death rate of COVID-19 diagnosed pregnant mothers.
推荐(1篇)
【1】 Blockchain-based Recommender Systems: Applications, Challenges and Future Opportunities 标题:基于区块链的推荐系统:应用、挑战和未来机遇 链接:https://arxiv.org/abs/2111.11509
作者:Yassine Himeur,Aya Sayed,Abdullah Alsalemi,Faycal Bensaali,Abbes Amira,Iraklis Varlamis,Magdalini Eirinaki,Christos Sardianos,George Dimitrakopoulos 机构:Department of Electrical Engineering Qatar University Doha Qatar, Institute of Artificial Intelligence De Montfort University Leicester United Kingdom, Department of Computer Science University of Sharjah UAE 备注:None 摘要:推荐系统已广泛应用于不同的应用领域,包括节能、电子商务、医疗保健、社交媒体等。此类应用需要分析和挖掘大量不同类型的用户数据,包括人口统计、偏好、社交互动、,以开发精确的推荐系统。这类数据集通常包含敏感信息,但大多数推荐系统关注模型的准确性,而忽略了与安全和用户隐私相关的问题。尽管使用不同的风险降低技术努力克服这些问题,但没有一种技术能够完全成功地确保密码安全和保护用户的私人信息。为了弥合这一差距,区块链技术被认为是促进推荐系统安全性和隐私保护的一种有前景的策略,这不仅是因为它的安全性和隐私性的显著特点,而且还因为它的弹性、适应性、容错性和信任特性。本文全面回顾了基于区块链的推荐系统,包括挑战、开放问题和解决方案。因此,我们引入了一个设计良好的分类法来描述安全和隐私挑战,概述现有框架,并在指出未来研究机会之前讨论它们在使用区块链时的应用和好处。 摘要:Recommender systems have been widely used in different application domains including energy-preservation, e-commerce, healthcare, social media, etc. Such applications require the analysis and mining of massive amounts of various types of user data, including demographics, preferences, social interactions, etc. in order to develop accurate and precise recommender systems. Such datasets often include sensitive information, yet most recommender systems are focusing on the models' accuracy and ignore issues related to security and the users' privacy. Despite the efforts to overcome these problems using different risk reduction techniques, none of them has been completely successful in ensuring cryptographic security and protection of the users' private information. To bridge this gap, the blockchain technology is presented as a promising strategy to promote security and privacy preservation in recommender systems, not only because of its security and privacy salient features, but also due to its resilience, adaptability, fault tolerance and trust characteristics. This paper presents a holistic review of blockchain-based recommender systems covering challenges, open issues and solutions. Accordingly, a well-designed taxonomy is introduced to describe the security and privacy challenges, overview existing frameworks and discuss their applications and benefits when using blockchain before indicating opportunities for future research.
聚类(1篇)
【1】 A Modular Framework for Centrality and Clustering in Complex Networks 标题:复杂网络中中心性和聚类性的模块化框架 链接:https://arxiv.org/abs/2111.11623
作者:Frederique Oggier,Silivanxay Phetsouvanh,Anwitaman Datta 机构: Nanyang Technological University 摘要:许多复杂网络的结构包括边缘方向性和拓扑顶部的权重。可以无缝地考虑这些属性的组合的网络分析是可取的。在本文中,我们研究了两种重要的网络分析技术,即中心性和聚类。聚类采用基于信息流的模型,该模型本身建立在计算中心性的信息论度量基础上。我们的主要贡献包括一个广义马尔可夫熵中心模型,该模型具有调整节点度、边权重和方向重要性的灵活性,并具有封闭形式的渐近分析。提出了一种新的两阶段图聚类算法。中心性分析有助于解释我们对给定图进行聚类的方法的适用性,并确定“查询”节点,围绕该节点探索本地社区结构,从而形成聚集聚类机制。熵中心度计算通过我们的聚类算法进行摊销,从而提高了计算效率:与以前使用马尔可夫熵中心度进行聚类的方法相比,我们的实验证明了多个数量级的加速。我们的聚类算法自然地继承了适应边缘方向性的灵活性,以及边缘权重和节点度之间的不同解释和相互作用。总的来说,本文不仅在理论和概念上做出了重大贡献,而且还将研究结果转化为具有实际意义的工件,产生了新的、有效的和可扩展的中心度计算和图聚类算法,其有效性已通过广泛的基准测试实验得到验证。 摘要:The structure of many complex networks includes edge directionality and weights on top of their topology. Network analysis that can seamlessly consider combination of these properties are desirable. In this paper, we study two important such network analysis techniques, namely, centrality and clustering. An information-flow based model is adopted for clustering, which itself builds upon an information theoretic measure for computing centrality. Our principal contributions include a generalized model of Markov entropic centrality with the flexibility to tune the importance of node degrees, edge weights and directions, with a closed-form asymptotic analysis. It leads to a novel two-stage graph clustering algorithm. The centrality analysis helps reason about the suitability of our approach to cluster a given graph, and determine `query' nodes, around which to explore local community structures, leading to an agglomerative clustering mechanism. The entropic centrality computations are amortized by our clustering algorithm, making it computationally efficient: compared to prior approaches using Markov entropic centrality for clustering, our experiments demonstrate multiple orders of magnitude of speed-up. Our clustering algorithm naturally inherits the flexibility to accommodate edge directionality, as well as different interpretations and interplay between edge weights and node degrees. Overall, this paper thus not only makes significant theoretical and conceptual contributions, but also translates the findings into artifacts of practical relevance, yielding new, effective and scalable centrality computations and graph clustering algorithms, whose efficacy has been validated through extensive benchmarking experiments.
自动驾驶|车辆|车道检测等(2篇)
【1】 VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles 标题:Vista 2.0:一个开放的、数据驱动的自主车辆多模态感知和策略学习模拟器 链接:https://arxiv.org/abs/2111.12083
作者:Alexander Amini,Tsun-Hsuan Wang,Igor Gilitschenski,Wilko Schwarting,Zhijian Liu,Song Han,Sertac Karaman,Daniela Rus 机构:edu 2 Department of Computer Science 备注:First two authors contributed equally. Code and project website is available here: this https URL 摘要:仿真有可能改变部署在安全关键场景中的移动代理鲁棒算法的发展。然而,现有模拟引擎的低真实感和缺乏多样的传感器模式仍然是实现这一潜力的关键障碍。在这里,我们介绍VISTA,一个开源的数据驱动模拟器,它集成了多种类型的自动驾驶车辆传感器。VISTA使用高保真、真实世界的数据集,表示和模拟RGB摄像机、3D激光雷达和基于事件的摄像机,实现了在模拟中快速生成新视点,从而丰富了可用于政策学习的数据,这些数据在物理世界中难以捕捉。使用VISTA,我们展示了训练和测试感知能力的能力,以控制每种传感器类型的策略,并通过在全尺寸自动车辆上的部署展示了这种方法的威力。在VISTA中学习到的策略显示出无需修改的模拟到真实的传输,并且比那些只针对真实数据进行训练的策略具有更大的鲁棒性。 摘要:Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in safety-critical scenarios. However, the poor photorealism and lack of diverse sensor modalities of existing simulation engines remain key hurdles towards realizing this potential. Here, we present VISTA, an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles. Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras, enabling the rapid generation of novel viewpoints in simulation and thereby enriching the data available for policy learning with corner cases that are difficult to capture in the physical world. Using VISTA, we demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle. The policies learned in VISTA exhibit sim-to-real transfer without modification and greater robustness than those trained exclusively on real-world data.
【2】 A Multi-Stage model based on YOLOv3 for defect detection in PV panels based on IR and Visible Imaging by Unmanned Aerial Vehicle 标题:基于YOLOv3的无人机红外和可见光光伏组件缺陷检测多级模型 链接:https://arxiv.org/abs/2111.11709
作者:Antonio Di Tommaso,Alessandro Betti,Giacomo Fontanelli,Benedetto Michelozzi 机构:FlySight S.r.l, Via Lampredi , Livorno, Italy 备注:Submitted to Elsevier. Under Review 摘要:随着全球太阳能装机容量的持续增长,人们越来越意识到,先进的检测系统对于安排智能干预和最小化停机可能性变得至关重要。在这项工作中,我们提出了一种新的自动多阶段模型,用于利用YOLOv3网络和计算机视觉技术检测无人机拍摄的航空图像上的面板缺陷。该模型结合了对面板和缺陷的检测,以提高其准确性。主要的新颖性表现在其处理热成像或可见图像的多功能性和检测各种缺陷的能力,以及其可移植到屋顶和地面安装的光伏系统和不同类型的面板上。该模型已在意大利南部的两个大型光伏发电厂进行了验证,结果令人满意AP@0.5超过98%的面板检测,显著提高AP@0.4 (AP@0.5)大约88.3%(66.95%)通过红外热成像和mAP@0.5近70%的可见光光谱用于检测异常,包括因污垢和鸟粪、分层、水坑和凸起屋顶面板引起的面板阴影。还预测了土壤覆盖率的估计值。最后分析了YOLOv3不同输出尺度对检测结果的影响。 摘要:As solar capacity installed worldwide continues to grow, there is an increasing awareness that advanced inspection systems are becoming of utmost importance to schedule smart interventions and minimize downtime likelihood. In this work we propose a novel automatic multi-stage model to detect panel defects on aerial images captured by unmanned aerial vehicle by using the YOLOv3 network and Computer Vision techniques. The model combines detections of panels and defects to refine its accuracy. The main novelties are represented by its versatility to process either thermographic or visible images and detect a large variety of defects and its portability to both rooftop and ground-mounted PV systems and different panel types. The proposed model has been validated on two big PV plants in the south of Italy with an outstanding AP@0.5 exceeding 98% for panel detection, a remarkable AP@0.4 (AP@0.5) of roughly 88.3% (66.95%) for hotspots by means of infrared thermography and a mAP@0.5 of almost 70% in the visible spectrum for detection of anomalies including panel shading induced by soiling and bird dropping, delamination, presence of puddles and raised rooftop panels. An estimation of the soiling coverage is also predicted. Finally an analysis of the influence of the different YOLOv3's output scales on the detection is discussed.
点云|SLAM|雷达|激光|深度RGBD相关(3篇)
【1】 Depth induces scale-averaging in overparameterized linear Bayesian neural networks 链接:https://arxiv.org/abs/2111.11954
作者:Jacob A. Zavatone-Veth,Cengiz Pehlevan 机构:Department of Physics, Harvard University, Cambridge, MA, United States, John A. Paulson School of Engineering and Applied Sciences 备注:8 pages, no figures 摘要:深度贝叶斯神经网络中的推理只有在无限宽度限制下才能完全理解,在无限宽度限制下,深度增加所提供的后验灵活性被冲掉,后验预测崩溃为浅高斯过程。在这里,我们将有限深线性贝叶斯神经网络解释为高斯过程预测器跨输出通道的数据相关尺度混合。我们利用这一观察来研究这些网络中的表征学习,使我们能够在一个统一的框架内连接先前研究中获得的有限结果。总的来说,这些结果促进了我们对深度如何影响一类简单贝叶斯神经网络推理的分析理解。 摘要:Inference in deep Bayesian neural networks is only fully understood in the infinite-width limit, where the posterior flexibility afforded by increased depth washes out and the posterior predictive collapses to a shallow Gaussian process. Here, we interpret finite deep linear Bayesian neural networks as data-dependent scale mixtures of Gaussian process predictors across output channels. We leverage this observation to study representation learning in these networks, allowing us to connect limiting results obtained in previous studies within a unified framework. In total, these results advance our analytical understanding of how depth affects inference in a simple class of Bayesian neural networks.
【2】 Isolation forests: looking beyond tree depth 标题:与世隔绝的森林:超越树木深度 链接:https://arxiv.org/abs/2111.11639
作者:David Cortes 摘要:离群点检测的隔离林算法利用了一个简单但有效的观察:如果获取一些多元数据并递归地在特征空间中进行均匀随机切割,则与常规观察相比,离群点在给定子空间中单独存在所需的随机切割更少。最初的想法提出了一个基于隔离所需的树深度(随机切割数)的离群值分数,但这里的实验表明,在许多情况下,使用有关特征空间大小和分配给它的点数的信息可以在不修改树结构的情况下改进结果,尤其是在存在分类特征的情况下。 摘要:The isolation forest algorithm for outlier detection exploits a simple yet effective observation: if taking some multivariate data and making uniformly random cuts across the feature space recursively, it will take fewer such random cuts for an outlier to be left alone in a given subspace as compared to regular observations. The original idea proposed an outlier score based on the tree depth (number of random cuts) required for isolation, but experiments here show that using information about the size of the feature space taken and the number of points assigned to it can result in improved results in many situations without any modification to the tree structure, especially in the presence of categorical features.
【3】 Depth Without the Magic: Inductive Bias of Natural Gradient Descent 标题:没有魔力的深度:自然梯度下降的归纳偏差 链接:https://arxiv.org/abs/2111.11542
作者:Anna Kerekes,Anna Mészáros,Ferenc Huszár 机构:University of Cambridge, UK, Anna M´esz´aros ∗, E¨otv¨os Lor´and University, Hungary, Ferenc Husz´ar, Computer Laboratory 摘要:在梯度下降法中,改变模型参数化的方式会导致截然不同的优化轨迹,从而产生一系列令人惊讶的有意义的归纳偏差:识别稀疏分类器或在没有显式正则化的情况下重建低秩矩阵。这种内隐正则化被认为是深度学习中良好泛化的一个因素。然而,自然梯度下降对重新参数化是近似不变的,它总是遵循相同的轨迹并找到相同的最优解。问题自然而然地出现了:如果我们消除了参数化的作用,会发生什么,会找到什么解决方案,会出现什么新特性?在logistic损失和深矩阵分解下,我们刻画了可分离分类的深线性网络中自然梯度流的行为。我们的一些发现扩展到具有充分但有限参数化的非线性神经网络。我们证明了存在自然梯度下降不能推广的学习问题,而具有正确结构的梯度下降表现良好。 摘要:In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep learning. However, natural gradient descent is approximately invariant to reparameterization, it always follows the same trajectory and finds the same optimum. The question naturally arises: What happens if we eliminate the role of parameterization, which solution will be found, what new properties occur? We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization. Some of our findings extend to nonlinear neural networks with sufficient but finite over-parametrization. We demonstrate that there exist learning problems where natural gradient descent fails to generalize, while gradient descent with the right architecture performs well.
联邦学习|隐私保护|加密(2篇)
【1】 Incentive Mechanisms for Federated Learning: From Economic and Game Theoretic Perspective 标题:联合学习的激励机制:经济学和博弈论视角 链接:https://arxiv.org/abs/2111.11850
作者:Xuezhen Tu,Kun Zhu,Nguyen Cong Luong,Dusit Niyato,Yang Zhang,Juan Li 备注:26 pages, 10 figures 摘要:联邦学习(FL)在不暴露所有者原始数据的情况下,在训练大规模机器学习(ML)模型方面显示出巨大的潜力。在FL中,数据所有者可以基于本地数据训练ML模型,并且只将模型更新而不是原始数据发送给模型所有者进行聚合。为了提高模型准确性和训练完成时间方面的学习绩效,必须招募足够的参与者。同时,数据所有者是理性的,可能由于资源消耗而不愿意参与协作学习过程。为了解决这些问题,最近提出了各种工作来激励数据所有者贡献他们的资源。在本文中,我们对文献中提出的经济和博弈论方法进行了全面的回顾,这些方法旨在设计各种激励数据所有者参与外语训练过程的方案。特别是,我们首先介绍了FL的基本原理和背景,FL是激励机制设计中常用的经济理论。然后,我们回顾了博弈论和经济学方法在外语激励机制设计中的应用。最后,我们强调了外语激励机制设计中存在的一些问题和未来的研究方向。 摘要:Federated learning (FL) becomes popular and has shown great potentials in training large-scale machine learning (ML) models without exposing the owners' raw data. In FL, the data owners can train ML models based on their local data and only send the model updates rather than raw data to the model owner for aggregation. To improve learning performance in terms of model accuracy and training completion time, it is essential to recruit sufficient participants. Meanwhile, the data owners are rational and may be unwilling to participate in the collaborative learning process due to the resource consumption. To address the issues, there have been various works recently proposed to motivate the data owners to contribute their resources. In this paper, we provide a comprehensive review for the economic and game theoretic approaches proposed in the literature to design various schemes for stimulating data owners to participate in FL training process. In particular, we first present the fundamentals and background of FL, economic theories commonly used in incentive mechanism design. Then, we review applications of game theory and economic approaches applied for incentive mechanisms design of FL. Finally, we highlight some open issues and future research directions concerning incentive mechanism design of FL.
【2】 FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning 标题:FLIX:一种简单高效的联邦学习局部方法替代方案 链接:https://arxiv.org/abs/2111.11556
作者:Elnur Gasanov,Ahmed Khaled,Samuel Horváth,Peter Richtárik 机构:KAUST, Saudi Arabia 摘要:联邦学习(FL)是一种越来越流行的机器学习范式,其中多个节点试图在隐私、通信和多种异质性约束下进行协作学习。联邦学习中一个长期存在的问题是,不清楚优化目标应该是什么:有监督学习的标准平均风险最小化不足以处理联邦学习特有的几个主要约束,例如通信自适应性和个性化控制。我们确定了联邦学习框架中的几个关键需求,并引入了一个新框架FLIX,该框架考虑了联邦学习带来的独特挑战。FLIX有一个标准的有限和形式,这使实践者能够利用大量现有的(可能是非局部的)分布式优化方法。通过不需要任何通信的智能初始化,FLIX不需要使用本地步骤,但仍然可证明能够与本地方法在PAR上执行相异正则化。我们给出了在通信约束下有效求解FLIX公式的几种算法。最后,我们通过大量实验验证了我们的理论结果。 摘要:Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control. We identify several key desiderata in frameworks for federated learning and introduce a new framework, FLIX, that takes into account the unique challenges brought by federated learning. FLIX has a standard finite-sum form, which enables practitioners to tap into the immense wealth of existing (potentially non-local) methods for distributed optimization. Through a smart initialization that does not require any communication, FLIX does not require the use of local steps but is still provably capable of performing dissimilarity regularization on par with local methods. We give several algorithms for solving the FLIX formulation efficiently under communication constraints. Finally, we corroborate our theoretical results with extensive experimentation.
推理|分析|理解|解释(4篇)
【1】 Explainable Deep Image Classifiers for Skin Lesion Diagnosis 标题:用于皮肤病变诊断的可解释深度图像分类器 链接:https://arxiv.org/abs/2111.11863
作者:Carlo Metta,Andrea Beretta,Riccardo Guidotti,Yuan Yin,Patrick Gallinari,Salvatore Rinzivillo,Fosca Giannotti 机构: Patrick Gallinari is with Sorbonne Universit´e andCriteo AI Lab, sorbonne-universite 摘要:在医疗诊断等关键环境中的一个关键问题是决策系统中采用的深度学习模型的可解释性。可解释人工智能(XAI)的研究正试图解决这个问题。然而,XAI方法通常只在多面手分类器上测试,并不代表诸如医疗诊断之类的现实问题。在本文中,我们分析了一个皮肤损伤图像的案例研究,其中我们定制了一个现有的XAI方法,用于解释能够识别不同类型皮肤损伤的深度学习模型。该解释是由皮肤损伤的综合样本和反样本图像形成的,为从业者提供了一种突出负责分类决策的关键特征的方法。一项针对领域专家、初学者和非熟练人员的调查证明,解释的使用增加了对自动决策系统的信任和信心。此外,对解释者采用的潜在空间的分析揭示了一些最常见的皮肤损伤类别是明显分开的。这一现象可能源于每个类别的固有特征,并有望为解决人类专家最常见的错误分类提供支持。 摘要:A key issue in critical contexts such as medical diagnosis is the interpretability of the deep learning models adopted in decision-making systems. Research in eXplainable Artificial Intelligence (XAI) is trying to solve this issue. However, often XAI approaches are only tested on generalist classifier and do not represent realistic problems such as those of medical diagnosis. In this paper, we analyze a case study on skin lesion images where we customize an existing XAI approach for explaining a deep learning model able to recognize different types of skin lesions. The explanation is formed by synthetic exemplar and counter-exemplar images of skin lesion and offers the practitioner a way to highlight the crucial traits responsible for the classification decision. A survey conducted with domain experts, beginners and unskilled people proof that the usage of explanations increases the trust and confidence in the automatic decision system. Also, an analysis of the latent space adopted by the explainer unveils that some of the most frequent skin lesion classes are distinctly separated. This phenomenon could derive from the intrinsic characteristics of each class and, hopefully, can provide support in the resolution of the most frequent misclassifications by human experts.
【2】 Understanding the Impact of Data Distribution on Q-learning with Function Approximation 标题:用函数逼近理解数据分布对Q-学习的影响 链接:https://arxiv.org/abs/2111.11758
作者:Pedro P. Santos,Francisco S. Melo,Alberto Sardinha,Diogo S. Carvalho 摘要:在这项工作中,我们将重点放在研究数据分布和基于Q学习的函数逼近算法之间的相互作用上。我们提供了一个理论和实证分析,来解释为什么数据分布的不同属性有助于调节算法不稳定性的来源。首先,我们回顾了近似动态规划算法性能的理论界。其次,我们提供了一个新的四态MDP,它强调了在线和离线环境下数据分布对函数近似Q学习算法性能的影响。最后,我们通过实验评估了数据分布特性对离线深度Q网络算法性能的影响。我们的研究结果表明:(i)数据分布需要具有一定的性质,以便在离线环境下进行鲁棒学习,即与MDP最优策略所诱导的分布的距离较低,并且在状态-行动空间上的覆盖率较高;(ii)高熵数据分布有助于缓解算法不稳定性的来源。 摘要:In this work, we focus our attention on the study of the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a theoretical and empirical analysis as to why different properties of the data distribution can contribute to regulating sources of algorithmic instability. First, we revisit theoretical bounds on the performance of approximate dynamic programming algorithms. Second, we provide a novel four-state MDP that highlights the impact of the data distribution in the performance of a Q-learning algorithm with function approximation, both in online and offline settings. Finally, we experimentally assess the impact of the data distribution properties in the performance of an offline deep Q-network algorithm. Our results show that: (i) the data distribution needs to possess certain properties in order to robustly learn in an offline setting, namely low distance to the distributions induced by optimal policies of the MDP and high coverage over the state-action space; and (ii) high entropy data distributions can contribute to mitigating sources of algorithmic instability.
【3】 Variational encoder geostatistical analysis (VEGAS) with an application to large scale riverine bathymetry 链接:https://arxiv.org/abs/2111.11719
作者:Mojtaba Forghani,Yizhou Qian,Jonghyun Lee,Matthew Farthing,Tyler Hesser,Peter K. Kitanidis,Eric F. Darve 机构: Department of Mechanical Engineering, Stanford University, CA, Institute for Computational and Mathematical Engineering, Stanford University, CA, Department of Civil and Environmental Engineering and Water Resources Research Center, University of 摘要:河床剖面估计,也称为水深测量,在许多应用中起着至关重要的作用,例如安全有效的内河航行、河岸侵蚀预测、地面沉降和洪水风险管理。直接测深测量(即深度成像)的高成本和复杂的后勤保障鼓励使用间接测量,如表面流速。然而,从间接测量估计高分辨率测深是一个在计算上具有挑战性的反问题。在这里,我们提出了一种基于降阶模型(ROM)的方法,利用变分自动编码器(VAE),一种在中间具有窄层的深度神经网络,来压缩测深和流速信息,并从流速测量加速水深反演问题。在我们的应用中,具有适当边界条件(BCs)的浅水方程(SWE),例如流量和/或自由面高程,构成预测流速的正问题。然后,通过变分编码器在低维非线性流形上构造SWE的ROM。在贝叶斯环境下,对低维潜在空间进行不确定性量化(UQ)估计。我们已经在美国佐治亚州萨凡纳河一英里河段对我们的反演方法进行了测试。一旦对神经网络进行了训练(离线阶段),所提出的技术可以比通常基于线性投影(如主成分分析)的传统反演方法更快地执行反演操作数量级此外,试验表明,即使在流速测量稀疏的情况下,该算法也能很好地估计水深。 摘要:Estimation of riverbed profiles, also known as bathymetry, plays a vital role in many applications, such as safe and efficient inland navigation, prediction of bank erosion, land subsidence, and flood risk management. The high cost and complex logistics of direct bathymetry surveys, i.e., depth imaging, have encouraged the use of indirect measurements such as surface flow velocities. However, estimating high-resolution bathymetry from indirect measurements is an inverse problem that can be computationally challenging. Here, we propose a reduced-order model (ROM) based approach that utilizes a variational autoencoder (VAE), a type of deep neural network with a narrow layer in the middle, to compress bathymetry and flow velocity information and accelerate bathymetry inverse problems from flow velocity measurements. In our application, the shallow-water equations (SWE) with appropriate boundary conditions (BCs), e.g., the discharge and/or the free surface elevation, constitute the forward problem, to predict flow velocity. Then, ROMs of the SWEs are constructed on a nonlinear manifold of low dimensionality through a variational encoder. Estimation with uncertainty quantification (UQ) is performed on the low-dimensional latent space in a Bayesian setting. We have tested our inversion approach on a one-mile reach of the Savannah River, GA, USA. Once the neural network is trained (offline stage), the proposed technique can perform the inversion operation orders of magnitude faster than traditional inversion methods that are commonly based on linear projections, such as principal component analysis (PCA), or the principal component geostatistical approach (PCGA). Furthermore, tests show that the algorithm can estimate the bathymetry with good accuracy even with sparse flow velocity measurements.
【4】 Link Analysis meets Ontologies: Are Embeddings the Answer? 标题:链接分析遇到本体论:嵌入是答案吗? 链接:https://arxiv.org/abs/2111.11710
作者:Sebastian Mežnar,Matej Bevec,Nada Lavrač,Blaž Škrlj 机构:Jožef Stefan Institute, Jamova , Ljubljana, Slovenia, Jožef Stefan International Postgraduate School, Jamova , Ljubljana, Slovenia, University of Nova Gorica, Glavni trg , Vipava, Slovenia, A R T I C L E I N F O 备注:17 pages, 8 tables, 7 figures 摘要:越来越多的语义资源为人类知识提供了宝贵的存储空间;但是,错误条目的概率随着大小的增加而增加。因此,开发识别给定知识库中潜在虚假部分的方法正成为一个日益重要的兴趣领域。在这项工作中,我们系统地评估了纯结构链接分析方法是否已经能够提供一种可扩展的方法来检测可能的异常,以及潜在有趣的新关系候选者。通过对八种不同语义资源(包括基因本体、食品本体、海洋本体和类似资源)的十三种方法的评估,我们证明了纯结构链接分析可以为数据集的子集提供可伸缩的异常检测。此外,我们还证明,通过考虑符号节点嵌入,可以获得预测(链接)的解释,这使得这一方法分支可能比只使用黑匣子的方法更有价值。据我们所知,这是目前对不同领域的语义资源中不同类型链接分析方法的适用性进行的最广泛的系统研究之一。 摘要:The increasing amounts of semantic resources offer valuable storage of human knowledge; however, the probability of wrong entries increases with the increased size. The development of approaches that identify potentially spurious parts of a given knowledge base is thus becoming an increasingly important area of interest. In this work, we present a systematic evaluation of whether structure-only link analysis methods can already offer a scalable means to detecting possible anomalies, as well as potentially interesting novel relation candidates. Evaluating thirteen methods on eight different semantic resources, including Gene Ontology, Food Ontology, Marine Ontology and similar, we demonstrated that structure-only link analysis could offer scalable anomaly detection for a subset of the data sets. Further, we demonstrated that by considering symbolic node embedding, explanations of the predictions (links) could be obtained, making this branch of methods potentially more valuable than the black-box only ones. To our knowledge, this is currently one of the most extensive systematic studies of the applicability of different types of link analysis methods across semantic resources from different domains.
检测相关(1篇)
【1】 End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge Devices 标题:基于机器学习的超边缘设备端到端优化心律失常检测流水线 链接:https://arxiv.org/abs/2111.11789
作者:Sideshwar J B,Sachin Krishan T,Vishal Nagarajan,Shanthakumar S,Vineeth Vijayaraghavan 机构:∗SSN College of Engineering, Chennai, India, †Solarillion Foundation, Chennai, India 备注:8 pages, 9 figures, Accepted at 20th IEEE International Conference on Machine Learning and Applications (ICMLA) 2021 摘要:心房颤动(AF)是世界范围内最常见的心律失常,2%的人群受到影响。它与中风、心力衰竭和其他心脏相关并发症的风险增加有关。监测高危人群和检测无症状房颤可带来可观的公共卫生效益,因为无症状房颤患者可以通过改变生活方式采取预防措施。随着可穿戴设备的可负担性越来越高,个性化医疗变得越来越容易获得。这些个性化的医疗解决方案需要精确的生物信号分类,同时计算成本低廉。通过在设备上进行推断,我们避免了基于云的系统固有的问题,如延迟和网络连接依赖性。我们提出了一种可用于超边缘设备的高精度实时心房颤动检测的高效管道。本研究中采用的特征工程旨在优化拟议管道中使用的资源效率高的分类器,该分类器能够在内存占用方面比性能最佳的标准ML模型高10^5倍,仅需折衷2%的分类精度。与以前最先进的(SoA)嵌入式实现相比,我们还获得了大约6%的更高精度,同时占用的内存减少了403$倍,速度提高了5.2$倍。 摘要:Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia worldwide, with 2% of the population affected. It is associated with an increased risk of strokes, heart failure and other heart-related complications. Monitoring at-risk individuals and detecting asymptomatic AF could result in considerable public health benefits, as individuals with asymptomatic AF could take preventive measures with lifestyle changes. With increasing affordability to wearables, personalized health care is becoming more accessible. These personalized healthcare solutions require accurate classification of bio-signals while being computationally inexpensive. By making inferences on-device, we avoid issues inherent to cloud-based systems such as latency and network connection dependency. We propose an efficient pipeline for real-time Atrial Fibrillation Detection with high accuracy that can be deployed in ultra-edge devices. The feature engineering employed in this research catered to optimizing the resource-efficient classifier used in the proposed pipeline, which was able to outperform the best performing standard ML model by $10^5times$ in terms of memory footprint with a mere trade-off of 2% classification accuracy. We also obtain higher accuracy of approximately 6% while consuming 403$times$ lesser memory and being 5.2$times$ faster compared to the previous state-of-the-art (SoA) embedded implementation.
表征(1篇)
【1】 A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning 标题:噪音中的免费午餐:表征学习的可证明性和实践性探索 链接:https://arxiv.org/abs/2111.11485
作者:Tongzheng Ren,Tianjun Zhang,Csaba Szepesvári,Bo Dai 机构:UT Austin & Google Brain, UC Berkeley, Csaba Szepesv´ari, University of Alberta & DeepMind 备注:The first two authors contribute equally 摘要:表征学习是深度学习在应对维度诅咒方面取得经验成功的核心。然而,在强化学习(RL)中,表征学习的力量尚未得到充分利用,这是因为在表达性和可驯化性之间进行了权衡;探索与表征学习之间的耦合。在本文中,我们首先揭示了在随机控制模型的某些噪声假设下,我们可以免费获得其相应的马尔可夫转移算子的闭式线性谱特征。基于这一观察,我们提出了谱动力学嵌入(SPEDE),它打破了折衷,通过利用噪声的结构完成了表征学习的乐观探索。我们对SPEDE进行了严格的理论分析,并在几个基准上证明了与现有最先进的经验算法相比,SPEDE的实际优越性能。 摘要:Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality. However, the power of representation learning has not been fully exploited yet in reinforcement learning (RL), due to i), the trade-off between expressiveness and tractability; and ii), the coupling between exploration and representation learning. In this paper, we first reveal the fact that under some noise assumption in the stochastic control model, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free. Based on this observation, we propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise. We provide rigorous theoretical analysis of SPEDE, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.
优化|敛散性(2篇)
【1】 HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance 标题:HERO:统一和改进泛化和量化性能的Hessian-Enhanced鲁棒优化 链接:https://arxiv.org/abs/2111.11986
作者:Huanrui Yang,Xiaoxuan Yang,Neil Zhenqiang Gong,Yiran Chen 机构:Duke University, Durham, NC, USA 摘要:随着最近在移动和边缘设备上部署神经网络模型的需求,人们希望提高模型在未知测试数据上的通用性,以及增强模型在定点量化下的鲁棒性,以便有效部署。然而,最小化训练损失对泛化和量化性能几乎没有保证。在这项工作中,我们在提高模型对有界权重扰动的鲁棒性和最小化Hessian矩阵相对于模型权重的特征值的框架下,从理论上统一了它们,从而满足了同时提高泛化和量化性能的需要。因此,我们提出HERO,一种Hessian增强鲁棒优化方法,通过基于梯度的训练过程最小化Hessian特征值,同时提高泛化和量化性能。HERO使测试精度提高了3.8%,在80%的训练标签扰动下精度提高了30%,并且在广泛的精度范围内实现了最佳的训练后量化精度,包括在各种数据集上的通用模型体系结构中,与SGD训练模型相比,精度提高了10%以上。 摘要:With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient deployment. Minimizing the training loss, however, provides few guarantees on the generalization and quantization performance. In this work, we fulfill the need of improving generalization and quantization performance simultaneously by theoretically unifying them under the framework of improving the model's robustness against bounded weight perturbation and minimizing the eigenvalues of the Hessian matrix with respect to model weights. We therefore propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process, simultaneously improving the generalization and quantization performance. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision, including a >10% accuracy improvement over SGD-trained models for common model architectures on various datasets.
【2】 Scalable Learning for Optimal Load Shedding Under Power Grid Emergency Operations 标题:电网紧急运行条件下最优切负荷的可伸缩学习 链接:https://arxiv.org/abs/2111.11980
作者:Yuqi Zhou,Jeehyun Park,Hao Zhu 机构:Department of Electrical and Computer Engineering, The University of Texas at Austin 摘要:有效和及时地应对突发事件对于增强电网的恢复能力至关重要。考虑到级联传播的快速、复杂过程,由于计算复杂性和通信延迟问题,在大规模网络中很难实现最佳减载(OLS)等纠正措施。本文提出了一种创新的OLS学习方法,通过离线神经网络(NN)训练,在各种可能的应急情况下构造最优的甩负荷决策规则。值得注意的是,所提出的基于神经网络的OLS决策完全分散,使得各个负荷中心能够使用现成的本地测量值对特定的意外事件做出快速反应。IEEE 14节点系统的数值研究证明了我们的可扩展OLS设计对于严重电网紧急事件的实时响应的有效性。 摘要:Effective and timely responses to unexpected contingencies are crucial for enhancing the resilience of power grids. Given the fast, complex process of cascading propagation, corrective actions such as optimal load shedding (OLS) are difficult to attain in large-scale networks due to the computation complexity and communication latency issues. This work puts forth an innovative learning-for-OLS approach by constructing the optimal decision rules of load shedding under a variety of potential contingency scenarios through offline neural network (NN) training. Notably, the proposed NN-based OLS decisions are fully decentralized, enabling individual load centers to quickly react to the specific contingency using readily available local measurements. Numerical studies on the IEEE 14-bus system have demonstrated the effectiveness of our scalable OLS design for real-time responses to severe grid emergency events.
预测|估计(5篇)
【1】 Appliance Level Short-term Load Forecasting via Recurrent Neural Network 链接:https://arxiv.org/abs/2111.11998
作者:Yuqi Zhou,Arun Sukumaran Nair,David Ganger,Abhinandan Tripathi,Chaitanya Baone,Hao Zhu 机构:∗The University of Texas at Austin, Austin, Texas, USA, †Eaton Research Lab, Denver, Colorado, USA, ‡Eaton Corporation, Pune, Maharashtra, India 摘要:准确的负荷预测对于电力市场运营和电力系统中的其他实时决策任务至关重要。本文考虑社区内住宅用户的短期负荷预测问题。现有的STLF工作主要集中于预测馈线系统或单个客户的总负荷,但很少在单个设备级别上进行负荷预测。在这项工作中,我们提出了一种STLF算法,用于有效地预测单个电器的功耗。所提出的方法建立在深度学习中一个强大的递归神经网络(RNN)结构上,称为长短时记忆(LSTM)。由于每个设备具有唯一的重复消费模式,因此将跟踪预测误差模式,以便可以使用过去的预测误差来改进最终预测性能。对真实负载数据集的数值试验表明,与现有的基于LSTM的方法和其他基准方法相比,该方法有了改进。 摘要:Accurate load forecasting is critical for electricity market operations and other real-time decision-making tasks in power systems. This paper considers the short-term load forecasting (STLF) problem for residential customers within a community. Existing STLF work mainly focuses on forecasting the aggregated load for either a feeder system or a single customer, but few efforts have been made on forecasting the load at individual appliance level. In this work, we present an STLF algorithm for efficiently predicting the power consumption of individual electrical appliances. The proposed method builds upon a powerful recurrent neural network (RNN) architecture in deep learning, termed as long short-term memory (LSTM). As each appliance has uniquely repetitive consumption patterns, the patterns of prediction error will be tracked such that past prediction errors can be used for improving the final prediction performance. Numerical tests on real-world load datasets demonstrate the improvement of the proposed method over existing LSTM-based method and other benchmark approaches.
【2】 Is this IoT Device Likely to be Secure? Risk Score Prediction for IoT Devices Using Gradient Boosting Machines 标题:这个物联网设备可能是安全的吗?基于梯度助推机的物联网设备风险分值预测 链接:https://arxiv.org/abs/2111.11874
作者:Carlos A. Rivera A.,Arash Shaghaghi,David D. Nguyen,Salil S. Kanhere 机构:Kanhere, The University of New South Wales (UNSW), Sydney, Australia, RMIT University, Melbourne, Australia 备注:Accepted - EAI MobiQuitous 2021 - 18th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services 摘要:安全风险评估和预测对于部署物联网(IoT)设备的组织至关重要。企业的最低绝对要求是验证物联网设备在国家漏洞数据库(NVD)中报告的漏洞的安全风险。本文提出了一种基于公开信息的物联网设备风险预测方法。我们的解决方案为各种规模的企业提供了一个简单且经济高效的解决方案,以预测部署新物联网设备的安全风险。在对过去八年的NVD记录进行了广泛分析后,我们为易受攻击的物联网设备创建了一个独特、系统和平衡的数据集,包括关键技术特征,以及公共资源提供的功能和描述性特征。然后,我们在该数据集上使用机器学习分类模型,如梯度提升决策树(GBDT),并在设备脆弱性评分的严重性分类中达到71%的预测准确率。 摘要:Security risk assessment and prediction are critical for organisations deploying Internet of Things (IoT) devices. An absolute minimum requirement for enterprises is to verify the security risk of IoT devices for the reported vulnerabilities in the National Vulnerability Database (NVD). This paper proposes a novel risk prediction for IoT devices based on publicly available information about them. Our solution provides an easy and cost-efficient solution for enterprises of all sizes to predict the security risk of deploying new IoT devices. After an extensive analysis of the NVD records over the past eight years, we have created a unique, systematic, and balanced dataset for vulnerable IoT devices, including key technical features complemented with functional and descriptive features available from public resources. We then use machine learning classification models such as Gradient Boosting Decision Trees (GBDT) over this dataset and achieve 71% prediction accuracy in classifying the severity of device vulnerability score.
【3】 Time Series Prediction about Air Quality using LSTM-Based Models: A Systematic Mapping 标题:基于LSTM模型的空气质量时间序列预测:系统映射 链接:https://arxiv.org/abs/2111.11848
作者:Lucas L. S. Sachetti,Vinicius F. S. Mota 机构:Computer Science Department – Federal University of Esp´ırito Santo (UFES), Mailbox ,-, – ,-, – Vit´oria – ES – Brazil 摘要:这项系统制图研究调查了使用长短记忆网络预测空气质量的时间序列数据,试图了解科学文献中可用的原因、特征和方法,确定研究领域的差距以及可在以后研究中利用的潜在方法。 摘要:This systematic mapping study investigates the use of Long short-term memory networks to predict time series data about air quality, trying to understand the reasons, characteristics and methods available in the scientific literature, identify gaps in the researched area and potential approaches that can be exploited on later studies.
【4】 pmSensing: A Participatory Sensing Network for Predictive Monitoring of Particulate Matter 标题:PMSISTING:一种用于颗粒物预测监测的参与式传感网络 链接:https://arxiv.org/abs/2111.11441
作者:Lucas L. S. Sachetti,Enzo B. Cussuol,José Marcos S. Nogueira,Vinicius F. S. Mota 机构:Departamento de Inform´atica – Universidade Federal do Esp´ırito Santo, Vit´oria – Brasil, Departamento de Ciˆencia da Computac¸˜ao – Universidade Federal de Minas Gerais, Belo Horizonte – Brasil 备注:None 摘要:这项工作提出了一个用于参与式传感的无线传感器网络的建议,其中物联网传感设备是专门为监测和预测空气质量而开发的,作为高成本气象站的替代方案。该系统称为pmSensing,旨在测量颗粒物质。通过将原型收集的数据与来自台站的数据进行比较来进行验证。比较表明,结果接近,这可以实现问题的低成本解决方案。该系统仍然使用递归神经网络(在本例中为LSTM-RNN)进行预测分析,其中预测相对于真实数据具有较高的精度。 摘要:This work presents a proposal for a wireless sensor network for participatory sensing, with IoT sensing devices developed especially for monitoring and predicting air quality, as alternatives of high cost meteorological stations. The system, called pmSensing, aims to measure particulate material. A validation is done by comparing the data collected by the prototype with data from stations. The comparison shows that the results are close, which can enable low-cost solutions to the problem. The system still presents a predictive analysis using recurrent neural networks, in this case the LSTM-RNN, where the predictions presented high accuracy in relation to the real data.
【5】 Tree density estimation 链接:https://arxiv.org/abs/2111.11971
作者:László Györfi,Aryeh Kontorovich,Roi Weiss 机构:Department of Computer Science, Ben-Gurion University of the Negev, Beer Sheva, Israel, Ariel University, Shomron, Israel 摘要:我们研究了概率密度为$f(boldsymbol X)$的随机向量${boldsymbol X}$在$mathbb R^d$中的密度估计问题。对于定义在顶点集${1、dots,d}$上的生成树$T$,树密度$f{T}$是二元条件密度的乘积。最优生成树$T^*$是生成树$T$,其中$f$和$f{T}$的Kullback-Leibler散度最小。根据i.i.d.数据,我们确定了最优树$T^*$,并在计算上有效地构造了树密度估计$f|n$,这样,在密度$f$上没有任何正则性条件的情况下,一个具有$lim|ntoinfty}int|f|n(boldsymbol x)-f|T^*}(boldsymbol x)| dboldsymbol x=0$a.s。对于有界支撑的Lipschitz连续$f$,$mathb E{int | f|n(boldsymbol x)-f|T^*}(boldsymbol x)| dboldsymbol x}=O(n^{-1/4})$。 摘要:We study the problem of density estimation for a random vector ${boldsymbol X}$ in $mathbb R^d$ with probability density $f(boldsymbol x)$. For a spanning tree $T$ defined on the vertex set ${1,dots ,d}$, the tree density $f_{T}$ is a product of bivariate conditional densities. The optimal spanning tree $T^*$ is the spanning tree $T$, for which the Kullback-Leibler divergence of $f$ and $f_{T}$ is the smallest. From i.i.d. data we identify the optimal tree $T^*$ and computationally efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has that $lim_{nto infty} int |f_n(boldsymbol x)-f_{T^*}(boldsymbol x)|dboldsymbol x=0$ a.s. For Lipschitz continuous $f$ with bounded support, $mathbb E{ int |f_n(boldsymbol x)-f_{T^*}(boldsymbol x)|dboldsymbol x}=O(n^{-1/4})$.
其他神经网络|深度学习|模型|建模(17篇)
【1】 Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning 标题:两两学习的简单随机在线梯度下降算法 链接:https://arxiv.org/abs/2111.12050
作者:Zhenhuan Yang,Yunwen Lei,Puyu Wang,Tianbao Yang,Yiming Ying 机构:University at Albany, SUNY, Albany, NY, University of Birmingham, Birmingham, City University of Hong Kong, Hong Kong, University of Iowa City, IA 备注:NeurIPS 2021 accepted 摘要:两两学习是指损失函数依赖于一对实例的学习任务。它例示了许多重要的机器学习任务,如二部排序和度量学习。在成对学习中处理流数据的一种流行方法是在线梯度下降(OGD)算法,其中需要将当前实例与具有足够大大小的先前实例的缓冲集配对,因此存在可伸缩性问题。在本文中,我们提出了简单的随机和在线梯度下降方法成对学习。与现有研究的一个显著区别是,我们仅在构建梯度方向时将当前实例与前一个实例配对,这在存储和计算复杂性方面都是有效的。对于凸和非凸以及光滑和非光滑问题,我们给出了新的稳定性结果、优化和推广误差界。在优化和泛化分析中,我们引入了新的技术来解耦模型和先前实例之间的依赖关系。我们的研究解决了一个悬而未决的问题,即使用固定大小的缓冲集为OGD开发有意义的泛化边界。我们还扩展了我们的算法和稳定性分析,以开发用于成对学习的差异私有SGD算法,这显著改进了现有结果。 摘要:Pairwise learning refers to learning tasks where the loss function depends on a pair of instances. It instantiates many important machine learning tasks such as bipartite ranking and metric learning. A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances with a sufficiently large size and therefore suffers from a scalability issue. In this paper, we propose simple stochastic and online gradient descent methods for pairwise learning. A notable difference from the existing studies is that we only pair the current instance with the previous one in building a gradient direction, which is efficient in both the storage and computational complexity. We develop novel stability results, optimization, and generalization error bounds for both convex and nonconvex as well as both smooth and nonsmooth problems. We introduce novel techniques to decouple the dependency of models and the previous instance in both the optimization and generalization analysis. Our study resolves an open question on developing meaningful generalization bounds for OGD using a buffering set with a very small fixed size. We also extend our algorithms and stability analysis to develop differentially private SGD algorithms for pairwise learning which significantly improves the existing results.
【2】 Reviewing continual learning from the perspective of human-level intelligence 标题:从人的智能角度审视持续学习 链接:https://arxiv.org/abs/2111.11964
作者:Yifan Chang,Wenbo Li,Jian Peng,Bo Tang,Yu Kang,Yinjie Lei,Yuanmiao Gui,Qing Zhu,Yu Liu,Haifeng Li 机构:Li Member, IEEE 备注:21 pages, 10 figures 摘要:人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,后者描述了人类如何实现持续学习能力和对所学信息的保存。自人工智能诞生以来,CL的概念一直存在于人工智能中。本文从稳定性与可塑性的关系出发,从更宏观的角度对化学发光现象进行了研究。与生物对应物类似,“智能”人工智能代理应该记住以前学到的信息(信息回顾);ii)不断推断新信息(信息展望:);iii)传输有用信息(信息传输),以实现高水平的CL。根据分类法,介绍了评估指标、算法、应用以及一些未决问题。我们的主要贡献涉及i)从人工通用智能的水平重新检查CL;ii)提供有关CL主题的详细和广泛概述;iii)就CL的潜在发展提出一些新颖的想法。 摘要:Humans' continual learning (CL) ability is closely related to Stability Versus Plasticity Dilemma that describes how humans achieve ongoing learning capacity and preservation for learned information. The notion of CL has always been present in artificial intelligence (AI) since its births. This paper proposes a comprehensive review of CL. Different from previous reviews that mainly focus on the catastrophic forgetting phenomenon in CL, this paper surveys CL from a more macroscopic perspective based on the Stability Versus Plasticity mechanism. Analogous to biological counterpart, "smart" AI agents are supposed to i) remember previously learned information (information retrospection); ii) infer on new information continuously (information prospection:); iii) transfer useful information (information transfer), to achieve high-level CL. According to the taxonomy, evaluation metrics, algorithms, applications as well as some open issues are then introduced. Our main contributions concern i) rechecking CL from the level of artificial general intelligence; ii) providing a detailed and extensive overview on CL topics; iii) presenting some novel ideas on the potential development of CL.
【3】 Modelling Direct Messaging Networks with Multiple Recipients for Cyber Deception 标题:具有多个接收者的网络欺骗直接消息传递网络建模 链接:https://arxiv.org/abs/2111.11932
作者:Kristen Moore,Cody J. Christopher,David Liebowitz,Surya Nepal,Renee Selvey 机构:∗Data, CSIRO, †Cyber Security, Cooperative Research, Centre Australia, ‡Penten Pty Ltd, Canberra, Australia, §UNSW, Sydney, Australia, ¶ANU 摘要:网络欺骗正在成为保护网络和系统免受攻击者和数据窃贼攻击的一种有前途的方法。然而,尽管部署成本相对较低,但大规模生成真实内容的成本非常高,因为丰富的交互式欺骗技术大多是手工制作的。随着最近机器学习的改进,我们现在有机会将规模和自动化引入到真实和诱人的模拟内容的创建中。在这项工作中,我们提出了一个框架来自动生成电子邮件和即时消息风格的大规模群组通信。组织内的此类消息传递平台在私人通信和文档附件中包含大量有价值的信息,使其成为对手的诱人目标。我们讨论了模拟这类系统的两个关键方面:建模参与者何时和与谁交流,以及生成主题、多方文本以填充模拟对话线程。我们将LogNormMix网络时间点过程作为第一种方法,基于Shchur等人的无强度建模方法~cite{shchur2019intensity}创建单播和多播通信的生成模型。我们演示了如何使用经过微调、预先训练的语言模型来生成令人信服的多方对话线索。通过将我们的LogNormMix Net TPP(用于生成通信时间戳、发送者和接收者)与生成多方电子邮件线程内容的语言模型相结合,模拟了实时电子邮件服务器。我们根据许多基于现实主义的属性来评估生成的内容,这些属性鼓励模型学习生成能够吸引对手注意力的内容,从而实现欺骗结果。 摘要:Cyber deception is emerging as a promising approach to defending networks and systems against attackers and data thieves. However, despite being relatively cheap to deploy, the generation of realistic content at scale is very costly, due to the fact that rich, interactive deceptive technologies are largely hand-crafted. With recent improvements in Machine Learning, we now have the opportunity to bring scale and automation to the creation of realistic and enticing simulated content. In this work, we propose a framework to automate the generation of email and instant messaging-style group communications at scale. Such messaging platforms within organisations contain a lot of valuable information inside private communications and document attachments, making them an enticing target for an adversary. We address two key aspects of simulating this type of system: modelling when and with whom participants communicate, and generating topical, multi-party text to populate simulated conversation threads. We present the LogNormMix-Net Temporal Point Process as an approach to the first of these, building upon the intensity-free modeling approach of Shchur et al.~cite{shchur2019intensity} to create a generative model for unicast and multi-cast communications. We demonstrate the use of fine-tuned, pre-trained language models to generate convincing multi-party conversation threads. A live email server is simulated by uniting our LogNormMix-Net TPP (to generate the communication timestamp, sender and recipients) with the language model, which generates the contents of the multi-party email threads. We evaluate the generated content with respect to a number of realism-based properties, that encourage a model to learn to generate content that will engage the attention of an adversary to achieve a deception outcome.
【4】 Functional Model of Residential Consumption Elasticity under Dynamic Tariffs 标题:动态关税下的居民消费弹性函数模型 链接:https://arxiv.org/abs/2111.11875
作者:Kamalanathan Ganesan,João Tomé Saraiva,Ricardo J. Bessa 机构: University of Porto (FEUP) 备注:28 pages, 19 figures, journal paper - Elsevier: Energy & Buildings 摘要:零售商面临的主要障碍之一是了解他们可以从合同需求响应(DR)客户那里获得的消费弹性。零售商提供的DR产品的当前趋势不是针对消费者的,这对消费者积极参与这些计划构成了额外的障碍。消费者需求行为的弹性因个人而异。电力公司将受益于更准确地了解其价格变化将如何改变其客户的消费模式。本文提出了DR合同消费者消费弹性的函数模型。该模型旨在确定DR消费者可以为零售商或公用事业公司提供不同价格水平的负荷调整。所提出的模型使用贝叶斯概率方法来确定单个签约客户可针对其可能经历的不同价格水平提供的实际负荷调整。所开发的框架为零售商或公用事业公司提供了一个工具,以获取个人消费者如何应对不同价格水平的关键信息。这种方法能够量化消费者对灾难恢复信号作出反应的可能性,并识别单个签约灾难恢复客户针对其可能经历的不同价格水平提供的实际负载调整。该信息可用于最大限度地控制零售商或公用事业公司可向系统运营商提供的服务并提高其可靠性。 摘要:One of the major barriers for the retailers is to understand the consumption elasticity they can expect from their contracted demand response (DR) clients. The current trend of DR products provided by retailers are not consumer-specific, which poses additional barriers for the active engagement of consumers in these programs. The elasticity of consumers demand behavior varies from individual to individual. The utility will benefit from knowing more accurately how changes in its prices will modify the consumption pattern of its clients. This work proposes a functional model for the consumption elasticity of the DR contracted consumers. The model aims to determine the load adjustment the DR consumers can provide to the retailers or utilities for different price levels. The proposed model uses a Bayesian probabilistic approach to identify the actual load adjustment an individual contracted client can provide for different price levels it can experience. The developed framework provides the retailers or utilities with a tool to obtain crucial information on how an individual consumer will respond to different price levels. This approach is able to quantify the likelihood with which the consumer reacts to a DR signal and identify the actual load adjustment an individual contracted DR client provides for different price levels they can experience. This information can be used to maximize the control and reliability of the services the retailer or utility can offer to the System Operators.
【5】 Variance Reduction in Deep Learning: More Momentum is All You Need 标题:深度学习中的方差减少:你只需要更多的动力 链接:https://arxiv.org/abs/2111.11828
作者:Lionel Tondji,Sergii Kashubin,Moustapha Cisse 机构: = arg minθE(x∼P)ℓ(fθ(x))( 1)The celebrated optimization algorithm for solving this prob-lem in large-scale machine learning is Stochastic Gradient 1Institute of Analysis and Algebra 备注:23 pages, 8 figures 摘要:方差缩减(VR)技术极大地促进了在光滑和强凸环境中使用海量数据集的学习(Schmidt等人,2017年;Johnson&Zhang,2013年;Roux等人,2012年)。然而,由于各种因素,如使用数据扩充或辍学等正规化方法,此类技术在大规模深度学习领域尚未取得同样的成功(Defazio&Bottou,2019)。这一挑战最近推动了专为深度学习量身定制的新型方差减少技术的设计(Arnold等人,2019年;Ma&Yarats,2018年)。这项工作是朝着这个方向迈出的又一步。特别是,我们利用深度学习中使用的丰富数据集的普遍聚类结构,通过将现有优化器(如SGD 动量、准双曲动量、隐式梯度传输)与多动量策略相结合,设计了一系列可扩展的方差缩减优化程序(Yuan等人,2019)。我们的建议在标准基准数据集(如CIFAR和ImageNet)上比普通方法更快地收敛。它对标记噪声具有很强的鲁棒性,并且易于分布式优化。我们在JAX中提供了一个并行实现。 摘要:Variance reduction (VR) techniques have contributed significantly to accelerating learning with massive datasets in the smooth and strongly convex setting (Schmidt et al., 2017; Johnson & Zhang, 2013; Roux et al., 2012). However, such techniques have not yet met the same success in the realm of large-scale deep learning due to various factors such as the use of data augmentation or regularization methods like dropout (Defazio & Bottou, 2019). This challenge has recently motivated the design of novel variance reduction techniques tailored explicitly for deep learning (Arnold et al., 2019; Ma & Yarats, 2018). This work is an additional step in this direction. In particular, we exploit the ubiquitous clustering structure of rich datasets used in deep learning to design a family of scalable variance reduced optimization procedures by combining existing optimizers (e.g., SGD Momentum, Quasi Hyperbolic Momentum, Implicit Gradient Transport) with a multi-momentum strategy (Yuan et al., 2019). Our proposal leads to faster convergence than vanilla methods on standard benchmark datasets (e.g., CIFAR and ImageNet). It is robust to label noise and amenable to distributed optimization. We provide a parallel implementation in JAX.
【6】 Composing Partial Differential Equations with Physics-Aware Neural Networks 标题:用物理感知神经网络合成偏微分方程 链接:https://arxiv.org/abs/2111.11798
作者:Matthias Karlbauer,Timothy Praditia,Sebastian Otte,Sergey Oladyshkin,Wolfgang Nowak,Martin V. Butz 机构:University of T¨ubingen, University of Stuttgart 备注:Submitted to ICLR2022. Reviews and rebuttals on this https URL 摘要:我们介绍了一种用于学习时空对流扩散过程的组合物理感知神经网络(FINN)。FINN通过以组合方式对偏微分方程(PDE)的组成部分进行建模,实现了一种将人工神经网络的学习能力与数值模拟中的物理和结构知识相结合的新方法。在一维和二维偏微分方程(Burger's、扩散吸附、扩散反应、Allen-Cahn)上的结果表明,FINN在初始和边界条件之外具有优越的建模精度和出色的分布外泛化能力。FINN平均只有十分之一的参数,在所有情况下都优于纯机器学习和其他最先进的物理感知模型——通常甚至超过多个数量级。此外,FINN在扩散吸附情景中逼近稀疏真实世界数据时的表现优于校准物理模型,通过揭示观测过程的未知延迟因子,证实了其泛化能力并显示出解释潜力。 摘要:We introduce a compositional physics-aware neural network (FINN) for learning spatiotemporal advection-diffusion processes. FINN implements a new way of combining the learning abilities of artificial neural networks with physical and structural knowledge from numerical simulation by modeling the constituents of partial differential equations (PDEs) in a compositional manner. Results on both one- and two-dimensional PDEs (Burger's, diffusion-sorption, diffusion-reaction, Allen-Cahn) demonstrate FINN's superior modeling accuracy and excellent out-of-distribution generalization ability beyond initial and boundary conditions. With only one tenth of the number of parameters on average, FINN outperforms pure machine learning and other state-of-the-art physics-aware models in all cases -- often even by multiple orders of magnitude. Moreover, FINN outperforms a calibrated physical model when approximating sparse real-world data in a diffusion-sorption scenario, confirming its generalization abilities and showing explanatory potential by revealing the unknown retardation factor of the observed process.
【7】 Independent Learning in Stochastic Games 标题:随机博弈中的自主学习 链接:https://arxiv.org/abs/2111.11743
作者:Asuman Ozdaglar,Muhammed O. Sayin,Kaiqing Zhang 机构: Zhang are withthe Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 备注:An invited chapter for the International Congress of Mathematicians 2022 (ICM 2022) 摘要:强化学习(RL)最近在许多人工智能应用中取得了巨大的成功。RL的许多前沿应用涉及多个代理,例如下棋和围棋游戏、自动驾驶和机器人技术。不幸的是,经典RL构建的框架不适合多agent学习,因为它假设agent的环境是静态的,并且没有考虑其他agent的适应性。在这篇综述文章中,我们提出了动态环境中多智能体学习的随机博弈模型。我们关注于随机博弈中简单而独立的学习动力学的发展:每个代理都是短视的,在不与对手协调的情况下,对其他代理的策略选择最佳响应类型的动作。对于随机博弈,在发展收敛的最佳响应型独立学习动力学方面进展有限。我们介绍了我们最近提出的保证零和随机博弈收敛的简单和独立的学习动力学,以及在此背景下的动态多智能体学习的其他同期算法。在此过程中,我们还重新检查了博弈论和RL文献中的一些经典结果,以定位我们独立学习动力的概念贡献和我们分析的数学新颖性。我们希望这篇综述文章能够推动博弈论中的自主学习和自然学习动力的研究重新兴起,以应对更具挑战性的动态环境。 摘要:Reinforcement learning (RL) has recently achieved tremendous successes in many artificial intelligence applications. Many of the forefront applications of RL involve multiple agents, e.g., playing chess and Go games, autonomous driving, and robotics. Unfortunately, the framework upon which classical RL builds is inappropriate for multi-agent learning, as it assumes an agent's environment is stationary and does not take into account the adaptivity of other agents. In this review paper, we present the model of stochastic games for multi-agent learning in dynamic environments. We focus on the development of simple and independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents' strategy without any coordination with her opponent. There has been limited progress on developing convergent best-response type independent learning dynamics for stochastic games. We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum stochastic games, together with a review of other contemporaneous algorithms for dynamic multi-agent learning in this setting. Along the way, we also reexamine some classical results from both the game theory and RL literature, to situate both the conceptual contributions of our independent learning dynamics, and the mathematical novelties of our analysis. We hope this review paper serves as an impetus for the resurgence of studying independent and natural learning dynamics in game theory, for the more challenging settings with a dynamic environment.
【8】 Sample Efficient Imitation Learning via Reward Function Trained in Advance 标题:基于预先训练奖励函数的样本有效模仿学习 链接:https://arxiv.org/abs/2111.11711
作者:Lihua Zhang 机构:School of Computer Science and Technology, Soochow University, Soochow, Provincial Key Laboratory for Computer Information Processing Technology, Soochow University 摘要:模仿学习(IL)是一个从演示中学习模仿专家行为的框架。最近,IL在高维和控制任务上显示了有希望的结果。然而,IL在环境交互方面通常存在样本效率低下的问题,这严重限制了其在模拟领域的应用。在工业应用中,学习者通常具有较高的交互成本,与环境的交互越多,对环境和学习者自身造成的损害就越大。在本文中,我们通过引入一种新的逆强化学习方案来提高样本效率。我们称之为text{基于模仿学习的模型奖励函数}(MRFIL)的方法,使用集成动态模型作为奖励函数,通过专家演示进行训练。其关键思想是,通过在遇到符合专家演示分布的状态时提供积极的奖励,为代理提供激励,使其能够在较长的时间内匹配演示。此外,我们还证明了新目标函数的收敛性保证。实验结果表明,与IL方法相比,我们的算法达到了有竞争力的性能,并且显著减少了环境交互。 摘要:Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations. Recently, IL shows promising results on high dimensional and control tasks. However, IL typically suffers from sample inefficiency in terms of environment interaction, which severely limits their application to simulated domains. In industrial applications, learner usually have a high interaction cost, the more interactions with environment, the more damage it causes to the environment and the learner itself. In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning. Our method, which we call textit{Model Reward Function Based Imitation Learning} (MRFIL), uses an ensemble dynamic model as a reward function, what is trained with expert demonstrations. The key idea is to provide the agent with an incentive to match the demonstrations over a long horizon, by providing a positive reward upon encountering states in line with the expert demonstration distribution. In addition, we demonstrate the convergence guarantee for new objective function. Experimental results show that our algorithm reaches the competitive performance and significantly reducing the environment interactions compared to IL methods.
【9】 A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence 标题:上下文潜在空间模型:旋律序列中的子序列调制 链接:https://arxiv.org/abs/2111.11703
作者:Taketo Akama 机构:Sony Computer Science Laboratories, Tokyo, Japan 备注:22nd International Society for Music Information Retrieval Conference (ISMIR), 2021; 8 pages 摘要:一些序列的生成模型,如音乐和文本,允许我们只编辑子序列,给定周围的上下文序列,这在以交互方式指导生成中起着重要作用。然而,编辑子序列主要涉及从可能的生成空间对子序列进行随机重采样。我们提出了一个上下文潜在空间模型(CLSM),以便用户能够利用生成空间中的方向感探索子序列生成,例如插值,以及探索变化——语义相似的可能子序列。基于上下文的先验知识和译码器构成了CLSM的生成模型,而基于上下文位置的编码器则是CLSM的推理模型。在实验中,我们使用了一个单声道符号音乐数据集,证明了我们的上下文潜在空间在插值方面比基线更平滑,并且生成的样本质量优于基线模型。生成示例可在线获取。 摘要:Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences, which plays an important part in steering generation interactively. However, editing subsequences mainly involves randomly resampling subsequences from a possible generation space. We propose a contextual latent space model (CLSM) in order for users to be able to explore subsequence generation with a sense of direction in the generation space, e.g., interpolation, as well as exploring variations -- semantically similar possible subsequences. A context-informed prior and decoder constitute the generative model of CLSM, and a context position-informed encoder is the inference model. In experiments, we use a monophonic symbolic music dataset, demonstrating that our contextual latent space is smoother in interpolation than baselines, and the quality of generated samples is superior to baseline models. The generation examples are available online.
【10】 Deep learning-based fast solver of the shallow water equations 标题:基于深度学习的浅水方程快速求解器 链接:https://arxiv.org/abs/2111.11702
作者:Mojtaba Forghani,Yizhou Qian,Jonghyun Lee,Matthew W. Farthing,Tyler Hesser,Peter K. Kitanidis,Eric F. Darve 机构:Department of Mechanical Engineering, Stanford University, CA, Institute for Computational and Mathematical Engineering, Stanford University, CA, Department of Civil and Environmental Engineering, University of Hawaii at Manoa, Honolulu, HI 备注:arXiv admin note: substantial text overlap with arXiv:2012.02620 摘要:河流流速的快速可靠预测在许多应用中都很重要,包括洪水风险管理。浅水方程(SWE)通常用于此目的。然而,传统的SWE数值解算器计算成本高,需要高分辨率河床剖面测量(水深测量)。在这项工作中,我们提出了一个两阶段的过程,首先,使用主成分地质统计方法(PCGA)从流速测量估计水深测量的概率密度函数,然后使用机器学习(ML)算法获得SWE的快速解算器。快速解算器使用来自后水深分布的实现,并将规定的BCs范围作为输入。第一阶段允许我们在不直接测量水深的情况下预测流速。此外,在第二阶段将测深后验分布作为ML算法的输入之前,我们将其扩展为更一般的一类分布。这使得解算器能够将未来的直接测深测量纳入流速预测中,以提高精度,即使水深测量与其原始间接估计相比随时间发生变化。我们提出并测试了三种不同的解算器,即PCA-DNN(主成分分析深度神经网络)、SE(监督编码器)和SVE(监督变分编码器),并在奥古斯塔的萨凡纳河上进行了验证,我们的结果表明,快速求解器能够以良好的精度预测不同测深和BCs的流速,计算成本显著低于用传统方法求解全边值问题的成本。 摘要:Fast and reliable prediction of river flow velocities is important in many applications, including flood risk management. The shallow water equations (SWEs) are commonly used for this purpose. However, traditional numerical solvers of the SWEs are computationally expensive and require high-resolution riverbed profile measurement (bathymetry). In this work, we propose a two-stage process in which, first, using the principal component geostatistical approach (PCGA) we estimate the probability density function of the bathymetry from flow velocity measurements, and then use machine learning (ML) algorithms to obtain a fast solver for the SWEs. The fast solver uses realizations from the posterior bathymetry distribution and takes as input the prescribed range of BCs. The first stage allows us to predict flow velocities without direct measurement of the bathymetry. Furthermore, we augment the bathymetry posterior distribution to a more general class of distributions before providing them as inputs to ML algorithm in the second stage. This allows the solver to incorporate future direct bathymetry measurements into the flow velocity prediction for improved accuracy, even if the bathymetry changes over time compared to its original indirect estimation. We propose and benchmark three different solvers, referred to as PCA-DNN (principal component analysis-deep neural network), SE (supervised encoder), and SVE (supervised variational encoder), and validate them on the Savannah river, Augusta, GA. Our results show that the fast solvers are capable of predicting flow velocities for different bathymetry and BCs with good accuracy, at a computational cost that is significantly lower than the cost of solving the full boundary value problem with traditional methods.
【11】 Multi-task manifold learning for small sample size datasets 标题:小样本数据集的多任务流形学习 链接:https://arxiv.org/abs/2111.11655
作者:Hideaki Ishibashi,Kazushi Higac,Tetsuo Furukawa 机构:Kyushu Institute of Technology,–, Hibikino, Wakamatsu-ku, Kitakyushu ,-, Japan, Horiba Ltd., Kyoto, Japan 备注:22 pages, 15 figures 摘要:在本研究中,我们开发了一种多任务流形学习方法。该方法旨在提高多任务的流形学习性能,特别是当每个任务的样本数较少时。此外,该方法还旨在为新任务生成新样本,以及为现有任务生成新样本。在该方法中,我们使用了两种不同类型的信息传输:实例传输和模型传输。例如,数据集在相似任务之间合并,而对于模型传输,流形模型在相似任务之间平均。为此,所提出的方法包括一组与任务对应的生成流形模型,这些模型集成到光纤束的通用模型中。我们将所提出的方法应用于人工数据集和人脸图像集,结果表明,该方法能够估计流形,即使是少量样本。 摘要:In this study, we develop a method for multi-task manifold learning. The method aims to improve the performance of manifold learning for multiple tasks, particularly when each task has a small number of samples. Furthermore, the method also aims to generate new samples for new tasks, in addition to new samples for existing tasks. In the proposed method, we use two different types of information transfer: instance transfer and model transfer. For instance transfer, datasets are merged among similar tasks, whereas for model transfer, the manifold models are averaged among similar tasks. For this purpose, the proposed method consists of a set of generative manifold models corresponding to the tasks, which are integrated into a general model of a fiber bundle. We applied the proposed method to artificial datasets and face image sets, and the results showed that the method was able to estimate the manifolds, even for a tiny number of samples.
【12】 Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration 标题:用于实时移动加速的最佳DNN修剪方案的自动映射 链接:https://arxiv.org/abs/2111.11581
作者:Yifan Gong,Geng Yuan,Zheng Zhan,Wei Niu,Zhengang Li,Pu Zhao,Yuxuan Cai,Sijia Liu,Bin Ren,Xue Lin,Xulong Tang,Yanzhi Wang 机构: Northeastern University, Michigan State University 摘要:权重剪枝是一种有效的模型压缩技术,可以解决在移动设备上实现实时深度神经网络(DNN)推理的难题。然而,由于精度降低、难以利用硬件加速和/或对某些类型的DNN层的限制,先前的修剪方案限制了应用场景。在本文中,我们提出了一种通用的、细粒度的结构化剪枝方案和相应的编译器优化,适用于任何类型的DNN层,同时实现了高精度和硬件推理性能。通过我们的编译器优化,我们可以灵活地将不同的剪枝方案应用于不同的层,因此我们进一步探讨了在考虑各种剪枝方案的不同加速和精度性能的情况下确定最适合的剪枝方案的新问题。提出了两种剪枝方案映射方法,一种是基于搜索的,另一种是基于规则的,用于自动为任意给定DNN的每一层导出最适合的剪枝规则和块大小。实验结果表明,我们的剪枝方案映射方法,连同一般的细粒度结构化剪枝方案,在CIFAR-10和ImageNet数据集上以高达2.48$times$和1.73$times$的DNN推理加速性能优于最先进的DNN优化框架,且不会造成精度损失。 摘要:Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods, one is search-based and the other is rule-based, are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48$times$ and 1.73$times$ DNN inference acceleration on CIFAR-10 and ImageNet dataset without accuracy loss.
【13】 KML: Using Machine Learning to Improve Storage Systems 标题:KML:使用机器学习改进存储系统 链接:https://arxiv.org/abs/2111.11554
作者:Ibrahim Umit Akgun,Ali Selman Aydin,Aadil Shaikh,Lukas Velikov,Andrew Burford,Michael McNeill,Michael Arkhangelskiy,Erez Zadok 机构:Stony Brook University 备注:16 pages, 11 figures 摘要:操作系统包括许多启发式算法,旨在提高总体存储性能和吞吐量。由于这种启发式方法无法在所有条件和工作负载下都能很好地工作,因此系统设计人员求助于向用户公开大量可调参数——本质上是让用户不断优化自己的存储系统和应用程序。在I/O繁忙的应用程序中,存储系统通常负责大部分延迟,因此,即使是较小的总体延迟改善也可能是显著的。机器学习(ML)技术有望学习模式,从中进行概括,并实现适应不断变化的工作负载的最佳解决方案。我们建议ML解决方案成为OSs中的一流组件,并取代手动启发法来动态优化存储系统。在本文中,我们描述了我们提出的ML体系结构,称为KML。我们开发了一个KML原型体系结构,并将其应用于两个问题:最佳预读和NFS读取大小值。我们的实验表明,KML只消耗很少的操作系统资源,增加的延迟可以忽略不计,而且还可以学习模式,在这两种使用情形下,可以将I/O吞吐量分别提高2.3倍或15倍——即使是在不同存储设备上同时运行复杂、前所未有的混合工作负载。 摘要:Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users -- essentially burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O heavy applications, so even a small overall latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this paper, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two problems: optimal readahead and NFS read-size values. Our experiments show that KML consumes little OS resources, adds negligible latency, and yet can learn patterns that can improve I/O throughput by as much as 2.3x or 15x for the two use cases respectively -- even for complex, never-before-seen, concurrently running mixed workloads on different storage devices.
【14】 Physics Informed Neural Networks for Control Oriented Thermal Modeling of Buildings 标题:面向控制的建筑热工建模的物理信息神经网络 链接:https://arxiv.org/abs/2111.12066
作者:Gargya Gokhale,Bert Claessens,Chris Develder 备注:13 pages, 7 figures 摘要:本文提出了一种数据驱动的建模方法,用于开发面向控制的建筑物热模型。开发这些模型的目的是降低能耗成本,同时将建筑物的室内温度控制在要求的舒适度范围内。为了结合白盒/灰盒物理模型的可解释性和神经网络的表达能力,我们提出了一种基于物理信息的神经网络建模方法。除了测量数据和建筑参数外,我们还利用控制这些建筑热行为的基础物理对神经网络进行编码。因此,实现一个以物理为指导的模型,有助于建模室温和功耗的时间演变以及隐藏状态,即后续时间步的建筑热质量温度。这项工作的主要研究贡献是:(1)我们提出了两种不同的物理信息神经网络结构,用于面向控制的建筑物热建模任务;(2)我们表明,训练这些结构是数据高效的,与传统的非物理信息神经网络相比,需要更少的训练数据,(3)我们表明,对于更长的预测范围,这些结构比传统的神经网络实现了更精确的预测。我们使用模拟和真实数据测试了所提出的结构的预测性能,以证明(2)和(3),并表明所提出的物理信息神经网络结构可以用于这种面向控制的建模问题。 摘要:This paper presents a data-driven modeling approach for developing control-oriented thermal models of buildings. These models are developed with the objective of reducing energy consumption costs while controlling the indoor temperature of the building within required comfort limits. To combine the interpretability of white/gray box physics models and the expressive power of neural networks, we propose a physics informed neural network approach for this modeling task. Along with measured data and building parameters, we encode the neural networks with the underlying physics that governs the thermal behavior of these buildings. Thus, realizing a model that is guided by physics, aids in modeling the temporal evolution of room temperature and power consumption as well as the hidden state, i.e., the temperature of building thermal mass for subsequent time steps. The main research contributions of this work are: (1) we propose two variants of physics informed neural network architectures for the task of control-oriented thermal modeling of buildings, (2) we show that training these architectures is data-efficient, requiring less training data compared to conventional, non-physics informed neural networks, and (3) we show that these architectures achieve more accurate predictions than conventional neural networks for longer prediction horizons. We test the prediction performance of the proposed architectures using simulated and real-word data to demonstrate (2) and (3) and show that the proposed physics informed neural network architectures can be used for this control-oriented modeling problem.
【15】 Is Shapley Explanation for a model unique? 标题:沙普利对模型的解释是独一无二的吗? 链接:https://arxiv.org/abs/2111.11946
作者:Harsh Kumar,Jithu Chandran 摘要:Shapley值最近已成为解释复杂和简单机器学习模型预测的流行方法。本文讨论了影响Shapley值的因素。特别是,我们探讨了特征分布与其Shapley值之间的关系。我们通过讨论同一模型的不同预测结果在Shapley解释中产生的差异来扩展我们的分析。我们的评估是,特定特征的Shapley值不仅取决于其预期平均值,还取决于其他时刻,如方差,基线预测存在分歧,迹象存在分歧,不同结果的最重要特征存在分歧,如概率、对数优势,以及使用相同的线性概率模型(logit/probit)生成的二元决策。这些分歧不仅停留在局部可解释性上,而且影响全局特征的重要性。我们得出结论,对于给定的模型,没有唯一的Shapley解释。它随模型结果(概率/对数赔率/二元决策,如接受与拒绝)以及模型应用而变化。 摘要:Shapley value has recently become a popular way to explain the predictions of complex and simple machine learning models. This paper is discusses the factors that influence Shapley value. In particular, we explore the relationship between the distribution of a feature and its Shapley value. We extend our analysis by discussing the difference that arises in Shapley explanation for different predicted outcomes from the same model. Our assessment is that Shapley value for particular feature not only depends on its expected mean but on other moments as well such as variance and there are disagreements for baseline prediction, disagreements for signs and most important feature for different outcomes such as probability, log odds, and binary decision generated using same linear probability model (logit/probit). These disagreements not only stay for local explainability but also affect the global feature importance. We conclude that there is no unique Shapley explanation for a given model. It varies with model outcome (Probability/Log-odds/binary decision such as accept vs reject) and hence model application.
【16】 Asteroid Flyby Cycler Trajectory Design Using Deep Neural Networks 标题:基于深度神经网络的小行星飞越旋转器轨迹设计 链接:https://arxiv.org/abs/2111.11858
作者:Naoya Ozaki,Kanta Yanagida,Takuya Chikazawa,Nishanth Pushparaj,Naoya Takeishi,Ryuki Hyodo 机构:Japan Aerospace Exploration Agency, Sagamihara, Kanagawa,-, Japan, The University of Tokyo, Tokyo ,-, Japan, SOKENDAI, Sagamihara, Kanagawa,-, Japan, University of Applied Sciences and Arts Western Switzerland, Carouge, Switzerland 摘要:近年来,小行星探测越来越受到人们的关注。然而,我们刚刚访问了数十颗小行星,同时发现了100多万个天体。由于我们目前的观测和知识应该是有偏见的,因此有必要直接探索多个小行星,以便更好地了解行星建筑材料的遗迹。任务设计解决方案之一是利用具有多重地球重力辅助的小行星飞行自行车轨道。小行星飞越自行车轨道设计问题是具有多个飞越的全局轨道优化问题的一个子类,涉及给定飞越序列的轨道优化问题和确定飞越序列的组合优化问题。随着飞越体数量的增加,该优化问题的计算时间会急剧增加。本文提出了一种利用深度神经网络建立的近似轨道优化结果的替代模型设计小行星飞越自行车轨道的新方法。由于机器学习方法的瓶颈之一是生成大量的轨迹数据库,因此我们通过引入满足Karush-Kuhn-Tucker条件的伪小行星,提出了一种高效的数据库生成策略。应用于JAXA的DESTINY 任务的数值结果表明,该方法可以显著减少搜索小行星飞越序列的计算时间。 摘要:Asteroid exploration has been attracting more attention in recent years. Nevertheless, we have just visited tens of asteroids while we have discovered more than one million bodies. As our current observation and knowledge should be biased, it is essential to explore multiple asteroids directly to better understand the remains of planetary building materials. One of the mission design solutions is utilizing asteroid flyby cycler trajectories with multiple Earth gravity assists. An asteroid flyby cycler trajectory design problem is a subclass of global trajectory optimization problems with multiple flybys, involving a trajectory optimization problem for a given flyby sequence and a combinatorial optimization problem to decide the sequence of the flybys. As the number of flyby bodies grows, the computation time of this optimization problem expands maliciously. This paper presents a new method to design asteroid flyby cycler trajectories utilizing a surrogate model constructed by deep neural networks approximating trajectory optimization results. Since one of the bottlenecks of machine learning approaches is to generate massive trajectory databases, we propose an efficient database generation strategy by introducing pseudo-asteroids satisfying the Karush-Kuhn-Tucker conditions. The numerical result applied to JAXA's DESTINY mission shows that the proposed method can significantly reduce the computational time for searching asteroid flyby sequences.
【17】 Machine Learning for Mars Exploration 链接:https://arxiv.org/abs/2111.11537
作者:Ali Momennasab 备注:16 pages, 0 figures 摘要:人类宇航员面临的风险和行星际距离导致通信速度缓慢且有限,这促使科学家寻求探索火星等遥远行星的自主方法。火星探测的一部分是通过火星探测器和火星快车轨道器等航天器对火星数据的自动收集和分析进行的。这些火星探测航天器和地球上用于分析这些飞行器收集的数据的自主性主要包括机器学习,这是一个人工智能领域,算法在其中收集数据并利用数据进行自我改进。机器学习技术在火星探测中的其他应用有可能解决行星际探测的通信限制和人类风险。此外,通过机器学习分析火星数据有可能在气候、大气和未来潜在居住等诸多领域提供对火星的更深入了解。为了探索机器学习技术在火星探测中的进一步应用,本文将首先总结火星的一般特征和现象,提供火星的总体概况,阐述火星的不确定性,这将有助于探索和理解,总结机器学习技术在火星探测中的每一次当前或以前的使用情况,探索将在未来火星探测任务中使用的机器学习的实现,并探索地球领域中使用的机器学习技术,以提供解决之前描述的火星不确定性的方法。 摘要:Risk to human astronauts and interplanetary distance causing slow and limited communication drives scientists to pursue an autonomous approach to exploring distant planets, such as Mars. A portion of exploration of Mars has been conducted through the autonomous collection and analysis of Martian data by spacecraft such as the Mars rovers and the Mars Express Orbiter. The autonomy used on these Mars exploration spacecraft and on Earth to analyze data collected by these vehicles mainly consist of machine learning, a field of artificial intelligence where algorithms collect data and self-improve with the data. Additional applications of machine learning techniques for Mars exploration have potential to resolve communication limitations and human risks of interplanetary exploration. In addition, analyzing Mars data with machine learning has the potential to provide a greater understanding of Mars in numerous domains such as its climate, atmosphere, and potential future habitation. To explore further utilizations of machine learning techniques for Mars exploration, this paper will first summarize the general features and phenomena of Mars to provide a general overview of the planet, elaborate upon uncertainties of Mars that would be beneficial to explore and understand, summarize every current or previous usage of machine learning techniques in the exploration of Mars, explore implementations of machine learning that will be utilized in future Mars exploration missions, and explore machine learning techniques used in Earthly domains to provide solutions to the previously described uncertainties of Mars.
其他(13篇)
【1】 Forget-SVGD: Particle-Based Bayesian Federated Unlearning 链接:https://arxiv.org/abs/2111.12056
作者:Jinu Gong,Osvaldo Simeone,Rahif Kassab,Joonhyuk Kang 备注:submitted for conference publication 摘要:基于变分粒子的贝叶斯学习方法具有不受影响传统参数化技术的偏差限制的优点。本文利用非参数贝叶斯近似推理的灵活性,提出了一种新的贝叶斯联邦学习方法,称为Forget-Stein变分梯度下降(Forget-SVGD)。忘记SVGD构建在SVGD(一种基于粒子的近似贝叶斯推理方案,使用基于梯度的确定性更新)及其分布式(联邦)扩展分布式SVGD(DSVGD)之上。联邦学习完成后,当一个或多个参与代理请求“忘记”其数据时,“忘记”SVGD在数据需要“取消学习”的代理上执行本地SVGD更新,这些更新与与参数服务器的通信轮次交错。通过与非参数方案的性能比较,验证了该方法的有效性。非参数方案通过排除要遗忘的数据进行从头开始的训练,并且与现有的参数贝叶斯解学习方法进行了比较。 摘要:Variational particle-based Bayesian learning methods have the advantage of not being limited by the bias affecting more conventional parametric techniques. This paper proposes to leverage the flexibility of non-parametric Bayesian approximate inference to develop a novel Bayesian federated unlearning method, referred to as Forget-Stein Variational Gradient Descent (Forget-SVGD). Forget-SVGD builds on SVGD - a particle-based approximate Bayesian inference scheme using gradient-based deterministic updates - and on its distributed (federated) extension known as Distributed SVGD (DSVGD). Upon the completion of federated learning, as one or more participating agents request for their data to be "forgotten", Forget-SVGD carries out local SVGD updates at the agents whose data need to be "unlearned", which are interleaved with communication rounds with a parameter server. The proposed method is validated via performance comparisons with non-parametric schemes that train from scratch by excluding data to be forgotten, as well as with existing parametric Bayesian unlearning methods.
【2】 Identifying the Units of Measurement in Tabular Data 标题:识别表格数据中的测量单位 链接:https://arxiv.org/abs/2111.11959
作者:Taha Ceritli,Christopher K. I. Williams 机构: School of Informatics, University of Edinburgh, UK, Alan Turing Institute, London, UK 摘要:我们考虑在数据列中识别测量单位的问题,该数据列包含每行中的数值和单位符号,例如“5.2 L”、“7品脱”。在这种情况下,我们试图确定列的尺寸(如体积),并将单位符号与从知识图中获得的有效单位(如升、品脱)联系起来。下面我们介绍PUC,一个概率单位规范化器,它可以准确地识别度量单位,提取定量数据列的语义描述并规范化它们的条目。我们展示了第一个为测量单位标注的混乱的现实世界表格数据集,这可以支持并加速这一领域的研究。我们在这些数据集上的实验表明,与现有解决方案相比,PUC获得了更好的结果。 摘要:We consider the problem of identifying the units of measurement in a data column that contains both numeric values and unit symbols in each row, e.g., "5.2 l", "7 pints". In this case we seek to identify the dimension of the column (e.g. volume) and relate the unit symbols to valid units (e.g. litre, pint) obtained from a knowledge graph. Below we present PUC, a Probabilistic Unit Canonicalizer that can accurately identify the units of measurement, extract semantic descriptions of quantitative data columns and canonicalize their entries. We present the first messy real-world tabular datasets annotated for units of measurement, which can enable and accelerate the research in this area. Our experiments on these datasets show that PUC achieves better results than existing solutions.
【3】 ptype-cat: Inferring the Type and Values of Categorical Variables 标题:ptype-cat:推断范畴变量的类型和值 链接:https://arxiv.org/abs/2111.11956
作者:Taha Ceritli,Christopher K. I. Williams 机构: School of Informatics, University of Edinburgh, UK, Alan Turing Institute, London, UK 摘要:类型推断是识别数据列中的值类型的任务,在文献中已被广泛研究。大多数现有的类型推断方法支持数据类型,如布尔、日期、浮点、整数和字符串。然而,这些方法不考虑非布尔分类变量,其中存在由整数或字符串编码的两个以上的可能值。因此,这些列被注释为整数或字符串,而不是类别,需要用户手动转换为类别。在本文中,我们提出了一种概率类型推理方法,可以识别一般的分类数据类型(包括非布尔变量)。此外,我们通过调整现有的类型推断方法ptype来识别每个分类变量的可能值。结合这些方法,我们提出了ptype cat,它比现有的适用解决方案取得了更好的结果。 摘要:Type inference is the task of identifying the type of values in a data column and has been studied extensively in the literature. Most existing type inference methods support data types such as Boolean, date, float, integer and string. However, these methods do not consider non-Boolean categorical variables, where there are more than two possible values encoded by integers or strings. Therefore, such columns are annotated either as integer or string rather than categorical, and need to be transformed into categorical manually by the user. In this paper, we propose a probabilistic type inference method that can identify the general categorical data type (including non-Boolean variables). Additionally, we identify the possible values of each categorical variable by adapting the existing type inference method ptype. Combining these methods, we present ptype-cat which achieves better results than existing applicable solutions.
【4】 Inferring User Facial Affect in Work-like Settings 标题:在类似工作的环境中推断用户的面部情感 链接:https://arxiv.org/abs/2111.11862
作者:Chaudhary Muhammad Aqdus Ilyas,Siyang Song,Hatice Gunes 机构:Affective Intelligence & Robotics Lab, Department of Computer Science & Technology, University of Cambridge, UK 摘要:与快乐、悲伤、恐惧、愤怒、厌恶和惊讶这六种基本情绪不同,根据配价(积极性-消极性)和唤醒(强度)对维度影响进行建模和预测已被证明在自然主义和现实世界环境中更加灵活、适用和有用。在本文中,我们旨在推断当用户在不同难度水平(基线、轻松、困难和压力条件)下从事多个类似工作的任务时,用户的面部表情,包括(i)一个类似办公室的环境,其中他们承担的任务体力要求较低,但需要更大的精神压力;(ii)需要使用精细运动技能的装配线式设置;(iii)类似办公室的环境,代表远程工作和远程会议。为了实现这一目标,我们首先设计了一项不同条件的研究,收集了12名受试者的多模态数据。然后,我们使用各种机器学习模型进行了一些实验,发现:(i)面部情感的显示和预测在非工作环境和工作环境中有所不同;(ii)通过使用在类似于工作的环境中捕获的数据集,可以提高预测能力;(iii)分段级(光谱表示)信息对于改善面部情感预测至关重要。 摘要:Unlike the six basic emotions of happiness, sadness, fear, anger, disgust and surprise, modelling and predicting dimensional affect in terms of valence (positivity - negativity) and arousal (intensity) has proven to be more flexible, applicable and useful for naturalistic and real-world settings. In this paper, we aim to infer user facial affect when the user is engaged in multiple work-like tasks under varying difficulty levels (baseline, easy, hard and stressful conditions), including (i) an office-like setting where they undertake a task that is less physically demanding but requires greater mental strain; (ii) an assembly-line-like setting that requires the usage of fine motor skills; and (iii) an office-like setting representing teleworking and teleconferencing. In line with this aim, we first design a study with different conditions and gather multimodal data from 12 subjects. We then perform several experiments with various machine learning models and find that: (i) the display and prediction of facial affect vary from non-working to working settings; (ii) prediction capability can be boosted by using datasets captured in a work-like context; and (iii) segment-level (spectral representation) information is crucial in improving the facial affect prediction.
【5】 Pruning Self-attentions into Convolutional Layers in Single Path 标题:在单路径中将自我关注修剪成卷积层 链接:https://arxiv.org/abs/2111.11802
作者:Haoyu He,Jing Liu,Zizheng Pan,Jianfei Cai,Jing Zhang,Dacheng Tao,Bohan Zhuang 机构:Monash University, The University of Sydney, JD Explore Academy 备注:Tech report 摘要:视觉转换器(VIT)在各种计算机视觉任务中取得了令人印象深刻的性能。然而,用多头自我注意(MSA)层建模全局相关性导致了两个被广泛认可的问题:大量的计算资源消耗和缺乏建模局部视觉模式的内在归纳偏差。一个统一的解决方案是,通过基于神经架构搜索(NAS)的修剪方法,搜索是否用计算效率高的卷积类归纳偏差替换某些MSA层。然而,将MSA和不同的候选卷积运算保持为单独的可训练路径会导致昂贵的搜索成本和具有挑战性的优化。相反,我们在MSA和卷积运算之间提出了一种新的权重共享方案,并将搜索问题转化为在每个MSA层中查找要使用的参数子集。权重共享方案进一步允许我们设计一种自动单路径视觉变换修剪方法(SPViT),在给定目标效率约束的情况下,快速将预先训练的VIT修剪成精确紧凑的混合模型,大大降低搜索成本。我们在两个具有代表性的ViT模型上进行了大量实验,结果表明我们的方法实现了良好的精度-效率权衡。代码可在https://github.com/zhuang-group/SPViT. 摘要:Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. However, modeling global correlations with multi-head self-attention (MSA) layers leads to two widely recognized issues: the massive computational resource consumption and the lack of intrinsic inductive bias for modeling local visual patterns. One unified solution is to search whether to replace some MSA layers with convolution-like inductive biases that are computationally efficient via neural architecture search (NAS) based pruning methods. However, maintaining MSA and different candidate convolutional operations as separate trainable paths gives rise to expensive search cost and challenging optimization. Instead, we propose a novel weight-sharing scheme between MSA and convolutional operations and cast the search problem as finding which subset of parameters to use in each MSA layer. The weight-sharing scheme further allows us to devise an automatic Single-Path Vision Transformer pruning method (SPViT) to quickly prune the pre-trained ViTs into accurate and compact hybrid models with significantly reduced search cost, given target efficiency constraints. We conduct extensive experiments on two representative ViT models showing our method achieves a favorable accuracy-efficiency trade-off. Code is available at https://github.com/zhuang-group/SPViT.
【6】 A self-training framework for glaucoma grading in OCT B-scans 标题:OCT B超青光眼分级的自我训练框架 链接:https://arxiv.org/abs/2111.11771
作者:Gabriel García,Adrián Colomer,Rafael Verdú-Monedero,José Dolz,Valery Naranjo 机构:∗Instituto de Investigaci´on e Innovaci´on en Bioingenier´ıa, I,B Universitat Politecnica de Valencia, Valencia, Spain., †Departamento de TICs, Universidad Polit´ecnica de Cartagena, Cartagena, Spain. 备注:5 pages, 4 figures, 3 tables, 2 algorithms, international conference 摘要:在这篇文章中,我们提出了一个基于自我训练的框架,在存在区域移位的情况下,使用OCT B扫描进行青光眼分级。特别地,所提出的两步学习方法借助于在第一步中生成的伪标签来扩充目标域上的训练数据集,然后使用该训练数据集来训练最终的目标模型。这允许从未标记的数据传输知识域。此外,我们还提出了一种新的青光眼特异性主干,它通过跳跃连接引入残余和注意模块,以细化潜在空间的嵌入特征。通过这样做,我们的模型能够从定量和可解释性的角度改进最新技术。报告的结果表明,通过仅使用源示例中的标签,所提出的学习策略可以提高模型在目标数据集上的性能,而不需要额外的注释步骤。我们的模型在不同指标上的表现始终优于基线1-3%,并弥补了在标记目标数据上训练模型的差距。 摘要:In this paper, we present a self-training-based framework for glaucoma grading using OCT B-scans under the presence of domain shift. Particularly, the proposed two-step learning methodology resorts to pseudo-labels generated during the first step to augment the training dataset on the target domain, which is then used to train the final target model. This allows transferring knowledge-domain from the unlabeled data. Additionally, we propose a novel glaucoma-specific backbone which introduces residual and attention modules via skip-connections to refine the embedding features of the latent space. By doing this, our model is capable of improving state-of-the-art from a quantitative and interpretability perspective. The reported results demonstrate that the proposed learning strategy can boost the performance of the model on the target dataset without incurring in additional annotation steps, by using only labels from the source examples. Our model consistently outperforms the baseline by 1-3% across different metrics and bridges the gap with respect to training the model on the labeled target data.
【7】 Schedule Based Temporal Difference Algorithms 链接:https://arxiv.org/abs/2111.11768
作者:Rohan Deb,Meet Gandhi,Shalabh Bhatnagar 机构:Department of Computer Science and Automation, Indian Institute of Science, Bangalore 摘要:从数据样本中学习给定策略的值函数是强化学习中的一个重要问题。TD($lambda$)是解决此问题的一类流行算法。但是,在参数$lambda$控制下,以TD($lambda$)为单位分配给不同$n$步返回的权重随着$n$的增加呈指数下降。在本文中,我们提出了一个$lambda$-调度过程,该过程将TD($lambda$)算法推广到参数$lambda$随时间步长变化的情况。这允许权重分配的灵活性,即,用户可以通过选择序列${lambdau t}{u{tgeq 1}$来指定分配给不同$n$步返回的权重。基于这个过程,我们分别提出了一个策略上的算法-TD($lambda$)-调度和两个策略下的算法-GTD($lambda$)-调度和TDC($lambda$)-调度。在一般的马尔可夫噪声框架下,我们证明了这三种算法的几乎肯定收敛性。 摘要:Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD($lambda$) is a popular class of algorithms to solve this problem. However, the weights assigned to different $n$-step returns in TD($lambda$), controlled by the parameter $lambda$, decrease exponentially with increasing $n$. In this paper, we present a $lambda$-schedule procedure that generalizes the TD($lambda$) algorithm to the case when the parameter $lambda$ could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different $n$-step returns by choosing a sequence ${lambda_t}_{t geq 1}$. Based on this procedure, we propose an on-policy algorithm - TD($lambda$)-schedule, and two off-policy algorithms - GTD($lambda$)-schedule and TDC($lambda$)-schedule, respectively. We provide proofs of almost sure convergence for all three algorithms under a general Markov noise framework.
【8】 Lossless Compression with Probabilistic Circuits 标题:基于概率电路的无损压缩 链接:https://arxiv.org/abs/2111.11632
作者:Anji Liu,Stephan Mandt,Guy Van den Broeck 机构:CS Department, UCLA, University of California, Irvine 摘要:尽管在图像生成方面取得了广泛的进展,但当应用于无损压缩时,深度生成模型是次优的。例如,诸如VAE之类的模型由于其潜在变量而遭受压缩成本开销,而这些潜在变量只能通过诸如位反向编码之类的精心设计的方案部分消除,从而导致通常较差的单样本压缩率。为了克服这些问题,我们建立了一类新的易于处理的无损压缩模型,允许有效的编码和解码:概率电路(PC)。这是一类涉及$| p |计算单元的神经网络,支持对$D$特征维度的任意子集进行有效边缘化,从而实现高效算术编码。我们推导出了有效的编码和解码方案,它们都具有时间复杂度$mathcal{O}(log(D)cdot | p |)$,其中一个简单方案的线性成本为$D$和$| p |$,使得该方法具有高度的可扩展性。根据经验,我们基于PC的(de)压缩算法比实现类似比特率的神经压缩算法运行速度快5-20倍。通过扩展传统的PC结构学习管道,我们在图像数据集(如MNIST)上获得了最先进的结果。此外,PCs可以自然地与现有的神经压缩算法集成,以提高这些基本模型在自然图像数据集上的性能。我们的结果强调了非标准学习结构可能对神经数据压缩产生的潜在影响。 摘要:Despite extensive progress on image generation, deep generative models are suboptimal when applied to lossless compression. For example, models such as VAEs suffer from a compression cost overhead due to their latent variables that can only be partially eliminated with elaborated schemes such as bits-back coding, resulting in oftentimes poor single-sample compression rates. To overcome such problems, we establish a new class of tractable lossless compression models that permit efficient encoding and decoding: Probabilistic Circuits (PCs). These are a class of neural networks involving $|p|$ computational units that support efficient marginalization over arbitrary subsets of the $D$ feature dimensions, enabling efficient arithmetic coding. We derive efficient encoding and decoding schemes that both have time complexity $mathcal{O} (log(D) cdot |p|)$, where a naive scheme would have linear costs in $D$ and $|p|$, making the approach highly scalable. Empirically, our PC-based (de)compression algorithm runs 5-20x faster than neural compression algorithms that achieve similar bitrates. By scaling up the traditional PC structure learning pipeline, we achieved state-of-the-art results on image datasets such as MNIST. Furthermore, PCs can be naturally integrated with existing neural compression algorithms to improve the performance of these base models on natural image datasets. Our results highlight the potential impact that non-standard learning architectures may have on neural data compression.
【9】 Using mixup as regularization and tuning hyper-parameters for ResNets 标题:利用混合作为正则化和调整ResNet的超参数 链接:https://arxiv.org/abs/2111.11616
作者:Venkata Bhanu Teja Pallakonda 机构:Texas A&M University, College Station, TX 备注:6 pages, 7 figures, 2 tables 摘要:虽然新的计算机视觉体系结构越来越受欢迎,但模型体系结构的影响往往与训练方法的改变或探索有关。基于身份映射的体系结构resnet和DenseNets已经承诺在图像分类任务中取得突破性的结果,并且即使现在给出的数据相当有限,它们仍然是常用的方法。考虑到使用有限资源进行训练的难易性,这项工作重新审视了ResNets,并通过使用混合数据增强作为正则化和调整超参数来改进ResNet50cite{ResNets}。 摘要:While novel computer vision architectures are gaining traction, the impact of model architectures is often related to changes or exploring in training methods. Identity mapping-based architectures ResNets and DenseNets have promised path-breaking results in the image classification task and are go-to methods for even now if the data given is fairly limited. Considering the ease of training with limited resources this work revisits the ResNets and improves the ResNet50 cite{resnets} by using mixup data-augmentation as regularization and tuning the hyper-parameters.
【10】 Building Goal-Oriented Dialogue Systems with Situated Visual Context 链接:https://arxiv.org/abs/2111.11576
作者:Sanchit Agarwal,Jan Jezabek,Arijit Biswas,Emre Barut,Shuyang Gao,Tagyoung Chung 机构:Amazon Alexa AI 摘要:大多数流行的面向目标的对话代理都能够理解会话上下文。然而,随着屏幕虚拟助理的激增,下一代代理也需要理解屏幕上下文,以便提供适当的交互体验,更好地理解用户的目标。在本文中,我们提出了一个新的多模态会话框架,其中对话主体的下一个动作和他们的论点是在会话和视觉语境的共同条件下导出的。具体来说,我们提出了一个新的模型,该模型可以在会话中对可视上下文进行推理,并在给定用户查询的情况下使用可视实体填充API参数。我们的模型可以识别视觉特征,如颜色和形状,以及基于元数据的特征,如与视觉实体相关的价格或星级。为了训练我们的模型,由于缺乏合适的多模态对话数据集,我们还提出了一种新的多模态对话模拟器来生成合成数据,并从MTurk收集真实用户数据,以提高模型的鲁棒性。所提出的模型达到了合理的85%的模型精度,没有高的推理延迟。我们还通过一个多模式虚拟助理的典型家具购物体验演示了所提出的方法。 摘要:Most popular goal-oriented dialogue agents are capable of understanding the conversational context. However, with the surge of virtual assistants with screen, the next generation of agents are required to also understand screen context in order to provide a proper interactive experience, and better understand users' goals. In this paper, we propose a novel multimodal conversational framework, where the dialogue agent's next action and their arguments are derived jointly conditioned both on the conversational and the visual context. Specifically, we propose a new model, that can reason over the visual context within a conversation and populate API arguments with visual entities given the user query. Our model can recognize visual features such as color and shape as well as the metadata based features such as price or star rating associated with a visual entity. In order to train our model, due to a lack of suitable multimodal conversational datasets, we also propose a novel multimodal dialog simulator to generate synthetic data and also collect realistic user data from MTurk to improve model robustness. The proposed model achieves a reasonable 85% model accuracy, without high inference latency. We also demonstrate the proposed approach in a prototypical furniture shopping experience for a multimodal virtual assistant.
【11】 Camera Measurement of Physiological Vital Signs 标题:生理生命体征的摄像测量 链接:https://arxiv.org/abs/2111.11547
作者:Daniel McDuff 摘要:医疗监控对远程工具的需求从未如此明显。生命体征的摄像机测量利用成像设备通过分析人体图像来计算生理变化。基于光学、机器学习、计算机视觉和医学方面的进步,自数码相机发明以来,这些技术取得了重大进展。本文对生理生命体征的摄像机测量进行了全面综述,描述了可测量的生命体征及其计算技术。我将介绍临床和非临床应用,以及这些应用从概念验证发展到概念验证需要克服的挑战。最后,我描述了可供研究社区使用的当前资源(数据集和代码),并提供了一个全面的网页(https://cameravitals.github.io/)通过这些资源的链接和本文引用的所有论文的分类列表。 摘要:The need for remote tools for healthcare monitoring has never been more apparent. Camera measurement of vital signs leverages imaging devices to compute physiological changes by analyzing images of the human body. Building on advances in optics, machine learning, computer vision and medicine these techniques have progressed significantly since the invention of digital cameras. This paper presents a comprehensive survey of camera measurement of physiological vital signs, describing they vital signs that can be measured and the computational techniques for doing so. I cover both clinical and non-clinical applications and the challenges that need to be overcome for these applications to advance from proofs-of-concept. Finally, I describe the current resources (datasets and code) available to the research community and provide a comprehensive webpage (https://cameravitals.github.io/) with links to these resource and a categorized list of all the papers referenced in this article.
【12】 On Data-centric Myths 标题:论以数据为中心的神话 链接:https://arxiv.org/abs/2111.11514
作者:Antonia Marcu,Adam Prügel-Bennett 机构:Vision, Learning and Control, University of Southampton 备注:arXiv admin note: text overlap with arXiv:2110.13968 摘要:社区缺乏建立良好数据集的理论指导。我们分析了与数据的哪些方面有关的理论方向,并得出结论,从现有文献中得出的直觉是错误和误导的。使用经验反例,我们表明1)数据维度不必最小化,2)在处理数据时,保留分布是不必要的。这需要更多的数据意识理论理解。虽然在这项工作中没有探索,但我们提出研究数据修改对学习表征的影响是一个有希望的研究方向。 摘要:The community lacks theory-informed guidelines for building good data sets. We analyse theoretical directions relating to what aspects of the data matter and conclude that the intuitions derived from the existing literature are incorrect and misleading. Using empirical counter-examples, we show that 1) data dimension should not necessarily be minimised and 2) when manipulating data, preserving the distribution is inessential. This calls for a more data-aware theoretical understanding. Although not explored in this work, we propose the study of the impact of data modification on learned representations as a promising research direction.
【13】 Bootstrap Your Flow 标题:引导您的流 链接:https://arxiv.org/abs/2111.11510
作者:Laurence Illing Midgley,Vincent Stimper,Gregor N. C. Simm,José Miguel Hernández-Lobato 机构:Jos´e Miguel Hern´andez-Lobato, Department of Engineering, University of Cambridge 摘要:正态化流是一种灵活的参数化分布,可以通过重要性抽样来近似难处理分布的期望值。然而,当前基于流的方法仅限于具有挑战性的目标,这些目标要么受到模式寻求行为的影响,要么在训练损失中存在较大差异,要么依赖于目标分布的样本,而这些样本可能不可用。为了解决这些挑战,我们将流与退火重要性抽样(AIS)结合起来,同时使用$alpha$-散度作为我们的目标,采用一种新的训练程序FAB(流AIS引导)。因此,流程和AIS将以自举的方式相互改进。我们证明,在以前基于流的方法失败的问题中,FAB可用于产生复杂目标分布(包括Boltzmann分布)的精确近似。 摘要:Normalising flows are flexible, parameterized distributions that can be used to approximate expectations from intractable distributions via importance sampling. However, current flow-based approaches are limited on challenging targets where they either suffer from mode seeking behaviour or high variance in the training loss, or rely on samples from the target distribution, which may not be available. To address these challenges, we combine flows with annealed importance sampling (AIS), while using the $alpha$-divergence as our objective, in a novel training procedure, FAB (Flow AIS Bootstrap). Thereby, the flow and AIS to improve each other in a bootstrapping manner. We demonstrate that FAB can be used to produce accurate approximations to complex target distributions, including Boltzmann distributions, in problems where previous flow-based methods fail.
机器翻译,仅供参考