机器学习学术速递[11.10]

2021-11-17 10:58:10 浏览数 (1)

cs.LG 方向,今日共计97篇

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】 Unsupervised Learning for Identifying High Eigenvector Centrality Nodes: A Graph Neural Network Approach 标题:识别高特征向量中心度节点的无监督学习:一种图神经网络方法 链接:https://arxiv.org/abs/2111.05264

作者:Appan Rakaraddi,Mahardhika Pratama 机构:School of Computer Science, NTU, Singapore 备注:accepted in IEEE BigData 2021 摘要:现有的计算特征向量中心度(EC)的方法往往不足以在较低的时间复杂度下确定EC,或者对于大型网络而言不具有良好的可扩展性,因此使得它们实际上不可靠/计算代价昂贵。因此,开发一种在低计算时间内可扩展的方法至关重要。因此,我们提出了一个深度学习模型来识别具有高特征向量中心性的节点。在使用监督学习方法识别排名靠前的节点方面已有一些工作,但在现实世界中,图形没有标记,因此监督学习方法的部署成为一种危险,其使用变得不切实际。因此,我们设计了CUL(Centrality with Unsupervised Learning)方法来以无监督的方式学习网络中的相对EC分数。为了实现这一点,我们开发了一个基于编码器-解码器的框架,将节点映射到各自的估计EC分数。在不同的合成和真实网络上进行了广泛的实验。我们比较了CUL和基线监督的EC估计方法,类似于过去的一些工作。有人观察到,即使在少量的训练数据集上进行训练,CUL在识别排名较高的节点时,也比其监督对应节点提供了相对更好的准确度得分。我们还表明,CUL比传统的EC计算基线方法速度更快,运行时间更小。该守则可于https://github.com/codexhammer/CUL. 摘要:The existing methods to calculate the Eigenvector Centrality(EC) tend to not be robust enough for determination of EC in low time complexity or not well-scalable for large networks, hence rendering them practically unreliable/ computationally expensive. So, it is of the essence to develop a method that is scalable in low computational time. Hence, we propose a deep learning model for the identification of nodes with high Eigenvector Centrality. There have been a few previous works in identifying the high ranked nodes with supervised learning methods, but in real-world cases, the graphs are not labelled and hence deployment of supervised learning methods becomes a hazard and its usage becomes impractical. So, we devise CUL(Centrality with Unsupervised Learning) method to learn the relative EC scores in a network in an unsupervised manner. To achieve this, we develop an Encoder-Decoder based framework that maps the nodes to their respective estimated EC scores. Extensive experiments were conducted on different synthetic and real-world networks. We compared CUL against a baseline supervised method for EC estimation similar to some of the past works. It was observed that even with training on a minuscule number of training datasets, CUL delivers a relatively better accuracy score when identifying the higher ranked nodes than its supervised counterpart. We also show that CUL is much faster and has a smaller runtime than the conventional baseline method for EC computation. The code is available at https://github.com/codexhammer/CUL.

【2】 Wasserstein Adversarially Regularized Graph Autoencoder 标题:Wasserstein逆正则图自动编码器 链接:https://arxiv.org/abs/2111.04981

作者:Huidong Liang,Junbin Gao 机构:Discipline of Business Analytics, The University of Sydney Business School, Sydney, NSW , Australia 备注:8 pages. 2021 NeurIPS OTML Workshop 摘要:本文介绍了Wasserstein逆正则化图自动编码器(WARGA),它是一种隐式生成算法,通过Wasserstein度量将节点嵌入的潜在分布直接正则化为目标分布。该方法已在实际图形的链路预测和节点聚类任务中得到验证,其中WARGA通常优于基于Kullback-Leibler(KL)散度和典型对抗框架的最新模型。 摘要:This paper introduces Wasserstein Adversarially Regularized Graph Autoencoder (WARGA), an implicit generative algorithm that directly regularizes the latent distribution of node embedding to a target distribution via the Wasserstein metric. The proposed method has been validated in tasks of link prediction and node clustering on real-world graphs, in which WARGA generally outperforms state-of-the-art models based on Kullback-Leibler (KL) divergence and typical adversarial framework.

【3】 On Representation Knowledge Distillation for Graph Neural Networks 标题:关于图神经网络的表示知识提取 链接:https://arxiv.org/abs/2111.04964

作者:Chaitanya K. Joshi,Fayao Liu,Xu Xun,Jie Lin,Chuan-Sheng Foo 机构: All authors are with Institute for InfocommResearch 摘要:知识提取是一种很有前途的学习范式,它可以使用更具表现力但更繁琐的教师模型来提高资源效率图神经网络(GNN)的性能和可靠性。过去关于GNNs蒸馏的工作提出了局部结构保留损失(LSP),它匹配学生和教师节点嵌入空间中的局部结构关系。在本文中,我们做出了两个关键贡献:从方法论的角度,我们研究了保留教师如何嵌入图形数据的全局拓扑是否是GNNs更有效的蒸馏目标,因为现实世界的图形通常包含潜在交互和噪声边。预定义边上的纯局部LSP目标无法实现这一点,因为它忽略了断开连接的节点之间的关系。我们提出了两种更好地保持全局拓扑结构的新方法:(1)全局结构保持损失(GSP),它将LSP扩展到包含所有成对相互作用;(2)图形对比表征蒸馏(G-CRD),它使用对比学习将学生节点嵌入到共享表征空间中的教师节点嵌入对齐。从实验的角度来看,我们在大规模真实数据集上引入了一组扩展的基准测试,其中教师和学生GNN之间的性能差距不容忽视。我们认为这对于测试知识提炼的有效性和稳健性至关重要,但LSP研究中缺少这一点,该研究使用的是性能差距很小的合成数据集。在4个数据集和14个异构GNN体系结构上的实验表明,G-CRD持续提高了轻量级GNN模型的性能和鲁棒性,优于结构保持方法、LSP和GSP以及基于2D计算机视觉的基线。 摘要:Knowledge distillation is a promising learning paradigm for boosting the performance and reliability of resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships across the student and teacher's node embedding spaces. In this paper, we make two key contributions: From a methodological perspective, we study whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. The purely local LSP objective over pre-defined edges is unable to achieve this as it ignores relationships among disconnected nodes. We propose two new approaches which better preserve global topology: (1) Global Structure Preserving loss (GSP), which extends LSP to incorporate all pairwise interactions; and (2) Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to align the student node embeddings to those of the teacher in a shared representation space. From an experimental perspective, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. We believe this is critical for testing the efficacy and robustness of knowledge distillation, but was missing from the LSP study which used synthetic datasets with trivial performance gaps. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNN models, outperforming the structure preserving approaches, LSP and GSP, as well as baselines adapted from 2D computer vision.

【4】 Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods 标题:COLD BREW:提取邻域不完整或缺失的图节点表示 链接:https://arxiv.org/abs/2111.04840

作者:Wenqing Zheng,Edward W Huang,Nikhil Rao,Sumeet Katariya,Zhangyang Wang,Karthik Subbian 备注:ICLR 2022 submission 摘要:图形神经网络(GNNs)在节点分类、回归和推荐任务方面取得了最先进的性能。当高质量和丰富的连接结构可用时,GNN工作良好。然而,在许多实际图中,由于许多节点的连接较少或有噪声,节点度具有幂律分布,因此无法满足这一要求。这种情况的极端情况是节点可能根本没有邻居,称为严格冷启动(SCS)场景。这迫使预测模型完全依赖于节点的输入特征。与逐点模型和其他基于图形的模型相比,我们建议使用Cold Brew通过蒸馏方法解决SCS和噪声邻居设置问题。我们引入了特征贡献率(FCR),这是一个度量标准,用于研究使用归纳GNN解决SCS问题的可行性,并为SCS泛化选择最佳体系结构。我们通过实验证明,FCR分离了图形数据集各个组件的贡献,并在几个公共基准测试和专有电子商务数据集上展示了Cold Brew的卓越性能。我们的方法的源代码可从以下网址获得:https://github.com/amazon-research/gnn-tail-generalization. 摘要:Graph Neural Networks (GNNs) have achieved state of the art performance in node classification, regression, and recommendation tasks. GNNs work well when high-quality and rich connectivity structure is available. However, this requirement is not satisfied in many real world graphs where the node degrees have power-law distributions as many nodes have either fewer or noisy connections. The extreme case of this situation is a node may have no neighbors at all, called Strict Cold Start (SCS) scenario. This forces the prediction models to rely completely on the node's input features. We propose Cold Brew to address the SCS and noisy neighbor setting compared to pointwise and other graph-based models via a distillation approach. We introduce feature-contribution ratio (FCR), a metric to study the viability of using inductive GNNs to solve the SCS problem and to select the best architecture for SCS generalization. We experimentally show FCR disentangles the contributions of various components of graph datasets and demonstrate the superior performance of Cold Brew on several public benchmarks and proprietary e-commerce datasets. The source code for our approach is available at: https://github.com/amazon-research/gnn-tail-generalization.

【5】 Inferential SIR-GN: Scalable Graph Representation Learning 标题:推理式SIR-GN:可伸缩图表示学习 链接:https://arxiv.org/abs/2111.04826

作者:Janet Layne,Edoardo Serra 机构: Boise State University 摘要:图表示学习方法为网络中的节点生成数字向量表示,从而使其能够在标准机器学习模型中使用。这些方法旨在保留关系信息,以便在表示空间中发现图中相似的节点彼此接近。相似性主要基于两个概念之一:连通性或结构角色。在节点结构角色很重要的任务中,基于连通性的方法表现出较差的性能。最近的工作已经开始关注学习方法对数百万到数十亿个节点和边的海量图的可伸缩性。许多无监督的节点表示学习算法无法扩展到大型图,并且无法为看不见的节点生成节点表示。在这项工作中,我们提出了推理SIR-GN模型,该模型在随机图上预先训练,然后快速计算节点表示,包括非常大的网络。我们证明了该模型能够捕获节点的结构角色信息,并在不可见网络上的节点和图分类任务中表现出优异的性能。此外,我们观察到推理SIR-GN的可伸缩性与当前最快的海量图方法相当。 摘要:Graph representation learning methods generate numerical vector representations for the nodes in a network, thereby enabling their use in standard machine learning models. These methods aim to preserve relational information, such that nodes that are similar in the graph are found close to one another in the representation space. Similarity can be based largely on one of two notions: connectivity or structural role. In tasks where node structural role is important, connectivity based methods show poor performance. Recent work has begun to focus on scalability of learning methods to massive graphs of millions to billions of nodes and edges. Many unsupervised node representation learning algorithms are incapable of scaling to large graphs, and are unable to generate node representations for unseen nodes. In this work, we propose Inferential SIR-GN, a model which is pre-trained on random graphs, then computes node representations rapidly, including for very large networks. We demonstrate that the model is able to capture node's structural role information, and show excellent performance at node and graph classification tasks, on unseen networks. Additionally, we observe the scalability of Inferential SIR-GN is comparable to the fastest current approaches for massive graphs.

【6】 MassFormer: Tandem Mass Spectrum Prediction with Graph Transformers 标题:MassFormer:利用图形转换器进行串联质谱预测 链接:https://arxiv.org/abs/2111.04824

作者:Adamo Young,Bo Wang,Hannes Röst 机构: Department of Computer Science, University of Toronto, Department of Laboratory Medicine and Pathobiology, University of Toronto, Department of Molecular Genetics, University of Toronto, Vector Institute, Peter Munk Centre, University Health Network 备注:14 pages (10 without bibliography), 5 figures, 3 tables 摘要:质谱是研究小分子的关键工具,在代谢组学、药物发现和环境化学中发挥着重要作用。串联质谱捕获碎片模式,提供有关分子的关键结构信息并帮助识别。从业者通常依靠光谱库搜索将未知光谱与已知化合物进行匹配。然而,这种基于搜索的方法受到参考实验数据可用性的限制。在这项工作中,我们表明,图形Transformer可以用来准确地预测串联质谱。我们的模型,MassFormer,在频谱预测方面优于竞争对手的深度学习方法,并包含一个可解释的注意机制来帮助解释预测。我们证明,我们的模型可用于提高合成分子鉴定任务的参考库覆盖率。通过定量分析和目视检查,我们验证了我们的模型恢复了碰撞能量对产生光谱影响的先验知识。我们从两个独立的质谱数据集中对我们的模型进行了不同类型的质谱评估,并表明它的性能是通用的。代码可从github.com/Roestlab/massformer获得。 摘要:Mass spectrometry is a key tool in the study of small molecules, playing an important role in metabolomics, drug discovery, and environmental chemistry. Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule and help with its identification. Practitioners often rely on spectral library searches to match unknown spectra with known compounds. However, such search-based methods are limited by availability of reference experimental data. In this work we show that graph transformers can be used to accurately predict tandem mass spectra. Our model, MassFormer, outperforms competing deep learning approaches for spectrum prediction, and includes an interpretable attention mechanism to help explain predictions. We demonstrate that our model can be used to improve reference library coverage on a synthetic molecule identification task. Through quantitative analysis and visual inspection, we verify that our model recovers prior knowledge about the effect of collision energy on the generated spectrum. We evaluate our model on different types of mass spectra from two independent MS datasets and show that its performance generalizes. Code available at github.com/Roestlab/massformer.

Transformer(1篇)

【1】 Sliced Recursive Transformer 标题:片式递归Transformer 链接:https://arxiv.org/abs/2111.05297

作者:Zhiqiang Shen,Zechun Liu,Eric Xing 机构:‡CMU §MBZUAI 备注:Code and models are available at this https URL 摘要:我们提出了一种简洁而有效的视觉转换器递归操作,可以在不涉及额外参数的情况下提高参数利用率。这是通过在Transformer网络的深度上共享权重来实现的。该方法只需使用na即可获得可观的增益(~2%)ive递归运算,不需要特殊或复杂的网络设计原理知识,并将训练过程的计算开销降至最低。为了在保持较高精度的同时减少递归运算带来的额外计算量,我们提出了一种跨递归层的多切片组自关注近似方法,该方法可以在性能损失最小的情况下将成本消耗降低10~30%。我们将我们的模型称为切片递归变换器(SReT),它与高效视觉变换器的广泛其他设计兼容。与最先进的方法相比,我们的最佳模型在包含较少参数的情况下显著改进了ImageNet。建议的切片递归操作允许我们在较小的尺寸(13~15M)下轻松构建100层甚至1000层以上的Transformer,以避免模型尺寸过大时出现优化困难。灵活的可伸缩性在扩展和构建极深和大维度的视觉转换器方面显示出巨大的潜力。我们的代码和模型可在https://github.com/szq0214/SReT. 摘要:We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across depth of transformer networks. The proposed method can obtain a substantial gain (~2%) simply using na"ive recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimum computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), which is compatible with a broad range of other designs for efficient vision transformers. Our best model establishes significant improvement on ImageNet over state-of-the-art methods while containing fewer parameters. The proposed sliced recursive operation allows us to build a transformer with more than 100 or even 1000 layers effortlessly under a still small size (13~15M), to avoid difficulties in optimization when the model size is too large. The flexible scalability has shown great potential for scaling up and constructing extremely deep and large dimensionality vision transformers. Our code and models are available at https://github.com/szq0214/SReT.

GAN|对抗|攻击|生成相关(5篇)

【1】 Reason first, then respond: Modular Generation for Knowledge-infused Dialogue 标题:先理性后回应:知识灌输对话的模块化生成 链接:https://arxiv.org/abs/2111.05204

作者:Leonard Adolphs,Kurt Shuster,Jack Urbanek,Arthur Szlam,Jason Weston 机构:ETH Zürich, Facebook AI Research 摘要:大型语言模型可以产生流畅的对话,但往往会产生事实不准确的幻觉。虽然检索增强模型有助于缓解这一问题,但它们仍然面临着推理以提供正确知识和同时生成对话的困难挑战。在这项工作中,我们提出了一个模块化的模型,知识到响应(K2R),用于将知识整合到会话代理中,它将这个问题分解为两个简单的步骤。K2R首先在给定对话上下文的情况下生成一个知识序列,作为中间步骤。在这个“推理步骤”之后,模型会关注自己生成的知识序列以及对话上下文,以产生最终的响应。在详细的实验中,我们发现这种模型在基于知识的对话任务中产生的幻觉较少,并且在可解释性和模块性方面具有优势。特别是,它可以用于将QA和对话系统融合在一起,以使对话代理能够给出有知识的答案,或者QA模型能够在零触发设置下给出对话响应。 摘要:Large language models can produce fluent dialogue but often hallucinate factual inaccuracies. While retrieval-augmented models help alleviate this issue, they still face a difficult challenge of both reasoning to provide correct knowledge and generating conversation simultaneously. In this work, we propose a modular model, Knowledge to Response (K2R), for incorporating knowledge into conversational agents, which breaks down this problem into two easier steps. K2R first generates a knowledge sequence, given a dialogue context, as an intermediate step. After this "reasoning step", the model then attends to its own generated knowledge sequence, as well as the dialogue context, to produce a final response. In detailed experiments, we find that such a model hallucinates less in knowledge-grounded dialogue tasks, and has advantages in terms of interpretability and modularity. In particular, it can be used to fuse QA and dialogue systems together to enable dialogue agents to give knowledgeable answers, or QA models to give conversational responses in a zero-shot setting.

【2】 Membership Inference Attacks Against Self-supervised Speech Models 标题:针对自监督语音模型的隶属度推理攻击 链接:https://arxiv.org/abs/2111.05113

作者:Wei-Cheng Tseng,Wei-Tsung Kao,Hung-yi Lee 机构:Graduate Institute of Communication Engineering, National Taiwan University, Taiwan 备注:Submitted to ICASSP 2022. Source code available at this https URL 摘要:最近,在连续语音上采用自监督学习(SSL)的思想开始受到关注。在大量未标记音频上预先训练的SSL模型可以生成通用表示,这有利于各种各样的语音处理任务。然而,尽管这些模型无处不在,但其潜在的隐私风险尚未得到很好的调查。在本文中,我们提出了第一个隐私分析几种SSL语音模型使用成员推理攻击(MIA)在黑盒访问。实验结果表明,这些预先训练好的模型在话语水平和说话人水平上都具有较高的对抗优势,容易受到MIA的影响,容易出现成员信息泄漏。此外,我们还进行了几项消融研究,以了解导致MIA成功的因素。 摘要:Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high adversarial advantage scores in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA.

【3】 Speaker Generation 标题:扬声器生成 链接:https://arxiv.org/abs/2111.05095

作者:Daisy Stanton,Matt Shannon,Soroosh Mariooryad,RJ Skerry-Ryan,Eric Battenberg,Tom Bagby,David Kao 机构:Google Research, USA 备注:12 pages, 3 figures, 4 tables, appendix with 2 tables 摘要:这项工作探索了在不存在的人声中合成语音的任务。我们将这项任务称为“说话人生成”,并介绍了TacoSpown,一个在这项任务中具有竞争力的系统。TacoSpawn是一种基于反复注意的文本到语音模型,它学习说话人嵌入空间上的分布,从而能够对新颖多样的说话人进行采样。我们的方法易于实现,并且不需要从说话人ID系统进行转移学习。我们提出了客观和主观指标来评估这项任务的性能,并证明我们提出的客观指标与人类对说话人相似性的感知相关。音频样本可以在我们的演示页面上找到。 摘要:This work explores the task of synthesizing speech in nonexistent human-sounding voices. We call this task "speaker generation", and present TacoSpawn, a system that performs competitively at this task. TacoSpawn is a recurrent attention-based text-to-speech model that learns a distribution over a speaker embedding space, which enables sampling of novel and diverse speakers. Our method is easy to implement, and does not require transfer learning from speaker ID systems. We present objective and subjective metrics for evaluating performance on this task, and demonstrate that our proposed objective metrics correlate with human perception of speaker similarity. Audio samples are available on our demo page.

【4】 Tightening the Approximation Error of Adversarial Risk with Auto Loss Function Search 标题:用自动损失函数搜索法缩小对抗性风险的逼近误差 链接:https://arxiv.org/abs/2111.05063

作者:Pengfei Xia,Ziqiang Li,Bin Li 机构: Li are with the Department of Electronic Engineeringand Information Science, University of Science and Technology of China 摘要:大量研究表明,深层神经网络很容易被对立的例子误导。有效评估模型的对抗鲁棒性对于其在实际应用中的部署非常重要。目前,一种常见的评估方法是通过构建恶意实例和执行攻击,将模型的对抗风险近似为鲁棒性指标。不幸的是,近似值和真实值之间存在误差(间隙)。以前的研究手动设计攻击方法,以实现较小的错误,这是低效的,可能会错过更好的解决方案。在本文中,我们将逼近误差的收紧建立为一个优化问题,并尝试用一种算法来解决它。更具体地说,我们首先分析了用替代损失替换非凸和不连续的0-1损失,这是计算近似值的必要折衷,是产生误差的主要原因之一。然后,我们提出了第一种搜索损失函数的方法AutoLoss AR,以减小对抗性风险的近似误差。在多种环境下进行了广泛的实验。结果证明了该方法的有效性:在MNIST和CIFAR-10上,最佳发现的损失函数分别比手工基线高0.9%-2.9%和0.7%-2.0%。此外,我们还验证了搜索到的损失可以转移到其他设置中,并通过可视化本地损失情况来探索为什么它们比基线更好。 摘要:Numerous studies have demonstrated that deep neural networks are easily misled by adversarial examples. Effectively evaluating the adversarial robustness of a model is important for its deployment in practical applications. Currently, a common type of evaluation is to approximate the adversarial risk of a model as a robustness indicator by constructing malicious instances and executing attacks. Unfortunately, there is an error (gap) between the approximate value and the true value. Previous studies manually design attack methods to achieve a smaller error, which is inefficient and may miss a better solution. In this paper, we establish the tightening of the approximation error as an optimization problem and try to solve it with an algorithm. More specifically, we first analyze that replacing the non-convex and discontinuous 0-1 loss with a surrogate loss, a necessary compromise in calculating the approximation, is one of the main reasons for the error. Then we propose AutoLoss-AR, the first method for searching loss functions for tightening the approximation error of adversarial risk. Extensive experiments are conducted in multiple settings. The results demonstrate the effectiveness of the proposed method: the best-discovered loss functions outperform the handcrafted baseline by 0.9%-2.9% and 0.7%-2.0% on MNIST and CIFAR-10, respectively. Besides, we also verify that the searched losses can be transferred to other settings and explore why they are better than the baseline by visualizing the local loss landscape.

【5】 Emotional Prosody Control for Speech Generation 标题:语音生成中的情感韵律控制 链接:https://arxiv.org/abs/2111.04730

作者:Sarath Sivaprasad,Saiteja Kosgi,Vineet Gandhi 机构:CVIT, KCIS, IIIT Hyderabad, TCS Research, Pune 摘要:机器生成语音的特点是其有限或不自然的情感变化。当前的文本到语音系统生成具有平坦情感、从预定义集合中选择的情感、从训练数据中的韵律序列中学习的平均变化或从源样式传输的语音。我们提出了一个文本到语音(TTS)系统,用户可以从一个连续且有意义的情感空间(唤醒价空间)中选择生成语音的情感。所提出的TTS系统可以从文本中生成任何说话人风格的语音,并对情感进行精细控制。我们表明,该系统可以处理训练过程中看不见的情绪,并且可以根据之前看不见的说话人的语音样本进行扩展。我们的工作将最先进的FastSpeech2主干扩展到多说话人环境,并提供令人垂涎的连续(可解释)情感控制,合成语音的质量不会出现任何明显下降。 摘要:Machine-generated speech is characterized by its limited or unnatural emotional variation. Current text to speech systems generates speech with either a flat emotion, emotion selected from a predefined set, average variation learned from prosody sequences in training data or transferred from a source style. We propose a text to speech(TTS) system, where a user can choose the emotion of generated speech from a continuous and meaningful emotion space (Arousal-Valence space). The proposed TTS system can generate speech from the text in any speaker's style, with fine control of emotion. We show that the system works on emotion unseen during training and can scale to previously unseen speakers given his/her speech sample. Our work expands the horizon of the state-of-the-art FastSpeech2 backbone to a multi-speaker setting and gives it much-coveted continuous (and interpretable) affective control, without any observable degradation in the quality of the synthesized speech.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】 Unsupervised Spiking Instance Segmentation on Event Data using STDP 标题:基于STDP的事件数据无监督尖峰实例分割 链接:https://arxiv.org/abs/2111.05283

作者:Paul Kirkland,Davide L. Manna,Alex Vincente-Sola,Gaetano Di Caterina 机构:Neuromorphic Sensor Signal Processing Lab, Centre for Image and Signal Processing, Electrical and Electronic Engineering, University of Strathclyde, Glasgow, United Kingdom. 备注:20 Pages, 13 Figures 摘要:尖峰神经网络(SNN)和神经形态工程领域带来了如何处理机器学习(ML)和计算机视觉(CV)问题的范式转变。这种范式的转变来自基于事件的感知和处理的适应。基于事件的视觉传感器允许生成与场景动态相关的稀疏和异步事件。不仅可以捕获空间信息,还可以捕获高保真的时间信息。同时避免了传统高帧速率方法的额外开销和冗余。然而,随着范式的变化,传统的CV和ML中的许多技术都不适用于这些基于事件的时空视觉流。因此,存在数量有限的识别、检测和分割方法。在本文中,我们提出了一种新的方法,只需使用训练用于对象识别的尖峰时间相关可塑性训练尖峰卷积神经网络的权值即可执行实例分割。这利用了网络内部特征表示的空间和时间方面,增加了这种新的鉴别能力。我们通过成功地将用于人脸检测的单类无监督网络转换为多人人脸识别和实例分割网络,突出了这一新功能。 摘要:Spiking Neural Networks (SNN) and the field of Neuromorphic Engineering has brought about a paradigm shift in how to approach Machine Learning (ML) and Computer Vision (CV) problem. This paradigm shift comes from the adaption of event-based sensing and processing. An event-based vision sensor allows for sparse and asynchronous events to be produced that are dynamically related to the scene. Allowing not only the spatial information but a high-fidelity of temporal information to be captured. Meanwhile avoiding the extra overhead and redundancy of conventional high frame rate approaches. However, with this change in paradigm, many techniques from traditional CV and ML are not applicable to these event-based spatial-temporal visual streams. As such a limited number of recognition, detection and segmentation approaches exist. In this paper, we present a novel approach that can perform instance segmentation using just the weights of a Spike Time Dependent Plasticity trained Spiking Convolutional Neural Network that was trained for object recognition. This exploits the spatial and temporal aspects of the network's internal feature representations adding this new discriminative capability. We highlight the new capability by successfully transforming a single class unsupervised network for face detection into a multi-person face recognition and instance segmentation network.

【2】 Risk Sensitive Model-Based Reinforcement Learning using Uncertainty Guided Planning 标题:基于风险敏感模型的不确定性引导规划强化学习 链接:https://arxiv.org/abs/2111.04972

作者:Stefan Radic Webster,Peter Flach 机构:Department of Computer Science, University of Bristol 备注:Safe RL Workshop NeurIPS 2021 摘要:识别不确定性并采取缓解措施对于安全可靠的强化学习代理至关重要,尤其是在高风险环境中部署时。本文利用动力学模型bootstrap集合估计环境认知不确定性的能力,在基于模型的强化学习算法中提高了风险敏感性。我们提出了不确定性引导的交叉熵方法规划,该方法惩罚在模型展开期间导致高方差状态预测的动作序列,将代理引导到具有低不确定性的状态空间的已知区域。实验表明,智能体能够在规划过程中识别状态空间的不确定区域,并在不需要显式约束的情况下采取行动,将智能体保持在高置信区域内。其结果是在获得回报方面的绩效降低,显示出风险和回报之间的权衡。 摘要:Identifying uncertainty and taking mitigating actions is crucial for safe and trustworthy reinforcement learning agents, especially when deployed in high-risk environments. In this paper, risk sensitivity is promoted in a model-based reinforcement learning algorithm by exploiting the ability of a bootstrap ensemble of dynamics models to estimate environment epistemic uncertainty. We propose uncertainty guided cross-entropy method planning, which penalises action sequences that result in high variance state predictions during model rollouts, guiding the agent to known areas of the state space with low uncertainty. Experiments display the ability for the agent to identify uncertain regions of the state space during planning and to take actions that maintain the agent within high confidence areas, without the requirement of explicit constraints. The result is a reduction in the performance in terms of attaining reward, displaying a trade-off between risk and return.

【3】 An Interactive Visualization Tool for Understanding Active Learning 标题:一种理解主动学习的交互式可视化工具 链接:https://arxiv.org/abs/2111.04936

作者:Zihan Wang,Jialin Lu,Oliver Snow,Martin Ester 机构:School of Computer Science, Simon Fraser University, Burnaby, BC, Canada 备注:NeurIPS 2021 workshop on Human-Centered AI; 7 pages; 5 figures 摘要:尽管人工智能和机器学习最近取得了进展,但许多最先进的方法缺乏可解释性和透明度。解释机器学习模型的预测并准确评估这些模型的能力至关重要。在本文中,我们提出了一个交互式可视化工具来阐明主动学习的训练过程。该工具使人们能够选择感兴趣的数据点样本,查看其预测值在不同查询阶段的变化,从而更好地了解主动学习的工作时间和方式。此外,用户可以利用此工具同时比较不同的主动学习策略,并检查为什么某些策略在某些情况下优于其他策略。通过一些初步的实验,我们证明了我们的可视化面板在各种主动学习实验中有很大的潜力,可以帮助用户适当地评估他们的模型。 摘要:Despite recent progress in artificial intelligence and machine learning, many state-of-the-art methods suffer from a lack of explainability and transparency. The ability to interpret the predictions made by machine learning models and accurately evaluate these models is crucially important. In this paper, we present an interactive visualization tool to elucidate the training process of active learning. This tool enables one to select a sample of interesting data points, view how their prediction values change at different querying stages, and thus better understand when and how active learning works. Additionally, users can utilize this tool to compare different active learning strategies simultaneously and inspect why some strategies outperform others in certain contexts. With some preliminary experiments, we demonstrate that our visualization panel has a great potential to be used in various active learning experiments and help users evaluate their models appropriately.

【4】 TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data 标题:TAGLETS:一个带辅助数据的自动半监督学习系统 链接:https://arxiv.org/abs/2111.04798

作者:Wasu Piriyakulkij,Cristina Menghini,Ross Briden,Nihal V. Nayak,Jeffrey Zhu,Elaheh Raisi,Stephen H. Bach 机构:Brown University,LinkedIn 摘要:机器学习实践者通常可以访问一系列数据:目标任务的标记数据(通常是有限的)、未标记数据和辅助数据,即其他任务的许多可用标记数据集。我们描述了TAGLETS,一个用于研究自动利用所有三种类型的数据和创建高质量、可服务的分类器的技术的系统。TAGLETS的关键组件是:(1)根据知识图组织的辅助数据,(2)封装用于利用辅助和未标记数据的不同方法的模块,以及(3)将集成模块组合成可服务模型的蒸馏阶段。我们将TAGLETS与最新的转移学习和半监督学习方法在四种图像分类任务上进行了比较。我们的研究涵盖了一系列设置,改变了标记数据的数量以及辅助数据与目标任务的语义相关性。我们发现,智能地将辅助和未标记的数据合并到多种学习技术中,使Taglet能够匹配,并且通常显著超过这些替代方法。TAGLETS作为开源系统在github.com/BatsResearch/TAGLETS上提供。 摘要:Machine learning practitioners often have access to a spectrum of data: labeled data for the target task (which is often limited), unlabeled data, and auxiliary data, the many available labeled datasets for other tasks. We describe TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers. The key components of TAGLETS are: (1) auxiliary data organized according to a knowledge graph, (2) modules encapsulating different methods for exploiting auxiliary and unlabeled data, and (3) a distillation stage in which the ensembled modules are combined into a servable model. We compare TAGLETS with state-of-the-art transfer learning and semi-supervised learning methods on four image classification tasks. Our study covers a range of settings, varying the amount of labeled data and the semantic relatedness of the auxiliary data to the target task. We find that the intelligent incorporation of auxiliary and unlabeled data into multiple learning techniques enables TAGLETS to match-and most often significantly surpass-these alternatives. TAGLETS is available as an open-source system at github.com/BatsResearch/taglets.

【5】 Gated Linear Model induced U-net for surrogate modeling and uncertainty quantification 标题:用于代理建模和不确定性量化的门线性模型诱导U网 链接:https://arxiv.org/abs/2111.05123

作者:Sai Krishna Mendu,Souvik Chakraborty 机构:Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati - , Assam, India., Department of Applied Mechanics, School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas - , New Delhi, India 备注:21 pages 摘要:我们提出了一种新的基于深度学习的替代模型,用于解决高维不确定性量化和不确定性传播问题。提出的深度学习体系结构是通过将著名的U-net体系结构与高斯选通线性网络(GGLN)相结合而开发的,称为选通线性网络诱导的U-net或GLU网络。所提出的GLU网络将不确定性传播问题视为图像到图像的回归,因此具有极高的数据效率。此外,它还提供了预测不确定性的估计值。GLU-net的网络结构没有当代作品复杂,参数少44%。我们展示了所提出的GLU网络在稀疏数据场景下求解不确定达西流问题的性能。我们认为随机输入维数高达4225。基准测试结果使用香草蒙特卡罗模拟生成。我们观察到,即使没有向网络提供有关输入结构的信息,所提出的GLU网络仍然是准确和高效的。通过改变训练样本大小和随机输入维数进行了实例研究,以说明该方法的鲁棒性。 摘要:We propose a novel deep learning based surrogate model for solving high-dimensional uncertainty quantification and uncertainty propagation problems. The proposed deep learning architecture is developed by integrating the well-known U-net architecture with the Gaussian Gated Linear Network (GGLN) and referred to as the Gated Linear Network induced U-net or GLU-net. The proposed GLU-net treats the uncertainty propagation problem as an image to image regression and hence, is extremely data efficient. Additionally, it also provides estimates of the predictive uncertainty. The network architecture of GLU-net is less complex with 44% fewer parameters than the contemporary works. We illustrate the performance of the proposed GLU-net in solving the Darcy flow problem under uncertainty under the sparse data scenario. We consider the stochastic input dimensionality to be up to 4225. Benchmark results are generated using the vanilla Monte Carlo simulation. We observe the proposed GLU-net to be accurate and extremely efficient even when no information about the structure of the inputs is provided to the network. Case studies are performed by varying the training sample size and stochastic input dimensionality to illustrate the robustness of the proposed approach.

【6】 Query-augmented Active Metric Learning 标题:查询增强的主动度量学习 链接:https://arxiv.org/abs/2111.04871

作者:Yujia Deng,Yubai Yuan,Haoda Fu,Annie Qu 机构:Department of Statistics, University of Illinois, Urbana-Champaign, Department of Statistics, University of California, Irvine, Eli Lilly and Company 摘要:在这篇文章中,我们提出了一种主动度量学习方法,用于两两约束下的聚类。该方法主动查询信息实例对的标签,同时通过合并未标记的实例对来估计底层度量,从而实现更准确、更高效的聚类过程。特别是,我们通过生成更多的成对标签来增加查询的约束,以便在学习度量时提供额外的信息,从而提高聚类性能。此外,我们通过顺序更新学习的度量并自适应地惩罚不相关的特征,提高了度量学习的鲁棒性。此外,我们还提出了一种新的主动查询策略,通过引入邻域结构更准确地评估实例对的信息增益,从而在不增加额外标记成本的情况下提高了聚类效率。从理论上讲,与仅使用现有约束的方法相比,我们提供了一个更严格的误差界。此外,我们还研究了使用主动查询策略代替随机选择的改进。对仿真环境和真实数据集的数值研究表明,当显著特征和不相关特征之间的信噪比较低时,该方法尤其具有优势。 摘要:In this paper we propose an active metric learning method for clustering with pairwise constraints. The proposed method actively queries the label of informative instance pairs, while estimating underlying metrics by incorporating unlabeled instance pairs, which leads to a more accurate and efficient clustering process. In particular, we augment the queried constraints by generating more pairwise labels to provide additional information in learning a metric to enhance clustering performance. Furthermore, we increase the robustness of metric learning by updating the learned metric sequentially and penalizing the irrelevant features adaptively. In addition, we propose a novel active query strategy that evaluates the information gain of instance pairs more accurately by incorporating the neighborhood structure, which improves clustering efficiency without extra labeling cost. In theory, we provide a tighter error bound of the proposed metric learning method utilizing augmented queries compared with methods using existing constraints only. Furthermore, we also investigate the improvement using the active query strategy instead of random selection. Numerical studies on simulation settings and real datasets indicate that the proposed method is especially advantageous when the signal-to-noise ratio between significant features and irrelevant features is low.

【7】 Unsupervised Approaches for Out-Of-Distribution Dermoscopic Lesion Detection 标题:离散型皮肤镜病变检测的非监督方法 链接:https://arxiv.org/abs/2111.04807

作者:Max Torop,Sandesh Ghimire,Wenqian Liu,Dana H. Brooks,Octavia Camps,Milind Rajadhyaksha,Jennifer Dy,Kivanc Kose 机构:Northeastern University, Memorial Sloan Kettering Cancer Center 备注:NeurIPS: Medical Imaging Meets NeurIPS Workshop 摘要:对于复杂的医学数据,显示无监督分布外(OOD)方法有效性的工作有限。在这里,我们介绍了我们的无监督OOD检测算法SimCLR LOF的初步发现,以及应用于医学图像的最新技术方法(SSD)。SimCLR LOF使用SimCLR学习语义上有意义的特征,如果测试样本是OOD,则使用LOF进行评分。我们在多源国际皮肤成像合作(ISIC)2019数据集上进行了评估,结果显示,该结果与SSD以及最近应用于相同数据的监督方法具有竞争力。 摘要:There are limited works showing the efficacy of unsupervised Out-of-Distribution (OOD) methods on complex medical data. Here, we present preliminary findings of our unsupervised OOD detection algorithm, SimCLR-LOF, as well as a recent state of the art approach (SSD), applied on medical images. SimCLR-LOF learns semantically meaningful features using SimCLR and uses LOF for scoring if a test sample is OOD. We evaluated on the multi-source International Skin Imaging Collaboration (ISIC) 2019 dataset, and show results that are competitive with SSD as well as with recent supervised approaches applied on the same data.

迁移|Zero/Few/One-Shot|自适应(1篇)

【1】 The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection 标题:自适应优化器在诚实私密超参数选择中的作用 链接:https://arxiv.org/abs/2111.04906

作者:Shubhankar Mohapatra,Sajin Sasy,Xi He,Gautam Kamath,Om Thakkar 摘要:超参数优化是机器学习中一个普遍存在的挑战,训练模型的性能取决于它们的有效选择。尽管有一套丰富的工具可用于此目的,但在差异隐私(DP)的约束下,目前还没有实用的超参数选择方法。我们研究了用于差异私有机器学习的诚实超参数选择,其中超参数调整的过程在总体隐私预算中被考虑。为此,我们i)证明了标准合成工具在许多环境中的表现优于更先进的技术;ii)从经验和理论上证明了学习率和剪切范数超参数之间的内在联系,iii)证明像DPAdam这样的自适应优化器在诚实的超参数调优过程中具有显著优势,iv)利用Adam在DP设置中的新颖限制行为设计新的更高效的优化器。 摘要:Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private machine learning, in which the process of hyperparameter tuning is accounted for in the overall privacy budget. To this end, we i) show that standard composition tools outperform more advanced techniques in many settings, ii) empirically and theoretically demonstrate an intrinsic connection between the learning rate and clipping norm hyperparameters, iii) show that adaptive optimizers like DPAdam enjoy a significant advantage in the process of honest hyperparameter tuning, and iv) draw upon novel limiting behaviour of Adam in the DP setting to design a new and more efficient optimizer.

强化学习(2篇)

【1】 On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods 标题:用形式化方法评估强化学习算法的安全性 链接:https://arxiv.org/abs/2111.04865

作者:Paulina Stevia,Nouwou Mindom,Amin Nikanjam,Foutse Khomh,John Mullins 机构:SWAT Lab., Polytechnique Montr´eal, Montr´eal, Canada 摘要:在自动驾驶车辆、健康和航空等安全关键系统领域越来越多地采用强化学习,这就需要确保其安全性。现有的安全机制,如对抗式训练、对抗式检测和鲁棒学习,并不总是适用于部署agent的所有干扰。这些干扰包括移动对手,其行为可由代理预测,事实上对其学习有害。确保关键系统的安全还需要对在扰动环境中演化的代理行为提供正式保证的方法。因此,有必要提出新的解决方案,以适应agent所面临的学习挑战。在本文中,我们首先通过呈现移动的对手来生成在代理策略中显示缺陷的对抗性代理。其次,我们使用奖励成形和改进的Q-学习算法作为防御机制,以改进代理在面对对抗性干扰时的策略。最后,采用概率模型检验来评估这两种机制的有效性。我们在一个离散的网格世界上进行了实验,单个代理面对非学习型和学习型对手。我们的结果显示,代理和对手之间的碰撞次数减少。概率模型检查提供了有关对抗环境中代理安全性的概率上下限。 摘要:The increasing adoption of Reinforcement Learning in safety-critical systems domains such as autonomous vehicles, health, and aviation raises the need for ensuring their safety. Existing safety mechanisms such as adversarial training, adversarial detection, and robust learning are not always adapted to all disturbances in which the agent is deployed. Those disturbances include moving adversaries whose behavior can be unpredictable by the agent, and as a matter of fact harmful to its learning. Ensuring the safety of critical systems also requires methods that give formal guarantees on the behaviour of the agent evolving in a perturbed environment. It is therefore necessary to propose new solutions adapted to the learning challenges faced by the agent. In this paper, first we generate adversarial agents that exhibit flaws in the agent's policy by presenting moving adversaries. Secondly, We use reward shaping and a modified Q-learning algorithm as defense mechanisms to improve the agent's policy when facing adversarial perturbations. Finally, probabilistic model checking is employed to evaluate the effectiveness of both mechanisms. We have conducted experiments on a discrete grid world with a single agent facing non-learning and learning adversaries. Our results show a diminution in the number of collisions between the agent and the adversaries. Probabilistic model checking provides lower and upper probabilistic bounds regarding the agent's safety in the adversarial environment.

【2】 Dueling RL: Reinforcement Learning with Trajectory Preferences 标题:决斗RL:带轨迹偏好的强化学习 链接:https://arxiv.org/abs/2111.04850

作者:Aldo Pacchiano,Aadirupa Saha,Jonathan Lee 机构:Microsoft Research, New York City, Stanford University 摘要:我们考虑基于偏好的强化学习(PBRL)的问题,其中,与传统强化学习不同,代理仅在轨迹对上接收到1位(0/1)偏好的反馈,而不是对它们的绝对奖励。传统RL框架的成功关键取决于底层的agent奖励模型,然而,这取决于系统设计者如何准确地表达适当的奖励函数,通常是一项非常重要的任务。我们框架的主要创新之处在于能够从基于偏好的轨迹反馈中学习,从而消除了手工制作数字奖励模型的需要。本文建立了具有非马尔可夫报酬的PbRL问题的形式化框架,其中轨迹偏好由维度为$d$的广义线性模型编码。假设已知转换模型,我们提出了一个几乎最优后悔保证为$tilde{mathcal{O}}left(shdlog(T/delta)sqrt{T}right)$的算法。我们进一步将上述算法推广到未知跃迁动力学的情况,并给出了一个具有近似最优后悔保证的算法$widetilde{mathcal{O}((sqrt{d} H^2 mathcal{S}}})$sqrt{dT} sqrt{S}mathcal{A}TH})。据我们所知,我们的工作是第一个为基于偏好的具有轨迹偏好的RL问题提供严格遗憾保证的工作之一。 摘要:We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them. The success of the traditional RL framework crucially relies on the underlying agent-reward model, which, however, depends on how accurately a system designer can express an appropriate reward function and often a non-trivial task. The main novelty of our framework is the ability to learn from preference-based trajectory feedback that eliminates the need to hand-craft numeric reward models. This paper sets up a formal framework for the PbRL problem with non-markovian rewards, where the trajectory preferences are encoded by a generalized linear model of dimension $d$. Assuming the transition model is known, we then propose an algorithm with almost optimal regret guarantee of $tilde {mathcal{O}}left( SH d log (T / delta) sqrt{T} right)$. We further, extend the above algorithm to the case of unknown transition dynamics, and provide an algorithm with near optimal regret guarantee $widetilde{mathcal{O}}((sqrt{d} H^2 |mathcal{S}|)sqrt{dT} sqrt{|mathcal{S}||mathcal{A}|TH} )$. To the best of our knowledge, our work is one of the first to give tight regret guarantees for preference based RL problems with trajectory preferences.

医学相关(1篇)

【1】 Deep diffusion-based forecasting of COVID-19 by incorporating network-level mobility information 标题:结合网络级移动信息的基于深度扩散的冠状病毒预测 链接:https://arxiv.org/abs/2111.05199

作者:Padmaksha Roy,Shailik Sarkar,Subhodip Biswas,Fanglan Chen,Zhiqian Chen,Naren Ramakrishnan,Chang-Tien Lu 机构:Department of Electrical and Computer Engineering, Virginia Tech, Greater Washington DC area, USA, Department of Computer Science, Virginia Tech, Greater Washington DC area, USA 备注:None 摘要:对传染病传播的时空性质进行建模,可以提供有用的直觉,了解疾病传播的时变方面以及人们流动模式中观察到的潜在复杂空间依赖性。此外,可以利用县级多个相关时间序列信息对单个时间序列进行预测。此外,实时数据往往偏离单峰高斯分布假设,并可能显示一些复杂的混合模式,这一事实也增加了这一挑战。基于此,我们开发了一种基于深度学习的概率预测时间序列模型,称为自回归混合密度动态扩散网络(ARM3Dnet),该网络将人员流动和疾病传播视为动态有向图上的扩散过程。高斯混合模型层是在考虑多个相关时间序列的情况下实现实时数据的多模态性质的。我们表明,当我们的模型采用动态协变量特征和混合成分的最佳组合进行训练时,在预测美国县一级的新冠肺炎死亡人数和病例数方面,我们的模型可以优于传统的统计模型和深度学习模型。 摘要:Modeling the spatiotemporal nature of the spread of infectious diseases can provide useful intuition in understanding the time-varying aspect of the disease spread and the underlying complex spatial dependency observed in people's mobility patterns. Besides, the county level multiple related time series information can be leveraged to make a forecast on an individual time series. Adding to this challenge is the fact that real-time data often deviates from the unimodal Gaussian distribution assumption and may show some complex mixed patterns. Motivated by this, we develop a deep learning-based time-series model for probabilistic forecasting called Auto-regressive Mixed Density Dynamic Diffusion Network(ARM3Dnet), which considers both people's mobility and disease spread as a diffusion process on a dynamic directed graph. The Gaussian Mixture Model layer is implemented to consider the multimodal nature of the real-time data while learning from multiple related time series. We show that our model, when trained with the best combination of dynamic covariate features and mixture components, can outperform both traditional statistical and deep learning models in forecasting the number of Covid-19 deaths and cases at the county level in the United States.

蒸馏|知识提取(1篇)

【1】 MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps 标题:MixACM:通过提取激活的通道映射实现基于混合的健壮性传递 链接:https://arxiv.org/abs/2111.05073

作者:Muhammad Awais,Fengwei Zhou,Chuanlong Xie,Jiawei Li,Sung-Ho Bae,Zhenguo Li 机构: Huawei Noah’s Ark Lab, Department of Computer Science, Kyung-Hee University, South Korea 备注:Accepted by NeurIPS 2021 摘要:深层神经网络容易受到自然输入中的不利、微小和不可察觉的变化的影响。对抗这些例子最有效的防御机制是对抗性训练,它通过迭代损失最大化在训练期间构造对抗性例子。然后对模型进行训练,以使这些构造的示例的损失最小化。这种最小-最大优化需要更多的数据、更大的容量模型和额外的计算资源。它还会降低模型的标准泛化性能。我们能否更有效地实现健壮性?在这项工作中,我们从知识转移的角度来探讨这个问题。首先,我们从理论上证明了鲁棒性在混合增强的帮助下从一个经过对抗训练的教师模型到一个学生模型的可转移性。其次,我们提出了一种新的鲁棒性传输方法,称为基于混合的激活信道映射(MixACM)传输。MixACM通过匹配生成的激活通道图(无需昂贵的对抗性干扰),将鲁棒性从健壮的教师传递给学生。最后,在多个数据集和不同的学习场景上进行的大量实验表明,我们的方法可以在提高自然图像泛化能力的同时传递鲁棒性。 摘要:Deep neural networks are susceptible to adversarially crafted, small and imperceptible changes in the natural inputs. The most effective defense mechanism against these examples is adversarial training which constructs adversarial examples during training by iterative maximization of loss. The model is then trained to minimize the loss on these constructed examples. This min-max optimization requires more data, larger capacity models, and additional computing resources. It also degrades the standard generalization performance of a model. Can we achieve robustness more efficiently? In this work, we explore this question from the perspective of knowledge transfer. First, we theoretically show the transferability of robustness from an adversarially trained teacher model to a student model with the help of mixup augmentation. Second, we propose a novel robustness transfer method called Mixup-Based Activated Channel Maps (MixACM) Transfer. MixACM transfers robustness from a robust teacher to a student by matching activated channel maps generated without expensive adversarial perturbations. Finally, extensive experiments on multiple datasets and different learning scenarios show our method can transfer robustness while also improving generalization on natural images.

聚类(1篇)

【1】 Approximating Fair Clustering with Cascaded Norm Objectives 标题:用级联范数目标逼近公平聚类 链接:https://arxiv.org/abs/2111.04804

作者:Eden Chlamtáč,Yury Makarychev,Ali Vakilian 机构:il†Toyota Technological Institute at Chicago (TTIC) 备注:SODA 2022 摘要:我们引入了$(p,q)$-公平聚类问题。在这个问题中,我们得到一组点$P$和一组不同的权重函数$W$。我们希望找到一个聚类,它使向量的$ellu q$-范数最小化,超过距离中心的点的加权距离的$ellu p$范数的$ellu q$-范数的$ellu p$。这概括了各种聚类问题,包括社会公平$k$-中值和$k$-均值,并与其他问题密切相关,如最密集的$k$-子图和最小$k$-并集。我们利用凸规划技术对$p$和$q$的不同值近似$(p,q)$-公平聚类问题。当$pgeq$时,我们得到一个$O(k^{(p-q)/(2pq)}$,它几乎匹配一个基于最小$k$并集的猜想硬度和其他问题的$k^{Omega((p-q)/(pq))}$下限。当$qgeq p$时,我们得到一个近似值,该近似值与有界$p,q$的输入大小无关,并且还匹配最近的$O((log n/(loglog n))^{1/p})-Makarychev和Vakilian(COLT 2021)对$(p,infty)$-公平聚类的近似值。 摘要:We introduce the $(p,q)$-Fair Clustering problem. In this problem, we are given a set of points $P$ and a collection of different weight functions $W$. We would like to find a clustering which minimizes the $ell_q$-norm of the vector over $W$ of the $ell_p$-norms of the weighted distances of points in $P$ from the centers. This generalizes various clustering problems, including Socially Fair $k$-Median and $k$-Means, and is closely connected to other problems such as Densest $k$-Subgraph and Min $k$-Union. We utilize convex programming techniques to approximate the $(p,q)$-Fair Clustering problem for different values of $p$ and $q$. When $pgeq q$, we get an $O(k^{(p-q)/(2pq)})$, which nearly matches a $k^{Omega((p-q)/(pq))}$ lower bound based on conjectured hardness of Min $k$-Union and other problems. When $qgeq p$, we get an approximation which is independent of the size of the input for bounded $p,q$, and also matches the recent $O((log n/(loglog n))^{1/p})$-approximation for $(p, infty)$-Fair Clustering by Makarychev and Vakilian (COLT 2021).

自动驾驶|车辆|车道检测等(1篇)

【1】 Deep Learning Approach for Aggressive Driving Behaviour Detection 标题:基于深度学习的攻击性驾驶行为检测方法 链接:https://arxiv.org/abs/2111.04794

作者:Farid Talebloo,Emad A. Mohammed,Behrouz Far 机构:Department of Electrical and Software Engineering, University of Calgary, University Drive NW, Calgary, Alberta T,N ,N, CANADA 摘要:驾驶行为是道路碰撞和事故的主要原因之一,可以通过识别和尽量减少攻击性驾驶行为来减少这些原因。这项研究确定了驾驶员在不同情况下(匆忙、心理冲突、报复)开始积极驾驶的时间步长。需要观察者(真实或虚拟)检查驾驶行为,以发现攻击性驾驶情况;我们通过使用智能手机的GPS传感器每三分钟检测一次位置并对驾驶员的驾驶行为进行分类,从而克服了这个问题。为了检测数据集中的时间序列模式,我们使用RNN(GRU,LSTM)算法在驾驶过程中识别模式。该算法与道路、车辆、位置或驾驶员特征无关。我们得出结论,三分钟(或更长时间)的驾驶(120秒的GPS数据)足以识别驾驶员行为。结果显示了较高的准确性和较高的F1分数。 摘要:Driving behaviour is one of the primary causes of road crashes and accidents, and these can be decreased by identifying and minimizing aggressive driving behaviour. This study identifies the timesteps when a driver in different circumstances (rush, mental conflicts, reprisal) begins to drive aggressively. An observer (real or virtual) is needed to examine driving behaviour to discover aggressive driving occasions; we overcome this problem by using a smartphone's GPS sensor to detect locations and classify drivers' driving behaviour every three minutes. To detect timeseries patterns in our dataset, we employ RNN (GRU, LSTM) algorithms to identify patterns during the driving course. The algorithm is independent of road, vehicle, position, or driver characteristics. We conclude that three minutes (or more) of driving (120 seconds of GPS data) is sufficient to identify driver behaviour. The results show high accuracy and a high F1 score.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 The Internet of Federated Things (IoFT): A Vision for the Future and In-depth Survey of Data-driven Approaches for Federated Learning 标题:联合物联网(IoFT):联邦学习的数据驱动方法的未来展望和深入综述 链接:https://arxiv.org/abs/2111.05326

作者:Raed Kontar,Naichen Shi,Xubo Yue,Seokhyun Chung,Eunshin Byon,Mosharaf Chowdhury,Judy Jin,Wissam Kontar,Neda Masoud,Maher Noueihed,Chinedum E. Okwudire,Garvesh Raskutti,Romesh Saigal,Karandeep Singh,Zhisheng Ye 机构:Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, Civil & Environmental Engineering, University of Wisconsin Madison, Madison 备注:Accepted at IEEE 摘要:物联网(IoT)正处于重大范式转变的边缘。在未来的物联网系统IoFT中,云将被人群所取代,模型训练被带到了边缘,允许物联网设备协作提取知识并构建智能分析/模型,同时将其个人数据存储在本地。IoT设备上计算能力的巨大增长以及分散和隐私保护模型训练(被称为联邦学习(FL))的最新进展推动了这一范式转变。本文提供了IoFT的愿景,并系统地概述了当前为实现这一愿景所做的努力。具体而言,我们首先介绍IoFT的定义特征,并讨论FL数据驱动的方法、机遇和挑战,这些方法允许在三个维度内进行分散推理:(i)在所有物联网设备上实现效用最大化的全球模型,(ii)一个个性化的模型,它在所有设备中借鉴了优势,但保留了自己的模型,(iii)一个元学习模型,可以快速适应新设备或学习任务。最后,我们通过领域专家的视角描述了IoFT在重塑不同行业方面的愿景和挑战。这些行业包括制造业、运输业、能源、医疗保健、质量与可靠性、商业和计算。 摘要:The Internet of Things (IoT) is on the verge of a major paradigm shift. In the IoT system of the future, IoFT, the cloud will be substituted by the crowd where model training is brought to the edge, allowing IoT devices to collaboratively extract knowledge and build smart analytics/models while keeping their personal data stored locally. This paradigm shift was set into motion by the tremendous increase in computational power on IoT devices and the recent advances in decentralized and privacy-preserving model training, coined as federated learning (FL). This article provides a vision for IoFT and a systematic overview of current efforts towards realizing this vision. Specifically, we first introduce the defining characteristics of IoFT and discuss FL data-driven approaches, opportunities, and challenges that allow decentralized inference within three dimensions: (i) a global model that maximizes utility across all IoT devices, (ii) a personalized model that borrows strengths across all devices yet retains its own model, (iii) a meta-learning model that quickly adapts to new devices or learning tasks. We end by describing the vision and challenges of IoFT in reshaping different industries through the lens of domain experts. Those industries include manufacturing, transportation, energy, healthcare, quality & reliability, business, and computing.

联邦学习|隐私保护|加密(2篇)

【1】 Unified Group Fairness on Federated Learning 标题:联合学习中的统一群体公平问题 链接:https://arxiv.org/abs/2111.04986

作者:Fengda Zhang,Kun Kuang,Yuxuan Liu,Chao Wu,Fei Wu,Jiaxun Lu,Yunfeng Shao,Jun Xiao 机构:College of Computer Science and Technology, Zhejiang University, School of Public Affairs, Zhejiang University, Huawei Noah’s Ark Lab 备注:Under Review 摘要:联邦学习(FL)已经成为一种重要的机器学习范式,它基于分布式客户机的私有数据来训练全局模型。然而,由于分布的变化,现有的FL算法大多不能保证对不同客户机或不同样本组的性能公平性。最近的研究侧重于实现客户之间的公平,但忽略了敏感属性(如性别和/或种族)对不同群体的公平性,这在实际应用中非常重要和实用。为了弥补这一差距,我们制定了FL上统一组公平的目标,即学习在不同组上具有相似性能的公平全局模型。为了实现任意敏感属性的统一组公平性,我们提出了一种新的FL算法,称为组分布鲁棒联邦平均(G-DRFA),该算法通过对收敛速度的理论分析缓解了组间分布的偏移。具体而言,我们将联邦全局模型在每个组的性能视为一个目标,并采用分布鲁棒性技术,在组重新加权设置的不确定性上最大化性能最差组的性能。我们在实验中验证了G-DRFA算法在各种分布移位设置下的优势,结果表明G-DRFA算法在统一组公平性方面优于现有的公平联邦学习算法。 摘要:Federated learning (FL) has emerged as an important machine learning paradigm where a global model is trained based on the private data from distributed clients. However, most of existing FL algorithms cannot guarantee the performance fairness towards different clients or different groups of samples because of the distribution shift. Recent researches focus on achieving fairness among clients, but they ignore the fairness towards different groups formed by sensitive attribute(s) (e.g., gender and/or race), which is important and practical in real applications. To bridge this gap, we formulate the goal of unified group fairness on FL which is to learn a fair global model with similar performance on different groups. To achieve the unified group fairness for arbitrary sensitive attribute(s), we propose a novel FL algorithm, named Group Distributionally Robust Federated Averaging (G-DRFA), which mitigates the distribution shift across groups with theoretical analysis of convergence rate. Specifically, we treat the performance of the federated global model at each group as an objective and employ the distributionally robust techniques to maximize the performance of the worst-performing group over an uncertainty set by group reweighting. We validate the advantages of the G-DRFA algorithm with various kinds of distribution shift settings in experiments, and the results show that G-DRFA algorithm outperforms the existing fair federated learning algorithms on unified group fairness.

【2】 Papaya: Practical, Private, and Scalable Federated Learning 标题:木瓜:实用、私有和可扩展的联合学习 链接:https://arxiv.org/abs/2111.04877

作者:Dzmitry Huba,John Nguyen,Kshitiz Malik,Ruiyu Zhu,Mike Rabbat,Ashkan Yousefpour,Carole-Jean Wu,Hongyuan Zhan,Pavel Ustinov,Harish Srinivas,Kaikai Wang,Anthony Shoumikhin,Jesik Min,Mani Malek 摘要:跨设备联合学习(FL)是一种分布式学习范式,它面临着与传统分布式学习不同的若干挑战,每个设备上的系统特性存在差异,数百万个客户机与中央服务器协调是主要的。文献中描述的大多数FL系统是同步的——它们执行来自各个客户端的模型更新的同步聚合。扩展同步FL具有挑战性,因为增加并行训练的客户机数量会导致训练速度的回报降低,类似于大批量训练。此外,掉队者阻碍了同步外语训练。在这项工作中,我们概述了生产异步FL系统的设计。我们的工作解决了上述问题,概述了一些系统设计挑战及其解决方案,并涉及到为数百万客户构建生产FL系统的原则。根据经验,我们证明了异步FL在近一亿台设备上进行训练时比同步FL收敛更快。特别是,在高并发设置中,异步FL比同步FL快5倍,通信开销几乎少8倍。 摘要:Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of model updates from individual clients. Scaling synchronous FL is challenging since increasing the number of clients training in parallel leads to diminishing returns in training speed, analogous to large-batch training. Moreover, stragglers hinder synchronous FL training. In this work, we outline a production asynchronous FL system design. Our work tackles the aforementioned issues, sketches of some of the system design challenges and their solutions, and touches upon principles that emerged from building a production FL system for millions of clients. Empirically, we demonstrate that asynchronous FL converges faster than synchronous FL when training across nearly one hundred million devices. In particular, in high concurrency settings, asynchronous FL is 5x faster and has nearly 8x less communication overhead than synchronous FL.

推理|分析|理解|解释(5篇)

【1】 A Topological Data Analysis Based Classifier 标题:一种基于拓扑数据分析的分类器 链接:https://arxiv.org/abs/2111.05214

作者:Rolando Kindelan,José Frías,Mauricio Cerda,Nancy Hitschfeld 备注:The paper is under consideration at Pattern Recognition Letters. arXiv admin note: text overlap with arXiv:2102.03709 摘要:拓扑数据分析(TDA)是一个新兴的领域,旨在发现隐藏在数据集中的拓扑信息。TDA工具通常用于创建过滤器和拓扑描述符,以改进机器学习(ML)方法。本文提出了一种将TDA直接应用于多类分类问题的算法,该算法不需要任何进一步的ML阶段,对不平衡数据集显示出优势。该算法在数据集上建立一个过滤的单纯形复合体。持久同源性(PH)用于指导子复合物的选择,其中未标记点通过标记相邻点的多数投票获得标签。我们选择了8个数据集,每个数据集具有不同的维度、类别重叠程度和不平衡样本。平均而言,提出的TDABC方法优于KNN和加权KNN。在平衡数据集中,它与局部支持向量机和随机森林基线分类器具有竞争性,并且优于所有对纠缠类和少数类进行分类的基线方法。 摘要:Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, without any further ML stage, showing advantages for imbalanced datasets. The proposed algorithm builds a filtered simplicial complex on the dataset. Persistent Homology (PH) is applied to guide the selection of a sub-complex where unlabeled points obtain the label with the majority of votes from labeled neighboring points. We select 8 datasets with different dimensions, degrees of class overlap and imbalanced samples per class. On average, the proposed TDABC method was better than KNN and weighted-KNN. It behaves competitively with Local SVM and Random Forest baseline classifiers in balanced datasets, and it outperforms all baseline methods classifying entangled and minority classes.

【2】 Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Usage, and Impact 标题:英语论文中跨语言引文的流行、使用和影响的大规模分析 链接:https://arxiv.org/abs/2111.05097

作者:Tarek Saier,Michael Färber,Tornike Tsereteli 机构:Received: date Accepted: date 备注:to be published in the International Journal on Digital Libraries 摘要:学术数据中的引文信息是了解出版物接收和学术话语的重要来源。引文分析的结果和基于引文的机器学习方法的适用性在很大程度上取决于这些数据的完整性。当今学术数据的一个特别缺点是,数据集中通常不包括非英语出版物,或者没有语言元数据。因此,不同语言出版物之间的引文(跨语言引文)研究的程度非常有限。在这篇论文中,我们对100多万篇英语论文的跨语言引文进行了分析,这些论文跨越三个科学学科,时间跨度为30年。我们的调查涵盖了被引用语言和学科之间的差异、随时间变化的趋势、使用特征以及跨语言引用的影响。我们的发现包括中文出版物的引用率不断上升,引用主要针对当地的非英语语言,以及跨语言和单语引用之间在引用意图上的一致性。为了促进进一步的研究,我们将收集的数据和源代码公开。 摘要:Citation information in scholarly data is an important source of insight into the reception of publications and the scholarly discourse. Outcomes of citation analyses and the applicability of citation based machine learning approaches heavily depend on the completeness of such data. One particular shortcoming of scholarly data nowadays is that non-English publications are often not included in data sets, or that language metadata is not available. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on over one million English papers, spanning three scientific disciplines and a time span of three decades. Our investigation covers differences between cited languages and disciplines, trends over time, and the usage characteristics as well as impact of cross-lingual citations. Among our findings are an increasing rate of citations to publications written in Chinese, citations being primarily to local non-English languages, and consistency in citation intent between cross- and monolingual citations. To facilitate further research, we make our collected data and source code publicly available.

【3】 Analysis of Sectoral Profitability of the Indian Stock Market Using an LSTM Regression Model 标题:基于LSTM回归模型的印度股市行业盈利能力分析 链接:https://arxiv.org/abs/2111.04976

作者:Jaydip Sen,Saikat Mondal,Sidra Mehtab 机构:Department of Data Science, Praxis Business School, Kolkata, INDIA 备注:This was accepted for oral presentation and publication in the proceedings of the Deep Learning Developers' Conference (DLDC'2021) organized online from September 23 - September 24, 2021 by Analytics India Magazine, INIDA. The paper si 8 pages long, and it contains 15 figures and 14 tables 摘要:准确预测未来股票价格的预测模型设计一直被认为是一个有趣且具有挑战性的研究问题。由于现实世界中股票价格的波动性和随机性,受许多可控和不可控变量的影响,这项任务变得复杂。本文提出了一种基于长、短期记忆(LSTM)结构的优化预测模型,用于在指定的时间间隔内从网络上自动提取过去的股票价格,并在指定的预测期内预测其未来价格,并预测未来的股票价格。该模型用于根据其对印度国家证券交易所(NSE)七个不同部门70只重要股票的预测结果进行买卖交易。每个部门的盈利能力基于该部门股票在2010年1月1日至2021年8月26日期间产生的总利润得出。根据盈利能力价值对这些部门进行比较。此外,还对每个部门的模型预测精度进行了评估。结果表明,该模型对未来股票价格具有较高的预测精度。 摘要:Predictive model design for accurately predicting future stock prices has always been considered an interesting and challenging research problem. The task becomes complex due to the volatile and stochastic nature of the stock prices in the real world which is affected by numerous controllable and uncontrollable variables. This paper presents an optimized predictive model built on long-and-short-term memory (LSTM) architecture for automatically extracting past stock prices from the web over a specified time interval and predicting their future prices for a specified forecast horizon, and forecasts the future stock prices. The model is deployed for making buy and sell transactions based on its predicted results for 70 important stocks from seven different sectors listed in the National Stock Exchange (NSE) of India. The profitability of each sector is derived based on the total profit yielded by the stocks in that sector over a period from Jan 1, 2010 to Aug 26, 2021. The sectors are compared based on their profitability values. The prediction accuracy of the model is also evaluated for each sector. The results indicate that the model is highly accurate in predicting future stock prices.

【4】 An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit 标题:多人多臂协同劫匪的实例相关分析 链接:https://arxiv.org/abs/2111.04873

作者:Aldo Pacchiano,Peter Bartlett,Michael I. Jordan 机构:Microsoft Research, NYC, University of California, Berkeley 备注:44 pages 摘要:研究了多玩家多武装匪徒的信息共享与合作问题。我们提出了第一个算法,实现对数遗憾这个问题。我们的结果基于两个创新。首先,我们证明了对连续淘汰策略的一个简单修改可以用于允许玩家在没有碰撞的情况下估计其次优差距,直到常数因子。其次,我们利用第一个结果设计了一个通信协议,该协议成功地利用碰撞的小回报在参与者之间进行协调,同时保留有意义的依赖于实例的对数遗憾保证。 摘要:We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while preserving meaningful instance-dependent logarithmic regret guarantees.

【5】 Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection 标题:随机样本缺失因果推断的广义核岭回归 链接:https://arxiv.org/abs/2111.05277

作者:Rahul Singh 机构: MIT Department of Economics 备注:75 pages 摘要:我提出了核岭回归估计的非参数剂量反应曲线和半参数治疗的影响,在设置中,分析员可以访问选定的样本,而不是随机样本;仅对选定的观察结果进行观察。我假设选择和随机条件治疗一样好,并且有一组足够丰富的观察到的协变量,这些协变量可以导致治疗或由治疗引起——随机缺失的延伸(MAR)。我提出了基于核矩阵运算的反事实结果的均值、增量和分布的估计,允许处理和协变量是离散的或连续的,低维、高维或无限维。对于连续处理的情况,我证明了有限样本率的一致性。对于离散处理的情况,我证明了根n一致性、高斯近似和半参数效率。 摘要:I propose kernel ridge regression estimators for nonparametric dose response curves and semiparametric treatment effects in the setting where an analyst has access to a selected sample rather than a random sample; only for select observations, the outcome is observed. I assume selection is as good as random conditional on treatment and a sufficiently rich set of observed covariates, where the covariates are allowed to cause treatment or be caused by treatment -- an extension of missingness-at-random (MAR). I propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations, allowing treatment and covariates to be discrete or continuous, and low, high, or infinite dimensional. For the continuous treatment case, I prove uniform consistency with finite sample rates. For the discrete treatment case, I prove root-n consistency, Gaussian approximation, and semiparametric efficiency.

检测相关(6篇)

【1】 Community detection using low-dimensional network embedding algorithms 标题:基于低维网络嵌入算法的社区发现 链接:https://arxiv.org/abs/2111.05267

作者:Aman Barot,Shankar Bhamidi,Souvik Dhara 摘要:随着大型网络在重要领域的相关性日益增强,如研究接触网络对疾病传播的影响,或研究社会网络对地缘政治的影响,有必要研究可扩展到通常包含数百万节点的超大网络的机器学习工具。这种可伸缩算法的一大类称为网络表示学习或网络嵌入。这些算法试图通过首先运行多个随机游动,然后使用观察到的随机游动段中每对节点的共现次数来学习网络泛函(例如~节点)的表示,以获得某些欧几里德空间上节点的低维表示。本文的目的是严格理解两种主要算法DeepWalk和node2vec在恢复具有地面真实社区的规范网络模型社区方面的性能。根据图的稀疏性,我们找到了所需的随机游走段的长度,以便相应的观察到的共现窗口能够执行基本社区分配的几乎精确恢复。我们证明,与使用简单随机游动的DeepWalk相比,在给定固定共现窗口的情况下,使用低非回溯概率的随机游动的node2vec可以在更稀疏的网络中成功。此外,如果稀疏性参数较低,我们提供的证据表明,这些算法可能无法实现几乎精确的恢复。该分析需要开发具有底层低秩结构的随机网络路径计算的通用工具,这些网络具有独立的兴趣。 摘要:With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learning or network embedding. These algorithms try to learn representations of network functionals (e.g.~nodes) by first running multiple random walks and then using the number of co-occurrences of each pair of nodes in observed random walk segments to obtain a low-dimensional representation of nodes on some Euclidean space. The aim of this paper is to rigorously understand the performance of two major algorithms, DeepWalk and node2vec, in recovering communities for canonical network models with ground truth communities. Depending on the sparsity of the graph, we find the length of the random walk segments required such that the corresponding observed co-occurrence window is able to perform almost exact recovery of the underlying community assignments. We prove that, given some fixed co-occurrence window, node2vec using random walks with a low non-backtracking probability can succeed for much sparser networks compared to DeepWalk using simple random walks. Moreover, if the sparsity parameter is low, we provide evidence that these algorithms might not succeed in almost exact recovery. The analysis requires developing general tools for path counting on random networks having an underlying low-rank structure, which are of independent interest.

【2】 Does Thermal data make the detection systems more reliable? 标题:热数据是否使检测系统更加可靠? 链接:https://arxiv.org/abs/2111.05191

作者:Shruthi Gowda,Bahram Zonooz,Elahe Arani 机构:Advanced Research Lab, NavInfo Europe, The Netherlands 备注:Accepted at NeurIPS 2021 - ML4AD workshop (The code for this research is available at: this https URL) 摘要:基于深度学习的检测网络在自动驾驶系统(ADS)方面取得了显著的进展。ADS应在各种环境照明和恶劣天气条件下具有可靠的性能。然而,亮度下降和视觉障碍(如眩光、雾)会导致视觉摄像机的图像质量差,从而导致性能下降。为了克服这些挑战,我们探讨了利用不同的数据模式的想法,该模式与可视数据完全不同,但又是互补的。我们提出了一个基于多模态协作框架的综合检测系统,该框架同时学习RGB(来自视觉摄像机)和热(来自红外摄像机)数据。该框架协同训练两个网络,并在学习其自身模式的最佳特征时提供灵活性,同时还结合了其他模式的补充知识。我们广泛的实证结果表明,虽然精度的提高是名义上的,但其价值在于具有挑战性和极端困难的边缘情况,这在AD等安全关键应用中至关重要。我们提供了在检测中使用热成像系统的优缺点的整体视图。 摘要:Deep learning-based detection networks have made remarkable progress in autonomous driving systems (ADS). ADS should have reliable performance across a variety of ambient lighting and adverse weather conditions. However, luminance degradation and visual obstructions (such as glare, fog) result in poor quality images by the visual camera which leads to performance decline. To overcome these challenges, we explore the idea of leveraging a different data modality that is disparate yet complementary to the visual data. We propose a comprehensive detection system based on a multimodal-collaborative framework that learns from both RGB (from visual cameras) and thermal (from Infrared cameras) data. This framework trains two networks collaboratively and provides flexibility in learning optimal features of its own modality while also incorporating the complementary knowledge of the other. Our extensive empirical results show that while the improvement in accuracy is nominal, the value lies in challenging and extremely difficult edge cases which is crucial in safety-critical applications such as AD. We provide a holistic view of both merits and limitations of using a thermal imaging system in detection.

【3】 Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? 标题:人在环路中的虚假信息检测:立场、情绪还是其他? 链接:https://arxiv.org/abs/2111.05139

作者:Alexander Michael Daniel 机构:Center for Operations Research and Analysis, Defence Research and Development Canada, Department of National Defence, Room , Building T, Carling Avenue, Ottawa, Ontario, Canada, K,K ,Y, Submitted to the ,th International Command and Control Research and 备注:15 pages references. Presented at the 26th International Command and Control Research and Technology Symposium, 18 October 2021 摘要:最近,政治和流行病为机器学习的虚假信息(又称假新闻)检测算法的发展提供了充足的动力。现有文献主要集中于全自动案例,但由此产生的技术无法可靠地检测军事应用所需的各种主题、来源和时间尺度上的虚假信息。然而,通过利用现有的分析师作为人在回路,情绪分析、基于方面的情绪分析和姿态检测等典型的机器学习技术成为用于部分自动虚假信息检测系统的可行方法。本文旨在确定这些技术中哪种最适合此目的,以及每种技术如何最好地用于此目的。每种方法都使用相同大小和几乎相同的神经结构的训练数据集(一个作为单词嵌入器的BERT变换器,其后有一个前馈层),然后在情绪和立场特定的数据集上进行测试,以确定每种方法在完成其他任务时的性能基线。四个与新冠病毒-19假信息相关的不同数据集用于测试每种技术检测训练数据集中未出现的主题假信息的能力。然后使用这些测试的定量和定性结果来深入了解如何在实践中最好地使用这些技术。 摘要:Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, however, the canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system. This paper aims to determine which of these techniques is best suited for this purpose and how each technique might best be used towards this end. Training datasets of the same size and nearly identical neural architectures (a BERT transformer as a word embedder with a single feed-forward layer thereafter) are used for each approach, which are then tested on sentiment- and stance-specific datasets to establish a baseline of how well each method can be used to do the other tasks. Four different datasets relating to COVID-19 disinformation are used to test the ability of each technique to detect disinformation on a topic that did not appear in the training data set. Quantitative and qualitative results from these tests are then used to provide insight into how best to employ these techniques in practice.

【4】 "How Does It Detect A Malicious App?" Explaining the Predictions of AI-based Android Malware Detector 链接:https://arxiv.org/abs/2111.05108

作者:Zhi Lu,Vrizlynn L. L. Thing 机构:Cyber Security Strategic Technology Centre, ST Engineering, Singapore 摘要:人工智能方法已被证明在Android恶意软件检测方面具有令人印象深刻的性能。然而,大多数基于人工智能的方法都是以黑盒的方式对可疑样本进行预测,对模型的推断没有透明度。网络安全和人工智能从业者对模型的可解释性和透明度的期望增加,以确保可信度。在这篇文章中,我们提出了一种新的模型不可知解释方法,用于Android恶意软件检测的AI模型。我们提出的方法通过两个步骤识别和量化与预测相关的数据特征:i)通过操纵特征值生成合成数据的数据扰动;以及ii)优化特征属性值,以寻求在特征值变化最小的扰动数据上预测分数的显著变化。通过三个实验验证了该方法的有效性。我们首先证明了我们提出的模型解释方法可以帮助发现人工智能模型是如何被对抗性样本定量回避的。在接下来的实验中,我们将我们提出的方法的可解释性和保真度分别与现有技术进行比较。 摘要:AI methods have been proven to yield impressive performance on Android malware detection. However, most AI-based methods make predictions of suspicious samples in a black-box manner without transparency on models' inference. The expectation on models' explainability and transparency by cyber security and AI practitioners to assure the trustworthiness increases. In this article, we present a novel model-agnostic explanation method for AI models applied for Android malware detection. Our proposed method identifies and quantifies the data features relevance to the predictions by two steps: i) data perturbation that generates the synthetic data by manipulating features' values; and ii) optimization of features attribution values to seek significant changes of prediction scores on the perturbed data with minimal feature values changes. The proposed method is validated by three experiments. We firstly demonstrate that our proposed model explanation method can aid in discovering how AI models are evaded by adversarial samples quantitatively. In the following experiments, we compare the explainability and fidelity of our proposed method with state-of-the-arts, respectively.

【5】 A Statistical Difference Reduction Method for Escaping Backdoor Detection 标题:一种逃避后门检测的统计减差方法 链接:https://arxiv.org/abs/2111.05077

作者:Pengfei Xia,Hongjing Niu,Ziqiang Li,Bin Li 摘要:最近的研究表明,深度神经网络(DNN)容易受到后门攻击。受感染的模型通常在良性输入上运行,而其预测将被迫针对敌方数据的特定攻击目标。已经开发了几种检测方法来区分输入以抵御此类攻击。这些防御所依赖的共同假设是,感染模型提取的干净输入和敌对输入的潜在表示之间存在巨大的统计差异。然而,尽管这一假设很重要,但缺乏关于该假设是否必须真实的全面研究。在本文中,我们重点研究了以下相关问题:1)统计差异的性质是什么?2) 如何在不影响攻击强度的情况下有效降低它们?3) 这种减少对基于差异的防御有什么影响?我们的工作是围绕这三个问题展开的。首先,通过引入最大平均差异(MMD)作为度量,我们发现多层次表示的统计差异都很大,而不仅仅是最高层次。然后,我们提出了一种统计差分缩减方法(SDRM),在训练后门模型的过程中,通过在损失函数中添加多级MMD约束来有效地减少差异。最后,研究了三种典型的基于差分的检测方法。在所有两个数据集、四个模型体系结构和四种攻击方法上,这些防御的F1分数从定期训练的后门模型的90%-100%下降到使用SDRM训练的模型的60%-70%。结果表明,该方法可用于增强现有的攻击,以逃避后门检测算法。 摘要:Recent studies show that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. An infected model behaves normally on benign inputs, whereas its prediction will be forced to an attack-specific target on adversarial data. Several detection methods have been developed to distinguish inputs to defend against such attacks. The common hypothesis that these defenses rely on is that there are large statistical differences between the latent representations of clean and adversarial inputs extracted by the infected model. However, although it is important, comprehensive research on whether the hypothesis must be true is lacking. In this paper, we focus on it and study the following relevant questions: 1) What are the properties of the statistical differences? 2) How to effectively reduce them without harming the attack intensity? 3) What impact does this reduction have on difference-based defenses? Our work is carried out on the three questions. First, by introducing the Maximum Mean Discrepancy (MMD) as the metric, we identify that the statistical differences of multi-level representations are all large, not just the highest level. Then, we propose a Statistical Difference Reduction Method (SDRM) by adding a multi-level MMD constraint to the loss function during training a backdoor model to effectively reduce the differences. Last, three typical difference-based detection methods are examined. The F1 scores of these defenses drop from 90%-100% on the regularly trained backdoor models to 60%-70% on the models trained with SDRM on all two datasets, four model architectures, and four attack methods. The results indicate that the proposed method can be used to enhance existing attacks to escape backdoor detection algorithms.

【6】 Using sequential drift detection to test the API economy 标题:使用序贯漂移检测来测试API的经济性 链接:https://arxiv.org/abs/2111.05136

作者:Samuel Ackerman,Parijat Dube,Eitan Farchi 摘要:API经济指的是API(高级编程接口)微服务的广泛集成,其中软件应用程序可以相互通信,作为业务模型和功能中的关键元素。这样一个系统可以被使用的可能方式的数量是巨大的。因此,需要监控使用模式并确定系统何时以以前从未使用过的方式使用。这为系统分析员提供了警告,他们可以确保系统的不间断运行。在这项工作中,我们分析了API使用的直方图和调用图,以确定系统的使用模式是否发生了变化。我们比较了非参数统计和贝叶斯序列分析在该问题中的应用。这样做可以克服重复统计测试的问题,并确保警报的统计意义。对该技术进行了模拟和测试,证明该技术在各种情况下都能有效地检测到漂移。我们还提到了对该技术的修改,以减少其内存,从而在监测开始时的延迟发生分布漂移时,能够更快地响应。 摘要:The API economy refers to the widespread integration of API (advanced programming interface) microservices, where software applications can communicate with each other, as a crucial element in business models and functions. The number of possible ways in which such a system could be used is huge. It is thus desirable to monitor the usage patterns and identify when the system is used in a way that was never used before. This provides a warning to the system analysts and they can ensure uninterrupted operation of the system. In this work we analyze both histograms and call graph of API usage to determine if the usage patterns of the system has shifted. We compare the application of nonparametric statistical and Bayesian sequential analysis to the problem. This is done in a way that overcomes the issue of repeated statistical tests and insures statistical significance of the alerts. The technique was simulated and tested and proven effective in detecting the drift in various scenarios. We also mention modifications to the technique to decrease its memory so that it can respond more quickly when the distribution drift occurs at a delay from when monitoring begins.

分类|识别(2篇)

【1】 Label-Aware Distribution Calibration for Long-tailed Classification 标题:长尾分类的标签感知分布校正 链接:https://arxiv.org/abs/2111.04901

作者:Chaozheng Wang,Shuzheng Gao,Cuiyun Gao,Pengyun Wang,Wenjie Pei,Lujia Pan,Zenglin Xu 机构: Harbin Institute of Technology, Shenzhen, Noah’s Ark Lab, Shenzhen, Huawei Technologies Co. Ltd 备注:9 pages 摘要:现实世界的数据通常呈现长尾分布。对不平衡数据的训练往往会使神经网络在头类上表现良好,而在尾类上表现更差。尾类训练实例的严重稀疏性是主要的挑战,这导致训练过程中的分布估计存在偏差。已经投入了大量的努力来改善这一挑战,包括数据重新采样和为尾部课程合成新的训练实例。然而,以前的研究还没有利用头类到尾类的可转移知识来校准尾类的分布。在本文中,我们假设相似的头类可以丰富尾类,并提出了一种新的分布校准方法,称为标签感知分布校准LADC。LADC从相关的头类转移统计数据,以推断尾类的分布。来自校准分布的采样进一步有助于重新平衡分类器。在图像和文本长尾数据集上的实验表明,LADC显著优于现有的方法。可视化还表明,LADC提供了更准确的分布估计。 摘要:Real-world data usually present long-tailed distributions. Training on imbalanced data tends to render neural networks perform well on head classes while much worse on tail classes. The severe sparseness of training instances for the tail classes is the main challenge, which results in biased distribution estimation during training. Plenty of efforts have been devoted to ameliorating the challenge, including data re-sampling and synthesizing new training instances for tail classes. However, no prior research has exploited the transferable knowledge from head classes to tail classes for calibrating the distribution of tail classes. In this paper, we suppose that tail classes can be enriched by similar head classes and propose a novel distribution calibration approach named as label-Aware Distribution Calibration LADC. LADC transfers the statistics from relevant head classes to infer the distribution of tail classes. Sampling from calibrated distribution further facilitates re-balancing the classifier. Experiments on both image and text long-tailed datasets demonstrate that LADC significantly outperforms existing methods.The visualization also shows that LADC provides a more accurate distribution estimation.

【2】 Harmless interpolation in regression and classification with structured features 标题:结构化特征回归分类中的无害化插值 链接:https://arxiv.org/abs/2111.05198

作者:Andrew D. McRae,Santhosh Karnik,Mark A. Davenport,Vidya Muthukumar 机构:School of Electrical and Computer Engineering, Georgia Tech, Department of Computational Mathematics, Science, & Engineering, School of Industrial and Systems Engineering, Georgia Tech 摘要:过参数化神经网络倾向于完美地拟合有噪声的训练数据,但对测试数据具有良好的泛化能力。受这一经验观察的启发,最近的工作试图在更简单的线性模型中理解良性过度拟合或无害插值的现象。以前的理论工作严格假设数据特征在统计上是独立的,或者输入数据是高维的;这排除了结构化特征映射的常规非参数设置。本文在再生核Hilbert空间中提出了一个通用的、灵活的上界回归和分类风险框架。一个关键贡献是,我们的框架描述了数据图矩阵上发生无害插值的精确充分条件。我们的结果恢复了先前的独立特征结果(通过更简单的分析),但它们进一步表明,无害的插值可以发生在更一般的设置中,例如有界正交系统的特征。此外,我们的结果显示了分类和回归性能之间的渐进分离,这种分离方式以前仅适用于高斯特征。 摘要:Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the input data is high-dimensional; this precludes general nonparametric settings with structured feature maps. In this paper, we present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space. A key contribution is that our framework describes precise sufficient conditions on the data Gram matrix under which harmless interpolation occurs. Our results recover prior independent-features results (with a much simpler analysis), but they furthermore show that harmless interpolation can occur in more general settings such as features that are a bounded orthonormal system. Furthermore, our results show an asymptotic separation between classification and regression performance in a manner that was previously only shown for Gaussian features.

编码器(1篇)

【1】 RAVE: A variational autoencoder for fast and high-quality neural audio synthesis 标题:RAVE:一种用于快速高质量神经音频合成的变分自动编码器 链接:https://arxiv.org/abs/2111.05011

作者:Antoine Caillon,Philippe Esling 机构:IRCAM - Sorbonne Universit´e, CNRS UMR , place Igor Stravinsky, Paris, France 摘要:应用于音频的深层生成模型极大地提高了许多语音和音乐相关任务的最新水平。然而,由于原始波形建模仍然是一项固有的困难任务,音频生成模型要么计算密集,依赖于低采样率,要么控制或限制可能信号的性质比较复杂。在这些模型中,变分自动编码器(VAE)通过暴露潜在变量来控制生成,尽管它们通常存在合成质量较低的问题。在本文中,我们介绍了一种实时音频变分自动编码器(RAVE),它可以实现快速和高质量的音频波形合成。我们介绍了一种新的两阶段训练过程,即表征学习和对抗性微调。我们表明,使用潜在空间的训练后分析可以直接控制重建保真度和表示紧凑性。通过利用原始波形的多波段分解,我们证明了我们的模型是第一个能够生成48kHz音频信号的模型,同时在标准笔记本电脑CPU上运行速度比实时速度快20倍。我们使用定量和定性的主观实验来评估合成质量,并展示了我们的方法与现有模型相比的优越性。最后,我们介绍了我们的模型在音色传输和信号压缩方面的应用。我们所有的源代码和音频示例都是公开的。 摘要:Deep generative models applied to audio have improved by a large margin the state-of-the-art in many speech and music related tasks. However, as raw waveform modelling remains an inherently difficult task, audio generative models are either computationally intensive, rely on low sampling rates, are complicated to control or restrict the nature of possible signals. Among those models, Variational AutoEncoders (VAE) give control over the generation by exposing latent variables, although they usually suffer from low synthesis quality. In this paper, we introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis. We introduce a novel two-stage training procedure, namely representation learning and adversarial fine-tuning. We show that using a post-training analysis of the latent space allows a direct control between the reconstruction fidelity and the representation compactness. By leveraging a multi-band decomposition of the raw waveform, we show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU. We evaluate synthesis quality using both quantitative and qualitative subjective experiments and show the superiority of our approach compared to existing models. Finally, we present applications of our model for timbre transfer and signal compression. All of our source code and audio examples are publicly available.

优化|敛散性(7篇)

【1】 Turing-Universal Learners with Optimal Scaling Laws 标题:图灵-具有最优比例律的通用学习器 链接:https://arxiv.org/abs/2111.05321

作者:Preetum Nakkiran 机构:Halıcıoğlu Data Science Institute, University of California San Diego 摘要:对于给定的分布、学习算法和性能度量,收敛速度(或数据比例律)是算法测试性能的渐近行为,是训练样本数的函数。理论和实践中的许多学习方法都有幂律比率,即绩效等级为$n^{-alpha}$,有些$alpha>0$。此外,理论家和实践者都关心在感兴趣的环境下提高学习算法的速度。我们观察到“通用学习器”的存在,它在指定的运行时间内(例如$O(n^2)$)在所有学习算法中实现了最佳的可能分布相关的渐近速率,而在该运行时间内只会导致多段对数减速。该算法是统一的,不依赖于分布,但对于所有分布都能获得最佳的可能速率。该结构本身是Levin的通用搜索(Levin,1973)的简单扩展。就像宇宙搜索一样,宇宙学习者根本不实用,主要是理论和哲学兴趣。 摘要:For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-alpha}$ for some $alpha > 0$. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the existence of a "universal learner", which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (e.g. $O(n^2)$), while incurring only polylogarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The construction itself is a simple extension of Levin's universal search (Levin, 1973). And much like universal search, the universal learner is not at all practical, and is primarily of theoretical and philosophical interest.

【2】 Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions 标题:原子干预学习因果DAG的几乎最优通用下界 链接:https://arxiv.org/abs/2111.05070

作者:Vibhor Porwal,Piyush Srivastava,Gaurav Sinha 机构:com†Tata Institute of Fundamental Research, comPS acknowledges support from the Department of Atomic Energy 摘要:在因果有向无环图(DAG)的结构学习问题中出现的一个研究得很好的挑战是,使用观测数据,只能将图学习到“马尔可夫等价类”(MEC)。剩余的无向边必须使用干预来定向,在应用程序中执行干预可能非常昂贵。因此,尽可能减少MEC完全定位所需的干预数量的问题最近受到了广泛关注,也是这项工作的重点。我们证明了两个主要结果。第一个是任何算法(主动或被动)都需要执行的原子干预数量的新的通用下限,以确定给定MEC的方向。我们的第二个结果表明,事实上,这个界限在能够确定MEC方向的最小原子干涉集大小的两倍以内。我们的下界可以证明比以前已知的下界更好。我们的下界的证明基于CBSP序的新概念,CBSP序是没有v-结构且满足某些特殊性质的DAG的拓扑序。此外,通过对合成图进行模拟,并给出特殊图族的例子,我们证明了我们的界通常明显更好。 摘要:A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. We prove two main results. The first is a new universal lower bound on the number of atomic interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. The proof of our lower bound is based on the new notion of CBSP orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better.

【3】 Misspecified Gaussian Process Bandit Optimization 标题:误指定高斯过程的Bandit优化 链接:https://arxiv.org/abs/2111.05008

作者:Ilija Bogunovic,Andreas Krause 机构:ETH Zürich 备注:Accepted to NeurIPS 2021 摘要:我们考虑优化的黑盒函数的基础上嘈杂的土匪反馈。核化bandit算法在这个问题上表现出了很强的经验和理论性能。然而,他们严重依赖于模型被很好地指定的假设,如果没有它,可能会失败。相反,我们引入了一个emph{misspecified}核化bandit设置,其中未知函数可以是$epsilon$--在某些再生核希尔BERT空间(RKHS)中由一个有界范数的函数一致逼近。我们设计了高效实用的算法,在存在模型错误的情况下,其性能下降最小。具体来说,我们提出了两种基于高斯过程(GP)方法的算法:需要知道错误指定误差的乐观EC-GP-UCB算法和能够适应未知模型错误指定的阶段GP不确定性采样算法。我们根据$epsilon$、时间范围和底层内核提供了它们的累积遗憾的上界,并且我们证明了我们的算法在没有错误指定的先验知识的情况下实现了对$epsilon$的最佳依赖。此外,在随机背景下,我们证明了EC-GP-UCB可以有效地与遗憾边界平衡策略相结合,并在不知道$epsilon$的情况下获得相似的遗憾边界。 摘要:We consider the problem of optimizing a black-box function based on noisy bandit feedback. Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. They heavily rely on the assumption that the model is well-specified, however, and can fail without it. Instead, we introduce a emph{misspecified} kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We design efficient and practical algorithms whose performance degrades minimally in the presence of model misspecification. Specifically, we present two algorithms based on Gaussian process (GP) methods: an optimistic EC-GP-UCB algorithm that requires knowing the misspecification error, and Phased GP Uncertainty Sampling, an elimination-type algorithm that can adapt to unknown model misspecification. We provide upper bounds on their cumulative regret in terms of $epsilon$, the time horizon, and the underlying kernel, and we show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification. In addition, in a stochastic contextual setting, we show that EC-GP-UCB can be effectively combined with the regret bound balancing strategy and attain similar regret bounds despite not knowing $epsilon$.

【4】 Safe Policy Optimization with Local Generalized Linear Function Approximations 标题:基于局部广义线性函数逼近的安全策略优化 链接:https://arxiv.org/abs/2111.04894

作者:Akifumi Wachi,Yunyue Wei,Yanan Sui 机构:IBM Research, Tsinghua University 备注:18 pages, 6 figures, Accepted to NeurIPS-21 摘要:安全探索是在安全关键系统中应用强化学习(RL)的关键。现有的安全勘探方法是在规律性假设下保证安全的,难以应用于大规模的实际问题。我们提出了一种新的算法SPO-LF,该算法在学习传感器获得的局部可用特征和使用广义线性函数近似的环境奖励/安全之间的关系的同时优化代理的策略。我们为其安全性和最优性提供了理论保证。我们的实验表明,我们的算法1)在样本复杂度和计算成本方面更有效,2)比以前的安全RL方法更适用于大规模问题,并且3)与现有的具有安全约束的高级深度RL方法相比,具有相当的样本效率和安全性。 摘要:Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems. Existing safe exploration methods guaranteed safety under the assumption of regularity, and it has been difficult to apply them to large-scale real problems. We propose a novel algorithm, SPO-LF, that optimizes an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety using generalized linear function approximations. We provide theoretical guarantees on its safety and optimality. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees, and 3) comparably sample-efficient and safer compared with existing advanced deep RL methods with safety constraints.

【5】 Efficient estimates of optimal transport via low-dimensional embeddings 标题:基于低维嵌入的最优传输的有效估计 链接:https://arxiv.org/abs/2111.04838

作者:Patric M. Fulop,Vincent Danos 机构: PatricSchool of InformaticsUniversity of EdinburghEH8 9AB 备注:Neurips 2021 Optimal Transport and Machine Learning Workshop 摘要:最优传输距离(OT)作为比较概率分布的一种方法,在机器学习领域得到了广泛的应用。当数据生活在高维环境中时,计算这些数据的成本很高。Paty等人于2019年开展的最新工作旨在通过使用数据的低阶预测(被视为离散度量)计算OT来降低成本。我们扩展了这一方法,并表明,如果地图族是1-Lipschitz,则可以使用更一般的地图族来近似OT距离。最佳估计是通过在给定的族上最大化OT来获得的。由于OT计算是在将数据映射到低维空间后进行的,因此我们的方法可以很好地与原始数据维进行缩放。我们用神经网络来演示这个想法。 摘要:Optimal transport distances (OT) have been widely used in recent work in Machine Learning as ways to compare probability distributions. These are costly to compute when the data lives in high dimension. Recent work by Paty et al., 2019, aims specifically at reducing this cost by computing OT using low-rank projections of the data (seen as discrete measures). We extend this approach and show that one can approximate OT distances by using more general families of maps provided they are 1-Lipschitz. The best estimate is obtained by maximising OT over the given family. As OT calculations are done after mapping data to a lower dimensional space, our method scales well with the original data dimension. We demonstrate the idea with neural networks.

【6】 Safe Optimal Design with Applications in Policy Learning 标题:安全优化设计及其在政策学习中的应用 链接:https://arxiv.org/abs/2111.04835

作者:Ruihao Zhu,Branislav Kveton 机构:Purdue Krannet School of Management, Amazon Science, Berkeley 摘要:出于在线实验和非策略学习的实际需要,我们研究了安全优化设计问题,其中我们开发了一个数据记录策略,该策略可以有效地探索,同时通过基线生产策略实现竞争性回报。我们首先表明,也许令人惊讶的是,将生产策略与统一勘探相结合的常见做法,尽管是安全的,但在最大化信息收益方面是次优的。然后,我们提出了一个安全的最优日志记录策略,当没有关于行为预期回报的附加信息可用时。我们通过考虑旁侧信息对该设计进行了改进,并使用线性奖励模型将这两种方法扩展到大量动作。我们分析数据记录策略如何影响非策略学习中的错误。最后,我们通过大量实验验证了我们设计的好处。 摘要:Motivated by practical needs in online experimentation and off-policy learning, we study the problem of safe optimal design, where we develop a data logging policy that efficiently explores while achieving competitive rewards with a baseline production policy. We first show, perhaps surprisingly, that a common practice of mixing the production policy with uniform exploration, despite being safe, is sub-optimal in maximizing information gain. Then we propose a safe optimal logging policy for the case when no side information about the actions' expected rewards is available. We improve upon this design by considering side information and also extend both approaches to a large number of actions with a linear reward model. We analyze how our data logging policies impact errors in off-policy learning. Finally, we empirically validate the benefit of our designs by conducting extensive experiments.

【7】 Explaining Hyperparameter Optimization via Partial Dependence Plots 标题:用部分相关图解释超参数优化 链接:https://arxiv.org/abs/2111.04820

作者:Julia Moosbauer,Julia Herbinger,Giuseppe Casalicchio,Marius Lindauer,Bernd Bischl 机构:Department of Statistics, Ludwig-Maximilians-University Munich, Munich, Germany, Institute of Information Processing, Leibniz University Hannover, Hannover, Germany 摘要:自动超参数优化(HPO)可以支持从业者在机器学习模型中获得最佳性能。然而,对于不同超参数对最终模型性能的影响,通常缺乏有价值的见解。由于缺乏可解释性,因此很难信任和理解自动化HPO过程及其结果。我们建议使用可解释机器学习(IML)来从使用贝叶斯优化(BO)的HPO期间获得的实验数据中获得见解。BO倾向于关注具有潜在高性能配置的有前途的区域,因此会导致采样偏差。因此,许多IML技术,如部分依赖图(PDP),具有产生有偏解释的风险。通过利用BO替代模型的后验不确定性,我们引入了一种具有估计置信带的PDP变体。我们建议对超参数空间进行划分,以在相关子区域获得更可靠的PDP。在一项实验研究中,我们为亚区域内PDP质量的提高提供了定量证据。 摘要:Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO with Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, such as the partial dependence plot (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. We propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.

预测|估计(8篇)

【1】 High-order joint embedding for multi-level link prediction 标题:用于多级链接预测的高阶联合嵌入 链接:https://arxiv.org/abs/2111.05265

作者:Yubai Yuan,Annie Qu 机构: Department of Statistics, University of California 备注:35 pages 摘要:链路预测从观测到的网络中推断出潜在链路,是网络分析中的基本问题之一。与传统的只预测双向成对关系的图表示模型不同,我们提出了一种新的基于张量的联合网络嵌入方法,将成对链接和超链接同时编码到潜在空间中,它捕获了在推断潜在的未观察到的超链接时成对链接和多路链接之间的依赖关系。所提出的嵌入方法的主要优点是,它结合了节点之间的成对关系和子组结构,以捕获更丰富的网络信息。此外,该方法还引入了链接之间的层次依赖关系来推断潜在的超链接,从而实现了更好的链接预测。在理论上,我们为所提出的嵌入方法建立了估计一致性,并且与仅利用成对链接或超链接的链接预测相比,提供了更快的收敛速度。对仿真环境和Facebook ego网络的数值研究表明,与现有的链接预测算法相比,该方法提高了超链接和成对链接的预测精度。 摘要:Link prediction infers potential links from observed networks, and is one of the essential problems in network analyses. In contrast to traditional graph representation modeling which only predicts two-way pairwise relations, we propose a novel tensor-based joint network embedding approach on simultaneously encoding pairwise links and hyperlinks onto a latent space, which captures the dependency between pairwise and multi-way links in inferring potential unobserved hyperlinks. The major advantage of the proposed embedding procedure is that it incorporates both the pairwise relationships and subgroup-wise structure among nodes to capture richer network information. In addition, the proposed method introduces a hierarchical dependency among links to infer potential hyperlinks, and leads to better link prediction. In theory we establish the estimation consistency for the proposed embedding approach, and provide a faster convergence rate compared to link prediction utilizing pairwise links or hyperlinks only. Numerical studies on both simulation settings and Facebook ego-networks indicate that the proposed method improves both hyperlink and pairwise link prediction accuracy compared to existing link prediction algorithms.

【2】 Ethically aligned Deep Learning: Unbiased Facial Aesthetic Prediction 标题:伦理一致的深度学习:无偏见的面部美学预测 链接:https://arxiv.org/abs/2111.05149

作者:Michael Danner,Thomas Weber,Leping Peng,Tobias Gerlach,Xueping Su,Matthias Rätsch 机构:∗University of Surrey †Reutlingen University ‡Hunan University of Science and Technology, §Xi’an Polytechnic University 备注:Peer reviewed and accepted at CEPE/IACAP 2021 as Extended Abstract 摘要:面部美预测(FBP)旨在开发一种自动进行面部吸引力评估的机器。在过去,这些结果与人类评分高度相关,因此也与他们在注释上的偏见高度相关。由于人工智能可能具有种族主义和歧视倾向,因此必须查明数据中出现偏差的原因。对科学家来说,开发对有偏见的信息具有鲁棒性的训练数据和人工智能算法是一个新的挑战。由于审美判断通常是有偏见的,我们想进一步提出一种无偏卷积神经网络。虽然可以创建网络模型,从道德的角度对人脸的吸引力进行高水平的评估,但同样重要的是确保模型不带偏见。在这项工作中,我们引入了唯美网,这是一个最先进的吸引力预测网络,其显著优于竞争对手,Pearson相关系数为0.9601。此外,我们提出了一种新的生成无偏差CNN的方法,以提高机器学习的公平性。 摘要:Facial beauty prediction (FBP) aims to develop a machine that automatically makes facial attractiveness assessment. In the past those results were highly correlated with human ratings, therefore also with their bias in annotating. As artificial intelligence can have racist and discriminatory tendencies, the cause of skews in the data must be identified. Development of training data and AI algorithms that are robust against biased information is a new challenge for scientists. As aesthetic judgement usually is biased, we want to take it one step further and propose an Unbiased Convolutional Neural Network for FBP. While it is possible to create network models that can rate attractiveness of faces on a high level, from an ethical point of view, it is equally important to make sure the model is unbiased. In this work, we introduce AestheticNet, a state-of-the-art attractiveness prediction network, which significantly outperforms competitors with a Pearson Correlation of 0.9601. Additionally, we propose a new approach for generating a bias-free CNN to improve fairness in machine learning.

【3】 Prediction of new outlinks for focused crawling 标题:用于聚焦爬行的新的外部链接的预测 链接:https://arxiv.org/abs/2111.05062

作者:Thi Kim Nhung Dang,Doina Bucur,Berk Atil,Guillaume Pitel,Frank Ruis,Hamidreza Kadkhodaei,Nelly Litvak 机构:University of Twente, The Netherlands, Bogazici University, Turkey, Exensa, France, Eindhoven University of Technology, The Netherlands 备注:25 pages, 20 figures, 5 tables, uses arxiv.sty 摘要:发现新的超链接使网络爬虫能够找到尚未编制索引的新页面。这对于专注的爬虫程序尤其重要,因为它们努力提供对Web特定部分的全面分析,从而将新页面的发现优先于内容变化的发现。在文献中,超链接和内容的变化通常被同时考虑。然而,也有证据表明,这两种变化不一定相关。此外,许多关于预测变化的研究假设页面的历史很长,这在实践中是无法实现的。这项工作的目的是提供一种方法,有效地检测新的链接使用一个简短的历史。为此,我们每隔一周使用十次爬网的数据集。我们的研究由三部分组成。首先,我们通过分析新大纲图数量的经验特性,对数据进行深入了解。我们观察到,平均而言,这些属性随着时间的推移是稳定的,但目标页面域内和域外的页面(分别为内部和外部大纲链接)出现超链接之间存在很大差异。接下来,我们提供三个目标的统计模型:链接更改率、新链接的存在和新链接的数量。这些模型包括文献中早期使用的特性,以及本工作中引入的新特性。我们分析了特征之间的相关性,并调查了它们的信息量。一个值得注意的发现是,如果目标页面的历史记录不可用,那么表示相关页面历史记录的新特性最能预测目标页面中的新链接。最后,我们提出了排名方法,作为重点爬虫高效发现新页面的指导原则,从而在相应的目标方面取得了优异的性能。 摘要:Discovering new hyperlinks enables Web crawlers to find new pages that have not yet been indexed. This is especially important for focused crawlers because they strive to provide a comprehensive analysis of specific parts of the Web, thus prioritizing discovery of new pages over discovery of changes in content. In the literature, changes in hyperlinks and content have been usually considered simultaneously. However, there is also evidence suggesting that these two types of changes are not necessarily related. Moreover, many studies about predicting changes assume that long history of a page is available, which is unattainable in practice. The aim of this work is to provide a methodology for detecting new links effectively using a short history. To this end, we use a dataset of ten crawls at intervals of one week. Our study consists of three parts. First, we obtain insight in the data by analyzing empirical properties of the number of new outlinks. We observe that these properties are, on average, stable over time, but there is a large difference between emergence of hyperlinks towards pages within and outside the domain of a target page (internal and external outlinks, respectively). Next, we provide statistical models for three targets: the link change rate, the presence of new links, and the number of new links. These models include the features used earlier in the literature, as well as new features introduced in this work. We analyze correlation between the features, and investigate their informativeness. A notable finding is that, if the history of the target page is not available, then our new features, that represent the history of related pages, are most predictive for new links in the target page. Finally, we propose ranking methods as guidelines for focused crawlers to efficiently discover new pages, which achieve excellent performance with respect to the corresponding targets.

【4】 Time-Varying Channel Prediction for RIS-Assisted MU-MISO Networks via Deep Learning 标题:基于深度学习的RIS辅助MU-MISO网络时变信道预测 链接:https://arxiv.org/abs/2111.04971

作者:Wangyang Xu,Jiancheng An,Yongjun Xu,Chongwen Huang,Lu Gan,Chau Yuen 机构: Chongqing University of Posts and Telecommunications 备注:30 pages, 13 figures 摘要:为了缓解阴影衰落和障碍物阻塞的影响,可重构智能表面(RIS)技术通过以较低的硬件成本和功耗控制可重构无源元件来改善无线通信的信号传输质量,已成为一种很有前途的技术。然而,由于RIS无源元件数量众多,准确、低延迟和低导频开销的信道状态信息(CSI)获取在RIS辅助系统中仍然是一个相当大的挑战。在本文中,我们提出了一个三阶段联合信道分解和预测框架来要求CSI。该框架利用了基站(BS)-RIS信道是准静态的和RIS用户设备(UE)信道是快速时变的两个时间尺度特性。具体地说,在第一阶段,我们使用全双工技术来估计BS的特定天线和RIS之间的信道,解决信道分解中的关键缩放模糊问题。然后,我们设计了一种新的深度神经网络,即稀疏连接长短时记忆(SCLSTM),并分别在第二和第三阶段提出了基于SCLSM的算法。该算法可以同时从级联信道中分解BS-RIS信道和RIS-UE信道,并捕获RIS-UE信道的时间关系进行预测。仿真结果表明,与传统的信道估计算法相比,我们提出的框架具有更低的导频开销,并且基于SCLSTM的算法也能够稳健有效地实现更精确的CSI捕获。 摘要:To mitigate the effects of shadow fading and obstacle blocking, reconfigurable intelligent surface (RIS) has become a promising technology to improve the signal transmission quality of wireless communications by controlling the reconfigurable passive elements with less hardware cost and lower power consumption. However, accurate, low-latency and low-pilot-overhead channel state information (CSI) acquisition remains a considerable challenge in RIS-assisted systems due to the large number of RIS passive elements. In this paper, we propose a three-stage joint channel decomposition and prediction framework to require CSI. The proposed framework exploits the two-timescale property that the base station (BS)-RIS channel is quasi-static and the RIS-user equipment (UE) channel is fast time-varying. Specifically, in the first stage, we use the full-duplex technique to estimate the channel between a BS's specific antenna and the RIS, addressing the critical scaling ambiguity problem in the channel decomposition. We then design a novel deep neural network, namely, the sparse-connected long short-term memory (SCLSTM), and propose a SCLSTM-based algorithm in the second and third stages, respectively. The algorithm can simultaneously decompose the BS-RIS channel and RIS-UE channel from the cascaded channel and capture the temporal relationship of the RIS-UE channel for prediction. Simulation results show that our proposed framework has lower pilot overhead than the traditional channel estimation algorithms, and the proposed SCLSTM-based algorithm can also achieve more accurate CSI acquisition robustly and effectively.

【5】 Learning from Multiple Time Series: A Deep Disentangled Approach to Diversified Time Series Forecasting 标题:从多时间序列中学习:一种深度解缠的多元化时间序列预测方法 链接:https://arxiv.org/abs/2111.04942

作者:Ling Chen,Weiqi Chen,Binqing Wu,Youdong Zhang,Bo Wen,Chenghu Yang 机构: Zhejiang University 摘要:时间序列预测是许多应用中的一个重要问题,例如财务预测和业务优化。现代数据集可以有多个相关的时间序列,这些时间序列通常由全局(共享)规则和局部(特定)动态生成。在本文中,我们试图用DeepDGL来解决此类预测问题,DeepDGL是一种将动态分解为全局和局部时间模式的深度预测模型。DeepDGL采用编码器-解码器体系结构,由两个编码器分别学习全局和局部时间模式,以及一个解码器进行多步预测。具体来说,为了对复杂的全局模式进行建模,引入了矢量量化(VQ)模块,允许全局特征编码器在所有时间序列中学习共享码本。为了对多样性和异质性的局部模式进行建模,提出了一种通过对比多地平线编码(CMC)增强的自适应参数生成模块,为每个时间序列生成局部特征编码器的参数,它最大化了特定于序列的上下文变量与相应时间序列的长期/短期表示之间的互信息。我们在几个真实数据集上的实验表明,DeepDGL优于现有的最新模型。 摘要:Time series forecasting is a significant problem in many applications, e.g., financial predictions and business optimization. Modern datasets can have multiple correlated time series, which are often generated with global (shared) regularities and local (specific) dynamics. In this paper, we seek to tackle such forecasting problems with DeepDGL, a deep forecasting model that disentangles dynamics into global and local temporal patterns. DeepDGL employs an encoder-decoder architecture, consisting of two encoders to learn global and local temporal patterns, respectively, and a decoder to make multi-step forecasting. Specifically, to model complicated global patterns, the vector quantization (VQ) module is introduced, allowing the global feature encoder to learn a shared codebook among all time series. To model diversified and heterogenous local patterns, an adaptive parameter generation module enhanced by the contrastive multi-horizon coding (CMC) is proposed to generate the parameters of the local feature encoder for each individual time series, which maximizes the mutual information between the series-specific context variable and the long/short-term representations of the corresponding time series. Our experiments on several real-world datasets show that DeepDGL outperforms existing state-of-the-art models.

【6】 Double Control Variates for Gradient Estimation in Discrete Latent Variable Models 标题:离散潜变量模型梯度估计的双控制变量 链接:https://arxiv.org/abs/2111.05300

作者:Michalis K. Titsias,Jiaxin Shi 机构:DeepMind, Microsoft Research New England 备注:18 pages 摘要:由于梯度的高方差,基于随机梯度的离散潜变量模型优化具有挑战性。我们介绍了一种利用双控制变量的得分函数估计的方差缩减技术。这些控制变量作用于主控制变量之上,并试图进一步降低总体估计量的方差。我们利用泰勒展开式为强化遗漏估计量建立了一个双控制变量。对于训练离散的潜变量模型,例如具有二进制潜变量的变分自动编码器,我们的方法与使用强化遗漏估计的标准训练相比不增加额外的计算成本。我们将我们的方法应用于挑战高维玩具的例子和训练具有二进制潜变量的变分自动编码器。我们证明了我们的估计可以有较低的方差相比,其他国家的最先进的估计。 摘要:Stochastic gradient-based optimisation for discrete latent variable models is challenging due to the high variance of gradients. We introduce a variance reduction technique for score function estimators that makes use of double control variates. These control variates act on top of a main control variate, and try to further reduce the variance of the overall estimator. We develop a double control variate for the REINFORCE leave-one-out estimator using Taylor expansions. For training discrete latent variable models, such as variational autoencoders with binary latent variables, our approach adds no extra computational cost compared to standard training with the REINFORCE leave-one-out estimator. We apply our method to challenging high-dimensional toy examples and training variational autoencoders with binary latent variables. We show that our estimator can have lower variance compared to other state-of-the-art estimators.

【7】 Stress field prediction in fiber-reinforced composite materials using a deep learning approach 标题:基于深度学习的纤维增强复合材料应力场预测 链接:https://arxiv.org/abs/2111.05271

作者:Anindya Bhaduri,Ashwini Gupta,Lori Graham-Brady 机构:Department of Civil and Systems Engineering, Johns Hopkins University, N. Charles Street, Baltimore, MD, USA 摘要:计算应力分析是材料系统设计中的一个重要步骤。有限元法(FEM)是对复杂材料系统进行应力分析的标准方法。加速应力分析的一种方法是用基于数据驱动机器学习的应力分析方法代替FEM。在这项研究中,我们考虑纤维增强基体复合材料系统,我们使用深度学习工具来寻找替代的有限元方法的应力场预测。我们首先尝试预测具有不同空间结构的固定数量纤维的复合材料系统的应力场图。具体来说,我们试图找到复合材料中纤维的空间排列与相应的von Mises应力场之间的映射。这是通过使用卷积神经网络(CNN)实现的,特别是U型网络结构,使用与训练数据具有相同数量纤维的系统真实应力图。U-Net是一种编解码网络,在本研究中,它将复合材料图像作为输入,并输出与输入图像大小相同的应力场图像。我们通过对训练样本进行不同的初始化来进行鲁棒性分析,以发现预测精度对少量训练样本的敏感性。当复合材料系统中的纤维数量在相同体积分数下增加时,需要更精细的有限元网格离散化来准确表示几何体。这导致计算成本的增加。因此,这里的第二个目标是使用来自相对便宜的、纤维数量较少的系统的真实应力图的信息,预测具有不同空间配置的纤维数量较多的系统的应力场。 摘要:Computational stress analysis is an important step in the design of material systems. Finite element method (FEM) is a standard approach of performing stress analysis of complex material systems. A way to accelerate stress analysis is to replace FEM with a data-driven machine learning based stress analysis approach. In this study, we consider a fiber-reinforced matrix composite material system and we use deep learning tools to find an alternative to the FEM approach for stress field prediction. We first try to predict stress field maps for composite material systems of fixed number of fibers with varying spatial configurations. Specifically, we try to find a mapping between the spatial arrangement of the fibers in the composite material and the corresponding von Mises stress field. This is achieved by using a convolutional neural network (CNN), specifically a U-Net architecture, using true stress maps of systems with same number of fibers as training data. U-Net is a encoder-decoder network which in this study takes in the composite material image as an input and outputs the stress field image which is of the same size as the input image. We perform a robustness analysis by taking different initializations of the training samples to find the sensitivity of the prediction accuracy to the small number of training samples. When the number of fibers in the composite material system is increased for the same volume fraction, a finer finite element mesh discretization is required to represent the geometry accurately. This leads to an increase in the computational cost. Thus, the secondary goal here is to predict the stress field for systems with larger number of fibers with varying spatial configurations using information from the true stress maps of relatively cheaper systems of smaller fiber number.

【8】 EEGEyeNet: a Simultaneous Electroencephalography and Eye-tracking Dataset and Benchmark for Eye Movement Prediction 标题:EEGEyeNet:一种同时进行脑电图和眼动跟踪的数据集和眼动预测基准 链接:https://arxiv.org/abs/2111.05100

作者:Ard Kastrati,Martyna Martyna Beata Płomecka,Damián Pascual,Lukas Wolf,Victor Gillioz,Roger Wattenhofer,Nicolas Langer 机构:ETH Zurich, Switzerland, University of Zurich, Switzerland 备注:Published at NeurIPS 2021 Datasets and Benchmarks Track 摘要:我们提出了一个新的数据集和基准,旨在推进大脑活动和眼球运动交叉点的研究。我们的数据集EEGEyeNet由来自三个不同实验范式的356名不同受试者的同时脑电图(EEG)和眼球跟踪(ET)记录组成。利用这个数据集,我们还提出了一个基准来评估凝视预测从脑电图测量。基准测试包括三项难度越来越大的任务:左右、角度幅度和绝对位置。我们在这个基准上运行了大量的实验,以提供基于经典机器学习模型和大型神经网络的可靠基线。我们发布了完整的代码和数据,并提供了一个简单易用的界面来评估新方法。 摘要:We present a new dataset and benchmark with the goal of advancing research in the intersection of brain activities and eye movements. Our dataset, EEGEyeNet, consists of simultaneous Electroencephalography (EEG) and Eye-tracking (ET) recordings from 356 different subjects collected from three different experimental paradigms. Using this dataset, we also propose a benchmark to evaluate gaze prediction from EEG measurements. The benchmark consists of three tasks with an increasing level of difficulty: left-right, angle-amplitude and absolute position. We run extensive experiments on this benchmark in order to provide solid baselines, both based on classical machine learning models and on large neural networks. We release our complete code and data and provide a simple and easy-to-use interface to evaluate new methods.

其他神经网络|深度学习|模型|建模(23篇)

【1】 Variational Multi-Task Learning with Gumbel-Softmax Priors 标题:基于Gumbel-Softmax先验的变分多任务学习 链接:https://arxiv.org/abs/2111.05323

作者:Jiayi Shen,Xiantong Zhen,Marcel Worring,Ling Shao 机构:AIM Lab, University of Amsterdam, Netherlands, Inception Institute of Artificial Intelligence, Abu Dhabi, UAE 备注:19 pages, 6 figures, accepted by NeurIPS 2021 摘要:多任务学习旨在探索任务相关性以改进单个任务,这在每个任务只有有限数据的挑战场景中具有特别重要的意义。为了应对这一挑战,我们提出了变分多任务学习(VMTL),这是一种用于学习多个相关任务的通用概率推理框架。我们将多任务学习问题转化为一个变分贝叶斯推理问题,通过指定先验条件,以统一的方式探索任务相关性。为了将共享的知识融入到每个任务中,我们将任务的先验知识设计为其他相关任务的可变后验知识的可学习混合,这是通过Gumbel Softmax技术学习的。与以前的方法相比,我们的VMTL可以通过联合推断表示和分类器的后验概率,以有原则的方式利用它们的任务相关性。这使单个任务能够充分利用相关任务提供的归纳偏差,从而提高所有任务的总体性能。实验结果表明,在分类和回归训练数据有限的情况下,所提出的VMTL能够有效地处理各种具有挑战性的多任务学习环境。我们的方法始终优于以前的方法,包括强贝叶斯方法,并在五个基准数据集上实现了最先进的性能。 摘要:Multi-task learning aims to explore task relatedness to improve individual tasks, which is of particular significance in the challenging scenario that only limited data is available for each task. To tackle this challenge, we propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks. We cast multi-task learning as a variational Bayesian inference problem, in which task relatedness is explored in a unified manner by specifying priors. To incorporate shared knowledge into each task, we design the prior of a task to be a learnable mixture of the variational posteriors of other related tasks, which is learned by the Gumbel-Softmax technique. In contrast to previous methods, our VMTL can exploit task relatedness for both representations and classifiers in a principled way by jointly inferring their posteriors. This enables individual tasks to fully leverage inductive biases provided by related tasks, therefore improving the overall performance of all tasks. Experimental results demonstrate that the proposed VMTL is able to effectively tackle a variety of challenging multi-task learning settings with limited training data for both classification and regression. Our method consistently surpasses previous methods, including strong Bayesian approaches, and achieves state-of-the-art performance on five benchmark datasets.

【2】 Machine-learning custom-made basis functions for partial differential equations 标题:偏微分方程的机器学习定制基函数 链接:https://arxiv.org/abs/2111.05307

作者:Brek Meuris,Saad Qadeer,Panos Stinis 机构:Advanced Computing, Mathematics and Data Division, Pacific Northwest National, Laboratory, Richland, WA , USA, Department of Applied Mathematics, University of Washington, Seattle, WA , Department of Mechanical Engineering, University of Washington, Seattle, WA 备注:35 pages, software used to reach results available upon request, approved for release by Pacific Northwest National Laboratory (PNNL-SA-168281) 摘要:谱方法是科学计算中求解偏微分方程(PDE)的重要组成部分。然而,它们的适用性和有效性在很大程度上取决于用于扩展偏微分方程解的基函数的选择。在过去的十年中,深度学习已经成为提供复杂函数高效表示的有力竞争者。在目前的工作中,我们提出了一种将深度神经网络与谱方法相结合的方法来解决偏微分方程。特别是,我们使用一种称为深度操作员网络(DeepONet)的深度学习技术来识别候选函数,在这些函数上扩展PDE的解决方案。我们设计了一种方法,使用DeepONet提供的候选函数作为起点来构造一组具有以下特性的函数:i)它们构成基,2)它们是正交的,3)它们是层次的,即类似于傅立叶级数或正交多项式。我们利用我们定制的基函数的有利性质来研究它们的逼近能力,并使用它们来扩展线性和非线性时变偏微分方程的解。 摘要:Spectral methods are an important part of scientific computing's arsenal for solving partial differential equations (PDEs). However, their applicability and effectiveness depend crucially on the choice of basis functions used to expand the solution of a PDE. The last decade has seen the emergence of deep learning as a strong contender in providing efficient representations of complex functions. In the current work, we present an approach for combining deep neural networks with spectral methods to solve PDEs. In particular, we use a deep learning technique known as the Deep Operator Network (DeepONet), to identify candidate functions on which to expand the solution of PDEs. We have devised an approach which uses the candidate functions provided by the DeepONet as a starting point to construct a set of functions which have the following properties: i) they constitute a basis, 2) they are orthonormal, and 3) they are hierarchical i.e., akin to Fourier series or orthogonal polynomials. We have exploited the favorable properties of our custom-made basis functions to both study their approximation capability and use them to expand the solution of linear and nonlinear time-dependent PDEs.

【3】 Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach 标题:使用平滑深度学习方法识别干旱和高温的大气驱动因素 链接:https://arxiv.org/abs/2111.05303

作者:Magdalena Mittermeier,Maximilian Weigert,David Rügamer 机构:Department of Geography, LMU Munich, Statistical Consulting StaBLab, Department of Statistics 备注:NeurIPS 2021: Tackling Climate Change with Machine Learning 摘要:在最近的夏天,欧洲遭受了几次灾难性的高温和干旱。除了热力学影响外,这种高温和干燥的极端天气是由某些大气条件(包括反气旋条件)驱动的。气候变化对大气环流的影响是复杂的,在这种情况下,许多开放的研究问题仍然存在,例如反气旋条件的未来趋势。基于标记环流模式目录和空间大气变量的组合,我们提出了一种平滑卷积神经网络分类器,用于六种与干旱和高温相关的反气旋环流。我们的工作有助于在气候模拟中确定极端炎热和干旱的重要驱动因素,从而揭示气候变化对这些驱动因素的影响。我们解决了环流模式分类固有的各种挑战,这些挑战也存在于其他气候模式中,例如,主观标签和明确的过渡期。 摘要:Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based on the combination of a catalog of labeled circulation patterns and spatial atmospheric variables, we propose a smoothed convolutional neural network classifier for six types of anticyclonic circulations that are associated with drought and heat. Our work can help to identify important drivers of hot and dry extremes in climate simulations, which allows to unveil the impact of climate change on these drivers. We address various challenges inherent to circulation pattern classification that are also present in other climate patterns, e.g., subjective labels and unambiguous transition periods.

【4】 Learning Perceptual Concepts by Bootstrapping from Human Queries 标题:从人类查询中通过自举学习知觉概念 链接:https://arxiv.org/abs/2111.05251

作者:Andreea Bobu,Chris Paxton,Wei Yang,Balakumar Sundaralingam,Yu-Wei Chao,Maya Cakmak,Dieter Fox 机构:com 3 University of Washington mcakmak 备注:7 pages, 7 figures 摘要:机器人需要能够从用户那里学习概念,以便使其能力适应每个用户的独特任务。但当机器人在高维输入(如图像或点云)上操作时,这是不切实际的:机器人需要不切实际的人力来学习新概念。为了应对这一挑战,我们提出了一种新的方法,即机器人学习概念的低维变体,并使用它生成更大的数据集,以便在高维空间学习概念。这使得它可以利用只有在训练时才可访问的语义上有意义的特权信息,如对象姿势和边界框,从而允许更丰富的人机交互来加速学习。我们通过学习描述对象状态或多对象关系的介词概念来评估我们的方法,如上图、近图或对齐图,它们是用户指定机器人任务目标和执行约束的关键。使用模拟人,我们表明,与直接在高维空间学习概念相比,我们的方法提高了样本复杂性。我们还演示了所学概念在7自由度Franka Panda机器人运动规划任务中的实用性。 摘要:Robots need to be able to learn concepts from their users in order to adapt their capabilities to each user's unique task. But when the robot operates on high-dimensional inputs, like images or point clouds, this is impractical: the robot needs an unrealistic amount of human effort to learn the new concept. To address this challenge, we propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space. This lets it take advantage of semantically meaningful privileged information only accessible at training time, like object poses and bounding boxes, that allows for richer human interaction to speed up learning. We evaluate our approach by learning prepositional concepts that describe object state or multi-object relationships, like above, near, or aligned, which are key to user specification of task goals and execution constraints for robots. Using a simulated human, we show that our approach improves sample complexity when compared to learning concepts directly in the high-dimensional space. We also demonstrate the utility of the learned concepts in motion planning tasks on a 7-DoF Franka Panda robot.

【5】 Learning Rates for Nonconvex Pairwise Learning 标题:非凸成对学习的学习率 链接:https://arxiv.org/abs/2111.05232

作者:Shaojie Li,Yong Liu 机构: Liu are with the Gaoling School of Artificial Intelli-gence, Renmin University of China 摘要:两两学习正受到越来越多的关注,因为它涵盖了许多重要的机器学习任务,例如度量学习、AUC最大化和排名。因此,研究两两学习的泛化行为具有重要意义。然而,现有的泛化分析主要集中在凸目标函数上,对非凸学习的研究较少。此外,目前针对两两学习的泛化性能得出的学习速率大多较低。基于这些问题,我们研究了非凸成对学习的泛化性能,并提供了改进的学习率。具体地说,我们在不同的假设下发展了不同的梯度一致收敛性,并在此基础上分析了经验风险最小化、梯度下降和随机梯度下降的成对学习。我们首先在一般的非凸环境中成功地建立了这些算法的学习率,分析揭示了优化和泛化之间的权衡以及提前停止的作用。然后研究了梯度优势曲率条件下非凸学习的泛化性能。在这个设置中,我们得到更快的$mathcal{O}(1/n)$阶学习率,其中$n$是样本大小。假设最优总体风险很小,我们进一步将学习率提高到$mathcal{O}(1/n^2)$,就我们所知,这是两两学习的第一个$mathcal{O}(1/n^2)$类型的学习率,无论是凸学习还是非凸学习。总体而言,我们系统地分析了非凸成对学习的泛化性能。 摘要:Pairwise learning is receiving increasing attention since it covers many important machine learning tasks, e.g., metric learning, AUC maximization, and ranking. Investigating the generalization behavior of pairwise learning is thus of significance. However, existing generalization analysis mainly focuses on the convex objective functions, leaving the nonconvex learning far less explored. Moreover, the current learning rates derived for generalization performance of pairwise learning are mostly of slower order. Motivated by these problems, we study the generalization performance of nonconvex pairwise learning and provide improved learning rates. Specifically, we develop different uniform convergence of gradients for pairwise learning under different assumptions, based on which we analyze empirical risk minimizer, gradient descent, and stochastic gradient descent pairwise learning. We first successfully establish learning rates for these algorithms in a general nonconvex setting, where the analysis sheds insights on the trade-off between optimization and generalization and the role of early-stopping. We then investigate the generalization performance of nonconvex learning with a gradient dominance curvature condition. In this setting, we derive faster learning rates of order $mathcal{O}(1/n)$, where $n$ is the sample size. Provided that the optimal population risk is small, we further improve the learning rates to $mathcal{O}(1/n^2)$, which, to the best of our knowledge, are the first $mathcal{O}(1/n^2)$-type of rates for pairwise learning, no matter of convex or nonconvex learning. Overall, we systematically analyzed the generalization performance of nonconvex pairwise learning.

【6】 A Survey on Green Deep Learning 标题:绿色深度学习研究综述 链接:https://arxiv.org/abs/2111.05193

作者:Jingjing Xu,Wangchunshu Zhou,Zhiyi Fu,Hao Zhou,Lei Li 机构:ByteDance AI Lab, Peking University, University of California, Santa Barbara 摘要:近年来,更大、更深层次的模型如雨后春笋般涌现,并不断推动自然语言处理(NLP)和计算机视觉(CV)等各个领域的最新成果。然而,尽管结果很有希望,但需要注意的是,SOTA模型所需的计算量以指数速度增加。大规模计算不仅会产生惊人的巨大碳足迹,而且会对研究包容性和实际应用的部署产生负面影响。绿色深度学习是一个日益热门的研究领域,呼吁研究者在模型训练和推理过程中关注能源使用和碳排放。目标是利用轻量级和高效的技术产生新的结果。许多技术可以用来实现这一目标,比如模型压缩和知识提取。本文重点对绿色深度学习技术的发展进行了系统的回顾。我们将这些方法分为四类:(1)紧凑网络,(2)节能训练策略,(3)节能推理方法,和(4)高效数据使用。对于每个类别,我们讨论已经取得的进展和尚未解决的挑战。 摘要:In recent years, larger and deeper models are springing up and continuously pushing state-of-the-art (SOTA) results across various fields like natural language processing (NLP) and computer vision (CV). However, despite promising results, it needs to be noted that the computations required by SOTA models have been increased at an exponential rate. Massive computations not only have a surprisingly large carbon footprint but also have negative effects on research inclusiveness and deployment on real-world applications. Green deep learning is an increasingly hot research field that appeals to researchers to pay attention to energy usage and carbon emission during model training and inference. The target is to yield novel results with lightweight and efficient technologies. Many technologies can be used to achieve this goal, like model compression and knowledge distillation. This paper focuses on presenting a systematic review of the development of Green deep learning technologies. We classify these approaches into four categories: (1) compact networks, (2) energy-efficient training strategies, (3) energy-efficient inference approaches, and (4) efficient data usage. For each category, we discuss the progress that has been achieved and the unresolved challenges.

【7】 On Training Implicit Models 标题:浅谈隐式模型的训练 链接:https://arxiv.org/abs/2111.05177

作者:Zhengyang Geng,Xin-Yu Zhang,Shaojie Bai,Yisen Wang,Zhouchen Lin 机构:Zhejiang Lab, China, Key Lab. of Machine Perception, School of EECS, Peking University, Carnegie Mellon University, Pazhou Lab, China 备注:24 pages, 4 figures, in The 35th Conference on Neural Information Processing Systems (NeurIPS 2021) 摘要:本文主要研究无限层隐式模型的训练问题。具体地说,以前的工作采用隐式微分法并求解反向传播的精确梯度。然而,有必要为训练计算如此精确但昂贵的梯度吗?在这项工作中,我们提出了一种新的隐式模型梯度估计,称为幻影梯度,它1)放弃了精确梯度的昂贵计算;2)提供了一个经验上优于隐式模型训练的更新方向。我们从理论上分析了可以找到损失景观上升方向的条件,并提供了基于阻尼展开和Neumann级数的幻影梯度的两个具体实例。在大规模任务上的实验表明,这些轻量级幻影梯度显著地将隐式模型训练中的向后传递速度提高了约1.7倍,甚至比基于ImageNet上精确梯度的方法的性能更高。 摘要:This paper focuses on training implicit models of infinite layers. Specifically, previous works employ implicit differentiation and solve the exact gradient for the backward propagation. However, is it necessary to compute such an exact but expensive gradient for training? In this work, we propose a novel gradient estimate for implicit models, named phantom gradient, that 1) forgoes the costly computation of the exact gradient; and 2) provides an update direction empirically preferable to the implicit model training. We theoretically analyze the condition under which an ascent direction of the loss landscape could be found, and provide two specific instantiations of the phantom gradient based on the damped unrolling and Neumann series. Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times, and even boost the performance over approaches based on the exact gradient on ImageNet.

【8】 Classifying DNS Servers based on Response Message Matrix using Machine Learning 标题:基于响应消息矩阵的机器学习在DNS服务器分类中的应用 链接:https://arxiv.org/abs/2111.05034

作者:Keiichi Shima,Ryo Nakamura,Kazuya Okada,Tomohiro Ishihara,Daisuke Miyamoto,Yuji Sekiya 摘要:配置不当的域名系统(DNS)服务器有时被用作数据包反射器,作为DoS或DDoS攻击的一部分。通过监视DNS请求和响应流量,在逻辑上可以检测由于此活动而创建的数据包。任何没有相应请求的响应都可以被视为反射消息;然而,检查和跟踪每个DNS数据包是一项非常重要的操作。在本文中,我们提出了一种DNS服务器作为反射器的检测机制,使用由少量数据包构建的DNS服务器特征矩阵和机器学习算法。当测试和训练数据在同一天内生成时,DNS服务器检测不良F1得分大于0.9,未用于当天训练和测试阶段的数据F1得分大于0.7。 摘要:Improperly configured domain name system (DNS) servers are sometimes used as packet reflectors as part of a DoS or DDoS attack. Detecting packets created as a result of this activity is logically possible by monitoring the DNS request and response traffic. Any response that does not have a corresponding request can be considered a reflected message; checking and tracking every DNS packet, however, is a non-trivial operation. In this paper, we propose a detection mechanism for DNS servers used as reflectors by using a DNS server feature matrix built from a small number of packets and a machine learning algorithm. The F1 score of bad DNS server detection was more than 0.9 when the test and training data are generated within the same day, and more than 0.7 for the data not used for the training and testing phase of the same day.

【9】 Learning to Generalize Compositionally by Transferring Across Semantic Parsing Tasks 标题:通过跨语义分析任务迁移学习成分泛化 链接:https://arxiv.org/abs/2111.05013

作者:Wang Zhu,Peter Shaw,Tal Linzen,Fei Sha 机构:University of Southern California,Google Research,New York University 摘要:神经网络模型通常很难推广到不匹配的域或分布。在NLP中,这个问题特别是当模型被期望从组合上推广时,即,到熟悉的单词和结构的新组合。我们研究了促进从一个合成任务向另一个合成任务转移学习的学习表示法:模型的表示法和特定于任务的层在预微调任务中进行了不同的策略性训练,以便它们能够很好地概括需要合成性的不匹配拆分。我们将此方法应用于语义解析,使用三个非常不同的数据集,COGS、GeoQuery和SCAN,交替用作预优化和目标任务。我们的方法显著提高了目标任务测试集基线上的合成泛化,这是在微调过程中保持的。烧蚀研究描述了所提出算法中主要步骤的效用,并支持我们的假设。 摘要:Neural network models often generalize poorly to mismatched domains or distributions. In NLP, this issue arises in particular when models are expected to generalize compositionally, that is, to novel combinations of familiar words and constructions. We investigate learning representations that facilitate transfer learning from one compositional task to another: the representation and the task-specific layers of the models are strategically trained differently on a pre-finetuning task such that they generalize well on mismatched splits that require compositionality. We apply this method to semantic parsing, using three very different datasets, COGS, GeoQuery and SCAN, used alternately as the pre-finetuning and target task. Our method significantly improves compositional generalization over baselines on the test set of the target task, which is held out during fine-tuning. Ablation studies characterize the utility of the major steps in the proposed algorithm and support our hypothesis.

【10】 How to Train Your Neural Network: A Comparative Evaluation 标题:如何训练你的神经网络:一个比较评估 链接:https://arxiv.org/abs/2111.04949

作者:Shu-Huai Lin,Daniel Nichols,Siddharth Singh,Abhinav Bhatele 机构:Department of Computer Science, University of Maryland, College Park, MD 摘要:深度学习领域已经见证了一个显著的转变,即向计算和记忆密集型神经网络的转变。这些新的更大的模型使研究人员能够在各个领域推进最先进的工具。这一现象刺激了神经网络分布式训练算法在大量硬件加速器上的发展。在本文中,我们讨论并比较了目前最先进的大规模分布式深度学习框架。首先,我们调查了分布式学习的当前实践,并确定了使用的不同类型的并行性。然后,我们给出了实证结果,比较了他们在大型图像和语言训练任务中的表现。此外,我们还讨论了它们的统计效率和内存消耗行为。基于我们的结果,我们讨论了每个框架中阻碍性能的算法和实现部分。 摘要:The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields. This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. First, we survey current practices in distributed learning and identify the different types of parallelism used. Then, we present empirical results comparing their performance on large image and language training tasks. Additionally, we address their statistical efficiency and memory consumption behavior. Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance.

【11】 Self-Interpretable Model with TransformationEquivariant Interpretation 标题:具有变换等变解释的自解释模型 链接:https://arxiv.org/abs/2111.04927

作者:Yipei Wang,Xiaoqian Wang 机构:Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 备注:Accepted by NeurIPS 2021 摘要:在本文中,我们提出了一个具有变换等变解释的自解释模型站点。我们关注几何变换解释的鲁棒性和自一致性。除了转换等效性,作为一种自我解释模型,SITE具有与基准黑盒分类器相当的表达能力,同时能够以高质量呈现忠实和稳健的解释。值得注意的是,尽管在大多数CNN可视化方法中都有应用,但双线性上采样近似是一种粗略近似,它只能以热图的形式提供解释(而不是像素)。这些解释是否可以直接指向输入空间(如MNIST实验所示),这仍然是一个悬而未决的问题。此外,我们还考虑了模型中的平移和旋转变换。在未来的工作中,我们将探索在更复杂的变换(如缩放和失真)下的稳健解释。此外,我们澄清了该站点不限于几何变换(我们在计算机视觉领域中使用),并将在未来的工作中探索其他领域中的站点。 摘要:In this paper, we propose a self-interpretable model SITE with transformation-equivariant interpretations. We focus on the robustness and self-consistency of the interpretations of geometric transformations. Apart from the transformation equivariance, as a self-interpretable model, SITE has comparable expressive power as the benchmark black-box classifiers, while being able to present faithful and robust interpretations with high quality. It is worth noticing that although applied in most of the CNN visualization methods, the bilinear upsampling approximation is a rough approximation, which can only provide interpretations in the form of heatmaps (instead of pixel-wise). It remains an open question whether such interpretations can be direct to the input space (as shown in the MNIST experiments). Besides, we consider the translation and rotation transformations in our model. In future work, we will explore the robust interpretations under more complex transformations such as scaling and distortion. Moreover, we clarify that SITE is not limited to geometric transformation (that we used in the computer vision domain), and will explore SITEin other domains in future work.

【12】 Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers 标题:在可实现的环境中实践的、可证明正确的互动学习:真正信徒的力量 链接:https://arxiv.org/abs/2111.04915

作者:Julian Katz-Samuels,Blake Mason,Kevin Jamieson,Rob Nowak 机构:University of Wisconsin, Madison, Rice University, University of Washington, Robert Nowak 摘要:我们考虑交互式学习在可实现的设置,并开发了一个通用的框架来处理问题,从最佳手臂识别到主动分类。我们从观察到不可知算法在可实现的环境中不可能是极小极大最优的开始我们的研究。因此,我们为可实现设置设计了新的计算效率高的算法,该算法匹配对数因子的极小极大下界,并且是通用的,可容纳多种函数类,包括核方法、H{“o}lder平滑函数和凸函数。我们的算法的样本复杂性可以用众所周知的量(如扩展教学维和haystack维)来量化。但是,与直接基于这些组合量的算法不同,我们的算法在计算效率上是高的。为了实现计算效率为了提高效率,我们的算法使用蒙特卡罗“点击并运行”从版本空间采样“算法,而不是显式维护版本空间。我们的方法有两个关键优势。首先,它很简单,由两个统一的贪婪算法组成。其次,我们的算法能够无缝地利用在实践中经常可用和有用的先验知识。除了我们新的理论结果外,我们还从经验上证明了我们的算法与高斯过程UCB方法具有竞争力。 摘要:We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We begin our investigation with the observation that agnostic algorithms emph{cannot} be minimax-optimal in the realizable setting. Hence, we design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors and are general-purpose, accommodating a wide variety of function classes including kernel methods, H{"o}lder smooth functions, and convex functions. The sample complexities of our algorithms can be quantified in terms of well-known quantities like the extended teaching dimension and haystack dimension. However, unlike algorithms based directly on those combinatorial quantities, our algorithms are computationally efficient. To achieve computational efficiency, our algorithms sample from the version space using Monte Carlo "hit-and-run" algorithms instead of maintaining the version space explicitly. Our approach has two key strengths. First, it is simple, consisting of two unifying, greedy algorithms. Second, our algorithms have the capability to seamlessly leverage prior knowledge that is often available and useful in practice. In addition to our new theoretical results, we demonstrate empirically that our algorithms are competitive with Gaussian process UCB methods.

【13】 Machine Learning for Multimodal Electronic Health Records-based Research: Challenges and Perspectives 标题:机器学习在基于多模态电子健康记录的研究中的挑战和前景 链接:https://arxiv.org/abs/2111.04898

作者:Ziyi Liu,Jiaqi Zhang,Yongshuai Hou,Xinran Zhang,Ge Li,Yang Xiang 机构:Department of Network Intelligence, Peng Cheng Laboratory, Shenzhen, China, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA, Mathematical and Computational Sciences, University of Toronto Mississauga, Mississauga, Canada 备注:37 pages, 3 tables, 2 figures, to be submitted to JAMIA 摘要:背景:电子健康记录(EHR)包含患者健康史的丰富信息,通常包括结构化和非结构化数据。有许多研究侧重于从结构化数据中提取有价值的信息,如疾病代码、实验室测试结果和治疗。然而,仅仅依靠结构化数据可能不足以反映患者的全面信息,并且此类数据有时可能包含错误记录。目的:随着机器学习(ML)和深度学习(DL)技术的最新进展,越来越多的研究试图通过合并非结构化自由文本数据来获得更准确的结果。本文回顾了使用EHR中的多模态数据(即结构化和非结构化数据的组合)作为传统ML或DL模型输入以解决目标任务的研究。材料和方法:我们在PubMed的电气和电子工程师协会(IEEE)数字图书馆和计算机械协会(ACM)数字图书馆中搜索与基于ML的多模态EHR研究相关的文章。结果和讨论:在最后94项纳入研究中,我们重点关注如何使用传统的ML和DL技术组合和交互来自不同模式的数据,以及这些算法如何应用于EHR相关任务。此外,我们还研究了这些融合方法的优点和局限性,并指出了基于ML的多模态EHR研究的未来方向。 摘要:Background: Electronic Health Records (EHRs) contain rich information of patients' health history, which usually include both structured and unstructured data. There have been many studies focusing on distilling valuable information from structured data, such as disease codes, laboratory test results, and treatments. However, relying on structured data only might be insufficient in reflecting patients' comprehensive information and such data may occasionally contain erroneous records. Objective: With the recent advances of machine learning (ML) and deep learning (DL) techniques, an increasing number of studies seek to obtain more accurate results by incorporating unstructured free-text data as well. This paper reviews studies that use multimodal data, i.e. a combination of structured and unstructured data, from EHRs as input for conventional ML or DL models to address the targeted tasks. Materials and Methods: We searched in the Institute of Electrical and Electronics Engineers (IEEE) Digital Library, PubMed, and Association for Computing Machinery (ACM) Digital Library for articles related to ML-based multimodal EHR studies. Results and Discussion: With the final 94 included studies, we focus on how data from different modalities were combined and interacted using conventional ML and DL techniques, and how these algorithms were applied in EHR-related tasks. Further, we investigate the advantages and limitations of these fusion methods and indicate future directions for ML-based multimodal EHR research.

【14】 EvoLearner: Learning Description Logics with Evolutionary Algorithms 标题:EvoLearner:用进化算法学习描述逻辑 链接:https://arxiv.org/abs/2111.04879

作者:Stefan Heindorf,Lukas Blübaum,Nick Düsterhus,Till Werner,Varun Nandkumar Golani,Caglar Demir,Axel-Cyrille Ngonga Ngomo 备注:11 pages, 3 figures, 9 tables, 3 algorithms 摘要:在知识图中对节点进行分类是一项重要任务,例如,预测缺失的实体类型,预测哪些分子会导致癌症,或者预测哪些药物是有希望的候选治疗药物。虽然黑盒模型通常具有很高的预测性能,但它们只是事后的和局部可解释的,并且不允许使用领域知识轻松丰富所学模型。为此,有人提出从正反两方面学习描述逻辑概念。然而,学习这些概念通常需要很长时间,最先进的方法对文字数据值提供的支持有限,尽管它们对于许多应用程序来说至关重要。在本文中,我们提出了进化学习者-一种学习ALCQ(D)的进化方法,这是一种带有补语(ALC)的定语,与限定的基数限制(Q)和数据属性(D)配对。我们为初始种群提供了一种新的初始化方法:从正示例(知识图中的节点)开始,执行有偏随机游动,并将其转换为描述逻辑概念。此外,在决定在何处拆分数据时,我们通过最大化信息增益来改进对数据属性的支持。我们表明,我们的方法在结构化机器学习的基准框架SML平台上显著优于最新水平。我们的消融研究证实,这是由于我们新颖的初始化方法和对数据属性的支持。 摘要:Classifying nodes in knowledge graphs is an important task, e.g., predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowledge. Towards this end, learning description logic concepts from positive and negative examples has been proposed. However, learning such concepts often takes a long time and state-of-the-art approaches provide limited support for literal data values, although they are crucial for many applications. In this paper, we propose EvoLearner - an evolutionary approach to learn ALCQ(D), which is the attributive language with complement (ALC) paired with qualified cardinality restrictions (Q) and data properties (D). We contribute a novel initialization method for the initial population: starting from positive examples (nodes in the knowledge graph), we perform biased random walks and translate them to description logic concepts. Moreover, we improve support for data properties by maximizing information gain when deciding where to split the data. We show that our approach significantly outperforms the state of the art on the benchmarking framework SML-Bench for structured machine learning. Our ablation study confirms that this is due to our novel initialization method and support for data properties.

【15】 Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL 标题:用TACCL综合异构网络的集体通信算法 链接:https://arxiv.org/abs/2111.04867

作者:Aashaka Shah,Vijay Chidambaram,Meghan Cowan,Saeed Maleki,Madan Musuvathi,Todd Mytkowicz,Jacob Nelson,Olli Saarikivi,Rachee Singh 机构: a set of communication primitives 1Department of Computer Science, University of Texas atAustin 备注:16 pages, 9 figures, includes Appendix 摘要:大型ML模型和数据集要求使用多GPU系统进行分布式模型训练。为了利用多GPU系统提供的电源,消除GPU间通信中的瓶颈是至关重要的,这一问题因互连的异构性而具有挑战性。在这项工作中,我们介绍了TACCL,一种用于大规模多GPU系统的集体通信原语合成器。TACCL将分析的拓扑和输入大小编码为综合问题,以生成优化的通信算法。TACCL构建在标准NVIDIA集体通信库(NCCL)的基础上,允许它在PyTorch等框架中作为GPU通信的替代品,只需很少的更改。TACCL为Allgather、Alltoall和Allreduce等通信原语生成的算法比NCCL快3倍。使用TACCL的算法可以将内部混合专家模型的端到端训练速度提高17%$。通过将优化问题分解为多个部分并利用多GPU拓扑中的对称性,TACCL在不到3分钟的时间内合成最多80 GPU的集合,比其他基于合成的最先进的集合通信库至少快两个数量级。 摘要:Large ML models and datasets have necessitated the use of multi-GPU systems for distributed model training. To harness the power offered by multi-GPU systems, it is critical to eliminate bottlenecks in inter-GPU communication - a problem made challenging by the heterogeneous nature of interconnects. In this work, we present TACCL, a synthesizer for collective communication primitives for large-scale multi-GPU systems. TACCL encodes a profiled topology and input size into a synthesis problem to generate optimized communication algorithms. TACCL is built on top of the standard NVIDIA Collective Communication Library (NCCL), allowing it to be a drop-in replacement for GPU communication in frameworks like PyTorch with minimal changes. TACCL generates algorithms for communication primitives like Allgather, Alltoall, and Allreduce that are up to $3times$ faster than NCCL. Using TACCL's algorithms speeds up the end-to-end training of an internal mixture of experts model by $17%$. By decomposing the optimization problem into parts and leveraging the symmetry in multi-GPU topologies, TACCL synthesizes collectives for up to 80-GPUs in less than 3 minutes, at least two orders of magnitude faster than other synthesis-based state-of-the-art collective communication libraries.

【16】 Model-assisted deep learning of rare extreme events from partial observations 标题:基于部分观测的罕见极端事件的模型辅助深度学习 链接:https://arxiv.org/abs/2111.04857

作者:Anna Asch,Ethan Brady,Hugo Gallardo,John Hood,Bryan Chu,Mohammad Farazmand 机构:Department of Mathematics, Cornell University, Malott Hall, Ithaca, NY , Department of Mathematics, Purdue University, N. University Street, West Lafayette, IN 摘要:为了使用深度神经网络预测罕见的极端事件,人们会遇到所谓的小数据问题,因为即使是长期观测也往往包含很少的极端事件。在这里,我们研究了一个模型辅助框架,其中的训练数据是从数值模拟中获得的,而不是从极端事件中获得足够的样本。然而,为了确保训练后的网络在实践中适用,训练不是在完整的模拟数据上进行的;相反,我们只使用可观测量的一小部分,这些量在实践中可以测量。我们在三个不同的动力系统(Rossler吸引子、FitzHugh-Nagumo模型和湍流)和三个不同的深层神经网络结构(前馈、长-短期记忆和储层计算)上研究了该模型辅助框架的可行性。在每种情况下,我们研究预测精度、对噪声的鲁棒性、重复训练下的再现性以及对输入数据类型的敏感性。特别是,我们发现长-短期记忆网络对噪声最具鲁棒性,并产生相对准确的预测,同时需要对超参数进行最小的微调。 摘要:To predict rare extreme events using deep neural networks, one encounters the so-called small data problem because even long-term observations often contain few extreme events. Here, we investigate a model-assisted framework where the training data is obtained from numerical simulations, as opposed to observations, with adequate samples from extreme events. However, to ensure the trained networks are applicable in practice, the training is not performed on the full simulation data; instead we only use a small subset of observable quantities which can be measured in practice. We investigate the feasibility of this model-assisted framework on three different dynamical systems (Rossler attractor, FitzHugh--Nagumo model, and a turbulent fluid flow) and three different deep neural network architectures (feedforward, long short-term memory, and reservoir computing). In each case, we study the prediction accuracy, robustness to noise, reproducibility under repeated training, and sensitivity to the type of input data. In particular, we find long short-term memory networks to be most robust to noise and to yield relatively accurate predictions, while requiring minimal fine-tuning of the hyperparameters.

【17】 Realizable Learning is All You Need 标题:可实现的学习是您所需要的全部 链接:https://arxiv.org/abs/2111.04746

作者:Max Hopkins,Daniel Kane,Shachar Lovett,Gaurav Mahajan 摘要:可实现性与不可知可学习性的等价性是学习理论中的一个基本现象。从经典环境(如PAC学习和回归)到最近的趋势(如对抗性稳健和私人学习),我们仍然缺乏统一的理论,这是令人惊讶的;传统的等价性证明往往是完全不同的,并且依赖于强大的特定于模型的假设,如一致收敛和样本压缩。在这项工作中,我们给出了第一个独立于模型的框架,解释了可实现性和不可知可学习性的等价性:一个三行黑盒约简,它简化、统一并扩展了我们对各种设置的理解。这包括不具有已知可学习性特征的模型,如具有任意分布假设或一般损失的学习,以及大量其他流行设置,如稳健学习、部分学习、公平学习和统计查询模型。更一般地说,我们认为可实现学习和不可知学习的等价性实际上是一种更广泛现象的特例,我们称之为属性泛化:学习算法的任何期望属性(例如,噪声容忍度、隐私性、稳定性)都可以在有限的假设类上得到满足(可能存在某种变化)任何可以学习的假设类。 摘要:The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory. With variants ranging from classical settings like PAC learning and regression to recent trends such as adversarially robust and private learning, it's surprising that we still lack a unified theory; traditional proofs of the equivalence tend to be disparate, and rely on strong model-specific assumptions like uniform convergence and sample compression. In this work, we give the first model-independent framework explaining the equivalence of realizable and agnostic learnability: a three-line blackbox reduction that simplifies, unifies, and extends our understanding across a wide variety of settings. This includes models with no known characterization of learnability such as learning with arbitrary distributional assumptions or general loss, as well as a host of other popular settings such as robust learning, partial learning, fair learning, and the statistical query model. More generally, we argue that the equivalence of realizable and agnostic learning is actually a special case of a broader phenomenon we call property generalization: any desirable property of a learning algorithm (e.g. noise tolerance, privacy, stability) that can be satisfied over finite hypothesis classes extends (possibly in some variation) to any learnable hypothesis class.

【18】 Use of 1D-CNN for input data size reduction of LSTM in Hourly Rainfall-Runoff modeling 标题:1D-CNN在逐时降雨径流模拟LSTM输入数据降维中的应用 链接:https://arxiv.org/abs/2111.04732

作者:Kei Ishida,Ali Ercan,Takeyoshi Nagasato,Masato Kiyama,Motoki Amagasaki 机构:International Research Organization for Advanced Science and Technology, Kumamoto University,-,-, Kurokami, Center for Water Cycle, Marine Environment, and Disaster Management, Kumamoto University,-,-, Kurokami 备注:18 pages, 9 figures 摘要:本研究提出了一种由一维卷积神经网络(1D-CNN)和长短时记忆(LSTM)网络(简称CNNsLSTM)串联耦合而成的时尺度降雨径流模拟体系结构。在CNNsLTSM中,CNN组件长时间接收每小时气象时间序列数据,然后LSTM组件短时间接收从1D-CNN提取的特征和每小时气象时间序列数据。作为一个案例研究,CNNsLSTM被用于日本石井河流域的小时降雨径流模拟。气象数据集包括降水量、气温、蒸散量、长波和短波辐射,作为输入,河流流量作为目标数据。为了评估提议的CNNsLSTM的性能,将CNNsLSTM的结果与1D-CNN、仅每小时输入的LSTM(LSTMWHORH)、1D-CNN和LSTM的并行架构(CNNpLSTM)以及同时使用每日和每小时输入数据的LSTM架构(LSTMwDpH)进行比较。与三种传统架构(1D-CNN、LSTMWHOOR和CNNpLSTM)和最近提出的LSTMwDpH相比,CNNsLSTM在估计精度方面有明显的改进。与观测到的流量相比,1D-CNN试验期间NSE值的中值为0.455-0.469(基于NCHF=8、16和32,CNN第一层特征图的通道数),CNNpLSTM试验期间NSE值的中值为0.639-0.656(基于NCHF=8、16和32),LSTMwHour试验期间NSE值的中值为0.745,LSTMwDpH试验期间NSE值的中值为0.831,CNNSM试验期间NSM试验期间NSE值的中值为0.865-0.873(基于NCHF=8、16和32)。此外,建议的CNNsLSTM将1D-CNN的RMSE中位数降低了50.2%-51.4%,CNNpLSTM降低了37.4%-40.8%,LSTMwHour降低了27.3%-29.5%,LSTMwDpH降低了10.6%-13.4%。 摘要:An architecture consisting of a serial coupling of the one-dimensional convolutional neural network (1D-CNN) and the long short-term memory (LSTM) network, which is referred as CNNsLSTM, was proposed for hourly-scale rainfall-runoff modeling in this study. In CNNsLTSM, the CNN component receives the hourly meteorological time series data for a long duration, and then the LSTM component receives the extracted features from 1D-CNN and the hourly meteorological time series data for a short-duration. As a case study, CNNsLSTM was implemented for hourly rainfall-runoff modeling at the Ishikari River watershed, Japan. The meteorological dataset, consists of precipitation, air temperature, evapotranspiration, and long- and short-wave radiation, were utilized as input, and the river flow was used as the target data. To evaluate the performance of proposed CNNsLSTM, results of CNNsLSTM were compared with those of 1D-CNN, LSTM only with hourly inputs (LSTMwHour), parallel architecture of 1D-CNN and LSTM (CNNpLSTM), and the LSTM architecture which uses both daily and hourly input data (LSTMwDpH). CNNsLSTM showed clear improvements on the estimation accuracy compared to the three conventional architectures (1D-CNN, LSTMwHour, and CNNpLSTM), and recently proposed LSTMwDpH. In comparison to observed flows, the median of the NSE values for the test period are 0.455-0.469 for 1D-CNN (based on NCHF=8, 16, and 32, the numbers of the channels of the feature map of the first layer of CNN), 0.639-0.656 for CNNpLSTM (based on NCHF=8, 16, and 32), 0.745 for LSTMwHour, 0.831 for LSTMwDpH, and 0.865-0.873 for CNNsLSTM (based on NCHF=8, 16, and 32). Furthermore, the proposed CNNsLSTM reduces the median RMSE of 1D-CNN by 50.2%-51.4%, CNNpLSTM by 37.4%-40.8%, LSTMwHour by 27.3%-29.5%, and LSTMwDpH by 10.6%-13.4%.

【19】 Survey of Deep Learning Methods for Inverse Problems 标题:反问题的深度学习方法综述 链接:https://arxiv.org/abs/2111.04731

作者:Shima Kamyab,Zihreh Azimifar,Rasool Sabzi,Paul Fieguth 机构:Dept. of Comp. Sci. and Eng., Shiraz University, Shiraz, Iran, Zohreh Azimifar, Dept. of Systems Design Engineering, University of Waterloo, Waterloo, Canada, arXiv:,.,v, [cs.CV] , Nov 摘要:在本文中,我们研究了解决反问题的各种深度学习策略。我们将现有的反问题深度学习解决方案分为三类:直接映射、数据一致性优化器和深度正则化器。我们选择了每种反问题类型的样本,以比较三种方法的鲁棒性我们对线性回归的经典问题和计算机视觉中的三个著名反问题进行了广泛的实验,即图像去噪、3D人脸逆绘制和对象跟踪,这些问题被选为每类反问题的代表性原型ems。总体结果和统计分析表明,解决方案类别具有鲁棒性行为,这取决于反问题域的类型,具体取决于问题是否包含测量异常值。根据我们的实验结果,我们提出了r每个反问题类。 摘要:In this paper we investigate a variety of deep learning strategies for solving inverse problems. We classify existing deep learning solutions for inverse problems into three categories of Direct Mapping, Data Consistency Optimizer, and Deep Regularizer. We choose a sample of each inverse problem type, so as to compare the robustness of the three categories, and report a statistical analysis of their differences. We perform extensive experiments on the classic problem of linear regression and three well-known inverse problems in computer vision, namely image denoising, 3D human face inverse rendering, and object tracking, selected as representative prototypes for each class of inverse problems. The overall results and the statistical analyses show that the solution categories have a robustness behaviour dependent on the type of inverse problem domain, and specifically dependent on whether or not the problem includes measurement outliers. Based on our experimental results, we conclude by proposing the most robust solution category for each inverse problem class.

【20】 Generalization in quantum machine learning from few training data 标题:基于少量训练数据的量子机器学习泛化 链接:https://arxiv.org/abs/2111.05292

作者:Matthias C. Caro,Hsin-Yuan Huang,M. Cerezo,Kunal Sharma,Andrew Sornborger,Lukasz Cincio,Patrick J. Coles 机构:Department of Mathematics, Technical University of Munich, Garching, Germany, Munich Center for Quantum Science and Technology (MCQST), Munich, Germany, Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA 备注:14 25 pages, 4 1 figures 摘要:现代量子机器学习(QML)方法涉及对训练数据集上的参数化量子电路进行变量优化,然后对测试数据集进行预测(即,泛化)。在这项工作中,我们在有限数量的$N$训练数据点上进行训练后,对QML中的泛化性能进行了全面的研究。我们证明了一个具有$T$可训练门的量子机器学习模型的泛化误差在最坏情况下为$sqrt{T/N}$。当优化过程中只有$Kll T$门发生了实质性变化时,我们证明了泛化误差提高到$sqrt{K/N}$。我们的结果表明,可以显著加快将单位编译成多项式数量的本机门的速度,这对于通常使用指数大小训练数据的量子计算行业来说是一个至关重要的应用。我们还表明,用量子卷积神经网络对跨越相变的量子态进行分类只需要非常小的训练数据集。其他潜在的应用包括学习量子纠错码或量子动力学模拟。我们的工作为QML领域注入了新的希望,因为很少的训练数据保证了良好的泛化。 摘要:Modern quantum machine learning (QML) methods involve variationally optimizing a parameterized quantum circuit on a training data set, and subsequently making predictions on a testing data set (i.e., generalizing). In this work, we provide a comprehensive study of generalization performance in QML after training on a limited number $N$ of training data points. We show that the generalization error of a quantum machine learning model with $T$ trainable gates scales at worst as $sqrt{T/N}$. When only $K ll T$ gates have undergone substantial change in the optimization process, we prove that the generalization error improves to $sqrt{K / N}$. Our results imply that the compiling of unitaries into a polynomial number of native gates, a crucial application for the quantum computing industry that typically uses exponential-size training data, can be sped up significantly. We also show that classification of quantum states across a phase transition with a quantum convolutional neural network requires only a very small training data set. Other potential applications include learning quantum error correcting codes or quantum dynamical simulation. Our work injects new hope into the field of QML, as good generalization is guaranteed from few training data.

【21】 A Deep Learning Technique using Low Sampling rate for residential Non Intrusive Load Monitoring 标题:一种用于住宅非侵入式负荷监测的低采样率深度学习技术 链接:https://arxiv.org/abs/2111.05120

作者:Ronak Aghera,Sahil Chilana,Vishal Garg,Raghunath Reddy 机构:International Institute of Information Technology, Hyderabad 摘要:单个设备负载和能耗反馈是追求用户在住宅中节能的重要途径之一。这有助于识别故障设备和设备闲置时浪费的能源。主要的挑战是在每个设备上没有入侵传感器的情况下识别和估计单个设备的能耗。非侵入性负荷监测(NILM)或能量分解是一个盲源分离问题,需要一个系统根据家庭总能耗估算单个电器的用电量。在本文中,我们提出了一种新的基于深度神经网络的方法,用于对从居民家庭获得的低频电力数据进行负荷分解。我们将一系列一维卷积神经网络和长-短期记忆(1D CNN-LSTM)相结合,以提取能够识别活动电器的特征,并在给定家庭总功率值的情况下检索其功耗。我们使用CNN从给定时间范围内的主要读数中提取特征,然后使用这些特征对给定设备在该时间段是否处于活动状态进行分类。然后,将提取的特征用于使用LSTM建模生成问题。我们训练LSTM生成特定设备的分解能耗。我们的神经网络能够生成需求侧的详细反馈,为最终用户提供有关其电力消耗的重要见解。该算法设计用于低功耗离线设备,如ESP32。经验计算表明,我们的模型在参考能量分解数据集(REDD)上的性能优于最新的模型。 摘要:Individual device loads and energy consumption feedback is one of the important approaches for pursuing users to save energy in residences. This can help in identifying faulty devices and wasted energy by devices when left On unused. The main challenge is to identity and estimate the energy consumption of individual devices without intrusive sensors on each device. Non-intrusive load monitoring (NILM) or energy disaggregation, is a blind source separation problem which requires a system to estimate the electricity usage of individual appliances from the aggregated household energy consumption. In this paper, we propose a novel deep neural network-based approach for performing load disaggregation on low frequency power data obtained from residential households. We combine a series of one-dimensional Convolutional Neural Networks and Long Short Term Memory (1D CNN-LSTM) to extract features that can identify active appliances and retrieve their power consumption given the aggregated household power value. We used CNNs to extract features from main readings in a given time frame and then used those features to classify if a given appliance is active at that time period or not. Following that, the extracted features are used to model a generation problem using LSTM. We train the LSTM to generate the disaggregated energy consumption of a particular appliance. Our neural network is capable of generating detailed feedback of demand-side, providing vital insights to the end-user about their electricity consumption. The algorithm was designed for low power offline devices such as ESP32. Empirical calculations show that our model outperforms the state-of-the-art on the Reference Energy Disaggregation Dataset (REDD).

【22】 Solving PDE-constrained Control Problems using Operator Learning 标题:利用算子学习求解偏微分方程约束控制问题 链接:https://arxiv.org/abs/2111.04941

作者:Rakhoon Hwang,Jae Yong Lee,Jin Young Shin,Hyung Ju Hwang 机构:Department of Mathematics, Pohang University of Science and Technology, Cheongam-ro , Pohang , Republic of Korea 备注:15 pages, 12 figures. This paper is under review of AAAI 2022 摘要:复杂物理动力学的建模和控制在现实问题中至关重要。通过引入具有特殊正则化子的偏微分方程解算子的代理模型,我们提出了一个新的框架,该框架通常适用于求解偏微分方程约束的最优控制问题。该框架的过程分为两个阶段:PDE约束的解算子学习(第一阶段)和最优控制搜索(第二阶段)。一旦代理模型在第一阶段得到训练,就可以在第二阶段推断出最优控制,而无需进行大量计算。我们的框架可以应用于数据驱动和无数据情况。我们证明了我们的方法成功地应用于从泊松方程到Burgers方程的具有不同PDE约束的不同控制变量的各种最优控制问题。 摘要:The modeling and control of complex physical dynamics are essential in real-world problems. We propose a novel framework that is generally applicable to solving PDE-constrained optimal control problems by introducing surrogate models for PDE solution operators with special regularizers. The procedure of the proposed framework is divided into two phases: solution operator learning for PDE constraints (Phase 1) and searching for optimal control (Phase 2). Once the surrogate model is trained in Phase 1, the optimal control can be inferred in Phase 2 without intensive computations. Our framework can be applied to both data-driven and data-free cases. We demonstrate the successful application of our method to various optimal control problems for different control variables with diverse PDE constraints from the Poisson equation to Burgers' equation.

【23】 Combining Machine Learning with Physics: A Framework for Tracking and Sorting Multiple Dark Solitons 标题:机器学习与物理相结合:跟踪和分类多个暗孤子的框架 链接:https://arxiv.org/abs/2111.04881

作者:Shangjie Guo,Sophia M. Koh,Amilson R. Fritsch,I. B. Spielman,Justyna P. Zwolak 机构:Joint Quantum Institute, National Institute of Standards and Technology, and University of Maryland, Gaithersburg, Maryland , USA, Department of Physics and Astronomy, Amherst College, Amherst, Massachusetts , USA 备注:12 pages, 9 figures 摘要:在超冷原子实验中,数据通常以图像的形式出现,这会使制备和测量系统的技术中固有的信息丢失。当感兴趣的过程很复杂时,例如玻色-爱因斯坦凝聚体(BEC)中激发之间的相互作用,这一问题尤其严重。在本文中,我们描述了一个结合机器学习(ML)模型和基于物理的传统分析的框架,用于识别和跟踪BEC图像中的多个孤子激发。我们使用一个基于ML的目标检测器来定位孤子激发,并开发一个物理信息分类器来将孤子激发分类为物理激励的子类别。最后,我们引入了一个质量度量来量化特定特征是扭结孤子的可能性。我们经过训练的这个框架的实现——SolDet——作为一个开源python包公开提供。当在合适的用户提供的数据集上进行训练时,SolDet广泛适用于冷原子图像中的特征识别。 摘要:In ultracold atom experiments, data often comes in the form of images which suffer information loss inherent in the techniques used to prepare and measure the system. This is particularly problematic when the processes of interest are complicated, such as interactions among excitations in Bose-Einstein condensates (BECs). In this paper, we describe a framework combining machine learning (ML) models with physics-based traditional analyses to identify and track multiple solitonic excitations in images of BECs. We use an ML-based object detector to locate the solitonic excitations and develop a physics-informed classifier to sort solitonic excitations into physically motivated sub-categories. Lastly, we introduce a quality metric quantifying the likelihood that a specific feature is a kink soliton. Our trained implementation of this framework -- SolDet -- is publicly available as an open-source python package. SolDet is broadly applicable to feature identification in cold atom images when trained on a suitable user-provided dataset.

其他(16篇)

【1】 Data Augmentation Can Improve Robustness 标题:数据增强可以提高健壮性 链接:https://arxiv.org/abs/2111.05328

作者:Sylvestre-Alvise Rebuffi,Sven Gowal,Dan A. Calian,Florian Stimberg,Olivia Wiles,Timothy Mann 机构:DeepMind, London 备注:Accepted at NeurIPS 2021. arXiv admin note: substantial text overlap with arXiv:2103.01946; text overlap with arXiv:2110.09468 摘要:对抗性训练存在稳健过度拟合现象,即稳健测试精度在训练过程中开始下降。在本文中,我们重点讨论了使用常用的数据增强方案来减少鲁棒过拟合。我们证明,与之前的研究结果相反,当与模型权重平均相结合时,数据增强可以显著提高鲁棒精度。此外,我们比较了各种增强技术,发现空间合成技术最适合对抗训练。最后,我们分别针对大小为$epsilon=8/255$和$epsilon=128/255$的$ellinfty$和$ellu 2$范数有界扰动对CIFAR-10的方法进行了评估。与以前最先进的方法相比,我们在鲁棒性精度方面显示了 2.93%和 2.16%的巨大绝对改进。特别是,对于大小为$epsilon=8/255$的$ellinfty$范数有界扰动,我们的模型在不使用任何外部数据的情况下达到60.07%的鲁棒精度。在使用其他体系结构和数据集(如CIFAR-100、SVHN和TinyImageNet)时,我们还通过这种方法实现了显著的性能提升。 摘要:Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training. In this paper, we focus on reducing robust overfitting by using common data augmentation schemes. We demonstrate that, contrary to previous findings, when combined with model weight averaging, data augmentation can significantly boost robust accuracy. Furthermore, we compare various augmentations techniques and observe that spatial composition techniques work the best for adversarial training. Finally, we evaluate our approach on CIFAR-10 against $ell_infty$ and $ell_2$ norm-bounded perturbations of size $epsilon = 8/255$ and $epsilon = 128/255$, respectively. We show large absolute improvements of 2.93% and 2.16% in robust accuracy compared to previous state-of-the-art methods. In particular, against $ell_infty$ norm-bounded perturbations of size $epsilon = 8/255$, our model reaches 60.07% robust accuracy without using any external data. We also achieve a significant performance boost with this approach while using other architectures and datasets such as CIFAR-100, SVHN and TinyImageNet.

【2】 Can Information Flows Suggest Targets for Interventions in Neural Circuits? 标题:信息流能建议神经回路干预的目标吗? 链接:https://arxiv.org/abs/2111.05299

作者:Praveen Venkatesh,Sanghamitra Dutta,Neil Mehta,Pulkit Grover 机构:Allen Institute,University of Washington, Seattle; ,JP Morgan Chase AI Research;, –,Department of Electrical and Computer Engineering,Neuroscience Institute, Carnegie Mellon University 备注:Accepted to the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). (29 pages; 61 figures) 摘要:在神经科学和临床应用的推动下,我们经验性地检验了信息流的观察指标是否可以建议干预措施。我们通过在机器学习公平性的背景下对人工神经网络进行实验来实现这一点,其目标是通过干预在系统中诱导公平性。使用我们最近开发的$M$-信息流框架,我们测量了关于真实标签的信息流(对准确性负责,因此是可取的),以及关于受保护属性的信息流(对偏差负责,因此是不可取的),在经过训练的神经网络的边上。然后,我们将流量大小与通过修剪对这些边缘进行干预的效果进行比较。我们表明,修剪携带有关受保护属性的更大信息流的边在更大程度上减少了输出的偏差。这表明$M$-信息流可以有意义地建议干预目标,以肯定的方式回答标题的问题。我们还评估了不同干预策略的偏差-准确度权衡,以分析人们如何使用对理想和不理想信息流(此处为准确度和偏差流)的估计来告知干预措施,从而保留前者,同时减少后者。 摘要:Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framework, we measure the flow of information about the true label (responsible for accuracy, and hence desirable), and separately, the flow of information about a protected attribute (responsible for bias, and hence undesirable) on the edges of a trained neural network. We then compare the flow magnitudes against the effect of intervening on those edges by pruning. We show that pruning edges that carry larger information flows about the protected attribute reduces bias at the output to a greater extent. This demonstrates that $M$-information flow can meaningfully suggest targets for interventions, answering the title's question in the affirmative. We also evaluate bias-accuracy tradeoffs for different intervention strategies, to analyze how one might use estimates of desirable and undesirable information flows (here, accuracy and bias flows) to inform interventions that preserve the former while reducing the latter.

【3】 Towards a Unified Information-Theoretic Framework for Generalization 标题:走向统一的信息论综合框架 链接:https://arxiv.org/abs/2111.05275

作者:Mahdi Haghifam,Gintare Karolina Dziugaite,Shay Moran,Daniel M. Roy 机构:University of Toronto, Vector Institute, Google Research, Mila, Technion 备注:22 Pages, NeurIPS 2021, This submission subsumes [arXiv:2011.02970] ("On the Information Complexity of Proper Learners for VC Classes in the Realizable Case") 摘要:在这项工作中,我们研究了Steinke和Zakynthinou(2020)的“条件互信息”(CMI)框架的表达能力,以及使用它来提供一个统一的框架来证明可实现环境中的泛化边界的前景。我们首先证明,对于从一类有界VC维输出假设的任何学习算法,可以使用该框架来表示非平凡(但次优)边界。我们证明了CMI框架在学习半空间的支持向量机(SVM)的预期风险上产生了最优界。这个结果是我们的一般结果的一个应用,表明大小为$k$的稳定压缩方案Bousquet al.(2020)具有$O(k)$阶的一致有界CMI。我们进一步表明,VC课程正确学习的固有局限性与具有恒定CMI的正确学习者的存在相矛盾,这意味着对Steinke和Zakynthinou(2020)的公开问题的消极解决。我们进一步研究了$H$类经验风险最小化(ERM)的CMI,并表明当且仅当$H$具有有界星号时,才有可能输出具有有界CMI的所有一致分类器(版本空间)(Hanneke和Yang(2015))。此外,我们证明了一个通用的简化,表明“遗漏一项”分析可以通过CMI框架表达。作为推论,我们研究了Haussler等人(1994)提出的单包含图算法的CMI。更一般地说,我们证明了CMI框架的普遍性,即对于每个一致的算法和数据分布,当且仅当其评估的CMI随样本数呈次线性增长时,预期风险随着样本数的发散而消失。 摘要:In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces. This result is an application of our general result showing that stable compression schemes Bousquet al. (2020) of size $k$ have uniformly bounded CMI of order $O(k)$. We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020). We further study the CMI of empirical risk minimizers (ERMs) of class $H$ and show that it is possible to output all consistent classifiers (version space) with bounded CMI if and only if $H$ has a bounded star number (Hanneke and Yang (2015)). Moreover, we prove a general reduction showing that "leave-one-out" analysis is expressible via the CMI framework. As a corollary we investigate the CMI of the one-inclusion-graph algorithm proposed by Haussler et al. (1994). More generally, we show that the CMI framework is universal in the sense that for every consistent algorithm and data distribution, the expected risk vanishes as the number of samples diverges if and only if its evaluated CMI has sublinear growth with the number of samples.

【4】 Logarithmic Regret from Sublinear Hints 标题:来自次线性暗示的对数遗憾 链接:https://arxiv.org/abs/2111.05257

作者:Aditya Bhaskara,Ashok Cutkosky,Ravi Kumar,Manish Purohit 机构:Department of Computer Science, University of Utah, Salt Lake City, UT, Dept. of Electrical and Computer Engineering, Boston University, Boston, MA, Google Research, Mountain View, CA 摘要:我们考虑在线线性优化问题,在每一步中,算法在单位球中扮演一个点$xyt $,并且遭受损失$Lang-Cyt,Xyt Langle $,对于一些代价向量$CYT $,然后被揭示给算法。最近的工作表明,如果一个算法在播放$x_t$之前收到一个与$c_t$具有非平凡相关性的提示$h_t$,那么它可以实现$O(log t)$的遗憾保证,从而改进标准设置中的$Theta(sqrt{t})的界限。在这项工作中,我们研究了一个算法是否真的需要在每一个时间步上都有提示的问题。令人惊讶的是,在自然查询模型下,算法只需$O(sqrt{T})$提示即可获得$O(logt)$遗憾;相反,我们还表明$o(sqrt{T})$提示不能保证比$Omega(sqrt{T})$后悔更好。我们给出了我们的结果的两个应用,一个是被充分研究的乐观遗憾界的设置,另一个是有弃权的在线学习问题。 摘要:We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $langle c_t, x_trangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a hint $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(log T)$, improving on the bound of $Theta(sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain $O(log T)$ regret with just $O(sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(sqrt{T})$ hints cannot guarantee better than $Omega(sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention.

【5】 MLHarness: A Scalable Benchmarking System for MLCommons 标题:MLHarness:一个可扩展的MLCommons标杆系统 链接:https://arxiv.org/abs/2111.05231

作者:Yen-Hsiang Chang,Jianhao Pu,Wen-mei Hwu,Jinjun Xiong 机构:University of Illinois at Urbana-Champaign, Urbana, IL, USA, University at Buffalo, Buffalo, NY, USA, A R T I C L E I N F O 摘要:随着社会越来越多地采用机器学习(ML)和深度学习(DL)作为各种智能解决方案,在通用的开发实践和资源下,越来越有必要对具有大规模开放数据集的ML/DL模型的通用度量标准化,以便人们能够在一个共同的基础上对模型的质量和性能进行基准测试和比较。MLCommons最近已成为行业和学术界协调这一努力的推动力。尽管MLCommons推理作为标准化基准被广泛采用,但它只包括有限数量的ML/DL模型(事实上总共有七个模型)。这大大限制了MLCommons推理的基准测试结果的通用性,因为研究界有更多新颖的ML/DL模型,用不同的输入和输出模式解决了广泛的问题。为了解决这一局限性,我们提出了MLHarness,这是一个用于MLCommons推理的可扩展基准测试工具系统,具有三个显著特征:(1)它编码了由MLCommons推理定义的标准基准过程,包括模型、数据集、DL框架以及软件和硬件系统;(2) 它为模型开发人员提供了一种简单且声明性的方法,以将其模型和数据集贡献给MLCommons推理;(3)它包括对具有不同输入/输出模式的广泛模型的支持,以便我们能够跨不同的数据集、框架和硬件系统对这些模型进行可伸缩的基准测试。该线束系统是在MLModelScope系统的基础上开发的,将向社区开源。我们的实验结果表明,该线束系统具有良好的灵活性和可扩展性,可用于MLCommons推理基准测试。 摘要:With the society's growing adoption of machine learning (ML) and deep learning (DL) for various intelligent solutions, it becomes increasingly imperative to standardize a common set of measures for ML/DL models with large scale open datasets under common development practices and resources so that people can benchmark and compare models quality and performance on a common ground. MLCommons has emerged recently as a driving force from both industry and academia to orchestrate such an effort. Despite its wide adoption as standardized benchmarks, MLCommons Inference has only included a limited number of ML/DL models (in fact seven models in total). This significantly limits the generality of MLCommons Inference's benchmarking results because there are many more novel ML/DL models from the research community, solving a wide range of problems with different inputs and outputs modalities. To address such a limitation, we propose MLHarness, a scalable benchmarking harness system for MLCommons Inference with three distinctive features: (1) it codifies the standard benchmark process as defined by MLCommons Inference including the models, datasets, DL frameworks, and software and hardware systems; (2) it provides an easy and declarative approach for model developers to contribute their models and datasets to MLCommons Inference; and (3) it includes the support of a wide range of models with varying inputs/outputs modalities so that we can scalably benchmark these models across different datasets, frameworks, and hardware systems. This harness system is developed on top of the MLModelScope system, and will be open sourced to the community. Our experimental results demonstrate the superior flexibility and scalability of this harness system for MLCommons Inference benchmarking.

【6】 A research framework for writing differentiable PDE discretizations in JAX 标题:用JAX编写可微PDE离散化的研究框架 链接:https://arxiv.org/abs/2111.05218

作者:Antonio Stanziola,Simon R. Arridge,Ben T. Cox,Bradley E. Treeby 机构:University College London 备注:10 pages, 2 figures 摘要:微分仿真器是一个新兴的概念,在强化学习到最优控制等领域有着广泛的应用。它们的显著特点是能够计算与输入参数相关的分析梯度。与神经网络一样,模拟通常需要计算一个操作符的输出,该操作符本身可以分解为链接在一起的基本单元。虽然神经网络的每一层代表一个特定的离散操作,但同一个操作符可以有多个表示,这取决于所采用的离散化和需要解决的研究问题。在这里,我们提出了一种简单的设计模式,通过将算子表示为连续函数族之间的映射,并通过有限向量进行参数化,来构造一个可微算子和离散化库。我们演示了声学优化问题的方法,其中亥姆霍兹方程用傅里叶谱方法离散,可微性用梯度下降法来优化声学透镜的声速。建议的框架是开源的,可在url获取{https://github.com/ucl-bug/jaxdf} 摘要:Differentiable simulators are an emerging concept with applications in several fields, from reinforcement learning to optimal control. Their distinguishing feature is the ability to calculate analytic gradients with respect to the input parameters. Like neural networks, which are constructed by composing several building blocks called layers, a simulation often requires computing the output of an operator that can itself be decomposed into elementary units chained together. While each layer of a neural network represents a specific discrete operation, the same operator can have multiple representations, depending on the discretization employed and the research question that needs to be addressed. Here, we propose a simple design pattern to construct a library of differentiable operators and discretizations, by representing operators as mappings between families of continuous functions, parametrized by finite vectors. We demonstrate the approach on an acoustic optimization problem, where the Helmholtz equation is discretized using Fourier spectral methods, and differentiability is demonstrated using gradient descent to optimize the speed of sound of an acoustic lens. The proposed framework is open-sourced and available at url{https://github.com/ucl-bug/jaxdf}

【7】 CAESynth: Real-Time Timbre Interpolation and Pitch Control with Conditional Autoencoders 标题:CAESynth:带条件自动编码器的实时音色插值和音调控制 链接:https://arxiv.org/abs/2111.05174

作者:Aaron Valero Puche,Sukhan Lee 机构:Artificial Intelligence School, Sungkyunkwan University, Suwon-si , Republic of Korea 备注:MLSP 2021 摘要:在本文中,我们提出了一种基于条件自动编码器的新型音频合成器CAESynth。CAESynth通过在共享的潜在特征空间中插值参考声音实时合成音色,同时独立控制音高。我们表明,训练一个基于音色分类准确度的条件自动编码器,再加上音高内容的对抗性正则化,可以使潜在空间中的音色分布在音色插值和音高调节方面更加有效和稳定。所提出的方法不仅适用于音乐线索的创建,而且也适用于基于新的音色与环境声音混合的混合现实中音频启示的探索。我们通过实验证明,CAESynth通过音色插值和独立但精确的音高控制实现了平滑和高保真的实时音频合成,用于音乐线索以及环境声音的音频提供。在线共享Python实现以及一些生成的示例。 摘要:In this paper, we present a novel audio synthesizer, CAESynth, based on a conditional autoencoder. CAESynth synthesizes timbre in real-time by interpolating the reference sounds in their shared latent feature space, while controlling a pitch independently. We show that training a conditional autoencoder based on accuracy in timbre classification together with adversarial regularization of pitch content allows timbre distribution in latent space to be more effective and stable for timbre interpolation and pitch conditioning. The proposed method is applicable not only to creation of musical cues but also to exploration of audio affordance in mixed reality based on novel timbre mixtures with environmental sounds. We demonstrate by experiments that CAESynth achieves smooth and high-fidelity audio synthesis in real-time through timbre interpolation and independent yet accurate pitch control for musical cues as well as for audio affordance with environmental sound. A Python implementation along with some generated samples are shared online.

【8】 Losses, Dissonances, and Distortions 标题:损失、不和谐和扭曲 链接:https://arxiv.org/abs/2111.05128

作者:Pablo Samuel Castro 机构:Google Research, Brain Team 备注:In the 5th Machine Learning for Creativity and Design Workshop at NeurIPS 2021 摘要:在这篇论文中,我提出了一项研究,利用在训练简单函数逼近器过程中获得的损失和梯度,作为在钢琴独奏表演环境中产生音乐不和谐和视觉失真的机制。这些不和谐和扭曲不仅通过影响视觉效果,而且通过影响艺术音乐表演,成为艺术表演的一部分。该系统的设计使得表演者可以反过来影响训练过程本身,从而在两个过程之间创建一个闭环反馈:机器学习模型的训练和即兴钢琴曲的演奏。 摘要:In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting. These dissonances and distortions become part of an artistic performance not just by affecting the visualizations, but also by affecting the artistic musical performance. The system is designed such that the performer can in turn affect the training process itself, thereby creating a closed feedback loop between two processes: the training of a machine learning model and the performance of an improvised piano piece.

【9】 MMD-ReID: A Simple but Effective Solution for Visible-Thermal Person ReID 标题:MMD-REID:一种简单而有效的可见光人体Reid解决方案 链接:https://arxiv.org/abs/2111.05059

作者:Chaitra Jambigi,Ruchit Rawal,Anirban Chakraborty 机构:Department of Computational and Data, Sciences, Indian Institute of Science, Bangalore, India 备注:Accepted in BMVC 2021 (Oral) 摘要:学习模态不变特征是可见热跨模态人员再识别(VT ReID)问题的核心,其中查询和画廊图像来自不同的模态。现有工作通过使用对抗性学习或仔细设计严重依赖领域知识的特征提取模块,隐式地在像素和特征空间中对齐模式。我们提出了一个简单而有效的框架MMD ReID,该框架通过一个显式的差异减少约束来减少模态差异。MMD ReID的灵感来自最大平均差异(MMD),这是一种广泛用于假设检验的统计工具,用于确定两个分布之间的距离。MMD ReID使用一种新的基于边缘的公式来匹配可见和热样本的类条件特征分布,以最小化类内距离,同时保持特征可辨别性。MMD ReID在架构和损耗公式方面是一个简单的框架。我们进行了大量的实验,定性和定量地证明了MMD-ReID在对齐边缘和类条件分布方面的有效性,从而学习了模态独立和身份一致的特征。该框架在SYSU-MM01和RegDB数据集上的性能明显优于最先进的方法。守则将于https://github.com/vcl-iisc/MMD-ReID 摘要:Learning modality invariant features is central to the problem of Visible-Thermal cross-modal Person Reidentification (VT-ReID), where query and gallery images come from different modalities. Existing works implicitly align the modalities in pixel and feature spaces by either using adversarial learning or carefully designing feature extraction modules that heavily rely on domain knowledge. We propose a simple but effective framework, MMD-ReID, that reduces the modality gap by an explicit discrepancy reduction constraint. MMD-ReID takes inspiration from Maximum Mean Discrepancy (MMD), a widely used statistical tool for hypothesis testing that determines the distance between two distributions. MMD-ReID uses a novel margin-based formulation to match class-conditional feature distributions of visible and thermal samples to minimize intra-class distances while maintaining feature discriminability. MMD-ReID is a simple framework in terms of architecture and loss formulation. We conduct extensive experiments to demonstrate both qualitatively and quantitatively the effectiveness of MMD-ReID in aligning the marginal and class conditional distributions, thus learning both modality-independent and identity-consistent features. The proposed framework significantly outperforms the state-of-the-art methods on SYSU-MM01 and RegDB datasets. Code will be released at https://github.com/vcl-iisc/MMD-ReID

【10】 Optimizing Bayesian acquisition functions in Gaussian Processes 标题:高斯过程中贝叶斯捕获函数的优化 链接:https://arxiv.org/abs/2111.04930

作者:Ashish Anil Pawar,Ujwal Warbhe 机构:College of Engineering, Pune, National Institute of Technology, Srinagar 备注:9 Pages, 12 Figures, 1 Table, 10 Equations 摘要:贝叶斯优化是一种搜索目标函数全局极大值的有效方法,特别是在函数未知的情况下。该过程包括使用代理函数和选择采集函数,然后优化采集函数以找到下一个采样点。本文分析了不同的采集函数,如最大改进概率和预期改进,以及各种优化器,如L-BFGS和TNC,以优化采集函数以找到下一个采样点。通过对所用时间的分析,本文还说明了初始样本位置的重要性。 摘要:Bayesian Optimization is an effective method for searching the global maxima of an objective function especially if the function is unknown. The process comprises of using a surrogate function and choosing an acquisition function followed by optimizing the acquisition function to find the next sampling point. This paper analyzes different acquistion functions like Maximum Probability of Improvement and Expected Improvement and various optimizers like L-BFGS and TNC to optimize the acquisitions functions for finding the next sampling point. Along with the analysis of time taken, the paper also shows the importance of position of initial samples chosen.

【11】 Active Sampling for Linear Regression Beyond the ell_2 Norm链接:https://arxiv.org/abs/2111.04888

作者:Cameron Musco,Christopher Musco,David P. Woodruff,Taisuke Yasuda 机构:UMass Amherst, NYU, CMU 备注:Abstract shortened to meet arXiv limits 摘要:我们研究了线性回归的主动抽样算法,其目的是只查询目标向量$binmathbb{R}^n$的少量条目,并输出一个接近极小值的$min{xinmathbb{R}^d}\\ Ax-b\$,其中$ainmathbb{R}{ntimes d}$是一个设计矩阵,$\\\ cdot\\是一些损失函数。对于任何$0<p<infty$的$ellu p$范数回归,我们给出了一种基于Lewis加权抽样的算法,该算法只使用$tilde{O}(d^{max(1,{p/2})}/mathrm{poly}(epsilon))$查询$b$即可输出$(1 epsilon)$近似解。我们表明,这种对$d$的依赖是最优的,直到对数因子。我们的结果解决了Chen和Derezi{n}ski最近提出的一个公开问题,他们给出了$ellu_1$范数的近似最优界,以及$ellu_p$回归与$in(1,2)$的次优界。我们还提供了损失函数的第一个总灵敏度上界$O(d^{max{1,p/2}log^2n)$,其最大程度为$p$多项式增长。这改进了Tukan、Maalouf和Feldman最近的结果。通过结合我们对$ellu p$回归结果的技术,我们得到了一个主动回归算法,使得$tilde O(d^{1 max{1,p/2}/mathrm{poly}(epsilon))$查询,回答了Chen和Derezi{n}ski的另一个开放问题。对于Huber损失的重要特例,我们进一步改进了活动样本复杂度为$tilde O(d^{(1 sqrt2)/2}/epsilon^c)$的界,以及非活动样本复杂度为$tilde O(d^{4-2sqrt 2}/epsilon^c)$的界,改进了之前由于Clarkson和Woodruff的Huber回归的$d^4$界。我们的灵敏度边界具有进一步的含义,改进了以前使用灵敏度采样的各种结果,包括Orlicz范数子空间嵌入和鲁棒子空间近似。最后,我们的主动抽样结果给出了在每$ellp$范数下Kronecker乘积回归的第一次次次线性时间算法。 摘要:We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector $binmathbb{R}^n$ and output a near minimizer to $min_{xinmathbb{R}^d}|Ax-b|$, where $Ainmathbb{R}^{n times d}$ is a design matrix and $|cdot|$ is some loss function. For $ell_p$ norm regression for any $0<p<infty$, we give an algorithm based on Lewis weight sampling that outputs a $(1 epsilon)$ approximate solution using just $tilde{O}(d^{max(1,{p/2})}/mathrm{poly}(epsilon))$ queries to $b$. We show that this dependence on $d$ is optimal, up to logarithmic factors. Our result resolves a recent open question of Chen and Derezi'{n}ski, who gave near optimal bounds for the $ell_1$ norm, and suboptimal bounds for $ell_p$ regression with $pin(1,2)$. We also provide the first total sensitivity upper bound of $O(d^{max{1,p/2}}log^2 n)$ for loss functions with at most degree $p$ polynomial growth. This improves a recent result of Tukan, Maalouf, and Feldman. By combining this with our techniques for the $ell_p$ regression result, we obtain an active regression algorithm making $tilde O(d^{1 max{1,p/2}}/mathrm{poly}(epsilon))$ queries, answering another open question of Chen and Derezi'{n}ski. For the important special case of the Huber loss, we further improve our bound to an active sample complexity of $tilde O(d^{(1 sqrt2)/2}/epsilon^c)$ and a non-active sample complexity of $tilde O(d^{4-2sqrt 2}/epsilon^c)$, improving a previous $d^4$ bound for Huber regression due to Clarkson and Woodruff. Our sensitivity bounds have further implications, improving a variety of previous results using sensitivity sampling, including Orlicz norm subspace embeddings and robust subspace approximation. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $ell_p$ norm.

【12】 A toolkit for data-driven discovery of governing equations in high-noise regimes 标题:一个用于高噪声区域中控制方程的数据驱动发现的工具包 链接:https://arxiv.org/abs/2111.04870

作者:Charles B. Delahunt,J. Nathan Kutz 机构:Department of Applied Mathematics, University of Washington, Seattle, WA ,- 备注:Body 21 pages. Total length with Appendix 32 pages. 17 Figures, 8 Tables 摘要:我们考虑数据驱动的发现控制方程的时间序列数据在高噪声的限制。所开发的算法描述了在非线性动力学稀疏识别(SINDy)框架中规避噪声有害影响的广泛方法工具包。我们提供了两个主要贡献,都集中于从系统x'=f(x)获取的噪声数据。首先,为了在高噪声环境中使用,我们提出了一个广泛的工具包,该工具包可以对SINDy回归方法进行关键性扩展,从一个过完整的库中逐步剔除泛函,并生成一组回归到导数x'的稀疏方程。这些创新可以从高噪声时间序列数据(例如300%附加噪声)中提取稀疏控制方程和系数。例如,它在Lorenz系统中发现正确的稀疏库,中值系数估计误差等于1%-3%(对于50%的噪声),6%-8%(对于100%的噪声);和23%-25%(对于300%的噪音)。工具箱中的启用模块组合成一个方法,但单个模块可以策略性地应用于其他方程发现方法(SINDy或not),以改进高噪声数据的结果。其次,我们提出了一种适用于基于x'=f(x)的任何模型发现方法的技术,以评估由于噪声数据而导致的非唯一解上下文中发现的模型的准确性。目前,这种非唯一性可能会模糊已发现模型的准确性,从而影响发现方法的有效性。我们描述了一种技术,该技术使用泛函之间的线性依赖关系将发现的模型转换为最接近真实模型的等效形式,从而能够更准确地评估发现的模型的准确性。 摘要:We consider the data-driven discovery of governing equations from time-series data in the limit of high noise. The algorithms developed describe an extensive toolkit of methods for circumventing the deleterious effects of noise in the context of the sparse identification of nonlinear dynamics (SINDy) framework. We offer two primary contributions, both focused on noisy data acquired from a system x' = f(x). First, we propose, for use in high-noise settings, an extensive toolkit of critically enabling extensions for the SINDy regression method, to progressively cull functionals from an over-complete library and yield a set of sparse equations that regress to the derivate x'. These innovations can extract sparse governing equations and coefficients from high-noise time-series data (e.g. 300% added noise). For example, it discovers the correct sparse libraries in the Lorenz system, with median coefficient estimate errors equal to 1% - 3% (for 50% noise), 6% - 8% (for 100% noise); and 23% - 25% (for 300% noise). The enabling modules in the toolkit are combined into a single method, but the individual modules can be tactically applied in other equation discovery methods (SINDy or not) to improve results on high-noise data. Second, we propose a technique, applicable to any model discovery method based on x' = f(x), to assess the accuracy of a discovered model in the context of non-unique solutions due to noisy data. Currently, this non-uniqueness can obscure a discovered model's accuracy and thus a discovery method's effectiveness. We describe a technique that uses linear dependencies among functionals to transform a discovered model into an equivalent form that is closest to the true model, enabling more accurate assessment of a discovered model's accuracy.

【13】 Solving Marginal MAP Exactly by Probabilistic Circuit Transformations 标题:用概率电路变换精确求解边际映射 链接:https://arxiv.org/abs/2111.04833

作者:YooJung Choi,Tal Friedman,Guy Van den Broeck 机构:Computer Science Department, University of California, Los Angeles 摘要:概率电路(PC)是一类易于处理的概率模型,它允许对边缘和最可能解释(MPE)等查询进行有效的、通常是线性时间的推理。然而,边缘图是许多决策问题的核心,除非它们满足高度限制性的结构约束,否则对于PC来说仍然是一个很难查询的问题。在本文中,我们开发了一种剪枝算法,该算法删除与边缘地图查询无关的PC部分,在保持正确解的同时缩小PC。这种修剪技术非常有效,因此我们能够仅基于迭代变换电路构建边缘贴图解算器——无需搜索。我们在实际数据集上以经验证明了我们的方法的有效性。 摘要:Probabilistic circuits (PCs) are a class of tractable probabilistic models that allow efficient, often linear-time, inference of queries such as marginals and most probable explanations (MPE). However, marginal MAP, which is central to many decision-making problems, remains a hard query for PCs unless they satisfy highly restrictive structural constraints. In this paper, we develop a pruning algorithm that removes parts of the PC that are irrelevant to a marginal MAP query, shrinking the PC while maintaining the correct solution. This pruning technique is so effective that we are able to build a marginal MAP solver based solely on iteratively transforming the circuit -- no search is required. We empirically demonstrate the efficacy of our approach on real-world datasets.

【14】 ML-EXray: Visibility into ML Deployment on the Edge 标题:ML-EXRAY:对边缘上的ML部署的可见性 链接:https://arxiv.org/abs/2111.04779

作者:Hang Qiu,Ioanna Vavelidou,Jian Li,Evgenya Pergament,Pete Warden,Sandeep Chinchali,Zain Asgar,Sachin Katti 摘要:得益于不断扩展的云基础设施,深度神经网络(DNN)如今在云中训练时具有越来越高的性能。研究人员花费数月的努力争取模型准确度的额外几个百分点。然而,实际上,当这些模型部署在边缘设备上时,通常性能会突然下降10%以上,而没有明显的原因。关键的挑战是,在边缘设备上执行ML推理的可见性不高,并且在边缘部署过程中很少意识到潜在的问题。我们介绍了MLExray,一个端到端框架,它提供了对ML执行的层级细节的可见性,并帮助开发人员分析和调试云到边缘部署问题。通常情况下,边缘性能次优的原因不仅在于模型本身,还在于整个数据流和部署过程中的每个操作。评估表明,ML-EXray可以有效捕获部署问题,如预处理错误、量化问题、次优内核等。使用ML-EXray,用户需要编写不到15行代码才能全面检查边缘部署管道。为了消除这些问题,ML EXray可以将模型性能校正高达30%,精确定位容易出错的层,并指导用户将内核执行延迟优化两个数量级。代码和API将作为开源多语言工具库和Python部署验证库发布。 摘要:Benefiting from expanding cloud infrastructure, deep neural networks (DNNs) today have increasingly high performance when trained in the cloud. Researchers spend months of effort competing for an extra few percentage points of model accuracy. However, when these models are actually deployed on edge devices in practice, very often, the performance can abruptly drop over 10% without obvious reasons. The key challenge is that there is not much visibility into ML inference execution on edge devices, and very little awareness of potential issues during the edge deployment process. We present ML-EXray, an end-to-end framework, which provides visibility into layer-level details of the ML execution, and helps developers analyze and debug cloud-to-edge deployment issues. More often than not, the reason for sub-optimal edge performance does not only lie in the model itself, but every operation throughout the data flow and the deployment process. Evaluations show that ML-EXray can effectively catch deployment issues, such as pre-processing bugs, quantization issues, suboptimal kernels, etc. Using ML-EXray, users need to write less than 15 lines of code to fully examine the edge deployment pipeline. Eradicating these issues, ML-EXray can correct model performance by up to 30%, pinpoint error-prone layers, and guide users to optimize kernel execution latency by two orders of magnitude. Code and APIs will be released as an open-source multi-lingual instrumentation library and a Python deployment validation library.

【15】 Mode connectivity in the loss landscape of parameterized quantum circuits 标题:参数化量子电路损耗图景中的模式连通性 链接:https://arxiv.org/abs/2111.05311

作者:Kathleen E. Hamilton,Emily Lynn,Raphael C. Pooser 机构:Received: date Accepted: date 备注:14 pages, related to work presented at QTML 2020 摘要:参数化量子电路(PQCs)的变分训练支持许多用于短期噪声中尺度量子(NISQ)的工作流程设备。这是一种混合量子经典方法,通过最小化相关成本函数来训练参数化ansatz。在本文中,我们采用了{Goodfellow2014Quastically,li2017visualizing}中介绍的神经网络的定性损失景观表征,以及{cite中使用的连接性测试{draxler2018c}为了研究PQC训练中的损失景观特征。我们使用双层电路ansatz(由参数化旋转门和缠绕门交替层组成)展示了在简单回归任务中训练PQC的结果。使用$3$不同批次梯度优化器训练多个电路:随机梯度下降,quantum natural gradient和Adam。我们确定了景观中的大特征,这些特征可以加快训练工作流的收敛速度。 摘要:Variational training of parameterized quantum circuits (PQCs) underpins many workflows employed on near-term noisy intermediate scale quantum (NISQ) devices. It is a hybrid quantum-classical approach that minimizes an associated cost function in order to train a parameterized ansatz. In this paper we adapt the qualitative loss landscape characterization for neural networks introduced in cite{goodfellow2014qualitatively,li2017visualizing} and tests for connectivity used in cite{draxler2018essentially} to study the loss landscape features in PQC training. We present results for PQCs trained on a simple regression task, using the bilayer circuit ansatz, which consists of alternating layers of parameterized rotation gates and entangling gates. Multiple circuits are trained with $3$ different batch gradient optimizers: stochastic gradient descent, the quantum natural gradient, and Adam. We identify large features in the landscape that can lead to faster convergence in training workflows.

【16】 Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression 标题:分位数回归中非单调性和交叉性问题的解决 链接:https://arxiv.org/abs/2111.04805

作者:Resve A. Saleh,A. K. Md. Ehsanes Saleh 机构:Dept. of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada, A.K.Md. Ehsanes Saleh, School of Mathematics and Statistics, Carleton University, Ottawa, Canada 备注:10 pages, 13 figures, IEEE conference format 摘要:本文提出了一种新的方法来解决长期存在的条件分位数函数和结构分位数函数估计缺乏单调性的问题,即分位数交叉问题。分位数回归是数据科学特别是计量经济学中非常强大的工具。不幸的是,40多年来,交叉问题一直困扰着研究者和实践者。为了找到一个可接受的解决方案,已经进行了无数次尝试,但迄今为止还没有找到简单而普遍的解决方案。本文描述了一个优雅的问题解决方案,它基于一个简单的数学方程,易于理解并在R和Python中实现,同时大大减少了交叉问题。它在分位数回归经常使用的所有领域都非常重要,也可能在稳健回归中得到应用,特别是在机器学习的背景下。 摘要:This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike for over 4 decades. Numerous attempts have been made to find an acceptable solution but no simple and general solution has been found to date. This paper describes an elegant solution to the problem which is based on a single mathematical equation that is easy to understand and implement in R and Python, while greatly reducing the crossing problem. It will be very important in all areas where quantile regression is routinely used and may also find application in robust regression, especially in the context of machine learning.

0 人点赞