cs.LG 方向,今日共计86篇
Graph相关(图学习|图神经网络|图优化等)(13篇)
【1】 Model Stealing Attacks Against Inductive Graph Neural Networks 标题:针对归纳图神经网络的模型窃取攻击 链接:https://arxiv.org/abs/2112.08331
作者:Yun Shen,Xinlei He,Yufei Han,Yang Zhang 备注:To Appear in the 43rd IEEE Symposium on Security and Privacy, May 22-26, 2022 摘要:许多真实世界的数据以图形的形式出现。图形神经网络(GNNs)是一个新的机器学习(ML)模型家族,它被提出来充分利用图形数据构建强大的应用程序。特别是,归纳GNN,它可以推广到看不见的数据,成为这一方向的主流。机器学习模型在各种任务中显示出巨大的潜力,并已部署在许多实际场景中。为了训练一个好的模型,需要大量的数据和计算资源,从而产生有价值的知识产权。以前的研究表明,ML模型容易受到模型窃取攻击,其目的是窃取目标模型的功能。然而,他们中的大多数人关注的是经过图像和文本训练的模型。另一方面,很少注意使用图形数据训练的模型,即GNN。在本文中,我们通过提出针对感应GNN的第一种模型窃取攻击来填补这一空白。我们系统地定义了威胁模型,并基于对手的背景知识和目标模型的响应提出了六种攻击。我们对六个基准数据集的评估表明,所提出的针对GNN的模型窃取攻击具有良好的性能。 摘要:Many real-world data come in the form of graphs. Graph neural networks (GNNs), a new family of machine learning (ML) models, have been proposed to fully leverage graph data to build powerful applications. In particular, the inductive GNNs, which can generalize to unseen data, become mainstream in this direction. Machine learning models have shown great potential in various tasks and have been deployed in many real-world scenarios. To train a good model, a large amount of data as well as computational resources are needed, leading to valuable intellectual property. Previous research has shown that ML models are prone to model stealing attacks, which aim to steal the functionality of the target models. However, most of them focus on the models trained with images and texts. On the other hand, little attention has been paid to models trained with graph data, i.e., GNNs. In this paper, we fill the gap by proposing the first model stealing attacks against inductive GNNs. We systematically define the threat model and propose six attacks based on the adversary's background knowledge and the responses of the target models. Our evaluation on six benchmark datasets shows that the proposed model stealing attacks against GNNs achieve promising performance.
【2】 Joint Demand Prediction for Multimodal Systems: A Multi-task Multi-relational Spatiotemporal Graph Neural Network Approach 标题:多通道系统联合需求预测:一种多任务多关系时空图神经网络方法 链接:https://arxiv.org/abs/2112.08078
作者:Yuebing Liang,Guan Huang,Zhan Zhao 摘要:动态需求预测对于城市交通系统的有效运行和管理至关重要。人们对单一运输方式的需求预测进行了广泛的研究,忽略了不同运输方式的需求可以相互关联的事实。尽管最近做出了一些努力,但现有的多模式需求预测方法通常不够灵活,无法考虑具有不同空间单元和跨不同模式的异构时空相关性的多路网络。为了解决这些问题,本研究提出了一种用于多模式需求预测的多关系时空图神经网络(ST-MRGNN)。具体而言,跨模式的空间依赖性使用多个模态内和模态间关系图进行编码。介绍了一种多关系图神经网络(MRGNN)来捕获跨模式异构空间依赖,该网络由广义图卷积网络和基于注意的聚合模块组成,前者用于学习关系图中的消息传递机制,后者用于总结不同的关系。我们进一步将MRGNS与时间选通卷积层集成,以联合建模异质时空相关性。使用来自纽约市的真实世界地铁和打车数据集进行了大量实验,结果验证了我们提出的方法在不同模式下比现有方法的性能改进。对于需求稀少的位置,改进尤其大。对ST-MRGNN注意机制的进一步分析也表明,ST-MRGNN对理解跨模式相互作用具有良好的解释能力。 摘要:Dynamic demand prediction is crucial for the efficient operation and management of urban transportation systems. Extensive research has been conducted on single-mode demand prediction, ignoring the fact that the demands for different transportation modes can be correlated with each other. Despite some recent efforts, existing approaches to multimodal demand prediction are generally not flexible enough to account for multiplex networks with diverse spatial units and heterogeneous spatiotemporal correlations across different modes. To tackle these issues, this study proposes a multi-relational spatiotemporal graph neural network (ST-MRGNN) for multimodal demand prediction. Specifically, the spatial dependencies across modes are encoded with multiple intra- and inter-modal relation graphs. A multi-relational graph neural network (MRGNN) is introduced to capture cross-mode heterogeneous spatial dependencies, consisting of generalized graph convolution networks to learn the message passing mechanisms within relation graphs and an attention-based aggregation module to summarize different relations. We further integrate MRGNNs with temporal gated convolution layers to jointly model heterogeneous spatiotemporal correlations. Extensive experiments are conducted using real-world subway and ride-hailing datasets from New York City, and the results verify the improved performance of our proposed approach over existing methods across modes. The improvement is particularly large for demand-sparse locations. Further analysis of the attention mechanisms of ST-MRGNN also demonstrates its good interpretability for understanding cross-mode interactions.
【3】 TLogic: Temporal Logical Rules for Explainable Link Forecasting on Temporal Knowledge Graphs 标题:TLogic:时态知识图上可解释链接预测的时态逻辑规则 链接:https://arxiv.org/abs/2112.08025
作者:Yushan Liu,Yunpu Ma,Marcel Hildebrandt,Mitchell Joblin,Volker Tresp 备注:Accepted at AAAI 2022 (36th AAAI Conference on Artificial Intelligence) 摘要:传统的静态知识图将关系数据中的实体建模为节点,由特定关系类型的边连接。然而,信息和知识不断发展,时间动态不断出现,预计将影响未来局势。在时态知识图中,时间信息通过为每条边配备时间戳或时间范围而集成到图中。基于嵌入的方法已经被引入到时态知识图的链接预测中,但它们大多缺乏可解释性和可理解的推理链。特别是,它们通常不用于处理链接预测——涉及未来时间戳的事件预测。我们讨论了时态知识图上的链接预测任务,并介绍了基于时态随机游动提取的时态逻辑规则的可解释框架TLogic。在三个基准数据集上,我们将TLogic与最先进的基线进行了比较,显示出更好的总体性能,同时我们的方法也提供了保持时间一致性的解释。此外,与大多数最先进的基于嵌入的方法相比,TLogic在归纳环境中工作得很好,在归纳环境中,已经学习的规则被转移到具有公共词汇表的相关数据集。 摘要:Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting -- event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.
【4】 Graph Representation Learning via Contrasting Cluster Assignments 标题:基于对比聚类分配的图表示学习 链接:https://arxiv.org/abs/2112.07934
作者:Chunyang Zhang,Hongyu Yao,C. L. Philip Chen,Yuena Lin 摘要:近年来,随着对比学习的兴起,无监督图表示学习蓬勃发展,在一些机器学习任务中甚至超过了有监督图表示学习。现有的大多数图表示学习对比模型要么着眼于最大化局部嵌入和全局嵌入之间的互信息,要么主要依赖于节点级的对比嵌入。然而,它们仍然不够精致,无法全面探索网络拓扑的局部和全局视图。虽然前者考虑了局部-全局关系,但其粗糙的全局信息导致局部和全局视图之间的勉强合作。后者注重节点级特征对齐,因此全局视图的作用显得不明显。为了避免陷入这两种极端情况,我们通过对比聚类分配提出了一种新的无监督图表示模型,称为GRCCA。通过将聚类算法与对比学习相结合,激发人们综合利用局部和全局信息的动机。这不仅有利于对比效果,而且提供了更高质量的图形信息。同时,GRCCA进一步挖掘了集群级的信息,这使得它能够洞察图拓扑以外的节点之间难以捉摸的关联。具体地说,我们首先生成两个具有不同图扩充策略的扩充图,然后使用聚类算法分别获得它们的聚类分配和原型。所提出的GRCCA通过最小化交叉熵损失,进一步迫使来自不同增广图的相同节点相互识别其集群分配。为了证明其有效性,我们在三个不同的下游任务中与最先进的模型进行了比较。实验结果表明,GRCCA在大多数任务中具有较强的竞争力。 摘要:With the rise of contrastive learning, unsupervised graph representation learning has been booming recently, even surpassing the supervised counterparts in some machine learning tasks. Most of existing contrastive models for graph representation learning either focus on maximizing mutual information between local and global embeddings, or primarily depend on contrasting embeddings at node level. However, they are still not exquisite enough to comprehensively explore the local and global views of network topology. Although the former considers local-global relationship, its coarse global information leads to grudging cooperation between local and global views. The latter pays attention to node-level feature alignment, so that the role of global view appears inconspicuous. To avoid falling into these two extreme cases, we propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA. It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning. This not only facilitates the contrastive effect, but also provides the more high-quality graph information. Meanwhile, GRCCA further excavates cluster-level information, which make it get insight to the elusive association between nodes beyond graph topology. Specifically, we first generate two augmented graphs with distinct graph augmentation strategies, then employ clustering algorithms to obtain their cluster assignments and prototypes respectively. The proposed GRCCA further compels the identical nodes from different augmented graphs to recognize their cluster assignments mutually by minimizing a cross entropy loss. To demonstrate its effectiveness, we compare with the state-of-the-art models in three different downstream tasks. The experimental results show that GRCCA has strong competitiveness in most tasks.
【5】 Learning Graph Partitions 标题:学习图分区 链接:https://arxiv.org/abs/2112.07897
作者:Sayan Mukherjee 备注:9 pages, 2 figures 摘要:如果将一个图划分为多个连接的组件,则成员资格oracle将断言该图的任意两个顶点是否位于同一组件中。我们证明了对于$ngekge2$,学习具有$k$分量的$n$顶点隐藏图的分量至少需要$frac{1}{2}(n-k)(k-1)$成员查询。这证明了Reyzin和Srivastava(2007)针对该问题提出的$O(nk)$算法的最优性,改进了$Omega(nlog k)$查询的最著名信息理论界。此外,我们构造了一个oracle,它可以在渐进较少的查询中学习$G$的组件数量,而不是学习完整分区,从而回答了同一作者提出的另一个问题。最后,我们介绍了这个oracle的一个更适用的版本,并证明了$widetildeTheta(m)$查询的渐近紧界,用于使用这个oracle学习和验证$m$-边隐藏图$G$。 摘要:Given a partition of a graph into connected components, the membership oracle asserts whether any two vertices of the graph lie in the same component or not. We prove that for $nge kge 2$, learning the components of an $n$-vertex hidden graph with $k$ components requires at least $frac{1}{2}(n-k)(k-1)$ membership queries. This proves the optimality of the $O(nk)$ algorithm proposed by Reyzin and Srivastava (2007) for this problem, improving on the best known information-theoretic bound of $Omega(nlog k)$ queries. Further, we construct an oracle that can learn the number of components of $G$ in asymptotically fewer queries than learning the full partition, thus answering another question posed by the same authors. Lastly, we introduce a more applicable version of this oracle, and prove asymptotically tight bounds of $widetildeTheta(m)$ queries for both learning and verifying an $m$-edge hidden graph $G$ using this oracle.
【6】 Graph-based Ensemble Machine Learning for Student Performance Prediction 标题:基于图的集成机器学习在学生成绩预测中的应用 链接:https://arxiv.org/abs/2112.07893
作者:Yinkai Wang,Aowei Ding,Kaiyi Guan,Shixi Wu,Yuanqi Du 备注:5 pages, 3 figures and 3 tables 摘要:学生成绩预测是了解学生需求、提供适当的学习机会/资源以及提高教学质量的关键研究问题。然而,传统的机器学习方法无法产生稳定、准确的预测结果。在本文中,我们提出了一种基于图的集成机器学习方法,旨在通过多种方法的一致性来提高单个机器学习方法的稳定性。具体来说,我们利用有监督预测方法和无监督聚类方法,构建一种迭代方法,该方法在二部图中传播,并收敛到更稳定和准确的预测结果。大量的实验证明了我们提出的方法在预测更准确的学生成绩方面的有效性。具体来说,我们的模型在预测精度上比最好的传统机器学习算法高出14.8%。 摘要:Student performance prediction is a critical research problem to understand the students' needs, present proper learning opportunities/resources, and develop the teaching quality. However, traditional machine learning methods fail to produce stable and accurate prediction results. In this paper, we propose a graph-based ensemble machine learning method that aims to improve the stability of single machine learning methods via the consensus of multiple methods. To be specific, we leverage both supervised prediction methods and unsupervised clustering methods, build an iterative approach that propagates in a bipartite graph as well as converges to more stable and accurate prediction results. Extensive experiments demonstrate the effectiveness of our proposed method in predicting more accurate student performance. Specifically, our model outperforms the best traditional machine learning algorithms by up to 14.8% in prediction accuracy.
【7】 Bayesian Graph Contrastive Learning 标题:贝叶斯图对比学习 链接:https://arxiv.org/abs/2112.07823
作者:Arman Hasanzadeh,Mohammadreza Armandpour,Ehsan Hajiramezanali,Mingyuan Zhou,Nick Duffield,Krishna Narayanan 摘要:对比学习已成为图形结构数据自监督学习方法的关键组成部分。然而,现有的图形对比学习方法尽管取得了成功,但无法对节点表示或其下游任务进行不确定性量化,限制了其在高风险领域的应用。在本文中,我们提出了一种新的贝叶斯图对比学习方法,表明随机增强导致随机编码器。因此,与将每个节点嵌入到确定性向量的现有技术相比,我们提出的方法通过潜在空间中的分布来表示每个节点。通过学习分布表示,我们在下游图分析任务中提供了不确定性估计,并提高了预测模型的表达能力。此外,我们提出了一个贝叶斯框架来推断对比模型每个视图中的扰动概率,从而消除了对超参数调整进行计算代价高昂的搜索的需要。在几个基准数据集上,与现有的最新方法相比,我们的经验表明性能有了相当大的提高。 摘要:Contrastive learning has become a key component of self-supervised learning approaches for graph-structured data. However, despite their success, existing graph contrastive learning methods are incapable of uncertainty quantification for node representations or their downstream tasks, limiting their application in high-stakes domains. In this paper, we propose a novel Bayesian perspective of graph contrastive learning methods showing random augmentations leads to stochastic encoders. As a result, our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector. By learning distributional representations, we provide uncertainty estimates in downstream graph analytics tasks and increase the expressive power of the predictive model. In addition, we propose a Bayesian framework to infer the probability of perturbations in each view of the contrastive model, eliminating the need for a computationally expensive search for hyperparameter tuning. We empirically show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
【8】 Network Graph Based Neural Architecture Search 标题:基于网络图的神经结构搜索 链接:https://arxiv.org/abs/2112.07805
作者:Zhenhan Huang,Chunheng Jiang,Pin-Yu Chen,Jianxi Gao 备注:12 pages 摘要:神经架构搜索可以实现架构设计的自动化。尽管它取得了成功,但它的计算成本很高,无法提供有关如何设计理想体系结构的见解。在这里,我们提出了一种新的神经网络搜索方法,通过重新布线相应的图来搜索神经结构,并通过图的属性来预测结构的性能。因为我们不在整个图空间上执行机器学习,而是使用预测的体系结构性能来搜索体系结构,所以搜索过程非常高效。我们发现基于图的搜索可以很好地预测理想的体系结构。此外,我们发现图形属性可以有效地预测体系结构性能。我们的工作提出了一种搜索神经结构的新方法,并为神经结构设计提供了见解。 摘要:Neural architecture search enables automation of architecture design. Despite its success, it is computationally costly and does not provide an insight on how to design a desirable architecture. Here we propose a new way of searching neural network where we search neural architecture by rewiring the corresponding graph and predict the architecture performance by graph properties. Because we do not perform machine learning over the entire graph space and use predicted architecture performance to search architecture, the searching process is remarkably efficient. We find graph based search can give a reasonably good prediction of desirable architecture. In addition, we find graph properties that are effective to predict architecture performance. Our work proposes a new way of searching neural architecture and provides insights on neural architecture design.
【9】 A Simple But Powerful Graph Encoder for Temporal Knowledge Graph Completion 标题:一种简单但功能强大的时态知识图补全图编码器 链接:https://arxiv.org/abs/2112.07791
作者:Zifeng Ding,Yunpu Ma,Bailan He,Volker Tresp 备注:9 pages, 4 figures 摘要:虽然知识图包含各种实体的丰富语义知识和它们之间的关系信息,但时态知识图(TKG)进一步表示实体随时间的交互。为了研究如何更好地建模TKGs,自动时态知识图完成(TKGC)引起了人们极大的兴趣。最近的TKGC方法旨在整合先进的深度学习技术,例如注意机制和Transformer,以提高模型性能。然而,我们发现,与采用各种复杂模块相比,更好地利用沿时间轴的全部时间信息更为有利。在本文中,我们为TKGC提出了一种简单但功能强大的图形编码器TARGCN。TARGCN是参数有效的,它广泛地利用了来自整个时间上下文的信息。我们在三个基准数据集上进行了实验。与最先进的模型相比,我们的模型可以在GDELT数据集上实现42%以上的相对改进。同时,它的性能优于ICEWS05-15数据集上最强的基线,参数减少约18.5%。 摘要:While knowledge graphs contain rich semantic knowledge of various entities and the relational information among them, temporal knowledge graphs (TKGs) further indicate the interactions of the entities over time. To study how to better model TKGs, automatic temporal knowledge graph completion (TKGC) has gained great interest. Recent TKGC methods aim to integrate advanced deep learning techniques, e.g., attention mechanism and Transformer, to boost model performance. However, we find that compared to adopting various kinds of complex modules, it is more beneficial to better utilize the whole amount of temporal information along the time axis. In this paper, we propose a simple but powerful graph encoder TARGCN for TKGC. TARGCN is parameter-efficient, and it extensively utilizes the information from the whole temporal context. We perform experiments on three benchmark datasets. Our model can achieve a more than 42% relative improvement on GDELT dataset compared with the state-of-the-art model. Meanwhile, it outperforms the strongest baseline on ICEWS05-15 dataset with around 18.5% fewer parameters.
【10】 Efficient Dynamic Graph Representation Learning at Scale 标题:有效的大规模动态图表示学习 链接:https://arxiv.org/abs/2112.07768
作者:Xinshi Chen,Yan Zhu,Haowen Xu,Mengyang Liu,Liang Xiong,Muhan Zhang,Le Song 摘要:节点之间具有有序事件序列的动态图在电子商务和社交平台等实际工业应用中非常普遍。然而,由于数据的时间和结构依赖性以及不规则性,动态图的表示学习带来了巨大的计算挑战,使得此类模型无法部署到实际应用中。为了应对这一挑战,我们提出了一种高效的动态图学习算法(EDGE),该算法通过训练损失选择性地表达一定的时间依赖性,以提高计算的并行性。我们证明了EDGE可以扩展到具有数百万个节点和数亿个时间事件的动态图,并实现新的最新性能(SOTA)。 摘要:Dynamic graphs with ordered sequences of events between nodes are prevalent in real-world industrial applications such as e-commerce and social platforms. However, representation learning for dynamic graphs has posed great computational challenges due to the time and structure dependency and irregular nature of the data, preventing such models from being deployed to real-world applications. To tackle this challenge, we propose an efficient algorithm, Efficient Dynamic Graph lEarning (EDGE), which selectively expresses certain temporal dependency via training loss to improve the parallelism in computations. We show that EDGE can scale to dynamic graphs with millions of nodes and hundreds of millions of temporal events and achieve new state-of-the-art (SOTA) performance.
【11】 Neighborhood Random Walk Graph Sampling for Regularized Bayesian Graph Convolutional Neural Networks 标题:正则化贝叶斯图卷积神经网络的邻域随机游动图采样 链接:https://arxiv.org/abs/2112.07743
作者:Aneesh Komanduri,Justin Zhan 备注:Accepted for publication at the 20th IEEE International Conference on Machine Learning and Applications (ICMLA 2021) 摘要:在社交媒体和网络的现代时代,真实世界现象的图形表示已经成为挖掘见解的非常有用的来源。通常,我们感兴趣的是理解图形中的实体是如何相互关联的。图形神经网络(GNN)已被证明是一种非常有用的工具,用于各种图形学习任务,包括节点分类、链接预测和边缘分类。然而,在大多数这些任务中,我们正在处理的图形数据可能有噪声,并且可能包含伪边。也就是说,与底层图形结构相关的不确定性很多。最近的不确定性建模方法是使用贝叶斯框架,将图形视为随机变量,概率与模型参数相关。将贝叶斯范式引入基于图的模型,特别是用于半监督节点分类,已经证明可以产生更高的分类精度。然而,最近提出的图推理方法没有考虑图的结构。本文提出了一种新的基于邻域随机游走抽样的贝叶斯图卷积网络(BGCN-NRWS)算法,该算法使用基于马尔可夫链蒙特卡罗(MCMC)的图抽样算法,利用图结构,通过变分推理层减少过拟合,与半监督节点分类中的最新技术相比,它能产生一致的竞争性分类结果。 摘要:In the modern age of social media and networks, graph representations of real-world phenomena have become an incredibly useful source to mine insights. Often, we are interested in understanding how entities in a graph are interconnected. The Graph Neural Network (GNN) has proven to be a very useful tool in a variety of graph learning tasks including node classification, link prediction, and edge classification. However, in most of these tasks, the graph data we are working with may be noisy and may contain spurious edges. That is, there is a lot of uncertainty associated with the underlying graph structure. Recent approaches to modeling uncertainty have been to use a Bayesian framework and view the graph as a random variable with probabilities associated with model parameters. Introducing the Bayesian paradigm to graph-based models, specifically for semi-supervised node classification, has been shown to yield higher classification accuracies. However, the method of graph inference proposed in recent work does not take into account the structure of the graph. In this paper, we propose a novel algorithm called Bayesian Graph Convolutional Network using Neighborhood Random Walk Sampling (BGCN-NRWS), which uses a Markov Chain Monte Carlo (MCMC) based graph sampling algorithm utilizing graph structure, reduces overfitting by using a variational inference layer, and yields consistently competitive classification results compared to the state-of-the-art in semi-supervised node classification.
【12】 TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of Clinical Trials 标题:TrialGraph:从临床试验的图形建模中获得机器智能的洞察力 链接:https://arxiv.org/abs/2112.08211
作者:Christopher Yacoumatos,Stefano Bragaglia,Anshul Kanakia,Nils Svangård,Jonathan Mangion,Claire Donoghue,Jim Weatherall,Faisal M. Khan,Khader Shameer 备注:17 pages (Manuscript); 3 pages (Supplemental Data); 9 figures 摘要:药物研发成功的一个主要障碍是临床试验的复杂性、成本和规模。临床试验数据的详细内部结构可能使常规优化难以实现。机器学习的最新进展,特别是图形结构数据分析,有可能在改善临床试验设计方面取得重大进展。TrialGraph试图应用这些方法来产生一个概念验证框架,用于开发能够帮助药物开发并使患者受益的模型。在这项工作中,我们首先介绍了一个由CT汇编而成的精心策划的临床试验数据集。gov、AACT和TrialTrove数据库(n=1191项试验;代表100万名患者)并描述了将这些数据转换为图形结构化格式的过程。然后,我们详细介绍了一系列图形机器学习算法的数学基础和实现,这些算法通常使用标准机器分类器处理嵌入在低维特征空间中的图形数据。我们对这些模型进行了训练,以根据疾病、现有医疗条件和治疗信息预测临床试验的副作用信息。MetaPath2Vec算法表现异常出色,标准逻辑回归、决策树、随机森林、支持向量和神经网络分类器的ROC-AUC分数分别为0.85、0.68、0.86、0.80和0.77。值得注意的是,当对等效阵列结构数据进行训练时,性能最好的分类器只能产生典型的ROC-AUC分数0.70。我们的工作表明,图形建模可以显著提高适当数据集的预测精度。该项目的后续版本细化了建模假设并纳入了更多的数据类型,可以在药物开发的实际应用中产生优秀的预测因子。 摘要:A major impediment to successful drug development is the complexity, cost, and scale of clinical trials. The detailed internal structure of clinical trial data can make conventional optimization difficult to achieve. Recent advances in machine learning, specifically graph-structured data analysis, have the potential to enable significant progress in improving the clinical trial design. TrialGraph seeks to apply these methodologies to produce a proof-of-concept framework for developing models which can aid drug development and benefit patients. In this work, we first introduce a curated clinical trial data set compiled from the CT.gov, AACT and TrialTrove databases (n=1191 trials; representing one million patients) and describe the conversion of this data to graph-structured formats. We then detail the mathematical basis and implementation of a selection of graph machine learning algorithms, which typically use standard machine classifiers on graph data embedded in a low-dimensional feature space. We trained these models to predict side effect information for a clinical trial given information on the disease, existing medical conditions, and treatment. The MetaPath2Vec algorithm performed exceptionally well, with standard Logistic Regression, Decision Tree, Random Forest, Support Vector, and Neural Network classifiers exhibiting typical ROC-AUC scores of 0.85, 0.68, 0.86, 0.80, and 0.77, respectively. Remarkably, the best performing classifiers could only produce typical ROC-AUC scores of 0.70 when trained on equivalent array-structured data. Our work demonstrates that graph modelling can significantly improve prediction accuracy on appropriate datasets. Successive versions of the project that refine modelling assumptions and incorporate more data types can produce excellent predictors with real-world applications in drug development.
【13】 Fast Computation of Generalized Eigenvectors for Manifold Graph Embedding 标题:流形图嵌入的广义特征向量的快速计算 链接:https://arxiv.org/abs/2112.07862
作者:Fei Chen,Gene Cheung,Xue Zhang 摘要:我们的目标是高效地计算输入图中节点的低维潜在坐标(称为图嵌入),以进行后续数据处理,如聚类。针对连续流形(称为流形图)上被解释为一致样本的有限图,我们利用现有的快速极值特征向量计算算法来快速执行。我们首先提出了稀疏矩阵对$(a,B)$的广义特征值问题,其中$a=L-muQ epsilonI$是图拉普拉斯$L$和断开的两跳差分矩阵$Q$的和。特征向量$v$最小化瑞利商$frac{v^{top}Av}{v^{top}v}$,从而最小化$1$-hop邻居距离,同时最大化断开的$2$-hop邻居之间的距离,保留图结构。然后选择定义特征向量正交性的矩阵$B=text{diag}({Bu i})$,以便采样域中的边界/内部节点具有相同的广义度$$N$图节点的K$维潜在向量是$(A、B)$的第一个$K$广义特征向量,使用LOBPCG在$cO(N)$中计算,其中$Kll N$。实验表明,我们的嵌入速度是文献中最快的,同时对流形图产生了最好的聚类性能。 摘要:Our goal is to efficiently compute low-dimensional latent coordinates for nodes in an input graph -- known as graph embedding -- for subsequent data processing such as clustering. Focusing on finite graphs that are interpreted as uniformly samples on continuous manifolds (called manifold graphs), we leverage existing fast extreme eigenvector computation algorithms for speedy execution. We first pose a generalized eigenvalue problem for sparse matrix pair $(A,B)$, where $A = L - mu Q epsilon I$ is a sum of graph Laplacian $L$ and disconnected two-hop difference matrix $Q$. Eigenvector $v$ minimizing Rayleigh quotient $frac{v^{top} A v}{v^{top} v}$ thus minimizes $1$-hop neighbor distances while maximizing distances between disconnected $2$-hop neighbors, preserving graph structure. Matrix $B = text{diag}({b_i})$ that defines eigenvector orthogonality is then chosen so that boundary / interior nodes in the sampling domain have the same generalized degrees. $K$-dimensional latent vectors for the $N$ graph nodes are the first $K$ generalized eigenvectors for $(A,B)$, computed in $cO(N)$ using LOBPCG, where $K ll N$. Experiments show that our embedding is among the fastest in the literature, while producing the best clustering performance for manifold graphs.
GAN|对抗|攻击|生成相关(6篇)
【1】 On the Convergence and Robustness of Adversarial Training 标题:论对抗性训练的收敛性和稳健性 链接:https://arxiv.org/abs/2112.08304
作者:Yisen Wang,Xingjun Ma,James Bailey,Jinfeng Yi,Bowen Zhou,Quanquan Gu 备注:ICML 2019 Long Talk. Fixing bugs in the proof of Theorem 1 摘要:提高深度神经网络(DNN)对对抗性示例的鲁棒性是安全深度学习的一个重要而富有挑战性的问题。在现有的防御技术中,具有投射梯度下降(PGD)的对抗训练是最有效的。对抗式训练解决了一个最小-最大优化问题,其中textit{internal maximization}通过最大化分类损失生成对抗性示例,而textit{outer minimalization}通过最小化由内部最大化生成的对抗性示例的损失来查找模型参数。因此,衡量内部最大化解决程度的标准对于对抗性训练至关重要。在本文中,我们提出了这样一个标准,即约束优化的一阶平稳条件(FOSC),来定量地评估在内部最大化中发现的对抗性例子的收敛质量。对于FOSC,我们发现,为了确保更好的鲁棒性,在训练的后期阶段使用具有更好收敛质量的对抗性示例是至关重要的。然而,在早期阶段,不需要高收敛质量的对抗性示例,甚至可能导致鲁棒性差。基于这些观察,我们提出了一种text{dynamic}训练策略,以逐渐提高生成的对抗性示例的收敛质量,从而显著提高对抗性训练的鲁棒性。我们的理论和实证结果表明了该方法的有效性。 摘要:Improving the robustness of deep neural networks (DNNs) to adversarial examples is an important yet challenging problem for secure deep learning. Across existing defense techniques, adversarial training with Projected Gradient Decent (PGD) is amongst the most effective. Adversarial training solves a min-max optimization problem, with the textit{inner maximization} generating adversarial examples by maximizing the classification loss, and the textit{outer minimization} finding model parameters by minimizing the loss on adversarial examples generated from the inner maximization. A criterion that measures how well the inner maximization is solved is therefore crucial for adversarial training. In this paper, we propose such a criterion, namely First-Order Stationary Condition for constrained optimization (FOSC), to quantitatively evaluate the convergence quality of adversarial examples found in the inner maximization. With FOSC, we find that to ensure better robustness, it is essential to use adversarial examples with better convergence quality at the textit{later stages} of training. Yet at the early stages, high convergence quality adversarial examples are not necessary and may even lead to poor robustness. Based on these observations, we propose a textit{dynamic} training strategy to gradually increase the convergence quality of the generated adversarial examples, which significantly improves the robustness of adversarial training. Our theoretical and empirical results show the effectiveness of the proposed method.
【2】 Generative Adversarial Networks for Data Generation in Structural Health Monitoring 标题:用于结构健康监测数据生成的产生式对抗性网络 链接:https://arxiv.org/abs/2112.08196
作者:Furkan Luleci,F. Necati Catbas,Onur Avci 摘要:结构健康监测(SHM)一直受益于数据科学领域的进步。各种类型的人工智能(AI)方法已被用于土木结构的评估和评估。在人工智能中,机器学习(ML)和深度学习(DL)算法需要大量的数据集进行训练;特别是,使用的数据DL模型越多,其输出就越好。然而,在SHM应用中,通过传感器从土木结构中收集数据成本高昂,获取有用数据(损伤相关数据)具有挑战性。在本文中,使用梯度惩罚的1-D Wasserstein损失深度卷积生成对抗网络(1-D WDCGAN-GP)生成与输入相似的损伤相关振动数据集。为了基于振动的损伤诊断,建立了一个1-D深卷积神经网络(1-D DCNN),并在真实数据集和生成的数据集上进行了训练和测试。两个数据集上的一维DCNN分类结果非常相似。本文介绍的工作表明,对于基于DL或ML的损伤诊断中数据不足的情况,1-D WDCGAN-GP可以成功生成用于模型训练的数据。关键词:一维生成对抗网络(GAN)、深卷积生成对抗网络(DCGAN)、带梯度惩罚的Wasserstein生成对抗网络(WGAN-GP)、一维卷积神经网络(CNN)、结构健康监测(SHM)、结构损伤诊断、结构损伤检测 摘要:Structural Health Monitoring (SHM) has been continuously benefiting from the advancements in the field of data science. Various types of Artificial Intelligence (AI) methods have been utilized for the assessment and evaluation of civil structures. In AI, Machine Learning (ML) and Deep Learning (DL) algorithms require plenty of datasets to train; particularly, the more data DL models are trained with, the better output it yields. Yet, in SHM applications, collecting data from civil structures through sensors is expensive and obtaining useful data (damage associated data) is challenging. In this paper, 1-D Wasserstein loss Deep Convolutional Generative Adversarial Networks using Gradient Penalty (1-D WDCGAN-GP) is utilized to generate damage associated vibration datasets that are similar to the input. For the purpose of vibration-based damage diagnostics, a 1-D Deep Convolutional Neural Network (1-D DCNN) is built, trained, and tested on both real and generated datasets. The classification results from the 1-D DCNN on both datasets resulted to be very similar to each other. The presented work in this paper shows that for the cases of insufficient data in DL or ML-based damage diagnostics, 1-D WDCGAN-GP can successfully generate data for the model to be trained on. Keywords: 1-D Generative Adversarial Networks (GAN), Deep Convolutional Generative Adversarial Networks (DCGAN), Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP), 1-D Convolutional Neural Networks (CNN), Structural Health Monitoring (SHM), Structural Damage Diagnostics, Structural Damage Detection
【3】 Generative Adversarial Networks for Labelled Vibration Data Generation 标题:用于标号振动数据生成的生成式对抗性网络 链接:https://arxiv.org/abs/2112.08195
作者:Furkan Luleci,F. Necati Catbas,Onur Avci 摘要:随着结构健康监测(SHM)的实施,土木结构的运行模态分析对于工程结构的评估和评估变得越来越重要。机器学习(ML)和深度学习(DL)算法在过去几十年中已用于土木结构的结构损伤诊断。虽然从土木结构中收集振动数据对于未受损和受损情况来说都是一项具有挑战性且昂贵的任务,但在本文中,作者介绍了基于深度卷积神经网络(DCNN)的生成性对抗网络(GAN),并使用Wasserstein距离生成用于结构损伤诊断目的的人工标记数据。作者将开发的模型命名为1D W-DCGAN,并成功生成与输入非常相似的振动数据。本文提出的方法将为SHM领域未来的许多应用奠定基础。 摘要:As Structural Health Monitoring (SHM) being implemented more over the years, the use of operational modal analysis of civil structures has become more significant for the assessment and evaluation of engineering structures. Machine Learning (ML) and Deep Learning (DL) algorithms have been in use for structural damage diagnostics of civil structures in the last couple of decades. While collecting vibration data from civil structures is a challenging and expensive task for both undamaged and damaged cases, in this paper, the authors are introducing Generative Adversarial Networks (GAN) that is built on the Deep Convolutional Neural Network (DCNN) and using Wasserstein Distance for generating artificial labelled data to be used for structural damage diagnostic purposes. The authors named the developed model 1D W-DCGAN and successfully generated vibration data which is very similar to the input. The methodology presented in this paper will pave the way for vibration data generation for numerous future applications in the SHM domain.
【4】 Leveraging Image-based Generative Adversarial Networks for Time Series Generation 标题:利用基于图像的生成性对抗性网络生成时间序列 链接:https://arxiv.org/abs/2112.08060
作者:Justin Hellermann,Stefan Lessmann 摘要:生成模型在采样质量、多样性和特征分离方面取得了巨大成功。时间序列的生成模型缺乏这些优点,因为缺少表示,它捕获时间动态并允许采样反转。本文提出了跨期返回图(IRP)表示,以便于使用基于图像的生成对抗网络生成时间序列。事实证明,该表示法在捕捉时间序列特征方面是有效的,与其他表示法相比,它具有可逆性和尺度不变性。经验基准证实了这些特征,并证明IRP使现成的带有梯度惩罚的Wasserstein GAN能够对真实时间序列进行采样,其性能优于基于RNN的专门GAN,同时降低了模型复杂性。 摘要:Generative models synthesize image data with great success regarding sampling quality, diversity and feature disentanglement. Generative models for time series lack these benefits due to a missing representation, which captures temporal dynamics and allows inversion for sampling. The paper proposes the intertemporal return plot (IRP) representation to facilitate the use of image-based generative adversarial networks for time series generation. The representation proves effective in capturing time series characteristics and, compared to alternative representations, benefits from invertibility and scale-invariance. Empirical benchmarks confirm these features and demonstrate that the IRP enables an off-the-shelf Wasserstein GAN with gradient penalty to sample realistic time series, which outperform a specialized RNN-based GAN, while simultaneously reducing model complexity.
【5】 Efficient Geometry-aware 3D Generative Adversarial Networks 标题:一种高效的几何感知3D生成性对抗网络 链接:https://arxiv.org/abs/2112.07945
作者:Eric R. Chan,Connor Z. Lin,Matthew A. Chan,Koki Nagano,Boxiao Pan,Shalini De Mello,Orazio Gallo,Leonidas Guibas,Jonathan Tremblay,Sameh Khamis,Tero Karras,Gordon Wetzstein 备注:Project page: this https URL 摘要:仅使用单视图2D照片集无监督生成高质量多视图一致图像和3D形状一直是一个长期的挑战。现有的3D GAN要么是计算密集型的,要么是不一致的近似值;前者限制了生成图像的质量和分辨率,后者对多视图一致性和形状质量产生不利影响。在这项工作中,我们在不过度依赖这些近似的情况下提高了3D GANs的计算效率和图像质量。为此,我们引入了一种表现型混合显式-隐式网络体系结构,它与其他设计选择一起,不仅实时合成高分辨率多视图一致性图像,而且还生成高质量的三维几何图形。通过解耦特征生成和神经渲染,我们的框架能够利用最先进的2D CNN生成器,如StyleGAN2,并继承它们的效率和表达能力。我们用FFHQ和AFHQ猫演示了最先进的3D感知合成,以及其他实验。 摘要:Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. For this purpose, we introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.
【6】 Tackling the Generative Learning Trilemma with Denoising Diffusion GANs 标题:用去噪扩散Gans方法解决生成性学习的三难问题 链接:https://arxiv.org/abs/2112.07804
作者:Zhisheng Xiao,Karsten Kreis,Arash Vahdat 摘要:在过去的十年中,已经开发出了各种各样的深层生成模型。然而,这些模型往往难以同时满足三个关键要求,包括:高样本质量、模式覆盖率和快速采样。我们将这些需求所带来的挑战称为生成性学习三重困境,因为现有的模型经常用其中的一些来换取另一些。特别是,去噪扩散模型显示了令人印象深刻的样本质量和多样性,但其昂贵的采样尚不允许在许多实际应用中应用。在本文中,我们认为这些模型中的慢采样基本上归因于去噪步骤中的高斯假设,这仅适用于小步长。为了实现大步长去噪,从而减少去噪步骤的总数,我们建议使用复杂的多峰分布对去噪分布进行建模。我们引入去噪扩散生成对抗网络(去噪扩散GAN),该网络使用多模态条件GAN对每个去噪步骤进行建模。通过广泛的评估,我们表明,去噪扩散GaN获得了与原始扩散模型竞争的样本质量和多样性,同时在CIFAR-10数据集上比原始扩散模型快2000$倍。与传统的GANs相比,我们的模型具有更好的模式覆盖率和样本多样性。据我们所知,去噪扩散GAN是第一个将扩散模型中的采样成本降低到能够以较低成本应用于实际应用的程度的模型。项目页面和代码:https://nvlabs.github.io/denoising-diffusion-gan 摘要:A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. In this paper, we argue that slow sampling in these models is fundamentally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to reduce the total number of denoising steps, we propose to model the denoising distribution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each denoising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity competitive with original diffusion models while being 2000$times$ faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively. Project page and code: https://nvlabs.github.io/denoising-diffusion-gan
半/弱/无/有监督|不确定性|主动学习(6篇)
【1】 Estimating Uncertainty For Vehicle Motion Prediction on Yandex Shifts Dataset 标题:基于Yandex位移数据集的车辆运动预测不确定性估计 链接:https://arxiv.org/abs/2112.08355
作者:Alexey Pustynnikov,Dmitry Eremeev 备注:Bayesian Deep Learning Workshop, NeurIPS 2021 摘要:环境智能体的运动预测是自主驾驶中的一项重要任务,因为它与驾驶员的安全密切相关。车辆运动预测(VMP)换档轨迹挑战的重点是开发对分布换档具有鲁棒性且能够测量其预测不确定性的模型。在这项工作中,我们提出了一种方法,该方法显著改进了基准测试,并在排行榜上排名第二。 摘要:Motion prediction of surrounding agents is an important task in context of autonomous driving since it is closely related to driver's safety. Vehicle Motion Prediction (VMP) track of Shifts Challenge focuses on developing models which are robust to distributional shift and able to measure uncertainty of their predictions. In this work we present the approach that significantly improved provided benchmark and took 2nd place on the leaderboard.
【2】 Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration 标题:利用自动无监督孤立点仲裁改进自监督学习 链接:https://arxiv.org/abs/2112.08132
作者:Yu Wang,Jingyang Lin,Jingjing Zou,Yingwei Pan,Ting Yao,Tao Mei 备注:NeurIPS 2021; Code is publicly available at: this https URL 摘要:我们的工作揭示了现有主流自监督学习方法的结构性缺陷。虽然自监督学习框架通常认为流行的完美实例级不变性假设是理所当然的,但我们仔细研究了其背后的陷阱。特别是,我们认为,现有的用于生成多个积极观点的增强管道自然会引入分布外(OOD)样本,从而破坏下游任务的学习。对输入产生不同的积极增强并不总是有利于下游任务。为了克服这个固有的缺陷,我们引入了一个轻量级的潜在变量模型UOTA,针对自监督学习的视图采样问题。UOTA自适应搜索最重要的采样区域以生成视图,并为离群点鲁棒自监督学习方法提供可行的选择。我们的方法直接推广到许多主流的自监督学习方法,不管损失的性质是否是对比的。我们的经验表明,UOTA比最先进的自我监督范式具有明显的优势,这很好地证明了现有方法中嵌入的OOD样本问题的存在。特别是,我们从理论上证明了该方案的优点归结为保证估计方差和偏差减少。代码可从以下网址获取:https://github.com/ssl-codelab/uota. 摘要:Our work reveals a structured shortcoming of the existing mainstream self-supervised learning methods. Whereas self-supervised learning frameworks usually take the prevailing perfect instance level invariance hypothesis for granted, we carefully investigate the pitfalls behind. Particularly, we argue that the existing augmentation pipeline for generating multiple positive views naturally introduces out-of-distribution (OOD) samples that undermine the learning of the downstream tasks. Generating diverse positive augmentations on the input does not always pay off in benefiting downstream tasks. To overcome this inherent deficiency, we introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning. UOTA adaptively searches for the most important sampling region to produce views, and provides viable choice for outlier-robust self-supervised learning approaches. Our method directly generalizes to many mainstream self-supervised learning approaches, regardless of the loss's nature contrastive or not. We empirically show UOTA's advantage over the state-of-the-art self-supervised paradigms with evident margin, which well justifies the existence of the OOD sample issue embedded in the existing approaches. Especially, we theoretically prove that the merits of the proposal boil down to guaranteed estimator variance and bias reduction. Code is available: at https://github.com/ssl-codelab/uota.
【3】 Towards General and Efficient Active Learning 标题:走向全面有效的主动学习 链接:https://arxiv.org/abs/2112.07963
作者:Yichen Xie,Masayoshi Tomizuka,Wei Zhan 摘要:主动学习旨在选择信息量最大的样本,以利用有限的注释预算。大多数现有工作都遵循一条繁琐的管道,分别在每个数据集上重复耗时的模型训练和批处理数据选择多次。本文提出了一种新的通用有效的主动学习(GEAL)方法,以挑战这一现状。利用一个在大数据集上预先训练的公共可用模型,我们的方法可以在不同数据集上进行数据选择过程,对同一模型进行单次推理。为了捕获图像内部的细微局部信息,我们提出了一种可以从预先训练的网络的中间特征中轻松提取的知识簇。与麻烦的批量选择策略不同,所有数据样本都是通过在细粒度知识集群级别执行K-中心贪婪一次性选择的。整个过程只需要单通道模型推理,无需训练或监督,使我们的方法在时间复杂度方面明显优于现有技术数百倍。大量的实验广泛地证明了我们的方法在目标检测、语义分割、深度估计和图像分类方面的良好性能。 摘要:Active learning aims to select the most informative samples to exploit limited annotation budgets. Most existing work follows a cumbersome pipeline by repeating the time-consuming model training and batch data selection multiple times on each dataset separately. We challenge this status quo by proposing a novel general and efficient active learning (GEAL) method in this paper. Utilizing a publicly available model pre-trained on a large dataset, our method can conduct data selection processes on different datasets with a single-pass inference of the same model. To capture the subtle local information inside images, we propose knowledge clusters that are easily extracted from the intermediate features of the pre-trained network. Instead of the troublesome batch selection strategy, all data samples are selected in one go by performing K-Center-Greedy in the fine-grained knowledge cluster level. The entire procedure only requires single-pass model inference without training or supervision, making our method notably superior to prior arts in terms of time complexity by up to hundreds of times. Extensive experiments widely demonstrate the promising performance of our method on object detection, semantic segmentation, depth estimation, and image classification.
【4】 Robust Depth Completion with Uncertainty-Driven Loss Functions 标题:具有不确定性驱动损失函数的鲁棒深度完井 链接:https://arxiv.org/abs/2112.07895
作者:Yufan Zhu,Weisheng Dong,Leida Li,Jinjian Wu,Xin Li,Guangming Shi 备注:accepted by AAAI2022 摘要:从稀疏激光雷达扫描中恢复密集深度图像是一项具有挑战性的任务。尽管颜色引导的稀疏到稠密深度补全方法非常流行,但它们在优化过程中平等地对待像素,忽略了稀疏深度图中的不均匀分布特征和合成地面真实值中累积的异常值。在这项工作中,我们引入了不确定性驱动的损失函数,以提高深度完井的鲁棒性,并处理深度完井中的不确定性。具体来说,我们提出了一个明确的不确定性公式,用于使用Jeffrey的先验知识进行稳健深度完井。引入参数不确定驱动损耗,并将其转化为新的损耗函数,该函数对噪声或缺失数据具有鲁棒性。同时,我们提出了一个多尺度联合预测模型,可以同时预测深度图和不确定性图。估计的不确定性映射还用于对具有高不确定性的像素执行自适应预测,从而产生用于细化完成结果的残差映射。我们的方法已经在KITTI深度完成基准上进行了测试,并在MAE、IMAE和IRMSE度量方面实现了最先进的鲁棒性性能。 摘要:Recovering a dense depth image from sparse LiDAR scans is a challenging task. Despite the popularity of color-guided methods for sparse-to-dense depth completion, they treated pixels equally during optimization, ignoring the uneven distribution characteristics in the sparse depth map and the accumulated outliers in the synthesized ground truth. In this work, we introduce uncertainty-driven loss functions to improve the robustness of depth completion and handle the uncertainty in depth completion. Specifically, we propose an explicit uncertainty formulation for robust depth completion with Jeffrey's prior. A parametric uncertain-driven loss is introduced and translated to new loss functions that are robust to noisy or missing data. Meanwhile, we propose a multiscale joint prediction model that can simultaneously predict depth and uncertainty maps. The estimated uncertainty map is also used to perform adaptive prediction on the pixels with high uncertainty, leading to a residual map for refining the completion results. Our method has been tested on KITTI Depth Completion Benchmark and achieved the state-of-the-art robustness performance in terms of MAE, IMAE, and IRMSE metrics.
【5】 Gaze Estimation with Eye Region Segmentation and Self-Supervised Multistream Learning 标题:基于眼区分割和自监督多数据流学习的视线估计 链接:https://arxiv.org/abs/2112.07878
作者:Zunayed Mahmud,Paul Hungler,Ali Etemad 备注:5 pages, 1 figure, 3 tables, Accepted in AAAI-22 Workshop on Human-Centric Self-Supervised Learning 摘要:我们提出了一种新的多流网络学习鲁棒眼睛表征凝视估计。我们首先使用模拟器创建一个合成数据集,其中包含详细描述可见眼球和虹膜的眼睛区域遮罩。然后,我们使用U-Net类型的模型执行眼睛区域分割,我们随后使用该模型为真实世界的眼睛图像生成眼睛区域遮罩。接下来,我们使用自监督对比学习在真实域中预训练眼睛图像编码器,以学习广义眼睛表征。最后,在我们的多流框架中,这个预训练的眼睛编码器,以及另外两个用于可见眼球区域和虹膜的编码器被并行使用,以从真实世界的图像中提取用于凝视估计的显著特征。我们在两种不同的评估设置下在Eyedip数据集上演示了我们的方法的性能,并实现了最先进的结果,优于该数据集上的所有现有基准。我们还进行了额外的实验,以验证我们的自监督网络对于用于训练的不同数量的标记数据的鲁棒性。 摘要:We present a novel multistream network that learns robust eye representations for gaze estimation. We first create a synthetic dataset containing eye region masks detailing the visible eyeball and iris using a simulator. We then perform eye region segmentation with a U-Net type model which we later use to generate eye region masks for real-world eye images. Next, we pretrain an eye image encoder in the real domain with self-supervised contrastive learning to learn generalized eye representations. Finally, this pretrained eye encoder, along with two additional encoders for visible eyeball region and iris, are used in parallel in our multistream framework to extract salient features for gaze estimation from real-world images. We demonstrate the performance of our method on the EYEDIAP dataset in two different evaluation settings and achieve state-of-the-art results, outperforming all the existing benchmarks on this dataset. We also conduct additional experiments to validate the robustness of our self-supervised network with respect to different amounts of labeled data used for training.
【6】 Deciphering antibody affinity maturation with language models and weakly supervised learning 标题:用语言模型和弱监督学习破译抗体亲和力成熟 链接:https://arxiv.org/abs/2112.07782
作者:Jeffrey A. Ruffolo,Jeffrey J. Gray,Jeremias Sulam 备注:Presented at Machine Learning for Structural Biology Workshop, NeurIPS 2021 摘要:为了应对病原体,适应性免疫系统产生特异性抗体,结合并中和外来抗原。了解个体免疫系统的组成可以深入了解这一过程并揭示潜在的治疗抗体。在这项工作中,我们探索抗体特异性语言模型的应用,以帮助理解免疫系统。我们介绍AntiBERTy,一种基于558M自然抗体序列训练的语言模型。我们发现,在基因库中,我们的模型将抗体聚集成类似亲和力成熟的轨迹。重要的是,我们展示了在多实例学习框架下训练用于预测高度冗余序列的模型识别过程中的关键结合残基。随着进一步的发展,本文提出的方法将为单从序列表序列研究抗原结合提供新的见解。 摘要:In response to pathogens, the adaptive immune system generates specific antibodies that bind and neutralize foreign antigens. Understanding the composition of an individual's immune repertoire can provide insights into this process and reveal potential therapeutic antibodies. In this work, we explore the application of antibody-specific language models to aid understanding of immune repertoires. We introduce AntiBERTy, a language model trained on 558M natural antibody sequences. We find that within repertoires, our model clusters antibodies into trajectories resembling affinity maturation. Importantly, we show that models trained to predict highly redundant sequences under a multiple instance learning framework identify key binding residues in the process. With further development, the methods presented here will provide new insights into antigen binding from repertoire sequences alone.
迁移|Zero/Few/One-Shot|自适应(6篇)
【1】 Hierarchical Variational Memory for Few-shot Learning Across Domains 标题:跨域少发学习的分层变分记忆 链接:https://arxiv.org/abs/2112.08181
作者:Yingjun Du,Xiantong Zhen,Ling Shao,Cees G. M. Snoek 备注:17 pages, 5 figures 摘要:神经记忆只需少量训练样本,即可快速适应新任务。现有的内存模型只存储最后一层的特征,这在训练和测试分布之间存在域转移的情况下不能很好地推广。我们提出了一种分层替代方案,以不同的语义级别存储特征,而不是依赖于平面内存。我们引入了一个分层原型模型,其中原型的每一层都从分层内存中获取相应的信息。该模型能够根据领域转移的情况灵活地依赖不同语义层次的特征。我们通过一个新衍生的分层变分推理框架对模型进行元学习,其中分层存储和原型被联合优化。为了探索和利用不同语义层次的重要性,我们进一步建议以数据驱动的方式在每个层次学习与原型相关的权重,这使得模型能够自适应地选择最具普遍性的特征。我们进行了彻底的烧蚀研究,以证明模型中每个组件的有效性。最新的跨域性能和传统Few-Shot分类的竞争性能进一步证实了分层变分记忆的优势。 摘要:Neural memory enables fast adaptation to new tasks with just a few training samples. Existing memory models store features only from the single last layer, which does not generalize well in presence of a domain shift between training and test distributions. Rather than relying on a flat memory, we propose a hierarchical alternative that stores features at different semantic levels. We introduce a hierarchical prototype model, where each level of the prototype fetches corresponding information from the hierarchical memory. The model is endowed with the ability to flexibly rely on features at different semantic levels if the domain shift circumstances so demand. We meta-learn the model by a newly derived hierarchical variational inference framework, where hierarchical memory and prototypes are jointly optimized. To explore and exploit the importance of different semantic levels, we further propose to learn the weights associated with the prototype at each level in a data-driven way, which enables the model to adaptively choose the most generalizable features. We conduct thorough ablation studies to demonstrate the effectiveness of each component in our model. The new state-of-the-art performance on cross-domain and competitive performance on traditional few-shot classification further substantiates the benefit of hierarchical variational memory.
【2】 AMSER: Adaptive Multi-modal Sensing for Energy Efficient and Resilient eHealth Systems 标题:AMSER:适用于高能效和高弹性电子健康系统的自适应多模式感知 链接:https://arxiv.org/abs/2112.08176
作者:Emad Kasaeyan Naeini,Sina Shahhosseini,Anil Kanduri,Pasi Liljeberg,Amir M. Rahmani,Nikil Dutt 摘要:电子健康系统通过持续监测生理和环境数据,为用户提供关键的数字医疗保健和健康服务。eHealth应用程序使用多模式机器学习内核来分析来自不同传感器模式的数据并自动化决策。感官数据采集过程中的噪声输入和运动伪影会影响i)电子健康服务的预测准确性和弹性,以及ii)处理垃圾数据的能效。监测原始感官输入,以识别和删除噪声模式中的数据和特征,可以提高预测精度和能源效率。我们提出了一个用于多模式电子健康应用的闭环监控框架AMSER,该框架可通过i)监控输入模式,ii)分析原始输入,有选择地丢弃噪声数据和特征,减少垃圾中的垃圾,以及iii)选择适合配置数据和特征向量的适当机器学习模型,以提高预测精度和能源效率。我们评估了我们的AMSER方法,通过不同传感器模式产生的不同水平和类型的噪声成分,使用疼痛评估和压力监测的多模式电子健康应用。与最先进的多模式监测应用相比,我们的方法在预测精度上提高了22%,在传感阶段能耗降低了5.6$倍。 摘要:eHealth systems deliver critical digital healthcare and wellness services for users by continuously monitoring physiological and contextual data. eHealth applications use multi-modal machine learning kernels to analyze data from different sensor modalities and automate decision-making. Noisy inputs and motion artifacts during sensory data acquisition affect the i) prediction accuracy and resilience of eHealth services and ii) energy efficiency in processing garbage data. Monitoring raw sensory inputs to identify and drop data and features from noisy modalities can improve prediction accuracy and energy efficiency. We propose a closed-loop monitoring and control framework for multi-modal eHealth applications, AMSER, that can mitigate garbage-in garbage-out by i) monitoring input modalities, ii) analyzing raw input to selectively drop noisy data and features, and iii) choosing appropriate machine learning models that fit the configured data and feature vector - to improve prediction accuracy and energy efficiency. We evaluate our AMSER approach using multi-modal eHealth applications of pain assessment and stress monitoring over different levels and types of noisy components incurred via different sensor modalities. Our approach achieves up to 22% improvement in prediction accuracy and 5.6$times$ energy consumption reduction in the sensing phase against the state-of-the-art multi-modal monitoring application.
【3】 Chimpanzee voice prints? Insights from transfer learning experiments from human voices 标题:黑猩猩的声纹?来自人声的迁移学习实验的启示 链接:https://arxiv.org/abs/2112.08165
作者:Mael Leroux,Orestes Gutierrez Al-Khudhairy,Nicolas Perony,Simon W. Townsend 摘要:在动物界,个体声音差异无处不在。在人类中,这些差异遍及整个声乐曲目,并构成“声纹”。类人猿是我们现存的近亲,在特定的呼叫类型中拥有个体特征,但对其形成独特声纹的可能性研究甚少。这部分归因于从小型数据集中提取有意义特征的局限性。机器学习的进步突出了传统声学特征的另一种选择,即预先训练的学习提取器。在这里,我们提出了一种基于这些发展的方法:利用一个基于对10000多个人类声纹进行训练的深层神经网络的特征提取器,提供一个信息空间,在这个空间中我们可以识别黑猩猩的声纹。我们将我们的结果与通过使用传统声学特征获得的结果进行比较,并讨论我们的方法的益处以及我们的发现对于非人类动物“声纹”识别的意义。 摘要:Individual vocal differences are ubiquitous in the animal kingdom. In humans, these differences pervade the entire vocal repertoire and constitute a "voice print". Apes, our closest-living relatives, possess individual signatures within specific call types, but the potential for a unique voice print has been little investigated. This is partially attributed to the limitations associated with extracting meaningful features from small data sets. Advances in machine learning have highlighted an alternative to traditional acoustic features, namely pre-trained learnt extractors. Here, we present an approach building on these developments: leveraging a feature extractor based on a deep neural network trained on over 10,000 human voice prints to provide an informative space over which we identify chimpanzee voice prints. We compare our results with those obtained by using traditional acoustic features and discuss the benefits of our methodology and the significance of our findings for the identification of "voice prints" in non-human animals.
【4】 MissMarple : A Novel Socio-inspired Feature-transfer Learning Deep Network for Image Splicing Detection 标题:MissMarple:一种新颖的基于社会启发的特征迁移学习深度网络图像拼接检测方法 链接:https://arxiv.org/abs/2112.08018
作者:Angelina L. Gokhale,Dhanya Pramod,Sudeep D. Thepade,Ravi Kulkarni 备注:27 pages, 6 figures and 15 tables 摘要:在本文中,我们提出了一种新的社会启发卷积神经网络(CNN)图像拼接检测深度学习模型。基于从粗拼接图像区域的检测中学习可以提高视觉上不可察觉的精细拼接图像伪造的检测的前提,所提出的模型称为MissMarple,是一个包含特征转移学习的双CNN网络。使用基准数据集(如Columbia Splicting、WildWeb、DSO1)和一个名为AbhAS的拟议数据集(包含真实拼接伪造)对拟议模型进行训练和测试,结果表明,与现有深度学习模型相比,检测精度有所提高。 摘要:In this paper we propose a novel socio-inspired convolutional neural network (CNN) deep learning model for image splicing detection. Based on the premise that learning from the detection of coarsely spliced image regions can improve the detection of visually imperceptible finely spliced image forgeries, the proposed model referred to as, MissMarple, is a twin CNN network involving feature-transfer learning. Results obtained from training and testing the proposed model using the benchmark datasets like Columbia splicing, WildWeb, DSO1 and a proposed dataset titled AbhAS consisting of realistic splicing forgeries revealed improvement in detection accuracy over the existing deep learning models.
【5】 Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data 标题:基于查询学习的弱标签数据零发音源分离 链接:https://arxiv.org/abs/2112.07891
作者:Ke Chen,Xingjian Du,Bilei Zhu,Zejun Ma,Taylor Berg-kirkpatrick,Shlomo Dubnov 备注:9 pages, 3 figures, 5 tables, preprint version for Association for the Advancement of Artificial Intelligence Conference, AAAI 2022 摘要:将音频分离为不同声源的深度学习技术面临若干挑战。标准体系结构要求针对不同类型的音频源训练不同的模型。尽管一些通用分离器采用单一模型来定位多个源,但它们很难推广到不可见源。在本文中,我们提出了一个三分量管道来从一个大型但标记较弱的数据集:AudioSet中训练通用音频源分离器。首先,我们提出了一个基于Transformer的声音事件检测系统,用于处理弱标记的训练数据。其次,我们设计了一个基于查询的音频分离模型,该模型利用这些数据进行模型训练。第三,我们设计了一个潜在的嵌入处理器来对指定音频目标进行分离的查询进行编码,从而实现Zero-Shot泛化。我们的方法使用单一模型来分离多种声音类型的源,并且仅依赖弱标记数据进行训练。此外,建议的音频分离器可用于Zero-Shot设置,学习分离训练中从未见过的音频源类型。为了评估分离性能,我们在MUSDB18上测试了我们的模型,同时在不相交的音频集上进行了训练。我们通过对训练中保留的音频源类型进行另一个实验,进一步验证了零炮性能。在这两种情况下,该模型的源失真比(SDR)性能与当前监督模型相当。 摘要:Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.
【6】 Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning 标题:基于模型的安全强化学习的保守自适应惩罚 链接:https://arxiv.org/abs/2112.07701
作者:Yecheng Jason Ma,Andrew Shen,Osbert Bastani,Dinesh Jayaraman 备注:AAAI 2022 摘要:现实世界中的强化学习(RL)代理除了最大化奖励目标外,还必须满足安全约束。基于模型的RL算法有望减少不安全的现实行为:它们可以使用学习模型中的模拟样本合成遵守所有约束的策略。然而,不完善的模型可能会导致违反现实世界中的约束,即使对于预测满足所有约束的操作也是如此。我们提出了一种基于模型的安全RL框架&保守和自适应惩罚(CAP),该框架通过捕获模型不确定性并自适应地利用它来平衡报酬和成本目标,从而解释潜在的建模错误。首先,CAP使用基于不确定性的惩罚来膨胀预测成本。理论上,我们证明了满足这种保守成本约束的政策在真实环境中也是可行的。我们进一步证明,这保证了RL训练期间所有中间溶液的安全性。此外,CAP在训练期间使用来自环境的真实成本反馈自适应调整此惩罚。我们在基于状态和图像的环境中广泛评估了这种基于模型的安全RL的保守和自适应惩罚方法。我们的结果表明,与以前的安全RL算法相比,在样本效率方面有了很大的提高,同时产生的违规行为也更少。代码可从以下网址获取:https://github.com/Redrew/CAP 摘要:Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment. We evaluate this conservative and adaptive penalty-based approach for model-based safe RL extensively on state and image-based environments. Our results demonstrate substantial gains in sample-efficiency while incurring fewer violations than prior safe RL algorithms. Code is available at: https://github.com/Redrew/CAP
强化学习(3篇)
【1】 Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning 标题:基于行为克隆的贝叶斯优化强化学习算法超参数自动调整 链接:https://arxiv.org/abs/2112.08094
作者:Juan Cruz Barsce,Jorge A. Palombarini,Ernesto C. Martínez 备注:Under review at Computational Intelligence 摘要:机器学习算法中若干超参数的优化设置是充分利用现有数据的关键。为此,人们提出了进化策略、随机搜索、贝叶斯优化和启发式经验规则等方法。在强化学习(RL)中,学习代理在与其环境交互时收集的数据的信息内容严重依赖于许多超参数的设置。因此,RL算法的用户必须依赖于基于搜索的优化方法,例如网格搜索或Nelder-Mead单纯形算法,这些方法对于大多数RL任务来说效率非常低,显著减慢了学习曲线,并将故意偏倚数据收集的负担留给用户。在这项工作中,为了使RL算法更加独立于用户,提出了一种新的基于贝叶斯优化的超参数自主设置方法。通过执行行为克隆,在元学习层面上使用来自过去事件和不同超参数值的数据,这有助于提高获取函数强化学习变量最大化的有效性。此外,通过在强化学习代理设计中紧密集成贝叶斯优化,减少了收敛到给定任务的最优策略所需的状态转换次数。与其他基于手动调整和优化的方法相比,计算实验显示了有希望的结果,这些方法强调了更改算法超参数以增加生成数据的信息含量的好处。 摘要:Optimal setting of several hyper-parameters in machine learning algorithms is key to make the most of available data. To this aim, several methods such as evolutionary strategies, random search, Bayesian optimization and heuristic rules of thumb have been proposed. In reinforcement learning (RL), the information content of data gathered by the learning agent while interacting with its environment is heavily dependent on the setting of many hyper-parameters. Therefore, the user of an RL algorithm has to rely on search-based optimization methods, such as grid search or the Nelder-Mead simplex algorithm, that are very inefficient for most RL tasks, slows down significantly the learning curve and leaves to the user the burden of purposefully biasing data gathering. In this work, in order to make an RL algorithm more user-independent, a novel approach for autonomous hyper-parameter setting using Bayesian optimization is proposed. Data from past episodes and different hyper-parameter values are used at a meta-learning level by performing behavioral cloning which helps improving the effectiveness in maximizing a reinforcement learning variant of an acquisition function. Also, by tightly integrating Bayesian optimization in a reinforcement learning agent design, the number of state transitions needed to converge to the optimal policy for a given task is reduced. Computational experiments reveal promising results compared to other manual tweaking and optimization-based approaches which highlights the benefits of changing the algorithm hyper-parameters to increase the information content of generated data.
【2】 Representation and Invariance in Reinforcement Learning 标题:强化学习中的表征与不变性 链接:https://arxiv.org/abs/2112.07752
作者:Samuel Alexander,Arthur Paul Pedersen 备注:28 pages 摘要:如果我们改变规则,聪明人会和蠢人交换位置吗?不同的团体以不同的方式将强化学习(RL)形式化。如果一个RL形式化中的代理要在另一个RL形式化的环境中运行,则必须首先转换或映射该代理。任何此类映射的充分性标准是,它保留了相对智能。本文研究了这一充分性标准的公式和性质。然而,在表述问题之前,我们认为,是比较智力的问题。我们使用超滤器来比较情报,这种超滤器的动机是将代理人视为情报选举中的候选人,而在选举中,选民是受环境影响的。这些比较器是违反直觉的,但我们证明了一个关于RL智能测量的不可能定理,表明这种违反直觉是不可避免的。给定RL框架之间的映射,我们建立了充分条件,以确保对于目标框架中的任何基于超滤器的智能比较器,源框架中存在基于超滤器的智能比较器,以便映射保持相对智能。我们考虑三个不同的RL框架之间的具体映射,并表明他们满足这些充分条件,因此保持适当测量的相对情报。 摘要:If we changed the rules, would the wise trade places with the fools? Different groups formalize reinforcement learning (RL) in different ways. If an agent in one RL formalization is to run within another RL formalization's environment, the agent must first be converted, or mapped. A criterion of adequacy for any such mapping is that it preserves relative intelligence. This paper investigates the formulation and properties of this criterion of adequacy. However, prior to the problem of formulation is, we argue, the problem of comparative intelligence. We compare intelligence using ultrafilters, motivated by viewing agents as candidates in intelligence elections where voters are environments. These comparators are counterintuitive, but we prove an impossibility theorem about RL intelligence measurement, suggesting such counterintuitions are unavoidable. Given a mapping between RL frameworks, we establish sufficient conditions to ensure that, for any ultrafilter-based intelligence comparator in the destination framework, there exists an ultrafilter-based intelligence comparator in the source framework such that the mapping preserves relative intelligence. We consider three concrete mappings between various RL frameworks and show that they satisfy these sufficient conditions and therefore preserve suitably-measured relative intelligence.
【3】 CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning 标题:CEM-GD:基于模型强化学习的交叉熵梯度下降规划方法 链接:https://arxiv.org/abs/2112.07746
作者:Kevin Huang,Sahin Lale,Ugo Rosolia,Yuanyuan Shi,Anima Anandkumar 摘要:当前最先进的基于模型的强化学习算法使用轨迹采样方法,如交叉熵法(CEM),在连续控制设置中进行规划。这些零阶优化器需要对大量的轨迹卷展进行采样,以选择最佳动作,对于较大的预测范围或高维动作空间,该动作的伸缩性较差。一阶方法使用与动作相关的奖励梯度作为更新,可以缓解此问题,但由于非凸优化环境,会出现局部最优。为了克服这些问题并实现两个方面的最佳效果,我们提出了一种新的规划器,即梯度下降交叉熵法(CEM-GD),它将一阶方法与CEM相结合。在执行开始时,CEM-GD使用CEM对大量的轨迹展开进行采样,以探索优化环境并避免出现局部极小值。然后,它使用顶部轨迹作为梯度下降的初始化,并对每个轨迹应用梯度更新,以找到最佳动作序列。然而,在随后的每个时间步,CEM-GD在应用梯度更新之前从CEM中采样的轨迹要少得多。我们表明,随着规划问题维数的增加,CEM-GD通过使用梯度信息在恒定的小样本数下保持理想的性能,同时使用初始良好采样轨迹避免局部最优。此外,CEM-GD在MuJoCo的各种连续控制基准上实现了比CEM更好的性能,每个时间步的采样数减少了100倍,从而减少了约25%的计算时间和10%的内存使用。CEM-GD的实施可从$href获得{https://github.com/KevinHuang8/CEM-GD}{text{https://github.com/KevinHuang8/CEM-GD}}$. 摘要:Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces. First-order methods that use the gradients of the rewards with respect to the actions as an update can mitigate this issue, but suffer from local optima due to the non-convex optimization landscape. To overcome these issues and achieve the best of both worlds, we propose a novel planner, Cross-Entropy Method with Gradient Descent (CEM-GD), that combines first-order methods with CEM. At the beginning of execution, CEM-GD uses CEM to sample a significant amount of trajectory rollouts to explore the optimization landscape and avoid poor local minima. It then uses the top trajectories as initialization for gradient descent and applies gradient updates to each of these trajectories to find the optimal action sequence. At each subsequent time step, however, CEM-GD samples much fewer trajectories from CEM before applying gradient updates. We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples by using the gradient information, while avoiding local optima using initially well-sampled trajectories. Furthermore, CEM-GD achieves better performance than CEM on a variety of continuous control benchmarks in MuJoCo with 100x fewer samples per time step, resulting in around 25% less computation time and 10% less memory usage. The implementation of CEM-GD is available at $href{https://github.com/KevinHuang8/CEM-GD}{text{https://github.com/KevinHuang8/CEM-GD}}$.
医学相关(5篇)
【1】 A Comparative Analysis of Machine Learning Approaches for Automated Face Mask Detection During COVID-19 标题:用于冠状病毒自动口罩检测的机器学习方法的比较分析 链接:https://arxiv.org/abs/2112.07913
作者:Junaed Younus Khan,Md Abdullah Al Alamin 摘要:世界卫生组织(WHO)建议戴口罩作为防止COVID-19传播的最有效措施之一。在许多国家,现在必须戴口罩,特别是在公共场所。由于面罩的手动监控在人群中间常常是不可行的,因此自动检测是有益的。为了促进这一点,我们探索了许多用于面罩检测的深度学习模型(即VGG1、VGG19、ResNet50),并在两个基准数据集上对其进行了评估。在此背景下,我们还评估了迁移学习(即VGG19、在ImageNet上预先训练的ResNet50)。我们发现,虽然所有模型的性能都很好,但迁移学习模型的性能最好。迁移学习可提高绩效0.10%--0.40%,训练时间减少30%。我们的实验还表明,对于测试数据集来自不同分布的真实情况,这些高性能模型并不十分健壮。如果不进行任何微调,这些模型在跨域设置下的性能将下降47%。 摘要:The World Health Organization (WHO) has recommended wearing face masks as one of the most effective measures to prevent COVID-19 transmission. In many countries, it is now mandatory to wear face masks, specially in public places. Since manual monitoring of face masks is often infeasible in the middle of the crowd, automatic detection can be beneficial. To facilitate that, we explored a number of deep learning models (i.e., VGG1, VGG19, ResNet50) for face-mask detection and evaluated them on two benchmark datasets. We also evaluated transfer learning (i.e., VGG19, ResNet50 pre-trained on ImageNet) in this context. We find that while the performances of all the models are quite good, transfer learning models achieve the best performance. Transfer learning improves the performance by 0.10%--0.40% with 30% less training time. Our experiment also shows these high-performing models are not quite robust for real-world cases where the test dataset comes from a different distribution. Without any fine-tuning, the performance of these models drops by 47% in cross-domain settings.
【2】 Investigating myocardial infarction and its effects in patients with urgent medical problems using advanced data mining tools 标题:利用先进的数据挖掘工具调查急诊患者的心肌梗死及其影响 链接:https://arxiv.org/abs/2112.07890
作者:Tanya Aghazadeh,Mostafa Bagheri 摘要:在医学中,收集不同疾病的多种数据是非常重要的,数据的最重要目的之一是调查疾病。心肌梗死是死亡率的一个严重危险因素,在以前的研究中,主要侧重于心脏病患者,并通过人口学特征、超声心动图和心电图测量他们发生心肌梗死的可能性。相比之下,本研究的目的是利用数据分析算法并比较其在心脏病发作患者中的准确性,以便通过考虑急诊手术来确定心肌梗死期间的心肌强度,从而预测心肌梗死。为此,通过数据分析的分类技术(包括随机决策林、决策树、,支持向量机(SVM)、k-最近邻和顺序逻辑回归。最后,在平均评价指标方面,选择精度为76%的随机决策林模型作为最佳模型。此外,肌酸磷酸激酶试验的七个特征、尿素、白细胞和红细胞计数、血糖、时间和血红蛋白被确定为射血分数变量的最有效特征。 摘要:In medical science, it is very important to gather multiple data on different diseases and one of the most important objectives of the data is to investigate the diseases. Myocardial infarction is a serious risk factor in mortality and in previous studies, the main emphasis has been on people with heart disease and measuring the likelihood of myocardial infarction in them through demographic features, echocardiography, and electrocardiogram. In contrast, the purpose of the present study is to utilize data analysis algorithms and compare their accuracy in patients with a heart attack in order to identify the heart muscle strength during myocardial infarction by taking into account emergency operations and consequently predict myocardial infarction. For this purpose, 105 medical records of myocardial infarction patients with fourteen features including age, the time of emergency operation, Creatine Phosphokinase (CPK) test, heart rate, blood sugar, and vein are gathered and investigated through classification techniques of data analysis including random decision forests, decision tree, support vector machine (SVM), k-nearest neighbor, and ordinal logistic regression. Finally, the model of random decision forests with an accuracy of 76% is selected as the best model in terms of the mean evaluation indicator. Also, seven features of the creatine Phosphokinase test, urea, white and red blood cell count, blood sugar, time, and hemoglobin are identified as the most effective features of the ejection fraction variable.
【3】 Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing 标题:面向生物医学自然语言处理的大型神经语言模型微调 链接:https://arxiv.org/abs/2112.07869
作者:Robert Tinn,Hao Cheng,Yu Gu,Naoto Usuyama,Xiaodong Liu,Tristan Naumann,Jianfeng Gao,Hoifung Poon 摘要:动机:生物医学研究人员和临床从业者面临的一个长期挑战是跟上出版物和医学笔记的快速增长。自然语言处理(NLP)已成为抑制信息过载的一个有希望的方向。特别是,大型神经语言模型通过对未标记文本进行预训练来促进迁移学习,BERT模型在各种NLP应用中的成功就是一个例子。然而,为最终任务微调此类模型仍然具有挑战性,特别是对于小型标记数据集,这在生物医学NLP中很常见。结果:我们对生物医学NLP的微调稳定性进行了系统研究。我们表明,微调性能可能对预训练设置敏感,尤其是在低资源域中。大型模型有可能获得更好的性能,但模型尺寸的增加也会加剧微调不稳定性。因此,我们对解决微调不稳定性的技术进行了全面的探索。我们表明,这些技术可以大大提高低资源生物医学NLP应用程序的微调性能。具体而言,冻结下层有助于标准的BERT-BASE模型,而分层衰减对BERT-LARGE和ELECTRA模型更有效。对于bioses等低资源文本相似性任务,重新初始化顶层是最佳策略。总的来说,特定领域的词汇表和预训练有助于更健壮的模型进行微调。基于这些发现,我们在广泛的生物医学NLP应用领域建立了新的技术水平。可用性和实施:为了促进生物医学NLP的进展,我们发布了最先进的预训练和微调模型:https://aka.ms/BLURB. 摘要:Motivation: A perennial challenge for biomedical researchers and clinical practitioners is to stay abreast with the rapid growth of publications and medical notes. Natural language processing (NLP) has emerged as a promising direction for taming information overload. In particular, large neural language models facilitate transfer learning by pretraining on unlabeled text, as exemplified by the successes of BERT models in various NLP applications. However, fine-tuning such models for an end task remains challenging, especially with small labeled datasets, which are common in biomedical NLP. Results: We conduct a systematic study on fine-tuning stability in biomedical NLP. We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains. Large models have potential to attain better performance, but increasing model size also exacerbates finetuning instability. We thus conduct a comprehensive exploration of techniques for addressing fine-tuning instability. We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications. Specifically, freezing lower layers is helpful for standard BERT-BASE models, while layerwise decay is more effective for BERT-LARGE and ELECTRA models. For low-resource text similarity tasks such as BIOSSES, reinitializing the top layer is the optimal strategy. Overall, domainspecific vocabulary and pretraining facilitate more robust models for fine-tuning. Based on these findings, we establish new state of the art on a wide range of biomedical NLP applications. Availability and implementation: To facilitate progress in biomedical NLP, we release our state-of-the-art pretrained and fine-tuned models: https://aka.ms/BLURB.
【4】 CentSmoothie: Central-Smoothing Hypergraph Neural Networks for Predicting Drug-Drug Interactions 标题:CentSmoothie:预测药物相互作用的中心平滑超图神经网络 链接:https://arxiv.org/abs/2112.07837
作者:Duc Anh Nguyen,Canh Hao Nguyen,Hiroshi Mamitsuka 摘要:预测药物-药物相互作用(DDI)是利用药物信息和多对药物的已知副作用来预测一对药物的副作用(不想要的结果)的问题。这个问题可以表述为预测DDI图中每对节点的标签(即副作用),其中节点是药物,边是与已知标签相互作用的药物。解决这个问题的最先进的方法是图神经网络(GNN),它利用图中的邻域信息来学习节点表示。然而,对于DDI,由于副作用的性质,有许多标签具有复杂的关系。通常的GNN通常将标签固定为一个热向量,不反映标签关系,并且在罕见标签的困难情况下可能无法获得最高性能。在本文中,我们将DDI描述为一个超图,其中每个超边是一个三元组:两个节点用于药物,一个节点用于标签。然后,我们介绍了CentSmoothie,这是一种超图神经网络,它使用一种新的中心平滑公式来学习节点和标签的表示。我们在模拟和真实数据集上通过实证证明了CentSmoothie的性能优势。 摘要:Predicting drug-drug interactions (DDI) is the problem of predicting side effects (unwanted outcomes) of a pair of drugs using drug information and known side effects of many pairs. This problem can be formulated as predicting labels (i.e. side effects) for each pair of nodes in a DDI graph, of which nodes are drugs and edges are interacting drugs with known labels. State-of-the-art methods for this problem are graph neural networks (GNNs), which leverage neighborhood information in the graph to learn node representations. For DDI, however, there are many labels with complicated relationships due to the nature of side effects. Usual GNNs often fix labels as one-hot vectors that do not reflect label relationships and potentially do not obtain the highest performance in the difficult cases of infrequent labels. In this paper, we formulate DDI as a hypergraph where each hyperedge is a triple: two nodes for drugs and one node for a label. We then present CentSmoothie, a hypergraph neural network that learns representations of nodes and labels altogether with a novel central-smoothing formulation. We empirically demonstrate the performance advantages of CentSmoothie in simulations as well as real datasets.
【5】 Energy-Efficient Real-Time Heart Monitoring on Edge-Fog-Cloud Internet-of-Medical-Things 标题:边缘雾云医疗物联网的节能实时心脏监护 链接:https://arxiv.org/abs/2112.07901
作者:Berken Utku Demirel,Islam Abdelsalam Bayoumy,Mohammad Abdullah Al Faruque 备注:To be appeared at IEEE Internet of Things (IoT) Journal 摘要:可穿戴设备和医疗物联网(IoMT)的最新发展允许实时监测和记录心电图(ECG)信号。然而,由于能量和内存的限制,在低功耗可穿戴设备中,ECG信号的连续监测具有挑战性。因此,在本文中,我们提出了一种新的节能方法,用于低功耗可穿戴设备的心脏持续监测。该方法由三层组成:1)噪声/伪影检测层,用于对心电信号的质量进行分级;2) 正常/异常搏动分类层用于检测ECG信号中的异常,以及3)异常搏动分类层用于检测ECG信号中的疾病。此外,采用分布式多输出卷积神经网络(CNN)结构来降低边缘雾/云之间的能量消耗和延迟。我们的方法在著名的MIT-BIH心律失常数据集上的准确率达到99.2%。对实际硬件的评估表明,我们的方法适用于最小RAM为32KB的设备。此外,与最先进的工程相比,所提出的方法实现了7美元的能源效率。 摘要:The recent developments in wearable devices and the Internet of Medical Things (IoMT) allow real-time monitoring and recording of electrocardiogram (ECG) signals. However, continuous monitoring of ECG signals is challenging in low-power wearable devices due to energy and memory constraints. Therefore, in this paper, we present a novel and energy-efficient methodology for continuously monitoring the heart for low-power wearable devices. The proposed methodology is composed of three different layers: 1) a Noise/Artifact detection layer to grade the quality of the ECG signals; 2) a Normal/Abnormal beat classification layer to detect the anomalies in the ECG signals, and 3) an Abnormal beat classification layer to detect diseases from ECG signals. Moreover, a distributed multi-output Convolutional Neural Network (CNN) architecture is used to decrease the energy consumption and latency between the edge-fog/cloud. Our methodology reaches an accuracy of 99.2% on the well-known MIT-BIH Arrhythmia dataset. Evaluation on real hardware shows that our methodology is suitable for devices having a minimum RAM of 32KB. Moreover, the proposed methodology achieves $7times$ more energy efficiency compared to state-of-the-art works.
蒸馏|知识提取(1篇)
【1】 GenIE: Generative Information Extraction 标题:GENIE:产生式信息抽取 链接:https://arxiv.org/abs/2112.08340
作者:Martin Josifoski,Nicola De Cao,Maxime Peyrard,Robert West 摘要:文本的结构化和扎根表示通常通过封闭信息提取进行形式化,即从知识库模式中提取与预定义的一组实体和关系一致的详尽的(主题、关系、对象)三元组。大多数现有工作都是容易累积错误的管道,所有方法都只适用于不现实的少量实体和关系。我们介绍GenIE(生成信息提取),这是封闭信息提取的第一个端到端自回归公式。GenIE通过以文本形式自动回归生成关系和实体,自然地利用预先训练好的转换器中的语言知识。由于新的双层约束生成策略,只生成与预定义的知识库模式一致的三元组。我们的实验表明,GenIE在封闭信息提取方面是最先进的,从比基线更少的训练数据点进行概括,并扩展到以前无法管理的大量实体和关系。通过这项工作,封闭式信息提取在现实场景中变得切实可行,为下游任务提供了新的机会。最后,这项工作为信息提取核心任务的统一端到端方法铺平了道路。代码和型号可在https://github.com/epfl-dlab/GenIE. 摘要:Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code and models available at https://github.com/epfl-dlab/GenIE.
推荐(1篇)
【1】 EDAssistant: Supporting Exploratory Data Analysis in Computational Notebooks with In-Situ Code Search and Recommendation 标题:EDAssistant:通过原位代码搜索和推荐支持计算笔记本中的探索性数据分析 链接:https://arxiv.org/abs/2112.07858
作者:Xingjun Li,Yizhi Zhang,Justin Leung,Chengnian Sun,Jian Zhao 摘要:数据科学家使用计算笔记本(如Jupyter笔记本),根据他们先前的经验和外部知识(如在线示例),合理化他们的探索性数据分析(EDA)。对于缺乏关于数据集或问题的特定知识的新手或数据科学家来说,有效地获取和理解外部信息对于执行EDA至关重要。本文介绍了eAssistant,这是一个JupyterLab扩展,它支持EDA,并通过搜索结果的新颖交互式可视化支持对示例笔记本进行现场搜索和推荐有用的API。代码搜索和推荐由最先进的机器学习模型实现,这些模型在在线收集的大量EDA笔记本上进行训练。用户研究旨在调查电子助手和数据科学家的当前实践(即使用外部搜索引擎)。结果证明了EDAssistant的有效性和实用性,参与者赞赏它对EDA的顺利和上下文支持。我们还报告了关于代码推荐工具的几个设计含义。 摘要:Using computational notebooks (e.g., Jupyter Notebook), data scientists rationalize their exploratory data analysis (EDA) based on their prior experience and external knowledge such as online examples. For novices or data scientists who lack specific knowledge about the dataset or problem to investigate, effectively obtaining and understanding the external information is critical to carry out EDA. This paper presents EDAssistant, a JupyterLab extension that supports EDA with in-situ search of example notebooks and recommendation of useful APIs, powered by novel interactive visualization of search results. The code search and recommendation are enabled by state-of-the-art machine learning models, trained on a large corpus of EDA notebooks collected online. A user study is conducted to investigate both EDAssistant and data scientists' current practice (i.e., using external search engines). The results demonstrate the effectiveness and usefulness of EDAssistant, and participants appreciated its smooth and in-context support of EDA. We also report several design implications regarding code recommendation tools.
联邦学习|隐私保护|加密(1篇)
【1】 Blockchain-enabled Server-less Federated Learning 标题:启用区块链的无服务器联合学习 链接:https://arxiv.org/abs/2112.07938
作者:Francesc Wilhelmi,Lorenza Giupponi,Paolo Dini 摘要:受参与大规模联合学习(FL)优化的设备的异构性的推动,我们专注于一个由区块链(BC)技术授权的异步无服务器FL解决方案。与大多数采用的假设同步操作的FL方法不同,我们提倡一种异步方法,即在客户端提交其本地更新时进行模型聚合。异步设置非常适合使用异构客户端的实际大规模设置中的联邦优化思想。因此,就通信开销和空闲周期而言,它可能导致更高的效率。为了评估支持BC的FL的学习完成延迟,我们提供了一个基于批处理服务队列理论的分析模型。此外,我们还提供了仿真结果来评估同步和异步机制的性能。结合并分析了BC-enabled FL优化中涉及的重要方面,如网络大小、链路容量或用户需求。结果表明,同步设置比异步设置具有更高的预测精度。尽管如此,异步联合优化在许多情况下提供了更低的延迟,因此在处理大型数据集、严格的时间限制(例如,近实时应用程序)或高度变化的训练数据时,它成为一种极具吸引力的FL解决方案。 摘要:Motivated by the heterogeneous nature of devices participating in large-scale Federated Learning (FL) optimization, we focus on an asynchronous server-less FL solution empowered by Blockchain (BC) technology. In contrast to mostly adopted FL approaches, which assume synchronous operation, we advocate an asynchronous method whereby model aggregation is done as clients submit their local updates. The asynchronous setting fits well with the federated optimization idea in practical large-scale settings with heterogeneous clients. Thus, it potentially leads to higher efficiency in terms of communication overhead and idle periods. To evaluate the learning completion delay of BC-enabled FL, we provide an analytical model based on batch service queue theory. Furthermore, we provide simulation results to assess the performance of both synchronous and asynchronous mechanisms. Important aspects involved in the BC-enabled FL optimization, such as the network size, link capacity, or user requirements, are put together and analyzed. As our results show, the synchronous setting leads to higher prediction accuracy than the asynchronous case. Nevertheless, asynchronous federated optimization provides much lower latency in many cases, thus becoming an appealing FL solution when dealing with large data sets, tough timing constraints (e.g., near-real-time applications), or highly varying training data.
推理|分析|理解|解释(6篇)
【1】 Interactive Visualization and Representation Analysis Applied to Glacier Segmentation 标题:交互式可视化与表示分析在冰川分割中的应用 链接:https://arxiv.org/abs/2112.08184
作者:Minxing Zheng,Xinran Miao,Kris Sankaran 备注:14 pages, 10 figures 摘要:在地球观测问题中,可解释性越来越受到关注。我们应用交互式可视化和表示分析来指导冰川分割模型的解释。我们将U-Net的激活可视化,以了解和评估模型性能。我们使用Shinny R软件包构建了一个在线界面,以提供预测的全面错误分析。用户可以与面板交互并发现模型故障模式。此外,我们还讨论了可视化如何在数据预处理和模型训练期间提供健全性检查。 摘要:Interpretability has attracted increasing attention in earth observation problems. We apply interactive visualization and representation analysis to guide interpretation of glacier segmentation models. We visualize the activations from a U-Net to understand and evaluate the model performance. We build an online interface using the Shiny R package to provide comprehensive error analysis of the predictions. Users can interact with the panels and discover model failure modes. Further, we discuss how visualization can provide sanity checks during data preprocessing and model training.
【2】 Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints 标题:带实时推理约束的标点符号预测掩码结合解码分类方法 链接:https://arxiv.org/abs/2112.08098
作者:Christoph Minixhofer,Ondřej Klejch,Peter Bell 备注:4 pages, 3 figures, to appear in ICASSP2022 摘要:在这项工作中,我们在一个框架中统一了几种现有的标点预测解码策略,并引入了一种新的策略,该策略在不同的窗口中利用每个单词的多个预测。我们表明,在训练模型后,通过优化这些策略可以实现显著的改进,只会导致推理时间的潜在增加,而不需要再训练。我们进一步使用我们的解码策略框架首次比较了实时环境中标点预测的标记和分类方法。我们的结果表明,当很少或没有右侧上下文可用时,标点预测的分类方法是有益的。 摘要:In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows. We show that significant improvements can be achieved by optimising these strategies after training a model, only leading to a potential increase in inference time, with no requirement for retraining. We further use our decoding strategy framework for the first comparison of tagging and classification approaches for punctuation prediction in a real-time setting. Our results show that a classification approach for punctuation prediction can be beneficial when little or no right-side context is available.
【3】 Head Matters: Explainable Human-centered Trait Prediction from Head Motion Dynamics 标题:头部问题:从头部运动动力学看可解释的以人为中心的特质预测 链接:https://arxiv.org/abs/2112.08068
作者:Surbhi Madan,Monika Gahalawat,Tanaya Guha,Ramanathan Subramanian 备注:10 pages, 10 figures, 6 tables. This paper is published in ICMI 2021 摘要:我们展示了称为kinemes的基本头部运动单元在行为分析中预测个性和面试特征的效用。将头部运动模式转换为一系列运动序列有助于发现表征目标特征的潜在时间特征,从而实现高效和可解释的特征预测。利用Kinemes和面部动作编码系统(FACS)功能预测(a)第一印象候选人筛选视频中的海洋性格特征,以及(b)麻省理工学院数据集上的面试特征,我们注意到:(1)使用kineme序列训练的长-短期记忆(LSTM)网络的性能优于或类似于使用人脸图像训练的卷积神经网络(CNN);(2) 将FACS动作单元(AU)与kinemes相结合可实现准确的预测和解释,(3)预测性能受观察头部和面部运动的时间长度的影响。 摘要:We demonstrate the utility of elementary head-motion units termed kinemes for behavioral analytics to predict personality and interview traits. Transforming head-motion patterns into a sequence of kinemes facilitates discovery of latent temporal signatures characterizing the targeted traits, thereby enabling both efficient and explainable trait prediction. Utilizing Kinemes and Facial Action Coding System (FACS) features to predict (a) OCEAN personality traits on the First Impressions Candidate Screening videos, and (b) Interview traits on the MIT dataset, we note that: (1) A Long-Short Term Memory (LSTM) network trained with kineme sequences performs better than or similar to a Convolutional Neural Network (CNN) trained with facial images; (2) Accurate predictions and explanations are achieved on combining FACS action units (AUs) with kinemes, and (3) Prediction performance is affected by the time-length over which head and facial movements are observed.
【4】 Ten years of image analysis and machine learning competitions in dementia 标题:痴呆症图像分析和机器学习竞赛十年 链接:https://arxiv.org/abs/2112.07922
作者:Esther E. Bron,Stefan Klein,Annika Reinke,Janne M. Papma,Lena Maier-Hein,Daniel C. Alexander,Neil P. Oxtoby 备注:12 pages, 4 tables 摘要:利用多参数生物标记物的机器学习方法,特别是基于神经成像的机器学习方法,在改善痴呆症的早期诊断和预测哪些个体有患痴呆症的风险方面具有巨大潜力。为了对痴呆症机器学习和神经成像领域的算法进行基准测试,并评估其在临床实践和临床试验中的应用潜力,在过去十年中组织了七项重大挑战:MIRIAD、阿尔茨海默病大数据梦、CADD痴呆症、机器学习挑战、MCI神经成像、蝌蚪、,以及预测分析比赛。基于两个挑战评估框架,我们分析了这些重大挑战在研究问题、数据集、验证方法、结果和影响方面是如何相互补充的。七大挑战解决了与(临床前)痴呆症筛查、诊断、预测和监测相关的问题。临床问题、任务和绩效指标几乎没有重叠。虽然这样做的优点是能够洞察广泛的问题,但它也限制了跨挑战结果的验证。通常,获胜算法执行严格的数据预处理,并结合广泛的输入特征。尽管具有高水平的最新性能,但挑战评估的大多数方法并未在临床上使用。为了增加影响,未来的挑战可能会更多地关注哪些因素(即特征、模型)与更高性能相关的统计分析,关注阿尔茨海默病以外的临床问题,以及使用阿尔茨海默病神经成像计划以外的测试数据。鉴于过去十年的潜力和经验教训,我们对未来十年及以后机器学习和神经成像领域的巨大挑战前景感到兴奋。 摘要:Machine learning methods exploiting multi-parametric biomarkers, especially based on neuroimaging, have huge potential to improve early diagnosis of dementia and to predict which individuals are at-risk of developing dementia. To benchmark algorithms in the field of machine learning and neuroimaging in dementia and assess their potential for use in clinical practice and clinical trials, seven grand challenges have been organized in the last decade: MIRIAD, Alzheimer's Disease Big Data DREAM, CADDementia, Machine Learning Challenge, MCI Neuroimaging, TADPOLE, and the Predictive Analytics Competition. Based on two challenge evaluation frameworks, we analyzed how these grand challenges are complementing each other regarding research questions, datasets, validation approaches, results and impact. The seven grand challenges addressed questions related to screening, diagnosis, prediction and monitoring in (pre-clinical) dementia. There was little overlap in clinical questions, tasks and performance metrics. Whereas this has the advantage of providing insight on a broad range of questions, it also limits the validation of results across challenges. In general, winning algorithms performed rigorous data pre-processing and combined a wide range of input features. Despite high state-of-the-art performances, most of the methods evaluated by the challenges are not clinically used. To increase impact, future challenges could pay more attention to statistical analysis of which factors (i.e., features, models) relate to higher performance, to clinical questions beyond Alzheimer's disease, and to using testing data beyond the Alzheimer's Disease Neuroimaging Initiative. Given the potential and lessons learned in the past ten years, we are excited by the prospects of grand challenges in machine learning and neuroimaging for the next ten years and beyond.
【5】 Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games 标题:随机对策分散Q-学习的有限样本分析 链接:https://arxiv.org/abs/2112.07859
作者:Zuguang Gao,Qianqian Ma,Tamer Başar,John R. Birge 摘要:随机博弈中的学习可以说是多智能体强化学习(MARL)中最标准和最基本的设置。本文研究了非渐近状态下随机博弈中的分散MARL问题。特别地,我们在一类重要的一般和随机对策(SGs)-弱非循环SGs中建立了完全分散Q-学习算法的有限样本复杂度,其中包括对所有代理具有相同报酬的公共合作MARL设置(马尔可夫团队问题)作为特例。我们专注于完全分散的MARL的实际而富有挑战性的设置,其中每个代理都无法观察到其他代理的回报或行为。事实上,每个代理都完全不知道其他决策者的存在。同时考虑了表格和线性函数近似情况。在表格环境下,我们分析了分散Q-学习算法收敛到马尔可夫完美均衡(纳什均衡)的样本复杂性。使用线性函数近似,结果是收敛到线性近似均衡——我们提出的一个新的均衡概念——它描述了每个代理的策略是线性空间中(对其他代理的)最佳回复。数值实验也提供了两种设置来证明结果。 摘要:Learning in stochastic games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL). In this paper, we consider decentralized MARL in stochastic games in the non-asymptotic regime. In particular, we establish the finite-sample complexity of fully decentralized Q-learning algorithms in a significant class of general-sum stochastic games (SGs) - weakly acyclic SGs, which includes the common cooperative MARL setting with an identical reward to all agents (a Markov team problem) as a special case. We focus on the practical while challenging setting of fully decentralized MARL, where neither the rewards nor the actions of other agents can be observed by each agent. In fact, each agent is completely oblivious to the presence of other decision makers. Both the tabular and the linear function approximation cases have been considered. In the tabular setting, we analyze the sample complexity for the decentralized Q-learning algorithm to converge to a Markov perfect equilibrium (Nash equilibrium). With linear function approximation, the results are for convergence to a linear approximated equilibrium - a new notion of equilibrium that we propose - which describes that each agent's policy is a best reply (to other agents) within a linear space. Numerical experiments are also provided for both settings to demonstrate the results.
【6】 Understanding Feature Transfer Through Representation Alignment 标题:通过制图表达对齐了解要素传递 链接:https://arxiv.org/abs/2112.07806
作者:Ehsan Imani,Wei Hu,Martha White 备注:13 pages, 16 figures 摘要:与随机标签相比,使用数据集的真实标签进行训练可以更快地优化和更好的泛化。这种差异归因于自然数据集中输入和标签之间的对齐概念。我们发现,在随机或真实标签上训练具有不同结构和优化器的神经网络,使隐藏表示和训练标签之间的关系保持一致,从而解释了为什么神经网络表示在传输方面如此成功。我们首先强调对齐特征促进迁移的原因,并在一个经典的综合迁移问题中表明对齐是对相似和不同任务进行正迁移和负迁移的决定因素。然后,我们研究了各种神经网络体系结构,发现(a)在各种不同的体系结构和优化器中出现对齐,深度对齐产生更多对齐(b)更接近输出的层的对齐增加,(c)现有的高性能深层CNN显示出高水平的对齐。 摘要:Training with the true labels of a dataset as opposed to randomized labels leads to faster optimization and better generalization. This difference is attributed to a notion of alignment between inputs and labels in natural datasets. We find that training neural networks with different architectures and optimizers on random or true labels enforces the same relationship between the hidden representations and the training labels, elucidating why neural network representations have been so successful for transfer. We first highlight why aligned features promote transfer and show in a classic synthetic transfer problem that alignment is the determining factor for positive and negative transfer to similar and dissimilar tasks. We then investigate a variety of neural network architectures and find that (a) alignment emerges across a variety of different architectures and optimizers, with more alignment arising from depth (b) alignment increases for layers closer to the output and (c) existing high-performance deep CNNs exhibit high levels of alignment.
分类|识别(4篇)
【1】 Fast characterization of inducible regions of atrial fibrillation models with multi-fidelity Gaussian process classification 标题:基于多保真高斯过程分类的心房颤动模型诱发区的快速表征 链接:https://arxiv.org/abs/2112.08075
作者:Lia Gandera,Simone Pezzutoa,Ali Gharaviri,Rolf Krause,Paris Perdikaris,Francisco Sahli Costabal 备注:22 pages, 7 figures 摘要:心房颤动的计算模型已成功用于预测最佳消融部位。评估消融模式效果的一个关键步骤是从不同的、可能是随机的位置对模型进行配速,以确定心房是否会诱发心律失常。在这项工作中,我们建议使用黎曼流形上的多保真度高斯过程分类来有效地确定心房中可诱发心律失常的区域。我们建立了一个直接作用于心房表面的概率分类器。我们利用低分辨率模型探索心房表面,并与高分辨率模型无缝结合以识别可诱导区域。当使用40个样本进行训练时,我们的多保真度分类器显示出平衡精度,比用作基线心房颤动模型的最近邻分类器高10%,在存在心房颤动伴消融时高9%。我们希望这项新技术将使房颤计算模型的临床应用更快、更精确。 摘要:Computational models of atrial fibrillation have successfully been used to predict optimal ablation sites. A critical step to assess the effect of an ablation pattern is to pace the model from different, potentially random, locations to determine whether arrhythmias can be induced in the atria. In this work, we propose to use multi-fidelity Gaussian process classification on Riemannian manifolds to efficiently determine the regions in the atria where arrhythmias are inducible. We build a probabilistic classifier that operates directly on the atrial surface. We take advantage of lower resolution models to explore the atrial surface and combine seamlessly with high-resolution models to identify regions of inducibility. When trained with 40 samples, our multi-fidelity classifier shows a balanced accuracy that is 10% higher than a nearest neighbor classifier used as a baseline atrial fibrillation model, and 9% higher in presence of atrial fibrillation with ablations. We hope that this new technique will allow faster and more precise clinical applications of computational models for atrial fibrillation.
【2】 A learning-based approach to feature recognition of Engineering shapes 标题:一种基于学习的工程形状特征识别方法 链接:https://arxiv.org/abs/2112.07962
作者:Lakshmi Priya Muraleedharan,Ramanathan Muthuganapathy 摘要:在本文中,我们提出了一种机器学习方法来识别CAD网格模型中的孔、槽等工程形状特征。随着数字存档、3D打印、零部件扫描和逆向工程等新制造技术的出现,CAD数据以网格模型表示的形式大量增加。随着网格模型中节点和边的数量增加以及存在噪声的可能性,基于图的方法的直接应用不仅成本高昂,而且难以针对噪声数据进行调整。因此,这就需要为以网格形式表示的CAD模型设计新的特征识别方法。这里,我们展示了一个离散版本的高斯映射可以作为特征学习的特征。我们表明,这种方法不仅需要更少的内存需求,而且训练时间也非常少。由于不涉及网络体系结构,超参数的数量要少得多,并且可以在更快的时间内进行调整。识别精度也非常类似于使用三维卷积神经网络(CNN)获得的识别精度,但运行时间和存储要求要少得多。与其他非网络机器学习方法进行了比较,结果表明我们的方法具有最高的准确性。我们还展示了从公共基准测试中获得的具有多个特征以及复杂/交互特征的CAD模型的识别结果。处理噪声数据的能力也得到了证明。 摘要:In this paper, we propose a machine learning approach to recognise engineering shape features such as holes, slots, etc. in a CAD mesh model. With the advent of digital archiving, newer manufacturing techniques such as 3D printing, scanning of components and reverse engineering, CAD data is proliferated in the form of mesh model representation. As the number of nodes and edges become larger in a mesh model as well as the possibility of presence of noise, direct application of graph-based approaches would not only be expensive but also difficult to be tuned for noisy data. Hence, this calls for newer approaches to be devised for feature recognition for CAD models represented in the form of mesh. Here, we show that a discrete version of Gauss map can be used as a signature for a feature learning. We show that this approach not only requires fewer memory requirements but also the training time is quite less. As no network architecture is involved, the number of hyperparameters are much lesser and can be tuned in a much faster time. The recognition accuracy is also very similar to that of the one obtained using 3D convolutional neural networks (CNN) but in much lesser running time and storage requirements. A comparison has been done with other non-network based machine learning approaches to show that our approach has the highest accuracy. We also show the recognition results for CAD models having multiple features as well as complex/interacting features obtained from public benchmarks. The ability to handle noisy data has also been demonstrated.
【3】 Nonlinear Discrete-time Systems' Identification without Persistence of Excitation: A Finite-time Concurrent Learning 标题:无激励持续的非线性离散系统辨识:有限时间并行学习 链接:https://arxiv.org/abs/2112.07765
作者:Farzaneh Tatari,Chiristos Panayiotou,Marios Polycarpou 摘要:研究了未知离散非线性系统动力学的有限时间学习问题,该问题不要求激励的持续性。提出了一种有限时间并行学习方法,通过使用当前数据和记录的经验数据,在线逼近离散时间非线性系统的不确定性,该方法在记录数据的丰富度上满足易于检查的秩条件,与持久性相比限制性更小激励条件的变化。严格的证明保证了基于离散时间Lyapunov分析的估计参数在有限时间内收敛到其最优值。仿真结果表明,与文献中已有的方法相比,该方法能够及时、准确地逼近不确定性。 摘要:This paper deals with the problem of finite-time learning for unknown discrete-time nonlinear systems' dynamics, without the requirement of the persistence of excitation. A finite-time concurrent learning approach is presented to approximate the uncertainties of the discrete-time nonlinear systems in an on-line fashion by employing current data along with recorded experienced data satisfying an easy-to-check rank condition on the richness of the recorded data which is less restrictive in comparison with persistence of excitation condition. Rigorous proofs guarantee the finite-time convergence of the estimated parameters to their optimal values based on a discrete-time Lyapunov analysis. Compared with the existing work in the literature, simulation results illustrate that the proposed method can timely and precisely approximate the uncertainties.
【4】 Robust Neural Network Classification via Double Regularization 标题:基于双重正则化的鲁棒神经网络分类 链接:https://arxiv.org/abs/2112.08102
作者:Olof Zetterqvist,Rebecka Jörnsten,Johan Jonasson 备注:23 pages, 12 figures 摘要:在统计学和机器学习中,数据中存在错误标记的观测值是一个臭名昭著的挑战性问题,这与传统分类器以及(也许更重要的是)像神经网络这样的灵活分类器的泛化性能较差有关。在这里,我们提出了一种新的神经网络训练损失的双重正则化方法,它结合了对分类模型复杂性的惩罚和对训练观测值的最优重新加权。组合惩罚提高了泛化性能,对不同错误标记训练数据设置下的过度拟合以及训练时初始参数值的变化具有很强的鲁棒性。我们提供了一个理论上的理由,我们提出的方法推导了一个简单的情况下的逻辑回归。我们展示了双正则化模型,这里用DRFit表示,用于(i)MNIST和(ii)CIFAR-10的神经网络分类,在这两种情况下都有模拟的错误标记。我们还说明了DRFit能够以非常高的精度识别错误标记的数据点。这为DRFit作为货架分类器的实用性提供了强有力的支持,因为在不牺牲任何性能的情况下,我们得到了一个分类器,该分类器可以同时减少过度拟合以防止标签错误,并提供标签可信度的准确度量。 摘要:The presence of mislabeled observations in data is a notoriously challenging problem in statistics and machine learning, associated with poor generalization properties for both traditional classifiers and, perhaps even more so, flexible classifiers like neural networks. Here we propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations. The combined penalties result in improved generalization properties and strong robustness against overfitting in different settings of mislabeled training data and also against variation in initial parameter values when training. We provide a theoretical justification for our proposed method derived for a simple case of logistic regression. We demonstrate the double regularization model, here denoted by DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling. We also illustrate that DRFit identifies mislabeled data points with very good precision. This provides strong support for DRFit as a practical of-the-shelf classifier, since, without any sacrifice in performance, we get a classifier that simultaneously reduces overfitting against mislabeling and gives an accurate measure of the trustworthiness of the labels.
优化|敛散性(5篇)
【1】 Predicting the utility of search spaces for black-box optimization:a simple, budget-aware approach 标题:预测搜索空间对黑盒优化的效用:一种简单、预算敏感的方法 链接:https://arxiv.org/abs/2112.08250
作者:Setareh Ariafar,Justin Gilmer,Zack Nado,Jasper Snoek,Rodolphe Jenatton,George E. Dahl 摘要:黑盒优化需要指定搜索空间以探索解决方案,例如,d维紧凑空间,这种选择对于以合理的预算获得最佳结果至关重要。不幸的是,在许多应用程序中,确定高质量的搜索空间可能是一项挑战。例如,在预算有限的新问题上调整机器学习管道的超参数时,必须在排除潜在有希望的区域和保持足够小的搜索空间以便于处理之间取得平衡。这项工作的目标是通过调整深度神经网络的示例应用来激发预测以预算为条件的搜索空间质量的问题,以及提供一种基于应用于概率响应面模型的效用函数的简单评分方法,类似于贝叶斯优化。我们表明,我们提出的方法可以在各种情况下计算有意义的预算条件分数。我们还提供了实验证据,证明准确的分数在构建和修剪搜索空间时是有用的。最终,我们认为,在深度学习的实验工作流程中,搜索空间评分应该成为标准做法。 摘要:Black box optimization requires specifying a search space to explore for solutions, e.g. a d-dimensional compact space, and this choice is critical for getting the best results at a reasonable budget. Unfortunately, determining a high quality search space can be challenging in many applications. For example, when tuning hyperparameters for machine learning pipelines on a new problem given a limited budget, one must strike a balance between excluding potentially promising regions and keeping the search space small enough to be tractable. The goal of this work is to motivate -- through example applications in tuning deep neural networks -- the problem of predicting the quality of search spaces conditioned on budgets, as well as to provide a simple scoring method based on a utility function applied to a probabilistic response surface model, similar to Bayesian optimization. We show that the method we present can compute meaningful budget-conditional scores in a variety of situations. We also provide experimental evidence that accurate scores can be useful in constructing and pruning search spaces. Ultimately, we believe scoring search spaces should become standard practice in the experimental workflow for deep learning.
【2】 Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations 标题:椭圆型偏微分方程深度算子网络的指数收敛性 链接:https://arxiv.org/abs/2112.08125
作者:Carlo Marcati,Christoph Schwab 摘要:我们在无限维空间之间构造了深度算子网络(ONets),该网络以指数收敛速度模拟椭圆二阶偏微分方程的系数到解映射。特别地,我们考虑在$D $维周期域中设置的问题,D=1, 2,DOTS $,并用解析右手边和系数。我们的分析包括扩散反应问题、参数扩散方程和椭圆系统,如非均质材料中的线性各向同性弹性静力学。对于解析解的边值问题,我们利用谱配置方法的指数收敛性。在目前的周期和解析环境中,这是从经典的椭圆正则性出发的。在[Chen and Chen,1993]和[Lu et al.,2021]的ONet分支和主干结构中,我们证明了深ONet的存在,其模拟系数到解映射到$H^1$范数中的精度$varepsilon>0$,均匀地覆盖系数集。我们证明了ONet中的神经网络的大小为$mathcal{O}(left | log(varepsilon)right | ^kappa)$,取决于物理空间维度,某些$kappa>0$。 摘要:We construct deep operator networks (ONets) between infinite-dimensional spaces that emulate with an exponential rate of convergence the coefficient-to-solution map of elliptic second-order PDEs. In particular, we consider problems set in $d$-dimensional periodic domains, $d=1, 2, dots$, and with analytic right-hand sides and coefficients. Our analysis covers diffusion-reaction problems, parametric diffusion equations, and elliptic systems such as linear isotropic elastostatics in heterogeneous materials. We leverage the exponential convergence of spectral collocation methods for boundary value problems whose solutions are analytic. In the present periodic and analytic setting, this follows from classical elliptic regularity. Within the ONet branch and trunk construction of [Chen and Chen, 1993] and of [Lu et al., 2021], we show the existence of deep ONets which emulate the coefficient-to-solution map to accuracy $varepsilon>0$ in the $H^1$ norm, uniformly over the coefficient set. We prove that the neural networks in the ONet have size $mathcal{O}(left|log(varepsilon)right|^kappa)$ for some $kappa>0$ depending on the physical space dimension.
【3】 Ensuring DNN Solution Feasibility for Optimization Problems with Convex Constraints and Its Application to DC Optimal Power Flow Problems 标题:保证凸约束优化问题DNN解的可行性及其在直流最优潮流问题中的应用 链接:https://arxiv.org/abs/2112.08091
作者:Tianyu Zhao,Xiang Pan,Minghua Chen,Steven H. Low 备注:43pages, 9 figures. In submission 摘要:由于DNN固有的预测误差,确保解决方案的可行性是开发用于解决约束优化问题的深层神经网络(DNN)方案的关键挑战。在本文中,我们提出了一个“预防性学习”框架来系统地保证具有凸约束和一般目标函数问题的DNN解的可行性。我们首先应用预测和重构设计,不仅保证等式约束,而且利用它们来减少DNN预测的变量数量。然后,作为一个关键的方法论贡献,我们系统地校准了DNN训练中使用的不等式约束,从而预测预测误差并确保得到的解仍然可行。我们描述了足以确保普遍可行性的校准量级和DNN大小。我们提出了一种新的对手样本感知训练算法,在不牺牲可行性保证的情况下提高了DNN的最优性。总体而言,该框架提供了两个DNN。第一种是通过描述足够的DNN大小来保证普遍可行性,而另一种是通过所提出的训练算法来进一步提高最优性,同时保持DNN的普遍可行性。我们应用预防性学习框架开发了DeepOPF ,用于解决电网运行中的基本直流最优潮流问题。它改进了现有的基于DNN的方案,确保了可行性,并在轻负载和重负载情况下获得一致的理想加速性能。在IEEE Case-30/118/300测试用例上的仿真结果表明,与最先进的迭代求解器相比,DeepOPF 生成了$100%$可行解,具有$<$0.5%的优化损失和高达两个数量级的计算加速。 摘要:Ensuring solution feasibility is a key challenge in developing Deep Neural Network (DNN) schemes for solving constrained optimization problems, due to inherent DNN prediction errors. In this paper, we propose a "preventive learning'" framework to systematically guarantee DNN solution feasibility for problems with convex constraints and general objective functions. We first apply a predict-and-reconstruct design to not only guarantee equality constraints but also exploit them to reduce the number of variables to be predicted by DNN. Then, as a key methodological contribution, we systematically calibrate inequality constraints used in DNN training, thereby anticipating prediction errors and ensuring the resulting solutions remain feasible. We characterize the calibration magnitudes and the DNN size sufficient for ensuring universal feasibility. We propose a new Adversary-Sample Aware training algorithm to improve DNN's optimality performance without sacrificing feasibility guarantee. Overall, the framework provides two DNNs. The first one from characterizing the sufficient DNN size can guarantee universal feasibility while the other from the proposed training algorithm further improves optimality and maintains DNN's universal feasibility simultaneously. We apply the preventive learning framework to develop DeepOPF for solving the essential DC optimal power flow problem in grid operation. It improves over existing DNN-based schemes in ensuring feasibility and attaining consistent desirable speedup performance in both light-load and heavy-load regimes. Simulation results over IEEE Case-30/118/300 test cases show that DeepOPF generates $100%$ feasible solutions with $<$0.5% optimality loss and up to two orders of magnitude computational speedup, as compared to a state-of-the-art iterative solver.
【4】 Optimal Latent Space Forecasting for Large Collections of Short Time Series Using Temporal Matrix Factorization 标题:基于时间矩阵分解的大样本短时间序列最优潜在空间预测 链接:https://arxiv.org/abs/2112.08052
作者:Himanshi Charotia,Abhishek Garg,Gaurav Dhama,Naman Maheshwari 摘要:在时间序列预测的背景下,通常的做法是评估多种方法并选择其中一种方法或一个集合来生成最佳预测。然而,在不同的集合中选择多种方法仍然是一项具有挑战性的任务,随着方法数量的增加,这项任务将经历组合爆炸。在需求预测或收入预测方面,由于业务环境不断变化,大量时间序列以及可用的有限历史数据点进一步加剧了这一挑战。尽管深度学习预测方法旨在同时预测大量时间序列,但由于可用的历史有限,它们在此类场景中的应用变得很有挑战性,并且可能不会产生理想的结果。我们提出了一个预测高维短时间序列数据的框架,该框架将低秩时间矩阵分解和使用交叉验证的潜在时间序列的最优模型选择相结合。我们证明,与直接对时间序列应用不同的单变量模型相比,预测潜在因素可以显著提高性能。性能已在M4月度数据集的截断版本上验证,该数据集包含来自多个域的时间序列数据,显示了该方法的普遍适用性。此外,由于潜在因素的数量较少,因此在将预测方法直接应用于高维数据集时通常是不切实际的,因此有利于纳入分析师对未来的看法。 摘要:In the context of time series forecasting, it is a common practice to evaluate multiple methods and choose one of these methods or an ensemble for producing the best forecasts. However, choosing among different ensembles over multiple methods remains a challenging task that undergoes a combinatorial explosion as the number of methods increases. In the context of demand forecasting or revenue forecasting, this challenge is further exacerbated by a large number of time series as well as limited historical data points available due to changing business context. Although deep learning forecasting methods aim to simultaneously forecast large collections of time series, they become challenging to apply in such scenarios due to the limited history available and might not yield desirable results. We propose a framework for forecasting short high-dimensional time series data by combining low-rank temporal matrix factorization and optimal model selection on latent time series using cross-validation. We demonstrate that forecasting the latent factors leads to significant performance gains as compared to directly applying different uni-variate models on time series. Performance has been validated on a truncated version of the M4 monthly dataset which contains time series data from multiple domains showing the general applicability of the method. Moreover, it is amenable to incorporating the analyst view of the future owing to the low number of latent factors which is usually impractical when applying forecasting methods directly to high dimensional datasets.
【5】 LoSAC: An Efficient Local Stochastic Average Control Method for Federated Optimization 标题:LOSAC:一种高效的联邦优化局部随机平均控制方法 链接:https://arxiv.org/abs/2112.07839
作者:Huiming Chen,Huandong Wang,Quanming Yao,Yong Li,Depeng Jin,Qiang Yang 摘要:联邦优化(Federated optimization,FedOpt)旨在跨大量分布式客户机协作训练学习模型,对于联邦学习至关重要。FedOpt中的主要关注点可归因于模型发散和通信效率,这会显著影响性能。在本文中,我们提出了一种新的方法,即LoSAC,以更有效地从异构分布式数据中学习。它的关键算法是在{each}常规局部模型更新后局部更新全局全梯度的估计。因此,LoSAC可以以更紧凑的方式更新客户机的信息。特别地,我们研究了LoSAC的收敛结果。此外,LoSAC的优点是能够保护信息泄漏免受最新技术深度泄漏梯度(DLG)的影响。最后,通过实验验证了LoSAC算法与现有FedOpt算法相比的优越性。具体而言,LoSAC显著提高了通信效率,平均提高100%$美元以上,缓解了模型分歧问题,并配备了DLG防御能力。 摘要:Federated optimization (FedOpt), which targets at collaboratively training a learning model across a large number of distributed clients, is vital for federated learning. The primary concerns in FedOpt can be attributed to the model divergence and communication efficiency, which significantly affect the performance. In this paper, we propose a new method, i.e., LoSAC, to learn from heterogeneous distributed data more efficiently. Its key algorithmic insight is to locally update the estimate for the global full gradient after {each} regular local model update. Thus, LoSAC can keep clients' information refreshed in a more compact way. In particular, we have studied the convergence result for LoSAC. Besides, the bonus of LoSAC is the ability to defend the information leakage from the recent technique Deep Leakage Gradients (DLG). Finally, experiments have verified the superiority of LoSAC comparing with state-of-the-art FedOpt algorithms. Specifically, LoSAC significantly improves communication efficiency by more than $100%$ on average, mitigates the model divergence problem and equips with the defense ability against DLG.
预测|估计(6篇)
【1】 Disparities in Social Determinants among Performances of Mortality Prediction with Machine Learning for Sepsis Patients 标题:脓毒症患者病死率机器学习预测的社会影响因素差异 链接:https://arxiv.org/abs/2112.08224
作者:Hanyin Wang,Yikuan Li,Andrew Naidech,Yuan Luo 摘要:背景在美国,脓毒症是危重病人最具生命威胁的情况之一,而脓毒症鉴定的标准化标准仍在制定中。脓毒症患者社会决定因素的差异可能会干扰使用机器学习的风险预测性能。方法采用森林样地法,对6项可用脓毒症标准确定的患者在种族、性别、婚姻状况、保险类型和语言等社会决定因素上的差异进行揭示。对16个机器学习分类器进行训练,以预测脓毒症患者的住院死亡率。在整个随机进行的测试集上测试训练模型的性能,并根据以下每个社会决定因素建立每个子群体:种族、性别、婚姻状况、保险类型和语言。结果我们分析了Mick-III数据库中的11791名重症监护患者。在每种脓毒症鉴定方法鉴定的人群中,观察到亚人群在种族、婚姻状况、保险类型和语言方面存在显著差异。在根据脓毒症-3标准确定的5783名脓毒症患者中,当对亚裔和西班牙裔患者应用经过训练的机器学习模型时,观察到死亡率预测的统计显著下降。通过两两比较,我们发现亚洲人和白人患者、亚洲人和其他种族的患者以及说英语和西班牙语的患者在死亡率预测方面存在差异。结论在不同的社会决定因素组中,不同脓毒症标准确定的患者比例存在差异。为了实现准确的诊断,需要一个多功能的脓毒症诊断系统来克服患者的社会决定因素差异。 摘要:Background Sepsis is one of the most life-threatening circumstances for critically ill patients in the US, while a standardized criteria for sepsis identification is still under development. Disparities in social determinants of sepsis patients can interfere with the risk prediction performances using machine learning. Methods Disparities in social determinants, including race, gender, marital status, insurance types and languages, among patients identified by six available sepsis criteria were revealed by forest plots. Sixteen machine learning classifiers were trained to predict in-hospital mortality for sepsis patients. The performance of the trained model was tested on the entire randomly conducted test set and each sub-population built based on each of the following social determinants: race, gender, marital status, insurance type, and language. Results We analyzed a total of 11,791 critical care patients from the MIMIC-III database. Within the population identified by each sepsis identification method, significant differences were observed among sub-populations regarding race, marital status, insurance type, and language. On the 5,783 sepsis patients identified by the Sepsis-3 criteria statistically significant performance decreases for mortality prediction were observed when applying the trained machine learning model on Asian and Hispanic patients. With pairwise comparison, we detected performance discrepancies in mortality prediction between Asian and White patients, Asians and patients of other races, as well as English-speaking and Spanish-speaking patients. Conclusions Disparities in proportions of patients identified by various sepsis criteria were detected among the different social determinant groups. To achieve accurate diagnosis, a versatile diagnostic system for sepsis is needed to overcome the social determinant disparities of patients.
【2】 Taming Overconfident Prediction on Unlabeled Data from Hindsight 标题:后见之明未标注数据的驯服过度自信预测 链接:https://arxiv.org/abs/2112.08200
作者:Jing Li,Yuangang Pan,Ivor W. Tsang 摘要:在半监督学习(SSL)中,最小化未标记数据的预测不确定性是实现良好性能的关键因素。预测不确定性通常表示为由输出空间中的转换概率计算的熵。现有的大多数工作都是通过接受确定类(概率最大)作为真实标签或抑制细微预测(概率较小)来提取低熵预测。毫无疑问,这些蒸馏策略通常是启发式的,对模型训练的信息量较小。基于这一认识,本文提出了一种双重机制,称为自适应锐化(ADS),该机制首先应用软阈值自适应屏蔽确定的和可忽略的预测,然后无缝锐化已知的预测,仅用已知的预测提取特定的预测。更重要的是,我们通过比较各种蒸馏策略,从理论上分析了ADS的特性。大量实验证明ADS通过使其成为插件显著改进了最先进的SSL方法。我们提出的ADS为未来基于蒸馏的SSL研究奠定了基础。 摘要:Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning (SSL). The prediction uncertainty is typically expressed as the emph{entropy} computed by the transformed probabilities in output space. Most existing works distill low-entropy prediction by either accepting the determining class (with the largest probability) as the true label or suppressing subtle predictions (with the smaller probabilities). Unarguably, these distillation strategies are usually heuristic and less informative for model training. From this discernment, this paper proposes a dual mechanism, named ADaptive Sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions, and then seamlessly sharpens the informed predictions, distilling certain predictions with the informed ones only. More importantly, we theoretically analyze the traits of ADS by comparing with various distillation strategies. Numerous experiments verify that ADS significantly improves the state-of-the-art SSL methods by making it a plug-in. Our proposed ADS forges a cornerstone for future distillation-based SSL research.
【3】 Solving the Data Sparsity Problem in Predicting the Success of the Startups with Machine Learning Methods 标题:用机器学习方法解决创业成功预测中的数据稀疏性问题 链接:https://arxiv.org/abs/2112.07985
作者:Dafei Yin,Jing Li,Gaosheng Wu 摘要:预测创业公司的成功对于创业公司和投资者都非常重要。由于缺乏可用的数据和适当的一般方法,这是很困难的。借助Crunchbase等数据平台聚合初创公司的信息,可以使用机器学习算法进行预测。现有研究存在数据稀疏的问题,因为大多数早期初创公司没有太多可供公众使用的数据。我们试图利用最新的算法来解决这个问题。我们在Crunchbase的大数据集上研究了几种机器学习算法。结果表明,LightGBM和XGBoost表现最好,F1成绩分别达到53.03%和52.96%。我们从特征贡献的角度来解释预测。我们根据这些模型构建投资组合,并取得了较高的成功率。这些发现对机器学习方法如何帮助初创公司和投资者具有重要意义。 摘要:Predicting the success of startup companies is of great importance for both startup companies and investors. It is difficult due to the lack of available data and appropriate general methods. With data platforms like Crunchbase aggregating the information of startup companies, it is possible to predict with machine learning algorithms. Existing research suffers from the data sparsity problem as most early-stage startup companies do not have much data available to the public. We try to leverage the recent algorithms to solve this problem. We investigate several machine learning algorithms with a large dataset from Crunchbase. The results suggest that LightGBM and XGBoost perform best and achieve 53.03% and 52.96% F1 scores. We interpret the predictions from the perspective of feature contribution. We construct portfolios based on the models and achieve high success rates. These findings have substantial implications on how machine learning methods can help startup companies and investors.
【4】 Channel Parameter Estimation in the Presence of Phase Noise Based on Maximum Correntropy Criterion 标题:基于最大相关熵准则的相位噪声信道参数估计 链接:https://arxiv.org/abs/2112.07955
作者:Amir Alizadeh,Ghosheh Abed Hodtani 备注:5 pages, 5 figures, under revision by the journal of IEEE Transactions on Neural Networks and Learning Systems 摘要:振荡器输出通常具有相位噪声,导致输出功率谱密度(PSD)围绕狄拉克δ函数分散。在本文中,考虑了AWGN信道,其中伴随相位噪声的发送信号被添加到信道高斯噪声中,并在接收机处接收。传统的信道估计算法,如最小均方(LMS)和平均均方误差(MSE)准则不适用于这种信道估计。我们(i)使用信息论学习(ITL)准则,即最大相关熵准则(MCC)分析这种相位噪声信道估计,从而使信道估计器的稳态行为具有鲁棒性;(ii)通过将MSE和MCC相结合作为一种新的混合LMS算法来提高收敛速度。 摘要:Oscillator output generally has phase noise causing the output power spectral density (PSD) to disperse around a Dirac delta function. In this paper, the AWGN channel is considered, where the sent signal accompanying with phase noise is added to the channel Gaussian noise and received at the receiver. Conventional channel estimation algorithms such as least mean square (LMS) and mean MSE criterion are not suitable for this channel estimation. We (i) analyze this phase noise channel estimation with information theoretic learning (ITL) criterion, i.e., maximum correntropy criterion (MCC), leading to robustness in the channel estimator's steady state behavior; and (ii) improve the convergence rate by combining MSE and MCC as a novel mixed-LMS algorithm.
【5】 Learning to track environment state via predictive autoencoding 标题:通过预测自动编码学习跟踪环境状态 链接:https://arxiv.org/abs/2112.07745
作者:Marian Andrecki,Nicholas K. Taylor 备注:10 pages. Written in March 2018. Not published 摘要:本文介绍了一种用于学习随机环境正向模型的神经网络结构。该任务仅通过从图像形式的时间非结构化观察中学习来实现。一旦训练,该模型允许在存在噪声或新感知间歇到达的情况下跟踪环境状态。此外,状态估计可以在观测盲模式下传播,从而允许长期预测。该网络可以输出对未来观测的期望和来自信念分布的样本。产生的功能类似于粒子过滤器(PF)。在模拟对象移动的环境中对体系结构进行评估。由于正向模型和传感器模型可用,我们实现了一个PF来衡量从数据中学习到的模型的质量。 摘要:This work introduces a neural architecture for learning forward models of stochastic environments. The task is achieved solely through learning from temporal unstructured observations in the form of images. Once trained, the model allows for tracking of the environment state in the presence of noise or with new percepts arriving intermittently. Additionally, the state estimate can be propagated in observation-blind mode, thus allowing for long-term predictions. The network can output both expectation over future observations and samples from belief distribution. The resulting functionalities are similar to those of a Particle Filter (PF). The architecture is evaluated in an environment where we simulate objects moving. As the forward and sensor models are available, we implement a PF to gauge the quality of the models learnt from the data.
【6】 Probabilistic Forecasting with Conditional Generative Networks via Scoring Rule Minimization 标题:基于评分规则最小化的条件生成网络概率预测 链接:https://arxiv.org/abs/2112.08217
作者:Lorenzo Pacchiardi,Rilwan Adewoyin,Peter Dueben,Ritabrata Dutta 摘要:概率预测包括根据过去的观测结果说明未来结果的概率分布。在气象学中,运行基于物理的数值模型集合来获得这种分布。通常,绩效评估采用评分规则、预测分布函数和观察结果。使用一些评分规则,可以同时评估预测的校准和清晰度。在深度学习中,生成型神经网络将高维空间上的分布参数化,并通过转换潜在变量的绘图,轻松实现采样。条件生成网络还限制了输入变量的分布。在这篇手稿中,我们使用经过训练以最小化评分规则值的条件生成网络进行概率预测。与生成性对抗网络(GAN)相比,不需要鉴别器,训练是稳定的。我们在两个混沌模型和一个全球天气观测数据集上进行了实验;结果令人满意,并且比GANs实现的结果校准得更好。 摘要:Probabilistic forecasting consists of stating a probability distribution for a future outcome based on past observations. In meteorology, ensembles of physics-based numerical models are run to get such distribution. Usually, performance is evaluated with scoring rules, functions of the forecast distribution and the observed outcome. With some scoring rules, calibration and sharpness of the forecast can be assessed at the same time. In deep learning, generative neural networks parametrize distributions on high-dimensional spaces and easily allow sampling by transforming draws from a latent variable. Conditional generative networks additionally constrain the distribution on an input variable. In this manuscript, we perform probabilistic forecasting with conditional generative networks trained to minimize scoring rule values. In contrast to Generative Adversarial Networks (GANs), no discriminator is required and training is stable. We perform experiments on two chaotic models and a global dataset of weather observations; results are satisfactory and better calibrated than what achieved by GANs.
其他神经网络|深度学习|模型|建模(14篇)
【1】 Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings 标题:在语言模型嵌入中识别线性有毒子空间的简单文本去毒方法 链接:https://arxiv.org/abs/2112.08346
作者:Andrew Wang,Mohit Sudhakar,Yangfeng Ji 摘要:大型预先训练的语言模型通常针对大量互联网数据进行训练,其中一些可能包含有毒或辱骂性语言。因此,语言模型编码有害信息,这使得这些语言模型的实际使用受到限制。当前的方法旨在防止有毒特征出现在生成的文本中。我们假设在预先训练的语言模型的潜在空间中存在一个低维有毒子空间,这表明有毒特征遵循某种潜在模式,因此是可移除的。为了构造这个有毒子空间,我们提出了一种在潜在空间中推广有毒方向的方法。我们还提供了一种使用基于上下文的单词掩蔽系统构建并行数据集的方法。通过我们的实验,我们发现当有毒子空间从一组句子表征中移除时,结果中几乎没有有毒表征。我们的经验证明,使用我们的方法发现的子空间推广到多重毒性语料库,表明存在一个低维毒性子空间。 摘要:Large pre-trained language models are often trained on large volumes of internet data, some of which may contain toxic or abusive language. Consequently, language models encode toxic information, which makes the real-world usage of these language models limited. Current methods aim to prevent toxic features from appearing generated text. We hypothesize the existence of a low-dimensional toxic subspace in the latent space of pre-trained language models, the existence of which suggests that toxic features follow some underlying pattern and are thus removable. To construct this toxic subspace, we propose a method to generalize toxic directions in the latent space. We also provide a methodology for constructing parallel datasets using a context based word masking system. Through our experiments, we show that when the toxic subspace is removed from a set of sentence representations, almost no toxic representations remain in the result. We demonstrate empirically that the subspace found using our method generalizes to multiple toxicity corpora, indicating the existence of a low-dimensional toxic subspace.
【2】 Measure and Improve Robustness in NLP Models: A Survey 标题:NLP模型中稳健性的度量与改进:综述 链接:https://arxiv.org/abs/2112.08313
作者:Xuezhi Wang,Haohan Wang,Diyi Yang 摘要:随着NLP模型在基准上实现了最先进的性能,并获得了广泛的应用,确保这些模型在现实世界中的安全部署变得越来越重要,例如,确保模型对未知或具有挑战性的场景具有鲁棒性。尽管稳健性是一个日益被研究的主题,但它已在vision和NLP等应用中分别进行了探索,在多个研究领域有不同的定义、评估和缓解策略。在本文中,我们的目的是提供一个统一的调查,如何定义,衡量和提高鲁棒性自然语言处理。我们首先连接了稳健性的多种定义,然后统一了识别稳健性故障和评估模型稳健性的各种工作。相应地,我们提出了数据驱动、模型驱动和归纳先验的缓解策略,并对如何有效提高NLP模型的稳健性提出了更系统的观点。最后,我们总结了开放的挑战和未来的方向,以推动这一领域的进一步研究。 摘要:As NLP models achieved state-of-the-art performances over benchmarks and gained wide applications, it has been increasingly important to ensure the safe deployment of these models in the real world, e.g., making sure the models are robust against unseen or challenging scenarios. Despite robustness being an increasingly studied topic, it has been separately explored in applications like vision and NLP, with various definitions, evaluation and mitigation strategies in multiple lines of research. In this paper, we aim to provide a unifying survey of how to define, measure and improve robustness in NLP. We first connect multiple definitions of robustness, then unify various lines of work on identifying robustness failures and evaluating models' robustness. Correspondingly, we present mitigation strategies that are data-driven, model-driven, and inductive-prior-based, with a more systematic view of how to effectively improve robustness in NLP models. Finally, we conclude by outlining open challenges and future directions to motivate further research in this area.
【3】 Rethinking Influence Functions of Neural Networks in the Over-parameterized Regime 标题:对过度参数化状态下神经网络影响函数的再思考 链接:https://arxiv.org/abs/2112.08297
作者:Rui Zhang,Shihua Zhang 备注:To appear in AAAI 2022 摘要:理解神经网络的黑盒预测是一个挑战。为了实现这一点,早期的研究设计了影响函数(IF)来测量移除单个训练点对神经网络的影响。然而,经典的隐式Hessian向量积(IHVP)方法计算IF是脆弱的,在神经网络环境下对IF的理论分析仍然缺乏。为此,我们利用神经切线核(NTK)理论计算了用正则化均方损失训练的神经网络的IF,并证明了当两层ReLU网络的宽度足够大时,近似误差可以任意小。我们分析了经典IHVP方法在过参数化情况下的误差界,以了解其失败的时间和原因。具体而言,我们的理论分析表明:(1)IHVP的精度取决于正则化项,在弱正则化条件下,IHVP的精度很低;(2) IHVP的准确性与相应训练点的概率密度显著相关。我们进一步借用NTK的理论来更好地理解IFs,包括量化有影响样本的复杂性和描述IFs在训练动态过程中的变化。对真实数据的数值实验证实了我们的理论结果并证明了我们的发现。 摘要:Understanding the black-box prediction for neural networks is challenging. To achieve this, early studies have designed influence function (IF) to measure the effect of removing a single training point on neural networks. However, the classic implicit Hessian-vector product (IHVP) method for calculating IF is fragile, and theoretical analysis of IF in the context of neural networks is still lacking. To this end, we utilize the neural tangent kernel (NTK) theory to calculate IF for the neural network trained with regularized mean-square loss, and prove that the approximation error can be arbitrarily small when the width is sufficiently large for two-layer ReLU networks. We analyze the error bound for the classic IHVP method in the over-parameterized regime to understand when and why it fails or not. In detail, our theoretical analysis reveals that (1) the accuracy of IHVP depends on the regularization term, and is pretty low under weak regularization; (2) the accuracy of IHVP has a significant correlation with the probability density of corresponding training points. We further borrow the theory from NTK to understand the IFs better, including quantifying the complexity for influential samples and depicting the variation of IFs during the training dynamics. Numerical experiments on real-world data confirm our theoretical results and demonstrate our findings.
【4】 Prescriptive Machine Learning for Automated Decision Making: Challenges and Opportunities 标题:用于自动决策的说明性机器学习:挑战和机遇 链接:https://arxiv.org/abs/2112.08268
作者:Eyke Hüllermeier 摘要:机器学习(ML)的最新应用表明,从主要用于预测(基本事实)的模型的数据驱动构建的意义上来说,机器学习(ML)在预测建模中的使用发生了明显的变化,它在规定建模中的使用。这意味着学习一个模型的任务,该模型规定了现实世界场景中正确行动方案的适当决策:应该应用哪种药物治疗?这个人应该被雇用来做这项工作吗?如本文所述,规范性建模带来了新的学习技术条件,以及关于可靠性、责任和决策道德的新要求。因此,为了支持决策代理的数据驱动设计,以理性的方式,同时以负责任的方式,需要严格的方法学基础的规定ML。这篇短文的目的是阐述规定性ML的具体特征,并强调它所暗示的一些关键挑战。此外,通过与当代人工智能研究的其他分支的联系,提倡将规定性ML建立在(广义)决策理论框架中。 摘要:Recent applications of machine learning (ML) reveal a noticeable shift from its use for predictive modeling in the sense of a data-driven construction of models mainly used for the purpose of prediction (of ground-truth facts) to its use for prescriptive modeling. What is meant by this is the task of learning a model that stipulates appropriate decisions about the right course of action in real-world scenarios: Which medical therapy should be applied? Should this person be hired for the job? As argued in this article, prescriptive modeling comes with new technical conditions for learning and new demands regarding reliability, responsibility, and the ethics of decision making. Therefore, to support the data-driven design of decision-making agents that act in a rational but at the same time responsible manner, a rigorous methodological foundation of prescriptive ML is needed. The purpose of this short paper is to elaborate on specific characteristics of prescriptive ML and to highlight some key challenges it implies. Besides, drawing connections to other branches of contemporary AI research, the grounding of prescriptive ML in a (generalized) decision-theoretic framework is advocated.
【5】 Online Feature Selection for Efficient Learning in Networked Systems 标题:网络系统中高效学习的在线特征选择 链接:https://arxiv.org/abs/2112.08253
作者:Xiaoxuan Wang,Rolf Stadler 备注:arXiv admin note: substantial text overlap with arXiv:2010.14907 摘要:当前用于数据驱动工程的AI/ML方法使用的模型大多是离线训练的。就通信和计算成本而言,此类模型的构建成本可能很高,而且它们依赖于长时间收集的数据。此外,当系统发生变化时,它们就过时了。为了应对这些挑战,我们研究了自动减少模型训练可用数据源数量的在线学习技术。我们提出了一种称为在线稳定特征集算法(OSFS)的在线算法,该算法在接收少量测量数据后,从大量可用数据源中选择一个小的特征集。该算法由特征排序算法、特征集稳定性度量和搜索策略初始化。我们使用来自内部测试台和运行中的数据中心的跟踪对该算法进行了广泛的实验评估。我们发现,在所有调查的数据集上,OSFS将特征集的大小大幅度减少了1-3个数量级。最重要的是,我们发现在OSFS生成的特征集上训练的预测器的精度比在通过离线特征选择获得的特征集上训练预测器的精度要好一些。因此,OSFS被证明是一种有效的在线特征选择算法,并且对于用于特征选择的样本间隔具有鲁棒性。我们还发现,当模型基础数据中出现概念漂移时,可以通过重新计算特征集和重新训练预测模型来缓解其影响。 摘要:Current AI/ML methods for data-driven engineering use models that are mostly trained offline. Such models can be expensive to build in terms of communication and computing cost, and they rely on data that is collected over extended periods of time. Further, they become out-of-date when changes in the system occur. To address these challenges, we investigate online learning techniques that automatically reduce the number of available data sources for model training. We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources after receiving a small number of measurements. The algorithm is initialized with a feature ranking algorithm, a feature set stability metric, and a search policy. We perform an extensive experimental evaluation of this algorithm using traces from an in-house testbed and from a data center in operation. We find that OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets. Most importantly, we find that the accuracy of a predictor trained on a OSFS-produced feature set is somewhat better than when the predictor is trained on a feature set obtained through offline feature selection. OSFS is thus shown to be effective as an online feature selection algorithm and robust regarding the sample interval used for feature selection. We also find that, when concept drift in the data underlying the model occurs, its effect can be mitigated by recomputing the feature set and retraining the prediction model.
【6】 Towards Controllable Agent in MOBA Games with Generative Modeling 标题:基于产生式建模的MOBA博弈中可控Agent研究 链接:https://arxiv.org/abs/2112.08093
作者:Shubao Zhang 备注:Human-Compatible AI; Human-AI Cooperation; AI control; AI Alignment 摘要:我们提出了一种新的方法来开发行为可控的代理,该代理的行为类似于人类,并且能够在多人在线战场(MOBA)游戏中与人类玩家对齐。通过将控制问题建模为一个动作生成过程,我们设计了一个用于训练agent的深层潜在对齐神经网络模型,并设计了相应的采样算法来控制agent的动作。特别地,我们提出了核心潜在对齐模型的确定性和随机注意实现。在《国王荣誉》游戏中的模拟和在线实验都证明了所提方法的有效性。 摘要:We propose novel methods to develop action controllable agent that behaves like a human and has the ability to align with human players in Multiplayer Online Battle Arena (MOBA) games. By modeling the control problem as an action generation process, we devise a deep latent alignment neural network model for training agent, and a corresponding sampling algorithm for controlling an agent's action. Particularly, we propose deterministic and stochastic attention implementations of the core latent alignment model. Both simulated and online experiments in the game Honor of Kings demonstrate the efficacy of the proposed methods.
【7】 Multi-modal Networks Reveal Patterns of Operational Similarity of Terrorist Organizations 标题:多模态网络揭示恐怖组织行动相似性模式 链接:https://arxiv.org/abs/2112.07998
作者:Gian Maria Campedelli,Iain J. Cruickshank,Kathleen M. Carley 备注:None 摘要:捕捉恐怖集团之间行动相似性的动态对于为反恐和情报监测提供可操作的见解至关重要。然而,尽管这一问题具有理论和实践意义,但目前缺乏解决这一问题的研究。我们针对这个问题提出了一个新的计算框架,用于检测具有相似行为的恐怖集团集群,重点关注集团每年部署的战术、攻击目标和使用的武器。特别是考虑到那些在1997年至2018年期间策划了至少50起袭击的组织,这些组织共有105个团体负责全球42000多起事件,我们提供了三组结果。首先,我们表明,多年来,全球恐怖主义的特点是行动凝聚力不断增强。其次,我们强调,2009年至2018年,群体间联合聚类的年稳定性特别高,表明过去十年中相似模式的时间一致性。第三,我们证明了两个组织之间的运作相似性是由三个因素驱动的:(a)它们的总体活动;(b) 其业务曲目多样性的差异;(c) 多样性和活动的综合衡量的差异。群体的作战偏好、地理同质性和意识形态亲和力在决定作战相似性方面没有一致的作用。 摘要:Capturing dynamics of operational similarity among terrorist groups is critical to provide actionable insights for counter-terrorism and intelligence monitoring. Yet, in spite of its theoretical and practical relevance, research addressing this problem is currently lacking. We tackle this problem proposing a novel computational framework for detecting clusters of terrorist groups sharing similar behaviors, focusing on groups' yearly repertoire of deployed tactics, attacked targets, and utilized weapons. Specifically considering those organizations that have plotted at least 50 attacks from 1997 to 2018, accounting for a total of 105 groups responsible for more than 42,000 events worldwide, we offer three sets of results. First, we show that over the years global terrorism has been characterized by increasing operational cohesiveness. Second, we highlight that year-to-year stability in co-clustering among groups has been particularly high from 2009 to 2018, indicating temporal consistency of similarity patterns in the last decade. Third, we demonstrate that operational similarity between two organizations is driven by three factors: (a) their overall activity; (b) the difference in the diversity of their operational repertoires; (c) the difference in a combined measure of diversity and activity. Groups' operational preferences, geographical homophily and ideological affinity have no consistent role in determining operational similarity.
【8】 Fix your Models by Fixing your Datasets 标题:通过修复数据集来修复模型 链接:https://arxiv.org/abs/2112.07844
作者:Atindriyo Sanyal,Vikram Chatterji,Nidhi Vyas,Ben Epstein,Nikita Demir,Anthony Corletti 摘要:基础训练数据的质量对于建立具有更广泛泛化性的高性能机器学习模型至关重要。然而,当前的机器学习(ML)工具缺乏用于提高数据质量的简化流程。因此,获取数据质量洞察并迭代地修剪错误以获得最能代表下游用例的数据集仍然是一个特别的手动过程。我们的工作解决了这一数据工具缺口,这是纯粹通过以数据为中心的技术构建改进的ML工作流所必需的。更具体地说,我们引入了一个系统框架,用于(1)在数据集中发现噪声或标签错误的样本,(2)识别信息量最大的样本,当包含在训练中时,将提供最大的模型性能提升。我们在两个财富500强公司的公共和私营企业数据集上展示了我们的框架的有效性,并且相信这项工作将为ML团队执行更智能的数据发现和修剪奠定基础。 摘要:The quality of underlying training data is very crucial for building performant machine learning models with wider generalizabilty. However, current machine learning (ML) tools lack streamlined processes for improving the data quality. So, getting data quality insights and iteratively pruning the errors to obtain a dataset which is most representative of downstream use cases is still an ad-hoc manual process. Our work addresses this data tooling gap, required to build improved ML workflows purely through data-centric techniques. More specifically, we introduce a systematic framework for (1) finding noisy or mislabelled samples in the dataset and, (2) identifying the most informative samples, which when included in training would provide maximal model performance lift. We demonstrate the efficacy of our framework on public as well as private enterprise datasets of two Fortune 500 companies, and are confident this work will form the basis for ML teams to perform more intelligent data discovery and pruning.
【9】 Analog/Mixed-Signal Circuit Synthesis Enabled by the Advancements of Circuit Architectures and Machine Learning Algorithms 标题:电路结构和机器学习算法的进步使模拟/混合信号电路综合成为可能 链接:https://arxiv.org/abs/2112.07824
作者:Shiyu Su,Qiaochu Zhang,Mohsen Hassanpourghadi,Juzheng Liu,Rezwan A Rasul,Mike Shuo-Wei Chen 备注:PREPRINT - accepted at IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC), 2022 摘要:由于技术的可扩展性和对更高灵活性/可重构性的需求,模拟混合信号(AMS)电路体系结构已朝着更数字友好的方向发展。同时,由于需要优化复杂AMS电路的电路尺寸、布局和验证,AMS电路的设计复杂性和成本大幅增加。另一方面,机器学习(ML)算法在过去十年中呈指数增长,并被电子设计自动化(EDA)社区积极开发。本文将确定这一趋势带来的机遇和挑战,并概述AMS电路体系结构和机器学习算法的最新发展所带来的几种新兴AMS设计方法。具体而言,我们将着重于使用基于神经网络的代理模型来加速电路设计参数搜索和布局迭代。最后,我们将演示几个AMS电路示例从规格到硅原型的快速合成,大大减少了人为干预。 摘要:Analog mixed-signal (AMS) circuit architecture has evolved towards more digital friendly due to technology scaling and demand for higher flexibility/reconfigurability. Meanwhile, the design complexity and cost of AMS circuits has substantially increased due to the necessity of optimizing the circuit sizing, layout, and verification of a complex AMS circuit. On the other hand, machine learning (ML) algorithms have been under exponential growth over the past decade and actively exploited by the electronic design automation (EDA) community. This paper will identify the opportunities and challenges brought about by this trend and overview several emerging AMS design methodologies that are enabled by the recent evolution of AMS circuit architectures and machine learning algorithms. Specifically, we will focus on using neural-network-based surrogate models to expedite the circuit design parameter search and layout iterations. Lastly, we will demonstrate the rapid synthesis of several AMS circuit examples from specification to silicon prototype, with significantly reduced human intervention.
【10】 Scatterbrained: A flexible and expandable pattern for decentralized machine learning 标题:分散式:一种灵活可扩展的分散式机器学习模式 链接:https://arxiv.org/abs/2112.07718
作者:Miller Wilt,Jordan K. Matelsky,Andrew S. Gearhart 备注:Code and documentation is available at this https URL 摘要:联邦机器学习是一种跨多个设备训练模型的技术,无需在设备之间交换数据。因为数据对于每个计算节点来说都是本地的,所以联邦学习非常适合于数据被仔细控制的领域中的用例,比如医学领域,或者带宽受限的领域。这种方法的一个缺点是,大多数联合学习工具依赖于一个中央服务器来执行工作负载委托并生成一个单一的共享模型。在这里,我们建议使用一个灵活的框架来分散联邦学习模式,并提供一个与PyTorch兼容的开源参考实现。 摘要:Federated machine learning is a technique for training a model across multiple devices without exchanging data between them. Because data remains local to each compute node, federated learning is well-suited for use-cases in fields where data is carefully controlled, such as medicine, or in domains with bandwidth constraints. One weakness of this approach is that most federated learning tools rely upon a central server to perform workload delegation and to produce a single shared model. Here, we suggest a flexible framework for decentralizing the federated learning pattern, and provide an open-source, reference implementation compatible with PyTorch.
【11】 RA V-Net: Deep learning network for automated liver segmentation 标题:RAV-Net:用于肝脏自动分割的深度学习网络 链接:https://arxiv.org/abs/2112.08232
作者:Zhiqi Lee,Sumin Qi,Chongchong Fan,Ziwei Xie 摘要:肝脏的精确分割是疾病诊断的先决条件。自动分割是计算机辅助肝脏疾病检测和诊断的一个重要应用。近年来,医学图像的自动化处理取得了突破性进展。然而,腹部扫描CT图像的低对比度和肝脏形态的复杂性使得精确的自动分割具有挑战性。本文提出了一种改进的基于U-Net的医学图像自动分割模型rav-Net。它有以下三个主要创新。提出了CofRes模型(复合原始特征残差模型)。具有更复杂的卷积层和跳跃连接,使其获得更高级别的图像特征提取能力,防止梯度消失或爆炸。为了减少模型的计算量,提出了AR模块(注意恢复模块)。此外,通过调整信道和LSTM卷积来感测编码和解码模块的数据像素之间的空间特征。最后,有效地保留了图像特征。引入了CA模块(channelattentionmodule,通道注意模块),该模块用于提取具有依赖关系的相关通道,并通过矩阵点积对其进行增强,同时削弱不具有依赖关系的无关通道。达到了通道注意的目的。LSTM卷积和CA模块提供的注意机制是神经网络性能的有力保证。U-Net网络的精度为0.9862,精度为0.9118,DSC为0.8547,JSC为0.82。RA V-Net的评估指标,准确度:0.9968,精密度:0.9597,DSC:0.9654,JSC:0.9414。分割效果最具代表性的指标是DSC,它比U-Net提高了0.1107,JSC提高了0.1214。 摘要:Accurate segmentation of the liver is a prerequisite for the diagnosis of disease. Automated segmentation is an important application of computer-aided detection and diagnosis of liver disease. In recent years, automated processing of medical images has gained breakthroughs. However, the low contrast of abdominal scan CT images and the complexity of liver morphology make accurate automatic segmentation challenging. In this paper, we propose RA V-Net, which is an improved medical image automatic segmentation model based on U-Net. It has the following three main innovations. CofRes Module (Composite Original Feature Residual Module) is proposed. With more complex convolution layers and skip connections to make it obtain a higher level of image feature extraction capability and prevent gradient disappearance or explosion. AR Module (Attention Recovery Module) is proposed to reduce the computational effort of the model. In addition, the spatial features between the data pixels of the encoding and decoding modules are sensed by adjusting the channels and LSTM convolution. Finally, the image features are effectively retained. CA Module (Channel Attention Module) is introduced, which used to extract relevant channels with dependencies and strengthen them by matrix dot product, while weakening irrelevant channels without dependencies. The purpose of channel attention is achieved. The attention mechanism provided by LSTM convolution and CA Module are strong guarantees for the performance of the neural network. The accuracy of U-Net network: 0.9862, precision: 0.9118, DSC: 0.8547, JSC: 0.82. The evaluation metrics of RA V-Net, accuracy: 0.9968, precision: 0.9597, DSC: 0.9654, JSC: 0.9414. The most representative metric for the segmentation effect is DSC, which improves 0.1107 over U-Net, and JSC improves 0.1214.
【12】 Building separable approximations for quantum states via neural networks 标题:用神经网络建立量子态的可分离近似 链接:https://arxiv.org/abs/2112.08055
作者:Antoine Girardin,Nicolas Brunner,Tamás Kriváchy 备注:11 pages, 5 figures We welcome any comments on references for Tables 1 and 2 摘要:找到离给定目标态最近的可分离态是一项众所周知的困难任务,甚至比确定一个态是纠缠态还是可分离态还要困难。为了解决这个问题,我们用神经网络对可分离状态进行参数化,并训练它使到给定目标状态的距离相对于可微距离(如轨迹距离或Hilbert-Schmidt距离)最小化。通过检查算法的输出,我们可以推断目标态是否纠缠,并构造其最近可分离态的近似。我们在各种已知的二部态上对该方法进行了基准测试,发现了非常好的一致性,甚至达到了$d=10$的局部维数。此外,考虑到不同的可分性概念,我们证明了我们的方法在多部分情况下是有效的。通过检查三方和四方GHZ和W态,我们恢复了已知的边界并获得了新的边界,例如三分性。最后,我们展示了如何使用神经网络的结果来获得分析洞察力。 摘要:Finding the closest separable state to a given target state is a notoriously difficult task, even more difficult than deciding whether a state is entangled or separable. To tackle this task, we parametrize separable states with a neural network and train it to minimize the distance to a given target state, with respect to a differentiable distance, such as the trace distance or Hilbert-Schmidt distance. By examining the output of the algorithm, we can deduce whether the target state is entangled or not, and construct an approximation for its closest separable state. We benchmark the method on a variety of well-known classes of bipartite states and find excellent agreement, even up to local dimension of $d=10$. Moreover, we show our method to be efficient in the multipartite case, considering different notions of separability. Examining three and four-party GHZ and W states we recover known bounds and obtain novel ones, for instance for triseparability. Finally, we show how to use the neural network's results to gain analytic insight.
【13】 Domain-informed neural networks for interaction localization within astroparticle experiments 标题:用于星粒实验中相互作用定位的域信息神经网络 链接:https://arxiv.org/abs/2112.07995
作者:Shixiao Liang,Aaron Higuera,Christina Peters,Venkat Roy,Waheed U. Bajwa,Hagit Shatkay,Christopher D. Tunnell 备注:Submitted to journal. Data and simulation referenced in paper 摘要:本文以暗物质研究中的时间投影室(TPC)粒子相互作用定位技术为例,提出了一种用于实验粒子物理的域信息神经网络结构。TPC内产生的信号的一个关键特征是,它们允许通过一个称为重建的过程定位粒子相互作用。虽然多层感知器(MLP)已成为TPC重建的主要竞争者,但这种黑盒方法并不反映对潜在科学过程的先验知识。本文重新审视了基于神经网络的交互定位,并根据信号特征和检测器几何结构将先验检测器知识编码到多层神经网络的特征编码层和输出层。由此产生的域信息神经网络(DiNN限制初始特征编码层中神经元的感受野,以解释TPC内产生的信号的空间局部化性质。DiNN的这一方面与图形神经网络的新兴领域相似,因为初始层中的神经元仅连接到少数几个与MLP相比,后续层中的神经元显著减少了网络中的参数数量。此外,为了考虑探测器的几何结构,使用两个几何变换修改网络的输出层,以确保DiNN在探测器内部产生定位。最终的结果是神经网络体系结构的参数比MLP少60%,但仍然实现了类似的本地化性能,并为未来的体系结构开发提供了一条路径,因为它们能够将额外的领域知识编码到体系结构中,从而提高了性能。 摘要:This work proposes a domain-informed neural network architecture for experimental particle physics, using particle interaction localization with the time-projection chamber (TPC) technology for dark matter research as an example application. A key feature of the signals generated within the TPC is that they allow localization of particle interactions through a process called reconstruction. While multilayer perceptrons (MLPs) have emerged as a leading contender for reconstruction in TPCs, such a black-box approach does not reflect prior knowledge of the underlying scientific processes. This paper looks anew at neural network-based interaction localization and encodes prior detector knowledge, in terms of both signal characteristics and detector geometry, into the feature encoding and the output layers of a multilayer neural network. The resulting Domain-informed Neural Network (DiNN limits the receptive fields of the neurons in the initial feature encoding layers in order to account for the spatially localized nature of the signals produced within the TPC. This aspect of the DiNN, which has similarities with the emerging area of graph neural networks in that the neurons in the initial layers only connect to a handful of neurons in their succeeding layer, significantly reduces the number of parameters in the network in comparison to an MLP. In addition, in order to account for the detector geometry, the output layers of the network are modified using two geometric transformations to ensure the DiNN produces localizations within the interior of the detector. The end result is a neural network architecture that has 60% fewer parameters than an MLP, but that still achieves similar localization performance and provides a path to future architectural developments with improved performance because of their ability to encode additional domain knowledge into the architecture.
【14】 Machine learning a manifold 标题:机器学习流形 链接:https://arxiv.org/abs/2112.07673
作者:Sean Craven,Djuna Croon,Daniel Cutting,Rachel Houtz 备注:7 pages, 2 figures. (SC RH) DC^2 propose mape epsilon^2 摘要:我们提出了一种通过人工神经网络回归识别数据集中连续李代数对称性的简单方法。我们的方案利用了$mathcal{O}(epsilon^2)$输出变量在输入变量的无穷小对称变换下的缩放。由于对称变换是在训练后生成的,因此该方法不依赖于对完整表示空间的采样或数据集的分块,并且将错误识别的可能性降至最低。我们在SU(3)对称(非线性)线性$Sigma$模型中演示了我们的方法。 摘要:We propose a simple method to identify a continuous Lie algebra symmetry in a dataset through regression by an artificial neural network. Our proposal takes advantage of the $ mathcal{O}(epsilon^2)$ scaling of the output variable under infinitesimal symmetry transformations on the input variables. As symmetry transformations are generated post-training, the methodology does not rely on sampling of the full representation space or binning of the dataset, and the possibility of false identification is minimised. We demonstrate our method in the SU(3)-symmetric (non-) linear $Sigma$ model.
其他(9篇)
【1】 Textless Speech-to-Speech Translation on Real Data 标题:基于真实数据的无文本语音到语音翻译 链接:https://arxiv.org/abs/2112.08352
作者:Ann Lee,Hongyu Gong,Paul-Ambroise Duquenne,Holger Schwenk,Peng-Jen Chen,Changhan Wang,Sravya Popuri,Juan Pino,Jiatao Gu,Wei-Ning Hsu 摘要:我们提出了一个无文本语音转换(S2ST)系统,该系统可以将语音从一种语言转换为另一种语言,并且不需要任何文本数据。与现有文献中的工作不同,我们解决了多说话人目标语音建模的挑战,并使用真实世界的S2ST数据对系统进行训练。我们的方法的关键是一种基于自我监督单元的语音规范化技术,它使用来自多个说话人和一个参考说话人的成对音频对预先训练的语音编码器进行微调,以减少由于口音引起的变化,同时保留词汇内容。在语音标准化的配对数据只有10分钟的情况下,与在非标准化语音目标上训练的基线相比,在vp~S2ST数据集上训练S2ST模型时,我们平均获得3.2 BLEU增益。我们还加入了自动挖掘的S2ST数据,并显示了额外的2.0 BLEU增益。据我们所知,我们是第一个建立无文本S2ST技术的人,该技术可以使用真实世界的数据进行训练,并适用于多种语言对。 摘要:We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents, while preserving the lexical content. With only 10 minutes of paired data for speech normalization, we obtain on average 3.2 BLEU gain when training the S2ST model on the vp~S2ST dataset, compared to a baseline trained on un-normalized speech target. We also incorporate automatically mined S2ST data and show an additional 2.0 BLEU gain. To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs.
【2】 Guaranteed Contraction Control in the Presence of Imperfectly Learned Dynamics 标题:存在不完全学习动力学时的保证收缩控制 链接:https://arxiv.org/abs/2112.08222
作者:Pan Zhao,Ziyao Guo,Yikun Cheng,Aditya Gahlawat,Naira Hovakimyan 备注:Shorter version submitted to L4DC 2022. 22 pages, 8 figures 摘要:针对具有匹配不确定性的非线性系统,提出了一种基于收缩度量和扰动估计的轨迹中心学习控制方法。该方法允许使用广泛的模型学习工具,包括深度神经网络来学习不确定动态,同时在整个学习阶段(包括无学习的特殊情况)提供瞬态跟踪性能的保证。在所提出的方法中,提出了一种干扰估计律来估计不确定性的逐点值,具有预计算的估计误差界(EEB)。然后将学习的动力学、估计的干扰和EEB合并到鲁棒黎曼能量条件中,以计算控制律,该控制律确保在整个学习阶段,即使学习的模型很差,实际轨迹也能指数收敛到期望轨迹。另一方面,随着精确度的提高,学习的模型可以整合到高级规划器中,以规划性能更好的轨迹,例如,更低的能耗和更短的旅行时间。通过一个平面四旋翼导航实例验证了该框架的有效性。 摘要:This paper presents an approach for trajectory-centric learning control based on contraction metrics and disturbance estimation for nonlinear systems subject to matched uncertainties. The approach allows for the use of a broad class of model learning tools including deep neural networks to learn uncertain dynamics while still providing guarantees of transient tracking performance throughout the learning phase, including the special case of no learning. Within the proposed approach, a disturbance estimation law is proposed to estimate the pointwise value of the uncertainty, with pre-computable estimation error bounds (EEBs). The learned dynamics, the estimated disturbances, and the EEBs are then incorporated in a robust Riemannian energy condition to compute the control law that guarantees exponential convergence of actual trajectories to desired ones throughout the learning phase, even when the learned model is poor. On the other hand, with improved accuracy, the learned model can be incorporated in a high-level planner to plan better trajectories with improved performance, e.g., lower energy consumption and shorter travel time. The proposed framework is validated on a planar quadrotor navigation example.
【3】 Funnels: Exact maximum likelihood with dimensionality reduction 标题:漏斗:降维后的精确最大似然 链接:https://arxiv.org/abs/2112.08069
作者:Samuel Klein,John A. Raine,Sebastian Pina-Otey,Slava Voloshynovskiy,Tobias Golling 备注:16 pages, 5 figures, 8 tables 摘要:规范化流是微分同胚的,通常是维保持的,使用模型的似然性训练的模型。我们使用SurVAE框架通过一个称为漏斗的新层构造降维满射流。我们在各种数据集上证明了它的有效性,并表明它改进或匹配了现有流的性能,同时减少了潜在空间大小。漏斗层可以从广泛的变换中构造,包括限制卷积和前馈层。 摘要:Normalizing flows are diffeomorphic, typically dimension-preserving, models trained using the likelihood of the model. We use the SurVAE framework to construct dimension reducing surjective flows via a new layer, known as the funnel. We demonstrate its efficacy on a variety of datasets, and show it improves upon or matches the performance of existing flows while having a reduced latent space size. The funnel layer can be constructed from a wide range of transformations including restricted convolution and feed forward layers.
【4】 TAFA: Design Automation of Analog Mixed-Signal FIR Filters Using Time Approximation Architecture 标题:TAFA:基于时间近似结构的模拟混合信号FIR滤波器设计自动化 链接:https://arxiv.org/abs/2112.07825
作者:Shiyu Su,Qiaochu Zhang,Juzheng Liu,Mohsen Hassanpourghadi,Rezwan Rasul,Mike Shuo-Wei Chen 备注:PREPRINT - accepted at IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC), 2022 摘要:由于数字电路的成熟CAD支持,数字有限冲激响应(FIR)滤波器设计完全可综合。相反,模拟混合信号(AMS)滤波器设计主要是一个手动过程,包括架构选择、原理图设计和布局。这项工作提出了一种系统化的设计方法,使用时间近似结构自动设计AMS FIR滤波器,无需任何可调无源元件,如开关电容或电阻器。它不仅增强了滤波器的灵活性,而且简化了设计自动化,降低了模拟复杂度。提出的设计流程采用了一种混合近似方案,该方案根据时间量化效应自动优化滤波器的冲激响应,从而在环路中以最少的设计者努力实现了显著的性能改进。此外,基于人工神经网络(ANN)的布局感知回归模型与基于梯度的搜索算法相结合,用于自动化和加速滤波器设计。利用所提出的框架,我们演示了从规格到布局的65nm过程中AMS FIR滤波器的快速合成。 摘要:A digital finite impulse response (FIR) filter design is fully synthesizable, thanks to the mature CAD support of digital circuitry. On the contrary, analog mixed-signal (AMS) filter design is mostly a manual process, including architecture selection, schematic design, and layout. This work presents a systematic design methodology to automate AMS FIR filter design using a time approximation architecture without any tunable passive component, such as switched capacitor or resistor. It not only enhances the flexibility of the filter but also facilitates design automation with reduced analog complexity. The proposed design flow features a hybrid approximation scheme that automatically optimize the filter's impulse response in light of time quantization effects, which shows significant performance improvement with minimum designer's efforts in the loop. Additionally, a layout-aware regression model based on an artificial neural network (ANN), in combination with gradient-based search algorithm, is used to automate and expedite the filter design. With the proposed framework, we demonstrate rapid synthesis of AMS FIR filters in 65nm process from specification to layout.
【5】 Classifying Emails into Human vs Machine Category 标题:将电子邮件分类为人与机器类别 链接:https://arxiv.org/abs/2112.07742
作者:Changsung Kang,Hongwei Shang,Jean-Marc Langlois 备注:This paper is accepted by AAAI'22 摘要:区分个人和机器生成的电子邮件是Yahoo Mail的基本产品要求。Yahoo Mail中的旧产品分类器基于简单的逻辑回归模型。该模型是通过在SMTP地址级别聚合功能来训练的。我们建议在消息级别构建深度学习模型。我们构建并训练了四个独立的CNN模型:(1)以主题和内容为输入的内容模型;(2) 输入发件人电子邮件地址和姓名的发件人模型;(3) 通过分析电子邮件收件人的操作模式,并根据发件人的打开/删除行为相应地生成目标标签的操作模型;(4) 利用发送者的“显式称呼”信号作为正面标记的称呼模型。接下来,在探索上述四种模型的不同组合后,我们构建了最终的完整模型。编辑数据的实验结果表明,与旧的生产模型相比,我们的完整模型将调整后的召回率从70.5%提高到78.8%,同时将准确率从94.7%提高到96.0%。在这项任务中,我们的完整模型也大大优于最先进的BERT模型。此完整模型已部署到当前的生产系统(Yahoo Mail 6)。 摘要:It is an essential product requirement of Yahoo Mail to distinguish between personal and machine-generated emails. The old production classifier in Yahoo Mail was based on a simple logistic regression model. That model was trained by aggregating features at the SMTP address level. We propose building deep learning models at the message level. We built and trained four individual CNN models: (1) a content model with subject and content as input; (2) a sender model with sender email address and name as input; (3) an action model by analyzing email recipients' action patterns and correspondingly generating target labels based on senders' opening/deleting behaviors; (4) a salutation model by utilizing senders' "explicit salutation" signal as positive labels. Next, we built a final full model after exploring different combinations of the above four models. Experimental results on editorial data show that our full model improves the adjusted-recall from 70.5% to 78.8% compared to the old production model, while at the same time lifts the precision from 94.7% to 96.0%. Our full model also significantly beats the state-of-the-art Bert model at this task. This full model has been deployed into the current production system (Yahoo Mail 6).
【6】 Enhancing operations management through smart sensors: measuring and improving well-being, interaction and performance of logistics workers 标题:通过智能传感器加强运营管理:测量和改善物流工人的幸福感、互动和绩效 链接:https://arxiv.org/abs/2112.08213
作者:D. Aloini,A. Fronzetti Colladon,P. Gloor,E. Guerrazzi,A. Stefanini 备注:None 摘要:目的本研究旨在对意大利物流中心的物料搬运活动进行探索性调查。可穿戴传感器和其他智能工具用于在工作活动中收集人类和环境特征。这些因素与工人的表现和幸福感相关。设计/方法/方法人和环境因素在运营管理活动中起着重要作用,因为它们显著影响员工的绩效、幸福和安全。令人惊讶的是,关于这些方面对物流运营影响的实证研究仍然非常有限。为了填补这一空白,该研究实证性地探索了影响物流工人使用智能工具绩效的人和环境因素。研究结果表明,人的态度、互动、情绪和环境条件显著影响员工的绩效和幸福感,但根据每个员工的个人特征,表现出不同的关系。实践意义作者的研究为分析员工和采用个性化人力资源管理开辟了新途径,为管理者提供了一个能够检查和改善员工福利和绩效的操作系统。原创性/价值本研究的原创性来自于在工作活动中使用佩戴式传感器,通过实时记录个人、协作和环境数据,对人类和环境因素进行深入探索。据作者所知,本论文是第一次在现实物流运营中进行如此详细的分析。 摘要:Purpose The purpose of the research is to conduct an exploratory investigation of the material handling activities of an Italian logistics hub. Wearable sensors and other smart tools were used for collecting human and environmental features during working activities. These factors were correlated with workers' performance and well-being. Design/methodology/approach Human and environmental factors play an important role in operations management activities since they significantly influence employees' performance, well-being and safety. Surprisingly, empirical studies about the impact of such aspects on logistics operations are still very limited. Trying to fill this gap, the research empirically explores human and environmental factors affecting the performance of logistics workers exploiting smart tools. Findings Results suggest that human attitudes, interactions, emotions and environmental conditions remarkably influence workers' performance and well-being, however, showing different relationships depending on individual characteristics of each worker. Practical implications The authors' research opens up new avenues for profiling employees and adopting an individualized human resource management, providing managers with an operational system capable to potentially check and improve workers' well-being and performance. Originality/value The originality of the study comes from the in-depth exploration of human and environmental factors using body-worn sensors during work activities, by recording individual, collaborative and environmental data in real-time. To the best of the authors' knowledge, the current paper is the first time that such a detailed analysis has been carried out in real-world logistics operations.
【7】 Experimental quantum advantage with quantum coupon collector 标题:利用量子优惠券收集器的实验量子优势 链接:https://arxiv.org/abs/2112.07884
作者:Min-Gang Zhou,Xiao-Yu Cao,Yu-Shuo Lu,Yang Wang,Yu Bao,Zhao-Ying Jia,Yao Fu,Hua-Lei Yin,Zeng-Bing Chen 备注:11 pages, 6 figures, 4 tables 摘要:近年来,越来越多的具有量子优势的通信和计算方案被提出,这意味着量子技术具有广阔的应用前景。然而,由于制备高维态或高纠缠态的困难,在实验上证明这些方案仍然是一个中心挑战。在这项研究中,我们介绍并分析了一种采用相干态和简单线性光学元件的量子优惠券收集器协议,并用实际的实验设备成功地演示了该协议。我们表明,与优惠券收集器问题的经典极限相比,我们的协议可以显著减少学习特定集合所需的样本数量。我们还通过构造一个量子盲盒博弈讨论了量子优惠券收集器的潜在价值和扩展。所提出的博弈传递的信息也打破了经典极限。这些结果有力地证明了量子力学在机器学习和通信复杂性方面的优势。 摘要:An increasing number of communication and computational schemes with quantum advantages have recently been proposed, which implies that quantum technology has fertile application prospects. However, demonstrating these schemes experimentally continues to be a central challenge because of the difficulty in preparing high-dimensional states or highly entangled states. In this study, we introduce and analyse a quantum coupon collector protocol by employing coherent states and simple linear optical elements, which was successfully demonstrated using realistic experimental equipment. We showed that our protocol can significantly reduce the number of samples needed to learn a specific set compared with the classical limit of the coupon collector problem. We also discuss the potential values and expansions of the quantum coupon collector by constructing a quantum blind box game. The information transmitted by the proposed game also broke the classical limit. These results strongly prove the advantages of quantum mechanics in machine learning and communication complexity.
【8】 Communication-Efficient Distributed SGD with Compressed Sensing 标题:基于压缩感知的高效通信分布式SGD 链接:https://arxiv.org/abs/2112.07836
作者:Yujie Tang,Vikram Ramanathan,Junshan Zhang,Na Li 摘要:我们认为大规模分布式优化在一组边缘设备连接到中央服务器,其中有限的通信带宽之间的服务器和边缘设备施加了一个重要的瓶颈优化过程。受联邦学习最新进展的启发,我们提出了一种分布式随机梯度下降(SGD)型算法,在可能的情况下利用梯度的稀疏性来减少通信负担。该算法的核心是使用压缩感知技术对设备端的局部随机梯度进行压缩;在服务器端,从噪声聚集的压缩局部梯度恢复全局随机梯度的稀疏近似。我们对该算法在通信信道噪声干扰下的收敛性进行了理论分析,并通过数值实验验证了其有效性。 摘要:We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization procedure. Inspired by recent advances in federated learning, we propose a distributed stochastic gradient descent (SGD) type algorithm that exploits the sparsity of the gradient, when possible, to reduce communication burden. At the heart of the algorithm is to use compressed sensing techniques for the compression of the local stochastic gradients at the device side; and at the server side, a sparse approximation of the global stochastic gradient is recovered from the noisy aggregated compressed local gradients. We conduct theoretical analysis on the convergence of our algorithm in the presence of noise perturbation incurred by the communication channels, and also conduct numerical experiments to corroborate its effectiveness.
【9】 Variable Selection and Regularization via Arbitrary Rectangle-range Generalized Elastic Net 标题:基于任意矩形范围广义弹性网的变量选择与正则化 链接:https://arxiv.org/abs/2112.07785
作者:Yujia Ding,Qidi Peng,Zhengming Song,Hansen Chen 备注:25 pages, 2 figures 摘要:我们介绍了任意矩形范围的广义弹性网惩罚方法,简称ARGEN,用于在高维稀疏线性模型中执行约束变量选择和正则化。作为非负弹性净惩罚方法的自然推广,证明了ARGEN在一定条件下具有变量选择一致性和估计一致性。研究了ARGEN估计量分布的渐近性态。我们还提出了一种称为MU-QP-RR-W-$l_1$的算法来有效地解决ARGEN问题。通过模拟研究,我们发现ARGEN在许多情况下都优于弹性网络。最后,应用标准普尔500指数跟踪对股票配置进行约束,为调整ARGEN以解决现实问题提供一般指导。 摘要:We introduce the arbitrary rectangle-range generalized elastic net penalty method, abbreviated to ARGEN, for performing constrained variable selection and regularization in high-dimensional sparse linear models. As a natural extension of the nonnegative elastic net penalty method, ARGEN is proved to have variable selection consistency and estimation consistency under some conditions. The asymptotic behavior in distribution of the ARGEN estimators have been studied. We also propose an algorithm called MU-QP-RR-W-$l_1$ to efficiently solve ARGEN. By conducting simulation study we show that ARGEN outperforms the elastic net in a number of settings. Finally an application of S&P 500 index tracking with constraints on the stock allocations is performed to provide general guidance for adapting ARGEN to solve real-world problems.