cs.LG 方向,今日共计57篇
Graph相关(图学习|图神经网络|图优化等)(4篇)
【1】 SPAN: Subgraph Prediction Attention Network for Dynamic Graphs 标题:SPAN:动态图的子图预测注意网络 链接:https://arxiv.org/abs/2108.07776
作者:Yuan Li,Chuanchang Chen,Yubo Tao,Hai Lin 机构:State Key Lab of CAD&CG, ZheJiang University, HangZhou, China 备注:Accepted by PRICAI 2021 摘要:本文提出了一种新的动态图子图预测模型,它是传统链路预测的扩展。该端到端模型直接学习当前快照中的子图结构到下一快照中的子图结构的映射,即子图中多个节点之间的边存在。设计了一种新的双塔模型交叉注意机制,将节点属性信息和拓扑信息协同集成到子图演化学习中。我们将我们的模型与几种最先进的子图预测方法和子图模式预测方法进行了比较,这些方法分别适用于多个真实世界的同质和异构动态图。实验结果表明,在这两个任务中,我们的模型优于其他模型,增益从5.02%增加到10.88%。 摘要:This paper proposes a novel model for predicting subgraphs in dynamic graphs, an extension of traditional link prediction. This proposed end-to-end model learns a mapping from the subgraph structures in the current snapshot to the subgraph structures in the next snapshot directly, i.e., edge existence among multiple nodes in the subgraph. A new mechanism named cross-attention with a twin-tower module is designed to integrate node attribute information and topology information collaboratively for learning subgraph evolution. We compare our model with several state-of-the-art methods for subgraph prediction and subgraph pattern prediction in multiple real-world homogeneous and heterogeneous dynamic graphs, respectively. Experimental results demonstrate that our model outperforms other models in these two tasks, with a gain increase from 5.02% to 10.88%.
【2】 How Powerful is Graph Convolution for Recommendation? 标题:图形卷积的推荐功能有多强? 链接:https://arxiv.org/abs/2108.07567
作者:Yifei Shen,Yongji Wu,Yao Zhang,Caihua Shan,Jun Zhang,Khaled B. Letaief,Dongsheng Li 机构:HKUST, Duke University, Fudan University, Microsoft Research Asia 摘要:图卷积网络(GCN)最近启用了一类流行的协同过滤(CF)算法。然而,他们的经验成功的理论基础仍然难以捉摸。在本文中,我们试图通过图形信号处理的角度来更好地理解基于GCN的CF方法。通过确定平滑度(图形信号处理中的一个关键概念)的关键作用,我们为CF开发了一个统一的基于图卷积的框架。我们证明了许多现有CF方法都是该框架的特例,包括基于邻域的方法、低秩矩阵分解、线性自动编码器和LightGCN,对应不同的低通滤波器。基于我们的框架,我们提出了一个简单且计算效率高的CF基线,我们称之为基于图过滤器的协同过滤(GF-CF)。在给定隐式反馈矩阵的情况下,GF-CF可以以封闭形式获得,而无需昂贵的反向传播训练。实验将表明,GF-CF在三个著名的数据集上与基于深度学习的方法相比具有竞争力或更好的性能,尤其是在亚马逊图书数据集上比LightGCN获得70%$的性能增益。 摘要:Graph convolutional networks (GCNs) have recently enabled a popular class of algorithms for collaborative filtering (CF). Nevertheless, the theoretical underpinnings of their empirical successes remain elusive. In this paper, we endeavor to obtain a better understanding of GCN-based CF methods via the lens of graph signal processing. By identifying the critical role of smoothness, a key concept in graph signal processing, we develop a unified graph convolution-based framework for CF. We prove that many existing CF methods are special cases of this framework, including the neighborhood-based methods, low-rank matrix factorization, linear auto-encoders, and LightGCN, corresponding to different low-pass filters. Based on our framework, we then present a simple and computationally efficient CF baseline, which we shall refer to as Graph Filter based Collaborative Filtering (GF-CF). Given an implicit feedback matrix, GF-CF can be obtained in a closed form instead of expensive training with back-propagation. Experiments will show that GF-CF achieves competitive or better performance against deep learning-based methods on three well-known datasets, notably with a $70%$ performance gain over LightGCN on the Amazon-book dataset.
【3】 GCCAD: Graph Contrastive Coding for Anomaly Detection 标题:GCCAD:用于异常检测的图对比编码 链接:https://arxiv.org/abs/2108.07516
作者:Bo Chen,Jing Zhang,Xiaokang Zhang,Yuxiao Dong,Jian Song,Peng Zhang,Kaibo Xu,Evgeny Kharlamov,Jie Tang 备注:14 pages, under review 摘要:基于图的异常检测已广泛用于检测现实应用中的恶意活动。迄今为止,解决这一问题的现有尝试主要集中在结构特征工程或二元分类体系中的学习上。在这项工作中,我们建议利用图形对比编码,并提出监督GCCAD模型,根据异常节点与正常节点到全局上下文的距离(例如,所有节点的平均值)对比异常节点与正常节点。为了处理标签稀少的场景,我们通过设计生成合成节点标签的图损坏策略,进一步使GCCAD成为一个自我监督的框架。为了达到对比的目的,我们设计了一个图形神经网络编码器,可以在消息传递过程中推断并进一步删除可疑链接,以及学习输入图形的全局上下文。我们在四个公共数据集上进行了广泛的实验,证明1)GCCAD显著且始终优于各种高级基线,2)其无需微调的自监督版本可以实现与其完全监督版本相当的性能。 摘要:Graph-based anomaly detection has been widely used for detecting malicious activities in real-world applications. Existing attempts to address this problem have thus far focused on structural feature engineering or learning in the binary classification regime. In this work, we propose to leverage graph contrastive coding and present the supervised GCCAD model for contrasting abnormal nodes with normal ones in terms of their distances to the global context (e.g., the average of all nodes). To handle scenarios with scarce labels, we further enable GCCAD as a self-supervised framework by designing a graph corrupting strategy for generating synthetic node labels. To achieve the contrastive objective, we design a graph neural network encoder that can infer and further remove suspicious links during message passing, as well as learn the global context of the input graph. We conduct extensive experiments on four public datasets, demonstrating that 1) GCCAD significantly and consistently outperforms various advanced baselines and 2) its self-supervised version without fine-tuning can achieve comparable performance with its fully supervised version.
【4】 RRLFSOR: An Efficient Self-Supervised Learning Strategy of Graph Convolutional Networks 标题:RRLFSOR:一种高效的图卷积网络自监督学习策略 链接:https://arxiv.org/abs/2108.07481
作者:Feng Sun,Ajith Kumar V,Guanci Yang,Qikui Zhu,Yiyun Zhang,Ansi Zhang,Dhruv Makwana 机构:Postgraduate, Assistant Experimentalist, Experimental Teaching Center for Liberal Arts, Zhejiang Normal University, Jinhua, Zhejiang, China , Postgraduate, Senior Engineer, School of AI, Bangalore, India , Ph.D., Professor 备注:27 pages 摘要:为了进一步提高GCNs的性能和自学习能力,本文提出了一种有效的GCNs自监督学习策略,称为一个区域固定步长随机删除链路(RRLFSOR)。此外,我们还提出了另一种GCNs自监督学习策略,即在某些块上用固定步长随机删除链路(RRLFSSB),以解决相邻节点没有选择步长的问题。对转换链接预测任务的实验表明,我们的策略在三个基准数据集上的准确率始终比基线模型高出21.34%。 摘要:To further improve the performance and the self-learning ability of GCNs, in this paper, we propose an efficient self-supervised learning strategy of GCNs, named randomly removed links with a fixed step at one region (RRLFSOR). In addition, we also propose another self-supervised learning strategy of GCNs, named randomly removing links with a fixed step at some blocks (RRLFSSB), to solve the problem that adjacent nodes have no selected step. Experiments on transductive link prediction tasks show that our strategies outperform the baseline models consistently by up to 21.34% in terms of accuracy on three benchmark datasets.
GAN|对抗|攻击|生成相关(1篇)
【1】 When Should You Defend Your Classifier -- A Game-theoretical Analysis of Countermeasures against Adversarial Examples 标题:量词何时辩护--对抗反例对策的博弈论分析 链接:https://arxiv.org/abs/2108.07602
作者:Maximilian Samsinger,Florian Merkle,Pascal Schöttle,Tomas Pevny 机构: Management Center Innsbruck, Universit¨atsstr. , Innsbruck, Austria, Department of Computers and Engineering, Czech Technical University in Prague 摘要:对抗性机器学习,即提高机器学习算法对所谓的对抗性示例的鲁棒性,现已成为一个成熟的领域。然而,新提出的方法在不考虑敌方和防御方成本、攻击所有样本或不攻击样本的不现实场景下进行评估和比较。我们仔细研究了这些假设,并提出了先进的对手分类游戏,它包含了对手和防御者在对手分类中的所有相关参数。特别是,我们考虑了双方的经济因素,以及迄今为止针对对抗性示例提出的所有对策都会降低良性样本的准确性这一事实。详细分析两个参与者都有两种纯策略的场景,我们确定了所有最佳反应,并得出结论,在实际环境中,最有影响的因素可能是最大数量的对抗性示例。 摘要:Adversarial machine learning, i.e., increasing the robustness of machine learning algorithms against so-called adversarial examples, is now an established field. Yet, newly proposed methods are evaluated and compared under unrealistic scenarios where costs for adversary and defender are not considered and either all samples are attacked or no sample is attacked. We scrutinize these assumptions and propose the advanced adversarial classification game, which incorporates all relevant parameters of an adversary and a defender in adversarial classification. Especially, we take into account economic factors on both sides and the fact that all so far proposed countermeasures against adversarial examples reduce accuracy on benign samples. Analyzing the scenario in detail, where both players have two pure strategies, we identify all best responses and conclude that in practical settings, the most influential factor might be the maximum amount of adversarial examples.
半/弱/无/有监督|不确定性|主动学习(6篇)
【1】 RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection 标题:RandomRoom:用于3D目标检测的合成形状和随机布局的无监督预训练 链接:https://arxiv.org/abs/2108.07794
作者:Yongming Rao,Benlin Liu,Yi Wei,Jiwen Lu,Cho-Jui Hsieh,Jie Zhou 机构:Tsinghua University,UCLA,University of Washington 备注:Accepted to ICCV 2021 摘要:三维点云理解近年来取得了很大进展。然而,一个主要的瓶颈是缺少带注释的真实数据集,特别是与2D对象检测任务相比,因为注释场景的真实扫描需要大量的劳动力。解决这个问题的一个很有希望的方法是更好地利用由CAD对象模型组成的合成数据集,以促进对真实数据集的学习。这可以通过预训练和微调程序实现。然而,最近关于3D预训练的工作在将合成对象上学习到的特征转移到其他实际应用中时显示出失败。在这项工作中,我们提出了一种称为随机房间的新方法来实现这一目标。特别是,我们建议利用合成CAD数据集中的对象生成场景的随机布局,并通过对同一组合成对象生成的两个随机场景应用对象级对比学习来学习3D场景表示。在以后对3D对象检测任务进行微调时,以这种方式预先训练的模型可以作为更好的初始化。从经验上看,我们在几个基础模型上显示了下游3D检测任务的持续改进,特别是当使用较少的训练数据时,这有力地证明了我们方法的有效性和泛化性。得益于合成数据中丰富的语义知识和多样的对象,我们的方法在广泛使用的3D检测基准ScanNetV2和SUN RGB-D上建立了新的技术水平。我们期望我们的尝试能够提供一个新的视角,用于连接对象和场景级别的3D理解。 摘要:3D point cloud understanding has made great progress in recent years. However, one major bottleneck is the scarcity of annotated real datasets, especially compared to 2D object detection tasks, since a large amount of labor is involved in annotating the real scans of a scene. A promising solution to this problem is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. This can be achieved by the pre-training and fine-tuning procedure. However, recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications. In this work, we put forward a new method called RandomRooms to accomplish this objective. In particular, we propose to generate random layouts of a scene by making use of the objects in the synthetic CAD dataset and learn the 3D scene representation by applying object-level contrastive learning on two random scenes generated from the same set of synthetic objects. The model pre-trained in this way can serve as a better initialization when later fine-tuning on the 3D object detection task. Empirically, we show consistent improvement in downstream 3D detection tasks on several base models, especially when less training data are used, which strongly demonstrates the effectiveness and generalization of our method. Benefiting from the rich semantic knowledge and diverse objects from synthetic data, our method establishes the new state-of-the-art on widely-used 3D detection benchmarks ScanNetV2 and SUN RGB-D. We expect our attempt to provide a new perspective for bridging object and scene-level 3D understanding.
【2】 ImitAL: Learning Active Learning Strategies from Synthetic Data 标题:ImitAL:从合成数据中学习主动学习策略 链接:https://arxiv.org/abs/2108.07670
作者:Julius Gonsior,Maik Thiele,Wolfgang Lehner 机构:Technische Universit¨at Dresden, Dresden, Germany 摘要:使应用监督机器学习复杂化的最大挑战之一是需要大量的标记数据。主动学习(AL)是一种众所周知的标准方法,通过基于查询策略首先标记包含最多信息的样本来有效地获取标记数据。尽管过去已经提出了许多查询策略的方法,但是还没有发现一种明显的适用于所有领域的优越方法。此外,许多策略的计算成本很高,这进一步阻碍了AL在大规模注释项目中的广泛使用。因此,我们提出了ImitAL,一种新的查询策略,它将AL编码为一个学习排序问题。为了训练底层神经网络,我们选择了模仿学习。训练所需的演示专家经验来自纯合成数据。为了显示ImitAL{}的通用性和优越性,我们对来自广泛领域的15个不同数据集上的策略与10种不同的最先进的查询策略进行了广泛的比较。我们还表明,与大多数其他策略相比,我们的方法具有更高的运行时性能,特别是在非常大的数据集上。 摘要:One of the biggest challenges that complicates applied supervised machine learning is the need for huge amounts of labeled data. Active Learning (AL) is a well-known standard method for efficiently obtaining labeled data by first labeling the samples that contain the most information based on a query strategy. Although many methods for query strategies have been proposed in the past, no clear superior method that works well in general for all domains has been found yet. Additionally, many strategies are computationally expensive which further hinders the widespread use of AL for large-scale annotation projects. We, therefore, propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem. For training the underlying neural network we chose Imitation Learning. The required demonstrative expert experience for training is generated from purely synthetic data. To show the general and superior applicability of ImitAL{}, we perform an extensive evaluation comparing our strategy on 15 different datasets, from a wide range of domains, with 10 different state-of-the-art query strategies. We also show that our approach is more runtime performant than most other strategies, especially on very large datasets.
【3】 MVCNet: Multiview Contrastive Network for Unsupervised Representation Learning for 3D CT Lesions 标题:MVCNet:三维CT病变无监督表征学习的多视图对比网络 链接:https://arxiv.org/abs/2108.07662
作者:Penghua Zhai,Huaiwei Cong,Gangming Zhao,Chaowei Fang,Jinpeng Li 机构:Ting Cai, and Huiguang He, Center for Pattern Recognition and Intelligent Medicine, HwaMei Hospital, University of Chinese Academy of Sciences, Ningbo , China, Ningbo Institute of Life and Health Industry, University of Chinese Academy of 备注:This 16-page manuscript has been submitted to Meidcal Image Analysis for possible publication 摘要:随着深度学习的复兴,计算机断层扫描(CT)的自动诊断系统已经取得了许多成功的应用。然而,它们大多归因于仔细的专家注释,而在实践中通常很少。这促使我们对无监督表征学习产生兴趣。最近的研究表明,自我监督学习是学习表征的一种有效方法,但大多数研究依赖于转换和借口任务的经验设计。为了避免与这些方法相关的主观性,我们提出了MVCNet,一种新的无监督三维(3D)表示学习方法,以无变换的方式工作。我们从不同方向查看每个3D病变,以收集多个二维(2D)视图。然后,通过最小化对比损失学习嵌入函数,从而聚集相同3D病变的2D视图,分离不同病变的2D视图。我们通过在嵌入层上训练一个简单的分类头来评估表示。实验结果表明,MVCNet在LIDC-IDRI(89.55%)、LNDb(77.69%)和天池(79.96%)数据集上实现了最先进的无监督表征学习精度。当对10%的标记数据进行微调时,准确度与监督学习模型相当(在三个数据集上分别为89.46%和85.03%,73.85%和73.44%,83.56%和83.34%),表明MVCNet在有限注释的学习表示方面具有优势。代码发布于:https://github.com/penghuazhai/MVCNet. 摘要:With the renaissance of deep learning, automatic diagnostic systems for computed tomography (CT) have achieved many successful applications. However, they are mostly attributed to careful expert annotations, which are often scarce in practice. This drives our interest to the unsupervised representation learning. Recent studies have shown that self-supervised learning is an effective approach for learning representations, but most of them rely on the empirical design of transformations and pretext tasks. To avoid the subjectivity associated with these methods, we propose the MVCNet, a novel unsupervised three dimensional (3D) representation learning method working in a transformation-free manner. We view each 3D lesion from different orientations to collect multiple two dimensional (2D) views. Then, an embedding function is learned by minimizing a contrastive loss so that the 2D views of the same 3D lesion are aggregated, and the 2D views of different lesions are separated. We evaluate the representations by training a simple classification head upon the embedding layer. Experimental results show that MVCNet achieves state-of-the-art accuracies on the LIDC-IDRI (89.55%), LNDb (77.69%) and TianChi (79.96%) datasets for unsupervised representation learning. When fine-tuned on 10% of the labeled data, the accuracies are comparable to the supervised learning model (89.46% vs. 85.03%, 73.85% vs. 73.44%, 83.56% vs. 83.34% on the three datasets, respectively), indicating the superiority of MVCNet in learning representations with limited annotations. Code is released at: https://github.com/penghuazhai/MVCNet.
【4】 Investigating a Baseline Of Self Supervised Learning Towards Reducing Labeling Costs For Image Classification 标题:降低图像分类标注代价的自监督学习基线研究 链接:https://arxiv.org/abs/2108.07464
作者:Hilal AlQuabeh,Ameera Bawazeer,Abdulateef Alhashmi 机构:Department of Machine Learning, Mohamed Bin Zayed University of Artificial Intellgence, Abu Dhabi, UAE 备注:10 Pages 摘要:有监督学习中的数据标注在某些情况下被认为是一种昂贵且不可行的工具。自监督学习方法是为了解决标记数据较少时的学习效率问题而提出的,但是,对于获得足够结果所需的标记数据的大小缺乏信心。这项研究的目的是在标记数据的比例上画一条基线,与使用附加标记的训练相比,模型可以欣赏该比例,从而产生合格的准确性。本研究采用kaggle.com的猫与狗数据集、Mnist和Fashion Mnist,通过在原始数据集上实施随机旋转增强来研究自我监督学习任务。为了揭示借口过程在自我监督学习中的真正有效性,将原始数据集划分为更小的批次,并在每个批次上重复学习,有无借口预训练。结果表明,与普通监督学习相比,自我监督学习中的借口过程在下游分类任务中提高了15%左右的准确率。 摘要:Data labeling in supervised learning is considered an expensive and infeasible tool in some conditions. The self-supervised learning method is proposed to tackle the learning effectiveness with fewer labeled data, however, there is a lack of confidence in the size of labeled data needed to achieve adequate results. This study aims to draw a baseline on the proportion of the labeled data that models can appreciate to yield competent accuracy when compared to training with additional labels. The study implements the kaggle.com' cats-vs-dogs dataset, Mnist and Fashion-Mnist to investigate the self-supervised learning task by implementing random rotations augmentation on the original datasets. To reveal the true effectiveness of the pretext process in self-supervised learning, the original dataset is divided into smaller batches, and learning is repeated on each batch with and without the pretext pre-training. Results show that the pretext process in the self-supervised learning improves the accuracy around 15% in the downstream classification task when compared to the plain supervised learning.
【5】 Incorporating Uncertainty in Learning to Defer Algorithms for Safe Computer-Aided Diagnosis 标题:将不确定性纳入学习延迟算法以实现安全的计算机辅助诊断 链接:https://arxiv.org/abs/2108.07392
作者:Jessie Liu,Blanca Gallego,Sebastiano Barbieri 机构:Centre for Big Data Research in Health, University of New South Wales 摘要:在这项研究中,我们提出了不确定性延迟学习(LDU)算法,这种方法在确定要由人类专家评估的患者组时考虑了模型的预测不确定性。我们的目标是在医疗环境中部署ML模型时确保患者安全。 摘要:In this study we propose the Learning to Defer with Uncertainty (LDU) algorithm, an approach which considers the model's predictive uncertainty when identifying the patient group to be evaluated by human experts. Our aim is to ensure patient safety when ML models are deployed in healthcare settings.
【6】 Weakly Supervised Classification Using Group-Level Labels 标题:基于组级标签的弱监督分类 链接:https://arxiv.org/abs/2108.07330
作者:Guruprasad Nayak,Rahul Ghosh,Xiaowei Jia,Vipin Kumar 机构:University of Minnesota, Minneapolis, MN, USA, University of Pittsburgh, Pittsburgh, PA, USA 备注:Presented at the DeMaL workshop, KDD'21 摘要:在许多应用中,找到足够的标记数据来训练预测模型是一个重大挑战。在这项工作中,我们提出了使用组级二进制标签作为弱监督来训练实例级二进制分类模型的方法。聚合标签在多个域中很常见,其中在组级别上进行注释可能更便宜,或者可能是在不侵犯隐私的情况下提供注释数据的唯一方法。我们将组级标签建模为单个实例的类条件噪声(CCN)标签,并使用噪声标签来正则化在强标签实例上训练的模型预测。我们在现实世界中应用土地覆盖图的实验表明,无论是在存在还是不存在类别不平衡的情况下,所提出的方法都可以利用组级标签。 摘要:In many applications, finding adequate labeled data to train predictive models is a major challenge. In this work, we propose methods to use group-level binary labels as weak supervision to train instance-level binary classification models. Aggregate labels are common in several domains where annotating on a group-level might be cheaper or might be the only way to provide annotated data without infringing on privacy. We model group-level labels as Class Conditional Noisy (CCN) labels for individual instances and use the noisy labels to regularize predictions of the model trained on the strongly-labeled instances. Our experiments on real-world application of land cover mapping shows the utility of the proposed method in leveraging group-level labels, both in the presence and absence of class imbalance.
迁移|Zero/Few/One-Shot|自适应(3篇)
【1】 Direct domain adaptation through reciprocal linear transformations 标题:通过倒易线性变换实现区域直接自适应 链接:https://arxiv.org/abs/2108.07600
作者:Tariq Alkhalifah,Oleg Ovcharenko 机构:Physical Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal ,-, Saudi Arabia 备注:18 pages, 10 figures 摘要:我们提出了一种直接域自适应(DDA)方法,通过从真实数据中提取特征来丰富有监督神经网络对合成数据的训练。该过程涉及对NN模型输入特征的一系列线性操作,无论这些特征来自源域还是目标域,如下所示:1)输入数据(即图像)与来自该域的所有图像的随机拾取的采样像素(或多个像素)或所有图像的所有随机拾取的采样像素(或多个像素)的平均值的互相关。2) 结果数据与来自另一个域的自相关输入图像的平均值的卷积。在训练阶段,如预期的那样,输入图像来自源域,自相关图像的平均值来自目标域。在推理/应用阶段,输入图像来自目标域,自相关图像的平均值来自源域。该方法只处理源域和目标域的数据,不明显干扰训练工作流和网络结构。包括在MNIST数据集上训练卷积神经网络和在MNIST-M数据集上测试网络的应用程序可在测试数据上实现70%的准确性。主成分分析(PCA)和t-SNE表明,与原始MNIST和MNIST-M输入特征相比,源域和目标域的输入特征在提出的直接变换后与主成分具有相似的特性。 摘要:We propose a direct domain adaptation (DDA) approach to enrich the training of supervised neural networks on synthetic data by features from real-world data. The process involves a series of linear operations on the input features to the NN model, whether they are from the source or target domains, as follows: 1) A cross-correlation of the input data (i.e. images) with a randomly picked sample pixel (or pixels) of all images from that domain or the mean of all randomly picked sample pixel (or pixels) of all images. 2) The convolution of the resulting data with the mean of the autocorrelated input images from the other domain. In the training stage, as expected, the input images are from the source domain, and the mean of auto-correlated images are evaluated from the target domain. In the inference/application stage, the input images are from the target domain, and the mean of auto-correlated images are evaluated from the source domain. The proposed method only manipulates the data from the source and target domains and does not explicitly interfere with the training workflow and network architecture. An application that includes training a convolutional neural network on the MNIST dataset and testing the network on the MNIST-M dataset achieves a 70% accuracy on the test data. A principal component analysis (PCA), as well as t-SNE, show that the input features from the source and target domains, after the proposed direct transformations, share similar properties along with the principal components as compared to the original MNIST and MNIST-M input features.
【2】 FARF: A Fair and Adaptive Random Forests Classifier 标题:FARF:一种公平自适应的随机森林分类器 链接:https://arxiv.org/abs/2108.07403
作者:Wenbin Zhang,Albert Bifet,Xiangliang Zhang,Jeremy C. Weiss,Wolfgang Nejdl 机构: University of Maryland, Baltimore County, MD , USA, University of Waikato, Hamilton , New Zealand, T´el´ecom Paris, Institut Polytechnique de Paris, Palaiseau , France, King Abdullah University of Science and Technology, Thuwal , Saudi Arabia 摘要:随着人工智能(AI)在更多应用中的应用,需要考虑和减轻学习模型的偏差。大多数开发公平学习算法的工作都集中在离线设置上。然而,在许多现实世界的应用程序中,数据以在线方式出现,需要动态处理。此外,在实际应用中,需要考虑准确性和公平性之间的折衷,但目前的方法通常具有多个超参数,并通过非平凡的交互来实现公平性。在本文中,我们提出了一种灵活的集成算法,用于在更具挑战性的在线环境中进行公平决策。该算法称为FARF(公平和自适应随机森林),基于使用在线组件分类器并根据当前分布更新它们,这也考虑了公平性和改变公平性-准确性平衡的单个超参数。在真实世界的鉴别数据流上的实验证明了FARF的实用性。 摘要:As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in practical application, there is a trade-off between accuracy and fairness that needs to be accounted for, but current methods often have multiple hyperparameters with non-trivial interaction to achieve fairness. In this paper, we propose a flexible ensemble algorithm for fair decision-making in the more challenging context of evolving online settings. This algorithm, called FARF (Fair and Adaptive Random Forests), is based on using online component classifiers and updating them according to the current distribution, that also accounts for fairness and a single hyperparameters that alters fairness-accuracy balance. Experiments on real-world discriminated data streams demonstrate the utility of FARF.
【3】 BOBCAT: Bilevel Optimization-Based Computerized Adaptive Testing 标题:Bobcat:基于双层优化的计算机化自适应测验 链接:https://arxiv.org/abs/2108.07386
作者:Aritra Ghosh,Andrew Lan 机构:University of Massachusetts Amherst 备注:IJCAI 2021 with supplementary material 摘要:计算机自适应测试(CAT)是指针对每个学生/考生的个性化测试形式。CAT方法根据每个学生对之前问题的回答,自适应地选择下一个信息量最大的问题/项目,有效地缩短了测试长度。现有的CAT方法使用项目反应理论(IRT)模型将学生的能力与他们对问题的反应联系起来,并使用静态问题选择算法来尽快减少能力估计误差;因此,这些算法无法通过从大规模学生反应数据中学习来改进。在本文中,我们提出了一个基于双层优化的框架BOBCAT,用于CAT直接从训练数据学习数据驱动的问题选择算法。BOBCAT对潜在的学生反应模型不可知,并且在自适应测试过程中计算效率高。通过对五个真实世界的学生反应数据集的广泛实验,我们表明BOBCAT在缩短测试长度方面优于现有的CAT方法(有时显著)。 摘要:Computerized adaptive testing (CAT) refers to a form of tests that are personalized to every student/test taker. CAT methods adaptively select the next most informative question/item for each student given their responses to previous questions, effectively reducing test length. Existing CAT methods use item response theory (IRT) models to relate student ability to their responses to questions and static question selection algorithms designed to reduce the ability estimation error as quickly as possible; therefore, these algorithms cannot improve by learning from large-scale student response data. In this paper, we propose BOBCAT, a Bilevel Optimization-Based framework for CAT to directly learn a data-driven question selection algorithm from training data. BOBCAT is agnostic to the underlying student response model and is computationally efficient during the adaptive testing process. Through extensive experiments on five real-world student response datasets, we show that BOBCAT outperforms existing CAT methods (sometimes significantly) at reducing test length.
强化学习(3篇)
【1】 Optimal Placement of Public Electric Vehicle Charging Stations Using Deep Reinforcement Learning 标题:基于深度强化学习的公共电动汽车充电站优化布置 链接:https://arxiv.org/abs/2108.07772
作者:Aidan Petratos,Allen Ting,Shankar Padmanabhan,Kristina Zhou,Dylan Hageman,Jesse R. Pisel,Michael J. Pyrcz 机构:College of Natural Sciences, The University of Texas at, Paul M. Rady School of, Computer Science and, Engineering, The University, of Colorado at Boulder, Jackson School of, Geosciences, The University, of Texas at Austin 备注:25 pages with 12 figures 摘要:在充电基础设施不断发展的地区设置充电站是电动汽车(EV)未来成功的关键组成部分。在纽约奥尔巴尼县,电动汽车人口的预期增长需要额外的充电站来维持整个充电基础设施的足够效率水平。强化学习(RL)的一种新应用能够在给定预测充电需求和当前充电位置的情况下找到新充电站的最佳位置。影响充电需求预测的最重要因素包括相邻交通密度、电动汽车登记以及与某些类型公共建筑的距离。建议的RL框架可以改进并应用于世界各地的城市,以优化充电站的布局。 摘要:The placement of charging stations in areas with developing charging infrastructure is a critical component of the future success of electric vehicles (EVs). In Albany County in New York, the expected rise in the EV population requires additional charging stations to maintain a sufficient level of efficiency across the charging infrastructure. A novel application of Reinforcement Learning (RL) is able to find optimal locations for new charging stations given the predicted charging demand and current charging locations. The most important factors that influence charging demand prediction include the conterminous traffic density, EV registrations, and proximity to certain types of public buildings. The proposed RL framework can be refined and applied to cities across the world to optimize charging station placement.
【2】 Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays 标题:随机延迟强化学习的重访状态增强方法 链接:https://arxiv.org/abs/2108.07555
作者:Somjit Nath,Mayank Baranwal,Harshad Khadilkar 机构:TCS Research, Mumbai, India, IIT Bombay 备注:Accepted at CIKM'21 摘要:一些真实场景,如远程控制和传感,由行动和观测延迟组成。延迟的存在会降低强化学习(RL)算法的性能,通常会导致算法无法学习任何实质性内容。本文形式化地描述了具有随机延迟的马尔可夫决策过程(MDP)的概念,并证明了延迟MDP可以转化为具有显著简化的成本结构的等价标准MDP(无延迟)。我们利用这种等价性导出了一个无模型延迟解析RL框架,并证明了即使是建立在该框架上的简单RL算法,在行动和观测具有随机延迟的环境中也能获得接近最优的回报。延迟解析深度Q网络(DRDQN)算法在包括多步延迟和随机延迟的各种环境中进行了基准测试,与当前建立的算法相比,在实现接近最优的回报和最小化其计算开销方面,该算法具有更好的性能。 摘要:Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.
【3】 Heterotic String Model Building with Monad Bundles and Reinforcement Learning 标题:基于单元束和强化学习的异质弦建模 链接:https://arxiv.org/abs/2108.07316
作者:Andrei Constantin,Thomas R. Harvey,Andre Lukas 机构:Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Parks Road, Oxford OX,PU, UK 备注:35 pages, 9 figures, data set of models included as ancillary material in the submission 摘要:我们使用强化学习作为构造具有规定属性的字符串紧化的一种手段。具体而言,我们研究了Calabi-Yau三倍体上带有单子束的杂合SO(10)肠道模型,以寻找现象学上有希望的例子。由于束的数量庞大,可行选择的稀疏性,基于系统扫描的方法不适合这类模型。通过关注两个具有Picard数2和3的特定流形,我们证明了强化学习可以成功地用于探索单子束。训练可以用最少的计算资源完成,并导致高效的策略网络。它们在几乎100%的情节和少量的步骤中产生了现象学上有希望的状态。通过这种方式,可以找到数百个新的候选标准模型。 摘要:We use reinforcement learning as a means of constructing string compactifications with prescribed properties. Specifically, we study heterotic SO(10) GUT models on Calabi-Yau three-folds with monad bundles, in search of phenomenologically promising examples. Due to the vast number of bundles and the sparseness of viable choices, methods based on systematic scanning are not suitable for this class of models. By focusing on two specific manifolds with Picard numbers two and three, we show that reinforcement learning can be used successfully to explore monad bundles. Training can be accomplished with minimal computing resources and leads to highly efficient policy networks. They produce phenomenologically promising states for nearly 100% of episodes and within a small number of steps. In this way, hundreds of new candidate standard models are found.
推荐(2篇)
【1】 Feature Recommendation for Structural Equation Model Discovery in Process Mining 标题:过程挖掘中结构方程模型发现的特征推荐 链接:https://arxiv.org/abs/2108.07795
作者:Mahnaz Sadat Qafari,Wil van der Aalst 机构:Rheinisch-Westf¨alische Technische Hochschule Aachen(RWTH), Aachen, Germany 备注:28 pages, 16 figures 摘要:流程挖掘技术可以帮助组织改进其运营流程。在发现和修改性能或法规遵从性问题的根本原因方面,企业可以从流程挖掘技术中获益。考虑到当今公司信息系统捕获的数据量和特征数量,发现根本原因分析中应考虑的特征集的任务可能相当复杂。在本文中,我们提出了一种方法来寻找(聚合)特征集,这可能会对问题产生影响。根本原因分析任务通常通过将机器学习技术应用于从支持流程的信息系统收集的数据来完成。为了防止由于将机器学习技术的发现解释为因果关系而可能发生的相关性和因果关系混淆,我们提出了一种发现过程结构方程模型的方法,该模型可用于根本原因分析。我们已经在ProM中实现了所提出的方法作为插件,并使用两个真实和合成事件日志对其进行了评估。这些实验证明了所提方法的有效性和有效性。 摘要:Process mining techniques can help organizations to improve their operational processes. Organizations can benefit from process mining techniques in finding and amending the root causes of performance or compliance problems. Considering the volume of the data and the number of features captured by the information system of today's companies, the task of discovering the set of features that should be considered in root cause analysis can be quite involving. In this paper, we propose a method for finding the set of (aggregated) features with a possible effect on the problem. The root cause analysis task is usually done by applying a machine learning technique to the data gathered from the information system supporting the processes. To prevent mixing up correlation and causation, which may happen because of interpreting the findings of machine learning techniques as causal, we propose a method for discovering the structural equation model of the process that can be used for root cause analysis. We have implemented the proposed method as a plugin in ProM and we have evaluated it using two real and synthetic event logs. These experiments show the validity and effectiveness of the proposed methods.
【2】 MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation 标题:MOI-Mixer:序贯推荐中多阶交互改进的MLP-Mixer 链接:https://arxiv.org/abs/2108.07505
作者:Hojoon Lee,Dongyoon Hwang,Sunghwan Hong,Changyeon Kim,Seungryong Kim,Jaegul Choo 机构:KAIST AI, Korea University, KAKAO 备注:9 pages 摘要:成功的顺序推荐系统依赖于准确地捕捉用户的短期和长期兴趣。尽管基于Transformer的模型在序列推荐任务中取得了最先进的性能,但它们通常需要二次内存和序列长度的时间复杂度,因此很难提取用户的长期兴趣。另一方面,基于多层感知器(MLP)的模型,以其线性记忆和时间复杂性而闻名,最近在各种任务中显示出与Transformer相比的竞争结果。考虑到大量用户行为历史的可用性,基于MLP的模型的线性内存和时间复杂性使其成为序列推荐任务中一个很有希望的替代方案。为此,我们在顺序推荐中采用了基于MLP的模型,但一致发现,尽管基于MLP的方法具有计算优势,但其性能低于Transformer的方法。从实验中,我们观察到,在MLP层中引入显式的高阶相互作用可以缓解这种性能差距。作为回应,我们提出了多阶交互(MOI)层,它能够在保持MLP层的内存和时间复杂性的同时表达输入中任意阶的交互。通过将MLP层替换为MOI层,我们的模型能够实现与基于Transformer的模型相当的性能,同时保留基于MLP的模型的计算优势。 摘要:Successful sequential recommendation systems rely on accurately capturing the user's short-term and long-term interest. Although Transformer-based models achieved state-of-the-art performance in the sequential recommendation task, they generally require quadratic memory and time complexity to the sequence length, making it difficult to extract the long-term interest of users. On the other hand, Multi-Layer Perceptrons (MLP)-based models, renowned for their linear memory and time complexity, have recently shown competitive results compared to Transformer in various tasks. Given the availability of a massive amount of the user's behavior history, the linear memory and time complexity of MLP-based models make them a promising alternative to explore in the sequential recommendation task. To this end, we adopted MLP-based models in sequential recommendation but consistently observed that MLP-based methods obtain lower performance than those of Transformer despite their computational benefits. From experiments, we observed that introducing explicit high-order interactions to MLP layers mitigates such performance gap. In response, we propose the Multi-Order Interaction (MOI) layer, which is capable of expressing an arbitrary order of interactions within the inputs while maintaining the memory and time complexity of the MLP layer. By replacing the MLP layer with the MOI layer, our model was able to achieve comparable performance with Transformer-based models while retaining the MLP-based models' computational benefits.
聚类(2篇)
【1】 Incremental cluster validity index-guided online learning for performance and robustness to presentation order 标题:增量聚类有效性指数引导的在线学习的性能和对呈现顺序的鲁棒性 链接:https://arxiv.org/abs/2108.07743
作者:Leonardo Enzo Brito da Silva,Nagasharath Rayapati,Donald C. Wunsch II 机构:Guise AI Inc., USA, Applied Computational Intelligence Laboratory, Missouri University of Science and Technology, USA 摘要:在流式数据应用中,传入的样本会被处理和丢弃,因此,智能决策对于终身学习系统的性能至关重要。此外,样本到达的顺序可能严重影响在线(和离线)增量学习者的表现。最近引入的增量集群有效性指数(iCVIs)为解决此类问题提供了有价值的帮助。它们的主要用例是集群质量监控;尽管如此,它们最近已被集成到流式聚类方法中,以协助聚类任务本身。在此背景下,本文介绍了第一个基于自适应共振理论(ART)的模型,该模型使用iCVIs进行无监督和半监督在线学习。此外,它首次展示了如何使用iCVIs通过基于iCVI的匹配跟踪机制来调节ART警惕性。该模型通过将在线iCVI框架集成为拓扑自适应共振理论预测映射(TopoARTMAP)的模块B,从而命名为iCVI TopoARTMAP,并在每个学习步骤结束时采用iCVI驱动的后处理启发式,从而提高了排序效果的准确性和鲁棒性。在线iCVI框架根据多个iCVI中的任何一个,在每次迭代中向集群分配输入样本。iCVI TopoARTMAP维护了ARTMAP模型共享的有用属性,例如稳定性、对灾难性遗忘的免疫力,以及通过map field模块的多对一映射功能。通过合成数据集和真实人脸图像数据集的深度嵌入实验,评估了iCVI TopoARTMAP的性能(无监督和半监督)和对呈现顺序(无监督)的鲁棒性。 摘要:In streaming data applications incoming samples are processed and discarded, therefore, intelligent decision-making is crucial for the performance of lifelong learning systems. In addition, the order in which samples arrive may heavily affect the performance of online (and offline) incremental learners. The recently introduced incremental cluster validity indices (iCVIs) provide valuable aid in addressing such class of problems. Their primary use-case has been cluster quality monitoring; nonetheless, they have been very recently integrated in a streaming clustering method to assist the clustering task itself. In this context, the work presented here introduces the first adaptive resonance theory (ART)-based model that uses iCVIs for unsupervised and semi-supervised online learning. Moreover, it shows for the first time how to use iCVIs to regulate ART vigilance via an iCVI-based match tracking mechanism. The model achieves improved accuracy and robustness to ordering effects by integrating an online iCVI framework as module B of a topological adaptive resonance theory predictive mapping (TopoARTMAP) -- thereby being named iCVI-TopoARTMAP -- and by employing iCVI-driven post-processing heuristics at the end of each learning step. The online iCVI framework provides assignments of input samples to clusters at each iteration in accordance to any of several iCVIs. The iCVI-TopoARTMAP maintains useful properties shared by ARTMAP models, such as stability, immunity to catastrophic forgetting, and the many-to-one mapping capability via the map field module. The performance (unsupervised and semi-supervised) and robustness to presentation order (unsupervised) of iCVI-TopoARTMAP were evaluated via experiments with a synthetic data set and deep embeddings of a real-world face image data set.
【2】 Learning to Cluster via Same-Cluster Queries 标题:通过同群查询学习集群 链接:https://arxiv.org/abs/2108.07383
作者:Yi Li,Yan Song,Qin Zhang 机构:Nanyang Technological University, Singapore, Indiana University Bloomington, Bloomington, IN, USA 摘要:我们研究的问题,学习集群数据点使用oracle可以回答相同的集群查询。与以前的方法不同,我们不假设在开始时集群的总数是已知的,也不要求真实集群与预定义的目标函数(如K-均值)一致。从实践的角度来看,这些放松是至关重要的,同时也使问题更具挑战性。我们提出了两种具有可证明的理论保证的算法,并通过对合成数据和真实数据的大量实验验证了它们的有效性。 摘要:We study the problem of learning to cluster data points using an oracle which can answer same-cluster queries. Different from previous approaches, we do not assume that the total number of clusters is known at the beginning and do not require that the true clusters are consistent with a predefined objective function such as the K-means. These relaxations are critical from the practical perspective and, meanwhile, make the problem more challenging. We propose two algorithms with provable theoretical guarantees and verify their effectiveness via an extensive set of experiments on both synthetic and real-world data.
自动驾驶|车辆|车道检测等(1篇)
【1】 Understanding the factors driving the opioid epidemic using machine learning 标题:用机器学习理解阿片类药物流行的驱动因素 链接:https://arxiv.org/abs/2108.07301
作者:Sachin Gavali,Chuming Chen,Julie Cowart,Xi Peng,Shanshan Ding,Cathy Wu,Tammy Anderson 机构:University of Delaware, Newark, DE, USA 备注:Submitted to IEEE International Conference on Bioinformatics & Biomedicine 2021 摘要:近年来,美国经历了一场类阿片流行病,吸毒过量死亡人数前所未有。研究发现,此类过量死亡与邻里层面的特征有关,因此提供了确定有效干预措施的机会。通常,诸如普通最小二乘法(OLS)或最大似然估计法(MLE)等技术用于记录在解释此类不利结果方面具有重要意义的邻域级因素。然而,这些技术不太适合确定混杂因素之间的非线性关系。因此,在本研究中,我们应用基于机器学习的技术识别特拉华州社区的阿片类药物风险,并使用Shapley加法解释(SHAP)探讨这些因素之间的相关性。我们发现,与社区环境相关的因素,其次是教育,然后是犯罪,与较高的类阿片风险高度相关。我们还探讨了这些相关性多年来的变化,以了解疫情的变化动态。此外,我们发现,随着近年来疫情从合法(即处方类阿片)转向非法(如海洛因和芬太尼)药物,环境,与类阿片风险相关的犯罪和健康变量显著增加,而经济和社会人口变量的相关性降低。教育相关因素的相关性从一开始就较高,近年来略有增加,表明需要提高对类阿片流行病的认识。 摘要:In recent years, the US has experienced an opioid epidemic with an unprecedented number of drugs overdose deaths. Research finds such overdose deaths are linked to neighborhood-level traits, thus providing opportunity to identify effective interventions. Typically, techniques such as Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) are used to document neighborhood-level factors significant in explaining such adverse outcomes. These techniques are, however, less equipped to ascertain non-linear relationships between confounding factors. Hence, in this study we apply machine learning based techniques to identify opioid risks of neighborhoods in Delaware and explore the correlation of these factors using Shapley Additive explanations (SHAP). We discovered that the factors related to neighborhoods environment, followed by education and then crime, were highly correlated with higher opioid risk. We also explored the change in these correlations over the years to understand the changing dynamics of the epidemic. Furthermore, we discovered that, as the epidemic has shifted from legal (i.e., prescription opioids) to illegal (e.g.,heroin and fentanyl) drugs in recent years, the correlation of environment, crime and health related variables with the opioid risk has increased significantly while the correlation of economic and socio-demographic variables has decreased. The correlation of education related factors has been higher from the start and has increased slightly in recent years suggesting a need for increased awareness about the opioid epidemic.
联邦学习|隐私保护|加密(3篇)
【1】 Federated Learning with Correlated Data: Taming the Tail for Age-Optimal Industrial IoT 标题:具有相关数据的联合学习:驯服年龄最优工业物联网的尾巴 链接:https://arxiv.org/abs/2108.07504
作者:Chen-Feng Liu,Mehdi Bennis 机构:Centre for Wireless Communications, University of Oulu, Finland 备注:Accepted in WiOpt 2021 with 6 pages, 5 figures, and 2 tables 摘要:虽然工业物联网中的信息传递需要可靠性和延迟保证,但控制器可用信息的新鲜度(以信息时代(AoI)衡量)对于高性能工业自动化至关重要。这项工作中的问题被归结为传感器在峰值AoI要求和排队延迟概率约束下的发射功率最小化。我们进一步通过广义帕累托分布(GPD)来描述延迟的尾部行为,通过李雅普诺夫优化来解决功率分配问题。由于每个传感器利用其自身的数据对GPD模型进行局部训练,我们结合了联邦学习,并提出了一种局部模型选择方法,该方法考虑了传感器训练数据之间的相关性。数值结果显示了发射功率、峰值AoI和延迟尾部分布之间的折衷。此外,我们还验证了所提出的相关感知方法在联邦学习中选择局部模型的优越性。 摘要:While information delivery in industrial Internet of things demands reliability and latency guarantees, the freshness of the controller's available information, measured by the age of information (AoI), is paramount for high-performing industrial automation. The problem in this work is cast as a sensor's transmit power minimization subject to the peak-AoI requirement and a probabilistic constraint on queuing latency. We further characterize the tail behavior of the latency by a generalized Pareto distribution (GPD) for solving the power allocation problem through Lyapunov optimization. As each sensor utilizes its own data to locally train the GPD model, we incorporate federated learning and propose a local-model selection approach which accounts for correlation among the sensor's training data. Numerical results show the tradeoff between the transmit power, peak AoI, and delay's tail distribution. Furthermore, we verify the superiority of the proposed correlation-aware approach for selecting the local models in federated learning over an existing baseline.
【2】 Aggregation Delayed Federated Learning 标题:聚合延迟联合学习 链接:https://arxiv.org/abs/2108.07433
作者:Ye Xue,Diego Klabjan,Yuan Luo 机构:Northwestern University, Evanston, IL, Chicago, IL 摘要:联合学习是一种分布式机器学习范式,其中多个数据所有者(客户机)协作训练一个机器学习模型,同时将数据保存在自己的设备上。客户端数据集的异构性是联邦学习算法最重要的挑战之一。研究发现,在非IID数据上使用标准联邦算法(如FedAvg)会降低性能。许多处理非IID数据的现有工作采用了与FedAvg相同的聚合框架,重点是改进服务器端或客户端的模型更新。在这项工作中,我们通过引入延迟聚合的重新分配轮,从不同的角度解决了这一挑战。我们在多个任务上进行了实验,结果表明该框架显著提高了非IID数据的性能。 摘要:Federated learning is a distributed machine learning paradigm where multiple data owners (clients) collaboratively train one machine learning model while keeping data on their own devices. The heterogeneity of client datasets is one of the most important challenges of federated learning algorithms. Studies have found performance reduction with standard federated algorithms, such as FedAvg, on non-IID data. Many existing works on handling non-IID data adopt the same aggregation framework as FedAvg and focus on improving model updates either on the server side or on clients. In this work, we tackle this challenge in a different view by introducing redistribution rounds that delay the aggregation. We perform experiments on multiple tasks and show that the proposed framework significantly improves the performance on non-IID data.
【3】 Fine-tuning is Fine in Federated Learning 标题:在联合学习中,微调是好的 链接:https://arxiv.org/abs/2108.07313
作者:Gary Cheng,Karan Chadha,John Duchi 备注:40 pages (10 main pages, 30 appendix pages), 13 figures 摘要:我们在渐近框架下研究了联邦学习算法及其变体的性能。我们的出发点是将联合学习表述为一个多标准目标,目标是使用来自所有客户机的信息最大限度地减少每个客户机的损失。我们提出了一个线性回归模型,其中,对于给定的客户,我们从理论上比较了各种算法在高维渐近极限下的性能。这种渐进多准则方法自然地模拟了联合学习的高维、多设备特性,并表明个性化是联合学习的核心。我们的理论表明,精细调整联邦平均(FTFA),即先进行联邦平均,然后进行局部训练,以及岭正则化变体岭调优联邦平均(RTFA),与更复杂的元学习和近端正则化方法相比具有竞争力。除了在概念上更简单外,FTFA和RTFA在计算上比其竞争对手更高效。我们在EMNIST、CIFAR-100、Shakespeare和堆栈溢出数据集的联合版本上进行了大量实验,证实了我们的理论主张。 摘要:We study the performance of federated learning algorithms and their variants in an asymptotic framework. Our starting point is the formulation of federated learning as a multi-criterion objective, where the goal is to minimize each client's loss using information from all of the clients. We propose a linear regression model, where, for a given client, we theoretically compare the performance of various algorithms in the high-dimensional asymptotic limit. This asymptotic multi-criterion approach naturally models the high-dimensional, many-device nature of federated learning and suggests that personalization is central to federated learning. Our theory suggests that Fine-tuned Federated Averaging (FTFA), i.e., Federated Averaging followed by local training, and the ridge regularized variant Ridge-tuned Federated Averaging (RTFA) are competitive with more sophisticated meta-learning and proximal-regularized approaches. In addition to being conceptually simpler, FTFA and RTFA are computationally more efficient than its competitors. We corroborate our theoretical claims with extensive experiments on federated versions of the EMNIST, CIFAR-100, Shakespeare, and Stack Overflow datasets.
检测相关(1篇)
【1】 Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model 标题:基于卷积神经网络和拉普拉斯隐半马尔可夫模型的新生儿肠音检测 链接:https://arxiv.org/abs/2108.07467
作者:Chiranjibi Sitaula,Jinyuan He,Archana Priyadarshi,Mark Tracy,Omid Kavehei,Murray Hinder,Anusha Withana,Alistair McEwan,Faezeh Marzbanrad 机构: Hinder are with Department of Paediatricsand Child Health at The University of Sydney and Westmead HospitalO, McEwan are with School of Biomedical Engineering 备注:Under review in IEEE/ACM Transactions on Audio Speech and Language Processing journal 摘要:腹部听诊是评估肠道状况的一种方便、安全和廉价的方法,在新生儿护理中至关重要。它有助于早期发现新生儿肠道功能障碍,并允许及时干预。本文介绍了一种新生儿肠音检测方法,以协助听诊。具体而言,卷积神经网络(CNN)被提出用于分类蠕动和非蠕动声音。然后使用拉普拉斯隐半马尔可夫模型(HSMM)对分类进行优化。我们的第三级新生儿重症监护病房(NICU)收治的49名新生儿的腹部声音验证了所提出的方法。结果表明,该方法能有效地检测肠鸣音,准确率和曲线下面积(AUC)得分分别为89.81%和83.96%,优于13种基线方法。此外,提出的拉普拉斯HSMM细化策略被证明能够增强其他肠道声音检测模型。这项工作的成果有可能促进未来远程保健在新生儿护理中的应用。我们工作的源代码可在以下网址找到:https://bitbucket.org/chirudeakin/neonatal-bowel-sound-classification/ 摘要:Abdominal auscultation is a convenient, safe and inexpensive method to assess bowel conditions, which is essential in neonatal care. It helps early detection of neonatal bowel dysfunctions and allows timely intervention. This paper presents a neonatal bowel sound detection method to assist the auscultation. Specifically, a Convolutional Neural Network (CNN) is proposed to classify peristalsis and non-peristalsis sounds. The classification is then optimized using a Laplace Hidden Semi-Markov Model (HSMM). The proposed method is validated on abdominal sounds from 49 newborn infants admitted to our tertiary Neonatal Intensive Care Unit (NICU). The results show that the method can effectively detect bowel sounds with accuracy and area under curve (AUC) score being 89.81% and 83.96% respectively, outperforming 13 baseline methods. Furthermore, the proposed Laplace HSMM refinement strategy is proven capable to enhance other bowel sound detection models. The outcomes of this work have the potential to facilitate future telehealth applications for neonatal care. The source code of our work can be found at: https://bitbucket.org/chirudeakin/neonatal-bowel-sound-classification/
分类|识别(4篇)
【1】 KCNet: An Insect-Inspired Single-Hidden-Layer Neural Network with Randomized Binary Weights for Prediction and Classification Tasks 标题:KCNet:一种用于预测和分类任务的随机二进制权的昆虫启发单隐层神经网络 链接:https://arxiv.org/abs/2108.07554
作者:Jinyung Hong,Theodore P. Pavlic 机构:School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ , USA, School of Sustainability, School of Complex Adaptive Systems, School of Life Sciences 备注:27 pages, 46 figures, 3 tables 摘要:果蝇是研究嗅觉学习的模型系统,因为它们很容易学会将气味与电击或糖奖励联系起来。昆虫大脑显然负责气味学习的机制形成了一个相对较浅的神经元结构。嗅觉输入由大脑的触角叶(AL)接收,该叶产生一种跨越约50个亚单位的气味混合物编码,称为肾小球。然后,每个肾小球将其特征向量的组成部分投射到大脑蘑菇体(MB)区域内约2000个所谓的肯扬细胞(KC)中的几个。苍蝇对气味的反应是由下游的小神经末梢产生的,这些神经末梢对来自MB的高阶表示进行解码。研究表明,肾小球——KC连接(以及特定的高阶表示)中没有可识别的模式;它们类似于指纹——即使是同基因的果蝇也有不同的投射。利用这种结构的见解,我们提出了KCNet,这是一种单隐层神经网络,包含输入层和隐层之间的稀疏、随机、二进制权重,以及隐层和输出层之间的分析学习权重。此外,我们还提出了一种动态优化算法,使KCNet能够通过搜索更有效的输入集来提高性能,超越其结构限制。对于预测气味感知特性的气味感知任务,我们表明KCNet优于现有的数据驱动方法,如XGBoost。对于图像分类任务,KCNet在基准数据集(MNIST、Fashion MNIST和EMNIST)上实现了合理的性能,无需任何数据增强方法或卷积层,运行时间特别快。因此,受昆虫大脑启发的神经网络既经济又性能良好。 摘要:Fruit flies are established model systems for studying olfactory learning as they will readily learn to associate odors with both electric shock or sugar rewards. The mechanisms of the insect brain apparently responsible for odor learning form a relatively shallow neuronal architecture. Olfactory inputs are received by the antennal lobe (AL) of the brain, which produces an encoding of each odor mixture across ~50 sub-units known as glomeruli. Each of these glomeruli then project its component of this feature vector to several of ~2000 so-called Kenyon Cells (KCs) in a region of the brain known as the mushroom body (MB). Fly responses to odors are generated by small downstream neuropils that decode the higher-order representation from the MB. Research has shown that there is no recognizable pattern in the glomeruli--KC connections (and thus the particular higher-order representations); they are akin to fingerprints~-- even isogenic flies have different projections. Leveraging insights from this architecture, we propose KCNet, a single-hidden-layer neural network that contains sparse, randomized, binary weights between the input layer and the hidden layer and analytically learned weights between the hidden layer and the output layer. Furthermore, we also propose a dynamic optimization algorithm that enables the KCNet to increase performance beyond its structural limits by searching a more efficient set of inputs. For odorant-perception tasks that predict perceptual properties of an odorant, we show that KCNet outperforms existing data-driven approaches, such as XGBoost. For image-classification tasks, KCNet achieves reasonable performance on benchmark datasets (MNIST, Fashion-MNIST, and EMNIST) without any data-augmentation methods or convolutional layers and shows particularly fast running time. Thus, neural networks inspired by the insect brain can be both economical and perform well.
【2】 A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems 标题:一种用于定制基于换能器的语音识别系统的轻量级上下文拼写校正模型 链接:https://arxiv.org/abs/2108.07493
作者:Xiaoqiang Wang,Yanqing Liu,Sheng Zhao,Jinyu Li 机构:Microsoft, China, Microsoft, USA 备注:This paper has been accepted by Interspeech 2021 摘要:基于传感器的自动语音识别(ASR)系统中的上下文信息是动态的,在模型训练过程中是不可用的。在这项工作中,我们引入了一个轻量级的上下文拼写纠正模型来纠正基于传感器的ASR系统中与上下文相关的识别错误。我们使用共享上下文编码器将上下文信息合并到拼写纠正模型中,并使用过滤算法来处理大型上下文列表。实验表明,该模型提高了基线ASR模型的性能,相对字错误率降低了约50%,这也显著优于基线方法,如上下文LM偏移。该模型还显示了在训练过程中没有看到的词汇表外术语的优异性能。 摘要:It's challenging to customize transducer-based automatic speech recognition (ASR) system with context information which is dynamic and unavailable during model training. In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems. We incorporate the context information into the spelling correction model with a shared context encoder and use a filtering algorithm to handle large-size context lists. Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction, which also significantly outperforms the baseline method such as contextual LM biasing. The model also shows excellent performance for out-of-vocabulary terms not seen during training.
【3】 Identifying Biased Subgroups in Ranking and Classification 标题:在排序和分类中识别有偏子群 链接:https://arxiv.org/abs/2108.07450
作者:Eliana Pastor,Luca de Alfaro,Elena Baralis 备注:None 摘要:在分析机器学习算法的行为时,重要的是确定特定的数据子组,对于这些数据子组,所考虑的算法相对于整个数据集表现出不同的性能。通常需要领域专家的介入,以确定定义这些子组的相关属性。我们引入了差异的概念来衡量这种性能差异,并在(i)分类模型和(ii)排名应用程序的上下文中利用它来自动检测在行为上表现出显著偏差的数据子组。此外,我们通过Shapley值量化数据子组中所有属性对发散行为的贡献,从而识别影响最大的属性。 摘要:When analyzing the behavior of machine learning algorithms, it is important to identify specific data subgroups for which the considered algorithm shows different performance with respect to the entire dataset. The intervention of domain experts is normally required to identify relevant attributes that define these subgroups. We introduce the notion of divergence to measure this performance difference and we exploit it in the context of (i) classification models and (ii) ranking applications to automatically detect data subgroups showing a significant deviation in their behavior. Furthermore, we quantify the contribution of all attributes in the data subgroup to the divergent behavior by means of Shapley values, thus allowing the identification of the most impacting attributes.
【4】 Classification of Common Waveforms Including a Watchdog for Unknown Signals 标题:常见波形的分类,包括未知信号的看门狗 链接:https://arxiv.org/abs/2108.07339
作者:C. Tanner Fredieu,Justin Bui,Anthony Martone,Robert J. Marks II,Charles Baylis,R. Michael Buehrer 机构:Department of Electrical and Computer Engineering, Baylor University, Waco, TX, USA, US Army Research Laboratory, Adelphi, MD, USA 摘要:在本文中,我们研究了使用深层感知器模型架构将接收信号样本分类为四种常见波形之一,即单载波(SC)、单载波频分多址(SC-FDMA)、正交频分复用(OFDM)和线性调频(LFM),用于通信和雷达网络。不需要同步信号,因为我们假设存在未知且未补偿的时间和频率偏移。一个具有深度CNN结构的自动编码器也被检查,以创建一个未知波形类型的新的第五分类类别。这是通过从雷达和通信波形的均方根误差(RMSE)计算最小和最大阈值来实现的。分类器和自动编码器共同监控频谱区域,以识别操作区域内的常见波形,同时检测未知波形。测试结果表明,该分类器在0分贝以上有100%的分类率,在-10分贝和-5分贝时的准确率分别为83.2%和94.7%,且存在信号损伤。异常检测器的结果显示,当使用高值快速傅里叶变换(FFT)时,在0 dB时的准确度为85.3%,在信噪比大于0 dB时的准确度为100%,且存在信号损伤。随着信号中引入额外噪声,准确检测率下降,在-5 dB时为78.1%,在-10 dB时为56.5%。然而,我们的结果也显示,通过使用更高的FFT大小,可以潜在地缓解这些低速率。 摘要:In this paper, we examine the use of a deep multi-layer perceptron model architecture to classify received signal samples as coming from one of four common waveforms, Single Carrier (SC), Single-Carrier Frequency Division Multiple Access (SC-FDMA), Orthogonal Frequency Division Multiplexing (OFDM), and Linear Frequency Modulation (LFM), used in communication and radar networks. Synchronization of the signals is not needed as we assume there is an unknown and uncompensated time and frequency offset. An autoencoder with a deep CNN architecture is also examined to create a new fifth classification category of an unknown waveform type. This is accomplished by calculating a minimum and maximum threshold values from the root mean square error (RMSE) of the radar and communication waveforms. The classifier and autoencoder work together to monitor a spectrum area to identify the common waveforms inside the area of operation along with detecting unknown waveforms. Results from testing showed the classifier had 100% classification rate above 0 dB with accuracy of 83.2% and 94.7% at -10 dB and -5 dB, respectively, with signal impairments present. Results for the anomaly detector showed 85.3% accuracy at 0 dB with 100% at SNR greater than 0 dB with signal impairments present when using a high-value Fast Fourier Transform (FFT) size. Accurate detection rates decline as additional noise is introduced to the signals, with 78.1% at -5 dB and 56.5% at -10 dB. However, these low rates seen can be potentially mitigated by using even higher FFT sizes also shown in our results.
优化|敛散性(2篇)
【1】 Synthesizing Pareto-Optimal Interpretations for Black-Box Models 标题:综合黑箱模型的帕累托最优解释 链接:https://arxiv.org/abs/2108.07307
作者:Hazem Torfah,Shetal Shah,Supratik Chakraborty,S. Akshay,Sanjit A. Seshia 机构:University of California at Berkeley, Indian Institute of Technology, Bombay 备注:Long version of conference paper accepted at FMCAD'21 摘要:我们提出了一种新的多目标优化方法,用于综合解释“解释”黑箱机器学习模型的行为。为黑箱模型构建人类可理解的解释通常需要平衡相互冲突的目标。对于人类来说,一个简单的解释可能更容易理解,而与复杂的解释相比,它的预测更不精确。现有的综合解释方法使用单一目标函数,并且通常针对单一类别的解释进行优化。相比之下,我们提供了一个更通用的多目标综合框架,允许用户选择(1)从中合成解释的语法模板类别,以及(2)解释正确性和解释性的定量度量。对于给定的黑盒,我们的方法产生了一组关于正确性和可解释性度量的帕累托最优解释。我们证明了基本的多目标优化问题可以通过简化为定量约束求解(如加权最大可满足性)来解决。为了证明我们的方法的优点,我们将其应用于黑盒神经网络分类器的综合解释。我们的实验表明,对于现有方法所遗漏的解释,往往存在着丰富多样的选择。 摘要:We present a new multi-objective optimization approach for synthesizing interpretations that "explain" the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability. To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.
【2】 Stochastic optimization under time drift: iterate averaging, step decay, and high probability guarantees 标题:时间漂移下的随机优化:迭代平均、步长衰减和高概率保证 链接:https://arxiv.org/abs/2108.07356
作者:Joshua Cutler,Dmitriy Drusvyatskiy,Zaid Harchaoui 备注:57 pages, 6 figures, under review 摘要:我们考虑一个凸函数的最小化问题,它是根据未知的和可能的随机动力学在时间上发展的。在机器学习和信号处理文献中,以概念漂移和随机跟踪为名,大量存在这样的问题。我们为具有迭代平均的随机算法提供了新的非渐近收敛性保证,重点关注在期望值和高概率下的有效界。值得注意的是,我们证明了当配备有阶跃衰减计划时,近似随机梯度方法的跟踪效率仅对数依赖于初始化质量。此外,结果自然延伸到动态共同依赖于时间和决策变量本身的环境中,如在执行预测框架中。 摘要:We consider the problem of minimizing a convex function that is evolving in time according to unknown and possibly stochastic dynamics. Such problems abound in the machine learning and signal processing literature, under the names of concept drift and stochastic tracking. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. Notably, we show that the tracking efficiency of the proximal stochastic gradient method depends only logarithmically on the initialization quality, when equipped with a step-decay schedule. The results moreover naturally extend to settings where the dynamics depend jointly on time and on the decision variable itself, as in the performative prediction framework.
预测|估计(4篇)
【1】 Prediction of Students performance with Artificial Neural Network using Demographic Traits 标题:基于人口学特征的人工神经网络对学生成绩的预测 链接:https://arxiv.org/abs/2108.07717
作者:Adeniyi Jide Kehinde,Abidemi Emmanuel Adeniyi,Roseline Oluwaseun Ogundokun,Himanshu Gupta,Sanjay Misra 机构:Department of Computer Science, Landmark University Omu Aran, Nigeria, Birla Institute of Technology Pilani, Hyderabad, Department of Electrical and Information Engineering, Covenant University, Ota, Nigeria 备注:10 pages, 7 figures, 3 Tables, Fourth International Conference on Recent Innovations in Computing (IRCIC-2021) 摘要:许多研究人员使用多种数据挖掘技术研究了学生在有监督和无监督学习中的学习成绩。神经网络通常需要更多的观测数据来获得足够的预测能力。由于贫困毕业生的增长率,有必要设计一个系统,帮助减少这种威胁,并减少学生不得不重复因成绩不佳或不得不辍学在整个职业生涯中期的发生率。因此,有必要研究每一种方法及其优缺点,以确定哪种方法更有效,以及在何种情况下应优先于另一种方法。该研究的目的是开发一个系统,用人工神经网络来预测学生的表现,利用学生的人口特征,以帮助大学选择候选人(学生)的录取成功率高预测使用以前的招生录取学生的学习记录,最终会导致质量。该机构的毕业生。该模型是基于某些选定变量作为输入而建立的。它达到了92.3%以上的准确率,显示了人工神经网络的潜在有效性作为预测工具和选择标准的考生寻求进入大学。 摘要:Many researchers have studied student academic performance in supervised and unsupervised learning using numerous data mining techniques. Neural networks often need a greater collection of observations to achieve enough predictive ability. Due to the increase in the rate of poor graduates, it is necessary to design a system that helps to reduce this menace as well as reduce the incidence of students having to repeat due to poor performance or having to drop out of school altogether in the middle of the pursuit of their career. It is therefore necessary to study each one as well as their advantages and disadvantages, so as to determine which is more efficient in and in what case one should be preferred over the other. The study aims to develop a system to predict student performance with Artificial Neutral Network using the student demographic traits so as to assist the university in selecting candidates (students) with a high prediction of success for admission using previous academic records of students granted admissions which will eventually lead to quality graduates of the institution. The model was developed based on certain selected variables as the input. It achieved an accuracy of over 92.3 percent, showing Artificial Neural Network potential effectiveness as a predictive tool and a selection criterion for candidates seeking admission to a university.
【2】 The application of predictive analytics to identify at-risk students in health professions education 标题:预测分析在卫生职业教育中识别高危学生中的应用 链接:https://arxiv.org/abs/2108.07709
作者:Anshul Kumar,Roger Edwards,Lisa Walker 机构:MGH Institute of Health Professions, (first and corresponding author), Roger A. Edwards, (co-senior author), This document contains the following items, within a single file:, . Article manuscript, . Supplementary code and data appendix 摘要:导言:当一个学习者未能达到一个里程碑时,教育工作者经常想知道是否有任何警告信号可以让他们更快地进行干预。机器学习用于预测哪些学生有可能无法通过国家认证考试。预测是在考试之前做出的,因此教育工作者可以在学生参加考试之前进行有意义的干预。方法:使用已收集的来自医师助理硕士研究项目四个队列的一年级学生评估数据,作者实现了k-最近邻算法(AMMKNN)的“自适应最小匹配”版本,使用不断变化的邻居数预测每个学生在医师助理国家认证考试(PANCE)上的未来考试分数。在对新生进行预测之前,使用留一交叉验证(LOOCV)评估该模型的实际能力。结果:最佳预测模型的准确率为93%,敏感性为69%,特异性为94%。它为每个学生在计划参加考试前一年生成一个预测的PANCE分数。然后,学生可以前瞻性地分为需要额外支持、可选额外支持或不需要额外支持的小组。然后,教育者有一年的时间为每种类型的学生提供适当的定制支持。结论:预测性分析可以帮助卫生专业教育工作者在学生中分配稀缺的时间和资源。跨专业教育者可以使用包含的方法和代码为学生生成预测的测试结果。作者建议使用这种或类似预测方法的教育者采取负责任和透明的行动。 摘要:Introduction: When a learner fails to reach a milestone, educators often wonder if there had been any warning signs that could have allowed them to intervene sooner. Machine learning is used to predict which students are at risk of failing a national certifying exam. Predictions are made well in advance of the exam, such that educators can meaningfully intervene before students take the exam. Methods: Using already-collected, first-year student assessment data from four cohorts in a Master of Physician Assistant Studies program, the authors implement an "adaptive minimum match" version of the k-nearest neighbors algorithm (AMMKNN), using changing numbers of neighbors to predict each student's future exam scores on the Physician Assistant National Certifying Examination (PANCE). Leave-one-out cross validation (LOOCV) was used to evaluate the practical capabilities of this model, before making predictions for new students. Results: The best predictive model has an accuracy of 93%, sensitivity of 69%, and specificity of 94%. It generates a predicted PANCE score for each student, one year before they are scheduled to take the exam. Students can then be prospectively categorized into groups that need extra support, optional extra support, or no extra support. The educator then has one year to provide the appropriate customized support to each type of student. Conclusions: Predictive analytics can help health professions educators allocate scarce time and resources across their students. Interprofessional educators can use the included methods and code to generate predicted test outcomes for students. The authors recommend that educators using this or similar predictive methods act responsibly and transparently.
【3】 From the Greene--Wu Convolution to Gradient Estimation over Riemannian Manifolds 标题:黎曼流形上从Greene-Wu卷积到梯度估计 链接:https://arxiv.org/abs/2108.07406
作者:Tianyu Wang,Yifeng Huang,Didong Li 摘要:在一个完整的有限维黎曼流形上,Greene和Wu引入了一个卷积,称为Greene-Wu(GW)卷积。本文介绍了GW卷积的一种新形式。使用我们的重新公式,可以很容易地导出GW卷积的许多性质,包括空间曲率如何通过GW卷积影响函数曲率的新公式。通过我们的新的重新表述,我们还介绍了一种改进的黎曼流形上的梯度估计方法。理论上,我们的梯度估计方法将估计误差的顺序从$Oleft(left(n 3right)^{3/2}right)$改进为$Oleft(n^{3/2}right)$,其中$n$是流形的维数。从经验上看,我们的方法优于现有的黎曼流形上梯度估计的最佳方法,通过彻底的实验评估证明了这一点。 摘要:Over a complete Riemannian manifold of finite dimension, Greene and Wu introduced a convolution, known as Greene-Wu (GW) convolution. In this paper, we introduce a reformulation of the GW convolution. Using our reformulation, many properties of the GW convolution can be easily derived, including a new formula for how the curvature of the space would affect the curvature of the function through the GW convolution. Also enabled by our new reformulation, an improved method for gradient estimation over Riemannian manifolds is introduced. Theoretically, our gradient estimation method improves the order of estimation error from $O left( left( n 3 right)^{3/2} right)$ to $O left( n^{3/2} right)$, where $n$ is the dimension of the manifold. Empirically, our method outperforms the best existing method for gradient estimation over Riemannian manifolds, as evidenced by thorough experimental evaluations.
【4】 An End-to-End Deep Learning Approach for Epileptic Seizure Prediction 标题:用于癫痫发作预测的端到端深度学习方法 链接:https://arxiv.org/abs/2108.07453
作者:Yankun Xu,Jie Yang,Shiqi Zhao,Hemmings Wu,Mohamad Sawan 机构:CenBRAIN, School of Engineering, Westlake University, Hangzhou, Zhejiang, China , Department of Neurosurgery, Zhejiang University School of Medicine Second Affiliated Hospital 备注:5 pages, 4 figures, 4 tables, conference 摘要:准确的癫痫发作预测系统能够在癫痫患者发作前提供早期预警。这对于药物难治性患者极为重要。传统的癫痫发作预测工作通常依赖于从脑电图(EEG)记录中提取的特征和诸如回归或支持向量机(SVM)等分类算法来定位癫痫发作前的短时间。然而,由于手工特征的信息丢失以及回归和支持向量机算法的分类能力有限,这些方法无法实现高精度的预测。本文提出了一种基于卷积神经网络(CNN)的端到端深度学习解决方案。在早期和晚期卷积层和最大池层分别采用一维和二维核。提出的CNN模型在Kaggle颅内和CHB-MIT头皮EEG数据集上进行了评估。在两个数据集上,总体灵敏度、误报率和接收机工作特性曲线下面积分别达到93.5%、0.063/h、0.981和98.8%、0.074/h和0.988。与最新研究成果的比较表明,该模型的预测性能优于现有模型。 摘要:An accurate seizure prediction system enables early warnings before seizure onset of epileptic patients. It is extremely important for drug-refractory patients. Conventional seizure prediction works usually rely on features extracted from Electroencephalography (EEG) recordings and classification algorithms such as regression or support vector machine (SVM) to locate the short time before seizure onset. However, such methods cannot achieve high-accuracy prediction due to information loss of the hand-crafted features and the limited classification ability of regression and SVM algorithms. We propose an end-to-end deep learning solution using a convolutional neural network (CNN) in this paper. One and two dimensional kernels are adopted in the early- and late-stage convolution and max-pooling layers, respectively. The proposed CNN model is evaluated on Kaggle intracranial and CHB-MIT scalp EEG datasets. Overall sensitivity, false prediction rate, and area under receiver operating characteristic curve reaches 93.5%, 0.063/h, 0.981 and 98.8%, 0.074/h, 0.988 on two datasets respectively. Comparison with state-of-the-art works indicates that the proposed model achieves exceeding prediction performance.
其他神经网络|深度学习|模型|建模(12篇)
【1】 Mitigating harm in language models with conditional-likelihood filtration 标题:用条件似然滤波减轻语言模型中的危害 链接:https://arxiv.org/abs/2108.07790
作者:Helen Ngo,Cooper Raterink,João G. M. Araújo,Ivan Zhang,Carol Chen,Adrien Morisot,Nicholas Frosst 机构:João G.M. Araújo† 摘要:在大规模未经过滤的数据集上训练的语言模型从其训练数据中获得系统性偏见、偏见和有害观点。我们提出了一种从web级数据集中以编程方式识别和删除有害文本的方法。预训练语言模型用于计算研究人员根据特定文档编写的触发短语的对数似然性,用于从数据集中识别和过滤文档。我们证明了在该过滤数据集上训练的模型生成有害文本的倾向性较低,与未过滤基线相比,在标准语言建模基准上的性能略有下降。我们通过从标准语言建模基准中呈现仇恨言论和其他不受欢迎的内容的示例,部分解释了这种性能差距。最后,我们讨论了这种方法的推广,以及研究人员如何使用反映特定价值观的触发短语来构建与其价值观更紧密一致的语言模型。 摘要:Language models trained on large-scale unfiltered datasets curated from the open web acquire systemic biases, prejudices, and harmful views from their training data. We present a methodology for programmatically identifying and removing harmful text from web-scale datasets. A pretrained language model is used to calculate the log-likelihood of researcher-written trigger phrases conditioned on a specific document, which is used to identify and filter documents from the dataset. We demonstrate that models trained on this filtered dataset exhibit lower propensity to generate harmful text, with a marginal decrease in performance on standard language modeling benchmarks compared to unfiltered baselines. We provide a partial explanation for this performance gap by surfacing examples of hate speech and other undesirable content from standard language modeling benchmarks. Finally, we discuss the generalization of this method and how trigger phrases which reflect specific values can be used by researchers to build language models which are more closely aligned with their values.
【2】 Panoramic Learning with A Standardized Machine Learning Formalism 标题:标准化机器学习形式主义的全景学习 链接:https://arxiv.org/abs/2108.07783
作者:Zhiting Hu,Eric P. Xing 机构:UC San Diego,Carnegie Mellon University,MBZUAI,Petuum Inc. 备注:29 pages 摘要:机器学习(ML)是一种让机器从经验中学习概念的计算方法。在处理各种各样的经验时,从数据实例、知识、约束到奖励、对手,以及在不断增长的任务范围内的终身互动,当代ML/AI研究产生了多种学习范式和方法。尽管在所有不同的领域都取得了不断的进步,但不同的狭隘方法也使得学习解决方案的标准化、可组合化和可重用开发变得困难,并且如果可能的话,使构建能够全面学习所有类型经验的人工智能代理变得昂贵。本文提出了一种标准化的ML形式主义,特别是学习目标的标准方程式,它提供了对不同ML算法的统一理解,由于建模组件的不同选择,使它们成为特殊情况。该框架还为新的ML解决方案的机械设计提供指导,并作为一个有希望的工具,通过各种经验进行全景式学习。 摘要:Machine Learning (ML) is about computational methods that enable machines to learn concepts from experiences. In handling a wide variety of experiences ranging from data instances, knowledge, constraints, to rewards, adversaries, and lifelong interplay in an ever-growing spectrum of tasks, contemporary ML/AI research has resulted in a multitude of learning paradigms and methodologies. Despite the continual progresses on all different fronts, the disparate narrowly-focused methods also make standardized, composable, and reusable development of learning solutions difficult, and make it costly if possible to build AI agents that panoramically learn from all types of experiences. This paper presents a standardized ML formalism, in particular a standard equation of the learning objective, that offers a unifying understanding of diverse ML algorithms, making them special cases due to different choices of modeling components. The framework also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
【3】 Program Synthesis with Large Language Models 标题:基于大型语言模型的程序综合 链接:https://arxiv.org/abs/2108.07732
作者:Jacob Austin,Augustus Odena,Maxwell Nye,Maarten Bosma,Henryk Michalewski,David Dohan,Ellen Jiang,Carrie Cai,Michael Terry,Quoc Le,Charles Sutton 机构:Google Research, denotes equal contribution 备注:Jacob and Augustus contributed equally 摘要:本文探讨了当前通用编程语言中用于程序综合的大型语言模型的局限性。我们在两个新的基准MBPP和MathQA Python上评估了一组这样的模型(参数介于244M和137B之间),包括少量快照和微调机制。我们的基准测试旨在衡量这些模型从自然语言描述合成简短Python程序的能力。基本编程问题(MBPP)数据集包含974个编程任务,旨在由入门级程序员解决。MathQA Python数据集是MathQA基准的Python版本,包含23914个问题,这些问题评估了模型从更复杂的文本合成代码的能力。在这两个数据集上,我们发现合成性能与模型大小成线性关系。我们最大的模型,即使没有对代码数据集进行微调,也可以通过设计良好的提示,使用少量的快照学习,综合解决MBPP中59.6%的问题。对数据集的保留部分进行微调可以在大多数模型大小中提高大约10个百分点的性能。在MathQA Python数据集上,最大的微调模型达到了83.8%的准确率。更进一步,我们研究了模型参与代码对话的能力,并结合了人类反馈来改进其解决方案。我们发现,与模型的初始预测相比,来自人类的自然语言反馈使错误率降低了一半。此外,我们还进行了错误分析,以阐明这些模型的不足之处以及最难生成的程序类型。最后,我们通过微调这些模型来预测程序执行的结果,从而探索这些模型的语义基础。我们发现,即使我们最好的模型通常也无法预测给定特定输入的程序的输出。 摘要:This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize short Python programs from natural language descriptions. The Mostly Basic Programming Problems (MBPP) dataset contains 974 programming tasks, designed to be solvable by entry-level programmers. The MathQA-Python dataset, a Python version of the MathQA benchmark, contains 23914 problems that evaluate the ability of the models to synthesize code from more complex text. On both datasets, we find that synthesis performance scales log-linearly with model size. Our largest models, even without finetuning on a code dataset, can synthesize solutions to 59.6 percent of the problems from MBPP using few-shot learning with a well-designed prompt. Fine-tuning on a held-out portion of the dataset improves performance by about 10 percentage points across most model sizes. On the MathQA-Python dataset, the largest fine-tuned model achieves 83.8 percent accuracy. Going further, we study the model's ability to engage in dialog about code, incorporating human feedback to improve its solutions. We find that natural language feedback from a human halves the error rate compared to the model's initial prediction. Additionally, we conduct an error analysis to shed light on where these models fall short and what types of programs are most difficult to generate. Finally, we explore the semantic grounding of these models by fine-tuning them to predict the results of program execution. We find that even our best models are generally unable to predict the output of a program given a specific input.
【4】 Scaling Laws for Deep Learning 标题:深度学习的标度律 链接:https://arxiv.org/abs/2108.07686
作者:Jonathan S. Rosenfeld 机构:B.A., Physics, Israel Institute of Technology (,), B.Sc., Electrical Engineering, Israel Institute of Technology (,), MBA., Electrical Engineering, Israel Institute of Technology (,), M.Sc. Electrical Engineering and Computer Science 备注:PhD thesis 摘要:跑得更快只会让你走得更远——通常建议先了解道路的走向,然后买辆车。。。机器学习(ML)和深度学习(DL)在过去十年中的复兴伴随着不可扩展的计算成本,限制了其发展,并在实践中影响了该领域。在这篇论文中,我们采取了一种系统的方法来解决这些成本根源的算法和方法限制。我们首先证明了DL训练和剪枝是可预测的,并且受缩放定律的控制——对于最先进的模型和任务,包括图像分类和语言建模,以及通过迭代剪枝进行最先进的模型压缩。通过建立这些比例定律,可预测性为原则性设计和权衡推理提供了途径,而目前该领域基本上缺乏这种方法。然后,我们继续分析标度律的来源,提供近似理论观点,并通过探索一个无噪声的可实现情况表明,DL实际上是由远离误差下限的误差源控制的。我们在对标度定律起源的理论理解的基础上得出结论。通过数据带宽限制假设和Nyquist学习者的引入,我们提出了一种消除当前主要错误源之一的推测路径,原则上,在有限的数据集大小下,该路径可以达到泛化错误下限(例如,在无噪情况下为0)。 摘要:Running faster will only get you so far -- it is generally advisable to first understand where the roads lead, then get a car ... The renaissance of machine learning (ML) and deep learning (DL) over the last decade is accompanied by an unscalable computational cost, limiting its advancement and weighing on the field in practice. In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs. We first demonstrate that DL training and pruning are predictable and governed by scaling laws -- for state of the art models and tasks, spanning image classification and language modeling, as well as for state of the art model compression via iterative pruning. Predictability, via the establishment of these scaling laws, provides the path for principled design and trade-off reasoning, currently largely lacking in the field. We then continue to analyze the sources of the scaling laws, offering an approximation-theoretic view and showing through the exploration of a noiseless realizable case that DL is in fact dominated by error sources very far from the lower error limit. We conclude by building on the gained theoretical understanding of the scaling laws' origins. We present a conjectural path to eliminate one of the current dominant error sources -- through a data bandwidth limiting hypothesis and the introduction of Nyquist learners -- which can, in principle, reach the generalization error lower limit (e.g. 0 in the noiseless case), at finite dataset size.
【5】 Learning to Compute Approximate Nash Equilibrium for Normal-form Games 标题:学习计算正规型博弈的近似纳什均衡 链接:https://arxiv.org/abs/2108.07472
作者:Zhijian Duan,Yali Du,Jun Wang,Xiaotie Deng 机构: Center on Frontiers of Computing Studies, Computer Science Dept., Peking University, University College London 摘要:在这篇文章中,我们提出了一个通用的元学习方法来计算有限n$人标准型博弈的近似纳什均衡。与现有解决方案不同的是,我们的元解算器直接构建了从博弈效用矩阵到联合策略配置文件的映射,这些解决方案从零开始近似或学习每个博弈的纳什均衡。映射是参数化的,并通过提出的纳什均衡近似度量以自我监督的方式学习,而无需地面真实数据通知任何纳什均衡。因此,在相同的博弈分布下,它可以立即预测近似于任何看不见的新博弈的纳什均衡的联合策略剖面。此外,如果允许迭代更新,元解算器可以进一步微调并适应新游戏。我们从理论上证明了我们的元解算器不受精确纳什均衡解的非光滑性的影响,并推导了一个样本复杂度,以证明其在正规形式博弈中的泛化能力。实验结果表明,在自适应和非自适应情况下,它对其他强基线都具有相当大的逼近能力。 摘要:In this paper, we propose a general meta learning approach to computing approximate Nash equilibrium for finite $n$-player normal-form games. Unlike existing solutions that approximate or learn a Nash equilibrium from scratch for each of the games, our meta solver directly constructs a mapping from a game utility matrix to a joint strategy profile. The mapping is parameterized and learned in a self-supervised fashion by a proposed Nash equilibrium approximation metric without ground truth data informing any Nash equilibrium. As such, it can immediately predict the joint strategy profile that approximates a Nash equilibrium for any unseen new game under the same game distribution. Moreover, the meta-solver can be further fine-tuned and adaptive to a new game if iteration updates are allowed. We theoretically prove that our meta-solver is not affected by the non-smoothness of exact Nash equilibrium solutions, and derive a sample complexity bound to demonstrate its generalization ability across normal-form games. Experimental results demonstrate its substantial approximation power against other strong baselines in both adaptive and non-adaptive cases.
【6】 Towards Secure and Practical Machine Learning via Secret Sharing and Random Permutation 标题:基于秘密共享和随机置换的安全实用机器学习 链接:https://arxiv.org/abs/2108.07463
作者:Fei Zheng,Chaochao Chen,Xiaolin Zheng 机构:College of Computer Science and Technology, Zhejiang University, Hangzhou, China, A R T I C L E I N F O 摘要:随着人们对隐私保护需求的不断增加,保护隐私的机器学习受到了学术界和工业界的广泛关注。然而,大多数现有方法在实际应用中都有其局限性。一方面,尽管大多数密码方法是可证明安全的,但它们带来了大量的计算和通信。另一方面,许多相对有效的私有方法(如联合学习和分割学习)的安全性受到质疑,因为它们是不可证明安全的。受先前隐私保护机器学习工作的启发,我们通过计算后置换技术将随机置换和算术秘密共享相结合,构建了一个隐私保护机器学习框架。由于我们的方法降低了元素函数计算的成本,因此它比现有的密码方法更有效。此外,通过采用距离相关性作为隐私泄漏的度量,我们证明了我们的方法比以前的不可证明安全方法更安全。总的来说,我们的方案在安全性和效率之间取得了良好的平衡。实验结果表明,与目前最先进的加密方法相比,我们的方法不仅速度快6倍,减少了85%的网络流量,而且与不可证明的安全方法相比,在训练过程中泄漏的隐私更少。 摘要:With the increasing demands for privacy protection, privacy-preserving machine learning has been drawing much attention in both academia and industry. However, most existing methods have their limitations in practical applications. On the one hand, although most cryptographic methods are provable secure, they bring heavy computation and communication. On the other hand, the security of many relatively efficient private methods (e.g., federated learning and split learning) is being questioned, since they are non-provable secure. Inspired by previous work on privacy-preserving machine learning, we build a privacy-preserving machine learning framework by combining random permutation and arithmetic secret sharing via our compute-after-permutation technique. Since our method reduces the cost for element-wise function computation, it is more efficient than existing cryptographic methods. Moreover, by adopting distance correlation as a metric for privacy leakage, we demonstrate that our method is more secure than previous non-provable secure methods. Overall, our proposal achieves a good balance between security and efficiency. Experimental results show that our method not only is up to 6x faster and reduces up to 85% network traffic compared with state-of-the-art cryptographic methods, but also leaks less privacy during the training process compared with non-provable secure methods.
【7】 Modeling Protein Using Large-scale Pretrain Language Model 标题:基于大规模预训练语言模型的蛋白质建模 链接:https://arxiv.org/abs/2108.07435
作者:Yijia Xiao,Jiezhong Qiu,Ziang Li,Chang-Yu Hsieh,Jie Tang 机构: Department of Computer Science and Technology, Tsinghua University, Beijing Academy of Artificial Intelligence, Tencent Quantum Lab 备注:Accepted paper in Pretrain@KDD 2021 (The International Workshop on Pretraining: Algorithms, Architectures, and Applications) 摘要:蛋白质几乎与每一个生命过程都有联系。因此,分析蛋白质序列的生物结构和性质对于生命探索、疾病检测和药物发现至关重要。传统的蛋白质分析方法往往是劳动密集型和耗时的。深度学习模型的出现使得在大量数据中建模数据模式成为可能。跨学科研究人员已开始利用深度学习方法对大型生物数据集进行建模,例如使用长-短期记忆和卷积神经网络进行蛋白质序列分类。经过数百万年的进化,进化信息被编码在蛋白质序列中。受自然语言和蛋白质序列之间相似性的启发,我们使用大规模语言模型对进化规模的蛋白质序列进行建模,在表示中编码蛋白质生物学信息。在标记级和序列级任务中都观察到了显著的改进,这表明我们的大规模模型能够准确地从进化尺度的个体序列的预训练中捕获进化信息。我们的代码和模型可在https://github.com/THUDM/ProteinLM. 摘要:Protein is linked to almost every life process. Therefore, analyzing the biological structure and property of protein sequences is critical to the exploration of life, as well as disease detection and drug discovery. Traditional protein analysis methods tend to be labor-intensive and time-consuming. The emergence of deep learning models makes modeling data patterns in large quantities of data possible. Interdisciplinary researchers have begun to leverage deep learning methods to model large biological datasets, e.g. using long short-term memory and convolutional neural network for protein sequence classification. After millions of years of evolution, evolutionary information is encoded in protein sequences. Inspired by the similarity between natural language and protein sequences, we use large-scale language models to model evolutionary-scale protein sequences, encoding protein biology information in representation. Significant improvements are observed in both token-level and sequence-level tasks, demonstrating that our large-scale model can accurately capture evolution information from pretraining on evolutionary-scale individual sequences. Our code and model are available at https://github.com/THUDM/ProteinLM.
【8】 Memory-Efficient Factorization Machines via Binarizing both Data and Model Coefficients 标题:通过对数据和模型系数进行二值化来实现内存高效的因式分解机器 链接:https://arxiv.org/abs/2108.07421
作者:Yu Geng,Liang Lan 机构:the date of receipt and acceptance should be inserted later 摘要:因子分解机(FM)是一种能够在线性时间内有效地对特征交互进行建模的通用预测器,最初被提出用于协作推荐,并被广泛用于回归、分类和排序任务。子空间编码因子分解机(SEFM)最近被提出,通过对每个输入特征进行一次热编码,为单个特征和特征交互应用显式非线性特征映射,以克服因子分解机(FM)的表达能力限制。尽管SEFM有效,但它将FM的内存成本增加了$b$倍,其中$b$是在每个输入功能上应用一个热编码时的存储箱数。为了降低SEFM的内存开销,我们提出了一种称为二进制FM的新方法,该方法将模型参数约束为二进制值(即1或$-1$)。然后每个参数值可以有效地存储在一位中。我们提出的方法可以显著降低SEFM模型的内存开销。此外,我们还提出了一种新的算法,使用带自适应梯度下降(Adagrad)的直通估计器(STE)有效地学习具有二进制约束的FM。最后,我们在八个不同的分类数据集上评估了我们提出的方法的性能。我们的实验结果表明,我们提出的方法达到了与SEFM相当的精度,但存储成本要少得多。 摘要:Factorization Machines (FM), a general predictor that can efficiently model feature interactions in linear time, was primarily proposed for collaborative recommendation and have been broadly used for regression, classification and ranking tasks. Subspace Encoding Factorization Machine (SEFM) has been proposed recently to overcome the expressiveness limitation of Factorization Machines (FM) by applying explicit nonlinear feature mapping for both individual features and feature interactions through one-hot encoding to each input feature. Despite the effectiveness of SEFM, it increases the memory cost of FM by $b$ times, where $b$ is the number of bins when applying one-hot encoding on each input feature. To reduce the memory cost of SEFM, we propose a new method called Binarized FM which constraints the model parameters to be binary values (i.e., 1 or $-1$). Then each parameter value can be efficiently stored in one bit. Our proposed method can significantly reduce the memory cost of SEFM model. In addition, we propose a new algorithm to effectively and efficiently learn proposed FM with binary constraints using Straight Through Estimator (STE) with Adaptive Gradient Descent (Adagrad). Finally, we evaluate the performance of our proposed method on eight different classification datasets. Our experimental results have demonstrated that our proposed method achieves comparable accuracy with SEFM but with much less memory cost.
【9】 Diagnosis of Acute Myeloid Leukaemia Using Machine Learning 标题:机器学习在急性髓系白血病诊断中的应用 链接:https://arxiv.org/abs/2108.07396
作者:A. Angelakis,I. Soulioti 机构:Diagnosis of Acute Myeloid Leukaemia Using Machine LearningAthanasios AngelakisIoanna SouliotiJADSDepartment of BiologyEindhoven University of TechnologyUniversity of AthensDen Bosch 备注:16 pages, 3 figures 摘要:我们在2177人的数据集上训练了一个机器学习模型,使用26个探针集及其年龄作为特征,以便对患有急性髓系白血病或健康的人进行分类。该数据集是多中心的,由来自4大洲、15个国家、25个城市、27个组织的数据组成。我们的模型的准确度为99.94%,F1得分为0.9996。就我们所知,我们的模型在使用相似或不相似数据预测AML方面的性能是文献中最好的。此外,对于我们在模型中用作特征的26个探针组,还没有任何与急性髓系白血病相关的文献参考。 摘要:We train a machine learning model on a dataset of 2177 individuals using as features 26 probe sets and their age in order to classify if someone has acute myeloid leukaemia or is healthy. The dataset is multicentric and consists of data from 27 organisations, 25 cities, 15 countries and 4 continents. The accuracy or our model is 99.94% and its F1-score 0.9996. To the best of our knowledge the performance of our model is the best one in the literature, as regards the prediction of AML using similar or not data. Moreover, there has not been any bibliographic reference associated with acute myeloid leukaemia for the 26 probe sets we used as features in our model.
【10】 Contextual Convolutional Neural Networks 标题:上下文卷积神经网络 链接:https://arxiv.org/abs/2108.07387
作者:Ionut Cosmin Duta,Mariana Iuliana Georgescu,Radu Tudor Ionescu 机构:University of Bucharest, Romania; SecurifAI, Romania 备注:Accepted at ICCV Workshop on Neural Architectures (NeurArch 2021) 摘要:我们提出了用于视觉识别的上下文卷积(CoConv)。CoConv是卷积神经网络核心部件标准卷积的直接替代。CoConv隐式地具备合并上下文信息的能力,同时与标准卷积相比保持相似数量的参数和计算成本。CoConv的灵感来源于神经科学研究,该研究表明:(i)神经元,即使来自初级视觉皮层(V1区),也参与上下文线索的检测;(ii)视觉神经元的活动可以受到完全位于其理论感受野之外的刺激的影响。一方面,我们将CoConv集成到广泛使用的残差网络中,并在视觉识别的核心任务和基准(即ImageNet数据集上的图像分类和MS COCO数据集上的目标检测)上显示出优于基线的识别性能。另一方面,我们在最先进的生成对抗网络的生成器中引入CoConv,在CIFAR-10和CelebA上显示改进的生成结果。我们的代码可在https://github.com/iduta/coconv. 摘要:We propose contextual convolution (CoConv) for visual recognition. CoConv is a direct replacement of the standard convolution, which is the core component of convolutional neural networks. CoConv is implicitly equipped with the capability of incorporating contextual information while maintaining a similar number of parameters and computational cost compared to the standard convolution. CoConv is inspired by neuroscience studies indicating that (i) neurons, even from the primary visual cortex (V1 area), are involved in detection of contextual cues and that (ii) the activity of a visual neuron can be influenced by the stimuli placed entirely outside of its theoretical receptive field. On the one hand, we integrate CoConv in the widely-used residual networks and show improved recognition performance over baselines on the core tasks and benchmarks for visual recognition, namely image classification on the ImageNet data set and object detection on the MS COCO data set. On the other hand, we introduce CoConv in the generator of a state-of-the-art Generative Adversarial Network, showing improved generative results on CIFAR-10 and CelebA. Our code is available at https://github.com/iduta/coconv.
【11】 AGNet: Weighing Black Holes with Deep Learning 标题:AgNet:用深度学习加权黑洞 链接:https://arxiv.org/abs/2108.07749
作者:Joshua Yao-Yu Lin,Sneh Pandya,Devanshi Pratap,Xin Liu,Matias Carrasco Kind,Volodymyr Kindratenko 机构:Department of Physics, University of Illinois at Urbana-Champaign, West Green Street, Urbana, IL , USA, National Center for Supercomputing Applications, East Springfield Avenue, Champaign, IL , USA 备注:8 pages, 7 figures, 1 table, submitting to MNRAS 摘要:超大质量黑洞(SMBH)普遍存在于大多数大质量星系的中心。测量SMBH质量对于理解SMBH的起源和演化非常重要。然而,传统的方法需要光谱数据,这是昂贵的收集。我们提出了一种使用类星体光时间序列对SMBH进行加权的算法,避免了昂贵光谱的需要。我们训练、验证和测试神经网络,这些神经网络直接从斯隆数字天空调查(SDSS)的条纹82光曲线学习,样本为38939美元的光谱确认类星体,以绘制SMBH质量和多色光学光曲线之间的非线性编码。我们发现预测的SMBH质量与基于SDSS单历元谱的基准维里质量估计之间存在0.37 dex的1$sigma$分散度,这与维里质量估计中的系统不确定性相当。我们的结果对将来从Vera C。鲁宾天文台。我们的代码,textsf{AGNet}可在{color{red}url公开获取{https://github.com/snehjp2/AGNet}}. 摘要:Supermassive black holes (SMBHs) are ubiquitously found at the centers of most massive galaxies. Measuring SMBH mass is important for understanding the origin and evolution of SMBHs. However, traditional methods require spectroscopic data which is expensive to gather. We present an algorithm that weighs SMBHs using quasar light time series, circumventing the need for expensive spectra. We train, validate, and test neural networks that directly learn from the Sloan Digital Sky Survey (SDSS) Stripe 82 light curves for a sample of $38,939$ spectroscopically confirmed quasars to map out the nonlinear encoding between SMBH mass and multi-color optical light curves. We find a 1$sigma$ scatter of 0.37 dex between the predicted SMBH mass and the fiducial virial mass estimate based on SDSS single-epoch spectra, which is comparable to the systematic uncertainty in the virial mass estimate. Our results have direct implications for more efficient applications with future observations from the Vera C. Rubin Observatory. Our code, textsf{AGNet}, is publicly available at {color{red} url{https://github.com/snehjp2/AGNet}}.
【12】 InfoGram and Admissible Machine Learning 标题:信息图与容许机器学习 链接:https://arxiv.org/abs/2108.07380
作者:Subhadeep Mukhopadhyay 备注:Keywords: Admissible machine learning; InfoGram; L-Features; Information-theory; ALFA-testing, Algorithmic risk management; Fairness; Interpretability; COREml; FINEml 摘要:我们已经进入了一个机器学习(ML)的新时代,在这个时代,具有卓越预测能力的最精确算法甚至可能无法部署,除非它在监管约束下是可接受的。这引起了人们对开发公平、透明和可信的ML方法的极大兴趣。本文的目的是介绍一种新的信息理论学习框架(可接受的机器学习)和算法风险管理工具(信息图、L特征、阿尔法测试),可以指导分析师重新设计现成的ML方法,使其符合监管要求,同时保持良好的预测准确性。我们使用了来自金融部门、生物医学研究、营销活动和刑事司法系统的几个真实数据示例来说明我们的方法。 摘要:We have entered a new era of machine learning (ML), where the most accurate algorithm with superior predictive power may not even be deployable, unless it is admissible under the regulatory constraints. This has led to great interest in developing fair, transparent and trustworthy ML methods. The purpose of this article is to introduce a new information-theoretic learning framework (admissible machine learning) and algorithmic risk-management tools (InfoGram, L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf ML methods to be regulatory compliant, while maintaining good prediction accuracy. We have illustrated our approach using several real-data examples from financial sectors, biomedical research, marketing campaigns, and the criminal justice system.
其他(9篇)
【1】 Group-aware Contrastive Regression for Action Quality Assessment 标题:群体意识对比回归在行动质量评估中的应用 链接:https://arxiv.org/abs/2108.07797
作者:Xumin Yu,Yongming Rao,Wenliang Zhao,Jiwen Lu,Jie Zhou 机构:Department of Automation, Tsinghua University, China, State Key Lab of Intelligent Technologies and Systems, China, Beijing National Research Center for Information Science and Technology, China 备注:Accepted to ICCV 2021 摘要:由于视频之间的细微差异和分数的巨大差异,评估动作质量具有挑战性。大多数现有的方法都是通过从单个视频中回归质量分数来解决这个问题,因为视频间的分数变化很大。在本文中,我们发现视频之间的关系可以为训练和推理过程中更准确的动作质量评估提供重要线索。具体而言,我们将动作质量评估问题重新表述为参考另一个具有共同属性(例如类别和难度)的视频回归相对分数,而不是学习未参考分数。根据这个公式,我们提出了一个新的对比回归(CoRe)框架,通过成对比较来学习相对分数,该框架突出了视频之间的差异,并指导模型学习评估的关键提示。为了进一步利用两个视频之间的相关信息,我们设计了一个群体感知回归树,将传统的分数回归转化为两个更简单的子问题:从粗到细的分类和小间隔回归。为了证明CoRe的有效性,我们在三个主流AQA数据集上进行了广泛的实验,包括AQA-7、MTL-AQA和JIGSAWS。我们的方法大大优于以前的方法,并在所有三个基准上建立了新的最先进水平。 摘要:Assessing action quality is challenging due to the subtle differences between videos and large variations in scores. Most existing approaches tackle this problem by regressing a quality score from a single video, suffering a lot from the large inter-video score variations. In this paper, we show that the relations among videos can provide important clues for more accurate action quality assessment during both training and inference. Specifically, we reformulate the problem of action quality assessment as regressing the relative scores with reference to another video that has shared attributes (e.g., category and difficulty), instead of learning unreferenced scores. Following this formulation, we propose a new Contrastive Regression (CoRe) framework to learn the relative scores by pair-wise comparison, which highlights the differences between videos and guides the models to learn the key hints for assessment. In order to further exploit the relative information between two videos, we devise a group-aware regression tree to convert the conventional score regression into two easier sub-problems: coarse-to-fine classification and regression in small intervals. To demonstrate the effectiveness of CoRe, we conduct extensive experiments on three mainstream AQA datasets including AQA-7, MTL-AQA and JIGSAWS. Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
【2】 Harnessing value from data science in business: ensuring explainability and fairness of solutions 标题:在业务中利用数据科学的价值:确保解决方案的可解释性和公平性 链接:https://arxiv.org/abs/2108.07714
作者:Krzysztof Chomiak,Michał Miktus 摘要:本文介绍了人工智能中公平性和可解释性(XAI)的概念,旨在解决复杂的商业问题。为了公平,作者讨论了导致偏见的细节,以及相关的缓解方法,最后提出了一套在数据驱动的组织中引入公平的方法。此外,对于XAI,作者审核了特定的算法和演示性的业务用例,讨论了大量的质量量化技术,并概述了未来的研究途径。 摘要:The paper introduces concepts of fairness and explainability (XAI) in artificial intelligence, oriented to solve a sophisticated business problems. For fairness, the authors discuss the bias-inducing specifics, as well as relevant mitigation methods, concluding with a set of recipes for introducing fairness in data-driven organizations. Additionally, for XAI, the authors audit specific algorithms paired with demonstrational business use-cases, discuss a plethora of techniques of explanations quality quantification and provide an overview of future research avenues.
【3】 Demonstrating REACT: a Real-time Educational AI-powered Classroom Tool 标题:演示反应:一个实时教育人工智能支持的课堂工具 链接:https://arxiv.org/abs/2108.07693
作者:Ajay Kulkarni,Olga Gkountouna 机构:George Mason University 备注:Published in the 14th International Conference on Educational Data Mining (EDM21) 摘要:我们展示了REACT,这是一种新的实时教育AI课堂工具,采用EDM技术支持教育者的决策过程。REACT是一种数据驱动工具,具有用户友好的图形界面。它分析学生的表现数据,并提供基于上下文的警报,以及向教育者提供课程规划建议。此外,它还结合了模型不可知论的解释,以便在决策过程中带来可解释性和可解释性。本文使用一个真实的数据集演示了我们提出的工具的一个用例场景,并给出了它的体系结构和用户界面的设计。本演示侧重于基于学生在课堂活动中的表现(即,错误的回答和使用的提示)的聚集性聚集。这种优势和劣势相似的学生群的形成可能有助于教育工作者通过识别风险学生、组建学习小组或鼓励不同优势学生之间的辅导来改进课程规划。 摘要:We present a demonstration of REACT, a new Real-time Educational AI-powered Classroom Tool that employs EDM techniques for supporting the decision-making process of educators. REACT is a data-driven tool with a user-friendly graphical interface. It analyzes students' performance data and provides context-based alerts as well as recommendations to educators for course planning. Furthermore, it incorporates model-agnostic explanations for bringing explainability and interpretability in the process of decision making. This paper demonstrates a use case scenario of our proposed tool using a real-world dataset and presents the design of its architecture and user interface. This demonstration focuses on the agglomerative clustering of students based on their performance (i.e., incorrect responses and hints used) during an in-class activity. This formation of clusters of students with similar strengths and weaknesses may help educators to improve their course planning by identifying at-risk students, forming study groups, or encouraging tutoring between students of different strengths.
【4】 Coalesced Multi-Output Tsetlin Machines with Clause Sharing 标题:具有子句共享的联合多输出Tsetlin机 链接:https://arxiv.org/abs/2108.07594
作者:Sondre Glimsdal,Ole-Christoffer Granmo 备注:23 pages, 9 figures 摘要:通过使用有限状态机学习模式,Tsetlin机器(TMs)在多个基准测试中获得了具有竞争力的精度和学习速度,并且节省了内存和能源。TM将模式表示为命题逻辑(和规则)中的连词从句,每个从句对特定输出投赞成票或反对票。虽然对单输出问题有效,但对于多输出问题,每个输出需要单独的TM。使用多个TM会阻碍模式重用,因为每个TM都在一个思洛存储器中运行。在本文中,我们引入子句共享,将多个TMs合并为单个TMs。每个子句通过使用权重与每个输出相关。正权重使子句投票给输出$1$,而负权重使子句投票给输出$0$。因此,这些子句合并产生多个输出。由此产生的联合Tsetlin机器(CoTM)通过在线交互随机搜索(SSL)和Tsetlin自动机(TA)团队同时学习每个子句的权重和组成。我们在MNIST、Fashion MNIST和Kuzushiji MNIST上的实证结果表明,CoTM在$50$到$1$K-子句配置上获得了比TM更高的准确性,这表明有能力重新调整子句的用途。例如,当每个类使用$50$子句(22 Kb内存)时,Fashion MNIST的准确度从$71.99$%提高到$89.66$%。当每个类使用超过$1$K的子句时,TM和CoTM的精度是相似的,而在MNIST上使用$8$K子句时,CoTM达到峰值精度的速度是$3倍。我们进一步研究了对不平衡训练数据的鲁棒性。我们对IMDb和CIFAR10数据的不平衡版本的评估表明,CoTM对高度的类不平衡具有鲁棒性。由于能够共享子句,我们相信CoTM将支持涉及多个输出的新TM应用领域,例如学习语言模型和自动编码。 摘要:Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks, with frugal memory- and energy footprint. A TM represents patterns as conjunctive clauses in propositional logic (AND-rules), each clause voting for or against a particular output. While efficient for single-output problems, one needs a separate TM per output for multi-output problems. Employing multiple TMs hinders pattern reuse because each TM then operates in a silo. In this paper, we introduce clause sharing, merging multiple TMs into a single one. Each clause is related to each output by using a weight. A positive weight makes the clause vote for output $1$, while a negative weight makes the clause vote for output $0$. The clauses thus coalesce to produce multiple outputs. The resulting coalesced Tsetlin Machine (CoTM) simultaneously learns both the weights and the composition of each clause by employing interacting Stochastic Searching on the Line (SSL) and Tsetlin Automata (TA) teams. Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on $50$- to $1$K-clause configurations, indicating an ability to repurpose clauses. E.g., accuracy goes from $71.99$% to $89.66$% on Fashion-MNIST when employing $50$ clauses per class (22 Kb memory). While TM and CoTM accuracy is similar when using more than $1$K clauses per class, CoTM reaches peak accuracy $3times$ faster on MNIST with $8$K clauses. We further investigate robustness towards imbalanced training data. Our evaluations on imbalanced versions of IMDb- and CIFAR10 data show that CoTM is robust towards high degrees of class imbalance. Being able to share clauses, we believe CoTM will enable new TM application domains that involve multiple outputs, such as learning language models and auto-encoding.
【5】 O-HAS: Optical Hardware Accelerator Search for Boosting Both Acceleration Performance and Development Speed 标题:O-HAS:寻求同时提高加速性能和开发速度的光学硬件加速器 链接:https://arxiv.org/abs/2108.07538
作者:Mengquan Li,Zhongzhi Yu,Yongan Zhang,Yonggan Fu,Yingyan Lin 机构:Department of Electrical and Computer Engineering, Rice University, USA 备注:Accepted at ICCAD 2021 摘要:深度神经网络(DNN)的最新突破和令人望而却步的复杂性激发了人们对特定领域DNN加速器的广泛兴趣,其中光学DNN加速器由于其前所未有的每瓦特性能潜力而特别有前途。然而,光学DNN加速器的发展要比电子DNN加速器慢得多。一个关键挑战是,虽然已经开发了许多技术来促进电动DNN加速器的开发,但支持或加速光学DNN加速器设计的技术仍然很少探索,这限制了光学DNN加速器的可实现性能和创新发展。为此,我们开发了第一个称为O-HAS的同类框架,该框架首次演示了自动光学硬件加速器搜索,以提高光学DNN加速器的加速效率和开发速度。具体而言,我们的O-HAS由两个集成的使能器组成:(1)O-Cost预测器,它可以基于DNN模型参数和光学加速器设计准确而有效地预测光学加速器的能量和延迟;(2)O-搜索引擎,可自动探索光学DNN加速器的大设计空间,并识别最佳加速器(即,微体系结构和算法到加速器的映射方法),以最大限度地提高目标加速效率。大量的实验和烧蚀研究一致地验证了我们的O-Cost预测器和O-Search引擎的有效性,以及O-HAS生成的光学加速器的卓越效率。 摘要:The recent breakthroughs and prohibitive complexities of Deep Neural Networks (DNNs) have excited extensive interest in domain-specific DNN accelerators, among which optical DNN accelerators are particularly promising thanks to their unprecedented potential of achieving superior performance-per-watt. However, the development of optical DNN accelerators is much slower than that of electrical DNN accelerators. One key challenge is that while many techniques have been developed to facilitate the development of electrical DNN accelerators, techniques that support or expedite optical DNN accelerator design remain much less explored, limiting both the achievable performance and the innovation development of optical DNN accelerators. To this end, we develop the first-of-its-kind framework dubbed O-HAS, which for the first time demonstrates automated Optical Hardware Accelerator Search for boosting both the acceleration efficiency and development speed of optical DNN accelerators. Specifically, our O-HAS consists of two integrated enablers: (1) an O-Cost Predictor, which can accurately yet efficiently predict an optical accelerator's energy and latency based on the DNN model parameters and the optical accelerator design; and (2) an O-Search Engine, which can automatically explore the large design space of optical DNN accelerators and identify the optimal accelerators (i.e., the micro-architectures and algorithm-to-accelerator mapping methods) in order to maximize the target acceleration efficiency. Extensive experiments and ablation studies consistently validate the effectiveness of both our O-Cost Predictor and O-Search Engine as well as the excellent efficiency of O-HAS generated optical accelerators.
【6】 Estimating smooth and sparse neural receptive fields with a flexible spline basis 标题:用柔性样条基估计平滑和稀疏神经感受野 链接:https://arxiv.org/abs/2108.07537
作者:Ziwei Huang,Yanli Ran,Jonathan Oesterle,Thomas Euler,Philipp Berens 机构:|, Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany, Centre for Integrative Neuroscience, Tübingen AI Center, University of, Correspondence, Research, University of Tübingen, Tübingen, Funding information, The German Research Foundation: 摘要:时空感受野(STRF)模型常用于近似由感觉神经元执行的计算。通常,假定此类strf是平滑和稀疏的。目前基于经验贝叶斯估计strf的最新方法在高维环境中的计算效率通常不如感觉神经科学中所遇到的那样高。在这里,我们采用了另一种方法,通过选择一组具有所需性质的基函数:自然三次样条,对用于估计STRF的先验知识进行编码。我们的方法计算效率高,可以很容易地应用于现有的各种模型。我们在模拟和实验数据上比较了基于样条的方法和非样条方法的性能,结果表明基于样条的方法始终优于非样条方法。 摘要:Spatio-temporal receptive field (STRF) models are frequently used to approximate the computation implemented by a sensory neuron. Typically, such STRFs are assumed to be smooth and sparse. Current state-of-the-art approaches for estimating STRFs based on empirical Bayes are often not computationally efficient in high-dimensional settings, as encountered in sensory neuroscience. Here we pursued an alternative approach and encode prior knowledge for estimation of STRFs by choosing a set of basis functions with the desired properties: natural cubic splines. Our method is computationally efficient and can be easily applied to a wide range of existing models. We compared the performance of spline-based methods to non-spline ones on simulated and experimental data, showing that spline-based methods consistently outperform the non-spline versions.
【7】 Stability and Generalization for Randomized Coordinate Descent 标题:随机坐标下降的稳定性和泛化 链接:https://arxiv.org/abs/2108.07414
作者:Puyu Wang,Liang Wu,Yunwen Lei 机构:School of Mathematics, Northwest University, Xi’an , China, Center of Statistical Research, School of Statistics, Southwestern University of Finance and Economics, Chengdu , China, School of Computer Science, University of Birmingham, Birmingham B,TT, UK 备注:12 pages, 1 figure 摘要:随机坐标下降(RCD)是一种流行的优化算法,在解决各种机器学习问题中有着广泛的应用,这促使人们对其收敛性进行了大量的理论分析。作为比较,没有研究RCD训练的模型如何推广到测试示例的工作。在本文中,我们利用强大的算法稳定性工具初始化RCD的泛化分析。我们为凸目标和强凸目标建立了RCD的参数稳定性界,通过展示如何提前停止算法以折衷估计和优化,我们从中得到了最优泛化界。我们的分析表明,与随机梯度下降相比,RCD具有更好的稳定性。 摘要:Randomized coordinate descent (RCD) is a popular optimization algorithm with wide applications in solving various machine learning problems, which motivates a lot of theoretical analysis on its convergence behavior. As a comparison, there is no work studying how the models trained by RCD would generalize to test examples. In this paper, we initialize the generalization analysis of RCD by leveraging the powerful tool of algorithmic stability. We establish argument stability bounds of RCD for both convex and strongly convex objectives, from which we develop optimal generalization bounds by showing how to early-stop the algorithm to tradeoff the estimation and optimization. Our analysis shows that RCD enjoys better stability as compared to stochastic gradient descent.
【8】 IsoScore: Measuring the Uniformity of Vector Space Utilization 标题:IsoScore:度量向量空间利用的均匀性 链接:https://arxiv.org/abs/2108.07344
作者:William Rudman,Nate Gillman,Taylor Rayne,Carsten Eickhoff 机构:Department of Computer Science, Brown University, Department of Mathematics, Brown University, Quest University 摘要:近年来,分布式单词表示的成功引起了人们对分析其空间分布特性的兴趣。当前的指标表明,当在向量空间中嵌入标记时,上下文化的单词嵌入模型不能统一地利用所有维度。在这里,我们认为现有的指标是脆弱的,往往混淆了点云的真实空间分布。为了改善这一问题,我们提出了IsoScore:一种新的度量标准,用于量化点云均匀利用环境向量空间的程度。我们证明了IsoScore具有一些理想的特性,如平均不变性和与所用维数的直接对应,这些特性是现有分数所不具备的。此外,IsoScore在概念上直观且计算效率高,非常适合分析任意向量空间中的点云分布,而不一定仅限于单词嵌入。此外,我们使用IsoScore来证明NLP文献中使用脆性空间分布度量(如平均余弦相似性)得出的一些最新结论可能不完整或完全不准确。 摘要:The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Current metrics suggest that contextualized word embedding models do not uniformly utilize all dimensions when embedding tokens in vector space. Here we argue that existing metrics are fragile and tend to obfuscate the true spatial distribution of point clouds. To ameliorate this issue, we propose IsoScore: a novel metric which quantifies the degree to which a point cloud uniformly utilizes the ambient vector space. We demonstrate that IsoScore has several desirable properties such as mean invariance and direct correspondence to the number of dimensions used, which are properties that existing scores do not possess. Furthermore, IsoScore is conceptually intuitive and computationally efficient, making it well suited for analyzing the distribution of point clouds in arbitrary vector spaces, not necessarily limited to those of word embeddings alone. Additionally, we use IsoScore to demonstrate that a number of recent conclusions in the NLP literature that have been derived using brittle metrics of spatial distribution, such as average cosine similarity, may be incomplete or altogether inaccurate.
【9】 Semi-parametric Bayesian Additive Regression Trees 标题:半参数贝叶斯加性回归树 链接:https://arxiv.org/abs/2108.07636
作者:Estevão B. Prado,Andrew C. Parnell,Nathan McJames,Ann O'Shea,Rafael A. Moral 机构:Moral, Hamilton Institute, University, Co. Kildare, Ireland., Department of Mathematics &, Statistics, Co., Insight Centre for Data Analytics, Correspondence, Present Address, Summary 摘要:我们提出了一种新的基于贝叶斯加性回归树(BART)的半参数模型。在我们的方法中,响应变量由一个线性预测因子和一个BART模型近似,其中第一个分量负责估计主要影响,BART解释了非指定的相互作用和非线性。我们的方法的新颖之处在于,我们改变了BART中的树生成动作,以处理参数和非参数组件之间的混淆,因为它们具有共同的协变量。通过合成和实际例子,我们证明了新的半参数BART与回归模型和其他基于树的方法相比具有竞争力。建议方法的实施可在https://github.com/ebprado/SP-BART. 摘要:We propose a new semi-parametric model based on Bayesian Additive Regression Trees (BART). In our approach, the response variable is approximated by a linear predictor and a BART model, where the first component is responsible for estimating the main effects and BART accounts for the non-specified interactions and non-linearities. The novelty in our approach lies in the way we change tree generation moves in BART to deal with confounding between the parametric and non-parametric components when they have covariates in common. Through synthetic and real-world examples, we demonstrate that the performance of the new semi-parametric BART is competitive when compared to regression models and other tree-based methods. The implementation of the proposed method is available at https://github.com/ebprado/SP-BART.