访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
stat统计学,共计48篇
【1】 An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises 标题:对基于私有表格合成数据训练的模型部署的分析:意想不到的惊喜
作者:Mayana Pereira,Meghana Kshirsagar,Sumit Mukherjee,Rahul Dodhia,Juan Lavista Ferres 机构: USA 2Department of Electrical Engineering, Universityof Brasilia 链接:https://arxiv.org/abs/2106.10241 摘要:差分私有(DP)合成数据集是训练机器学习模型的一种强有力的方法,同时尊重各个数据提供者的隐私。DP对训练模型公平性的影响尚不清楚。在本文中,我们系统地研究了差异私有合成数据生成对分类的影响。我们分析了由合成数据集引起的模型效用和偏差的差异,通过算法公平性度量来衡量。我们的第一组结果表明,尽管在我们评估的所有数据合成器中,隐私和效用之间似乎存在明显的负相关(越隐私,越不准确),但隐私越多并不一定意味着更多的偏见。此外,我们还评估了利用合成数据集进行模型训练和模型评估的效果。我们发现,在合成数据上得到的结果可能会错误估计实际模型在实际数据上的性能。因此,我们主张在使用不同私有合成数据集进行模型训练和评估的场景中,需要定义适当的测试协议。 摘要:Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers. The effect of DP on the fairness of the resulting trained models is not yet well understood. In this contribution, we systematically study the effects of differentially private synthetic data generation on classification. We analyze disparities in model utility and bias caused by the synthetic dataset, measured through algorithmic fairness metrics. Our first set of results show that although there seems to be a clear negative correlation between privacy and utility (the more private, the less accurate) across all data synthesizers we evaluated, more privacy does not necessarily imply more bias. Additionally, we assess the effects of utilizing synthetic datasets for model training and model evaluation. We show that results obtained on synthetic data can misestimate the actual model performance when it is deployed on real data. We hence advocate on the need for defining proper testing protocols in scenarios where differentially private synthetic datasets are utilized for model training and evaluation.
【2】 Trend estimation and short-term forecasting of COVID-19 cases and deaths worldwide 标题:全球冠状病毒病例和死亡的趋势估计和短期预测
作者:Ekaterina Krymova,Benjamín Béjar,Dorina Thanou,Tao Sun,Elisa Manetti,Gavin Lee,Kristen Namigai,Christine Choirat,Antoine Flahault,Guillaume Obozinski 机构: Swiss Data Science Center, EPFL & ETH Zürich, INN, Station , Lausanne, Switzerland, Institute of Global Health, University of Geneva, Geneva, Switzerland, Center for Intelligent Systems, EPFL, INF, Station , Lausanne, Switzerland 备注:15 pages including 5 pages of supplementary material 链接:https://arxiv.org/abs/2106.10203 摘要:自从COVID-19大流行开始以来,许多仪表盘已经成为监测大流行演变、通知公众和协助政府决策的有用工具。我们的目标是开发一种全球适用的方法,该方法集成在每日两次更新的仪表板中,根据200多个国家和地区的报告数据,提供病例和死亡人数演变趋势的估计,以及7天的预测。管理迅速传播的流行病的一个重大困难是,由于在确定病例和死亡方面的延误以及不定期的报告,预测其演变所需的动态细节变得模糊不清。我们的预测方法主要依赖于使用稳健的季节趋势分解技术估计观测时间序列中的潜在趋势。这使我们能够用简单而有效的线性或对数尺度外推方法获得预报。我们提出了我们的预测方法的评估结果,并讨论了其在全球和区域风险图制作中的应用。 摘要:Since the beginning of the COVID-19 pandemic, many dashboards have emerged as useful tools to monitor the evolution of the pandemic, inform the public, and assist governments in decision making. Our goal is to develop a globally applicable method, integrated in a twice daily updated dashboard that provides an estimate of the trend in the evolution of the number of cases and deaths from reported data of more than 200 countries and territories, as well as a seven-day forecast. One of the significant difficulties to manage a quickly propagating epidemic is that the details of the dynamic needed to forecast its evolution are obscured by the delays in the identification of cases and deaths and by irregular reporting. Our forecasting methodology substantially relies on estimating the underlying trend in the observed time series using robust seasonal trend decomposition techniques. This allows us to obtain forecasts with simple, yet effective extrapolation methods in linear or log scale. We present the results of an assessment of our forecasting methodology and discuss its application to the production of global and regional risk maps.
【3】 Deterministic Gibbs Sampling via Ordinary Differential Equations 标题:基于常微分方程的确定性Gibbs抽样
作者:Kirill Neklyudov,Roberto Bondesan,Max Welling 机构: While ergodicity of such samplers is not easy 1University of Amsterdam 2Qualcomm AI Research 链接:https://arxiv.org/abs/2106.10188 摘要:确定性动力学是许多MCMC算法的一个重要组成部分,例如混合蒙特卡罗或使用标准化流的采样器。利用微分几何中的自治常微分方程和工具,提出了一种确定性测度保持动力学的一般构造方法。我们展示了混合蒙特卡罗和其他确定性采样器如何作为我们理论的特例。然后,我们通过构造一个连续的非序列Gibbs采样,并将其扩展到离散状态空间,来证明我们的方法的实用性。我们发现我们的确定性采样器比随机采样器更有效,即使后者产生独立的样本。 摘要:Deterministic dynamics is an essential part of many MCMC algorithms, e.g. Hybrid Monte Carlo or samplers utilizing normalizing flows. This paper presents a general construction of deterministic measure-preserving dynamics using autonomous ODEs and tools from differential geometry. We show how Hybrid Monte Carlo and other deterministic samplers follow as special cases of our theory. We then demonstrate the utility of our approach by constructing a continuous non-sequential version of Gibbs sampling in terms of an ODE flow and extending it to discrete state spaces. We find that our deterministic samplers are more sample efficient than stochastic counterparts, even if the latter generate independent samples.
【4】 Robust nonparametric hypothesis tests for differences in the covariance structure of functional data 标题:函数数据协方差结构差异的稳健非参数假设检验
作者:Kelly Ramsay,Shojaeddin Chenouri 备注:40 pages, 7 figures 链接:https://arxiv.org/abs/2106.10173 摘要:我们发展了一组稳健的、非参数的假设检验,用来检测函数数据的几个总体的协方差算子之间的差异。这些测试称为FKWC测试,基于功能数据深度等级。这些测试即使在数据是重尾的情况下也能很好地工作,这在模拟和理论上都得到了证实。这些测试还提供了一些其他的好处,它们在零假设下具有简单的分布,计算成本低,并且具有变换不变性。我们表明,在一般的替代假设下,这些检验在温和的非参数假设下是一致的。作为这项工作的结果,我们引入了一个新的函数深度函数称为L2根深度,它可以很好地检测协方差核之间的幅度差异。我们提出了一个分析FKWC测试使用L2根深度在当地的选择。在模拟中,当真协方差核具有严格的正特征值时,我们证明了这些检验在保持其标称尺寸的同时,比其竞争对手具有更高的幂。我们还提供了计算样本大小和执行多重比较的方法。 摘要:We develop a group of robust, nonparametric hypothesis tests which detect differences between the covariance operators of several populations of functional data. These tests, called FKWC tests, are based on functional data depth ranks. These tests work well even when the data is heavy tailed, which is shown both in simulation and theoretically. These tests offer several other benefits, they have a simple distribution under the null hypothesis, they are computationally cheap and they possess transformation invariance properties. We show that under general alternative hypotheses these tests are consistent under mild, nonparametric assumptions. As a result of this work, we introduce a new functional depth function called L2-root depth which works well for the purposes of detecting differences in magnitude between covariance kernels. We present an analysis of the FKWC test using L2-root depth under local alternatives. In simulation, when the true covariance kernels have strictly positive eigenvalues, we show that these tests have higher power than their competitors, while still maintaining their nominal size. We also provide a methods for computing sample size and performing multiple comparisons.
【5】 Generalized Linear Randomized Response Modeling using GLMMRR 标题:基于GLMMRR的广义线性随机响应建模
作者:Jean-Paul Fox,Konrad Klotzke,Duco Veen 链接:https://arxiv.org/abs/2106.10171 摘要:随机反应(RR)设计用于收集敏感行为(如犯罪行为、性欲)的反应数据。RR数据的建模更加复杂,因为它需要对RR过程的描述。对于一类广义线性混合模型(GLMMs),RR过程可以用一个调整后的连接函数来表示,对于大多数常见的RR设计,该连接函数将期望RR与线性预测器联系起来。GLMMRR包包括GLMs和GLMMs的四种不同累积分布(即logistic分布、累积正态分布、gumbel分布、cauchy分布)的修正链接函数,其中lme4包便于ML和REML估计。GLMMRR中的混合建模框架可用于联合分析不同设计(如双重提问、多层次、混合模式、重复测量设计、多组设计)下收集的数据。保留了GLM和GLMM(lme4软件包)软件的众所周知的功能,同时添加了新的模型拟合测试、残差分析和绘图功能,以支持深入的RR数据分析。利用H'{o}glinger和Jann(2018)以及H'{o}glinger,Jann和Diekmann(2014)的数据来说明方法和软件。 摘要:Randomized response (RR) designs are used to collect response data about sensitive behaviors (e.g., criminal behavior, sexual desires). The modeling of RR data is more complex, since it requires a description of the RR process. For the class of generalized linear mixed models (GLMMs), the RR process can be represented by an adjusted link function, which relates the expected RR to the linear predictor, for most common RR designs. The package GLMMRR includes modified link functions for four different cumulative distributions (i.e., logistic, cumulative normal, gumbel, cauchy) for GLMs and GLMMs, where the package lme4 facilitates ML and REML estimation. The mixed modeling framework in GLMMRR can be used to jointly analyse data collected under different designs (e.g., dual questioning, multilevel, mixed mode, repeated measurements designs, multiple-group designs). The well-known features of the GLM and GLMM (package lme4) software are remained, while adding new model-fit tests, residual analyses, and plot functions to give support to a profound RR data analysis. Data of H"{o}glinger and Jann (2018) and H"{o}glinger, Jann, and Diekmann (2014) is used to illustrate the methodology and software.
【6】 Problem Dependent View on Structured Thresholding Bandit Problems 标题:结构化阈值Bandit问题的问题依赖观点
作者:James Cheshire,Pierre Ménard,Alexandra Carpentier 机构: the probability that the learner mis-classifies 1Otto von Guericke University Magdeburg 备注:25 pages. arXiv admin note: text overlap with arXiv:2006.10006 链接:https://arxiv.org/abs/2106.10166 摘要:研究了随机门限土匪问题(TBP)在几种形状约束下的问题依赖区域。在TBP中,学习者的目标是在一个连续的游戏结束时输出一组平均值高于给定阈值的手臂。普通的,无结构的,案例已经在文献中得到了很好的研究。以$K$为臂数,我们考虑了(i)臂的均值序列$(muu K){K=1}^K$单调递增(MTBP)和(ii)$(muu K){K=1}^K$凹的情况(CTBP)。我们考虑这两种情况下的问题相关制度和研究的概率错误-即概率误分类至少一支手臂。在固定预算的情况下,我们给出了凹和单调两种情况下误差概率的上界和下界,以及相关的算法。在这两种情况下,边界在问题相关区域中匹配到指数中的普遍常数。 摘要:We investigate the problem dependent regime in the stochastic Thresholding Bandit problem (TBP) under several shape constraints. In the TBP, the objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold. The vanilla, unstructured, case is already well studied in the literature. Taking $K$ as the number of arms, we consider the case where (i) the sequence of arm's means $(mu_k)_{k=1}^K$ is monotonically increasing (MTBP) and (ii) the case where $(mu_k)_{k=1}^K$ is concave (CTBP). We consider both cases in the problem dependent regime and study the probability of error - i.e. the probability to mis-classify at least one arm. In the fixed budget setting, we provide upper and lower bounds for the probability of error in both the concave and monotone settings, as well as associated algorithms. In both settings the bounds match in the problem dependent regime up to universal constants in the exponential.
【7】 LNIRT: An R Package for Joint Modeling of Response Accuracy and Times 标题:LNIRT:响应精度与响应时间联合建模的R软件包
作者:Jean-Paul Fox,Konrad Klotzke,Ahmet Salih Simsek 备注:3 figures 链接:https://arxiv.org/abs/2106.10144 摘要:在计算机测试中,为每个测试项目收集响应准确度(RA)和响应时间(RTs)已成为标准。IRT模型是用来衡量潜在变量(例如,能力,智力)使用RA观察。RTs中的信息有助于改进教育测试中的常规操作,并提供有关工作速度的信息。在现代应用中,需要联合模型来集成测试分析中的RT信息。R-package LNIRT支持通过用户友好的设置来拟合关节模型,该设置只需要指定RA、RT数据和Gibbs采样迭代的总数。更详细的分析规范是可选的。主要结果可以通过摘要函数报告,但也可以使用马尔可夫链蒙特卡罗(MCMC)输出工具(即coda、mcmcse)分析输出。LNIRT包的主要功能用两个实际数据应用程序来说明。 摘要:In textit{computer-based testing} it has become standard to collect response accuracy (RA) and response times (RTs) for each test item. IRT models are used to measure a latent variable (e.g., ability, intelligence) using the RA observations. The information in the RTs can help to improve routine operations in (educational) testing, and provide information about speed of working. In modern applications, the joint models are needed to integrate RT information in a test analysis. The R-package LNIRT supports fitting joint models through a user-friendly setup which only requires specifying RA, RT data, and the total number of Gibbs sampling iterations. More detailed specifications of the analysis are optional. The main results can be reported through the summary functions, but output can also be analysed with Markov chain Monte Carlo (MCMC) output tools (i.e., coda, mcmcse). The main functionality of the LNIRT package is illustrated with two real data applications.
【8】 CLT for LSS of sample covariance matrices with unbounded dispersions 标题:无界色散样本协方差阵最小二乘估计的CLT
作者:Liu Zhijun,Bai Zhidong,Hu Jiang,Song Haiyan 机构: and Haiyan SongSchool of Mathematics and Statistics, Northeast Normal Universityliuzj0 37 链接:https://arxiv.org/abs/2106.10135 摘要:在数据维数和样本大小成比例趋于无穷大的高维条件下,我们导出了大维样本协方差矩阵线性谱统计量的中心极限定理。与现有文献不同,我们的结果不需要假设总体协方差矩阵是有界的。此外,本文还允许在实际数据中使用许多常用的核函数,如对数函数和多项式函数。在我们的模型中,尖峰特征值的数目可以是固定的,也可以趋于无穷大。中心极限定理的渐近均值和协方差的一个显著特征是它与总体谱范数的发散阶有关。 摘要:Under the high-dimensional setting that data dimension and sample size tend to infinity proportionally, we derive the central limit theorem (CLT) for linear spectral statistics (LSS) of large-dimensional sample covariance matrix. Different from existing literature, our results do not require the assumption that the population covariance matrices are bounded. Moreover, many common kernel functions in the real data such as logarithmic functions and polynomial functions are allowed in this paper. In our model, the number of spiked eigenvalues can be fixed or tend to infinity. One salient feature of the asymptotic mean and covariance in our proposed central limit theorem is that it is related to the divergence order of the population spectral norm.
【9】 Assessing an Alternative for `Negative Variance Components': A Gentle Introduction to Bayesian Covariance Structure Modelling for Negative Associations Among Patients with Personalized Treatments 标题:评估“负方差分量”的替代方案:温和地介绍个性化治疗患者中负关联的贝叶斯协方差结构建模
作者:Jean-Paul Fox,Wouter Smink 机构:Research Methodology, Measurement & Data Analysis, Wouter A. C. Smink, Author Note, Behavior and Management Sciences, Department of Research Methodology, Measurement &, Data Analysis, University of Twente, P.O. Box , A.E. Enschede, The Netherlands. 备注:4 figures, 3 tables 链接:https://arxiv.org/abs/2106.10107 摘要:多层次模型(MLM)是描述层次聚类观测相关性的常用方法。一个主要特征是能够估计(特定于簇的)随机效应参数,而它们的分布描述了簇间的变化。然而,传销模型只能模拟聚集观测值之间的正关联,不适合小样本。当估计方法对随机效应方差产生负估计时,传销的局限性变得明显,这可以被视为观测值负相关的迹象。文中对贝叶斯协方差结构模型(BCSM)作了简要的介绍,使建立负相关观测模型成为可能。BCSM不通过随机(簇特定)效应建模依赖关系,而是通过协方差矩阵。我们表明,这使得BCSM特别适用于小数据样本。我们特别注意检测个性化干预的效果。个体化治疗的效果可能因个体而异,这可能导致被同一治疗师治疗的个体的测量结果之间产生负面关联。结果表明,BCSM能够对聚类度量之间的负关联进行建模,并有助于解释负聚类效应。通过一个模拟研究和一个真实的数据实例的分析,我们讨论了BCSM对于小数据集和个体化治疗效果的适用性,特别是当(标准)传销软件产生负或零方差估计时。 摘要:The multilevel model (MLM) is the popular approach to describe dependences of hierarchically clustered observations. A main feature is the capability to estimate (cluster-specific) random effect parameters, while their distribution describes the variation across clusters. However, the MLM can only model positive associations among clustered observations, and it is not suitable for small sample sizes. The limitation of the MLM becomes apparent when estimation methods produce negative estimates for random effect variances, which can be seen as an indication that observations are negatively correlated. A gentle introduction to Bayesian Covariance Structure Modelling (BCSM) is given, which makes it possible to model also negatively correlated observations. The BCSM does not model dependences through random (cluster-specific) effects, but through a covariance matrix. We show that this makes the BCSM particularly useful for small data samples. We draw specific attention to detect effects of a personalized intervention. The effect of a personalized treatment can differ across individuals, and this can lead to negative associations among measurements of individuals who are treated by the same therapist. It is shown that the BCSM enables the modeling of negative associations among clustered measurements and aids in the interpretation of negative clustering effects. Through a simulation study and by analysis of a real data example, we discuss the suitability of the BCSM for small data sets and for exploring effects of individualized treatments, specifically when (standard) MLM software produces negative or zero variance estimates.
【10】 Fitting summary statistics of neural data with a differentiable spiking network simulator 标题:用微分尖峰网络模拟器拟合神经数据的汇总统计
作者:Guillaume Bellec,Shuqi Wang,Alireza Modirshanechi,Johanni Brea,Wulfram Gerstner 机构:Laboratory of Computational Neuroscience, École polytechnique fédérale de Lausanne (EPFL) 链接:https://arxiv.org/abs/2106.10064 摘要:将网络模型与神经活动相匹配正成为神经科学的一个重要工具。一种流行的方法是用概率循环尖峰网络来模拟大脑区域,其参数使记录活动的可能性最大化。虽然这是广泛使用的,我们表明,当未记录的神经元对记录的网络有实质性影响时,所得到的模型不会产生真实的神经活动,并且错误地估计了连接矩阵。为了纠正这一点,我们建议用测量模拟活动和记录活动之间差异的项来增加对数可能性。这种差异性是通过神经科学中常用的汇总统计来定义的,这种优化是有效的,因为它依赖于通过随机模拟的尖峰序列的反向传播。我们从理论上分析了该方法,并通过实验证明了该方法比其他方法生成更真实的活动统计信息和更好地恢复连通矩阵。 摘要:Fitting network models to neural activity is becoming an important tool in neuroscience. A popular approach is to model a brain area with a probabilistic recurrent spiking network whose parameters maximize the likelihood of the recorded activity. Although this is widely used, we show that the resulting model does not produce realistic neural activity and wrongly estimates the connectivity matrix when neurons that are not recorded have a substantial impact on the recorded network. To correct for this, we suggest to augment the log-likelihood with terms that measure the dissimilarity between simulated and recorded activity. This dissimilarity is defined via summary statistics commonly used in neuroscience, and the optimization is efficient because it relies on back-propagation through the stochastically simulated spike trains. We analyze this method theoretically and show empirically that it generates more realistic activity statistics and recovers the connectivity matrix better than other methods.
【11】 Bayesian Cox Regression for Population-scale Inference in Electronic Health Records 标题:电子病历人群规模推断的贝叶斯Cox回归
作者:Alexander W. Jung,Moritz Gerstung 机构:European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB,SD, UK, University of Cambridge, Trumpington St, Cambridge CB,RL, UK, Genome Biology Unit, EMBL, Meyerhofstraße , Heidelberg, Germany 备注:35 pages, 5 figures, 4 tables 链接:https://arxiv.org/abs/2106.10057 摘要:Cox模型是一个不可或缺的时间事件分析工具,特别是在生物医学研究中。然而,医学正在经历一场深刻的变革,以前所未有的规模产生数据,这为研究和理解疾病开辟了新的领域。随着收集到的大量数据,对统计推断提出了新的挑战,因为数据集通常是高维的,在不规则间隔的时间点上显示出越来越多的测量值,并且太大而无法放入内存。当前许多事件分析的实现都不适合这些问题,因为推理的计算要求很高,并且需要一次访问完整的数据。在这里,我们提出了一个贝叶斯版本的计数过程表示Cox的部分似然有效推断大规模数据集与数百万个数据点和数千个时间相关的协变量。通过随机变分推理和对数似然重加权的结合,我们得到了一个后验分布的近似值,该后验分布在数据的子样本上进行因子分解,从而可以在大数据环境下进行分析。关键的是,该方法为大规模和高维数据集提供了可行的不确定性估计。我们通过一个模拟研究和在英国生物银行心肌梗死的应用来展示我们的方法的实用性。 摘要:The Cox model is an indispensable tool for time-to-event analysis, particularly in biomedical research. However, medicine is undergoing a profound transformation, generating data at an unprecedented scale, which opens new frontiers to study and understand diseases. With the wealth of data collected, new challenges for statistical inference arise, as datasets are often high dimensional, exhibit an increasing number of measurements at irregularly spaced time points, and are simply too large to fit in memory. Many current implementations for time-to-event analysis are ill-suited for these problems as inference is computationally demanding and requires access to the full data at once. Here we propose a Bayesian version for the counting process representation of Cox's partial likelihood for efficient inference on large-scale datasets with millions of data points and thousands of time-dependent covariates. Through the combination of stochastic variational inference and a reweighting of the log-likelihood, we obtain an approximation for the posterior distribution that factorizes over subsamples of the data, enabling the analysis in big data settings. Crucially, the method produces viable uncertainty estimates for large-scale and high-dimensional datasets. We show the utility of our method through a simulation study and an application to myocardial infarction in the UK Biobank.
【12】 On Contrastive Representations of Stochastic Processes 标题:关于随机过程的对比表示
作者:Emile Mathieu,Adam Foster,Yee Whye Teh 机构:† Department of Statistics, University of Oxford, United Kingdom, ‡ Deepmind, United Kingdom 链接:https://arxiv.org/abs/2106.10052 摘要:随机过程的学习表示是机器学习中的一个新兴问题,从元学习到物理对象模型再到时间序列。典型的方法依赖于观测值的精确重建,但这种方法随着观测值变得高维或噪声分布变得复杂而失效。为了解决这个问题,我们提出了一个学习随机过程对比表示(CRESP)的统一框架,该框架不需要精确重构。我们剖析了随机过程表示的潜在用例,并提出了适应每种情况的方法。实验证明,我们的方法对于学习周期函数、三维物体和动力学过程的表示是有效的。我们的方法比传统方法更能容忍高维的噪声观测,并且学习到的表征可以转移到一系列的下游任务中。 摘要:Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learning contrastive representations of stochastic processes (CRESP) that does away with exact reconstruction. We dissect potential use cases for stochastic process representations, and propose methods that accommodate each. Empirically, we show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes. Our methods tolerate noisy high-dimensional observations better than traditional approaches, and the learned representations transfer to a range of downstream tasks.
【13】 Sparse Linear Spectral Unmixing of Hyperspectral images using Expectation-Propagation 标题:基于期望传播的高光谱图像稀疏线性光谱分解
作者:Zeng Li,Yoann Altmann,Jie Chen,Stephen Mclaughlin,Susanto Rahardja 链接:https://arxiv.org/abs/2106.09985 摘要:提出了一种新的高光谱图像的贝叶斯分解方法。观察到的像素是由一个线性组合的材料签名加权其相应的丰度。采用尖峰-板丰度先验来促进稀疏混合,Ising先验模型用来捕捉像素间混合支持度的空间相关性。我们使用期望传播(EP)方法来近似丰度的后验分布。结果表明,与传统的不确定性量化方法相比,该方法可以显著降低分解阶段的计算复杂度,同时提供不确定性度量。此外,每个EP因子内的许多可变参数可以并行地更新,这使得基于图形处理单元(GPU)的高效算法架构的映射成为可能。在相同的近似贝叶斯框架下,我们将该算法推广到半监督分解,将丰度视为潜在变量,并用期望最大化(EM)算法对端元矩阵进行细化。在合成数据和真实高光谱数据上的实验结果表明,该框架优于现有的线性分解方法。 摘要:This paper presents a novel Bayesian approach for hyperspectral image unmixing. The observed pixels are modeled by a linear combination of material signatures weighted by their corresponding abundances. A spike-and-slab abundance prior is adopted to promote sparse mixtures and an Ising prior model is used to capture spatial correlation of the mixture support across pixels. We approximate the posterior distribution of the abundances using the expectation-propagation (EP) method. We show that it can significantly reduce the computational complexity of the unmixing stage and meanwhile provide uncertainty measures, compared to expensive Monte Carlo strategies traditionally considered for uncertainty quantification. Moreover, many variational parameters within each EP factor can be updated in a parallel manner, which enables mapping of efficient algorithmic architectures based on graphics processing units (GPU). Under the same approximate Bayesian framework, we then extend the proposed algorithm to semi-supervised unmixing, whereby the abundances are viewed as latent variables and the expectation-maximization (EM) algorithm is used to refine the endmember matrix. Experimental results on synthetic data and real hyperspectral data illustrate the benefits of the proposed framework over state-of-art linear unmixing methods.
【14】 Local asymptotics of cross-validation in least-squares density estimation 标题:最小二乘密度估计中交叉验证的局部渐近性
作者:Guillaume Maillard 链接:https://arxiv.org/abs/2106.09962 摘要:在模型选择中,通常使用几种类型的交叉验证,并引入了许多变体。虽然其中一些方法的一致性已经被证明,但它们收敛到oracle的速度通常还是未知的。到目前为止,还缺乏一个能够回答这个问题的交叉验证的渐近分析。现有结果侧重于单个估计员风险的“逐点”估计,而分析模型选择需要了解CV风险如何随模型变化。本文研究了密度估计中三角级数估计在最优模型附近CV风险的渐近性。渐近地,简单验证和“不完全”V-折叠CV的行为类似于凸函数fn和随时间W gn/V变化的对称布朗函数之和。我们认为,这是正确的渐进框架研究模型选择。 摘要:In model selection, several types of cross-validation are commonly used and many variants have been introduced. While consistency of some of these methods has been proven, their rate of convergence to the oracle is generally still unknown. Until now, an asymptotic analysis of crossvalidation able to answer this question has been lacking. Existing results focus on the ''pointwise'' estimation of the risk of a single estimator, whereas analysing model selection requires understanding how the CV risk varies with the model. In this article, we investigate the asymptotics of the CV risk in the neighbourhood of the optimal model, for trigonometric series estimators in density estimation. Asymptotically, simple validation and ''incomplete'' V --fold CV behave like the sum of a convex function fn and a symmetrized Brownian changed in time W gn/V. We argue that this is the right asymptotic framework for studying model selection.
【15】 SAGE: Stealthy Attack GEneration for Cyber-Physical Systems 标题:SAGE:针对网络物理系统的隐身攻击生成
作者:Michael Biehler,Zhen Zhong,Jianjun Shi 链接:https://arxiv.org/abs/2106.09905 摘要:网络物理系统(CPS)越来越多地受到黑客的攻击。最近的研究表明,CPS特别容易受到内部攻击,在这种情况下,攻击者完全了解系统配置。为了更好地防止此类攻击,我们需要了解内部攻击是如何产生的。通常,成功的内部攻击有三个关键方面:(i)最大化损害,(ii)避免被发现和(iii)最小化攻击成本。在本文中,我们提出了一个隐形攻击生成(SAGE)框架,通过公式化一个新的优化问题,考虑到这三个目标和物理约束的CPS。通过给系统添加最坏情况下的小扰动,SAGE攻击可以产生显著的破坏,而系统监控算法仍然无法检测到。在几种异常检测算法上对该方法进行了评价。结果表明,SAGE攻击可以在不被发现和保持低攻击成本的同时造成严重的破坏。我们的方法可以在本文的补充材料中获得,以帮助研究人员和实践者设计和开发弹性CP和检测算法。 摘要:Cyber-physical systems (CPS) have been increasingly attacked by hackers. Recent studies have shown that CPS are especially vulnerable to insider attacks, in which case the attacker has full knowledge of the systems configuration. To better prevent such types of attacks, we need to understand how insider attacks are generated. Typically, there are three critical aspects for a successful insider attack: (i) Maximize damage, (ii) Avoid detection and (iii) Minimize the attack cost. In this paper we propose a Stealthy Attack GEneration (SAGE) framework by formulizing a novel optimization problem considering these three objectives and the physical constraints of the CPS. By adding small worst-case perturbations to the system, the SAGE attack can generate significant damage, while remaining undetected by the systems monitoring algorithms. The proposed methodology is evaluated on several anomaly detection algorithms. The results show that SAGE attacks can cause severe damage while staying undetected and keeping the cost of an attack low. Our method can be accessed in the supplementary material of this paper to aid researcher and practitioners in the design and development of resilient CPS and detection algorithms.
【16】 Distributionally Weighted Least Squares in Structural Equation Modeling 标题:结构方程建模中的分布加权最小二乘
作者:Han Du,Peter M. Bentler 链接:https://arxiv.org/abs/2106.09845 摘要:在结构方程模型的实际数据分析中,数据不可能是正态分布的。如果忽略非正态性的实际情况,则基于正态理论的方法(如最大似然估计(ML)和基于正态理论的广义最小二乘估计(GLS))的参数估计、标准误差估计和模型拟合统计量是不可靠的。另一方面,渐近无分布(ADF)估计不依赖于任何分布假设,但在小样本和中等样本的情况下不能证明其效率优势。在某些情况下,采用包括岭GLS(RGLS)在内的错误指定损失函数的方法可以提供比基于一般理论的方法和ADF估计更好的估计和推断。我们提出了一种分布加权最小二乘(DLS)估计方法,由于它结合了基于正态理论和基于ADF的广义最小二乘估计,因此它的性能优于现有的广义最小二乘估计。计算机模拟结果表明,基于模型隐含协方差的DLS(DLS-um)提供了相对准确和有效的RMSE估计。此外,DLS M中姜元秩调整模型拟合检验统计量(T JY)的经验标准差、标准差估计的相对偏差和I型错误率与经典方法ML、GLS和RGLS具有竞争性。DLSμM的性能取决于其调谐参数a。在一个实际的数据例子中,我们通过一个bootstrap过程来说明如何实现DLSM和选择最优a。 摘要:In real data analysis with structural equation modeling, data are unlikely to be exactly normally distributed. If we ignore the non-normality reality, the parameter estimates, standard error estimates, and model fit statistics from normal theory based methods such as maximum likelihood (ML) and normal theory based generalized least squares estimation (GLS) are unreliable. On the other hand, the asymptotically distribution free (ADF) estimator does not rely on any distribution assumption but cannot demonstrate its efficiency advantage with small and modest sample sizes. The methods which adopt misspecified loss functions including ridge GLS (RGLS) can provide better estimates and inferences than the normal theory based methods and the ADF estimator in some cases. We propose a distributionally-weighted least squares (DLS) estimator, and expect that it can perform better than the existing generalized least squares, because it combines normal theory based and ADF based generalized least squares estimation. Computer simulation results suggest that model-implied covariance based DLS (DLS_M) provided relatively accurate and efficient estimates in terms of RMSE. In addition, the empirical standard errors, the relative biases of standard error estimates, and the Type I error rates of the Jiang-Yuan rank adjusted model fit test statistic (T_JY) in DLS_M were competitive with the classical methods including ML, GLS, and RGLS. The performance of DLS_M depends on its tuning parameter a. We illustrate how to implement DLS_M and select the optimal a by a bootstrap procedure in a real data example.
【17】 Entrywise limit theorems of eigenvectors and their one-step refinement for sparse random graphs 标题:稀疏随机图特征向量的逐项极限定理及其一步求精
作者:Fangzheng Xie 链接:https://arxiv.org/abs/2106.09840 摘要:建立了稀疏随机图的特征向量入口极限的有限样本Berry-Esseen定理及其一步求精。对于特征向量的入口极限,允许平均期望度以$Omega(logn)$的速率增长,其中,$n$是顶点数,对于特征向量一步细化的入口极限,我们要求期望度以$Omega(logn)$的速率增长。结果表明,一步细化比谱中的特征向量具有更小的入口协方差。发展这些极限定理的关键技术贡献是一个锐利的有限样本入口特征向量摄动界。特别地,当图的平均期望度与$logn$成正比时,高阶余数的两到无穷范数上存在的误差界是不充分的。我们的证明依赖于一个解耦策略,它使用了辅助矩阵的“遗漏”结构。 摘要:We establish finite-sample Berry-Esseen theorems for the entrywise limits of the eigenvectors and their one-step refinement for sparse random graphs. For the entrywise limits of the eigenvectors, the average expected degree is allowed to grow at the rate $Omega(log n)$, where $n$ is the number of vertices, and for the entrywise limits of the one-step refinement of the eigenvectors, we require the expected degree to grow at the rate $omega(log n)$. The one-step refinement is shown to have a smaller entrywise covariance than the eigenvectors in spectra. The key technical contribution towards the development of these limit theorems is a sharp finite-sample entrywise eigenvector perturbation bound. In particular, the existed error bounds on the two-to-infinity norms of the higher-order remainders are not sufficient when the graph average expected degree is proportional to $log n$. Our proof relies on a decoupling strategy using a ``leave-one-out'' construction of auxiliary matrices.
【18】 Wide stochastic networks: Gaussian limit and PAC-Bayesian training 标题:广域随机网络:高斯极限与PAC-贝叶斯训练
作者:Eugenio Clerico,George Deligiannidis,Arnaud Doucet 机构:Department of Statistics, University of Oxford, UK 备注:20 pages, 2 figures 链接:https://arxiv.org/abs/2106.09798 摘要:无限宽度的限制使过参数化神经网络的分析研究大大简化。通过适当的随机初始化,一个非常大的网络在训练之前和训练期间都可以很好地用高斯过程来逼近。在目前的工作中,我们建立了一个简单的随机结构,其参数是随机变量类似的结果。输出分布的显式评估允许直接优化泛化边界的PAC贝叶斯训练过程。对于一个大而有限宽度的网络,我们在MNIST上的实验表明,这种训练方法可以优于标准的PAC贝叶斯方法。 摘要:The limit of infinite width allows for substantial simplifications in the analytical study of overparameterized neural networks. With a suitable random initialization, an extremely large network is well approximated by a Gaussian process, both before and during training. In the present work, we establish a similar result for a simple stochastic architecture whose parameters are random variables. The explicit evaluation of the output distribution allows for a PAC-Bayesian training procedure that directly optimizes the generalization bound. For a large but finite-width network, we show empirically on MNIST that this training approach can outperform standard PAC-Bayesian methods.
【19】 Generalized regression operator estimation for continuous time functional data processes with missing at random response 标题:随机响应缺失的连续时间函数数据过程的广义回归算子估计
作者:Mohamed Chaouch,Naâmane Laïb 机构:Naˆamane La¨ıb, Department of Mathematics, Statistics, and Physics., Qatar University, Qatar, CY Cergy Paris Univerist´e, Laboratoire AGM, UMR , du CNRS, F-, Cergy, France. 链接:https://arxiv.org/abs/2106.09769 摘要:本文研究了基于连续平稳遍历过程(X,Y,zeta)$的不完全样本(X,Y,zeta)的广义回归函数(包括条件累积分布和条件分位数函数)的非参数核估计。预测器$X$在无限维空间中取值,而实值过程$Y$在$zeta=1$时观察到,在$zeta=0$时丢失。建立了这些估计量的点态一致相合性和中心极限定理。给出了条件偏差和渐近二次误差。讨论了广义回归函数的渐近置信区间和基于bootstrap的置信区间。第一个模拟研究是执行比较离散时间和连续时间估计。第二个模拟还讨论了在连续时间情况下最佳采样网格的选择。最后,值得注意的是,我们的结果是在遍历假设下陈述的,没有假设任何经典的混合条件。 摘要:In this paper, we are interested in nonparametric kernel estimation of a generalized regression function, including conditional cumulative distribution and conditional quantile functions, based on an incomplete sample $(X_t, Y_t, zeta_t)_{tin mathbb{ R}^ }$ copies of a continuous-time stationary ergodic process $(X, Y, zeta)$. The predictor $X$ is valued in some infinite-dimensional space, whereas the real-valued process $Y$ is observed when $zeta= 1$ and missing whenever $zeta = 0$. Pointwise and uniform consistency (with rates) of these estimators as well as a central limit theorem are established. Conditional bias and asymptotic quadratic error are also provided. Asymptotic and bootstrap-based confidence intervals for the generalized regression function are also discussed. A first simulation study is performed to compare the discrete-time to the continuous-time estimations. A second simulation is also conducted to discuss the selection of the optimal sampling mesh in the continuous-time case. Finally, it is worth noting that our results are stated under ergodic assumption without assuming any classical mixing conditions.
【20】 Causal Bias Quantification for Continuous Treatment 标题:连续治疗的因果偏差量化方法
作者:Gianluca Detommaso,Michael Brückner,Philip Schulz,Victor Chernozhukov 机构:Massachusetts Institute of Technology & Amazon 链接:https://arxiv.org/abs/2106.09762 摘要:在这项工作中,我们开发了一个新的特征边际因果效应和因果偏见的连续治疗设置。我们证明了它们可以表示为关于条件概率分布的期望,这可以通过标准的统计和概率方法来估计。期望中的所有项都可以通过自动微分来计算,对于高度非线性的模型也是如此。我们进一步发展了一个新的通过协变量调整的因果效应可识别性的完整标准,如果满足该标准,则偏差等于零。我们研究了我们的框架在三种不同情景下的有效性:混杂、过度控制和内生选择偏差下的线性模型;一种非线性模型,由于数据丢失而无法完全辨识;他汀类药物与动脉粥样硬化性心血管疾病的模拟医学研究。 摘要:In this work we develop a novel characterization of marginal causal effect and causal bias in the continuous treatment setting. We show they can be expressed as an expectation with respect to a conditional probability distribution, which can be estimated via standard statistical and probabilistic methods. All terms in the expectations can be computed via automatic differentiation, also for highly non-linear models. We further develop a new complete criterion for identifiability of causal effects via covariate adjustment, showing the bias equals zero if the criterion is met. We study the effectiveness of our framework in three different scenarios: linear models under confounding, overcontrol and endogenous selection bias; a non-linear model where full identifiability cannot be achieved because of missing data; a simulated medical study of statins and atherosclerotic cardiovascular disease.
【21】 Riemannian Convex Potential Maps 标题:黎曼凸势映射
作者:Samuel Cohen,Brandon Amos,Yaron Lipman 机构: Euclidean models will oftenEqual contribution 1University College London 2FacebookAI Research 3Weizmann Institute of Science 备注:ICML 2021 链接:https://arxiv.org/abs/2106.10272 摘要:对黎曼流形上的分布进行建模是理解非欧几里德数据的一个重要组成部分,例如在物理学和地质学中。这种空间中的萌芽方法受到表征和计算权衡的限制。提出并研究了一类利用黎曼最优输运的凸势的流。这些是通用的,可以在任何紧黎曼流形上建模分布,而不需要将流形的领域知识集成到体系结构中。我们证明,这些流动可以模拟球体上的标准分布,以及合成和地质数据上的环面。我们的源代码可以在网上免费获得http://github.com/facebookresearch/rcpm 摘要:Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited by representational and computational tradeoffs. We propose and study a class of flows that uses convex potentials from Riemannian optimal transport. These are universal and can model distributions on any compact Riemannian manifold without requiring domain knowledge of the manifold to be integrated into the architecture. We demonstrate that these flows can model standard distributions on spheres, and tori, on synthetic and geological data. Our source code is freely available online at http://github.com/facebookresearch/rcpm
【22】 MADE: Exploration via Maximizing Deviation from Explored Regions 标题:进行:通过最大限度地偏离勘探区域进行勘探
作者:Tianjun Zhang,Paria Rashidinejad,Jiantao Jiao,Yuandong Tian,Joseph Gonzalez,Stuart Russell 机构:† Department of Electrical Engineering and Computer Sciences, UC Berkeley, ‡ Department of Statistics, UC Berkeley, § Facebook AI Research 备注:28 pages, 10 figures 链接:https://arxiv.org/abs/2106.10268 摘要:在在线强化学习(RL)中,在报酬稀少的高维环境中,有效的探索仍然是一个特别具有挑战性的问题。在低维环境中,表格参数化是可能的,基于计数的置信上限(UCB)勘探方法可以获得接近最优速率的极大极小值。然而,如何在包含非线性函数逼近的实际RL任务中有效地实现UCB仍然是个未知数。为了解决这个问题,我们提出了一种新的探索方法,通过最大化下一个策略的占用率与探索区域的偏差。我们将此项作为自适应正则化器添加到标准RL目标中,以平衡勘探与开发。我们将新的目标与一个可证明收敛的算法配对,从而产生一个新的内在奖励来调整现有的奖金。所提出的内禀报酬算法易于实现,并与现有的RL算法相结合进行探索。作为概念证明,我们通过各种基于模型和无模型的算法对表格示例的新内在回报进行了评估,显示了对仅计数探索策略的改进。当在MiniGrid和DeepMind Control Suite的导航和移动任务上进行测试时,我们的方法比最新的方法显著提高了样本效率。我们的代码在https://github.com/tianjunz/MADE. 摘要:In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. In low-dimensional environments, where tabular parameterization is possible, count-based upper confidence bound (UCB) exploration methods achieve minimax near-optimal rates. However, it remains unclear how to efficiently implement UCB in realistic RL tasks that involve non-linear function approximation. To address this, we propose a new exploration approach via textit{maximizing} the deviation of the occupancy of the next policy from the explored regions. We add this term as an adaptive regularizer to the standard RL objective to balance exploration vs. exploitation. We pair the new objective with a provably convergent algorithm, giving rise to a new intrinsic reward that adjusts existing bonuses. The proposed intrinsic reward is easy to implement and combine with other existing RL algorithms to conduct exploration. As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. When tested on navigation and locomotion tasks from MiniGrid and DeepMind Control Suite benchmarks, our approach significantly improves sample efficiency over state-of-the-art methods. Our code is available at https://github.com/tianjunz/MADE.
【23】 A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization 标题:DNNs的一种概率表示:桥接互信息和泛化
作者:Xinjie Lan,Kenneth Barner 机构:Equal contribution 1Department of Electrical and ComputerEngineering, University of Delaware 备注:To appear in the ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接:https://arxiv.org/abs/2106.10262 摘要:近年来,互信息在限制深度神经网络泛化误差方面引起了广泛的关注。然而,在DNNs中准确估计MI是一个困难的问题,因此以往的工作大多不得不放宽MI界,这反过来削弱了信息论对泛化的解释。为了解决这一局限性,本文提出了一种精确估计MI的DNNs概率表示方法。利用所提出的MI估计,我们验证了推广的信息论解释,并得出了比最新松弛更严格的推广界。 摘要:Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic representation of DNNs for accurately estimating the MI. Leveraging the proposed MI estimator, we validate the information theoretic explanation for generalization, and derive a tighter generalization bound than the state-of-the-art relaxations.
【24】 Active Offline Policy Selection 标题:活动脱机策略选择
作者:Ksenia Konyushkova,Yutian Chen,Thomas Paine,Caglar Gulcehre,Cosmin Paduraru,Daniel J Mankowitz,Misha Denil,Nando de Freitas 机构:DeepMind 链接:https://arxiv.org/abs/2106.10251 摘要:本文研究了具有大量日志数据,但交互开销非常有限的域中的策略选择问题。解决这个问题将使离线强化学习策略在工业、机器人和医疗保健等领域的安全评估和部署成为可能。已经提出了几种非策略评估(OPE)技术,仅使用记录的数据来评估策略的价值。然而,OPE的评价与真实环境下的完全在线评价相比还有很大差距。为了缩小这个差距,我们引入了一个新的emph{active offline policy selection}问题公式,它结合了记录的数据和有限的在线交互来确定最佳策略。我们依靠OPE的进步来开始评估。我们建立在贝叶斯优化的基础上,迭代地决定要评估哪些策略,以便明智地利用有限的环境交互。许多候选策略可以被提出,因此,我们专注于使我们的方法具有可伸缩性,并引入一个核函数来模拟策略之间的相似性。我们使用了几个基准环境来表明,所提出的方法改进了最新的OPE估计和完全在线的政策评估,并且预算有限。此外,我们还证明了该方法的每个组成部分都是重要的,它适用于不同数量和质量的OPE估计,甚至适用于大量的候选策略。 摘要:This paper addresses the problem of policy selection in domains with abundant logged data, but with a very restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and healthcare domain among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation in the real environment. To reduce this gap, we introduce a novel emph{active offline policy selection} problem formulation, which combined logged data and limited online interactions to identify the best policy. We rely on the advances in OPE to warm start the evaluation. We build upon Bayesian optimization to iteratively decide which policies to evaluate in order to utilize the limited environment interactions wisely. Many candidate policies could be proposed, thus, we focus on making our approach scalable and introduce a kernel function to model similarity between policies. We use several benchmark environments to show that the proposed approach improves upon state-of-the-art OPE estimates and fully online policy evaluation with limited budget. Additionally, we show that each component of the proposed method is important, it works well with various number and quality of OPE estimates and even with a large number of candidate policies.
【25】 Nonparametric Hamiltonian Monte Carlo 标题:非参数哈密顿蒙特卡罗
作者:Carol Mak,Fabian Zaiser,Luke Ong 机构: it is 1Department of Computer Science, University of Ox-ford 备注:33 pages, 13 figures. To appear in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021 链接:https://arxiv.org/abs/2106.10238 摘要:概率规划使用程序来表示生成模型,其后验概率由内置的推理机计算。一个具有挑战性的目标是开发通用概率编程语言(PPL)中任意程序的通用推理算法。这些程序定义的密度可以使用随机分支和递归,通常是非参数的,因为它们对应于无限维参数空间上的模型。然而,标准的推理算法,如哈密顿蒙特卡罗(HMC)算法,目标分布具有固定数量的参数。本文介绍了非参数哈密顿蒙特卡罗(NP-HMC)算法,该算法将HMC推广到非参数模型。NP-HMC的输入是一类新的可测函数,称为“树可表示”,它是通用PPL中概率程序密度函数的独立于语言的表示。我们提供了一个NP-HMC的正确性证明,并在几个非参数的例子中证明了与现有方法相比性能的显著改进。 摘要:Probabilistic programming uses programs to express generative models whose posterior probability is then computed by built-in inference engines. A challenging goal is to develop general purpose inference algorithms that work out-of-the-box for arbitrary programs in a universal probabilistic programming language (PPL). The densities defined by such programs, which may use stochastic branching and recursion, are (in general) nonparametric, in the sense that they correspond to models on an infinite-dimensional parameter space. However standard inference algorithms, such as the Hamiltonian Monte Carlo (HMC) algorithm, target distributions with a fixed number of parameters. This paper introduces the Nonparametric Hamiltonian Monte Carlo (NP-HMC) algorithm which generalises HMC to nonparametric models. Inputs to NP-HMC are a new class of measurable functions called "tree representable", which serve as a language-independent representation of the density functions of probabilistic programs in a universal PPL. We provide a correctness proof of NP-HMC, and empirically demonstrate significant performance improvements over existing approaches on several nonparametric examples.
【26】 Efficient Black-Box Importance Sampling for VaR and CVaR Estimation 标题:VaR和CVaR估计的有效黑箱重要抽样
作者:Anand Deo,Karthyek Murthy 机构:Singapore University of Technology and Design, Somapah Rd, Singapore 链接:https://arxiv.org/abs/2106.10236 摘要:本文考虑了用机器学习特征映射或混合整数线性优化公式等复杂对象来估计损失尾部风险的重要性抽样方法。假设只有黑箱访问损失和潜在随机向量的分布,本文提出了一种有效的估计风险值和条件风险值的IS算法。在任何IS程序中,关键的挑战是,识别适当的测量变化,通过自结构IS转换实现自动化,该转换学习并复制不太罕见样本中条件过剩的浓度特性。当在对数尺度上观察时,得到的估计享受渐近最优方差缩减。仿真实验验证了该方案的有效性和实用性 摘要:This paper considers Importance Sampling (IS) for the estimation of tail risks of a loss defined in terms of a sophisticated object such as a machine learning feature map or a mixed integer linear optimisation formulation. Assuming only black-box access to the loss and the distribution of the underlying random vector, the paper presents an efficient IS algorithm for estimating the Value at Risk and Conditional Value at Risk. The key challenge in any IS procedure, namely, identifying an appropriate change-of-measure, is automated with a self-structuring IS transformation that learns and replicates the concentration properties of the conditional excess from less rare samples. The resulting estimators enjoy asymptotically optimal variance reduction when viewed in the logarithmic scale. Simulation experiments highlight the efficacy and practicality of the proposed scheme
【27】 Residual Error: a New Performance Measure for Adversarial Robustness 标题:残差:一种新的对抗鲁棒性性能度量
作者:Hossein Aboutalebi,Mohammad Javad Shafiee,Michelle Karg,Christian Scharfenberger,Alexander Wong 机构:Waterloo AI Institute, University of Waterloo, Waterloo, Ontario, Canada, ADC Automotive Distance Control Systems GmbH, Continental, Germany, DarwinAI Corp., Canada 链接:https://arxiv.org/abs/2106.10212 摘要:尽管在过去十年中,深度学习取得了重大进展,但限制深度学习广泛应用的一个主要挑战是,深度学习在对抗性攻击中的脆弱性。在存在不利扰动数据的情况下,这种对错误预测的敏感性使得深层神经网络很难用于某些现实世界的任务关键型应用。虽然大部分的研究重点都围绕着对抗性例子的创建和对抗性强化,但是评估对抗性稳健性的性能度量的领域还没有得到很好的探索。基于此,本研究提出了残差的概念,残差是一种新的性能指标,不仅可以在个体样本水平上评估深层神经网络的对抗鲁棒性,还可以用来区分对抗性和非对抗性样本,以便于对抗性样本检测。此外,我们还引入了一个混合模型来逼近残差。以图像分类为例的实验结果表明,所提出的残差度量方法对于评价几种常见的深度神经网络结构是有效的。这些结果表明,所提出的方法不仅可用于评估任务关键场景中使用的深度神经网络的鲁棒性,而且可用于设计对抗鲁棒模型。 摘要:Despite the significant advances in deep learning over the past decade, a major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This sensitivity to making erroneous predictions in the presence of adversarially perturbed data makes deep neural networks difficult to adopt for certain real-world, mission-critical applications. While much of the research focus has revolved around adversarial example creation and adversarial hardening, the area of performance measures for assessing adversarial robustness is not well explored. Motivated by this, this study presents the concept of residual error, a new performance measure for not only assessing the adversarial robustness of a deep neural network at the individual sample level, but also can be used to differentiate between adversarial and non-adversarial examples to facilitate for adversarial example detection. Furthermore, we introduce a hybrid model for approximating the residual error in a tractable manner. Experimental results using the case of image classification demonstrates the effectiveness and efficacy of the proposed residual error metric for assessing several well-known deep neural network architectures. These results thus illustrate that the proposed measure could be a useful tool for not only assessing the robustness of deep neural networks used in mission-critical scenarios, but also in the design of adversarially robust models.
【28】 Combining Pseudo-Point and State Space Approximations for Sum-Separable Gaussian Processes 标题:可和可分高斯过程的伪点和状态空间联合逼近
作者:Will Tebbutt,Arno Solin,Richard E. Turner 机构:University of Cambridge, UK, Aalto University, Finland 链接:https://arxiv.org/abs/2106.10210 摘要:高斯过程(GPs)是气候科学和流行病学等时空模拟问题中推理和学习的重要概率工具。然而,现有的GP近似不能同时支持大量的非网格空间数据点和长时间序列,这是许多应用的一个特点。伪点逼近是将GPs扩展到大数据集的金标准方法之一,非常适合处理离网空间数据。然而,它们不能有效地处理长时间的观测视界,在时间维度上恢复为三次计算尺度。状态空间GP近似非常适合处理时态数据,如果时态GP先验允许马尔可夫形式,导致时态观测数的线性复杂性,但是具有立方空间代价并且不能处理非网格空间数据。在这项工作中,我们展示了一种简单而优雅的方法,将伪点方法与状态空间GP近似框架相结合,以获得两者的最佳效果。这种方法依赖于一个令人惊讶的条件独立性,它适用于时空分离GPs。我们的经验证明,组合方法是更具可扩展性,适用于更大范围的时空问题比任何一种方法本身。 摘要:Gaussian processes (GPs) are important probabilistic tools for inference and learning in spatio-temporal modelling problems such as those in climate science and epidemiology. However, existing GP approximations do not simultaneously support large numbers of off-the-grid spatial data-points and long time-series which is a hallmark of many applications. Pseudo-point approximations, one of the gold-standard methods for scaling GPs to large data sets, are well suited for handling off-the-grid spatial data. However, they cannot handle long temporal observation horizons effectively reverting to cubic computational scaling in the time dimension. State space GP approximations are well suited to handling temporal data, if the temporal GP prior admits a Markov form, leading to linear complexity in the number of temporal observations, but have a cubic spatial cost and cannot handle off-the-grid spatial data. In this work we show that there is a simple and elegant way to combine pseudo-point methods with the state space GP approximation framework to get the best of both worlds. The approach hinges on a surprising conditional independence property which applies to space--time separable GPs. We demonstrate empirically that the combined approach is more scalable and applicable to a greater range of spatio-temporal problems than either method on its own.
【29】 Federated Robustness Propagation: Sharing Adversarial Robustness in Federated Learning 标题:联合健壮性传播:联合学习中的对抗性健壮性共享
作者:Junyuan Hong,Haotao Wang,Zhangyang Wang,Jiayu Zhou 机构:Department of Computer Science and Engineering, Michigan State University, East Lansing, MI , USA, †Department of Electrical and Computer Engineering, University of Texas at Austin, Austin TX , USA 链接:https://arxiv.org/abs/2106.10196 摘要:联邦学习(FL)是一种流行的分布式学习模式,它从一组参与的用户那里学习模型,而不需要共享原始数据。FL的一个主要挑战来自用户的异构性,用户可能具有分布不同(或非iid)的数据和不同的计算资源。与集中式学习一样,FL用户也希望模型在测试时对恶意攻击者具有鲁棒性。尽管对抗式训练(AT)为集中学习提供了一个良好的解决方案,但扩展其在FL用户中的使用带来了巨大的挑战,因为许多用户可能拥有非常有限的训练数据以及紧张的计算预算,无法负担AT所需的数据和昂贵的成本。在本文中,我们研究了一种新的学习环境,在FL过程中,它将对抗性健壮性从高资源用户(可以负担AT)传播到低资源用户(不能负担AT)。我们证明了现有的FL技术不能有效地在非iid用户之间传播对抗性健壮性,并提出了一种简单而有效的传播方法,该方法通过精心设计的批标准化统计来传递健壮性。通过大量的实验验证了该方法的合理性和有效性。特别是当学习过程中只有一小部分用户负担得起AT时,所提出的方法也能赋予FL显著的鲁棒性。代码将在验收后发布。 摘要:Federated learning (FL) emerges as a popular distributed learning schema that learns a model from a set of participating users without requiring raw data to be shared. One major challenge of FL comes from heterogeneity in users, which may have distributionally different (or non-iid) data and varying computation resources. Just like in centralized learning, FL users also desire model robustness against malicious attackers at test time. Whereas adversarial training (AT) provides a sound solution for centralized learning, extending its usage for FL users has imposed significant challenges, as many users may have very limited training data as well as tight computational budgets, to afford the data-hungry and costly AT. In this paper, we study a novel learning setting that propagates adversarial robustness from high-resource users that can afford AT, to those low-resource users that cannot afford it, during the FL process. We show that existing FL techniques cannot effectively propagate adversarial robustness among non-iid users, and propose a simple yet effective propagation approach that transfers robustness through carefully designed batch-normalization statistics. We demonstrate the rationality and effectiveness of our method through extensive experiments. Especially, the proposed method is shown to grant FL remarkable robustness even when only a small portion of users afford AT during learning. Codes will be published upon acceptance.
【30】 The Principles of Deep Learning Theory 标题:深度学习理论的基本原理
作者:Daniel A. Roberts,Sho Yaida,Boris Hanin 机构:based on research in collaboration with, arXiv:,.,v, [cs.LG] , Jun 备注:451 pages, to be published by Cambridge University Press 链接:https://arxiv.org/abs/2106.10165 摘要:这本书开发了一个有效的理论方法来理解实际意义的深层神经网络。从网络的第一性原理分量图出发,说明了如何通过求解层间迭代方程和非线性学习动力学来确定训练网络输出的精确描述。主要结果是网络的预测是近似高斯分布的,网络的深宽比控制着与无限宽高斯描述的偏差。我们解释了这些深度网络如何有效地从训练中学习非平凡的表示,并更广泛地分析了非线性模型的表示学习机制。从近似核方法的角度,我们发现这种模型的预测对底层学习算法的依赖性可以用一种简单而通用的方式来表示。为了得到这些结果,我们提出了表示群流(RG-flow)的概念来描述信号在网络中的传播。通过将网络调整到临界状态,我们给出了爆炸和消失梯度问题的一个实用解。我们进一步解释了RG流如何导致接近普遍性的行为,并让我们将从不同激活函数构建的网络分类为普遍性类。总之,我们证明了深度与宽度之比决定了训练网络集合的有效模型复杂性。通过使用信息论技术,我们估计了最佳的纵横比,在这个比例下,我们期望网络实际上是最有用的,并展示了如何使用剩余连接将这个比例推到任意深度。通过这些工具,我们可以详细了解体系结构、超参数和优化器的归纳偏差。 摘要:This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
【31】 The Dimpled Manifold Model of Adversarial Examples in Machine Learning 标题:机器学习中对抗性例子的凹陷流形模型
作者:Adi Shamir,Odelia Melamed,Oriel BenShmuel 机构:Weizmann Institute of Science, Israel 链接:https://arxiv.org/abs/2106.10151 摘要:2013年,多个研究小组独立发现了深部神经网络在输入受到微小扰动时的极端脆弱性,但尽管付出了巨大努力,这些对立的例子仍然是一个令人困惑的现象,没有明确的解释。本文介绍了一个新的概念框架(我们称之为酒窝流形模型),它简单地解释了为什么存在对抗性例子,为什么它们的扰动有如此微小的范数,为什么这些扰动看起来像随机噪声,为什么用错误标记的图像进行对抗性训练的网络仍然能够正确地对测试图像进行分类。在论文的最后一部分,我们描述了大量实验的结果,这些结果有力地支持了这一新模型,特别是我们的观点,即敌对扰动大致垂直于包含所有训练实例的低维流形。 摘要:The extreme fragility of deep neural networks when presented with tiny perturbations in their inputs was independently discovered by several research groups in 2013, but in spite of enormous effort these adversarial examples remained a baffling phenomenon with no clear explanation. In this paper we introduce a new conceptual framework (which we call the Dimpled Manifold Model) which provides a simple explanation for why adversarial examples exist, why their perturbations have such tiny norms, why these perturbations look like random noise, and why a network which was adversarially trained with incorrectly labeled images can still correctly classify test images. In the last part of the paper we describe the results of numerous experiments which strongly support this new model, and in particular our assertion that adversarial perturbations are roughly perpendicular to the low dimensional manifold which contains all the training examples.
【32】 Stochastic parareal: an application of probabilistic methods to time-parallelisation 标题:随机并行化:概率方法在时间并行化中的应用
作者:Kamran Pentland,Massimiliano Tamborrino,D. Samaddar,L. C. Appel 机构:Mathematics Institute, University of Warwick, Coventry, C,AL, United Kingdom, Department of Statistics, University of Warwick, Coventry, CV,AL, United Kingdom, Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, Oxfordshire, OX,DB 链接:https://arxiv.org/abs/2106.10139 摘要:Parareal是一种通过并行化时域数值积分含时微分方程组的算法。给定每个时间子间隔的近似初始值,该算法使用预测-校正器在固定的迭代次数内定位解,一旦满足容差就停止。这个迭代过程结合了由廉价(粗分辨率)和昂贵(精细分辨率)数值积分器定位的解。本文介绍了一种随机仿实算法,目的是加速确定性仿实算法的收敛。随机算法不是为预测-校正器提供一组确定的初始值,而是从每个时间子区间中动态变化的概率分布中采样初始值。然后,用数值方法并行传播所有样品。在连续的子间隔中产生最连续(最平滑)轨迹的初始值被选为新的、更精确的初始值集。这些值被输入到预测器-校正器中,以比确定性算法更少的迭代次数以给定的概率收敛。在常微分方程组上,用不同的概率分布实现了随机算法的性能。当采样初值的个数足够大时,我们证明了随机仿实算法在保持解的精度的同时,几乎可以肯定地以比确定性算法更少的迭代次数收敛。此外,研究还表明,随着样本数的增加,收敛速度的期望值减小。 摘要:Parareal is a well-studied algorithm for numerically integrating systems of time-dependent differential equations by parallelising the temporal domain. Given approximate initial values at each temporal sub-interval, the algorithm locates a solution in a fixed number of iterations using a predictor-corrector, stopping once a tolerance is met. This iterative process combines solutions located by inexpensive (coarse resolution) and expensive (fine resolution) numerical integrators. In this paper, we introduce a textit{stochastic parareal} algorithm with the aim of accelerating the convergence of the deterministic parareal algorithm. Instead of providing the predictor-corrector with a deterministically located set of initial values, the stochastic algorithm samples initial values from dynamically varying probability distributions in each temporal sub-interval. All samples are then propagated by the numerical method in parallel. The initial values yielding the most continuous (smoothest) trajectory across consecutive sub-intervals are chosen as the new, more accurate, set of initial values. These values are fed into the predictor-corrector, converging in fewer iterations than the deterministic algorithm with a given probability. The performance of the stochastic algorithm, implemented using various probability distributions, is illustrated on systems of ordinary differential equations. When the number of sampled initial values is large enough, we show that stochastic parareal converges almost certainly in fewer iterations than the deterministic algorithm while maintaining solution accuracy. Additionally, it is shown that the expected value of the convergence rate decreases with increasing numbers of samples.
【33】 ScoreGrad: Multivariate Probabilistic Time Series Forecasting with Continuous Energy-based Generative Models 标题:ScoreGrad:基于连续能量生成模型的多变量概率时间序列预测
作者:Tijin Yan,Hongwei Zhang,Tong Zhou,Yufeng Zhan,Yuanqing Xia 机构: Zhan are with the School ofAutomation, Beijing Institute of Technology 备注:12 pages, 10 figures 链接:https://arxiv.org/abs/2106.10121 摘要:多元时间序列预测因其在智能交通、人工智能等领域的广泛应用而受到广泛关注。由于生成模型能够对数据分布进行建模并考虑噪声的影响,因此在时间序列建模中取得了令人瞩目的成果。然而,由于生成模型函数形式的限制或对超参数的敏感性,现有的许多研究成果不能得到广泛的应用。本文提出了基于连续能量生成模型的多元概率时间序列预测框架ScoreGrad。ScoreGrad由时间序列特征提取模块和基于条件随机微分方程的分数匹配模块组成。这种预测可以通过迭代求解逆时SDE来实现。据我们所知,ScoreGrad是第一个用于时间序列预测的基于连续能量的生成模型。此外,ScoreGrad在六个真实数据集上获得了最新的结果。此外,还探讨了超参数和采样器类型对性能的影响。代码位于https://github.com/yantijin/ScoreGradPred. 摘要:Multivariate time series prediction has attracted a lot of attention because of its wide applications such as intelligence transportation, AIOps. Generative models have achieved impressive results in time series modeling because they can model data distribution and take noise into consideration. However, many existing works can not be widely used because of the constraints of functional form of generative models or the sensitivity to hyperparameters. In this paper, we propose ScoreGrad, a multivariate probabilistic time series forecasting framework based on continuous energy-based generative models. ScoreGrad is composed of time series feature extraction module and conditional stochastic differential equation based score matching module. The prediction can be achieved by iteratively solving reverse-time SDE. To the best of our knowledge, ScoreGrad is the first continuous energy based generative model used for time series forecasting. Furthermore, ScoreGrad achieves state-of-the-art results on six real-world datasets. The impact of hyperparameters and sampler types on the performance are also explored. Code is available at https://github.com/yantijin/ScoreGradPred.
【34】 It's FLAN time! Summing feature-wise latent representations for interpretability 标题:果仁时间到了!用于可解释性的基于特征的潜在表示求和
作者:An-phi Nguyen,Maria Rodriguez Martinez 机构:IBM Research Europe, ETH Zürich, Zürich, Switzerland, Maria Rodríguez Martínez 链接:https://arxiv.org/abs/2106.10086 摘要:可解释性已成为机器学习模型在关键场景(如法律系统、医疗保健)中的一个必要特性。在这些情况下,算法决策可能对受决策影响的最终用户产生(潜在的负面)长期影响。在许多情况下,不需要深度学习模型的表现力,因此应首选简单且可解释的模型(例如线性模型)。然而,在高维和/或复杂领域(如计算机视觉),需要神经网络的通用逼近能力。受线性模型和Kolmogorov-Arnol表示定理的启发,我们提出了一类新的结构约束神经网络,我们称之为FLANs(特征型潜在可加网络)。关键的是,FLANs分别处理每个输入特征,为每个特征计算公共潜在空间中的表示。然后简单地对这些特征的潜在表示进行求和,并使用聚集的表示进行预测。这些约束(线性模型可解释性的核心)允许用户独立于其他特征估计每个特征的效果,从而增强可解释性。在一组跨不同领域的实验中,我们展示了在不过度影响测试性能的情况下,FLANs中提出的结构约束确实提高了深度学习模型的可解释性。 摘要:Interpretability has become a necessary feature for machine learning models deployed in critical scenarios, e.g. legal systems, healthcare. In these situations, algorithmic decisions may have (potentially negative) long-lasting effects on the end-user affected by the decision. In many cases, the representational power of deep learning models is not needed, therefore simple and interpretable models (e.g. linear models) should be preferred. However, in high-dimensional and/or complex domains (e.g. computer vision), the universal approximation capabilities of neural networks is required. Inspired by linear models and the Kolmogorov-Arnol representation theorem, we propose a novel class of structurally-constrained neural networks, which we call FLANs (Feature-wise Latent Additive Networks). Crucially, FLANs process each input feature separately, computing for each of them a representation in a common latent space. These feature-wise latent representations are then simply summed, and the aggregated representation is used for prediction. These constraints (which are at the core of the interpretability of linear models) allow an user to estimate the effect of each individual feature independently from the others, enhancing interpretability. In a set of experiments across different domains, we show how without compromising excessively the test performance, the structural constraints proposed in FLANs indeed increase the interpretability of deep learning models.
【35】 Being a Bit Frequentist Improves Bayesian Neural Networks 标题:比特频率法改进了贝叶斯神经网络
作者:Agustinus Kristiadi,Matthias Hein,Philipp Hennig 机构:University of Tübingen and MPI for Intelligent Systems, Tübingen 链接:https://arxiv.org/abs/2106.10065 摘要:尽管贝叶斯神经网络(BNNs)具有令人信服的理论特性,但在基于分类的不确定性量化(UQ)任务(如分布外检测(OOD)和数据集移位鲁棒性)中,其性能往往比频点法差。在这项工作中,基于先前工作中的经验发现,我们假设这个问题是由于在所谓的“OOD训练”中避免了贝叶斯方法,这是一系列在训练过程中合并OOD数据的技术,这一直是最先进的频繁UQ方法的一个组成部分。为了验证这一点,我们将OOD数据作为BNN训练中的一等公民,探索了将OOD数据整合到贝叶斯推理中的四种不同方法。我们在广泛的实验中表明,OOD训练的bnn是有竞争力的,如果不是比最近的频繁基线。因此,这项工作提供了强有力的基线,为今后的工作在贝叶斯和频繁的UQ。 摘要:Despite their compelling theoretical properties, Bayesian neural networks (BNNs) tend to perform worse than frequentist methods in classification-based uncertainty quantification (UQ) tasks such as out-of-distribution (OOD) detection and dataset-shift robustness. In this work, based on empirical findings in prior works, we hypothesize that this issue is due to the avoidance of Bayesian methods in the so-called "OOD training" -- a family of techniques for incorporating OOD data during training process, which has since been an integral part of state-of-the-art frequentist UQ methods. To validate this, we treat OOD data as a first-class citizen in BNN training by exploring four different ways of incorporating OOD data in Bayesian inference. We show in extensive experiments that OOD-trained BNNs are competitive to, if not better than recent frequentist baselines. This work thus provides strong baselines for future work in both Bayesian and frequentist UQ.
【36】 A Vertical Federated Learning Framework for Horizontally Partitioned Labels 标题:一种面向水平划分标签的垂直联合学习框架
作者:Wensheng Xia,Ying Li,Lan Zhang,Zhonghai Wu,Xiaoyong Yuan 机构:Peking University, Michigan Technological University 备注:10 pages, 6 figures 链接:https://arxiv.org/abs/2106.10056 摘要:垂直联合学习是一种协作式机器学习框架,用于在保持隐私的情况下,对垂直划分的数据进行深度学习模型的训练。它引起了学术界和工业界的广泛关注。不幸的是,在实际应用中应用大多数现有的垂直联合学习方法仍然面临两个严峻的挑战。首先,大多数现有的垂直联邦学习方法都有一个很强的假设,即至少有一方持有所有数据样本的完整标签集,而在许多实际场景中,这种假设并不满足,因为标签是水平分区的,各方只持有部分标签。现有的垂直联合学习方法只能利用部分标签,这可能导致端到端反向传播模型更新不足。第二,计算和通信资源因各方而异。一些计算和通信资源有限的各方将成为掉队者,减缓训练的收敛速度。在垂直联邦学习中,这种离散问题在水平划分标签的情况下会被夸大。为了应对这些挑战,我们提出了一种新的垂直联邦学习框架级联垂直联邦学习(CVFL),以充分利用所有水平分区标签来训练具有隐私保护的神经网络。为了缓解离散问题,我们设计了一个新的优化目标,可以增加离散者对训练模型的贡献。我们进行了一系列的定性实验来严格验证CVFL的有效性。结果表明,通过集中训练,CVFL可以获得相当的性能(例如,分类任务的准确性)。新的优化目标与训练过程中仅采用异步聚合机制相比,能进一步缓解训练过程中的掉队问题。 摘要:Vertical federated learning is a collaborative machine learning framework to train deep leaning models on vertically partitioned data with privacy-preservation. It attracts much attention both from academia and industry. Unfortunately, applying most existing vertical federated learning methods in real-world applications still faces two daunting challenges. First, most existing vertical federated learning methods have a strong assumption that at least one party holds the complete set of labels of all data samples, while this assumption is not satisfied in many practical scenarios, where labels are horizontally partitioned and the parties only hold partial labels. Existing vertical federated learning methods can only utilize partial labels, which may lead to inadequate model update in end-to-end backpropagation. Second, computational and communication resources vary in parties. Some parties with limited computational and communication resources will become the stragglers and slow down the convergence of training. Such straggler problem will be exaggerated in the scenarios of horizontally partitioned labels in vertical federated learning. To address these challenges, we propose a novel vertical federated learning framework named Cascade Vertical Federated Learning (CVFL) to fully utilize all horizontally partitioned labels to train neural networks with privacy-preservation. To mitigate the straggler problem, we design a novel optimization objective which can increase straggler's contribution to the trained models. We conduct a series of qualitative experiments to rigorously verify the effectiveness of CVFL. It is demonstrated that CVFL can achieve comparable performance (e.g., accuracy for classification tasks) with centralized training. The new optimization objective can further mitigate the straggler problem comparing with only using the asynchronous aggregation mechanism during training.
【37】 Sharp Lower and Upper Bounds for the Covariance of Bounded Random Variables 标题:有界随机变量协方差的精确上下界
作者:Ola Hössjer,Arvid Sjölander 链接:https://arxiv.org/abs/2106.10037 摘要:本文在已知两个有界随机变量的期望值、方差或两者同时存在的情况下,给出了它们协方差的上下界。当只有期望值已知时,我们的结果可以看作是Bhatia-Davis方差不等式的推广。我们还提供了许多不同的方法来标准化协方差。对于一对二元随机变量,其中一个标准化的协变量度量与一个常用的遗传变异之间依赖性度量是一致的。 摘要:In this paper we derive sharp lower and upper bounds for the covariance of two bounded random variables when knowledge about their expected values, variances or both is available. When only the expected values are known, our result can be viewed as an extension of the Bhatia-Davis Inequality for variances. We also provide a number of different ways to standardize covariance. For a binary pair random variables, one of these standardized measures of covariation agrees with a frequently used measure of dependence between genetic variants.
【38】 A Note on Optimizing Distributions using Kernel Mean Embeddings 标题:关于核均值嵌入优化分布的一个注记
作者:Boris Muzellec,Francis Bach,Alessandro Rudi 机构:∗ INRIA Paris, rue Simone Iff, Paris, France, ⋆ ENS - Département d’Informatique de l’École Normale Supérieure, ⋆ PSL Research University, rue Simone Iff, Paris, France 链接:https://arxiv.org/abs/2106.09994 摘要:核均值嵌入是一种常用的工具,它通过在再生核Hilbert空间中的无限维均值嵌入来表示概率测度。当核是特征时,均值嵌入可以用来定义概率测度之间的距离,称为最大均值差异(MMD)。均值嵌入和MMD的一个众所周知的优点是它们的低计算量和低样本复杂度。然而,由于很难描述哪些Hilbert空间向量对应于概率分布,核均值嵌入在优化分布问题中的应用受到了限制。在本文中,我们建议利用Marteau Ferey等人[2020]的正函数的核平方和参数化来拟合MMD几何体中的分布。首先,我们证明了当核是特征时,具有核平方和密度的分布是稠密的。然后,我们给出了在有限样本条件下优化这类分布的算法,并用密度拟合的数值实验加以说明。 摘要:Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low computational cost and low sample complexity. However, kernel mean embeddings have had limited applications to problems that consist in optimizing distributions, due to the difficulty of characterizing which Hilbert space vectors correspond to a probability distribution. In this note, we propose to leverage the kernel sums-of-squares parameterization of positive functions of Marteau-Ferey et al. [2020] to fit distributions in the MMD geometry. First, we show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. Then, we provide algorithms to optimize such distributions in the finite-sample setting, which we illustrate in a density fitting numerical experiment.
【39】 Evolving GANs: When Contradictions Turn into Compliance 标题:演变中的甘斯:当矛盾转变为顺从
作者:Sauptik Dhar,Javad Heydari,Samarth Tripathi,Unmesh Kurup,Mohak Shah 机构:America Research Lab, LG Electronics, Great America Pkwy, Santa Clara, CA, USA 备注:Generative Adversarial Networks, Universum Learning, Semi-Supervised Learning 链接:https://arxiv.org/abs/2106.09946 摘要:标记数据的有限可用性使得任何监督学习问题都具有挑战性。半监督学习和universum学习等替代学习设置减轻了对标记数据的依赖,但仍然需要大量的未标记数据,这些数据可能不可用或获取成本高昂。基于GAN的合成数据生成方法最近通过生成合成样本来改进手头的任务,显示出了良好的前景。但是,这些样品不能用于其他目的。在本文中,我们提出了一个GAN游戏,在有限的数据设置下提供了改进的鉴别器精度,同时生成真实的合成数据。这提供了一个额外的优势,即现在生成的数据可以用于其他类似的任务。我们提供了理论保证和实证结果来支持我们的方法。 摘要:Limited availability of labeled-data makes any supervised learning problem challenging. Alternative learning settings like semi-supervised and universum learning alleviate the dependency on labeled data, but still require a large amount of unlabeled data, which may be unavailable or expensive to acquire. GAN-based synthetic data generation methods have recently shown promise by generating synthetic samples to improve task at hand. However, these samples cannot be used for other purposes. In this paper, we propose a GAN game which provides improved discriminator accuracy under limited data settings, while generating realistic synthetic data. This provides the added advantage that now the generated data can be used for other similar tasks. We provide the theoretical guarantees and empirical results in support of our approach.
【40】 Investigating the Role of Negatives in Contrastive Representation Learning 标题:否定在对比表征学习中的作用研究
作者:Jordan T. Ash,Surbhi Goel,Akshay Krishnamurthy,Dipendra Misra 机构:Microsoft Research NYC 链接:https://arxiv.org/abs/2106.09943 摘要:噪声对比学习是一种流行的无监督表征学习方法。在这种方法中,通过简化到监督学习来获得表示,在给定语义相似性概念的情况下,学习者试图从随机(负)示例集合中区分相似(正)示例。现代对比学习管道的成功依赖于许多参数,如数据扩充的选择、负面例子的数量和批量大小;然而,对于这些参数如何相互作用以及如何影响下游性能,人们的理解还很有限。我们专注于消除这些参数之一的作用:负面例子的数量。理论上,我们证明了冲突覆盖率权衡的存在,这表明最佳的负面例子数量应该与数据中潜在概念的数量成比例。从经验上看,我们仔细研究了NLP和视觉任务中消极因素的数量所起的作用。在NLP任务中,我们发现结果与我们的理论基本一致,而我们的视觉实验比较模糊,有时甚至对负片数不敏感。我们讨论了这种行为的合理解释,并建议未来的方向,以更好地调整理论和实践。 摘要:Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. In the NLP task, we find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives. We discuss plausible explanations for this behavior and suggest future directions to better align theory and practice.
【41】 Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments 标题:迭代特征匹配:对数环境下的可证域泛化
作者:Yining Chen,Elan Rosenfeld,Mark Sellke,Tengyu Ma,Andrej Risteski 机构:Stanford University, Carnegie Mellon University 链接:https://arxiv.org/abs/2106.09913 摘要:领域泛化的目标是在不可见的测试环境中使用有限数量的训练环境中的数据来表现良好。尽管这项任务的建议算法层出不穷,但从理论和经验上评估它们的性能仍然是非常具有挑战性的。此外,最近的方法,如不变风险最小化(IRM),需要大量的训练环境-在虚假特征空间的维度上是线性的$d_s$-即使是在简单的数据模型上,如[Rosenfeld et al.,2021]提出的。在这个模型的一个变种下,我们证明了ERM和IRM都不能推广到$o(dus)$环境。在此基础上,我们提出了一种新的基于迭代特征匹配的算法,该算法保证了在只看到$O(log{dus})$环境的情况下,高概率地产生一个泛化的预测器。 摘要:Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments. Despite a proliferation of proposal algorithms for this task, assessing their performance, both theoretically and empirically is still very challenging. Moreover, recent approaches such as Invariant Risk Minimization (IRM) require a prohibitively large number of training environments - linear in the dimension of the spurious feature space $d_s$ - even on simple data models like the one proposed by [Rosenfeld et al., 2021]. Under a variant of this model, we show that both ERM and IRM cannot generalize with $o(d_s)$ environments. We then present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(log{d_s})$ environments.
【42】 Message Passing in Graph Convolution Networks via Adaptive Filter Banks 标题:基于自适应过滤银行的图卷积网络消息传递
作者:Xing Gao,Wenrui Dai,Chenglin Li,Junni Zou,Hongkai Xiong,Pascal Frossard 机构:‡Department of Electronic Engineering, Shanghai Jiao Tong University, ⋄Department of Computer Science, Shanghai Jiao Tong University, †Signal Processing Laboratory (LTS,), EPFL 链接:https://arxiv.org/abs/2106.09910 摘要:图卷积网络与消息传递图卷积网络(MPGCNs)一样,是网络数据表示学习的有力工具。然而,当数据是异构的时,大多数体系结构都受到限制,因为它们采用单一的策略来处理多通道图形信号,并且通常集中于低频信息。在本文中,我们提出了一种新的图卷积算子,称为BankGCN,它保留了消息传递模型的优点,但扩展了它们的能力,使之超越了“低通”特性。它将图上的多通道信号分解为子空间,并用自适应滤波器处理每个子空间中的特定信息。所有子空间的滤波器具有不同的频率响应,共同构成一个滤波器组。此外,谱域中的每个滤波器对应于一个消息传递方案,并且通过滤波器组实现不同的方案。重要的是,滤波器组和信号分解被联合学习以适应数据的频谱特性和目标应用。此外,与大多数现有mpgcn相比,这几乎是在没有额外参数的情况下实现的。实验结果表明,所提出的卷积算子可以在一组基准图数据集上实现良好的分类性能。 摘要:Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolution operator, termed BankGCN, which keeps benefits of message passing models, but extends their capabilities beyond `low-pass' features. It decomposes multi-channel signals on graphs into subspaces and handles particular information in each subspace with an adapted filter. The filters of all subspaces have different frequency responses and together form a filter bank. Furthermore, each filter in the spectral domain corresponds to a message passing scheme, and diverse schemes are implemented via the filter bank. Importantly, the filter bank and the signal decomposition are jointly learned to adapt to the spectral characteristics of data and to target applications. Furthermore, this is implemented almost without extra parameters in comparison with most existing MPGCNs. Experimental results show that the proposed convolution operator permits to achieve excellent performance in graph classification on a collection of benchmark graph datasets.
【43】 Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks 标题:基于深度自回归网络的批量多保真贝叶斯优化
作者:Shibo Li,Robert M. Kirby,Shandian Zhe 机构:School of Computing, University of Utah, Salt Lake City, UT 链接:https://arxiv.org/abs/2106.09884 摘要:贝叶斯优化(BO)是一种强大的黑盒优化方法,其代价昂贵。为了在成本和精度之间实现灵活的权衡,许多应用程序允许在不同的置信度下对函数进行评估。为了在最大化效益成本比的同时降低优化成本,本文提出了基于深度自回归网络的批量多保真度贝叶斯优化算法(BMBO-DARN)。我们使用一组贝叶斯神经网络来构造一个完全自回归的模型,该模型具有足够的表达能力来捕捉所有置信度之间强大而复杂的关系,从而提高代理学习和优化性能。此外,为了提高查询的质量和多样性,我们开发了一种简单而有效的批量查询方法,不需要对可信度进行任何组合搜索。我们提出了一个基于最大值熵搜索(MES)原理的批量获取函数,该函数惩罚高度相关的查询并鼓励多样性。我们使用后验样本和矩匹配来实现捕获函数的高效计算,并对每个保真度输入对进行交替优化,保证了每一步的改进。我们在四个实际的超参数优化应用中展示了我们的方法的优势。 摘要:Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper, we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications.
【44】 PAC Prediction Sets Under Covariate Shift 标题:协变量漂移下的PAC预测集
作者:Sangdon Park,Edgar Dobriban,Insup Lee,Osbert Bastani 机构:Dept. of Computer & Info. Science, PRECISE Center, University of Pennsylvania, Dept. of Statistics & Data Science, The Wharton School 链接:https://arxiv.org/abs/2106.09848 摘要:现代机器学习面临的一个重要挑战是如何严格量化模型预测的不确定性。当潜在数据分布发生变化,可能使预测模型失效时,传递不确定性尤为重要。然而,大多数现有的不确定性量化算法在出现这种变化时会崩溃。我们提出了一种新的方法,通过在协变量移位的情况下构造emph{可能近似正确(PAC)}预测集来解决这个问题。我们的方法侧重于从源分布(我们标记了训练示例)到目标分布(我们要量化不确定性)的协变量转移。我们的算法假设给定的重要性权重编码如何在协变移位下改变训练样本的概率。在实践中,通常需要估计重要性权重;因此,我们将我们的算法扩展到这样的设置,即我们得到的是重要性权重的置信区间,而不是它们的真实值。我们证明了我们的方法对基于DomainNet和ImageNet数据集设计的各种协变量转移的有效性。 摘要:An important challenge facing modern machine learning is how to rigorously quantify the uncertainty of model predictions. Conveying uncertainty is especially important when there are changes to the underlying data distribution that might invalidate the predictive model. Yet, most existing uncertainty quantification algorithms break down in the presence of such shifts. We propose a novel approach that addresses this challenge by constructing emph{probably approximately correct (PAC)} prediction sets in the presence of covariate shift. Our approach focuses on the setting where there is a covariate shift from the source distribution (where we have labeled training examples) to the target distribution (for which we want to quantify uncertainty). Our algorithm assumes given importance weights that encode how the probabilities of the training examples change under the covariate shift. In practice, importance weights typically need to be estimated; thus, we extend our algorithm to the setting where we are given confidence intervals for the importance weights rather than their true value. We demonstrate the effectiveness of our approach on various covariate shifts designed based on the DomainNet and ImageNet datasets.
【45】 Escaping strict saddle points of the Moreau envelope in nonsmooth optimization 标题:非光滑优化中逃避Moreau包络的严格鞍点
作者:Damek Davis,Mateo Díaz,Dmitriy Drusvyatskiy 机构:Mateo D´ıaz† 备注:29 pages, 1 figure 链接:https://arxiv.org/abs/2106.09815 摘要:最近的研究表明,随机扰动梯度方法可以有效地避开光滑函数的严格鞍点。我们通过分析应用于Moreau包络的随机扰动梯度法的不精确模拟,将这一工作扩展到非光滑优化。主要结论是,各种非光滑优化算法都能以可控的速率避开Moreau包络的严格鞍点。主要的技术见解是,应用于近端子问题的典型算法产生的方向近似于Moreau包络的相对梯度。 摘要:Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict saddle points of the Moreau envelope at a controlled rate. The main technical insight is that typical algorithms applied to the proximal subproblem yield directions that approximate the gradient of the Moreau envelope in relative terms.
【46】 Locally Differentially Private Federated Learning: Efficient Algorithms with Tight Risk Bounds 标题:局部差分私有联合学习:具有严格风险界的高效算法
作者:Andrew Lowy,Meisam Razaviyayn 机构:University of Southern California 链接:https://arxiv.org/abs/2106.09779 摘要:联邦学习(FL)是一种分布式学习范式,在这种范式中,许多具有异构、不平衡且通常敏感的本地数据的客户机协作学习模型。本地差异隐私(LDP)提供了一个强有力的保证,即每个客户的数据不会在训练期间和训练后泄露,而不依赖于可信的第三方。虽然自民党通常被认为过于严格,无法实现令人满意的效用,但我们的论文对这一观点提出了挑战。我们考虑的是一个不平衡的、异构的数据、跨客户端的不同隐私需求以及不可靠的通信的一般设置,其中每轮都有随机数/子集的客户端可用。针对光滑(强)凸FL提出了三种LDP算法;每一种都是分布式小批量SGD的噪声变体。一个是加速的,另一个是新的时变噪声,我们利用它得到了完全一般非i.i.d.FL问题的第一个非平凡LDP超额风险界。专门针对i.i.d.客户,我们的风险边界在集中式设置和跨设备设置中的最知名和/或最佳边界之间插值,其中每个客户仅代表一个人的数据。此外,我们还表明,在某些情况下,我们的收敛速度(几乎)与相应的非私有下界相匹配,或优于最新的非私有算法(“免费隐私”)。最后,通过数值实验验证了本文算法的有效性。 摘要:Federated learning (FL) is a distributed learning paradigm in which many clients with heterogeneous, unbalanced, and often sensitive local data, collaborate to learn a model. Local Differential Privacy (LDP) provides a strong guarantee that each client's data cannot be leaked during and after training, without relying on a trusted third party. While LDP is often believed to be too stringent to allow for satisfactory utility, our paper challenges this belief. We consider a general setup with unbalanced, heterogeneous data, disparate privacy needs across clients, and unreliable communication, where a random number/subset of clients is available each round. We propose three LDP algorithms for smooth (strongly) convex FL; each are noisy variations of distributed minibatch SGD. One is accelerated and one involves novel time-varying noise, which we use to obtain the first non-trivial LDP excess risk bound for the fully general non-i.i.d. FL problem. Specializing to i.i.d. clients, our risk bounds interpolate between the best known and/or optimal bounds in the centralized setting and the cross-device setting, where each client represents just one person's data. Furthermore, we show that in certain regimes, our convergence rate (nearly) matches the corresponding non-private lower bound or outperforms state of the art non-private algorithms (``privacy for free''). Finally, we validate our theoretical results and illustrate the practical utility of our algorithm with numerical experiments.
【47】 On Invariance Penalties for Risk Minimization 标题:论风险最小化的不变罚金
作者:Kia Khezeli,Arno Blaas,Frank Soboczenski,Nicholas Chia,John Kalantari 机构:Oxford University, King’s College London 链接:https://arxiv.org/abs/2106.09777 摘要:Arjovsky等人[2019]首次提出不变风险最小化(IRM)原则,通过利用不同实验条件下的数据异构性来解决领域泛化问题。具体来说,IRM寻求一种数据表示,在这种数据表示下,最优分类器在所有域中保持不变。尽管IRM在概念上具有吸引力,但最初提出的不变性惩罚的有效性最近受到了质疑。特别是,存在反例,对于非不变数据表示,不变性惩罚可以任意小。我们提出了另一种不变性惩罚通过重新审视格拉曼矩阵的数据表示。我们讨论了它的特征值在风险和不变性惩罚之间的关系中的作用,并证明了它对于上述反例是病态的。该方法保证在温和的非简并条件下恢复线性设置的不变表示。在两个广泛的领域泛化测试平台DomainBed和不变性unittest上的实验证明了该方法的有效性。 摘要:The Invariant Risk Minimization (IRM) principle was first proposed by Arjovsky et al. [2019] to address the domain generalization problem by leveraging data heterogeneity from differing experimental conditions. Specifically, IRM seeks to find a data representation under which an optimal classifier remains invariant across all domains. Despite the conceptual appeal of IRM, the effectiveness of the originally proposed invariance penalty has recently been brought into question. In particular, there exists counterexamples for which that invariance penalty can be arbitrarily small for non-invariant data representations. We propose an alternative invariance penalty by revisiting the Gramian matrix of the data representation. We discuss the role of its eigenvalues in the relationship between the risk and the invariance penalty, and demonstrate that it is ill-conditioned for said counterexamples. The proposed approach is guaranteed to recover an invariant representation for linear settings under mild non-degeneracy conditions. Its effectiveness is substantiated by experiments on DomainBed and InvarianceUnitTest, two extensive test beds for domain generalization.
【48】 PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python 标题:PyKale:基于Python的多源知识机器学习
作者:Haiping Lu,Xianyuan Liu,Robert Turner,Peizhen Bai,Raivo E Koot,Shuo Zhou,Mustafa Chasmai,Lawrence Schobs 机构:The University of Sheffield, Sheffield, United Kingdom, Indian Institute of Technology, Delhi, New Delhi, India 备注:This library is available at this https URL 链接:https://arxiv.org/abs/2106.09756 摘要:机器学习是一种多学科交叉研究的通用技术。然而,当大多数机器学习工具在不同领域分别开发时,在跨越学科界限方面存在着明显的障碍。我们介绍Pykale-一个Python库,用于图形、图像、文本和视频的知识感知机器学习,以支持和加速跨学科研究。我们在标准软件工程实践的基础上制定了新的绿色机器学习准则,并提出了一种新的基于流水线的应用程序编程接口(API)。PyKale专注于利用来自多个来源的知识进行准确和可解释的预测,从而通过最新的深度学习和降维模型支持多模式学习和迁移学习(特别是领域适应)。我们在Pytork上建立PyKale,并利用丰富的Pytork生态系统。我们基于管道的API设计加强了标准化和极简主义,通过减少重复和冗余、重用现有资源和跨领域回收学习模型,拥抱绿色机器学习概念。我们通过生物信息学、知识图、图像/视频识别和医学成像的例子来展示它的跨学科性质。 摘要:Machine learning is a general-purpose technology holding promises for many interdisciplinary research problems. However, significant barriers exist in crossing disciplinary boundaries when most machine learning tools are developed in different areas separately. We present Pykale - a Python library for knowledge-aware machine learning on graphs, images, texts, and videos to enable and accelerate interdisciplinary research. We formulate new green machine learning guidelines based on standard software engineering practices and propose a novel pipeline-based application programming interface (API). PyKale focuses on leveraging knowledge from multiple sources for accurate and interpretable prediction, thus supporting multimodal learning and transfer learning (particularly domain adaptation) with latest deep learning and dimensionality reduction models. We build PyKale on PyTorch and leverage the rich PyTorch ecosystem. Our pipeline-based API design enforces standardization and minimalism, embracing green machine learning concepts via reducing repetitions and redundancy, reusing existing resources, and recycling learning models across areas. We demonstrate its interdisciplinary nature via examples in bioinformatics, knowledge graph, image/video recognition, and medical imaging.