stat统计学,共计35篇
【1】 Assessment of Treatment Effect Estimators for Heavy-Tailed Data 标题:重尾数据处理效果估计器的评价 链接:https://arxiv.org/abs/2112.07602
作者:Nilesh Tripuraneni,Dhruv Madeka,Dean Foster,Dominique Perrault-Joncas,Michael I. Jordan 机构:University of California, Berkeley, Amazon 摘要:在随机对照试验(RCT)中,客观评估治疗效果(TE)估计器的一个主要障碍是缺乏测试其性能的基本事实(或验证集)。在本文中,我们提供了一种新的交叉验证方法来应对这一挑战。我们程序的关键洞察是,均值估计的噪声(但无偏)差异可以用作RCT一部分的基本真值“标签”,以测试在另一部分训练的估计器的性能。我们将这一观点与聚合方案相结合,该方案借用了大量随机对照试验的统计强度,提出了一种端到端的方法来判断估计员恢复潜在治疗效果的能力。我们评估了在亚马逊供应链中实施的709个RCT的方法。在亚马逊的AB测试语料库中,我们强调了由于反应变量的重尾性质,与恢复治疗效果相关的独特困难。在这种重尾情况下,我们的方法表明,在引入偏差的同时,积极降低或截断较大值的程序足以降低方差,以确保更准确地估计治疗效果。 摘要:A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance. In this paper, we provide a novel cross-validation-like methodology to address this challenge. The key insight of our procedure is that the noisy (but unbiased) difference-of-means estimate can be used as a ground truth "label" on a portion of the RCT, to test the performance of an estimator trained on the other portion. We combine this insight with an aggregation scheme, which borrows statistical strength across a large collection of RCTs, to present an end-to-end methodology for judging an estimator's ability to recover the underlying treatment effect. We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain. In the corpus of AB tests at Amazon, we highlight the unique difficulties associated with recovering the treatment effect due to the heavy-tailed nature of the response variables. In this heavy-tailed setting, our methodology suggests that procedures that aggressively downweight or truncate large values, while introducing bias lower the variance enough to ensure that the treatment effect is more accurately estimated.
【2】 On the Eigenstructure of Covariance Matrices with Divergent Spikes 标题:具有发散尖峰的协方差矩阵的特征结构 链接:https://arxiv.org/abs/2112.07591
作者:Simona Diaconu 机构: 57 1Department of Mathematics, Stanford University 摘要:对于Johnstone尖峰模型的推广,特征值为$M$的协方差矩阵,特征数为$N$,与样本数$N:N=N(N),M=M(N),gamma^{-1}leqfrac{N}{N}leqgamma$相当,其中$gammain(0,infty),$只要$M$的增长速度略慢于$n:lim{ntoinfty}{frac{sqrt{log{n}}}{log{frac{n}{M(n}}}}=0,我们就会以CLT的形式获得趋于无穷快的分离尖峰的一致性率。$我们的结果填补了现有文献中的一个空白,其中峰值数量的最大范围为$o(n^{1/6})$,并揭示了这些CLT中的中心具有一定程度的灵活性,因为它可以是经验的、确定性的或两者的总和。此外,我们推导了它们对应的经验特征向量与真实特征向量的一致性率,这取决于这些特征值的相对增长。 摘要:For a generalization of Johnstone's spiked model, a covariance matrix with eigenvalues all one but $M$ of them, the number of features $N$ comparable to the number of samples $n: N=N(n), M=M(n), gamma^{-1} leq frac{N}{n} leq gamma$ where $gamma in (0,infty),$ we obtain consistency rates in the form of CLTs for separated spikes tending to infinity fast enough whenever $M$ grows slightly slower than $n: lim_{n to infty}{frac{sqrt{log{n}}}{log{frac{n}{M(n)}}}}=0.$ Our results fill a gap in the existing literature in which the largest range covered for the number of spikes has been $o(n^{1/6})$ and reveal a certain degree of flexibility for the centering in these CLTs inasmuch as it can be empirical, deterministic, or a sum of both. Furthermore, we derive consistency rates of their corresponding empirical eigenvectors to their true counterparts, which turn out to depend on the relative growth of these eigenvalues.
【3】 The multirank likelihood and cyclically monotone Monte Carlo: a semiparametric approach to CCA 标题:多秩似然循环单调蒙特卡罗:CCA的半参数方法 链接:https://arxiv.org/abs/2112.07465
作者:Jordan G. Bryan,Jonathan Niles-Weed,Peter D. Hoff 机构:Department of Statistical Science, Duke University, Courant Institute of Mathematical Sciences and, the Center for Data Science, NYU 摘要:多变量数据的许多分析侧重于评估两组变量之间的依赖关系,而不是每组变量之间的依赖关系。典型相关分析(CCA)是一种经典的数据分析技术,用于估计描述这类集合之间依赖关系的参数。然而,基于传统CCA的推理过程依赖于所有变量联合正态分布的假设。我们提出了一种半参数CCA方法,其中每个变量集的多变量边际可能是任意的,但变量集之间的依赖性由参数模型描述,该模型提供了低维的依赖性摘要。虽然所提出模型中的最大似然估计难以处理,但我们开发了一种称为循环单调蒙特卡罗(CMMC)的新MCMC算法,该算法为集合间依赖参数提供估计和置信域。该算法基于一个多秩似然函数,该函数仅使用观测数据中的部分信息,交换条件是不需要对多变量裕度进行假设。我们举例说明了美国农业部提出的营养数据推断程序。 摘要:Many analyses of multivariate data are focused on evaluating the dependence between two sets of variables, rather than the dependence among individual variables within each set. Canonical correlation analysis (CCA) is a classical data analysis technique that estimates parameters describing the dependence between such sets. However, inference procedures based on traditional CCA rely on the assumption that all variables are jointly normally distributed. We present a semiparametric approach to CCA in which the multivariate margins of each variable set may be arbitrary, but the dependence between variable sets is described by a parametric model that provides a low-dimensional summary of dependence. While maximum likelihood estimation in the proposed model is intractable, we develop a novel MCMC algorithm called cyclically monotone Monte Carlo (CMMC) that provides estimates and confidence regions for the between-set dependence parameters. This algorithm is based on a multirank likelihood function, which uses only part of the information in the observed data in exchange for being free of assumptions about the multivariate margins. We illustrate the proposed inference procedure on nutrient data from the USDA.
【4】 Triangulation candidates for Bayesian optimization 标题:贝叶斯优化的三角剖分候选算法 链接:https://arxiv.org/abs/2112.07457
作者:Robert B. Gramacy,Annie Sauer,Nathan Wycoff 备注:19 pages, 9 figures 摘要:贝叶斯优化是顺序设计的一种形式:用适当灵活的非线性回归模型理想化投入产出关系;与初始实验活动的数据相符;设计并优化一个标准,用于选择拟合模型下的下一个实验条件(例如,通过预测方程),以达到目标结果(比如最小值);在这些条件下获取输出并更新拟合后重复。在许多情况下,这种针对新数据采集标准的“内部优化”很麻烦,因为它是非凸/高度多模态的,可能是不可微的,或者可能会妨碍数值优化器,特别是当推理需要蒙特卡罗时。在这种情况下,将连续搜索替换为随机候选上的离散搜索并不少见。在这里,我们建议使用基于现有输入设计的Delaunay三角剖分的候选项。除了基于传统凸包库的简单包装器来详细说明这些“三边”的构造之外,我们还基于所涉及的几何准则的特性来推广一些优势。然后,我们通过经验证明,与数值优化采集和基于基准问题的随机候选方案相比,tricands如何能够带来更好的贝叶斯优化性能。 摘要:Bayesian optimization is a form of sequential design: idealize input-output relationships with a suitably flexible nonlinear regression model; fit to data from an initial experimental campaign; devise and optimize a criterion for selecting the next experimental condition(s) under the fitted model (e.g., via predictive equations) to target outcomes of interest (say minima); repeat after acquiring output under those conditions and updating the fit. In many situations this "inner optimization" over the new-data acquisition criterion is cumbersome because it is non-convex/highly multi-modal, may be non-differentiable, or may otherwise thwart numerical optimizers, especially when inference requires Monte Carlo. In such cases it is not uncommon to replace continuous search with a discrete one over random candidates. Here we propose using candidates based on a Delaunay triangulation of the existing input design. In addition to detailing construction of these "tricands", based on a simple wrapper around a conventional convex hull library, we promote several advantages based on properties of the geometric criterion involved. We then demonstrate empirically how tricands can lead to better Bayesian optimization performance compared to both numerically optimized acquisitions and random candidate-based alternatives on benchmark problems.
【5】 Posterior contraction rates for constrained deep Gaussian processes in density estimation and classication 标题:约束深高斯过程的后验压缩率在密度估计和分类中的应用 链接:https://arxiv.org/abs/2112.07280
作者:François Bachoc,Agnès Lagnoux 机构:Institut de Mathématiques de Toulouse; UMR,. Université de Toulouse;, CNRS. UT, F-, Toulouse, France., CNRS. UT,J, F-, Toulouse, France. 摘要:我们提供了非参数密度估计和分类中受约束的深高斯过程的后验收缩率。约束以组合结构层中高斯过程的值和导数的边界形式存在。收缩率rst是在一般框架下给出的,根据我们引入的一个新的集中函数,并考虑了约束条件。然后,将一般框架应用于积分布朗运动、Riemann-Liouville过程和Mat{e}rn过程以及函数的标准光滑类。在每一个例子中,我们都可以恢复已知的极小极大速率。 摘要:We provide posterior contraction rates for constrained deep Gaussian processes in non-parametric density estimation and classication. The constraints are in the form of bounds on the values and on the derivatives of the Gaussian processes in the layers of the composition structure. The contraction rates are rst given in a general framework, in terms of a new concentration function that we introduce and that takes the constraints into account. Then, the general framework is applied to integrated Brownian motions, Riemann-Liouville processes, and Mat{'e}rn processes and to standard smoothness classes of functions. In each of these examples, we can recover known minimax rates.
【6】 Convex transport potential selection with semi-dual criterion 标题:基于半对偶准则的凸运移势选择 链接:https://arxiv.org/abs/2112.07275
作者:Adrien Vacher,François-Xavier Vialard 机构:LIGM, Univ. Gustave Eiffel, CNRS, INRIA, Fran¸cois-Xavier Vialard 摘要:在过去的几年中,许多计算模型被开发用于解决随机环境下的最优运输(OT),其中分布由样本表示。在这种情况下,我们的目标是找到一个对看不见的数据具有良好泛化特性的交通地图,理想情况下是最接近地面真相的地图,在实际环境中是未知的。然而,在缺乏基本事实的情况下,尽管它对模型选择至关重要,但没有提出衡量其泛化性能的定量标准。我们建议利用OT的Brenier公式来执行此任务。理论上,我们证明,该公式保证,在依赖于平滑度/强凸性和统计偏差项的失真参数下,所选贴图与地面真实值的二次误差最小。该准则通过凸优化进行估计,可以在OT的熵正则化、输入凸神经网络以及光滑和强凸最近Brenier(SSNB)模型中选择参数和模型。最后,我们做了一个实验,质疑OT在领域适应中的应用。由于该标准,我们可以确定最接近源和目标之间真实OT图的电位,并且我们观察到,所选电位不是最适合下游转移分类任务的电位。 摘要:Over the past few years, numerous computational models have been developed to solve Optimal Transport (OT) in a stochastic setting, where distributions are represented by samples. In such situations, the goal is to find a transport map that has good generalization properties on unseen data, ideally the closest map to the ground truth, unknown in practical settings. However, in the absence of ground truth, no quantitative criterion has been put forward to measure its generalization performance although it is crucial for model selection. We propose to leverage the Brenier formulation of OT to perform this task. Theoretically, we show that this formulation guarantees that, up to a distortion parameter that depends on the smoothness/strong convexity and a statistical deviation term, the selected map achieves the lowest quadratic error to the ground truth. This criterion, estimated via convex optimization, enables parameter and model selection among entropic regularization of OT, input convex neural networks and smooth and strongly convex nearest-Brenier (SSNB) models. Last, we make an experiment questioning the use of OT in Domain-Adaptation. Thanks to the criterion, we can identify the potential that is closest to the true OT map between the source and the target and we observe that this selected potential is not the one that performs best for the downstream transfer classification task.
【7】 Inductive Semi-supervised Learning Through Optimal Transport 标题:基于最优传输的归纳半监督学习 链接:https://arxiv.org/abs/2112.07262
作者:Mourad El Hamri,Younès Bennani,Issam Falih 机构: LIPN - CNRS UMR , Universit´e Sorbonne Paris Nord, France, LaMSN - La Maison des Sciences Num´eriques, France, LIMOS - CNRS UMR , Universit´e Clermont Auvergne, France 备注:None 摘要:在本文中,我们解决了归纳半监督学习问题,该问题旨在获得样本外数据的标签预测。提出的方法称为最优传输诱导(OTI),有效地将基于最优传输的转换算法(OTP)扩展到二进制和多类设置的诱导任务。在几个数据集上进行了一系列实验,以便将所提出的方法与最先进的方法进行比较。实验证明了该方法的有效性。我们公开了我们的代码(代码可从以下网址获得:https://github.com/MouradElHamri/OTI). 摘要:In this paper, we tackle the inductive semi-supervised learning problem that aims to obtain label predictions for out-of-sample data. The proposed approach, called Optimal Transport Induction (OTI), extends efficiently an optimal transport based transductive algorithm (OTP) to inductive tasks for both binary and multi-class settings. A series of experiments are conducted on several datasets in order to compare the proposed approach with state-of-the-art methods. Experiments demonstrate the effectiveness of our approach. We make our code publicly available (Code is available at: https://github.com/MouradElHamri/OTI).
【8】 Data-driven chimney fire risk prediction using machine learning and point process tools 标题:基于机器学习和点处理工具的数据驱动烟囱火灾风险预测 链接:https://arxiv.org/abs/2112.07257
作者:C. Lu,M. N. M. van Lieshout,M. de Graaf,P. Visscher 机构:Department of Applied Mathematics, University of Twente, Enschede, The Netherlands, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands, Thales Nederland B.V., Huizen, The Netherlands, Brandweer Twente, Enschede, The Netherlands 备注:30 pages, 13 figures 摘要:烟囱火灾是最常见的火灾类型之一。准确的预测和及时的预防对于减少它们造成的危害至关重要。在本文中,我们开发了一个结合机器学习和统计建模的过程来预测烟囱火灾。首先,我们使用随机森林和排列重要性技术来识别信息量最大的解释变量。其次,我们设计了一个泊松点过程模型,并应用相关的logistic回归估计来估计参数。此外,我们使用二阶汇总统计和残差来验证泊松模型假设。我们根据特温特消防队收集的数据实施建模过程,并获得合理的预测。与类似的研究相比,我们的方法有两个优点:i)对于随机森林,我们可以考虑变量依赖性,非参数地选择解释变量;ii)使用logistic回归估计,我们可以通过调整统计模型以关注火灾数据的重要区域和时间,有效地拟合统计模型。 摘要:Chimney fires constitute one of the most commonly occurring fire types. Precise prediction and prompt prevention are crucial in reducing the harm they cause. In this paper, we develop a combined machine learning and statistical modeling process to predict chimney fires. Firstly, we use random forests and permutation importance techniques to identify the most informative explanatory variables. Secondly, we design a Poisson point process model and apply associated logistic regression estimation to estimate the parameters. Moreover, we validate the Poisson model assumption using second-order summary statistics and residuals. We implement the modeling process on data collected by the Twente Fire Brigade and obtain plausible predictions. Compared to similar studies, our approach has two advantages: i) with random forests, we can select explanatory variables non-parametrically considering variable dependence; ii) using logistic regression estimation, we can fit the statistical model efficiently by tuning it to focus on important regions and times of the fire data.
【9】 Zero-inflated Beta distribution regression modeling 标题:零膨胀Beta分布回归建模 链接:https://arxiv.org/abs/2112.07249
作者:Becky Tang,Henry A Frye,Alan E. Gelfand,John A Silander Jr 摘要:生态数据经常遇到的一个挑战是如何解释、分析或建模具有高比例零的数据。人们对零膨胀计数数据给予了很大的关注,但缺乏具有大量0的非负连续数据的模型。我们认为零膨胀数据的单位间隔,并提供建模,以捕捉两种类型的0S的背景下,贝塔回归模型。通过对潜在回归的左截尾,我们对0进行了建模,由于偶然丢失,我们对0进行了建模;由于不适合,我们使用独立的伯努利规范对0进行建模,以在0处创建点质量。我们首先将该模型发展为环境特征的空间回归,然后扩展到引入空间随机效应。我们分层指定模型,使用潜在变量,在贝叶斯框架内拟合模型,并提供新的模型比较工具。我们的激励数据集包括南非开普植物区两种植物物种的覆盖丰度百分比。我们发现,环境特征使我们能够了解这两种类型0的发生率以及正覆盖率。我们还表明,空间随机效应模型提高了预测性能。所提议的模型使生态学家能够利用环境回归,从不适宜与偶然缺失以及存在时的丰度方面,更好地理解物种的存在/缺失。 摘要:A frequent challenge encountered with ecological data is how to interpret, analyze, or model data having a high proportion of zeros. Much attention has been given to zero-inflated count data, whereas models for non-negative continuous data with an abundance of 0s are lacking. We consider zero-inflated data on the unit interval and provide modeling to capture two types of 0s in the context of the Beta regression model. We model 0s due to missing by chance through left censoring of a latent regression, and 0s due to unsuitability using an independent Bernoulli specification to create a point mass at 0. We first develop the model as a spatial regression in environmental features and then extend to introduce spatial random effects. We specify models hierarchically, employing latent variables, fit them within a Bayesian framework, and present new model comparison tools. Our motivating dataset consists of percent cover abundance of two plant species at a collection of sites in the Cape Floristic Region of South Africa. We find that environmental features enable learning about the incidence of both types of 0s as well as the positive percent covers. We also show that the spatial random effects model improves predictive performance. The proposed modeling enables ecologists, using environmental regressors, to extract a better understanding of the presence/absence of species in terms of absence due to unsuitability vs. missingness by chance, as well as abundance when present.
【10】 Dynamic Factor Models with Sparse VAR Idiosyncratic Components 标题:具有稀疏VAR特征分量的动态因子模型 链接:https://arxiv.org/abs/2112.07149
作者:Jonas Krampe,Luca Margaritella 机构:‡†University of Mannheim, ‡Lund Universityj 摘要:我们通过利用密集建模和稀疏建模的积极方面来协调这两个世界。我们采用动态因子模型,并假设特质项遵循稀疏向量自回归模型(VAR),该模型考虑了横截面和时间依赖性。该估计分为两步:第一步,通过主成分分析估计因子及其负荷;第二步,通过对估计的特质成分进行正则化回归估计稀疏VAR。我们证明了所提出的估计方法的一致性,因为时间和横截面维度不同。在第二步中,需要考虑第一步的估计误差。在这里,我们不遵循简单插入因子估计所导出的标准速率的天真方法。相反,我们推导了一个更精确的错误表达式。这使我们能够得出更严格的利率。我们讨论了谱密度矩阵逆的预测和半参数估计的含义,并用VAR滞后长度和因子数量的联合信息标准补充了我们的程序。有限样本性能通过广泛的模拟练习来说明。在实证方面,我们使用FRED-MD数据集评估了所提出的宏观经济预测方法的性能。 摘要:We reconcile the two worlds of dense and sparse modeling by exploiting the positive aspects of both. We employ a dynamic factor model and assume the idiosyncratic term follows a sparse vector autoregressive model (VAR) which allows for cross-sectional and time dependence. The estimation is articulated in two steps: first, the factors and their loadings are estimated via principal component analysis and second, the sparse VAR is estimated by regularized regression on the estimated idiosyncratic components. We prove consistency of the proposed estimation approach as the time and cross-sectional dimension diverge. In the second step, the estimation error of the first step needs to be accounted for. Here, we do not follow the naive approach of simply plugging in the standard rates derived for the factor estimation. Instead, we derive a more refined expression of the error. This enables us to derive tighter rates. We discuss the implications to forecasting and semi-parametric estimation of the inverse of the spectral density matrix and we complement our procedure with a joint information criteria for the VAR lag-length and the number of factors. The finite sample performance is illustrated by means of an extensive simulation exercise. Empirically, we assess the performance of the proposed method for macroeconomic forecasting using the FRED-MD dataset.
【11】 Linear Discriminant Analysis with High-dimensional Mixed Variables 标题:高维混合变量的线性判别分析 链接:https://arxiv.org/abs/2112.07145
作者:Binyan Jiang,Chenlei Leng,Cheng Wang,Zhongqing Yang 机构: CHENLEI LENG 2 CHENG WANG 3 AND ZHONGQING YANG 4 1Department of Applied Mathematics, The Hong Kong Polytechnic University, hk 2Department of Statistics, University of Warwick, uk 3School of Mathematical Sciences, Shanghai Jiao Tong University 摘要:在许多领域经常遇到包含分类变量和连续变量的数据集,随着现代测量技术的快速发展,这些变量的维数可能非常高。尽管最近在为连续变量的高维数据建模方面取得了进展,但仍然缺乏处理混合变量集的方法。为了填补这一空白,本文提出了一种用混合变量对高维观测数据进行分类的新方法。我们的框架建立在一个位置模型上,在该模型中,连续变量的分布假设为高斯分布。我们通过核平滑克服了将数据分割成指数级多个单元或分类变量组合的挑战,并为其带宽选择提供了新的视角,以确保类似于Bochner引理,这与通常的偏差-方差权衡不同。我们证明了我们模型中的两组参数可以分别估计,并且为它们的估计提供了惩罚似然。建立了估计精度和误分类率的结果,并通过大量仿真和实际数据研究说明了该分类器的竞争性能。 摘要:Datasets containing both categorical and continuous variables are frequently encountered in many areas, and with the rapid development of modern measurement technologies, the dimensions of these variables can be very high. Despite the recent progress made in modelling high-dimensional data for continuous variables, there is a scarcity of methods that can deal with a mixed set of variables. To fill this gap, this paper develops a novel approach for classifying high-dimensional observations with mixed variables. Our framework builds on a location model, in which the distributions of the continuous variables conditional on categorical ones are assumed Gaussian. We overcome the challenge of having to split data into exponentially many cells, or combinations of the categorical variables, by kernel smoothing, and provide new perspectives for its bandwidth choice to ensure an analogue of Bochner's Lemma, which is different to the usual bias-variance tradeoff. We show that the two sets of parameters in our model can be separately estimated and provide penalized likelihood for their estimation. Results on the estimation accuracy and the misclassification rates are established, and the competitive performance of the proposed classifier is illustrated by extensive simulation and real data studies.
【12】 Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent 标题:在线乘性随机梯度下降法优化问题的非渐近界 链接:https://arxiv.org/abs/2112.07110
作者:Riddhiman Bhattacharya 机构:University of Minnesota 摘要:随机梯度下降(SGD)的梯度噪声被认为在其性质(如逃逸低势点和正则化)中起着关键作用。过去的研究表明,通过小批量处理产生的SGD误差的协方差在确定其正则化和从低电位点逃逸方面起着关键作用。然而,对于误差分布对算法行为的影响程度,还没有太多的探讨。在这一领域一些新研究的推动下,我们通过显示具有相同SGD均值和协方差结构的噪声类具有相似的性质来证明普适性结果。我们主要考虑吴等人引入的乘法随机梯度下降算法(M-SGD),它比通过小批量处理的SGD算法具有更广泛的噪声级。我们主要针对通过小批量处理对应于SGD的随机微分方程,建立了M-SGD算法的非渐近界。我们还证明了M-SGD算法的误差近似为标度高斯分布,在M-SGD算法的任何固定点上的平均值为$0$。 摘要:The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its properties (e.g. escaping low potential points and regularization). Past research has indicated that the covariance of the SGD error done via minibatching plays a critical role in determining its regularization and escape from low potential points. It is however not much explored how much the distribution of the error influences the behavior of the algorithm. Motivated by some new research in this area, we prove universality results by showing that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties. We mainly consider the Multiplicative Stochastic Gradient Descent (M-SGD) algorithm as introduced by Wu et al., which has a much more general noise class than the SGD algorithm done via minibatching. We establish nonasymptotic bounds for the M-SGD algorithm mainly with respect to the Stochastic Differential Equation corresponding to SGD via minibatching. We also show that the M-SGD error is approximately a scaled Gaussian distribution with mean $0$ at any fixed point of the M-SGD algorithm.
【13】 Methods for Eliciting Informative Prior Distributions 标题:获取信息先验分布的方法 链接:https://arxiv.org/abs/2112.07090
作者:Julia R. Falconer,Eibe Frank,Devon L. L. Polaschek,Chaitanya Joshi 机构:Department of Mathematics, University of Waikato, Hamilton, New Zealand, Department of Computer Science, Hamilton,New Zealand, School of Psychology 摘要:为贝叶斯推理获取信息性先验分布通常是复杂且具有挑战性的。虽然流行的方法依赖于向专家提出基于概率的问题来量化不确定性,但这些方法并非没有缺点,并且存在许多替代的启发式方法。本文探讨了获取按类型分类的信息先验的方法,并简要讨论了它们的优点和局限性。两个有代表性的应用贯穿始终,用于探索现有方法的适用性或不足,以获取这些问题的信息先验。这项工作的主要目的是突出目前技术水平中的一些差距,并确定未来研究的方向。 摘要:Eliciting informative prior distributions for Bayesian inference can often be complex and challenging. While popular methods rely on asking experts probability based questions to quantify uncertainty, these methods are not without their drawbacks and many alternative elicitation methods exist. This paper explores methods for eliciting informative priors categorized by type and briefly discusses their strengths and limitations. Two representative applications are used throughout to explore the suitability, or lack thereof, of the existing methods for eliciting informative priors for these problems. The primary aim of this work is to highlight some of the gaps in the present state of art and identify directions for future research.
【14】 The integrated copula spectrum 标题:积分Copula谱 链接:https://arxiv.org/abs/2112.07077
作者:Yuichi Goto,Tobias Kley,Ria Van Hecke,Stanislav Volgushev,Holger Dette,Marc Hallin 机构:¶ 1Waseda University, jp 2Georg-August-Universität Göttingen, ca 5Ruhr-Universität Bochum 备注:74 pages, 28 figures 摘要:频域方法是时间序列分析统计工具箱中普遍存在的一部分。近年来,人们对开发新的谱方法和工具给予了相当大的兴趣,这些方法和工具在整个联合分布中捕捉动力学,从而避免了基于经典$L^2$谱方法的局限性。该文献中提出的大多数光谱概念都有一个主要缺点:它们的估计需要选择平滑参数,这对估计质量有很大影响,并对统计推断提出了挑战。本文结合copula谱的概念,引入了copula谱分布函数或积分copula谱的概念。这种集成的copula谱保留了基于copula谱的优点,但可以在不需要平滑参数的情况下进行估计。基于泛函中心极限定理,我们提供了这种估计,并对其渐近性质进行了深入的理论分析。我们利用这些结果来检验经典谱方法无法解决的各种假设,如尾部动力学中缺乏时间可逆性或不对称性。 摘要:Frequency domain methods form a ubiquitous part of the statistical toolbox for time series analysis. In recent years, considerable interest has been given to the development of new spectral methodology and tools capturing dynamics in the entire joint distributions and thus avoiding the limitations of classical, $L^2$-based spectral methods. Most of the spectral concepts proposed in that literature suffer from one major drawback, though: their estimation requires the choice of a smoothing parameter, which has a considerable impact on estimation quality and poses challenges for statistical inference. In this paper, associated with the concept of copula-based spectrum, we introduce the notion of copula spectral distribution function or integrated copula spectrum. This integrated copula spectrum retains the advantages of copula-based spectra but can be estimated without the need for smoothing parameters. We provide such estimators, along with a thorough theoretical analysis, based on a functional central limit theorem, of their asymptotic properties. We leverage these results to test various hypotheses that cannot be addressed by classical spectral methods, such as the lack of time-reversibility or asymmetry in tail dynamics.
【15】 Score-Based Generative Modeling with Critically-Damped Langevin Diffusion 标题:临界阻尼朗之万扩散基于分数的产生式建模 链接:https://arxiv.org/abs/2112.07068
作者:Tim Dockhorn,Arash Vahdat,Karsten Kreis 机构:NVIDIA, University of Waterloo, Vector Institute 摘要:基于分数的生成模型(SGMs)已显示出显著的综合质量。SGM依赖于一个扩散过程,该过程会逐渐将数据扰动到一个可处理的分布,而生成模型则学习去噪。除数据分布本身外,该去噪任务的复杂性由扩散过程唯一决定。我们认为,当前的SGM采用了过于简单的扩散,导致不必要的复杂去噪过程,从而限制了生成性建模性能。基于与统计力学的联系,我们提出了一种新的临界阻尼朗之万扩散(CLD),并表明基于CLD的SGM具有优越的性能。CLD可以解释为在扩展空间中运行联合扩散,其中辅助变量可以被视为与哈密顿动力学中的数据变量耦合的“速度”。我们推导了一个新的CLD分数匹配目标,表明该模型只需要学习给定数据的条件分布的分数函数,比直接学习数据的分数更容易。我们还从基于CLD的扩散模型导出了一种新的有效合成抽样方案。我们发现,对于类似的网络结构和采样计算预算,CLD在合成质量上优于以前的SGM。我们表明,我们的新型CLD采样器明显优于Euler-Maruyama等解算器。我们的框架为基于分数的去噪扩散模型提供了新的见解,并可用于高分辨率图像合成。项目页面和代码:https://nv-tlabs.github.io/CLD-SGM. 摘要:Score-based generative models (SGMs) have demonstrated remarkable synthesis quality. SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the generative model learns to denoise. The complexity of this denoising task is, apart from the data distribution itself, uniquely determined by the diffusion process. We argue that current SGMs employ overly simplistic diffusions, leading to unnecessarily complex denoising processes, which limit generative modeling performance. Based on connections to statistical mechanics, we propose a novel critically-damped Langevin diffusion (CLD) and show that CLD-based SGMs achieve superior performance. CLD can be interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered "velocities" that are coupled to the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD and show that the model only needs to learn the score function of the conditional distribution of the velocity given data, an easier task than learning scores of the data directly. We also derive a new sampling scheme for efficient synthesis from CLD-based diffusion models. We find that CLD outperforms previous SGMs in synthesis quality for similar network architectures and sampling compute budgets. We show that our novel sampler for CLD significantly outperforms solvers such as Euler--Maruyama. Our framework provides new insights into score-based denoising diffusion models and can be readily used for high-resolution image synthesis. Project page and code: https://nv-tlabs.github.io/CLD-SGM.
【16】 Dynamic Learning of Correlation Potentials for a Time-Dependent Kohn-Sham System 标题:含时Kohn-Sham系统关联势的动态学习 链接:https://arxiv.org/abs/2112.07067
作者:Harish S. Bhat,Kevin Collins,Prachi Gupta,Christine M. Isborn 机构:Department of Applied Mathematics, University of California Merced, Department of Physics, University of California Merced, Department of Applied Mathematics, Department of Chemistry and Biochemistry, University of California 备注:18 pages, 5 figures 摘要:我们发展了在一维空间中学习时间相关Kohn-Sham(TDKS)系统的相关势的方法。我们从一个低维双电子系统开始,我们可以数值求解含时薛定谔方程“odinger方程;这产生了适用于相关势训练模型的电子密度。我们将学习问题定义为在动力学服从TDKS方程的约束下优化最小二乘目标的问题。应用伴随,我们开发了计算梯度的有效方法,从而学习f相关电位。我们的结果表明,可以学习相关势的值,从而得到与基态真值密度匹配的电子密度。我们还展示了如何学习与记忆相关的势泛函,展示了一个这样的模型,该模型可以为训练集之外的轨迹产生合理的结果。 摘要:We develop methods to learn the correlation potential for a time-dependent Kohn-Sham (TDKS) system in one spatial dimension. We start from a low-dimensional two-electron system for which we can numerically solve the time-dependent Schr"odinger equation; this yields electron densities suitable for training models of the correlation potential. We frame the learning problem as one of optimizing a least-squares objective subject to the constraint that the dynamics obey the TDKS equation. Applying adjoints, we develop efficient methods to compute gradients and thereby learn models of the correlation potential. Our results show that it is possible to learn values of the correlation potential such that the resulting electron densities match ground truth densities. We also show how to learn correlation potential functionals with memory, demonstrating one such model that yields reasonable results for trajectories outside the training set.
【17】 Fiducial Inference and Decision Theory 标题:信义推理与决策理论 链接:https://arxiv.org/abs/2112.07060
作者:G. Taraldsen,B. H. Lindquist 机构:and B.H. Lindquist, Norwegian University of Science and Technology, Trondheim, Norway 摘要:几十年前,大多数统计学家得出结论认为,基准推断对他们来说是毫无意义的。然而,Hannig等人(2016)和其他人促成了新的兴趣和关注。基准推断类似于贝叶斯分析,但不需要先验知识。通过假设一个特定的数据生成方程来替换先验信息。Berger(1985)解释说,贝叶斯分析和统计决策理论是协调一致的。Taraldsen和Lindqvist(2013)表明,基准理论和统计决策理论也很好地结合在一起。本文的目的是解释和举例说明这一点以及最近的数学结果。 摘要:The majority of the statisticians concluded many decades ago that fiducial inference was nonsensical to them. Hannig et al. (2016) and others have, however, contributed to a renewed interest and focus. Fiducial inference is similar to Bayesian analysis, but without requiring a prior. The prior information is replaced by assuming a particular data generating equation. Berger (1985) explains that Bayesian analysis and statistical decision theory are in harmony. Taraldsen and Lindqvist (2013) show that fiducial theory and statistical decision theory also play well together. The purpose of this text is to explain and exemplify this together with recent mathematical results.
【18】 Limits of epidemic prediction using SIR models 标题:SIR模型在疫情预测中的局限性 链接:https://arxiv.org/abs/2112.07039
作者:Omar Melikechi,Alexander L. Young,Tao Tang,Trevor Bowman,David Dunson,James Johndrow 机构:Department of Mathematics, Duke University, Department of Statistics, Harvard University, Department of Statistics, Duke University, Department of Statistics, University of Pennsylvania 摘要:易感传染病恢复(SIR)方程及其扩展包括一组常用的模型,用于理解和预测流行病的过程。在实践中,在疫情爆发的早期,即疫情达到峰值之前,根据噪声观测值估计模型参数具有重大意义。这可以预测该流行病的后续过程,并设计适当的干预措施。然而,在这种情况下准确推断SIR模型参数是有问题的。本文就SIR模型的实际可识别性问题提供了新颖的理论见解。我们的理论为常规使用的流行病模型的推断极限提供了新的理解,并为当前的模拟和检查方法提供了有价值的补充。我们通过对真实世界流行病数据集的应用说明了一些实际意义。 摘要:The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters based on noisy observations early in the outbreak, well before the epidemic reaches its peak. This allows prediction of the subsequent course of the epidemic and design of appropriate interventions. However, accurately inferring SIR model parameters in such scenarios is problematic. This article provides novel, theoretical insight on this issue of practical identifiability of the SIR model. Our theory provides new understanding of the inferential limits of routinely used epidemic models and provides a valuable addition to current simulate-and-check methods. We illustrate some practical implications through application to a real-world epidemic data set.
【19】 Boosting Independent Component Analysis 标题:增强独立成分分析 链接:https://arxiv.org/abs/2112.06920
作者:Yunpeng Li,ZhaoHui Ye 机构:Department of Automation, Tsinghua University, Beijing , China 摘要:独立成分分析旨在从线性混合物中尽可能独立地恢复未知成分。该技术已广泛应用于数据分析、信号处理和机器学习等领域。本文提出了一种新的基于boosting的独立分量分析算法。我们的算法通过在极大似然估计中引入boosting,填补了非参数独立分量分析的空白。与目前已知的许多算法相比,各种实验验证了其性能。 摘要:Independent component analysis is intended to recover the unknown components as independent as possible from their linear mixtures. This technique has been widely used in many fields, such as data analysis, signal processing, and machine learning. In this paper, we present a novel boosting-based algorithm for independent component analysis. Our algorithm fills the gap in the nonparametric independent component analysis by introducing boosting to maximum likelihood estimation. A variety of experiments validate its performance compared with many of the presently known algorithms.
【20】 DiPS: Differentiable Policy for Sketching in Recommender Systems 标题:DIPS:推荐系统中草图的差异化策略 链接:https://arxiv.org/abs/2112.07616
作者:Aritra Ghosh,Saayan Mitra,Andrew Lan 机构:University of Massachusetts Amherst, Adobe Research 备注:AAAI 2022 with supplementary material 摘要:在顺序推荐系统应用程序中,重要的是开发能够捕获用户随时间变化的兴趣的模型,以便成功推荐他们可能与之交互的未来项目。对于历史悠久的用户,基于递归神经网络的典型模型往往会忘记遥远过去的重要项目。最近的工作表明,存储过去项目的小草图可以改进顺序推荐任务。然而,这些工作都依赖于静态草图策略,即选择要保留在草图中的项目的启发式方法,这不一定是最优的,并且不能随着时间的推移使用更多的训练数据进行改进。在本文中,我们提出了一种可微素描策略(DiPS),一种以端到端的方式学习数据驱动的素描策略的框架,以及推荐系统模型,以明确地最大化未来的推荐质量。我们还提出了一个梯度的近似估计,用于优化素描算法参数,这在计算上是有效的。我们在各种实际设置下验证了真实世界数据集上DIP的有效性,并表明与现有草图策略相比,它需要多达50%$的草图项才能达到相同的预测质量。 摘要:In sequential recommender system applications, it is important to develop models that can capture users' evolving interest over time to successfully recommend future items that they are likely to interact with. For users with long histories, typical models based on recurrent neural networks tend to forget important items in the distant past. Recent works have shown that storing a small sketch of past items can improve sequential recommendation tasks. However, these works all rely on static sketching policies, i.e., heuristics to select items to keep in the sketch, which are not necessarily optimal and cannot improve over time with more training data. In this paper, we propose a differentiable policy for sketching (DiPS), a framework that learns a data-driven sketching policy in an end-to-end manner together with the recommender system model to explicitly maximize recommendation quality in the future. We also propose an approximate estimator of the gradient for optimizing the sketching algorithm parameters that is computationally efficient. We verify the effectiveness of DiPS on real-world datasets under various practical settings and show that it requires up to $50%$ fewer sketch items to reach the same predictive quality than existing sketching policies.
【21】 Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans{ä}tze 标题:利用群等变卷积量子算法加速学习量子态 链接:https://arxiv.org/abs/2112.07611
作者:Han Zheng,Zimu Li,Junyu Liu,Sergii Strelchuk,Risi Kondor 机构:Department of Statistics, The University of Chicago, Chicago, IL , USA, DAMTP, Center for Mathematical Sciences, University of Cambridge, Cambridge CB,WA, UK, Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL , USA 备注:16 pages, 12 figures 摘要:我们发展了一个$S_n$等变量子卷积电路的理论框架,建立在并显著推广了Jordan的置换量子计算(PQC)形式。我们证明,量子电路是傅里叶空间神经结构的自然选择,与对称群上最著名的经典快速傅里叶变换(FFT)相比,量子电路在计算$S_n$-傅里叶系数的矩阵元素时具有超指数加速。特别地,我们利用Okounkov-Vershik方法证明了Harrow关于$operatorname{SU}(D)$和$s_n$不可修复基之间等价性的陈述(博士论文2005年第160页),并使用Young Jucys Murphy(YJM)建立了$s_n$等变卷积量子交替Ans{a}tze($s_n$-CQA)元素。我们证明了$S_n$-CQA是稠密的,因此可以在每个$S_n$-unrepr块中表达,这可以作为未来潜在量子机器学习和优化应用的通用模型。我们的方法从表象理论的角度为证明量子近似优化算法(QOA)的普适性提供了另一种途径。我们的框架可以自然地应用于具有全局$operatorname{SU}(d)$对称性的广泛问题。我们通过数值模拟展示了ans{“a}的有效性tze在矩形晶格和Kagome晶格上找到$J_1$--$J_2$反铁磁海森堡模型基态的符号结构。我们的工作确定了特定机器学习问题的量子优势,并首次将著名的Okounkov Vershik表示理论应用于机器学习和量子物理。 摘要:We develop a theoretical framework for $S_n$-equivariant quantum convolutional circuits, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism. We show that quantum circuits are a natural choice for Fourier space neural architectures affording a super-exponential speedup in computing the matrix elements of $S_n$-Fourier coefficients compared to the best known classical Fast Fourier Transform (FFT) over the symmetric group. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $operatorname{SU}(d)$- and $S_n$-irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans{"a}tze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA are dense, thus expressible within each $S_n$-irrep block, which may serve as a universal model for potential future quantum machine learning and optimization applications. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA), from the representation-theoretical point of view. Our framework can be naturally applied to a wide array of problems with global $operatorname{SU}(d)$ symmetry. We present numerical simulations to showcase the effectiveness of the ans{"a}tze to find the sign structure of the ground state of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work identifies quantum advantage for a specific machine learning problem, and provides the first application of the celebrated Okounkov-Vershik's representation theory to machine learning and quantum physics.
【22】 M3E2: Multi-gate Mixture-of-experts for Multi-treatment Effect Estimation 标题:M3E2:用于多治疗效果评估的多门混合专家 链接:https://arxiv.org/abs/2112.07574
作者:Raquel Aoki,Yizhou Chen,Martin Ester 机构:Simon Fraser University 备注:4 figures, 10 pages 摘要:这项工作提出了M3E2,一个多任务学习神经网络模型来估计多种治疗的效果。与现有方法相比,M3E2对同时应用于同一单位、连续和二元处理以及许多协变量的多种处理效果具有鲁棒性。我们在三个综合基准数据集中比较了M3E2与三个基线:两个采用多种治疗,一个采用一种治疗。我们的分析表明,我们的方法具有优越的性能,对真正的治疗效果做出了更为自信的估计。该代码可在github上获得。com/raquelaoki/M3E2。 摘要:This work proposes the M3E2, a multi-task learning neural network model to estimate the effect of multiple treatments. In contrast to existing methods, M3E2 is robust to multiple treatment effects applied simultaneously to the same unit, continuous and binary treatments, and many covariates. We compared M3E2 with three baselines in three synthetic benchmark datasets: two with multiple treatments and one with one treatment. Our analysis showed that our method has superior performance, making more assertive estimations of the true treatment effects. The code is available at github.com/raquelaoki/M3E2.
【23】 The high-dimensional asymptotics of first order methods with random data 标题:随机数据一阶方法的高维渐近性 链接:https://arxiv.org/abs/2112.07572
作者:Michael Celentano,Chen Cheng,Andrea Montanari 备注:83 pages 摘要:我们研究了一类${mathbb R}^{dtimes k}$中的确定性流,由{mathbb R}^{ntimes d}$中的随机矩阵${boldsymbol X}参数化,带有以i.i.d.为中心的次高斯项。我们在高维极限下刻画了这些流在有界时间范围内的渐近行为,其中,$n,dtoinfty$与$k$固定且收敛的纵横比为$n/dtodelta$。我们证明的渐近特征是关于$k$维非线性随机过程系统,其参数由不动点条件确定。这种类型的表征在物理学中称为动态平均场理论。对于一些自旋玻璃模型,过去已经获得了这种类型的严格结果。我们的证明基于时间离散化和对某些迭代方案的简化,这些迭代方案称为近似消息传递(AMP)算法,与早期基于大偏差理论和随机过程理论的工作相反。新的方法允许更基本的证明,并意味着流的高维行为对于${boldsymbol X}$项的分布是通用的。作为具体应用,在随机设计假设下,我们从统计学和机器学习中获得了一些经典模型中梯度流的高维特征。 摘要:We study a class of deterministic flows in ${mathbb R}^{dtimes k}$, parametrized by a random matrix ${boldsymbol X}in {mathbb R}^{ntimes d}$ with i.i.d. centered subgaussian entries. We characterize the asymptotic behavior of these flows over bounded time horizons, in the high-dimensional limit in which $n,dtoinfty$ with $k$ fixed and converging aspect ratios $n/dtodelta$. The asymptotic characterization we prove is in terms of a system of a nonlinear stochastic process in $k$ dimensions, whose parameters are determined by a fixed point condition. This type of characterization is known in physics as dynamical mean field theory. Rigorous results of this type have been obtained in the past for a few spin glass models. Our proof is based on time discretization and a reduction to certain iterative schemes known as approximate message passing (AMP) algorithms, as opposed to earlier work that was based on large deviations theory and stochastic processes theory. The new approach allows for a more elementary proof and implies that the high-dimensional behavior of the flow is universal with respect to the distribution of the entries of ${boldsymbol X}$. As specific applications, we obtain high-dimensional characterizations of gradient flow in some classical models from statistics and machine learning, under a random design assumption.
【24】 The Oracle estimator is suboptimal for global minimum variance portfolio optimisation 标题:对于全局最小方差投资组合优化,Oracle估计器是次优的 链接:https://arxiv.org/abs/2112.07521
作者:Christian Bongiorno,Damien Challet 机构:Université Paris-Saclay, CentraleSupélec, Laboratoire de Mathématiques et Informatique pour la Complexité et les Systèmes, Gif-sur-Yvette, France 摘要:一个常见的误解是,协方差矩阵的Oracle特征值估值器产生最佳实现投资组合绩效。实际上,Oracle估计器只是修改经验协方差矩阵的特征值,以最小化滤波后的协方差矩阵和实现的协方差矩阵之间的Frobenius距离。只有当样本内特征向量与样本外特征向量一致时,才能得到最佳投资组合。在所有其他情况下,最优特征值校正可以从二次规划问题的解中获得。求解它表明,Oracle估值器仅在每个资产的无限个数据点的限制下产生最佳投资组合,并且仅在平稳系统中产生最佳投资组合。 摘要:A common misconception is that the Oracle eigenvalue estimator of the covariance matrix yields the best realized portfolio performance. In reality, the Oracle estimator simply modifies the empirical covariance matrix eigenvalues so as to minimize the Frobenius distance between the filtered and the realized covariance matrices. This leads to the best portfolios only when the in-sample eigenvectors coincide with the out-of-sample ones. In all the other cases, the optimal eigenvalue correction can be obtained from the solution of a Quadratic-Programming problem. Solving it shows that the Oracle estimators only yield the best portfolios in the limit of infinite data points per asset and only in stationary systems.
【25】 Efficient differentiable quadratic programming layers: an ADMM approach 标题:高效可微二次规划层:ADMM方法 链接:https://arxiv.org/abs/2112.07464
作者:Andrew Butler,Roy Kwon 机构:University of Toronto, Department of Mechanical and Industrial Engineering 摘要:神经网络结构的最新进展允许将凸优化问题无缝集成为端到端可训练神经网络中的可微层。然而,将大中型二次规划集成到深度神经网络结构中是一个挑战,因为用内点方法精确求解二次规划在变量数量上具有最坏情况下的立方复杂性。在本文中,我们提出了一种基于交替方向乘数法(ADMM)的替代网络层架构,该架构能够扩展到具有中等数量变量的问题。后向微分是通过对修正的定点迭代的残差映射进行隐式微分来实现的。模拟结果证明了ADMM层的计算优势,对于中等规模的问题,它比OptNet二次规划层大约快一个数量级。此外,与基于KKT最优性条件的展开微分或隐式微分的标准方法相比,从记忆和计算的角度来看,我们新的后向传递例程是有效的。最后,我们以综合预测和优化范式中的投资组合优化为例进行总结。 摘要:Recent advances in neural-network architecture allow for seamless integration of convex optimization problems as differentiable layers in an end-to-end trainable neural network. Integrating medium and large scale quadratic programs into a deep neural network architecture, however, is challenging as solving quadratic programs exactly by interior-point methods has worst-case cubic complexity in the number of variables. In this paper, we present an alternative network layer architecture based on the alternating direction method of multipliers (ADMM) that is capable of scaling to problems with a moderately large number of variables. Backward differentiation is performed by implicit differentiation of the residual map of a modified fixed-point iteration. Simulated results demonstrate the computational advantage of the ADMM layer, which for medium scaled problems is approximately an order of magnitude faster than the OptNet quadratic programming layer. Furthermore, our novel backward-pass routine is efficient, from both a memory and computation standpoint, in comparison to the standard approach based on unrolled differentiation or implicit differentiation of the KKT optimality conditions. We conclude with examples from portfolio optimization in the integrated prediction and optimization paradigm.
【26】 Bayesian Learning of Play Styles in Multiplayer Video Games 标题:多人视频游戏中游戏风格的贝叶斯学习 链接:https://arxiv.org/abs/2112.07437
作者:Aline Normoyle,Shane T. Jensen 机构:Bryn Mawr College and University of Pennsylvania 摘要:在线多人游戏中游戏的复杂性已经引起了人们对模拟玩家成功使用的不同游戏风格或策略的浓厚兴趣。我们为在线多人游戏《战场3》开发了一种分层贝叶斯回归方法,在该方法中,性能被建模为角色、游戏类型和该玩家在每一场比赛中使用的地图的函数。我们使用Dirichlet过程,使回归模型中具有类似球员特定系数的球员能够聚集,这使我们能够在我们的战地3球员样本中发现共同的比赛风格。这种贝叶斯半参数聚类方法有几个优点:不需要指定常见游戏风格的数量,玩家可以在多个集群之间移动,并且生成的分组通常具有直接的解释。我们详细研究了战场3玩家中最常见的游戏风格,找到了表现出整体高性能的玩家组,以及在特定游戏类型、地图和角色中表现特别出色的玩家组。我们还能够区分具有特定打法风格的稳定球员和在比赛中表现出多种打法风格的混合球员。为不同游戏风格的场景建模将有助于游戏开发人员为新参与者开发专门的教程,并改进在线匹配队列中互补团队的构建。 摘要:The complexity of game play in online multiplayer games has generated strong interest in modeling the different play styles or strategies used by players for success. We develop a hierarchical Bayesian regression approach for the online multiplayer game Battlefield 3 where performance is modeled as a function of the roles, game type, and map taken on by that player in each of their matches. We use a Dirichlet process prior that enables the clustering of players that have similar player-specific coefficients in our regression model, which allows us to discover common play styles amongst our sample of Battlefield 3 players. This Bayesian semi-parametric clustering approach has several advantages: the number of common play styles do not need to be specified, players can move between multiple clusters, and the resulting groupings often have a straight-forward interpretations. We examine the most common play styles among Battlefield 3 players in detail and find groups of players that exhibit overall high performance, as well as groupings of players that perform particularly well in specific game types, maps and roles. We are also able to differentiate between players that are stable members of a particular play style from hybrid players that exhibit multiple play styles across their matches. Modeling this landscape of different play styles will aid game developers in developing specialized tutorials for new participants as well as improving the construction of complementary teams in their online matching queues.
【27】 Conjugated Discrete Distributions for Distributional Reinforcement Learning 标题:分布式强化学习的共轭离散分布 链接:https://arxiv.org/abs/2112.07424
作者:Björn Lindenberg,Jonas Nordqvist,Karl-Olof Lindahl 机构:Department of Mathematics, Linnæus University, V¨axj¨o, Sweden 备注:17 pages, 7 figures, conference 摘要:在这项工作中,我们继续建立在有限马尔可夫过程强化学习的最新进展之上。在以前的现有算法中,无论是单参与者算法还是分布式算法,都有一种常见的方法,即要么剪辑奖励,要么对Q函数应用转换方法,以处理实际贴现回报中的大量数量级。我们从理论上证明,如果我们有一个不确定的过程,最成功的方法之一可能不会产生最优策略。作为一种解决方案,我们认为分布强化学习有助于完全纠正这种情况。通过引入共轭分配算子,我们可以在保证理论收敛的情况下处理一大类实际收益的变换。我们提出了一种基于该算子的近似单角色算法,该算法使用Cramêer距离给出的适当分布度量,直接在不变的奖励上训练代理。为了评估其在随机环境中的性能,我们使用粘性动作在55个Atari 2600游戏套件上训练代理,并与多巴胺框架中的其他著名算法相比,获得最先进的性能。 摘要:In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cram'er distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.
【28】 Confidence intervals of ruin probability under Lévy surplus 标题:Lévy盈余下破产概率的置信区间 链接:https://arxiv.org/abs/2112.07405
作者:Yasutaka Shimizu 机构:Department of Applied Mathematics, Waseda University 摘要:本文的目的是构造在Levy过程驱动的保险盈余下的最终破产概率的置信区间。假设一个Lêevy测度的参数族,我们从剩余数据估计参数,并通过delta方法估计破产概率。然而,渐近方差包括破产概率对参数的导数,这通常没有明确给出,并且即使破产概率得到很好的估计,置信区间也不是直接的。本文给出了该导数的Cram′er型近似,并给出了破产概率的渐近置信区间。 摘要:The aim of this paper is to construct the confidence interval of the ultimate ruin probability under the insurance surplus driven by a L'evy process. Assuming a parametric family for the L'evy measures, we estimate the parameter from the surplus data and estimate the ruin probability via the delta method. However the asymptotic variance includes the derivative of the ruin probability with respect to the parameter, which is not generally given explicitly, and the confidence interval is not straightforward even if the ruin probability is well estimated. This paper gives the Cram'er-type approximation for the derivative and gives an asymptotic confidence interval of ruin probability.
【29】 Robustifying automatic speech recognition by extracting slowly varying features 标题:通过提取缓慢变化的特征来实现自动语音识别的ROBUST化 链接:https://arxiv.org/abs/2112.07400
作者:Matias Pizarro,Dorothea Kolossa,Asja Fischer 机构:Ruhr University Bochum, Germany 摘要:在过去的几年中,已经证明深度学习系统在对抗性攻击下非常脆弱。基于神经网络的自动语音识别(ASR)系统也不例外。有针对性和无针对性的攻击可以修改音频输入信号,使人类仍能识别相同的单词,而ASR系统则被引导预测不同的转录。在本文中,我们提出了一种针对目标对抗性攻击的防御机制,包括在将输入反馈给ASR系统之前,通过应用慢速特征分析、低通滤波器或两者,从音频信号中移除快速变化的特征。我们对以这种方式预处理的数据训练的混合ASR模型进行了实证分析。虽然生成的模型在良性数据上表现得相当好,但它们对目标对手攻击的鲁棒性显著提高:我们最终提出的模型在干净数据上表现出与基线模型类似的性能,同时鲁棒性提高了四倍以上。 摘要:In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial attacks: Our final, proposed model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
【30】 Euclid: Covariance of weak lensing pseudo-C_ell estimates. Calculation, comparison to simulations, and dependence on survey geometry链接:https://arxiv.org/abs/2112.07341
作者:R. E. Upham,M. L. Brown,L. Whittaker,A. Amara,N. Auricchio,D. Bonino,E. Branchini,M. Brescia,J. Brinchmann,V. Capobianco,C. Carbone,J. Carretero,M. Castellano,S. Cavuoti,A. Cimatti,R. Cledassou,G. Congedo,L. Conversi,Y. Copin,L. Corcione,M. Cropper,A. Da Silva,H. Degaudenzi,M. Douspis,F. Dubath,C. A. J. Duncan,X. Dupac,S. Dusini,A. Ealet,S. Farrens,S. Ferriol,P. Fosalba,M. Frailis,E. Franceschi,M. Fumana,B. Garilli,B. Gillis,C. Giocoli,F. Grupp,S. V. H. Haugan,H. Hoekstra,W. Holmes,F. Hormuth,A. Hornstrup,K. Jahnke,S. Kermiche,A. Kiessling,M. Kilbinger,T. Kitching,M. Kümmel,M. Kunz,H. Kurki-Suonio,S. Ligori,P. B. Lilje,I. Lloro,O. Marggraf,K. Markovic,F. Marulli,M. Meneghetti,G. Meylan,M. Moresco,L. Moscardini,E. Munari,S. M. Niemi,C. Padilla,S. Paltani,F. Pasian,K. Pedersen,V. Pettorino,S. Pires,M. Poncet,L. Popa,F. Raison,J. Rhodes,E. Rossetti,R. Saglia,B. Sartoris,P. Schneider,A. Secroun,G. Seidel,C. Sirignano,G. Sirri,L. Stanco,J. -L. Starck,P. Tallada-Crespí,D. Tavagnacco,A. N. Taylor,I. Tereno,R. Toledo-Moreo,F. Torradeflot,L. Valenziano,Y. Wang,G. Zamorani,J. Zoubian,S. Andreon,M. Baldi,S. Camera,V. F. Cardone,G. Fabbian,G. Polenta,A. Renzi,B. Joachimi,A. Hall,A. Loureiro,E. Sellentin 机构:(Affiliations can be found after the references) 备注:15 pages, 8 figures; submitted to A&A; code available at this https URL 摘要:当使用高斯似然法时,精确的协方差矩阵对于获得可靠的宇宙学结果至关重要。在本文中,我们研究了层析宇宙剪切功率谱的伪Cuell$估计的协方差。结合使用两个现有的公开代码,我们计算了完整的协方差矩阵,包括由部分天空覆盖和非线性结构增长引起的模式耦合贡献。对于三种不同的天空遮罩,我们将理论协方差矩阵与公开的N体弱透镜模拟估计的协方差矩阵进行了比较,发现了良好的一致性。我们发现,当应用更极端的天空切割时,在理论和模拟中观察到高斯非对角协方差和非高斯超样本协方差的相应增加,与预期一致。详细研究了对协方差的不同贡献,我们发现高斯协方差沿主对角线和最近的非对角线占主导地位,但远离主对角线的超样本协方差占主导地位。通过在描述物质聚集和暗能量的参数中形成模拟约束,我们发现忽略协方差的非高斯贡献会导致低估置信区域的真实大小达70%。主要的非高斯协方差分量是超样本协方差,但忽略较小的连接非高斯协方差仍可能导致不确定性低估10%-20%。真正的宇宙学分析需要对许多有害参数进行边缘化,这将降低所有宇宙学贡献对协方差的相对重要性,因此这些值应作为每个分量重要性的上限。 摘要:An accurate covariance matrix is essential for obtaining reliable cosmological results when using a Gaussian likelihood. In this paper we study the covariance of pseudo-$C_ell$ estimates of tomographic cosmic shear power spectra. Using two existing publicly available codes in combination, we calculate the full covariance matrix, including mode-coupling contributions arising from both partial sky coverage and non-linear structure growth. For three different sky masks, we compare the theoretical covariance matrix to that estimated from publicly available N-body weak lensing simulations, finding good agreement. We find that as a more extreme sky cut is applied, a corresponding increase in both Gaussian off-diagonal covariance and non-Gaussian super-sample covariance is observed in both theory and simulations, in accordance with expectations. Studying the different contributions to the covariance in detail, we find that the Gaussian covariance dominates along the main diagonal and the closest off-diagonals, but further away from the main diagonal the super-sample covariance is dominant. Forming mock constraints in parameters describing matter clustering and dark energy, we find that neglecting non-Gaussian contributions to the covariance can lead to underestimating the true size of confidence regions by up to 70 per cent. The dominant non-Gaussian covariance component is the super-sample covariance, but neglecting the smaller connected non-Gaussian covariance can still lead to the underestimation of uncertainties by 10--20 per cent. A real cosmological analysis will require marginalisation over many nuisance parameters, which will decrease the relative importance of all cosmological contributions to the covariance, so these values should be taken as upper limits on the importance of each component.
【31】 Compensatory model for quantile estimation and application to VaR 标题:分位数估计的补偿模型及其在VaR中的应用 链接:https://arxiv.org/abs/2112.07278
作者:Shuzhen Yang 机构:Shandong University-Zhong Tai Securities Institute for Financial Studies, Shandong University 备注:23 pages, 6 figures 摘要:与通常估计时间序列的分布然后从分布中获得分位数的过程不同,我们开发了一个补偿模型来改进给定分布估计下的分位数估计。在补偿模型中引入了一种新的惩罚项。我们证明了惩罚项可以控制给定时间序列分位数估计的收敛误差,并获得自适应调整分位数估计。仿真和实证分析表明,在给定的分布估计下,补偿模型可以显著提高风险价值(VaR)的性能。 摘要:In contrast to the usual procedure of estimating the distribution of a time series and then obtaining the quantile from the distribution, we develop a compensatory model to improve the quantile estimation under a given distribution estimation. A novel penalty term is introduced in the compensatory model. We prove that the penalty term can control the convergence error of the quantile estimation of a given time series, and obtain an adaptive adjusted quantile estimation. Simulation and empirical analysis indicate that the compensatory model can significantly improve the performance of the value at risk (VaR) under a given distribution estimation.
【32】 Semiparametric Conditional Factor Models: Estimation and Inference 标题:半参数条件因子模型的估计与推断 链接:https://arxiv.org/abs/2112.07121
作者:Qihui Chen,Nikolai Roussanov,Xiaoliang Wang 机构:CUHK-Shenzhen, The Wharton School, UPenn 备注:103 pages 摘要:本文介绍了一种具有潜在因子的半参数条件因子模型的简单易处理的筛估计。我们建立了估计量和检验的大-$N$渐近性质,而不需要大的$T$。我们还开发了一个简单的bootstrap程序,用于对条件定价错误以及因子加载函数的形状进行推断。这些结果使我们能够通过利用多个特征的任意非线性函数来估计大量单个资产的条件因子结构,而无需预先指定因子,同时使我们能够从阿尔法中分离特征在捕获因子β方面的作用(即,错误定价的不可分散风险)。我们将这些方法应用于单个美国股票收益的横截面,并发现大量非零定价错误的有力证据,这些错误结合起来产生夏普比率高于3的套利投资组合。 摘要:This paper introduces a simple and tractable sieve estimation of semiparametric conditional factor models with latent factors. We establish large-$N$-asymptotic properties of the estimators and the tests without requiring large $T$. We also develop a simple bootstrap procedure for conducting inference about the conditional pricing errors as well as the shapes of the factor loadings functions. These results enable us to estimate conditional factor structure of a large set of individual assets by utilizing arbitrary nonlinear functions of a number of characteristics without the need to pre-specify the factors, while allowing us to disentangle the characteristics' role in capturing factor betas from alphas (i.e., undiversifiable risk from mispricing). We apply these methods to the cross-section of individual U.S. stock returns and find strong evidence of large nonzero pricing errors that combine to produce arbitrage portfolios with Sharpe ratios above 3.
【33】 How to Learn when Data Gradually Reacts to Your Model 标题:如何了解数据何时会逐渐对您的模型做出反应 链接:https://arxiv.org/abs/2112.07042
作者:Zachary Izzo,James Zou,Lexing Ying 机构:Department of Mathematics, Stanford University, Institute for Computational and Mathematical Engineering, Stanford University, Department of Biomedical Data Science, Stanford University 备注:40 pages, 8 figures 摘要:最近的一项工作重点是在性能设置中训练机器学习(ML)模型,即当数据分布对部署的模型作出反应时。此设置的目标是学习一个模型,该模型既能诱导有利的数据分布,又能在诱导分布上表现良好,从而将测试损失降至最低。以前寻找最优模型的工作假设数据分布立即适应部署的模型。然而,在实践中,情况可能并非如此,因为人口可能需要时间来适应该模型。在许多应用程序中,数据分布既取决于当前部署的ML模型,也取决于在部署模型之前填充所处的“状态”。在这项工作中,我们提出了一种新的算法,状态执行梯度下降(statefulperfgd),以最小化即使在存在这些影响的情况下的执行损失。我们为有状态PerfGD的收敛性提供了理论保证。我们的实验证实,有状态PerfGD大大优于以前的最先进的方法。 摘要:A recent line of work has focused on training machine learning (ML) models in the performative setting, i.e. when the data distribution reacts to the deployed model. The goal in this setting is to learn a model which both induces a favorable data distribution and performs well on the induced distribution, thereby minimizing the test loss. Previous work on finding an optimal model assumes that the data distribution immediately adapts to the deployed model. In practice, however, this may not be the case, as the population may take time to adapt to the model. In many applications, the data distribution depends on both the currently deployed ML model and on the "state" that the population was in before the model was deployed. In this work, we propose a new algorithm, Stateful Performative Gradient Descent (Stateful PerfGD), for minimizing the performative loss even in the presence of these effects. We provide theoretical guarantees for the convergence of Stateful PerfGD. Our experiments confirm that Stateful PerfGD substantially outperforms previous state-of-the-art methods.
【34】 ELF: Exact-Lipschitz Based Universal Density Approximator Flow 标题:ELF:基于Exact-Lipschitz的通用密度近似子流 链接:https://arxiv.org/abs/2112.06997
作者:Achintya Gopal 机构:Bloomberg Quant Research 摘要:在过去几年中,流动正常化越来越受欢迎;然而,它们的计算成本仍然很高,这使得它们很难被更广泛的机器学习社区所接受。本文介绍了一个简单的一维单层网络,它具有封闭形式的Lipschitz常数;利用这一点,我们引入了一种新的精确Lipschitz流(ELF),它结合了从剩余流采样的方便性和自回归流的强大性能。此外,我们还证明了ELF是一种通用密度近似器,与许多其他流相比,它的计算和参数效率更高,并且在多个大规模数据集上实现了最先进的性能。 摘要:Normalizing flows have grown more popular over the last few years; however, they continue to be computationally expensive, making them difficult to be accepted into the broader machine learning community. In this paper, we introduce a simple one-dimensional one-layer network that has closed form Lipschitz constants; using this, we introduce a new Exact-Lipschitz Flow (ELF) that combines the ease of sampling from residual flows with the strong performance of autoregressive flows. Further, we show that ELF is provably a universal density approximator, more computationally and parameter efficient compared to a multitude of other flows, and achieves state-of-the-art performance on multiple large-scale datasets.
【35】 Addressing Bias in Active Learning with Depth Uncertainty Networks... or Not 标题:用深度不确定网络解决主动学习中的偏差。或者不是 链接:https://arxiv.org/abs/2112.06926
作者:Chelsea Murray,James U. Allingham,Javier Antorán,José Miguel Hernández-Lobato 机构:Department of Engineering, University of Cambridge 备注:arXiv admin note: substantial text overlap with arXiv:2112.06796 摘要:Farquhar等人[2021]表明,用参数不足的模型纠正主动学习偏差可以提高下游绩效。然而,对于NNs等过度参数化模型,校正会导致性能降低或保持不变。他们认为这是由于“过度拟合偏差”抵消了主动学习偏差。我们表明,深度不确定性网络在低过拟合状态下运行,很像低参数模型。因此,通过偏差校正,他们应该看到性能的提高。令人惊讶的是,他们没有。我们认为,这一负面结果以及Farquhar等人[2021]的结果可以通过广义误差的偏差方差分解来解释。 摘要:Farquhar et al. [2021] show that correcting for active learning bias with underparameterised models leads to improved downstream performance. For overparameterised models such as NNs, however, correction leads either to decreased or unchanged performance. They suggest that this is due to an "overfitting bias" which offsets the active learning bias. We show that depth uncertainty networks operate in a low overfitting regime, much like underparameterised models. They should therefore see an increase in performance with bias correction. Surprisingly, they do not. We propose that this negative result, as well as the results Farquhar et al. [2021], can be explained via the lens of the bias-variance decomposition of generalisation error.