统计学学术速递[7.21]

2021-07-27 11:08:29 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

stat统计学,共计36篇

【1】 On Estimating Rank-One Spiked Tensors in the Presence of Heavy Tailed Errors 标题:重尾误差下一阶尖峰张量的估计

作者:Arnab Auddy,Ming Yuan 机构:Department of Statistics, Columbia University, ) 备注:46 pages, 7 figures 链接:https://arxiv.org/abs/2107.09660 摘要:本文研究了重尾噪声下秩一尖峰张量的估计问题。我们的结果突出了在重尾和高斯噪声下统计效率和计算效率之间的一些基本相似性和差异。特别地,我们证明,对于$p$th阶张量,当噪声具有有限的$4(p-1)$th矩时,这种折衷表现为与高斯情形相同的方式。在有或没有计算约束的情况下,我们以最佳速率估计奇异向量的信号强度要求的差异,有趣的是,对于尾部较重的噪声,这种差异会缩小,而当噪声只有有限的四阶矩时,这种差异就会消失。此外,如果噪声小于四阶矩,张量奇异值分解,也许是最自然的方法,是次优的,即使它是计算困难。我们的分析利用了一个密切的联系,估计秩一尖峰和谱范数的随机张量与iid项目。特别地,我们证明了随机张量的谱范数的阶可以用其项的矩来精确刻画,推广了随机矩阵的经典结果。除了理论保证之外,我们还提出了重尾机制的估计方法,该方法易于实现,运行效率高。数值实验证明了它们的实用价值。 摘要:In this paper, we study the estimation of a rank-one spiked tensor in the presence of heavy tailed noise. Our results highlight some of the fundamental similarities and differences in the tradeoff between statistical and computational efficiencies under heavy tailed and Gaussian noise. In particular, we show that, for $p$ th order tensors, the tradeoff manifests in an identical fashion as the Gaussian case when the noise has finite $4(p-1)$ th moment. The difference in signal strength requirements, with or without computational constraints, for us to estimate the singular vectors at the optimal rate, interestingly, narrows for noise with heavier tails and vanishes when the noise only has finite fourth moment. Moreover, if the noise has less than fourth moment, tensor SVD, perhaps the most natural approach, is suboptimal even though it is computationally intractable. Our analysis exploits a close connection between estimating the rank-one spikes and the spectral norm of a random tensor with iid entries. In particular, we show that the order of the spectral norm of a random tensor can be precisely characterized by the moment of its entries, generalizing classical results for random matrices. In addition to the theoretical guarantees, we propose estimation procedures for the heavy tailed regime, which are easy to implement and efficient to run. Numerical experiments are presented to demonstrate their practical merits.

【2】 Pooled testing to isolate infected individuals 标题:用于隔离感染者的联合测试

作者:Matthew Aldridge 机构:School of Mathematics, University of Leeds, Leeds, U.K., LS,JT 备注:None 链接:https://arxiv.org/abs/2107.09633 摘要:群体测试的常见问题是:对于给定数量的个体和给定的流行率,需要多少测试才能找到每个受感染的个体?然而,在现实生活中,问题通常是不同的:对于给定数量的个体,给定的患病率,以及有限数量的T远小于T*的测试,如何最好地使用这些测试?在这篇会议论文中,我们概述了两个模型在这个问题上的一些最新结果。首先,与COVID-19筛查相关的“实用”模型显示,简单算法在低患病率和高敏感度下的表现优于简单算法。第二,非常低流行率的“理论”模型和完美的测试给出了有趣的新的数学结果。 摘要:The usual problem for group testing is this: For a given number of individuals and a given prevalence, how many tests T* are required to find every infected individual? In real life, however, the problem is usually different: For a given number of individuals, a given prevalence, and a limited number of tests T much smaller than T*, how can these tests best be used? In this conference paper, we outline some recent results on this problem for two models. First, the "practical" model, which is relevant for screening for COVID-19 and has tests that are highly specific but imperfectly sensitive, shows that simple algorithms can be outperformed at low prevalence and high sensitivity. Second, the "theoretical" model of very low prevalence with perfect tests gives interesting new mathematical results.

【3】 Sparse composite likelihood selection 标题:稀疏复合似然选择

作者:Claudia Di Caterina,Davide Ferrari 机构:University of Bolzano 链接:https://arxiv.org/abs/2107.09586 摘要:由于复合似然法能够将复杂模型分解为更简单的组成部分,因此在参数$p$的数量很大的情况下,复合似然法已显示出良好的应用前景,因此即使在完全似然法不易处理的情况下,也能进行推断。尽管有许多方法可以在有限的$$p$环境中建立有效的复合似然,但是在如何构造复合似然上似乎没有达成一致,即当允许$p$发散时,复合似然是完全有效的并且在统计上是合理的。本文介绍了一种选择稀疏复合似然的方法,该方法通过最小化一个表示隐含估计的统计效率的准则加上一个阻止包含太多子似然项的$L_1$-惩罚。研究了一致性模型选择的条件。文中详细分析了该方法的应用实例,并将其应用于实际数据。 摘要:Composite likelihood has shown promise in settings where the number of parameters $p$ is large due to its ability to break down complex models into simpler components, thus enabling inference even when the full likelihood is not tractable. Although there are a number of ways to formulate a valid composite likelihood in the finite-$p$ setting, there does not seem to exist agreement on how to construct composite likelihoods that are comp utationally efficient and statistically sound when $p$ is allowed to diverge. This article introduces a method to select sparse composite likelihoods by minimizing a criterion representing the statistical efficiency of the implied estimator plus an $L_1$-penalty discouraging the inclusion of too many sub-likelihood terms. Conditions under which consistent model selection occurs are studied. Examples illustrating the procedure are analysed in detail and applied to real data.

【4】 The Smoking Gun: Statistical Theory Improves Neural Network Estimates 标题:“吸烟枪”:统计理论改进了神经网络估计

作者:Alina Braun,Michael Kohler,Sophie Langer,Harro Walk 机构:Fachbereich Mathematik, Technische Universit¨at Darmstadt, Schlossgartenstr. , Fachbereich Mathematik, Universit¨at Stuttgart, Pfaffenwaldring , Stuttgart 链接:https://arxiv.org/abs/2107.09550 摘要:本文分析了单隐层神经网络回归估计的L×2误差。在假设回归函数的傅里叶变换衰减得很快的情况下,我们证明了一个估计,其中所有初始权值都是根据适当的均匀分布选择的,并且权值是通过梯度下降学习的,它的收敛速度达到$1/sqrt{n}$(直到一个对数因子)。我们的统计分析表明,这一结果背后的关键是正确选择初始内部权重和通过梯度下降调整外部权重。这表明我们也可以简单地使用线性最小二乘法来选择外部权重。我们证明了相应的理论结果,并通过仿真数据将我们的新的线性最小二乘神经网络估计与标准神经网络估计进行了比较。我们的模拟结果表明,我们的理论考虑导致估计与改进的性能。因此,统计理论的发展确实可以改善神经网络的估计。 摘要:In this paper we analyze the $L_2$ error of neural network regression estimates with one hidden layer. Under the assumption that the Fourier transform of the regression function decays suitably fast, we show that an estimate, where all initial weights are chosen according to proper uniform distributions and where the weights are learned by gradient descent, achieves a rate of convergence of $1/sqrt{n}$ (up to a logarithmic factor). Our statistical analysis implies that the key aspect behind this result is the proper choice of the initial inner weights and the adjustment of the outer weights via gradient descent. This indicates that we can also simply use linear least squares to choose the outer weights. We prove a corresponding theoretical result and compare our new linear least squares neural network estimate with standard neural network estimates via simulated data. Our simulations show that our theoretical considerations lead to an estimate with an improved performance. Hence the development of statistical theory can indeed improve neural network estimates.

【5】 Adaptively Sampling via Regional Variance-Based Sensitivities 标题:基于区域方差灵敏度的自适应采样

作者:Brian W. Bush,Joanne Wendelberger,Rebecca Hanes 机构:National Renewable Energy Laboratory, Los Alamos National Laboratory 备注:22 pages, 8 figures 链接:https://arxiv.org/abs/2107.09538 摘要:受已建立的基于方差的全局灵敏度分析方法的启发,我们开发了一个局部总灵敏度指数,该指数通过自变量的值来分解全局总灵敏度条件。我们在一种新的实验设计方法中使用了这种局部敏感性指数,该方法根据局部对全局方差的贡献,对多元函数的域进行顺序和自适应采样。该方法在一个具有三维域和三维辅域的非线性示例上进行了演示,同时也在一个复杂的高维模拟上进行了演示,以评估生物量生产生物产品的工业可行性。 摘要:Inspired by the well-established variance-based methods for global sensitivity analysis, we develop a local total sensitivity index that decomposes the global total sensitivity conditions by independent variables' values. We employ this local sensitivity index in a new method of experimental design that sequentially and adaptively samples the domain of a multivariate function according to local contributions to the global variance. The method is demonstrated on a nonlinear illustrative example that has a three-dimensional domain and a three-dimensional codomain, but also on a complex, high-dimensional simulation for assessing the industrial viability of the production of bioproducts from biomass.

【6】 Estimation of a regression function on a manifold by fully connected deep neural networks 标题:用完全连通的深度神经网络估计流形上的回归函数

作者:Michael Kohler,Sophie Langer,Ulrich Reif 机构:Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 链接:https://arxiv.org/abs/2107.09532 摘要:研究了独立同分布数据回归函数的估计问题。与预测变量的分布有关的$L_2$积分误差被用作误差准则。针对光滑回归函数,分析了基于全连通空间的带ReLU激活函数的深度神经网络最小二乘估计的收敛速度。结果表明,当预测变量的分布集中在流形上时,这些估计的收敛速度与流形的维数有关,而与预测变量的分量个数无关。 摘要:Estimation of a regression function from independent and identically distributed data is considered. The $L_2$ error with integration with respect to the distribution of the predictor variable is used as the error criterion. The rate of convergence of least squares estimates based on fully connected spaces of deep neural networks with ReLU activation function is analyzed for smooth regression functions. It is shown that in case that the distribution of the predictor variable is concentrated on a manifold, these estimates achieve a rate of convergence which depends on the dimension of the manifold and not on the number of components of the predictor variable.

【7】 Canonical Polyadic Decomposition and Deep Learning for Machine Fault Detection 标题:典型多元分解和深度学习在机械故障检测中的应用

作者:Frusque Gaetan,Michau Gabriel,Fink Olga 机构: ETH Zurich, Swiss Federal Institute of Technology, Zurich, Switzerland 备注:None 链接:https://arxiv.org/abs/2107.09519 摘要:机器故障检测的声学监测是一个新兴的研究方向,已经为工业界提供了很好的研究成果。然而,不可能收集足够的数据来从机器上学习所有类型的故障。因此,开发了新的算法,仅使用来自健康状况的数据进行训练,以执行无监督异常检测。这些算法的一个关键问题是信号中的噪声,因为它影响异常检测的性能。在这项工作中,我们提出了一种基于张量分解的强大的数据驱动准非参数光谱数据去噪策略:非负正则多变量(CP)分解。这种方法特别适用于发出静止声音的机器。在一个案例研究中,我们演示了故障工业机器调查和检查(MIMII)基线,如何使用我们的去噪策略导致无监督异常检测的合理改进。这种方法能够使基于声音的工业过程监测更加可靠。 摘要:Acoustic monitoring for machine fault detection is a recent and expanding research path that has already provided promising results for industries. However, it is impossible to collect enough data to learn all types of faults from a machine. Thus, new algorithms, trained using data from healthy conditions only, were developed to perform unsupervised anomaly detection. A key issue in the development of these algorithms is the noise in the signals, as it impacts the anomaly detection performance. In this work, we propose a powerful data-driven and quasi non-parametric denoising strategy for spectral data based on a tensor decomposition: the Non-negative Canonical Polyadic (CP) decomposition. This method is particularly adapted for machine emitting stationary sound. We demonstrate in a case study, the Malfunctioning Industrial Machine Investigation and Inspection (MIMII) baseline, how the use of our denoising strategy leads to a sensible improvement of the unsupervised anomaly detection. Such approaches are capable to make sound-based monitoring of industrial processes more reliable.

【8】 On some information-theoretic aspects of non-linear statistical inverse problems 标题:关于非线性统计反问题的几个信息论问题

作者:Richard Nickl,Gabriel Paternain 机构:& Gabriel P. Paternain, University of Cambridge 备注:21 pages 链接:https://arxiv.org/abs/2107.09488 摘要:在无限维非线性高斯回归背景下,回顾了vandervaart(1991)半参数统计关于非零Fisher信息存在性的结果。当且仅当回归映射的线性化伴随满足一定的区间条件时,理论上未知参数方面的信息最优推断是可能的。结果表明,在通常研究的带散度型方程的椭圆型反问题中,这个值域条件可能失效,并且在这种情况下,一大类光滑线性泛函的电导率参数无法有效估计。特别是,贝叶斯后验分布的高斯“Bernstein von Mises”型近似不适用于这种情况。 摘要:Results by van der Vaart (1991) from semi-parametric statistics about the existence of a non-zero Fisher information are reviewed in an infinite-dimensional non-linear Gaussian regression setting. Information-theoretically optimal inference on aspects of the unknown parameter is possible if and only if the adjoint of the linearisation of the regression map satisfies a certain range condition. It is shown that this range condition may fail in a commonly studied elliptic inverse problem with a divergence form equation, and that a large class of smooth linear functionals of the conductivity parameter cannot be estimated efficiently in this case. In particular, Gaussian `Bernstein von Mises'-type approximations for Bayesian posterior distributions do not hold in this setting.

【9】 Directional testing for high-dimensional multivariate normal distributions 标题:高维多元正态分布的方向检验

作者:Caizhu Huang,Claudia Di Caterina,Nicola Sartori 机构:Department of Statistical Sciences, University of Padova, Padova, Italy, Free University of Bozen-Bolzano, Bolzano, Italy 备注:71 pages, 39 figures, 14 tables 链接:https://arxiv.org/abs/2107.09418 摘要:多元正态分布由于其良好的性质,在许多科学领域中仍然被广泛地用于模拟现象。然而,当分量数$p$与样本量$n$具有相同的渐近顺序时,标准的推断技术通常不足以对平均向量和/或协方差矩阵进行假设检验。在几个突出的框架内,我们建议通过定向测试得出可靠的结论。我们证明了在零假设下,在满足正态模型极大似然估计存在的条件下,即使当p$的阶数为n$时,方向的p$值也是完全均匀分布的。大量的仿真结果证实了$p/n$不同值下的理论结果,并表明该方法不仅优于通常的有限元方法,而且优于为高维环境定制的其他方法。 摘要:Thanks to its favorable properties, the multivariate normal distribution is still largely employed for modeling phenomena in various scientific fields. However, when the number of components $p$ is of the same asymptotic order as the sample size $n$, standard inferential techniques are generally inadequate to conduct hypothesis testing on the mean vector and/or the covariance matrix. Within several prominent frameworks, we propose then to draw reliable conclusions via a directional test. We show that under the null hypothesis the directional $p$-value is exactly uniformly distributed even when $p$ is of the same order of $n$, provided that conditions for the existence of the maximum likelihood estimate for the normal model are satisfied. Extensive simulation results confirm the theoretical findings across different values of $p/n$, and show that the proposed approach outperforms not only the usual finite-$p$ approaches but also alternative methods tailored for high-dimensional settings.

【10】 A Comparison of Value-Added Models for School Accountability 标题:学校问责增值模式的比较研究

作者:George Leckie,Lucy Prior 机构:Centre for Multilevel Modelling and School of Education, University of Bristol, Corresponding author:, Berkeley Square, BS,JA, United Kingdom 链接:https://arxiv.org/abs/2107.09410 摘要:学校问责制度越来越多地要求学校使用旨在衡量学校对学生学习影响的增值模式来说明其表现。最常用的方法是拟合学生当前成绩与学生先前成绩的线性回归,其中学校效应是预测残差的学校平均数。在文献中,进一步调整了学生的社会人口结构,有时学校组成和“非可塑性”特征。然而,问责制度通常作出较少的调整:对最终用户的透明度,因为数据不可用或质量不足,或出于意识形态的原因。因此,人们对理解更简单的模型在多大程度上与理论上更合理但更复杂的模型产生类似的学校效应有着相当大的兴趣。我们通过对英国“进步8”中学问责制的个案研究和实证分析来探讨这些问题。 摘要:School accountability systems increasingly hold schools to account for their performances using value-added models purporting to measure the effect of schools on student learning. The most common approach is to fit a linear regression of student current achievement on student prior achievement, where the school effects are the school means of the predicted residuals. In the literature further adjustments are made for student sociodemographics and sometimes school composition and 'non-malleable' characteristics. However, accountability systems typically make fewer adjustments: for transparency to end users, because data is unavailable or of insufficient quality, or for ideological reasons. There is therefore considerable interest in understanding the extent to which simpler models give similar school effects to more theoretically justified but complex models. We explore these issues via a case study and empirical analysis of England's 'Progress 8' secondary school accountability system.

【11】 Multi-Normex Distributions for the Sum of Random Vectors. Rates of Convergence 标题:随机向量和的多重正态分布收敛速度

作者:Marie Kratz,Evgeny Prokopenko 机构:† ESSEC Business School Paris, CREAR, France, ‡ Sobolev Institute of Mathematics, Novosibirsk, Russia 链接:https://arxiv.org/abs/2107.09409 摘要:我们建立了一个iid重尾随机向量和的整体分布的尖锐近似,结合了均值和极值行为。它将所谓的“normex”方法从单变量扩展到多变量框架。我们提出了两种可能的多normex分布,命名为$d$-normex和MRV-normex。两者都依赖于高斯分布来描述平均行为,通过CLT,而两个版本之间的差异来自于使用精确分布或最大值的EV定理。当考虑MRV-normex情形时,假设父随机向量的范数具有二阶正则变差性质,主要定理给出了每种形式的多范数分布向和分布的收敛速度。利用基于几何分位数的QQ图,给出了不同依赖结构对母体随机向量的数值说明和比较。 摘要:We build a sharp approximation of the whole distribution of the sum of iid heavy-tailed random vectors, combining mean and extreme behaviors. It extends the so-called 'normex' approach from a univariate to a multivariate framework. We propose two possible multi-normex distributions, named $d$-Normex and MRV-Normex. Both rely on the Gaussian distribution for describing the mean behavior, via the CLT, while the difference between the two versions comes from using the exact distribution or the EV theorem for the maximum. The main theorems provide the rate of convergence for each version of the multi-normex distributions towards the distribution of the sum, assuming second order regular variation property for the norm of the parent random vector when considering the MRV-normex case. Numerical illustrations and comparisons are proposed with various dependence structures on the parent random vector, using QQ-plots based on geometrical quantiles.

【12】 Diagnosis of model-structural errors with a sliding time-window Bayesian analysis 标题:滑动时间窗贝叶斯分析在模型结构误差诊断中的应用

作者:Han-Fang Hsueh,Anneli Guthke,Thomas Wöhling,Wolfgang Nowak 机构:Department of Stochastic Simulation and Safety Research for Hydrosystems (IWSLS,), University of, Stuttgart, Stuttgart, Germany, Center for Applied Geoscience, University of T¨ubingen, T¨ubingen, Germany 备注:58 pages, 23 figures 链接:https://arxiv.org/abs/2107.09399 摘要:确定性水文模型具有不确定性,但被推断为时不变参数,通常表现出与时间相关的模型结构误差。如果一个水文过程在自然界中的某个时间段是活跃的,但模型不能解决这个问题,就会出现这种错误。在校准过程中,这种缺失的过程可能会成为模型参数随时间变化的最佳拟合值。我们提出了一种形式化的时间窗贝叶斯分析来诊断这类模型错误,用贝叶斯模型证据(BME)作为模型性能度量,将问题形式化n在校准时间序列的哪个时段内,模型在统计上不符合准真,我们确定了校准时间序列的时间窗中有多少数据支持或反驳了模型。然后,我们跟踪滑动时间窗上的BME,得到一个动态的时间窗BME(tBME),并搜索表示模型误差开始的突然下降。tBME还允许我们根据数据对模型进行正式的滑动似然比测试。我们提出的方法被设计用于检测各种时间尺度上的错误发生,这在水文模拟中特别有用。我们通过将我们提出的方法应用于土壤水分模拟来说明这一点。我们将tBME作为模型误差指示器,在几个合成的和真实的测试用例上进行测试,我们设计了不同的误差源和误差时间尺度。结果证明了该框架在动态模型结构误差检测中的有效性。此外,后验参数分布的时间序列有助于研究模型误差的原因,为模型改进提供指导。 摘要:Deterministic hydrological models with uncertain, but inferred-to-be-time-invariant parameters typically show time-dependent model structural errors. Such errors can occur if a hydrological process is active in certain time periods in nature, but is not resolved by the model. Such missing processes could become visible during calibration as time-dependent best-fit values of model parameters. We propose a formal time-windowed Bayesian analysis to diagnose this type of model error, formalizing the question In which period of the calibration time-series does the model statistically disqualify itself as quasi-true?" Using Bayesian model evidence (BME) as model performance metric, we determine how much the data in time windows of the calibration time-series support or refute the model. Then, we track BME over sliding time windows to obtain a dynamic, time-windowed BME (tBME) and search for sudden decreases that indicate an onset of model error. tBME also allows us to perform a formal, sliding likelihood-ratio test of the model against the data. Our proposed approach is designed to detect error occurrence on various temporal scales, which is especially useful in hydrological modelling. We illustrate this by applying our proposed method to soil moisture modeling. We test tBME as model error indicator on several synthetic and real-world test cases that we designed to vary in error sources and error time scales. Results prove the usefulness of the framework for detecting structural errors in dynamic models. Moreover, the time sequence of posterior parameter distributions helps to investigate the reasons for model error and provide guidance for model improvement.

【13】 An induction proof of the backpropagation algorithm in matrix notation 标题:矩阵表示的反向传播算法的归纳证明

作者:Dirk Ostwald,Franziska Usée 机构:Institute of Psychology and Center for Behavioral Brain Sciences, Otto-von-Guericke Universität Magdeburg, Germany 链接:https://arxiv.org/abs/2107.09384 摘要:反向传播(BP)是当代神经网络深度学习的核心组成部分。简单地说,BP算法是一种利用神经网络的计算结构,在神经网络参数优化过程中有效地估计代价函数梯度的算法。BP算法的有效性取决于多元链式规则在神经网络及其相关目标函数的计算结构中的应用。对深度学习理论的介绍通常以矩阵形式呈现神经网络的计算结构,但在矩阵微分学的框架中避免了BP的并行表述和证明。这给深度学习的理论和教学方法带来了一些缺陷。在这项工作中,我们克服了这些限制,提供了一个完整的归纳证明的BP算法在矩阵表示法。具体来说,我们将BP算法置于矩阵微分学的框架中,包含仿射线性势函数,用归纳形式证明了BP算法的有效性,并举例说明了矩阵形式BP算法在计算机代码中的实现。 摘要:Backpropagation (BP) is a core component of the contemporary deep learning incarnation of neural networks. Briefly, BP is an algorithm that exploits the computational architecture of neural networks to efficiently evaluate the gradient of a cost function during neural network parameter optimization. The validity of BP rests on the application of a multivariate chain rule to the computational architecture of neural networks and their associated objective functions. Introductions to deep learning theory commonly present the computational architecture of neural networks in matrix form, but eschew a parallel formulation and justification of BP in the framework of matrix differential calculus. This entails several drawbacks for the theory and didactics of deep learning. In this work, we overcome these limitations by providing a full induction proof of the BP algorithm in matrix notation. Specifically, we situate the BP algorithm in the framework of matrix differential calculus, encompass affine-linear potential functions, prove the validity of the BP algorithm in inductive form, and exemplify the implementation of the matrix form BP algorithm in computer code.

【14】 Study of the Parent-of-origin effect in monogenic diseases with variable age of onset. Application on ATTRv 标题:不同发病年龄单基因疾病的父母效应研究。在ATTRV上的应用

作者:Flora Alarcon,Violaine Planté-Bordeneuve,Gregory Nuel 机构: Laboratory MAP, UMR CNRS , Paris University, Paris, France, Department of Neurology, Henri Mondor University Hospital, APHP, Cr´eteil, France., Paris Est-Cr´eteil University, Cr´eteil, France. And Inserm U., Institut Mondor de Recherche 备注:14 pages, 4 figures, 链接:https://arxiv.org/abs/2107.09365 摘要:在发病年龄可变的遗传病中,准确估计突变携带者的生存函数和修正因子效应估计对于无症状基因携带者的终生管理具有重要意义。在这些修饰因素中,传递突变的父母的性别(即起源父母效应)对转甲状腺素家族性淀粉样多发性神经病(ATTRv)家族的生存曲线估计有显著影响。然而,由于大多数基因型是未知的,亲本的起源必须通过一个概率估计从系谱。我们在这篇文章中提出了扩展的方法提供变异载体生存估计,以估计亲本起源的影响。该方法在模拟数据上得到了验证,并应用于有ATTRv的家庭样本。 摘要:In genetic diseases with variable age of onset, an accurate estimation of the survival function for the mutation carriers and also modifying factors effects estimations are important for the management of asymptomatic gene carriers across life. Among the modifying factors, the gender of the parent transmitting the mutation (i.e. the parent-of-origin effect) has been shown to have a significant effect on survival curve estimation on transthyretin familial amyloid polyneuropathy (ATTRv) families. However, as most genotypes are unknown, the parent-of-origin must be calculated through a probability estimated from the pedigree. We propose in this article to extend the method providing mutation carrier survival estimates in order to estimate the parent-of-origin effect. The method is both validated on simulated data and applied to familly samples with ATTRv.

【15】 JAGS, NIMBLE, Stan: a detailed comparison among Bayesian MCMC software 标题:JAGS、NIMBLE、STAN:贝叶斯MCMC软件的详细比较

作者:Mario Beraha,Daniele Falco,Alessandra Guglielmi 机构:Department of Mathematics, Politecnico di Milano, Department of Computer Science, Universita di Bologna 链接:https://arxiv.org/abs/2107.09357 摘要:这项工作的目的是比较三种流行的软件平台JAGS、NIMBLE和Stan的性能。这些概率编程语言能够使用MCMC算法,从贝叶斯模型的规范(即似然和先验)开始,从兴趣的后验分布自动生成样本。最终目标是向统计学家或应用科学家详细分析他们的长处和短处。通过这种方式,我们希望能够帮助他们充分了解这个软件的优点和缺点。我们在广泛的模型、先验分布和数据生成机制上对这三个平台进行了系统的比较。我们广泛的模拟研究评估了所生产的MCMC链的质量、软件的效率和输出的拟合优度。我们还考虑了由三个平台进行并行化的效率。 摘要:The aim of this work is the comparison of the performance of the three popular software platforms JAGS, NIMBLE and Stan. These probabilistic programming languages are able to automatically generate samples from the posterior distribution of interest using MCMC algorithms, starting from the specification of a Bayesian model, i.e. the likelihood and the prior. The final goal is to present a detailed analysis of their strengths and weaknesses to statisticians or applied scientists. In this way, we wish to contribute to make them fully aware of the pros and cons of this software. We carry out a systematic comparison of the three platforms on a wide class of models, prior distributions, and data generating mechanisms. Our extensive simulation studies evaluate the quality of the MCMC chains produced, the efficiency of the software and the goodness of fit of the output. We also consider the efficiency of the parallelization made by the three platforms.

【16】 Record-Based Transmuted Generalized Linear Exponential Distribution with Increasing, Decreasing and Bathtub Shaped Failure Rates 标题:基于记录的递增、递减和浴缸形故障率的变形广义线性指数分布

作者:M. Arshad,M. Khetan,V. Kumar,A. K. Pathak 机构: 1 and Ashok Kumar PathakeaDepartment of Mathematics, Indian Institute of Technology Indore, bDepartment of Statistics and Operations Research, Aligarh Muslim University, cDepartment of Mathematics, Amity University Mumbai, dDepartment of Mathematics 备注:29 pages, 5 figures, 9 tables 链接:https://arxiv.org/abs/2107.09316 摘要:线性指数分布是指数分布和瑞利分布的推广。这种分布是拟合失效率(IFR)不断增加的数据的最佳模型之一。但对于失效率降低(DFR)和浴缸型失效率(BTFR)的建模数据,它并不能提供合理的拟合。为了克服这个缺点,我们利用Balakrishnan和He(2021)的技术提出了一种新的基于记录的广义线性指数(RTGLE)分布。RTGLE分布族对IFR、DFR和BTFR数据集的拟合更为灵活,同时也推广了一些著名的模型和一些新的基于记录的变形模型。本文旨在研究概率分布的统计性质,如概率密度函数和危险函数的形状,分位数函数及其应用,矩及其母函数,次序和记录统计量,Renyi熵。构造了未知参数的极大似然估计、最小二乘和加权最小二乘估计、Anderson-Darling估计、Cramer-von-Mises估计,并通过montecarlo模拟研究了它们的偏差和均方误差。最后,基于失效时间的实际数据集验证了该分布的拟合优度和适用性;因此,提出了适当的建议。 摘要:The linear exponential distribution is a generalization of the exponential and Rayleigh distributions. This distribution is one of the best models to fit data with increasing failure rate (IFR). But it does not provide a reasonable fit for modeling data with decreasing failure rate (DFR) and bathtub shaped failure rate (BTFR). To overcome this drawback, we propose a new record-based transmuted generalized linear exponential (RTGLE) distribution by using the technique of Balakrishnan and He (2021). The family of RTGLE distributions is more flexible to fit the data sets with IFR, DFR, and BTFR, and also generalizes several well-known models as well as some new record-based transmuted models. This paper aims to study the statistical properties of RTGLE distribution, like, the shape of the probability density function and hazard function, quantile function and its applications, moments and its generating function, order and record statistics, Renyi entropy. The maximum likelihood estimators, least squares and weighted least squares estimators, Anderson-Darling estimators, Cramer-von Mises estimators of the unknown parameters are constructed and their biases and mean squared errors are reported via Monte Carlo simulation study. Finally, the real data set based on failure time illustrates the goodness of fit and applicability of the proposed distribution; hence, suitable recommendations are forwarded.

【17】 A Bayesian Approach to Invariant Deep Neural Networks 标题:不变深度神经网络的贝叶斯方法

作者:Nikolaos Mourdoukoutas,Marco Federici,Georges Pantalos,Mark van der Wilk,Vincent Fortuin 机构: Switzerland 2University of Amster-dam 备注:8 pages, 3 figures, To be published in ICML UDL 2021 链接:https://arxiv.org/abs/2107.09301 摘要:我们提出了一种新的贝叶斯神经网络结构,它可以通过推断不同权重分配方案的后验分布来学习数据的不变性。我们表明,当对包含特定不变性的数据集进行训练时,我们的模型优于其他非不变体系结构。当不执行数据扩充时,也是如此。 摘要:We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed.

【18】 Generalized maximum likelihood estimation of the mean of parameters of mixtures, with applications to sampling 标题:混合参数均值的广义极大似然估计及其在抽样中的应用

作者:Eitan Greenshtein,Ya'acov Ritov 机构:Ya’acov Ritov 链接:https://arxiv.org/abs/2107.09296 摘要:让$f(y | θ),\θ在Ω$中是一个参数族,$eta(θ)$是一个给定的函数,$G$是一个未知的混合分布。希望根据独立观测值$Y_1,…,Y_n$估计$E_G(eta(theta))等价于eta_G$,其中$Y_isim f(Y | theta_i)$和$theta_isim G$是iid。我们研究了这个问题的广义极大似然估计。给出了这些估计量的一些基本性质和表示。特别地,我们对Kiefer和Wolfowitz(1956)的弱收敛结果提出了一个新的观点,并对$thetau 1,…,thetau n$是{it fixed}参数的相应设置进行了说明。我们还将上述估计$etau G$的问题与平方损失下的非参数经验Bayes估计联系起来。介绍了GMLE在抽样问题中的应用。通过仿真和实例验证了GMLE的性能。 摘要:Let $f(y|theta), ; theta in Omega$ be a parametric family, $eta(theta)$ a given function, and $G$ an unknown mixing distribution. It is desired to estimate $E_G (eta(theta))equiv eta_G$ based on independent observations $Y_1,...,Y_n$, where $Y_i sim f(y|theta_i)$, and $theta_i sim G$ are iid. We explore the Generalized Maximum Likelihood Estimators (GMLE) for this problem. Some basic properties and representations of those estimators are shown. In particular we suggest a new perspective, of the weak convergence result by Kiefer and Wolfowitz (1956), with implications to a corresponding setup in which $theta_1,...,theta_n$ are {it fixed} parameters. We also relate the above problem, of estimating $eta_G$, to non-parametric empirical Bayes estimation under a squared loss. Applications of GMLE to sampling problems are presented. The performance of the GMLE is demonstrated both in simulations and through a real data example.

【19】 Conditional Wasserstein Barycenters and Interpolation/Extrapolation of Distributions 标题:条件Wasserstein重心与分布的内插/外推

作者:Jianing Fan,Hans-Georg Müller 机构:Department of Statistics, University of California, Davis, CA , USA 备注:42 pages, 15 figures 链接:https://arxiv.org/abs/2107.09218 摘要:越来越复杂的数据分析任务促使人们研究多变量连续随机变量的分布对标量或向量预测值的依赖性。迄今为止,分布响应的统计回归模型主要针对一维响应分布的情况进行了研究。本文研究了在分布空间中采用2-Wasserstein度量时多变量响应分布的情况。挑战在于,与单变量情形不同的是,在具有2-Wasserstein度量的分布空间中,与测地线对应的最优传输对于多变量分布没有明确的表示。我们证明了在某些正则性假设下,欧氏预测空间中为测地线构造的条件Wasserstein重心在Wasserstein分布空间中形成了相应的测地线,并证明了如何利用条件重心的概念进行插值和外推分配。通过仿真和实例探讨了分布内插和外推的实用性。我们研究了全局类参数模型和局部类光滑模型来实现条件Wasserstein重心,并建立了相应估计的渐近收敛性。对于算法的实现,我们使用了Sinkhorn熵惩罚算法。以气候科学和老化研究中的应用为例,说明了条件Wasserstein重心和分布外推。 摘要:Increasingly complex data analysis tasks motivate the study of the dependency of distributions of multivariate continuous random variables on scalar or vector predictors. Statistical regression models for distributional responses so far have primarily been investigated for the case of one-dimensional response distributions. We investigate here the case of multivariate response distributions while adopting the 2-Wasserstein metric in the distribution space. The challenge is that unlike the situation in the univariate case, the optimal transports that correspond to geodesics in the space of distributions with the 2-Wasserstein metric do not have an explicit representation for multivariate distributions. We show that under some regularity assumptions the conditional Wasserstein barycenters constructed for a geodesic in the Euclidean predictor space form a corresponding geodesic in the Wasserstein distribution space and demonstrate how the notion of conditional barycenters can be harnessed to interpolate as well as extrapolate multivariate distributions. The utility of distributional inter- and extrapolation is explored in simulations and examples. We study both global parametric-like and local smoothing-like models to implement conditional Wasserstein barycenters and establish asymptotic convergence properties for the corresponding estimates. For algorithmic implementation we make use of a Sinkhorn entropy-penalized algorithm. Conditional Wasserstein barycenters and distribution extrapolation are illustrated with applications in climate science and studies of aging.

【20】 Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression 标题:我们能否全局优化交叉验证损失?岭回归中的拟凸性

作者:William T. Stephenson,Zachary Frangella,Madeleine Udell,Tamara Broderick 机构:MIT, Cornell 备注:20 pages, 6 figures 链接:https://arxiv.org/abs/2107.09194 摘要:像LASSO和ridge回归这样的模型由于其可解释性、易用性和强大的理论保证,在实践中得到了广泛的应用。交叉验证(CV)被广泛用于这些模型的超参数整定,但实际的优化方法是否能使真正的样本外损失最小化?最近的一项研究表明,变异系数损失的最佳值与样本外损失的最佳值相匹配(可能经过简单的修正)。它仍然显示如何处理它是最大限度地减少CV损失。在本文中,我们证明了在岭回归的情况下,CV损失可能不是拟凸的,因此可能存在多重局部最优解。我们可以保证至少在一种情况下CV损失是拟凸的:当协变矩阵的谱几乎是平坦的并且观测到的响应中的噪声不是太高时。更一般地,我们证明了拟凸状态与观测数据的许多性质(响应范数、协变矩阵右奇异向量和奇异值标度)无关,并且与剩余的少数性质有复杂的依赖关系。我们用模拟实验来证实我们的理论。 摘要:Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.

【21】 BICNet: A Bayesian Approach for Estimating Task Effects on Intrinsic Connectivity Networks in fMRI Data 标题:BICNet:在fMRI数据中估计固有连接网络任务效应的贝叶斯方法

作者:Meini Tang,Chee-Ming Ting,Hernando Ombao 机构:King Abdullah University of Science and Technology, Thuwal, Saudi Arabia, Monash University Malaysia, Subang Jaya, Malaysia 链接:https://arxiv.org/abs/2107.09160 摘要:内在连接性网络(Intrinsic connectivity networks,ICNs)是一种特殊的动态功能性脑网络,在包括休息和任务在内的各种条件下都能被发现。研究表明,一些刺激实际上通过抑制、兴奋、调节或修饰激活内在连接性。然而,icn的结构和任务对icn的影响尚不完全清楚。在本文中,我们提出了一个贝叶斯内在连接性网络(BICNet)模型来识别ICN并量化任务相关对ICN动态的影响。利用扩展的贝叶斯动态稀疏潜在因子模型,提出的BICNet具有以下优点:(1)它能同时识别单个ICN和组级ICN空间地图(2) 通过建立静息状态功能磁共振成像(rfMRI)和任务相关功能磁共振成像(tfMRI)的联合模型,对ICNs进行可靠的识别(3) 与基于独立分量分析(ICA)的方法相比,该方法可以量化不同状态下ICNs振幅的差异(4) 它通过icn的稀疏性自动地进行特征选择,而不是临时的阈值分割。将所提出的BICNet应用于人类连接组项目(HCP)的rfMRI和语言tfMRI数据,分析发现几个与不同语言处理功能相关的icn。 摘要:Intrinsic connectivity networks (ICNs) are specific dynamic functional brain networks that are consistently found under various conditions including rest and task. Studies have shown that some stimuli actually activate intrinsic connectivity through either suppression, excitation, moderation or modification. Nevertheless, the structure of ICNs and task-related effects on ICNs are not yet fully understood. In this paper, we propose a Bayesian Intrinsic Connectivity Network (BICNet) model to identify the ICNs and quantify the task-related effects on the ICN dynamics. Using an extended Bayesian dynamic sparse latent factor model, the proposed BICNet has the following advantages: (1) it simultaneously identifies the individual ICNs and group-level ICN spatial maps; (2) it robustly identifies ICNs by jointly modeling resting-state functional magnetic resonance imaging (rfMRI) and task-related functional magnetic resonance imaging (tfMRI); (3) compared to independent component analysis (ICA)-based methods, it can quantify the difference of ICNs amplitudes across different states; (4) it automatically performs feature selection through the sparsity of the ICNs rather than ad-hoc thresholding. The proposed BICNet was applied to the rfMRI and language tfMRI data from the Human Connectome Project (HCP) and the analysis identified several ICNs related to distinct language processing functions.

【22】 Inference for Change Points in High Dimensional Mean Shift Models 标题:高维Mean Shift模型中变点的推断

作者:Abhishek Kaul,George Michailidis 机构:Department of Mathematics and Statistics, Washington State University, Pullman, WA , USA., Department of Statistics and the Informatics Institute, University of Florida, Gainsville, FL ,-, USA. 链接:https://arxiv.org/abs/2107.09150 摘要:我们考虑在高维均值漂移模型中构造变点位置的置信区间的问题。为此,我们开发了一个局部修正的最小二乘估计量,并获得了潜在变化点的分量估计率和同时估计率。同时速率是文献中可用的最快的,至少是$logp$的一个因子,而组件式速率是最优的。这些结果使得极限分布的存在成为可能。分量分布在零跳和非零跳两种情况下都具有特征,而变点估计的任意有限子集的联合分布在后一种情况下具有特征,这也使得这些估计具有渐近独立性。组合结果用于构造变点参数的渐近有效的分量置信区间和同时置信区间。结果是在高维尺度下建立的,考虑到跳跃大小的减小,在变化点数目发散和亚指数误差的情况下。它们在用于活动识别的智能手机的合成数据和传感器测量上得到了说明。 摘要:We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. To that end, we develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of the underlying change points. The simultaneous rate is the sharpest available in the literature by at least a factor of $log p,$ while the component-wise one is optimal. These results enable existence of limiting distributions. Component-wise distributions are characterized under both vanishing and non-vanishing jump size regimes, while joint distributions for any finite subset of change point estimates are characterized under the latter regime, which also yields asymptotic independence of these estimates. The combined results are used to construct asymptotically valid component-wise and simultaneous confidence intervals for the change point parameters. The results are established under a high dimensional scaling, allowing for diminishing jump sizes, in the presence of diverging number of change points and under subexponential errors. They are illustrated on synthetic data and on sensor measurements from smartphones for activity recognition.

【23】 Adaptive wavelet distillation from neural networks through interpretations 标题:基于解释的神经网络自适应小波提取

作者:Wooseok Ha,Chandan Singh,Francois Lanusse,Eli Song,Song Dang,Kangmin He,Srigokul Upadhyayula,Bin Yu 机构: CNRS; Universit´e Paris-Saclay, Universit´e Paris Diderot, Sorbonne Paris Cit´e§Institute of Biophysics, Chinese Academy of Science¶State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology 链接:https://arxiv.org/abs/2107.09145 摘要:最近的深度学习模型取得了令人印象深刻的预测性能,但往往牺牲解释性和计算效率。可解释性在许多学科中是至关重要的,比如科学和医学,在这些学科中,模型必须经过仔细的审查,或者解释本身就是目标。此外,可解释模型是简洁的,往往产生计算效率。在这里,我们提出了一种自适应小波蒸馏(AWD)方法,其目的是将训练好的神经网络中的信息提取为小波变换。具体地说,AWD在小波域对神经网络的特征属性进行惩罚,以学习有效的多分辨率小波变换。所得到的模型具有高度的预测性、简洁性和计算效率,并且具有易于解释的特性(例如多尺度结构)。在与领域专家的密切合作下,我们展示了AWD如何在两种现实环境中应对挑战:宇宙学参数推断和分子伴侣预测。在这两种情况下,AWD产生了一个科学解释和简洁的模型,它比最先进的神经网络具有更好的预测性能。此外,AWD识别在各个领域中具有科学意义的预测特征。所有代码和模型都在Github上提供的一个完整的包中发布(https://github.com/Yu-Group/adaptive-wavelets). 摘要:Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we propose adaptive wavelet distillation (AWD), a method which aims to distill information from a trained neural network into a wavelet transform. Specifically, AWD penalizes feature attributions of a neural network in the wavelet domain to learn an effective multi-resolution wavelet transform. The resulting model is highly predictive, concise, computationally efficient, and has properties (such as a multi-scale structure) which make it easy to interpret. In close collaboration with domain experts, we showcase how AWD addresses challenges in two real-world settings: cosmological parameter inference and molecular-partner prediction. In both cases, AWD yields a scientifically interpretable and concise model which gives predictive performance better than state-of-the-art neural networks. Moreover, AWD identifies predictive features that are scientifically meaningful in the context of respective domains. All code and models are released in a full-fledged package available on Github (https://github.com/Yu-Group/adaptive-wavelets).

【24】 A Non-ergodic Spectral Acceleration Ground Motion Model for California Developed with Random Vibration Theory 标题:用随机振动理论建立的加州非遍历谱加速度地面运动模型

作者:Grigorios Lavrentiadis,Norman A. Abrahamson 机构:Department of Civil Engineering, University of California, Berkeley 备注:32 pages, 34 figures 链接:https://arxiv.org/abs/2107.09125 摘要:提出了一种新的非遍历地震动模型(GMM)的建立方法,该模型考虑了非遍历效应的震级依赖性。在这种方法中,平均$PSA$标度由遍历$PSA$GMM控制,非遍历效应用非遍历$PSA$因子捕获,该因子是需要应用于遍历$PSA$GMM以合并非遍历效应的调整。非遍历$PSA$因子基于$EAS$非遍历效应,通过随机振动理论(RVT)转换为$PSA$。这种方法的优点是它能更好地捕捉到小震的非遍历震源、路径和场地效应。由于Fourier变换的线性性质,小事件的$EAS$非遍历效应可以直接应用于大事件。$PSA$的情况并非如此,因为响应光谱由一系列频率控制,使得$PSA$的非遍历效应取决于光谱形状,而光谱形状与幅值有关。两个$PSA$非遍历GMMs分别以ASK14和CY14 GMMs为主干模型。用LAK21-GMM估计了非遍历效应。RVT计算采用V75峰值因子模型、AS96对地震动持续时间的$D{a0.05-0.85}$估计以及BT15振荡器持续时间模型。两个模型都使用了NGAWest2数据库的加利福尼亚子集。两个非遍历$PSA$GMMs的总误差标准差比相应遍历$PSA$GMMs的总误差标准差小约$30$到$35%$。这种减少对大重现期的风险计算有重大影响。在远离台站和过去事件的偏远地区,知觉变异性的减少伴随着认知不确定性的增加。 摘要:A new approach for creating a non-ergodic $PSA$ ground-motion model (GMM) is presented which account for the magnitude dependence of the non-ergodic effects. In this approach, the average $PSA$ scaling is controlled by an ergodic $PSA$ GMM, and the non-ergodic effects are captured with non-ergodic $PSA$ factors, which are the adjustment that needs to be applied to an ergodic $PSA$ GMM to incorporate the non-ergodic effects. The non-ergodic $PSA$ factors are based on $EAS$ non-ergodic effects and are converted to $PSA$ through Random Vibration Theory (RVT). The advantage of this approach is that it better captures the non-ergodic source, path, and site effects through the small magnitude earthquakes. Due to the linear properties of Fourier Transform, the $EAS$ non-ergodic effects of the small events can be applied directly to the large magnitude events. This is not the case for $PSA$, as response spectrum is controlled by a range of frequencies, making $PSA$ non-ergodic effects depended on the spectral shape which is magnitude dependent. Two $PSA$ non-ergodic GMMs are derived using the ASK14 and CY14 GMMs as backbone models, respectively. The non-ergodic $EAS$ effects are estimated with the LAK21 GMM. The RVT calculations are performed with the V75 peak factor model, the $D_{a0.05-0.85}$ estimate of AS96 for the ground-motion duration, and BT15 oscillator-duration model. The California subset of the NGAWest2 database is used for both models. The total aleatory standard deviation of the two non-ergodic $PSA$ GMMs is approximately $30$ to $35%$ smaller than the total aleatory standard deviation of the corresponding ergodic $PSA$ GMMs. This reduction has a significant impact on hazard calculations at large return periods. In remote areas, far from stations and past events, the reduction of aleatory variability is accompanied by an increase of epistemic uncertainty.

【25】 Reward-Weighted Regression Converges to a Global Optimum 标题:报酬加权回归收敛于全局最优值

作者:Miroslav Štrupl,Francesco Faccio,Dylan R. Ashley,Rupesh Kumar Srivastava,Jürgen Schmidhuber 机构:Lugano, Switzerland, The Swiss AI Lab IDSIAUSISUPSINNAISENSE 备注:10 pages in main text 2 pages of references 4 pages of appendices, 2 figures in main text; source code available at this https URL 链接:https://arxiv.org/abs/2107.09088 摘要:奖励加权回归(RWR)属于一类基于期望最大化框架的迭代强化学习算法。在这个族中,每次迭代的学习包括使用当前策略对一批轨迹进行采样,并拟合一个新策略以最大化操作的回报加权对数似然。虽然RWR在某些情况下可以产生单调的策略改进,但是RWR是否收敛到最优策略以及在什么条件下收敛到最优策略仍然是一个悬而未决的问题。本文首次证明了当不使用函数逼近时,RWR收敛于全局最优解。 摘要:Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used.

【26】 Order Book Queue Hawkes-Markovian Modeling 标题:订单排队的霍克斯-马尔可夫模型

作者:Philip Protter,Qianfan Wu,Shihao Yang 机构:Department of Statistics, Columbia University, Department of Finance, Indiana University Bloomington, Department of Industrial and Systems Engineering, Georgia Institute of Technology, Order Book Queue Hawkes-Markovian, Modeling 备注:71 pages, 80 figures, submitted to Journal of American Statistical Association 链接:https://arxiv.org/abs/2107.09629 摘要:本文提出了一种基于马尔可夫基线强度的Hawkes过程模型,用于高频订单数据建模。我们根据订单类型和到达后的价格变化,将日内订单簿交易事件分为一系列类别。利用多元Hawkes过程来模拟自激励和相互激励事件的到达,以捕捉不同类型订单事件之间的激励效应。通过考虑订单流动性状态和时间因素对基线强度的影响,将马尔可夫基线强度引入到事件到达动态中。在我们的Hawkes Markovian模型中,采用了一种基于回归的非参数估计方法来估计模型参数。为了消除冗余模型参数,在估计过程中引入了LASSO正则化。此外,采用基于Akaike信息准则的模型选择方法对模型的各个部分进行了效果评价。给出了一个基于实际LOB数据的实现实例。通过实例,我们研究了Hawkes兴奋函数的经验形式,流动性状态和时间因素的影响,套索变量的选择,以及Hawkes和Markovian元素对订单簿动力学的解释力。 摘要:This article presents a Hawkes process model with Markovian baseline intensities for high-frequency order book data modeling. We classify intraday order book trading events into a range of categories based on their order types and the price changes after their arrivals. To capture the stimulating effects between multiple types of order book events, we use the multivariate Hawkes process to model the self- and mutually-exciting event arrivals. We also integrate a Markovian baseline intensity into the event arrival dynamic, by including the impacts of order book liquidity state and time factor to the baseline intensity. A regression-based non-parametric estimation procedure is adopted to estimate the model parameters in our Hawkes Markovian model. To eliminate redundant model parameters, LASSO regularization is incorporated in the estimation procedure. Besides, model selection method based on Akaike Information Criteria is applied to evaluate the effect of each part of the proposed model. An implementation example based on real LOB data is provided. Through the example, we study the empirical shapes of Hawkes excitement functions, the effects of liquidity state as well as time factors, the LASSO variable selection, and the explanatory power of Hawkes and Markovian elements to the dynamics of the order book.

【27】 Positively Weighted Kernel Quadrature via Subsampling 标题:基于二次采样的正权核正交

作者:Satoshi Hayakawa,Harald Oberhauser,Terry Lyons 机构:Mathematical Institute, University of Oxford 备注:19 pages 链接:https://arxiv.org/abs/2107.09597 摘要:研究了一般区域上概率测度的正权核求积规则。我们的理论分析结合了核的光谱特性和点的随机采样。这就产生了有效的算法来构造具有正权重和较小最坏情况误差的核求积规则。除了额外的鲁棒性,我们的数值实验表明,这可以实现快速收敛速度与最佳界在众所周知的例子竞争。 摘要:We study kernel quadrature rules with positive weights for probability measures on general domains. Our theoretical analysis combines the spectral properties of the kernel with random sampling of points. This results in effective algorithms to construct kernel quadrature rules with positive weights and small worst-case error. Besides additional robustness, our numerical experiments indicate that this can achieve fast convergence rates that compete with the optimal bounds in well-known examples.

【28】 Open Problem: Is There an Online Learning Algorithm That Learns Whenever Online Learning Is Possible? 标题:悬而未决的问题:有没有一种在线学习算法,只要在线学习是可能的,它就会学习?

作者:Steve Hanneke 机构:Toyota Technological Institute at Chicago, Editors: Mikhail Belkin and Samory Kpotufe, . Background, One of the most classical topics in learning theory is online learning, wherein a learning algo- 链接:https://arxiv.org/abs/2107.09542 摘要:这个开放性问题询问是否存在一个用于二进制分类的在线学习算法,该算法仅在假设点X的(可能是随机的)序列允许该序列存在这样的学习算法的情况下,就所有目标概念而言,保证犯下次线性数目的错误。作为第二个问题,它还询问特定的简明条件是否完全确定给定(可能是随机的)点X序列是否允许存在在线学习算法,以保证所有目标概念的错误数为次线性。 摘要:This open problem asks whether there exists an online learning algorithm for binary classification that guarantees, for all target concepts, to make a sublinear number of mistakes, under only the assumption that the (possibly random) sequence of points X allows that such a learning algorithm can exist for that sequence. As a secondary problem, it also asks whether a specific concise condition completely determines whether a given (possibly random) sequence of points X admits the existence of online learning algorithms guaranteeing a sublinear number of mistakes for all target concepts.

【29】 Large-scale graph representation learning with very deep GNNs and self-supervision 标题:具有极深GNN和自我监督的大规模图表示学习

作者:Ravichandra Addanki,Peter W. Battaglia,David Budden,Andreea Deac,Jonathan Godwin,Thomas Keck,Wai Lok Sibon Li,Alvaro Sanchez-Gonzalez,Jacklynn Stott,Shantanu Thakoor,Petar Veličković 机构:DeepMind, Mila, Université de Montréal, Petar Veliˇckovi´c∗ 备注:To appear at KDD Cup 2021. 13 pages, 3 figures. All authors contributed equally 链接:https://arxiv.org/abs/2107.09422 摘要:有效和高效地大规模部署图神经网络(GNNs)仍然是图表示学习最具挑战性的方面之一。许多强大的解决方案只能在相对较小的数据集上进行验证,结果往往与直觉相反——开放图基准大规模挑战(OGB-LSC)打破了这一障碍。我们使用两个大规模GNN进入OGB-LSC:一个由自举提供动力的深跨导节点分类器和一个由去噪目标正则化的非常深(多达50层)的归纳图回归器。我们的模型在MAG240M和PCQM4M基准上都取得了优异的性能(前3名)。在此过程中,我们展示了可伸缩的自监督图表示学习的证据,以及非常深入的GNNs的实用性——这两个都是非常重要的开放性问题。我们的代码可在以下网址公开获取:https://github.com/deepmind/deepmind-research/tree/master/ogb_lsc. 摘要:Effectively and efficiently deploying graph neural networks (GNNs) at scale remains one of the most challenging aspects of graph representation learning. Many powerful solutions have only ever been validated on comparatively small datasets, often with counter-intuitive outcomes -- a barrier which has been broken by the Open Graph Benchmark Large-Scale Challenge (OGB-LSC). We entered the OGB-LSC with two large-scale GNNs: a deep transductive node classifier powered by bootstrapping, and a very deep (up to 50-layer) inductive graph regressor regularised by denoising objectives. Our models achieved an award-level (top-3) performance on both the MAG240M and PCQM4M benchmarks. In doing so, we demonstrate evidence of scalable self-supervised graph representation learning, and utility of very deep GNNs -- both very important open issues. Our code is publicly available at: https://github.com/deepmind/deepmind-research/tree/master/ogb_lsc.

【30】 Approximation Theory of Convolutional Architectures for Time Series Modelling 标题:时间序列建模的卷积结构逼近理论

作者:Haotian Jiang,Zhong Li,Qianxiao Li 备注:Published version 链接:https://arxiv.org/abs/2107.09355 摘要:我们研究了应用于时间序列建模的卷积结构的逼近性质,这些结构可以用函数逼近问题来描述。在循环设置中,最近的结果揭示了数据生成过程中近似效率和存储结构之间的复杂联系。本文以WaveNet为例,导出了卷积结构的并行结果。我们的结果显示,在这个新的设定下,近似效率的特性不仅是记忆,而且是目标关系中额外的精细结构。这导致了一个基于谱的正则性的新定义,该定义在卷积近似方案下度量时间关系的复杂性。这些分析为理解时间序列建模的建筑选择之间的差异提供了基础,并可以为实际应用提供理论上的扎根指导。 摘要:We study the approximation properties of convolutional architectures applied to time series modelling, which can be formulated mathematically as a functional approximation problem. In the recurrent setting, recent results reveal an intricate connection between approximation efficiency and memory structures in the data generation process. In this paper, we derive parallel results for convolutional architectures, with WaveNet being a prime example. Our results reveal that in this new setting, approximation efficiency is not only characterised by memory, but also additional fine structures in the target relationship. This leads to a novel definition of spectrum-based regularity that measures the complexity of temporal relationships under the convolutional approximation scheme. These analyses provide a foundation to understand the differences between architectural choices for time series modelling and can give theoretically grounded guidance for practical applications.

【31】 Kernel Selection for Stein Variational Gradient Descent 标题:Stein变分梯度下降的核选择

作者:Qingzhong Ai,Shiyu Liu,Zenglin Xu 机构:SMILE Lab, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu , China., b School of Computer Science and Technology, Harbin Institute of Technology Shenzhen, Shenzhen, China. 链接:https://arxiv.org/abs/2107.09338 摘要:Stein变分梯度下降法(SVGD)及其变种在复杂分布的近似推理中取得了成功。然而,它们的经验性能在很大程度上取决于最优核的选择。遗憾的是,带中值启发式的RBF核在以前的方法中是一种常见的选择,已经被证明是次优的。受多核学习范式的启发,我们的解决方案是使用多个核的组合来逼近最优核,而不是使用单个核来限制性能和灵活性。为此,我们将核化Stein差异(KSD)扩展到其多核视图,称为多核化Stein差异(MKSD)。进一步利用MKSD构造了一种基于SVGD的通用算法,称为多核SVGD(MK-SVGD)。此外,我们自动为每个内核分配一个权重,而不需要任何其他参数。该方法不仅消除了最优核依赖,而且保持了计算的有效性。在不同任务和模型上的实验表明了该方法的有效性。 摘要:Stein variational gradient descent (SVGD) and its variants have shown promising successes in approximate inference for complex distributions. However, their empirical performance depends crucially on the choice of optimal kernel. Unfortunately, RBF kernel with median heuristics is a common choice in previous approaches which has been proved sub-optimal. Inspired by the paradigm of multiple kernel learning, our solution to this issue is using a combination of multiple kernels to approximate the optimal kernel instead of a single one which may limit the performance and flexibility. To do so, we extend Kernelized Stein Discrepancy (KSD) to its multiple kernel view called Multiple Kernelized Stein Discrepancy (MKSD). Further, we leverage MKSD to construct a general algorithm based on SVGD, which be called Multiple Kernel SVGD (MK-SVGD). Besides, we automatically assign a weight to each kernel without any other parameters. The proposed method not only gets rid of optimal kernel dependence but also maintains computational effectiveness. Experiments on various tasks and models show the effectiveness of our method.

【32】 Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions 标题:深度学习中的概率推理评估:超越边缘预测

作者:Xiuyuan Lu,Ian Osband,Benjamin Van Roy,Zheng Wen 机构:DeepMind 链接:https://arxiv.org/abs/2107.09224 摘要:任何智能系统的一个基本挑战是预测:给定一些输入值$X_1,…,X_tau$,你能预测结果$Y_1,…,Y_tau$。KL散度$mathbf{d}{mathrm{KL}}$提供了预测质量的自然度量,但大多数深度学习研究只关注每个输入的边际预测。在本技术报告中,我们提出了一个评分规则$mathbf{d}{uuu{mathrm{KL}}}^tau$,由$tauInmathcal{N}$参数化,用于同时评估$tau$输入的联合预测。我们证明了常用的$tau=1$不足以在许多感兴趣的环境中驱动良好的决策。我们还表明,随着$tau$的增长,根据$mathbf{d}{mathrm{KL}}}^tau$执行良好,可以恢复对任何可能决策的普遍保证。最后,我们在$tau$的量表上提供依赖于问题的指导,我们的分数为良好的表现提供了充分的保证。 摘要:A fundamental challenge for any intelligent system is prediction: given some inputs $X_1,..,X_tau$ can you predict outcomes $Y_1,.., Y_tau$. The KL divergence $mathbf{d}_{mathrm{KL}}$ provides a natural measure of prediction quality, but the majority of deep learning research looks only at the marginal predictions per input $X_t$. In this technical report we propose a scoring rule $mathbf{d}_{mathrm{KL}}^tau$, parameterized by $tau in mathcal{N}$ that evaluates the joint predictions at $tau$ inputs simultaneously. We show that the commonly-used $tau=1$ can be insufficient to drive good decisions in many settings of interest. We also show that, as $tau$ grows, performing well according to $mathbf{d}_{mathrm{KL}}^tau$ recovers universal guarantees for any possible decision. Finally, we provide problem-dependent guidance on the scale of $tau$ for which our score provides sufficient guarantees for good performance.

【33】 Asymptotic Escape of Spurious Critical Points on the Low-rank Matrix Manifold 标题:低秩矩阵流形上伪临界点的渐近逃逸

作者:Thomas Y. Hou,Zhenzhen Li,Ziyun Zhang 链接:https://arxiv.org/abs/2107.09207 摘要:证明了低秩矩阵流形上的Riemannian梯度下降算法几乎肯定能避开流形边界上的一些伪临界点。考虑到低秩矩阵流形是一个不完备集,这个结果首次克服了这个困难,并部分证明了在流形上全局使用黎曼梯度下降的合理性。伪临界点是一些秩亏矩阵,它们只捕获了基本真值的部分奇异值分解分量。它们表现出非常奇异的行为,避开了严格鞍点的经典分析。结果表明,利用动态低秩近似和重标度梯度流,一些伪临界点可以转化为经典的严格鞍点,从而得到理想的结果。数值实验支持了我们的理论发现。 摘要:We show that the Riemannian gradient descent algorithm on the low-rank matrix manifold almost surely escapes some spurious critical points on the boundary of the manifold. Given that the low-rank matrix manifold is an incomplete set, this result is the first to overcome this difficulty and partially justify the global use of the Riemannian gradient descent on the manifold. The spurious critical points are some rank-deficient matrices that capture only part of the SVD components of the ground truth. They exhibit very singular behavior and evade the classical analysis of strict saddle points. We show that using the dynamical low-rank approximation and a rescaled gradient flow, some of the spurious critical points can be converted to classical strict saddle points, which leads to the desired result. Numerical experiments are provided to support our theoretical findings.

【34】 Improving exploration in policy gradient search: Application to symbolic optimization 标题:改进策略梯度搜索的探索性:在符号优化中的应用

作者:Mikel Landajuela Larma,Brenden K. Petersen,Soo K. Kim,Claudio P. Santiago,Ruben Glatt,T. Nathan Mundhenk,Jacob F. Pettit,Daniel M. Faissol 机构:Lawrence Livermore National Laboratory, Livermore, CA , USA 备注:None 链接:https://arxiv.org/abs/2107.09158 摘要:许多机器学习策略设计用于自动完成数学任务,利用神经网络搜索数学符号的大组合空间。与传统的进化方法不同,在搜索的核心使用神经网络可以学习更高层次的符号模式,为指导搜索提供一个明智的方向。当没有标记数据可用时,这种网络仍然可以使用强化学习进行训练。然而,我们证明了这种方法会受到早期承诺现象和初始化偏差的影响,这两种情况都限制了探索。我们提出两种探索方法来解决这些问题,建立在熵正则化和分布初始化的思想。我们证明了这些技术可以提高符号回归任务的性能,提高样本效率,降低解的复杂度。 摘要:Many machine learning strategies designed to automate mathematical tasks leverage neural networks to search large combinatorial spaces of mathematical symbols. In contrast to traditional evolutionary approaches, using a neural network at the core of the search allows learning higher-level symbolic patterns, providing an informed direction to guide the search. When no labeled data is available, such networks can still be trained using reinforcement learning. However, we demonstrate that this approach can suffer from an early commitment phenomenon and from initialization bias, both of which limit exploration. We present two exploration methods to tackle these issues, building upon ideas of entropy regularization and distribution initialization. We show that these techniques can improve the performance, increase sample efficiency, and lower the complexity of solutions for the task of symbolic regression.

【35】 Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion 标题:对SGD极限动力学的再思考:修正损耗、相空间振荡和反常扩散

作者:Daniel Kunin,Javier Sagastuy-Brena,Lauren Gillespie,Eshed Margalit,Hidenori Tanaka,Surya Ganguli,Daniel L. K. Yamins 机构:Stanford University, NTT Research, Facebook AI Research 备注:30 pages, 8 figures 链接:https://arxiv.org/abs/2107.09133 摘要:在这项工作中,我们探讨了极限动态的深层神经网络训练与随机梯度下降(SGD)。我们从经验上发现,在性能收敛很久之后,网络通过一个异常扩散过程继续在参数空间中移动,在这个过程中,移动的距离随着具有非平凡指数的梯度更新次数的幂律增长。我们揭示了优化的超参数、梯度噪声中的结构和训练结束时的Hessian矩阵之间的复杂相互作用,从而解释了这种异常扩散。为了建立这种理解,我们首先推导了一个学习率和批量有限的SGD的连续时间模型,作为欠阻尼Langevin方程。我们在线性回归的背景下研究这个方程,在这里我们可以得到从初始到平稳的参数及其瞬时速度的相空间动力学的精确解析表达式。利用Fokker-Planck方程,我们证明了驱动这些动力学的关键因素不是原始的训练损失,而是一个修正的损失的组合,它隐含地正则化了速度和概率流,后者在相空间中引起振荡。我们在ImageNet上训练的ResNet-18模型的动力学中确定了这一理论的定性和定量预测。通过统计物理的视角,我们揭示了用SGD训练的深层神经网络的反常极限动力学的机制根源。 摘要:In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). We find empirically that long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate interaction between the hyperparameters of optimization, the structure in the gradient noise, and the Hessian matrix at the end of training that explains this anomalous diffusion. To build this understanding, we first derive a continuous-time model for SGD with finite learning rates and batch sizes as an underdamped Langevin equation. We study this equation in the setting of linear regression, where we can derive exact, analytic expressions for the phase space dynamics of the parameters and their instantaneous velocities from initialization to stationarity. Using the Fokker-Planck equation, we show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity, and probability currents, which cause oscillations in phase space. We identify qualitative and quantitative predictions of this theory in the dynamics of a ResNet-18 model trained on ImageNet. Through the lens of statistical physics, we uncover a mechanistic origin for the anomalous limiting dynamics of deep neural networks trained with SGD.

【36】 Support Recovery in Universal One-bit Compressed Sensing 标题:通用一位压缩传感中的支持恢复

作者:Arya Mazumdar,Soumyabrata Pal 备注:15 pages 链接:https://arxiv.org/abs/2107.09091 摘要:一位压缩感知(1bCS)是近十年来被广泛研究的一种极端量化信号获取方法。在1bCS中,高维信号的线性样本被量化为每个样本只有一位(测量符号)。假设原始信号向量是稀疏的,现有的结果要么是为了找到向量的支持度,要么是在$epsilon$-球内近似信号。本文的重点是支持恢复,这通常也有助于计算近似信号恢复。1bCS的通用测量矩阵是指一组适用于所有稀疏信号的测量。具有普遍性,众所周知$tilde{Theta}(k^2)$1bCS度量对于支持恢复是必要的和足够的(其中$k$表示稀疏性)。在这项工作中,我们证明了用$tilde{O}(k^{3/2})$度量值用少量的误报来普遍恢复支持是可能的。如果信号向量的动态范围是已知的,那么使用不同的技术,这个结果可以改进为只有$tilde{O}(k)$个测量值。进一步的结果支持恢复也提供了。 摘要:One-bit compressed sensing (1bCS) is an extreme-quantized signal acquisition method that has been widely studied in the past decade. In 1bCS, linear samples of a high dimensional signal are quantized to only one bit per sample (sign of the measurement). Assuming the original signal vector to be sparse, existing results either aim to find the support of the vector, or approximate the signal within an $epsilon$-ball. The focus of this paper is support recovery, which often also computationally facilitates approximate signal recovery. A universal measurement matrix for 1bCS refers to one set of measurements that work for all sparse signals. With universality, it is known that $tilde{Theta}(k^2)$ 1bCS measurements are necessary and sufficient for support recovery (where $k$ denotes the sparsity). In this work, we show that it is possible to universally recover the support with a small number of false positives with $tilde{O}(k^{3/2})$ measurements. If the dynamic range of the signal vector is known, then with a different technique, this result can be improved to only $tilde{O}(k)$ measurements. Further results on support recovery are also provided.

0 人点赞