统计学学术速递[12.20]

2021-12-22 17:04:39 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

stat统计学,共计33篇

【1】 On Frequentist and Bayesian Sequential Clinical Trial Designs 标题:论频数与贝叶斯序贯临床试验设计 链接:https://arxiv.org/abs/2112.09644

作者:Tianjian Zhou,Yuan Ji 机构:Department of Statistics, Colorado State University, Department of Public Health Sciences, University of Chicago 摘要:临床试验通常包括顺序患者输入。在设计临床试验时,通常需要包括对累积数据进行中期分析的规定,以便尽早停止试验。我们回顾了频繁和贝叶斯序贯临床试验设计,重点是它们的基本和哲学差异。频率主义设计利用重复显著性检验或条件能力做出早期停止决定。大多数频密设计都与控制在任何分析中错误拒绝无效假设的总体I型错误率有关。另一方面,贝叶斯设计利用后验概率或后验预测概率进行决策。可以选择贝叶斯设计中的先验值和阈值,以实现所需的频繁操作特性或反映研究者的主观信念。我们还对似然原理进行了评论,该原理通常与连续临床试验中的统计推断和决策联系在一起。一个具有正态分布结果的单臂试验示例贯穿始终,用于说明一些频率和贝叶斯设计。进行数值研究以评估这些设计。 摘要:Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review frequentist and Bayesian sequential clinical trial designs with a focus on their fundamental and philosophical differences. Frequentist designs utilize repeated significance testing or conditional power to make early stopping decisions. The majority of frequentist designs are concerned with controlling the overall type I error rate of falsely rejecting the null hypothesis at any analysis. On the other hand, Bayesian designs utilize posterior or posterior predictive probabilities for decision-making. The prior and threshold values in a Bayesian design can be chosen to either achieve desirable frequentist operating characteristics or reflect the investigator's subjective belief. We also comment on the likelihood principle, which is commonly tied with statistical inference and decision-making in sequential clinical trials. A single-arm trial example with normally distributed outcomes is used throughout to illustrate some frequentist and Bayesian designs. Numerical studies are conducted to assess these designs.

【2】 Selection bias in the treatment effect for a principal stratum 标题:主力地层处理效果中的选择偏差 链接:https://arxiv.org/abs/2112.09541

作者:Yongming Qu,Stephen J. Ruberg,Junxiang Luo,Ilya Lipkovich 机构:Department of Statistics, Data and Analytics, Eli Lilly and Company, Indianapolis, IN , USA, Analytix Thinking, LCC, Bentgrass Court, Indianapolis, IN , USA, Moderna, Inc., Technology Square, Cambridge, MA , USA 备注:9 pages 摘要:对主要地层的处理效果的估计已经研究了二十多年。现有的研究主要集中在估计上,但对于基于主分层的估计的假设形成和检验的研究很少。在这篇简短的报告中,我们讨论了一种现象,即即使两种治疗在患者水平上具有相同的效果,但主要阶层的真实治疗效果可能不等于零,这意味着主要阶层的平均治疗效果相等。我们从选择偏差的角度解释了这一现象。这是一个重要的发现,在使用和解释基于主要分层的结果时值得注意。有必要进一步研究如何形成主阶层估计的零假设。 摘要:Estimation of treatment effect for principal strata has been studied for more than two decades. Existing research exclusively focuses on the estimation, but there is little research on forming and testing hypotheses for principal stratification-based estimands. In this brief report, we discuss a phenomenon in which the true treatment effect for a principal stratum may not equal zero even if the two treatments have the same effect at patient level which implies an equal average treatment effect for the principal stratum. We explain this phenomenon from the perspective of selection bias. This is an important finding and deserves attention when using and interpreting results based on principal stratification. There is a need to further study how to form the null hypothesis for estimands for a principal stratum.

【3】 Doubly Robust Estimation of the Hazard Difference for Competing Risks Data 标题:竞争风险数据风险差的双稳健估计 链接:https://arxiv.org/abs/2112.09535

作者:Denise Rava,Ronghui Xu 机构:Department of Mathematics, University of California, San Diego, Department of Mathematics and Herbert Wertheim School of Public Health 摘要:我们考虑在观测研究中的竞争风险数据的条件处理效果。虽然它被描述为给定协变量的危害函数之间的恒定差异,但我们不假设协变量的特定函数形式。我们使用现代半参数理论得出治疗效果的有效分数,以及两个关于1)治疗的假设倾向分数和审查模型,以及2)竞争风险的结果模型的双重稳健分数。关于估计量的一个重要渐近结果是速率双重鲁棒性,除了经典模型双重鲁棒性之外。速率双重稳健性允许使用机器学习和非参数方法来估计干扰参数,同时保持估计量的根-$n$渐近正态性,以便于推理。我们通过仿真研究了估计器的性能。为了研究中年饮酒行为对晚年认知结果的影响,自20世纪60年代以来,对夏威夷一组日本男性的数据进行了评估。 摘要:We consider the conditional treatment effect for competing risks data in observational studies. While it is described as a constant difference between the hazard functions given the covariates, we do not assume specific functional forms for the covariates. We derive the efficient score for the treatment effect using modern semiparametric theory, as well as two doubly robust scores with respect to 1) the assumed propensity score for treatment and the censoring model, and 2) the outcome models for the competing risks. An important asymptotic result regarding the estimators is rate double robustness, in addition to the classical model double robustness. Rate double robustness enables the use of machine learning and nonparametric methods in order to estimate the nuisance parameters, while preserving the root-$n$ asymptotic normality of the estimators for inferential purposes. We study the performance of the estimators using simulation. The estimators are applied to the data from a cohort of Japanese men in Hawaii followed since 1960s in order to study the effect of mid-life drinking behavior on late life cognitive outcomes.

【4】 A flexible Bayesian hierarchical modeling framework for spatially dependent peaks-over-threshold data 标题:空间相关阈值峰值数据的灵活贝叶斯分层建模框架 链接:https://arxiv.org/abs/2112.09530

作者:Rishikesh Yadav,Raphaël Huser,Thomas Opitz 备注:33 pages, 5 figures, 3 tables 摘要:在这项工作中,我们基于具有轻尾或重尾裕度和各种空间依赖性特征的随机场的一般乘积混合,开发了空间场重复观测中极端阈值超标的构造性建模框架,适当设计,以在尾部和亚渐近水平提供高度灵活性。我们提出的模型类似于最近提出的Gamma-Gamma模型,使用具有Gamma边际分布的过程比率,但它在其联合尾部结构中具有更高的灵活性,更容易捕获强依赖性。我们关注具有以下三个乘积因子的结构,它们的不同作用确保了它们的统计可识别性:重尾空间相关场、轻尾空间常数场和另一轻尾空间独立场。由于该模型的分层公式,基于马尔可夫链蒙特卡罗方法可以方便地进行推理。我们利用Metropolis-adjusted Langevin算法(MALA)和针对潜在变量的随机块建议,以及针对超参数的随机梯度Langevin dynamics(SGLD)算法,以便在相对较高的时空维度上非常有效地拟合我们提出的模型,同时审查非阈值超标情况并在多个站点执行空间预测。截尾机制应用于空间独立分量,因此只需计算单变量累积分布函数。我们探索了新模型的理论特性,并通过模拟和应用于2011-2020年期间西班牙东北部约100个测站的日降水数据,说明了所提出的方法。 摘要:In this work, we develop a constructive modeling framework for extreme threshold exceedances in repeated observations of spatial fields, based on general product mixtures of random fields possessing light or heavy-tailed margins and various spatial dependence characteristics, which are suitably designed to provide high flexibility in the tail and at sub-asymptotic levels. Our proposed model is akin to a recently proposed Gamma-Gamma model using a ratio of processes with Gamma marginal distributions, but it possesses a higher degree of flexibility in its joint tail structure, capturing strong dependence more easily. We focus on constructions with the following three product factors, whose different roles ensure their statistical identifiability: a heavy-tailed spatially-dependent field, a lighter-tailed spatially-constant field, and another lighter-tailed spatially-independent field. Thanks to the model's hierarchical formulation, inference may be conveniently performed based on Markov chain Monte Carlo methods. We leverage the Metropolis adjusted Langevin algorithm (MALA) with random block proposals for latent variables, as well as the stochastic gradient Langevin dynamics (SGLD) algorithm for hyperparameters, in order to fit our proposed model very efficiently in relatively high spatio-temporal dimensions, while simultaneously censoring non-threshold exceedances and performing spatial prediction at multiple sites. The censoring mechanism is applied to the spatially independent component, such that only univariate cumulative distribution functions have to be evaluated. We explore the theoretical properties of the novel model, and illustrate the proposed methodology by simulation and application to daily precipitation data from North-Eastern Spain measured at about 100 stations over the period 2011-2020.

【5】 Correlated Product of Experts for Sparse Gaussian Process Regression 标题:稀疏高斯过程回归的专家相关积 链接:https://arxiv.org/abs/2112.09519

作者:Manuel Schürch,Dario Azzimonti,Alessio Benavoli,Marco Zaffalon 机构:Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), Lugano, Switzerland., Universita della Svizzera italiana (USI), Lugano, Switzerland., University of Limerick (UL), Limerick, Ireland. 摘要:高斯过程(GPs)是机器学习和统计学的重要工具,其应用范围从社会科学、自然科学到工程。它们构成了一种功能强大的核化非参数方法,具有经过良好校准的不确定性估计,然而,由于其立方计算复杂性,现成的GP推理程序仅限于具有数千个数据点的数据集。由于这个原因,许多稀疏GPs技术在过去几年中得到了发展。在本文中,我们关注GP回归任务,并提出了一种新的方法,该方法基于聚合来自多个本地和相关专家的预测。因此,专家之间的相关性程度可以在独立专家和完全相关的专家之间变化。将专家的个人预测进行汇总,并考虑其相关性,从而得出一致的不确定性估计。我们的方法在极限情况下恢复了专家、稀疏GP和完全GP的独立乘积。该框架可以处理一般的核函数和多个变量,并且在时间和空间上的复杂性与专家和数据样本的数量呈线性关系,这使得我们的方法具有高度的可扩展性。我们展示了我们所提出的方法在时间和精度方面的优越性能,与最先进的GP近似方法相比,该方法适用于合成数据集以及多个具有确定性和随机优化的真实数据集。 摘要:Gaussian processes (GPs) are an important tool in machine learning and statistics with applications ranging from social and natural science through engineering. They constitute a powerful kernelized non-parametric method with well-calibrated uncertainty estimates, however, off-the-shelf GP inference procedures are limited to datasets with several thousand data points because of their cubic computational complexity. For this reason, many sparse GPs techniques have been developed over the past years. In this paper, we focus on GP regression tasks and propose a new approach based on aggregating predictions from several local and correlated experts. Thereby, the degree of correlation between the experts can vary between independent up to fully correlated experts. The individual predictions of the experts are aggregated taking into account their correlation resulting in consistent uncertainty estimates. Our method recovers independent Product of Experts, sparse GP and full GP in the limiting cases. The presented framework can deal with a general kernel function and multiple variables, and has a time and space complexity which is linear in the number of experts and data samples, which makes our approach highly scalable. We demonstrate superior performance, in a time vs. accuracy sense, of our proposed method against state-of-the-art GP approximation methods for synthetic as well as several real-world datasets with deterministic and stochastic optimization.

【6】 Online Generalized Additive Model 标题:在线广义加法模型 链接:https://arxiv.org/abs/2112.09497

作者:Ying Yang,Fang Yao 机构:Department of Probability and Statistics, School of Mathematical Sciences, Center for Statistical Science, Peking University, Beijing, China 摘要:加法模型和广义加法模型是多维数据的有效半参数工具。在这篇文章中,我们提出了一个在线平滑回填方法的广义加法模型与局部多项式平滑。其主要思想是使用二阶展开来逼近非线性积分方程,以最大化局部拟线性度,并将系数存储为足够的统计量,该统计量可通过动态候选带宽方法在线更新。更新过程仅取决于存储的足够统计信息和当前数据块。我们推导了在线估计的渐近正态性和相对效率下界,从而深入了解了由候选带宽序列长度驱动的估计精度和计算成本之间的关系。通过仿真和实际数据示例验证了我们的研究结果。 摘要:Additive models and generalized additive models are effective semiparametric tools for multidimensional data. In this article we propose an online smoothing backfitting method for generalized additive models with local polynomial smoothers. The main idea is to use a second order expansion to approximate the nonlinear integral equations to maximize the local quasilikelihood and store the coefficients as the sufficient statistics which can be updated in an online manner by a dynamic candidate bandwidth method. The updating procedure only depends on the stored sufficient statistics and the current data block. We derive the asymptotic normality as well as the relative efficiency lower bounds of the online estimates, which provides insight into the relationship between estimation accuracy and computational cost driven by the length of candidate bandwidth sequence. Simulations and real data examples are provided to validate our findings.

【7】 An overview of active learning methods for insurance with fairness appreciation 标题:公平增值保险的主动学习方法综述 链接:https://arxiv.org/abs/2112.09466

作者:Romuald Elie,Caroline Hillairet,François Hu,Marc Juillard 机构:Université Gustave Eiffel,ENSAE-CREST,Société Générale Insurance 摘要:随着模型部署的民主化,本文讨论并解决了机器学习在保险中应用的一些挑战。第一个挑战是通过主动学习(模型推理和oracle之间的反馈回路)减少标记工作(因此关注数据质量):在保险中,未标记的数据通常非常丰富,主动学习可以成为降低标记成本的重要资产。为此,本文在研究各种经典主动学习方法对合成数据集和真实数据集的实证影响之前,勾勒出了各种经典主动学习方法。保险业的另一个关键挑战是模型推理中的公平性问题。我们将在这个主动学习框架中引入并集成多类任务的后处理公平性,以解决这两个问题。最后,对不公平数据集的数值实验表明,该方法在模型精度和公平性之间取得了很好的折衷。 摘要:This paper addresses and solves some challenges in the adoption of machine learning in insurance with the democratization of model deployment. The first challenge is reducing the labelling effort (hence focusing on the data quality) with the help of active learning, a feedback loop between the model inference and an oracle: as in insurance the unlabeled data is usually abundant, active learning can become a significant asset in reducing the labelling cost. For that purpose, this paper sketches out various classical active learning methodologies before studying their empirical impact on both synthetic and real datasets. Another key challenge in insurance is the fairness issue in model inferences. We will introduce and integrate a post-processing fairness for multi-class tasks in this active learning framework to solve these two issues. Finally numerical experiments on unfair datasets highlight that the proposed setup presents a good compromise between model precision and fairness.

【8】 Sequential decision making for a class of hidden Markov processes, application to medical treatment optimisation 标题:一类隐马尔可夫过程的序贯决策及其在医疗优化中的应用 链接:https://arxiv.org/abs/2112.09408

作者:Alice Cleynen,Benoîte de Saporta 机构:Benoˆıte de Saporta∗, IMAG, Univ Montpellier, CNRS, Montpellier, France 摘要:基于一个医学决策问题,本文研究了一类分段确定半马尔可夫过程的脉冲控制问题。该过程在跳跃之间决定性地演化,且跳跃间时间具有一般分布这可能取决于两个坐标。不观察离散坐标(例如,患者的整体健康状态),连续坐标(例如,某些血液测量的结果)在某些观察时间(可能很少)观察到噪声。本文的目标是在控制过程的同时优化选择观测日期,使其保持接近标称值。在每次到医疗中心就诊时,癌症患者都会接受可能的侵入性分析,并且{治疗和下次就诊日期}是根据他们的结果和患者病史安排的。频繁的观察有助于更好地估计过程的隐藏状态,但对中心和/或患者来说可能成本过高。罕见的观察结果可能会导致未被发现的可能致命的患者健康恶化。我们展示了一个基于过程离散化的接近最优的明确策略。详细讨论了离散网格的构造。这篇论文以2009年法语组间临床试验中拟合的合成数据为例进行了说明。 摘要:Motivated by a medical decision making problem, this paper focuses on an impulse control problem for a class of piecewise deterministic semi-Markov processes. The process evolves deterministically between jumps and the inter-jump times have a general distribution. %that may depend on both coordinates. The discrete coordinate (e.g. global health state of the patient) is not observed, the continuous one (e.g. result of some blood measurement) is observed with noise at some (possibly scarce) observation times. The objective %in this paper is to optimally select the observation dates while controlling the process so that it remains close to a nominal value. At each visit to the medical center, a cancer patient undergoes possibly invasive analyses, and {treatment and next visit dates} are scheduled according to their result and the patient history. Frequent observations lead to a better estimation of the hidden state of the process but may be too costly for the center and/or patient. Rare observations may lead to undetected possibly lethal degradation of the patient's health. We exhibit an explicit policy close to optimality based on discretisations of the process. Construction of discretisation grids are discussed at length. The paper is illustrated with experiments on synthetic data fitted from the Intergroupe Francophone du My'elome 2009 clinical trial.

【9】 Moments and random number generation for the truncated elliptical family of distributions 标题:截断椭圆族分布族的矩和随机数生成 链接:https://arxiv.org/abs/2112.09319

作者:Katherine A. L. Valeriano,Christian E. Galarza,Larissa A. Matos 机构:Departamento de Estatística, Universidade Estadual de Campinas, Campinas, Brazil ,-, Departamento de Matemáticas, Escuela Superior Politécnica del Litoral, Guayaquil, Ecuador 备注:20 pages, 7 figures and 3 tables 摘要:本文提出了一种从具有严格递减密度母函数的截断多元椭圆分布族的任意成员生成随机数的算法。基于Neal(2003)和Ho et al.(2012),我们采用带有Gibbs采样步骤的切片采样算法构建了一种有效的采样方法。我们还提供了一种更快的方法来近似截断多元椭圆分布的第一和第二阶矩,其中蒙特卡罗积分用于截断分区,以及非截断部分的显式表达式(Galarza et al.,2020)。实例和对环境空间数据的应用说明了它的有用性。方法在新的R库中免费提供。 摘要:This paper proposes an algorithm to generate random numbers from any member of the truncated multivariate elliptical family of distributions with a strictly decreasing density generating function. Based on Neal (2003) and Ho et al. (2012), we construct an efficient sampling method by means of a slice sampling algorithm with Gibbs sampler steps. We also provide a faster approach to approximate the first and the second moment for the truncated multivariate elliptical distributions where Monte Carlo integration is used for the truncated partition, and explicit expressions for the non-truncated part (Galarza et al., 2020). Examples and an application to environmental spatial data illustrate its usefulness. Methods are available for free in the new R library elliptical.

【10】 Federated Adaptive Causal Estimation (FACE) of Target Treatment Effects 标题:目标治疗效果的联合自适应因果估计(FACE) 链接:https://arxiv.org/abs/2112.09313

作者:Larry Han,Jue Hou,Kelly Cho,Rui Duan,Tianxi Cai 机构: Department of Biostatistics, Harvard T.H. Chan School of Public Health, Massachusetts Veterans Epidemiology Research and Information Center, US Department of Veteran Affairs, Department of Biomedical Informatics, Harvard Medical School 备注:35 pages, 2 figures, 2 tables 摘要:因果估计的联合学习可以通过聚合来自多个研究地点的估计,极大地提高估计效率,但对极端估计的鲁棒性对于保持一致性至关重要。我们开发了一个联邦自适应因果估计(FACE)框架,以合并来自多个站点的异构数据,为目标人群提供治疗效果估计和推断。我们的策略是高效的通信和隐私保护,并允许灵活地指定目标人群。我们的方法通过密度比加权解释了协变量分布中的场地水平异质性。为了安全地聚合所有站点的估计并避免负迁移,我们引入了一种自适应程序,通过对影响函数的惩罚回归,对使用目标和源总体数据构建的估计量进行加权,从而实现1)一致性和2)最佳效率。我们通过对2019冠状病毒疾病疫苗接种者的美国五名VA1616B2(Pfisher)和MRN-1272(中晚期)疫苗的比较有效性的研究,说明了使用VA网站的电子健康记录。 摘要:Federated learning of causal estimands may greatly improve estimation efficiency by aggregating estimates from multiple study sites, but robustness to extreme estimates is vital for maintaining consistency. We develop a federated adaptive causal estimation (FACE) framework to incorporate heterogeneous data from multiple sites to provide treatment effect estimation and inference for a target population of interest. Our strategy is communication-efficient and privacy-preserving and allows for flexibility in the specification of the target population. Our method accounts for site-level heterogeneity in the distribution of covariates through density ratio weighting. To safely aggregate estimates from all sites and avoid negative transfer, we introduce an adaptive procedure of weighing the estimators constructed using data from the target and source populations through a penalized regression on the influence functions, which achieves 1) consistency and 2) optimal efficiency. We illustrate FACE by conducting a comparative effectiveness study of BNT162b2 (Pfizer) and mRNA-1273 (Moderna) vaccines on COVID-19 outcomes in U.S. veterans using electronic health records from five VA sites.

【11】 Unadjusted Langevin algorithm for sampling a mixture of weakly smooth potentials 标题:弱光滑势混合采样的未调整朗之万算法 链接:https://arxiv.org/abs/2112.09311

作者:Dao Nguyen 摘要:连续时间扩散过程的离散化是一种广泛认可的采样方法。然而,当电位通常要求平滑时(梯度Lipschitz),这似乎是一个相当大的限制。本文研究了欧拉离散化采样问题,其中势函数假定为弱光滑分布的混合物,满足弱耗散。我们建立了Kullback-Leibler(KL)散度的收敛性,迭代次数达到目标分布的$epsilon$-邻域,仅依赖于维数的多项式。我们在无穷远条件下放松了退化凸的{erdogdu2020convergence},证明了在Poincar{e}不等式或球外非强凸下的收敛保证。此外,我们还提供了平滑势的$L_{beta}$-Wasserstein度量的收敛性。 摘要:Discretization of continuous-time diffusion processes is a widely recognized method for sampling. However, it seems to be a considerable restriction when the potentials are often required to be smooth (gradient Lipschitz). This paper studies the problem of sampling through Euler discretization, where the potential function is assumed to be a mixture of weakly smooth distributions and satisfies weakly dissipative. We establish the convergence in Kullback-Leibler (KL) divergence with the number of iterations to reach $epsilon$-neighborhood of a target distribution in only polynomial dependence on the dimension. We relax the degenerated convex at infinity conditions of citet{erdogdu2020convergence} and prove convergence guarantees under Poincar'{e} inequality or non-strongly convex outside the ball. In addition, we also provide convergence in $L_{beta}$-Wasserstein metric for the smoothing potential.

【12】 The Effect of Sample Size and Missingness on Inference with Missing Data 标题:样本量和缺失率对缺失数据推理的影响 链接:https://arxiv.org/abs/2112.09275

作者:Julian Morimoto 备注:Submitted to the Journal of the American Statistical Association on December 14, 2021 摘要:从部分数据中获得的推论(无论是直接似然、贝叶斯还是频度)何时有效?本文通过提供一种关于缺失数据推理的新理论来回答这个问题。证明了随着样本量的增加和缺失程度的降低,忽略缺失机制的部分数据生成的平均对数似然函数几乎肯定会一致收敛到完全数据生成的平均对数似然函数;如果数据是随机丢失的,这种收敛性只取决于样本大小。因此,对部分数据的推断,如后验模式、不确定性估计、置信区间、似然比,以及事实上,从部分数据对数似然函数导出的所有数量或特征,将近似于其真实值(它们将获得完整数据)。这增加了之前的研究,该研究仅证明了后验模式的一致性。讨论了这一结果的实际意义,并在先前的国际人权法研究中检验了这一理论。 摘要:When are inferences (whether Direct-Likelihood, Bayesian, or Frequentist) obtained from partial data valid? This paper answers this question by offering a new theory about inference with missing data. It proves that as the sample size increases and the extent of missingness decreases, the mean-loglikelihood function generated by partial data and that ignores the missingness mechanism will almost surely converge uniformly to that which would have been generated by complete data; and if the data are Missing at Random, this convergence depends only on sample size. Thus, inferences on partial data, such as posterior modes, uncertainty estimates, confidence intervals, likelihood ratios, and indeed, all quantities or features derived from the partial-data loglikelihood function, will approximate their true values (what they would have been given complete data). This adds to previous research which has only proved the consistency of the posterior mode. Practical implications of this result are discussed, and the theory is tested on a previous study of International Human Rights Law.

【13】 Marginalization in Bayesian Networks: Integrating Exact and Approximate Inference 标题:贝叶斯网络中的边际化:精确推理与近似推理的结合 链接:https://arxiv.org/abs/2112.09217

作者:Fritz M. Bayer,Giusi Moffa,Niko Beerenwinkel,Jack Kuipers 机构:chGiusi Moffa University of BaselNiko Beerenwinkel ETH ZurichJack Kuipers ETH Zurichjack 摘要:贝叶斯网络是一种概率图形模型,可以简洁地表示随机变量之间的依赖关系。缺失数据和隐藏变量需要计算变量子集的边际概率分布。虽然边际概率分布的知识对于统计和机器学习中的各种问题至关重要,但由于这项任务的NP难度,它的精确计算对于分类变量通常是不可行的。我们开发了一种分而治之的方法,利用贝叶斯网络的图形特性将边际概率分布的计算划分为低维的子计算,从而降低了总体计算复杂度。利用这一性质,我们提出了一种有效且可扩展的分类变量边际概率分布估计算法。在基准研究中,将新方法与最先进的近似推理方法进行了比较,在基准研究中,新方法表现出了优异的性能。作为一个直接的应用,我们演示了如何使用边际概率分布根据贝叶斯网络对不完整数据进行分类,并使用此方法识别肾癌患者样本的癌症亚型。 摘要:Bayesian Networks are probabilistic graphical models that can compactly represent dependencies among random variables. Missing data and hidden variables require calculating the marginal probability distribution of a subset of the variables. While knowledge of the marginal probability distribution is crucial for various problems in statistics and machine learning, its exact computation is generally not feasible for categorical variables due to the NP-hardness of this task. We develop a divide-and-conquer approach using the graphical properties of Bayesian networks to split the computation of the marginal probability distribution into sub-calculations of lower dimensionality, reducing the overall computational complexity. Exploiting this property, we present an efficient and scalable algorithm for estimating the marginal probability distribution for categorical variables. The novel method is compared against state-of-the-art approximate inference methods in a benchmarking study, where it displays superior performance. As an immediate application, we demonstrate how the marginal probability distribution can be used to classify incomplete data against Bayesian networks and use this approach for identifying the cancer subtype of kidney cancer patient samples.

【14】 iGraphMatch: an R Package for the Analysis of Graph Matching 标题:iGraphMatch:一个用于图匹配分析的R软件包 链接:https://arxiv.org/abs/2112.09212

作者:Zihuan Qiao,Daniel Sussman 机构:Boston University 摘要:iGraphMatch是一个R包,用于查找两个图之间的对应顶点,也称为图匹配。该软件包实现了三类流行的图匹配算法,包括基于松弛、基于渗流和基于谱的匹配算法,适用于一般设置下的图匹配:不同顺序的加权有向图和多层图。我们提供多种选择,以种子的形式结合先验信息,有无噪声和相似性分数。此外,iGraphMatch还提供了一些函数,可以根据几种评估指标总结图形匹配结果,并可视化匹配性能。最后,该软件包使用户能够从经典随机图模型中采样相关随机图对,以生成用于模拟的数据。本文通过使用来自通信、神经元和交通网络的真实数据的详细示例,说明了该软件包在图形匹配分析中的实际应用。 摘要:iGraphMatch is an R package for finding corresponding vertices between two graphs, also known as graph matching. The package implements three categories of prevalent graph matching algorithms including relaxation-based, percolation-based, and spectral-based, which are applicable to matching graphs under general settings: weighted directed graphs of different order and graphs of multiple layers. We provide versatile options to incorporate prior information in the form of seeds with or without noise and similarity scores. In addition, iGraphMatch provides functions to summarize the graph matching results in terms of several evaluation measures and visualize the matching performance. Finally, the package enables users to sample correlated random graph pairs from classic random graph models to generate data for simulations. This paper illustrates the practical applications of the package to the analysis of graph matching by detailed examples using real data from communication, neuron, and transportation networks.

【15】 Empirical Likelihood for the Analysis of Experimental Designs 标题:实验设计分析的经验似然性 链接:https://arxiv.org/abs/2112.09206

作者:Eunseop Kim,Steven MacEachern,Mario Peruggia 机构:∗Department of Statistics, The Ohio State University, Columbus, OH 摘要:经验似然实现了非参数、似然驱动的推理方式,而无需在参数模型中常规作出限制性假设。我们开发了一个将经验可能性应用于实验设计分析的框架,解决了因阻塞和多重假设检验而产生的问题。除了流行的设计,如平衡不完全块设计,我们的方法允许高度不平衡,不完全块设计。基于所有这些设计,我们推导了一组经验似然检验统计量的渐近多元卡方分布。此外,我们还提出了两种单步多重检验方法:渐近蒙特卡罗方法和非参数bootstrap方法。这两种方法都能渐近地控制广义族错误率,并在不明确考虑潜在协方差结构的情况下,有效地构造感兴趣的比较的同时置信区间。模拟研究表明,程序的性能对违反线性混合模型的标准假设具有鲁棒性。值得注意的是,考虑到经验似然的渐近性质,非参数bootstrap方法即使在小样本情况下也表现良好。我们还提出了一个应用于农药的实验。这篇文章的补充材料可以在网上找到。 摘要:Empirical likelihood enables a nonparametric, likelihood-driven style of inference without restrictive assumptions routinely made in parametric models. We develop a framework for applying empirical likelihood to the analysis of experimental designs, addressing issues that arise from blocking and multiple hypothesis testing. In addition to popular designs such as balanced incomplete block designs, our approach allows for highly unbalanced, incomplete block designs. Based on all these designs, we derive an asymptotic multivariate chi-square distribution for a set of empirical likelihood test statistics. Further, we propose two single-step multiple testing procedures: asymptotic Monte Carlo and nonparametric bootstrap. Both procedures asymptotically control the generalized family-wise error rate and efficiently construct simultaneous confidence intervals for comparisons of interest without explicitly considering the underlying covariance structure. A simulation study demonstrates that the performance of the procedures is robust to violations of standard assumptions of linear mixed models. Significantly, considering the asymptotic nature of empirical likelihood, the nonparametric bootstrap procedure performs well even for small sample sizes. We also present an application to experiments on a pesticide. Supplementary materials for this article are available online.

【16】 Sensitivity Analysis of the MCRF Model to Different Transiogram Joint Modeling Methods for Simulating Categorical Spatial Variables 标题:MCRF模型对不同过渡图联合建模方法的敏感性分析 链接:https://arxiv.org/abs/2112.09178

作者:Bo Zhang,Weidong Li,Chuanrong Zhang 机构:Department of Geography and Center for Environmental Sciences and Engineering, University of, Connecticut, Storrs, CT ,-, USA 备注:29 pages, 10 figures, 12 Tables 摘要:马尔可夫链地质统计学是一种模拟分类场的方法。其条件模拟的基本模型是马尔可夫链随机场(MCRF)模型,其基本空间相关性度量是传递图。基于样本数据和专家知识,有不同的方法获得MCRF仿真的过渡图模型(即连续滞后过渡图):线性插值法、数学模型联合拟合法以及前两者的混合方法。进行了两个案例研究,以说明模拟结果(包括最佳预测图和模拟实现图)如何响应三种不同的过渡图建模方法生成的不同过渡图模型集。结果表明,三种过渡图关节建模方法是可行的;MCRF模型通常对不同方法产生的传输图模型不太敏感,尤其是当样本数据足以生成可靠的实验传输图时;并且,基于不同过渡图模型集的整体模拟精度之间的差异并不显著。然而,当理论传递图模型(由数学模型拟合专家知识生成)用于次要类别时,一些次要类别在模拟精度方面表现出明显的改善。总的来说,本研究表明,当可以估计出有意义的实验传递图时,从实验传递图推导传递图模型的方法可以在分类土壤变量的条件模拟中表现良好。采用数学模型对次要类进行过渡图建模,提供了一种结合专家知识和提高次要类模拟精度的方法。 摘要:Markov chain geostatistics is a methodology for simulating categorical fields. Its fundamental model for conditional simulation is the Markov chain random field (MCRF) model, and its basic spatial correlation measure is the transiogram. There are different ways to get transiogram models (i.e., continuous-lag transiograms) for MCRF simulation based on sample data and expert knowledge: linear interpolation method, mathematical model joint-fitting method, and a mixed method of the former two. Two case studies were conducted to show how simulated results, including optimal prediction maps and simulated realization maps, would respond to different sets of transiogram models generated by the three different transiogram jointing modeling methods. Results show that the three transiogram joint modeling methods are applicable; the MCRF model is generally not very sensitive to the transiogram models produced by different methods, especially when sample data are sufficient to generate reliable experimental transiograms; and the differences between overall simulation accuracies based on different sets of transiogram models are not significant. However, some minor classes show obvious improvement in simulation accuracy when theoretical transiogram models (generated by mathematical model fitting with expert knowledge) are used for minor classes. In general, this study indicates that the methods for deriving transiogram models from experimental transiograms can perform well in conditional simulations of categorical soil variables when meaningful experimental transiograms can be estimated. Employing mathematical models for transiogram modeling of minor classes provides a way to incorporate expert knowledge and improve the simulation accuracy of minor classes.

【17】 Game-theoretic Formulations of Sequential Nonparametric One- and Two-Sample Tests 标题:序贯非参数单样和双样试验的博弈论公式 链接:https://arxiv.org/abs/2112.09162

作者:Shubhanshu Shekhar,Aaditya Ramdas 机构:Department of Statistics and Data Science, Carnegie Mellon University, Department of Machine Learning, Carnegie Mellon University 备注:56 pages, 7 figures 摘要:我们研究在非参数环境下设计一致的连续一个和两个样本检验的问题。在博彩测试原则的指导下,我们将构建序列测试的任务重新构造为选择使虚拟投注者财富最大化的支付函数,在重复博弈中对空投注。当投注者的财富过程超过适当的阈值时,由此产生的序列测试拒绝空值。我们提出了一种选择支付函数作为与某些统计距离度量的变分表示相关的emph{witness function}的可预测估计的一般策略,如积分概率度量(IPMs)和$varphi$-发散。总的来说,这种方法确保(i)财富过程在空值下是一个非负鞅,从而允许对i型误差进行严格控制,(ii)在替代方案下几乎肯定会增长到无穷大,从而意味着一致性。我们通过设计复合e-过程来实现这一点,该复合e-过程在零下保持有界期望,但在另一种情况下增长到无穷大。我们例举了一些常用距离度量的通用测试,以获得Kolmogorov-Smirnov~(KS)测试、$chi^2$-测试和内核MMD测试的顺序版本,并通过实证证明了它们能够适应未知的问题难度。本文构建的顺序测试框架是通用的,我们最后讨论了如何将这些思想应用于两个相关问题:高阶随机优势测试和对称性测试。 摘要:We study the problem of designing consistent sequential one- and two-sample tests in a nonparametric setting. Guided by the principle of emph{testing by betting}, we reframe the task of constructing sequential tests into that of selecting payoff functions that maximize the wealth of a fictitious bettor, betting against the null in a repeated game. The resulting sequential test rejects the null when the bettor's wealth process exceeds an appropriate threshold. We propose a general strategy for selecting payoff functions as predictable estimates of the emph{witness function} associated with the variational representation of some statistical distance measures, such as integral probability metrics~(IPMs) and $varphi$-divergences. Overall, this approach ensures that (i) the wealth process is a non-negative martingale under the null, thus allowing tight control over the type-I error, and (ii) it grows to infinity almost surely under the alternative, thus implying consistency. We accomplish this by designing composite e-processes that remain bounded in expectation under the null, but grow to infinity under the alternative. We instantiate the general test for some common distance metrics to obtain sequential versions of Kolmogorov-Smirnov~(KS) test, $chi^2$-test and kernel-MMD test, and empirically demonstrate their ability to adapt to the unknown hardness of the problem. The sequential testing framework constructed in this paper is versatile, and we end with a discussion on applying these ideas to two related problems: testing for higher-order stochastic dominance, and testing for symmetry.

【18】 On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks 标题:深度神经网络训练中梯度下降法的全局极小值存在性及收敛性分析 链接:https://arxiv.org/abs/2112.09684

作者:Arnulf Jentzen,Adrian Riekert 机构: Applied Mathematics: Institute for Analysis and Numerics, School of Data Science and Shenzhen Research Institute of Big Data 备注:93 pages, 2 figures. arXiv admin note: text overlap with arXiv:2112.07369, arXiv:2108.04620 摘要:在本文中,我们研究了具有任意多个隐层的全连接前馈深ReLU神经网络,并在假设其概率分布的非正规概率密度函数为所考虑的监督学习问题的输入数据是分段多项式,假设目标函数(描述输入数据和输出数据之间的关系)是分段多项式,并且假设所考虑的监督学习问题的风险函数至少允许一个正则全局极小值。此外,在只有一个隐藏层和一维输入的浅层人工神经网络的特殊情况下,我们还通过在训练此类浅层人工神经网络时证明,对于每个Lipschitz连续目标函数,风险景观中存在一个全局最小值来验证这一假设。最后,在训练具有ReLU激活的深层神经网络时,我们还研究了梯度流(GF)微分方程的解,并证明了每个非发散GF轨迹以多项式的收敛速度收敛到临界点(在限制Fr'echet次微分的意义上)。我们的数学收敛性分析建立在真实代数几何工具(如半代数函数和广义Kurdyka-Lojasiewicz不等式的概念)的基础上,建立在函数分析工具(如Arzel`a-Ascoli定理)的基础上,建立在非光滑分析工具(如限制Fr`echet次梯度的概念)的基础上,以及具有固定结构的浅层ReLU ANN的实现函数集形成Petersen等人揭示的连续函数集的闭合子集这一事实。 摘要:In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum. In addition, in the special situation of shallow ANNs with just one hidden layer and one-dimensional input we also verify this assumption by proving in the training of such shallow ANNs that for every Lipschitz continuous target function there exists a global minimum in the risk landscape. Finally, in the training of deep ANNs with ReLU activation we also study solutions of gradient flow (GF) differential equations and we prove that every non-divergent GF trajectory converges with a polynomial rate of convergence to a critical point (in the sense of limiting Fr'echet subdifferentiability). Our mathematical convergence analysis builds up on tools from real algebraic geometry such as the concept of semi-algebraic functions and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional analysis such as the Arzel`a-Ascoli theorem, on tools from nonsmooth analysis such as the concept of limiting Fr'echet subgradients, as well as on the fact that the set of realization functions of shallow ReLU ANNs with fixed architecture forms a closed subset of the set of continuous functions revealed by Petersen et al.

【19】 Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation 标题:基于伪标签自训练的局部对比损失半监督医学图像分割 链接:https://arxiv.org/abs/2112.09645

作者:Krishna Chaitanya,Ertunc Erdil,Neerav Karani,Ender Konukoglu 备注:13 pages, 4 figures, 7 tables. This article is under review at a Journal 摘要:基于监督深度学习的方法可以得到准确的医学图像分割结果。然而,这需要大量的标记数据集,获取它们是一项艰巨的任务,需要临床专业知识。基于半监督/自监督学习的方法通过利用未标记数据和有限的注释数据来解决这一限制。最近的自监督学习方法使用对比损失从未标记图像中学习良好的全局级表示,并在流行的自然图像数据集(如ImageNet)上实现高性能的分类任务。在像素级预测任务(如分割)中,还必须学习良好的局部级表示和全局表示,以获得更好的精度。然而,现有的基于局部对比损失的方法对于学习良好的局部表示的影响仍然有限,因为相似和不同的局部区域是基于随机增强和空间邻近性定义的;由于在半监督/自监督环境中缺乏大规模专家注释,因此不基于局部区域的语义标签。在本文中,我们提出了一种局部对比损失法,通过利用从未标记图像的伪标签和有限的注释图像中获得的语义标签信息来学习用于分割的良好像素级特征。特别是,我们定义了建议的损失,以鼓励对具有相同伪标签/标签的像素进行类似表示,同时与数据集中具有不同伪标签/标签的像素的表示不同。我们执行基于伪标签的自训练,并通过联合优化建议的标记集和未标记集上的对比损失和仅有限标记集上的分割损失来训练网络。我们在三个公共心脏和前列腺数据集上进行了评估,获得了较高的分割性能。 摘要:Supervised deep learning-based methods yield accurate results for medical image segmentation. However, they require large labeled datasets for this, and obtaining them is a laborious task that requires clinical expertise. Semi/self-supervised learning-based approaches address this limitation by exploiting unlabeled data along with limited annotated data. Recent self-supervised learning methods use contrastive loss to learn good global level representations from unlabeled images and achieve high performance in classification tasks on popular natural image datasets like ImageNet. In pixel-level prediction tasks such as segmentation, it is crucial to also learn good local level representations along with global representations to achieve better accuracy. However, the impact of the existing local contrastive loss-based methods remains limited for learning good local representations because similar and dissimilar local regions are defined based on random augmentations and spatial proximity; not based on the semantic label of local regions due to lack of large-scale expert annotations in the semi/self-supervised setting. In this paper, we propose a local contrastive loss to learn good pixel level features useful for segmentation by exploiting semantic label information obtained from pseudo-labels of unlabeled images alongside limited annotated images. In particular, we define the proposed loss to encourage similar representations for the pixels that have the same pseudo-label/ label while being dissimilar to the representation of pixels with different pseudo-label/label in the dataset. We perform pseudo-label based self-training and train the network by jointly optimizing the proposed contrastive loss on both labeled and unlabeled sets and segmentation loss on only the limited labeled set. We evaluated on three public cardiac and prostate datasets, and obtain high segmentation performance.

【20】 Joint machine learning analysis of muon spectroscopy data from different materials 标题:不同材料µ子谱数据的联合机器学习分析 链接:https://arxiv.org/abs/2112.09601

作者:T. Tula,G. Möller,J. Quintanilla,S. R. Giblin,A. D. Hillier,E. E. McCabe,S. Ramos,D. S. Barker,S. Gibson 机构: 5 and S Gibson 1 1 School of Physical Sciences, University of Kent, United Kingdom 2 School of Physics and Astronomy, Cardiff University, United Kingdom 4 Department of Physics, Durham University, United Kingdom 5 School of Physics and Astronomy 备注:4 pages, 1 figure, to be published in Journal of Physics: Conference Series, proceedings paper from SCES 2020 conference 摘要:机器学习(ML)方法已被证明是物理科学中非常成功的工具,尤其是在应用于实验数据分析时。人工智能特别擅长识别高维数据中的模式,而在高维数据中,人工智能的表现通常优于人类。在这里,我们应用了一个简单的ML工具,称为主成分分析(PCA),来研究μ介子光谱的数据。本实验测得的量是一个不对称函数,它包含了样品平均本征磁场的信息。不对称函数的变化可能表明相变;然而,这些变化可能非常微妙,现有的数据分析方法需要了解材料的特定物理特性。PCA是一种无监督的ML工具,这意味着不需要对输入数据进行假设,但我们发现它仍然可以成功地应用于不对称曲线,并且可以恢复相变指示。该方法应用于一系列具有不同基础物理性质的磁性材料。我们发现,同时对所有这些材料进行PCA可以对相变指标的清晰度产生积极影响,也可以改进不对称函数最重要变化的检测。对于这种联合PCA,我们引入了一种简单的方法来跟踪不同材料的贡献,以便进行更有意义的分析。 摘要:Machine learning (ML) methods have proved to be a very successful tool in physical sciences, especially when applied to experimental data analysis. Artificial intelligence is particularly good at recognizing patterns in high dimensional data, where it usually outperforms humans. Here we applied a simple ML tool called principal component analysis (PCA) to study data from muon spectroscopy. The measured quantity from this experiment is an asymmetry function, which holds the information about the average intrinsic magnetic field of the sample. A change in the asymmetry function might indicate a phase transition; however, these changes can be very subtle, and existing methods of analyzing the data require knowledge about the specific physics of the material. PCA is an unsupervised ML tool, which means that no assumption about the input data is required, yet we found that it still can be successfully applied to asymmetry curves, and the indications of phase transitions can be recovered. The method was applied to a range of magnetic materials with different underlying physics. We discovered that performing PCA on all those materials simultaneously can have a positive effect on the clarity of phase transition indicators and can also improve the detection of the most important variations of asymmetry functions. For this joint PCA we introduce a simple way to track the contributions from different materials for a more meaningful analysis.

【21】 Free-Riding for Future: Field Experimental Evidence of Strategic Substitutability in Climate Protest 标题:搭乘未来的便车:气候抗议中战略替代的现场实验证据 链接:https://arxiv.org/abs/2112.09478

作者:Johannes Jarke-Neuert,Grischa Perino,Henrike Schwickert 备注:21 pages, 6 pages appendix, 4 figures 摘要:我们检验了一个假设,即潜在气候抗议者的成年人群中的抗议参与决策是相互依存的。来自德国四大城市的受试者(n=1510)在抗议日期前两周招募。我们在一项在线调查中测量了参与(事后)和其他受试者参与(事前)的信念,使用随机信息干预诱导信念的外源性差异,并使用控制函数方法估计信念变化对参与概率的因果影响。研究发现,参与决策是一种战略替代:信念增加1个百分点,平均受试者参与概率降低0.67个百分点。 摘要:We test the hypothesis that protest participation decisions in an adult population of potential climate protesters are interdependent. Subjects (n=1,510) from the four largest German cities were recruited two weeks before protest date. We measured participation (ex post) and beliefs about the other subjects' participation (ex ante) in an online survey, used a randomized informational intervention to induce exogenous variance in beliefs, and estimated the causal effect of a change in belief on the probability of participation using a control function approach. Participation decisions are found to be strategic substitutes: a one percentage-point increase of belief causes a .67 percentage-point decrease in the probability of participation in the average subject.

【22】 Federated Learning with Heterogeneous Data: A Superquantile Optimization Approach 标题:异构数据联合学习:一种超分位数优化方法 链接:https://arxiv.org/abs/2112.09429

作者:Krishna Pillutla,Yassine Laguel,Jérôme Malick,Zaid Harchaoui 机构:University of Washington, Seattle, WA, USA, Univ. Grenoble Alpes, Grenoble, France, CNRS, Grenoble, France 备注:This is the longer version of a conference paper published in IEEE CISS 2021 摘要:我们提出了一个联邦学习框架,该框架旨在跨异构数据的各个客户端提供良好的预测性能。所提出的方法依赖于基于超分位数的学习目标,该目标捕获异构客户机上错误分布的尾部统计信息。我们提出了一种随机训练算法,该算法将不同的私有客户端重新加权步骤与联邦平均步骤交织在一起。该算法具有有限时间收敛性保证,同时覆盖凸和非凸设置。在联邦学习的基准数据集上的实验结果表明,我们的方法在平均误差方面与经典方法具有竞争力,并且在误差的尾部统计方面优于经典方法。 摘要:We present a federated learning framework that is designed to robustly deliver good predictive performance across individual clients with heterogeneous data. The proposed approach hinges upon a superquantile-based learning objective that captures the tail statistics of the error distribution over heterogeneous clients. We present a stochastic training algorithm which interleaves differentially private client reweighting steps with federated averaging steps. The proposed algorithm is supported with finite time convergence guarantees that cover both convex and non-convex settings. Experimental results on benchmark datasets for federated learning demonstrate that our approach is competitive with classical ones in terms of average error and outperforms them in terms of tail statistics of the error.

【23】 Continual Learning for Monolingual End-to-End Automatic Speech Recognition 标题:基于连续学习的单语端到端自动语音识别 链接:https://arxiv.org/abs/2112.09427

作者:Steven Vander Eeckt,Hugo Van hamme 机构:KU Leuven, Department Electrical Engineering ESAT-PSI, Leuven, Belgium 备注:Submitted to ICASSP 2021. 5 pages, 1 figure 摘要:使自动语音识别(ASR)模型适应新的领域会导致原始领域的性能下降,这种现象称为灾难性遗忘(CF)。即使是单语ASR模型也无法扩展到新的口音、方言、主题等,而不会受到CF的影响,这使得它们无法在不存储所有过去数据的情况下不断增强。幸运的是,可以使用持续学习(CL)方法,其目的是在克服CF的同时实现持续适应。在本文中,我们为端到端ASR实现了大量的CL方法,并测试和比较了它们在四个新任务中扩展单语混合CTCTransformer模型的能力。我们发现,性能最佳的CL方法将微调模型(下限)和所有任务联合训练的模型(上限)之间的差距缩小了40%以上,同时只需要访问0.6%的原始数据。 摘要:Adapting Automatic Speech Recognition (ASR) models to new domains leads to a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.

【24】 A random energy approach to deep learning 标题:深度学习的随机能量方法 链接:https://arxiv.org/abs/2112.09420

作者:Rongrong Xie,Matteo Marsili 机构:Key Laboratory of Quark and Lepton Physics (MOE) and Institute of Particle Physics, Central China Normal University (CCNU), Wuhan, China, Quantitative Life Sciences Section, The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy 备注:16 pages, 4 figures 摘要:我们研究了一个由各层隐态能级分布参数化的深度信念网络的通用集合。我们表明,在随机能量方法中,只有当每一层在学习过程中调整到接近临界点时,统计依赖性才能从可见层传播到深层。因此,经过有效训练的学习机器的特点是能量水平分布广泛。对不同数据集的深层信念网络和受限玻尔兹曼机器的分析证实了这些结论。 摘要:We study a generic ensemble of deep belief networks which is parametrized by the distribution of energy levels of the hidden states of each layer. We show that, within a random energy approach, statistical dependence can propagate from the visible to deep layers only if each layer is tuned close to the critical point during learning. As a consequence, efficiently trained learning machines are characterised by a broad distribution of energy levels. The analysis of Deep Belief Networks and Restricted Boltzmann Machines on different datasets confirms these conclusions.

【25】 Forward-backward algorithms with a biallelic mutation-drift model: Orthogonal polynomials, and a coalescent/urn-model based approach 标题:具有双等位突变-漂移模型的前向-后向算法:正交多项式和基于合并/URN模型的方法 链接:https://arxiv.org/abs/2112.09394

作者:Claus Vogl,Sandra Peer,Lynette Caitlin Mikula 机构:Department of Biomedical Sciences, Vetmeduni Vienna, Veterin¨arplatz , A-, Wien, Vienna Graduate School of Population Genetics, A-, Wien, Austria, Centre for Biological Diversity, School of Biology, University of St. Andrews, St Andrews, KY,TH, UK 摘要:使用反向算法推断样本等位基因配置的边际可能性,得到与Kingman结合模型、Moran模型和扩散模型相同的结果(直至时间尺度)。为了推断过去任何给定点的祖先群体等位基因频率的概率,无论是聚合中的离散祖先等位基因配置,还是反向扩散中的祖先等位基因比例,反向方法都需要与相应的正向方法相结合。这是通过所谓的前向-后向算法实现的。在本文中,我们将正交多项式用于前后向算法。它们能够有效地计算现存样本的过去等位基因配置,以及平衡和非平衡状态下祖先群体等位基因频率的概率。我们证明,样本的谱系完全由其等位基因配置的边际似然的向后多项式展开来描述。 摘要:Inference of the marginal likelihood of sample allele configurations using backward algorithms yields identical results with the Kingman coalescent, the Moran model, and the diffusion model (up to a scaling of time). For inference of probabilities of ancestral population allele frequencies at any given point in the past - either of discrete ancestral allele configurations as in the coalescent, or of ancestral allele proportions as in the backward diffusion - backward approaches need to be combined with corresponding forward ones. This is done in so-called forward-backward algorithms. In this article, we utilize orthogonal polynomials in forward-backward algorithms. They enable efficient calculation of past allele configurations of an extant sample and probabilities of ancestral population allele frequencies in equilibrium and in non-equilibrium. We show that the genealogy of a sample is fully described by the backward polynomial expansion of the marginal likelihood of its allele configuration.

【26】 Improving evidential deep learning via multi-task learning 标题:利用多任务学习改进证据深度学习 链接:https://arxiv.org/abs/2112.09368

作者:Dongpin Oh,Bonggun Shin 机构: Deargen Inc., Seoul, South Korea, Deargen USA Inc., Atlanta, GA 备注:Accepted by AAAI-2022 摘要:证据回归网络(ENet)估计连续目标及其预测不确定性,无需昂贵的贝叶斯模型平均。然而,由于ENet原始损失函数的梯度收缩问题,即负对数边际似然(NLL)损失,目标预测可能不准确。在本文中,目标是通过解决梯度收缩问题来提高ENet的预测精度,同时保持其有效的不确定性估计。为了实现这一目标,提出了一个多任务学习(MTL)框架,称为MT-ENet。在MTL中,我们将Lipschitz修正均方误差(MSE)损失函数定义为另一损失,并将其添加到现有NLL损失中。Lipschitz修正的MSE损耗通过动态调整其Lipschitz常数来缓解与NLL损耗的梯度冲突。这样,Lipschitz MSE损失不会干扰NLL损失的不确定性估计。MT-ENet在不损失合成数据集和真实基准(包括药物靶点亲和力(DTA)回归)的不确定性估计能力的情况下,提高了ENet的预测准确性。此外,MT-ENet在DTA基准上显示出显著的校准和分布外检测能力。 摘要:The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks.

【27】 MUSE: Marginal Unbiased Score Expansion and Application to CMB Lensing 标题:MUSE:边缘无偏分数扩展及其在CMB镜头中的应用 链接:https://arxiv.org/abs/2112.09354

作者:Marius Millea,Uros Seljak 机构:Department of Physics, University of California, Berkeley, CA , USA and, Department of Physics, University of California, Davis, CA , USA, Uroˇs Seljak, and Lawrence Berkeley National Laboratory, Berkeley, CA , USA 备注:22 pages, 8 figures 摘要:我们提出了边际无偏分数扩展(MUSE)方法,这是一种通用的高维层次贝叶斯推理算法。MUSE在任意非高斯潜在参数空间上执行近似边缘化,对感兴趣的全局参数产生高斯化的渐近无偏和近似最优约束。它在计算上比哈密顿蒙特卡罗(HMC)等精确替代方法便宜得多,在挑战HMC的漏斗问题上表现出色,并且不像其他近似方法(如变分推理或许多基于模拟的推理方法)那样,不需要任何特定于问题的用户监督。MUSE使首次联合贝叶斯估计被删除的宇宙微波背景(CMB)功率谱和引力透镜势功率谱成为可能,在模拟数据集上演示,该模拟数据集与即将到来的南极望远镜3G 1500度$^2$测量一样大,对应于${sim}的潜在维数,600万美元,100个全球波段功率参数。在问题的一个子集上,精确但更昂贵的HMC解是可行的,我们验证了MUSE产生了接近最优的结果。我们还证明,现有的基于光谱的预测工具忽略了像素掩蔽,仅将预测误差条低估了${sim},10%。该方法对于快速透镜化和去透镜化分析是一条有前途的途径,这对于未来的CMB实验(如SPT-3G、Simons天文台或CMB-S4)是必要的,并且可以补充或取代现有的HMC方法。MUSE在这一具有挑战性的问题上的成功加强了其作为一类广泛的高维推理问题的通用程序的案例。 摘要:We present the marginal unbiased score expansion (MUSE) method, an algorithm for generic high-dimensional hierarchical Bayesian inference. MUSE performs approximate marginalization over arbitrary non-Gaussian latent parameter spaces, yielding Gaussianized asymptotically unbiased and near-optimal constraints on global parameters of interest. It is computationally much cheaper than exact alternatives like Hamiltonian Monte Carlo (HMC), excelling on funnel problems which challenge HMC, and does not require any problem-specific user supervision like other approximate methods such as Variational Inference or many Simulation-Based Inference methods. MUSE makes possible the first joint Bayesian estimation of the delensed Cosmic Microwave Background (CMB) power spectrum and gravitational lensing potential power spectrum, demonstrated here on a simulated data set as large as the upcoming South Pole Telescope 3G 1500 deg$^2$ survey, corresponding to a latent dimensionality of ${sim},6$ million and of order 100 global bandpower parameters. On a subset of the problem where an exact but more expensive HMC solution is feasible, we verify that MUSE yields nearly optimal results. We also demonstrate that existing spectrum-based forecasting tools which ignore pixel-masking underestimate predicted error bars by only ${sim},10%$. This method is a promising path forward for fast lensing and delensing analyses which will be necessary for future CMB experiments such as SPT-3G, Simons Observatory, or CMB-S4, and can complement or supersede existing HMC approaches. The success of MUSE on this challenging problem strengthens its case as a generic procedure for a broad class of high-dimensional inference problems.

【28】 Gaussian RBF Centered Kernel Alignment (CKA) in the Large Bandwidth Limit 标题:大带宽限制下的高斯径向基核对齐(CKA) 链接:https://arxiv.org/abs/2112.09305

作者:Sergio A. Alvarez 机构:Department of Computer Science, Boston College, Chestnut Hill, MA , USA 备注:11 pages, 3 figures 摘要:我们证明了基于高斯径向基函数核的中心核对齐(CKA)在大带宽限制下收敛到线性CKA。我们证明了收敛起始点对特征表示的几何结构敏感,并且表示偏心率限制了高斯CKA非线性行为的带宽范围。 摘要:We prove that Centered Kernel Alignment (CKA) based on a Gaussian RBF kernel converges to linear CKA in the large-bandwidth limit. We show that convergence onset is sensitive to the geometry of the feature representations, and that representation eccentricity bounds the range of bandwidths for which Gaussian CKA behaves nonlinearly.

【29】 A Robust Optimization Approach to Deep Learning 标题:一种用于深度学习的鲁棒优化方法 链接:https://arxiv.org/abs/2112.09279

作者:Dimitris Bertsimas,Xavier Boix,Kimberly Villalobos Carballo,Dick den Hertog 机构:Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA , USA, Department of Brain and Cognitive Sciences, Amsterdam Business School, University of Amsterdam 摘要:许多最先进的对抗性训练方法利用对抗性损失的上限来提供安全保障。然而,这些方法需要在每个训练步骤进行计算,而这些计算不能包含在反向传播的梯度中。我们介绍了一种新的、更具原则性的对抗性训练方法,该方法基于对抗性损失上界的封闭形式解,可以通过反向传播进行有效训练。稳健优化的最新工具促进了这一界限。我们用我们的方法导出了两种新方法。第一种方法(近似鲁棒上界或aRUB)使用网络的一阶近似以及线性鲁棒优化的基本工具,以获得易于实现的对抗性损失的近似上界。第二种方法(鲁棒上界或RUB)计算对抗损失的精确上界。通过各种表格和视觉数据集,我们证明了我们更具原则性的方法的有效性——对于较大的扰动,RUB比最先进的方法更加稳健,而对于较小的扰动,aRUB的性能与最先进的方法相当。此外,RUB和aRUB的速度都比标准对抗训练快(以增加记忆力为代价)。所有重现结果的代码都可以在https://github.com/kimvc7/Robustness. 摘要:Many state-of-the-art adversarial training methods leverage upper bounds of the adversarial loss to provide security guarantees. Yet, these methods require computations at each training step that can not be incorporated in the gradient for backpropagation. We introduce a new, more principled approach to adversarial training based on a closed form solution of an upper bound of the adversarial loss, which can be effectively trained with backpropagation. This bound is facilitated by state-of-the-art tools from robust optimization. We derive two new methods with our approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from linear robust optimization to obtain an approximate upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes an exact upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our more principled approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations. Also, both RUB and aRUB run faster than standard adversarial training (at the expense of an increase in memory). All the code to reproduce the results can be found at https://github.com/kimvc7/Robustness.

【30】 Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning 标题:非光滑非凸统计学习的广义Bregman代理算法分析 链接:https://arxiv.org/abs/2112.09191

作者:Yiyuan She,Zhifeng Wang,Jiuwu Jin 机构:Department of Statistics, Florida State University 备注:None 摘要:现代统计应用通常涉及最小化可能是非光滑和/或非凸的目标函数。本文重点介绍了一个广泛的Bregman代理算法框架,包括局部线性逼近、镜像下降、迭代阈值、DC规划和许多其他特殊实例。通过广义Bregman函数的再特征化,我们可以构造适当的误差测度,并在可能的高维中建立非凸和非光滑目标的全局收敛速度。对于具有复合目标的稀疏学习问题,在某些正则条件下,所得到的估计量作为代理不动点,虽然不一定是局部极小值,但具有可证明的统计保证,迭代序列可以在几何上快速接近期望精度内的统计真值。本文还研究了如何在不假设凸性或光滑性的情况下,通过仔细控制步长和松弛参数来设计基于动量的自适应加速度。 摘要:Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The recharacterization via generalized Bregman functions enables us to construct suitable error measures and establish global convergence rates for nonconvex and nonsmooth objectives in possibly high dimensions. For sparse learning problems with a composite objective, under some regularity conditions, the obtained estimators as the surrogate's fixed points, though not necessarily local minimizers, enjoy provable statistical guarantees, and the sequence of iterates can be shown to approach the statistical truth within the desired accuracy geometrically fast. The paper also studies how to design adaptive momentum based accelerations without assuming convexity or smoothness by carefully controlling stepsize and relaxation parameters.

【31】 Monitoring crop phenology with street-level imagery using computer vision 标题:利用计算机视觉利用街道图像监测作物物候 链接:https://arxiv.org/abs/2112.09190

作者:Raphaël d'Andrimont,Momchil Yordanov,Laura Martinez-Sanchez,Marijn van der Velde 机构:European Commission, Joint Research Centre (JRC), Ispra , Italy, ∗ These authors contributed equally to this work. 备注:18 pages 摘要:街道一级图像具有扩大现场数据收集的巨大潜力。这是通过结合使用廉价的高质量摄像机和深度学习计算机解决方案的最新进展来获得相关主题信息而实现的。我们提出了一个利用计算机视觉从街道图像中收集和提取作物类型和物候信息的框架。2018年生长季期间,在荷兰弗莱沃兰省,用侧视动作摄像机拍摄高清晰度照片。从3月到10月,每月对一条200公里的固定路线进行调查,每秒收集一张图片,总共产生40万张地理标记图片。在220个特定地块位置,记录了17种作物类型的现场作物物候观测结果。此外,时间跨度包括特定的出苗前阶段,如春、夏作物不同种植的裸土,以及收获后的种植做法,如绿色施肥和捕获作物。基于卷积神经网络(MobileNet)的转移学习,使用TensorFlow和著名的图像识别模型进行分类。开发了一种超调谐方法,以获得160个模型中性能最好的模型。该最佳模型应用于区分作物类型的独立推理集,宏观F1得分为88.1%,地块水平的主要物候期为86.9%。讨论了该方法的潜力和注意事项以及实施和改进的实际考虑。提出的框架加快了高质量的现场数据收集,并提出了通过计算机视觉自动分类收集海量数据的途径。 摘要:Street-level imagery holds a significant potential to scale-up in-situ data collection. This is enabled by combining the use of cheap high quality cameras with recent advances in deep learning compute solutions to derive relevant thematic information. We present a framework to collect and extract crop type and phenological information from street level imagery using computer vision. During the 2018 growing season, high definition pictures were captured with side-looking action cameras in the Flevoland province of the Netherlands. Each month from March to October, a fixed 200-km route was surveyed collecting one picture per second resulting in a total of 400,000 geo-tagged pictures. At 220 specific parcel locations detailed on the spot crop phenology observations were recorded for 17 crop types. Furthermore, the time span included specific pre-emergence parcel stages, such as differently cultivated bare soil for spring and summer crops as well as post-harvest cultivation practices, e.g. green manuring and catch crops. Classification was done using TensorFlow with a well-known image recognition model, based on transfer learning with convolutional neural networks (MobileNet). A hypertuning methodology was developed to obtain the best performing model among 160 models. This best model was applied on an independent inference set discriminating crop type with a Macro F1 score of 88.1% and main phenological stage at 86.9% at the parcel level. Potential and caveats of the approach along with practical considerations for implementation and improvement are discussed. The proposed framework speeds up high quality in-situ data collection and suggests avenues for massive data collection via automated classification using computer vision.

【32】 Reinforcing RCTs with Multiple Priors while Learning about External Validity 标题:在学习外部效度的同时加强多先验随机对照试验 链接:https://arxiv.org/abs/2112.09170

作者:Frederico Finan,Demian Pouzo 机构:UC Berkeley 摘要:本文提出了一个如何将先验信息源纳入序贯实验设计的框架。这些信息可以来自许多来源,包括以前的实验、专家意见或实验者自己的自省。我们使用多先验贝叶斯方法将该问题形式化,该方法将每个源映射到贝叶斯模型。这些模型根据其相关的后验概率进行聚合。我们根据三个标准评估我们的框架:实验者是否学习了回报分布的参数,实验者在决定停止实验时选择错误处理的概率,以及平均回报。我们展示了我们的框架展示了几个很好的有限样本属性,包括对任何外部无效源的鲁棒性。 摘要:This paper presents a framework for how to incorporate prior sources of information into the design of a sequential experiment. This information can come from many sources, including previous experiments, expert opinions, or the experimenter's own introspection. We formalize this problem using a multi-prior Bayesian approach that maps each source to a Bayesian model. These models are aggregated according to their associated posterior probabilities. We evaluate our framework according to three criteria: whether the experimenter learns the parameters of the payoff distributions, the probability that the experimenter chooses the wrong treatment when deciding to stop the experiment, and the average rewards. We show that our framework exhibits several nice finite sample properties, including robustness to any source that is not externally valid.

【33】 Real-time Detection of Anomalies in Multivariate Time Series of Astronomical Data 标题:天文数据多变量时间序列异常的实时检测 链接:https://arxiv.org/abs/2112.08415

作者:Daniel Muthukrishna,Kaisey S. Mandel,Michelle Lochner,Sara Webb,Gautham Narayan 机构:Massachusetts Institute of Technology, Cambridge, MA, USA, University of Cambridge, Cambridge, United Kingdom, Department of Physics and Astronomy, University of the Western Cape∗, and South African Radio Astronomy Observatory (SARAO)† 备注:9 pages, 5 figures, Accepted at the NeurIPS 2021 workshop on Machine Learning and the Physical Sciences 摘要:天文瞬变是指在不同的时间尺度上暂时变亮的恒星物体,它导致了宇宙学和天文学中一些最重要的发现。其中一些瞬变是被称为超新星的恒星爆炸性死亡,而另一些是罕见的、奇异的或全新的令人兴奋的恒星爆炸。新的天文天象观测正在观测数量空前的多波长瞬变,使得视觉识别新的有趣瞬变的标准方法变得不可行。为了满足这一需求,我们提出了两种新的方法,旨在快速、自动地实时检测异常瞬态光曲线。这两种方法都基于一个简单的想法,即如果已知瞬变总体的光照曲线可以精确建模,那么与模型预测的任何偏差都可能是异常。第一种方法是使用时间卷积网络(TCN)构建的概率神经网络,第二种方法是瞬态的可解释贝叶斯参数模型。我们表明,与我们的参数模型相比,神经网络的灵活性(使其成为许多回归任务的强大工具的属性)使其不适合异常检测。 摘要:Astronomical transients are stellar objects that become temporarily brighter on various timescales and have led to some of the most significant discoveries in cosmology and astronomy. Some of these transients are the explosive deaths of stars known as supernovae while others are rare, exotic, or entirely new kinds of exciting stellar explosions. New astronomical sky surveys are observing unprecedented numbers of multi-wavelength transients, making standard approaches of visually identifying new and interesting transients infeasible. To meet this demand, we present two novel methods that aim to quickly and automatically detect anomalous transient light curves in real-time. Both methods are based on the simple idea that if the light curves from a known population of transients can be accurately modelled, any deviations from model predictions are likely anomalies. The first approach is a probabilistic neural network built using Temporal Convolutional Networks (TCNs) and the second is an interpretable Bayesian parametric model of a transient. We show that the flexibility of neural networks, the attribute that makes them such a powerful tool for many regression tasks, is what makes them less suitable for anomaly detection when compared with our parametric model.

机器翻译,仅供参考

0 人点赞