统计学学术速递[6.22]

2021-07-02 17:46:15 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

stat统计学,共计77篇

【1】 Nested Variational Inference 标题:嵌套变分推理

作者:Heiko Zimmermann,Hao Wu,Babak Esmaeili,Jan-Willem van de Meent 机构:†Khoury College of Computer Sciences, Northeastern University 链接:https://arxiv.org/abs/2106.11302 摘要:我们开发了嵌套变分推理(NVI),这是一个方法家族,通过最小化嵌套的每个层次上的正向或反向KL散度来学习嵌套重要性采样器的建议。NVI适用于许多常用的重要抽样策略,它提供了一种学习中间密度的机制,可以作为指导抽样的启发式方法。我们的实验将NVI应用于(a)使用学习退火路径的多峰分布样本(b)在隐马尔可夫模型中学习近似未来观测可能性的启发式算法,以及(c)在层次深生成模型中执行摊销推理。我们观察到,优化嵌套目标导致改善样本质量方面的对数平均重量和有效样本量。 摘要:We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.

【2】 Circuits for robust designs 标题:用于健壮设计的电路

作者:Roberto Fontana,Fabio Rapallo,Henry P. Wynn 机构:Department of Mathematical Science, Politecnico di Torino, Italy∗, Department of Economics, University of Genova, Italy†, London School of Economics, UK‡ 备注:21 pages; 6 figures 链接:https://arxiv.org/abs/2106.11213 摘要:本文继续应用电路理论的实验设计开始由前两位作者。该理论对设计模型矩阵的核心给出了非常特殊和详细的表示。这种表示法被证明是研究被称为稳健性的优化标准的一种适当方法:设计对设计点移除的敏感性。文中给出了许多例子,从经典的组合设计到包含交互作用的两级析因设计。电路表示的复杂性是有用的,因为它们提供了大量的选项,但反过来需要使用专用软件。提出了提高速度的建议。 摘要:This paper continues the application of circuit theory to experimental design started by the first two authors. The theory gives a very special and detailed representation of the kernel of the design model matrix. This representation turns out to be an appropriate way to study the optimality criteria referred to as robustness: the sensitivity of the design to the removal of design points. Many examples are given, from classical combinatorial designs to two-level factorial design including interactions. The complexity of the circuit representations are useful because the large range of options they offer, but conversely require the use of dedicated software. Suggestions for speed improvement are made.

【3】 Stratified Learning: a general-purpose statistical method for improved learning under Covariate Shift 标题:分层学习:协变量漂移条件下改进学习的一种通用统计方法

作者:Maximilian Autenrieth,David A. van Dyk,Roberto Trotta,David C. Stenning 机构:Imperial College London, David van Dyk, & SISSA (Trieste), David Stenning, Simon Fraser University 链接:https://arxiv.org/abs/2106.11211 摘要:由于协变量分布的系统性差异,当标记的训练(源)数据不能代表未标记的(目标)数据时,协变量发生偏移。在受协变量漂移影响的源数据上训练的监督模型对目标数据的泛化能力较差。我们提出了一种新的,统计原理和理论上合理的方法,以提高学习协变量转移条件下,基于倾向评分分层,一个成熟的方法在因果推理。我们表明,协变量转移的影响可以减少或完全消除条件对倾向得分。在实践中,这是通过根据估计的倾向性得分对数据进行分区,从而使学习者适应亚组(“层”),从而实现平衡的协变量和大大改进的目标预测。我们证明了我们的通用方法在观测宇宙学的当代研究问题上的有效性,以及在其他基准例子上的有效性,匹配或优于在协变量转移文献中广泛研究的最新重要加权方法。我们在更新的“超新星光度分类挑战”上获得了最好的AUC(0.958),并改进了现有的基于斯隆数据天空调查(SDSS)数据的星系红移条件密度估计。 摘要:Covariate shift arises when the labelled training (source) data is not representative of the unlabelled (target) data due to systematic differences in the covariate distributions. A supervised model trained on the source data subject to covariate shift may suffer from poor generalization on the target data. We propose a novel, statistically principled and theoretically justified method to improve learning under covariate shift conditions, based on propensity score stratification, a well-established methodology in causal inference. We show that the effects of covariate shift can be reduced or altogether eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners on subgroups ("strata") constructed by partitioning the data based on the estimated propensity scores, leading to balanced covariates and much-improved target prediction. We demonstrate the effectiveness of our general-purpose method on contemporary research questions in observational cosmology, and on additional benchmark examples, matching or outperforming state-of-the-art importance weighting methods, widely studied in the covariate shift literature. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge" and improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data.

【4】 Facilitating Team-Based Data Science: Lessons Learned from the DSC-WAV Project 标题:促进基于团队的数据科学:从DSC-WAV项目中吸取的经验教训

作者:Chelsey Legacy,Andrew Zieffler,Benjamin S. Baumer,Valerie Barr,Nicholas J. Horton 机构:Department of Educational Psychology, University of Minnesota, Minneapolis, MN , USA, Statistical & Data Sciences, Smith College, Northampton, MA , USA, Department of Computer Science, Mount Holyoke College, South Hadley, MA , USA, Department of Mathematics and Statistics 链接:https://arxiv.org/abs/2106.11209 摘要:虽然课程向本科数据科学专业的学生介绍了一些相关的分析技能,但许多学生并没有获得在工作场所取得成功所需的大量数据和计算经验。此外,学生通常很少了解基于团队的数据科学以及校外遇到的协作原则和工具。在本文中,我们描述了DSC-WAV计划,一个NSF资助的数据科学劳动力发展项目,在这个项目中,由大学二年级和三年级学生组成的团队与当地的非营利组织合作解决一个以数据为中心的问题。为了帮助学生培养代理意识,提高他们对从事数据科学的技术和非技术技能的信心,该项目促进了以团队为基础的数据科学方法,采用了一些旨在促进这种合作的过程和工具。来自项目评估的证据,包括参与者调查和访谈数据,用于记录项目在多大程度上成功地让学生参与基于团队的数据科学,以及项目如何改变学生对其技术和非技术技能的看法。我们还研究了改进的机会,并提供了一些见解,以便为其他希望在自己的机构中实现类似功能的数据科学教育工作者提供建议。 摘要:While coursework introduces undergraduate data science students to some relevant analytic skills, many are not given the myriad experiences with data and computing they need to be successful in the workplace. Additionally, students often have little background with team-based data science and the principles and tools of collaboration that are encountered outside of school. In this paper, we describe the DSC-WAV program, an NSF-funded data science workforce development project in which teams of undergraduate sophomores and juniors work with a local non-profit organization on a data-focused problem. To help students develop a sense of agency and improve their confidence in their technical and non-technical skills for doing data science, the project promoted a team-based approach to data science, adopting several processes and tools intended to facilitate this collaboration. Evidence from the project evaluation, including participant survey and interview data, is presented to document the degree to which the project was successful in engaging students in team-based data science, and how the project changed the students' perceptions of their technical and non-technical skills. We also examine opportunities for improvement and offer insight to provide advice for other data science educators who want to implement something similar at their own institutions.

【5】 maars: Tidy Inference under the 'Models as Approximations' Framework in R 标题:MAARS:R中“模型即近似”框架下的整齐推理

作者:Riccardo Fogliato,Shamindra Shrotriya,Arun Kumar Kuchibhotla 机构:Department of Statistics & Data Science, Carnegie Mellon University 备注:The first two authors contributed equally to this work and are ordered alphabetically 链接:https://arxiv.org/abs/2106.11188 摘要:使用普通最小二乘法(OLS)的线性回归是每个统计学家工具箱的关键部分。在R中,这是通过lm()及其相关函数优雅地实现的。然而,这组函数的统计推断输出是基于模型被很好地指定的假设。这种假设通常是不切实际的,充其量也只是近似满足。在统计学和计量经济学文献中,这一点早已得到认可,大量的工作为OLS在更实际的假设下提供了推论。这可以看作是无模型推理。在本文中,我们介绍了我们的软件包maars(models as approximations),旨在通过一个全面的工作流将无模型推理的研究引入到R中。maars包与其他也实现方差估计的包(如sandwich)在三个关键方面不同。首先,maars中的所有函数都遵循一致的语法,并以整洁的格式返回输出,与典型的lm()工作流的偏差最小。第二,maars包含了几种推理工具,包括经验、乘数、剩余自举和子抽样,以便于比较。第三,maars的发展考虑到了教育学。为此,它的大多数函数都显式返回输出有效的假设。这一关键的创新使得maars在错误描述下的推理教学中非常有用,同时也是应用研究人员的有力工具。我们希望显式呈现假设的默认特性将成为R中大多数统计建模的事实标准。 摘要:Linear regression using ordinary least squares (OLS) is a critical part of every statistician's toolkit. In R, this is elegantly implemented via lm() and its related functions. However, the statistical inference output from this suite of functions is based on the assumption that the model is well specified. This assumption is often unrealistic and at best satisfied approximately. In the statistics and econometrics literature, this has long been recognized and a large body of work provides inference for OLS under more practical assumptions. This can be seen as model-free inference. In this paper, we introduce our package maars ("models as approximations") that aims at bringing research on model-free inference to R via a comprehensive workflow. The maars package differs from other packages that also implement variance estimation, such as sandwich, in three key ways. First, all functions in maars follow a consistent grammar and return output in tidy format, with minimal deviation from the typical lm() workflow. Second, maars contains several tools for inference including empirical, multiplier, residual bootstrap, and subsampling, for easy comparison. Third, maars is developed with pedagogy in mind. For this, most of its functions explicitly return the assumptions under which the output is valid. This key innovation makes maars useful in teaching inference under misspecification and also a powerful tool for applied researchers. We hope our default feature of explicitly presenting assumptions will become a de facto standard for most statistical modeling in R.

【6】 Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis 标题:仿射不变综合秩加权深度:定义、性质和有限样本分析

作者:Guillaume Staerman,Pavlo Mozharovskyi,Stéphan Clémençon 机构:LTCI, Télécom Paris, Institut Polytechnique de Paris 链接:https://arxiv.org/abs/2106.11068 摘要:因为统计深度的概念决定了观测值在$mathbb{R}^d$和$dgeq 2$中的中心向外排序,所以它允许定义多变量数据的分位数和秩,并将它们用于各种统计任务(例如}推断、假设检验)。尽管自{Tukey75}的开创性贡献以来,文献中提出了许多深度函数,但并非所有深度函数都具有模拟单变量概率分布分位数函数概念所需的性质。在本文中,我们提出了一个扩展的综合秩加权统计深度(简称IRW深度),最初是在{IRW}中引入的,为了满足{EXTIT{仿射不变性}的性质而进行了修改,从而实现了{ZuoS00a}所阐述的命名法中列出的所有四个关键公理。我们提出的变量,称为仿射不变IRW深度(简称AI-IRW),涉及正在研究的(假定平方可积)$d$维随机向量$X$的协方差/精度矩阵,为了考虑$X$最可变的方向,将深度值赋给任何点$Xinmathbb{R}^d$。从非交感的角度研究了AI-IRW深度采样版本的精度。也就是说,证明了AI-IRW深度的统计对应的浓缩结果。除了进行理论分析外,还考虑了在异常检测中的应用,并展示了数值结果,为我们提出的深度函数的相关性提供了有力的经验证据。 摘要:Because it determines a center-outward ordering of observations in $mathbb{R}^d$ with $dgeq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (textit{e.g.} inference, hypothesis testing). Whereas many depth functions have been proposed textit{ad-hoc} in the literature since the seminal contribution of cite{Tukey75}, not all of them possess the properties desirable to emulate the notion of quantile function for univariate probability distributions. In this paper, we propose an extension of the textit{integrated rank-weighted} statistical depth (IRW depth in abbreviated form) originally introduced in cite{IRW}, modified in order to satisfy the property of textit{affine-invariance}, fulfilling thus all the four key axioms listed in the nomenclature elaborated by cite{ZuoS00a}. The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study, in order to take into account the directions along which $X$ is most variable to assign a depth value to any point $xin mathbb{R}^d$. The accuracy of the sampling version of the AI-IRW depth is investigated from a nonasymptotic perspective. Namely, a concentration result for the statistical counterpart of the AI-IRW depth is proved. Beyond the theoretical analysis carried out, applications to anomaly detection are considered and numerical results are displayed, providing strong empirical evidence of the relevance of the depth function we propose here.

【7】 α-Stable convergence of heavy-tailed infinitely-wide neural networks标题:α-重尾无限宽神经网络的稳定收敛性

作者:Paul Jung,Hoil Lee,Jiho Lee,Hongseok Yang 链接:https://arxiv.org/abs/2106.11064 摘要:我们考虑无限宽的多层感知器(mlp),这是标准的深度前馈神经网络的限制。我们假设,对于每一层,MLP的权重都是用来自对称$alpha$-稳定分布吸引域中的轻尾(有限方差)或重尾分布的i.i.d.样本初始化的,其中$alphain(0,2]$可能取决于层。对于层的偏差项,我们假设i.i.d.初始化具有对称的$alpha$-稳定分布,该层具有相同的$alpha$参数。然后,我们扩展了Favaro、Fortini和Peluchetti(2020)的最新结果,以表明给定隐藏层的所有节点上的预激活值向量在适当的比例下收敛到具有对称$alpha$-稳定分布的i.i.d.随机变量向量。 摘要:We consider infinitely-wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with i.i.d. samples from either a light-tailed (finite variance) or heavy-tailed distribution in the domain of attraction of a symmetric $alpha$-stable distribution, where $alphain(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $alpha$-stable distribution having the same $alpha$ parameter of that layer. We then extend a recent result of Favaro, Fortini, and Peluchetti (2020), to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $alpha$-stable distributions.

【8】 Scalable Bayesian inference for time series via divide-and-conquer 标题:基于分而治之的时间序列可扩展贝叶斯推理

作者:Rihui Ou,Deborshee Sen,David Dunson 机构:Department of Statistical Science, Duke University, Durham, NC, USA , The Statistical and Applied Mathematical Sciences Institute (SAMSI), Durham 链接:https://arxiv.org/abs/2106.11043 摘要:随着数据量的增加,贝叶斯计算算法往往伸缩性较差。这导致了基于分而治之的可伸缩推理方法的发展。这些方法将数据划分为多个子集,并行地对每个子集进行推理,然后合并这些推理。尽管独立观测的理论特性和实际性能已经得到了验证,但依赖数据的可伸缩性推理仍然具有挑战性。在这项工作中,我们研究从很长的时间序列的贝叶斯推断问题。这方面的文献主要集中在缺乏任何理论保证的近似方法上,并且在实践中可能提供任意的低精度。提出了一种简单、可扩展的分治方法,并提供了精度保证。数值模拟和实际数据应用证明了该方法的有效性。 摘要:Bayesian computational algorithms tend to scale poorly as data size increases. This had led to the development of divide-and-conquer-based approaches for scalable inference. These divide the data into subsets, perform inference for each subset in parallel, and then combine these inferences. While appealing theoretical properties and practical performance have been demonstrated for independent observations, scalable inference for dependent data remains challenging. In this work, we study the problem of Bayesian inference from very long time series. The literature in this area focuses mainly on approximate approaches that lack any theoretical guarantees and may provide arbitrarily poor accuracy in practice. We propose a simple and scalable divide-and-conquer method, and provide accuracy guarantees. Numerical simulations and real data applications demonstrate the effectiveness of our approach.

【9】 Estimation of time-specific intervention effects on continuously distributed time-to-event outcomes by targeted maximum likelihood estimation 标题:用目标最大似然估计估计特定时间干预对连续分布的时间到事件结果的影响

作者:Helene Charlotte Wiese Rytgaard,Frank Eriksson,Mark van der Laan 机构: University of Copenhagen, University of California 链接:https://arxiv.org/abs/2106.11009 摘要:目标极大似然估计是一种将柔性集成学习和半参数效率理论相结合的两步因果参数估计方法。迄今为止,针对生存和竞争风险分析提出的有针对性的最大似然方法主要关注离散时间内的事件值。在这里,我们提出了一个有针对性的最大似然估计过程的事件时间采取的价值在R 。我们的重点是估计干预具体的平均结果与随机干预的时间固定的治疗。对于干扰参数的数据自适应估计,提出了一种新的灵活的连续时间强度的高自适应lasso估计方法,该方法可以用L1惩罚Poisson回归实现。在一个模拟研究中,基于高度自适应lasso估计的目标极大似然估计被证明是无偏的,实现了与渐近理论相一致的适当覆盖,并进一步显示出相对于Kaplan-Meier方法的效率改进。 摘要:Targeted maximum likelihood estimation is a general methodology combining flexible ensemble learning and semiparametric efficiency theory in a two-step procedure for estimation of causal parameters. Proposed targeted maximum likelihood procedures for survival and competing risks analysis have so far focused on events taken values in discrete time. We here present a targeted maximum likelihood estimation procedure for event times that take values in R . We focuson the estimation of intervention-specific mean outcomes with stochastic interventions on a time-fixed treatment. For data-adaptive estimation of nuisance parameters, we propose a new flexible highly adaptive lasso estimation method for continuous-time intensities that can be implemented with L1-penalized Poisson regression. In a simulation study the targeted maximum likelihood estimator based on the highly adaptive lasso estimator proves to be unbiased and achieve proper coverage in agreement with the asymptotic theory and further displays efficiency improvements relative to a Kaplan-Meier approach.

【10】 A generalized EMS algorithm for model selection with incomplete data 标题:不完全数据下的广义EMS模型选择算法

作者:Ping-Feng Xu,Lai-Xu Shang,Man-Lai Tang,Na Shan,Guoliang Tian 机构:School of Mathematics and Statistics, Changchun University of Technology, Changchun, Department of Mathematics and Statistics, Hang Seng University of Hong Kong, Hong, School of Psychology, Northeast Normal University, Changchun, China 链接:https://arxiv.org/abs/2106.10983 摘要:最近,一种所谓的E-MS算法被开发用于在缺失数据情况下的模型选择。具体地说,它交替地执行期望步骤(E步骤)和模型选择步骤(MS步骤)来寻找观测的广义信息准则(GIC)的最小点。在实践中,对于高维设置执行MS步骤在数值上可能是不可行的。本文提出了一种更简单可行的广义EMS(GEMS)算法,该算法只需在MS步骤中减少观测GIC,并将原EMS算法作为特例。得到了GEMS算法在温和条件下的数值收敛结果。将GEMS算法应用于广义线性模型中的高斯图形模型选择和变量选择,并通过数值实验与已有算法进行了比较。我们用三个实际数据集来说明它的应用。 摘要:Recently, a so-called E-MS algorithm was developed for model selection in the presence of missing data. Specifically, it performs the Expectation step (E step) and Model Selection step (MS step) alternately to find the minimum point of the observed generalized information criteria (GIC). In practice, it could be numerically infeasible to perform the MS-step for high dimensional settings. In this paper, we propose a more simple and feasible generalized EMS (GEMS) algorithm which simply requires a decrease in the observed GIC in the MS-step and includes the original EMS algorithm as a special case. We obtain several numerical convergence results of the GEMS algorithm under mild conditions. We apply the proposed GEMS algorithm to Gaussian graphical model selection and variable selection in generalized linear models and compare it with existing competitors via numerical experiments. We illustrate its application with three real data sets.

【11】 Spliced Binned-Pareto Distribution for Robust Modeling of Heavy-tailed Time Series 标题:重尾时间序列稳健建模的拼接宾数-帕累托分布

作者:Elena Ehrlich,Laurent Callot,François-Xavier Aubet 机构:AWS ProServe, Miami, FL, USA, Amazon Research, Seattle, WA, USA, Franc¸ois-Xavier Aubet, Vienna, Austria 备注:Accepted at RobustWorkshop@ICLR2021: <this https URL> 链接:https://arxiv.org/abs/2106.10952 摘要:本文提出了一种新的方法,在非平稳情形下,对具有重尾噪声的时间序列进行鲁棒而精确的建模。在许多实际应用中,时间序列具有重尾噪声,严重影响了经典预测模型的性能;特别是,精确地建模极端事件的分布对于执行精确的时间序列异常检测至关重要。我们提出了一种拼接的二元Pareto分布,该分布对极端观测具有鲁棒性,并允许对完全分布进行精确建模。我们的方法允许捕捉分布的高阶矩的时间依赖性,如尾部重量。我们比较了我们的方法的鲁棒性和尾部估计的准确性,以其他国家的最先进的方法在Twitter上提到计数时间序列。 摘要:This work proposes a novel method to robustly and accurately model time series with heavy-tailed noise, in non-stationary scenarios. In many practical application time series have heavy-tailed noise that significantly impacts the performance of classical forecasting models; in particular, accurately modeling a distribution over extreme events is crucial to performing accurate time series anomaly detection. We propose a Spliced Binned-Pareto distribution which is both robust to extreme observations and allows accurate modeling of the full distribution. Our method allows the capture of time dependencies in the higher order moments of the distribution such as the tail heaviness. We compare the robustness and the accuracy of the tail estimation of our method to other state of the art methods on Twitter mentions count time series.

【12】 Tumor Radiogenomics with Bayesian Layered Variable Selection 标题:基于贝叶斯分层变量选择的肿瘤放射基因组学

作者:Shariq Mohammed,Sebastian Kurtek,Karthik Bharath,Arvind Rao,Veerabhadran Baladandayuthapani 机构:Department of Biostatistics, University of Michigan, Department of Computational Medicine & Bioinformatics, University of Michigan, Department of Statistics, Ohio State University, School of Mathematical Sciences, University of Nottingham 链接:https://arxiv.org/abs/2106.10941 摘要:我们提出了一个统计框架来整合放射磁共振成像(MRI)和基因组数据,以确定低级别胶质瘤(LGG)潜在的放射基因相关性。我们设计了一种新的成像表型,将肿瘤区域分成同心球形层,模拟肿瘤的演化过程。每一层的MRI数据都用体素表示,体素是一种基于强度的概率密度函数,它捕捉了肿瘤异质性的完整信息。在黎曼几何框架下,这些密度被映射到作为成像表型的主成分得分向量。随后,我们建立了以影像学表型为反应,以基因组标记为预测因子的各层贝叶斯变量选择模型。我们的新的层次先验公式结合了层的内部到外部结构,以及基因组标记之间的相关性。我们采用了一种计算效率高的期望值——基于最大化的估计策略。仿真研究表明,与其他方法相比,该方法具有更好的性能。我们将重点放在LGG的癌症驱动基因上,讨论一些生物学相关的发现。与生存和肿瘤发生相关的基因被鉴定为与球形层相关,在常规侵入性方法之前,球形层可能作为疾病监测的早期诊断标志物。 摘要:We propose a statistical framework to integrate radiological magnetic resonance imaging (MRI) and genomic data to identify the underlying radiogenomic associations in lower grade gliomas (LGG). We devise a novel imaging phenotype by dividing the tumor region into concentric spherical layers that mimics the tumor evolution process. MRI data within each layer is represented by voxel--intensity-based probability density functions which capture the complete information about tumor heterogeneity. Under a Riemannian-geometric framework these densities are mapped to a vector of principal component scores which act as imaging phenotypes. Subsequently, we build Bayesian variable selection models for each layer with the imaging phenotypes as the response and the genomic markers as predictors. Our novel hierarchical prior formulation incorporates the interior-to-exterior structure of the layers, and the correlation between the genomic markers. We employ a computationally-efficient Expectation--Maximization-based strategy for estimation. Simulation studies demonstrate the superior performance of our approach compared to other approaches. With a focus on the cancer driver genes in LGG, we discuss some biologically relevant findings. Genes implicated with survival and oncogenesis are identified as being associated with the spherical layers, which could potentially serve as early-stage diagnostic markers for disease monitoring, prior to routine invasive approaches.

【13】 Schr{ö}dinger-F{ö}llmer Sampler: Sampling without Ergodicity 标题:Schr{ö}dinger-F{ö}llmer采样器:无遍历性采样

作者:Jian Huang,Yuling Jiao,Lican Kang,Xu Liao,Jin Liu,Yanyan Liu 机构:cn‡School of Mathematics and Statistics, Wuhan University 链接:https://arxiv.org/abs/2106.10880 摘要:从概率分布抽样是统计学和机器学习中的一个重要问题,特别是在贝叶斯推理中,当后验分布的积分很难处理时,从后验分布抽样是推理的唯一可行选择。在本文中,我们提出了一种从可能的非标准化分布中采样的新方法Schr{o}dinger-F{o}llmer采样器(SFS)。所提出的SFS是基于单位时间间隔上的薛定谔-F{o}llmer扩散过程,并带有一个随时间变化的漂移项,它将时间零点的简并分布传输到时间零点的目标分布。与已有的马尔可夫链蒙特卡罗采样器相比,该采样器不需要遍历性。在计算上,利用Euler-Maruyama离散化可以很容易地实现SFS。在理论分析中,我们在适当的条件下,建立了在Wasserstein距离内SFS抽样分布的非渐近误差界。我们进行了数值实验来评估SFS的性能,并证明它能够生成比现有方法更好的样本。 摘要:Sampling from probability distributions is an important problem in statistics and machine learning, specially in Bayesian inference when integration with respect to posterior distribution is intractable and sampling from the posterior is the only viable option for inference. In this paper, we propose Schr"{o}dinger-F"{o}llmer sampler (SFS), a novel approach for sampling from possibly unnormalized distributions. The proposed SFS is based on the Schr"{o}dinger-F"{o}llmer diffusion process on the unit interval with a time dependent drift term, which transports the degenerate distribution at time zero to the target distribution at time one. Comparing with the existing Markov chain Monte Carlo samplers that require ergodicity, no such requirement is needed for SFS. Computationally, SFS can be easily implemented using the Euler-Maruyama discretization. In theoretical analysis, we establish non-asymptotic error bounds for the sampling distribution of SFS in the Wasserstein distance under suitable conditions. We conduct numerical experiments to evaluate the performance of SFS and demonstrate that it is able to generate samples with better quality than several existing methods.

【14】 Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation 标题:多类分类中的良性过拟合:所有道路都会导致插值

作者:Ke Wang,Vidya Muthukumar,Christos Thrampoulidis 机构:∗Department of Statistics and Applied Probability, University of California Santa Barbara, †Electrical and Computer Engineering & Industrial and Systems Engineering, Georgia Institute of Technology 链接:https://arxiv.org/abs/2106.10865 摘要:关于过度参数化模型中“良性过度拟合”的文献越来越多,主要局限于回归或二元分类设置;然而,现代机器学习的大多数成功案例都是在多类环境中记录的。基于这种差异,我们研究了多类线性分类中的良性过拟合。具体来说,我们考虑了以下几种常用的可分离数据训练算法:(i)交叉熵损失的经验风险最小化(ERM),它收敛于多类支持向量机(SVM)解(ii)带最小二乘损失的ERM,收敛到最小范数插值(MNI)解;以及,(iii)一对所有支持向量机分类器。首先,我们提供了一个简单的充分条件,在这个条件下,所有三种算法都可以得到插值训练数据且具有相同精度的分类器。当数据由高斯混合或多项式logistic模型生成时,在足够高的有效过参数化条件下,该条件成立。其次,我们推导了MNI分类器精度的新误差界,从而表明在充分的过参数化条件下,所有三种训练算法都会导致良性过拟合。最后,我们的分析表明,良好的推广是可能的SVM解决方案以外的领域,其中典型的保证金为基础的界限适用。 摘要:The growing literature on "benign overfitting" in overparameterized models has been mostly restricted to regression or binary classification settings; however, most success stories of modern machine learning have been recorded in multiclass settings. Motivated by this discrepancy, we study benign overfitting in multiclass linear classification. Specifically, we consider the following popular training algorithms on separable data: (i) empirical risk minimization (ERM) with cross-entropy loss, which converges to the multiclass support vector machine (SVM) solution; (ii) ERM with least-squares loss, which converges to the min-norm interpolating (MNI) solution; and, (iii) the one-vs-all SVM classifier. First, we provide a simple sufficient condition under which all three algorithms lead to classifiers that interpolate the training data and have equal accuracy. When the data is generated from Gaussian mixtures or a multinomial logistic model, this condition holds under high enough effective overparameterization. Second, we derive novel error bounds on the accuracy of the MNI classifier, thereby showing that all three training algorithms lead to benign overfitting under sufficient overparameterization. Ultimately, our analysis shows that good generalization is possible for SVM solutions beyond the realm in which typical margin-based bounds apply.

【15】 Dynamic group testing to control and monitor disease progression in a population 标题:用于控制和监测人群中疾病进展的动态分组测试

作者:Sundara Rajan Srinivasavaradhan,Pavlos Nikolopoulos,Christina Fragouli,Suhas Diggavi 机构:∗University of California, Los Angeles, Electrical and Computer Engineering 链接:https://arxiv.org/abs/2106.10765 摘要:在像COVID-19这样的大流行背景下,在大多数人接种疫苗之前,主动检测和干预已被证明是控制疾病传播的唯一手段。最近的学术工作在这方面提供了重要的证据,但一个关键的问题仍然悬而未决:我们能否准确地识别每天发生的所有新感染,而不必花费高昂的代价,即每天只使用所需测试的一小部分来测试每个人(完成测试)?组测试提供了一个强大的工具集来最小化测试的数量,但是它不能解释感染背后的时间动态。此外,它通常假设人们是独立感染的,而感染是由社区传播控制的。另一方面,流行病学确实通过已建立的连续时间SIR随机网络模型探讨了时间动力学和社区相关性,但标准模型没有纳入离散时间检验和干预。在本文中,我们介绍了一个“离散时间SIR随机块模型”,它也允许每天进行分组测试和干预。我们的模型可以被看作是连续时间SIR随机网络模型的离散版本,在一个特定类型的加权图上捕获了底层的社区结构。我们分析了w.r.t.模型,它是在错误概率为零的情况下识别所有感染所需的最小组测试次数。我们发现,一个人可以利用社区和模型的知识来通知顺序最优的非适应性组测试算法,因此可以用更少的测试次数达到与完全测试相同的性能。 摘要:In the context of a pandemic like COVID-19, and until most people are vaccinated, proactive testing and interventions have been proved to be the only means to contain the disease spread. Recent academic work has offered significant evidence in this regard, but a critical question is still open: Can we accurately identify all new infections that happen every day, without this being forbiddingly expensive, i.e., using only a fraction of the tests needed to test everyone everyday (complete testing)? Group testing offers a powerful toolset for minimizing the number of tests, but it does not account for the time dynamics behind the infections. Moreover, it typically assumes that people are infected independently, while infections are governed by community spread. Epidemiology, on the other hand, does explore time dynamics and community correlations through the well-established continuous-time SIR stochastic network model, but the standard model does not incorporate discrete-time testing and interventions. In this paper, we introduce a "discrete-time SIR stochastic block model" that also allows for group testing and interventions on a daily basis. Our model can be regarded as a discrete version of the continuous-time SIR stochastic network model over a specific type of weighted graph that captures the underlying community structure. We analyze that model w.r.t. the minimum number of group tests needed everyday to identify all infections with vanishing error probability. We find that one can leverage the knowledge of the community and the model to inform nonadaptive group testing algorithms that are order-optimal, and therefore achieve the same performance as complete testing using a much smaller number of tests.

【16】 Some smooth sequential empirical copula processes and their multiplier bootstraps under strong mixing 标题:强混合下几类光滑序列经验Copula过程及其乘子自举

作者:Ivan Kojadinovic,Bingqing Yi 机构:CNRS Universit´e de Pau et des Pays de l’Adour E,S UPPA, Laboratoire de math´ematiques et applications IPRA, UMR , B.P. , Pau Cedex, France 备注:33 pages, 5 figures 链接:https://arxiv.org/abs/2106.10726 摘要:研究了Segers、Sibuya和Tsukahara提出的一类光滑经验copula,其中包含了经验beta copula。给出了相应的序贯经验copula过程弱收敛的条件。提出了这类光滑估计的具体成员,它们依赖于一个决定边缘光滑量的标量参数和一个控制光滑区域形状的函数参数。对这些参数影响的实证研究表明,应关注数据自适应光滑非参数copula的一个子类。为了在未知copula的推理过程(包括变点分析)中使用所提出的光滑估计,我们渐近地验证了序列相依乘法器bootstrap的自然光滑扩展,并通过montecarlo实验研究了它们的有限样本性能。 摘要:A broad class of smooth empirical copulas that contains the empirical beta copula proposed by Segers, Sibuya and Tsukahara is studied. Conditions under which the corresponding sequential empirical copula processes converge weakly are provided. Specific members of this general class of smooth estimators that depend on a scalar parameter determining the amount of marginal smoothing and a functional parameter controlling the shape of the smoothing region are proposed. The empirical investigation of the influence of these parameters suggests to focus on a subclass of data-adaptive smooth nonparametric copulas. To allow the use of the proposed class of smooth estimators in inference procedures on an unknown copula, including in change-point analysis, natural smooth extensions of the sequential dependent multiplier bootstrap are asymptotically validated and their finite-sample performance is studied through Monte Carlo experiments.

【17】 The Expected Value of Perfect Information for Risk Prediction Models 标题:风险预测模型的完全信息期望值

作者:Mohsen Sadatsafavi,Tae Yoon Lee,Paul Gustafson 机构:Affiliations:, . Respiratory Evaluation Sciences Program, Collaboration for Outcomes Research and Evaluation, . Department of Statistics, The University of British Columbia, Vancouver, Canada, Corresponding Author:, Associate Professor 备注:20 pages, 1 table, 3 figures 链接:https://arxiv.org/abs/2106.10721 摘要:风险预测模型通常是使用有限的开发样本来构建的,因此所得到的预测风险是不确定的。对预测不确定性的决策理论意义的研究还不够。对于风险预测模型,可以根据将阳性阈值解释为真阳性和假阳性结果之间的汇率来计算净收益。在建立风险预测模型时,我们采用贝叶斯的观点,将决策理论中的信息概念的价值应用于此类净收益计算。我们将完全信息期望值(EVPI)定义为净收益中的期望收益,通过使用正确的预测来代替所提出的模型。我们建议使用自举方法从预测的后验分布取样,以便使用蒙特卡罗模拟计算EVPI。在一个案例研究中,我们使用了临床试验中不同大小的数据子集来建立急性心肌梗死后30天死亡率的风险预测模型。在样本量为1000的情况下,EVPI在阈值高于0.6时为0,表明没有必要获取更多的开发数据。在阈值为0.4-0.6时,提出的模型没有净效益,但EVPI为正,表明获得更多的开发数据可能是有益的。在整个阈值范围内,使用正确模型获得的净收益比使用建议模型获得的收益高24%。EVPI随着样本量的增加而下降,当样本量在4000及以上时,EVPI普遍较低。我们总结了一个算法,将EVPI计算纳入常用的bootstrap方法进行乐观校正。信息价值方法可用于探索风险预测中不确定性的决策理论后果,并可在开发或验证风险预测模型时补充推理方法。 摘要:Risk prediction models are often constructed using a finite development sample, thus the resulting predicted risks are uncertain. The decision-theoretic implications of prediction uncertainty are not sufficiently studied. For risk prediction models, a measure of net benefit can be calculated based on interpreting the positivity threshold as an exchange rate between true and false positive outcomes. Adopting a Bayesian perspective, we apply Value of Information concepts from decision theory to such net benefit calculations when developing a risk prediction model. We define the Expected Value of Perfect Information (EVPI) as the expected gain in net benefit by using the correct predictions as opposed to the proposed model. We suggest bootstrap methods for sampling from the posterior distribution of predictions for EVPI calculation using Monte Carlo simulations. In a case study, we used subsets of data of various sizes from a clinical trial to develop risk prediction models for 30-day mortality after acute myocardial infarction. With sample size of 1,000, EVPI was 0 at threshold values above 0.6, indicating no point in procuring more development data. At thresholds of 0.4-0.6, the proposed model was not net beneficial, but EVPI was positive, indicating that obtaining more development data might be beneficial. Across the entire thresholds, the gain in net benefit by using the correct model was 24% higher than the gain by using the proposed model. EVPI declined with larges sample sizes and was generally low with sample size of 4,000 and above. We summarize an algorithm for incorporating EVPI calculations into the commonly used bootstrap method for optimism correction. Value of Information methods can be applied to explore decision-theoretic consequences of uncertainty in risk prediction, and can complement inferential methods when developing or validating risk prediction models.

【18】 Constrained randomization and statistical inference for multi-arm parallel cluster randomized controlled trials 标题:多臂平行整群随机对照试验的约束随机化和统计推断

作者:Yunji Zhou,Elizabeth L. Turner,Ryan A. Simmons,Fan Li 机构:Department of Biostatistics and, Bioinformatics, Duke University, Durham, Duke Global Health Institute, Duke, University, Durham, North Carolina, USA, Department of Biostatistics, Yale School of, Public Health, New Haven, Connecticut 链接:https://arxiv.org/abs/2106.10720 摘要:整群随机对照试验(CRCT)的目的是评估干预措施提供给群体的个人。这种设计的一个实际限制是,可用聚类的数量可能很小,导致在简单随机化下基线不平衡的风险增加。受限随机化通过将分配限制到随机化方案的子集来克服此问题,其中相对于预先指定的平衡度量,在比较臂之间实现充分的总体协变量平衡。然而,约束随机化在多臂crct设计和分析中的几个方面还没有得到充分的研究。基于一个正在进行的多臂cRCT,我们提供了一个基于模型和随机化测试的统计特性的综合评估,包括在多臂cRCT中的简单和约束随机化设计,以及基于设计和分析的协变量调整策略的不同组合。特别是,由于基于随机化的测试在多臂CRCT中没有被广泛研究,我们在线性混合模型框架下额外开发了最强大的排列测试用于我们的比较。我们的结果表明,在约束随机化条件下,只要对基线协变量进行适当的分析调整,基于模型的分析和基于随机化的分析都可以在保持标称I型错误率的情况下获得功率。本文还讨论了平衡度量和候选集大小的选择及其对两两假设和全局假设检验的影响。最后,由于自由度不足以及获得过度受限的随机化空间的趋势,我们警告不要设计和分析具有极少量簇的多臂CRCT。 摘要:Cluster randomized controlled trials (cRCTs) are designed to evaluate interventions delivered to groups of individuals. A practical limitation of such designs is that the number of available clusters may be small, resulting in an increased risk of baseline imbalance under simple randomization. Constrained randomization overcomes this issue by restricting the allocation to a subset of randomization schemes where sufficient overall covariate balance across comparison arms is achieved with respect to a pre-specified balance metric. However, several aspects of constrained randomization for the design and analysis of multi-arm cRCTs have not been fully investigated. Motivated by an ongoing multi-arm cRCT, we provide a comprehensive evaluation of the statistical properties of model-based and randomization-based tests under both simple and constrained randomization designs in multi-arm cRCTs, with varying combinations of design and analysis-based covariate adjustment strategies. In particular, as randomization-based tests have not been extensively studied in multi-arm cRCTs, we additionally develop most-powerful permutation tests under the linear mixed model framework for our comparisons. Our results indicate that under constrained randomization, both model-based and randomization-based analyses could gain power while preserving nominal type I error rate, given proper analysis-based adjustment for the baseline covariates. The choice of balance metrics and candidate set size and their implications on the testing of the pairwise and global hypotheses are also discussed. Finally, we caution against the design and analysis of multi-arm cRCTs with an extremely small number of clusters, due to insufficient degrees of freedom and the tendency to obtain an overly restricted randomization space.

【19】 Life-cycle assessment for flutter probability of a long-span suspension bridge based on field monitoring data 标题:基于现场监测数据的大跨度悬索桥颤振概率全生命周期评估

作者:Xiaolei Chu,Hung Nguyen Sinh,Wei Cui,Lin Zhao,Yaojun Ge 机构:State Key Lab of Disaster Reduction in, Civil Engineering, Tongji University, Key Laboratory of Transport Industry of, Bridge Wind Resistance Technologies, Tongji University, Shanghai, China, State Key Laboratory of Mountain Bridge, and Tunnel Engineering, Chongqing 链接:https://arxiv.org/abs/2106.10694 摘要:结构安全状态评估对于既有桥梁至关重要,准确评估颤振概率对于大跨度桥梁至关重要。在目前的工程实践中,在设计阶段,颤振临界风速通常是通过风洞试验来估计的,它对模态频率和阻尼比非常敏感。工程建成后,由于结构劣化和周期性环境等因素的影响,既有结构的结构性能会随着时间的推移而发生变化。结构的动力特性,如模态频率和阻尼比,不能被视为与初始值相同的值,在估计寿命周期颤振概率时应考虑退化。基于现场监测数据,提出了考虑结构性能退化的大跨度桥梁全寿命颤振概率评估框架。采用贝叶斯方法对一座主跨1650m的悬索桥进行模态识别,并对2010-2015年的现场监测数据进行分析,确定模态频率和阻尼比的劣化函数及其季节间波动。根据历史趋势,可以预测结构的长期特性,并计算出长期内各年颤振临界风速的概率分布。因此,根据预测的模态频率和阻尼比,估计寿命周期颤振概率。 摘要:Assessment of structural safety status is of paramount importance for existing bridges, where accurate evaluation of flutter probability is essential for long-span bridges. In current engineering practice, at the design stage, flutter critical wind speed is usually estimated by the wind tunnel test, which is sensitive to modal frequencies and damping ratios. After construction, structural properties of existing structures will change with time due to various factors, such as structural deteriorations and periodic environments. The structural dynamic properties, such as modal frequencies and damping ratios, cannot be considered as the same values as the initial ones, and the deteriorations should be included when estimating the life-cycle flutter probability. This paper proposes an evaluation framework to assess the life-cycle flutter probability of long-span bridges considering the deteriorations of structural properties, based on field monitoring data. The Bayesian approach is employed for modal identification of a suspension bridge with the main span of 1650 m, and the field monitoring data during 2010-2015 is analyzed to determine the deterioration functions of modal frequencies and damping ratios, as well as their inter-seasonal fluctuations. According to the historical trend, the long-term structural properties can be predicted, and the probability distributions of flutter critical wind speed for each year in the long term are calculated. Consequently, the life-cycle flutter probability is estimated, based on the predicted modal frequencies and damping ratios.

【20】 Outlier Detection and Spatial Analysis Algorithms 标题:离群点检测与空间分析算法

作者:Jacob John 机构:School of Computer Science and Engineering (SCOPE), Vellore Institute of Technology, Vellore, India 备注:7 pages, 14 figures 链接:https://arxiv.org/abs/2106.10669 摘要:离群点检测是数据挖掘的一个重要领域。它可以用于在分析之前对数据进行预处理,也可以用于在处理阶段(可视化之前)进行后处理,具体取决于异常值的有效性及其重要性。离群点检测扩展到信用卡欺诈、网络入侵、机器故障预测、潜在恐怖袭击等领域。离群值是那些具有显著不同特征的数据点。它们偏离数据集,导致分析过程中的不一致、噪声和异常,并导致原始点的修改。然而,一个常见的误解是,必须立即从数据集中消除或替换异常值。如果单独分析,这些观点可能会被认为是有用的,因为它们可以从一个单独的机制中获得,这使得它对研究问题非常重要。本文综述了用于空间分析的不同离群点检测方法。空间数据或地理空间数据是那些显示地理属性或属性,如位置或面积的数据。一个例子是为某个特定区域收集的天气数据,如降水量、温度、风速等。 摘要:Outlier detection is a significant area in data mining. It can be either used to pre-process the data prior to an analysis or post the processing phase (before visualization) depending on the effectiveness of the outlier and its importance. Outlier detection extends to several fields such as detection of credit card fraud, network intrusions, machine failure prediction, potential terrorist attacks, and so on. Outliers are those data points with characteristics considerably different. They deviate from the data set causing inconsistencies, noise and anomalies during analysis and result in modification of the original points However, a common misconception is that outliers have to be immediately eliminated or replaced from the data set. Such points could be considered useful if analyzed separately as they could be obtained from a separate mechanism entirely making it important to the research question. This study surveys the different methods of outlier detection for spatial analysis. Spatial data or geospatial data are those that exhibit geographic properties or attributes such as position or areas. An example would be weather data such as precipitation, temperature, wind velocity, and so on collected for a defined region.

【21】 Bayesian inference for continuous-time hidden Markov models with an unknown number of states 标题:状态数未知的连续时间隐马尔可夫模型的贝叶斯推断

作者:Yu Luo,David A. Stephens 机构: United Kingdom†Department of Mathematics and Statistics, McGill University 链接:https://arxiv.org/abs/2106.10660 摘要:我们考虑由一个具有有限维未知状态空间的潜在连续时间马尔可夫跳跃过程生成的数据的建模。通常在这样的模型中,状态的数目必须预先指定,而对于固定数目的状态的贝叶斯推断直到最近才被研究。此外,虽然已经发展了解决离散时间模型问题的方法,但是还没有一种方法成功地应用于连续时间情况。我们重点研究了可逆跳马尔可夫链蒙特卡罗方法,它允许在不同的状态数之间进行跨维移动,以便对未知的状态数进行贝叶斯推理。具体地说,我们提出了一种有效的分裂-组合移动算法,该算法有助于参数空间的探索,并证明了该算法在大规模上是有效的。随后,我们将该算法扩展到基于模型的聚类环境中,允许在分析过程中确定状态数和聚类数。通过仿真研究说明了模型的建立、推理方法和相关算法。最后,我们将此方法应用于魁北克加拿大医疗系统的实际数据。 摘要:We consider the modeling of data generated by a latent continuous-time Markov jump process with a state space of finite but unknown dimensions. Typically in such models, the number of states has to be pre-specified, and Bayesian inference for a fixed number of states has not been studied until recently. In addition, although approaches to address the problem for discrete-time models have been developed, no method has been successfully implemented for the continuous-time case. We focus on reversible jump Markov chain Monte Carlo which allows the trans-dimensional move among different numbers of states in order to perform Bayesian inference for the unknown number of states. Specifically, we propose an efficient split-combine move which can facilitate the exploration of the parameter space, and demonstrate that it can be implemented effectively at scale. Subsequently, we extend this algorithm to the context of model-based clustering, allowing numbers of states and clusters both determined during the analysis. The model formulation, inference methodology, and associated algorithm are illustrated by simulation studies. Finally, We apply this method to real data from a Canadian healthcare system in Quebec.

【22】 Dynamic prediction and analysis based on restricted mean survival time in survival analysis with nonproportional hazards 标题:非比例风险生存分析中基于约束平均生存时间的动态预测与分析

作者:Zijing Yang,Hongji Wu,Yawen Hou,Hao Yuan,Zheng Chen 机构: Department of Biostatistics, Southern Medical University, Guangzhou, China, Department of Statistics, Jinan University, Guangzhou, China, § This is the initial submission version of this manuscript. The Version of Record of the manuscript 备注:None 链接:https://arxiv.org/abs/2106.10625 摘要:在临床诊断和治疗过程中,限制性平均生存时间(RMST)可以作为一个合适的预后指标,它反映了患者在特定时间内的预期寿命。然而,RMST仅计算患者在开始随访后一段时间内的平均生存时间,可能无法准确描述患者预期寿命随时间的变化。预期寿命可根据患者已经存活的时间进行调整,并定义为条件限制平均生存时间(cRMST)。将时变协变量和具有时变效应的协变量相结合,建立了基于cRMST的动态RMST模型。我们分析了来自原发性胆汁性肝硬化(PBC)研究的数据,以说明动态RMST模型的应用。利用C指数和预测误差对预测性能进行了评价。所提出的动态RMST模型能够探讨预后因素对生存时间的动态影响,比RMST模型具有更好的预测性能。以3例PBC患者为例,说明在随访过程中预测的cRMST在不同预测时间的变化。使用基于cRMST的动态RMST模型可以通过更新患者的个性化动态预期寿命来优化基于证据的决策。 摘要:In the process of clinical diagnosis and treatment, the restricted mean survival time (RMST), which reflects the life expectancy of patients up to a specified time, can be used as an appropriate outcome measure. However, the RMST only calculates the mean survival time of patients within a period of time after the start of follow-up and may not accurately portray the change in a patient's life expectancy over time. The life expectancy can be adjusted for the time the patient has already survived and defined as the conditional restricted mean survival time (cRMST). A dynamic RMST model based on the cRMST can be established by incorporating time-dependent covariates and covariates with time-varying effects. We analysed data from a study of primary biliary cirrhosis (PBC) to illustrate the use of the dynamic RMST model. The predictive performance was evaluated using the C-index and the prediction error. The proposed dynamic RMST model, which can explore the dynamic effects of prognostic factors on survival time, has better predictive performance than the RMST model. Three PBC patient examples were used to illustrate how the predicted cRMST changed at different prediction times during follow-up. The use of the dynamic RMST model based on the cRMST allows for optimization of evidence-based decision-making by updating personalized dynamic life expectancy for patients.

【23】 Combined tests based on restricted mean time lost for competing risks data 标题:基于竞争风险数据受限平均损失时间的组合检验

作者:Jingjing Lyu,Yawen Hou,Zheng Chen 机构: Department of Biostatistics, Southern Medical University, Guangzhou, China, Department of Statistics, Jinan University, Guangzhou, China, Sep. 备注:26 pages, 3 figures 链接:https://arxiv.org/abs/2106.10624 摘要:竞争性风险数据在医学研究中很常见,子分布危险(SDH)比率被认为是一种合适的测量方法。然而,由于危险本身的局限性在临床上不易解释,而且SDH比率仅在比例SDH假设下有效,本文引入了一种竞争风险下的替代指标,即限制平均时间损失(RMTL)。在RMTL的基础上构建了若干测试程序。首先介绍了基于Aalen-Johansen累积关联函数的RMTL的定义和估计。然后,我们考虑了几种基于SDH和RMTL差分(RMTLd)的组合测试。通过仿真对这些方法的统计特性进行了评价,并应用于两个实例。组合试验的Ⅰ型误差接近于标称水平。所有组合试验表明,在所有情况下,功率都是可接受的。综上所述,RMTL能有意义地总结治疗效果,为临床决策提供依据,三种联合检验在不同条件下具有较强的稳健性,可用于实际数据分析中的统计推断。 摘要:Competing risks data are common in medical studies, and the sub-distribution hazard (SDH) ratio is considered an appropriate measure. However, because the limitations of hazard itself are not easy to interpret clinically and because the SDH ratio is valid only under the proportional SDH assumption, this article introduced an alternative index under competing risks, named restricted mean time lost (RMTL). Several test procedures were also constructed based on RMTL. First, we introduced the definition and estimation of RMTL based on Aalen-Johansen cumulative incidence functions. Then, we considered several combined tests based on the SDH and the RMTL difference (RMTLd). The statistical properties of the methods are evaluated using simulations and are applied to two examples. The type I errors of combined tests are close to the nominal level. All combined tests show acceptable power in all situations. In conclusion, RMTL can meaningfully summarize treatment effects for clinical decision making, and three combined tests have robust power under various conditions, which can be considered for statistical inference in real data analysis.

【24】 Low-rank Characteristic Tensor Density Estimation Part II: Compression and Latent Density Estimation 标题:低秩特征张量密度估计第二部分:压缩和潜在密度估计

作者:Magda Amiridi,Nikos Kargas,Nicholas D. Sidiropoulos 机构:University of Minnesota; he is now with Amazon 链接:https://arxiv.org/abs/2106.10591 摘要:生成概率模型的学习是机器学习中的一个核心问题,由于维数灾难的存在,它给机器学习带来了巨大的挑战。本文提出了一个联合降维和非参数密度估计框架,使用一种新的估计器,可以显式地捕捉输入数据的适当降维表示的潜在分布。其思想是联合设计一个非线性降维自动编码器,将训练数据建模为一组简约的潜在随机变量,并学习潜在变量在Fourier域联合分布的标准低秩张量模型。提出的潜在密度模型是非参数的和普遍的,而不是预定义的先验假设在变分自动编码器。自动编码器和潜在密度估计器的联合优化通过一个公式来实现,该公式通过最小化潜在域中的负对数似然和自动编码器重建损失的组合来学习两者。我们证明了所提出的模型在玩具、表格和图像数据集的回归任务、采样和异常检测方面取得了很好的效果。 摘要:Learning generative probabilistic models is a core problem in machine learning, which presents significant challenges due to the curse of dimensionality. This paper proposes a joint dimensionality reduction and non-parametric density estimation framework, using a novel estimator that can explicitly capture the underlying distribution of appropriate reduced-dimension representations of the input data. The idea is to jointly design a nonlinear dimensionality reducing auto-encoder to model the training data in terms of a parsimonious set of latent random variables, and learn a canonical low-rank tensor model of the joint distribution of the latent variables in the Fourier domain. The proposed latent density model is non-parametric and universal, as opposed to the predefined prior that is assumed in variational auto-encoders. Joint optimization of the auto-encoder and the latent density estimator is pursued via a formulation which learns both by minimizing a combination of the negative log-likelihood in the latent domain and the auto-encoder reconstruction loss. We demonstrate that the proposed model achieves very promising results on toy, tabular, and image datasets on regression tasks, sampling, and anomaly detection.

【25】 Sparse logistic regression on functional data 标题:函数数据的稀疏Logistic回归

作者:Yunnan Xu,Pang Du,John Robertson,Ryan Senger 链接:https://arxiv.org/abs/2106.10583 摘要:基于一项血液透析监测研究,我们提出了一个具有功能预测因子的logistic模型,称为稀疏函数logistic回归(SFLR),其中相应的系数函数为{it locally Sparse},即在其域的某些子区域上完全为零。系数函数和截距参数是通过B样条展开的双惩罚似然法估计的。一种惩罚用于控制系数函数估计的粗糙度,另一种惩罚以$L_1$范数的形式强制执行局部稀疏性。设计了一个Newton-Raphson程序来优化惩罚似然法。仿真结果表明,SFLR在识别空域的同时,能够对非空域上的系数函数产生一个平滑且相当好的估计。将该方法应用于heomdiasis研究产生的拉曼光谱数据,精确定位波数区域,以识别对透析过程有贡献的关键化学品。 摘要:Motivated by a hemodialysis monitoring study, we propose a logistic model with a functional predictor, called the Sparse Functional Logistic Regression (SFLR), where the corresponding coefficient function is {it locally sparse}, that is, it is completely zero on some subregions of its domain. The coefficient function, together with the intercept parameter, are estimated through a doubly-penalized likelihood approach with a B-splines expansion. One penalty is for controlling the roughness of the coefficient function estimate and the other penalty, in the form of the $L_1$ norm, enforces the local sparsity. A Newton-Raphson procedure is designed for the optimization of the penalized likelihood. Our simulations show that SFLR is capable of generating a smooth and reasonably good estimate of the coefficient function on the non-null region(s) while recognizing the null region(s). Application of the method to the Raman spectral data generated from the heomdialysis study pinpoint the wavenumber regions for identifying key chemicals contributing to the dialysis progress.

【26】 Choosing the Estimand When Matching or Weighting in Observational Studies 标题:观测研究中匹配或加权时估计量的选择

作者:Noah Greifer,Elizabeth A. Stuart 机构:Department of Mental Health, Johns Hopkins Bloomberg School of Public Health; Department, of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health 链接:https://arxiv.org/abs/2106.10577 摘要:观察性研究的匹配和加权方法需要选择一个估计值,即与特定目标人群相关的因果关系。常用的估计包括治疗组的平均治疗效果(ATT)、未治疗组的平均治疗效果(ATU)、群体的平均治疗效果(ATE)和重叠组的平均治疗效果(即均衡群体);ATO)。每个估计都有自己的假设、解释和统计方法,可以用来估计它。本文提供了选择和解释评估的指导,以帮助医学研究人员正确实施用于评估观察研究中因果关系的统计方法,并帮助受众正确解释这些研究的结果和局限性。文中还讨论了回归分析和工具变量分析结果的解释。仔细地选择一个估计值对于从观察数据的分析中做出有效的推断和确保结果是可复制的并且对从业者有用是至关重要的。 摘要:Matching and weighting methods for observational studies require the choice of an estimand, the causal effect with reference to a specific target population. Commonly used estimands include the average treatment effect in the treated (ATT), the average treatment effect in the untreated (ATU), the average treatment effect in the population (ATE), and the average treatment effect in the overlap (i.e., equipoise population; ATO). Each estimand has its own assumptions, interpretation, and statistical methods that can be used to estimate it. This article provides guidance on selecting and interpreting an estimand to help medical researchers correctly implement statistical methods used to estimate causal effects in observational studies and to help audiences correctly interpret the results and limitations of these studies. The interpretations of the estimands resulting from regression and instrumental variable analyses are also discussed. Choosing an estimand carefully is essential for making valid inferences from the analysis of observational data and ensuring results are replicable and useful for practitioners.

【27】 Geographic and Racial Disparities in the Incidence of Low Birthweight in Pennsylvania 标题:宾夕法尼亚州低出生体重发生率的地理和种族差异

作者:Guangzi Song,Loni Philip Tabb,Harrison Quick 机构:Department of Epidemiology and Biostatistics, Drexel University, Philadelphia, PA 链接:https://arxiv.org/abs/2106.10571 摘要:出生时出生体重较低和极低的婴儿(即出生体重分别低于2500克和1500克)与其他婴儿相比,发生并发症的风险增加,而出生体重较低的婴儿比例是评估人群公共卫生状况时常用的指标。虽然许多因素会增加婴儿出生体重偏低的风险,但许多因素可能与母亲的社会经济地位有关,这反过来又导致低体重儿发生率的种族差异很大。在这里,我们采用贝叶斯统计模型来分析宾夕法尼亚州各县按种族/民族划分的低出生体重婴儿的比例。由于宾夕法尼亚州许多县的出生人数较少,而且出生体重较轻,按种族划分,我们的方法必须走一条很好的路线。一方面,利用空间结构有助于提高我们估计的精度。另一方面,我们必须谨慎,避免让模型压倒数据中的信息,产生虚假结论。因此,我们首先开发一个框架,通过它我们可以测量(和控制)我们的空间模型的信息性。在通过模拟展示了我们框架的特性之后,我们分析了来自宾夕法尼亚州的低出生体重数据,并检验了常用的条件自回归模型可能导致过度平滑的程度。然后,我们使用我们提出的框架重新分析数据,并强调其检测(或不检测)低出生体重发生率种族差异证据的能力。 摘要:Babies born with low and very low birthweights -- i.e., birthweights below 2,500 and 1,500 grams, respectively -- have an increased risk of complications compared to other babies, and the proportion of babies with a low birthweight is a common metric used when evaluating public health in a population. While many factors increase the risk of a baby having a low birthweight, many can be linked to the mother's socioeconomic status, which in turn contributes to large racial disparities in the incidence of low weight births. Here, we employ Bayesian statistical models to analyze the proportion of babies with low birthweight in Pennsylvania counties by race/ethnicity. Due to the small number of births -- and low weight births -- in many Pennsylvania counties when stratified by race/ethnicity, our methods must walk a fine line. On one hand, leveraging spatial structure can help improve the precision of our estimates. On the other hand, we must be cautious to avoid letting the model overwhelm the information in the data and produce spurious conclusions. As such, we first develop a framework by which we can measure (and control) the informativeness of our spatial model. After demonstrating the properties of our framework via simulation, we analyze the low birthweight data from Pennsylvania and examine the extent to which the commonly used conditional autoregressive model can lead to oversmoothing. We then reanalyze the data using our proposed framework and highlight its ability to detect (or not detect) evidence of racial disparities in the incidence of low birthweight.

【28】 Rayleigh-Gauss-Newton optimization with enhanced sampling for variational Monte Carlo 标题:变分蒙特卡罗的增强抽样Rayleigh-Gauss-Newton优化

作者:Robert J. Webber,Michael Lindsey 机构:Courant Institute of Mathematical Sciences, New York University, New York , USA 备注:12 pages, 7 figures 链接:https://arxiv.org/abs/2106.10558 摘要:变分蒙特卡罗(VMC)是一种计算基态波函数的方法,由于引入了基于神经网络的波函数参数化,这种方法最近变得更加强大。然而,有效地训练神经波函数使其收敛到能量最小值仍然是一个难题。在这项工作中,我们分析了VMC中使用的优化和抽样方法,并介绍了改进方法来提高它们的性能。首先,基于理论上的无噪声收敛性分析,提出了一种新的优化算法Rayleigh-Gauss-Newton法,它可以改进梯度下降法和自然梯度下降法,实现超线性收敛。其次,为了在随机噪声存在的情况下实现这种有利的比较,我们分析了采样误差对VMC参数更新的影响,并通过实验证明了并行回火方法可以降低这种影响。特别地,我们证明了RGN对于在优化过程中采样器获得新的配置空间区域时出现的能量峰值具有鲁棒性。最后,将理论应用到实际中,我们将改进的优化和采样方法应用到大晶格的横向场Ising和XXZ模型中,只需更新200-500个参数,就可以得到非常高精度的基态能量估计。 摘要:Variational Monte Carlo (VMC) is an approach for computing ground-state wavefunctions that has recently become more powerful due to the introduction of neural network-based wavefunction parametrizations. However, efficiently training neural wavefunctions to converge to an energy minimum remains a difficult problem. In this work, we analyze optimization and sampling methods used in VMC and introduce alterations to improve their performance. First, based on theoretical convergence analysis in a noiseless setting, we motivate a new optimizer that we call the Rayleigh-Gauss-Newton method, which can improve upon gradient descent and natural gradient descent to achieve superlinear convergence. Second, in order to realize this favorable comparison in the presence of stochastic noise, we analyze the effect of sampling error on VMC parameter updates and experimentally demonstrate that it can be reduced by the parallel tempering method. In particular, we demonstrate that RGN can be made robust to energy spikes that occur when new regions of configuration space become available to the sampler over the course of optimization. Finally, putting theory into practice, we apply our enhanced optimization and sampling methods to the transverse-field Ising and XXZ models on large lattices, yielding ground-state energy estimates with remarkably high accuracy after just 200-500 parameter updates.

【29】 Fasano-Franceschini Test: an Implementation of a 2-Dimensional Kolmogorov-Smirnov test in R 标题:Fasano-Franceschini检验:二维Kolmogorov-Smirnov检验在R中的实现

作者:Elan Ness-Cohn,Rosemary Braun 机构:Northwestern University 备注:8 pages, 4 figures 链接:https://arxiv.org/abs/2106.10539 摘要:单变量Kolmogorov-Smirnov(KS)检验是一种非参数统计检验,旨在评估一组数据是否符合给定的概率分布(或者,在两个样本的情况下,两个样本是否来自相同的基础分布)。KS测试的多功能性使其成为统计分析的基石,并在科学学科中普遍使用。然而,Kolmogorov和Smirnov提出的检验不能自然地推广到多维分布。这里,我们展示了fasano.franceschini.test包,这是由fasano和franceschini(fasano和franceschini 1987)定义的2-dks双样本测试的R实现。fasano.franceschini.test包提供了三个改进,与全面R存档网络(CRAN)上当前的2-dks测试相比:(i)fasano和franceschini测试显示在$O(n^2)$中运行,而Peacock实现在$O(n^3)$中运行(ii)程序包实现了处理数据中关系的程序;以及(iii)该软件包实现了一个并行引导过程,以改进显著性测试。最终,fasano.franceschini.test包提供了一个用于分析二维随机样本的稳健统计测试。 摘要:The univariate Kolmogorov-Smirnov (KS) test is a non-parametric statistical test designed to assess whether a set of data is consistent with a given probability distribution (or, in the two-sample case, whether the two samples come from the same underlying distribution). The versatility of the KS test has made it a cornerstone of statistical analysis and is commonly used across the scientific disciplines. However, the test proposed by Kolmogorov and Smirnov does not naturally extend to multidimensional distributions. Here, we present the fasano.franceschini.test package, an R implementation of the 2-D KS two-sample test as defined by Fasano and Franceschini (Fasano and Franceschini 1987). The fasano.franceschini.test package provides three improvements over the current 2-D KS test on the Comprehensive R Archive Network (CRAN): (i) the Fasano and Franceschini test has been shown to run in $O(n^2)$ versus the Peacock implementation which runs in $O(n^3)$; (ii) the package implements a procedure for handling ties in the data; and (iii) the package implements a parallelized bootstrapping procedure for improved significance testing. Ultimately, the fasano.franceschini.test package presents a robust statistical test for analyzing random samples defined in 2-dimensions.

【30】 Robust Hierarchical Modeling of Counts under Zero-inflation and Outliers 标题:零膨胀和异常值下计数的鲁棒分层建模

作者:Yasuyuki Hamura,Kaoru Irie,Shonosuke Sugasawa 机构:Graduate School of Economics, The University of Tokyo, Center for Spatial Information Science, The University of Tokyo 备注:41 pages 链接:https://arxiv.org/abs/2106.10503 摘要:在许多科学应用中,零通货膨胀和大异常值的计数数据无处不在。然而,在标准统计模型(如泊松分布或负二项分布)下的后验分析对此类污染非常敏感。本文介绍了一种新的计数贝叶斯建模框架,该框架对零膨胀和大异常值都具有鲁棒性。在此过程中,我们引入了重定标贝塔分布,并采用它来吸收零计数和外围计数的不良影响。该方法具有两个显著的特点:一是通过自定义Gibbs采样算法实现后验计算的效率,二是理论上的后验鲁棒性,即后验分布中的极端异常值被自动去除。通过仿真和实际数据验证了该方法的有效性。 摘要:Count data with zero inflation and large outliers are ubiquitous in many scientific applications. However, the posterior analysis under a standard statistical model such as Poisson or negative binomial distribution is sensitive to such contamination. This paper introduces a novel framework for Bayesian modeling of counts robust to both zeros inflation and large outliers. In doing so, we introduce the rescaled beta distribution and adopt it to absorb undesirable effects from zero and outlying counts. The proposed approach has two appealing features: the efficiency of the posterior computation via a custom Gibbs sampling algorithm, and the theoretical posterior robustness, where the extreme outliers are automatically removed from the posterior distribution. We demonstrate the usefulness of the proposed method through simulation and real data applications.

【31】 The Tangent Exponential Model 标题:正切指数模型

作者:Anthony C. Davison,Nancy Reid 机构: Department of Statistical Sciences, University of Toronto, 700 University Ave 链接:https://arxiv.org/abs/2106.10496 摘要:似然函数是参数统计推断的频数和贝叶斯公式的核心,估计量和检验统计量的抽样分布以及后验密度的大样本近似在实践中得到广泛应用。改进的近似方法已经得到广泛的研究,当样本很小或有许多干扰参数时,它可以提供高精度的推断。本文回顾了D.~a.~S.~Fraser及其同事在一系列文章中提出的基于切线指数模型的改进近似法,试图解释该模型的理论基础,并为相关文献提供指导,包括部分注释书目。 摘要:The likelihood function is central to both frequentist and Bayesian formulations of parametric statistical inference, and large-sample approximations to the sampling distributions of estimators and test statistics, and to posterior densities, are widely used in practice. Improved approximations have been widely studied and can provide highly accurate inferences when samples are small or there are many nuisance parameters. This article reviews improved approximations based on the tangent exponential model developed in a series of articles by D.~A.~S.~Fraser and co-workers, attempting to explain the theoretical basis of this model and to provide a guide to the associated literature, including a partially-annotated bibliography.

【32】 Generalized Spatial and Spatiotemporal ARCH Models 标题:广义空间和时空ARCH模型

作者:Philipp Otto,Wolfgang Schmid 机构:Leibniz University Hannover, Germany, European University Viadrina, Frankfurt (Oder), Germany 链接:https://arxiv.org/abs/2106.10477 摘要:在时间序列分析中,特别是在金融领域,广义自回归条件异方差(GARCH)模型是广泛应用的统计工具,用于对波动率簇(即风险增加或减少的时期)进行建模。相比之下,在条件二阶矩中建立空间相关性模型,直到现在还没有被认为是至关重要的。只有少数模型被提出用于模拟风险增加的局部集群。在本文中,我们在一个统一的时空GARCH框架中引入了一个新的空间GARCH过程,它也涵盖了所有先前提出的空间ARCH模型、指数空间GARCH模型和时间序列GARCH模型。与以前的时空和时间序列模型不同,这种空间GARCH模型允许所有空间单元的瞬时溢出。对于这种通用的建模框架,估计量是基于非线性最小二乘法推导出来的。最后,通过montecarlo模拟研究和一个以1995年至2014年柏林邮政编码地区的房地产价格为重点的实证案例,证明了该模型的使用。将空间自回归模型应用于数据以说明如何通过空间GARCH型模型捕捉局部变化的模型不确定性(例如,由于潜在回归)。 摘要:In time-series analyses, particularly for finance, generalized autoregressive conditional heteroscedasticity (GARCH) models are widely applied statistical tools for modelling volatility clusters (i.e., periods of increased or decreased risk). In contrast, it has not been considered to be of critical importance until now to model spatial dependence in the conditional second moments. Only a few models have been proposed for modelling local clusters of increased risks. In this paper, we introduce a novel spatial GARCH process in a unified spatial and spatiotemporal GARCH framework, which also covers all previously proposed spatial ARCH models, exponential spatial GARCH, and time-series GARCH models. In contrast to previous spatiotemporal and time series models, this spatial GARCH allows for instantaneous spill-overs across all spatial units. For this common modelling framework, estimators are derived based on a non-linear least-squares approach. Eventually, the use of the model is demonstrated by a Monte Carlo simulation study and by an empirical example that focuses on real estate prices from 1995 to 2014 across the ZIP-Code areas of Berlin. A spatial autoregressive model is applied to the data to illustrate how locally varying model uncertainties (e.g., due to latent regressors) can be captured by the spatial GARCH-type models.

【33】 Discussion on Competition for Spatial Statistics for Large Datasets 标题:关于大数据集的空间统计竞争问题的探讨

作者:Roman Flury,Reinhard Furrer 机构: Dept. of Mathematics, University of Zurich , Dept. of Computational Science, University of, arXiv:,.,v, [stat.ME] , Jun 备注:5 pages, 1 figure 链接:https://arxiv.org/abs/2106.10462 摘要:我们讨论了AppStatUZH团队参与大型数据集空间统计竞赛中不同空间近似的综合和无偏比较的经验和结果。在每个不同的子竞争中,我们基于似然函数估计协方差模型的参数,并用简单克里格法预测缺失观测值。我们用协方差锥化或紧支撑的Wendland协方差函数来逼近协方差模型。 摘要:We discuss the experiences and results of the AppStatUZH team's participation in the comprehensive and unbiased comparison of different spatial approximations conducted in the Competition for Spatial Statistics for Large Datasets. In each of the different sub-competitions, we estimated parameters of the covariance model based on a likelihood function and predicted missing observations with simple kriging. We approximated the covariance model either with covariance tapering or a compactly supported Wendland covariance function.

【34】 Deep Learning for Functional Data Analysis with Adaptive Basis Layers 标题:基于自适应基本层的函数数据分析深度学习

作者:Junwen Yao,Jonas Mueller,Jane-Ling Wang 备注:ICML 2021 链接:https://arxiv.org/abs/2106.10414 摘要:尽管深部神经网络已经取得了广泛的成功,但其在功能性数据中的应用仍然很少。函数数据的无限维性意味着标准的学习算法只有在适当的降维后才能应用,通常通过基展开来实现。目前,这些基础是事先选择的,没有手头任务的信息,因此可能对指定的任务无效。相反,我们建议以端到端的方式自适应地学习这些基础。我们介绍了一种新的神经网络,它采用一个新的基层,其隐单元是每个基函数本身,作为一个微神经网络来实现。我们的架构学习对功能输入应用简约降维,只关注与目标相关的信息,而不是输入函数中不相关的变化。在众多的函数数据分类/回归任务中,我们的方法在经验上优于其他类型的神经网络,并且我们证明了我们的方法在统计上与低泛化误差是一致的。代码位于:url{https://github.com/jwyyy/AdaFNN}. 摘要:Despite their widespread success, the application of deep neural networks to functional data remains scarce today. The infinite dimensionality of functional data means standard learning algorithms can be applied only after appropriate dimension reduction, typically achieved via basis expansions. Currently, these bases are chosen a priori without the information for the task at hand and thus may not be effective for the designated task. We instead propose to adaptively learn these bases in an end-to-end fashion. We introduce neural networks that employ a new Basis Layer whose hidden units are each basis functions themselves implemented as a micro neural network. Our architecture learns to apply parsimonious dimension reduction to functional inputs that focuses only on information relevant to the target rather than irrelevant variation in the input function. Across numerous classification/regression tasks with functional data, our method empirically outperforms other types of neural networks, and we prove that our approach is statistically consistent with low generalization error. Code is available at: url{https://github.com/jwyyy/AdaFNN}.

【35】 On the bimodal Gumbel model with application to environmental data 标题:双峰Gumbel模型及其在环境数据中的应用

作者:Cira E. G. Otiniano,Roberto Vila,Pedro C. Brom,Marcelo Bourguignon 机构:Departamento de Estat´ıstica, Universidade de Bras´ılia,-, Bras´ılia, Brazil, Departamento de Estat´ıstica, Universidade Federal do Rio Grande do Norte,-, NatalRN, Brazil 备注:23 pages, 17 figures 链接:https://arxiv.org/abs/2106.10398 摘要:Gumbel模型是一种非常流行的统计模型,因为它具有广泛的适用性,例如在某些生存、环境、金融或可靠性研究过程中。在这项工作中,我们引入了Gumbel分布的双峰推广,它可以替代双峰数据模型。我们推导了相应的概率密度函数和危险率函数的解析形式,并提供了图示。此外,我们还讨论了这种密度的模、双峰性、矩母函数和矩等性质。我们的结果用马尔可夫链蒙特卡罗模拟方法进行了验证。参数估计采用极大似然法。最后,我们还对实际数据进行了应用,证明了该分布的有效性。 摘要:The Gumbel model is a very popular statistical model due to its wide applicability for instance in the course of certain survival, environmental, financial or reliability studies. In this work, we have introduced a bimodal generalization of the Gumbel distribution that can be an alternative to model bimodal data. We derive the analytical shapes of the corresponding probability density function and the hazard rate function and provide graphical illustrations. Furthermore, We have discussed the properties of this density such as mode, bimodality, moment generating function and moments. Our results were verified using the Markov chain Monte Carlo simulation method. The maximum likelihood method is used for parameters estimation. Finally, we also carry out an application to real data that demonstrates the usefulness of the proposed distribution.

【36】 Learning the Preferences of Uncertain Humans with Inverse Decision Theory 标题:用逆决策理论学习不确定人的偏好

作者:Cassidy Laidlaw,Stuart Russell 机构:University of California, Berkeley 链接:https://arxiv.org/abs/2106.10394 摘要:现有的用于学习人类偏好的观察方法,如逆强化学习,通常对人类环境的可观察性做出强有力的假设。然而,在现实中,人们在不确定的情况下做出许多重要的决策。为了更好地理解这些情况下的偏好学习,我们研究了逆决策理论(IDT)的背景,IDT是一个先前提出的框架,在这个框架中,观察到一个人在不确定的情况下做出非连续的二元决策。在IDT中,人们的偏好是通过损失函数来传递的,损失函数表示不同类型错误之间的权衡。我们给出了IDT的第一个统计分析,提供了确定这些偏好所需的条件,并描述了样本的复杂性——即必须观察的决策数量,以了解人类在达到所需精度时所做的权衡。有趣的是,我们发现当决策问题更不确定时,识别偏好实际上更容易。此外,不确定决策问题允许我们放松不现实的假设,即人是一个最优决策者,但仍然确定他们的确切偏好;我们也给出了这种次优情况下的样本复杂性。我们的分析与直觉相矛盾,即部分可观测性会使偏好学习变得更加困难。它还为理解和改进不确定和次优人类的偏好学习方法提供了第一步。 摘要:Existing observational approaches for learning human preferences, such as inverse reinforcement learning, usually make strong assumptions about the observability of the human's environment. However, in reality, people make many important decisions under uncertainty. To better understand preference learning in these cases, we study the setting of inverse decision theory (IDT), a previously proposed framework where a human is observed making non-sequential binary decisions under uncertainty. In IDT, the human's preferences are conveyed through their loss function, which expresses a tradeoff between different types of mistakes. We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision. Interestingly, we show that it is actually easier to identify preferences when the decision problem is more uncertain. Furthermore, uncertain decision problems allow us to relax the unrealistic assumption that the human is an optimal decision maker but still identify their exact preferences; we give sample complexities in this suboptimal case as well. Our analysis contradicts the intuition that partial observability should make preference learning more difficult. It also provides a first step towards understanding and improving preference learning methods for uncertain and suboptimal humans.

【37】 Systemic Infinitesimal Over-dispersion on General Stochastic Graphical Models 标题:一般随机图模型上的系统无穷小超离散性

作者:Ning Ning,Edward L. Ionides 机构:Department of Statistics, University of Michigan, Ann Arbor. e-mail: 备注:47 pages 链接:https://arxiv.org/abs/2106.10387 摘要:相互作用种群的随机模型在流行病学和生态学等科学领域具有重要作用,然而将常微分方程模型扩展到马尔可夫链的标准方法在均值-方差关系方面没有足够的灵活性来匹配数据(例如,.cite{bjornstad2001noised})。先前由{breto2011compound}提出的关于单箭头上时间齐次动力学的理论表明,伽马白噪声可以用来构造某些超离散马尔可夫链,从而导致广泛使用的模型(例如.cite{breto2009time,he2010plug})。本文定义了系统无穷小过色散,发展了一般时间非齐次随机图形模型的理论和方法。我们的方法基于Dirichlet噪声,得到了一类新的一般直图上的Markov模型。它与现代基于可能性的推理方法(例如.cite{ionides2006推理,ionides2015推理,king2008inapparent})兼容,因此我们可以评估新模型与数据的拟合程度。我们在一个广泛分析的麻疹数据集上演示了我们的方法,在一个经典的SEIR模型中加入Dirichlet噪声。我们发现,所提出的方法比伽马白噪声方法具有更高的对数似然性,由此产生的参数估计为这种生物系统的过度分散提供了新的见解。 摘要:Stochastic models of interacting populations have crucial roles in scientific fields such as epidemiology and ecology, yet the standard approach to extending an ordinary differential equation model to a Markov chain does not have sufficient flexibility in the mean-variance relationship to match data (e.g. cite{bjornstad2001noisy}). A previous theory on time-homogeneous dynamics over a single arrow by cite{breto2011compound} showed how gamma white noise could be used to construct certain over-dispersed Markov chains, leading to widely used models (e.g. cite{breto2009time,he2010plug}). In this paper, we define systemic infinitesimal over-dispersion, developing theory and methodology for general time-inhomogeneous stochastic graphical models. Our approach, based on Dirichlet noise, leads to a new class of Markov models over general direct graphs. It is compatible with modern likelihood-based inference methodologies (e.g. cite{ionides2006inference,ionides2015inference,king2008inapparent}) and therefore we can assess how well the new models fit data. We demonstrate our methodology on a widely analyzed measles dataset, adding Dirichlet noise to a classical SEIR (Susceptible-Exposed-Infected-Recovered) model. We find that the proposed methodology has higher log-likelihood than the gamma white noise approach, and the resulting parameter estimations provide new insights into the over-dispersion of this biological system.

【38】 Scalable Bayesian change point detection with spike and slab priors 标题:尖峰先验和板条先验的可伸缩贝叶斯变点检测

作者:Lorenzo Cappello,Oscar Hernan Madrid Padilla,Julia A. Palacios 链接:https://arxiv.org/abs/2106.10383 摘要:我们研究使用尖峰和板先验一致估计的数量变化点和他们的位置。利用变量选择文献中的最新结果,我们证明了在多离线变化点检测问题中,基于尖峰和板先验的估计器可以获得最优的定位率。在此基础上,我们提出了一种最快的贝叶斯方法之一的贝叶斯变化点检测方法,该方法比其他方法对错误项的错误描述更具鲁棒性。我们通过实证工作证明了我们的方法相对于一些最先进的基准的良好性能。 摘要:We study the use of spike and slab priors for consistent estimation of the number of change points and their locations. Leveraging recent results in the variable selection literature, we show that an estimator based on spike and slab priors achieves optimal localization rate in the multiple offline change point detection problem. Based on this estimator, we propose a Bayesian change point detection method, which is one of the fastest Bayesian methodologies, and it is more robust to misspecification of the error terms than the competing methods. We demonstrate through empirical work the good performance of our approach vis-a-vis some state-of-the-art benchmarks.

【39】 On the benefits of maximum likelihood estimation for Regression and Forecasting 标题:论极大似然估计在回归预测中的效益

作者:Pranjal Awasthi,Abhimanyu Das,Rajat Sen,Ananda Theertha Suresh 机构:Google Research 链接:https://arxiv.org/abs/2106.10370 摘要:我们主张用一种实用的最大似然估计(MLE)方法进行回归和预测,作为对特定目标度量的典型经验风险最小化(ERM)方法的替代方法。这种方法更适合于捕获归纳偏差,例如数据集中的先验领域知识,并且可以在推理时输出事后估计器,从而优化不同类型的目标度量。我们给出的理论结果表明,在某些一般条件下,我们的方法总是与目标度量的任何估计相竞争的,并且在许多实际情况下(如Poisson回归)实际上可以比ERM优越得多。我们的经验证明,我们的方法实例化一个设计良好的通用混合似然家庭可以获得优于ERM的各种任务的时间序列预测和回归数据集不同的数据分布。 摘要:We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is always competitive with any estimator for the target metric under some general conditions, and in many practical settings (such as Poisson Regression) can actually be much superior to ERM. We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over ERM for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

【40】 Bayesian decision theory for tree-based adaptive screening tests with an application to youth delinquency 标题:树形自适应筛查的贝叶斯决策理论及其在青少年犯罪中的应用

作者:Chelsea Krantsevich,P. Richard Hahn,Yi Zheng,Charles Katz 机构:KATZ,‡ 备注:20 pages, 11 figures 链接:https://arxiv.org/abs/2106.10364 摘要:以早期干预为基础的预防犯罪战略有赖于准确的风险评估工具来确定高风险青年。在这方面,文书必须便于管理,这尤其意味着文书必须相当简短;适应性筛选测试在这方面很有用。尽管项目反应理论(IRT)在产生可靠的适应性测试方面有着悠久而丰富的历史,但利用分类树和回归树构建的适应性测试正在成为替代传统IRT方法进行项目选择的一种流行方法。另一方面,与IRT不同的是,基于树的问卷在给药期间不需要实时的参数估计。另一方面,虽然项目反应理论为终止考试提供了可靠的标准,但基于树的自适应测试的停止标准(最大树深度)还不清楚。我们提出了一种贝叶斯决策理论的方法来描述管理不同长度的基于树的问卷的权衡。这种形式主义包括:(1)一个效用函数来衡量评价的优劣;2) 一个目标群体,其效用应最大化;3) 由不同长度评估组成的动作空间,通过树拟合算法填充。利用这个框架,我们为缩短考试时间的权衡提供了不确定性估计,让从业者有原则地确定最佳考试时间。该方法通过洪都拉斯青少年犯罪风险评估的应用进行了验证。 摘要:Crime prevention strategies based on early intervention depend on accurate risk assessment instruments for identifying high risk youth. It is important in this context that the instruments be convenient to administer, which means, in particular, that they must be reasonably brief; adaptive screening tests are useful for this purpose. Although item response theory (IRT) bears a long and rich history in producing reliable adaptive tests, adaptive tests constructed using classification and regression trees are becoming a popular alternative to the traditional IRT approach for item selection. On the upside, unlike IRT, tree-based questionnaires require no real-time parameter estimation during administration. On the downside, while item response theory provides robust criteria for terminating the exam, the stopping criterion for a tree-based adaptive test (the maximum tree depth) is unclear. We present a Bayesian decision theory approach for characterizing the trade-offs of administering tree-based questionnaires of different lengths. This formalism involves specifying 1) a utility function measuring the goodness of the assessment; 2) a target population over which this utility should be maximized; 3) an action space comprised of different-length assessments, populated via a tree-fitting algorithm. Using this framework, we provide uncertainty estimates for the trade-offs of shortening the exam, allowing practitioners to determine an optimal exam length in a principled way. The method is demonstrated through an application to youth delinquency risk assessment in Honduras.

【41】 Scalable Econometrics on Big Data -- The Logistic Regression on Spark 标题:大数据上的可伸缩计量经济学--电光上的Logistic回归

作者:Aurélien Ouattara,Matthieu Bulté,Wan-Ju Lin,Philipp Scholl,Benedikt Veit,Christos Ziakas,Florian Felice,Julien Virlogeux,George Dikos 机构:Luxembourg 2Technical University of Mu-nich 链接:https://arxiv.org/abs/2106.10341 摘要:超大数据集的可访问性越来越高,设计用于高效处理大量数据的计算工具正在迅速民主化。然而,传统的统计和计量工具在处理如此庞大的数据集时仍然缺乏流畅性。本文研究了大数据集上的计量经济学,特别是Spark上的logistic回归。我们回顾了Spark中用于拟合logistic回归的函数的健壮性,并介绍了我们在PySpark中开发的一个包,该包返回统计推断所必需的logistic回归的统计摘要。 摘要:Extra-large datasets are becoming increasingly accessible, and computing tools designed to handle huge amount of data efficiently are democratizing rapidly. However, conventional statistical and econometric tools are still lacking fluency when dealing with such large datasets. This paper dives into econometrics on big datasets, specifically focusing on the logistic regression on Spark. We review the robustness of the functions available in Spark to fit logistic regression and introduce a package that we developed in PySpark which returns the statistical summary of the logistic regression, necessary for statistical inference.

【42】 Differentiable Particle Filtering without Modifying the Forward Pass 标题:无需修改前向通道的可微粒子滤波

作者:Adam Ścibior,Vaden Masrani,Frank Wood 机构:University of British Columbia, Inverted AI, Mila 备注:11 pages, 3 figures 链接:https://arxiv.org/abs/2106.10314 摘要:近年来,粒子滤波器已被用于系统优化的梯度下降端到端组件。然而,粒子滤波中的重采样步骤是不可微的,这会使梯度产生偏差并干扰优化。为了解决这一问题,人们提出了几种可微的重采样方法,这些方法都对粒子滤波器的性能进行了显著的和潜在的改进。在本文中,我们展示了如何仅通过修改反向传播中使用的信息,而不改变粒子滤波器的标准前向传递,来获得边缘似然梯度的无偏估计。该方法实现简单,计算开销小,不引入额外的超参数,并可推广到高阶导数。我们称之为停止梯度重采样,因为它可以很容易地通过使用停止梯度操作符的自动微分库来实现,而不是显式地修改向后的消息。 摘要:In recent years particle filters have being used as components in systems optimized end-to-end with gradient descent. However, the resampling step in a particle filter is not differentiable, which biases gradients and interferes with optimization. To remedy this problem, several differentiable variants of resampling have been proposed, all of which modify the behavior of the particle filter in significant and potentially undesirable ways. In this paper, we show how to obtain unbiased estimators of the gradient of the marginal likelihood by only modifying messages used in backpropagation, leaving the standard forward pass of a particle filter unchanged. Our method is simple to implement, has a low computational overhead, does not introduce additional hyperparameters, and extends to derivatives of higher orders. We call it stop-gradient resampling, since it can easily be implemented with automatic differentiation libraries using the stop-gradient operator instead of explicitly modifying the backward messages.

【43】 Weighted Fractional Generalized Cumulative Past Entropy 标题:加权分数广义累积过去熵

作者:Suchandan Kayal 机构:Department of Mathematics, National Institute of Technology Rourkela, Rourkela-, India 备注:23 pages, 8 figures 链接:https://arxiv.org/abs/2106.10312 摘要:本文引入有界支持的非负绝对连续随机变量的加权分数广义累积过去熵。研究了所提出的加权分数测度的各种性质。给出了随机序的界和随机序。建立了该测度与左侧Riemann-Liouville分数阶积分之间的联系。进一步研究了比例反向风险率模型的测度。其次,基于经验分布函数,提出了加权分数广义累积过去熵的非参数估计。为了说明的目的,考虑了具有实际数据集的各种示例。最后,研究了所提出的经验估计量的大样本性质。 摘要:In this paper, we introduce weighted fractional generalized cumulative past entropy of a nonnegative absolutely continuous random variable with bounded support. Various properties of the proposed weighted fractional measure are studied. Bounds and stochastic orderings are derived. A connection between the proposed measure and the left-sided Riemann-Liouville fractional integral is established. Further, the proposed measure is studied for the proportional reversed hazard rate models. Next, a nonparametric estimator of the weighted fractional generalized cumulative past entropy is suggested based on the empirical distribution function. Various examples with a real life data set are considered for the illustration purposes. Finally, large sample properties of the proposed empirical estimator are studied.

【44】 Boundary Graph Neural Networks for 3D Simulations 标题:用于三维仿真的边界图神经网络

作者:Andreas Mayr,Sebastian Lehner,Arno Mayrhofer,Christoph Kloss,Sepp Hochreiter,Johannes Brandstetter 机构:ELLIS Unit Linz, LIT AI Lab, Johannes Kepler University Linz, DCS Computing GmbH, Linz, Austria, Institute of Advanced Research in, Artificial Intelligence (IARAI), University of Amsterdam 链接:https://arxiv.org/abs/2106.11299 摘要:丰富的数据为机器学习在自然科学和工程领域提供了巨大的动力。然而,模拟物理过程的建模仍然很困难。这样做的一个关键问题是几何边界的正确处理。虽然三角化的几何边界在工程应用中非常常见,但是由于它们在尺寸和方向上的异质性,机器学习方法很难对它们进行建模。在这项工作中,我们引入了边界图神经网络(BGNNs),它可以动态地修改图的结构来处理边界条件。通过修改边、增加节点特征和动态插入虚拟节点来构造边界图结构。在工业机械标准件料斗和转鼓的复杂三维颗粒流过程中进行了试验。利用一种昂贵而复杂的离散元方法得到的精确模拟结果,对BGNNs的计算效率、颗粒流和混合熵的预测精度进行了评价。即使存在复杂的边界,BGNNs也能够在数十万个模拟时间步内准确地再现模拟不确定性中的三维颗粒流,最显著的是,颗粒完全停留在几何对象内,而无需使用手工制作的条件或限制。 摘要:The abundance of data has given machine learning huge momentum in natural sciences and engineering. However, the modeling of simulated physical processes remains difficult. A key problem in doing so is the correct handling of geometric boundaries. While triangularized geometric boundaries are very common in engineering applications, they are notoriously difficult to model by machine learning approaches due to their heterogeneity with respect to size and orientation. In this work, we introduce Boundary Graph Neural Networks (BGNNs), which dynamically modify graph structures to address boundary conditions. Boundary graph structures are constructed via modifying edges, augmenting node features, and dynamically inserting virtual nodes. The new BGNNs are tested on complex 3D granular flow processes of hoppers and rotating drums which are standard parts of industrial machinery. Using precise simulations that are obtained by an expensive and complex discrete element method, BGNNs are evaluated in terms of computational efficiency as well as prediction accuracy of particle flows and mixing entropies. Even if complex boundaries are present, BGNNs are able to accurately reproduce 3D granular flows within simulation uncertainties over hundreds of thousands of simulation timesteps, and most notably particles completely stay within the geometric objects without using handcrafted conditions or restrictions.

【45】 A causal view on compositional data 标题:关于成分数据的一种因果观

作者:Elisabeth Ailer,Christian L. Müller,Niki Kilbertus 机构:Helmholtz AI, Munich, LMU & Helmholtz Zentrum Munich, Flatiron Institute, New York 备注:Code available on this https URL 链接:https://arxiv.org/abs/2106.11234 摘要:许多科学数据集本质上是组成的。重要的例子包括生态学中的物种丰富度、地质学中的岩石成分、大规模文本语料库中的主题成分以及分子生物学中的测序计数数据。在这里,我们提供了一个因果关系的观点,在一个工具变量设置的成分数据作为原因。自始至终,我们特别注意从干预的角度来解释组成原因,并清晰地阐明从业者的潜在陷阱。将现代高维微生物组测序数据作为及时的说明性用例,我们的分析首先揭示了流行的一维信息理论摘要统计,如多样性和丰富度,可能不足以从生态数据中得出因果结论。相反,我们提倡多元替代使用统计数据转换和回归技术,考虑到特殊结构的组成样本空间。通过对合成数据和半合成数据的比较分析,说明了该方法的优点和局限性。我们假设我们的框架可以为成分数据背景下的因果估计提供一个有用的起点。 摘要:Many scientific datasets are compositional in nature. Important examples include species abundances in ecology, rock compositions in geology, topic compositions in large-scale text corpora, and sequencing count data in molecular biology. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. Throughout, we pay particular attention to the interpretation of compositional causes from the viewpoint of interventions and crisply articulate potential pitfalls for practitioners. Focusing on modern high-dimensional microbiome sequencing data as a timely illustrative use case, our analysis first reveals that popular one-dimensional information-theoretic summary statistics, such as diversity and richness, may be insufficient for drawing causal conclusions from ecological data. Instead, we advocate for multivariate alternatives using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account. In a comparative analysis on synthetic and semi-synthetic data we show the advantages and limitations of our proposal. We posit that our framework may provide a useful starting point for cause-effect estimation in the context of compositional data.

【46】 Corruption Robust Active Learning 标题:腐败鲁棒主动学习

作者:Yifang Chen,Simon S. Du,Kevin Jamieson 机构:Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle,WA 链接:https://arxiv.org/abs/2106.11220 摘要:我们对未知对抗性标签损坏情况下基于流媒体的二值分类主动学习进行了理论研究。在这种情况下,每次学习者观察样本之前,对手都会决定是否破坏标签。首先,我们证明,在良性破坏环境(包括作为特例的错误指定环境)中,随着假设消除阈值的略微增大,经典的RobustCAL框架可以(令人惊讶地)获得与非破坏环境中几乎相同的标签复杂性保证。但是,此算法在一般损坏设置中可能会失败。为了解决这个缺点,我们提出了一个新的算法,它是可证明正确的不存在任何假设的腐蚀。此外,该算法在未损坏设置(由RobustCAL实现)中具有极大的最小标签复杂度,并且只需要在损坏设置中增加$tilde{mathcal{O}(C{mathrm{total}})$个标签即可实现$mathcal{O}(varepsilon frac{C{mathrm{total}}{n})$,其中$varepsilon$是目标精度,$C{mathrm{total}}$是损坏的总数,$n$是未标记样本的总数。 摘要:We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions. In this setting, every time before the learner observes a sample, the adversary decides whether to corrupt the label or not. First, we show that, in a benign corruption setting (which includes the misspecification setting as a special case), with a slight enlargement on the hypothesis elimination threshold, the classical RobustCAL framework can (surprisingly) achieve nearly the same label complexity guarantee as in the non-corrupted setting. However, this algorithm can fail in the general corruption setting. To resolve this drawback, we propose a new algorithm which is provably correct without any assumptions on the presence of corruptions. Furthermore, this algorithm enjoys the minimax label complexity in the non-corrupted setting (which is achieved by RobustCAL) and only requires $tilde{mathcal{O}}(C_{mathrm{total}})$ additional labels in the corrupted setting to achieve $mathcal{O}(varepsilon frac{C_{mathrm{total}}}{n})$, where $varepsilon$ is the target accuracy, $C_{mathrm{total}}$ is the total number of corruptions and $n$ is the total number of unlabeled samples.

【47】 3D Shape Registration Using Spectral Graph Embedding and Probabilistic Matching 标题:基于谱图嵌入和概率匹配的三维形状配准

作者:Avinash Sharma,Radu Horaud,Diana Mateus 机构:Inria Grenoble Rhˆone-Alpes, avenue de l’Europe, Montbonnot Saint-Martin, France 链接:https://arxiv.org/abs/2106.11166 摘要:提出了一种基于谱图理论和概率匹配的三维形状配准方法。三维形状分析的任务包括跟踪、识别、配准等。考虑到不同采集设备采集到的数据具有很大的可变性,在单个框架中分析三维数据仍然是一项具有挑战性的任务。三维形状配准就是这样一项具有挑战性的形状分析任务。本章的主要贡献是将谱图匹配与拉普拉斯嵌入相结合,将谱图匹配方法推广到超大图。由于图的嵌入表示是通过降维得到的,我们认为现有的基于谱的方法不容易应用。讨论了精确图同构和不精确图同构问题的解,回顾了组合图拉普拉斯算子的主要谱性质;我们提供了一个新的分析通勤时间嵌入,使我们能够解释后者的PCA的一个图形,并选择适当的维数相关的嵌入度量空间;我们推导了一个单位超球标准化的通勤时间嵌入,使我们能够注册两个不同的采样形状;我们提出了一种新的方法来寻找特征值特征向量排序和特征向量符号,该方法利用特征特征签名(直方图)对等距形状变形保持不变,并且很适合于谱图匹配框架,提出了一种基于期望最大化点配准算法的概率形状匹配公式,该算法在对齐特征基和寻找顶点到顶点分配之间交替进行。 摘要:We address the problem of 3D shape registration and we propose a novel technique based on spectral graph theory and probabilistic matching. The task of 3D shape analysis involves tracking, recognition, registration, etc. Analyzing 3D data in a single framework is still a challenging task considering the large variability of the data gathered with different acquisition devices. 3D shape registration is one such challenging shape analysis task. The main contribution of this chapter is to extend the spectral graph matching methods to very large graphs by combining spectral graph matching with Laplacian embedding. Since the embedded representation of a graph is obtained by dimensionality reduction we claim that the existing spectral-based methods are not easily applicable. We discuss solutions for the exact and inexact graph isomorphism problems and recall the main spectral properties of the combinatorial graph Laplacian; We provide a novel analysis of the commute-time embedding that allows us to interpret the latter in terms of the PCA of a graph, and to select the appropriate dimension of the associated embedded metric space; We derive a unit hyper-sphere normalization for the commute-time embedding that allows us to register two shapes with different samplings; We propose a novel method to find the eigenvalue-eigenvector ordering and the eigenvector signs using the eigensignature (histogram) which is invariant to the isometric shape deformations and fits well in the spectral graph matching framework, and we present a probabilistic shape matching formulation using an expectation maximization point registration algorithm which alternates between aligning the eigenbases and finding a vertex-to-vertex assignment.

【48】 On Testing Equal Conditional Predictive Ability Under Measurement Error 标题:关于测量误差下等条件预测能力的检验

作者:Yannick Hoga,Timo Dimitriadis 链接:https://arxiv.org/abs/2106.11104 摘要:损失函数被广泛用于比较几种相互竞争的预测。然而,预测比较通常基于对真实目标的错误测量的代理变量。引入了损失函数对测量误差精确鲁棒性的概念,并将这类损失函数充分刻画为Bregman类。对于这种完全稳健的损失函数,预测损失差异平均不受代理变量的影响,因此,条件预测能力的推断可以照常进行。此外,我们发现,更精确的代理给予预测能力测试更高的权力,在区分竞争预测。仿真结果说明了精确鲁棒和非鲁棒损失函数的不同行为。对美国GDP增长率的实证应用表明,如果使用更好的GDP增长指标,则更容易区分不同时期发布的预测。 摘要:Loss functions are widely used to compare several competing forecasts. However, forecast comparisons are often based on mismeasured proxy variables for the true target. We introduce the concept of exact robustness to measurement error for loss functions and fully characterize this class of loss functions as the Bregman class. For such exactly robust loss functions, forecast loss differences are on average unaffected by the use of proxy variables and, thus, inference on conditional predictive ability can be carried out as usual. Moreover, we show that more precise proxies give predictive ability tests higher power in discriminating between competing forecasts. Simulations illustrate the different behavior of exactly robust and non-robust loss functions. An empirical application to US GDP growth rates demonstrates that it is easier to discriminate between forecasts issued at different horizons if a better proxy for GDP growth is used.

【49】 Analytically Tractable Bayesian Deep Q-Learning 标题:分析易处理的贝叶斯深度Q-学习

作者:Luong Ha,Nguyen,James-A. Goulet 机构:Department of Civil, Geologic and Mining Engineering, Polytechnique Montr´eal, CANADA 备注:19 pages, 4 figures 链接:https://arxiv.org/abs/2106.11086 摘要:强化学习(RL)自从使用深度Q-learning(DQN)证明它能够在视频游戏基准测试中达到人类的表现以来,受到了越来越多的关注。在这种复杂环境下训练神经网络的共识是依赖于基于梯度的优化。尽管存在替代的贝叶斯深度学习方法,但大多数方法仍然依赖于基于梯度的优化,并且它们通常不基于基准(如Atari游戏环境)进行扩展。此外,这些方法都不允许对定义神经网络的权重和偏差进行分析推断。在这篇文章中,我们提出如何调整时间差分Q-学习框架,使之与可处理的近似高斯推理(TAGI)兼容,后者允许使用封闭形式的分析方法来学习神经网络的参数。通过对开策略和关策略强化学习方法的实验,我们证明了在使用较少的超参数和不依赖基于梯度的优化的情况下,TAGI可以达到与反向传播训练网络相当的性能。 摘要:Reinforcement learning (RL) has gained increasing interest since the demonstration it was able to reach human performance on video game benchmarks using deep Q-learning (DQN). The current consensus for training neural networks on such complex environments is to rely on gradient-based optimization. Although alternative Bayesian deep learning methods exist, most of them still rely on gradient-based optimization, and they typically do not scale on benchmarks such as the Atari game environment. Moreover none of these approaches allow performing the analytical inference for the weights and biases defining the neural network. In this paper, we present how we can adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI), which allows learning the parameters of a neural network using a closed-form analytical method. Throughout the experiments with on- and off-policy reinforcement learning approaches, we demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks while using fewer hyperparameters, and without relying on gradient-based optimization.

【50】 Techniques for Symbol Grounding with SATNet 标题:使用SATNet实现符号接地的技术

作者:Sever Topan,David Rolnick,Xujie Si 机构: the issue 1McGill University and NVIDIA, ca 2McGill University and Mila – Quebec AI Institute 备注:Code available at this https URL 链接:https://arxiv.org/abs/2106.11072 摘要:许多专家认为,人工智能的未来受到该领域将符号逻辑推理集成到深度学习体系结构的能力的限制。最近提出的可微MAXSAT求解器SATNet,是其与传统神经网络集成和解决视觉推理问题能力的突破。例如,它可以从图像例子中学习数独的规则。尽管SATNet取得了成功,但它还是屈服于神经符号系统中的一个关键挑战,即符号基础问题:在没有明确监督的情况下,无法将视觉输入映射到符号变量(“标签泄漏”)。在这项工作中,我们提出了一个自我监督的预训练管道,使SATNet能够克服这一限制,从而拓宽了SATNet体系结构可以解决的问题的类别,包括完全没有中间标签的数据集。我们证明,我们的方法可以使卫星网达到完全准确,甚至与一个更困难的问题设置,防止任何标签泄漏。此外,我们还介绍了一种校对方法,进一步提高了SATNet体系结构的性能,击败了最先进的视觉数独。 摘要:Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into deep learning architectures. The recently proposed differentiable MAXSAT solver, SATNet, was a breakthrough in its capacity to integrate with a traditional neural network and solve visual reasoning problems. For instance, it can learn the rules of Sudoku purely from image examples. Despite its success, SATNet was shown to succumb to a key challenge in neurosymbolic systems known as the Symbol Grounding Problem: the inability to map visual inputs to symbolic variables without explicit supervision ("label leakage"). In this work, we present a self-supervised pre-training pipeline that enables SATNet to overcome this limitation, thus broadening the class of problems that SATNet architectures can solve to include datasets where no intermediary labels are available at all. We demonstrate that our method allows SATNet to attain full accuracy even with a harder problem setup that prevents any label leakage. We additionally introduce a proofreading method that further improves the performance of SATNet architectures, beating the state-of-the-art on Visual Sudoku.

【51】 GRAND: Graph Neural Diffusion 标题:GRAND:图的神经扩散

作者:Benjamin Paul Chamberlain,James Rowbottom,Maria Gorinova,Stefan Webb,Emanuele Rossi,Michael M. Bronstein 备注:15 pages, 4 figures. Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s) 链接:https://arxiv.org/abs/2106.10934 摘要:我们提出了一种图神经扩散(GRAND)方法,它将图的深度学习看作是一个连续的扩散过程,并将图神经网络(GNNs)看作是基本偏微分方程的离散化。在我们的模型中,层结构和拓扑对应于时间和空间算子的离散化选择。我们的方法允许有原则地开发一类广泛的新GNN,这些GNN能够解决图形学习模型的常见问题,如深度、过度平滑和瓶颈。我们的模型成功的关键是关于数据扰动的稳定性,这是针对隐式和显式离散格式的。我们开发了GRAND的线性和非线性版本,在许多标准图形基准上都取得了有竞争力的结果。 摘要:We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.

【52】 Bayesian inference of ODEs with Gaussian processes 标题:具有高斯过程的常微分方程的贝叶斯推断

作者:Pashupati Hegde,Çağatay Yıldız,Harri Lähdesmäki,Samuel Kaski,Markus Heinonen 机构:Ça˘gatay Yıldız, Department of Computer Science, Aalto University, Finland 链接:https://arxiv.org/abs/2106.10905 摘要:机器学习的最新进展提出了直接从数据中估计未知连续时间系统动力学的黑盒方法。然而,早期的工作是基于近似常微分方程解或点估计。提出了一种新的贝叶斯非参数模型,利用高斯过程直接从数据中推断未知ODE系统的后验概率。我们推导了用解耦函数采样表示向量场后验概率的稀疏变分推理。我们还介绍了一种概率射击增强,使有效的推断从任意长的轨迹。该方法证明了计算向量场后验概率的优势,在多个ODE学习任务中,预测不确定性得分优于其他方法。 摘要:Recent machine learning advances have proposed black-box estimation of unknown continuous-time system dynamics directly from data. However, earlier works are based on approximative ODE solutions or point estimates. We propose a novel Bayesian nonparametric model that uses Gaussian processes to infer posteriors of unknown ODE systems directly from data. We derive sparse variational inference with decoupled functional sampling to represent vector field posteriors. We also introduce a probabilistic shooting augmentation to enable efficient inference from arbitrarily long trajectories. The method demonstrates the benefit of computing vector field posteriors, with predictive uncertainty scores outperforming alternative methods on multiple ODE learning tasks.

【53】 Multiplying Matrices Without Multiplying 标题:矩阵乘法不乘法

作者:Davis Blalock,John Guttag 备注:To appear at ICML 2021 链接:https://arxiv.org/abs/2106.10860 摘要:乘法矩阵是机器学习中最基本的计算密集型操作之一。因此,在有效地逼近矩阵乘法方面已经做了大量的工作。我们介绍了一个基于学习的算法,大大优于现有的方法。使用来自不同领域的数百个矩阵进行的实验表明,它通常比精确矩阵积快100美元,比当前的近似方法快10美元。在通常情况下,一个矩阵是已知的时间提前,我们的方法也有一个有趣的性质,它需要零乘加。这些结果表明,散列、平均和字节洗牌的混合$-$是我们方法$-$的核心操作,与最近成为大量研究和硬件投资焦点的稀疏化、因式分解和/或标量量化矩阵产品相比,它可能是一个更有希望的机器学习构建块。 摘要:Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100times$ faster than exact matrix products and $10times$ faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds. These results suggest that a mixture of hashing, averaging, and byte shuffling$-$the core operations of our method$-$could be a more promising building block for machine learning than the sparsified, factorized, and/or scalar quantized matrix products that have recently been the focus of substantial research and hardware investment.

【54】 Compressing Deep ODE-Nets using Basis Function Expansions 标题:基于基函数展开的深层ODE-Net压缩

作者:Alejandro Queiruga,N. Benjamin Erichson,Liam Hodgkinson,Michael W. Mahoney 机构:Google Research, ICSI and UC Berkeley 链接:https://arxiv.org/abs/2106.10820 摘要:最近引入的常微分方程网络(ODE-Nets)在深度学习和动力系统之间建立了卓有成效的联系。在这项工作中,我们重新考虑公式的权重连续深度函数使用线性组合的基础函数。这种观点允许我们通过改变基础来压缩重量,而无需再训练,同时保持接近最先进的性能。反过来,推理时间和内存占用都减少了,使得计算环境之间能够快速而严格地适应。此外,我们的框架使用函数投影实现有意义的连续时间批标准化层。通过将连续深度模型应用于(a)使用卷积单元的图像分类任务和(b)使用Transformer编码器单元的句子标注任务,证明了基函数压缩的性能。 摘要:The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-depth functions using linear combinations of basis functions. This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance. In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments. Furthermore, our framework enables meaningful continuous-in-time batch normalization layers using function projections. The performance of basis function compression is demonstrated by applying continuous-depth models to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.

【55】 Lossy Compression for Lossless Prediction 标题:有损压缩在无损预测中的应用

作者:Yann Dubois,Benjamin Bloem-Reddy,Karen Ullrich,Chris J. Maddison 机构:The University of British Columbia, Facebook AI Research, University of Toronto, Vector Institute 链接:https://arxiv.org/abs/2106.10800 摘要:大多数数据是自动收集的,只有算法才能“看到”。然而,数据压缩器保持了感知保真度,而不仅仅是执行下游任务的算法所需的信息。在本文中,我们描述了在一组变换(如数据扩充)下保持不变的所有预测任务上确保高性能所需的比特率。基于我们的理论,我们设计了训练神经压缩器的无监督目标。利用这些目标,我们训练了一个通用的图像压缩程序,与8个数据集上的JPEG相比,它在不降低下游分类性能的情况下实现了显著的速率节省(超过ImageNet上的1000美元×1000美元)。 摘要:Most data is automatically collected and only ever "seen" by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on our theory, we design unsupervised objectives for training neural compressors. Using these objectives, we train a generic image compressor that achieves substantial rate savings (more than $1000times$ on ImageNet) compared to JPEG on 8 datasets, without decreasing downstream classification performance.

【56】 Neural Spectral Marked Point Processes 标题:神经频谱标记点过程

作者:Shixiang Zhu,Haoyun Wang,Xiuyuan Cheng,Yao Xie 链接:https://arxiv.org/abs/2106.10773 摘要:自激励和互激励点过程是机器学习和相关离散事件数据统计中常用的模型。到目前为止,大多数现有的模型都采用平稳核(包括经典的Hawkes过程)和简单的参数模型。具有复杂事件数据的现代应用程序需要更通用的点过程模型,该模型除了时间和位置信息之外,还可以包含事件的上下文信息(称为标记)。此外,此类应用通常需要非平稳模型来捕获更复杂的时空相关性。为了应对这些挑战,一个关键的问题是在点过程模型中设计一个通用的影响核。在本文中,我们介绍了一种新的、通用的基于神经网络的非平稳影响核,它具有很高的表达能力,可以处理复杂的离散事件数据,同时提供了理论上的性能保证。我们证明了我们提出的方法在合成和真实数据上的优越性能。 摘要:Self- and mutually-exciting point processes are popular models in machine learning and statistics for dependent discrete event data. To date, most existing models assume stationary kernels (including the classical Hawkes processes) and simple parametric models. Modern applications with complex event data require more general point process models that can incorporate contextual information of the events, called marks, besides the temporal and location information. Moreover, such applications often require non-stationary models to capture more complex spatio-temporal dependence. To tackle these challenges, a key question is to devise a versatile influence kernel in the point process model. In this paper, we introduce a novel and general neural network-based non-stationary influence kernel with high expressiveness for handling complex discrete events data while providing theoretical performance guarantees. We demonstrate the superior performance of our proposed method compared with the state-of-the-art on synthetic and real data.

【57】 Multirate Training of Neural Networks 标题:神经网络的多速率训练

作者:Tiffany Vlaar,Benedict Leimkuhler 机构:Department of Mathematics, University of Edinburgh 链接:https://arxiv.org/abs/2106.10771 摘要:我们提出了神经网络的多速率训练:将神经网络参数分为“快”和“慢”两部分,用不同的学习速率同时训练。通过选择适当的划分,我们可以获得大的计算速度为转移学习任务。我们证明,对于视觉和自然语言处理中的各种转移学习应用,我们可以在几乎一半的时间内对深度神经网络进行微调,而不会降低所得到模型的泛化性能。我们还讨论了在从头开始训练神经网络的情况下,有利于提高泛化性能的神经网络参数的其他分裂选择。最后,我们提出了一种额外的多速率技术,它可以通过在不同时间尺度上同时训练整个网络来学习数据中的不同特征。对于图像数据的ResNet体系结构,说明了使用这种方法的好处。本文揭示了利用多速率技术进行神经网络训练的潜力,并为今后这方面的工作提供了许多出发点。 摘要:We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained simultaneously using different learning rates. By choosing appropriate partitionings we can obtain large computational speed-ups for transfer learning tasks. We show that for various transfer learning applications in vision and NLP we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting model. We also discuss other splitting choices for the neural network parameters which are beneficial in enhancing generalization performance in settings where neural networks are trained from scratch. Finally, we propose an additional multirate technique which can learn different features present in the data by training the full network on different time scales simultaneously. The benefits of using this approach are illustrated for ResNet architectures on image data. Our paper unlocks the potential of using multirate techniques for neural network training and provides many starting points for future work in this area.

【58】 Generalization in the Face of Adaptivity: A Bayesian Perspective 标题:适应性面前的泛化:贝叶斯视角

作者:Moshe Shenfeld,Katrina Ligett 链接:https://arxiv.org/abs/2106.10761 摘要:通过自适应选择的查询重复使用数据样本可能会迅速导致过度拟合,其中发出的查询在样本上生成的答案与那些查询在底层数据分布上的值相差很大。差分隐私提供了一种工具,以确保尽管自适应选择查询的泛化,但其最坏情况的性质意味着,例如,它不能为低方差查询产生更好的结果。在本文中,我们给出了一个简单的新特征,说明了自适应数据分析的核心问题。我们明确地表明,自适应性的危害来自于未来查询行为之间的协方差和基于Bayes因子的度量,即过去查询的响应中有多少数据样本的信息被编码。我们利用这种直觉引入一种新的稳定性概念;然后,我们用它来证明最基本的噪声添加机制(Laplace和Gaussian噪声添加)的新的泛化结果,并保证其与查询的方差而不是其范围的平方成比例。我们的特征为自适应数据分析中实现泛化的基本问题提供了新的见解和新的算法。 摘要:Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the issued queries yield answers on the sample that differ wildly from the values of those queries on the underlying data distribution. Differential privacy provides a tool to ensure generalization despite adaptively-chosen queries, but its worst-case nature means that it cannot, for example, yield improved results for low-variance queries. In this paper, we give a simple new characterization that illuminates the core problem of adaptive data analysis. We show explicitly that the harms of adaptivity come from the covariance between the behavior of future queries and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries. We leverage this intuition to introduce a new stability notion; we then use it to prove new generalization results for the most basic noise-addition mechanisms (Laplace and Gaussian noise addition), with guarantees that scale with the variance of the queries rather than the square of their range. Our characterization opens the door to new insights and new algorithms for the fundamental problem of achieving generalization in adaptive data analysis.

【59】 Fundamental bounds on the precision of iSCAT, COBRI and dark-field microscopy for 3D localization and mass photometry 标题:用于三维定位和质量光度测量的ISCAT、COBRI和暗场显微镜精度的基本界限

作者:Jonathan Dong,Dante Maestre,Clara Conrad-Billroth,Thomas Juffmann 机构:Biomedical Imaging Group, ´Ecole Polytechnique F´ed´erale de Lausanne, Lausanne , Switzerland, University of Vienna, VCQ, A-, Vienna, Austria, University of Vienna, Max Perutz Laboratories, Department of Structural and Computational Biology, A-, Vienna, Austria 链接:https://arxiv.org/abs/2106.10758 摘要:干涉成像是一种新兴的粒子跟踪和质量测光技术。质量或位置是由微弱的信号估计出来的,这些信号是由纳米颗粒或单个分子相干散射而来的,并且受到共同传播的参考的干扰。在这项工作中,我们进行了统计分析,并推导了下限的参数测量精度从散粒噪声限制的图像。这是通过使用干涉成像技术的精确矢量模型,计算用于定位和质量估计的经典Cram′er-Rao界来实现的。然后,基于量子Cram'er-Rao形式主义,我们导出了适用于任何成像系统的基本边界。这种方法可以对干涉散射显微镜(iSCAT)、相干亮场显微镜(COBRI)和暗场显微镜等常用技术进行严格和定量的比较。特别地,我们证明了iSCAT中的光收集几何结构大大增加了轴向位置灵敏度,并且用于质量估计的量子Cram-Rao界产生了$sigmau m/m=1/(2sqrt{N})$的最小相对估计误差,其中,$N$是收集的散射光子的数量。 摘要:Interferometric imaging is an emerging technique for particle tracking and mass photometry. Mass or position are estimated from weak signals, coherently scattered from nanoparticles or single molecules, and interfered with a co-propagating reference. In this work, we perform a statistical analysis and derive lower bounds on the measurement precision of the parameters of interest from shot-noise limited images. This is done by computing the classical Cram'er-Rao bound for localization and mass estimation, using a precise vectorial model of interferometric imaging techniques. We then derive fundamental bounds valid for any imaging system, based on the quantum Cram'er-Rao formalism. This approach enables a rigorous and quantitative comparison of common techniques such as interferometric scattering microscopy (iSCAT), Coherent Brightfield microscopy (COBRI), and dark-field microscopy. In particular, we demonstrate that the light collection geometry in iSCAT greatly increases the axial position sensitivity, and that the Quantum Cram'er-Rao bound for mass estimation yields a minimum relative estimation error of $sigma_m/m=1/(2sqrt{N})$, where $N$ is the number of collected scattered photons.

【60】 On the Cryptographic Hardness of Learning Single Periodic Neurons 标题:关于学习单周期神经元的密码难度

作者:Min Jae Song,Ilias Zadik,Joan Bruna 机构:Courant Institute of Mathematical Sciences, New York University, New York, Center for Data Science, New York University, New York 备注:54 pages 链接:https://arxiv.org/abs/2106.10744 摘要:我们给出了一个简单的简化算法,证明了在噪声存在的情况下,在各向同性高斯分布上学习单个周期神经元的密码困难性。更确切地说,我们的减少表明,任何多项式时间算法(不一定是基于梯度的)在小噪声下学习这样的函数意味着一个多项式时间量子算法用于求解最坏情况的格点问题,其硬度是基于格的密码学的基础。我们的核心硬函数族,由一层神经网络很好地逼近,采用一元周期函数的一般形式应用于数据的仿射投影。这些函数出现在以前的开创性工作中,证明了它们对基于梯度(Shamir'18)和统计查询(SQ)算法(Song et al'17)的硬度。我们证明,如果(多项式)小噪声被添加到标签上,在上述密码假设下,学习这些函数的困难性适用于所有多项式时间算法。此外,我们还通过设计一个多项式时间算法,在指数小的对抗性噪声下学习某些函数族,证明了在硬度结果中引入噪声的必要性。我们提出的算法不是基于梯度或SQ的算法,而是基于著名的Lenstra-Lenstra-Lov'asz(LLL)格基约简算法。此外,在没有噪声的情况下,该算法可直接用于解决CLWE检测(Bruna等人21)和相位恢复,最佳样本复杂度为$d 1$样本。在前一种情况下,这改进了(Bruna等人,21)中所要求的二次In-$d$样本复杂性。在后一种情况下,这改进了最先进的基于放大器的算法,该算法需要大约1.128d$个样本(Barbier等人,19)。 摘要:We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice problems, whose hardness form the foundation of lattice-based cryptography. Our core hard family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (Shamir'18), and Statistical Query (SQ) algorithms (Song et al.'17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradient-based or an SQ algorithm, but is rather based on the celebrated Lenstra-Lenstra-Lov'asz (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.'21) and phase retrieval with an optimal sample complexity of $d 1$ samples. In the former case, this improves upon the quadratic-in-$d$ sample complexity required in (Bruna et al.'21). In the latter case, this improves upon the state-of-the-art AMP-based algorithm, which requires approximately $1.128d$ samples (Barbier et al.'19).

【61】 Better Training using Weight-Constrained Stochastic Dynamics 标题:使用权重约束随机动力学进行更好的训练

作者:Benedict Leimkuhler,Tiffany Vlaar,Timothée Pouchon,Amos Storkey 机构: 1Department of Mathematics, University of Edinburgh, United Kingdom 2Department of Informatics, University of Ed-inburgh 备注:None 链接:https://arxiv.org/abs/2106.10704 摘要:在整个训练过程中,我们采用约束来控制深度神经网络的参数空间。使用定制的、适当设计的约束可以减少消失/爆炸梯度问题,提高分类边界的平滑度,控制权值大小,稳定深层神经网络,从而增强训练算法的鲁棒性和神经网络的泛化能力。我们提供了一个通用的方法来有效地将约束合并到随机梯度Langevin框架中,从而增强了对损失景观的探索。我们还提出了具体的例子约束训练方法的动机正交权矩阵和显式权规范化保存。对Langevin动力学的过阻尼形式和欠阻尼形式都给出了离散化方案,其中momenta进一步提高了采样效率。这些优化方案可以直接使用,无需调整神经网络结构设计选择或用正则化项修改目标,并在分类任务中看到性能改进。 摘要:We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimization schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.

【62】 A compressive multi-kernel method for privacy-preserving machine learning 标题:一种用于隐私保护的压缩多核机器学习方法

作者:Thee Chanyaswad,J. Morris Chang,S. Y. Kung 机构:Department of, Electrical Engineeering, Princeton University, Princeton, New Jersey , University of South Florida, Tampa, Florida , S.Y. Kung 备注:Published in 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017 链接:https://arxiv.org/abs/2106.10671 摘要:随着分析工具越来越强大,每天生成的数据越来越多,数据隐私问题也随之产生。这导致了对隐私保护机器学习算法设计的研究。考虑到效用最大化和隐私损失最小化这两个目标,本工作基于两个先前不相交的机制——压缩隐私和多核方法。压缩隐私(Compressive Privacy)是一种隐私框架,它采用实用的有损编码方案来保护数据的隐私,而多核方法是一种基于核的机器学习机制,它探索了使用多核来构建更好的预测器的思想。提出的压缩多核方法分为两个阶段:压缩阶段和多核阶段。压缩阶段遵循压缩隐私范式来提供所需的隐私保护。每个核矩阵被压缩一个有损投影矩阵来自判别成分分析(DCA)。多核级利用每个核的信噪比得分对多个压缩核进行非均匀组合。该方法在两个移动感知数据集MHEALTH和HAR上进行了评估,其中活动识别定义为效用,个人识别定义为隐私。结果表明,在所有实验中,由于隐私分类的准确率几乎都处于随机猜测水平,因此压缩机制在隐私保护方面是成功的。另一方面,新的基于信噪比的多核方法在两种数据集的分类精度上都比现有方法有所提高。这些结果为隐私保护机器学习的研究指明了方向。 摘要:As the analytic tools become more powerful, and more data are generated on a daily basis, the issue of data privacy arises. This leads to the study of the design of privacy-preserving machine learning algorithms. Given two objectives, namely, utility maximization and privacy-loss minimization, this work is based on two previously non-intersecting regimes -- Compressive Privacy and multi-kernel method. Compressive Privacy is a privacy framework that employs utility-preserving lossy-encoding scheme to protect the privacy of the data, while multi-kernel method is a kernel based machine learning regime that explores the idea of using multiple kernels for building better predictors. The compressive multi-kernel method proposed consists of two stages -- the compression stage and the multi-kernel stage. The compression stage follows the Compressive Privacy paradigm to provide the desired privacy protection. Each kernel matrix is compressed with a lossy projection matrix derived from the Discriminant Component Analysis (DCA). The multi-kernel stage uses the signal-to-noise ratio (SNR) score of each kernel to non-uniformly combine multiple compressive kernels. The proposed method is evaluated on two mobile-sensing datasets -- MHEALTH and HAR -- where activity recognition is defined as utility and person identification is defined as privacy. The results show that the compression regime is successful in privacy preservation as the privacy classification accuracies are almost at the random-guess level in all experiments. On the other hand, the novel SNR-based multi-kernel shows utility classification accuracy improvement upon the state-of-the-art in both datasets. These results indicate a promising direction for research in privacy-preserving machine learning.

【63】 TD-GEN: Graph Generation With Tree Decomposition 标题:TD-Gen:基于树分解的图形生成

作者:Hamed Shirzad,Hossein Hajimirsadeghi,Amir H. Abdi,Greg Mori 机构:Borealis AI & Simon Fraser University 链接:https://arxiv.org/abs/2106.10656 摘要:提出了一种基于树分解的图生成框架TD-GEN,并对图生成所需的最大决策数引入了一个简化的上界。该框架包括一个置换不变树生成模型,它构成了图生成的主干。树节点是超级节点,每个超级节点表示图中的一组节点。通过遍历树的超节点,遵循树分解的结构,并遵循簇间的节点共享决策,在簇内增量生成图的节点和边。最后,我们讨论了基于生成图的统计特性作为性能度量的标准评估准则的缺点。我们建议比较基于似然的模型的性能。在各种标准图形生成数据集上的实验结果表明了该方法的优越性。 摘要:We propose TD-GEN, a graph generation framework based on tree decomposition, and introduce a reduced upper bound on the maximum number of decisions needed for graph generation. The framework includes a permutation invariant tree generation model which forms the backbone of graph generation. Tree nodes are supernodes, each representing a cluster of nodes in the graph. Graph nodes and edges are incrementally generated inside the clusters by traversing the tree supernodes, respecting the structure of the tree decomposition, and following node sharing decisions between the clusters. Finally, we discuss the shortcomings of standard evaluation criteria based on statistical properties of the generated graphs as performance measures. We propose to compare the performance of models based on likelihood. Empirical results on a variety of standard graph generation datasets demonstrate the superior performance of our method.

【64】 On Sampling Top-K Recommendation Evaluation 标题:浅谈抽样Top-K推荐评价

作者:Dong Li,Ruoming Jin,Jing Gao,Zhi Liu 机构:Kent State University, Lambda 链接:https://arxiv.org/abs/2106.10621 摘要:最近,Rendle警告说,使用基于抽样的top-$k$指标可能还不够。这使得基于深度学习的推荐算法和使用这种度量的经典非深度学习算法的一些最新研究陷入危险之中。在这项工作中,我们深入研究了抽样与全球最高点击率(HR,或召回率)之间的关系,这一点最初由Koren[2]提出并被其他人广泛使用。通过构造样本顶部对齐的问题-$k$($SHR@k$)和全球排名-$K$($HR@K$)通过映射函数$f$的命中率,以便$SHR@k大约HR@f(k) $,我们从理论上和实验上证明了抽样top-$k$命中率提供了其全局(精确)对应物的精确近似,并能始终如一地预测正确的赢家(与相应的全球命中率相同)。 摘要:Recently, Rendle has warned that the use of sampling-based top-$k$ metrics might not suffice. This throws a number of recent studies on deep learning-based recommendation algorithms, and classic non-deep-learning algorithms using such a metric, into jeopardy. In this work, we thoroughly investigate the relationship between the sampling and global top-$K$ Hit-Ratio (HR, or Recall), originally proposed by Koren[2] and extensively used by others. By formulating the problem of aligning sampling top-$k$ ($SHR@k$) and global top-$K$ ($HR@K$) Hit-Ratios through a mapping function $f$, so that $SHR@kapprox HR@f(k)$, we demonstrate both theoretically and experimentally that the sampling top-$k$ Hit-Ratio provides an accurate approximation of its global (exact) counterpart, and can consistently predict the correct winners (the same as indicate by their corresponding global Hit-Ratios).

【65】 EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization 标题:EvoGrad:高效的基于梯度的元学习和超参数优化

作者:Ondrej Bohdal,Yongxin Yang,Timothy Hospedales 机构:School of Informatics, The University of Edinburgh 链接:https://arxiv.org/abs/2106.10575 摘要:近年来,基于梯度的元学习和超参数优化技术取得了显著的进展,使得神经网络的端到端训练和许多超参数训练成为可能。然而,现有的方法是相对昂贵的,因为他们需要计算二阶导数和存储一个较长的计算图。这一成本使它们无法扩展到更大的网络体系结构。我们介绍了EvoGrad,一种新的元学习方法,它利用进化技术更有效地计算超梯度。EvoGrad在不计算二阶梯度或存储更长的计算图的情况下估计超参数的超梯度,从而显著提高了效率。我们评估了EvoGrad在最近两个重要的元学习应用,即跨域的Few-Shot学习与特征转换和噪声标签学习与MetaWeightNet。结果表明,EvoGrad显著提高了效率,并支持将元学习扩展到更大的CNN架构,如从ResNet18到ResNet34。 摘要:Gradient-based meta-learning and hyperparameter optimization have seen significant progress recently, enabling practical end-to-end training of neural networks together with many hyperparameters. Nevertheless, existing approaches are relatively expensive as they need to compute second-order derivatives and store a longer computational graph. This cost prevents scaling them to larger network architectures. We present EvoGrad, a new approach to meta-learning that draws upon evolutionary techniques to more efficiently compute hypergradients. EvoGrad estimates hypergradient with respect to hyperparameters without calculating second-order gradients, or storing a longer computational graph, leading to significant improvements in efficiency. We evaluate EvoGrad on two substantial recent meta-learning applications, namely cross-domain few-shot learning with feature-wise transformations and noisy label learning with MetaWeightNet. The results show that EvoGrad significantly improves efficiency and enables scaling meta-learning to bigger CNN architectures such as from ResNet18 to ResNet34.

【66】 The nonzero gain coefficients of Sobol's sequences are always powers of two 标题:Sobol序列的非零增益系数总是2的幂

作者:Zexin Pan,Art B. Owen 机构:Stanford University 链接:https://arxiv.org/abs/2106.10534 摘要:当对$n$样本的纯蒙特卡罗估计具有方差$sigma^2/n$时,则加扰数字网获得的方差为$o(1/n)$作为$n到infty$。对于有限的$n$和一个不利选择的被积函数,对于最大增益系数$Gamma<infty$,置乱的$(t,m,s)$-net的方差最多可以是$Gammasigma^2/n$。最广泛使用的数字网络和序列是Sobol的。先前已知,$Gammaleqslant 2^t3^s$代表Sobol'点和Niederreiter Xing点。本文研究以2美元为基数的网络。我们证明了$Gammaleqslant2^{t s-1}$对于网络。这个界限是Niederreiter和Pirsic(2001)中微观结构分析的一个简单但显然未被注意到的结果。对于某些数字网络,我们得到了比这个更小的更清晰的界。我们还证明了所有非零增益系数必须是2的幂。后一个事实的结果是一个简化的算法,用于计算以2美元为基数的网络增益系数。 摘要:When a plain Monte Carlo estimate on $n$ samples has variance $sigma^2/n$, then scrambled digital nets attain a variance that is $o(1/n)$ as $ntoinfty$. For finite $n$ and an adversarially selected integrand, the variance of a scrambled $(t,m,s)$-net can be at most $Gammasigma^2/n$ for a maximal gain coefficient $Gamma<infty$. The most widely used digital nets and sequences are those of Sobol'. It was previously known that $Gammaleqslant 2^t3^s$ for Sobol' points as well as Niederreiter-Xing points. In this paper we study nets in base $2$. We show that $Gamma leqslant2^{t s-1}$ for nets. This bound is a simple, but apparently unnoticed, consequence of a microstructure analysis in Niederreiter and Pirsic (2001). We obtain a sharper bound that is smaller than this for some digital nets. We also show that all nonzero gain coefficients must be powers of two. A consequence of this latter fact is a simplified algorithm for computing gain coefficients of nets in base $2$.

【67】 Neural Network Classifier as Mutual Information Evaluator 标题:作为互信息评估器的神经网络分类器

作者:Zhenyue Qin,Dongwoo Kim,Tom Gedeon 机构: From a varia-Equal contribution 1School of Computing, Australian Na-tional University 2GSAI 备注:arXiv admin note: substantial text overlap with arXiv:1911.10688 链接:https://arxiv.org/abs/2106.10471 摘要:具有softmax输出的交叉熵损失是训练神经网络分类器的标准选择。提出了以softmax和交叉熵作为互信息评价器的神经网络分类器的新观点。我们证明了当数据集是平衡的,训练一个具有交叉熵的神经网络通过互信息的变分形式使输入和标签之间的互信息最大化。因此,我们开发了一种新形式的softmax,当数据集不平衡时,它还将分类器转换为互信息评估器。实验结果表明,这种新的分类形式具有更好的分类精度,特别是对于不平衡数据集。 摘要:Cross-entropy loss with softmax output is a standard choice to train neural network classifiers. We give a new view of neural network classifiers with softmax and cross-entropy as mutual information evaluators. We show that when the dataset is balanced, training a neural network with cross-entropy maximises the mutual information between inputs and labels through a variational form of mutual information. Thereby, we develop a new form of softmax that also converts a classifier to a mutual information evaluator when the dataset is imbalanced. Experimental results show that the new form leads to better classification accuracy, in particular for imbalanced datasets.

【68】 A Unified View of Algorithms for Path Planning Using Probabilistic Inference on Factor Graphs 标题:因子图概率推理路径规划算法的统一视图

作者:Francesco A. N. Palmieri,Krishna R. Pattipati,Giovanni Di Gennaro,Giovanni Fioretti,Francesco Verolla,Amedeo Buonanno 机构:Dipartimento di Ingegneria, Università degli Studi della Campania “Luigi Vanvitelli”, Aversa (CE), Italy, Department of Electrical and Computer Engineering, University of Connecticut, Storrs (CT), USA, ENEA 链接:https://arxiv.org/abs/2106.10442 摘要:即使路径规划可以用动态规划和控制的标准技术来解决,也可以用概率推理来解决。使用后一种框架出现的算法具有一些吸引人的特性,这些特性使概率方法成为更传统控制公式的有力替代方法。利用随机模型上的估计来解决控制问题的想法并不新鲜,这里考虑的推理方法属于主动推理(AI)和控制即推理(CAI)的范畴。在这项工作中,我们将研究由各种代价函数产生的具体递归,这些代价函数虽然在范围上看起来相似,但至少在应用于典型的路径规划问题时具有明显的差异。我们首先提出了一个概率因子图上的路径规划问题,并展示了各种算法如何转化为特定的消息组合规则。然后,我们展示了这种在概率空间和对数空间中提出的统一方法如何提供一个非常通用的框架,包括和积、最大积、动态规划和基于混合奖赏/熵准则的算法。该框架还扩展了用于更平滑或更清晰策略分布的算法设计选项,包括广义和/最大积算法、平滑动态规划算法和改进的奖励/熵递归。我们提供了一个完整的递归表,并通过模拟进行了比较,首先在一个带有障碍物的单一目标的合成小网格上,然后在一个带有多个目标和语义图的真实场景中推断出的网格上。 摘要:Even if path planning can be solved using standard techniques from dynamic programming and control, the problem can also be approached using probabilistic inference. The algorithms that emerge using the latter framework bear some appealing characteristics that qualify the probabilistic approach as a powerful alternative to the more traditional control formulations. The idea of using estimation on stochastic models to solve control problems is not new and the inference approach considered here falls under the rubric of Active Inference (AI) and Control as Inference (CAI). In this work, we look at the specific recursions that arise from various cost functions that, although they may appear similar in scope, bear noticeable differences, at least when applied to typical path planning problems. We start by posing the path planning problem on a probabilistic factor graph, and show how the various algorithms translate into specific message composition rules. We then show how this unified approach, presented both in probability space and in log space, provides a very general framework that includes the Sum-product, the Max-product, Dynamic programming and mixed Reward/Entropy criteria-based algorithms. The framework also expands algorithmic design options for smoother or sharper policy distributions, including generalized Sum/Max-product algorithm, a Smooth Dynamic programming algorithm and modified versions of the Reward/Entropy recursions. We provide a comprehensive table of recursions and a comparison through simulations, first on a synthetic small grid with a single goal with obstacles, and then on a grid extrapolated from a real-world scene with multiple goals and a semantic map.

【69】 STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning 标题:STEM:一种获得接近最优样本和联邦学习通信复杂度的随机双边动量算法

作者:Prashant Khanduri,Pranay Sharma,Haibo Yang,Mingyi Hong,Jia Liu,Ketan Rajawat,Pramod K. Varshney 机构:∗Department of Electrical and Computer Engineering, The Ohio State University, OH, USA, †Department of Electrical and Computer Engineering, University of Minnesota, MN, USA, ⋄Department of Electrical Engineering and Computer Science, Syracuse University, NY, USA 链接:https://arxiv.org/abs/2106.10435 摘要:联邦学习(FL)是指多个工作节点(WNs)通过使用本地数据建立联合模型的范例。对于一个一般的非凸FL问题,如何选择WNs和服务器的更新方向、小批量大小和本地更新频率,使WNs使用最少的样本数和通信轮数来达到期望的解,目前还不清楚。这项工作解决了上述问题,并考虑了一类随机算法,其中WNs在通信前执行一些局部更新。我们证明,当基于随机动量估计选择WN和服务器的方向时,该算法需要$tilde{mathcal{O}(epsilon^{-3/2})$样本和$tilde{mathcal{O}(epsilon^{-1})$通信轮来计算$epsilon$-平稳解。据我们所知,这是第一个同时实现{it near optimal}采样和通信复杂性的FL算法。进一步,我们证明了在局部更新频率和局部小批量大小之间存在一条折衷曲线,在该曲线上可以保持上述样本和通信复杂性。最后,我们证明了对于经典的FedAvg(又称局部SGD,这是STEM的一个动量较少的特例),存在一个类似的折衷曲线,尽管样本和通信复杂度较差。我们对这种权衡的见解为选择FL算法的四个重要设计元素、更新频率、方向和小批量大小提供了指导,以实现最佳性能。 摘要:Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires $tilde{mathcal{O}}(epsilon^{-3/2})$ samples and $tilde{mathcal{O}}(epsilon^{-1})$ communication rounds to compute an $epsilon$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance.

【70】 Robust M-estimation-based Tensor Ring Completion: a Half-quadratic Minimization Approach 标题:基于鲁棒M-估计的张量环完备化:半二次最小化方法

作者:Yicong He,George K. Atia 机构: Atia are with the Department of Electricaland Computer Engineering, University of Central Florida 链接:https://arxiv.org/abs/2106.10422 摘要:张量完备是从部分观测的数据中估计高阶数据缺失值的问题。在张量秩的几种定义中,张量环秩提供了建立不同阶张量模型所需的灵活性和准确性,这激发了近年来张量环完备的研究。然而,由于普遍存在的异常值而导致的数据损坏对现有算法提出了重大挑战。在本文中,我们提出了一种稳健的张量环完备方法,该方法使用M-估计作为其误差统计量,可以显著减轻异常值的影响。利用半二次(HQ)方法,我们将问题转化为一个加权张量完备问题。提出了两种基于截断奇异值分解和矩阵分解的HQ算法,并对其收敛性和复杂性进行了分析。讨论了该方法对张量秩的替代定义的可扩展性。实验结果表明,该方法优于现有的张量完备鲁棒算法。 摘要:Tensor completion is the problem of estimating the missing values of high-order data from partially observed entries. Among several definitions of tensor rank, tensor ring rank affords the flexibility and accuracy needed to model tensors of different orders, which motivated recent efforts on tensor-ring completion. However, data corruption due to prevailing outliers poses major challenges to existing algorithms. In this paper, we develop a robust approach to tensor ring completion that uses an M-estimator as its error statistic, which can significantly alleviate the effect of outliers. Leveraging a half-quadratic (HQ) method, we reformulate the problem as one of weighted tensor completion. We present two HQ-based algorithms based on truncated singular value decomposition and matrix factorization along with their convergence and complexity analysis. Extendibility of the proposed approach to alternative definitions of tensor rank is also discussed. The experimental results demonstrate the superior performance of the proposed approach over state-of-the-art robust algorithms for tensor completion.

【71】 Variance-Dependent Best Arm Identification 标题:基于方差的最佳ARM辨识

作者:Pinyan Lu,Chao Tao,Xiaojin Zhang 机构:ITCS, Shanghai University of Finance and Economics, Department of Computer Science, Indiana University Bloomington, Department of Computer Science and Engineering, The Chinese University of Hong Kong 链接:https://arxiv.org/abs/2106.10417 摘要:研究了随机多臂土匪博弈中最优臂的确定问题。给定一组从$1$到$n$索引的$n$arms,每个arm$i$与一个未知的奖励分布相关联,该奖励分布在$[0,1]$上,平均值为$thetau i$,方差为$sigmau i^2$。假设$thetau 1>thetau 2geqcdotsgeqthetau n$。我们提出了一种自适应算法,利用一种称为分组中值消元法的新方法,探索手臂奖励的差距和方差,并根据收集到的信息做出未来的决策。该算法保证输出概率为$(1-delta)$的最佳arm,最多使用$O左(sum{i=1}^n左(frac{sigma{i^2}{delta{i^2} frac{1}{delta{i}右)(ln delta^{-1} ln ln delta{i^-1})right)$样本,其中$Deltau i$($igeq 2$)表示arm$i$和最佳arm之间的报酬差距,我们定义$Deltau 1=Deltau 2$。在某些有利的情况下,这与方差无关算法相比具有显著的优势,并且与最新技术相比,这是第一个消除最佳arm上额外的$ln n$因子的结果。我们进一步证明了$Omegaleft(sum{i=1}^nleft(frac{sigma{i^2}{Delta{i^2} frac{1}{Delta{i} right)lnDelta^{-1} right)$样本是算法实现相同目标所必需的,从而说明我们的算法在双对数项下是最优的。 摘要:We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean $theta_i$ and variance $sigma_i^2$. Assume $theta_1 > theta_2 geq cdots geqtheta_n$. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability $(1-delta)$ and uses at most $O left(sum_{i = 1}^n left(frac{sigma_i^2}{Delta_i^2} frac{1}{Delta_i}right)(ln delta^{-1} ln ln Delta_i^{-1})right)$ samples, where $Delta_i$ ($i geq 2$) denotes the reward gap between arm $i$ and the best arm and we define $Delta_1 = Delta_2$. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra $ln n$ factor on the best arm compared with the state-of-the-art. We further show that $Omega left( sum_{i = 1}^n left( frac{sigma_i^2}{Delta_i^2} frac{1}{Delta_i} right) ln delta^{-1} right)$ samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

【72】 Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle 标题:面向故障Oracle的查询优化和时间效率聚类算法

作者:Pan Peng,Jiapeng Zhang 机构:Department of Computer Science, University of Sheffield, uk†Department of Computer Science, University of Southern California 备注:Accepted for presentation at the Conference on Learning Theory (COLT) 2021 链接:https://arxiv.org/abs/2106.10374 摘要:基于数据库中的众包实体解析、社交网络中的符号边缘预测和相关聚类的应用,Mazumdar和Saha[NIPS 2017]提出了一个优雅的理论模型,用于研究具有错误oracle的聚类。在这个模型中,给定一组属于$k$未知组(或簇)的$n$项,我们的目标是通过向oracle请求成对查询来恢复簇。这个oracle可以回答“项目$u$和$v$属于同一个集群吗?”。但是,对每个成对查询的回答都会出现错误,概率为$varepsilon$,对于某些$varepsilon in(0, frac12)$。Mazumdar和Saha在该模型下给出了两种算法:一种算法是查询最优的,但时间效率不高(即以准多项式时间运行);另一种算法是时间效率高(即以多项式时间运行),但查询次优。Larsen,Mitzenmacher和tsourakakakis[WWW 2020]针对$2$簇的特殊情况给出了一种新的时间效率算法,当模型的偏差$delta:=1-2varepsilon$较大时,该算法是查询最优的。对于$k$簇的一般情况和$delta$的其他区域,是否能得到一个查询最优的、时间效率高的算法是一个悬而未决的问题。在本文中,我们在上述问题上取得了进展,并在信息论恢复可能的情况下,为所有常数$k$和区域中的任何$delta$提供了一个具有接近最优查询复杂度(高达$O(log^2n)$)的时间效率算法。我们的算法建立在随机块模型的基础上。 摘要:Motivated by applications in crowdsourced entity resolution in database, signed edge prediction in social networks and correlation clustering, Mazumdar and Saha [NIPS 2017] proposed an elegant theoretical model for studying clustering with a faulty oracle. In this model, given a set of $n$ items which belong to $k$ unknown groups (or clusters), our goal is to recover the clusters by asking pairwise queries to an oracle. This oracle can answer the query that ``do items $u$ and $v$ belong to the same cluster?''. However, the answer to each pairwise query errs with probability $varepsilon$, for some $varepsilonin(0,frac12)$. Mazumdar and Saha provided two algorithms under this model: one algorithm is query-optimal while time-inefficient (i.e., running in quasi-polynomial time), the other is time efficient (i.e., in polynomial time) while query-suboptimal. Larsen, Mitzenmacher and Tsourakakis [WWW 2020] then gave a new time-efficient algorithm for the special case of $2$ clusters, which is query-optimal if the bias $delta:=1-2varepsilon$ of the model is large. It was left as an open question whether one can obtain a query-optimal, time-efficient algorithm for the general case of $k$ clusters and other regimes of $delta$. In this paper, we make progress on the above question and provide a time-efficient algorithm with nearly-optimal query complexity (up to a factor of $O(log^2 n)$) for all constant $k$ and any $delta$ in the regime when information-theoretic recovery is possible. Our algorithm is built on a connection to the stochastic block model.

【73】 Intersectional synergies: untangling irreducible effects of intersecting identities via information decomposition 标题:交叉性协同效应:通过信息分解解开交叉性同一性不可削减的影响

作者:Thomas F. Varley 机构:School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA, Department of Psychology & Brain Sciences, Indiana University, Bloomington, IN, USA 备注:15 pages, 6 figures 链接:https://arxiv.org/abs/2106.10338 摘要:交叉性的概念已经成为学术社会学中的一个经常讨论的话题,也成为诸如黑人生命问题、交叉女权运动和LGBT权利等社会正义运动中的一个经常讨论的话题。交叉性提出,个人的社会经验具有一些方面,这些方面不可简化为个人各种身份的总和,但“大于其各部分的总和”。在这项工作中,我们证明了交叉恒等式的影响可以用信息论的方法在经验数据中进行统计观察。我们发现,当考虑不同身份类别(如种族、性别和收入)对健康和幸福等结果的预测关系时,出现了强大的统计协同效应。这些协同作用表明,身份对结果的联合影响不可还原为单独考虑的任何身份,只有在具体类别一起考虑时才会出现(例如,联合考虑种族和性别对收入的巨大协同影响不可还原为种族或性别)。然后,我们使用合成数据表明,当前评估数据交叉性的金标准方法(具有乘法交互系数的线性回归)无法消除真正协同、大于其部分交互作用和冗余交互作用之间的歧义。我们探讨了这两种不同类型的交互作用在数据交叉关系推断中的意义,以及能够可靠区分这两种交互作用的重要性。最后,我们得出结论,信息理论作为一个对数据的非线性和协同效应敏感的无模型框架,是探索高阶社会动态空间的自然方法。 摘要:The idea of intersectionality has become a frequent topic of discussion both in academic sociology, as well as among popular movements for social justice such as Black Lives Matter, intersectional feminism, and LGBT rights. Intersectionality proposes that an individual's experience of society has aspects that are irreducible to the sum of one's various identities considered individually, but are "greater than the sum of their parts." In this work, we show that the effects of intersectional identities can be statistically observed in empirical data using information theory. We show that, when considering the predictive relationship between various identities categories such as race, sex, and income (as a proxy for class) on outcomes such as health and wellness, robust statistical synergies appear. These synergies show that there are joint-effects of identities on outcomes that are irreducible to any identity considered individually and only appear when specific categories are considered together (for example, there is a large, synergistic effect of race and sex considered jointly on income irreducible to either race or sex). We then show using synthetic data that the current gold-standard method of assessing intersectionalities in data (linear regression with multiplicative interaction coefficients) fails to disambiguate between truly synergistic, greater-than-the-sum-of-their-parts interactions, and redundant interactions. We explore the significance of these two distinct types of interactions in the context of making inferences about intersectional relationships in data and the importance of being able to reliably differentiate the two. Finally, we conclude that information theory, as a model-free framework sensitive to nonlinearities and synergies in data, is a natural method by which to explore the space of higher-order social dynamics.

【74】 Non-parametric Differentially Private Confidence Intervals for the Median 标题:中位数的非参数微分私有置信区间

作者:Joerg Drechsler,Ira Globus-Harris,Audra McMillan,Jayshree Sarathy,Adam Smith 机构:Institute for Employment Research, Germany, The Joint Program in Survey Methodology, University of Maryland, USA, University of Pennsylvania, USA, Apple, USA, Harvard John A. Paulson School of Engineering and Applied Sciences, USA 备注:44 pages, 15 figures 链接:https://arxiv.org/abs/2106.10333 摘要:差异隐私是对数据处理算法的一种限制,它为数据中的单个记录提供了强大的保密性保证。然而,关于适当的统计推断的研究,即关于适当量化(噪声)样本估计关于总体中真实值的不确定性的研究,目前仍然是有限的。本文提出并评估了几种策略来计算有效的差异私有置信区间的中位数。我们不需要计算微分私有点估计并推导它的不确定性,而是直接估计区间界,并讨论在保证隐私性很重要的情况下,这种方法优越的原因。我们还说明,同时处理两个不确定性来源——采样误差和保护输出误差——应优先于以顺序方式合并不确定性的简单方法。在广泛的模拟研究中,我们评估了不同参数设置下不同算法的性能,并利用1940年十年一次的人口普查数据演示了这些发现如何应用于实际情况。 摘要:Differential privacy is a restriction on data processing algorithms that provides strong confidentiality guarantees for individual records in the data. However, research on proper statistical inference, that is, research on properly quantifying the uncertainty of the (noisy) sample estimate regarding the true value in the population, is currently still limited. This paper proposes and evaluates several strategies to compute valid differentially private confidence intervals for the median. Instead of computing a differentially private point estimate and deriving its uncertainty, we directly estimate the interval bounds and discuss why this approach is superior if ensuring privacy is important. We also illustrate that addressing both sources of uncertainty--the error from sampling and the error from protecting the output--simultaneously should be preferred over simpler approaches that incorporate the uncertainty in a sequential fashion. We evaluate the performance of the different algorithms under various parameter settings in extensive simulation studies and demonstrate how the findings could be applied in practical settings using data from the 1940 Decennial Census.

【75】 Group-Structured Adversarial Training 标题:团体结构化对抗性训练

作者:Farzan Farnia,Amirali Aghazadeh,James Zou,David Tse 机构: University of California, edu‡Department of Biomedical Data Science, Stanford University 链接:https://arxiv.org/abs/2106.10324 摘要:抗输入数据扰动的鲁棒训练方法在机器学习领域受到了广泛的关注。在这个方向上的一个标准方法是对抗性训练,它使用对抗性扰动的训练样本来学习模型。然而,对抗性训练在对抗样本间的干扰时表现得并不理想,比如普遍的和群体的稀疏位移,而这种位移通常存在于生物数据中,比如不同组织的基因表达水平。在这项工作中,我们试图弥补这一最优性差距,并引入了群体结构对抗训练(GSAT),它学习一个对跨样本结构的扰动具有鲁棒性的模型。我们将GSAT描述为一个非凸-凹minimax优化问题,该问题使得一个群体结构的最优运输成本最小化。具体地说,我们重点研究了GSAT在群稀疏扰动和秩约束扰动中的应用。为了解决这种情况下GSAT的非光滑优化问题,结合梯度下降上升法(GDA)和交替方向乘子法(ADMM),提出了一种新的minimax优化算法GDADMM。本文介绍了GSAT框架在图像识别和计算生物学数据集抗结构扰动方面的若干应用。 摘要:Robust training methods against perturbations to the input data have received great attention in the machine learning literature. A standard approach in this direction is adversarial training which learns a model using adversarially-perturbed training samples. However, adversarial training performs suboptimally against perturbations structured across samples such as universal and group-sparse shifts that are commonly present in biological data such as gene expression levels of different tissues. In this work, we seek to close this optimality gap and introduce Group-Structured Adversarial Training (GSAT) which learns a model robust to perturbations structured across samples. We formulate GSAT as a non-convex concave minimax optimization problem which minimizes a group-structured optimal transport cost. Specifically, we focus on the applications of GSAT for group-sparse and rank-constrained perturbations modeled using group and nuclear norm penalties. In order to solve GSAT's non-smooth optimization problem in those cases, we propose a new minimax optimization algorithm called GDADMM by combining Gradient Descent Ascent (GDA) and Alternating Direction Method of Multipliers (ADMM). We present several applications of the GSAT framework to gain robustness against structured perturbations for image recognition and computational biology datasets.

【76】 Dependency Structure Misspecification in Multi-Source Weak Supervision Models 标题:多源弱监督模型中的依赖结构错误规范

作者:Salva Rühling Cachay,Benedikt Boecking,Artur Dubrawski 机构:Carnegie Mellon University 备注:Oral presentation at the Workshop on Weakly Supervised Learning at ICLR 2021 链接:https://arxiv.org/abs/2106.10302 摘要:数据编程(DP)已被证明是一个有吸引力的替代昂贵的手工数据标签。在DP中,用户将领域知识编码为emph{labeling functions}(LF),这是一种启发式方法,它对数据的子集进行有噪声的标记,并且可能具有复杂的依赖关系。然后将标签模型拟合到LFs中,以产生未知类标签的估计。研究了标签模型错误对下游分类器测试集性能的影响。这给实践者带来了一个严重的认识差距,特别是因为在DP的现场应用中,LFs之间的依赖结构常常被忽略。我们分析了由于结构过度规范而导致的建模误差。我们推导了建模误差的新的理论界,并从经验上证明了这种误差可能是巨大的,即使在建模一个看似合理的结构时也是如此。 摘要:Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure.

【77】 Adaptive Group Testing on Networks with Community Structure 标题:社区结构网络的自适应分组测试

作者:Surin Ahn,Wei-Ning Chen,Ayfer Ozgur 机构: and Ayfer ¨Ozg¨ur are with the Department of Electrical Engineering, Stanford University 备注:26 pages, 5 figures, to be presented in part at the 2021 IEEE International Symposium on Information Theory (ISIT) 链接:https://arxiv.org/abs/2101.02405 摘要:自第二次世界大战中群体测试问题出现以来,该问题概率变体中的一个普遍假设是,人群中的个体独立地受到某种疾病的感染。然而,这一假设在实践中很少成立,因为疾病通常通过个体间的相互作用传播,因此导致感染相互关联。受COVID-19和类似疾病特征的启发,我们考虑了一个网络上的感染模型,该模型推广了传统的基于概率群检验的i.i.d.模型。在这个感染模型下,我们询问是否可以利用网络结构的知识来更有效地执行组测试,特别关注从随机块模型中提取的社区结构图。我们证明了当网络和感染参数有利于“强社区结构”时,我们提出的自适应图感知算法的性能优于基线二进制分割算法,并且在某些参数区域是偶数阶最优的。我们用数值模拟来支持我们的结果。 摘要:Since the inception of the group testing problem in World War II, one of the prevailing assumptions in the probabilistic variant of the problem has been that individuals in the population are infected by a disease independently. However, this assumption rarely holds in practice, as diseases typically spread through interactions between individuals and therefore cause infections to be correlated. Inspired by characteristics of COVID-19 and similar diseases, we consider an infection model over networks which generalizes the traditional i.i.d. model from probabilistic group testing. Under this infection model, we ask whether knowledge of the network structure can be leveraged to perform group testing more efficiently, focusing specifically on community-structured graphs drawn from the stochastic block model. We prove that when the network and infection parameters are conducive to "strong community structure," our proposed adaptive, graph-aware algorithm outperforms the baseline binary splitting algorithm, and is even order-optimal in certain parameter regimes. We support our results with numerical simulations.

0 人点赞