统计学学术速递[12.9]

2021-12-09 20:28:24 浏览数 (1)

stat统计学,共计36篇

【1】 Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression 标题:乐观率:线性回归中插值学习和正则化的统一理论 链接:https://arxiv.org/abs/2112.04470

作者:Lijia Zhou,Frederic Koehler,Danica J. Sutherland,Nathan Srebro 摘要:我们研究了高斯数据线性回归的局部一致收敛概念,称为“乐观率”(Panchenko 2002;Srebro等人,2010)。我们的精细分析避免了现有结果中隐藏的常数和对数因子,这在高维环境中是至关重要的,特别是对于理解插值学习。作为特例,我们的分析恢复了Koehler et al.(2021)的保证,该保证紧密地刻画了良性过拟合条件下低范数插值器的总体风险。然而,我们的乐观率界也分析了具有任意训练误差的预测值。这使我们能够恢复随机设计下岭回归和套索回归的一些经典统计保证,并帮助我们获得对过度参数化区域中近内插子过度风险的精确理解。 摘要:We study a localized notion of uniform convergence known as an "optimistic rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian data. Our refined analysis avoids the hidden constant and logarithmic factor in existing results, which are known to be crucial in high-dimensional settings, especially for understanding interpolation learning. As a special case, our analysis recovers the guarantee from Koehler et al. (2021), which tightly characterizes the population risk of low-norm interpolators under the benign overfitting conditions. Our optimistic rate bound, though, also analyzes predictors with arbitrary training error. This allows us to recover some classical statistical guarantees for ridge and LASSO regression under random designs, and helps us obtain a precise understanding of the excess risk of near-interpolators in the over-parameterized regime.

【2】 Consistency of Spectral Seriation 标题:光谱序列的一致性 链接:https://arxiv.org/abs/2112.04408

作者:Amine Natik,Aaron Smith 摘要:考虑一个随机图$g $大小根据$Trimt{{Grimon } $ ,,,[0,1] ^ { 2 } MAPSTO[0,1] $构造如下。首先将$N$顶点$V={V_1,V_2,ldots,V_N}$嵌入区间$[0,1]$,然后为每个$i<j$在$V_{i},V_{j}$之间添加一条边,概率为$w(V_{i},V_{j})$。仅考虑图的邻接矩阵,我们可能期望能够近似地重建置换$sigma$,如果$w$满足[Janssen 2019]中引入的下列{textit{线性嵌入}属性,$v{sigma(1)}<ldots<v{sigma(N)}$,则$w(x,y)$随着$y$从$x$移出,每$x$减少一个。对于一个大的非参数图族,我们表明:(i)流行的谱序列化算法[Atkins 1998]提供了$hat{sigma}$的一致估计$hat{sigma}$,以及(ii)少量的后处理结果在估计$tilde{sigma}$中以接近最优的速率收敛到$sigma$,均为$Nrightarrowinfty$。 摘要:Consider a random graph $G$ of size $N$ constructed according to a textit{graphon} $w , : , [0,1]^{2} mapsto [0,1]$ as follows. First embed $N$ vertices $V = {v_1, v_2, ldots, v_N}$ into the interval $[0,1]$, then for each $i < j$ add an edge between $v_{i}, v_{j}$ with probability $w(v_{i}, v_{j})$. Given only the adjacency matrix of the graph, we might expect to be able to approximately reconstruct the permutation $sigma$ for which $v_{sigma(1)} < ldots < v_{sigma(N)}$ if $w$ satisfies the following textit{linear embedding} property introduced in [Janssen 2019]: for each $x$, $w(x,y)$ decreases as $y$ moves away from $x$. For a large and non-parametric family of graphons, we show that (i) the popular spectral seriation algorithm [Atkins 1998] provides a consistent estimator $hat{sigma}$ of $sigma$, and (ii) a small amount of post-processing results in an estimate $tilde{sigma}$ that converges to $sigma$ at a nearly-optimal rate, both as $N rightarrow infty$.

【3】 Matching for causal effects via multimarginal optimal transport 标题:基于多边际最优传输的因果效应匹配 链接:https://arxiv.org/abs/2112.04398

作者:Florian Gunsilius,Yuliang Xu 备注:Main text is 22 pages, 4 Figures, and 15 pages of Appendix 摘要:在观察性研究中,协变量匹配是一个成熟的评估因果效应的框架。这些环境中的主要挑战来自问题的高维结构。已经引入了许多方法来应对这一挑战,在计算和统计性能以及可解释性方面各有优缺点。此外,方法学的重点一直是在二元治疗方案中匹配两个样本,但迄今为止还没有一种能够在多个治疗方案中最佳平衡样本的专用方法。本文介绍了一种基于熵正则化多边际最优传输的自然最优匹配方法,它具有许多有用的性质来解决这些挑战。它提供了匹配个体的可解释权重,这些权重以参数速率收敛到群体中的最优权重,可以通过经典的迭代比例拟合程序有效地实现,甚至可以同时匹配多个治疗臂。它还具有明显优异的有限样本特性。 摘要:Matching on covariates is a well-established framework for estimating causal effects in observational studies. The principal challenge in these settings stems from the often high-dimensional structure of the problem. Many methods have been introduced to deal with this challenge, with different advantages and drawbacks in computational and statistical performance and interpretability. Moreover, the methodological focus has been on matching two samples in binary treatment scenarios, but a dedicated method that can optimally balance samples across multiple treatments has so far been unavailable. This article introduces a natural optimal matching method based on entropy-regularized multimarginal optimal transport that possesses many useful properties to address these challenges. It provides interpretable weights of matched individuals that converge at the parametric rate to the optimal weights in the population, can be efficiently implemented via the classical iterative proportional fitting procedure, and can even match several treatment arms simultaneously. It also possesses demonstrably excellent finite sample properties.

【4】 Robust parameter estimation of regression model with weaker moment 标题:弱阶矩回归模型的稳健参数估计 链接:https://arxiv.org/abs/2112.04358

作者:Kangqiang Li,Songqiao Tang,Lixin Zhang 备注:18 pages 摘要:本文给出了当协变量或响应具有较弱的矩条件时,估计高维回归模型参数矩阵的一些推广结果。我们研究了Fan等人(Ann Stat 49(3):1239-12662021)对具有$(1 epsilon)$次矩的矩阵完备模型的$M$估计量。观察到了相应的相变现象。当$epsilongeq 1$时,鲁棒估计具有与以往文献相同的收敛速度。当$1>epsilon>0$时,速率将变慢。对于高维多指标系数模型,我们还应用元素截断方法构造了一个鲁棒估计,该估计可以处理具有有限四阶矩的缺失和重尾数据。 摘要:This paper provides some extended results on estimating the parameter matrix of high-dimensional regression model when the covariate or response possess weaker moment condition. We investigate the $M$-estimator of Fan et al. (Ann Stat 49(3):1239--1266, 2021) for matrix completion model with $(1 epsilon)$-th moments. The corresponding phase transition phenomenon is observed. When $epsilon geq 1$, the robust estimator possesses the same convergence rate as previous literature. While $1> epsilon>0$, the rate will be slower. For high dimensional multiple index coefficient model, we also apply the element-wise truncation method to construct a robust estimator which handle missing and heavy-tailed data with finite fourth moment.

【5】 Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing 标题:旋转不变广义线性模型的近似消息传递估计 链接:https://arxiv.org/abs/2112.04330

作者:Ramji Venkataramanan,Kevin Kögler,Marco Mondelli 备注:31 pages, 4 figures 摘要:我们考虑通过旋转不变的设计矩阵定义的广义线性模型中的信号估计问题。由于这些矩阵可以具有任意的谱分布,因此该模型非常适合捕捉应用中经常出现的复杂相关结构。我们提出了一系列新的近似消息传递(AMP)信号估计算法,并通过状态演化递归严格描述了它们在高维极限下的性能。假设知道设计矩阵谱,我们的旋转不变放大器的复杂度与现有的高斯矩阵放大器的复杂度相同;作为特例,它还可以恢复现有的放大器。数值结果表明,该算法的性能接近于向量AMP(在某些情况下被认为是Bayes最优的),但其复杂度要低得多,因为该算法不需要计算昂贵的奇异值分解。 摘要:We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited to capture complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. Assuming knowledge of the design matrix spectrum, our rotationally invariant AMP has complexity of the same order as the existing AMP for Gaussian matrices; it also recovers the existing AMP as a special case. Numerical results showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal in some settings), but obtained with a much lower complexity, as the proposed algorithm does not require a computationally expensive singular value decomposition.

【6】 Multiway Ensemble Kalman Filter 标题:多路乐团卡尔曼过滤 链接:https://arxiv.org/abs/2112.04322

作者:Yu Wang,Alfred Hero 备注:Appeared in NeurIPS'21 Workshop on Machine Learning and the Physical Sciences 摘要:在这项工作中,我们研究了由偏微分方程(PDE)控制的动力学过程的二阶统计特征中出现的稀疏性和多向结构。我们考虑几个最先进的多通道协方差和逆协方差(精度)矩阵估计,并检查其优点和缺点的准确性和可解释性的背景下,物理驱动的预测时,纳入集成卡尔曼滤波器(Enkf)。特别是,我们表明,当结合适当的协方差和精确矩阵估计量时,由泊松和对流扩散类型的偏微分方程生成的多路数据可以通过EnKF精确跟踪。 摘要:In this work, we study the emergence of sparsity and multiway structures in second-order statistical characterizations of dynamical processes governed by partial differential equations (PDEs). We consider several state-of-the-art multiway covariance and inverse covariance (precision) matrix estimators and examine their pros and cons in terms of accuracy and interpretability in the context of physics-driven forecasting when incorporated into the ensemble Kalman filter (EnKF). In particular, we show that multiway data generated from the Poisson and the convection-diffusion types of PDEs can be accurately tracked via EnKF when integrated with appropriate covariance and precision matrix estimators.

【7】 Bayesian Modeling of Effective and Functional Brain Connectivity using Hierarchical Vector Autoregressions 标题:基于分层向量自回归的有效和功能性脑连通性的贝叶斯建模 链接:https://arxiv.org/abs/2112.04249

作者:Bertil Wegmann,Anders Lundquist,Anders Eklund,Mattias Villani 备注:21 pages, 5 figures 摘要:分析大脑连通性对于理解大脑如何处理信息非常重要。我们提出了一种新的贝叶斯向量自回归(VAR)层次模型,用于分析自闭症谱系障碍(ASD)患者和健康对照组静息状态功能磁共振成像数据集中的大脑连通性。我们的方法同时对功能性和有效的连通性进行建模,这是VAR文献中关于大脑连通性的新方法,并且允许组和单受试者推理以及组比较。我们结合分析边缘化和哈密顿蒙特卡罗(HMC)来获得高效的后验抽样。与我们的结果相比,从更简化的协方差设置中得出的结果通常对区域间的功能连通性过于乐观。此外,与所有受试者具有共同协方差矩阵的模型相比,我们对异质受试者特定协方差矩阵的建模显示出较小的有效连通性差异。 摘要:Analysis of brain connectivity is important for understanding how information is processed by the brain. We propose a novel Bayesian vector autoregression (VAR) hierarchical model for analyzing brain connectivity in a resting-state fMRI data set with autism spectrum disorder (ASD) patients and healthy controls. Our approach models functional and effective connectivity simultaneously, which is new in the VAR literature for brain connectivity, and allows for both group- and single-subject inference as well as group comparisons. We combine analytical marginalization with Hamiltonian Monte Carlo (HMC) to obtain highly efficient posterior sampling. The results from more simplified covariance settings are, in general, overly optimistic about functional connectivity between regions compared to our results. In addition, our modeling of heterogeneous subject-specific covariance matrices is shown to give smaller differences in effective connectivity compared to models with a common covariance matrix to all subjects.

【8】 Hybrid Data-driven Framework for Shale Gas Production Performance Analysis via Game Theory, Machine Learning and Optimization Approaches 标题:基于博弈论、机器学习和优化方法的混合数据驱动页岩气开采动态分析框架 链接:https://arxiv.org/abs/2112.04243

作者:Jin Menga,Yujie Zhou,Tianrui Ye,Yitian Xiao 备注:37 pages, 15 figures, 6 tables 摘要:全面准确地分析页岩气生产动态对于评估资源潜力、设计油田开发计划和作出投资决策至关重要。然而,定量分析可能具有挑战性,因为生产性能由一系列地质和工程因素之间的复杂相互作用所决定。在本研究中,我们提出了一种混合数据驱动的页岩气生产绩效分析程序,该程序由一个完整的主导因素分析、产量预测和开发优化工作流程组成。更具体地说,将博弈论和机器学习模型耦合起来,以确定主要的地质和工程因素。采用具有明确物理意义的Shapley值来定量测量各个因素的影响。训练多模型融合叠加模型进行产量预测,在此基础上引入无导数优化算法对开发计划进行优化。利用从中国四川盆地涪陵页岩气田收集的实际生产数据验证了完整的工作流程。验证结果表明,所提出的程序能够得出具有量化证据的严格结论,从而为开发计划优化提供具体可靠的建议。与传统方法和基于经验的方法相比,混合数据驱动方法在效率和准确性方面都有所提高。 摘要:A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential, designing field development plan, and making investment decisions. However, quantitative analysis can be challenging because production performance is dominated by a complex interaction among a series of geological and engineering factors. In this study, we propose a hybrid data-driven procedure for analyzing shale gas production performance, which consists of a complete workflow for dominant factor analysis, production forecast, and development optimization. More specifically, game theory and machine learning models are coupled to determine the dominating geological and engineering factors. The Shapley value with definite physical meanings is employed to quantitatively measure the effects of individual factors. A multi-model-fused stacked model is trained for production forecast, on the basis of which derivative-free optimization algorithms are introduced to optimize the development plan. The complete workflow is validated with actual production data collected from the Fuling shale gas field, Sichuan Basin, China. The validation results show that the proposed procedure can draw rigorous conclusions with quantified evidence and thereby provide specific and reliable suggestions for development plan optimization. Comparing with traditional and experience-based approaches, the hybrid data-driven procedure is advanced in terms of both efficiency and accuracy.

【9】 Determinantal shot noise Cox processes 标题:行列式散粒噪声Cox过程 链接:https://arxiv.org/abs/2112.04204

作者:Jesper Møller,Ninna Vihrs 备注:13 pages, 5 figures 摘要:我们提出了一类新的簇点过程模型,我们称之为行列式散粒噪声Cox过程(DSNCP),簇中心之间存在排斥作用。它们是广义散粒噪声Cox过程的特例,其中簇中心是行列式点过程。我们建立了各种矩结果,并描述了如何在两种特别容易处理的情况下使用这些矩结果来轻松估计未知参数,即当子代密度为各向同性高斯且簇中心的决定点过程的核为高斯或类似于标度Ginibre点过程时。通过对真实点模式数据集的分析,我们发现,当对聚集点模式进行建模时,与散粒噪声Cox过程相比,DSNCP模型中可能需要低得多的聚集中心强度。 摘要:We present a new class of cluster point process models, which we call determinantal shot noise Cox processes (DSNCP), with repulsion between cluster centres. They are the special case of generalized shot noise Cox processes where the cluster centres are determinantal point processes. We establish various moment results and describe how these can be used to easily estimate unknown parameters in two particularly tractable cases, namely when the offspring density is isotropic Gaussian and the kernel of the determinantal point process of cluster centres is Gaussian or like in a scaled Ginibre point process. Through the analysis of a real point pattern data set we see that when modelling clustered point patterns, a much lower intensity of cluster centres may be needed in DSNCP models as compared to shot noise Cox processes.

【10】 Statistical Inference for Large-dimensional Matrix Factor Model from Least Squares and Huber Loss Points of View 标题:从最小二乘和Huber损失角度对高维矩阵因子模型的统计推断 链接:https://arxiv.org/abs/2112.04186

作者:Yong He,Xinbing Kong,Long Yu,Xinsheng Zhang,Changwei Zhao 摘要:在本文中,我们重点研究了大维矩阵因子模型,并从最小化最小二乘目标函数的角度提出了因子加载矩阵和因子得分矩阵的估计量。结果表明,所得到的估计量与Yu等人(2021年)中相应的投影估计量相当,具有降低特殊误差分量大小从而提高信噪比的良好特性。我们推导了次高斯尾下理论极小值的收敛速度,而不是Yu等人(2021)的一步迭代估计。在最小二乘法的启发下,我们进一步考虑了利用胡贝尔损失函数估计大维矩阵因子模型的一种稳健方法。理论上,我们推导了有限四阶矩条件下因子加载矩阵的鲁棒估计的收敛速度。我们还提出了一个迭代过程来稳健地估计行和列因子数对。我们进行了广泛的数值研究,以调查所提出的稳健方法相对于现有稳健方法的经验性能,结果表明,当数据为重尾时,所提出的稳健方法的性能比现有方法好得多,而性能几乎相同(可比)当数据是轻尾数据时,使用投影估计,因此可以作为现有估计的安全替换。Fama-French金融投资组合数据集的应用说明了其经验有用性。 摘要:In the article we focus on large-dimensional matrix factor models and propose estimators of factor loading matrices and factor score matrix from the perspective of minimizing least squares objective function. The resultant estimators turns out to be equivalent to the corresponding projected estimators in Yu et al. (2021), which enjoys the nice properties of reducing the magnitudes of the idiosyncratic error components and thereby increasing the signal-to-noise ratio. We derive the convergence rate of the theoretical minimizers under sub-Gaussian tails, instead of the one-step iteration estimators by Yu et al. (2021). Motivated by the least squares formulation, we further consider a robust method for estimating large-dimensional matrix factor model by utilizing Huber Loss function. Theoretically, we derive the convergence rates of the robust estimators of the factor loading matrices under finite fourth moment conditions. We also propose an iterative procedure to estimate the pair of row and column factor numbers robustly. We conduct extensive numerical studies to investigate the empirical performance of the proposed robust methods relative to the sate-of-the-art ones, which show the proposed ones perform robustly and much better than the existing ones when data are heavy-tailed while perform almost the same (comparably) with the projected estimators when data are light-tailed, and as a result can be used as a safe replacement of the existing ones. An application to a Fama-French financial portfolios dataset illustrates its empirical usefulness.

【11】 A semi-group approach to Principal Component Analysis 标题:主成分分析的一种半群方法 链接:https://arxiv.org/abs/2112.04026

作者:Martin Schlather,Felix Reinbott 摘要:主成分分析(PCA)是一种众所周知的降低数据集内在复杂性的方法,主要通过简化协方差结构或相关结构来实现。我们引入了一种新的代数、基于模型的观点,并通过将PCA表述为最佳低阶近似问题,特别将PCA扩展到无二阶矩的分布。与迄今为止已有的方法相比,近似是基于一种光谱表示,而不是基于真实空间。尽管如此,特征向量的突出作用在这里被简化为定义近似曲面及其最大维数。从这个角度来看,我们的方法接近皮尔逊(1901)的原始想法,因此也接近自动编码器。由于线性回归中的变量选择可以看作是我们扩展的一个特例,因此我们的方法给出了一些见解,即为什么各种变量选择方法,如正向选择和最佳子集选择,不能期望一致。线性回归模型本身和PCA回归显示为极限情况。 摘要:Principal Component Analysis (PCA) is a well known procedure to reduce intrinsic complexity of a dataset, essentially through simplifying the covariance structure or the correlation structure. We introduce a novel algebraic, model-based point of view and provide in particular an extension of the PCA to distributions without second moments by formulating the PCA as a best low rank approximation problem. In contrast to hitherto existing approaches, the approximation is based on a kind of spectral representation, and not on the real space. Nonetheless, the prominent role of the eigenvectors is here reduced to define the approximating surface and its maximal dimension. In this perspective, our approach is close to the original idea of Pearson (1901) and hence to autoencoders. Since variable selection in linear regression can be seen as a special case of our extension, our approach gives some insight, why the various variable selection methods, such as forward selection and best subset selection, cannot be expected to coincide. The linear regression model itself and the PCA regression appear as limit cases.

【12】 Convergence rate bounds for iterative random functions using one-shot coupling 标题:单次耦合迭代随机函数的收敛速度界 链接:https://arxiv.org/abs/2112.03982

作者:Sabrina Sixta,Jeffrey S. Rosenthal 摘要:一次耦合是一种在总变化距离内限制马尔可夫链两个副本之间收敛速度的方法。该方法分为两个部分:收缩阶段(当链在预期距离内收敛时)和凝聚阶段(当尝试耦合时,发生在最后一次迭代时)。该方法与用于模拟的普通随机数技术非常相似。本文利用一次耦合方法给出了一个求马尔可夫链收敛速度上界的一般定理。我们的定理不需要使用任何外生变量,如漂移函数或二值化常数。然后,我们将一般定理应用于两类马尔可夫链:随机函数自回归过程和随机缩放迭代随机函数。我们提供了多个例子,说明该定理如何应用于各种模型,包括高维模型。这些例子说明了如何以一种简单的方式验证定理的条件。一次耦合法似乎产生了严格的几何收敛速度边界。 摘要:One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance. The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. The method closely resembles the common random number technique used for simulation. In this paper, we present a general theorem for finding the upper bound on the Markov chain convergence rate that uses the one-shot coupling method. Our theorem does not require the use of any exogenous variables like a drift function or minorization constant. We then apply the general theorem to two families of Markov chains: the random functional autoregressive process and the randomly scaled iterated random function. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how theorem's conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds.

【13】 Stabilized Direct Learning for Efficient Estimation of Individualized Treatment Rules 标题:用于有效估计个体化治疗规则的稳定直接学习 链接:https://arxiv.org/abs/2112.03981

作者:Kushal S. Shah,Haoda Fu,Michael R. Kosorok 摘要:近年来,精确医学领域取得了许多进展。重点放在创建算法以估计个体化治疗规则(ITR)上,该规则从患者协变量映射到可用治疗空间,目标是最大化患者结果。直接学习(D-Learning)是一种最新的一步方法,通过直接建模治疗协变量相互作用来估计ITR。然而,当结果的方差在治疗和协变量方面是异质的时,D-Learning不会利用这种结构。本文提出的稳定直接学习(SD Learning)通过残差重新加权,利用误差项中的潜在异方差,通过灵活的机器学习算法(如XGBoost和随机森林)对残差方差进行建模。我们还开发了一个内部交叉验证方案,用于确定竞争模型中的最佳残差模型。SD学习提高了二元和多臂治疗方案中D学习估计的效率。该方法实现简单,是改进D-Learning家族中现有算法的简单方法,包括原始D-Learning、基于角度的D-Learning(AD-Learning)和鲁棒D-Learning(RD-Learning)。我们提供了SD学习最优性的理论性质和证明。通过模拟提供了与D-学习方法的头对头性能比较,显示了平均预测误差(APE)、误分类率和经验值方面的改进,以及对AIDS随机临床试验的数据分析。 摘要:In recent years, the field of precision medicine has seen many advancements. Significant focus has been placed on creating algorithms to estimate individualized treatment rules (ITR), which map from patient covariates to the space of available treatments with the goal of maximizing patient outcome. Direct Learning (D-Learning) is a recent one-step method which estimates the ITR by directly modeling the treatment-covariate interaction. However, when the variance of the outcome is heterogeneous with respect to treatment and covariates, D-Learning does not leverage this structure. Stabilized Direct Learning (SD-Learning), proposed in this paper, utilizes potential heteroscedasticity in the error term through a residual reweighting which models the residual variance via flexible machine learning algorithms such as XGBoost and random forests. We also develop an internal cross-validation scheme which determines the best residual model amongst competing models. SD-Learning improves the efficiency of D-Learning estimates in binary and multi-arm treatment scenarios. The method is simple to implement and an easy way to improve existing algorithms within the D-Learning family, including original D-Learning, Angle-based D-Learning (AD-Learning), and Robust D-Learning (RD-Learning). We provide theoretical properties and justification of the optimality of SD-Learning. Head-to-head performance comparisons with D-Learning methods are provided through simulations, which demonstrate improvement in terms of average prediction error (APE), misclassification rate, and empirical value, along with data analysis of an AIDS randomized clinical trial.

【14】 Change-point Detection for Piecewise Exponential Models 标题:分段指数模型的变点检测 链接:https://arxiv.org/abs/2112.03962

作者:Philip Cooney,Arthur White 备注:21 pages, 3 figures 摘要:在具有事件时间数据的决策建模中,参数模型通常用于推断幸存者函数。其中一个模型是分段指数模型,根据该模型,危险函数被划分为若干段,段内的危险常数与段之间的危险常数无关,这些段的边界称为变化点。我们提出了一种确定分段指数模型中变化点位置和数量的方法。使用马尔可夫链蒙特卡罗(MCMC)在贝叶斯框架中进行推理,模型参数可以从模型中集成出来,变化点的数量可以作为MCMC方案的一部分进行采样。对于给定的变更点模型,我们可以估计变更点位置和危险的不确定性,并获得变更点数量的概率解释。我们在模拟研究中评估了模型性能,以确定变更点数量和位置,并使用两个数据集显示了该方法对事件时间数据的实用性。在胶质母细胞瘤患者的数据集中,我们使用分段指数模型来描述危险函数的一般趋势。在一组心脏移植患者的数据中,我们发现分段指数模型在其他标准参数模型中产生了最佳的统计拟合和外推。如果长期恒定危险趋势在临床上是合理的,分段指数模型可能有助于生存外推。这种方法的一个主要优点是,自动估计数量和变化点位置,而不是由分析员指定。 摘要:In decision modelling with time to event data, parametric models are often used to extrapolate the survivor function. One such model is the piecewise exponential model whereby the hazard function is partitioned into segments, with the hazard constant within the segment and independent between segments and the boundaries of these segments are known as change-points. We present an approach for determining the location and number of change-points in piecewise exponential models. Inference is performed in a Bayesian framework using Markov Chain Monte Carlo (MCMC) where the model parameters can be integrated out of the model and the number of change-points can be sampled as part of the MCMC scheme. We can estimate both the uncertainty in the change-point locations and hazards for a given change-point model and obtain a probabilistic interpretation for the number of change-points. We evaluate model performance to determine changepoint numbers and locations in a simulation study and show the utility of the method using two data sets for time to event data. In a dataset of Glioblastoma patients we use the piecewise exponential model to describe the general trends in the hazard function. In a data set of heart transplant patients, we show the piecewise exponential model produces the best statistical fit and extrapolation amongst other standard parametric models. Piecewise exponential models may be useful for survival extrapolation if a long-term constant hazard trend is clinically plausible. A key advantage of this method is that the number and change-point locations are automatically estimated rather than specified by the analyst.

【15】 A causal approach to functional mediation analysis with application to a smoking cessation intervention 标题:功能中介分析的因果方法及其在戒烟干预中的应用 链接:https://arxiv.org/abs/2112.03960

作者:Donna L. Coffman,John J. Dziak,Kaylee Litson,Yajnaseni Chakraborti,Megan E. Piper,Runze Li 备注:42 pgs., 5 figures 摘要:随着移动和可穿戴设备使用量的增加,现在可以对调解过程进行密集评估。例如,药物干预可能通过减少暂时戒断症状对戒烟产生影响。我们根据平均差和优势比量表上的潜在结果定义并确定直接和间接因果效应,并提出了一种方法,用于估计和测试随机治疗对远端二元变量的间接影响,该二元变量由集中测量的纵向变量的非参数轨迹(例如,来自生态瞬时评估)介导。通过模拟验证了间接效应的自举测试的覆盖率。根据戒烟治疗过程中的渴求模式,本文给出了一个实证例子。我们提供了一个R包funmediation,以方便地应用此技术。最后,我们讨论了对多个中介体的可能扩展以及未来研究的方向。 摘要:The increase in the use of mobile and wearable devices now allow dense assessment of mediating processes over time. For example, a pharmacological intervention may have an effect on smoking cessation via reductions in momentary withdrawal symptoms. We define and identify the causal direct and indirect effects in terms of potential outcomes on the mean difference and odds ratio scales, and present a method for estimating and testing the indirect effect of a randomized treatment on a distal binary variable as mediated by the nonparametric trajectory of an intensively measured longitudinal variable (e.g., from ecological momentary assessment). Coverage of a bootstrap test for the indirect effect is demonstrated via simulation. An empirical example is presented based on estimating later smoking abstinence from patterns of craving during smoking cessation treatment. We provide an R package, funmediation, to conveniently apply this technique. We conclude by discussing possible extensions to multiple mediators and directions for future research.

【16】 Case Study: Evaluation of a meta-analysis of the association between soy protein and cardiovascular disease 标题:案例研究:大豆蛋白与心血管疾病相关性荟萃分析的评价 链接:https://arxiv.org/abs/2112.03945

作者:S. Stanley Young,Warren B. Kindzierski,Douglas Hawkins,Paul Fogel,Terry Meyer 备注:23 pages, 5 figures, 3 Tables 摘要:众所周知,来自观察性研究的说法往往无法复制。实验(随机)试验,条件在研究者控制下,享有很高的声誉,实验试验的荟萃分析被认为是最好的证据。鉴于不可再生性危机,最近的实验开始受到质疑。有必要了解随机试验得出的声明的可靠性。本文介绍了一个案例研究,独立审查了一项已发表的随机试验荟萃分析,该试验声称摄入大豆蛋白可改善心血管健康。使用计数和p值绘制技术(标准p值图、p值期望图和火山图)。计数(搜索空间)分析表明,由于多重测试和多重建模,荟萃分析报告的p值可能偏低。用于可视化元分析数据集行为的绘图技术表明,从基础论文中得出的统计数据不满足随机效应元分析的关键假设。这些假设包括使用来自同一人群的无偏统计数据。此外,发表偏倚在荟萃分析中未得到解决。我们的分析不支持摄入大豆蛋白可以改善心血管健康的说法。 摘要:It is well-known that claims coming from observational studies most often fail to replicate. Experimental (randomized) trials, where conditions are under researcher control, have a high reputation and meta-analysis of experimental trials are considered the best possible evidence. Given the irreproducibility crisis, experiments lately are starting to be questioned. There is a need to know the reliability of claims coming from randomized trials. A case study is presented here independently examining a published meta-analysis of randomized trials claiming that soy protein intake improves cardiovascular health. Counting and p-value plotting techniques (standard p-value plot, p-value expectation plot, and volcano plot) are used. Counting (search space) analysis indicates that reported p-values from the meta-analysis could be biased low due to multiple testing and multiple modeling. Plotting techniques used to visualize the behavior of the data set used for meta-analysis suggest that statistics drawn from the base papers do not satisfy key assumptions of a random-effects meta-analysis. These assumptions include using unbiased statistics all drawn from the same population. Also, publication bias is unaddressed in the meta-analysis. The claim that soy protein intake should improve cardiovascular health is not supported by our analysis.

【17】 Enhancing Counterfactual Classification via Self-Training 标题:通过自我训练加强反事实分类 链接:https://arxiv.org/abs/2112.04461

作者:Ruijiang Gao,Max Biggs,Wei Sun,Ligong Han 备注:AAAI 2022 摘要:与传统的监督学习不同,在许多情况下,只有部分反馈可用。我们可能只观察所选行动的结果,但不观察与其他备选方案相关的反事实结果。这些设置涵盖了广泛的应用,包括定价、在线营销和精准医疗。一个关键的挑战是观测数据受到系统中部署的历史政策的影响,从而产生有偏差的数据分布。我们将这项任务视为一个领域适应问题,并提出了一种自我训练算法,该算法使用观察数据中有限的不可见行为的分类值来估算结果,以通过伪标记模拟随机试验,我们称之为反事实自我训练(CST)。CST迭代插补伪标签并重新训练模型。此外,我们还发现输入一致性损失可以进一步改善CST性能,这在最近的伪标记理论分析中得到了证明。我们在合成数据集和真实数据集上证明了所提算法的有效性。 摘要:Unlike traditional supervised learning, in many settings only partial feedback is available. We may only observe outcomes for the chosen actions, but not the counterfactual outcomes associated with other alternatives. Such settings encompass a wide variety of applications including pricing, online marketing and precision medicine. A key challenge is that observational data are influenced by historical policies deployed in the system, yielding a biased data distribution. We approach this task as a domain adaptation problem and propose a self-training algorithm which imputes outcomes with categorical values for finite unseen actions in the observational data to simulate a randomized trial through pseudolabeling, which we refer to as Counterfactual Self-Training (CST). CST iteratively imputes pseudolabels and retrains the model. In addition, we show input consistency loss can further improve CST performance which is shown in recent theoretical analysis of pseudolabeling. We demonstrate the effectiveness of the proposed algorithms on both synthetic and real datasets.

【18】 Mixed Membership Distribution-Free model 标题:混合成员分布--自由模型 链接:https://arxiv.org/abs/2112.04389

作者:Huan Qing 备注:15 pages, 7 figures, comments are welcome 摘要:考虑混合成员加权网络中的潜在社团信息的检测问题,其中节点具有混合成员,节点之间的连接可以是有限实数。针对这个问题,我们提出了一个通用的混合成员分布自由模型。该模型没有边的分布约束,只有期望值,可以看作是以前一些模型的推广。我们使用一种有效的谱算法来估计模型下的社区成员。我们还利用精细谱分析推导了该算法在该模型下的收敛速度。我们展示了混合成员分布自由模型的优势,并将其应用于小规模的模拟网络,当边遵循不同的分布时。 摘要:We consider the problem of detecting latent community information of mixed membership weighted network in which nodes have mixed memberships and edges connecting between nodes can be finite real numbers. We propose a general mixed membership distribution-free model for this problem. The model has no distribution constraints of edges but only the expected values, and can be viewed as generalizations of some previous models. We use an efficient spectral algorithm to estimate community memberships under the model. We also derive the convergence rate of the proposed algorithm under the model using delicate spectral analysis. We demonstrate the advantages of mixed membership distribution-free model with applications to a small scale of simulated networks when edges follow different distributions.

【19】 Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation Systems 标题:语义TrueLearn:在推荐系统中使用语义知识图 链接:https://arxiv.org/abs/2112.04368

作者:Sahan Bulathwela,María Pérez-Ortiz,Emine Yilmaz,John Shawe-Taylor 备注:Presented at the First International Workshop on Joint Use of Probabilistic Graphical Models and Ontology at Conference on Knowledge Graph and Semantic Web 2021 摘要:在信息推荐者中,需要处理知识领域之间的语义和层次结构,这就产生了许多挑战。这项工作的目的是建立一个国家意识的教育推荐系统,该系统整合了知识主题之间的语义关联,在语义相关的主题之间传播潜在信息。我们介绍了一种新的学习者模型,该模型利用维基百科链接图利用学习资源中知识组件之间的语义相关性,旨在更好地预测终身学习场景中的学习者参与度和潜在知识。从这个意义上说,语义TrueLearn构建了一个人性化的直观知识表示,同时利用贝叶斯机器学习来提高教育参与的预测性能。我们在一个大数据集上的实验表明,这种新的语义版本的TrueLearn算法在预测性能方面取得了统计上的显著改进,它通过一个简单的扩展将语义感知添加到模型中。 摘要:In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph, with the aim to better predict learner engagement and latent knowledge in a lifelong learning scenario. In this sense, Semantic TrueLearn builds a humanly intuitive knowledge representation while leveraging Bayesian machine learning to improve the predictive performance of the educational engagement. Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance with a simple extension that adds semantic awareness to the model.

【20】 Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks 标题:迭代恢复算法的神经网络推广误差界 链接:https://arxiv.org/abs/2112.04364

作者:Ekkehard Schnoor,Arash Behboodi,Holger Rauhut 备注:29 pages, 6 figures 摘要:受学习迭代软阈值算法(LISTA)的启发,我们介绍了一类适用于从少量线性测量值进行稀疏重建的神经网络。通过允许各层之间广泛程度的权重共享,我们能够对非常不同的神经网络类型进行统一分析,从递归神经网络到更类似于标准前馈神经网络的网络。基于训练样本,通过经验风险最小化,我们的目标是学习最佳网络参数,从而获得从低维线性测量重构信号的最佳网络。我们通过分析由这样的深度网络组成的假设类的Rademacher复杂度得出了推广界,同时也考虑了阈值参数。我们得到了样本复杂度的估计值,该估计值本质上仅与参数数量和深度成线性关系。我们应用我们的主要结果来获得几个实际例子的特定泛化边界,包括(隐式)字典学习的不同算法和卷积神经网络。 摘要:Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.

【21】 COSMIC: fast closed-form identification from large-scale data for LTV systems 标题:COSMIC:基于大规模数据的LTV系统快速闭式识别 链接:https://arxiv.org/abs/2112.04355

作者:Maria Carvalho,Claudia Soares,Pedro Lourenço,Rodrigo Ventura 摘要:我们介绍了一种从数据中识别离散时间线性时变系统的封闭形式方法,将学习问题表述为正则化最小二乘问题,其中正则化子支持轨迹内的光滑解。我们开发了一个封闭形式的算法,保证了算法的最优性,其复杂度随着每条轨迹所考虑的瞬间数线性增加。宇宙算法即使在存在大量数据的情况下也能达到预期的结果。我们的方法解决了这个问题,使用两个数量级的计算能力比一般用途的凸求解器要少,并且比专门设计的随机块坐标下降方法快3倍左右。即使在通用解算器崩溃的10k和100k时间瞬间,我们方法的计算时间仍保持在秒量级。为了证明其对实际系统的适用性,我们使用弹簧-质量-阻尼器系统进行了测试,并使用估计模型寻找最优控制路径。我们的算法被应用于彗星拦截任务的低保真度和功能性工程模拟器,该任务要求在快速动力学环境中精确指向机载摄像机。因此,本文为线性时变系统的经典系统识别技术提供了一个快速的替代方案,同时为航天工业中的应用奠定了坚实的基础,也为在这种安全关键环境中整合利用数据的算法迈出了一步。 摘要:We introduce a closed-form method for identification of discrete-time linear time-variant systems from data, formulating the learning problem as a regularized least squares problem where the regularizer favors smooth solutions within a trajectory. We develop a closed-form algorithm with guarantees of optimality and with a complexity that increases linearly with the number of instants considered per trajectory. The COSMIC algorithm achieves the desired result even in the presence of large volumes of data. Our method solved the problem using two orders of magnitude less computational power than a general purpose convex solver and was about 3 times faster than a Stochastic Block Coordinate Descent especially designed method. Computational times of our method remained in the order of magnitude of the second even for 10k and 100k time instants, where the general purpose solver crashed. To prove its applicability to real world systems, we test with spring-mass-damper system and use the estimated model to find the optimal control path. Our algorithm was applied to both a Low Fidelity and Functional Engineering Simulators for the Comet Interceptor mission, that requires precise pointing of the on-board cameras in a fast dynamics environment. Thus, this paper provides a fast alternative to classical system identification techniques for linear time-variant systems, while proving to be a solid base for applications in the Space industry and a step forward to the incorporation of algorithms that leverage data in such a safety-critical environment.

【22】 Improving the Training of Graph Neural Networks with Consistency Regularization 标题:用一致性正则化改进图神经网络的训练 链接:https://arxiv.org/abs/2112.04319

作者:Chenhui Zhang,Yufei He,Yukuo Cen,Zhenyu Hou,Jie Tang 摘要:图形神经网络(GNNs)在半监督学习场景中取得了显著的成功。图神经网络中的消息传递机制有助于未标记节点从其标记的邻居收集监控信号。在这项工作中,我们研究了一致性正则化,一种被广泛采用的半监督学习方法,如何帮助改善图神经网络的性能。我们回顾了图神经网络一致性正则化的两种方法。一种是简单一致性正则化(SCR),另一种是平均教师一致性正则化(MCR)。我们将一致性正则化方法与两种最先进的GNN相结合,并在ogbn产品数据集上进行了实验。通过一致性正则化,在开放图基准(OGB)的ogbn产品数据集上,无论有无外部数据,最先进的GNNs的性能都可以提高0.3%。 摘要:Graph neural networks (GNNs) have achieved notable success in the semi-supervised learning scenario. The message passing mechanism in graph neural networks helps unlabeled nodes gather supervision signals from their labeled neighbors. In this work, we investigate how consistency regularization, one of widely adopted semi-supervised learning methods, can help improve the performance of graph neural networks. We revisit two methods of consistency regularization for graph neural networks. One is simple consistency regularization (SCR), and the other is mean-teacher consistency regularization (MCR). We combine the consistency regularization methods with two state-of-the-art GNNs and conduct experiments on the ogbn-products dataset. With the consistency regularization, the performance of state-of-the-art GNNs can be improved by 0.3% on the ogbn-products dataset of Open Graph Benchmark (OGB) both with and without external data.

【23】 Trainability for Universal GNNs Through Surgical Randomness 标题:基于手术随机性的通用GNN可训练性 链接:https://arxiv.org/abs/2112.04314

作者:Billy Joe Franks,Markus Anders,Marius Kloft,Pascal Schweitzer 摘要:消息传递神经网络(MPNN)具有可证明的局限性,通用网络可以克服这些局限性。然而,通用网络通常是不切实际的。唯一的例外是随机节点初始化(RNI),这是一种数据增强方法,可以产生可证明的通用网络。不幸的是,RNI存在严重的缺点,如收敛速度慢和对超参数变化高度敏感。我们将强大的技术从图同构测试的实际世界转移到MPNNs,解决了这些缺点。这最终导致个性化细化节点初始化(IRNI)。我们将RNI中使用的不分青红皂白和随意的随机性替换为在精心选择的节点上仅使用几个随机位的外科手术切口。我们新颖的非侵入式数据扩充方案在解决可训练性问题的同时保持了网络的通用性。我们正式证明了所声称的普遍性,并在实验上证实了IRNI克服了MPNN的局限性——在先前明确为此目的设计的合成基准集上。我们还验证了我们的方法在标准基准数据集蛋白质和NCI1上的实际有效性。 摘要:Message passing neural networks (MPNN) have provable limitations, which can be overcome by universal networks. However, universal networks are typically impractical. The only exception is random node initialization (RNI), a data augmentation method that results in provably universal networks. Unfortunately, RNI suffers from severe drawbacks such as slow convergence and high sensitivity to changes in hyperparameters. We transfer powerful techniques from the practical world of graph isomorphism testing to MPNNs, resolving these drawbacks. This culminates in individualization-refinement node initialization (IRNI). We replace the indiscriminate and haphazard randomness used in RNI by a surgical incision of only a few random bits at well-selected nodes. Our novel non-intrusive data-augmentation scheme maintains the networks' universality while resolving the trainability issues. We formally prove the claimed universality and corroborate experimentally -- on synthetic benchmarks sets previously explicitly designed for that purpose -- that IRNI overcomes the limitations of MPNNs. We also verify the practical efficacy of our approach on the standard benchmark data sets PROTEINS and NCI1.

【24】 Non parametric estimation of causal populations in a counterfactual scenario 标题:反事实情景下因果总体的非参数估计 链接:https://arxiv.org/abs/2112.04288

作者:Celine Beji,Florian Yger,Jamal Atif 摘要:在因果关系中,在没有混淆推理的情况下估计治疗的效果仍然是一个主要问题,因为需要在有和没有治疗的情况下评估结果。由于不能同时观察这两种情况,潜在结果的估计仍然是一项具有挑战性的任务。我们提出了一种创新方法,将问题重新表述为缺失数据模型。目的是估计因果人群的隐藏分布,定义为治疗和结果的函数。因果自动编码器(CAE)通过对治疗和结果信息的先验依赖性增强,将潜在空间同化为目标人群的概率分布。这些特征在被缩减到一个潜在空间后被重建,并被网络中间层引入的包含治疗和结果信息的掩模所约束。 摘要:In causality, estimating the effect of a treatment without confounding inference remains a major issue because requires to assess the outcome in both case with and without treatment. Not being able to observe simultaneously both of them, the estimation of potential outcome remains a challenging task. We propose an innovative approach where the problem is reformulated as a missing data model. The aim is to estimate the hidden distribution of emph{causal populations}, defined as a function of treatment and outcome. A Causal Auto-Encoder (CAE), enhanced by a prior dependent on treatment and outcome information, assimilates the latent space to the probability distribution of the target populations. The features are reconstructed after being reduced to a latent space and constrained by a mask introduced in the intermediate layer of the network, containing treatment and outcome information.

【25】 Modeling Spatio-Temporal Dynamics in Brain Networks: A Comparison of Graph Neural Network Architectures 标题:脑网络时空动力学建模:图神经网络结构的比较 链接:https://arxiv.org/abs/2112.04266

作者:Simon Wein,Alina Schüller,Ana Maria Tomé,Wilhelm M. Malloni,Mark W. Greenlee,Elmar W. Lang 摘要:理解神经动力学的时空特征之间的相互作用有助于我们理解人脑中的信息处理。图形神经网络(GNNs)为解释复杂大脑网络中观察到的图形结构信号提供了一种新的可能性。在我们的研究中,我们比较了不同的时空GNN结构,并研究了它们复制功能MRI(fMRI)研究中获得的神经活动分布的能力。我们评估了MRI研究中各种场景下GNN模型的性能,并将其与目前主要用于定向功能连通性分析的VAR模型进行了比较。我们表明,通过学习解剖基底上的局部功能相互作用,基于GNN的方法能够稳健地扩展到大型网络研究,即使可用数据很少。通过将解剖连接作为信息传播的物理基础,此类GNN还提供了定向连接分析的多模态视角,为研究大脑网络中的时空动力学提供了新的可能性。 摘要:Comprehending the interplay between spatial and temporal characteristics of neural dynamics can contribute to our understanding of information processing in the human brain. Graph neural networks (GNNs) provide a new possibility to interpret graph structured signals like those observed in complex brain networks. In our study we compare different spatio-temporal GNN architectures and study their ability to replicate neural activity distributions obtained in functional MRI (fMRI) studies. We evaluate the performance of the GNN models on a variety of scenarios in MRI studies and also compare it to a VAR model, which is currently predominantly used for directed functional connectivity analysis. We show that by learning localized functional interactions on the anatomical substrate, GNN based approaches are able to robustly scale to large network studies, even when available data are scarce. By including anatomical connectivity as the physical substrate for information propagation, such GNNs also provide a multimodal perspective on directed connectivity analysis, offering a novel possibility to investigate the spatio-temporal dynamics in brain networks.

【26】 A Fast Algorithm for PAC Combinatorial Pure Exploration 标题:一种PAC组合纯搜索的快速算法 链接:https://arxiv.org/abs/2112.04197

作者:Noa Ben-David,Sivan Sabato 备注:Full version of paper accepted to AAAI-22 摘要:我们考虑组合纯探索(CPE)的问题,它涉及找到一个组合的集合或武器具有高回报,当个别手臂的奖励是事先未知的,并且必须使用ARM拉出来估计。以前针对该问题的算法虽然在许多情况下降低了样本复杂度,但计算量很大,因此即使对于较小的问题也不可行。在这项工作中,我们在PAC环境中提出了一种新的CPE算法,该算法计算量小,因此可以很容易地应用于上万个手臂的问题。这是因为所提出的算法需要非常少的组合oracle调用。该算法基于arms的连续接受,以及基于问题组合结构的消除。我们为我们的算法提供了样本复杂性保证,并在实验中证明了它在大型问题上的有效性,而以前的算法在几十个分支的问题上都是不切实际的。有关算法和实验的代码,请参阅https://github.com/noabdavid/csale. 摘要:We consider the problem of Combinatorial Pure Exploration (CPE), which deals with finding a combinatorial set or arms with a high reward, when the rewards of individual arms are unknown in advance and must be estimated using arm pulls. Previous algorithms for this problem, while obtaining sample complexity reductions in many cases, are highly computationally intensive, thus making them impractical even for mildly large problems. In this work, we propose a new CPE algorithm in the PAC setting, which is computationally light weight, and so can easily be applied to problems with tens of thousands of arms. This is achieved since the proposed algorithm requires a very small number of combinatorial oracle calls. The algorithm is based on successive acceptance of arms, along with elimination which is based on the combinatorial structure of the problem. We provide sample complexity guarantees for our algorithm, and demonstrate in experiments its usefulness on large problems, whereas previous algorithms are impractical to run on problems of even a few dozen arms. The code for the algorithms and experiments is provided at https://github.com/noabdavid/csale.

【27】 Aggregation of Pareto optimal models 标题:帕累托最优模型的集结 链接:https://arxiv.org/abs/2112.04161

作者:Hamed Hamze Bajgiran,Houman Owhadi 摘要:在统计决策理论中,如果没有其他模型对至少一种自然状态的风险较小,而对其他状态的风险较小,则该模型称为帕累托最优(或可容许)。如何合理地聚合/组合有限的帕累托最优模型集,同时保持帕累托效率?这个问题很重要,因为加权模型平均通常不保持帕累托效率。本文通过四个逻辑步骤给出了答案:(1)理性聚合规则应保持帕累托效率(2)由于完全类定理,帕累托最优模型必须是贝叶斯的,即它们最小化了一个风险,其中自然的真实状态相对于某个先验平均。因此,每个帕累托最优模型都可以与先验知识相关联,通过将帕累托最优模型通过其先验知识进行聚合,可以保持帕累托效率。(3) 先验可解释为对模型的偏好排序:如果A的平均风险低于B的平均风险,则先验$pi$更偏好模型A而非模型B。(4)合理/一致的聚合规则应保持这种偏好排序:如果先验$pi$和$pi'$都更偏好模型A而非模型B,然后,通过聚合$pi$和$pi'$获得的先验值也必须优先于A而不是B。在这四个步骤中,我们证明了所有合理/一致的聚合规则如下:给每个单独的帕累托最优模型一个权重,在帕累托最优模型集上引入弱顺序/排名,将有限的模型集合S聚合为与先验相关的模型,该先验作为S中排名最高的模型的先验加权平均值。该结果表明,所有合理/一致的聚合规则必须遵循分层贝叶斯建模的推广。根据我们的主要结果,我们介绍了核平滑、时间折旧模型和投票机制的应用。 摘要:In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preserve Pareto efficiency. This paper presents an answer in four logical steps: (1) A rational aggregation rule should preserve Pareto efficiency (2) Due to the complete class theorem, Pareto optimal models must be Bayesian, i.e., they minimize a risk where the true state of nature is averaged with respect to some prior. Therefore each Pareto optimal model can be associated with a prior, and Pareto efficiency can be maintained by aggregating Pareto optimal models through their priors. (3) A prior can be interpreted as a preference ranking over models: prior $pi$ prefers model A over model B if the average risk of A is lower than the average risk of B. (4) A rational/consistent aggregation rule should preserve this preference ranking: If both priors $pi$ and $pi'$ prefer model A over model B, then the prior obtained by aggregating $pi$ and $pi'$ must also prefer A over B. Under these four steps, we show that all rational/consistent aggregation rules are as follows: Give each individual Pareto optimal model a weight, introduce a weak order/ranking over the set of Pareto optimal models, aggregate a finite set of models S as the model associated with the prior obtained as the weighted average of the priors of the highest-ranked models in S. This result shows that all rational/consistent aggregation rules must follow a generalization of hierarchical Bayesian modeling. Following our main result, we present applications to Kernel smoothing, time-depreciating models, and voting mechanisms.

【28】 Best Arm Identification under Additive Transfer Bandits 标题:加性传递带下的ARM最佳辨识 链接:https://arxiv.org/abs/2112.04083

作者:Ojash Neopane,Aaditya Ramdas,Aarti Singh 摘要:我们考虑在多臂土匪(MAB)中的最佳臂识别(BAI)问题的变体,其中有两组武器(源和目标),并且目标是在只拉源臂的情况下确定最佳目标臂。在本文中,我们研究了当平均值未知时,源和目标MAB实例之间存在已知的相加关系时的设置。我们将展示我们的框架如何涵盖一系列以前研究过的纯探索问题,并另外捕获新问题。我们提出并理论分析了一种LUCB类型的算法,以高概率识别$epsilon$最优目标臂。我们的理论分析强调了这种迁移学习问题在典型的BAI设置中不会出现的方面,但作为特例,恢复了单域BAI的LUCB算法。 摘要:We consider a variant of the best arm identification (BAI) problem in multi-armed bandits (MAB) in which there are two sets of arms (source and target), and the objective is to determine the best target arm while only pulling source arms. In this paper, we study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances. We show how our framework covers a range of previously studied pure exploration problems and additionally captures new problems. We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability. Our theoretical analysis highlights aspects of this transfer learning problem that do not arise in the typical BAI setup, and yet recover the LUCB algorithm for single domain BAI as a special case.

【29】 Image classifiers can not be made robust to small perturbations 标题:不能使图像分类器对小扰动具有鲁棒性 链接:https://arxiv.org/abs/2112.04033

作者:Zheng Dai,David K. Gifford 摘要:图像分类器对输入小扰动的敏感性通常被视为其结构的缺陷。我们证明了这种敏感性是分类器的一个基本属性。对于$n$-by-$n$图像集上的任意分类器,我们表明,对于除一个类别外的所有类别,与以任何$p$-标准(包括汉明距离)测量的图像空间直径相比,可以通过微小的修改来改变该类别中除一小部分以外的所有图像的分类。然后,我们研究这一现象在人类视觉感知中的表现,并讨论其对计算机视觉系统设计考虑的影响。 摘要:The sensitivity of image classifiers to small perturbations in the input is often viewed as a defect of their construction. We demonstrate that this sensitivity is a fundamental property of classifiers. For any arbitrary classifier over the set of $n$-by-$n$ images, we show that for all but one class it is possible to change the classification of all but a tiny fraction of the images in that class with a tiny modification compared to the diameter of the image space when measured in any $p$-norm, including the hamming distance. We then examine how this phenomenon manifests in human visual perception and discuss its implications for the design considerations of computer vision systems.

【30】 Passenger Network Ridership Model Through a BRT System, the case of TransMilenio in Bogotá 标题:基于BRT系统的客运网客流模型研究--以波哥大TransMilenio为例 链接:https://arxiv.org/abs/2112.04009

作者:Arturo Argüelles,Juan D. Garcia-Arteaga,Gabriel Villalobos 摘要:我们提出了一个公共交通网络中用户个人轨迹的乘客模型,其中起点和终点之间有多条不同的路线,并且自动收费数据不包括关于出口站的信息;只有卡标识和进入系统的时间。该模型在TransMilenio的Tronical组件中实施,TransMilenio是波哥大公共交通主干的快速公交系统。数据的粒度允许在任何特定的一天识别系统中每一条总线的占用情况。作为验证,我们比较了2020年5月17日和2020年12月1日这两个特定日期的平均公交车占用率。 摘要:We present a ridership model of individual trajectories of users within a public transport network for which there are several different routes between origin and destiny and for which the automatic fare collection data does not include information about the exit station; only card identification and time of entry into the system. This model is implemented in the case of the Troncal component of TransMilenio, the BRT system that is the backbone of public transport in Bogot'a. The granularity of the data allows to identify the occupation of every bus within the system at any particular day. As a validation, we compare the average bus occupation of two particular days, 17 of May of 2020 and the 1 of December of 2020.

【31】 SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning 标题:SHRIMP:基于迭代幅度剪枝的稀疏随机特征模型 链接:https://arxiv.org/abs/2112.04002

作者:Yuege Xie,Bobby Shi,Hayden Schaeffer,Rachel Ward 摘要:稀疏收缩加性模型和稀疏随机特征模型分别作为学习低阶函数的方法被开发,在低阶函数中,变量之间几乎没有交互作用,但两者都不能提供计算效率。另一方面,基于$ellu 2$的收缩加性模型是有效的,但由于生成的系数向量密集,因此不提供特征选择。受迭代幅度剪枝技术在寻找神经网络彩票方面的成功启发,我们提出了一种新方法——通过IMP(ShRIMP)建立稀疏随机特征模型——以稀疏变量依赖的形式有效地拟合具有固有低维结构的高维数据。我们的方法可以看作是构造和查找两层密集网络的稀疏彩票的组合过程。我们通过对阈值基追踪的泛化误差和由此产生的特征值界的精细分析,解释了SHRIMP的观察优势。通过对合成数据和真实基准数据集的函数近似实验,我们表明SHRIMP比最先进的稀疏特征和加法(如SRFE-S、SSAM和SALSA)获得更好的测试精度或具有竞争力的测试精度。同时,SHRIMP以较低的计算复杂度执行特征选择,并且对剪枝率具有鲁棒性,表明所获得的子网络结构具有鲁棒性。通过SHRIMP,我们发现了我们的模型和重量/神经元子网络之间的对应关系,从而深入了解了彩票假设。 摘要:Sparse shrunk additive models and sparse random feature models have been developed separately as methods to learn low-order functions, where there are few interactions between variables, but neither offers computational efficiency. On the other hand, $ell_2$-based shrunk additive models are efficient but do not offer feature selection as the resulting coefficient vectors are dense. Inspired by the success of the iterative magnitude pruning technique in finding lottery tickets of neural networks, we propose a new method -- Sparser Random Feature Models via IMP (ShRIMP) -- to efficiently fit high-dimensional data with inherent low-dimensional structure in the form of sparse variable dependencies. Our method can be viewed as a combined process to construct and find sparse lottery tickets for two-layer dense networks. We explain the observed benefit of SHRIMP through a refined analysis on the generalization error for thresholded Basis Pursuit and resulting bounds on eigenvalues. From function approximation experiments on both synthetic data and real-world benchmark datasets, we show that SHRIMP obtains better than or competitive test accuracy compared to state-of-art sparse feature and additive methods such as SRFE-S, SSAM, and SALSA. Meanwhile, SHRIMP performs feature selection with low computational complexity and is robust to the pruning rate, indicating a robustness in the structure of the obtained subnetworks. We gain insight into the lottery ticket hypothesis through SHRIMP by noting a correspondence between our model and weight/neuron subnetworks.

【32】 Testing for Causal Influence using a Partial Coherence Statistic 标题:使用部分相干统计量检验因果影响 链接:https://arxiv.org/abs/2112.03987

作者:Louis L. Scharf,Yuan Wang 摘要:在本文中,我们探讨了部分相干性作为一种工具,用于评估一个信号序列对另一个信号序列的因果影响。在某些情况下,信号序列是从时间或空间序列中采样的。关键思想是在因果关系问题和部分连贯性问题之间建立联系。一旦建立了这种联系,就可以使用尺度不变的部分相干统计来解决因果关系问题。这个相干统计量被证明是一个似然比,它的零分布被证明是一个Wilks-Lambda。它可以从复合协方差矩阵或其逆矩阵(信息矩阵)计算。数值实验证明了部分相干性在因果关系解析中的应用。重要的是,该方法是无模型的,不依赖于因果关系的生成模型。 摘要:In this paper we explore partial coherence as a tool for evaluating causal influence of one signal sequence on another. In some cases the signal sequence is sampled from a time- or space-series. The key idea is to establish a connection between questions of causality and questions of partial coherence. Once this connection is established, then a scale-invariant partial coherence statistic is used to resolve the question of causality. This coherence statistic is shown to be a likelihood ratio, and its null distribution is shown to be a Wilks Lambda. It may be computed from a composite covariance matrix or from its inverse, the information matrix. Numerical experiments demonstrate the application of partial coherence to the resolution of causality. Importantly, the method is model-free, depending on no generative model for causality.

【33】 Posterior linearisation smoothing with robust iterations 标题:鲁棒迭代的后验线性化平滑 链接:https://arxiv.org/abs/2112.03969

作者:Jakob Lindqvist,Simo Särkkä,Ángel F. García-Fernández,Matti Raitoharju,Lennart Svensson 摘要:本文研究了具有加性噪声的非线性状态空间模型的鲁棒迭代贝叶斯平滑问题。众所周知,迭代方法可以改进平滑估计,但不能保证收敛,从而推动算法更稳健版本的开发。本文的目的是介绍经典迭代扩展卡尔曼平滑器(IEKS)和迭代后线性化平滑器(IPLS)的Levenberg-Marquardt(LM)和线搜索扩展。IEKS先前已被证明与高斯-牛顿(GN)方法等效。我们推导出了IPLS的类似GN解释。此外,我们还表明,两种迭代方法的LM扩展都可以通过对平滑迭代的简单修改来实现,从而使算法具有高效的实现。我们的数值实验表明了鲁棒方法的重要性,特别是对于基于IEKS的平滑器。基于IPLS的平滑器在计算上非常昂贵,自然具有鲁棒性,但仍然可以从进一步的正则化中获益。 摘要:This paper considers the problem of robust iterative Bayesian smoothing in nonlinear state-space models with additive noise using Gaussian approximations. Iterative methods are known to improve smoothed estimates but are not guaranteed to converge, motivating the development of more robust versions of the algorithms. The aim of this article is to present Levenberg-Marquardt (LM) and line-search extensions of the classical iterated extended Kalman smoother (IEKS) as well as the iterated posterior linearisation smoother (IPLS). The IEKS has previously been shown to be equivalent to the Gauss-Newton (GN) method. We derive a similar GN interpretation for the IPLS. Furthermore, we show that an LM extension for both iterative methods can be achieved with a simple modification of the smoothing iterations, enabling algorithms with efficient implementations. Our numerical experiments show the importance of robust methods, in particular for the IEKS-based smoothers. The computationally expensive IPLS-based smoothers are naturally robust but can still benefit from further regularisation.

【34】 Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks 标题:学习理论(有时)可以解释图神经网络的泛化 链接:https://arxiv.org/abs/2112.03968

作者:Pascal Mattia Esser,Leena Chennuru Vankadara,Debarghya Ghoshdastidar 备注:35th Conference on Neural Information Processing Systems (NeurIPS 2021) 摘要:近年来,监督学习环境中的一些结果表明,经典的统计学习理论度量,如VC维,不能充分解释深度学习模型的性能,这促使在无限宽度和迭代区域进行大量工作。然而,除了有监督的环境外,很少有理论解释神经网络的成功。在本文中,我们认为,在一些分布假设下,经典的学习理论方法可以充分解释图神经网络在直传环境下的泛化。特别是,我们通过分析图卷积网络在节点分类问题上的泛化特性,对神经网络在传递推理中的性能进行了严格的分析。虽然VC维确实会在该设置中导致微不足道的泛化误差界,但我们表明,对于随机块模型,transductive Rademacher复杂度可以解释图卷积网络的泛化特性。我们进一步使用基于transductive Rademacher复杂度的泛化误差界来演示图卷积和网络架构在实现较小泛化误差方面的作用,并深入了解图结构何时有助于学习。这篇论文的发现可以重新激发人们对学习理论方法方面的神经网络泛化研究的兴趣,尽管是在具体问题上。 摘要:In recent years, several results in the supervised learning setting suggested that classical statistical learning-theoretic measures, such as VC dimension, do not adequately explain the performance of deep learning models which prompted a slew of work in the infinite-width and iteration regimes. However, there is little theoretical explanation for the success of neural networks beyond the supervised setting. In this paper we argue that, under some distributional assumptions, classical learning-theoretic measures can sufficiently explain generalization for graph neural networks in the transductive setting. In particular, we provide a rigorous analysis of the performance of neural networks in the context of transductive inference, specifically by analysing the generalisation properties of graph convolutional networks for the problem of node classification. While VC Dimension does result in trivial generalisation error bounds in this setting as well, we show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for stochastic block models. We further use the generalisation error bounds based on transductive Rademacher complexity to demonstrate the role of graph convolutions and network architectures in achieving smaller generalisation error and provide insights into when the graph structure can help in learning. The findings of this paper could re-new the interest in studying generalisation in neural networks in terms of learning-theoretic measures, albeit in specific problems.

【35】 RID-Noise: Towards Robust Inverse Design under Noisy Environments 标题:RID-噪声:噪声环境下的鲁棒逆设计 链接:https://arxiv.org/abs/2112.03912

作者:Jia-Qi Yang,Ke-Bin Fan,Hao Ma,De-Chuan Zhan 备注:AAAI'22 摘要:从工程角度来看,设计不仅应在理想条件下运行良好,还应抵抗噪音。这种设计方法,即稳健设计,已广泛应用于工业产品质量控制。然而,传统稳健设计需要对单个设计目标进行大量评估,而这些评估的结果不能用于新的目标。为了实现数据高效的鲁棒设计,我们提出了噪声下的鲁棒逆设计(RID噪声),它可以利用现有的噪声数据来训练条件可逆神经网络(cINN)。具体来说,我们通过前向神经网络的预测误差来衡量设计参数的可预测性,从而估计其鲁棒性。我们还定义了样本权重,可用于基于cINN的逆模型的最大加权似然估计。通过实验的可视化结果,我们清楚地证明了RID噪声是如何通过从数据中学习分布和鲁棒性来工作的。在多个有噪声的真实基准任务上的进一步实验证实,我们的方法比其他最先进的逆设计方法更有效。守则及补充资料可于https://github.com/ThyrixYang/rid-noise-aaai22 摘要:From an engineering perspective, a design should not only perform well in an ideal condition, but should also resist noises. Such a design methodology, namely robust design, has been widely implemented in the industry for product quality control. However, classic robust design requires a lot of evaluations for a single design target, while the results of these evaluations could not be reused for a new target. To achieve a data-efficient robust design, we propose Robust Inverse Design under Noise (RID-Noise), which can utilize existing noisy data to train a conditional invertible neural network (cINN). Specifically, we estimate the robustness of a design parameter by its predictability, measured by the prediction error of a forward neural network. We also define a sample-wise weight, which can be used in the maximum weighted likelihood estimation of an inverse model based on a cINN. With the visual results from experiments, we clearly justify how RID-Noise works by learning the distribution and robustness from data. Further experiments on several real-world benchmark tasks with noises confirm that our method is more effective than other state-of-the-art inverse design methods. Code and supplementary is publicly available at https://github.com/ThyrixYang/rid-noise-aaai22

【36】 Convergence Guarantees for Deep Epsilon Greedy Policy Learning 标题:深度Epsilon贪婪策略学习的收敛性保证 链接:https://arxiv.org/abs/2112.03376

作者:Michael Rawson,Radu Balan 摘要:政策学习是一个快速发展的领域。由于机器人和计算机控制着日常生活,因此需要最小化和控制它们的错误率。有许多策略学习方法和可证明的错误率。我们展示了一个错误或遗憾界和收敛的深Epsilon贪婪方法选择行动与神经网络的预测。在实际数据集MNIST的实验中,我们构造了一个非线性强化学习问题。我们见证了在高噪声或低噪声条件下,有些方法收敛,有些不收敛,这与我们的收敛性证明是一致的。 摘要:Policy learning is a quickly growing area. As robotics and computers control day-to-day life, their error rate needs to be minimized and controlled. There are many policy learning methods and provable error rates that accompany them. We show an error or regret bound and convergence of the Deep Epsilon Greedy method which chooses actions with a neural network's prediction. In experiments with the real-world dataset MNIST, we construct a nonlinear reinforcement learning problem. We witness how with either high or low noise, some methods do and some do not converge which agrees with our proof of convergence.

机器翻译,仅供参考

0 人点赞