统计学学术速递[12.10]

2021-12-10 16:59:05 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

stat统计学,共计40篇

【1】 Fair Structure Learning in Heterogeneous Graphical Models 标题:异构图形模型中的公平结构学习 链接:https://arxiv.org/abs/2112.05128

作者:Davoud Ataee Tarzanagh,Laura Balzano,Alfred O. Hero 机构:Department of Electrical Engineering and Computer Science, University of Michigan 摘要:当节点具有人口统计属性时,概率图形模型中的社区结构推断可能与公平性约束不一致。某些人口统计数据可能在某些检测到的社区中代表性过高,而在其他社区中代表性不足。本文定义了一种新的$ellu_1$正则化伪似然方法,用于公平图形模型选择。特别是,我们假设在真实的基础图中存在某种社区或集群结构,并试图从数据中学习稀疏无向图及其社区,以便人口群体在社区中得到公平的代表。我们的优化方法使用人口均等公平性定义,但该框架很容易扩展到其他公平性定义。我们分别对连续数据和二进制数据建立了高斯图形模型和伊辛模型的统计一致性,证明了我们的方法能够以高概率恢复图及其公平社区。 摘要:Inference of community structure in probabilistic graphical models may not be consistent with fairness constraints when nodes have demographic attributes. Certain demographics may be over-represented in some detected communities and under-represented in others. This paper defines a novel $ell_1$-regularized pseudo-likelihood approach for fair graphical model selection. In particular, we assume there is some community or clustering structure in the true underlying graph, and we seek to learn a sparse undirected graph and its communities from the data such that demographic groups are fairly represented within the communities. Our optimization approach uses the demographic parity definition of fairness, but the framework is easily extended to other definitions of fairness. We establish statistical consistency of the proposed method for both a Gaussian graphical model and an Ising model for, respectively, continuous and binary data, proving that our method can recover the graphs and their fair communities with high probability.

【2】 On Convergence of Federated Averaging Langevin Dynamics 标题:关于联邦平均朗之万动力学的收敛性 链接:https://arxiv.org/abs/2112.05120

作者:Wei Deng,Yi-An Ma,Zhao Song,Qian Zhang,Guang Lin 摘要:我们提出了一种用于分布式客户机的不确定性量化和平均预测的联合平均Langevin算法(FA-LD)。特别是,我们推广超越正常后验分布,并考虑一般类的模型。我们为具有非i.i.d数据的强对数凹分布的FA-LD提供了理论保证,并研究了注入噪声和随机梯度噪声、数据的异质性以及不同的学习速率对收敛性的影响。这样的分析有助于优化本地更新的选择,从而最大限度地降低通信成本。对于我们的方法来说,重要的是,在Langevin算法中,注入噪声不会降低通信效率。此外,我们在我们的FA-LD算法中检查了在不同客户机上使用的独立和相关噪声。我们注意到,在联邦和通信成本之间也存在权衡。由于本地设备在联邦网络中可能变得不活动,我们还展示了基于不同平均方案的收敛结果,其中只有部分设备更新可用。 摘要:We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe that there is also a trade-off between federation and communication cost there. As local devices may become inactive in the federated network, we also show convergence results based on different averaging schemes where only partial device updates are available.

【3】 Times Square sampling: an adaptive algorithm for free energy estimation 标题:时间平方抽样:一种自由能估计的自适应算法 链接:https://arxiv.org/abs/2112.05109

作者:Cristian Predescu,Michael Snarski,Avi Robinson-Mosher,Duluxan Sritharan,Tamas Szalay,David E. Shaw 机构: D. E. Shaw Research, New York, NY , USA., Department of Biochemistry and Molecular Biophysics, Columbia University, † To whom correspondence should be addressed., E-mail:, Phone:, (,) ,-, Fax: 摘要:估算自由能差是计算药物发现和其他广泛应用领域中的一个重要问题,通常涉及对一系列高维概率分布进行采样的计算密集型过程以及基于这些样本计算估算值的过程。感兴趣的自由能估计的方差通常很大程度上取决于可用于采样的总计算资源如何在分布中分配,但如果不对分布进行采样,则很难确定有效的分配。在这里,我们介绍了Times Square采样算法,这是一种新的动态估计方法,它动态分配资源,从而显著加快自由能和其他观测值的估计,同时为估计量提供严格的收敛保证。我们还表明,令人惊讶的是,动态自由能估计有可能比最大似然估计MBAR获得更低的渐近方差,这提高了动态估计在各种其他统计应用中减少方差的前景。 摘要:Estimating free energy differences, an important problem in computational drug discovery and in a wide range of other application areas, commonly involves a computationally intensive process of sampling a family of high-dimensional probability distributions and a procedure for computing estimates based on those samples. The variance of the free energy estimate of interest typically depends strongly on how the total computational resources available for sampling are divided among the distributions, but determining an efficient allocation is difficult without sampling the distributions. Here we introduce the Times Square sampling algorithm, a novel on-the-fly estimation method that dynamically allocates resources in such a way as to significantly accelerate the estimation of free energies and other observables, while providing rigorous convergence guarantees for the estimators. We also show that it is possible, surprisingly, for on-the-fly free energy estimation to achieve lower asymptotic variance than the maximum-likelihood estimator MBAR, raising the prospect that on-the-fly estimation could reduce variance in a variety of other statistical applications.

【4】 Provable Continual Learning via Sketched Jacobian Approximations 标题:基于草图雅可比近似的可证明连续学习 链接:https://arxiv.org/abs/2112.05095

作者:Reinhard Heckel 机构:∗Dept. of Electrical and Computer Engineering, Technical University of Munich, †Dept. of Electrical and Computer Engineering, Rice University 摘要:机器学习中的一个重要问题是以顺序方式学习任务的能力。如果使用标准的一阶方法进行训练,大多数模型在接受新任务训练时会忘记以前学习过的任务,这通常被称为灾难性遗忘。克服遗忘的一种流行方法是通过惩罚在以前任务中表现不佳的模型来规范损失函数。例如,弹性权重固结(EWC)采用二次形式进行正则化,其中涉及基于过去数据构建的对角矩阵。虽然EWC在某些设置中工作得很好,但我们表明,即使在其他理想条件下,如果对角矩阵与以前任务的Hessian矩阵的近似性较差,它也可能遭受灾难性遗忘。我们提出了一种简单的方法来克服这一问题:用过去数据的雅可比矩阵草图来正则化新任务的训练。这可以证明能够克服线性模型和广泛的神经网络的灾难性遗忘,而代价是内存。本文的总体目标是提供关于基于正则化的持续学习算法何时工作以及在何种内存成本下工作的见解。 摘要:An important problem in machine learning is the ability to learn tasks in a sequential manner. If trained with standard first-order methods most models forget previously learned tasks when trained on a new task, which is often referred to as catastrophic forgetting. A popular approach to overcome forgetting is to regularize the loss function by penalizing models that perform poorly on previous tasks. For example, elastic weight consolidation (EWC) regularizes with a quadratic form involving a diagonal matrix build based on past data. While EWC works very well for some setups, we show that, even under otherwise ideal conditions, it can provably suffer catastrophic forgetting if the diagonal matrix is a poor approximation of the Hessian matrix of previous tasks. We propose a simple approach to overcome this: Regularizing training of a new task with sketches of the Jacobian matrix of past data. This provably enables overcoming catastrophic forgetting for linear models and for wide neural networks, at the cost of memory. The overarching goal of this paper is to provided insights on when regularization-based continual learning algorithms work and under what memory costs.

【5】 Multi-Kink Quantile Regression for Longitudinal Data with Application to the Progesterone Data Analysis 标题:纵向数据的多结分位数回归及其在孕酮数据分析中的应用 链接:https://arxiv.org/abs/2112.05045

作者:Chuang Wan,Wei Zhong,Wenyang Zhang,Changliang Zou 机构: Xiamen University; ,The University of York; , Nankai University 备注:22pages; 3 figures 摘要:通过纵向研究孕酮与月经周期天数之间的关系,我们提出了一个用于纵向数据分析的多扭结分位数回归模型。它放松了线性条件,并在阈值协变量域的不同区域采用不同的回归形式。在本文中,我们首先提出了纵向数据的多扭结分位数回归。提出了两种估计回归系数和扭点位置的方法:一种是在工作独立性框架下的计算效率剖面估计方法,另一种是使用无偏广义估计方程方法考虑受试者内部的相关性。建立了扭结点个数的选择一致性和两个估计量的渐近正态性。其次,针对纵向研究中扭结效应的存在,我们构建了一个基于部分次梯度的秩和检验。推导了检验统计量的零分布和局部替代分布。仿真研究表明,该方法具有良好的有限样本性能。在纵向孕酮数据的应用中,我们在不同分位数的孕酮曲线上确定了两个扭结点,并观察到孕酮水平在排卵前保持稳定,然后在排卵后5至6天迅速增加,然后再次变为稳定甚至略有下降 摘要:Motivated by investigating the relationship between progesterone and the days in a menstrual cycle in a longitudinal study, we propose a multi-kink quantile regression model for longitudinal data analysis. It relaxes the linearity condition and assumes different regression forms in different regions of the domain of the threshold covariate. In this paper, we first propose a multi-kink quantile regression for longitudinal data. Two estimation procedures are proposed to estimate the regression coefficients and the kink points locations: one is a computationally efficient profile estimator under the working independence framework while the other one considers the within-subject correlations by using the unbiased generalized estimation equation approach. The selection consistency of the number of kink points and the asymptotic normality of two proposed estimators are established. Secondly, we construct a rank score test based on partial subgradients for the existence of kink effect in longitudinal studies. Both the null distribution and the local alternative distribution of the test statistic have been derived. Simulation studies show that the proposed methods have excellent finite sample performance. In the application to the longitudinal progesterone data, we identify two kink points in the progesterone curves over different quantiles and observe that the progesterone level remains stable before the day of ovulation, then increases quickly in five to six days after ovulation and then changes to stable again or even drops slightly

【6】 Bayesian Functional Data Analysis over Dependent Regions and Its Application for Identification of Differentially Methylated Regions 标题:相依区域的贝叶斯函数数据分析及其在差异甲基化区域识别中的应用 链接:https://arxiv.org/abs/2112.05041

作者:Suvo Chatterjee,Shrabanti Chowdhury,Duchwan Ryu,Sanjib Basu 机构:Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD , USA. 摘要:我们认为一个贝叶斯函数数据分析的观测非常长的序列。将序列拆分为若干具有可管理长度的小窗口,这些窗口可能不是独立的,尤其是当它们彼此相邻时。我们建议利用贝叶斯平滑样条来估计每个窗口中的单个函数模式,并为每个窗口中涉及的参数建立转换模型,以解决窗口之间的依赖结构。在分析中,基于马尔可夫链蒙特卡罗样本的贝叶斯因子可以评估每个窗口中个体组的功能差异。在本文中,我们通过模拟研究检验了所提出的方法,并将其应用于识别TCGA肺腺癌数据中的差异甲基化基因区域。 摘要:We consider a Bayesian functional data analysis for observations measured as extremely long sequences. Splitting the sequence into a number of small windows with manageable length, the windows may not be independent especially when they are neighboring to each other. We propose to utilize Bayesian smoothing splines to estimate individual functional patterns within each window and to establish transition models for parameters involved in each window to address the dependent structure between windows. The functional difference of groups of individuals at each window can be evaluated by Bayes Factor based on Markov Chain Monte Carlo samples in the analysis. In this paper, we examine the proposed method through simulation studies and apply it to identify differentially methylated genetic regions in TCGA lung adenocarcinoma data.

【7】 CoBWeb: a user-friendly web application to estimate causal treatment effects from observational data using multiple algorithms 标题:Cobweb:一个用户友好的Web应用程序,可以使用多种算法从观测数据中估计因果治疗效果 链接:https://arxiv.org/abs/2112.05035

作者:Andreas Markoulidakis,Peter Holmans,Philip Pallmann,Monica Busse,Beth-Ann Griffin 备注:16 pages 摘要:背景/目的:虽然随机对照试验是测量因果关系的金标准,但如果使用适当的统计技术来解释各组间预处理混杂因素的不平衡性,则可以使用观察性研究的数据得出关于因果关系的可靠结论。倾向评分(PS)和平衡加权是一种有用的技术,旨在减少治疗组之间观察到的不平衡,方法是根据观察到的混杂因素对各组进行尽可能相似的加权。方法:我们创建了CoBWeb,这是一个免费且易于使用的web应用程序,用于根据观察数据评估因果治疗效果,使用PS和平衡权重来控制混淆偏差。CoBWeb使用多种算法来估计PS和平衡权重,以允许治疗指标和观察到的混杂因素之间存在更灵活的关系(因为不同的算法对治疗协变量和混杂因素之间的结构关系做出不同(或无)假设)。可以通过选择在平衡和有效样本量之间实现最佳权衡的算法来选择最优算法。结果:CoBWeb遵循了从观察研究数据稳健估计因果治疗效果所需的所有关键步骤,包括未观察到的混杂因素潜在影响的敏感性分析。我们使用一个数据集来说明应用程序的实际使用,该数据集来自一项针对患有物质使用障碍的青少年的干预研究,该数据集可供应用程序环境中的用户使用。结论:CoBWeb旨在使非专家了解并应用所有关键步骤,以便使用观察数据对因果治疗效果进行稳健估计。 摘要:Background/aims: While randomized controlled trials are the gold standard for measuring causal effects, robust conclusions about causal relationships can be obtained using data from observational studies if proper statistical techniques are used to account for the imbalance of pretreatment confounders across groups. Propensity score (PS) and balance weighting are useful techniques that aim to reduce the observed imbalances between treatment groups by weighting the groups to be as similar as possible with respect to observed confounders. Methods: We have created CoBWeb, a free and easy-to-use web application for the estimation of causal treatment effects from observational data, using PS and balancing weights to control for confounding bias. CoBWeb uses multiple algorithms to estimate the PS and balancing weights, to allow for more flexible relations between the treatment indicator and the observed confounders (as different algorithms make different (or no) assumptions about the structural relationship between the treatment covariate and the confounders). The optimal algorithm can be chosen by selecting the one that achieves the best trade-off between balance and effective sample size. Results: CoBWeb follows all the key steps required for robust estimation of the causal treatment effect from observational study data and includes sensitivity analysis of the potential impact of unobserved confounders. We illustrate the practical use of the app using a dataset derived from a study of an intervention for adolescents with substance use disorder, which is available for users within the app environment. Conclusion: CoBWeb is intended to enable non-specialists to understand and apply all the key steps required to perform robust estimation of causal treatment effects using observational data.

【8】 Measuring Wind Turbine Health Using Drifting Concepts 标题:利用漂移概念测量风力机健康 链接:https://arxiv.org/abs/2112.04933

作者:Agnieszka Jastrzebska,Alejandro Morales-Hernández,Gonzalo Nápoles,Yamisleydi Salgueiro,Koen Vanhoof 机构:Warsaw University of Technology, Poland., Hasselt University, Belgium., Department of Cognitive Science & Artificial Intelligence, Tilburg University, The, Netherlands., Department of Computer Sciences, Universidad de Talca, Campus Curic´o, Chile. 摘要:时间序列处理是风力发电机组健康监测的一个重要方面。尽管在这一领域取得了进展,但仍有改进建模质量的新方法的空间。在本文中,我们提出了两种新的风力发电机组健康分析方法。这两种方法都基于抽象概念,使用模糊集实现,模糊集汇总和聚合底层原始数据。通过观察概念的变化,我们推断出涡轮机健康状况的变化。分别针对不同的外部条件(风速和温度)进行分析。我们提取代表相对低、中、高功率生产的概念。第一种方法旨在评估相对高功率和低功率生产的减少或增加。此任务使用类似回归的模型执行。第二种方法评估提取概念的总体漂移。大漂移表明发电过程在时间上会发生波动。使用语言标签标记概念,从而使我们的模型具有改进的可解释性特征。我们应用所提出的方法来处理描述四个风力涡轮机的公开数据。仿真结果表明,老化过程并非在所有风力涡轮机中都是均匀的。 摘要:Time series processing is an essential aspect of wind turbine health monitoring. Despite the progress in this field, there is still room for new methods to improve modeling quality. In this paper, we propose two new approaches for the analysis of wind turbine health. Both approaches are based on abstract concepts, implemented using fuzzy sets, which summarize and aggregate the underlying raw data. By observing the change in concepts, we infer about the change in the turbine's health. Analyzes are carried out separately for different external conditions (wind speed and temperature). We extract concepts that represent relative low, moderate, and high power production. The first method aims at evaluating the decrease or increase in relatively high and low power production. This task is performed using a regression-like model. The second method evaluates the overall drift of the extracted concepts. Large drift indicates that the power production process undergoes fluctuations in time. Concepts are labeled using linguistic labels, thus equipping our model with improved interpretability features. We applied the proposed approach to process publicly available data describing four wind turbines. The simulation results have shown that the aging process is not homogeneous in all wind turbines.

【9】 Forecast Evaluation in Large Cross-Sections of Realized Volatility 标题:大断面已实现波动率的预测评价 链接:https://arxiv.org/abs/2112.04887

作者:Christis Katsouris 摘要:在本文中,我们考虑预测评估的横截面依赖性的实现波动率测量等预测精度测试程序。在预测实际波动率时,我们评估了基于增强横截面的模型的预测精度。在预测精度相等的零假设下,采用的基准模型是标准的HAR模型,而在预测精度不相等的替代方案下,预测模型是通过套索收缩估计的增强的HAR模型。我们通过结合测量误差修正和横截面跳跃分量测量来研究预测对模型规范的敏感性。模型的样本外预测评估通过数值实现进行评估。 摘要:In this paper, we consider the forecast evaluation of realized volatility measures under cross-section dependence using equal predictive accuracy testing procedures. We evaluate the predictive accuracy of the model based on the augmented cross-section when forecasting Realized Volatility. Under the null hypothesis of equal predictive accuracy the benchmark model employed is a standard HAR model while under the alternative of non-equal predictive accuracy the forecast model is an augmented HAR model estimated via the LASSO shrinkage. We study the sensitivity of forecasts to the model specification by incorporating a measurement error correction as well as cross-sectional jump component measures. The out-of-sample forecast evaluation of the models is assessed with numerical implementations.

【10】 Extremes of Markov random fields on block graphs 标题:挡路图上马氏随机场的极值 链接:https://arxiv.org/abs/2112.04847

作者:Stefka Asenova,Johan Segers 摘要:我们研究与块图相关的马尔可夫随机场或无向图形模型的大值的联合出现。在这样的图上,把树作为特例,我们的目的是推广最近关于马尔可夫树极值的结果。块图中的每一对节点都由唯一的最短路径连接。这些路径用于确定适当重定标随机场的极限分布,前提是固定变量超过高阈值。当块诱导的子向量遵循H “usler-Reiss极值copulas时,原始场的全局Markov性质在极限最大稳定H “usler-Reiss分布的参数矩阵上诱导一种特殊结构。后者的多元帕累托模型根据原始块图证明是一个极值图形模型。此外,由于这些代数关系,即使某些变量是潜在的,参数仍然是可识别的。 摘要:We study the joint occurrence of large values of a Markov random field or undirected graphical model associated to a block graph. On such graphs, containing trees as special cases, we aim to generalize recent results for extremes of Markov trees. Every pair of nodes in a block graph is connected by a unique shortest path. These paths are shown to determine the limiting distribution of the properly rescaled random field given that a fixed variable exceeds a high threshold. When the sub-vectors induced by the blocks follow H"usler-Reiss extreme value copulas, the global Markov property of the original field induces a particular structure on the parameter matrix of the limiting max-stable H"usler-Reiss distribution. The multivariate Pareto version of the latter turns out to be an extremal graphical model according to the original block graph. Moreover, thanks to these algebraic relations, the parameters are still identifiable even if some variables are latent.

【11】 Sampling rate-corrected analysis of irregularly sampled time series 标题:不规则采样时间序列的采样率校正分析 链接:https://arxiv.org/abs/2112.04843

作者:Tobias Braun,Cinthya N. Fernandez,Deniz Eroglu,Adam Hartland,Sebastian F. M. Breitenbach,Norbert Marwan 机构:Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam, Germany, Tel.:, ,-,-, Institute for Geology, Mineralogy and Geophysics, Ruhr-Universität Bochum, Kadir Has University, Istanbul, Turkey 摘要:不规则采样时间序列的分析仍然是一项具有挑战性的任务,需要在不引入额外偏差的情况下考虑采样分辨率的连续和突然变化的方法。编辑距离是通过计算将一段转换为另一段的成本来定量比较长度不等的时间序列段的有效度量。我们表明,转换成本通常与局部采样率呈非平凡关系。如果采样分辨率发生很大的变化,这种影响会妨碍不同时间段之间的无偏比较。我们研究了这种效应对递归量化分析的影响,递归量化分析是一种非常适合于识别非线性时间序列中制度变迁的框架。提出了一种约束随机化方法来修正有偏递归量化测度。该策略涉及生成一种新型的时间序列和时间轴代理,我们称之为采样率约束(SRC)代理。我们用一个合成的例子和一个来自热带太平洋中部纽埃岛的不规则采样的洞穴化石代理记录证明了所提出方法的有效性。应用建议的校正方案,可识别仅由采样率突然变化造成的虚假过渡,并揭示与增强的ENSO和热带气旋活动相关的季节性降雨可预测性降低的时期。 摘要:The analysis of irregularly sampled time series remains a challenging task requiring methods that account for continuous and abrupt changes of sampling resolution without introducing additional biases. The edit-distance is an effective metric to quantitatively compare time series segments of unequal length by computing the cost of transforming one segment into the other. We show that transformation costs generally exhibit a non-trivial relationship with local sampling rate. If the sampling resolution undergoes strong variations, this effect impedes unbiased comparison between different time episodes. We study the impact of this effect on recurrence quantification analysis, a framework that is well-suited for identifying regime shifts in nonlinear time series. A constrained randomization approach is put forward to correct for the biased recurrence quantification measures. This strategy involves the generation of a novel type of time series and time axis surrogates which we call sampling rate constrained (SRC) surrogates. We demonstrate the effectiveness of the proposed approach with a synthetic example and an irregularly sampled speleothem proxy record from Niue island in the central tropical Pacific. Application of the proposed correction scheme identifies a spurious transition that is solely imposed by an abrupt shift in sampling rate and uncovers periods of reduced seasonal rainfall predictability associated with enhanced ENSO and tropical cyclone activity.

【12】 Evaluation of survival distribution predictions with discrimination measures 标题:用判别测度评价生存分布预测 链接:https://arxiv.org/abs/2112.04828

作者:Raphael Sonabend,Andreas Bender,Sebastian Vollmer 机构:MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial, College London, School of Public Health, W,PG, London, UK, Department of Computer Science, Technische Universit¨at Kaiserslautern, Gottlieb-Daimler-Straße , Kaiserslautern, Germany 摘要:在本文中,我们考虑如何评估生存分配预测的歧视措施。这是一个非常重要的问题,因为判别度量是生存分析中最常用的,但是没有明确的方法从分布预测中得出风险预测。我们调查的方法提出的文献和软件,并考虑其各自的优点和缺点。虽然分配经常通过歧视措施进行评估,但我们发现这样做的方法很少在文献中描述,并且经常导致不公平的比较。我们发现,将分布降低为风险的最稳健的方法是将预测的累积风险相加。我们建议机器学习生存分析软件在分布和风险预测之间实现清晰的转换,以允许更透明和可访问的模型评估。 摘要:In this paper we consider how to evaluate survival distribution predictions with measures of discrimination. This is a non-trivial problem as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages. Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons. We find that the most robust method of reducing a distribution to a risk is to sum over the predicted cumulative hazard. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation.

【13】 Regularized Modal Regression on Markov-dependent Observations: A Theoretical Assessment 标题:马尔可夫相依观测值的正则化模式回归:一个理论评估 链接:https://arxiv.org/abs/2112.04779

作者:Tielang Gong,Yuxin Dong,Hong Chen,Bo Dong,Wei Feng,Chen Li 机构:School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an , China, Key Laboratory of Intelligent Networks and Network Security, Ministry of Education, Xi’an , China, College of Science, Huazhong Agriculture University, Wuhan , China 摘要:模态回归是一种广泛使用的回归协议,由于其对异常值和重尾噪声的鲁棒性,在统计和机器学习领域得到了广泛的研究。理解模态回归的理论行为是学习理论的基础。尽管在描述其统计特性方面取得了重大进展,但大多数结果都是基于样本是独立和相同分布(i.i.d.)的假设,这对于实际应用来说限制太大。本文研究了一种重要的依赖结构——马尔可夫依赖结构中正则模态回归(RMR)的统计性质。具体地说,我们在中等条件下建立了RMR估计的上界,并给出了一个明确的学习率。我们的结果表明,马尔可夫依赖性对泛化误差的影响方式是,样本量将根据潜在马尔可夫链的谱间隙通过乘法因子进行折扣。这一结果为描述稳健回归的理论基础提供了新的思路。 摘要:Modal regression, a widely used regression protocol, has been extensively investigated in statistical and machine learning communities due to its robustness to outliers and heavy-tailed noises. Understanding modal regression's theoretical behavior can be fundamental in learning theory. Despite significant progress in characterizing its statistical property, the majority of the results are based on the assumption that samples are independent and identical distributed (i.i.d.), which is too restrictive for real-world applications. This paper concerns the statistical property of regularized modal regression (RMR) within an important dependence structure - Markov dependent. Specifically, we establish the upper bound for RMR estimator under moderate conditions and give an explicit learning rate. Our results show that the Markov dependence impacts on the generalization error in the way that sample size would be discounted by a multiplicative factor depending on the spectral gap of underlying Markov chain. This result shed a new light on characterizing the theoretical underpinning for robust regression.

【14】 A Note on Comparison of F-measures 标题:关于F-测度比较的一个注记 链接:https://arxiv.org/abs/2112.04677

作者:Wei Ju,Wenxin Jiang 机构: Jiang is with the Department of Statistics 摘要:我们对TKDE最近的一篇论文“不平衡数据集分类算法性能评估的F-测度线性近似”进行了评论,并对两种预测规则的F-测度进行了比较。 摘要:We comment on a recent TKDE paper "Linear Approximation of F-measure for the Performance Evaluation of Classification Algorithms on Imbalanced Data Sets", and make two improvements related to comparison of F-measures for two prediction rules.

【15】 Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning 标题:类别学习的贝叶斯半参数纵向逆概率混合模型 链接:https://arxiv.org/abs/2112.04626

作者:Minerva Mukhopadhyay,Jacie R. McHaney,Bharath Chandrasekaran,Abhra Sarkar 机构:Department of of Mathematics and Statistics, Indian Institute of Technology, Kanpur, UP , India, Department of Communication Science and Disorders, University of Pittsburgh, Forbes Tower, Pittsburgh, PA , USA, Department of Statistics and Data Sciences 备注:arXiv admin note: text overlap with arXiv:1912.02774 摘要:了解成人如何学习分类可以对经验依赖性大脑可塑性的机制提供新的见解。漂移扩散过程因其模仿潜在神经机制的能力而在此类环境中流行,但需要类别反应和相关反应时间的数据进行推理。然而,类别反应准确度通常是行为科学家记录的描述人类学习的唯一可靠指标。在潜伏期漂移扩散模型的基础上,我们推导出了一个生物学上可解释的逆概率分类概率模型。然而,该模型提出了重大的可识别性和推理挑战。我们通过一种新的基于投影的方法解决这些挑战,该方法具有对称性保持可识别性约束,允许我们在无约束空间中处理共轭先验。我们采用该模型进行纵向背景下的群体和个体水平推理。再次基于模型的潜在变量表示,我们设计了一种用于后验计算的有效马尔可夫链蒙特卡罗算法。我们通过模拟实验评估了该方法的经验性能。该方法的实际有效性在纵向声调学习研究中得到了应用。 摘要:Understanding how adult humans learn to categorize can shed novel insights into the mechanisms underlying experience-dependent brain plasticity. Drift-diffusion processes are popular in such contexts for their ability to mimic underlying neural mechanisms but require data on both category responses and associated response times for inference. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. Building carefully on drift-diffusion models with latent response times, we derive a biologically interpretable inverse-probit categorical probability model for such data. The model, however, presents significant identifiability and inference challenges. We address these challenges via a novel projection-based approach with a symmetry preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual level inference in longitudinal settings. Building again on the model's latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the method's empirical performances through simulation experiments. The method's practical efficacy is illustrated in applications to longitudinal tone learning studies.

【16】 Moments estimators and omnibus chi-square tests for some usual probability laws 标题:几种常见概率律的矩估计和综合卡方检验 链接:https://arxiv.org/abs/2112.04589

作者:Gorgui Gning,Aladji Babacar Niang,Modou Ngom,Gane Samb Lo 机构: Ministery of High School (SENEGAL)LERSTAD, Gaston Berger University 备注:36 pages; six figures 摘要:对于许多概率定律,在参数模型中,参数的估计可以在最大似然法的框架内完成,也可以在矩估计法的框架内完成,或者使用插件法等。通常,对于估计多个参数,使用相同的框架。本文重点研究了矩估计方法。我们使用Lo(2016)中函数经验过程(fep)的工具工具,展示了几乎以代数方式推导联合分布高斯定律以及从中推导综合卡方渐近定律的实用性。我们选择了四个分布来说明该方法(伽马定律、贝塔定律、一致定律和费舍尔定律),并尽可能完整地描述了矩估计量的渐近规律。进行模拟研究,以调查每种情况下获得的统计测试推荐的最小尺寸。一般来说,本文提出的综合卡方检验在样本量为50%左右时效果良好 摘要:For many probability laws, in parametric models, the estimation of the parameters can be done in the frame of the maximum likelihood method, or in the frame of moment estimation methods, or by using the plug-in method, etc. Usually, for estimating more than one parameter, the same frame is used. We focus on the moment estimation method in this paper. We use the instrumental tool of the functional empirical process (fep) in Lo (2016) to show how it is practical to derive, almost algebraically, the joint distribution Gaussian law and to derive omnibus chi-square asymptotic laws from it. We choose four distributions to illustrate the method (Gamma law, beta law, Uniform law and Fisher law) and completely describe the asymptotic laws of the moment estimators whenever possible. Simulations studies are performed to investigate for each case the smallest sizes for which the obtained statistical tests are recommendable. Generally, the omnibus chi-square test proposed here work fine with sample sizes around fifty

【17】 A generalized definition of the average causal effect for both binary and continuous treatments 标题:二元和连续处理的平均因果效应的广义定义 链接:https://arxiv.org/abs/2112.04580

作者:Fernando Pires Hartwig 机构:Postgraduate Program in Epidemiology, Federal University of Pelotas, Pelotas, Brazil., MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom. 摘要:因果推理的主要任务之一是估计定义良好的因果参数。主要因果参数之一是平均因果效应(ACE)——目标人群中个体水平因果效应的预期值。对于二元治疗,个体水平的因果效应定义为潜在结果之间的对比。然而,对于连续的结果,在有限的样本中存在许多这样的对比,因此妨碍了它们作为因果关系的有用总结。在此,我们提出了ACE的广义版本,其中个体水平的因果效应定义为个体水平因果剂量反应函数的导数(与治疗相关),该函数在个体具有的治疗值下进行评估。该定义等同于二元处理的常规定义,但也包含连续处理。我们证明了这个数量可以在传统的因果假设下估计,并通过模拟研究说明了理论观点。 摘要:One of the main tasks of causal inference is estimating well-defined causal parameters. One of the main causal parameters is the average causal effect (ACE) - the expected value of the individual level causal effects in the target population. For binary treatments, the individual level causal effect is defined as contrast between potential outcomes. For continuous outcomes, however, there are many such contrasts in finite samples, thus hampering their use as a useful summary of the causal relationship. Here, we proposed a generalized version of the ACE, where individual level causal effects are defined as the derivative (with respect to the treatment) of the individual level causal dose-response function evaluated at treatment value that the individual has. This definition is equivalent to the conventional definition for binary treatments, but also incorporates continuous treatments. We demonstrate that this quantity can be estimated under conventional causal assumptions and illustrate the theoretical ideas with a simulation study.

【18】 Custom Orthogonal Weight functions (COWs) for Event Classification 标题:用于事件分类的自定义正交权函数(COWS) 链接:https://arxiv.org/abs/2112.04574

作者:Hans Dembinski,Matthew Kenzie,Christoph Langenbruch,Michael Schmelling 机构:TU Dortmund, Germany, University of Warwick, United Kingdom, RWTH Aachen, Germany, Max Planck Institute for Nuclear Physics, Heidelberg, Germany 备注:18 pages, 16 figures, for associated software tools see this https URL 摘要:数据分析中的一个常见问题是信号和背景的分离。我们重新回顾并推广了所谓的$sWeights$方法,该方法允许我们使用混合信号和背景模型拟合判别变量,计算控制变量信号密度的经验估计。我们证明$sWeights$是一类更大的自定义正交权函数(COWs)的特例,它可以应用于一类更一般的问题,在这类问题中,判别变量和控制变量不一定是独立的,并且仍然达到接近最优的性能。我们还研究了从统计模型拟合到$sWeights$估计的参数的性质,并给出了拟合参数的渐近协方差矩阵的闭合公式。为了说明我们的发现,我们讨论了这些技术的几个实际应用。 摘要:A common problem in data analysis is the separation of signal and background. We revisit and generalise the so-called $sWeights$ method, which allows one to calculate an empirical estimate of the signal density of a control variable using a fit of a mixed signal and background model to a discriminating variable. We show that $sWeights$ are a special case of a larger class of Custom Orthogonal Weight functions (COWs), which can be applied to a more general class of problems in which the discriminating and control variables are not necessarily independent and still achieve close to optimal performance. We also investigate the properties of parameters estimated from fits of statistical models to $sWeights$ and provide closed formulas for the asymptotic covariance matrix of the fitted parameters. To illustrate our findings, we discuss several practical applications of these techniques.

【19】 A Combinatorial Approach for Nonparametric Short-Term Estimation of Queue Lengths using Probe Vehicles 标题:利用探测车进行非参数短期排队长度估计的组合方法 链接:https://arxiv.org/abs/2112.04551

作者:Gurcan Comert,Tewodros Amdeberhan,Negash Begashaw,Mashrur Chowdhury 机构:Computer Science, Physics, and Engineering Department, Benedict College, Harden St., Columbia, SC USA , Information Trust Institute, University of Illinois Urbana-Champaign, West Main St., Urbana, IL , USA 备注:13 pages, 5 figures 摘要:交通状态估计在有效的交通管理中起着重要的作用。本研究发展了一种组合方法,用于根据探测车辆的逐周期部分观测队列估计非参数短期队列长度。该方法不假设随机到达,也不假设任何主要参数或任何参数的估计,而是使用仅依赖于信号定时的简单代数表达式。对于交通交叉口的引道,给定探测车辆位置、计数、时间和分析间隔(例如,在红色信号相位结束时)的条件队列长度由负超几何分布表示。使用涉及探测车辆的现场试验数据,将获得的估算值与参数方法和简单的公路通行能力手动方法进行比较。分析表明,本文提出的非参数方法与现场测试数据中用于估计队列长度的参数方法的精度相匹配。 摘要:Traffic state estimation plays an important role in facilitating effective traffic management. This study develops a combinatorial approach for nonparametric short-term queue length estimation in terms of cycle-by-cycle partially observed queues from probe vehicles. The method does not assume random arrivals and does not assume any primary parameters or estimation of any parameters but uses simple algebraic expressions that only depend on signal timing. For an approach lane at a traffic intersection, the conditional queue lengths given probe vehicle location, count, time, and analysis interval (e.g., at the end of red signal phase) are represented by a Negative Hypergeometric distribution. The estimators obtained are compared with parametric methods and simple highway capacity manual methods using field test data involving probe vehicles. The analysis indicates that the nonparametric methods presented in this paper match the accuracy of parametric methods used in the field test data for estimating queue lengths.

【20】 A Parametric Approach to Relaxing the Independence Assumption in Relative Survival Analysis 标题:放松相对生存分析中独立性假设的参数方法 链接:https://arxiv.org/abs/2112.04534

作者:Reuben Adatorwovor,Aurelien Latouche,Jason P. Fine 机构: 3 1University of Kentucky, edu 2University Conservatoire National des Arts et M´etiers, fr 3University of North Carolina at Chapel Hill 备注:15 pages, 3 figures, 6 tables 摘要:在已知死亡原因(CoD)的情况下,竞争性风险生存方法可用于估计疾病特异性生存率。当死亡原因不明或有误判,且实际使用不可靠时,可使用相对生存分析来估计疾病特异性生存率。这种方法在使用登记数据的基于人群的癌症生存研究中很流行,并且不需要CoD信息。标准估计值是癌症队列组的全因生存率与一般参考人群的已知预期生存率之比。疾病特异性死亡与其他死亡原因竞争,可能造成鳕鱼之间的依赖性。只有在疾病死亡和其他原因死亡是独立的情况下,标准比率估计才有效。为了放松独立性假设,我们使用基于copula的模型来描述依赖性。基于似然的参数方法用于拟合无CoD信息的疾病特异性死亡分布,其中假设已知copula,其他死亡原因的分布来自参考人群。我们提出了一种敏感性分析,其中分析是在一系列假定的依赖结构上进行的。我们通过模拟研究和对法国乳腺癌数据的应用证明了我们方法的实用性。 摘要:With known cause of death (CoD), competing risk survival methods are applicable in estimating disease-specific survival. Relative survival analysis may be used to estimate disease-specific survival when cause of death is either unknown or subject to misspecification and not reliable for practical usage. This method is popular for population-based cancer survival studies using registry data and does not require CoD information. The standard estimator is the ratio of all-cause survival in the cancer cohort group to the known expected survival from a general reference population. Disease-specific death competes with other causes of mortality, potentially creating dependence among the CoD. The standard ratio estimate is only valid when death from disease and death from other causes are independent. To relax the independence assumption, we formulate dependence using a copula-based model. Likelihood-based parametric method is used to fit the distribution of disease-specific death without CoD information, where the copula is assumed known and the distribution of other cause of mortality is derived from the reference population. We propose a sensitivity analysis, where the analysis is conducted across a range of assumed dependence structures. We demonstrate the utility of our method through simulation studies and an application to French breast cancer data.

【21】 Extending the WILDS Benchmark for Unsupervised Adaptation 标题:扩展无监督适应的WARDS基准 链接:https://arxiv.org/abs/2112.05090

作者:Shiori Sagawa,Pang Wei Koh,Tony Lee,Irena Gao,Sang Michael Xie,Kendrick Shen,Ananya Kumar,Weihua Hu,Michihiro Yasunaga,Henrik Marklund,Sara Beery,Etienne David,Ian Stavness,Wei Guo,Jure Leskovec,Kate Saenko,Tatsunori Hashimoto,Sergey Levine,Chelsea Finn,Percy Liang 摘要:在野外部署的机器学习系统通常在源分布上进行训练,但部署在不同的目标分布上。未标记的数据可以成为缓解这些分布变化的有力杠杆,因为它通常比标记的数据更可用。然而,现有的未标记数据的分布转移基准并不能反映现实应用程序中出现的场景的广度。在这项工作中,我们介绍了WILDS 2.0更新,该更新扩展了WILDS分布转移基准中10个数据集中的8个,以包括在部署中实际可获得的经管理的未标记数据。为了保持一致性,标记的训练、验证和测试集以及评估指标与原始WILDS基准中的完全相同。这些数据集涵盖广泛的应用(从组织学到野生动物保护)、任务(分类、回归和检测)和模式(照片、卫星图像、显微镜载玻片、文本、分子图)。我们系统地测试了利用未标记数据的最新方法,包括域不变、自训练和自监督方法,并表明它们在WILDS 2.0上的成功是有限的。为了促进方法开发和评估,我们提供了一个开源软件包,它可以自动加载数据,并包含本文中使用的所有模型体系结构和方法。代码和排行榜可在https://wilds.stanford.edu. 摘要:Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data. However, existing distribution shift benchmarks for unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. To maintain consistency, the labeled training, validation, and test sets, as well as the evaluation metrics, are exactly the same as in the original WILDS benchmark. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS 2.0 is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

【22】 The Peril of Popular Deep Learning Uncertainty Estimation Methods 标题:流行的深度学习不确定性估计方法的危害性 链接:https://arxiv.org/abs/2112.05000

作者:Yehao Liu,Matteo Pagliardini,Tatjana Chavdarova,Sebastian U. Stich 机构:EPFL, UC Berkeley, CISPA 备注:Presented at the Bayesian Deep Learning Workshop at NeurIPS 2021

【23】 Machine Learning for Utility Prediction in Argument-Based Computational Persuasion 标题:基于参数的计算说服中效用预测的机器学习 链接:https://arxiv.org/abs/2112.04953

作者:Ivan Donadello,Anthony Hunter,Stefano Teso,Mauro Dragoni 机构: Free University of Bozen-Bolzano, Italy, University College London, United Kingdom, Fondazione Bruno Kessler, Italy, University of Trento, Italy

【24】 A More Stable Accelerated Gradient Method Inspired by Continuous-Time Perspective 标题:一种受连续时间透视启发的更稳定的加速梯度法 链接:https://arxiv.org/abs/2112.04922

作者:Yasong Feng,Weiguo Gao

【25】 Scalable and Decentralized Algorithms for Anomaly Detection via Learning-Based Controlled Sensing 标题:基于学习控制感知的可扩展分散异常检测算法 链接:https://arxiv.org/abs/2112.04912

作者:Geethu Joseph,Chen Zhong,M. Cenk Gursoy,Senem Velipasalar,Pramod K. Varshney 备注:13 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2105.06289 摘要:我们解决的问题是从一个给定的集合中依次选择和观察过程,以发现其中的异常。决策者在任何给定的时刻观察过程的子集,并获得相应过程是否异常的噪声二元指标。在此设置中,我们开发了一种异常检测算法,该算法选择在给定时刻要观察的进程,决定何时停止观察,并宣布对异常进程的决定。检测算法的目标是识别精度超过期望值的异常,同时最小化决策延迟。我们设计了一个集中式算法,其中流程由一个公共代理共同选择,以及一个分散式算法,其中每个流程都独立决定是否选择流程。我们的算法依赖于一个马尔可夫决策过程,该过程是根据观测值,使用每个过程正常或异常的边际概率定义的。我们使用深度参与者-批评家强化学习框架来实现检测算法。与之前在这个主题上的工作不同,我们的算法在进程数量上具有指数复杂性,我们的算法在计算和内存方面的需求在进程数量上都是多项式的。通过与最新方法的比较,我们用数值实验证明了这些算法的有效性。 摘要:We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the processes to be observed at a given time instant, decides when to stop taking observations, and declares the decision on anomalous processes. The objective of the detection algorithm is to identify the anomalies with an accuracy exceeding the desired value while minimizing the delay in decision making. We devise a centralized algorithm where the processes are jointly selected by a common agent as well as a decentralized algorithm where the decision of whether to select a process is made independently for each process. Our algorithms rely on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithms using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithms have computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of these algorithms using numerical experiments by comparing them with state-of-the-art methods.

【26】 i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery 标题:I-SpaSP:基于稀疏信号恢复的结构化神经剪枝 链接:https://arxiv.org/abs/2112.04905

作者:Cameron R. Wolfe,Anastasios Kyrillidis 机构:Department of Computer Science, Rice University, Houston, TX, USA. 备注:27 pages, 4 figures 摘要:我们提出了一种新的神经网络结构化剪枝算法——迭代稀疏结构化剪枝算法,称为i-SpaSP。受稀疏信号恢复思想的启发,i-SpaSP通过迭代识别网络中对剪枝和密集网络输出之间的残差贡献最大的一组重要参数组(例如,滤波器或神经元),然后基于更小的预定义剪枝比对这些组进行阈值化来运行。对于具有ReLU激活的两层和多层网络结构,我们展示了由i-SpaSP修剪引起的误差以多项式形式衰减,其中该多项式的次数根据稠密网络隐藏表示的稀疏性变得任意大。在我们的实验中,i-SpaSP在各种数据集(即MNIST和ImageNet)和体系结构(即前馈网络、ResNet34和MobileNetV2)上进行评估,结果表明,i-SpaSP可以发现高性能子网络,并将可证明基线方法的修剪效率提高几个数量级。简单地说,i-SpaSP易于通过自动微分实现,获得了很强的经验结果,具有理论上的收敛保证,并且是高效的,因此,它是为数不多的计算高效、实用且可证明的修剪算法之一。 摘要:We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP. Inspired by ideas from sparse signal recovery, i-SpaSP operates by iteratively identifying a larger set of important parameter groups (e.g., filters or neurons) within a network that contribute most to the residual between pruned and dense network output, then thresholding these groups based on a smaller, pre-defined pruning ratio. For both two-layer and multi-layer network architectures with ReLU activations, we show the error induced by pruning with i-SpaSP decays polynomially, where the degree of this polynomial becomes arbitrarily large based on the sparsity of the dense network's hidden representations. In our experiments, i-SpaSP is evaluated across a variety of datasets (i.e., MNIST and ImageNet) and architectures (i.e., feed forward networks, ResNet34, and MobileNetV2), where it is shown to discover high-performing sub-networks and improve upon the pruning efficiency of provable baseline methodologies by several orders of magnitude. Put simply, i-SpaSP is easy to implement with automatic differentiation, achieves strong empirical results, comes with theoretical convergence guarantees, and is efficient, thus distinguishing itself as one of the few computationally efficient, practical, and provable pruning algorithms.

【27】 Multi-Task Learning on Networks 标题:网络环境下的多任务学习 链接:https://arxiv.org/abs/2112.04891

作者:Andrea Ponti 机构:Matricola , Anno Accademico ,-, arXiv:,.,v, [cs.LG] , Dec 备注:94 pages, 53 figures, 8 tables 摘要:多任务学习(MTL)范式可以追溯到Caruana(1997)的一篇早期论文中,其中认为可以使用多个任务的数据来获得比独立学习每个任务更好的性能。具有冲突目标的MTL解决方案需要对它们之间的权衡进行建模,这通常超出了直线组合可以实现的范围。一个理论上有原则且计算上有效的策略是寻找不受其他人支配的解,正如帕累托分析中所述。多任务学习环境中出现的多目标优化问题具有特定的特点,需要特殊的方法。对这些特征的分析和一种新的计算方法的提出是这项工作的重点。多目标进化算法(MOEA)可以很容易地包含优势的概念,因此也包括帕累托分析。MOEAs的主要缺点是功能评估的样本效率较低。这一缺点的关键原因是大多数进化方法不使用模型来近似目标函数。贝叶斯优化采用了一种完全不同的基于替代模型的方法,如高斯过程。在本论文中,输入空间中的解表示为概率分布,封装了函数求值中包含的知识。在这一概率分布空间中,利用Wasserstein距离给出的度量,可以设计一种新的算法MOEA/WST,其中模型不是直接在目标函数上,而是在中间信息空间中,其中来自输入空间的对象映射为直方图。计算结果表明,MOEA/WST提供的样本效率和Pareto集的质量明显优于标准MOEA。 摘要:The multi-task learning (MTL) paradigm can be traced back to an early paper of Caruana (1997) in which it was argued that data from multiple tasks can be used with the aim to obtain a better performance over learning each task independently. A solution of MTL with conflicting objectives requires modelling the trade-off among them which is generally beyond what a straight linear combination can achieve. A theoretically principled and computationally effective strategy is finding solutions which are not dominated by others as it is addressed in the Pareto analysis. Multi-objective optimization problems arising in the multi-task learning context have specific features and require adhoc methods. The analysis of these features and the proposal of a new computational approach represent the focus of this work. Multi-objective evolutionary algorithms (MOEAs) can easily include the concept of dominance and therefore the Pareto analysis. The major drawback of MOEAs is a low sample efficiency with respect to function evaluations. The key reason for this drawback is that most of the evolutionary approaches do not use models for approximating the objective function. Bayesian Optimization takes a radically different approach based on a surrogate model, such as a Gaussian Process. In this thesis the solutions in the Input Space are represented as probability distributions encapsulating the knowledge contained in the function evaluations. In this space of probability distributions, endowed with the metric given by the Wasserstein distance, a new algorithm MOEA/WST can be designed in which the model is not directly on the objective function but in an intermediate Information Space where the objects from the input space are mapped into histograms. Computational results show that the sample efficiency and the quality of the Pareto set provided by MOEA/WST are significantly better than in the standard MOEA.

【28】 Evaluating saliency methods on artificial data with different background types 标题:不同背景类型人工数据的显著性评价方法 链接:https://arxiv.org/abs/2112.04882

作者:Céline Budding,Fabian Eitel,Kerstin Ritter,Stefan Haufe 机构:Department of Industrial Engineering & Innovation Sciences., Eindhoven University of Technology, Eindhoven, The Netherlands, Charité – Universitätsmedizin Berlin, Berlin, Germany;, Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany 备注:6 pages, 2 figures. Presented at Medical Imaging meets NeurIPS 2021 (poster presentation) 摘要:在过去的几年里,许多“可解释人工智能”(xAI)方法已经被开发出来,但这些方法并不总是被客观地评估。为了评估由各种显著性方法生成的热图的质量,我们开发了一个框架,用合成病变和已知的地面真值图生成人工数据。利用这个框架,我们评估了两个不同背景的数据集,柏林噪声和2D脑MRI切片,发现显著性方法和背景之间的热图差异很大。我们强烈鼓励在将显著性图和xAI方法应用于临床或其他安全关键环境之前,使用该框架对显著性图和xAI方法进行进一步评估。 摘要:Over the last years, many 'explainable artificial intelligence' (xAI) approaches have been developed, but these have not always been objectively evaluated. To evaluate the quality of heatmaps generated by various saliency methods, we developed a framework to generate artificial data with synthetic lesions and a known ground truth map. Using this framework, we evaluated two data sets with different backgrounds, Perlin noise and 2D brain MRI slices, and found that the heatmaps vary strongly between saliency methods and backgrounds. We strongly encourage further evaluation of saliency maps and xAI methods using this framework before applying these in clinical or other safety-critical settings.

【29】 A New Measure of Model Redundancy for Compressed Convolutional Neural Networks 标题:压缩卷积神经网络模型冗余度的一种新度量 链接:https://arxiv.org/abs/2112.04857

作者:Feiqing Huang,Yuefeng Si,Yao Zheng,Guodong Li 机构:† Department of Statistics and Actuarial Science, University of Hong Kong, China, ‡ Department of Statistics, University of Connecticut 摘要:虽然最近提出了许多设计来提高卷积神经网络(CNN)在固定资源预算下的模型效率,但对这些设计的理论理解仍然明显不足。本文旨在提供一个新的框架来回答这个问题:压缩后的CNN中是否还有剩余的模型冗余?我们首先通过张量分解开发CNN和压缩CNN的通用统计公式,这样跨层的权重可以总结为单个张量。然后,通过严格的样本复杂度分析,我们揭示了导出的样本复杂度与原始参数计数之间的一个重要差异,这是模型冗余的直接指标。基于这一发现,我们引入了一种新的压缩CNN模型冗余度量,称为$K/R$比率,它进一步允许非线性激活。这项新措施的有效性得到了对流行区块设计和数据集的消融研究的支持。 摘要:While recently many designs have been proposed to improve the model efficiency of convolutional neural networks (CNNs) on a fixed resource budget, theoretical understanding of these designs is still conspicuously lacking. This paper aims to provide a new framework for answering the question: Is there still any remaining model redundancy in a compressed CNN? We begin by developing a general statistical formulation of CNNs and compressed CNNs via the tensor decomposition, such that the weights across layers can be summarized into a single tensor. Then, through a rigorous sample complexity analysis, we reveal an important discrepancy between the derived sample complexity and the naive parameter counting, which serves as a direct indicator of the model redundancy. Motivated by this finding, we introduce a new model redundancy measure for compressed CNNs, called the $K/R$ ratio, which further allows for nonlinear activations. The usefulness of this new measure is supported by ablation studies on popular block designs and datasets.

【30】 Effective dimension of machine learning models 标题:机器学习模型的有效维度 链接:https://arxiv.org/abs/2112.04807

作者:Amira Abbas,David Sutter,Alessio Figalli,Stefan Woerner 机构:IBM Quantum, IBM Research – Zurich, University of KwaZulu-Natal, Durban, Department of Mathematics, ETH Zurich 备注:17 pages, 2 figures 摘要:在涉及新数据的任务上对经过训练的模型的性能进行说明是机器学习的主要目标之一,即理解模型的泛化能力。各种能力度量都试图捕捉这种能力,但通常无法解释我们在实践中观察到的模型的重要特征。在这项研究中,我们提出了局部有效维数作为容量度量,它似乎与标准数据集上的泛化误差有很好的相关性。重要的是,我们证明了局部有效维数限制了泛化误差,并讨论了这种能力测度对机器学习模型的适用性。 摘要:Making statements about the performance of trained models on tasks involving new data is one of the primary goals of machine learning, i.e., to understand the generalization power of a model. Various capacity measures try to capture this ability, but usually fall short in explaining important characteristics of models that we observe in practice. In this study, we propose the local effective dimension as a capacity measure which seems to correlate well with generalization error on standard data sets. Importantly, we prove that the local effective dimension bounds the generalization error and discuss the aptness of this capacity measure for machine learning models.

【31】 Does Redundancy in AI Perception Systems Help to Test for Super-Human Automated Driving Performance? 标题:人工智能感知系统中的冗余是否有助于测试超人自动驾驶性能? 链接:https://arxiv.org/abs/2112.04758

作者:Hanno Gottschalk,Matthias Rottmann,Maida Saltagic 机构: equal contribution 2UniversityofWuppertal 摘要:虽然自动驾驶的广告通常比人工驾驶的性能更好,但这项工作回顾说,几乎不可能在系统层面上提供直接的统计证据,证明事实确实如此。所需的标记数据量将超过当今技术和经济能力的规模。因此,一种常用的策略是使用冗余,同时证明子系统具有足够的性能。众所周知,该策略特别适用于子系统独立运行的情况,即错误的发生在统计意义上是独立的。在这里,我们给出了一些初步考虑和实验证据,证明这种策略不是免费的,因为完成相同计算机视觉任务的神经网络的错误,至少在某些情况下,显示出错误的相关发生。如果训练数据、体系结构和训练保持分离,或者使用特殊的损失函数训练独立性,这仍然是正确的。在我们的实验中,使用来自不同传感器的数据(通过3D MNIST数据集的多达五个2D投影实现)可以更有效地降低相关性,但这并没有实现减少冗余和统计独立子系统可获得的测试数据的潜力。 摘要:While automated driving is often advertised with better-than-human driving performance, this work reviews that it is nearly impossible to provide direct statistical evidence on the system level that this is actually the case. The amount of labeled data needed would exceed dimensions of present day technical and economical capabilities. A commonly used strategy therefore is the use of redundancy along with the proof of sufficient subsystems' performances. As it is known, this strategy is efficient especially for the case of subsystems operating independently, i.e. the occurrence of errors is independent in a statistical sense. Here, we give some first considerations and experimental evidence that this strategy is not a free ride as the errors of neural networks fulfilling the same computer vision task, at least for some cases, show correlated occurrences of errors. This remains true, if training data, architecture, and training are kept separate or independence is trained using special loss functions. Using data from different sensors (realized by up to five 2D projections of the 3D MNIST data set) in our experiments is more efficiently reducing correlations, however not to an extent that is realizing the potential of reduction of testing data that can be obtained for redundant and statistically independent subsystems.

【32】 Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations 标题:跨地点外推随机试验的协变量平衡灵敏度分析 链接:https://arxiv.org/abs/2112.04723

作者:Xinkun Nie,Guido Imbens,Stefan Wager 摘要:在不同地区推广随机对照试验(RCT)的实验结果对于为目标地区的政策决策提供信息至关重要。由于不可测量的效应调节剂会影响治疗效果评估从一个位置到另一个位置的直接传输,因此缺乏可识别性,这通常会阻碍这种推广。我们以观察性研究中的敏感性分析为基础,提出了一种优化程序,使我们能够获得目标区域治疗效果的界限。此外,我们通过平衡协变量的矩来构造更多的信息边界。在仿真实验中,我们证明了协变量平衡方法在获得更清晰的识别区间方面是有希望的。 摘要:The ability to generalize experimental results from randomized control trials (RCTs) across locations is crucial for informing policy decisions in targeted regions. Such generalization is often hindered by the lack of identifiability due to unmeasured effect modifiers that compromise direct transport of treatment effect estimates from one location to another. We build upon sensitivity analysis in observational studies and propose an optimization procedure that allows us to get bounds on the treatment effects in targeted regions. Furthermore, we construct more informative bounds by balancing on the moments of covariates. In simulation experiments, we show that the covariate balancing approach is promising in getting sharper identification intervals.

【33】 Nonparametric inference of stochastic differential equations based on the relative entropy rate 标题:基于相对熵率的随机微分方程非参数推断 链接:https://arxiv.org/abs/2112.04692

作者:Min Dai,Jinqiao Duan,Jianyu Hu,Xiangjun Wang 机构:School of Science, Wuhan University of Technology, Wuhan , China, Department of Applied Mathematics, College of Computing, Illinois Institute of Technology, Chicago, Illinois, USA 摘要:目前,大数据检测方法正在经历一场由机器和复杂数据驱动的学习革命。发现控制方程和量化复杂系统的动力学特性是核心挑战之一。在这项工作中,我们设计了一种非参数方法来从具有不同漂移函数的随机微分方程的观测中学习相对熵率。然后利用高斯过程核理论给出了相对熵率的估计量。同时,该方法能够提取控制方程。我们用几个例子来说明我们的方法。数值实验表明,该方法不仅适用于多项式漂移函数,而且适用于有理漂移函数。 摘要:The information detection of complex systems from data is currently undergoing a revolution, driven by the emergence of big data and machine learning methodology. Discovering governing equations and quantifying dynamical properties of complex systems are among central challenges. In this work, we devise a nonparametric approach to learn the relative entropy rate from observations of stochastic differential equations with different drift functions.The estimator corresponding to the relative entropy rate then is presented via the Gaussian process kernel theory. Meanwhile, this approach enables to extract the governing equations. We illustrate our approach in several examples. Numerical experiments show the proposed approach performs well for rational drift functions, not only polynomial drift functions.

【34】 Differentially Private Ensemble Classifiers for Data Streams 标题:数据流的差分私有集成分类器 链接:https://arxiv.org/abs/2112.04640

作者:Lovedeep Gondara,Ke Wang,Ricardo Silva Carvalho 机构:School of Computing Science, Simon Fraser University, British Columbia, Canada 备注:Accepted at WSDM 2022 摘要:通过分类/回归从连续数据流中学习在许多领域都很普遍。适应不断变化的数据特征(概念漂移),同时保护数据所有者的私人信息是一个公开的挑战。我们针对这个问题提出了一个差异私有集成解决方案,它具有两个显著的特点:它允许textit{unbounded}个集成更新来处理固定隐私预算下可能永无止境的数据流,并且它是textit{model agnostic},因为它将任何预先训练的差异私有分类/回归模型视为一个黑箱。在真实世界和模拟数据集上,我们的方法在隐私、概念漂移和数据分布的不同设置方面优于竞争对手。 摘要:Learning from continuous data streams via classification/regression is prevalent in many domains. Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge. We present a differentially private ensemble solution to this problem with two distinguishing features: it allows an textit{unbounded} number of ensemble updates to deal with the potentially never-ending data streams under a fixed privacy budget, and it is textit{model agnostic}, in that it treats any pre-trained differentially private classification/regression model as a black-box. Our method outperforms competitors on real-world and simulated datasets for varying settings of privacy, concept drift, and data distribution.

【35】 Calibration Improves Bayesian Optimization 标题:校准改进了贝叶斯优化 链接:https://arxiv.org/abs/2112.04620

作者:Shachi Deshpande,Volodymyr Kuleshov 机构:Department of Computer Science, Cornell Tech, New York, NY 摘要:贝叶斯优化是一种允许获得黑箱函数全局最优值的过程,在超参数优化等应用中非常有用。目标函数形状的不确定性估计有助于指导优化过程。但是,如果目标函数违反基础模型中的假设(例如高斯性),则这些估计可能不准确。作为贝叶斯优化过程的一部分,我们提出了一个简单的算法来校准目标函数上后验分布的不确定性。我们表明,通过校准改进后验分布的不确定性估计,贝叶斯优化可以做出更好的决策,并以更少的步骤达到全局最优。我们表明,该技术提高了贝叶斯优化在标准基准函数和超参数优化任务上的性能。 摘要:Bayesian optimization is a procedure that allows obtaining the global optimum of black-box functions and that is useful in applications such as hyper-parameter optimization. Uncertainty estimates over the shape of the objective function are instrumental in guiding the optimization process. However, these estimates can be inaccurate if the objective function violates assumptions made within the underlying model (e.g., Gaussianity). We propose a simple algorithm to calibrate the uncertainty of posterior distributions over the objective function as part of the Bayesian optimization process. We show that by improving the uncertainty estimates of the posterior distribution with calibration, Bayesian optimization makes better decisions and arrives at the global optimum in fewer steps. We show that this technique improves the performance of Bayesian optimization on standard benchmark functions and hyperparameter optimization tasks.

【36】 InvGAN: Invertable GANs 标题:InvGAN:可逆GANS 链接:https://arxiv.org/abs/2112.04598

作者:Partha Ghosh,Dominik Zietlow,Michael J. Black,Larry S. Davis,Xiaochen Hu 机构:† MPI for Intelligent Systems, Tübingen 摘要:高分辨率生成模型的许多潜在应用包括照片真实感图像的生成、语义编辑和表示学习。GAN的最新进展已将其确定为此类任务的最佳选择。然而,由于它们不提供推理模型,因此无法使用GAN潜在空间对真实图像进行图像编辑或诸如分类之类的下游任务。尽管为训练推理模型或设计迭代方法以反转预先训练的生成器进行了大量工作,但以前的方法都是针对数据集(如人脸图像)和体系结构(如StyleGAN)的。这些方法对于扩展到新的数据集或体系结构来说是非常重要的。我们提出了一个对体系结构和数据集不可知的通用框架。我们的主要见解是,通过将推理和生成模型一起训练,我们可以使它们相互适应,并收敛到质量更好的模型。我们的textbf{InvGAN}是可逆GAN的缩写,它成功地将真实图像嵌入高质量生成模型的潜在空间。这使我们能够执行图像修复、合并、插值和在线数据增强。我们通过大量的定性和定量实验证明了这一点。 摘要:Generation of photo-realistic images, semantic editing and representation learning are a few of many potential applications of high resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, image editing or downstream tasks such as classification can not be done on real images using the GAN latent space. Despite numerous efforts to train an inference model or design an iterative method to invert a pre-trained generator, previous methods are dataset (e.g. human face images) and architecture (e.g. StyleGAN) specific. These methods are nontrivial to extend to novel datasets or architectures. We propose a general framework that is agnostic to architecture and datasets. Our key insight is that, by training the inference and the generative model together, we allow them to adapt to each other and to converge to a better quality model. Our textbf{InvGAN}, short for Invertable GAN, successfully embeds real images to the latent space of a high quality generative model. This allows us to perform image inpainting, merging, interpolation and online data augmentation. We demonstrate this with extensive qualitative and quantitative experiments.

【37】 The perils of being unhinged: On the accuracy of classifiers minimizing a noise-robust convex loss 标题:错位的危险:关于最小化噪声鲁棒性凸性损失的分类器的精度 链接:https://arxiv.org/abs/2112.04590

作者:Philip M. Long,Rocco A. Servedio 机构:Google, Columbia University 摘要:van Rooyen等人引入了凸损失函数对随机分类噪声具有鲁棒性的概念,并确定了“非交错”损失函数在这个意义上是鲁棒的。在这篇文章中,我们研究了通过最小化失谐损失获得的二元分类器的精度,并观察到即使对于简单的线性可分离数据分布,最小化失谐损失也只能得到精度不优于随机猜测的二元分类器。 摘要:van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense. In this note we study the accuracy of binary classifiers obtained by minimizing the unhinged loss, and observe that even for simple linearly separable data distributions, minimizing the unhinged loss may only yield a binary classifier with accuracy no better than random guessing.

【38】 Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach 标题:模糊动态处理机制:一种强化学习方法 链接:https://arxiv.org/abs/2112.04571

作者:Soroush Saghafian 机构:Harvard Kennedy School, Harvard University, Cambridge, MA 摘要:各种研究的一个主要研究目标是使用观察数据集,并提供一套新的反事实指南,以产生因果改善。动态处理机制(DTR)被广泛研究,以规范这一过程。然而,寻找最佳DTR的可用方法通常依赖于在实际应用中违反的假设(例如,医疗决策或公共政策),尤其是当(a)无法忽略未观察到的混杂因素的存在,以及(b)未观察到的混杂因素是时变的(例如,受先前行动的影响)。当违反这些假设时,人们通常会面临关于需要假设以获得最佳DTR的潜在因果模型的模糊性。这种模糊性是不可避免的,因为无法从观测数据中理解未观测到的混杂因素的动态及其对观测数据部分的因果影响。受在我们的合作医院接受移植并面临被称为移植后新发糖尿病(NODAT)的医疗状况的患者寻找更好治疗方案的案例研究的启发,我们将DTR扩展到一个新的类别,称为模糊动态治疗方案(ADTR),其中,基于潜在因果模型的“云”评估治疗方案的偶然影响。然后,我们将ADTR与Saghafian(2018)提出的模糊部分可观察标记决策过程(APOMDP)相连接,并开发了两种强化学习方法,称为直接增强V-学习(DAV-Learning)和安全增强V-学习(SAV-Learning),这使得使用观测数据能够有效地学习最佳治疗方案。我们建立了这些学习方法的理论结果,包括(弱)一致性和渐近正态性。在我们的案例研究和模拟实验中,我们进一步评估了这些学习方法的性能。 摘要:A main research goal in various studies is to use an observational data set and provide a new set of counterfactual guidelines that can yield causal improvements. Dynamic Treatment Regimes (DTRs) are widely studied to formalize this process. However, available methods in finding optimal DTRs often rely on assumptions that are violated in real-world applications (e.g., medical decision-making or public policy), especially when (a) the existence of unobserved confounders cannot be ignored, and (b) the unobserved confounders are time-varying (e.g., affected by previous actions). When such assumptions are violated, one often faces ambiguity regarding the underlying causal model that is needed to be assumed to obtain an optimal DTR. This ambiguity is inevitable, since the dynamics of unobserved confounders and their causal impact on the observed part of the data cannot be understood from the observed data. Motivated by a case study of finding superior treatment regimes for patients who underwent transplantation in our partner hospital and faced a medical condition known as New Onset Diabetes After Transplantation (NODAT), we extend DTRs to a new class termed Ambiguous Dynamic Treatment Regimes (ADTRs), in which the casual impact of treatment regimes is evaluated based on a "cloud" of potential causal models. We then connect ADTRs to Ambiguous Partially Observable Mark Decision Processes (APOMDPs) proposed by Saghafian (2018), and develop two Reinforcement Learning methods termed Direct Augmented V-Learning (DAV-Learning) and Safe Augmented V-Learning (SAV-Learning), which enable using the observed data to efficiently learn an optimal treatment regime. We establish theoretical results for these learning methods, including (weak) consistency and asymptotic normality. We further evaluate the performance of these learning methods both in our case study and in simulation experiments.

【39】 Daily peak electrical load forecasting with a multi-resolution approach 标题:基于多分辨率方法的日高峰电力负荷预测 链接:https://arxiv.org/abs/2112.04492

作者:Yvenn Amara-Ouali,Matteo Fasiolo,Yannig Goude,Hui Yan 机构:Laboratoire de Math´ematiques d’Orsay (LMO), CNRS, Universit´e Paris-Saclay, Facult´e, des Sciences d’Orsay, bat , Orsay, France, CELESTE, Inria Saclay, FRANCE, School of Mathematics, University of Bristol, Bristol, UK 摘要:在智能电网和负载平衡的背景下,每日峰值负载预测已成为能源行业利益相关者的关键活动。了解峰值大小和时间对于实施智能电网战略(如调峰)至关重要。本文提出的建模方法利用高分辨率和低分辨率信息预测每日峰值需求规模和时间。由此产生的多分辨率建模框架可适用于不同的模型类别。本文的主要贡献是:a)对多分辨率建模方法进行了一般和正式的介绍;b)讨论了通过广义加法模型和神经网络实现的不同分辨率下的建模方法;c)对英国电力市场真实数据的实验结果。结果证实,所提出的建模方法的预测性能与低分辨率和高分辨率备选方案的预测性能具有竞争力。 摘要:In the context of smart grids and load balancing, daily peak load forecasting has become a critical activity for stakeholders of the energy industry. An understanding of peak magnitude and timing is paramount for the implementation of smart grid strategies such as peak shaving. The modelling approach proposed in this paper leverages high-resolution and low-resolution information to forecast daily peak demand size and timing. The resulting multi-resolution modelling framework can be adapted to different model classes. The key contributions of this paper are a) a general and formal introduction to the multi-resolution modelling approach, b) a discussion on modelling approaches at different resolutions implemented via Generalised Additive Models and Neural Networks and c) experimental results on real data from the UK electricity market. The results confirm that the predictive performance of the proposed modelling approach is competitive with that of low- and high-resolution alternatives.

【40】 Distribution-Free Robust Linear Regression 标题:无分布稳健线性回归 链接:https://arxiv.org/abs/2102.12919

作者:Jaouad Mourtada,Tomas Vaškevičius,Nikita Zhivotovskiy 备注:29 pages, to appear in Mathematical Statistics and Learning 摘要:我们研究随机设计线性回归,不假设协变量的分布和重尾响应变量。在这个无分布回归环境中,我们证明了给定协变量的响应条件二阶矩的有界性是实现非平凡保证的一个充要条件。作为出发点,我们证明了由于Gy{o}rfi,Kohler,Krzy{z}的作用,截断最小二乘估计的经典期望界的一个最优版本然而,我们证明了这种方法对于某些分布是失败的,尽管其期望性能是最优的。然后,结合截断最小二乘法、均值中值法和聚集理论的思想,我们构造了一个非线性估计器,该估计器实现了阶数为$d/n$的超额风险,具有最优的次指数尾。虽然现有的重尾分布线性回归方法侧重于返回线性函数的适当估计量,但我们强调,我们的程序的不适当性对于在无分布环境中获得非平凡保证是必要的。 摘要:We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy"{o}rfi, Kohler, Krzy.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order $d/n$ with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting.

机器翻译,仅供参考

0 人点赞