统计学学术速递[10.20]

stat统计学，共计42篇

【1】 Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes 标题：基于分层Gamma过程的非参数稀疏张量分解链接：https://arxiv.org/abs/2110.10082

作者：Conor Tillinghast,Zheng Wang,Shandian Zhe 机构：Department of Mathematics, University of Utah, Salt Lake City, UT , School of Computing 备注：15 pages, 4 figures 摘要：我们提出了一种稀疏观测张量的非参数因子分解方法。稀疏性并不意味着零值项是大量的或占支配地位的。相反，它意味着观察到的条目非常少，随着张量的增长，甚至更少；这在实践中是普遍存在的。与现有工作相比，我们的模型不仅利用了观测到的入口指数背后的结构信息，而且提供了额外的解释性和灵活性——它可以同时估计一组关于张量节点固有属性的位置因子，另一组社交因素反映了他们在与他人互动时的外向行为；用户可以自由选择这两种因素之间的权衡。具体地说，我们使用层次伽马过程和泊松随机测度来构造一个张量值过程，该过程可以自由地对两类因子进行采样以生成张量，并且始终保证渐近稀疏性。然后，我们对张量过程进行归一化，以获得分层Dirichlet过程来对每个观察到的条目索引进行采样，并使用高斯过程将条目值作为因子的非线性函数进行采样，从而捕获稀疏结构特性和复杂的节点关系。为了有效地进行推断，我们利用有限样本划分、密度变换和随机特征上的Dirichlet过程性质来开发随机变分估计算法。我们在几个基准数据集中展示了我们的方法的优势。摘要：We propose a nonparametric factorization approach for sparsely observed tensors. The sparsity does not mean zero-valued entries are massive or dominated. Rather, it implies the observed entries are very few, and even fewer with the growth of the tensor; this is ubiquitous in practice. Compared with the existent works, our model not only leverages the structural information underlying the observed entry indices, but also provides extra interpretability and flexibility -- it can simultaneously estimate a set of location factors about the intrinsic properties of the tensor nodes, and another set of sociability factors reflecting their extrovert activity in interacting with others; users are free to choose a trade-off between the two types of factors. Specifically, we use hierarchical Gamma processes and Poisson random measures to construct a tensor-valued process, which can freely sample the two types of factors to generate tensors and always guarantees an asymptotic sparsity. We then normalize the tensor process to obtain hierarchical Dirichlet processes to sample each observed entry index, and use a Gaussian process to sample the entry value as a nonlinear function of the factors, so as to capture both the sparse structure properties and complex node relationships. For efficient inference, we use Dirichlet process properties over finite sample partitions, density transformations, and random features to develop a stochastic variational estimation algorithm. We demonstrate the advantage of our method in several benchmark datasets.

【2】 On Clustering Categories of Categorical Predictors in Generalized Linear Models 标题：关于广义线性模型中范畴预报器的聚类范畴链接：https://arxiv.org/abs/2110.10059

作者：Emilio Carrizosa,Marcela Galvis Restrepo,Dolores Romero Morales 机构：Journal article (Accepted manuscript), Please cite this article as: Carrizosa, E., Galvis Restrepo, M., and Romero Morales, D. (,)., On clustering categories of categorical predictors in generalized linear models. Expert Systems 备注：None 摘要：我们提出了一种方法，以减少复杂性的广义线性模型中存在的分类预测。传统的一个热编码，其中每个类别由一个虚拟变量表示，可能是浪费，难以解释，并且容易过度拟合，特别是在处理高基数类别预测时。本文通过对类别预测因子进行聚类，找到类别预测因子的简化表示，从而解决了这些挑战。这是通过一种数值方法来实现的，该方法旨在保持（甚至提高）精度，同时减少分类预测因子的待估计系数数量。多亏了它的设计，我们能够在分类预测值的类别之间导出一个接近度度量，它可以很容易地可视化。我们在现实世界的分类和计数数据集中展示了我们的方法的性能，其中我们看到分类预测聚类大大降低了复杂性而不损害准确性。摘要：We propose a method to reduce the complexity of Generalized Linear Models in the presence of categorical predictors. The traditional one-hot encoding, where each category is represented by a dummy variable, can be wasteful, difficult to interpret, and prone to overfitting, especially when dealing with high-cardinality categorical predictors. This paper addresses these challenges by finding a reduced representation of the categorical predictors by clustering their categories. This is done through a numerical method which aims to preserve (or even, improve) accuracy, while reducing the number of coefficients to be estimated for the categorical predictors. Thanks to its design, we are able to derive a proximity measure between categories of a categorical predictor that can be easily visualized. We illustrate the performance of our approach in real-world classification and count-data datasets where we see that clustering the categorical predictors reduces complexity substantially without harming accuracy.

【3】 Nonstationary seasonal model for daily mean temperature distribution bridging bulk and tails 标题：连接船尾的日平均温度分布的非平稳季节模型链接：https://arxiv.org/abs/2110.10046

作者：Mitchell Krock,Julie Bessac,Michael L. Stein,Adam H. Monahan 摘要：在传统的极值分析中，忽略了大量数据，仅利用分布的尾部进行推断。极端观测值被指定为超过阈值的值或不同时间段内的最大值，后续的估计程序受随机过程极值的渐近理论的激励。对于环境数据，大部分分布中的非平稳行为，如季节性或气候变化，也将在尾部观察到。为了精确地建模这种非平稳性，使用整个数据集而不仅仅是最极端的值似乎是很自然的。在分布的每个尾部观察到不同类型的非平稳性也是很常见的。大多数关于极端情况的研究只关注分布的一个尾部，但对于温度，两个尾部都很有趣。本文基于最近提出的全概率分布的参数模型，该模型在两个尾部都具有灵活的行为。我们将此模型的扩展应用于美国各地不同气候和当地条件下的日平均温度历史记录。我们强调了该方法在过去几十年中以及在不同地理和气候条件下量化总量和尾部变化的能力。与温度极值分析中常用的几种基准模型相比，该模型表现出良好的性能。摘要：In traditional extreme value analysis, the bulk of the data is ignored, and only the tails of the distribution are used for inference. Extreme observations are specified as values that exceed a threshold or as maximum values over distinct blocks of time, and subsequent estimation procedures are motivated by asymptotic theory for extremes of random processes. For environmental data, nonstationary behavior in the bulk of the distribution, such as seasonality or climate change, will also be observed in the tails. To accurately model such nonstationarity, it seems natural to use the entire dataset rather than just the most extreme values. It is also common to observe different types of nonstationarity in each tail of a distribution. Most work on extremes only focuses on one tail of a distribution, but for temperature, both tails are of interest. This paper builds on a recently proposed parametric model for the entire probability distribution that has flexible behavior in both tails. We apply an extension of this model to historical records of daily mean temperature at several locations across the United States with different climates and local conditions. We highlight the ability of the method to quantify changes in the bulk and tails across the year over the past decades and under different geographic and climatic conditions. The proposed model shows good performance when compared to several benchmark models that are typically used in extreme value analysis of temperature.

【4】 BNPdensity: Bayesian nonparametric mixture modeling in R 标题：BNP密度：R中的贝叶斯非参数混合建模链接：https://arxiv.org/abs/2110.10019

作者：Julyan Arbel,Guillaume Kon Kam King,Antonio Lijoi,Luis Enrique Nieto-Bajaras,Igor Prünster 机构：Luis Enrique NIETO-BARAJAS, Igor PR¨UNSTER, Universit´e Grenoble Alpes, Inria, CNRS LJK, Grenoble INP, Grenoble, France, Universit´e Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France, Department of Decision Sciences, BIDSA, Bocconi University, Milan, Italy 摘要：潜在模型错误规范下的稳健统计数据建模通常需要将参数世界留给非参数世界。在后者中，参数是无限维对象，如函数、概率分布或无限向量。在贝叶斯非参数方法中，为这些参数设计了先验分布，从而在实践中为管理非参数模型的复杂性提供了一种方法。然而，大多数现代的贝叶斯非参数模型似乎常常无法达到从业人员，因为推理算法需要仔细设计，以处理无限多的参数。这项工作的目的是通过为贝叶斯非参数推理提供计算工具来促进这一过程。本文介绍了BNPdensity软件包中的一组函数，用于使用无限混合模型进行密度估计，包括所有类型的截尾数据。该软件包提供了对一大类基于规范化随机测度的此类模型的访问，这些模型代表了流行的Dirichlet过程混合物的推广。这种推广的一个显著优点是，它提供了比Dirichlet更可靠的簇数先验。另一个重要的优点是，由于不需要共轭性，因此在为簇的规模和位置参数指定先验值时具有完全的灵活性。推理是使用理论上有根据的近似抽样方法，即Ferguson&Klass算法进行的。该软件包还提供了一些拟合优度诊断，如QQ图，包括交叉验证标准，条件预测坐标。提出的方法以一种称为物种敏感性分布（SSD）问题的经典生态风险评估方法为例，展示了贝叶斯非参数框架的优点。摘要：Robust statistical data modelling under potential model mis-specification often requires leaving the parametric world for the nonparametric. In the latter, parameters are infinite dimensional objects such as functions, probability distributions or infinite vectors. In the Bayesian nonparametric approach, prior distributions are designed for these parameters, which provide a handle to manage the complexity of nonparametric models in practice. However, most modern Bayesian nonparametric models seem often out of reach to practitioners, as inference algorithms need careful design to deal with the infinite number of parameters. The aim of this work is to facilitate the journey by providing computational tools for Bayesian nonparametric inference. The article describes a set of functions available in the R package BNPdensity in order to carry out density estimation with an infinite mixture model, including all types of censored data. The package provides access to a large class of such models based on normalized random measures, which represent a generalization of the popular Dirichlet process mixture. One striking advantage of this generalization is that it offers much more robust priors on the number of clusters than the Dirichlet. Another crucial advantage is the complete flexibility in specifying the prior for the scale and location parameters of the clusters, because conjugacy is not required. Inference is performed using a theoretically grounded approximate sampling methodology known as the Ferguson & Klass algorithm. The package also offers several goodness of fit diagnostics such as QQ-plots, including a cross-validation criterion, the conditional predictive ordinate. The proposed methodology is illustrated on a classical ecological risk assessment method called the Species Sensitivity Distribution (SSD) problem, showcasing the benefits of the Bayesian nonparametric framework.

【5】 How to Guide Decisions with Bayes Factors 标题：如何用贝叶斯因子指导决策链接：https://arxiv.org/abs/2110.09981

作者：Patrick Schwaferts,Thomas Augustin 机构：Ludwig-Maximilians-Universit¨at Munich, Department of Statistics, Methodological Foundations of Statistics and its Applications, Ludwigsstraße , Munich, Germany 备注：13 pages, 3 figures, 1 table 摘要：有些科学研究问题是用来指导决策的，有些则不然。就其性质而言，频繁假设测试产生了一个二分的测试决定，使得它们不适合于后一类研究问题。然而，有人认为贝叶斯因素既可以避免做出决策，也可以用于指导决策。本文阐述了如何使用贝叶斯因子来指导决策。在这方面，它嵌入在贝叶斯决策理论的框架中，其中需要指定一个（基于假设的）损失函数。通常，这样的规范对于应用科学家来说是困难的，因为相关信息可能是稀缺的、模糊的、不完整的和不明确的。为了解决这一问题，应允许对该损失函数进行稳健的区间值说明，以便将基本但部分的信息按原样纳入分析。此外，如果对决策感兴趣，则可以减轻将先验分布限制为适当分布（这是计算贝叶斯因子所必需的）的限制。基于假设的具有稳健损失函数的贝叶斯决策理论的最终框架以及如何从现有的贝叶斯因子中得出最优决策都由用户友好且简单的逐步指南描述。摘要：Some scientific research questions ask to guide decisions and others do not. By their nature frequentist hypothesis-tests yield a dichotomous test decision as result, rendering them rather inappropriate for latter types of research questions. Bayes factors, however, are argued to be both able to refrain from making decisions and to be employed in guiding decisions. This paper elaborates on how to use a Bayes factor for guiding a decision. In this regard, its embedding within the framework of Bayesian decision theory is delineated, in which a (hypothesis-based) loss function needs to be specified. Typically, such a specification is difficult for an applied scientist as relevant information might be scarce, vague, partial, and ambiguous. To tackle this issue, a robust, interval-valued specification of this loss function shall be allowed, such that the essential but partial information can be included into the analysis as is. Further, the restriction of the prior distributions to be proper distributions (which is necessary to calculate Bayes factors) can be alleviated if a decision is of interest. Both the resulting framework of hypothesis-based Bayesian decision theory with robust loss function and how to derive optimal decisions from already existing Bayes factors are depicted by user-friendly and straightforward step-by-step guides.

【6】 Fully Three-dimensional Radial Visualization 标题：全三维径向可视化链接：https://arxiv.org/abs/2110.09971

作者：Yifan Zhu,Fan Dai,Ranjan Maitra 机构： Maitra are with the Department of Statistics at Iowa State University, Dai is with the Department of Mathematical Sciences at the Michigan Technological University 备注：10 pages, 7 figures, 1 table 摘要：我们开发了多维数据集的三维（3D）径向可视化（RadViz）方法。经典的二维（2D）RadViz通过将每个观测值映射到单位圆内的一个点来可视化2D平面上的多变量数据。我们的工具RadViz3D在3D单位球体上均匀分布锚定点。我们表明，这种均匀分布为具有不相关变量的数据提供了最佳的可视化效果和最小的人工视觉相关性。但是，仅对于五个柏拉图式实体，锚定点可以彼此精确等距放置，因此我们为这五种设置提供等距锚定点，并通过斐波那契网格为其他情况提供近似等距锚定点。我们的方法在R包$radviz3d$中实现，使完全3D RadViz成为可能，并被证明提高了这种非线性技术更真实地显示模拟数据以及螃蟹、橄榄油和葡萄酒数据集的能力。此外，由于径向可视化自然适用于成分数据，我们使用RadViz3D来说明（i）龙泉青瓷的化学成分及其数百年来景德镇仿制品，以及（ii）2021年夏季三角洲变异激增期间，美国地区SARS-Cov-2变异在新冠病毒-19大流行中的流行率。摘要：We develop methodology for three-dimensional (3D) radial visualization (RadViz) of multidimensional datasets. The classical two-dimensional (2D) RadViz visualizes multivariate data in the 2D plane by mapping every observation to a point inside the unit circle. Our tool, RadViz3D, distributes anchor points uniformly on the 3D unit sphere. We show that this uniform distribution provides the best visualization with minimal artificial visual correlation for data with uncorrelated variables. However, anchor points can be placed exactly equi-distant from each other only for the five Platonic solids, so we provide equi-distant anchor points for these five settings, and approximately equi-distant anchor points via a Fibonacci grid for the other cases. Our methodology, implemented in the R package $radviz3d$, makes fully 3D RadViz possible and is shown to improve the ability of this nonlinear technique in more faithfully displaying simulated data as well as the crabs, olive oils and wine datasets. Additionally, because radial visualization is naturally suited for compositional data, we use RadViz3D to illustrate (i) the chemical composition of Longquan celadon ceramics and their Jingdezhen imitation over centuries, and (ii) US regional SARS-Cov-2 variants' prevalence in the Covid-19 pandemic during the summer 2021 surge of the Delta variant.

【7】 Advanced Statistical Learning on Short Term Load Process Forecasting 标题：短期负荷过程预测的高级统计学习链接：https://arxiv.org/abs/2110.09920

作者：Junjie Hu,Brenda López Cabrera,Awdesch Melzer 机构：Humboldt-Universit¨at zu Berlin, .,. 备注：19 pages 摘要：短期负荷预测（STLF）是电力用户进行有效调度、运行优化交易和决策的必要条件。现代高效的机器学习方法被用来管理具有非线性时间依赖结构的复杂结构大数据集。我们提出了不同的统计非线性模型来管理这些硬类型数据集的挑战，并预测未来2天内15分钟的频率电力负荷。通过Diebold-Mariano（DM）检验，我们发现应用于化工生产装置生产线的长-短记忆（LSTM）和选通循环单元（GRU）模型在样本外预测精度方面优于其他几种预测模型。预测信息是电力用户风险和生产管理的基础。摘要：Short Term Load Forecast (STLF) is necessary for effective scheduling, operation optimization trading, and decision-making for electricity consumers. Modern and efficient machine learning methods are recalled nowadays to manage complicated structural big datasets, which are characterized by having a nonlinear temporal dependence structure. We propose different statistical nonlinear models to manage these challenges of hard type datasets and forecast 15-min frequency electricity load up to 2-days ahead. We show that the Long-short Term Memory (LSTM) and the Gated Recurrent Unit (GRU) models applied to the production line of a chemical production facility outperform several other predictive models in terms of out-of-sample forecasting accuracy by the Diebold-Mariano (DM) test with several metrics. The predictive information is fundamental for the risk and production management of electricity consumers.

【8】 Bayes Factors can only Quantify Evidence w.r.t. Sets of Parameters, not w.r.t. (Prior) Distributions on the Parameter 标题：贝叶斯因子只能量化证据w.r.t。参数集，而不是w.r.t.参数上的(先验)分布链接：https://arxiv.org/abs/2110.09871

作者：Patrick Schwaferts,Thomas Augustin 机构：Ludwig-Maximilians-Universität Munich, Germany 备注：11 pages, 4 figures 摘要：贝叶斯因子具有强大的贝叶斯统计数学框架和作为证据量化的有用解释的特点。前者需要通过观察数据改变参数分布，后者需要两个固定的假设，即证据量化所指的w.r.t。当然，这些固定的假设不能因为看到数据而改变，只有它们的可信度应该改变！然而，如果假设的内容由参数分布表示（近十年来贝叶斯因子的最新趋势），那么通过查看数据，假设本身（不仅是其可信度）发生了这样的变化，从而使得对贝叶斯因子的正确解释变得毫无用处。相反，本文认为，贝叶斯假设的推理基础只能维持，如果假设是参数集，而不是参数分布。此外，还特别注意在贝叶斯因素的背景下提供统计推断大图的明确术语，以及知识（由先验分布形式化并允许改变）与理论立场之间的区别（形式化为假设并要求保持固定）对感兴趣现象的描述。摘要：Bayes factors are characterized by both the powerful mathematical framework of Bayesian statistics and the useful interpretation as evidence quantification. Former requires a parameter distribution that changes by seeing the data, latter requires two fixed hypotheses w.r.t. which the evidence quantification refers to. Naturally, these fixed hypotheses must not change by seeing the data, only their credibility should! Yet, it is exactly such a change of the hypotheses themselves (not only their credibility) that occurs by seeing the data, if their content is represented by parameter distributions (a recent trend in the context of Bayes factors for about one decade), rendering a correct interpretation of the Bayes factor rather useless. Instead, this paper argues that the inferential foundation of Bayes factors can only be maintained, if hypotheses are sets of parameters, not parameter distributions. In addition, particular attention has been paid to providing an explicit terminology of the big picture of statistical inference in the context of Bayes factors as well as to the distinction between knowledge (formalized by the prior distribution and being allowed to change) and theoretical positions (formalized as hypotheses and required to stay fixed) of the phenomenon of interest.

【9】 Learning Pareto-Efficient Decisions with Confidence 标题：自信地学习帕累托有效决策链接：https://arxiv.org/abs/2110.09864

作者：Sofia Ek,Dave Zachariah,Petre Stoica 机构：Uppsala University, Sweden 摘要：本文考虑了结果不确定时的多目标决策支持问题。我们扩展了帕累托有效决策的概念，以考虑不同背景下决策结果的不确定性。这使得根据与安全关键相关的尾部结果量化决策之间的权衡应用。我们提出了一种基于共形预测文献的结果学习具有统计置信度的有效决策的方法。该方法适用于弱或不存在的上下文协变量重叠，并使用合成数据和真实数据评估其统计保证。摘要：The paper considers the problem of multi-objective decision support when outcomes are uncertain. We extend the concept of Pareto-efficient decisions to take into account the uncertainty of decision outcomes across varying contexts. This enables quantifying trade-offs between decisions in terms of tail outcomes that are relevant in safety-critical applications. We propose a method for learning efficient decisions with statistical confidence, building on results from the conformal prediction literature. The method adapts to weak or nonexistent context covariate overlap and its statistical guarantees are evaluated using both synthetic and real data.

【10】 The dynamic relationship of crude oil prices on macroeconomic variables in Ghana: a time series analysis approach 标题：加纳原油价格与宏观经济变量的动态关系：时间序列分析方法链接：https://arxiv.org/abs/2110.09850

作者：Dennis Arku,Gabriel Kallah-Dagadu,Dzidzor Kwabla Klogo 机构：Department of Statistics and Actuarial Science, College of Basic and Applied Sciences, University of Ghana, Legon. 摘要：该研究利用从加纳银行数据仓库获得的数据，调查了原油价格对加纳通货膨胀和利率的影响。扩展Dickey-Fuller和Phillips-Perron检验用于检验变量之间是否存在单位根关系。平稳性检验表明，变量要么是一阶积分，要么是零阶积分。采用自回归分布滞后边界检验方法检验变量间的协整关系。结果表明，从长期来看，原油价格与通货膨胀之间存在正相关关系。从短期来看，通货膨胀的第一阶段滞后值系数为负，但在短期内统计上不显著。然而，通货膨胀的第二阶段滞后值为正且显著。结果还表明，原油价格与利率之间存在负相关关系。根据调查结果，建议加纳政府提供并加强公共交通系统的效率，帮助降低交通费用，以保护穷人免受加纳油价上涨的影响。摘要：The study investigates the effects of crude oil prices on inflation and interest rate in Ghana using data obtained from Bank of Ghana data repository. The Augmented Dickey-Fuller and the Phillips-Perron tests were used to test the presence or otherwise of unit root relationship among the variables. The stationarity test showed that the variables are either integrated of order one or integrated of order zero. Autoregressive distributed lag bounds test approach was adopted to examine cointegration among the variables. The results showed a positive relationship between crude oil prices and inflation in the long-run. The short-run show the coefficient of the first period lag value of inflation is negative, but statistically insignificant in the short-run. However, the second period lag value of inflation is positive and significant. The result also shows negative relationship between crude oil price and interest rate. Based on the findings, it is recommended that the government of Ghana should provide and strengthen the efficiency of the public transport system to help reduce transport fares in order to shield the poor from the implications of oil price increases in Ghana.

【11】 Practical Relevance: A Formal Definition 标题：实践相关性：一个正式的定义链接：https://arxiv.org/abs/2110.09837

作者：Patrick Schwaferts,Thomas Augustin 备注：14 pages, 1 figure 摘要：有一个总的协议，重要的是要考虑的实际相关性的效果，除了其统计意义，但正式定义的实际相关性仍然悬而未决，并应在本文中提供。似乎需要一个以行动和损失函数为特征的潜在决策问题来定义实际相关性的概念，使其成为决策理论的概念。在基于假设的分析中，实际相关性的概念涉及到合理地指定假设，使得零假设不仅包含单个参数零值，还包含在实际层面上与零值等效的所有参数值。在这方面，实践相关性的定义也被扩展到假设的上下文中。本文中对实际相关性概念的正式阐述表明，通常情况下，在处理效果或某些结果的实际相关性时，隐含地假设了一个特定的决策问题。因此，将决策理论的考虑因素纳入统计分析表明，其本身仅仅是实践相关性概念的性质。摘要：There is a general agreement that it is important to consider the practical relevance of an effect in addition to its statistical significance, yet a formal definition of practical relevance is still pending and shall be provided within this paper. It appears that an underlying decision problem, characterized by actions and a loss function, is required to define the notion of practical relevance, rendering it a decision theoretic concept. In the context of hypothesis-based analyses, the notion of practical relevance relates to specifying the hypotheses reasonably, such that the null hypothesis does not contain only a single parameter null value, but also all parameter values that are equivalent to the null value on a practical level. In that regard, the definition of practical relevance is also extended into the context of hypotheses. The formal elaborations on the notion of practical relevance within this paper indicate that, typically, a specific decision problem is implicitly assumed when dealing with the practical relevance of an effect or some results. As a consequence, involving decision theoretic considerations into a statistical analysis suggests itself by the mere nature of the notion of practical relevance.

【12】 Simulating the Power of Statistical Tests: A Collection of R Examples 标题：模拟统计测试的威力：R例集链接：https://arxiv.org/abs/2110.09836

作者：Florian Wickelmaier 机构：University of Tuebingen 备注：PDFLaTeX, 23 pages, uses packages hyperref, listings, apacite 摘要：本文阐述了如何通过计算机模拟计算统计检验的功效。它为几种经典推理程序的幂模拟提供了R代码，包括一个和两个样本t检验、卡方检验、回归和方差分析。摘要：This paper illustrates how to calculate the power of a statistical test by computer simulation. It provides R code for power simulations of several classical inference procedures including one- and two-sample t tests, chi-squared tests, regression, and analysis of variance.

【13】 Learning to Learn Graph Topologies 标题：学习学习图拓扑链接：https://arxiv.org/abs/2110.09807

作者：Xingyue Pu,Tianyue Cao,Xiaoyun Zhang,Xiaowen Dong,Siheng Chen 机构：University of Oxford, Shanghai Jiao Tong University 备注：Accepted at NeurIPS 2021 摘要：学习图形拓扑以揭示数据实体之间的潜在关系在各种机器学习和数据分析任务中起着重要作用。在结构数据在图上平稳变化的假设下，该问题可以表示为半正定锥上的正则凸优化问题，并通过迭代算法求解。经典方法需要一个显式凸函数来反映一般的拓扑先验，例如$ellu 1$强制稀疏性惩罚，这限制了学习丰富拓扑结构的灵活性和表达能力。基于学习优化（L2O）的思想，我们提出学习从节点数据到图结构的映射。具体来说，我们的模型首先将迭代原始-对偶分裂算法展开为神经网络。关键的结构近邻投影被替换为一个变分自动编码器，该编码器用增强的拓扑性质细化估计图。该模型通过成对的节点数据和图形样本以端到端的方式进行训练。对合成数据和真实数据的实验表明，在学习具有特定拓扑性质的图时，我们的模型比经典迭代算法更有效。摘要：Learning a graph topology to reveal the underlying relationship between data entities plays an important role in various machine learning and data analysis tasks. Under the assumption that structured data vary smoothly over a graph, the problem can be formulated as a regularised convex optimisation over a positive semidefinite cone and solved by iterative algorithms. Classic methods require an explicit convex function to reflect generic topological priors, e.g. the $ell_1$ penalty for enforcing sparsity, which limits the flexibility and expressiveness in learning rich topological structures. We propose to learn a mapping from node data to the graph structure based on the idea of learning to optimise (L2O). Specifically, our model first unrolls an iterative primal-dual splitting algorithm into a neural network. The key structural proximal projection is replaced with a variational autoencoder that refines the estimated graph with enhanced topological properties. The model is trained in an end-to-end fashion with pairs of node data and graph samples. Experiments on both synthetic and real-world data demonstrate that our model is more efficient than classic iterative algorithms in learning a graph with specific topological properties.

【14】 Efficient and Consistent Data-Driven Model Selection for Time Series 标题：高效一致的时间序列数据驱动模型选择链接：https://arxiv.org/abs/2110.09785

作者：Jean-Marc Bardet,Kamila Kare,William Kengne 机构： KAMILA KARE 1 AND WILLIAM KENGNE 2 1University Paris 1 Panthéon-Sorbonne, com 2CY Cergy Paris Université 摘要：本文研究了一大类因果时间序列模型的模型选择问题，包括ARMA或AR（$infty$）过程，以及GARCH或ARCH（$infty$）、APARCH、ARMA-GARCH和许多其他过程。我们首先研究了在包含真实模型的有限模型族中，使拟似然估计引起的风险最小化的理想惩罚的渐近行为。然后，我们给出了获得一致性和效率性质的惩罚项的一般条件。我们显著地证明了一致模型选择准则在效率方面优于经典AIC准则。最后，我们从贝叶斯方法推导出通常的BIC准则，并通过保持拉普拉斯近似的所有二阶项，将数据驱动准则表示为KC’。Monte Carlo实验展示了所得的渐近结果，并表明KC准则在一致性和效率方面优于AIC和BIC准则。摘要：This paper studies the model selection problem in a large class of causal time series models, which includes both the ARMA or AR($infty$) processes, as well as the GARCH or ARCH($infty$), APARCH, ARMA-GARCH and many others processes. We first study the asymptotic behavior of the ideal penalty that minimizes the risk induced by a quasi-likelihood estimation among a finite family of models containing the true model. Then, we provide general conditions on the penalty term for obtaining the consistency and efficiency properties. We notably prove that consistent model selection criteria outperform classical AIC criterion in terms of efficiency. Finally, we derive from a Bayesian approach the usual BIC criterion, and by keeping all the second order terms of the Laplace approximation, a data-driven criterion denoted KC'. Monte-Carlo experiments exhibit the obtained asymptotic results and show that KC' criterion does better than the AIC and BIC ones in terms of consistency and efficiency.

【15】 Hierarchical Bayesian Modeling of Ocean Heat Content and its Uncertainty 标题：海洋热含量及其不确定性的分层贝叶斯建模链接：https://arxiv.org/abs/2110.09717

作者：Samuel Baugh,Karen McKinnon 机构：Institute for the Environment and Sustainability, UCLA 备注：Manuscript 23 pages 7 figures, supplement 7 pages 4 figures 摘要：准确量化世界海洋热含量的变化对于我们理解温室气体浓度增加的影响至关重要。Argo计划由测量全球海洋垂直温度剖面的拉格朗日浮标组成，提供了丰富的数据，用于估算海洋热含量。然而，建立一个全球一致的海洋热含量统计模型仍然具有挑战性，因为需要一个能够捕获复杂非平稳性的全球有效协方差模型。在本文中，我们开发了一个分层贝叶斯-高斯过程模型，该模型使用圆柱距离的核卷积来考虑所有模型参数的空间非平稳性，同时使用Vecchia过程来保持大型空间数据集的计算可行性。我们的方法可以为全球综合数量生成有效可信的区间，这是使用以前的方法无法做到的。通过将该模型应用于Argo数据，证明了这些优势，为海洋热含量的空间变化趋势提供了可信的区间，该区间解释了插值和估计平均场及其他参数产生的不确定性。通过交叉验证，我们证明了我们的模型比其他更简单的模型具有更好的开箱即用的性能。执行此分析的代码作为R包BayesianOHC提供。摘要：The accurate quantification of changes in the heat content of the world's oceans is crucial for our understanding of the effects of increasing greenhouse gas concentrations. The Argo program, consisting of Lagrangian floats that measure vertical temperature profiles throughout the global ocean, has provided a wealth of data from which to estimate ocean heat content. However, creating a globally consistent statistical model for ocean heat content remains challenging due to the need for a globally valid covariance model that can capture complex nonstationarity. In this paper, we develop a hierarchical Bayesian Gaussian process model that uses kernel convolutions with cylindrical distances to allow for spatial non-stationarity in all model parameters while using a Vecchia process to remain computationally feasible for large spatial datasets. Our approach can produce valid credible intervals for globally integrated quantities that would not be possible using previous approaches. These advantages are demonstrated through the application of the model to Argo data, yielding credible intervals for the spatially varying trend in ocean heat content that accounts for both the uncertainty induced from interpolation and from estimating the mean field and other parameters. Through cross-validation, we show that our model out-performs an out-of-the-box approach as well as other simpler models. The code for performing this analysis is provided as the R package BayesianOHC.

【16】 Hybrid variable monitoring: An unsupervised process monitoring framework 标题：混合变量监控：一种无监督的过程监控框架链接：https://arxiv.org/abs/2110.09704

作者：Min Wang,Donghua Zhou,Maoyin Chen 机构：Department of Automation, Tsinghua University, Beijing , China, College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, China 备注：This paper has been submitted to Automatica for potential publication 摘要：传统的过程监控方法，如PCA、PLS、ICA、MD等，由于不可避免地涉及欧氏距离或马氏距离，因此对连续变量的依赖性很强。随着工业过程变得越来越复杂和集成，除了连续变量外，监控变量中还出现了二进制变量，这使得过程监控更具挑战性。上述传统方法无法挖掘二进制变量的信息，因此在数据预处理过程中通常会丢弃其中包含的有用信息。针对这一问题，本文重点研究了混合变量监控问题，提出了一种新的基于混合变量的无监督过程监控框架。HVM是在概率框架下提出的，它可以同时有效地利用连续变量和二元变量中隐含的过程信息。在HVM中，定义了适用于只有健康状态数据的混合变量的统计信息和监控策略，并阐述了框架背后的物理解释。此外，详细推导了HVM所需参数的估计，并分析了该方法的可检测条件。最后，首先通过数值模拟，然后通过一个火电厂的实际案例，充分展示了HVM的优越性。摘要：Traditional process monitoring methods, such as PCA, PLS, ICA, MD et al., are strongly dependent on continuous variables because most of them inevitably involve Euclidean or Mahalanobis distance. With industrial processes becoming more and more complex and integrated, binary variables also appear in monitoring variables besides continuous variables, which makes process monitoring more challenging. The aforementioned traditional approaches are incompetent to mine the information of binary variables, so that the useful information contained in them is usually discarded during the data preprocessing. To solve the problem, this paper focuses on the issue of hybrid variable monitoring (HVM) and proposes a novel unsupervised framework of process monitoring with hybrid variables. HVM is addressed in the probabilistic framework, which can effectively exploit the process information implicit in both continuous and binary variables at the same time. In HVM, the statistics and the monitoring strategy suitable for hybrid variables with only healthy state data are defined and the physical explanation behind the framework is elaborated. In addition, the estimation of parameters required in HVM is derived in detail and the detectable condition of the proposed method is analyzed. Finally, the superiority of HVM is fully demonstrated first on a numerical simulation and then on an actual case of a thermal power plant.

【17】 abess: A Fast Best Subset Selection Library in Python and R 标题：ABESS：一种基于Python和R的快速最佳子集选择库链接：https://arxiv.org/abs/2110.09697

作者：Jin Zhu,Liyuan Hu,Junhao Huang,Kangkang Jiang,Yanhang Zhang,Shiyun Lin,Junxian Zhu,Xueqin Wang 机构：Department of Statistical Science, Sun Yat-Sen University, Guangzhou, GD, China, School of Statistics, Renmin University of China, Beijing, China, Center for Statistical Science, Peking University, Beijing, China 摘要：我们引入了一个名为abess的新库，它实现了最佳子集选择的统一框架，用于解决各种机器学习问题，例如线性回归、分类和主成分分析。特别是在线性模型下，abess可证明在多项式时间内得到最优解。我们的高效实现使abess能够以比现有竞争变量（模型）选择工具箱快100倍甚至100倍的速度解决最佳子集选择问题。此外，它还支持常见的变体，如最佳组子集选择和$ellu 2$正则化最佳子集选择。图书馆的核心是用C 编程的。为了便于使用，Python库设计用于方便地与scikit learn集成，并且可以从Python库索引安装。此外，一个用户友好的R库在全面的R存档网络中可用。源代码可从以下网址获得：https://github.com/abess-team/abess. 摘要：We introduce a new library named abess that implements a unified framework of best-subset selection for solving diverse machine learning problems, e.g., linear regression, classification, and principal component analysis. Particularly, the abess certifiably gets the optimal solution within polynomial times under the linear model. Our efficient implementation allows abess to attain the solution of best-subset selection problems as fast as or even 100x faster than existing competing variable (model) selection toolboxes. Furthermore, it supports common variants like best group subset selection and $ell_2$ regularized best-subset selection. The core of the library is programmed in C . For ease of use, a Python library is designed for conveniently integrating with scikit-learn, and it can be installed from the Python library Index. In addition, a user-friendly R library is available at the Comprehensive R Archive Network. The source code is available at: https://github.com/abess-team/abess.

【18】 Multilevel Stochastic Optimization for Imputation in Massive Medical Data Records 标题：海量医疗数据记录补偿的多级随机优化链接：https://arxiv.org/abs/2110.09680

作者：Xiaoyu Wang,Wenrui Li,Yuetian Sun,Snezana Milanovic,Mark Kon,Julio Enrique Castrillon-Candas 机构：Department of Mathematics and Statistics, Boston University, Boston, MA , USA 备注：18 pages, 2 figures 摘要：对海量数据集的探索和分析最近引起了研发界越来越多的兴趣。长期以来，许多数据集都包含大量缺失的数值数据，这是一个公认的问题。介绍了一种基于克里格理论的数学原理的随机优化插补方法。这是一种强有力的插补方法。然而，它的计算工作和潜在的数值不稳定性产生了昂贵和/或不可靠的预测，可能限制了它在大规模数据集上的使用。在本文中，我们将最近发展的多级随机优化方法应用于海量病历的插补问题。该方法基于计算应用数学技术，具有较高的精度。特别是，对于最佳线性无偏预测器（BLUP），这种多级公式是精确的，而且速度更快，数值更稳定。这使得克立格方法能够实际应用于海量数据集的数据插补问题。我们在国家住院患者样本（NIS）数据记录、医疗成本和利用项目（HCUP）、医疗研究和质量机构（Agency for Healthcare Research and Quality）的数据上测试了这种方法。数值结果表明，多层方法的性能明显优于现有方法，并且在数值上具有鲁棒性。特别是，与HCUP最近关于数据缺失这一重要问题的报告中建议的方法相比，它具有更高的准确性，这可能导致次优和基础较差的融资政策决策。对比基准测试表明，多水平随机方法明显优于报告中推荐的方法，包括预测平均值匹配（PMM）和预测后验分布（PPD），误差降低高达75%。摘要：Exploration and analysis of massive datasets has recently generated increasing interest in the research and development communities. It has long been a recognized problem that many datasets contain significant levels of missing numerical data. We introduce a mathematically principled stochastic optimization imputation method based on the theory of Kriging. This is shown to be a powerful method for imputation. However, its computational effort and potential numerical instabilities produce costly and/or unreliable predictions, potentially limiting its use on large scale datasets. In this paper, we apply a recently developed multi-level stochastic optimization approach to the problem of imputation in massive medical records. The approach is based on computational applied mathematics techniques and is highly accurate. In particular, for the Best Linear Unbiased Predictor (BLUP) this multi-level formulation is exact, and is also significantly faster and more numerically stable. This permits practical application of Kriging methods to data imputation problems for massive datasets. We test this approach on data from the National Inpatient Sample (NIS) data records, Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality. Numerical results show the multi-level method significantly outperforms current approaches and is numerically robust. In particular, it has superior accuracy as compared with methods recommended in the recent report from HCUP on the important problem of missing data, which could lead to sub-optimal and poorly based funding policy decisions. In comparative benchmark tests it is shown that the multilevel stochastic method is significantly superior to recommended methods in the report, including Predictive Mean Matching (PMM) and Predicted Posterior Distribution (PPD), with up to 75% reductions in error.

【19】 A simple Bayesian state-space model for the collective risk model 标题：集体风险模型的一种简单贝叶斯状态空间模型链接：https://arxiv.org/abs/2110.09657

作者：Jae Youn Ahn,Himchan Jeong,Yang Lu 摘要：频率和严重性的集体风险模型（CRM）是零售保险费率制定、宏观灾难性风险预测以及银行监管操作风险的重要工具。该模型最初是为横断面数据设计的，最近通过引入随机效应对其进行了纵向调整，以进行先验和后验评级。然而，到目前为止，由于计算方面的考虑，通常假定随机效应是静态的，从而导致忽略索赔优先级的预测溢价。本文提出了一种新的二元动态随机效应过程CRM模型。该模型基于贝叶斯状态空间模型。它与似然函数的简单预测平均值和闭式表达式相关联，同时也考虑了频率和严重度分量之间的相关性。通过实际数据在汽车保险中的应用，验证了该方法的有效性。摘要：The collective risk model (CRM) for frequency and severity is an important tool for retail insurance ratemaking, macro-level catastrophic risk forecasting, as well as operational risk in banking regulation. This model, which is initially designed for cross-sectional data, has recently been adapted to a longitudinal context to conduct both a priori and a posteriori ratemaking, through the introduction of random effects. However, so far, the random effect(s) is usually assumed static due to computational concerns, leading to predictive premium that omit the seniority of the claims. In this paper, we propose a new CRM model with bivariate dynamic random effect process. The model is based on Bayesian state-space models. It is associated with the simple predictive mean and closed form expression for the likelihood function, while also allowing for the dependence between the frequency and severity components. Real data application to auto insurance is proposed to show the performance of our method.

【20】 The f-divergence and Loss Functions in ROC Curve标题：ROC曲线中的f-散度函数和损失函数链接：https://arxiv.org/abs/2110.09651

作者：Song Liu 机构：netSchool of MathematicsUniversity of Bristol 摘要：给定两个数据分布和一个测试分数函数，受试者操作特征（ROC）曲线显示了这样一个分数对两个分布的分离程度。然而，ROC曲线是否可以用来衡量两种分布之间的差异？本文表明，当使用数据似然比作为测试分数时，ROC曲线的弧长会产生一个新的$f$-散度，用于测量两个数据分布之间的差异。使用变分目标和经验样本近似此弧长，可在损失函数未知的情况下实现经验风险最小化。我们提供了一个拉格朗日对偶目标，并将核模型引入到估计问题中。我们研究了该估计的非参数收敛速度，并表明在实反正切密度比函数的轻度光滑条件下，收敛速度为$O_p（n^{-beta/4}）$（$betain（0,1]$取决于光滑度）。摘要：Given two data distributions and a test score function, the Receiver Operating Characteristic (ROC) curve shows how well such a score separates two distributions. However, can the ROC curve be used as a measure of discrepancy between two distributions? This paper shows that when the data likelihood ratio is used as the test score, the arc length of the ROC curve gives rise to a novel $f$-divergence measuring the differences between two data distributions. Approximating this arc length using a variational objective and empirical samples leads to empirical risk minimization with previously unknown loss functions. We provide a Lagrangian dual objective and introduce kernel models into the estimation problem. We study the non-parametric convergence rate of this estimator and show under mild smoothness conditions of the real arctangent density ratio function, the rate of convergence is $O_p(n^{-beta/4})$ ($beta in (0,1]$ depends on the smoothness).

【21】 Robustly leveraging the post-randomization information to improve precision in the analyses of randomized clinical trials 标题：充分利用随机化后信息提高随机临床试验分析的精确度链接：https://arxiv.org/abs/2110.09645

作者：Bingkai Wang,Yu Du 摘要：在随机临床试验中，定期收集结果的重复测量。重复测量混合模型（MMRM）利用这些重复结果测量的信息，通常用于初步分析，以估计最终就诊时的平均治疗效果。但是，MMRM在错误地模拟中间结果时，可能会出现精度损失，因此无法以无害的方式使用随机化后的信息。在本文提出了一个新的工作模型，称为IMMRM，该模型推广了MMRM，并优化了协变量调整、分层随机化和中间结果指标调整的精度增益。我们证明了平均治疗效果的IMMRM估计在任意错特异性条件下是一致和渐近正态的假设完全随机缺失，对其工作模型的估计。在简单或分层随机化下，IMRM估计量与协方差分析（ANCOVA）的精度渐近相等或更高估计量和MMRM估计量。通过重新分析糖尿病治疗领域的三个随机试验，我们证明IMRM估计量的方差比ANCOVA小2-24%，比MMRM小5-16%。摘要：In randomized clinical trials, repeated measures of the outcome are routinely collected. The mixed model for repeated measures (MMRM) leverages the information from these repeated outcome measures, and is often used for the primary analysis to estimate the average treatment effect at the final visit. MMRM, however, can suffer from precision loss when it models the intermediate outcomes incorrectly, and hence fails to use the post-randomization information in a harmless way. In this paper, we propose a new working model, called IMMRM, that generalizes MMRM and optimizes the precision gain from covariate adjustment, stratified randomization and adjustment for intermediate outcome measures. We prove that the IMMRM estimator for the average treatment effect is consistent and asymptotically normal under arbitrary misspecification of its working model assuming missing completely at random. Under simple or stratified randomization, the IMMRM estimator is asymptotically equally or more precise than the analysis of covariance (ANCOVA) estimator and the MMRM estimator. By re-analyzing three randomized trials in the diabetes therapeutic area, we demonstrate that the IMMRM estimator has 2-24% smaller variance than ANCOVA and 5-16% smaller variance than MMRM.

【22】 Comparative Methods for the Analysis of Cluster Randomized Trials 标题：整群随机试验分析的比较方法链接：https://arxiv.org/abs/2110.09633

作者：Alejandra Benitez,Maya L. Petersen,Mark J. van der Laan,Nicole Santos,Elizabeth Butrick,Dilys Walker,Rakesh Ghosh,Phelgona Otieno,Peter Waiswa,Laura B. Balzer 摘要：在各个研究学科中，通常采用集群随机试验（CRT）来评估向社区和诊所等参与者群体提供的干预措施。尽管CRT的设计和分析取得了进展，但仍存在一些挑战。首先，有许多可能的方法来指定干预效果（例如，在个体层面或集群层面）。第二，对CRT分析常用方法的理论和实践性能仍知之甚少。在这里，我们使用因果模型来正式定义一系列因果效应，作为反事实结果的汇总度量。接下来，我们将全面概述著名的CRT估计器，包括t检验和广义估计方程（GEE），以及不太知名的方法，包括增广GEE和目标最大似然估计（TMLE）。在有限样本模拟中，我们说明了这些估计器的性能和效应说明的重要性，特别是当簇大小变化时。最后，我们对早产倡议（PTBi）研究数据的应用证明了选择与研究问题对应的分析方法的现实重要性。鉴于TMLE具有估计各种效应的灵活性，以及在保持I型误差控制的同时自适应调整协变量以获得精度增益的能力，我们得出结论TMLE是CRT分析的一个有前途的工具。摘要：Across research disciplines, cluster randomized trials (CRTs) are commonly implemented to evaluate interventions delivered to groups of participants, such as communities and clinics. Despite advances in the design and analysis of CRTs, several challenges remain. First, there are many possible ways to specify the intervention effect (e.g., at the individual-level or at the cluster-level). Second, the theoretical and practical performance of common methods for CRT analysis remain poorly understood. Here, we use causal models to formally define an array of causal effects as summary measures of counterfactual outcomes. Next, we provide a comprehensive overview of well-known CRT estimators, including the t-test and generalized estimating equations (GEE), as well as less known methods, including augmented-GEE and targeted maximum likelihood estimation (TMLE). In finite sample simulations, we illustrate the performance of these estimators and the importance of effect specification, especially when cluster size varies. Finally, our application to data from the Preterm Birth Initiative (PTBi) study demonstrates the real-world importance of selecting an analytic approach corresponding to the research question. Given its flexibility to estimate a variety of effects and ability to adaptively adjust for covariates for precision gains while maintaining Type-I error control, we conclude TMLE is a promising tool for CRT analysis.

【23】 A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds 标题：关于将决策树拟合到来自加性模型的数据的警示故事：推广下限链接：https://arxiv.org/abs/2110.09626

作者：Yan Shuo Tan,Abhineet Agarwal,Bin Yu 机构：Department of Statistics, UC Berkeley, Department of Physics, UC Berkeley, Department of Electrical Engineering and Computer Sciences, UC Berkeley, Center for Computational Biology, UC Berkeley, Chan-Zuckerberg Biohub Intercampus Award Investigator 摘要：决策树不仅是适合高风险决策的可解释模型，而且是集成方法（如随机森林和梯度提升）的构建块。然而，它们的统计特性还没有得到很好的理解。以前被引用最多的工作集中于在经典的非参数回归环境中推导CART的逐点一致性保证。我们采取了不同的方法，并主张研究决策树对于不同生成回归模型的泛化性能。这允许我们引出他们的归纳偏见，即算法做出（或不做出）的假设，以推广到新数据，从而指导实践者何时以及如何应用这些方法。本文主要研究稀疏可加生成模型，该模型具有较低的统计复杂度和一定的非参数灵活性。我们证明了适用于具有$C^1$分量函数的稀疏加性模型的一大类决策树算法的锐平方误差推广下界。这个界限比估计这种稀疏加性模型的极大极小率要糟糕得多。效率低下不是因为贪婪，而是因为当我们仅对每片叶子进行平均响应时，检测全局结构的能力下降，这一观察结果表明有机会改进基于树的算法，例如通过分层收缩。为了证明这些界限，我们开发了新的技术机制，在决策树估计和率失真理论（信息论的一个子领域）之间建立了新的联系。摘要：Decision trees are important both as interpretable models amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well understood. The most cited prior works have focused on deriving pointwise consistency guarantees for CART in a classical nonparametric regression setting. We take a different approach, and advocate studying the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data, thereby guiding practitioners on when and how to apply these methods. In this paper, we focus on sparse additive generative models, which have both low statistical complexity and some nonparametric flexibility. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with $C^1$ component functions. This bound is surprisingly much worse than the minimax rate for estimating such sparse additive models. The inefficiency is due not to greediness, but to the loss in power for detecting global structure when we average responses solely over each leaf, an observation that suggests opportunities to improve tree-based algorithms, for example, by hierarchical shrinkage. To prove these bounds, we develop new technical machinery, establishing a novel connection between decision tree estimation and rate-distortion theory, a sub-field of information theory.

【24】 Sufficient Dimension Reduction for High-Dimensional Regression and Low-Dimensional Embedding: Tutorial and Survey 标题：高维回归和低维嵌入的充分降维：教程和综述链接：https://arxiv.org/abs/2110.09620

作者：Benyamin Ghojogh,Ali Ghodsi,Fakhri Karray,Mark Crowley 机构：Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada, Department of Statistics and Actuarial Science & David R. Cheriton School of Computer Science 备注：To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning 摘要：这是一篇关于充分降维（SDR）的各种方法的教程和调查论文。我们用统计的高维回归视角和降维的机器学习方法介绍了这些方法。我们首先介绍逆回归方法，包括切片逆回归（SIR）、切片平均方差估计（SAVE）、轮廓回归、方向回归、主拟合成分（PFC）、似然获得方向（LAD）和图形回归。然后，我们介绍了前向回归方法，包括主Hessian方向（pHd）、最小平均方差估计（MAVE）、条件方差估计（CVE）和深度SDR方法。最后，我们解释了监督和非监督学习的核降维（KDR）。我们还证明了监督KDR和监督PCA是等价的。摘要：This is a tutorial and survey paper on various methods for Sufficient Dimension Reduction (SDR). We cover these methods with both statistical high-dimensional regression perspective and machine learning approach for dimensionality reduction. We start with introducing inverse regression methods including Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE), contour regression, directional regression, Principal Fitted Components (PFC), Likelihood Acquired Direction (LAD), and graphical regression. Then, we introduce forward regression methods including Principal Hessian Directions (pHd), Minimum Average Variance Estimation (MAVE), Conditional Variance Estimation (CVE), and deep SDR methods. Finally, we explain Kernel Dimension Reduction (KDR) both for supervised and unsupervised learning. We also show that supervised KDR and supervised PCA are equivalent.

【25】 Interpolating between sampling and variational inference with infinite stochastic mixtures 标题：无限随机混合样本与变分推断之间的插值链接：https://arxiv.org/abs/2110.09618

作者：Richard D. Lange,Ari Benjamin,Ralf M. Haefner,Xaq Pitkow 机构：University of Pennsylvania, University of Rochester, Baylor College of Medicine, Rice University, ∗equal contribution 备注：8 pages, 4 figures. Submitted to AISTATS 2022; under double-blind review. Code available at this https URL 摘要：抽样和变分推理（VI）是两大类具有互补优势的近似推理方法。抽样方法擅长于逼近任意概率分布，但可能效率低下。VI方法是有效的，但在概率分布复杂时可能会失败。在这里，我们开发了一个框架，用于构建平衡采样和VI强度的中间算法。两者都使用简单成分分布的混合近似概率分布：在采样中，每个成分都是delta函数，并且随机选择，而在标准VI中，选择单个组件以最小化分歧。我们证明了抽样和VI作为混合分布上优化问题的特例出现，而中间近似是通过改变单个参数产生的。然后，我们推导了随机构建混合物的变分参数上的闭式采样动力学。最后，我们讨论了在给定计算预算的情况下，如何在采样和虚拟仪器之间选择最佳折衷方案。这项工作是向高度灵活但简单的推理方法系列迈出的第一步，该系列推理方法结合了抽样和VI的互补优势。摘要：Sampling and Variational Inference (VI) are two large families of methods for approximate inference with complementary strengths. Sampling methods excel at approximating arbitrary probability distributions, but can be inefficient. VI methods are efficient, but can fail when probability distributions are complex. Here, we develop a framework for constructing intermediate algorithms that balance the strengths of both sampling and VI. Both approximate a probability distribution using a mixture of simple component distributions: in sampling, each component is a delta-function and is chosen stochastically, while in standard VI a single component is chosen to minimize divergence. We show that sampling and VI emerge as special cases of an optimization problem over a mixing distribution, and intermediate approximations arise by varying a single parameter. We then derive closed-form sampling dynamics over variational parameters that stochastically build a mixture. Finally, we discuss how to select the optimal compromise between sampling and VI given a computational budget. This work is a first step towards a highly flexible yet simple family of inference methods that combines the complementary strengths of sampling and VI.

【26】 Semi-supervised Approach to Event Time Annotation Using Longitudinal Electronic Health Records 标题：基于纵向电子健康记录的事件时间标注的半监督方法链接：https://arxiv.org/abs/2110.09612

作者：Liang Liang,Jue Hou,Hajime Uno,Kelly Cho,Yanyuan Ma,Tianxi Cai 机构：Department of Biostatistics, Harvard T.H. Chan School of Public Health, Massachusetts Veterans Epidemiology Research and Information Center, US Department of Veteran Affairs, Department of Medical Oncology, Dana-Farber Cancer Institute 摘要：来自保险索赔和电子健康记录（EHR）系统的大型临床数据集是精确医学研究的宝贵资源。这些数据集可用于开发个性化预测风险或治疗反应的模型。然而，利用真实世界的数据有效地推导预测模型面临着实际和方法上的挑战。这些数据库中没有关于重要临床结果（如癌症进展时间）的准确信息。根据账单或程序代码的简单提取，通常无法很好地近似真实的临床事件时间。然而，手动注释事件时间在时间和资源上是不允许的。在本文中，我们提出了一种利用多维纵向EHR遭遇记录的两步半监督多模式自动时间标注（MATA）方法。在第一步中，我们采用功能主成分分析方法，根据未标记患者的观察点过程估计潜在强度函数。在第二步中，我们将一个惩罚比例赔率模型与事件时间结果相匹配，并在标记数据的第一步中使用B样条逼近非参数基线函数。在正则条件下，特征效应向量的估计结果显示为根-$n$一致。我们通过模拟和一个真实的数据例子来说明我们的方法相对于现有方法的优越性，这个例子是关于在资深卫生管理局的肺癌患者EHR队列中注释肺癌复发的。摘要：Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-$n$ consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.

【27】 A General Modeling Framework for Network Autoregressive Processes 标题：一种通用的网络自回归过程建模框架链接：https://arxiv.org/abs/2110.09596

作者：Hang Yin,Abolfazl Safikhani,George Michailidis 机构：University of Florida, Gainesville, FL, Department of Statistics & Informatics Institute 摘要：本文为网络自回归过程（NAR）开发了一个通用的灵活框架，其中每个节点的响应线性地取决于其过去的值、相邻节点的预定线性组合和一组特定于节点的协变量。相应的系数是特定于节点的，而该框架可以通过空间自回归和基于因子的协方差结构来适应比高斯误差更大的误差。我们提供了一个充分条件，以确保基础NAR的稳定性（平稳性），该NAR明显弱于文献中的对应项。此外，我们为固定数量和发散数量的网络节点开发了普通最小二乘估计和广义最小二乘估计，并提供了在大型网络环境中表现出更好性能的岭正则化估计及其渐近分布。我们还讨论了网络连通性的误判问题及其对上述各种NAR参数估计的渐近分布的影响。该框架以合成和真实的空气污染数据为例进行了说明。摘要：The paper develops a general flexible framework for Network Autoregressive Processes (NAR), wherein the response of each node linearly depends on its past values, a prespecified linear combination of neighboring nodes and a set of node-specific covariates. The corresponding coefficients are node-specific, while the framework can accommodate heavier than Gaussian errors with both spatial-autorgressive and factor based covariance structures. We provide a sufficient condition that ensures the stability (stationarity) of the underlying NAR that is significantly weaker than its counterparts in previous work in the literature. Further, we develop ordinary and generalized least squares estimators for both a fixed, as well as a diverging number of network nodes, and also provide their ridge regularized counterparts that exhibit better performance in large network settings, together with their asymptotic distributions. We also address the issue of misspecifying the network connectivity and its impact on the aforementioned asymptotic distributions of the various NAR parameter estimators. The framework is illustrated on both synthetic and real air pollution data.

【28】 The Two Cultures for Prevalence Mapping: Small Area Estimation and Spatial Statistics 标题：流行地图绘制的两种文化：小区域估计和空间统计链接：https://arxiv.org/abs/2110.09576

作者：Geir-Arne Fuglstad,Zehang Richard Li,Jon Wakefield 机构：Department of Mathematical Sciences, Norwegian University of Science and Technology, Norway, Department of Statistics, University of California Santa Cruz, USA, Department of Statistics and Department of Biostatistics, University of Washington, USA 摘要：低收入和中等收入国家（LMICs）对人口和健康指标的次国家级估计的需求正在推动从基于设计的方法转向空间和时空方法。后者是基于模型的，通过借用空间、时间和协变量的优势来克服数据稀疏性，原则上可以用于根据家庭调查创建年度精细比例尺像素级地图。然而，基于模型的方法的典型实现并不完全承认复杂的调查设计，也不享受基于设计的方法的理论一致性。我们描述了空间和时空方法目前如何用于LMICs背景下的小面积估计，强调了需要克服的关键挑战，并讨论了一种新方法，该方法在精神上更接近于小面积估计。主要讨论点通过两个案例研究得到证明：基于2018年人口和健康调查（DHS）调查的尼日利亚疫苗接种覆盖率的空间分析，以及基于2010年和2015-2016年DHS调查的马拉维新生儿死亡率的时空分析。我们对我们的主要发现进行了总体讨论，并重点讨论了次国家流行率估计的工业生产者所采取的流行方法的影响。摘要：The emerging need for subnational estimation of demographic and health indicators in low- and middle-income countries (LMICs) is driving a move from design-based methods to spatial and spatio-temporal approaches. The latter are model-based and overcome data sparsity by borrowing strength across space, time and covariates and can, in principle, be leveraged to create yearly fine-scale pixel level maps based on household surveys. However, typical implementations of the model-based approaches do not fully acknowledge the complex survey design, and do not enjoy the theoretical consistency of design-based approaches. We describe how spatial and spatio-temporal methods are currently used for small area estimation in the context of LMICs, highlight the key challenges that need to be overcome, and discuss a new approach, which is methodologically closer in spirit to small area estimation. The main discussion points are demonstrated through two case studies: spatial analysis of vaccination coverage in Nigeria based on the 2018 Demographic and Health Surveys (DHS) survey, and spatio-temporal analysis of neonatal mortality in Malawi based on 2010 and 2015--2016 DHS surveys. We discuss our key findings both generally and with an emphasis on the implications for popular approaches undertaken by industrial producers of subnational prevalence estimates.

【29】 Robustness against conflicting prior information in regression 标题：回归中对冲突先验信息的稳健性链接：https://arxiv.org/abs/2110.09556

作者：Philippe Gagnon 机构：Department of Mathematics and Statistics, Université de Montréal, Canada. 摘要：包含有关模型参数的先验信息是任何贝叶斯统计分析的基本步骤。一些人认为它是积极的，因为它允许定量地纳入关于模型参数的专家意见。其他人对它持否定态度，因为它为统计分析中的主观性奠定了基础。当然，当由于与收集的数据冲突而导致推断出现偏差时，会产生问题。根据冲突解决理论（O'Hagan和Pericchi，2012），此类问题的解决方案是减少冲突先验信息的影响，产生与数据一致的推理。这通常是通过使用重尾先验来实现的。我们从理论上和数值上研究了这种方法在回归中的有效性，其中关于系数的先验信息是密度函数与已知位置和尺度参数的乘积。我们研究了具有规则变化尾巴的函数（学生分布），记录规则变化尾巴（如Desgagn踯e（2015）中介绍的），并提出了具有较慢尾巴衰减的函数，允许解决在该回归框架下可能发生的任何冲突，与前面两种类型的函数相反。再现所有数值实验的代码可以在线获得。摘要：Including prior information about model parameters is a fundamental step of any Bayesian statistical analysis. It is viewed positively by some as it allows, among others, to quantitatively incorporate expert opinion about model parameters. It is viewed negatively by others because it sets the stage for subjectivity in statistical analysis. Certainly, it creates problems when the inference is skewed due to a conflict with the data collected. According to the theory of conflict resolution (O'Hagan and Pericchi, 2012), a solution to such problems is to diminish the impact of conflicting prior information, yielding inference consistent with the data. This is typically achieved by using heavy-tailed priors. We study both theoretically and numerically the efficacy of such a solution in regression where the prior information about the coefficients takes the form of a product of density functions with known location and scale parameters. We study functions with regularly-varying tails (Student distributions), log-regularly-varying tails (as introduced in Desgagn'e (2015)), and propose functions with slower tail decays that allow to resolve any conflict that can happen under that regression framework, contrarily to the two previous types of functions. The code to reproduce all numerical experiments is available online.

【30】 Kernel Minimum Divergence Portfolios 标题：核最小散度投资组合链接：https://arxiv.org/abs/2110.09516

作者：Linda Chamakh,Zoltán Szabó 机构： and (iii) the Europlace Institute of Finance, ‡Department of Statistics 摘要：投资组合优化是金融领域的一个关键挑战，其目的是创建符合投资者偏好的投资组合。基于Kullback-Leibler或$f$-分歧的目标分配方法是实现这一目标的最有效形式之一。在本文中，我们建议使用基于核和最优传输（KOT）的发散来处理该任务，这放松了以前方法的假设和优化约束。在基于核的最大均值差异（MMD）的情况下，我们（i）证明了各种目标分布核对的潜在均值嵌入的解析可计算性，（ii）表明这种解析知识可以导致MMD估计的更快收敛，以及（iii）将结果推广到具有极大极小下界的无界指数核。数值实验证明了我们的KOT估计在合成和真实例子中的改进性能。摘要：Portfolio optimization is a key challenge in finance with the aim of creating portfolios matching the investors' preference. The target distribution approach relying on the Kullback-Leibler or the $f$-divergence represents one of the most effective forms of achieving this goal. In this paper, we propose to use kernel and optimal transport (KOT) based divergences to tackle the task, which relax the assumptions and the optimization constraints of the previous approaches. In case of the kernel-based maximum mean discrepancy (MMD) we (i) prove the analytic computability of the underlying mean embedding for various target distribution-kernel pairs, (ii) show that such analytic knowledge can lead to faster convergence of MMD estimators, and (iii) extend the results to the unbounded exponential kernel with minimax lower bounds. Numerical experiments demonstrate the improved performance of our KOT estimators both on synthetic and real-world examples.

【31】 Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes 标题：线性混合马尔可夫决策过程的局部差分私有强化学习链接：https://arxiv.org/abs/2110.10133

作者：Chonghua Liao,Jiafan He,Quanquan Gu 机构：and 备注：25 pages, 2 figures 摘要：强化学习（RL）算法可用于提供个性化服务，它依赖于用户的私有和敏感数据。为了保护用户的隐私，保护隐私的RL算法应运而生。在本文中，我们研究了具有线性函数近似和局部微分隐私（LDP）保证的RL。我们提出了一种新的$（varepsilon，delta）$-LDP算法来学习一类被称为线性混合MDP的马尔可夫决策过程（MDP），并得到了一个$tilde{mathcal{O}（d^{5/4}H^{7/4}T^{3/4}左（log（1/delta）right）^{1/4}sqrt{1/varepsilon遗憾，其中$d$是特征映射的维数，$H$是规划范围的长度，$T$是与环境的交互次数。我们还证明了在$varepsilon$-LDP约束下学习线性混合MDP的下界$Omega（dHsqrt{T}/left（e^{varepsilon}（e^{varepsilon}-1）right））$。在合成数据集上的实验验证了算法的有效性。据我们所知，这是第一个具有线性函数近似的可证明隐私保护RL算法。摘要：Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel $(varepsilon, delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an $tilde{mathcal{O}}( d^{5/4}H^{7/4}T^{3/4}left(log(1/delta)right)^{1/4}sqrt{1/varepsilon})$ regret, where $d$ is the dimension of feature mapping, $H$ is the length of the planning horizon, and $T$ is the number of interactions with the environment. We also prove a lower bound $Omega(dHsqrt{T}/left(e^{varepsilon}(e^{varepsilon}-1)right))$ for learning linear mixture MDPs under $varepsilon$-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

【32】 Inductive Biases and Variable Creation in Self-Attention Mechanisms 标题：自我注意机制中的归纳偏差与变量创造链接：https://arxiv.org/abs/2110.10090

作者：Benjamin L. Edelman,Surbhi Goel,Sham Kakade,Cyril Zhang 机构：Harvard University, Microsoft Research NYC, University of Washington 摘要：“自我关注”是一种架构主题，旨在对序列数据中的远程交互进行建模，它推动了自然语言处理及其他领域的许多最新突破。这项工作对自我注意模块的诱导偏差进行了理论分析，我们的重点是严格确定自我注意模块更愿意代表哪些功能和长期依赖性。我们的主要结果表明，有界范数变换层创建稀疏变量：它们可以表示输入序列的稀疏函数，样本复杂度仅随上下文长度呈对数缩放。此外，我们提出了新的实验协议来支持这一分析，并指导训练Transformer的实践，这是围绕可证明学习稀疏布尔函数的大量工作而建立的。摘要：Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond. This work provides a theoretical analysis of the inductive biases of self-attention modules, where our focus is to rigorously establish which functions and long-range dependencies self-attention blocks prefer to represent. Our main result shows that bounded-norm Transformer layers create sparse variables: they can represent sparse functions of the input sequence, with sample complexity scaling only logarithmically with the context length. Furthermore, we propose new experimental protocols to support this analysis and to guide the practice of training Transformers, built around the large body of work on provably learning sparse Boolean functions.

【33】 Stateful Offline Contextual Policy Evaluation and Learning 标题：有状态离线上下文策略评估和学习链接：https://arxiv.org/abs/2110.10081

作者：Nathan Kallus,Angela Zhou 摘要：我们研究了一类结构化的马尔可夫决策过程中的非策略评估和序列数据学习，这些马尔可夫决策过程产生于与外部到达序列和上下文的重复交互，从而产生未知的个体水平对代理行为的响应。该模型可以被认为是资源受限的上下文强盗的离线泛化。我们形式化了问题的相关因果结构，如动态个性化定价和存在潜在高维用户类型的其他运营管理问题。关键的见解是，单个级别的响应通常不会受到状态变量的因果影响，因此可以很容易地跨时间步和状态进行概括。如果这是真的，我们研究（双重稳健）非政策评估和学习的含义，而不是利用单时间步评估，通过人口数据估计单个到达的预期，用于边际MDP中的拟合值迭代。我们研究样本的复杂性，并分析导致混杂误差随时间持续而非衰减的误差放大。在动态和容量限制定价的仿真中，我们在这类相关问题中显示了改进的样本外策略性能。摘要：We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous sequence of arrivals with contexts, which generate unknown individual-level responses to agent actions. This model can be thought of as an offline generalization of contextual bandits with resource constraints. We formalize the relevant causal structure of problems such as dynamic personalized pricing and other operations management problems in the presence of potentially high-dimensional user types. The key insight is that an individual-level response is often not causally affected by the state variable and can therefore easily be generalized across timesteps and states. When this is true, we study implications for (doubly robust) off-policy evaluation and learning by instead leveraging single time-step evaluation, estimating the expectation over a single arrival via data from a population, for fitted-value iteration in a marginal MDP. We study sample complexity and analyze error amplification that leads to the persistence, rather than attenuation, of confounding error over time. In simulations of dynamic and capacitated pricing, we show improved out-of-sample policy performance in this class of relevant problems.

【34】 Coalitional Bayesian Autoencoders -- Towards explainable unsupervised deep learning 标题：联合贝叶斯自动编码器--走向可解释的无监督深度学习链接：https://arxiv.org/abs/2110.10038

作者：Bang Xiang Yong,Alexandra Brintrup 机构：Institute for Manufacturing, University of Cambridge, UK 备注：Preprint submitted to Journal of Applied Soft Computing 摘要：本文旨在通过提出两种基于对数似然估计的均值和认知不确定性的解释方法来提高自动编码器（AE）预测的可解释性，这两种解释方法自然产生于被称为贝叶斯自动编码器（BAE）的自动编码器的概率公式。为了定量评估解释方法的性能，我们在传感器网络应用中对其进行了测试，并提出了基于传感器协变量移位的三个指标：（1）斯皮尔曼漂移系数的G均值，（2）解释排序的灵敏度-特异性的G均值，（3）传感器解释质量指数（SEQI）它结合了上述两个指标。令人惊讶的是，我们发现BAE预测的解释受到高度相关性的影响，导致误导性解释。为了缓解这种情况，受基于agent的系统理论的启发，提出了一种“联合BAE”。我们在公开的状态监测数据集上进行的综合实验表明，使用联合BAE的解释质量有所提高。摘要：This paper aims to improve the explainability of Autoencoder's (AE) predictions by proposing two explanation methods based on the mean and epistemic uncertainty of log-likelihood estimate, which naturally arise from the probabilistic formulation of the AE called Bayesian Autoencoders (BAE). To quantitatively evaluate the performance of explanation methods, we test them in sensor network applications, and propose three metrics based on covariate shift of sensors : (1) G-mean of Spearman drift coefficients, (2) G-mean of sensitivity-specificity of explanation ranking and (3) sensor explanation quality index (SEQI) which combines the two aforementioned metrics. Surprisingly, we find that explanations of BAE's predictions suffer from high correlation resulting in misleading explanations. To alleviate this, a "Coalitional BAE" is proposed, which is inspired by agent-based system theory. Our comprehensive experiments on publicly available condition monitoring datasets demonstrate the improved quality of explanations using the Coalitional BAE.

【35】 Riemannian classification of EEG signals with missing values 标题：具有缺失值的脑电信号的黎曼分类链接：https://arxiv.org/abs/2110.10011

作者：Alexandre Hippert-Ferrer,Ammar Mian,Florent Bouchard,Frédéric Pascal 机构： Pascal 1 1 Universit´e Paris-Saclay 摘要：本文提出了两种利用协方差矩阵进行脑电分类的缺失数据处理策略。第一种方法使用$k$-最近邻算法从输入数据估计协方差；第二种方法通过在期望最大化算法中利用观测数据的可能性来依赖观测数据。这两种方法结合到黎曼平均分类器的最小距离，并应用于事件相关电位的分类任务，这是一种广为人知的脑-机接口范式。结果表明，与基于观测数据的分类方法相比，本文提出的分类方法具有更好的分类效果，即使在数据丢失率增加的情况下，也能保持较高的分类精度。摘要：This paper proposes two strategies to handle missing data for the classification of electroencephalograms using covariance matrices. The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. Both approaches are combined with the minimum distance to Riemannian mean classifier and applied to a classification task of event related-potentials, a widely known paradigm of brain-computer interface paradigms. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.

【36】 BAMLD: Bayesian Active Meta-Learning by Disagreement 标题：BAMLD：基于不一致的贝叶斯主动元学习链接：https://arxiv.org/abs/2110.09943

作者：Ivana Nikoloska,Osvaldo Simeone 机构：KCLIP, CTR, King’s College London 备注：submitted for publication 摘要：数据有效的学习算法在许多实际应用中是必不可少的，对于这些应用，数据收集和标记是昂贵的或不可行的，例如，对于自动驾驶汽车。为了解决这个问题，元学习从一组元训练任务中推断出一种归纳偏差，以便使用少量样本学习新的但相关的任务。大多数研究假设元学习者能够访问大量任务中的标记数据集。在实践中，可能只有来自任务的未标记数据集可用，需要在标准元学习方案中使用之前执行昂贵的标记程序。为了减少元训练任务的标记请求数量，本文引入了一种信息论主动任务选择机制，该机制通过在不同归纳偏差下获得的预测之间的差异来量化认知不确定性。我们详细介绍了一个基于高斯过程回归的非参数方法的实例，并报告了它的经验性能结果，与现有的启发式获取机制相比，它的性能是非常好的。摘要：Data-efficient learning algorithms are essential in many practical applications for which data collection and labeling is expensive or infeasible, e.g., for autonomous cars. To address this problem, meta-learning infers an inductive bias from a set of meta-training tasks in order to learn new, but related, task using a small number of samples. Most studies assume the meta-learner to have access to labeled data sets from a large number of tasks. In practice, one may have available only unlabeled data sets from the tasks, requiring a costly labeling procedure to be carried out before use in standard meta-learning schemes. To decrease the number of labeling requests for meta-training tasks, this paper introduces an information-theoretic active task selection mechanism which quantifies the epistemic uncertainty via disagreements among the predictions obtained under different inductive biases. We detail an instantiation for nonparametric methods based on Gaussian Process Regression, and report its empirical performance results that compare favourably against existing heuristic acquisition mechanisms.

【37】 Extensive Deep Temporal Point Process 标题：广泛的深部时点过程链接：https://arxiv.org/abs/2110.09823

作者：Haitao Lin,Cheng Tan,Lirong Wu,Zhangyang Gao,Stan. Z. Li 备注：21 pages 摘要：时间点过程作为连续时间域上的随机过程，通常用于对具有发生时间戳的异步事件序列进行建模。随着深度学习的兴起，由于深度神经网络具有很强的表达能力，在时间点过程的设置中，深度神经网络正成为异步序列模式捕获的一种很有前途的选择。本文首先回顾了近年来利用深时点过程对异步事件序列建模的研究重点和难点，归纳为四个方面：历史序列编码、条件强度函数表示、事件关系发现和优化学习方法。我们将最近提出的大多数模型分解为四个部分，并使用相同的学习策略对前三个部分进行重新建模，以进行公平的实证评估。此外，我们扩展了历史编码器和条件强度函数族，并提出了一个Granger因果关系发现框架，用于挖掘多类型事件之间的关系。采用变分推理框架下的离散图结构学习方法揭示了Granger因果图的潜在结构，进一步的实验表明，所提出的基于学习的潜在图的框架既能捕捉关系，又能提高拟合和预测性能。摘要：Temporal point process as the stochastic process on continuous domain of time is usually used to model the asynchronous event sequence featuring with occurence timestamps. With the rise of deep learning, due to the strong expressivity of deep neural networks, they are emerging as a promising choice for capturing the patterns in asynchronous sequences, in the setting of temporal point process. In this paper, we first review recent research emphasis and difficulties in modeling asynchronous event sequences with deep temporal point process, which can be concluded into four fields: encoding of history sequence, formulation of conditional intensity function, relational discovery of events and learning approaches for optimization. We introduce most of recently proposed models by dismantling them as the four parts, and conduct experiments by remodularizing the first three parts with the same learning strategy for a fair empirical evaluation. Besides, we extend the history encoders and conditional intensity function family, and propose a Granger causality discovery framework for exploiting the relations among multi-types of events. Discrete graph structure learning in the framework of Variational Inference is employed to reveal latent structures of Granger causality graph, and further experiments shows the proposed framework with learned latent graph can both capture the relations and achieve an improved fitting and predicting performance.

【38】 On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game 标题：基于核函数和神经函数逼近的无报酬RL：单智能体MDP和马尔可夫对策链接：https://arxiv.org/abs/2110.09771

作者：Shuang Qiu,Jieping Ye,Zhaoran Wang,Zhuoran Yang 备注：ICML 2021 摘要：为了在强化学习（RL）中实现样本效率，必须有效地探索底层环境。在离线设置下，解决探索挑战在于收集覆盖率足够的离线数据集。在这种挑战的激励下，我们研究了无报酬RL问题，其中一个代理的目标是在没有任何预先指定的报酬函数的情况下彻底探索环境。然后，在获得任何外部奖励的情况下，代理使用在探索阶段收集的离线数据，通过规划算法计算策略。此外，我们利用功能强大的函数逼近器，在函数逼近的背景下解决这个问题。具体而言，我们建议通过结合核函数和神经函数近似的值迭代算法的乐观变量进行探索，其中我们采用相关的探索奖金作为探索奖励。此外，我们还设计了单代理MDP和零和马尔可夫博弈的探索和规划算法，并证明了在给定任意外部报酬时，我们的方法可以实现$widetilde{mathcal{O}（1/varepsilon^2）$样本复杂度，以生成$varepsilon$次优策略或$varepsilon$近似纳什均衡。据我们所知，我们建立了第一个具有核函数和神经函数逼近器的可证明有效的无报酬RL算法。摘要：To achieve sample efficiency in reinforcement learning (RL), it necessitates efficiently exploring the underlying environment. Under the offline setting, addressing the exploration challenge lies in collecting an offline dataset with sufficient coverage. Motivated by such a challenge, we study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase. Moreover, we tackle this problem under the context of function approximation, leveraging powerful function approximators. Specifically, we propose to explore via an optimistic variant of the value-iteration algorithm incorporating kernel and neural function approximations, where we adopt the associated exploration bonus as the exploration reward. Moreover, we design exploration and planning algorithms for both single-agent MDPs and zero-sum Markov games and prove that our methods can achieve $widetilde{mathcal{O}}(1 /varepsilon^2)$ sample complexity for generating a $varepsilon$-suboptimal policy or $varepsilon$-approximate Nash equilibrium when given an arbitrary extrinsic reward. To the best of our knowledge, we establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.

【39】 Private measurement of nonlinear correlations between data hosted across multiple parties 标题：跨多方托管的数据之间的非线性相关性的私有测量链接：https://arxiv.org/abs/2110.09670

作者：Praneeth Vepakomma,Subha Nawer Pushpita,Ramesh Raskar 机构：MIT 摘要：我们引入了一种差异私有方法来测量跨两个实体托管的敏感数据之间的非线性相关性。我们为我们的私人估计器提供效用保证。据我们所知，我们是第一个这样的非线性相关性私人估计器，在多方设置中。考虑非线性相关的重要度量是距离相关。除探索性数据分析外，本研究还直接应用于私有特征筛选、私有独立性测试、私有k样本测试、私有多方因果推理和私有数据合成。代码访问：补充文件中提供了公开访问代码的链接。摘要：We introduce a differentially private method to measure nonlinear correlations between sensitive data hosted across two entities. We provide utility guarantees of our private estimator. Ours is the first such private estimator of nonlinear correlations, to the best of our knowledge within a multi-party setup. The important measure of nonlinear correlation we consider is distance correlation. This work has direct applications to private feature screening, private independence testing, private k-sample tests, private multi-party causal inference and private data synthesis in addition to exploratory data analysis. Code access: A link to publicly access the code is provided in the supplementary file.

【40】 Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks 标题：路径正则化：并行REU网络的一种诱导凸性和稀疏性的正则化链接：https://arxiv.org/abs/2110.09548

作者：Tolga Ergen,Mert Pilanci 机构：Department of Electrical Engineering, Stanford University 备注：Accepted to NeurIPS 2021. arXiv admin note: text overlap with arXiv:2110.05518 摘要：尽管进行了几次尝试，深层神经网络成功背后的基本机制仍然难以捉摸。为此，我们引入了一种新的分析框架来揭示深层神经网络训练中的隐凸性。我们考虑了一个具有多个Relu子网络的并行体系结构，它包括许多标准的深度体系结构和RESNET作为其特殊情况。然后，我们证明了具有路径正则化的训练问题可以转化为高维空间中的单个凸优化问题。我们进一步证明了等价凸规划是通过群稀疏诱导范数正则化的。因此，具有ReLU子网络的路径正则化并行结构可以看作是高维中的一种节省的特征选择方法。更重要的是，我们证明了全局优化等价凸问题所需的计算复杂度与数据样本数和特征维数有关，是多项式时间。因此，我们证明了具有全局最优性保证的路径正则化深ReLU网络的精确多项式时间可训练性。我们还提供了一些数值实验来证实我们的理论。摘要：Despite several attempts, the fundamental mechanisms behind the success of deep neural networks still remain elusive. To this end, we introduce a novel analytic framework to unveil hidden convexity in training deep neural networks. We consider a parallel architecture with multiple ReLU sub-networks, which includes many standard deep architectures and ResNets as its special cases. We then show that the training problem with path regularization can be cast as a single convex optimization problem in a high-dimensional space. We further prove that the equivalent convex program is regularized via a group sparsity inducing norm. Thus, a path regularized parallel architecture with ReLU sub-networks can be viewed as a parsimonious feature selection method in high-dimensions. More importantly, we show that the computational complexity required to globally optimize the equivalent convex problem is polynomial-time with respect to the number of data samples and feature dimension. Therefore, we prove exact polynomial-time trainability for path regularized deep ReLU networks with global optimality guarantees. We also provide several numerical experiments corroborating our theory.

【41】 Generalized XGBoost Method 标题：广义XGBoost方法链接：https://arxiv.org/abs/2109.07473

作者：Yang Guang 摘要：XGBoost方法有许多优点，特别适用于大数据的统计分析，但其损失函数仅限于凸函数。在许多特定应用中，最好使用非凸损失函数。本文提出了一种广义XGBoost方法，该方法要求较弱的损失函数条件，并涉及更一般的损失函数，包括凸损失函数和一些非凸损失函数。此外，将这种广义XGBoost方法推广到多元损失函数，形成了一种更为广义的XGBoost方法。该方法是一种多元正则化树boosting方法，可以对大多数常用的参数概率分布中的多个参数进行建模，并用预测变量进行拟合。同时，给出了非寿险定价的相关算法和实例。摘要：The XGBoost method has many advantages and is especially suitable for statistical analysis of big data, but its loss function is limited to convex functions. In many specific applications, a nonconvex loss function would be preferable. In this paper, we propose a generalized XGBoost method, which requires weaker loss function condition and involves more general loss functions, including convex loss functions and some non-convex loss functions. Furthermore, this generalized XGBoost method is extended to multivariate loss function to form a more generalized XGBoost method. This method is a multivariate regularized tree boosting method, which can model multiple parameters in most of the frequently-used parametric probability distributions to be fitted by predictor variables. Meanwhile, the related algorithms and some examples in non-life insurance pricing are given.

【42】 Ising Model Selection Using ell_{1}-Regularized Linear Regression: A Statistical Mechanics Analysis链接：https://arxiv.org/abs/2102.03988

作者：Xiangming Meng,Tomoyuki Obuchi,Yoshiyuki Kabashima 机构：Institute for Physics of Intelligence, The University of Tokyo, -,-, Hongo, Tokyo ,-, Japan, Department of Systems Science, Kyoto University, Kyoto ,-, Japan 备注：Accepted by NeurIPS 2021 摘要：我们从理论上研究了$ellu{1}$-正则化线性回归（$ellu{1$-LinR）在使用统计力学的复制方法选择伊辛模型时的典型学习性能。对于顺磁相的典型随机正则（RR）图，获得了$ellu 1$-LinR的典型样本复杂度的精确估计，表明对于具有$N$变量的伊辛模型，$ellu 1$-LinR是与$M=mathcal{O}left（log Nright）$样本一致的模型选择。此外，我们还提供了一种计算效率高的方法来精确预测$ellu_1$-LinR在中等$M$和$N$情况下的非渐近行为，如精确度和召回率。模拟结果表明，理论预测和实验结果之间有很好的一致性，即使是对于有许多圈的图，这也支持了我们的发现。尽管本文的重点是$ellu{1$-LinR，但我们的方法很容易适用于精确研究一类广泛的$ellu{1}$-正则M-估计的典型学习性能，包括$ellu{1}$-正则逻辑回归和交互筛选。摘要：We theoretically investigate the typical learning performance of $ell_{1}$-regularized linear regression ($ell_1$-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular (RR) graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of $ell_1$-LinR is obtained, demonstrating that, for an Ising model with $N$ variables, $ell_1$-LinR is model selection consistent with $M=mathcal{O}left(log Nright)$ samples. Moreover, we provide a computationally efficient method to accurately predict the non-asymptotic behavior of $ell_1$-LinR for moderate $M$ and $N$, such as the precision and recall rates. Simulations show a fairly good agreement between the theoretical predictions and experimental results, even for graphs with many loops, which supports our findings. Although this paper focuses on $ell_1$-LinR, our method is readily applicable for precisely investigating the typical learning performances of a wide class of $ell_{1}$-regularized M-estimators including $ell_{1}$-regularized logistic regression and interaction screening.

机器翻译，仅供参考

linux https 网络安全学习方法机器学习

0 人点赞