机器学习学术速递[8.17]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计109篇

Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】 Multistream Graph Attention Networks for Wind Speed Forecasting 标题：用于风速预报的多流图注意网络链接：https://arxiv.org/abs/2108.07063

作者：Dogan Aykas,Siamak Mehrkanoon 机构：Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands 备注：8 pages, 5 figures 摘要：可靠和准确的风速预测在许多工业部门，如经济、商业和管理等领域具有重大影响。提出了一种基于图注意网络（GAT）的风速预测模型。特别是，所提出的模型通过为GAT架构配备可学习的邻接矩阵以及引入新的注意机制来扩展GAT架构，目的是获得每个天气变量的注意分数。基于GAT的模式的输出与LSTM层相结合，以利用多元多维历史天气数据的空间和时间特征。从丹麦和荷兰的几个城市收集的真实天气数据用于进行实验并评估所提出模型的性能。我们表明，与以前用于风速预测的架构相比，该模型能够更好地了解天气数据的复杂输入输出关系。此外，由于学习了注意力权重，该模型为所研究的预测任务提供了关于最重要天气变量和城市的额外见解。摘要：Reliable and accurate wind speed prediction has significant impact in many industrial sectors such as economic, business and management among others. This paper presents a new model for wind speed prediction based on Graph Attention Networks (GAT). In particular, the proposed model extends GAT architecture by equipping it with a learnable adjacency matrix as well as incorporating a new attention mechanism with the aim of obtaining attention scores per weather variable. The output of the GAT based model is combined with the LSTM layer in order to exploit both the spatial and temporal characteristics of the multivariate multidimensional historical weather data. Real weather data collected from several cities in Denmark and Netherlands are used to conduct the experiments and evaluate the performance of the proposed model. We show that in comparison to previous architectures used for wind speed prediction, the proposed model is able to better learn the complex input-output relationships of the weather data. Furthermore, thanks to the learned attention weights, the model provides an additional insights on the most important weather variables and cities for the studied prediction task.

【2】 Causal Incremental Graph Convolution for Recommender System Retraining 标题：基于因果增量图卷积的推荐系统再训练链接：https://arxiv.org/abs/2108.06889

作者：Sihao Ding,Fuli Feng,Xiangnan He,Yong Liao,Jun Shi,Yongdong Zhang 备注：submitted to TNNLS 摘要：现实世界的推荐系统需要定期接受再训练，以适应新数据。在这项工作中，我们考虑如何有效地重新训练基于图形卷积网络（GCN）的推荐模型，这是最先进的技术，用于协同推荐。为了追求高效率，我们将目标设定为只使用新数据进行模型更新，同时与完全模型再训练相比，不牺牲推荐精度。这是很容易实现的，因为交互数据参与模型构建的图形结构和模型学习的损失函数，而旧的图形结构不允许用于模型更新。为此，我们提出了一种因果增量图卷积方法，该方法由两个新的算子组成，分别称为IGC和CED来估计全图卷积的输出。特别是，我们为IGC设计了简单有效的模块，巧妙地结合了旧的表示和增量图，并有效地融合了长期和短期偏好信号。CED旨在避免不在增量图中的非活动节点的过时问题，增量图通过因果推理将新数据与非活动节点连接起来。特别是，CED通过对撞机的控制来估计新数据对非活动节点表示的因果影响。在三个真实数据集上进行的大量实验表明，与现有的再训练机制相比，精度提高，速度显著提高。摘要：Real-world recommender system needs to be regularly retrained to keep with the new data. In this work, we consider how to efficiently retrain graph convolution network (GCN) based recommender models, which are state-of-the-art techniques for collaborative recommendation. To pursue high efficiency, we set the target as using only new data for model updating, meanwhile not sacrificing the recommendation accuracy compared with full model retraining. This is non-trivial to achieve, since the interaction data participates in both the graph structure for model construction and the loss function for model learning, whereas the old graph structure is not allowed to use in model updating. Towards the goal, we propose a textit{Causal Incremental Graph Convolution} approach, which consists of two new operators named textit{Incremental Graph Convolution} (IGC) and textit{Colliding Effect Distillation} (CED) to estimate the output of full graph convolution. In particular, we devise simple and effective modules for IGC to ingeniously combine the old representations and the incremental graph and effectively fuse the long-term and short-term preference signals. CED aims to avoid the out-of-date issue of inactive nodes that are not in the incremental graph, which connects the new data with inactive nodes through causal inference. In particular, CED estimates the causal effect of new data on the representation of inactive nodes through the control of their collider. Extensive experiments on three real-world datasets demonstrate both accuracy gains and significant speed-ups over the existing retraining mechanism.

【3】 Event2Graph: Event-driven Bipartite Graph for Multivariate Time-series Anomaly Detection 标题：Event2Graph：用于多变量时间序列异常检测的事件驱动二部图链接：https://arxiv.org/abs/2108.06783

作者：Yuhang Wu,Mengting Gu,Lan Wang,Yusan Lin,Fei Wang,Hao Yang 机构：Visa Research, Visa, Palo Alto, CA, United States 备注：In submission to a conference 摘要：对时间序列之间的相互依赖关系进行建模是实现多变量时间序列数据异常检测高性能的关键。建立依赖关系模型的实际解决方案是将数据输入到递归神经网络（RNN）中。然而，RNN（GRU或LSTM）下的全连接网络结构假定时间序列之间存在静态和完整的依赖关系图，这在许多实际应用程序中可能不适用。为了缓解这种假设，我们提出了一种动态二部图结构来编码时间序列之间的相互依赖关系。更具体地说，我们将时间序列建模为一种类型的节点，将时间序列段（视为事件）建模为另一种类型的节点，其中两种类型的节点之间的边描述特定时间序列上在特定时间发生的时间模式。基于这种设计，时间序列之间的关系可以通过与事件节点的动态连接来显式建模，而多元时间序列异常检测问题可以表述为动态图中的自监督边流预测问题。我们进行了大量的实验来证明设计的有效性。摘要：Modeling inter-dependencies between time-series is the key to achieve high performance in anomaly detection for multivariate time-series data. The de-facto solution to model the dependencies is to feed the data into a recurrent neural network (RNN). However, the fully connected network structure underneath the RNN (either GRU or LSTM) assumes a static and complete dependency graph between time-series, which may not hold in many real-world applications. To alleviate this assumption, we propose a dynamic bipartite graph structure to encode the inter-dependencies between time-series. More concretely, we model time series as one type of nodes, and the time series segments (regarded as event) as another type of nodes, where the edge between two types of nodes describe a temporal pattern occurred on a specific time series at a certain time. Based on this design, relations between time series can be explicitly modelled via dynamic connections to event nodes, and the multivariate time-series anomaly detection problem can be formulated as a self-supervised, edge stream prediction problem in dynamic graphs. We conducted extensive experiments to demonstrate the effectiveness of the design.

【4】 Effective and Efficient Graph Learning for Multi-view Clustering 标题：一种高效高效的多视图聚类图学习方法链接：https://arxiv.org/abs/2108.06734

作者：Quanxue Gao,Wei Xia,Xinbo Gao,Dacheng Tao 机构： Xidian University, Gao is with the School of Electronic Engineering, China and with the Chongqing Key Laboratory of Im-age Cognition, Chongqing University of Posts and Telecommunications 摘要：尽管在描述数据和集群结构之间的关系方面具有令人印象深刻的集群性能和效率，但是现有的基于图的多视图集群方法仍然存在以下缺点。由于图的构造和拉普拉斯矩阵的特征分解，它们承受着昂贵的时间负担，并且无法探索大规模数据的聚类结构。此外，它们需要进行后处理才能得到最终的聚类结果，从而导致性能不理想。此外，学习视图一致性图的秩不能近似于目标秩。本文借鉴二部图的思想，提出了一种高效的多视图聚类图学习模型。具体地说，我们的方法通过最小化张量Schatten p-范数来利用不同视图的图之间的视图相似性，这很好地刻画了嵌入在不同视图的图中的空间结构和互补信息。我们学习了具有自适应加权策略和连通性约束的视图一致性图，使得连通分量直接表示簇。我们提出的算法是时间经济的，并且获得了稳定的结果，并且可以很好地扩展数据量。大量的实验结果表明，我们的方法优于最先进的方法。摘要：Despite the impressive clustering performance and efficiency in characterizing both the relationship between data and cluster structure, existing graph-based multi-view clustering methods still have the following drawbacks. They suffer from the expensive time burden due to both the construction of graphs and eigen-decomposition of Laplacian matrix, and fail to explore the cluster structure of large-scale data. Moreover, they require a post-processing to get the final clustering, resulting in suboptimal performance. Furthermore, rank of the learned view-consensus graph cannot approximate the target rank. In this paper, drawing the inspiration from the bipartite graph, we propose an effective and efficient graph learning model for multi-view clustering. Specifically, our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm, which well characterizes both the spatial structure and complementary information embedded in graphs of different views. We learn view-consensus graph with adaptively weighted strategy and connectivity constraint such that the connected components indicates clusters directly. Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size. Extensive experimental results indicate that our method is superior to state-of-the-art methods.

【5】 Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer 标题：基于时态图协同变换的连续时间序贯推荐链接：https://arxiv.org/abs/2108.06625

作者：Ziwei Fan,Zhiwei Liu,Jiawei Zhang,Yun Xiong,Lei Zheng,Philip S. Yu 机构：Department of Computer Science, University of Illinois at Chicago, USA, IFM Lab, Department of Computer, Science, University of California, Davis, Fudan University, China, Pinterest Inc. 备注：accepted by CIKM2021 摘要：为了对用户偏好的演化进行建模，我们应该学习基于时间顺序的物品购买序列的用户/物品嵌入，这被定义为顺序推荐（SR）问题。现有方法利用顺序模式对项目转换进行建模。然而，它们中的大多数忽略了关键的时间协作信号，这些信号潜藏在不断演化的用户项交互中，并与序列模式共存。因此，我们建议统一序列模式和时态协作信号来提高推荐的质量，这是一个相当具有挑战性的问题。首先，很难同时编码序列模式和协作信号。其次，表达协作信号的时间效应是非常重要的。因此，我们在定义的连续时间二部图的基础上设计了一个新的时序图推荐器（TGSRec）。我们在TGSRec中提出了一种新的时间协同转换层（TCT），该层通过采用一种新的协同注意来改进自我注意机制。TCT层可以同时捕获来自用户和项目的协作信号，并考虑序列模式中的时间动态。我们通过时态图传播从TCTLayer学习到的信息，以统一序列模式和时态协作信号。对五个数据集的实证结果表明，TGSRec显著优于其他基线，平均提高22.5%和22.1%Recall@10and分别为MRR。摘要：In order to model the evolution of user preference, we should learn user/item embeddings based on time-ordered item purchasing sequences, which is defined as Sequential Recommendation (SR) problem. Existing methods leverage sequential patterns to model item transitions. However, most of them ignore crucial temporal collaborative signals, which are latent in evolving user-item interactions and coexist with sequential patterns. Therefore, we propose to unify sequential patterns and temporal collaborative signals to improve the quality of recommendation, which is rather challenging. Firstly, it is hard to simultaneously encode sequential patterns and collaborative signals. Secondly, it is non-trivial to express the temporal effects of collaborative signals. Hence, we design a new framework Temporal Graph Sequential Recommender (TGSRec) upon our defined continuous-time bi-partite graph. We propose a novel Temporal Collaborative Trans-former (TCT) layer in TGSRec, which advances the self-attention mechanism by adopting a novel collaborative attention. TCT layer can simultaneously capture collaborative signals from both users and items, as well as considering temporal dynamics inside sequential patterns. We propagate the information learned fromTCTlayerover the temporal graph to unify sequential patterns and temporal collaborative signals. Empirical results on five datasets show that TGSRec significantly outperforms other baselines, in average up to 22.5% and 22.1�solute improvements in Recall@10and MRR, respectively.

【6】 LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis 标题：LinkTeller：基于影响分析的图神经网络私有边恢复链接：https://arxiv.org/abs/2108.06504

作者：Fan Wu,Yunhui Long,Ce Zhang,Bo Li 机构：University of Illinois at Urbana-Champaign, ETH Z¨urich 摘要：图形结构的数据已经实现了一些成功的应用，例如推荐系统和流量预测，提供了丰富的节点特征和边缘信息。然而，在实际应用中，这些高维特征和高阶邻接信息通常是异构的，并且由不同的数据持有者持有。考虑到这种垂直数据划分（例如，一个数据持有者将只拥有节点特征或边缘信息），不同的数据持有者必须制定有效的联合训练协议，而不是出于隐私考虑直接相互传输数据。在本文中，我们专注于边缘隐私，并考虑训练的情况下，鲍伯节点特征将首先发送训练节点特征爱丽丝拥有邻接信息。然后，Alice将使用联合信息训练一个图形神经网络（GNN），并发布一个推理API。在推理过程中，Bob能够提供测试节点特性并查询API以获得测试节点的预测。在这种情况下，我们首先提出了一种隐私攻击LinkTeller，通过影响分析，通过为Bob设计对抗性查询来推断Alice持有的私有边缘信息。然后，我们根据经验证明LinkTeller能够恢复大量的私有边缘，表现优于现有基线。为了进一步评估隐私泄漏，我们改进了现有的差分私有图卷积网络（DP-GCN）训练算法，并提出了一种新的DP-GCN机制lappraph。我们表明，这些DP-GCN机制在轻度隐私保证（$varepsilon>5$）的情况下，在经验上并不总是能够抵抗LinkTeller。我们的研究将为设计更具弹性的隐私保护GCN模型的未来研究提供帮助；同时，深入了解GCN模型实用性和抗潜在隐私攻击的鲁棒性之间的权衡。摘要：Graph structured data have enabled several successful applications such as recommendation systems and traffic prediction, given the rich node features and edges information. However, these high-dimensional features and high-order adjacency information are usually heterogeneous and held by different data holders in practice. Given such vertical data partition (e.g., one data holder will only own either the node features or edge information), different data holders have to develop efficient joint training protocols rather than directly transfer data to each other due to privacy concerns. In this paper, we focus on the edge privacy, and consider a training scenario where Bob with node features will first send training node features to Alice who owns the adjacency information. Alice will then train a graph neural network (GNN) with the joint information and release an inference API. During inference, Bob is able to provide test node features and query the API to obtain the predictions for test nodes. Under this setting, we first propose a privacy attack LinkTeller via influence analysis to infer the private edge information held by Alice via designing adversarial queries for Bob. We then empirically show that LinkTeller is able to recover a significant amount of private edges, outperforming existing baselines. To further evaluate the privacy leakage, we adapt an existing algorithm for differentially private graph convolutional network (DP GCN) training and propose a new DP GCN mechanism LapGraph. We show that these DP GCN mechanisms are not always resilient against LinkTeller empirically under mild privacy guarantees ($varepsilon>5$). Our studies will shed light on future research towards designing more resilient privacy-preserving GCN models; in the meantime, provide an in-depth understanding of the tradeoff between GCN model utility and robustness against potential privacy attacks.

【7】 Non-Local Feature Aggregation on Graphs via Latent Fixed Data Structures 标题：基于潜在固定数据结构的图上非局部特征聚集链接：https://arxiv.org/abs/2108.07028

作者：Mostafa Rahmani,Rasoul Shafipour,Ping Li 机构：Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA 备注：published in 2021 IEEE Asilomar Conference on Signals, Systems, and Computers 摘要：与图像/文本数据不同，图像/文本数据的顺序可用于使用池层以直接方式执行非局部特征聚合，图缺少张量表示，并且主要使用元素最大/平均函数来聚合局部提取的特征向量。在本文中，我们提出了一种新的全局特征聚合方法，该方法利用潜在的固定数据结构（LFDS）来聚合提取的特征向量。局部提取的特征向量在LFD上进行分类/分布，并利用潜在神经网络（CNN/GNN）对LFD进行特征聚合。该方法用于设计基于LFD选择的几种新的全局特征聚合方法。我们介绍了多个LFDS，包括loop、3D张量（图像）、序列、数据驱动图，以及一种在LFDS上对提取的局部特征向量进行排序/分布的算法。虽然所提出的方法的计算复杂度与输入图的顺序成线性关系，但它们可以获得有竞争力或更好的结果。摘要：In contrast to image/text data whose order can be used to perform non-local feature aggregation in a straightforward way using the pooling layers, graphs lack the tensor representation and mostly the element-wise max/mean function is utilized to aggregate the locally extracted feature vectors. In this paper, we present a novel approach for global feature aggregation in Graph Neural Networks (GNNs) which utilizes a Latent Fixed Data Structure (LFDS) to aggregate the extracted feature vectors. The locally extracted feature vectors are sorted/distributed on the LFDS and a latent neural network (CNN/GNN) is utilized to perform feature aggregation on the LFDS. The proposed approach is used to design several novel global feature aggregation methods based on the choice of the LFDS. We introduce multiple LFDSs including loop, 3D tensor (image), sequence, data driven graphs and an algorithm which sorts/distributes the extracted local feature vectors on the LFDS. While the computational complexity of the proposed methods are linear with the order of input graphs, they achieve competitive or better results.

GAN|对抗|攻击|生成相关(8篇)

【1】 Patch Attack Invariance: How Sensitive are Patch Attacks to 3D Pose? 标题：补丁攻击不变性：补丁攻击对3D姿势有多敏感？链接：https://arxiv.org/abs/2108.07229

作者：Max Lennon,Nathan Drenkow,Philippe Burlina 机构：The Johns Hopkins University Applied Physics Laboratory, Johns Hopkins Road, Laurel, Maryland 摘要：基于扰动的攻击虽然在物理上无法实现，但一直是对抗式机器学习（ML）研究的重点。相比之下，基于补丁的攻击在物理上是可以实现的，但大多数工作都集中在2D领域，最近又涉足3D领域。描述面片攻击的鲁棒性及其对三维姿态的不变性是重要的，但尚未完全阐明，这是本文的重点。为此，本文做出了以下几点贡献：A）我们开发了一种新的度量，称为变换平均攻击成功率（mAST），用于评估补丁攻击的鲁棒性和不变性；和B），我们系统地评估了补丁攻击在各种条件下对3D位置和方向的鲁棒性；特别是，我们进行了敏感性分析，该分析提供了关于攻击有效性的重要定性见解，作为面片相对于相机的3D姿势（旋转、平移）的函数，并阐述了面片攻击3D不变性的一些特性；和C），我们得出了新的定性结论，包括：1）我们证明，对于一些3D变换，即旋转和织布机，增加训练分布支持可以提高测试时整个范围内的补丁成功率。2）我们提供了一个新的见解，以了解面片攻击有效性的基本截止极限的存在，该极限取决于面外旋转角度的范围。这些发现将共同指导3D补丁攻击和防御的未来设计。摘要：Perturbation-based attacks, while not physically realizable, have been the main emphasis of adversarial machine learning (ML) research. Patch-based attacks by contrast are physically realizable, yet most work has focused on 2D domain with recent forays into 3D. Characterizing the robustness properties of patch attacks and their invariance to 3D pose is important, yet not fully elucidated, and is the focus of this paper. To this end, several contributions are made here: A) we develop a new metric called mean Attack Success over Transformations (mAST) to evaluate patch attack robustness and invariance; and B), we systematically assess robustness of patch attacks to 3D position and orientation for various conditions; in particular, we conduct a sensitivity analysis which provides important qualitative insights into attack effectiveness as a function of the 3D pose of a patch relative to the camera (rotation, translation) and sets forth some properties for patch attack 3D invariance; and C), we draw novel qualitative conclusions including: 1) we demonstrate that for some 3D transformations, namely rotation and loom, increasing the training distribution support yields an increase in patch success over the full range at test time. 2) We provide new insights into the existence of a fundamental cutoff limit in patch attack effectiveness that depends on the extent of out-of-plane rotation angles. These findings should collectively guide future design of 3D patch attacks and defenses.

【2】 A Novel Attribute Reconstruction Attack in Federated Learning 标题：联邦学习中一种新的属性重构攻击链接：https://arxiv.org/abs/2108.06910

作者：Lingjuan Lyu,Chen Chen 机构： Sony AI , Zhejiang University 备注：accepted by FTL-IJCAI'21 Oral 摘要：联邦学习（FL）作为一种很有前途的学习范式出现，使众多参与者能够在不公开其私人训练数据的情况下构建联合ML模型。现有的FL设计已被证明存在漏洞，系统内外的对手都可以利用这些漏洞危害数据隐私。然而，当前大多数作品通过利用小批量数据的梯度来进行攻击，这在FL中不太实用，我们认为一个更实际和有趣的场景，参与者分享他们的历元平均梯度（共享梯度后至少1个时代的局部训练），而不是每一个例子或小批量平均梯度，如在以前的作品。我们对FL系统中恶意服务器发起的属性重建攻击（ARA）进行了首次系统评估，并实证证明共享历元平均局部模型梯度可以揭示任何受害者参与者的局部训练数据的敏感属性。为了实现这一目标，我们开发了一种更有效的基于梯度匹配的方法cos匹配来重建训练数据属性。我们评估对各种现实世界数据集、场景和假设的攻击。我们的实验表明，我们提出的方法实现了比大多数现有基线更好的属性攻击性能。摘要：Federated learning (FL) emerged as a promising learning paradigm to enable a multitude of participants to construct a joint ML model without exposing their private training data. Existing FL designs have been shown to exhibit vulnerabilities which can be exploited by adversaries both within and outside of the system to compromise data privacy. However, most current works conduct attacks by leveraging gradients on a small batch of data, which is less practical in FL. In this work, we consider a more practical and interesting scenario in which participants share their epoch-averaged gradients (share gradients after at least 1 epoch of local training) rather than per-example or small batch-averaged gradients as in previous works. We perform the first systematic evaluation of attribute reconstruction attack (ARA) launched by the malicious server in the FL system, and empirically demonstrate that the shared epoch-averaged local model gradients can reveal sensitive attributes of local training data of any victim participant. To achieve this goal, we develop a more effective and efficient gradient matching based method called cos-matching to reconstruct the training data attributes. We evaluate our attacks on a variety of real-world datasets, scenarios, assumptions. Our experiments show that our proposed method achieves better attribute attack performance than most existing baselines.

【3】 Interpreting Attributions and Interactions of Adversarial Attacks 标题：解读对抗性攻击的归因与互动链接：https://arxiv.org/abs/2108.06895

作者：Xin Wang,Shuyun Lin,Hao Zhang,Yufei Zhu,Quanshi Zhang 机构：Shanghai Jiao Tong University 摘要：本文旨在从对抗性干扰如何影响攻击任务的角度来解释对抗性攻击。我们根据Shapley值估计不同图像区域的属性以降低攻击成本。我们定义并量化对抗扰动像素之间的相互作用，并将整个扰动图分解为相对独立的扰动分量。对扰动图的分解表明，对抗训练的DNN比正常训练的DNN在前景中具有更多的扰动分量。此外，与正常训练的DNN相比，对抗训练的DNN有更多的成分，这主要降低了真实类别的得分。上述分析为理解对抗性攻击提供了新的见解。摘要：This paper aims to explain adversarial attacks in terms of how adversarial perturbations contribute to the attacking task. We estimate attributions of different image regions to the decrease of the attacking cost based on the Shapley value. We define and quantify interactions among adversarial perturbation pixels, and decompose the entire perturbation map into relatively independent perturbation components. The decomposition of the perturbation map shows that adversarially-trained DNNs have more perturbation components in the foreground than normally-trained DNNs. Moreover, compared to the normally-trained DNN, the adversarially-trained DNN have more components which mainly decrease the score of the true category. Above analyses provide new insights into the understanding of adversarial attacks.

【4】 Neural Architecture Dilation for Adversarial Robustness 标题：用于对抗健壮性的神经结构扩张链接：https://arxiv.org/abs/2108.06885

作者：Yanxi Li,Zhaohui Yang,Yunhe Wang,Chang Xu 机构： School of Computer Science, University of Sydney, Australia, Noah’s Ark Lab, Huawei Technologies, China, Key Lab of Machine Perception (MOE), Department of Machine Intelligence, Peking University, China 备注：9 pages of main text, 5 pages of appendix, 4 figures, 9 tables 摘要：在过去的几十年里，随着卷积神经网络（CNN）的结构和规模的巨大进步，它们在某些任务中很容易达到甚至超过人类的性能。然而，最近发现的CNN的一个缺点是，它们容易受到敌对攻击。尽管对抗训练可以提高CNN的对抗鲁棒性，但在标准准确性和对抗鲁棒性之间存在权衡。从神经网络体系结构的角度出发，本文旨在提高主干CNN的对抗鲁棒性，使其具有令人满意的准确性。在计算开销最小的情况下，扩容架构的引入有望与主干CNN的标准性能保持友好关系，同时追求对抗鲁棒性。对标准误差界和对抗误差界的理论分析自然激发了所提出的神经结构膨胀算法。在真实数据集和基准神经网络上的实验结果证明了该算法在平衡精度和对抗鲁棒性方面的有效性。摘要：With the tremendous advances in the architecture and scale of convolutional neural networks (CNNs) over the past few decades, they can easily reach or even exceed the performance of humans in certain tasks. However, a recently discovered shortcoming of CNNs is that they are vulnerable to adversarial attacks. Although the adversarial robustness of CNNs can be improved by adversarial training, there is a trade-off between standard accuracy and adversarial robustness. From the neural architecture perspective, this paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy. Under a minimal computational overhead, the introduction of a dilation architecture is expected to be friendly with the standard performance of the backbone CNN while pursuing adversarial robustness. Theoretical analyses on the standard and adversarial error bounds naturally motivate the proposed neural architecture dilation algorithm. Experimental results on real-world datasets and benchmark neural networks demonstrate the effectiveness of the proposed algorithm to balance the accuracy and adversarial robustness.

【5】 IADA: Iterative Adversarial Data Augmentation Using Formal Verification and Expert Guidance 标题：IADA：使用形式验证和专家指导的迭代对抗性数据增强链接：https://arxiv.org/abs/2108.06871

作者：Ruixuan Liu,Changliu Liu 机构： 20 16) have shown 1 Robotics Institute, Carnegie Mellon University 备注：9 pages, 7 figures. In proceeding to ICML 2021 Workshop of Human in the Loop Learning 摘要：神经网络（NNs）以其卓越的性能被广泛应用于分类任务中。然而，神经网络的鲁棒性和准确性在很大程度上取决于训练数据。在许多应用中，通常无法获得大量的训练数据。为了应对这一挑战，本文提出了一种迭代对抗性数据增强（IADA）框架，用于从数量不足的训练数据中学习神经网络模型。该方法使用形式化验证来识别最“混乱”的输入样本，并利用人工指导安全地、迭代地使用这些样本扩充训练数据。该框架被应用于一个人工2D数据集、MNIST数据集和一个人体运动数据集。通过将IADA应用于全连通神经网络分类器，我们证明了我们的训练方法可以提高学习模型的鲁棒性和准确性。与常规监督训练相比，在MNIST数据集上，平均扰动界提高了107.4%。在二维数据集、MNIST数据集和人体运动数据集上，分类准确率分别提高了1.77%、3.76%、10.85%。摘要：Neural networks (NNs) are widely used for classification tasks for their remarkable performance. However, the robustness and accuracy of NNs heavily depend on the training data. In many applications, massive training data is usually not available. To address the challenge, this paper proposes an iterative adversarial data augmentation (IADA) framework to learn neural network models from an insufficient amount of training data. The method uses formal verification to identify the most "confusing" input samples, and leverages human guidance to safely and iteratively augment the training data with these samples. The proposed framework is applied to an artificial 2D dataset, the MNIST dataset, and a human motion dataset. By applying IADA to fully-connected NN classifiers, we show that our training method can improve the robustness and accuracy of the learned model. By comparing to regular supervised training, on the MNIST dataset, the average perturbation bound improved 107.4%. The classification accuracy improved 1.77%, 3.76%, 10.85% on the 2D dataset, the MNIST dataset, and the human motion dataset respectively.

【6】 Generating Cyber Threat Intelligence to Discover Potential Security Threats Using Classification and Topic Modeling 标题：使用分类和主题建模生成网络威胁情报以发现潜在的安全威胁链接：https://arxiv.org/abs/2108.06862

作者：Md Imran Hossen,Ashraful Islam,Farzana Anowar,Eshtiak Ahmed,Mohammad Masudur Rahman 机构：School of Computing and Informatics, University of Louisiana at Lafayette, Louisiana, USA, University of Regina, Regina, Canada, Tampere University, Tampere, Finland, Mohammed Masudur Rahman†, Department of Computer Science and Engineering 摘要：由于网络攻击或威胁的多样性，网络安全社区一直在将传统的安全控制机制提升到高级水平，以便自动化工具能够遇到潜在的安全威胁。最近，网络威胁情报（CTI）一词因其基于数据的自动网络安全威胁预测而被称为一种主动、稳健的机制。一般而言，CTI从各种来源收集和分析数据，例如在线安全论坛、社交媒体，网络爱好者、分析师甚至网络罪犯在这些地方讨论网络或计算机安全相关话题，并根据分析发现潜在威胁。由于手动分析每一次此类讨论（即在线平台上的帖子）耗时、效率低下且容易出错，因此CTI作为一种自动化工具，可以在检测网络威胁方面发挥独特的作用。在本文中，我们的目标是通过使用不同的有监督和无监督学习技术来识别和探索黑客论坛中的相关CTI。为此，我们从一个真实的黑客论坛收集数据，并构建了两个数据集：一个二进制数据集和一个多类数据集。我们的二进制数据集包含两个类，一个包含与网络安全相关的帖子，另一个包含与安全无关的帖子。这个数据集是使用简单的关键字搜索技术构建的。使用类似的方法，我们将来自安全相关岗位的岗位进一步分类为五个不同的威胁类别。然后，我们应用了几种机器学习分类器以及基于深度神经网络的分类器，并在数据集上使用它们来比较它们的性能。我们还在一个泄漏的数据集上测试了分类器，该数据集的标签名为nulled.io，这是我们的基本事实。我们进一步利用无监督技术，即潜在狄里克莱分配（LDA）和非负矩阵分解（NMF），探索数据集。摘要：Due to the variety of cyber-attacks or threats, the cybersecurity community has been enhancing the traditional security control mechanisms to an advanced level so that automated tools can encounter potential security threats. Very recently a term, Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms because of its automated cybersecurity threat prediction based on data. In general, CTI collects and analyses data from various sources e.g. online security forums, social media where cyber enthusiasts, analysts, even cybercriminals discuss cyber or computer security related topics and discovers potential threats based on the analysis. As the manual analysis of every such discussion i.e. posts on online platforms is time-consuming, inefficient, and susceptible to errors, CTI as an automated tool can perform uniquely to detect cyber threats. In this paper, our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques. To this end, we collect data from a real hacker forum and constructed two datasets: a binary dataset and a multi-class dataset. Our binary dataset contains two classes one containing cybersecurity-relevant posts and another one containing posts that are not related to security. This dataset is constructed using simple keyword search technique. Using a similar approach, we further categorize posts from security-relevant posts into five different threat categories. We then applied several machine learning classifiers along with deep neural network-based classifiers and use them on the datasets to compare their performances. We also tested the classifiers on a leaked dataset with labels named nulled.io as our ground truth. We further explore the datasets using unsupervised techniques i.e. Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF).

【7】 SAPPHIRE: Approaches for Enhanced Concept-to-Text Generation 标题：蓝宝石：增强概念到文本生成的方法链接：https://arxiv.org/abs/2108.06643

作者：Steven Y. Feng,Jessica Huynh,Chaitanya Narisetty,Eduard Hovy,Varun Gangal 机构：Language Technologies Institute, Carnegie Mellon University 备注：INLG 2021. Code available at this https URL 摘要：我们激发并提出了一套简单但有效的改进概念到文本生成的SAPPHIRE：集合增强和后期短语填充和重组。通过使用BART和T5模型的实验，我们证明了它们在生成性常识推理，即CommonGen任务上的有效性。通过广泛的自动和人工评估，我们发现SAPPHIRE显著提高了模型性能。深入的定性分析表明，SAPPHIRE有效地解决了基线模型生成的许多问题，包括缺乏常识、不够具体和流利性差。摘要：We motivate and propose a suite of simple but effective improvements for concept-to-text generation called SAPPHIRE: Set Augmentation and Post-hoc PHrase Infilling and REcombination. We demonstrate their effectiveness on generative commonsense reasoning, a.k.a. the CommonGen task, through experiments using both BART and T5 models. Through extensive automatic and human evaluation, we show that SAPPHIRE noticeably improves model performance. An in-depth qualitative analysis illustrates that SAPPHIRE effectively addresses many issues of the baseline model generations, including lack of commonsense, insufficient specificity, and poor fluency.

【8】 A Survey on GAN Acceleration Using Memory Compression Technique 标题：基于存储器压缩技术的GaN加速研究综述链接：https://arxiv.org/abs/2108.06626

作者：Dina Tantawy,Mohamed Zahran,Amr Wassal 机构：Correspondence:, Department of Computer, Engineering, Cairo University, Cairo, Egypt, Full list of author information is, available at the end of the article 备注：21 pages, 17 figures 摘要：自其发明以来，生成性对抗网络（GAN）在许多应用中已显示出优异的结果。生成性对抗网络是一种强大但又需要大量资源的深度学习模型。它们与普通深度学习模型的主要区别在于其输出的性质。例如，相对于检测对象或分类图像的其他模型，GAN输出可以是整个图像。因此，网络的体系结构和数值精度会影响解决方案的质量和速度。因此，加速GANs至关重要。加速GAN可分为三个主要方面：（1）内存压缩，（2）计算优化，（3）数据流优化。因为数据传输是能源使用的主要来源，所以内存压缩带来了最大的节约。因此，在本文中，我们调查了基于CNN的GAN的内存压缩技术。此外，本文总结了GANs加速的机遇和挑战，并提出了有待进一步研究的开放性研究问题。摘要：Since its invention, Generative adversarial networks (GANs) have shown outstanding results in many applications. Generative Adversarial Networks are powerful yet, resource-hungry deep-learning models. Their main difference from ordinary deep learning models is the nature of their output. For example, GAN output can be a whole image versus other models detecting objects or classifying images. Thus, the architecture and numeric precision of the network affect the quality and speed of the solution. Hence, accelerating GANs is pivotal. Accelerating GANs can be classified into three main tracks: (1) Memory compression, (2) Computation optimization, and (3) Data-flow optimization. Because data transfer is the main source of energy usage, memory compression leads to the most savings. Thus, in this paper, we survey memory compression techniques for CNN-Based GANs. Additionally, the paper summarizes opportunities and challenges in GANs acceleration and suggests open research problems to be further investigated.

半/弱/无/有监督|不确定性|主动学习(12篇)

【1】 APReL: A Library for Active Preference-based Reward Learning Algorithms 标题：APREL：一个基于主动偏好的奖励学习算法库链接：https://arxiv.org/abs/2108.07259

作者：Erdem Bıyı k,Aditi Talati,Dorsa Sadigh 机构： Department of Electrical Engineering, Stanford University, Department of Computer Science, Stanford University 备注：5 pages, 1 figures. Library is available at: this https URL 摘要：奖励学习是机器人学中的一个基本问题，即让机器人按照人类用户的需求进行操作。许多基于偏好的学习算法和主动查询技术已经被提出来解决这个问题。在本文中，我们介绍了APReL，一个基于主动偏好的奖励学习算法库，它使研究人员和实践者能够使用现有技术进行实验，并轻松地为问题的各个模块开发自己的算法。摘要：Reward learning is a fundamental problem in robotics to have robots that operate in alignment with what their human user wants. Many preference-based learning algorithms and active querying techniques have been proposed as a solution to this problem. In this paper, we present APReL, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop their own algorithms for various modules of the problem.

【2】 Improving Self-supervised Learning with Hardness-aware Dynamic Curriculum Learning: An Application to Digital Pathology 标题：用硬度意识动态课程学习改进自我监督学习：在数字病理学中的应用链接：https://arxiv.org/abs/2108.07183

作者：Chetan L Srinidhi,Anne L Martel 机构：Physical Sciences, Sunnybrook Research Institute, Toronto, Canada, Department of Medical Biophysics, University of Toronto, Canada 备注：Accepted at ICCV 2021 CDpath workshop 摘要：自监督学习（SSL）最近显示出巨大的潜力，可以学习对许多图像分析任务有用的通用视觉表示。尽管它们取得了显著的成功，但当标记的训练实例数量较少或传输域之间的域转移显著时，现有的SSL方法无法推广到下游任务。在本文中，我们试图通过课程学习的视角，通过提出一种硬度感知的动态课程学习（HaDCL）方法来改进自我监督的预训练表征。为了提高SSL的健壮性和通用性，我们在小批量下游微调过程中通过易到难和难到非常难的样本动态地利用渐进式较难示例。我们发现，通过循序渐进的阶段性课程学习，预训练表征得到显著增强，并适用于域内和域外分布数据。我们对三个组织学基准数据集进行了广泛验证，包括面片分类和玻片分类问题。与标准微调相比，我们基于课程的微调产生了显著的改进，在域内和域外分布数据上，曲线下面积（AUC）分数的最小改进分别为1.7%和2.2%。此外，我们的经验表明，我们的方法更通用，适用于任何SSL方法，并且不会增加任何额外的开销复杂性。此外，我们还概述了基于补丁和基于幻灯片的课程学习在组织病理学中的作用，以提供基于课程的SSL方法微调成功的实际见解。守则将于https://github.com/srinidhiPY/ICCVCDPATH2021-ID-8 摘要：Self-supervised learning (SSL) has recently shown tremendous potential to learn generic visual representations useful for many image analysis tasks. Despite their notable success, the existing SSL methods fail to generalize to downstream tasks when the number of labeled training instances is small or if the domain shift between the transfer domains is significant. In this paper, we attempt to improve self-supervised pretrained representations through the lens of curriculum learning by proposing a hardness-aware dynamic curriculum learning (HaDCL) approach. To improve the robustness and generalizability of SSL, we dynamically leverage progressive harder examples via easy-to-hard and hard-to-very-hard samples during mini-batch downstream fine-tuning. We discover that by progressive stage-wise curriculum learning, the pretrained representations are significantly enhanced and adaptable to both in-domain and out-of-domain distribution data. We performed extensive validation on three histology benchmark datasets on both patch-wise and slide-level classification problems. Our curriculum based fine-tuning yields a significant improvement over standard fine-tuning, with a minimum improvement in area-under-the-curve (AUC) score of 1.7% and 2.2% on in-domain and out-of-domain distribution data, respectively. Further, we empirically show that our approach is more generic and adaptable to any SSL methods and does not impose any additional overhead complexity. Besides, we also outline the role of patch-based versus slide-based curriculum learning in histopathology to provide practical insights into the success of curriculum based fine-tuning of SSL methods. Code will be released at https://github.com/srinidhiPY/ICCVCDPATH2021-ID-8

【3】 Semi-Supervised Siamese Network for Identifying Bad Data in Medical Imaging Datasets 标题：医学影像数据集中识别不良数据的半监督暹罗网络链接：https://arxiv.org/abs/2108.07130

作者：Niamh Belton,Aonghus Lawlor,Kathleen M. Curran 机构：Science Foundation Ireland Centre for Research Training in Machine Learning, School of Medicine,School of Computer Science, University College Dublin, Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland 备注：None 摘要：医学成像数据集中存在的噪声数据通常有助于开发能够处理真实世界数据的稳健模型。但是，如果不良数据包含的解剖信息不足，可能会对模型的性能产生严重的负面影响。我们提出了一种使用半监督暹罗网络识别不良数据的新方法。这种方法只需要一小部分“参考”医学图像就可以由非专家人员进行审查，以确保主要解剖结构出现在视野中。该模型在此参考集上进行训练，并通过使用暹罗网络计算参考集与数据集中所有其他医学图像之间的距离来识别不良数据。该方法实现了0.989的曲线下面积（AUC），用于识别不良数据。代码将在https://git.io/JYFuV. 摘要：Noisy data present in medical imaging datasets can often aid the development of robust models that are equipped to handle real-world data. However, if the bad data contains insufficient anatomical information, it can have a severe negative effect on the model's performance. We propose a novel methodology using a semi-supervised Siamese network to identify bad data. This method requires only a small pool of 'reference' medical images to be reviewed by a non-expert human to ensure the major anatomical structures are present in the Field of View. The model trains on this reference set and identifies bad data by using the Siamese network to compute the distance between the reference set and all other medical images in the dataset. This methodology achieves an Area Under the Curve (AUC) of 0.989 for identifying bad data. Code will be available at https://git.io/JYFuV.

【4】 Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision 标题：基于激活范围监督的卷积神经网络硬件容错安全方案链接：https://arxiv.org/abs/2108.07019

作者：Florian Geissler,Syed Qutub,Sayanta Roychowdhury,Ali Asgari,Yang Peng,Akash Dhamasia,Ralf Graefe,Karthik Pattabiraman,Michael Paulitsch 机构：Intel, Germany, University of British Columbia, Canada 备注：8 pages, 7 figures 摘要：卷积神经网络（CNN）已成为众多安全关键计算机视觉应用的一部分，包括人机交互和自动驾驶。现实世界中的实现需要保证其鲁棒性，以防硬件软错误损坏底层平台内存。基于之前观察到的激活限幅技术的有效性，我们采用八指数浮点数据表示，通过证明距离监控代表了一个高度可靠的故障检测器和缓解器，为分类器CNN构建了一个典型的安全案例。我们进一步探索新的非均匀范围限制方法，有效地抑制静默数据损坏和不可纠正错误的概率。作为一个安全相关的端到端用例，我们使用ResNet-50和交通摄像头数据集MIOVision在车辆分类场景中展示了我们的方法的优点。这项工作中提供的定量证据可以用来激发进一步的、可能更复杂的CNN安全论据。摘要：Convolutional neural networks (CNNs) have become an established part of numerous safety-critical computer vision applications, including human robot interactions and automated driving. Real-world implementations will need to guarantee their robustness against hardware soft errors corrupting the underlying platform memory. Based on the previously observed efficacy of activation clipping techniques, we build a prototypical safety case for classifier CNNs by demonstrating that range supervision represents a highly reliable fault detector and mitigator with respect to relevant bit flips, adopting an eight-exponent floating point data representation. We further explore novel, non-uniform range restriction methods that effectively suppress the probability of silent data corruptions and uncorrectable errors. As a safety-relevant end-to-end use case, we showcase the benefit of our approach in a vehicle classification scenario, using ResNet-50 and the traffic camera data set MIOVision. The quantitative evidence provided in this work can be leveraged to inspire further and possibly more complex CNN safety arguments.

【5】 Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping 标题：基于动态时间规整的弱监督时间异常分割链接：https://arxiv.org/abs/2108.06816

作者：Dongha Lee,Sehun Yu,Hyunjun Ju,Hwanjo Yu 机构：University of Illinois at Urbana-Champaign (UIUC), Urbana, IL, United States, Pohang University of Science and Technology (POSTECH), Pohang, South Korea 备注：ICCV 2021. 8 pages, References (2 pages), Appendix (3 pages), 6 figures 摘要：最近关于检测和定位时间异常的研究主要是利用深度神经网络以无监督的方式学习时间数据的正常模式。与它们不同的是，我们工作的目标是充分利用实例级（或弱）异常标签，它只指示在每个时态数据实例中是否发生了任何异常事件。在本文中，我们提出了一种新的框架WETAS，它可以有效地识别输入实例中的异常时间段（即连续时间点）。WETAS从实例级别的标签中学习鉴别特征，从而推断每个实例中正常和异常段的顺序，这可以用作粗分割掩码。基于输入实例与其分割掩码之间的动态时间扭曲（DTW）对齐，WETAS获得时间分割的结果，同时，通过使用掩码作为附加监控，它进一步增强了自身。我们的实验表明，在时间异常的定位方面，WETAS大大优于其他基线，并且与点级检测方法相比，它提供了更多信息。摘要：Most recent studies on detecting and localizing temporal anomalies have mainly employed deep neural networks to learn the normal patterns of temporal data in an unsupervised manner. Unlike them, the goal of our work is to fully utilize instance-level (or weak) anomaly labels, which only indicate whether any anomalous events occurred or not in each instance of temporal data. In this paper, we present WETAS, a novel framework that effectively identifies anomalous temporal segments (i.e., consecutive time points) in an input instance. WETAS learns discriminative features from the instance-level labels so that it infers the sequential order of normal and anomalous segments within each instance, which can be used as a rough segmentation mask. Based on the dynamic time warping (DTW) alignment between the input instance and its segmentation mask, WETAS obtains the result of temporal segmentation, and simultaneously, it further enhances itself by using the mask as additional supervision. Our experiments show that WETAS considerably outperforms other baselines in terms of the localization of temporal anomalies, and also it provides more informative results than point-level detection methods.

【6】 Self-supervised Contrastive Learning of Multi-view Facial Expressions 标题：多视点面部表情的自监督对比学习链接：https://arxiv.org/abs/2108.06723

作者：Shuvendu Roy,Ali Etemad 机构：Department of Electrical and Computer Engineering & Ingenuity Labs Research Institute, Queen’s University, Kingston, Canada 备注：Accepted by 23rd ACM International Conference on Multimodal Interaction (ICMI 2021) 摘要：人脸表情识别（FER）已成为人机交互系统的重要组成部分。尽管FER最近有所进步，但非正面面部图像的性能通常会显著下降。我们提出多视角面部表情对比学习（CL-MEx）来利用从不同角度同时捕获的面部图像。CL MEx是一个两步训练框架。在第一步中，使用所提出的自监督对比损失对编码器网络进行预训练，学习为主体的不同视图生成视图不变嵌入。然后，在有监督的环境中使用标记数据对模型进行微调。我们在两个多视图FER数据集KDEF和DDCF上演示了所提出的方法的性能，在这两个数据集上实现了最先进的性能。进一步的实验表明，我们的方法在处理具有挑战性的角度和减少标记数据量方面具有鲁棒性。摘要：Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems. Despite recent advancements in FER, performance often drops significantly for non-frontal facial images. We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER. CL-MEx is a two-step training framework. In the first step, an encoder network is pre-trained with the proposed self-supervised contrastive loss, where it learns to generate view-invariant embeddings for different views of a subject. The model is then fine-tuned with labeled data in a supervised setting. We demonstrate the performance of the proposed method on two multi-view FER datasets, KDEF and DDCF, where state-of-the-art performances are achieved. Further experiments show the robustness of our method in dealing with challenging angles and reduced amounts of labeled data.

【7】 Unsupervised Disentanglement without Autoencoding: Pitfalls and Future Directions 标题：不带自动编码的无监督解缠：陷阱和未来方向链接：https://arxiv.org/abs/2108.06613

作者：Andrea Burns,Aaron Sarna,Dilip Krishnan,Aaron Maschinot 备注：Accepted at the ICML 2021 Self-Supervised Learning for Reasoning and Perception Workshop 摘要：解构的视觉表现已经在很大程度上被研究与生成模型，如变分自动编码器（VAE）。虽然之前的工作主要集中在用于分离表示学习的生成方法上，但由于生成模型的当前限制，这些方法不能扩展到大型数据集。相反，我们探索了使用对比学习的正则化方法，这可能会导致对大规模数据集和下游应用足够强大的分离表示。然而，我们发现，由于优化和初始化的敏感性，在任务性能的权衡下，非监督解纠缠很难实现。我们评估了与下游任务的分离，分析了每种正则化方法的优缺点，并讨论了未来的方向。摘要：Disentangled visual representations have largely been studied with generative models such as Variational AutoEncoders (VAEs). While prior work has focused on generative methods for disentangled representation learning, these approaches do not scale to large datasets due to current limitations of generative models. Instead, we explore regularization methods with contrastive learning, which could result in disentangled representations that are powerful enough for large scale datasets and downstream applications. However, we find that unsupervised disentanglement is difficult to achieve due to optimization and initialization sensitivity, with trade-offs in task performance. We evaluate disentanglement with downstream tasks, analyze the benefits and disadvantages of each regularization used, and discuss future directions.

【8】 Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification 标题：联合无监督人员身份识别的边云连续体联合优化链接：https://arxiv.org/abs/2108.06493

作者：Weiming Zhuang,Yonggang Wen,Shuai Zhang 机构：S-Lab, Nanyang Technological University, SenseTime Research 备注：ACMMM'21 摘要：人员重新识别（ReID）旨在从非重叠摄像机视图中重新识别人员。由于person ReID数据包含敏感的个人信息，研究人员采用了联邦学习（federated learning）这一新兴的分布式训练方法来降低隐私泄露风险。然而，现有的研究依赖于数据标签，获取这些标签费时费力。我们提出FedUReID，一个联邦的无监督的person-ReID系统来学习person-ReID模型，而不需要任何标签，同时保护隐私。FedUReID允许在带有未标记数据的边上进行现场模型训练。云服务器从边缘聚合模型，而不是集中原始数据以保护数据隐私。此外，为了解决边缘在数据量和分布方面存在差异的问题，我们通过云和边缘的联合优化对边缘进行个性化训练。具体地说，我们提出了个性化epoch来重新分配整个训练过程中的计算，个性化聚类来迭代地预测未标记数据的合适标签，以及个性化更新来使服务器聚合模型适应每个边缘。在8人ReID数据集上的大量实验表明，FedUReID不仅具有更高的精度，而且还将计算成本降低了29%。我们的FedUReID系统和联合优化将有助于实现联邦学习，使更多的多媒体任务不需要数据标签。摘要：Person re-identification (ReID) aims to re-identify a person from non-overlapping camera views. Since person ReID data contains sensitive personal information, researchers have adopted federated learning, an emerging distributed training method, to mitigate the privacy leakage risks. However, existing studies rely on data labels that are laborious and time-consuming to obtain. We present FedUReID, a federated unsupervised person ReID system to learn person ReID models without any labels while preserving privacy. FedUReID enables in-situ model training on edges with unlabeled data. A cloud server aggregates models from edges instead of centralizing raw data to preserve data privacy. Moreover, to tackle the problem that edges vary in data volumes and distributions, we personalize training in edges with joint optimization of cloud and edge. Specifically, we propose personalized epoch to reassign computation throughout training, personalized clustering to iteratively predict suitable labels for unlabeled data, and personalized update to adapt the server aggregated model to each edge. Extensive experiments on eight person ReID datasets demonstrate that FedUReID not only achieves higher accuracy but also reduces computation cost by 29%. Our FedUReID system with the joint optimization will shed light on implementing federated learning to more multimedia tasks without data labels.

【9】 Collaborative Unsupervised Visual Representation Learning from Decentralized Data 标题：分散数据的协作式无监督视觉表示学习链接：https://arxiv.org/abs/2108.06492

作者：Weiming Zhuang,Xin Gan,Yonggang Wen,Shuai Zhang,Shuai Yi 机构：S-Lab, Nanyang Technological University,Nanyang Technological University,SenseTime Research 备注：ICCV'21 摘要：利用互联网上的集中数据，无监督表征学习取得了优异的成绩。然而，隐私保护意识的提高限制了分散的未标记图像数据的共享，这些数据在多方（如手机和相机）中爆炸性增长。因此，一个自然的问题是如何利用这些数据来学习下游任务的可视化表示，同时保护数据隐私。为了解决这个问题，我们提出了一个新的联邦无监督学习框架FedU。在这个框架中，各方使用在线网络和目标网络的对比学习，独立地从未标记的数据中训练模型。然后，中央服务器聚合经过训练的模型，并使用聚合的模型更新客户机的模型。它保护了数据隐私，因为各方只能访问其原始数据。多方之间分散的数据通常是非独立且分布相同的（非IID），导致性能下降。为了应对这一挑战，我们提出了两种简单但有效的方法：1）设计通信协议，仅上传在线网络的编码器进行服务器聚合，并使用聚合编码器进行更新；2）我们引入了一个新的模块，根据非IID引起的偏差动态决定如何更新预测器。预测器是在线网络的另一个组成部分。大量的实验和烧蚀证明了FedU的有效性和重要性。在非IID数据的线性和半监督评估中，它的表现优于仅使用一方的训练，超过5%，其他方法超过14%。摘要：Unsupervised representation learning has achieved outstanding performances using centralized data available on the Internet. However, the increasing awareness of privacy protection limits sharing of decentralized unlabeled image data that grows explosively in multiple parties (e.g., mobile phones and cameras). As such, a natural problem is how to leverage these data to learn visual representations for downstream tasks while preserving data privacy. To address this problem, we propose a novel federated unsupervised learning framework, FedU. In this framework, each party trains models from unlabeled data independently using contrastive learning with an online network and a target network. Then, a central server aggregates trained models and updates clients' models with the aggregated model. It preserves data privacy as each party only has access to its raw data. Decentralized data among multiple parties are normally non-independent and identically distributed (non-IID), leading to performance degradation. To tackle this challenge, we propose two simple but effective methods: 1) We design the communication protocol to upload only the encoders of online networks for server aggregation and update them with the aggregated encoder; 2) We introduce a new module to dynamically decide how to update predictors based on the divergence caused by non-IID. The predictor is the other component of the online network. Extensive experiments and ablations demonstrate the effectiveness and significance of FedU. It outperforms training with only one party by over 5% and other methods by over 14% in linear and semi-supervised evaluation on non-IID data.

【10】 Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring 标题：关注积极因素：用于生物多样性监测的自我监督学习链接：https://arxiv.org/abs/2108.06435

作者：Omiros Pantazis,Gabriel Brostow,Kate Jones,Oisin Mac Aodha 机构：Gabriel J. Brostow, Kate E. Jones, University College London, Niantic, University of Edinburgh 备注：ICCV 2021 摘要：我们解决了从未标记图像集合中学习自监督表示的问题。与试图通过最大化每个输入图像的增强版本之间的相似性或通过推测性地拾取负样本来学习有用特征的现有方法不同，我们还利用了使用静态监控摄像机捕获的图像集合中发生的自然变化。为了实现这一点，我们利用现成的上下文数据对输入图像之间的空间和时间关系等信息进行编码。通过在训练时首先识别高概率正对，即那些可能描述相同视觉概念的图像，我们能够学习对下游监督分类出奇有效的表示。对于全球生物多样性监测的关键任务而言，这会产生图像特征，可以在有限的人类监督下适应具有挑战性的视觉物种分类任务。我们展示了在三种不同的自监督学习方法中对四种不同的相机捕捉图像采集的结果，并表明与现有基线（如传统的自监督训练和转移学习）相比，在训练时仔细选择图像可以获得更好的性能。摘要：We address the problem of learning self-supervised representations from unlabeled image collections. Unlike existing approaches that attempt to learn useful features by maximizing similarity between augmented versions of each input image or by speculatively picking negative samples, we instead also make use of the natural variation that occurs in image collections that are captured using static monitoring cameras. To achieve this, we exploit readily available context data that encodes information such as the spatial and temporal relationships between the input images. We are able to learn representations that are surprisingly effective for downstream supervised classification, by first identifying high probability positive pairs at training time, i.e. those images that are likely to depict the same visual concept. For the critical task of global biodiversity monitoring, this results in image features that can be adapted to challenging visual species classification tasks with limited human supervision. We present results on four different camera trap image collections, across three different families of self-supervised learning methods, and show that careful image selection at training time results in superior performance compared to existing baselines such as conventional self-supervised training and transfer learning.

【11】 Detecting OODs as datapoints with High Uncertainty 标题：检测作为高不确定性数据点的OOD 链接：https://arxiv.org/abs/2108.06380

作者：Ramneet Kaur,Susmit Jha,Anirban Roy,Sangdon Park,Oleg Sokolsky,Insup Lee 机构：University of Pennsylvania 备注：None 摘要：众所周知，深度神经网络（DNN）会对分布外输入（OOD）产生非常高置信度的错误预测。这一限制是在自动驾驶、空中交通管理和医疗诊断等高保证系统中采用DNN的关键挑战之一。这一挑战最近受到了极大的关注，并且已经开发了几种技术来检测模型预测不可信的输入。这些技术将OOD检测为具有高认知不确定性或高任意不确定性的数据点。我们展示了这些技术在检测能力上的差异，并提出了一种将OOD检测为具有高度不确定性（认知或任意）的数据点的集成方法。我们在具有多个DNN体系结构的vision数据集上进行了实验，在大多数情况下获得了最先进的结果。摘要：Deep neural networks (DNNs) are known to produce incorrect predictions with very high confidence on out-of-distribution inputs (OODs). This limitation is one of the key challenges in the adoption of DNNs in high-assurance systems such as autonomous driving, air traffic management, and medical diagnosis. This challenge has received significant attention recently, and several techniques have been developed to detect inputs where the model's prediction cannot be trusted. These techniques detect OODs as datapoints with either high epistemic uncertainty or high aleatoric uncertainty. We demonstrate the difference in the detection ability of these techniques and propose an ensemble approach for detection of OODs as datapoints with high uncertainty (epistemic or aleatoric). We perform experiments on vision datasets with multiple DNN architectures, achieving state-of-the-art results in most cases.

【12】 Weakly Supervised Continual Learning 标题：弱监督连续学习链接：https://arxiv.org/abs/2108.06552

作者：Matteo Boschini,Pietro Buzzega,Lorenzo Bonicelli,Angelo Porrello,Simone Calderara 机构： Member, IEEE 备注：11 pages, 4 figures 摘要：持续学习（CL）研究如何在不导致灾难性遗忘的情况下在任务流上训练深层网络。文献中提出的CL设置假设每个传入的示例都与基础真理注释配对。然而，这与许多现实世界的应用程序相冲突：当数据以流的形式流动并且必须实时使用时，收集标记数据（其本身既繁琐又昂贵）确实变得不可行。这项工作探索了弱监督连续学习（WSCL）：在这里，只有一小部分标记的输入示例显示给学习者。我们评估了当前CL方法（例如：EWC、LwF、iCaRL、ER、GDumb、DER）在这种新颖且具有挑战性的场景中的表现，其中过度拟合会导致遗忘。随后，我们设计了两种新的WSCL方法，它们利用度量学习和一致性正则化在学习过程中利用无监督数据。在这样做的过程中，我们表明，不仅我们的提案在监督信息稀缺时表现出更高的灵活性，而且少于25%的标签足以达到甚至超过在完全监督下训练的SOTA方法。摘要：Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring catastrophic forgetting. CL settings proposed in the literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes indeed infeasible when data flow as a stream and must be consumed in real-time. This work explores Weakly Supervised Continual Learning (WSCL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, in which overfitting entangles forgetting. Subsequently, we design two novel WSCL methods which exploit metric learning and consistency regularization to leverage unsupervised data while learning. In doing so, we show that not only our proposals exhibit higher flexibility when supervised information is scarce, but also that less than 25% labels can be enough to reach or even outperform SOTA methods trained under full supervision.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】 Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning 标题：基于强化学习的信息路径规划策略自适应选择链接：https://arxiv.org/abs/2108.06618

作者：Taeyeong Choi,Grzegorz Cielniak 备注：Published in the proceedings of ECMR 2021 摘要：在我们之前的工作中，我们设计了一个系统策略，通过使用高斯过程回归（GPR）的预测不确定性作为路径规划中部署机器人的“吸引力”，对采样位置进行优先排序，从而显著提高空间插值的精度。尽管与旅行商问题（TSP）解算器的集成也显示出相对较短的旅行距离，但我们在此假设了几个可能降低整体预测精度的因素，因为次优位置最终可能包含在其路径中。为了解决这个问题，在本文中，我们首先探讨了采用不同空间范围的“局部规划”方法，在这些空间范围内，下一个采样位置被优先排序，以调查它们对预测性能以及产生的旅行距离的影响。此外，训练基于强化学习（RL）的高级控制器，从一组特定的本地计划员自适应生成混合计划，以根据最新的预测状态从选择中继承独特的优势。我们在温度监测机器人用例上的实验表明，动态混合的计划者不仅可以生成复杂的，信息型计划，单个计划员无法单独创建，但也可以确保显著缩短行程距离，而无需任何最短路径计算模块的辅助，且不以预测可靠性为代价。摘要：In our previous work, we designed a systematic policy to prioritize sampling locations to lead significant accuracy improvement in spatial interpolation by using the prediction uncertainty of Gaussian Process Regression (GPR) as "attraction force" to deployed robots in path planning. Although the integration with Traveling Salesman Problem (TSP) solvers was also shown to produce relatively short travel distance, we here hypothesise several factors that could decrease the overall prediction precision as well because sub-optimal locations may eventually be included in their paths. To address this issue, in this paper, we first explore "local planning" approaches adopting various spatial ranges within which next sampling locations are prioritized to investigate their effects on the prediction performance as well as incurred travel distance. Also, Reinforcement Learning (RL)-based high-level controllers are trained to adaptively produce blended plans from a particular set of local planners to inherit unique strengths from that selection depending on latest prediction states. Our experiments on use cases of temperature monitoring robots demonstrate that the dynamic mixtures of planners can not only generate sophisticated, informative plans that a single planner could not create alone but also ensure significantly reduced travel distances at no cost of prediction reliability without any assist of additional modules for shortest path calculation.

【2】 Fractional Transfer Learning for Deep Model-Based Reinforcement Learning 标题：基于深度模型强化学习的分数转移学习链接：https://arxiv.org/abs/2108.06526

作者：Remo Sasso,Matthia Sabatelli,Marco A. Wiering 机构：Dept. Artificial Intelligence, University of Groningen 备注：21 pages, 8 figures, 7 tables 摘要：强化学习（RL）因需要大量数据才能让RL代理学习执行复杂任务而闻名。基于模型的RL的最新进展使代理能够更加高效地处理数据，因为它使代理能够利用环境的内部世界模型在想象中学习视觉环境的行为。通过重用以前学习过的任务中的知识也可以提高样本效率，但迁移学习在RL中仍然是一个具有挑战性的课题。基于参数的迁移学习通常采用全有或全无的方法，其中网络的参数要么完全转移，要么随机初始化。在这项工作中，我们提出了一种简单的替代方法：分数转移学习。其思想是转移知识的一小部分，而不是像随机初始化那样丢弃潜在有用的知识。使用基于世界模型的Dreamer算法，我们确定该方法适用于哪种类型的组件，并在新的多源迁移学习环境中进行实验。结果表明，与从头开始学习和随机初始化相比，分数转移学习通常能显著提高性能和更快的学习速度。摘要：Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks. Recent progress in model-based RL allows agents to be much more data-efficient, as it enables them to learn behaviors of visual environments in imagination by leveraging an internal World Model of the environment. Improved sample efficiency can also be achieved by reusing knowledge from previously learned tasks, but transfer learning is still a challenging topic in RL. Parameter-based transfer learning is generally done using an all-or-nothing approach, where the network's parameters are either fully transferred or randomly initialized. In this work we present a simple alternative approach: fractional transfer learning. The idea is to transfer fractions of knowledge, opposed to discarding potentially useful knowledge as is commonly done with random initialization. Using the World Model-based Dreamer algorithm, we identify which type of components this approach is applicable to, and perform experiments in a new multi-source transfer learning setting. The results show that fractional transfer learning often leads to substantially improved performance and faster learning compared to learning from scratch and random initialization.

【3】 GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints 标题：GC-TTS：几何约束条件下的Few-Shot说话人自适应链接：https://arxiv.org/abs/2108.06890

作者：Ji-Hoon Kim,Sang-Hoon Lee,Ji-Hyun Lee,Hong-Gyu Jung,Seong-Whan Lee 机构：KoreaUniversity 备注：Accepted paper in IEEE International Conference on Systems, Man, and Cybernetics (SMC 2021) 摘要：Few-Shot说话人自适应是一种特定的文本到语音（TTS）系统，旨在通过少量的训练数据再现新说话人的声音。尽管已经对Few-Shot说话人自适应系统进行了多次尝试，但根据数据量的不同，在说话人与目标说话人的相似性方面仍然存在差距。为了弥补这一差距，我们提出了GC-TTS，它在显著提高说话人相似度的同时实现了高质量的说话人自适应。具体来说，我们利用两个几何约束来学习区分性说话人表示。在这里，一个TTS模型是为具有足够数据量的基本说话人预先训练的，然后在具有两个几何约束的几分钟数据上为新说话人进行微调。两个几何约束使得该模型能够从有限的数据中提取有区别的说话人嵌入，从而合成可理解的语音。我们讨论并验证了GC-TTS的有效性，并将其与流行的基本方法进行了比较。实验结果表明，GC-TTS仅从几分钟的训练数据中生成高质量的语音，在说话人与目标说话人的相似性方面优于标准技术。摘要：Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to reproduce a novel speaker's voice with a few training data. While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data. To bridge the gap, we propose GC-TTS which achieves high-quality speaker adaptation with significantly improved speaker similarity. Specifically, we leverage two geometric constraints to learn discriminative speaker representations. Here, a TTS model is pre-trained for base speakers with a sufficient amount of data, and then fine-tuned for novel speakers on a few minutes of data with two geometric constraints. Two geometric constraints enable the model to extract discriminative speaker embeddings from limited data, which leads to the synthesis of intelligible speech. We discuss and verify the effectiveness of GC-TTS by comparing it with popular and essential methods. The experimental results demonstrate that GC-TTS generates high-quality speech from only a few minutes of training data, outperforming standard techniques in terms of speaker similarity to the target speaker.

强化学习(5篇)

【1】 Using Cyber Terrain in Reinforcement Learning for Penetration Testing 标题：数字地形在突防测试强化学习中的应用链接：https://arxiv.org/abs/2108.07124

作者：Rohit Gangupantulu,Tyler Cody,Paul Park,Abdul Rahman,Logan Eisenbeiser,Dan Radke,Ryan Clark 机构：Deloitte Consulting LLC, Hume Center for National Security and Technology, Virginia Polytechnic University, Deloitte & Touche LLP, †Co-first authors 摘要：强化学习（RL）已应用于渗透测试的攻击图，但是，训练有素的代理无法反映真实情况，因为攻击图缺乏通常在战场情报准备（IPB）中捕获的作战细微差别，其中包括（网络）地形的概念。特别是，目前的做法专门使用通用漏洞评分系统（CVSS）及其组件构建攻击图。我们提出了利用IPB中关于网络地形的概念构建攻击图的方法，包括障碍物分析、接近通道分析、关键地形分析、观察和火力场分析、掩护和隐蔽分析。我们在一个例子中演示了我们的方法，其中防火墙被视为障碍物，并用（1）奖励空间和（2）状态动力学表示。我们表明，地形分析可以用来为RL攻击图带来真实感。摘要：Reinforcement learning (RL) has been applied to attack graphs for penetration testing, however, trained agents do not reflect reality because the attack graphs lack operational nuances typically captured within the intelligence preparation of the battlefield (IPB) that include notions of (cyber) terrain. In particular, current practice constructs attack graphs exclusively using the Common Vulnerability Scoring System (CVSS) and its components. We present methods for constructing attack graphs using notions from IPB on cyber terrain analysis of obstacles, avenues of approach, key terrain, observation and fields of fire, and cover and concealment. We demonstrate our methods on an example where firewalls are treated as obstacles and represented in (1) the reward space and (2) the state dynamics. We show that terrain analysis can be used to bring realism to attack graphs for RL.

【2】 Introduction to Quantum Reinforcement Learning: Theory and PennyLane-based Implementation 标题：量子强化学习导论：理论与基于PennyLane的实现链接：https://arxiv.org/abs/2108.06849

作者：Yunseok Kwak,Won Joon Yun,Soyi Jung,Jong-Kook Kim,Joongheon Kim 机构：◦School of Electrical Engineering, Korea University, Seoul, Republic of Korea, †School of Software, Hallym University, Chungcheon, Republic of Korea 摘要：量子计算的出现使研究人员能够将量子电路应用于许多现有的研究中。利用量子电路和量子差分编程进行了许多研究，如量子机器学习（QML）。特别是，量子强化学习是检验量子机器学习可能性的一个很好的领域，目前正在进行大量的研究。本文将介绍利用变分量子电路进行量子强化学习的概念，并通过实现和实验验证其可行性。我们将首先介绍量子强化学习的背景知识和工作原理，然后指导使用PennyLane库的实现方法。我们还将从实验结果中讨论量子强化学习的能力和可能性。摘要：The emergence of quantum computing enables for researchers to apply quantum circuit on many existing studies. Utilizing quantum circuit and quantum differential programming, many research are conducted such as textit{Quantum Machine Learning} (QML). In particular, quantum reinforcement learning is a good field to test the possibility of quantum machine learning, and a lot of research is being done. This work will introduce the concept of quantum reinforcement learning using a variational quantum circuit, and confirm its possibility through implementation and experimentation. We will first present the background knowledge and working principle of quantum reinforcement learning, and then guide the implementation method using the PennyLane library. We will also discuss the power and possibility of quantum reinforcement learning from the experimental results obtained through this work.

【3】 Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs 标题：办公室需求响应中能源定价的离线-在线强化学习：降低能源和数据成本链接：https://arxiv.org/abs/2108.06594

作者：Doseok Jang,Lucas Spangher,Manan Khattar,Utkarsha Agwan,Selvaprabuh Nadarajah,Costas Spanos 机构：. Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, California, USA ,. Department of Information and Decision Sciences, University of Illinois, Chicago 摘要：我们的团队提议在一座办公楼中进行一次全面的能源需求响应实验。尽管这是一项激动人心的工作，将为社区提供价值，但为强化学习代理收集训练数据的成本高昂且有限。在这项工作中，我们将研究如何利用离线训练来最小化数据成本（加速收敛）和项目实施成本。我们提出了两种方法来实现这一点：对模型进行预训练，用模拟的任务来热身开始实验，以及使用经过训练的计划模型来模拟真实世界对代理的奖励。我们给出的结果证明了离线强化学习在能源需求响应问题中有效定价的效用。摘要：Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program implementation costs. We present two approaches to doing so: pretraining our model to warm start the experiment with simulated tasks, and using a planning model trained to simulate the real world's rewards to the agent. We present results that demonstrate the utility of offline reinforcement learning to efficient price-setting in the energy demand response problem.

【4】 A Microscopic Pandemic Simulator for Pandemic Prediction Using Scalable Million-Agent Reinforcement Learning 标题：一种基于可扩展百万智能体强化学习的微观流感大流行模拟器链接：https://arxiv.org/abs/2108.06589

作者：Zhenggang Tang,Kai Yan,Liting Sun,Wei Zhan,Changliu Liu 机构：Peking University, University of California, Berkeley, Carnegie Mellon University 备注：14 pages 摘要：微观传染病模型是政府决策者预测和模拟传染病暴发的有力工具，可以捕捉个体行为对宏观现象的影响。然而，现有的模型只考虑简单的基于规则的个体行为，限制了它们的适用性。提出了一种基于深度强化学习的微观模型——微观流行病模拟器（MPS）。MPS将基于规则的代理替换为理性代理，理性代理的行为被驱动以实现回报最大化，从而更好地近似真实世界的动态。为了有效地模拟MPS中的大量代理，我们提出了可扩展的百万代理DQN（SMADQN）。MPS使我们能够有效地评估不同政府战略的影响。本文首先根据美国阿勒格尼的真实数据校准了MPS，然后实证评估了两种政府战略：信息披露和隔离。结果验证了该方法的有效性。作为广泛的影响，本文为DRL在大规模基于代理的网络（如经济和社会网络）中的应用提供了新的见解。摘要：Microscopic epidemic models are powerful tools for government policy makers to predict and simulate epidemic outbreaks, which can capture the impact of individual behaviors on the macroscopic phenomenon. However, existing models only consider simple rule-based individual behaviors, limiting their applicability. This paper proposes a deep-reinforcement-learning-powered microscopic model named Microscopic Pandemic Simulator (MPS). By replacing rule-based agents with rational agents whose behaviors are driven to maximize rewards, the MPS provides a better approximation of real world dynamics. To efficiently simulate with massive amounts of agents in MPS, we propose Scalable Million-Agent DQN (SMADQN). The MPS allows us to efficiently evaluate the impact of different government strategies. This paper first calibrates the MPS against real-world data in Allegheny, US, then demonstratively evaluates two government strategies: information disclosure and quarantine. The results validate the effectiveness of the proposed method. As a broad impact, this paper provides novel insights for the application of DRL in large scale agent-based networks such as economic and social networks.

【5】 Optimal Scheduling of Isolated Microgrids Using Automated Reinforcement Learning-based Multi-period Forecasting 标题：基于自动强化学习的多周期预测孤立微电网优化调度链接：https://arxiv.org/abs/2108.06764

作者：Yang Li,Ruinong Wang,Zhen Yang 机构： Wang are with the School of Electrical Engineering, NortheastElectric Power University 备注：Accepted by IEEE Transactions on Sustainable Energy 摘要：为了减少负荷和可再生能源输出的不确定性对微电网运行的负面影响，采用基于自动强化学习的可再生能源发电和负荷多周期预测方法，提出了一种孤立微电网的优化调度模型。首先，设计了一个优先体验重播自动强化学习（PER AutoRL），以简化基于深度强化学习（DRL）的预测模型的定制部署，首次提出了基于PER AutoRL的单步多周期预测方法，解决了现有多步预测方法存在的误差积累问题，并通过误差分布对预测值进行修正，提高了预测精度；其次，以微电网总运行成本最小为目标，建立了考虑需求响应的调度模型，以修正后的预测值为调度依据，根据误差分布设置旋转备用机会约束；最后，利用序列运算理论（SOT）将原调度模型转化为易于求解的混合整数线性规划模型，并用CPLEX求解器求解。仿真结果表明，与传统的无预测调度模型相比，该方法通过提高预测精度显著降低了系统运行成本。摘要：In order to reduce the negative impact of the uncertainty of load and renewable energies outputs on microgrid operation, an optimal scheduling model is proposed for isolated microgrids by using automated reinforcement learning-based multi-period forecasting of renewable power generations and loads. Firstly, a prioritized experience replay automated reinforcement learning (PER-AutoRL) is designed to simplify the deployment of deep reinforcement learning (DRL)-based forecasting model in a customized manner, the single-step multi-period forecasting method based on PER-AutoRL is proposed for the first time to address the error accumulation issue suffered by existing multi-step forecasting methods, then the prediction values obtained by the proposed forecasting method are revised via the error distribution to improve the prediction accuracy; secondly, a scheduling model considering demand response is constructed to minimize the total microgrid operating costs, where the revised forecasting values are used as the dispatch basis, and a spinning reserve chance constraint is set according to the error distribution; finally, by transforming the original scheduling model into a readily solvable mixed integer linear programming via the sequence operation theory (SOT), the transformed model is solved by using CPLEX solver. The simulation results show that compared with the traditional scheduling model without forecasting, this approach manages to significantly reduce the system operating costs by improving the prediction accuracy.

元学习(2篇)

【1】 Efficient Federated Meta-Learning over Multi-Access Wireless Networks 标题：多址无线网络中高效的联合元学习链接：https://arxiv.org/abs/2108.06453

作者：Sheng Yue,Ju Ren,Jiang Xin,Deyu Zhang,Yaoxue Zhang,Weihua Zhuang 机构：Ju Ren and Yaoxue Zhang are with the Department of Computer Sci-ence and Technology, Tsinghua University 摘要：联邦元学习（FML）已经成为一种很有前途的范式，可以应对当今边缘学习领域中的数据限制和异构性挑战。然而，其性能往往受到收敛速度慢和通信效率低的限制。此外，由于无线带宽和物联网设备的能量容量通常不足，因此在现实无线网络中部署FML时，控制资源分配和能量消耗至关重要。为了克服这些挑战，在本文中，我们首先严格分析每一轮中每个设备对全局损耗降低的贡献，并开发一个FML算法（称为NUFM），该算法采用非均匀设备选择方案来加速收敛。在此基础上，我们提出了一个在多址无线系统中集成NUFM的资源分配问题，以共同提高收敛速度，最小化挂钟时间和能量消耗。通过逐步解构原问题，我们设计了一种联合设备选择和资源分配策略（称为URAL）来解决该问题并提供理论保证。此外，我们还表明，通过结合两种一阶近似技术，NUFM的计算复杂度可以从$O（d^2）$降低到$O（d）$（其中$d$为模型维）。大量的仿真结果表明，与现有的基线相比，所提出的方法是有效的和优越的。摘要：Federated meta-learning (FML) has emerged as a promising paradigm to cope with the data limitation and heterogeneity challenges in today's edge learning arena. However, its performance is often limited by slow convergence and corresponding low communication efficiency. Besides, since the wireless bandwidth and IoT devices' energy capacity are usually insufficient, it is crucial to control the resource allocation and energy consumption when deploying FML in realistic wireless networks. To overcome these challenges, in this paper, we first rigorously analyze each device's contribution to the global loss reduction in each round and develop an FML algorithm (called NUFM) with a non-uniform device selection scheme to accelerate the convergence. After that, we formulate a resource allocation problem integrating NUFM in multi-access wireless systems to jointly improve the convergence rate and minimize the wall-clock time along with energy cost. By deconstructing the original problem step by step, we devise a joint device selection and resource allocation strategy (called URAL) to solve the problem and provide theoretical guarantees. Further, we show that the computational complexity of NUFM can be reduced from $O(d^2)$ to $O(d)$ (with $d$ being the model dimension) via combining two first-order approximation techniques. Extensive simulation results demonstrate the effectiveness and superiority of the proposed methods by comparing with the existing baselines.

【2】 AdaGNN: A multi-modal latent representation meta-learner for GNNs based on AdaBoosting 标题：AdaGNN：一种基于AdaBoosting的多模态潜在表示元学习器链接：https://arxiv.org/abs/2108.06452

作者：Qinyi Zhu,Yiou Xiao 机构：University of California, Berkeley, Berkeley, California, USA, LinkedIn, Sunnyvale, California, USA 摘要：作为深度学习的一个特殊领域，图形神经网络（GNNs）专注于提取网络的固有特征，在学术界和工业界都得到了前所未有的普及。大多数最先进的GNN模型提供了表现力强、健壮、可扩展和归纳的解决方案，使社交网络推荐系统具有丰富的网络功能，这些功能在计算上很难通过基于图遍历的方法加以利用。最新的GNN遵循编码器-解码器范式，将高维异构信息从子图编码到低维嵌入空间。然而，单个嵌入空间通常无法捕获图形信号的所有方面。在这项工作中，我们提出了基于boosting的GNNs元学习器，它可以自动学习多个投影和相应的嵌入空间，从而捕获图形信号的不同方面。结果，子图之间的相似性通过在多个嵌入空间上嵌入邻近性来量化。对于具有丰富多样的节点邻域信息的应用程序，AdaGNN表现得异常出色。此外，对于节点级和边缘级任务，AdaGNN与任何感应式GNN都是兼容的。摘要：As a special field in deep learning, Graph Neural Networks (GNNs) focus on extracting intrinsic network features and have drawn unprecedented popularity in both academia and industry. Most of the state-of-the-art GNN models offer expressive, robust, scalable and inductive solutions empowering social network recommender systems with rich network features that are computationally difficult to leverage with graph traversal based methods. Most recent GNNs follow an encoder-decoder paradigm to encode high dimensional heterogeneous information from a subgraph onto one low dimensional embedding space. However, one single embedding space usually fails to capture all aspects of graph signals. In this work, we propose boosting-based meta learner for GNNs, which automatically learns multiple projections and the corresponding embedding spaces that captures different aspects of the graph signals. As a result, similarities between sub-graphs are quantified by embedding proximity on multiple embedding spaces. AdaGNN performs exceptionally well for applications with rich and diverse node neighborhood information. Moreover, AdaGNN is compatible with any inductive GNNs for both node-level and edge-level tasks.

医学相关(3篇)

【1】 Task-wise Split Gradient Boosting Trees for Multi-center Diabetes Prediction 标题：基于任务分割梯度增强树的多中心糖尿病预测链接：https://arxiv.org/abs/2108.07107

作者：Mingcheng Chen,Zhenghui Wang,Zhiyun Zhao,Weinan Zhang,Xiawei Guo,Jian Shen,Yanru Qu,Jieli Lu,Min Xu,Yu Xu,Tiange Wang,Mian Li,Wei-Wei Tu,Yong Yu,Yufang Bi,Weiqing Wang,Guang Ning 机构：Shanghai Jiao Tong University, Shanghai, China ,Ruijin Hospital, SJTU School of Medicine, Shanghai, China ,Paradigm Inc., Beijing, China 备注：11 pages (2 pages of supplementary), 10 figures, 7 tables. Accepted by ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021) 摘要：糖尿病预测是数据科学在社会医疗领域的重要应用。在糖尿病预测任务中存在两个主要挑战：由于人口统计学和代谢数据的类型不同，导致数据异质性；由于单个医疗中心的糖尿病病例数量通常有限，导致数据不足。为了应对上述挑战，我们采用梯度提升决策树（GBDT）来处理数据异构性，并引入多任务学习（MTL）来解决数据不足的问题。为此，针对多中心糖尿病预测任务，提出了基于任务的分割梯度推进树（TSGB）。具体来说，我们首先引入任务增益，在树形结构中分别评估每个任务，并对GBDT的学习目标进行了理论分析。其次，我们揭示了在MTL中直接应用GBDT的一个问题，即负任务增益问题。最后，我们提出了一种新的基于任务增益统计的MTL中GBDT分割方法，称为任务分割，作为标准特征分割的替代方法，以克服上述负任务增益问题。在大规模真实糖尿病数据集和常用基准数据集上的大量实验表明，TSGB与几种最先进的方法相比具有优异的性能。详细的案例研究进一步支持了我们对消极任务获得问题的分析，并提供了深刻的发现。建议的TSGB方法已被部署为在线糖尿病风险评估软件，用于早期诊断。摘要：Diabetes prediction is an important data science application in the social healthcare domain. There exist two main challenges in the diabetes prediction task: data heterogeneity since demographic and metabolic data are of different types, data insufficiency since the number of diabetes cases in a single medical center is usually limited. To tackle the above challenges, we employ gradient boosting decision trees (GBDT) to handle data heterogeneity and introduce multi-task learning (MTL) to solve data insufficiency. To this end, Task-wise Split Gradient Boosting Trees (TSGB) is proposed for the multi-center diabetes prediction task. Specifically, we firstly introduce task gain to evaluate each task separately during tree construction, with a theoretical analysis of GBDT's learning objective. Secondly, we reveal a problem when directly applying GBDT in MTL, i.e., the negative task gain problem. Finally, we propose a novel split method for GBDT in MTL based on the task gain statistics, named task-wise split, as an alternative to standard feature-wise split to overcome the mentioned negative task gain problem. Extensive experiments on a large-scale real-world diabetes dataset and a commonly used benchmark dataset demonstrate TSGB achieves superior performance against several state-of-the-art methods. Detailed case studies further support our analysis of negative task gain problems and provide insightful findings. The proposed TSGB method has been deployed as an online diabetes risk assessment software for early diagnosis.

【2】 Dilated Inception U-Net (DIU-Net) for Brain Tumor Segmentation 标题：扩展初始U-网(DIU-Net)在脑肿瘤分割中的应用链接：https://arxiv.org/abs/2108.06772

作者：Daniel E. Cahall,Ghulam Rasool,Nidhal C. Bouaynaya,Hassan M. Fathallah-Shaykh 机构： Department of Electrical and Computer Engineering, Rowan University, University of Alabama at Birmingham 摘要：磁共振成像（MRI）通常用于脑肿瘤诊断、治疗计划和治疗后监测。最近，基于深度神经网络的各种模型被提出用于脑磁共振成像中肿瘤的像素级分割。然而，磁共振成像的结构变化、空间差异和强度不均匀性使得分割成为一项具有挑战性的任务。我们提出了一种新的基于U-Net的端到端脑肿瘤分割架构，该架构将起始模块和扩展卷积集成到其收缩和扩展路径中。这使我们能够提取局部结构信息以及全局上下文信息。我们使用脑肿瘤分割（BraTS）2018数据集对胶质瘤亚区域进行分割，包括肿瘤核心、增强肿瘤和整个肿瘤。我们提出的模型在肿瘤核心和整个肿瘤分割方面的表现明显优于最先进的基于U-Net的模型（$p<0.05$）。摘要：Magnetic resonance imaging (MRI) is routinely used for brain tumor diagnosis, treatment planning, and post-treatment surveillance. Recently, various models based on deep neural networks have been proposed for the pixel-level segmentation of tumors in brain MRIs. However, the structural variations, spatial dissimilarities, and intensity inhomogeneity in MRIs make segmentation a challenging task. We propose a new end-to-end brain tumor segmentation architecture based on U-Net that integrates Inception modules and dilated convolutions into its contracting and expanding paths. This allows us to extract local structural as well as global contextual information. We performed segmentation of glioma sub-regions, including tumor core, enhancing tumor, and whole tumor using Brain Tumor Segmentation (BraTS) 2018 dataset. Our proposed model performed significantly better than the state-of-the-art U-Net-based model ($p<0.05$) for tumor core and whole tumor segmentation.

【3】 Multi-Slice Dense-Sparse Learning for Efficient Liver and Tumor Segmentation 标题：多层密集-稀疏学习在肝脏和肿瘤分割中的应用链接：https://arxiv.org/abs/2108.06761

作者：Ziyuan Zhao,Zeyu Ma,Yanjie Liu,Zeng Zeng,Pierce KH Chow 机构： 2 NationalUniversity of Singapore 备注：Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE EMBC 2021 摘要：准确的肝脏和肿瘤自动分割在治疗计划和疾病监测中起着至关重要的作用。近年来，深度卷积神经网络（DCNNs）在二维和三维医学图像分割中取得了巨大的成功。然而，2D DCNN不能充分利用片间信息，而3D DCNN计算成本高且内存密集。为了解决这些问题，我们首先从数据的角度提出了一种新的密集稀疏训练流，其中，密集相邻切片和稀疏相邻切片被提取作为正则化DCNN的输入，从而提高模型性能。此外，我们还从网络的角度设计了一个2.5D轻型nnU网络，其中采用了深度可分离卷积来提高效率。在LiTS数据集上的大量实验证明了该方法的优越性。摘要：Accurate automatic liver and tumor segmentation plays a vital role in treatment planning and disease monitoring. Recently, deep convolutional neural network (DCNNs) has obtained tremendous success in 2D and 3D medical image segmentation. However, 2D DCNNs cannot fully leverage the inter-slice information, while 3D DCNNs are computationally expensive and memory intensive. To address these issues, we first propose a novel dense-sparse training flow from a data perspective, in which, densely adjacent slices and sparsely adjacent slices are extracted as inputs for regularizing DCNNs, thereby improving the model performance. Moreover, we design a 2.5D light-weight nnU-Net from a network perspective, in which, depthwise separable convolutions are adopted to improve the efficiency. Extensive experiments on the LiTS dataset have demonstrated the superiority of the proposed method.

蒸馏|知识提取(1篇)

【1】 Neural-to-Tree Policy Distillation with Policy Improvement Criterion 标题：具有策略改进准则的神经到树策略精馏链接：https://arxiv.org/abs/2108.06898

作者：Zhao-Hua Li,Yang Yu,Yingfeng Chen,Ke Chen,Zhipeng Hu,Changjie Fan 机构：National Key Laboratory for Novel Software Technology, Nanjing University, China, Polixir Technologies, China, NetEase Fuxi AI Lab, China, Zhejiang University, China 摘要：虽然深度强化学习在具有挑战性的决策任务中取得了很好的效果，但其成功的主要因素——深度神经网络大多是黑匣子。深入了解黑箱模型的一种可行方法是将其提取到一个可解释的模型，如决策树，该模型由if-then规则组成，易于掌握和验证。然而，传统的模型提取通常是在平稳数据分布假设下的有监督学习任务，这在强化学习中被违反。因此，一个典型的策略提取，即使是很小的错误也会克隆模型行为，这可能会带来数据分布的变化，导致不满意的提取策略模型保真度低或性能低。在本文中，我们建议通过将蒸馏目标从行为克隆改为最大化优势评估来解决这个问题。新的蒸馏目标使近似累积报酬最大化，并更多地关注关键状态下的灾难性行为，从而控制数据转移效应。我们在几个健身任务、商业格斗游戏和自动驾驶汽车模拟器上评估了我们的方法。实证结果表明，与行为克隆相比，该方法能保持较高的累积报酬，并能学习到与原始策略更一致的策略。此外，通过检查从提取的决策树中提取的规则，我们证明了所提出的方法提供了合理和稳健的决策。摘要：While deep reinforcement learning has achieved promising results in challenging decision-making tasks, the main bones of its success --- deep neural networks are mostly black-boxes. A feasible way to gain insight into a black-box model is to distill it into an interpretable model such as a decision tree, which consists of if-then rules and is easy to grasp and be verified. However, the traditional model distillation is usually a supervised learning task under a stationary data distribution assumption, which is violated in reinforcement learning. Therefore, a typical policy distillation that clones model behaviors with even a small error could bring a data distribution shift, resulting in an unsatisfied distilled policy model with low fidelity or low performance. In this paper, we propose to address this issue by changing the distillation objective from behavior cloning to maximizing an advantage evaluation. The novel distillation objective maximizes an approximated cumulative reward and focuses more on disastrous behaviors in critical states, which controls the data shift effect. We evaluate our method on several Gym tasks, a commercial fight game, and a self-driving car simulator. The empirical results show that the proposed method can preserve a higher cumulative reward than behavior cloning and learn a more consistent policy to the original one. Moreover, by examining the extracted rules from the distilled decision trees, we demonstrate that the proposed method delivers reasonable and robust decisions.

聚类(3篇)

【1】 Robust Hierarchical Clustering for Directed Networks: An Axiomatic Approach 标题：有向网络的鲁棒层次聚类：一种公理化方法链接：https://arxiv.org/abs/2108.07247

作者：Gunnar Carlsson,Facundo Mémoli,Santiago Segarra 摘要：我们提供了一个完整的分类特征的鲁棒分层聚类方法有向网络遵循公理化的方法。我们首先介绍了与层次聚类中鲁棒性概念相关的三个实用属性：线性尺度保持性、稳定性和兴奋性。线性尺度保持强制不影响度量单位的变化，而稳定性确保输入网络中的有界扰动会导致聚类输出中的有界扰动。兴奋性是指聚类结果的局部一致性。从算法上讲，兴奋性意味着我们可以通过仅对数据子集进行聚类来降低计算复杂性，同时从理论上保证在对整个数据集进行聚类时可以观察到相同的分层结果。在这三个属性的同时，我们引入了可表示性的概念，这是一种生成模型，通过指定聚类方法在网络集合上的行为来描述聚类方法。我们的主要结果是利用这个生成模型来精确描述所有鲁棒的——即，有向网络的激励、线性规模保持和稳定的——分层聚类方法。我们还讨论了我们的方法的实现，并描述了对真实数据的应用。摘要：We provide a complete taxonomic characterization of robust hierarchical clustering methods for directed networks following an axiomatic approach. We begin by introducing three practical properties associated with the notion of robustness in hierarchical clustering: linear scale preservation, stability, and excisiveness. Linear scale preservation enforces imperviousness to change in units of measure whereas stability ensures that a bounded perturbation in the input network entails a bounded perturbation in the clustering output. Excisiveness refers to the local consistency of the clustering outcome. Algorithmically, excisiveness implies that we can reduce computational complexity by only clustering a subset of our data while theoretically guaranteeing that the same hierarchical outcome would be observed when clustering the whole dataset. In parallel to these three properties, we introduce the concept of representability, a generative model for describing clustering methods through the specification of their action on a collection of networks. Our main result is to leverage this generative model to give a precise characterization of all robust -- i.e., excisive, linear scale preserving, and stable -- hierarchical clustering methods for directed networks. We also address the implementation of our methods and describe an application to real data.

【2】 Provable Data Clustering via Innovation Search 标题：基于创新搜索的可证明数据聚类链接：https://arxiv.org/abs/2108.06888

作者：Weiwei Li,Mostafa Rahmani,Ping Li 机构：Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA 摘要：研究了从高维环境空间采集的数据点位于线性子空间并集的子空间聚类问题。当子空间之间的交集维数较大，并且大多数基于自表示的方法对聚类跨度的交集比较敏感时，子空间聚类成为一个挑战。与基于自表示的方法形成鲜明对比的是，最近提出的一种称为创新追求的聚类方法计算了一组最佳方向（创新方向）来构建邻接矩阵。本文主要研究创新追踪算法，以揭示其在子空间严重相交时令人印象深刻的性能。结果表明，与大多数现有方法要求子空间彼此足够不相干相比，创新追求只要求子空间的创新成分彼此足够不相干。这些新的充分条件允许簇彼此非常接近。在理论分析的推动下，本文提出了一种简单而有效的基于投影的技术，数值和理论结果表明，该技术可以提高创新追求的绩效。摘要：This paper studies the subspace clustering problem in which data points collected from high-dimensional ambient space lie in a union of linear subspaces. Subspace clustering becomes challenging when the dimension of intersection between subspaces is large and most of the self-representation based methods are sensitive to the intersection between the span of clusters. In sharp contrast to the self-representation based methods, a recently proposed clustering method termed Innovation Pursuit, computed a set of optimal directions (directions of innovation) to build the adjacency matrix. This paper focuses on the Innovation Pursuit Algorithm to shed light on its impressive performance when the subspaces are heavily intersected. It is shown that in contrast to most of the existing methods which require the subspaces to be sufficiently incoherent with each other, Innovation Pursuit only requires the innovative components of the subspaces to be sufficiently incoherent with each other. These new sufficient conditions allow the clusters to be strongly close to each other. Motivated by the presented theoretical analysis, a simple yet effective projection based technique is proposed which we show with both numerical and theoretical results that it can boost the performance of Innovation Pursuit.

【3】 Clustering Filipino Disaster-Related Tweets Using Incremental and Density-Based Spatiotemporal Algorithm with Support Vector Machines for Needs Assessment 2 标题：使用增量和基于密度的时空算法和支持向量机对菲律宾灾难相关推文进行聚类以进行需求评估2 链接：https://arxiv.org/abs/2108.06853

作者：Ocean M. Barba,Franz Arvin T. Calbay,Angelica Jane S. Francisco,Angel Luis D. Santos,Charmaine S. Ponay 摘要：社交媒体在人们获取信息和相互交流方面发挥了巨大作用。它帮助人们表达他们在灾难中的需求。由于通过Twitter发布的帖子在默认情况下是可以公开访问的，因此Twitter是灾难发生时最有用的社交媒体网站之一。因此，这项研究旨在评估菲律宾人在灾难期间在推特上表达的需求。使用Na “ive Bayes分类器收集数据并将其分类为与灾害相关或无关。在此之后，使用增量聚类算法对与灾难相关的tweet按灾难类型进行聚类，然后使用基于密度的时空聚类算法根据tweet的位置和时间进行子聚类。最后，使用支持向量机，根据表达的需求对推特进行分类，如避难所、救援、救济、现金、祈祷和其他。研究结果表明，增量聚类算法和基于密度的时空聚类算法能够对tweet进行聚类，f-measure得分分别为47.20%和82.28%。此外，Na “ive Bayes和支持向量机能够分别以97%的平均f-度量分数和77.57%的平均准确率进行分类。摘要：Social media has played a huge part on how people get informed and communicate with one another. It has helped people express their needs due to distress especially during disasters. Because posts made through it are publicly accessible by default, Twitter is among the most helpful social media sites in times of disaster. With this, the study aims to assess the needs expressed during calamities by Filipinos on Twitter. Data were gathered and classified as either disaster-related or unrelated with the use of Na"ive Bayes classifier. After this, the disaster-related tweets were clustered per disaster type using Incremental Clustering Algorithm, and then sub-clustered based on the location and time of the tweet using Density-based Spatiotemporal Clustering Algorithm. Lastly, using Support Vector Machines, the tweets were classified according to the expressed need, such as shelter, rescue, relief, cash, prayer, and others. After conducting the study, results showed that the Incremental Clustering Algorithm and Density-Based Spatiotemporal Clustering Algorithm were able to cluster the tweets with f-measure scores of 47.20% and 82.28% respectively. Also, the Na"ive Bayes and Support Vector Machines were able to classify with an average f-measure score of 97% and an average accuracy of 77.57% respectively.

自动驾驶|车辆|车道检测等(2篇)

【1】 Vehicle-counting with Automatic Region-of-Interest and Driving-Trajectory detection 标题：具有自动感兴趣区域和行驶轨迹检测的车辆计数链接：https://arxiv.org/abs/2108.07135

作者：Malolan Vasu,Nelson Abreu,Raysa Vásquez,Christian López 备注：5 pages with 3 figures and 1 table. Presented in ICML 2021 LatinXAI Workshop 摘要：车辆计数系统有助于车辆分析和交通事故检测。不幸的是，大多数现有的方法都需要一定程度的人工输入来识别感兴趣区域（ROI）、感兴趣的运动，或者建立一个参考点或参考线来从交通摄像头中对车辆进行计数。这项工作介绍了一种从交通视频中计数车辆的方法，该方法可以自动识别摄像头的ROI以及车辆的行驶轨迹。这使得该方法适用于在发展中国家经常使用的云台变焦相机。初步结果表明，所提出的方法在对测试的交通摄像机的车辆计数时，ROI的平均交点为57.05%，平均绝对误差仅为17.44%。摘要：Vehicle counting systems can help with vehicle analysis and traffic incident detection. Unfortunately, most existing methods require some level of human input to identify the Region of interest (ROI), movements of interest, or to establish a reference point or line to count vehicles from traffic cameras. This work introduces a method to count vehicles from traffic videos that automatically identifies the ROI for the camera, as well as the driving trajectories of the vehicles. This makes the method feasible to use with Pan-Tilt-Zoom cameras, which are frequently used in developing countries. Preliminary results indicate that the proposed method achieves an average intersection over the union of 57.05% for the ROI and a mean absolute error of just 17.44% at counting vehicles of the traffic video cameras tested.

【2】 Time Delay Estimation of Traffic Congestion Propagation based on Transfer Entropy 标题：基于传递熵的交通拥堵传播时延估计链接：https://arxiv.org/abs/2108.06717

作者：YongKyung Oh,JiIn Kwak,JuYoung Lee,Sungil Kim 摘要：考虑到拥堵在不久的将来将如何传播，了解交通拥堵的传播对于GPS导航系统为用户提供更准确的预计到达时间（ETA）至关重要。然而，由于道路之间复杂的传播过程以及过程未来行为的高度不确定性，在拥堵期间提供准确的ETA是一项挑战。最近的研究主要集中在发现频繁的拥塞传播模式和确定传播概率。相比之下，本研究提出了一种新的基于滞后特定转移熵（TE）的道路间交通拥挤传播时延估计方法。在计算TE时，采用滑动窗口的非线性归一化方法有效地揭示了源时间序列和目标时间序列之间的因果关系。此外，采用马尔可夫bootstrap技术对时延估计器中的不确定性进行量化。据我们所知，本文提出的时延估计方法是第一个确定任何拥塞传播模式下道路之间时延的方法。利用模拟数据以及从韩国应用的一个主要GPS导航系统获得的真实用户轨迹数据，对所提出的方法进行了验证。摘要：Considering how congestion will propagate in the near future, understanding traffic congestion propagation has become crucial in GPS navigation systems for providing users with a more accurate estimated time of arrival (ETA). However, providing the exact ETA during congestion is a challenge owing to the complex propagation process between roads and high uncertainty regarding the future behavior of the process. Recent studies have focused on finding frequent congestion propagation patterns and determining the propagation probabilities. By contrast, this study proposes a novel time delay estimation method for traffic congestion propagation between roads using lag-specific transfer entropy (TE). Nonlinear normalization with a sliding window is used to effectively reveal the causal relationship between the source and target time series in calculating the TE. Moreover, Markov bootstrap techniques were adopted to quantify the uncertainty in the time delay estimator. To the best of our knowledge, the time delay estimation method presented in this article is the first to determine the time delay between roads for any congestion propagation pattern. The proposed method was validated using simulated data as well as real user trajectory data obtained from a major GPS navigation system applied in South Korea.

联邦学习|隐私保护|加密(3篇)

【1】 Aegis: A Trusted, Automatic and Accurate Verification Framework for Vertical Federated Learning 标题：Aegis：一种可信、自动、准确的垂直联合学习验证框架链接：https://arxiv.org/abs/2108.06958

作者：Cengguang Zhang,Junxue Zhang,Di Chai,Kai Chen 机构：SING Lab, Hong Kong University of Science and Technology, Clustar 备注：7 pages, International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with IJCAI 2021 (FL-IJCAI'21) 摘要：垂直联合学习（VFL）利用各种隐私保护算法，例如同态加密或基于秘密共享的SecureBoost，以确保数据隐私。然而，这些算法都需要一个半诚实的安全定义，这在实际应用中引起了关注。在本文中，我们提出了Aegis，一个可信的、自动的、准确的验证框架来验证VFL作业的安全性。宙斯盾与当地各方分离，以确保框架的安全。此外，它通过将VFL作业定义为有限状态机来统一验证不同的算法，并再现整个作业以提供更精确的验证，从而自动适应不断发展的VFL算法。我们在金融和医疗数据集上使用不同的威胁模型实施和评估宙斯盾。评估结果表明：1）Aegis可以检测95%的威胁模型，2）它在总VFL作业时间的84%内提供细粒度验证结果。摘要：Vertical federated learning (VFL) leverages various privacy-preserving algorithms, e.g., homomorphic encryption or secret sharing based SecureBoost, to ensure data privacy. However, these algorithms all require a semi-honest secure definition, which raises concerns in real-world applications. In this paper, we present Aegis, a trusted, automatic, and accurate verification framework to verify the security of VFL jobs. Aegis is separated from local parties to ensure the security of the framework. Furthermore, it automatically adapts to evolving VFL algorithms by defining the VFL job as a finite state machine to uniformly verify different algorithms and reproduce the entire job to provide more accurate verification. We implement and evaluate Aegis with different threat models on financial and medical datasets. Evaluation results show that: 1) Aegis can detect 95% threat models, and 2) it provides fine-grained verification results within 84% of the total VFL job time.

【2】 Blockchain-based Trustworthy Federated Learning Architecture 标题：基于区块链的可信联邦学习体系结构链接：https://arxiv.org/abs/2108.06912

作者：Sin Kit Lo,Yue Liu,Qinghua Lu,Chen Wang,Xiwei Xu,Hye-Young Paik,Liming Zhu 机构：∗Data, CSIRO, Sydney, Australia, †School of Computer Science and Engineering, UNSW, Sydney, Australia 摘要：联合学习（Federated learning）是一种新兴的隐私保护人工智能技术，客户（即组织或设备）在本地训练模型，并基于本地模型更新制定全局模型，而无需向外部传输本地数据。然而，联邦学习系统很难实现可信度并体现负责任的人工智能原则。特别是，由于多方利益相关者的参与和客户端数据分发的异构性，联邦学习系统面临着责任和公平性方面的挑战。为了增强联邦学习系统的可问责性和公平性，我们提出了一种基于区块链的可信联邦学习体系结构。我们首先设计了一个基于智能合约的数据模型出处注册中心，以实现可问责性。此外，我们还提出了一种加权公平数据采样器算法来增强训练数据的公平性。我们使用COVID-19 X射线检测用例来评估所提出的方法。评估结果表明，该方法在实现问责制和提高公平性方面是可行的。在模型的泛化性和准确性方面，该算法比默认的联邦学习设置具有更好的性能。摘要：Federated learning is an emerging privacy-preserving AI technique where clients (i.e., organisations or devices) train models locally and formulate a global model based on the local model updates without transferring local data externally. However, federated learning systems struggle to achieve trustworthiness and embody responsible AI principles. In particular, federated learning systems face accountability and fairness challenges due to multi-stakeholder involvement and heterogeneity in client data distribution. To enhance the accountability and fairness of federated learning systems, we present a blockchain-based trustworthy federated learning architecture. We first design a smart contract-based data-model provenance registry to enable accountability. Additionally, we propose a weighted fair data sampler algorithm to enhance fairness in training data. We evaluate the proposed approach using a COVID-19 X-ray detection use case. The evaluation results show that the approach is feasible to enable accountability and improve fairness. The proposed algorithm can achieve better performance than the default federated learning setting in terms of the model's generalisation and accuracy.

【3】 Reducing the Communication Cost of Federated Learning through Multistage Optimization 标题：通过多阶段优化降低联邦学习的通信成本链接：https://arxiv.org/abs/2108.06869

作者：Charlie Hou,Kiran K. Thekumparampil,Giulia Fanti,Sewoong Oh 机构： 1Department of Electrical and Computer Engineering, Carnegie Mellon University, USA 2Department of Electrical and Computer Engineering, USA 3Allen School of Computer Science and Engineer-ing, University of Washington 摘要：联邦学习（FL）中的一个核心问题是如何设计优化算法，以最大限度地降低在分布于多个客户端的异构数据上训练模型的通信成本。减少通信的一种流行技术是使用本地步骤，其中客户端在与服务器通信之前对本地数据执行多个优化步骤（例如，FedAvg、SCAFFOLD）。这与集中式方法形成了对比，在集中式方法中，客户机在每一轮通信中采取一个优化步骤（例如，Minibatch SGD）。最近关于一阶方法通信复杂度的下限表明，集中式方法在高度异构数据上是最优的，而局部方法在纯同质数据上是最优的[Woodworth et al.，2020]。对于中等异质性水平，没有已知的算法匹配下限。在本文中，我们提出了一个多级优化方案，几乎匹配所有异质性水平的下限。其思想是首先运行一个局部方法，直到异质性引起的错误下限；接下来，我们切换到一个集中的方法来完成剩下的步骤。我们的分析可能有助于解释FL中经验上成功的步长衰减方法[Charles et al.，2020；Reddi等人，2020年]。我们证明了该方案在图像分类任务中的实用性。摘要：A central question in federated learning (FL) is how to design optimization algorithms that minimize the communication cost of training a model over heterogeneous data distributed across many clients. A popular technique for reducing communication is the use of local steps, where clients take multiple optimization steps over local data before communicating with the server (e.g., FedAvg, SCAFFOLD). This contrasts with centralized methods, where clients take one optimization step per communication round (e.g., Minibatch SGD). A recent lower bound on the communication complexity of first-order methods shows that centralized methods are optimal over highly-heterogeneous data, whereas local methods are optimal over purely homogeneous data [Woodworth et al., 2020]. For intermediate heterogeneity levels, no algorithm is known to match the lower bound. In this paper, we propose a multistage optimization scheme that nearly matches the lower bound across all heterogeneity levels. The idea is to first run a local method up to a heterogeneity-induced error floor; next, we switch to a centralized method for the remaining steps. Our analysis may help explain empirically-successful stepsize decay methods in FL [Charles et al., 2020; Reddi et al., 2020]. We demonstrate the scheme's practical utility in image classification tasks.

推理|分析|理解|解释(8篇)

【1】 Efficient Feature Representations for Cricket Data Analysis using Deep Learning based Multi-Modal Fusion Model 标题：基于深度学习的多模态融合模型在板球数据分析中的高效特征表示链接：https://arxiv.org/abs/2108.07139

作者：Souridas Alaka,Rishikesh Sreekumar,Hrithwik Shalu 机构：Indian Institute of Technology Madras, India 摘要：数据分析已成为现代板球运动的必要条件。从有效的团队管理到比赛获胜预测，一切都使用某种形式的分析。有效分析数据需要有意义的数据表示。在本研究中，我们研究了自适应（可学习）嵌入的使用，以表示相互关联的特征（如球员、团队等）。本研究使用的数据来自经典的T20锦标赛IPL（印度超级联赛）。为了自然地促进学习有意义的特征表示以进行准确的数据分析，我们制定了一个深度表示学习框架，该框架通过最小化对比损失来共同学习一组自定义嵌入（代表我们感兴趣的特征）。我们的目标是根据一局的总跑动率通过分层聚类得到的一组类。据评估，该框架可确保获得的嵌入具有更大的通用性，在此基础上，对总体运行率预测进行了基于任务的分析，以显示该框架的可靠性。摘要：Data analysis has become a necessity in the modern era of cricket. Everything from effective team management to match win predictions use some form of analytics. Meaningful data representations are necessary for efficient analysis of data. In this study we investigate the use of adaptive (learnable) embeddings to represent inter-related features (such as players, teams, etc). The data used for this study is collected from a classical T20 tournament IPL (Indian Premier League). To naturally facilitate the learning of meaningful representations of features for accurate data analysis, we formulate a deep representation learning framework which jointly learns a custom set of embeddings (which represents our features of interest) through the minimization of a contrastive loss. We base our objective on a set of classes obtained as a result of hierarchical clustering on the overall run rate of an innings. It's been assessed that the framework ensures greater generality in the obtained embeddings, on top of which a task based analysis of overall run rate prediction was done to show the reliability of the framework.

【2】 AIREX: Neural Network-based Approach for Air Quality Inference in Unmonitored Cities 标题：AIREX：基于神经网络的非监测城市空气质量推断方法链接：https://arxiv.org/abs/2108.07120

作者：Yuya Sasaki,Kei Harada,Shohei Yamasaki,Makoto Onizuka 机构：Osaka university 摘要：城市空气污染是影响人类健康和生活质量的主要环境问题。已建立监测站，以不断获取空气质量信息，但监测站并不覆盖所有地区。因此，有许多方法用于空间细粒度空气质量推断。由于现有方法的目的是仅推断受监测城市中各地点的空气质量，因此它们不假设推断未受监测城市的空气质量。在本文中，我们首先研究了无监测城市的空气质量推断。为了准确推断未受监测城市的空气质量，我们提出了一种基于神经网络的AIREX方法。AIREX的创新之处在于采用了一种混合专家方法，这是一种基于分治原理的机器学习技术，用于学习多个城市之间空气质量的相关性。为了进一步提高性能，它采用注意机制来计算从受监测城市到未受监测城市位置的空气质量推断的影响。我们通过对真实空气质量数据集的实验表明，AIREX比最先进的方法具有更高的精度。摘要：Urban air pollution is a major environmental problem affecting human health and quality of life. Monitoring stations have been established to continuously obtain air quality information, but they do not cover all areas. Thus, there are numerous methods for spatially fine-grained air quality inference. Since existing methods aim to infer air quality of locations only in monitored cities, they do not assume inferring air quality in unmonitored cities. In this paper, we first study the air quality inference in unmonitored cities. To accurately infer air quality in unmonitored cities, we propose a neural network-based approach AIREX. The novelty of AIREX is employing a mixture-of-experts approach, which is a machine learning technique based on the divide-and-conquer principle, to learn correlations of air quality between multiple cities. To further boost the performance, it employs attention mechanisms to compute impacts of air quality inference from the monitored cities to the locations in the unmonitored city. We show, through experiments on a real-world air quality dataset, that AIREX achieves higher accuracy than state-of-the-art methods.

【3】 A complex network approach to time series analysis with application in diagnosis of neuromuscular disorders 标题：时间序列分析的复杂网络方法及其在神经肌肉疾病诊断中的应用链接：https://arxiv.org/abs/2108.06920

作者：Samaneh Samiei,Nasser Ghadiri,Behnaz Ansari 摘要：肌电图（EMG）是指指示神经肌肉活动和肌肉形态的生物医学信号。专家们利用这个时间序列准确地诊断神经肌肉疾病。现代数据分析技术最近引入了将时间序列数据映射到图形和复杂网络的新方法，在包括医学在内的各个领域都有应用。由此产生的网络发展出完全不同的视力，可以用来补充医生对时间序列的发现。这可以导致更丰富的分析，减少误差，更准确地诊断疾病，并提高治疗过程的准确性和速度。映射过程可能会导致时间序列的基本数据丢失，并且无法保留所有时间序列特征。因此，实现一种既能很好地表示时间序列又能保持基本特征的方法至关重要。本文提出了一种新的网络开发方法GraphTS，以克服现有方法的局限性，该方法通过使用可见性图方法的EMG时间序列进行网络开发。为此，肌电信号通过标准的可视图算法进行预处理并映射到复杂网络。由此产生的网络可以区分健康样本和患者样本。下一步，在提取最佳特征后，以特征矩阵的形式给出所开发网络的属性，作为分类器的输入。使用深度神经网络对所提出的方法进行性能评估，结果表明，训练数据的准确率为99.30%，测试数据的准确率为99.18%。因此，除了丰富网络表示并涵盖健康、肌病和神经病变肌电图的时间序列特征外，所提出的技术还提高了准确性、精确性、召回率和F评分。摘要：Electromyography (EMG) refers to a biomedical signal indicating neuromuscular activity and muscle morphology. Experts accurately diagnose neuromuscular disorders using this time series. Modern data analysis techniques have recently led to introducing novel approaches for mapping time series data to graphs and complex networks with applications in diverse fields, including medicine. The resulting networks develop a completely different visual acuity that can be used to complement physician findings of time series. This can lead to a more enriched analysis, reduced error, more accurate diagnosis of the disease, and increased accuracy and speed of the treatment process. The mapping process may cause the loss of essential data from the time series and not retain all the time series features. As a result, achieving an approach that can provide a good representation of the time series while maintaining essential features is crucial. This paper proposes a new approach to network development named GraphTS to overcome the limited accuracy of existing methods through EMG time series using the visibility graph method. For this purpose, EMG signals are pre-processed and mapped to a complex network by a standard visibility graph algorithm. The resulting networks can differentiate between healthy and patient samples. In the next step, the properties of the developed networks are given in the form of a feature matrix as input to classifiers after extracting optimal features. Performance evaluation of the proposed approach with deep neural network shows 99.30% accuracy for training data and 99.18% for test data. Therefore, in addition to enriched network representation and covering the features of time series for healthy, myopathy, and neuropathy EMG, the proposed technique improves accuracy, precision, recall, and F-score.

【4】 Locally Interpretable Model Agnostic Explanations using Gaussian Processes 标题：基于高斯过程的局部可解释模型不可知性解释链接：https://arxiv.org/abs/2108.06907

作者：Aditya Saini,Ranjitha Prasad 机构：Indraprastha Institute of Information Technology Delhi, New Delhi 摘要：由于数据密集型领域性能的巨大改进，机器学习（ML）在研究界引起了极大的兴趣。然而，这些ML模型被证明是黑匣子，很难解释，导致生产力的直接下降。局部可解释模型不可知解释（LIME）是解释单个实例预测的常用技术。尽管石灰简单且用途广泛，但它在生成的解释中存在不稳定性。在本文中，我们提出了一种基于高斯过程（GP）的局部可解释模型变异。在贝叶斯优化中，我们采用了一种基于捕获函数的智能抽样策略。此外，我们在GP中使用了基于自动相关确定的协方差函数，每个特征具有单独的长度尺度参数，其中长度尺度参数的倒数用作特征解释。我们在两个真实数据集上演示了该技术的性能，并展示了该技术优越的稳定性。此外，我们证明，与石灰相比，所提出的技术能够使用更少的样本生成可靠的解释。摘要：Owing to tremendous performance improvements in data-intensive domains, machine learning (ML) has garnered immense interest in the research community. However, these ML models turn out to be black boxes, which are tough to interpret, resulting in a direct decrease in productivity. Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique for explaining the prediction of a single instance. Although LIME is simple and versatile, it suffers from instability in the generated explanations. In this paper, we propose a Gaussian Process (GP) based variation of locally interpretable models. We employ a smart sampling strategy based on the acquisition functions in Bayesian optimization. Further, we employ the automatic relevance determination based covariance function in GP, with separate length-scale parameters for each feature, where the reciprocal of lengthscale parameters serve as feature explanations. We illustrate the performance of the proposed technique on two real-world datasets, and demonstrate the superior stability of the proposed technique. Furthermore, we demonstrate that the proposed technique is able to generate faithful explanations using much fewer samples as compared to LIME.

【5】 Towards Understanding Theoretical Advantages of Complex-Reaction Networks 标题：认识络合反应网络的理论优势链接：https://arxiv.org/abs/2108.06711

作者：Shao-Qun Zhang,Gao Wei,Zhi-Hua Zhou 机构： National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China 摘要：近年来，复值神经网络受到了越来越多的关注，但与实值网络相比，复值神经网络的优势仍然是一个未知数。这项工作通过引入具有全连接前馈结构的emph{复杂反应网络}朝着这个方向迈出了一步。我们证明了复杂反应网络的普适逼近性，并证明了一类径向函数可以用多项式参数数由复杂反应网络逼近，而实值网络至少需要指数参数才能达到相同的逼近水平。对于经验风险最小化，我们的理论结果表明，复杂反应网络的临界点集是实值网络临界点集的一个适当子集，这可能为更容易找到复杂反应网络的最优解提供一些见解。摘要：Complex-valued neural networks have attracted increasing attention in recent years, while it remains open on the advantages of complex-valued neural networks in comparison with real-valued networks. This work takes one step on this direction by introducing the emph{complex-reaction network} with fully-connected feed-forward architecture. We prove the universal approximation property for complex-reaction networks, and show that a class of radial functions can be approximated by a complex-reaction network using the polynomial number of parameters, whereas real-valued networks need at least exponential parameters to reach the same approximation level. For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks, which may show some insights on finding the optimal solutions more easily for complex-reaction networks.

【6】 Prediction Analysis of Optical Tracker Parameters using Machine Learning Approaches for efficient Head Tracking 标题：基于机器学习的高效头部跟踪光学跟踪器参数预测分析链接：https://arxiv.org/abs/2108.06606

作者：Aman Kataria,Smarajit Ghosh,Vinod Karar 机构： Department of Electrical and Instrumentation Engineering, Thapar University, Patiala-, Punjab, India, Department of Optical Devices and Systems, CSIR-Central Scientific Instruments Organization, Chandigarh 备注：None 摘要：头部跟踪器是头戴式显示系统的关键部分，因为它在飞机/驾驶舱模拟器中跟踪飞行员的头部。头部跟踪器的操作缺陷也取决于不同的环境条件，如不同的照明条件和杂散光干扰。在这封信中，一个光学跟踪器被用来收集不同环境条件下头部运动的6自由度数据。此外，还分析了不同环境条件以及接收器和光发射器之间距离的变化对6自由度数据的影响。摘要：A head tracker is a crucial part of the head mounted display systems, as it tracks the head of the pilot in the plane/cockpit simulator. The operational flaws of head trackers are also dependent on different environmental conditions like different lighting conditions and stray light interference. In this letter, an optical tracker has been employed to gather the 6-DoF data of head movements under different environmental conditions. Also, the effect of different environmental conditions and variation in distance between the receiver and optical transmitter on the 6-DoF data was analyzed.

【7】 Variational Inference at Glacier Scale 标题：冰川尺度的变分推断链接：https://arxiv.org/abs/2108.07263

作者：Douglas J. Brinkerhoff 机构：Department of Computer Science, University of Montana, Missoula, MT 摘要：我们利用随机变分推理结合自然梯度下降来寻找一个近似的变分分布，通过对表面速度的观测，刻画了冰盖模型在空间变化的基础牵引力和冰软度参数上的完整联合后验分布。通过将高斯过程置于参数之上，并将问题转化为核的本征函数，我们对参数平滑度和长度尺度的先验假设进行了实质性的控制，同时也使推理易于处理。在一个合成的例子中，我们发现该方法恢复了已知的参数并解释了相互不确定性，这两者都会影响观测到的表面速度。在格陵兰岛东南部海尔海姆冰川的应用中，我们表明我们的方法适用于冰川大小的问题。我们发现，无论观测噪声模型的选择如何，慢流区域的后验不确定性都很高。摘要：We characterize the complete joint posterior distribution over spatially-varying basal traction and and ice softness parameters of an ice sheet model from observations of surface speed by using stochastic variational inference combined with natural gradient descent to find an approximating variational distribution. By placing a Gaussian process prior over the parameters and casting the problem in terms of eigenfunctions of a kernel, we gain substantial control over prior assumptions on parameter smoothness and length scale, while also rendering the inference tractable. In a synthetic example, we find that this method recovers known parameters and accounts for mutual indeterminacy, both of which can influence observed surface speed. In an application to Helheim Glacier in Southeast Greenland, we show that our method scales to glacier-sized problems. We find that posterior uncertainty in regions of slow flow is high regardless of the choice of observational noise model.

【8】 Equity-Directed Bootstrapping: Examples and Analysis 标题：股权导向自举：实例与分析链接：https://arxiv.org/abs/2108.06624

作者：Harish S. Bhat,Majerle E. Reeves,Sidra Goldman-Mellor 备注：17 pages 摘要：当面临严重不平衡的二元分类问题时，我们通常在自举数据上训练模型，其中每个类的实例数以更有利的比率出现，例如一个。我们通过不平衡分类的视角来看待算法的不平等性：为了平衡分类器在不同组之间的性能，我们可以引导实现在标签和组标识方面平衡的训练集。例如，一个严重的等级不平衡问题——根据行政病历预测自杀死亡——我们说明了公平引导引导引导如何使测试集的敏感性和特异性更接近于满足等几率标准。在na “ive Bayes和logistic回归的背景下，我们分析了股权引导引导引导法，通过将优势比接近1，并将其与截距调整、阈值和权重相关的方法联系起来，证明其有效。摘要：When faced with severely imbalanced binary classification problems, we often train models on bootstrapped data in which the number of instances of each class occur in a more favorable ratio, e.g., one. We view algorithmic inequity through the lens of imbalanced classification: in order to balance the performance of a classifier across groups, we can bootstrap to achieve training sets that are balanced with respect to both labels and group identity. For an example problem with severe class imbalance---prediction of suicide death from administrative patient records---we illustrate how an equity-directed bootstrap can bring test set sensitivities and specificities much closer to satisfying the equal odds criterion. In the context of na"ive Bayes and logistic regression, we analyze the equity-directed bootstrap, demonstrating that it works by bringing odds ratios close to one, and linking it to methods involving intercept adjustment, thresholding, and weighting.

检测相关(5篇)

【1】 Detecting and interpreting faults in vulnerable power grids with machine learning 标题：基于机器学习的脆弱电网故障检测与解释链接：https://arxiv.org/abs/2108.07060

作者：Odin Foldvik Eikeland,Inga Setså Holmstrand,Sigurd Bakkejord,Matteo Chiesa,Filippo Maria Bianchi 机构：Department of Physics and Technology, UiT-the Arctic University of Norway, Arva Power Company, Department of Mathematics and Statistics, NORCE Norwegian Research Centre AS 摘要：计划外的电力干扰会对客户和电网运营商造成严重后果。为了防范此类事件，有必要确定配电网中断的原因。在这项工作中，我们将重点放在北极挪威社区的电网上，该社区经历了几次来源不明的故障。首先，我们构建了一个由相关气象数据和电能质量计记录的当前电能质量信息组成的数据集。然后，我们采用机器学习技术来预测故障的发生。实验结果表明，线性和非线性分类器均能获得良好的分类性能。这表明所考虑的电能质量和天气变量很好地解释了电力扰动。解释分类器的决策过程为理解干扰的主要原因提供了有价值的见解。传统的特征选择方法只能指出平均而言，哪些变量主要解释数据集中的故障发生。除了提供这种全局解释外，还必须确定解释每个单独故障的特定变量集。为了应对这一挑战，我们采用了一种最新的技术来解释深度学习模型的决策过程，称为综合梯度。所提出的方法允许获得特定故障发生的详细信息，这对于配电系统运营商实施预防和缓解电力干扰的策略非常有价值。摘要：Unscheduled power disturbances cause severe consequences both for customers and grid operators. To defend against such events, it is necessary to identify the causes of interruptions in the power distribution network. In this work, we focus on the power grid of a Norwegian community in the Arctic that experiences several faults whose sources are unknown. First, we construct a data set consisting of relevant meteorological data and information about the current power quality logged by power-quality meters. Then, we adopt machine-learning techniques to predict the occurrence of faults. Experimental results show that both linear and non-linear classifiers achieve good classification performance. This indicates that the considered power-quality and weather variables explain well the power disturbances. Interpreting the decision process of the classifiers provides valuable insights to understand the main causes of disturbances. Traditional features selection methods can only indicate which are the variables that, on average, mostly explain the fault occurrences in the dataset. Besides providing such a global interpretation, it is also important to identify the specific set of variables that explain each individual fault. To address this challenge, we adopt a recent technique to interpret the decision process of a deep learning model, called Integrated Gradients. The proposed approach allows to gain detailed insights on the occurrence of a specific fault, which are valuable for the distribution system operators to implement strategies to prevent and mitigate power disturbances.

【2】 Task-Sensitive Concept Drift Detector with Metric Learning 标题：基于度量学习的任务敏感型概念漂移检测器链接：https://arxiv.org/abs/2108.06980

作者：Andrea Castellani,Sebastian Schmitt,Barbara Hammer 机构：CITEC, Bielefeld University, Bielefeld, Germany, Honda Research Institute Europe GmbH, Offenbach, Germany 备注：Preprint. Submitted at SSCI 2021 摘要：检测数据中的漂移对于机器学习应用至关重要，因为处理数据的统计数据的变化通常会对训练模型的性能产生深远的影响。大多数可用的漂移检测方法都需要在推断期间访问真实标签。在真实场景中，真实标签通常仅在模型训练期间可用。在这项工作中，我们提出了一种新的任务敏感漂移检测框架，该框架能够在推理过程中检测漂移，而无需访问真实标签。它利用输入数据的约束低维嵌入表示的度量学习，这最适合于分类任务。它能够检测到真实漂移，漂移影响分类性能，而正确忽略虚拟漂移，分类性能不受漂移影响。在提出的框架中，可以自由选择检测传入数据样本统计数据变化的实际方法。我们还提出了两种变化检测方法，分别基于指数移动平均和修正的$z$-分数。我们使用一种新的度量来评估该框架的性能，该度量将检测准确率、误报率和检测延迟的标准度量累积为一个值。对九个基准数据集的实验评估表明，该框架能够可靠地检测漂移，并且优于最先进的无监督漂移检测方法。摘要：Detecting drifts in data is essential for machine learning applications, as changes in the statistics of processed data typically has a profound influence on the performance of trained models. Most of the available drift detection methods require access to true labels during inference time. In a real-world scenario, true labels usually available only during model training. In this work, we propose a novel task-sensitive drift detection framework, which is able to detect drifts without access to true labels during inference. It utilizes metric learning of a constrained low-dimensional embedding representation of the input data, which is best suited for the classification task. It is able to detect real drift, where the drift affects the classification performance, while it properly ignores virtual drift, where the classification performance is not affected by the drift. In the proposed framework, the actual method to detect a change in the statistics of incoming data samples can be chosen freely. We also propose the two change detection methods, which are based on the exponential moving average and a modified $z$-score, respectively. We evaluate the performance of the proposed framework with a novel metric, which accumulates the standard metrics of detection accuracy, false positive rate and detection delay into one value. Experimental evaluation on nine benchmarks datasets, with different types of drift, demonstrates that the proposed framework can reliably detect drifts, and outperforms state-of-the-art unsupervised drift detection approaches.

【3】 Maps Search Misspelling Detection Leveraging Domain-Augmented Contextual Representations 标题：利用域增强上下文表示的地图搜索拼写错误检测链接：https://arxiv.org/abs/2108.06842

作者：Yutong Li 机构：Apple Inc., Cupertino, CA, USA 摘要：构建一个独立的拼写错误检测器并在纠正之前提供它可以为拼写器和其他搜索组件带来多方面的好处，这对于最常用的基于通道的噪声拼写器系统尤其如此。随着深度学习的快速发展和诸如BERTology等上下文表示学习的实质性进步，构建一个像样的拼写错误检测器而不必依赖于与嘈杂通道结构相关的手工功能变得比以往任何时候都更容易实现。然而，BERTolgy模型是用自然语言语料库训练的，但地图搜索是高度领域特定的，BERTolgy会继续成功吗。在本文中，我们设计了从最基本的LSTM到单域增强微调BERT的4个阶段的错误检测模型。在我们的案例中，我们发现对于Maps搜索，其他高级BERTology家族模型（如RoBERTa）并不一定优于BERT，而经典的跨域微调全BERT甚至低于较小的单域微调BERT。通过全面的建模实验和分析，我们分享了更多的发现，我们还简要介绍了数据生成算法的突破。摘要：Building an independent misspelling detector and serve it before correction can bring multiple benefits to speller and other search components, which is particularly true for the most commonly deployed noisy-channel based speller systems. With rapid development of deep learning and substantial advancement in contextual representation learning such as BERTology, building a decent misspelling detector without having to rely on hand-crafted features associated with noisy-channel architecture becomes more-than-ever accessible. However BERTolgy models are trained with natural language corpus but Maps Search is highly domain specific, would BERTology continue its success. In this paper we design 4 stages of models for misspeling detection ranging from the most basic LSTM to single-domain augmented fine-tuned BERT. We found for Maps Search in our case, other advanced BERTology family model such as RoBERTa does not necessarily outperform BERT, and a classic cross-domain fine-tuned full BERT even underperforms a smaller single-domain fine-tuned BERT. We share more findings through comprehensive modeling experiments and analysis, we also briefly cover the data generation algorithm breakthrough.

【4】 Topology-Guided Sampling for Fast and Accurate Community Detection 标题：拓扑引导采样实现快速准确的社区检测链接：https://arxiv.org/abs/2108.06651

作者：Frank Wanye,Vitaliy Gleyzer,Edward Kao,Wu-chun Feng 机构： © 20 2 1 Massachusetts Institute of Technology 摘要：社区检测是一个研究很好的问题，其应用范围从计算机网络到生物信息学。虽然有许多算法可以执行社区检测，但更精确和更具统计鲁棒性的算法往往速度较慢，难以并行化。加速这种算法的一种方法是通过数据缩减。然而，这种方法还没有得到彻底的研究，并且用这种方法得到的结果的质量随着它所应用的图形的不同而不同。在这篇手稿中，我们提出了一种基于拓扑引导采样的加速随机块划分的方法——一种适用于具有复杂和异构社区结构的图的社区检测算法。我们还介绍了一种基于度的阈值方案，该方案在牺牲加速的情况下提高了我们方法的效率。最后，我们对合成生成的图进行了一系列实验，以确定各种图参数如何影响结果质量和使用我们的方法获得的加速比，并在真实数据上验证了我们的方法。我们的结果表明，我们的方法在保持结果质量的同时，在不进行采样的情况下，可以使随机块划分的速度提高15倍，甚至可以使某些类型图的F1分数的结果质量提高150%以上。摘要：Community detection is a well-studied problem with applications in domains ranging from computer networking to bioinformatics. While there are many algorithms that perform community detection, the more accurate and statistically robust algorithms tend to be slow and hard to parallelize. One way to speed up such algorithms is through data reduction. However, this approach has not been thoroughly studied, and the quality of results obtained with this approach varies with the graph it is applied to. In this manuscript, we present an approach based on topology-guided sampling for accelerating stochastic block partitioning - a community detection algorithm that works well on graphs with complex and heterogeneous community structure. We also introduce a degree-based thresholding scheme that improves the efficacy of our approach at the expense of speedup. Finally, we perform a series of experiments on synthetically generated graphs to determine how various graph parameters affect the quality of results and speedup obtained with our approach, and we validate our approach on real-world data. Our results show that our approach can lead to a speedup of up to 15X over stochastic block partitioning without sampling while maintaining result quality and can even lead to improvements of over 150% in result quality in terms of F1 score on certain kinds of graphs.

【5】 Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study 标题：毒物评论自动检测中的偏差调查：一项实证研究链接：https://arxiv.org/abs/2108.06487

作者：Ayush Kumar,Pratik Kumar 机构：Georgia Institute of Technology, Atlanta, US 摘要：随着在线平台的激增，通过评论和反应，用户在这些平台上的参与度也在激增。这些文字评论中有很大一部分是辱骂、粗鲁和冒犯观众的。有了机器学习系统来检查平台上的评论，训练数据中存在的偏见就会传递到分类器上，导致对一组阶级、宗教和性别的歧视。在这项工作中，我们评估了不同的分类器和特征，以估计这些分类器中的偏差以及它们在毒性分类下游任务中的性能。结果表明，自动毒性评论检测模型性能的改善与这些模型中的偏差的缓解正相关。在我们的工作中，有注意机制的LSTM被证明是比CNN模型更好的建模策略。进一步的分析表明，在毒性评价检测的训练模型上，fasttext嵌入略优于手套嵌入。更深入的分析揭示了这样一个发现，即这种自动模型特别偏向于特定的身份群体，即使该模型具有较高的AUC分数。最后，为了减轻毒性检测模型中的偏差，用毒性亚型辅助任务训练的多任务设置被证明是有用的，导致AUC得分增加高达0.26%（6%相对）。摘要：With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender. In this work, we evaluate different classifiers and feature to estimate the bias in these classifiers along with their performance on downstream task of toxicity classification. Results show that improvement in performance of automatic toxic comment detection models is positively correlated to mitigating biases in these models. In our work, LSTM with attention mechanism proved to be a better modelling strategy than a CNN model. Further analysis shows that fasttext embeddings is marginally preferable than glove embeddings on training models for toxicity comment detection. Deeper analysis reveals the findings that such automatic models are particularly biased to specific identity groups even though the model has a high AUC score. Finally, in effort to mitigate bias in toxicity detection models, a multi-task setup trained with auxiliary task of toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain in AUC scores.

分类|识别(4篇)

【1】 A Physics Informed Neural Network Approach to Solution and Identification of Biharmonic Equations of Elasticity 标题：求解和辨识弹性双调和方程的物理启发式神经网络方法链接：https://arxiv.org/abs/2108.07243

作者：Mohammad Vahab,Ehsan Haghighat,Maryam Khaleghi,Nasser Khalili 机构： KhaliliaaSchool of Civil and Environmental Engineering, The University of New South Wales, AustraliabMassachusetts Institute of Technology, USAcUniversity of British Columbia, CanadadDepartment of Civil Engineering, Sharif University ofTechnology 摘要：我们探索了物理信息神经网络（PINNs）与Airy应力函数和Fourier级数结合的应用，以找到弹性和弹性板理论中几个参考双调和问题的最优解。双调和关系是一种四阶偏微分方程（PDE），难以用经典的数值方法求解，而且还没有用PINNs解决。我们的工作突出了经典分析方法的一个新应用，以指导高效神经网络的构建，该网络具有非常准确和快速的评估参数。特别是，我们发现使用艾里应力函数丰富特征空间可以显著提高双调和偏微分方程PINN解的精度。摘要：We explore an application of the Physics Informed Neural Networks (PINNs) in conjunction with Airy stress functions and Fourier series to find optimal solutions to a few reference biharmonic problems of elasticity and elastic plate theory. Biharmonic relations are fourth-order partial differential equations (PDEs) that are challenging to solve using classical numerical methods, and have not been addressed using PINNs. Our work highlights a novel application of classical analytical methods to guide the construction of efficient neural networks with the minimal number of parameters that are very accurate and fast to evaluate. In particular, we find that enriching feature space using Airy stress functions can significantly improve the accuracy of PINN solutions for biharmonic PDEs.

【2】 HCR-Net: A deep learning based script independent handwritten character recognition network 标题：HCR-Net：一种基于深度学习的手写体独立字符识别网络链接：https://arxiv.org/abs/2108.06663

作者：Vinod Kumar Chauhan,Sukhdeep Singh,Anuj Sharma 机构：Received: date Accepted: date 备注：21 pages, 5 figures, 16 tables (under review) 摘要：手写字符识别（HCR）是模式识别中一个具有挑战性的学习问题，主要原因是字符结构相似、书写风格不同、数据集噪声大以及语言和脚本种类繁多。HCR问题已经被广泛研究了几十年，但是对于脚本无关模型的研究非常有限。这是因为脚本的多样性、大多数传统研究工作的重点都是针对特定语言/脚本的手工特征提取技术，并且并不总是可用，以及公共数据集和代码无法再现结果等因素。另一方面，深度学习在模式识别的不同领域（包括HCR）取得了巨大成功，并提供了端到端的学习，即自动特征提取和识别。在本文中，我们提出了一种新的深度学习体系结构，称为HCR-Net，该体系结构利用迁移学习和图像增强进行端到端学习，用于与脚本无关的手写字符识别。该网络基于一种新的HCR转移学习方法，其中使用了预先训练的VGG16网络的一些较低层。由于迁移学习和图像增强，HCR网络提供了更快的训练、更好的性能和更好的概括。在孟加拉语、旁遮普语、印地语、英语、瑞典语、乌尔都语、波斯语、藏语、卡纳达语、马拉雅拉姆语、泰卢固语、马拉地语、尼泊尔语和阿拉伯语的公开数据集上的实验结果证明了HCR网络的有效性，并建立了若干新的基准。对于结果的再现性和HCR研究的进展，完整代码在href公开发布{https://github.com/jmdvinodjmd/HCR-Net}{GitHub}。摘要：Handwritten character recognition (HCR) is a challenging learning problem in pattern recognition, mainly due to similarity in structure of characters, different handwriting styles, noisy datasets and a large variety of languages and scripts. HCR problem is studied extensively for a few decades but there is very limited research on script independent models. This is because of factors, like, diversity of scripts, focus of the most of conventional research efforts on handcrafted feature extraction techniques which are language/script specific and are not always available, and unavailability of public datasets and codes to reproduce the results. On the other hand, deep learning has witnessed huge success in different areas of pattern recognition, including HCR, and provides end-to-end learning, i.e., automated feature extraction and recognition. In this paper, we have proposed a novel deep learning architecture which exploits transfer learning and image-augmentation for end-to-end learning for script independent handwritten character recognition, called HCR-Net. The network is based on a novel transfer learning approach for HCR, where some of lower layers of a pre-trained VGG16 network are utilised. Due to transfer learning and image-augmentation, HCR-Net provides faster training, better performance and better generalisations. The experimental results on publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages prove the efficacy of HCR-Net and establishes several new benchmarks. For reproducibility of the results and for the advancements of the HCR research, complete code is publicly released at href{https://github.com/jmdvinodjmd/HCR-Net}{GitHub}.

【3】 DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants 标题：Dexter：用于虚拟助手命名实体识别的外部知识深度编码链接：https://arxiv.org/abs/2108.06633

作者：Deepak Muralidharan,Joel Ruben Antony Moniz,Weicheng Zhang,Stephen Pulman,Lin Li,Megan Barnes,Jingjing Pan,Jason Williams,Alex Acero 机构：Apple, USA, Apple, UK, University of Washington, USA 备注：Interspeech 2021 摘要：命名实体识别（NER）通常是在书面来源良好的文本上开发和测试的。然而，在智能语音助理中，NER是一个重要组件，由于用户或语音识别错误，NER的输入可能会有噪声。在应用程序中，实体标签可能会频繁更改，并且可能需要非文本属性（如主题性或流行性）来在备选方案中进行选择。我们描述了一个旨在解决这些问题的NER系统。我们在一个专有的用户派生数据集上测试和训练这个系统。我们将其与基线纯文本NER系统进行比较；使用外部地名录增强基线；我们下面介绍的搜索和间接标记技术增强了基线。最终配置使NER错误率降低约6%。我们还表明，该技术改进了相关任务，如语义分析，错误率提高了5%。摘要：Named entity recognition (NER) is usually developed and tested on text from well-written sources. However, in intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. In applications, entity labels may change frequently, and non-textual properties like topicality or popularity may be needed to choose among alternatives. We describe a NER system intended to address these problems. We test and train this system on a proprietary user-derived dataset. We compare with a baseline text-only NER system; the baseline enhanced with external gazetteers; and the baseline enhanced with the search and indirect labelling techniques we describe below. The final configuration gives around 6% reduction in NER error rate. We also show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.

【4】 Asymptotic optimality and minimal complexity of classification by random projection 标题：随机投影分类的渐近最优性和最小复杂度链接：https://arxiv.org/abs/2108.06339

作者：Mireille Boutin,Evzenie Coupkova 摘要：分类器的泛化误差与选择分类器的函数集的复杂性有关。粗略地说，族越复杂，分类器的训练误差和总体误差之间的潜在差异就越大。奥卡姆的剃须刀原理以外行术语体现了这一原则，它表明低复杂性假设优于复杂假设。我们研究了一系列低复杂度的分类器，包括对一维特征进行阈值化，该一维特征是在将数据嵌入到由高达k阶的单项式参数化的高维空间后，将数据投影到一条随机线上获得的。更具体地说，扩展数据被投影n次，并在这些n次中选择最佳分类器（基于其在训练数据上的性能）。我们得到了这些低复杂度分类器的泛化误差的界。该界小于任何具有非平凡VC维的分类器的界，因此小于线性分类器的界。我们还表明，在充分了解类条件密度的情况下，当k和n趋于无穷大时，分类器的误差将收敛到最优（Bayes）误差；如果只给出一个训练数据集，我们证明了当k和n趋于无穷大时，分类器将完美地对所有训练点进行分类。摘要：The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. Roughly speaking, the more complex the family, the greater the potential disparity between the training error and the population error of the classifier. This principle is embodied in layman's terms by Occam's razor principle, which suggests favoring low-complexity hypotheses over complex ones. We study a family of low-complexity classifiers consisting of thresholding the one-dimensional feature obtained by projecting the data on a random line after embedding it into a higher dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n (based on its performance on training data) is chosen. We obtain a bound on the generalization error of these low-complexity classifiers. The bound is less than that of any classifier with a non-trivial VC dimension, and thus less than that of a linear classifier. We also show that, given full knowledge of the class conditional densities, the error of the classifiers would converge to the optimal (Bayes) error as k and n go to infinity; if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity.

表征(1篇)

【1】 Deepfake Representation with Multilinear Regression 标题：多元线性回归的深伪表示链接：https://arxiv.org/abs/2108.06702

作者：Sara Abdali,M. Alex O. Vasilescu,Evangelos E. Papalexakis 机构：University of California, Riverside, University of California, Los Angeles, Tensor Vision, Los Angeles 摘要：生成型神经网络结构（如GANs）可用于生成合成实例，以弥补实际数据的不足。然而，他们可能被用来创建可能导致社会、政治或经济动荡的媒体。一种新兴媒体是“深度伪造”。能够区分这类媒体的技术是必不可少的。在本文中，我们提出了一种改进的多线性（张量）方法，一种线性和多线性回归的组合，用于表示假数据和真实数据。我们通过使用改进的多线性（张量）方法表示深度伪造来测试我们的方法，并执行SVM分类，结果令人鼓舞。摘要：Generative neural network architectures such as GANs, may be used to generate synthetic instances to compensate for the lack of real data. However, they may be employed to create media that may cause social, political or economical upheaval. One emerging media is "Deepfake".Techniques that can discriminate between such media is indispensable. In this paper, we propose a modified multilinear (tensor) method, a combination of linear and multilinear regressions for representing fake and real data. We test our approach by representing Deepfakes with our modified multilinear (tensor) approach and perform SVM classification with encouraging results.

优化|敛散性(5篇)

【1】 Near-Optimal No-Regret Learning in General Games 标题：一般对策中的近优无遗憾学习链接：https://arxiv.org/abs/2108.06924

作者：Constantinos Daskalakis,Maxwell Fishelson,Noah Golowich 机构：MIT CSAIL 备注：40 pages 摘要：我们证明了在多人一般和博弈中，乐观套期保值——一种常见的具有近期偏差的乘法权重更新变体——达到${rm poly}（log T）$后悔。特别是，当游戏中的每个玩家都使用乐观对冲来迭代更新她的策略以响应到目前为止的游戏历史时，那么在$T$轮交互之后，每个玩家都会经历总后悔，即${rm poly}（log T）$。我们的界限指数级地提高了标准无遗憾学习者在游戏中可达到的$O（{T}{1/2}）$遗憾，具有近期偏差的无遗憾学习者可达到的$O（T}{1/4}）$遗憾（Syrgkanis等人，2015），以及最近在两人游戏的特殊情况下为乐观对冲显示的{O}（T}（T{1/6}）$界限（Chen和Pen，2020）。我们的界的一个推论是，在一般博弈中，乐观对冲以$tilde{O} left（frac 1T right）$的速率收敛到粗相关均衡。摘要：We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${rm poly}(log T)$ regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${rm poly}(log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard no-regret learners in games, the $O(T^{1/4})$ regret attainable by no-regret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen & Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $tilde{O}left(frac 1Tright)$.

【2】 Optimal Actor-Critic Policy with Optimized Training Datasets 标题：具有优化训练数据集的最优参与者-批评者策略链接：https://arxiv.org/abs/2108.06911

作者：Chayan Banerjee,Zhiyong Chen,Nasimul Noman,Mohsen Zamani 机构： for improvingThe authors are with the School of Electrical Engineering and Computing, University of Newcastle 摘要：Actor-critic（AC）算法在解决强化学习问题时具有高效性和高性能，但同时也存在采样效率低的问题。基于AC的策略优化过程是迭代的，需要频繁访问agent环境系统，通过推出策略、收集奖励和状态（即样本）并从中学习来评估和更新策略。学习最优策略最终需要大量样本。为了提高采样效率，我们提出了一种优化训练数据集的策略，该数据集包含从AC过程中采集的显著较少的样本。数据集优化由一个最佳事件操作、一个策略参数适应度模型和一个遗传算法模块组成。通过优化训练数据集训练的最优策略网络在控制自治动态系统方面比许多当代交流算法表现出更高的性能。对标准基准测试的评估表明，该方法提高了采样效率，确保更快地收敛到最优值，并且比同类方法具有更高的数据效率。摘要：Actor-critic (AC) algorithms are known for their efficacy and high performance in solving reinforcement learning problems, but they also suffer from low sampling efficiency. An AC based policy optimization process is iterative and needs to frequently access the agent-environment system to evaluate and update the policy by rolling out the policy, collecting rewards and states (i.e. samples), and learning from them. It ultimately requires a huge number of samples to learn an optimal policy. To improve sampling efficiency, we propose a strategy to optimize the training dataset that contains significantly less samples collected from the AC process. The dataset optimization is made of a best episode only operation, a policy parameter-fitness model, and a genetic algorithm module. The optimal policy network trained by the optimized training dataset exhibits superior performance compared to many contemporary AC algorithms in controlling autonomous dynamical systems. Evaluation on standard benchmarks show that the method improves sampling efficiency, ensures faster convergence to optima, and is more data-efficient than its counterparts.

【3】 CONet: Channel Optimization for Convolutional Neural Networks 标题：CONET：卷积神经网络的信道优化链接：https://arxiv.org/abs/2108.06822

作者：Mahdi S. Hosseini,Jia Shu Zhang,Zhe Liu,Andre Fu,Jingxuan Su,Mathieu Tuli,Konstantinos N. Plataniotis 机构：The Department of Electrical and Computer Engineering, University of New Brunswick, The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto 备注：Accepted for Publication in ICCV2021 NeurArch 摘要：神经架构搜索（NAS）已将网络设计从使用人类直觉转向利用由评估指标引导的搜索算法。我们研究了卷积神经网络（CNN）中的信道大小优化，并确定了它在模型精度和复杂性中所起的作用。当前的信道大小选择方法通常受到离散样本空间的限制，同时受到手动迭代和简单启发式的影响。为了解决这个问题，我们引入了一种高效的动态缩放算法——CONet——它可以自动优化给定CNN的跨网络层通道大小。引入了两个度量--“`textit{Rank}”和“textit{Rank Average Slope}”来识别训练中积累的信息。该算法在固定的搜索阶段上动态地放大或缩小通道大小。我们在CIFAR10/100和ImageNet数据集上进行了实验，结果表明，CONet可以找到在ResNet、DART和DART 空间中搜索的高效、准确的体系结构，其性能优于其基线模型。摘要：Neural Architecture Search (NAS) has shifted network design from using human intuition to leveraging search algorithms guided by evaluation metrics. We study channel size optimization in convolutional neural networks (CNN) and identify the role it plays in model accuracy and complexity. Current channel size selection methods are generally limited by discrete sample spaces while suffering from manual iteration and simple heuristics. To solve this, we introduce an efficient dynamic scaling algorithm -- CONet -- that automatically optimizes channel sizes across network layers for a given CNN. Two metrics -- ``textit{Rank}" and "textit{Rank Average Slope}" -- are introduced to identify the information accumulated in training. The algorithm dynamically scales channel sizes up or down over a fixed searching phase. We conduct experiments on CIFAR10/100 and ImageNet datasets and show that CONet can find efficient and accurate architectures searched in ResNet, DARTS, and DARTS spaces that outperform their baseline models.

【4】 Optimal Approximation with Sparse Neural Networks and Applications 标题：稀疏神经网络的最优逼近及其应用链接：https://arxiv.org/abs/2108.06467

作者：Khay Boon Hong 机构： Their performancesare compared and concluded based on the datasets in Modified National Instituteof Standards and Technology (MNIST) or the ImageNet Large Scale Visual Recog-nition Challenge (ILSVRC) [Sandler et al 备注：37 pages, no figures. Undergraduate Final Year Project 摘要：我们使用深度稀疏连接的神经网络，通过限制存储神经网络的连接性和内存需求，以$L^2（mathbb R^d）$度量函数类的复杂性。我们还介绍了表示系统——一个用于指导神经网络的可数函数集合，因为表示系统的近似理论在数学上已经得到了很好的发展。然后，我们证明了基本定界定理，这意味着函数类本身固有的一个量可以提供有关神经网络和表示系统的逼近能力的信息。我们还提供了一种将现有的表示系统近似理论转换为神经网络近似理论的方法，极大地扩大了神经网络的实用价值。最后，利用神经网络逼近B样条函数，生成B样条曲线。然后，我们利用率失真理论和楔子构造分析了一类名为$beta$卡通函数的复杂性。摘要：We use deep sparsely connected neural networks to measure the complexity of a function class in $L^2(mathbb R^d)$ by restricting connectivity and memory requirement for storing the neural networks. We also introduce representation system - a countable collection of functions to guide neural networks, since approximation theory with representation system has been well developed in Mathematics. We then prove the fundamental bound theorem, implying a quantity intrinsic to the function class itself can give information about the approximation ability of neural networks and representation system. We also provides a method for transferring existing theories about approximation by representation systems to that of neural networks, greatly amplifying the practical values of neural networks. Finally, we use neural networks to approximate B-spline functions, which are used to generate the B-spline curves. Then, we analyse the complexity of a class called $beta$ cartoon-like functions using rate-distortion theory and wedgelets construction.

【5】 Optimal and Efficient Algorithms for General Mixable Losses against Switching Oracles 标题：针对切换预言机的一般可混合损失的优化高效算法链接：https://arxiv.org/abs/2108.06411

作者：Kaan Gokcesu,Hakan Gokcesu 摘要：我们研究了在线学习问题，由于其在从机器学习到博弈论等广泛领域的适用性，在线学习近年来受到了广泛关注。具体地说，我们研究了动态环境下混合损失函数的在线优化问题。我们引入了在线混合方案，该方案渐近地实现了具有最优后悔冗余的切换预言机的最佳动态估计序列的性能。我们竞争的最佳动态估计序列是在事后充分观察损失函数的情况下选择的，并且允许在不同的时间间隔（段）内选择不同的最佳估计。我们在工作中提出了两种混合方案。首先，我们提出了一种易于处理的多项式时间复杂度算法，该算法可以实现难以处理的蛮力方法的最佳冗余度。其次，我们提出了一种有效的对数时间复杂度算法，该算法可以在恒定的多重间隔内实现最佳冗余。我们的结果保证在一个单独的序列方式中具有很强的确定性。摘要：We investigate the problem of online learning, which has gained significant attention in recent years due to its applicability in a wide range of fields from machine learning to game theory. Specifically, we study the online optimization of mixable loss functions in a dynamic environment. We introduce online mixture schemes that asymptotically achieves the performance of the best dynamic estimation sequence of the switching oracle with optimal regret redundancies. The best dynamic estimation sequence that we compete against is selected in hindsight with full observation of the loss functions and is allowed to select different optimal estimations in different time intervals (segments). We propose two mixtures in our work. Firstly, we propose a tractable polynomial time complexity algorithm that can achieve the optimal redundancy of the intractable brute force approach. Secondly, we propose an efficient logarithmic time complexity algorithm that can achieve the optimal redundancy up to a constant multiplicity gap. Our results are guaranteed to hold in a strong deterministic sense in an individual sequence manner.

预测|估计(5篇)

【1】 Neural Predictive Monitoring under Partial Observabilit 标题：部分可观测性下的神经预测监测链接：https://arxiv.org/abs/2108.07134

作者：Francesca Cairoli,Luca Bortolussi,Nicola Paoletti 机构： Department of Mathematics and Geosciences, Universita di Trieste, Italy, Modeling and Simulation Group, Saarland University, Germany, Department of Computer Science, Royal Holloway University, London 摘要：我们考虑预测监视（PM）的问题，即在运行时预测未来系统违反当前状态的问题。我们在最真实的环境下工作，在运行时只有部分和嘈杂的状态观测可用。这些设置直接影响可达性预测的准确性和可靠性，危及系统的安全。在这项工作中，我们提出了一种基于学习的PM方法，该方法在不考虑部分可观测性（PO）的情况下产生准确可靠的可达性预测。我们建立在神经预测监测（NPM）的基础上，这是一种利用深度神经网络逼近混合系统可达性的PM方法，并将其扩展到PO情况。我们提出并比较了两种解决方案，一种是端到端的方法，它直接对粗略的观测值进行操作，另一种是两步方法，它引入了中间状态估计步骤。这两种解决方案都依赖于共形预测，以提供1）预测区域形式的概率保证和2）预测不确定性的合理估计。我们使用后者来识别不可靠（以及可能错误的）预测，并对这些不确定输入（即主动学习）进行重新训练和改进监控器。我们的方法产生了高度精确的可达性预测和错误检测，以及具有保证覆盖率的紧密预测区域。摘要：We consider the problem of predictive monitoring (PM), i.e., predicting at runtime future violations of a system from the current state. We work under the most realistic settings where only partial and noisy observations of the state are available at runtime. Such settings directly affect the accuracy and reliability of the reachability predictions, jeopardizing the safety of the system. In this work, we present a learning-based method for PM that produces accurate and reliable reachability predictions despite partial observability (PO). We build on Neural Predictive Monitoring (NPM), a PM method that uses deep neural networks for approximating hybrid systems reachability, and extend it to the PO case. We propose and compare two solutions, an end-to-end approach, which directly operates on the rough observations, and a two-step approach, which introduces an intermediate state estimation step. Both solutions rely on conformal prediction to provide 1) probabilistic guarantees in the form of prediction regions and 2) sound estimates of predictive uncertainty. We use the latter to identify unreliable (and likely erroneous) predictions and to retrain and improve the monitors on these uncertain inputs (i.e., active learning). Our method results in highly accurate reachability predictions and error detection, as well as tight prediction regions with guaranteed coverage.

【2】 A physics-informed variational DeepONet for predicting the crack path in brittle materials 标题：预测脆性材料裂纹路径的物理信息变分深度网链接：https://arxiv.org/abs/2108.06905

作者：Somdatta Goswami,Minglang Yin,Yue Yu,George Karniadakis 机构：Division of Applied Mathematics, Brown University, Providence, RI, Center for Biomedical Engineering, Brown University, Providence, RI, School of Engineering, Brown University, Providence, RI, Department of Mathematics, Lehigh University, Bethlehem, PA 摘要：在脆性断裂应用中，失效轨迹、确定可能的失效区域和损伤统计是一些关键的相关数量。存在可靠估计这些相关量的高保真数值解算器，但它们在计算上要求裂纹的高分辨率。此外，即使域参数和/或材料特性发生微小变化，也需要进行独立的密集模拟。因此，需要快速且可推广的替代模型来减轻计算负担，但断裂力学的不连续性对开发此类模型提出了重大挑战。我们提出了用于脆性断裂分析的DeepONet（V-DeepONet）的物理信息变分公式。V-DeepONet经过训练，能够将缺陷的初始配置映射到相关领域（例如，损伤和位移领域）。一旦对网络进行训练，就可以快速获得该域上任何初始裂纹形状和加载步骤的整体解。虽然原始的DeepONet完全是数据驱动的，但我们采用了不同的路径来训练V-DeepONet，方法是将控制方程以变分形式施加，并且我们还使用了一些标记数据。我们通过两个脆性断裂基准来证明V-DeepOnet的有效性，并使用高保真解算器的结果来验证其准确性。考虑到裂缝建模对波动非常敏感，对物理定律和一些数据进行编码以训练网络使得替代模型能够准确执行插值和外推任务。所提出的V-DeepONet混合训练方法优于现有的方法，可以应用于具有复杂响应的各种动力系统。摘要：Failure trajectories, identifying the probable failure zones, and damage statistics are some of the key quantities of relevance in brittle fracture applications. High-fidelity numerical solvers that reliably estimate these relevant quantities exist but they are computationally demanding requiring a high resolution of the crack. Moreover, independent intensive simulations need to be carried out even for a small change in domain parameters and/or material properties. Therefore, fast and generalizable surrogate models are needed to alleviate the computational burden but the discontinuous nature of fracture mechanics presents a major challenge to developing such models. We propose a physics-informed variational formulation of DeepONet (V-DeepONet) for brittle fracture analysis. V-DeepONet is trained to map the initial configuration of the defect to the relevant fields of interests (e.g., damage and displacement fields). Once the network is trained, the entire global solution can be rapidly obtained for any initial crack configuration and loading steps on that domain. While the original DeepONet is solely data-driven, we take a different path to train the V-DeepONet by imposing the governing equations in variational form and we also use some labelled data. We demonstrate the effectiveness of V-DeepOnet through two benchmarks of brittle fracture, and we verify its accuracy using results from high-fidelity solvers. Encoding the physical laws and also some data to train the network renders the surrogate model capable of accurately performing both interpolation and extrapolation tasks, considering that fracture modeling is very sensitive to fluctuations. The proposed hybrid training of V-DeepONet is superior to state-of-the-art methods and can be applied to a wide array of dynamical systems with complex responses.

【3】 Nowcasting-Nets: Deep Neural Network Structures for Precipitation Nowcasting Using IMERG 标题：短时预报网：IMERG降水短时预报的深度神经网络结构链接：https://arxiv.org/abs/2108.06868

作者：Mohammad Reza Ehsani,Ariyan Zarei,Hoshin V. Gupta,Kobus Barnard,Ali Behrangi 机构： Department of Hydrology and Atmospheric Sciences, The University of Arizona; Tucson, AZ, Department of Computer Science, The University of Arizona; Tucson, AZ, Submitted to: arXiv 备注：41 Pages, 18 Figures 摘要：准确及时地估计降水量对于发布危险警告（如山洪暴发或滑坡）至关重要。目前的遥感降水产品与卫星数据的采集和处理有关，有几个小时的延迟时间。通过对这些产品应用健壮的即时广播系统，可以（原则上）减少延迟并提高其适用性、价值和影响。然而，由于大气的混沌性质，以及由此导致的降水系统结构的快速变化，这种系统的发展是复杂的，我们开发了两种方法（以下称为临近预报网），使用递归和卷积深度神经网络结构来应对降水临近预报的挑战。使用全球降水测量（GPM）综合多卫星检索对美国东部毗连地区（CONUS）的全球降水测量（IMERG）降水数据共训练了五个模型，然后根据东部和西部CONUS的独立数据进行测试。该模型设计用于提供最多1.5小时的提前期预测，并通过使用反馈回路方法，研究了该模型将预测时间延长至4.5小时的能力。将模型性能与随机森林（RF）和线性回归（LR）机器学习方法进行比较，并与使用最新观测作为预测的持久性基准（BM）进行比较。独立的IMERG观测被用作参考，并进行了实验，以检查涉及特定降水事件的总体统计数据和案例研究。总的来说，由临近预报网络模型提供的预测是优越的，带有剩余水头的卷积临近预报网络（CNC-R）在试验中分别提高了25%、28%和46%。。。摘要：Accurate and timely estimation of precipitation is critical for issuing hazard warnings (e.g., for flash floods or landslides). Current remotely sensed precipitation products have a few hours of latency, associated with the acquisition and processing of satellite data. By applying a robust nowcasting system to these products, it is (in principle) possible to reduce this latency and improve their applicability, value, and impact. However, the development of such a system is complicated by the chaotic nature of the atmosphere, and the consequent rapid changes that can occur in the structures of precipitation systems In this work, we develop two approaches (hereafter referred to as Nowcasting-Nets) that use Recurrent and Convolutional deep neural network structures to address the challenge of precipitation nowcasting. A total of five models are trained using Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals for GPM (IMERG) precipitation data over the Eastern Contiguous United States (CONUS) and then tested against independent data for the Eastern and Western CONUS. The models were designed to provide forecasts with a lead time of up to 1.5 hours and, by using a feedback loop approach, the ability of the models to extend the forecast time to 4.5 hours was also investigated. Model performance was compared against the Random Forest (RF) and Linear Regression (LR) machine learning methods, and also against a persistence benchmark (BM) that used the most recent observation as the forecast. Independent IMERG observations were used as a reference, and experiments were conducted to examine both overall statistics and case studies involving specific precipitation events. Overall, the forecasts provided by the Nowcasting-Net models are superior, with the Convolutional Nowcasting Network with Residual Head (CNC-R) achieving 25%, 28%, and 46% improvement in the test ...

【4】 Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations 标题：作为属性组合精度曲面的预测服务主动评估链接：https://arxiv.org/abs/2108.06514

作者：Vihari Piratla,Soumen Chakrabarty,Sunita Sarawagi 机构：Department of Computer Science, Indian Institute of Technology, Bombay 摘要：我们的目标是评估黑盒分类模型的准确性，不是作为给定测试数据分布的单个聚合，而是作为表征多个测试数据分布的大量属性组合的表面。随着机器学习模型被部署为一项服务，在这种服务中，训练数据分布对客户端隐藏，并且不同的客户端可能对数据分布的不同区域感兴趣，这种属性化的准确性度量变得非常重要。我们提出了一种基于高斯过程（GP）的概率估计方法——属性准确度分析（AAA），用于此类准确度曲面。每个被称为“arm”的属性组合都与贝塔密度相关联，从中可以对服务的准确性进行采样。我们期望GP平滑相关臂上的β密度参数，以减轻稀疏性。我们表明，GPs的明显应用无法解决在人口稀少且不均匀的巨大属性空间上异方差不确定性的挑战。作为回应，我们提出了两个增强：汇集稀疏观测值和正则化β密度的尺度参数。在引入这些创新之后，我们通过大量的实验和分析，确定了AAA在估算精度和勘探效率方面的有效性。摘要：Our goal is to evaluate the accuracy of a black-box classification model, not as a single aggregate on a given test data distribution, but as a surface over a large number of combinations of attributes characterizing multiple test data distributions. Such attributed accuracy measures become important as machine learning models get deployed as a service, where the training data distribution is hidden from clients, and different clients may be interested in diverse regions of the data distribution. We present Attributed Accuracy Assay (AAA)--a Gaussian Process (GP)--based probabilistic estimator for such an accuracy surface. Each attribute combination, called an 'arm', is associated with a Beta density from which the service's accuracy is sampled. We expect the GP to smooth the parameters of the Beta density over related arms to mitigate sparsity. We show that obvious application of GPs cannot address the challenge of heteroscedastic uncertainty over a huge attribute space that is sparsely and unevenly populated. In response, we present two enhancements: pooling sparse observations, and regularizing the scale parameter of the Beta densities. After introducing these innovations, we establish the effectiveness of AAA in terms of both its estimation accuracy and exploration efficiency, through extensive experiments and analysis.

【5】 Hybrid Gaussian Process Modeling Applied to Economic Stochastic Model Predictive Control of Batch Processes 标题：混合高斯过程建模在间歇过程经济随机模型预测控制中的应用链接：https://arxiv.org/abs/2108.06430

作者：E. Bradford,L. Imsland,M. Reble,E. A. del Rio-Chanona 机构： Norwegian University of Science and Technology, Department of Chemical Engineering 备注：None 摘要：非线性模型预测控制（NMPC）是控制具有约束的非线性多变量动态系统的一种有效方法，但它需要精确的对象模型。植物模型通常可以根据第一性原理确定，但模型的某些部分很难单独用物理定律推导出来。本文提出了一种混合高斯过程（GP）第一性原理建模方案，利用GPs对动态系统中难以用第一性原理描述的部分进行建模。GPs不仅给出了精确的预测，而且还量化了该模型的剩余不确定性。必须在控制算法中考虑这种不确定性，以防止违反约束和性能恶化。离线生成GPs的蒙特卡罗样本，以收紧NMPC的约束，确保在线满足联合概率约束。我们的方法的优点包括快速的在线评估时间，考虑在线学习的可能性，减少保守性，利用GPs的灵活性和第一原理模型的数据效率。该算法在一个具有挑战性的半间歇式生物反应器的案例研究中得到了验证。摘要：Nonlinear model predictive control (NMPC) is an efficient approach for the control of nonlinear multivariable dynamic systems with constraints, which however requires an accurate plant model. Plant models can often be determined from first principles, parts of the model are however difficult to derive using physical laws alone. In this paper a hybrid Gaussian process (GP) first principles modeling scheme is proposed to overcome this issue, which exploits GPs to model the parts of the dynamic system that are difficult to describe using first principles. GPs not only give accurate predictions, but also quantify the residual uncertainty of this model. It is vital to account for this uncertainty in the control algorithm, to prevent constraint violations and performance deterioration. Monte Carlo samples of the GPs are generated offline to tighten constraints of the NMPC to ensure joint probabilistic constraint satisfaction online. Advantages of our method include fast online evaluation times, possibility to account for online learning alleviating conservativeness, and exploiting the flexibility of GPs and the data efficiency of first principle models. The algorithm is verified on a case study involving a challenging semi-batch bioreactor.

其他神经网络|深度学习|模型|建模(18篇)

【1】 On the Opportunities and Risks of Foundation Models 标题：论基础模型的机遇与风险链接：https://arxiv.org/abs/2108.07258

作者：Rishi Bommasani,Drew A. Hudson,Ehsan Adeli,Russ Altman,Simran Arora,Sydney von Arx,Michael S. Bernstein,Jeannette Bohg,Antoine Bosselut,Emma Brunskill,Erik Brynjolfsson,Shyamal Buch,Dallas Card,Rodrigo Castellon,Niladri Chatterji,Annie Chen,Kathleen Creel,Jared Quincy Davis,Dora Demszky,Chris Donahue,Moussa Doumbouya,Esin Durmus,Stefano Ermon,John Etchemendy,Kawin Ethayarajh,Li Fei-Fei,Chelsea Finn,Trevor Gale,Lauren Gillespie,Karan Goel,Noah Goodman,Shelby Grossman,Neel Guha,Tatsunori Hashimoto,Peter Henderson,John Hewitt,Daniel E. Ho,Jenny Hong,Kyle Hsu,Jing Huang,Thomas Icard,Saahil Jain,Dan Jurafsky,Pratyusha Kalluri,Siddharth Karamcheti,Geoff Keeling,Fereshte Khani,Omar Khattab,Pang Wei Koh,Mark Krass,Ranjay Krishna,Rohith Kuditipudi,Ananya Kumar,Faisal Ladhak,Mina Lee,Tony Lee,Jure Leskovec,Isabelle Levent,Xiang Lisa Li,Xuechen Li,Tengyu Ma,Ali Malik,Christopher D. Manning,Suvir Mirchandani,Eric Mitchell,Zanele Munyikwa,Suraj Nair,Avanika Narayan,Deepak Narayanan,Ben Newman,Allen Nie,Juan Carlos Niebles,Hamed Nilforoshan,Julian Nyarko,Giray Ogut,Laurel Orr,Isabel Papadimitriou,Joon Sung Park,Chris Piech,Eva Portelance,Christopher Potts,Aditi Raghunathan,Rob Reich,Hongyu Ren,Frieda Rong,Yusuf Roohani,Camilo Ruiz,Jack Ryan,Christopher Ré,Dorsa Sadigh,Shiori Sagawa,Keshav Santhanam,Andy Shih,Krishnan Srinivasan,Alex Tamkin,Rohan Taori,Armin W. Thomas,Florian Tramèr,Rose E. Wang,William Wang,Bohan Wu,Jiajun Wu,Yuhuai Wu,Sang Michael Xie,Michihiro Yasunaga,Jiaxuan You,Matei Zaharia,Michael Zhang,Tianyi Zhang,Xikun Zhang,Yuhui Zhang,Lucia Zheng,Kaitlyn Zhou,Percy Liang 机构：Dorottya Demszky, Center for Research on Foundation Models (CRFM) — Stanford University 备注：Published by the Center for Research on Foundation Models (this https URL) 摘要：人工智能正在经历一场范式转换，模型（例如，BERT、DALL-e、GPT-3）的兴起，这些模型在大规模的大数据上进行训练，并且能够适应广泛的下游任务。我们把这些模型称为基础模型来强调它们的中心性和不完整性。该报告提供了基础模型的机会和风险，包括从能力（例如，语言、视觉、机器人、推理、人类交互）和技术原理（例如，模型体系结构、训练程序、数据、系统、安全性、评估、理论）到它们的应用（例如，法律）。医疗、教育）和社会影响（例如，不平等、滥用、经济和环境影响、法律和道德考虑）。虽然基础模型是基于传统的深度学习和迁移学习，但它们的规模导致新的应急能力，并且它们在许多任务上的有效性都会激励均质化。均质化提供了强大的杠杆作用，但需要谨慎，因为基础模型的缺陷是由下游所有适应模型所继承的。尽管迫在眉睫的广泛部署的基础模型，我们目前还没有一个明确的了解，他们如何工作，当他们失败，以及他们甚至能够由于其紧急性质。为了解决这些问题，我们相信对基础模型的大量研究将需要与他们的基本社会技术本质相称的深刻的跨学科合作。摘要：AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles (e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on conventional deep learning and transfer learning, their scale results in new emergent capabilities, and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

【2】 Hierarchical Infinite Relational Model 标题：分层无限关系模型链接：https://arxiv.org/abs/2108.07208

作者：Feras A. Saad,Vikash K. Mansinghka 机构： Massachusetts Institute of Technology, Cambridge, MA, USA 备注：11 pages, 6 figures, 4 tables. Appearing in UAI 2021 摘要：本文描述了分层无限关系模型（HIRM），这是一种针对噪声、稀疏和异构关系数据的新概率生成模型。给定一组定义在一组域上的关系，该模型首先使用顶级中餐馆流程推断出多个不重叠的关系簇。在每个关系簇中，使用Dirichlet过程混合对域实体进行划分，并对关系值的概率分布进行建模。HIRM概括了标准的无限关系模型，可用于各种数据分析任务，包括相关性检测、聚类和密度估计。本文提出了一种新的基于Gibbs抽样的完全贝叶斯后验推理算法。我们在20个对象属性数据集的密度估计基准上展示了该方法的有效性，这些数据集包含多达1800万个单元，并使用它来发现来自政治学和基因组学的真实世界数据集中的关系结构。摘要：This paper describes the hierarchical infinite relational model (HIRM), a new probabilistic generative model for noisy, sparse, and heterogeneous relational data. Given a set of relations defined over a collection of domains, the model first infers multiple non-overlapping clusters of relations using a top-level Chinese restaurant process. Within each cluster of relations, a Dirichlet process mixture is then used to partition the domain entities and model the probability distribution of relation values. The HIRM generalizes the standard infinite relational model and can be used for a variety of data analysis tasks including dependence detection, clustering, and density estimation. We present new algorithms for fully Bayesian posterior inference via Gibbs sampling. We illustrate the efficacy of the method on a density estimation benchmark of twenty object-attribute datasets with up to 18 million cells and use it to discover relational structure in real-world datasets from politics and genomics.

【3】 Identifying and Exploiting Structures for Reliable Deep Learning 标题：识别和开发用于可靠深度学习的结构链接：https://arxiv.org/abs/2108.07083

作者：Amartya Sanyal 机构：St Hugh’s College, A thesis submitted for the degree of, Doctor of Philosophy, University of Oxford, Trinity ,., arXiv:,.,v, [cs.LG] , Aug 备注：Final Thesis for DPhil in Computer Science submitted at the University of Oxford 摘要：深度学习研究最近在包括计算机视觉、自然语言处理和强化学习在内的一系列任务中取得了令人印象深刻的快速进展。这些系统的非凡性能常常给人一种印象，即它们可以用来改变我们的生活，使我们变得更好。然而，正如最近的研究所指出的，这些系统存在一些问题，使得它们在现实世界中使用起来不可靠，包括易受敌对攻击（Szegedy等人[248]）、易记忆噪音（Zhang等人[292]）、对错误预测过于自信（错误校准）（Guo等人[99]），不适合处理私有数据（Gilad Bachrach等人[88]）。在这篇论文中，我们详细研究了每一个问题，调查了它们的原因，并提出了在实践中减少这些问题的廉价算法。为此，我们确定了深层神经网络中的结构，这些结构可以用来缓解深层学习算法不可靠的上述原因。摘要：Deep learning research has recently witnessed an impressively fast-paced progress in a wide range of tasks including computer vision, natural language processing, and reinforcement learning. The extraordinary performance of these systems often gives the impression that they can be used to revolutionise our lives for the better. However, as recent works point out, these systems suffer from several issues that make them unreliable for use in the real world, including vulnerability to adversarial attacks (Szegedy et al. [248]), tendency to memorise noise (Zhang et al. [292]), being over-confident on incorrect predictions (miscalibration) (Guo et al. [99]), and unsuitability for handling private data (Gilad-Bachrach et al. [88]). In this thesis, we look at each of these issues in detail, investigate their causes, and propose computationally cheap algorithms for mitigating them in practice. To do this, we identify structures in deep neural networks that can be exploited to mitigate the above causes of unreliability of deep learning algorithms.

【4】 WiseR: An end-to-end structure learning and deployment framework for causal graphical models 标题：WISER：一个面向因果图形模型的端到端结构学习和部署框架链接：https://arxiv.org/abs/2108.07046

作者：Shubham Maheshwari,Khushbu Pahwa,Tavpritesh Sethi 机构： Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, Near, Govind Puri Metro Station, New Delhi, Delhi , India. , Delhi Technological University, Bawana Rd, Shahbad Daulatpur Village, Rohini, Delhi 摘要：结构学习为复杂生物数据的因果和机械建模提供了一种表达、通用和可解释的方法。我们介绍了wiseR，一个开源应用程序，用于使用图形神经网络和贝叶斯网络学习、评估和部署健壮的因果图形模型。我们通过在COVID-19临床数据集中的生物标记物发现应用，展示了该应用的实用性。摘要：Structure learning offers an expressive, versatile and explainable approach to causal and mechanistic modeling of complex biological data. We present wiseR, an open source application for learning, evaluating and deploying robust causal graphical models using graph neural networks and Bayesian networks. We demonstrate the utility of this application through application on for biomarker discovery in a COVID-19 clinical dataset.

【5】 Challenges for cognitive decoding using deep learning methods 标题：使用深度学习方法进行认知解码面临的挑战链接：https://arxiv.org/abs/2108.06896

作者：Armin W. Thomas,Christopher Ré,Russell A. Poldrack 机构：★ Stanford Data Science, Stanford University, Stanford, CA, USA, ♢ Department of Psychology, Stanford University, Stanford, CA, USA, ◾Department of Computer Science, Stanford University, Stanford, CA, USA 摘要：在认知解码中，研究人员的目标是通过识别可以从大脑区域的活动中识别的认知状态（例如，接受/拒绝赌博）来表征大脑区域的表征。深度学习（DL）方法在认知解码方面有着巨大的应用前景，因为它们具有无与伦比的能力来学习复杂数据的多种表示形式。然而，由于其普遍缺乏可解释性，以及难以将其应用于小数据集和确保其可再现性和鲁棒性，它们在认知解码中的广泛应用受到阻碍。我们建议通过利用可解释人工智能和转移学习的最新进展来应对这些挑战，同时还就如何提高DL建模结果的再现性和鲁棒性提供具体建议。摘要：In cognitive decoding, researchers aim to characterize a brain region's representations by identifying the cognitive states (e.g., accepting/rejecting a gamble) that can be identified from the region's activity. Deep learning (DL) methods are highly promising for cognitive decoding, with their unmatched ability to learn versatile representations of complex data. Yet, their widespread application in cognitive decoding is hindered by their general lack of interpretability as well as difficulties in applying them to small datasets and in ensuring their reproducibility and robustness. We propose to approach these challenges by leveraging recent advances in explainable artificial intelligence and transfer learning, while also providing specific recommendations on how to improve the reproducibility and robustness of DL modeling results.

【6】 An Investigation of Replay-based Approaches for Continual Learning 标题：基于回放的继续学习方法研究链接：https://arxiv.org/abs/2108.06758

作者：Benedikt Bagus,Alexander Gepperth 机构：Fulda University of Applied Sciences, Fulda, Germany 备注：Accepted at the IJCNN2021, 9 pages, 1 figure 摘要：连续学习（CL）是机器学习（ML）的一个主要挑战，描述了在不发生灾难性遗忘（CF）的情况下连续学习多个任务的能力。最近的工作表明，CL是一个复杂的主题，当涉及到具有多个约束的真实场景时，更是如此。已经提出了几个解决方案类，其中所谓的基于重播的方法由于其简单性和健壮性而非常有前途。这种方法将过去样本的子集存储在专用内存中，以供以后处理：虽然这并不能解决所有问题，但已经获得了良好的结果。在本文中，我们实证研究了基于重播的持续学习方法，并评估了其应用潜力。在一套共同的基准上比较最近选定的方法以及自己的建议，特别侧重于评估不同样本选择战略的绩效。我们发现，当存储的样本数量较少时，样本选择的影响会增加。然而，不同的重播方法之间的性能差别很大。令人惊讶的是，我们发现，我们在这里提出的最简单的基于排练的方法可以比最近最先进的方法表现得更好。摘要：Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF). Recent works indicate that CL is a complex topic, even more so when real-world scenarios with multiple constraints are involved. Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness. Such approaches store a subset of past samples in a dedicated memory for later processing: while this does not solve all problems, good results have been obtained. In this article, we empirically investigate replay-based approaches of continual learning and assess their potential for applications. Selected recent approaches as well as own proposals are compared on a common set of benchmarks, with a particular focus on assessing the performance of different sample selection strategies. We find that the impact of sample selection increases when a smaller number of samples is stored. Nevertheless, performance varies strongly between different replay approaches. Surprisingly, we find that the most naive rehearsal-based approaches that we propose here can outperform recent state-of-the-art methods.

【7】 Deep Geospatial Interpolation Networks 标题：深层次地理空间内插网络链接：https://arxiv.org/abs/2108.06670

作者：Sumit Kumar Varshney,Jeetu Kumar,Aditya Tiwari,Rishabh Singh,Venkata M. V. Gunturi,Narayanan C. Krishnan 机构：Indian Institute of Technology Ropar, Punjab, India 摘要：时空数据插值在气候、交通和采矿等领域有着广泛的应用。由于复杂的时空关系，时空插值具有很大的挑战性。然而，传统的技术，如克里金法，在空间和时间维度上表现出高度差异的数据上运行时间长，性能差。为此，我们提出了一种新的深度神经网络，称为深度地理空间插值网络（DGIN），它结合了空间和时间关系，并且具有显著较低的训练时间。DGIN由三个主要组件组成：用于捕捉空间相关性的空间编码器、用于合并时间动态的顺序模块以及用于了解间隙周围时间邻域重要性的注意块。我们在来自两个不同区域的MODIS反射率数据集上评估DGIN。我们的实验结果表明，DGIN有两个优点：（a）它优于其他方法（具有较低的MSE，p值<0.01）和（b）它的执行时间比Kriging低得多。摘要：Interpolation in Spatio-temporal data has applications in various domains such as climate, transportation, and mining. Spatio-Temporal interpolation is highly challenging due to the complex spatial and temporal relationships. However, traditional techniques such as Kriging suffer from high running time and poor performance on data that exhibit high variance across space and time dimensions. To this end, we propose a novel deep neural network called as Deep Geospatial Interpolation Network(DGIN), which incorporates both spatial and temporal relationships and has significantly lower training time. DGIN consists of three major components: Spatial Encoder to capture the spatial dependencies, Sequential module to incorporate the temporal dynamics, and an Attention block to learn the importance of the temporal neighborhood around the gap. We evaluate DGIN on the MODIS reflectance dataset from two different regions. Our experimental results indicate that DGIN has two advantages: (a) it outperforms alternative approaches (has lower MSE with p-value < 0.01) and, (b) it has significantly low execution time than Kriging.

【8】 Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach 标题：连续时空中的策略评估与时差学习：一种鞅方法链接：https://arxiv.org/abs/2108.06655

作者：Yanwei Jia,Xun Yu Zhou 备注：46 pages, 9 figures 摘要：我们提出了一个统一的框架来研究在连续时间和空间中强化学习的策略评估（PE）和相关的时间差分（TD）方法。我们证明了PE等价于保持过程的鞅条件。从这个角度来看，我们发现均方TD误差近似于鞅的二次变化，因此不是PE的合适目标。我们提出了两种利用鞅特征设计PE算法的方法。第一种方法最小化一个“鞅损失函数”，其解被证明是均方意义下真值函数的最佳逼近。该方法解释了经典的梯度蒙特卡罗算法。第二种方法基于一个称为“鞅正交条件”的方程组和“测试函数”。以不同的方式求解这些方程可以恢复各种经典的TD算法，例如TD（$lambda$）、LSTD和GTD。测试函数的不同选择决定了结果解在何种意义上近似真值函数。此外，我们还证明了当网格尺寸为零时，任何收敛时间离散化算法都收敛到其连续时间对应的算法。我们通过数值实验和应用证明了理论结果和相应的算法。摘要：We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE. We present two methods to use the martingale characterization for designing PE algorithms. The first one minimizes a "martingale loss function", whose solution is proved to be the best approximation of the true value function in the mean--square sense. This method interprets the classical gradient Monte-Carlo algorithm. The second method is based on a system of equations called the "martingale orthogonality conditions" with "test functions". Solving these equations in different ways recovers various classical TD algorithms, such as TD($lambda$), LSTD, and GTD. Different choices of test functions determine in what sense the resulting solutions approximate the true value function. Moreover, we prove that any convergent time-discretized algorithm converges to its continuous-time counterpart as the mesh size goes to zero. We demonstrate the theoretical results and corresponding algorithms with numerical experiments and applications.

【9】 On Multi-Modal Learning of Editing Source Code 标题：论源代码编辑的多模态学习链接：https://arxiv.org/abs/2108.06645

作者：Saikat Chakraborty,Baishakhi Ray 机构：Department of Computer Science, Columbia University, New York, NY, USA 备注：Accepted for publication in 36th IEEE/ACM conference on Automated Software Engineering (ASE-2021) 摘要：近年来，神经机器翻译（NMT）在自动编辑源代码方面显示出良好的前景。典型的基于NMT的代码编辑器只考虑需要更改的代码作为输入，并建议开发人员从已修补代码的列表中进行选择，其中正确的代码可能并不总是位于列表的顶部。虽然基于NMT的代码编辑系统生成了大量看似合理的补丁，但正确的补丁取决于开发人员的需求，通常取决于应用补丁的上下文。因此，如果开发人员使用自然语言或提供补丁上下文提供一些提示，NMT模型可以从中受益。作为概念证明，在本研究中，我们利用三种信息模式：编辑位置、编辑代码上下文、提交消息（作为开发人员在自然语言中提示的代理）来使用NMT模型自动生成编辑。为此，我们构建了MODIT，一个基于多模式NMT的代码编辑引擎。通过深入的调查和分析，我们发现，开发人员的提示作为一种输入方式，可以缩小补丁的搜索空间，并优于最先进的模型，以生成排名第一的正确补丁代码。摘要：In recent years, Neural Machine Translator (NMT) has shown promise in automatically editing source code. Typical NMT based code editor only considers the code that needs to be changed as input and suggests developers with a ranked list of patched code to choose from - where the correct one may not always be at the top of the list. While NMT based code editing systems generate a broad spectrum of plausible patches, the correct one depends on the developers' requirement and often on the context where the patch is applied. Thus, if developers provide some hints, using natural language, or providing patch context, NMT models can benefit from them. As a proof of concept, in this research, we leverage three modalities of information: edit location, edit code context, commit messages (as a proxy of developers' hint in natural language) to automatically generate edits with NMT models. To that end, we build MODIT, a multi-modal NMT based code editing engine. With in-depth investigation and analysis, we show that developers' hint as an input modality can narrow the search space for patches and outperform state-of-the-art models to generate correctly patched code in top-1 position.

【10】 LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling 标题：LayerPipes：通过层内和层间梯度流水线和多处理器调度加速深度神经网络训练链接：https://arxiv.org/abs/2108.06629

作者：Nanda K. Unnikrishnan,Keshab K. Parhi 机构：Dept. Electrical and Computer Engineering, University of Minnesota 备注：Proc. of the 2021 IEEE International Conference on Computer Aided Design (ICCAD) 摘要：训练神经网络所需的时间随着规模、复杂性和深度的增加而增加。通过反向传播对模型参数进行训练固有地产生反馈回路。这些循环阻碍了层内和连续层之间任务的高效流水线和调度。以前的方法，如PipeDream，已经利用延迟梯度来实现层间流水线。然而，这些方法将整个反向传播视为单个任务；这会导致计算时间增加和处理器利用率不足。本文提出了一种新的优化方法，其中关于权重和激活函数的梯度计算是独立考虑的；因此，这些可以并行计算。这称为层内优化。此外，关于激活函数的梯度计算进一步分为两部分并分布到两个连续层。这导致平衡调度，其中每层的计算时间相同。这称为层间优化。建议的系统称为LayerPipe，它减少了训练所需的时钟周期数，同时以最小的处理器间通信开销最大化处理器利用率。与PipeDream相比，LayerPipe使用7到9个处理器实现了25%和80%以上的平均加速比，通信开销更少。摘要：The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.

【11】 Investigating the Relationship Between Dropout Regularization and Model Complexity in Neural Networks 标题：神经网络中丢弃正则化与模型复杂度关系的研究链接：https://arxiv.org/abs/2108.06628

作者：Christopher Sun,Jai Sharma,Milind Maiti 摘要：辍学正则化可以减少方差，在深度学习模型中几乎无处不在。我们通过在我们选择的三个数据集中的每一个上训练2000个神经网络，这些神经网络配置有每个密集层中的丢失率和隐藏单元数的随机组合，从而探索丢失率和模型复杂性之间的关系。生成的图形在z轴上具有二元交叉熵损失和二元精度，对增加密集层的深度，同时增加辍学率肯定会提高性能这一常见假设提出质疑。我们还发现了两个超参数之间的复杂关联，我们通过构建额外的机器学习和深度学习模型来量化这些超参数，这些模型预测了在每个密集层中给定一些隐藏单元的最佳辍学率。线性回归和多项式逻辑回归需要使用任意阈值来选择回归中包含的成本数据点，并分别为成本数据点分配二元分类。这些机器学习模型的性能一般，因为它们的幼稚本质阻止了复杂决策边界的建模。转向深度学习模型，我们构建了神经网络，根据每个密集层中隐藏单元的数量、所需成本和所需模型精度，预测最佳辍学率。尽管如此，这一尝试遇到了一个数学错误，可归因于垂直线测试的失败。最终的深度学习模型是一个神经网络，其决策边界表示先前生成的2000个数据点。这个最终的模型引导我们设计出一种有前途的方法来调整超参数，以最小化计算开销，同时最大限度地提高性能。该策略可应用于任何模型超参数，有望在工业模型中实现更高效的调整。摘要：Dropout Regularization, serving to reduce variance, is nearly ubiquitous in Deep Learning models. We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks configured with random combinations of the dropout rate and the number of hidden units in each dense layer, on each of the three data sets we selected. The generated figures, with binary cross entropy loss and binary accuracy on the z-axis, question the common assumption that adding depth to a dense layer while increasing the dropout rate will certainly enhance performance. We also discover a complex correlation between the two hyperparameters that we proceed to quantify by building additional machine learning and Deep Learning models which predict the optimal dropout rate given some hidden units in each dense layer. Linear regression and polynomial logistic regression require the use of arbitrary thresholds to select the cost data points included in the regression and to assign the cost data points a binary classification, respectively. These machine learning models have mediocre performance because their naive nature prevented the modeling of complex decision boundaries. Turning to Deep Learning models, we build neural networks that predict the optimal dropout rate given the number of hidden units in each dense layer, the desired cost, and the desired accuracy of the model. Though, this attempt encounters a mathematical error that can be attributed to the failure of the vertical line test. The ultimate Deep Learning model is a neural network whose decision boundary represents the 2,000 previously generated data points. This final model leads us to devise a promising method for tuning hyperparameters to minimize computational expense yet maximize performance. The strategy can be applied to any model hyperparameters, with the prospect of more efficient tuning in industrial models.

【12】 The Neural Network shifted-Proper Orthogonal Decomposition: a Machine Learning Approach for Non-linear Reduction of Hyperbolic Equations 标题：神经网络移位-本征正交分解：双曲型方程非线性降阶的一种机器学习方法链接：https://arxiv.org/abs/2108.06558

作者：Davide Papapicco,Nicola Demo,Michele Girfoglio,Giovanni Stabile,Gianluigi Rozza 机构：Mathematics Area, mathLab, SISSA, Via Bonomea, Trieste, Italy, Department of Electronics and Telecommunications, Politecnico di Torino, C.so, Duca degli Abruzzi, Torino, Italy 摘要：对于基于投影的降阶模型，具有主导平流的模型一直是一个难题。最近提出的许多方法都是基于全阶解的预处理来加速Kolmogorov N-宽度衰减，从而获得更小的线性子空间并提高精度。然而，这些方法必须依赖于解的相空间中特征速度的知识，将其适用范围限制在对流场具有显式函数形式的问题上。在这项工作中，我们通过实现深度学习体系结构来解决在统计学习框架中自动检测正确预处理转换的问题。纯数据驱动方法使我们能够将现有的线性子空间处理方法推广到具有未知平流场的非线性双曲问题。该算法通过简单的测试用例进行了验证，以测试其性能，并成功应用于多相仿真。摘要：Models with dominant advection always posed a difficult challenge for projection-based reduced order modelling. Many methodologies that have recently been proposed are based on the pre-processing of the full-order solutions to accelerate the Kolmogorov N-width decay thereby obtaining smaller linear subspaces with improved accuracy. These methods however must rely on the knowledge of the characteristic speeds in phase space of the solution, limiting their range of applicability to problems with explicit functional form for the advection field. In this work we approach the problem of automatically detecting the correct pre-processing transformation in a statistical learning framework by implementing a deep-learning architecture. The purely data-driven method allowed us to generalise the existing approaches of linear subspace manipulation to non-linear hyperbolic problems with unknown advection fields. The proposed algorithm has been validated against simple test cases to benchmark its performances and later successfully applied to a multiphase simulation.

【13】 Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling 标题：具有多层次注意机制的堆叠沙漏网络：在哪里寻找椎间盘标记链接：https://arxiv.org/abs/2108.06554

作者：Reza Azad,Lucas Rouhier,Julien Cohen-Adad 机构： NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montreal, Mila, Quebec AI Institute, Canada, Functional Neuroimaging Unit, CRIUGM, University of Montreal, Montreal 备注：None 摘要：从MRI扫描中标记椎间盘对于正确诊断脊柱相关疾病非常重要，包括多发性硬化症、肌萎缩侧索硬化症、退行性颈脊髓病和癌症。在MRI数据中自动标记椎间盘是一项困难的任务，因为椎间盘和骨区域之间的相似性，个体间脊柱和周围组织几何结构的变异性，以及扫描（制造商、脉冲序列、图像对比度、分辨率和人工制品）的变异性。在以前的研究中，椎间盘标记通常是在椎间盘检测步骤之后进行的，当定位算法遗漏椎间盘或检测到假阳性时，大多会失败。在这项工作中，我们的目标是通过使用姿势估计技术重新制定语义椎间盘标记来缓解这个问题。为此，我们提出了一种具有多级注意机制的叠层沙漏网络来共同学习椎间盘位置及其骨架结构。所提出的深度学习模型考虑了语义分割和姿态估计技术的优点来处理缺失区域和误报检测。为了进一步提高该方法的性能，我们提出了一种基于骨架的搜索空间来减少误报检测。该方法在spine通用公共多中心数据集上进行了评估，在T1w和T2w对比度方面，与之前的工作相比，表现出更好的性能。该方法在ivadomed中实现(https://ivadomed.org). 摘要：Labeling vertebral discs from MRI scans is important for the proper diagnosis of spinal related diseases, including multiple sclerosis, amyotrophic lateral sclerosis, degenerative cervical myelopathy and cancer. Automatic labeling of the vertebral discs in MRI data is a difficult task because of the similarity between discs and bone area, the variability in the geometry of the spine and surrounding tissues across individuals, and the variability across scans (manufacturers, pulse sequence, image contrast, resolution and artefacts). In previous studies, vertebral disc labeling is often done after a disc detection step and mostly fails when the localization algorithm misses discs or has false positive detection. In this work, we aim to mitigate this problem by reformulating the semantic vertebral disc labeling using the pose estimation technique. To do so, we propose a stacked hourglass network with multi-level attention mechanism to jointly learn intervertebral disc position and their skeleton structure. The proposed deep learning model takes into account the strength of semantic segmentation and pose estimation technique to handle the missing area and false positive detection. To further improve the performance of the proposed method, we propose a skeleton-based search space to reduce false positive detection. The proposed method evaluated on spine generic public multi-center dataset and demonstrated better performance comparing to previous work, on both T1w and T2w contrasts. The method is implemented in ivadomed (https://ivadomed.org).

【14】 Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models 标题：基于贝叶斯层次模型的元数据多任务BITITS 链接：https://arxiv.org/abs/2108.06422

作者：Runzhe Wan,Lin Ge,Rui Song 机构：Department of Statistics, North Carolina State University 摘要：如何有效地探索是多武装匪徒研究的中心问题。在本文中，我们介绍了基于元数据的多任务bandit问题，其中agent需要解决大量相关的多武装bandit任务，并且可以利用一些特定于任务的特性（即元数据）跨任务共享知识。作为一个通用框架，我们提出通过贝叶斯层次模型的视角来捕捉任务关系，在此基础上设计了一个汤普森抽样算法，以有效地学习任务关系、共享信息和最小化累积遗憾。详细分析了高斯土匪和伯努利土匪的两个具体例子。高斯土匪的贝叶斯遗憾清楚地表明了我们的算法共享信息的好处。大量实验进一步证明了该方法的有效性。摘要：How to explore efficiently is a central problem in multi-armed bandits. In this paper, we introduce the metadata-based multi-task bandit problem, where the agent needs to solve a large number of related multi-armed bandit tasks and can leverage some task-specific features (i.e., metadata) to share knowledge across tasks. As a general framework, we propose to capture task relations through the lens of Bayesian hierarchical models, upon which a Thompson sampling algorithm is designed to efficiently learn task relations, share information, and minimize the cumulative regrets. Two concrete examples for Gaussian bandits and Bernoulli bandits are carefully analyzed. The Bayes regret for Gaussian bandits clearly demonstrates the benefits of information sharing with our algorithm. The proposed method is further supported by extensive experiments.

【15】 The Sharpe predictor for fairness in machine learning 标题：机器学习中的Sharpe公平性预测器链接：https://arxiv.org/abs/2108.06415

作者：Suyun Liu,Luis Nunes Vicente 机构：†Department of Industrial and Systems Engineering, Lehigh University 摘要：在机器学习（ML）应用中，不公平的预测可能会歧视少数群体。大多数现有的公平机器学习（FML）方法将公平性视为ML模型优化中的一个约束或惩罚项，这不会导致发现学习精度和公平性指标之间权衡的完整情况，也不会以有意义的方式整合公平性。最近，我们引入了一种新的基于随机多目标优化（SMOO）的FML范式，其中准确度和公平性指标是需要同时优化的冲突目标。整个权衡范围被定义为SMOO问题的帕累托前沿，然后可以使用随机梯度型算法有效地计算该前沿。SMOO还允许为FML定义和计算新的有意义的预测器，一个新的预测器是我们在本文中介绍和探讨的Sharpe预测器，它提供了最高的准确率和不公平率。FML的Sharpe预测器受金融SMOO的启发，每单位预测风险（不公平）提供最高的预测回报（准确度）。摘要：In machine learning (ML) applications, unfair predictions may discriminate against a minority group. Most existing approaches for fair machine learning (FML) treat fairness as a constraint or a penalization term in the optimization of a ML model, which does not lead to the discovery of the complete landscape of the trade-offs among learning accuracy and fairness metrics, and does not integrate fairness in a meaningful way. Recently, we have introduced a new paradigm for FML based on Stochastic Multi-Objective Optimization (SMOO), where accuracy and fairness metrics stand as conflicting objectives to be optimized simultaneously. The entire trade-offs range is defined as the Pareto front of the SMOO problem, which can then be efficiently computed using stochastic-gradient type algorithms. SMOO also allows defining and computing new meaningful predictors for FML, a novel one being the Sharpe predictor that we introduce and explore in this paper, and which gives the highest ratio of accuracy-to-unfairness. Inspired from SMOO in finance, the Sharpe predictor for FML provides the highest prediction return (accuracy) per unit of prediction risk (unfairness).

【16】 Interpreting and improving deep-learning models with reality checks 标题：用现实检验解释和改进深度学习模型链接：https://arxiv.org/abs/2108.06847

作者：Chandan Singh,Wooseok Ha,Bin Yu 机构：University of California, Berkeley, Berkeley CA, USA, ∗ Equal contribution 摘要：最近的深度学习模型通过学习许多变量的复杂函数，取得了令人印象深刻的预测性能，通常以可解释性为代价。本章介绍了最近的工作，旨在通过将重要性归因于单个预测的特征和特征组来解释模型。重要的是，除了隔离的特征外，建议的属性还重视特征之间的交互。这些属性显示出对现实世界领域的洞察，包括生物成像、宇宙学图像和自然语言处理。然后，我们将展示如何使用这些属性来直接提高神经网络的泛化能力或将其提取到一个简单的模型中。在本章中，我们强调使用真实性检查来仔细检查拟议的解释技术。摘要：Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in addition to features in isolation. These attributions are shown to yield insights across real-world domains, including bio-imaging, cosmology image and natural-language processing. We then show how these attributions can be used to directly improve the generalization of a neural network or to distill it into a simple model. Throughout the chapter, we emphasize the use of reality checks to scrutinize the proposed interpretation techniques.

【17】 High-dimensional Assisted Generative Model for Color Image Restoration 标题：彩色图像恢复的高维辅助生成模型链接：https://arxiv.org/abs/2108.06460

作者：Kai Hong,Chunhua Wu,Cailian Yang,Minghui Zhang,Yancheng Lu,Yuhao Wang,Qiegen Liu 机构：I 备注：12 pages,11 figures 摘要：本文提出了一种基于高维辅助评分生成模型的无监督深度学习方案，用于彩色图像恢复任务。考虑到基于分数的生成模型中的样本数和内部维数对估计数据分布梯度具有关键影响，提出了两种不同的高维方法：通道拷贝变换增加样本数，像素尺度变换减少可行空间维数。随后，使用由这些变换表示的一组高维张量通过去噪分数匹配来训练网络。然后，通过退火Langevin dynamics和替代数据一致性更新执行采样。此外，为了缓解学习高维表示的困难，提出了一种渐进策略来利用性能。与其他数据驱动方法相比，所提出的无监督学习和迭代恢复算法具有透明和清晰的解释，该算法涉及一个预先训练的生成网络来获取先验信息。脱漆和修复的实验结果表明，该方法具有显著的性能和多样性。摘要：This work presents an unsupervised deep learning scheme that exploiting high-dimensional assisted score-based generative model for color image restoration tasks. Considering that the sample number and internal dimension in score-based generative model have key influence on estimating the gradients of data distribution, two different high-dimensional ways are proposed: The channel-copy transformation increases the sample number and the pixel-scale transformation decreases feasible space dimension. Subsequently, a set of high-dimensional tensors represented by these transformations are used to train the network through denoising score matching. Then, sampling is performed by annealing Langevin dynamics and alternative data-consistency update. Furthermore, to alleviate the difficulty of learning high-dimensional representation, a progressive strategy is proposed to leverage the performance. The proposed unsupervised learning and iterative restoration algo-rithm, which involves a pre-trained generative network to obtain prior, has transparent and clear interpretation compared to other data-driven approaches. Experimental results on demosaicking and inpainting conveyed the remarkable performance and diversity of our proposed method.

【18】 A Machine-Learning-Ready Dataset Prepared from the Solar and Heliospheric Observatory Mission 标题：从太阳和日球层天文台任务准备的机器学习就绪数据集链接：https://arxiv.org/abs/2108.06394

作者：Carl Shneider,Andong Hu,Ajay K. Tiwari,Monica G. Bobra,Karl Battams,Jannis Teunissen,Enrico Camporeale 机构：Center for Mathematics and Computer Science, Multiscale Dynamics, Amsterdam, XG, the Netherlands, W.W. Hansen Experimental Physics Laboratory, Stanford University, Stanford, CA, USA, CIRES, University of Colorado, Boulder, CO, USA 备注：under review 摘要：我们提供了一个Python工具，用于从太阳图像生成标准数据集，该数据集允许用户定义选择标准和一系列预处理步骤。我们的Python工具可用于太阳和日光层观测站（SoHO）和太阳动力学观测站（SDO）任务的所有图像产品。我们讨论了一个由SoHO任务的多光谱图像生成的数据集，该数据集不存在缺失或损坏的数据以及日冕仪图像中的行星凌日，并进行了时间同步，以便为机器学习系统的输入做好准备。机器学习就绪的图像对于社区来说是一种宝贵的资源，因为它们可以用于预测空间天气参数等。我们用在拉格朗日一点（L1）观测到的行星际磁场（IMF）南北分量的提前3-5天预报来说明这一数据的使用。对于这个用例，我们将深度卷积神经网络（CNN）应用于完整SoHO数据集的子集，并与高斯朴素贝叶斯分类器的基线结果进行比较。摘要：We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission's multi-spectral images which is free of missing or corrupt data as well as planetary transits in coronagraph images, and is temporally synced making it ready for input to a machine learning system. Machine-learning-ready images are a valuable resource for the community because they can be used, for example, for forecasting space weather parameters. We illustrate the use of this data with a 3-5 day-ahead forecast of the north-south component of the interplanetary magnetic field (IMF) observed at Lagrange point one (L1). For this use case, we apply a deep convolutional neural network (CNN) to a subset of the full SoHO dataset and compare with baseline results from a Gaussian Naive Bayes classifier.

其他(14篇)

【1】 Who's Waldo? Linking People Across Text and Images 标题：沃尔多是谁？跨文本和图像链接人员链接：https://arxiv.org/abs/2108.07253

作者：Claire Yuqing Cui,Apoorv Khandelwal,Yoav Artzi,Noah Snavely,Hadar Averbuch-Elor 机构：Cornell University ,Cornell Tech 备注：Published in ICCV 2021 (Oral). Project webpage: this https URL 摘要：我们提出了一个任务和基准数据集，用于以人为中心的视觉基础，即标题中命名的人和图像中的人物之间的链接问题。与之前主要基于对象的视觉基础研究不同，我们的新任务掩盖了字幕中的人名，以鼓励在此类图像字幕对上训练的方法关注上下文线索（如多人之间的丰富互动），而不是学习名字和外表之间的联系。为了促进这项任务，我们引入了一个新的数据集，Who'swaldo，它是从wikimediacomons上的图像标题数据中自动挖掘出来的。我们提出了一种基于Transformer的方法，它优于这个任务上的几个强基线，并将我们的数据发布到研究社区，以刺激工作的上下文模型考虑视觉和语言。摘要：We present a task and benchmark dataset for person-centric visual grounding, the problem of linking between people named in a caption and people pictured in an image. In contrast to prior work in visual grounding, which is predominantly object-based, our new task masks out the names of people in captions in order to encourage methods trained on such image-caption pairs to focus on contextual cues (such as rich interactions between multiple people), rather than learning associations between names and appearances. To facilitate this task, we introduce a new dataset, Who's Waldo, mined automatically from image-caption data on Wikimedia Commons. We propose a Transformer-based method that outperforms several strong baselines on this task, and are releasing our data to the research community to spur work on contextual models that consider both vision and language.

【2】 Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism 标题：逃脱梯度消失：注意机制中Softmax的周期性选择链接：https://arxiv.org/abs/2108.07153

作者：Shulun Wang,Bin Liu,Feng Liu 机构：Department of Computer Science, Beijing Jiaotong University, Beijing, China, Key Laboratory of Deep Oil and Gas, China University of Petroleum (East China), Qingdao, China, slwang.github.ioPeriod alternatives 备注：18 pages, 16 figures 摘要：Softmax广泛应用于神经网络的多类分类、门结构和注意机制。输入为正态分布的统计假设支持Softmax的梯度稳定性。然而，当用于注意机制（如Transformer）时，由于嵌入之间的相关分数通常不是正态分布，因此出现了梯度消失问题，我们通过实验验证了这一点。在这项工作中，我们建议用周期函数代替指数函数，并从值和梯度的角度研究Softmax的一些潜在周期替代方案。通过参考LeViT的一个简单设计的演示上的实验，我们的方法被证明能够缓解梯度问题，并且与Softmax及其变体相比产生实质性的改进。此外，我们还通过数学和实验分析了预规范化对Softmax的影响以及我们的方法。最后，我们增加了演示的深度，并证明了我们的方法在深部结构中的适用性。摘要：Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms. The statistical assumption that the input is normal distributed supports the gradient stability of Softmax. However, when used in attention mechanisms such as transformers, since the correlation scores between embeddings are often not normally distributed, the gradient vanishing problem appears, and we prove this point through experimental confirmation. In this work, we suggest that replacing the exponential function by periodic functions, and we delve into some potential periodic alternatives of Softmax from the view of value and gradient. Through experiments on a simply designed demo referenced to LeViT, our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants. Further, we analyze the impact of pre-normalization for Softmax and our methods through mathematics and experiments. Lastly, we increase the depth of the demo and prove the applicability of our method in deep structures.

【3】 Implicitly Regularized RL with Implicit Q-Values 标题：具有隐Q值的隐正则RL 链接：https://arxiv.org/abs/2108.07041

作者：Nino Vieillard,Marcin Andrychowicz,Anton Raichuk,Olivier Pietquin,Matthieu Geist 机构：Google Research, Brain Team, Université de Lorraine, CNRS, Inria, IECL, F-, Nancy, France 摘要：$Q$-函数是许多强化学习（RL）算法中的一个中心量，对于这些算法，RL代理的行为遵循（软）贪婪策略w.r.t.$Q$。它是一个强大的工具，允许在没有环境模型的情况下选择操作，甚至不显式地对策略建模。然而，该方案只能用于离散动作任务，动作数量较少，否则无法精确计算softmax。特别是在现代演员批评体系结构中，使用函数近似来处理连续动作空间，从本质上防止了softmax的精确计算。我们建议通过隐式地将$Q$-函数参数化为日志策略和值函数的总和来缓解这个问题。我们使用得到的参数化导出了一个实用的脱离策略的深度RL算法，该算法适用于较大的动作空间，并强制策略和$Q$-值之间的softmax关系。我们对我们的算法进行了理论分析：从近似动态规划的角度，我们展示了它与值迭代的正则化版本的等价性，同时考虑了熵和Kullback-Leibler正则化，并且获得了有益的错误传播结果。然后，我们在经典控制任务上评估我们的算法，其结果与最先进的方法竞争。摘要：The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy w.r.t. to $Q$. It is a powerful tool that allows action selection without a model of the environment and even without explicitly modeling the policy. Yet, this scheme can only be used in discrete action tasks, with small numbers of actions, as the softmax cannot be computed exactly otherwise. Especially the usage of function approximation, to deal with continuous action spaces in modern actor-critic architectures, intrinsically prevents the exact computation of a softmax. We propose to alleviate this issue by parametrizing the $Q$-function implicitly, as the sum of a log-policy and of a value function. We use the resulting parametrization to derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the $Q$-value. We provide a theoretical analysis of our algorithm: from an Approximate Dynamic Programming perspective, we show its equivalence to a regularized version of value iteration, accounting for both entropy and Kullback-Leibler regularization, and that enjoys beneficial error propagation results. We then evaluate our algorithm on classic control tasks, where its results compete with state-of-the-art methods.

【4】 A diffusion-map-based algorithm for gradient computation on manifolds and applications 标题：一种基于扩散图的流形梯度计算算法及其应用链接：https://arxiv.org/abs/2108.06988

作者：Alvaro Almeida Gomez,Antônio J. Silva Neto,Jorge P. Zubelli 机构：IMPA, Est. D. Castorina, Jardim Botânico, Rio de Janeiro,-, Brazil, IPRJ-UERJ, R. Bonfim , Nova Friburgo ,-, Brazil, Khalifa University, P.O. Box , Abu Dhabi, UAE 摘要：基于流形中点处函数求值的（正态分布）随机样本，我们恢复了定义在具有欧氏空间边界的子流形内点上的给定函数的梯度。该方法基于扩散映射理论中提出的Laplace-Beltrami算子的估计。证明了展开式的解析收敛性，并提出了一种处理欧氏子流形上非凸优化问题的有效算法。我们测试并验证了我们的方法作为低温电子显微镜（Cryo EM）的后处理工具。我们还将该方法应用于经典的球面布局问题。摘要：We recover the gradient of a given function defined on interior points of a submanifold with boundary of the Euclidean space based on a (normally distributed) random sample of function evaluations at points in the manifold. This approach is based on the estimates of the Laplace-Beltrami operator proposed in the theory of Diffusion-Maps. Analytical convergence results of the resulting expansion are proved, and an efficient algorithm is proposed to deal with non-convex optimization problems defined on Euclidean submanifolds. We test and validate our methodology as a post-processing tool in Cryogenic electron microscopy (Cryo-EM). We also apply the method to the classical sphere packing problem.

【5】 WikiChurches: A Fine-Grained Dataset of Architectural Styles with Real-World Challenges 标题：WikiChurches：具有现实挑战的细粒度建筑样式数据集链接：https://arxiv.org/abs/2108.06959

作者：Björn Barz,Joachim Denzler 机构：Computer Vision Group, Friedrich Schiller University Jena, Jena, Germany 备注：10 pages, 7 figures, 3 tables 摘要：我们介绍了一个新的建筑风格分类数据集，由9485个教堂建筑图像组成。图片和样式标签都来源于维基百科。该数据集可以作为各种研究领域的基准，因为它结合了许多现实世界的挑战：基于细微视觉特征的类之间的细粒度区分、相对较小的样本量、高度不平衡的类分布、高度不同的视点以及标签的分层组织，其中只有一些图像以最精确的级别进行标记。此外，我们还为四大类别的139座教堂提供了631个特征视觉特征的边界框注释。例如，这些注释可以用于细粒度分类的研究，在细粒度分类中，通常可以获得关于不同对象部分的额外专家知识。有关图像和注释，请访问：https://doi.org/10.5281/zenodo.5166987 摘要：We introduce a novel dataset for architectural style classification, consisting of 9,485 images of church buildings. Both images and style labels were sourced from Wikipedia. The dataset can serve as a benchmark for various research fields, as it combines numerous real-world challenges: fine-grained distinctions between classes based on subtle visual features, a comparatively small sample size, a highly imbalanced class distribution, a high variance of viewpoints, and a hierarchical organization of labels, where only some images are labeled at the most precise level. In addition, we provide 631 bounding box annotations of characteristic visual features for 139 churches from four major categories. These annotations can, for example, be useful for research on fine-grained classification, where additional expert knowledge about distinctive object parts is often available. Images and annotations are available at: https://doi.org/10.5281/zenodo.5166987

【6】 Do Proportionate Algorithms Exploit Sparsity? 标题：比例算法利用稀疏性吗？链接：https://arxiv.org/abs/2108.06846

作者：Markus V. S. Lima,Gabriel S. Chaves,Tadeu N. Ferreira,Paulo S. R. Diniz 备注：5 pages, 2 figures, 6 sub-figures 摘要：利用稀疏性的自适应滤波器一直是一个非常活跃的研究领域，其中遵循比例更新原理的算法，即所谓的比例型算法，非常流行。事实上，关于比例型算法的研究有数百项，因此，它们的优点是众所周知的。本文讨论了使用比例更新的未探索的缺点和局限性及其实际影响。我们的发现包括在一些稀疏情况下，以及在处理非平稳和可压缩系统时，这些算法性能差的理论理由。仿真结果证实了该理论。摘要：Adaptive filters exploiting sparsity have been a very active research field, among which the algorithms that follow the "proportional-update principle", the so-called proportionate-type algorithms, are very popular. Indeed, there are hundreds of works on proportionate-type algorithms and, therefore, their advantages are widely known. This paper addresses the unexplored drawbacks and limitations of using proportional updates and their practical impacts. Our findings include the theoretical justification for the poor performance of these algorithms in several sparse scenarios, and also when dealing with non-stationary and compressible systems. Simulation results corroborating the theory are presented.

【7】 Batched Thompson Sampling for Multi-Armed Bandits 标题：多臂土匪的分批Thompson抽样链接：https://arxiv.org/abs/2108.06812

作者：Nikolai Karpov,Qin Zhang 机构：Indiana University Bloomington 备注：9 pages 摘要：我们研究了成批环境下随机多武装匪徒的Thompson抽样算法，其中我们希望使用少量策略更改（或成批）来最小化对一系列武装拉扯的后悔。我们提出了两种算法，并通过在合成数据集和真实数据集上的实验证明了它们的有效性。我们还从理论上分析了所提出的算法，并得到了双臂情况下几乎严格的折衷。摘要：We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets. We also analyze the proposed algorithms from the theoretical aspect and obtain almost tight regret-batches tradeoffs for the two-arm case.

【8】 Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data 标题：Bregman近似点算法的隐式正则化与可分数据的镜面下降链接：https://arxiv.org/abs/2108.06808

作者：Yan Li,Caleb Ju,Ethan X. Fang,Tuo Zhao 机构： and Tuo Zhao are affiliated with School of Industrial and Systems Engineering at Georgia Instituteof Technology; Ethan X, Fang is affiliated with Department of Statistics at Pennsylvania State University; Correspondingemails 摘要：Bregman近邻点算法（BPPA）作为优化工具箱中的核心之一，已经见证了新兴的应用。由于更新规则简单且易于实现，该算法在经验成功方面具有一些令人信服的直觉，但严格的理由在很大程度上仍有待探索。我们通过可分离数据的分类任务研究了BPPA的计算性质，并证明了与BPPA相关的可证明的算法正则化效果。我们证明了BPPA获得的非平凡裕度与引起Bregman发散的距离母函数的条件数密切相关。我们进一步证明了一类问题对条件数的依赖是紧密的，从而表明了散度在影响所获得解的质量方面的重要性。此外，我们将我们的发现扩展到镜像下降（MD），我们在边缘和布雷格曼散度之间建立了类似的联系。通过一个具体的例子，我们证明了BPPA/MD在关于马氏距离的最大裕度解的方向上收敛。我们的理论发现是最早证明良性学习特性BPPA/MD的发现之一，也为在算法设计中谨慎选择分歧提供了佐证。摘要：Bregman proximal point algorithm (BPPA), as one of the centerpieces in the optimization toolbox, has been witnessing emerging applications. With simple and easy to implement update rule, the algorithm bears several compelling intuitions for empirical successes, yet rigorous justifications are still largely unexplored. We study the computational properties of BPPA through classification tasks with separable data, and demonstrate provable algorithmic regularization effects associated with BPPA. We show that BPPA attains non-trivial margin, which closely depends on the condition number of the distance generating function inducing the Bregman divergence. We further demonstrate that the dependence on the condition number is tight for a class of problems, thus showing the importance of divergence in affecting the quality of the obtained solutions. In addition, we extend our findings to mirror descent (MD), for which we establish similar connections between the margin and Bregman divergence. We demonstrate through a concrete example, and show BPPA/MD converges in direction to the maximal margin solution with respect to the Mahalanobis distance. Our theoretical findings are among the first to demonstrate the benign learning properties BPPA/MD, and also provide corroborations for a careful choice of divergence in the algorithmic design.

【9】 Deep Adversarially-Enhanced k-Nearest Neighbors 标题：深层对抗性增强的k-最近邻域链接：https://arxiv.org/abs/2108.06797

作者：Ren Wang,Tianqi Chen 机构：University of Michigan 摘要：最近的研究从理论和经验上表明，深度神经网络（DNN）对小扰动具有固有的脆弱性。应用深度k-最近邻（DkNN）分类器，我们观察到随着层的加深，鲁棒性和准确性的权衡显著增加。在这项工作中，我们提出了一种深度敌对增强的k-最近邻（DAEkNN）方法，该方法比DkNN具有更高的鲁棒性，并通过两个关键因素缓解了深层鲁棒性-准确性权衡。首先，DAEkNN基于一个经过对抗训练的模型。其次，DAEkNN利用良性和对抗性训练数据的加权组合进行预测。从经验上看，我们发现DAEkNN改善了MNIST和CIFAR-10数据集的鲁棒性和鲁棒性-准确性权衡。摘要：Recent works have theoretically and empirically shown that deep neural networks (DNNs) have an inherent vulnerability to small perturbations. Applying the Deep k-Nearest Neighbors (DkNN) classifier, we observe a dramatically increasing robustness-accuracy trade-off as the layer goes deeper. In this work, we propose a Deep Adversarially-Enhanced k-Nearest Neighbors (DAEkNN) method which achieves higher robustness than DkNN and mitigates the robustness-accuracy trade-off in deep layers through two key elements. First, DAEkNN is based on an adversarially trained model. Second, DAEkNN makes predictions by leveraging a weighted combination of benign and adversarial training data. Empirically, we find that DAEkNN improves both the robustness and the robustness-accuracy trade-off on MNIST and CIFAR-10 datasets.

【10】 Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time 标题：面向未来的训练：简单的梯度插值损失以随时间概括链接：https://arxiv.org/abs/2108.06721

作者：Anshul Nasery,Soumyadeep Thakur,Vihari Piratla,Abir De,Sunita Sarawagi 机构：Indian Institute of Technology, Bombay 摘要：在一些实际应用中，使用机器学习模型对分布随时间逐渐变化的数据进行预测，从而导致列车分布和测试分布之间的漂移。此类模型通常会定期根据新数据重新训练，因此需要推广到不太远的将来的数据。在这种情况下，在增强时间泛化方面有很多以前的工作，例如，过去数据的连续传输、核平滑的时间敏感参数以及最近的时间不变特征的对抗性学习。然而，这些方法都有一些局限性，例如，可扩展性差、训练不稳定以及依赖未来未标记的数据。针对上述局限性，我们提出了一种简单的方法，该方法从具有时间敏感参数的模型开始，但使用梯度插值（GI）损失调整其时间复杂度。GI允许决策边界随时间变化，并且通过允许特定于任务的控制随时间变化，仍然可以防止过度拟合有限的训练时间快照。我们将我们的方法与多个真实数据集上的现有基线进行比较，结果表明，GI一方面优于更复杂的生成性和对抗性方法，另一方面优于更简单的梯度正则化方法。摘要：In several real world applications, machine learning models are deployed to make predictions on data whose distribution changes gradually along time, leading to a drift between the train and test distributions. Such models are often re-trained on new data periodically, and they hence need to generalize to data not too far into the future. In this context, there is much prior work on enhancing temporal generalization, e.g. continuous transportation of past data, kernel smoothed time-sensitive parameters and more recently, adversarial learning of time-invariant features. However, these methods share several limitations, e.g, poor scalability, training instability, and dependence on unlabeled data from the future. Responding to the above limitations, we propose a simple method that starts with a model with time-sensitive parameters but regularizes its temporal complexity using a Gradient Interpolation (GI) loss. GI allows the decision boundary to change along time and can still prevent overfitting to the limited training time snapshots by allowing task-specific control over changes along time. We compare our method to existing baselines on multiple real-world datasets, which show that GI outperforms more complicated generative and adversarial approaches on the one hand, and simpler gradient regularization methods on the other.

【11】 Neuron Campaign for Initialization Guided by Information Bottleneck Theory 标题：信息瓶颈理论指导的神经元初始化运动链接：https://arxiv.org/abs/2108.06530

作者：Haitao Mao,Xu Chen,Qiang Fu,Lun Du,Shi Han,Dongmei Zhang 机构：University of Electronic Science and, Technology of China, Chengdu, China, Peking University, Beijing, China, Microsoft Research Asia, Domei Zhang 备注：5 pages, Accepted by CIKM'21 摘要：初始化在深度神经网络（DNN）的训练中起着至关重要的作用。现有的初始化策略主要侧重于稳定训练过程，以缓解梯度消失/爆炸问题。然而，这些初始化方法缺乏对如何提高泛化能力的考虑。信息瓶颈（IB）理论是一个著名的解释DNN泛化的理解框架。在IB理论的指导下，我们设计了两个更好地初始化DNN的标准。我们进一步设计了一种神经元运动初始化算法，以便在给定的数据集上为神经网络有效地选择一个良好的初始化。在MNIST数据集上的实验表明，该方法具有更快的收敛速度和更好的泛化性能。摘要：Initialization plays a critical role in the training of deep neural networks (DNN). Existing initialization strategies mainly focus on stabilizing the training process to mitigate gradient vanish/explosion problems. However, these initialization methods are lacking in consideration about how to enhance generalization ability. The Information Bottleneck (IB) theory is a well-known understanding framework to provide an explanation about the generalization of DNN. Guided by the insights provided by IB theory, we design two criteria for better initializing DNN. And we further design a neuron campaign initialization algorithm to efficiently select a good initialization for a neural network on a given dataset. The experiments on MNIST dataset show that our method can lead to a better generalization performance with faster convergence.

【12】 DQN Control Solution for KDD Cup 2021 City Brain Challenge 标题：KDD Cup 2021城市脑力挑战赛DQN控制解决方案链接：https://arxiv.org/abs/2108.06491

作者：Yitian Chen,Kunlong Chen,Kunjin Chen,Lin Wang 机构：BIGO Technology, Alibaba Group 备注：5 pages, report for KDD Cup 2021 City Brain Challenge workshop 摘要：我们参加了城市大脑挑战赛并获得第8名。在本次比赛中，参赛者将获得一个真实的城市规模道路网络，其交通需求来自真实的交通数据。玩家被要求与自行设计的代理协调交通信号，以最大限度地增加服务车辆的数量，同时保持可接受的延误。在这篇摘要文章中，我们提出了一个全面的分析和我们的详细解决方案的竞争。我们的方法主要是基于深度Q网络（DQN）的自适应，用于实时交通信号控制。从我们的角度来看，这种竞争的主要挑战是如何将经典的DQN框架扩展到现实世界复杂道路网络和交通流情况下的交通信号控制。在尝试并实现了几个经典的奖励函数之后，我们最终选择在我们的代理中应用我们新设计的奖励。通过应用我们新提出的奖励函数并仔细调整控制方案，基于单个DQN模型的代理可以跻身前15名团队。我们希望本文能在一定程度上作为现实道路网络交通信号控制的基线解决方案，并激发进一步的尝试和研究。摘要：We took part in the city brain challenge competition and achieved the 8th place. In this competition, the players are provided with a real-world city-scale road network and its traffic demand derived from real traffic data. The players are asked to coordinate the traffic signals with a self-designed agent to maximize the number of vehicles served while maintaining an acceptable delay. In this abstract paper, we present an overall analysis and our detailed solution to this competition. Our approach is mainly based on the adaptation of the deep Q-network (DQN) for real-time traffic signal control. From our perspective, the major challenge of this competition is how to extend the classical DQN framework to traffic signals control in real-world complex road network and traffic flow situation. After trying and implementing several classical reward functions, we finally chose to apply our newly-designed reward in our agent. By applying our newly-proposed reward function and carefully tuning the control scheme, an agent based on a single DQN model can rank among the top 15 teams. We hope this paper could serve, to some extent, as a baseline solution to traffic signal control of real-world road network and inspire further attempts and researches.

【13】 Fast predictions of lattice energies by continuous isometry invariants of crystal structures 标题：用晶体结构的连续等距不变量快速预测晶格能链接：https://arxiv.org/abs/2108.07233

作者：Jakob Ropers,Marco M Mosca,Olga Anosova,Vitaliy Kurlin,Andrew I Cooper 机构： and Andrew I Cooper 1[]University of Liverpool 备注：To appear in the proceedings of DACOMSIN (Data and Computation for Materials Science and Innovation) 2021, this https URL 摘要：晶体结构预测（CSP）旨在通过优化原子、离子或分子的周期性排列来发现固态晶体材料。由于数百万模拟晶体的能量最小化速度较慢，CSP需要数周的超级计算机时间。晶格能是决定晶体热力学稳定性的关键物理性质，但没有简单的解析表达式。过去预测晶格能的机器学习方法使用慢晶体描述符，这取决于手动选择的参数。周期几何的新领域提供了更快的等距不变量，这些不变量在原子扰动下也是连续的。我们在模拟晶体上的实验证实，新不变量之间的小距离保证了较小的能量差异。我们比较了几种基于不变量的能量预测核方法，在5679个晶体的数据集上，平均绝对误差小于5kJ/摩尔或0.05eV/原子。摘要：Crystal Structure Prediction (CSP) aims to discover solid crystalline materials by optimizing periodic arrangements of atoms, ions or molecules. CSP takes weeks of supercomputer time because of slow energy minimizations for millions of simulated crystals. The lattice energy is a key physical property, which determines thermodynamic stability of a crystal but has no simple analytic expression. Past machine learning approaches to predict the lattice energy used slow crystal descriptors depending on manually chosen parameters. The new area of Periodic Geometry offers much faster isometry invariants that are also continuous under perturbations of atoms. Our experiments on simulated crystals confirm that a small distance between the new invariants guarantees a small difference of energies. We compare several kernel methods for invariant-based predictions of energy and achieve the mean absolute error of less than 5kJ/mole or 0.05eV/atom on a dataset of 5679 crystals.

【14】 Robust Trimmed k-means 标题：鲁棒剪裁k-均值算法链接：https://arxiv.org/abs/2108.07186

作者：Olga Dorabiala,J. Nathan Kutz,Aleksandr Aravkin 机构：Department of Applied Mathematics, University of Washington, Seattle, WA 备注：14 pages, 6 figures, one table 摘要：聚类是无监督学习中的一个基本工具，用于通过区分给定数据集的相似和不同特征来对对象进行分组。最常见的聚类算法之一是k-means。不幸的是，在处理真实世界的数据时，许多传统的聚类算法由于组之间缺乏清晰的分离、噪声观测和/或外围数据点而受到损害。因此，成功的数据分析需要稳健的统计算法。目前的k-均值聚类方法是专门针对单成员或多成员数据的，但在这两种情况下都没有竞争力。我们提出了k-means算法的一个扩展，我们称之为鲁棒修剪k-means（RTKM），它可以同时识别异常值和聚类点，并且可以应用于单成员或多成员数据。我们在各种真实数据集上对RTKM进行了测试，结果表明，RTKM在有离群值的单成员数据和无离群值的多成员数据上的性能与其他方法相当。我们还表明，RTKM利用其相对优势，在包含离群值的多成员数据上优于其他方法。摘要：Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing between similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robustify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single- or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers. We also show that RTKM leverages its relative advantages to outperform other methods on multi-membership data containing outliers.

linux https 网络安全批量计算学习方法

0 人点赞