机器学习学术速递[8.20]

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计80篇

Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】 EqGNN: Equalized Node Opportunity in Graphs 标题：EqGNN：图中均衡的节点机会链接：https://arxiv.org/abs/2108.08800

作者：Uriel Singer,Kira Radinsky 机构：Technion, Israel Institute of Technology, Haifa, Israel 备注：10 pages, 3 figures, 4 tables, 2 algorithms 摘要：图形神经网络（GNNs）已被广泛用于图形中的监督学习任务，以达到最先进的结果。然而，很少有人致力于创建无偏见的GNN，即分类与敏感属性（如种族或性别）不相关的GNN。有些人忽略了敏感属性，或优化了公平性的统计平价标准。然而，研究表明，这两种方法都不能保证公平性，反而会削弱预测任务的效用。在这项工作中，我们提出了一个GNN框架，该框架允许优化均衡赔率公平标准概念的表示。该体系结构由三部分组成：（1）预测效用类的GNN分类器，（2）学习给定标签的节点敏感属性分布的采样器。它生成的样本被送入（3）鉴别器，该鉴别器使用一种新的“置换损失”函数来区分真实和采样的敏感属性。使用这些组件，我们训练一个模型忽略敏感属性的相关信息，只考虑其标签。据我们所知，我们是第一个为均等赔率标准优化GNN的公司。我们在几个图形数据集和敏感属性上评估了我们的分类器，并表明我们的算法达到了最先进的结果。摘要：Graph neural networks (GNNs), has been widely used for supervised learning tasks in graphs reaching state-of-the-art results. However, little work was dedicated to creating unbiased GNNs, i.e., where the classification is uncorrelated with sensitive attributes, such as race or gender. Some ignore the sensitive attributes or optimize for the criteria of statistical parity for fairness. However, it has been shown that neither approaches ensure fairness, but rather cripple the utility of the prediction task. In this work, we present a GNN framework that allows optimizing representations for the notion of Equalized Odds fairness criteria. The architecture is composed of three components: (1) a GNN classifier predicting the utility class, (2) a sampler learning the distribution of the sensitive attributes of the nodes given their labels. It generates samples fed into a (3) discriminator that discriminates between true and sampled sensitive attributes using a novel "permutation loss" function. Using these components, we train a model to neglect information regarding the sensitive attribute only with respect to its label. To the best of our knowledge, we are the first to optimize GNNs for the equalized odds criteria. We evaluate our classifier over several graph datasets and sensitive attributes and show our algorithm reaches state-of-the-art results.

【2】 Temporal Graph Network Embedding with Causal Anonymous Walks Representations 标题：嵌入因果匿名游动表示的时态图网络链接：https://arxiv.org/abs/2108.08754

作者：Ilya Makarov,Andrey Savchenko,Arseny Korovko,Leonid Sherstyuk,Nikita Severin,Aleksandr Mikheev,Dmitrii Babaev 机构：HSE University, Moscow, Russia, University of Ljubljana, Ljubljana, Slovenia, Moscow Institute of Physics and, Technology, Sber AI Lab 备注：10 pages, 3 figures 摘要：图机器学习中的许多任务，如链接预测和节点分类，通常通过表示学习来解决，其中网络中的每个节点或边都通过嵌入进行编码。虽然静态图中存在大量的网络嵌入，但当分析动态（即时间）网络时，任务变得更加复杂。在本文中，我们提出了一种基于时态图网络的动态网络表示学习的新方法，通过提取因果匿名行走，使用高度定制的消息生成函数。对于评估，我们提供了一个用于评估时态网络嵌入的基准管道。这项工作为涉及节点分类和链路预测的图机学习问题提供了第一个全面的比较框架，用于在每种可用设置下的时态网络表示学习。该模型的性能优于最先进的基线模型。这项工作还证明了它们之间的差异是基于在各种转换/归纳边缘/节点分类任务中的评估。此外，我们还展示了我们的模型在一家欧洲顶级银行提供的实际下游图形机器学习任务中的适用性和优越性能，该任务涉及基于交易数据的信用评分。摘要：Many tasks in graph machine learning, such as link prediction and node classification, are typically solved by using representation learning, in which each node or edge in the network is encoded via an embedding. Though there exists a lot of network embeddings for static graphs, the task becomes much more complicated when the dynamic (i.e. temporal) network is analyzed. In this paper, we propose a novel approach for dynamic network representation learning based on Temporal Graph Network by using a highly custom message generating function by extracting Causal Anonymous Walks. For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings. This work provides the first comprehensive comparison framework for temporal network representation learning in every available setting for graph machine learning problems involving node classification and link prediction. The proposed model outperforms state-of-the-art baseline models. The work also justifies the difference between them based on evaluation in various transductive/inductive edge/node classification tasks. In addition, we show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks, involving credit scoring based on transaction data.

【3】 SiReN: Sign-Aware Recommendation Using Graph Neural Networks 标题：SEREN：基于图神经网络的手势感知推荐链接：https://arxiv.org/abs/2108.08735

作者：Changwon Seo,Kyeong-Joong Jeong,Sungsu Lim,Won-Yong Shin 备注：14 pages, 5 figures, 6 tables 摘要：近年来，许多采用网络嵌入（NE）的推荐系统，如图神经网络（GNNs）在提高推荐精度方面得到了广泛的研究。然而，这些尝试主要集中于仅利用高评分的积极用户项目交互信息。因此，如何利用低评分来表示用户的偏好是一个挑战，因为在设计基于NE的推荐系统时，低评分仍然可以提供信息。在本研究中，我们提出了SiReN，一种新的基于GNN模型的符号感知推荐系统。具体来说，SiReN有三个关键组成部分：1）构造一个有符号的二部图，以更精确地表示用户的偏好，该图被拆分为两个边不相交的图，每个边都有正边和负边，2）分别通过GNN模型和多层感知器（MLP）为具有正边和负边的分区图生成两个嵌入，然后使用注意模型获得最终嵌入；3）在优化过程中建立符号感知贝叶斯个性化排序（BPR）损失函数。通过综合实验，我们实证证明SiReN始终优于最先进的NE辅助推荐方法。摘要：In recent years, many recommender systems using network embedding (NE) such as graph neural networks (GNNs) have been extensively studied in the sense of improving recommendation accuracy. However, such attempts have focused mostly on utilizing only the information of positive user-item interactions with high ratings. Thus, there is a challenge on how to make use of low rating scores for representing users' preferences since low ratings can be still informative in designing NE-based recommender systems. In this study, we present SiReN, a new sign-aware recommender system based on GNN models. Specifically, SiReN has three key components: 1) constructing a signed bipartite graph for more precisely representing users' preferences, which is split into two edge-disjoint graphs with positive and negative edges each, 2) generating two embeddings for the partitioned graphs with positive and negative edges via a GNN model and a multi-layer perceptron (MLP), respectively, and then using an attention model to obtain the final embeddings, and 3) establishing a sign-aware Bayesian personalized ranking (BPR) loss function in the process of optimization. Through comprehensive experiments, we empirically demonstrate that SiReN consistently outperforms state-of-the-art NE-aided recommendation methods.

【4】 Blockchain Phishing Scam Detection via Multi-channel Graph Classification 标题：基于多通道图分类的区块链钓鱼诈骗检测链接：https://arxiv.org/abs/2108.08456

作者：Dunjie Zhang,Jinyin Chen 机构： College of Information Engineering, Zhejiang University of Technology, Hangzhou, Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou,China 摘要：随着区块链技术的普及，区块链交易网络的金融安全问题日益严重。网络钓鱼欺诈检测方法将保护可能的受害者并构建更健康的区块链生态系统。现有的研究通常将钓鱼欺诈检测定义为一项节点分类任务，通过随机游走或图神经网络（GNN）等图嵌入方法学习用户的潜在特征。然而，由于区块链交易网络的大规模，这些检测方法的复杂性很高，忽略了交易的时间信息。针对这个问题，我们为用户定义了交易模式图，并将钓鱼欺诈检测转化为一个图分类任务。为了从输入图中提取更丰富的信息，我们提出了一种具有多个特征提取通道的GNN多通道图分类模型（MCGC）。通过提取目标用户的交易模式特征，交易模式图和MCGC能够更有效地检测潜在的网络钓鱼骗子。在七个基准数据集和以太坊数据集上进行的大量实验表明，所提出的MCGC不仅能够在图形分类任务中实现最先进的性能，而且能够基于目标用户的交易模式图实现有效的钓鱼欺诈检测。摘要：With the popularity of blockchain technology, the financial security issues of blockchain transaction networks have become increasingly serious. Phishing scam detection methods will protect possible victims and build a healthier blockchain ecosystem. Usually, the existing works define phishing scam detection as a node classification task by learning the potential features of users through graph embedding methods such as random walk or graph neural network (GNN). However, these detection methods are suffered from high complexity due to the large scale of the blockchain transaction network, ignoring temporal information of the transaction. Addressing this problem, we defined the transaction pattern graphs for users and transformed the phishing scam detection into a graph classification task. To extract richer information from the input graph, we proposed a multi-channel graph classification model (MCGC) with multiple feature extraction channels for GNN. The transaction pattern graphs and MCGC are more able to detect potential phishing scammers by extracting the transaction pattern features of the target users. Extensive experiments on seven benchmark and Ethereum datasets demonstrate that the proposed MCGC can not only achieve state-of-the-art performance in the graph classification task but also achieve effective phishing scam detection based on the target users' transaction pattern graphs.

【5】 Computing Steiner Trees using Graph Neural Networks 标题：用图神经网络计算Steiner树链接：https://arxiv.org/abs/2108.08368

作者：Reyan Ahmed,Md Asadullah Turja,Faryad Darabi Sahneh,Mithun Ghosh,Keaton Hamm,Stephen Kobourov 摘要：图神经网络在许多学习问题和实际应用中都取得了成功。最近的一项研究探索了图形神经网络在解决组合和图形算法问题方面的能力，如子图同构、检测派系和旅行商问题。然而，到目前为止，许多NP完全问题还没有用这种方法来解决。在本文中，我们处理Steiner树问题。我们采用四种学习框架来计算低成本的Steiner树：前馈神经网络、图神经网络、图卷积网络和图注意模型。我们以两种根本不同的方式使用这些框架：1）训练模型以学习实际的Steiner树节点，2）训练模型以学习好的Steiner点候选者以贪婪的方式使用最短路径连接到构建的树。我们在几个随机图生成模型以及SteinLib数据库上演示了我们的启发式算法的鲁棒性。我们的发现表明，GNN方法的开箱即用应用比经典的2-近似方法更糟糕。然而，当结合贪婪最短路径构造时，它甚至比2-近似算法略好。这一结果揭示了图形学习技术在经典NP完全问题上的基本能力和局限性。摘要：Graph neural networks have been successful in many learning problems and real-world applications. A recent line of research explores the power of graph neural networks to solve combinatorial and graph algorithmic problems such as subgraph isomorphism, detecting cliques, and the traveling salesman problem. However, many NP-complete problems are as of yet unexplored using this method. In this paper, we tackle the Steiner Tree Problem. We employ four learning frameworks to compute low cost Steiner trees: feed-forward neural networks, graph neural networks, graph convolutional networks, and a graph attention model. We use these frameworks in two fundamentally different ways: 1) to train the models to learn the actual Steiner tree nodes, 2) to train the model to learn good Steiner point candidates to be connected to the constructed tree using a shortest path in a greedy fashion. We illustrate the robustness of our heuristics on several random graph generation models as well as the SteinLib data library. Our finding suggests that the out-of-the-box application of GNN methods does worse than the classic 2-approximation method. However, when combined with a greedy shortest path construction, it even does slightly better than the 2-approximation algorithm. This result sheds light on the fundamental capabilities and limitations of graph learning techniques on classical NP-complete problems.

【6】 Multivariate and Propagation Graph Attention Network for Spatial-Temporal Prediction with Outdoor Cellular Traffic 标题：户外蜂窝流量时空预测的多变量传播图关注网络链接：https://arxiv.org/abs/2108.08307

作者：Chung-Yi Lin,Hung-Ting Su,Shen-Lung Tung,Winston Hsu 机构：National Taiwan University,Chunghwa Telecom Laboratories 备注：5 pages, 5 figures 摘要：时空预测是智能交通的关键问题，它有助于交通控制和事故预防等任务。以前的研究依赖于从传感器收集的大规模交通数据。但是，由于设备和维护成本的原因，不太可能在所有地区部署传感器。本文通过从一家电信公司每天超过20亿条记录中提取的室外蜂窝通信量来解决这个问题，因为由用户移动性引起的室外蜂窝通信量与运输通信量高度相关。我们研究了城市道路交叉口，目的是预测所有交叉口的未来室外蜂窝交通量。此外，我们提出了一个新的多元时空预测模型，主要由两个扩展的图形注意网络（GAT）组成。第一个GAT用于探索多元蜂窝业务之间的相关性。另一个GAT利用注意机制进行图形传播，以提高捕获空间依赖性的效率。实验表明，在我们的数据集上，所提出的模型明显优于最新的方法。摘要：Spatial-temporal prediction is a critical problem for intelligent transportation, which is helpful for tasks such as traffic control and accident prevention. Previous studies rely on large-scale traffic data collected from sensors. However, it is unlikely to deploy sensors in all regions due to the device and maintenance costs. This paper addresses the problem via outdoor cellular traffic distilled from over two billion records per day in a telecom company, because outdoor cellular traffic induced by user mobility is highly related to transportation traffic. We study road intersections in urban and aim to predict future outdoor cellular traffic of all intersections given historic outdoor cellular traffic. Furthermore, We propose a new model for multivariate spatial-temporal prediction, mainly consisting of two extending graph attention networks (GAT). First GAT is used to explore correlations among multivariate cellular traffic. Another GAT leverages the attention mechanism into graph propagation to increase the efficiency of capturing spatial dependency. Experiments show that the proposed model significantly outperforms the state-of-the-art methods on our dataset.

【7】 Clustering dynamics on graphs: from spectral clustering to mean shift through Fokker-Planck interpolation 标题：图的聚类动力学：从谱聚类到基于Fokker-Planck插值的均值漂移链接：https://arxiv.org/abs/2108.08687

作者：Katy Craig,Nicolás García Trillos,Dejan Slepčev 摘要：在这项工作中，我们建立了一个统一的框架，在数据聚类的密度驱动算法和基于几何的算法之间进行插值，特别是在离散和连续水平上将均值漂移算法与谱聚类相连接。我们通过在数据图上引入福克-普朗克方程来寻求这种联系。除了在图上引入新形式的均值漂移算法外，我们还提供了在大样本限制下扩散映射族行为的新理论见解，以及在固定图上扩散映射和均值漂移动力学之间提供了新的联系。几个数值例子说明了我们的理论发现，并强调了插值密度驱动和基于几何的聚类算法的好处。摘要：In this work we build a unifying framework to interpolate between density-driven and geometry-based algorithms for data clustering, and specifically, to connect the mean shift algorithm with spectral clustering at discrete and continuum levels. We seek this connection through the introduction of Fokker-Planck equations on data graphs. Besides introducing new forms of mean shift algorithms on graphs, we provide new theoretical insights on the behavior of the family of diffusion maps in the large sample limit as well as provide new connections between diffusion maps and mean shift dynamics on a fixed graph. Several numerical examples illustrate our theoretical findings and highlight the benefits of interpolating density-driven and geometry-based clustering algorithms.

Transformer(4篇)

【1】 PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers 标题：PoinTr：使用几何感知转换器完成不同的点云链接：https://arxiv.org/abs/2108.08839

作者：Xumin Yu,Yongming Rao,Ziyi Wang,Zuyan Liu,Jiwen Lu,Jie Zhou 机构：Department of Automation, Tsinghua University, China, State Key Lab of Intelligent Technologies and Systems, China, Beijing National Research Center for Information Science and Technology, China 备注：Accepted to ICCV 2021 (Oral Presentation) 摘要：由于传感器分辨率、单视点和遮挡的限制，在实际应用中捕获的点云通常是不完整的。因此，从局部点云中恢复完整点云成为许多实际应用中不可缺少的任务。在本文中，我们提出了一种新的方法，将点云完成转化为集对集转换问题，并设计了一种新的模型，称为PoinTr，该模型采用transformer编码器-解码器体系结构来完成点云完成。通过将点云表示为一组具有位置嵌入的无序点组，我们将点云转换为一系列点代理，并使用转换器生成点云。为了便于Transformer更好地利用点云三维几何结构的感应偏差，我们进一步设计了一个几何感知块，明确地建模局部几何关系。transformers的迁移使我们的模型能够更好地学习结构知识，并保留详细信息以完成点云计算。此外，我们还提出了两个更具挑战性的基准，其中包含更多不同的不完整点云，可以更好地反映现实世界的场景，以促进未来的研究。实验结果表明，无论是在新的基准测试还是在现有的基准测试中，我们的方法都大大优于现有的方法。代码可在https://github.com/yuxumin/PoinTr 摘要：Point clouds captured in real-world applications are often incomplete due to the limited sensor resolution, single viewpoint, and occlusion. Therefore, recovering the complete point clouds from partial ones becomes an indispensable task in many practical applications. In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr that adopts a transformer encoder-decoder architecture for point cloud completion. By representing the point cloud as a set of unordered groups of points with position embeddings, we convert the point cloud to a sequence of point proxies and employ the transformers for point cloud generation. To facilitate transformers to better leverage the inductive bias about 3D geometric structures of point clouds, we further devise a geometry-aware block that models the local geometric relationships explicitly. The migration of transformers enables our model to better learn structural knowledge and preserve detailed information for point cloud completion. Furthermore, we propose two more challenging benchmarks with more diverse incomplete point clouds that can better reflect the real-world scenarios to promote future research. Experimental results show that our method outperforms state-of-the-art methods by a large margin on both the new benchmarks and the existing ones. Code is available at https://github.com/yuxumin/PoinTr

【2】 Do Vision Transformers See Like Convolutional Neural Networks? 标题：视觉Transformer看起来像卷积神经网络吗？链接：https://arxiv.org/abs/2108.08810

作者：Maithra Raghu,Thomas Unterthiner,Simon Kornblith,Chiyuan Zhang,Alexey Dosovitskiy 机构：Dosovitskiy, Google Research, Brain Team 摘要：迄今为止，卷积神经网络（CNN）已成为视觉数据的事实模型。最近的工作表明，（视觉）变换器模型（ViT）可以在图像分类任务上实现相当甚至更高的性能。这就提出了一个中心问题：视觉转换器是如何解决这些任务的？他们是像卷积网络一样工作，还是学习完全不同的视觉表现？通过分析ViT和CNN在图像分类基准上的内部表示结构，我们发现这两种体系结构之间存在显著差异，例如ViT在所有层上都有更统一的表示。我们探索了这些差异是如何产生的，发现了自我注意所起的关键作用，自我注意使全局信息得以早期聚合，以及ViT残余连接，它强烈地将特征从较低层传播到较高层。我们研究了空间定位的影响，证明VIT成功地保留了输入的空间信息，不同分类方法的效果显著。最后，我们研究了（预训练）数据集规模对中间特征和迁移学习的影响，最后讨论了与新体系结构（如MLP混合器）的连接。摘要：Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers. We study the ramifications for spatial localization, demonstrating ViTs successfully preserve input spatial information, with noticeable effects from different classification methods. Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.

【3】 A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection 标题：基于多输入多输出Transformer的混合神经网络多类隐私检测链接：https://arxiv.org/abs/2108.08483

作者：A K M Nuhil Mehdy,Hoda Mehrpouyan 机构：Department of Computer Science, Boise State University, Idaho, USA 备注：None 摘要：由于通信平台、社交网站的大量增加，以及更多用户参与在线公共讨论，对用户数据隐私的担忧已达到最高水平。越来越多的人在没有意识到风险和影响的情况下通过电子邮件、短信和社交媒体交换私人信息。自然语言处理（NLP）领域的研究人员一直致力于创建工具和策略，以识别、分类和清理文本数据中的私有信息，因为大量数据是以文本形式交换的。然而，大多数检测方法仅仅依赖于文本中预先确定的关键词的存在，而忽略了对特定语境中话语潜在意义的推断。因此，在某些情况下，这些工具和算法无法检测泄漏，或者生成的结果未分类。在本文中，我们提出了一种多输入、多输出的混合神经网络，它利用迁移学习、语言学和元数据来学习隐藏模式。我们的目标是根据具体情况更好地对披露/不披露内容进行分类。我们在人类注释的地面真相数据集上训练和评估了我们的模型，该数据集共包含5400条推文。结果表明，通过对两个独立任务的联合学习，所提出的模型能够以77.4%的准确率识别通过推文披露的隐私，同时以99%的令人印象深刻的准确率对这些推文的信息类型进行分类。摘要：The concern regarding users' data privacy has risen to its highest level due to the massive increase in communication platforms, social networking sites, and greater users' participation in online public discourse. An increasing number of people exchange private information via emails, text messages, and social media without being aware of the risks and implications. Researchers in the field of Natural Language Processing (NLP) have concentrated on creating tools and strategies to identify, categorize, and sanitize private information in text data since a substantial amount of data is exchanged in textual form. However, most of the detection methods solely rely on the existence of pre-identified keywords in the text and disregard the inference of the underlying meaning of the utterance in a specific context. Hence, in some situations, these tools and algorithms fail to detect disclosure, or the produced results are miss-classified. In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. Our goal is to better classify disclosure/non-disclosure content in terms of the context of situation. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets. The results show that the proposed model was able to identify privacy disclosure through tweets with an accuracy of 77.4% while classifying the information type of those tweets with an impressive accuracy of 99%, by jointly learning for two separate tasks.

【4】 Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks 标题：Transformer注意头在多语言和跨语言任务中的贡献链接：https://arxiv.org/abs/2108.08375

作者：Weicheng Ma,Kai Zhang,Renze Lou,Lili Wang,Soroush Vosoughi 机构：Department of Computer Science, Dartmouth College, Department of Computer Science and Technology, Tsinghua University, Department of Computer Science, Zhejiang University City College 备注：In ACL 2021 摘要：本文研究了注意头在基于Transformer的模型中的相对重要性，以帮助它们在跨语言和多语言任务中的解释能力。先前的研究发现，在每项单语言自然语言处理（NLP）任务中，只有少数注意头是重要的，修剪其余的注意头可以使模型的性能相当或得到改进。然而，在跨语言和多语言任务中，修剪注意头的影响尚不清楚。通过大量的实验，我们发现：（1）在基于多语言转换的模型中剪除大量的注意头通常对其在跨语言和多语言任务中的表现有积极的影响；（2）可以使用梯度对要剪除的注意头进行排序，并通过一些试验进行识别。我们的实验集中在序列标记任务上，可能适用于其他跨语言和多语言任务。为了全面性，我们研究了两个预先训练的多语言模型，即多语言BERT（mBERT）和XLM-R，它们分别涉及9种语言的三个任务。我们还讨论了我们的发现的有效性及其对真正资源稀缺的语言和其他任务设置的可扩展性。摘要：This paper studies the relative importance of attention heads in Transformer-based models to aid their interpretability in cross-lingual and multi-lingual tasks. Prior research has found that only a few attention heads are important in each mono-lingual Natural Language Processing (NLP) task and pruning the remaining heads leads to comparable or improved performance of the model. However, the impact of pruning attention heads is not yet clear in cross-lingual and multi-lingual tasks. Through extensive experiments, we show that (1) pruning a number of attention heads in a multi-lingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments. Our experiments focus on sequence labeling tasks, with potential applicability on other cross-lingual and multi-lingual tasks. For comprehensiveness, we examine two pre-trained multi-lingual models, namely multi-lingual BERT (mBERT) and XLM-R, on three tasks across 9 languages each. We also discuss the validity of our findings and their extensibility to truly resource-scarce languages and other task settings.

GAN|对抗|攻击|生成相关(8篇)

【1】 Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation 标题：基于线性函数逼近的在线和离线环境下可证明有效的生成性对抗性模仿学习链接：https://arxiv.org/abs/2108.08765

作者：Zhihan Liu,Yufeng Zhang,Zuyue Fu,Zhuoran Yang,Zhaoran Wang 备注：54 pages, in submission 摘要：在生成性对抗性模仿学习（GAIL）中，agent的目标是从专家演示中学习策略，以便在某个预定义的奖励集上不能将其性能与专家策略区分开来。在本文中，我们使用线性函数近似研究了在线和离线环境下的GAIL，其中特征映射中的转移函数和奖励函数都是线性的。除了专家演示之外，在联机设置中，代理可以与环境交互，而在脱机设置中，代理仅访问先前用户收集的附加数据集。对于在线GAIL，我们提出了一种乐观生成对抗策略优化算法（OGAP），并证明了OGAP实现了$widetilde{mathcal{O}（H^2d{3/2}K^{1/2} KH^{3/2}dN u 1^{-1/2}）$遗憾。这里，$N_1$表示专家演示的轨迹数，$d$表示特征维度，$K$表示剧集数。对于离线GAIL，我们提出了一种悲观生成对抗策略优化算法（PGAP）。对于一个任意的附加数据集，我们得到了PGAP的最优性缺口，实现了附加数据集利用率的极小极大下界。假设在附加数据集上有足够的覆盖率，我们表明PGAP实现了$widetilde{mathcal{O}（H^{2}dK^{-1/2} H^2d^{3/2}Nè2^{-1/2} H^{3/2}dNè1^{-1/2}）$最优性缺口。此处$N_2$表示具有足够覆盖率的附加数据集的轨迹数。摘要：In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $widetilde{mathcal{O}}(H^2 d^{3/2}K^{1/2} KH^{3/2}dN_1^{-1/2})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes. For offline GAIL, we propose a pessimistic generative adversarial policy optimization algorithm (PGAP). For an arbitrary additional dataset, we obtain the optimality gap of PGAP, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAP achieves $widetilde{mathcal{O}}(H^{2}dK^{-1/2} H^2d^{3/2}N_2^{-1/2} H^{3/2}dN_1^{-1/2} )$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.

【2】 Dynamic Difficulty Adjustment in Virtual Reality Exergames through Experience-driven Procedural Content Generation 标题：虚拟现实中动态难度调整通过体验驱动的过程性内容生成进行游戏链接：https://arxiv.org/abs/2108.08762

作者：Tobias Huber,Silvan Mertes,Stanislava Rangelova,Simon Flutura,Elisabeth André 机构：University of Augsburg, Augsburg, Germany, Elisabeth Andr´e 摘要：以体育活动为特色的虚拟现实（VR）游戏已被证明能提高玩家进行体育锻炼的动机。然而，为了使这些运动产生积极的保健效果，必须每周重复几次。为了在较长时间内保持玩家的积极性，游戏通常采用动态难度调整（DDA）来根据玩家的能力调整游戏的挑战。对于运动游戏，这主要是通过调整特定的游戏内参数来完成的，如对象的速度。在这项工作中，我们建议在VR运动游戏中使用经验驱动的DDA程序内容生成，通过程序生成与玩家当前能力相匹配的级别。不仅微调特定参数，而且创造全新的水平有可能减少较长时间内的重复，并允许同时适应游戏的认知和身体挑战。作为概念证明，我们实现了一个初始原型，在该原型中，玩家必须穿过一个包含多个练习室的迷宫，通过神经网络生成迷宫。通过这些健身室需要运动员进行身体活动。为了匹配玩家的能力，我们使用深度强化学习来调整迷宫的结构，并决定迷宫中包括哪些练习室。我们利用biodata和主观问卷对我们的原型进行了探索性用户研究。摘要：Virtual Reality (VR) games that feature physical activities have been shown to increase players' motivation to do physical exercise. However, for such exercises to have a positive healthcare effect, they have to be repeated several times a week. To maintain player motivation over longer periods of time, games often employ Dynamic Difficulty Adjustment (DDA) to adapt the game's challenge according to the player's capabilities. For exercise games, this is mostly done by tuning specific in-game parameters like the speed of objects. In this work, we propose to use experience-driven Procedural Content Generation for DDA in VR exercise games by procedurally generating levels that match the player's current capabilities. Not only finetuning specific parameters but creating completely new levels has the potential to decrease repetition over longer time periods and allows for the simultaneous adaptation of the cognitive and physical challenge of the exergame. As a proof-of-concept, we implement an initial prototype in which the player must traverse a maze that includes several exercise rooms, whereby the generation of the maze is realized by a neural network. Passing those exercise rooms requires the player to perform physical activities. To match the player's capabilities, we use Deep Reinforcement Learning to adjust the structure of the maze and to decide which exercise rooms to include in the maze. We evaluate our prototype in an exploratory user study utilizing both biodata and subjective questionnaires.

【3】 An Innovative Attack Modelling and Attack Detection Approach for a Waiting Time-based Adaptive Traffic Signal Controller 标题：一种基于等待时间的自适应交通信号控制器攻击建模与检测新方法链接：https://arxiv.org/abs/2108.08627

作者：Sagar Dasgupta,Courtland Hollis,Mizanur Rahman,Travis Atkison 机构：Ph.D. Student, Department of Civil, Construction & Environmental Engineering, The University of Alabama, Cyber Hall, Box , Kirkbride Lane, Tuscaloosa, AL , Undergraduate Student, Assistant Professor, Department of Computer Science, Cyber Hall, Box , Tuscaloosa, AL 摘要：结合连接车辆（CV）概念的自适应交通信号控制器（ATSC）使用实时车辆轨迹数据来调节绿灯时间，并能够显著减少交叉口等待时间，从而改善信号走廊中的行驶时间。然而，基于CV的ATSC增加了易受潜在网络攻击的路面尺寸，使得攻击者能够在道路网络中造成灾难性的交通拥堵。攻击者可以通过以低速维持交通和跟车规则来生成假车，从而阻塞路线，从而使信号定时和相位发生变化，而不会使车辆数量发生任何突然变化。由于ATSC的自适应特性，对此类攻击进行建模并制定检测策略是一项挑战。本文介绍了一种基于等待时间的ATSC算法和相应的检测策略。因此，本文的目标是：（i）为ATSC开发“慢中毒”攻击生成策略，（ii）使用递归神经网络（即长-短期记忆模型）开发基于预测的“慢中毒”攻击检测策略。我们使用微观交通模拟器——城市机动模拟（SUMO）——生成了一个“慢中毒”攻击建模策略，并使用模拟生成的数据开发了攻击模型和检测模型。我们的分析表明，攻击策略可以有效地在方法中造成拥塞，而检测策略能够标记攻击。摘要：An adaptive traffic signal controller (ATSC) combined with a connected vehicle (CV) concept uses real-time vehicle trajectory data to regulate green time and has the ability to reduce intersection waiting time significantly and thereby improve travel time in a signalized corridor. However, the CV-based ATSC increases the size of the surface vulnerable to potential cyber-attack, allowing an attacker to generate disastrous traffic congestion in a roadway network. An attacker can congest a route by generating fake vehicles by maintaining traffic and car-following rules at a slow rate so that the signal timing and phase change without having any abrupt changes in number of vehicles. Because of the adaptive nature of ATSC, it is a challenge to model this kind of attack and also to develop a strategy for detection. This paper introduces an innovative "slow poisoning" cyberattack for a waiting time based ATSC algorithm and a corresponding detection strategy. Thus, the objectives of this paper are to: (i) develop a "slow poisoning" attack generation strategy for an ATSC, and (ii) develop a prediction-based "slow poisoning" attack detection strategy using a recurrent neural network -- i.e., long short-term memory model. We have generated a "slow poisoning" attack modeling strategy using a microscopic traffic simulator -- Simulation of Urban Mobility (SUMO) -- and used generated data from the simulation to develop both the attack model and detection model. Our analyses revealed that the attack strategy is effective in creating a congestion in an approach and detection strategy is able to flag the attack.

【4】 Image2Lego: Customized LEGO Set Generation from Images 标题：Image2Lego：从图像生成自定义乐高积木集链接：https://arxiv.org/abs/2108.08477

作者：Kyle Lennon,Katharina Fransen,Alexander O'Brien,Yumeng Cao,Matthew Beveridge,Yamin Arefeen,Nikhil Singh,Iddo Drori 机构：Massachusetts Institute of Technology 备注：9 pages, 10 figures 摘要：尽管乐高玩具已经让一代又一代的儿童和成人感到愉悦，但对于一般的乐高爱好者来说，设计与现实世界或想象场景的复杂性相匹配的定制玩具仍然是一个巨大的挑战。为了使这一壮举成为可能，我们实现了一个从2D图像生成乐高积木模型的系统。我们设计了一个新的解决方案，使用在三维体素化模型上训练的八叉树结构的自动编码器来获得模型重建的可行潜在表示，并使用训练的独立网络从二维图像预测该潜在表示。乐高模型是通过三维体素化模型到砖块的算法转换获得的。我们展示了第一个将照片转换为三维乐高模型的例子。八叉树体系结构可以灵活地生成多个分辨率，以最适合用户的创造性愿景或设计需求。为了证明我们系统的广泛适用性，我们为乐高物体和人脸模型生成逐步构建说明和动画。最后，我们通过使用真实的乐高积木构建物理构建来测试这些自动生成的乐高积木集。摘要：Although LEGO sets have entertained generations of children and adults, the challenge of designing customized builds matching the complexity of real-world or imagined scenes remains too great for the average enthusiast. In order to make this feat possible, we implement a system that generates a LEGO brick model from 2D images. We design a novel solution to this problem that uses an octree-structured autoencoder trained on 3D voxelized models to obtain a feasible latent representation for model reconstruction, and a separate network trained to predict this latent representation from 2D images. LEGO models are obtained by algorithmic conversion of the 3D voxelized model to bricks. We demonstrate first-of-its-kind conversion of photographs to 3D LEGO models. An octree architecture enables the flexibility to produce multiple resolutions to best fit a user's creative vision or design needs. In order to demonstrate the broad applicability of our system, we generate step-by-step building instructions and animations for LEGO models of objects and human faces. Finally, we test these automatically generated LEGO sets by constructing physical builds using real LEGO bricks.

【5】 Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes 标题：利用多对象关系检测复杂场景中的对抗性攻击链接：https://arxiv.org/abs/2108.08421

作者：Mingjun Yin,Shasha Li,Zikui Cai,Chengyu Song,M. Salman Asif,Amit K. Roy-Chowdhury,Srikanth V. Krishnamurthy 机构：University of California, Riverside, USA 备注：ICCV'21 Accepted 摘要：众所周知，部署深度神经网络（DNN）的视觉系统容易受到对抗性示例的攻击。最近的研究表明，检查输入数据的内在一致性是检测对抗性攻击的一种很有前途的方法（例如，通过检查复杂场景中的对象共生关系）。然而，现有的方法与特定的模型相联系，不能提供通用性。基于自然场景图像的语言描述已经捕获了可以通过语言模型学习的对象共生关系，我们开发了一种新的方法来使用这种语言模型执行上下文一致性检查。我们的方法的独特之处在于，它独立于部署的对象检测器，但在实际场景中检测多个对象的对抗性示例时，它提供了非常高的准确性。摘要：Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks (e.g., by checking the object co-occurrence relationships in complex scenes). However, existing approaches are tied to specific models and do not offer generalizability. Motivated by the observation that language descriptions of natural scene images have already captured the object co-occurrence relationships that can be learned by a language model, we develop a novel approach to perform context consistency checks using such language models. The distinguishing aspect of our approach is that it is independent of the deployed object detector and yet offers very high accuracy in terms of detecting adversarial examples in practical scenes with multiple objects.

【6】 Discriminating modelling approaches for Point in Time Economic Scenario Generation 标题：时间点经济情景生成的判别建模方法链接：https://arxiv.org/abs/2108.08818

作者：Rui Wang 机构：Departement of Mathematics, D-MATH, ETH Zürich, Supervisors:, Prof. Dr. Patrick Cheridito, Mr. Binghuan Lin (UBS), arXiv:,.,v, [q-fin.CP] , Aug 备注：49 pages, 20 figures 摘要：我们引入时间点经济情景生成（PiT ESG）的概念，并给出清晰的数学问题公式，以统一和比较基于前瞻性市场数据的经济情景生成方法。与仅根据长期历史数据校准的传统ESG相比，此类PiT ESG应能对突然的经济变化做出更快、更灵活的反应。我们特别将S&P500指数和VIX指数作为经济变量作为前瞻性市场数据，比较非参数过滤历史模拟、GARCH模型和联合似然估计（参数），受限玻尔兹曼机器和条件变分自动编码器（生成网络）适用于PiT ESG。我们的评估包括模型拟合的统计测试和样本外预测质量的基准测试，以及使用模型输出作为止损标准的策略回溯测试。我们发现，在我们的测试中，两种生成网络的性能都优于非参数和经典参数模型，但CVAE似乎特别适合我们的目的：产生更稳健的性能和更轻的计算量。摘要：We introduce the notion of Point in Time Economic Scenario Generation (PiT ESG) with a clear mathematical problem formulation to unify and compare economic scenario generation approaches conditional on forward looking market data. Such PiT ESGs should provide quicker and more flexible reactions to sudden economic changes than traditional ESGs calibrated solely to long periods of historical data. We specifically take as economic variable the S&P500 Index with the VIX Index as forward looking market data to compare the nonparametric filtered historical simulation, GARCH model with joint likelihood estimation (parametric), Restricted Boltzmann Machine and the conditional Variational Autoencoder (Generative Networks) for their suitability as PiT ESG. Our evaluation consists of statistical tests for model fit and benchmarking the out of sample forecasting quality with a strategy backtest using model output as stop loss criterion. We find that both Generative Networks outperform the nonparametric and classic parametric model in our tests, but that the CVAE seems to be particularly well suited for our purposes: yielding more robust performance and being computationally lighter.

【7】 A Sensor Fusion-based GNSS Spoofing Attack Detection Framework for Autonomous Vehicles 标题：基于传感器融合的自主车辆GNSS欺骗攻击检测框架链接：https://arxiv.org/abs/2108.08635

作者：Sagar Dasgupta,Mizanur Rahman,Mhafuzul Islam,Mashrur Chowdhury 机构：Ph.D. Student, Department of Civil, Construction & Environmental Engineering, The University of Alabama, Cyber Hall, Box , Kirkbride Lane, Tuscaloosa, AL , Assistant Professor, D Computer VisionMachine Learning Engineer, MicroVision Inc. WA, USA, Phone: (,) ,- 备注：arXiv admin note: substantial text overlap with arXiv:2106.02982 摘要：本文提出了一种基于传感器融合的全球导航卫星系统（GNSS）自主车辆（AV）欺骗攻击检测框架，该框架包括两种并行策略：（i）使用预测位置偏移检测车辆状态，即。，两个连续时间戳之间的行驶距离——以及车辆运动状态的监控——即静止/运动状态；和（ii）转弯检测和分类（即左转弯或右转弯）。来自多个低成本车内传感器（即加速计、转向角传感器、速度传感器和GNSS）的数据被融合并输入一个递归神经网络模型，该模型是一个长-短期记忆（LSTM）网络，用于预测位置偏移，即AV在两个连续时间戳之间移动的距离。然后将此位置偏移与基于GNSS的位置偏移进行比较，以检测攻击。然后，我们结合k-最近邻（k-NN）和动态时间扭曲（DTW）算法，使用来自转向角传感器的数据检测和分类左右转向。为了证明基于传感器融合的攻击检测框架的有效性，我们使用公开的真实世界本田研究所驾驶数据集（HDD）为四种独特而复杂的欺骗攻击创建了攻击数据集—逐轮、超调、错误转弯和停止。我们的分析表明，基于传感器融合的检测框架能够在所需的计算延迟阈值内成功检测所有四种类型的欺骗攻击。摘要：This paper presents a sensor fusion based Global Navigation Satellite System (GNSS) spoofing attack detection framework for autonomous vehicles (AV) that consists of two concurrent strategies: (i) detection of vehicle state using predicted location shift -- i.e., distance traveled between two consecutive timestamps -- and monitoring of vehicle motion state -- i.e., standstill/ in motion; and (ii) detection and classification of turns (i.e., left or right). Data from multiple low-cost in-vehicle sensors (i.e., accelerometer, steering angle sensor, speed sensor, and GNSS) are fused and fed into a recurrent neural network model, which is a long short-term memory (LSTM) network for predicting the location shift, i.e., the distance that an AV travels between two consecutive timestamps. This location shift is then compared with the GNSS-based location shift to detect an attack. We have then combined k-Nearest Neighbors (k-NN) and Dynamic Time Warping (DTW) algorithms to detect and classify left and right turns using data from the steering angle sensor. To prove the efficacy of the sensor fusion-based attack detection framework, attack datasets are created for four unique and sophisticated spoofing attacks-turn-by-turn, overshoot, wrong turn, and stop, using the publicly available real-world Honda Research Institute Driving Dataset (HDD). Our analysis reveals that the sensor fusion-based detection framework successfully detects all four types of spoofing attacks within the required computational latency threshold.

【8】 A Reinforcement Learning Approach for GNSS Spoofing Attack Detection of Autonomous Vehicles 标题：自主车辆GNSS欺骗攻击检测的强化学习方法链接：https://arxiv.org/abs/2108.08628

作者：Sagar Dasgupta,Tonmoy Ghosh,Mizanur Rahman 机构：Ph.D. Student, Department of Civil, Construction & Environmental Engineering, Cyber Hall Box , Tuscaloosa, AL , Department of Electrical and Computer Engineering, The University of Alabama, Tuscaloosa, South Engineering Research Center, Tuscaloosa, AL 摘要：自主车辆（AVs）的导航需要一个弹性和鲁棒的定位、导航和定时（PNT）系统。全球导航卫星系统（GNSS）提供基于卫星的PNT服务。然而，一个恶搞者可以调节真实的GNSS信号，并可能将错误的位置信息传输到AV。因此，全球导航卫星系统必须具备实时检测和反馈纠正与PNT接收器相关的欺骗攻击的能力，从而在最终用户（本例中为自动驾驶车辆）遇到任何危害时，帮助其安全导航。本文旨在利用低成本车内传感器数据，开发一种基于深度强化学习（RL）的逐轮欺骗攻击检测方法。我们利用本田驾驶数据集创建了攻击和非攻击数据集，开发了深度RL模型，并评估了基于RL的攻击检测模型的性能。我们发现RL模型的准确率在99.99%到100%之间，召回率为100%。然而，精确度在93.44%到100%之间，f1分数在96.61%到100%之间。总的来说，分析表明RL模型是有效的逐轮欺骗攻击检测。摘要：A resilient and robust positioning, navigation, and timing (PNT) system is a necessity for the navigation of autonomous vehicles (AVs). Global Navigation Satelite System (GNSS) provides satellite-based PNT services. However, a spoofer can temper an authentic GNSS signal and could transmit wrong position information to an AV. Therefore, a GNSS must have the capability of real-time detection and feedback-correction of spoofing attacks related to PNT receivers, whereby it will help the end-user (autonomous vehicle in this case) to navigate safely if it falls into any compromises. This paper aims to develop a deep reinforcement learning (RL)-based turn-by-turn spoofing attack detection using low-cost in-vehicle sensor data. We have utilized Honda Driving Dataset to create attack and non-attack datasets, develop a deep RL model, and evaluate the performance of the RL-based attack detection model. We find that the accuracy of the RL model ranges from 99.99% to 100%, and the recall value is 100%. However, the precision ranges from 93.44% to 100%, and the f1 score ranges from 96.61% to 100%. Overall, the analyses reveal that the RL model is effective in turn-by-turn spoofing attack detection.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】 Improving Semi-Supervised Learning for Remaining Useful Lifetime Estimation Through Self-Supervision 标题：半监督半监督学习的自监督剩余有效寿命估计链接：https://arxiv.org/abs/2108.08721

作者：Tilman Krokotsch,Mirko Knaak,Clemens Gühmann 机构：Chair of Electronic Measurement and Diagnostic Technology, Technische Universit¨at Berlin, Thermodynamics & Power Systems, Power Train & Power Engineering, IAV GmbH 备注：The manuscript for initial journal submission 摘要：RUL估算受到服务器数据不平衡的影响，在这种情况下，来自生命周期即将结束的机器的数据很少。此外，机器产生的数据只能在机器出现故障后进行标记。半监督学习（SSL）可以合并由尚未失败的机器生成的未标记数据。以前关于SSL的工作在不现实的条件下评估了他们的方法，在这种情况下，接近失败的数据仍然可用。即便如此，也只取得了适度的改善。提出了一种新的基于自监督预训练的SSL方法。在NASA C-MAPSS数据集的实际条件下，该方法的性能优于文献中的两种竞争方法和有监督的基线。然而，我们在某些情况下观察到性能下降，并讨论可能的原因。摘要：RUL estimation suffers from a server data imbalance where data from machines near their end of life is rare. Additionally, the data produced by a machine can only be labeled after the machine failed. Semi-Supervised Learning (SSL) can incorporate the unlabeled data produced by machines that did not yet fail. Previous work on SSL evaluated their approaches under unrealistic conditions where the data near failure was still available. Even so, only moderate improvements were made. This paper proposes a novel SSL approach based on self-supervised pre-training. The method can outperform two competing approaches from the literature and a supervised baseline under realistic conditions on the NASA C-MAPSS dataset. Nevertheless, we observe degraded performance in some circumstances and discuss possible causes.

【2】 Teaching Uncertainty Quantification in Machine Learning through Use Cases 标题：通过用例讲授机器学习中的不确定性量化链接：https://arxiv.org/abs/2108.08712

作者：Matias Valdenegro-Toro 机构： mostlybeing present in advanced summer schools (like MLSS 备注：2nd Teaching in Machine Learning Workshop, Camera Ready, 5 pages, 3 figures 摘要：机器学习中的不确定性在机器学习课程中通常不作为一般知识教授。在本文中，我们为一门关于机器学习中的不确定性的课程提出了一个简短的课程，并选择了一些用例作为补充，旨在引发讨论，让学生在编程环境中玩不确定性的概念。我们的用例包括输出不确定性的概念、贝叶斯神经网络和权重分布、不确定性的来源以及分布外检测。我们期望本课程和一组用例能够激励社区将这些重要概念纳入AI安全课程中。摘要：Uncertainty in machine learning is not generally taught as general knowledge in Machine Learning course curricula. In this paper we propose a short curriculum for a course about uncertainty in machine learning, and complement the course with a selection of use cases, aimed to trigger discussion and let students play with the concepts of uncertainty in a programming setting. Our use cases cover the concept of output uncertainty, Bayesian neural networks and weight distributions, sources of uncertainty, and out of distribution detection. We expect that this curriculum and set of use cases motivates the community to adopt these important concepts into courses for safety in AI.

【3】 Neural density estimation and uncertainty quantification for laser induced breakdown spectroscopy spectra 标题：激光诱导击穿光谱的神经密度估计和不确定度量化链接：https://arxiv.org/abs/2108.08709

作者：Katiana Kontolati,Natalie Klein,Nishant Panda,Diane Oyen 机构：Johns Hopkins University, Baltimore, MD , Los Alamos National Laboratory, Los Alamos, NM 备注：5 pages, 3 figures 摘要：在高维光谱数据中构造用于推断的概率密度通常是困难的。在这项工作中，我们使用结构化谱潜空间上的归一化流来估计这种密度，从而实现下游推理任务。此外，我们评估了一种在预测与每个光谱相关的未观测状态向量时进行不确定性量化的方法。我们在火星漫游者好奇号上的ChemCam仪器收集的激光诱导击穿光谱数据上展示了这种方法的能力。使用我们的方法，我们能够生成真实的光谱样本，并准确预测具有相关校准不确定性的状态向量。我们预计，这种方法将能够对光谱数据进行有效的概率建模，从而在多个领域取得潜在进展，包括分布外检测和灵敏度分析。摘要：Constructing probability densities for inference in high-dimensional spectral data is often intractable. In this work, we use normalizing flows on structured spectral latent spaces to estimate such densities, enabling downstream inference tasks. In addition, we evaluate a method for uncertainty quantification when predicting unobserved state vectors associated with each spectrum. We demonstrate the capability of this approach on laser-induced breakdown spectroscopy data collected by the ChemCam instrument on the Mars rover Curiosity. Using our approach, we are able to generate realistic spectral samples and to accurately predict state vectors with associated well-calibrated uncertainties. We anticipate that this methodology will enable efficient probabilistic modeling of spectral data, leading to potential advances in several areas, including out-of-distribution detection and sensitivity analysis.

【4】 Batch Curation for Unsupervised Contrastive Representation Learning 标题：无监督对比表征学习的批处理算法链接：https://arxiv.org/abs/2108.08643

作者：Michael C. Welle,Petra Poklukar,Danica Kragic 机构： KTH Royal Institute of Tech-nology 摘要：最近出现的最先进的无监督对比视觉表征学习方法（SimCLR、MoCo、SwAV）都利用数据增强来构建由相似和不相似的图像对组成的即时辨别的借口任务。通过从同一图像中随机提取面片并应用其他变换（如颜色抖动或模糊）来构造相似的面片对，而来自给定批次中不同图像实例的变换面片被视为不同的面片对。我们认为这种方法可以产生相似的对，而这些对在语义上是不同的。在这项工作中，我们通过引入一个{batch curation}方案来解决这个问题，该方案在训练过程中选择更符合基本对比目标的批次。我们通过将CIFAR10整合到SimCLR模型中，深入了解什么构成有益的相似和不相似对，并验证CIFAR10上的text{batch curation}。摘要：The state-of-the-art unsupervised contrastive visual representation learning methods that have emerged recently (SimCLR, MoCo, SwAV) all make use of data augmentations in order to construct a pretext task of instant discrimination consisting of similar and dissimilar pairs of images. Similar pairs are constructed by randomly extracting patches from the same image and applying several other transformations such as color jittering or blurring, while transformed patches from different image instances in a given batch are regarded as dissimilar pairs. We argue that this approach can result similar pairs that are textit{semantically} dissimilar. In this work, we address this problem by introducing a textit{batch curation} scheme that selects batches during the training process that are more inline with the underlying contrastive objective. We provide insights into what constitutes beneficial similar and dissimilar pairs as well as validate textit{batch curation} on CIFAR10 by integrating it in the SimCLR model.

【5】 Concurrent Discrimination and Alignment for Self-Supervised Feature Learning 标题：自监督特征学习的并行判别和对齐链接：https://arxiv.org/abs/2108.08562

作者：Anjan Dutta,Massimiliano Mancini,Zeynep Akata 机构：University of Exeter,University of T¨ubingen 备注：International Conference on Computer Vision (DeepMTL) 2021 摘要：现有的自监督学习方法通过借口任务学习表征，这些任务要么（1）明确指定哪些特征应该分离，要么（2）对齐，精确指示哪些特征应该闭合，但忽略了一个事实，即如何共同且主要地定义哪些特征应该被排斥，哪些特征应该被吸引。在这项工作中，我们结合了识别和对齐方法的积极方面，并设计了一种解决上述问题的混合方法。我们的方法通过区分性预测任务和同时最大化共享冗余信息的成对视图之间的互信息，分别明确指定了排斥和吸引机制。我们定性和定量地表明，我们提出的模型学习更好的特征，更有效地处理从分类到语义分割的各种下游任务。我们在九个已建立的基准上的实验表明，该模型始终优于现有的自监督和转移学习协议的最新结果。摘要：Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together, but ignore the fact how to jointly and principally define which features to be repelled and which ones to be attracted. In this work, we combine the positive aspects of the discriminating and aligning methods, and design a hybrid method that addresses the above issue. Our method explicitly specifies the repulsion and attraction mechanism respectively by discriminative predictive task and concurrently maximizing mutual information between paired views sharing redundant information. We qualitatively and quantitatively show that our proposed model learns better features that are more effective for the diverse downstream tasks ranging from classification to semantic segmentation. Our experiments on nine established benchmarks show that the proposed model consistently outperforms the existing state-of-the-art results of self-supervised and transfer learning protocol.

【6】 Self-Supervised Video Representation Learning with Meta-Contrastive Network 标题：基于元对比网络的自监督视频表示学习链接：https://arxiv.org/abs/2108.08426

作者：Yuanze Lin,Xun Guo,Yan Lu 机构：University of Washington, Microsoft Research Asia 备注：Accepted to ICCV 2021 摘要：自监督学习已成功地应用于训练前视频表示，其目的是有效地适应训练前域到下游任务。现有的方法仅仅利用对比损失来学习实例级别的区分。然而，类别信息的缺乏将导致难以确定的正问题，从而限制了这类方法的泛化能力。我们发现元学习的多任务过程可以解决这个问题。在本文中，我们提出了一个元对比网络（MCN），它将对比学习和元学习结合起来，以增强现有自监督方法的学习能力。我们的方法包含两个基于模型不可知元学习（MAML）的训练阶段，每个阶段包括一个对比分支和一个元分支。广泛的评估证明了我们方法的有效性。对于两个下游任务，即视频动作识别和视频检索，MCN在UCF101和HMDB51数据集上优于最先进的方法。更具体地说，使用R（2 1）D主干，MCN在视频动作识别方面达到了84.8%和54.5%的顶级精度，在视频检索方面达到了52.5%和23.7%的顶级精度。摘要：Self-supervised learning has been successfully applied to pre-train video representations, which aims at efficient adaptation from pre-training domain to downstream tasks. Existing approaches merely leverage contrastive loss to learn instance-level discrimination. However, lack of category information will lead to hard-positive problem that constrains the generalization ability of this kind of methods. We find that the multi-task process of meta learning can provide a solution to this problem. In this paper, we propose a Meta-Contrastive Network (MCN), which combines the contrastive learning and meta learning, to enhance the learning ability of existing self-supervised approaches. Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch. Extensive evaluations demonstrate the effectiveness of our method. For two downstream tasks, i.e., video action recognition and video retrieval, MCN outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets. To be more specific, with R(2 1)D backbone, MCN achieves Top-1 accuracies of 84.8% and 54.5% for video action recognition, as well as 52.5% and 23.7% for video retrieval.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】 IT2CFNN: An Interval Type-2 Correlation-Aware Fuzzy Neural Network to Construct Non-Separable Fuzzy Rules with Uncertain and Adaptive Shapes for Nonlinear Function Approximation 标题：IT2CFNN：一种构造非线性函数逼近的不确定自适应形状不可分模糊规则的区间2型相关性模糊神经网络链接：https://arxiv.org/abs/2108.08704

作者：Armin Salimi-Badr 机构：Shahid Beheshti University, Tehran, Iran 摘要：本文介绍了一种新的区间2型模糊神经网络，它能够构造形状自适应的不可分模糊规则。为了反映不确定性，模糊集的形状被认为是不确定的。因此，提出了一种基于一般高斯模型的区间2型模糊集的新形式，该模型能够构造不同的形状（包括三角形、钟形、梯形）。为了考虑输入变量之间的相互作用，将输入向量变换为不相关变量的新特征空间，用于定义每个模糊规则。然后，使用所提出的具有自适应形状的区间2型模糊集将新特征反馈到模糊化层。因此，考虑到变量的局部交互作用和不确定性，形成具有适当形状的区间型2不可分离模糊规则。对于类型约简，分别自适应地选择每个模糊规则的上下射击强度的贡献。为了训练不同的网络参数，采用了Levenberg-Markadt优化方法。所提出的方法的性能研究清洁和嘈杂的数据集，以显示考虑不确定性的能力。此外，所提出的范式已成功地应用于现实世界的时间序列预测、回归问题和非线性系统辨识。根据实验结果，我们提出的模型的性能优于其他结构更简洁的方法。摘要：In this paper, a new interval type-2 fuzzy neural network able to construct non-separable fuzzy rules with adaptive shapes is introduced. To reflect the uncertainty, the shape of fuzzy sets considered to be uncertain. Therefore, a new form of interval type-2 fuzzy sets based on a general Gaussian model able to construct different shapes (including triangular, bell-shaped, trapezoidal) is proposed. To consider the interactions among input variables, input vectors are transformed to new feature spaces with uncorrelated variables proper for defining each fuzzy rule. Next, the new features are fed to a fuzzification layer using proposed interval type-2 fuzzy sets with adaptive shape. Consequently, interval type-2 non-separable fuzzy rules with proper shapes, considering the local interactions of variables and the uncertainty are formed. For type reduction the contribution of the upper and lower firing strengths of each fuzzy rule are adaptively selected separately. To train different parameters of the network, the Levenberg-Marquadt optimization method is utilized. The performance of the proposed method is investigated on clean and noisy datasets to show the ability to consider the uncertainty. Moreover, the proposed paradigm, is successfully applied to real-world time-series predictions, regression problems, and nonlinear system identification. According to the experimental results, the performance of our proposed model outperforms other methods with a more parsimonious structure.

【2】 Order Optimal One-Shot Federated Learning for non-Convex Loss Functions 标题：非凸损失函数的阶数最优单次联合学习链接：https://arxiv.org/abs/2108.08677

作者：Arsalan Sharifnassab,Saber Salehkaleybar,S. Jamaloddin Golestani 机构： where d and B are dimensionThe authors are with the Department of Electrical Engineering, SharifUniversity of Technology 摘要：我们考虑联合学习的问题，在一个镜头设置，其中有$M$机器，每个观察$N$样本函数从未知分布上的非凸损失函数。设$F:[-1,1]^dtomathbb{R}$是关于此未知分布的预期损失函数。目标是找到$F$最小值的估计值。根据观察结果，每台机器生成一个长度有界的信号$B$，并将其发送到服务器。服务器收集所有机器的信号，并输出最小值$F$的估计值。我们提出了一种分布式学习算法，称为非凸损失函数的多分辨率估计器（MRE-NC），其期望误差为$maxbig（1/sqrt{n}（mB）^{1/d}，1/sqrt{mn}big）$，直至多对数因子。我们还提供了任何算法性能的匹配下界，表明MRE-NC在$n$和$m$方面是顺序最优的。对合成数据和真实数据的实验表明，MRE-NC在非凸损失函数模型参数的分布式学习中是有效的。摘要：We consider the problem of federated learning in a one-shot setting in which there are $m$ machines, each observing $n$ samples function from an unknown distribution on non-convex loss functions. Let $F:[-1,1]^dtomathbb{R}$ be the expected loss function with respect to this unknown distribution. The goal is to find an estimate of the minimizer of $F$. Based on its observations, each machine generates a signal of bounded length $B$ and sends it to a server. The sever collects signals of all machines and outputs an estimate of the minimizer of $F$. We propose a distributed learning algorithm, called Multi-Resolution Estimator for Non-Convex loss function (MRE-NC), whose expected error is bounded by $maxbig(1/sqrt{n}(mB)^{1/d}, 1/sqrt{mn}big)$, up to polylogarithmic factors. We also provide a matching lower bound on the performance of any algorithm, showing that MRE-NC is order optimal in terms of $n$ and $m$. Experiments on synthetic and real data show the effectiveness of MRE-NC in distributed learning of model's parameters for non-convex loss functions.

【3】 Spatially-Adaptive Image Restoration using Distortion-Guided Networks 标题：基于失真引导网络的空间自适应图像恢复链接：https://arxiv.org/abs/2108.08617

作者：Kuldeep Purohit,Maitreya Suin,A. N. Rajagopalan,Vishnu Naresh Boddeti 机构： Michigan State University, Indian Institute of Technology Madras 备注：Accepted at ICCV 2021 摘要：我们提出了一个通用的基于学习的解决方案，用于恢复遭受空间变化退化的图像。先前的方法通常是特定于退化的，并且在不同的图像和图像中的不同像素上采用相同的处理。然而，我们假设这样的空间刚性处理对于同时恢复退化像素以及重建图像的干净区域而言是次优的。为了克服这一局限性，我们提出了SPAIR，这是一种利用失真定位信息并根据图像中的困难区域动态调整计算的网络设计。SPAIR由两部分组成：（1）一个用于识别退化像素的定位网络，以及（2）一个恢复网络，该网络利用来自滤波器和特征域中的定位网络的知识来选择性和自适应地恢复退化像素。我们的关键思想是利用空间域中严重退化的不均匀性，并将这些知识适当地嵌入到执行稀疏归一化、特征提取和注意的失真引导模块中。我们的体系结构与物理形成模型无关，并概括了几种类型的空间变化退化。我们分别展示了SPAIR在四项恢复任务中的功效，即去除雨条纹、雨滴、阴影和运动模糊。在11个基准数据集上与现有技术进行了广泛的定性和定量比较，结果表明，与最先进的特定于退化的体系结构相比，我们的退化不可知网络设计提供了显著的性能提升。代码可在https://github.com/human-analysis/spatially-adaptive-image-restoration. 摘要：We present a general learning-based solution for restoring images suffering from spatially-varying degradations. Prior approaches are typically degradation-specific and employ the same processing across different images and different pixels within. However, we hypothesize that such spatially rigid processing is suboptimal for simultaneously restoring the degraded pixels as well as reconstructing the clean regions of the image. To overcome this limitation, we propose SPAIR, a network design that harnesses distortion-localization information and dynamically adjusts computation to difficult regions in the image. SPAIR comprises of two components, (1) a localization network that identifies degraded pixels, and (2) a restoration network that exploits knowledge from the localization network in filter and feature domain to selectively and adaptively restore degraded pixels. Our key idea is to exploit the non-uniformity of heavy degradations in spatial-domain and suitably embed this knowledge within distortion-guided modules performing sparse normalization, feature extraction and attention. Our architecture is agnostic to physical formation model and generalizes across several types of spatially-varying degradations. We demonstrate the efficacy of SPAIR individually on four restoration tasks-removal of rain-streaks, raindrops, shadows and motion blur. Extensive qualitative and quantitative comparisons with prior art on 11 benchmark datasets demonstrate that our degradation-agnostic network design offers significant performance gains over state-of-the-art degradation-specific architectures. Code available at https://github.com/human-analysis/spatially-adaptive-image-restoration.

强化学习(2篇)

【1】 Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning 标题：脱机强化学习的参与者-批评性方法的可证明的好处链接：https://arxiv.org/abs/2108.08812

作者：Andrea Zanette,Martin J. Wainwright,Emma Brunskill 机构：Inst. for Comp. and Math. Engineering, Departments of Statistics and EECS, Department of Computer Science, Stanford University 备注：Initial submission; appeared as spotlight talk in ICML 2021 Workshop on Theory of RL 摘要：演员批评方法广泛应用于离线强化学习实践中，但在理论上并没有得到很好的理解。我们提出了一种新的离线演员-评论家算法，该算法自然地结合了悲观主义原则，与最新技术相比，具有几个关键优势。当Bellman评估算子相对于参与者策略的动作值函数关闭时，该算法可以运行；这是一个比低秩MDP模型更一般的设置。尽管增加了通用性，但该过程在计算上是可处理的，因为它涉及到一系列二阶程序的求解。我们证明了由过程返回的策略的次优差距的上界，该上界取决于任意的、可能依赖于数据的比较器策略的数据覆盖率。可实现的保证由匹配对数因子的极小极大下界补充。摘要：Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor's policies; this is a more general setting than the low-rank MDP model. Despite the added generality, the procedure is computationally tractable as it involves the solution of a sequence of second-order programs. We prove an upper bound on the suboptimality gap of the policy returned by the procedure that depends on the data coverage of any arbitrary, possibly data dependent comparator policy. The achievable guarantee is complemented with a minimax lower bound that is matching up to logarithmic factors.

【2】 Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning 标题：强化学习中在线Actor-Critic算法ODE极限的全局收敛性链接：https://arxiv.org/abs/2108.08655

作者：Ziheng Wang,Justin Sirignano 摘要：Actor-critic算法广泛应用于强化学习中，但由于非i.i.d.数据样本的在线到达，因此难以进行数学分析。数据样本的分布随着模型的更新而动态变化，在数据分布和强化学习算法之间引入了一个复杂的反馈回路。我们证明了在时间尺度下，随着更新次数的增加，具有表格参数化的在线演员-评论家算法收敛到常微分方程（ODE）。证明首先建立了固定参与者策略下数据样本的几何遍历性。然后，利用泊松方程，我们证明了动态概率测度（作为演化参与者模型的函数）周围的数据样本的波动随着更新次数的增加而消失。一旦导出了常微分方程极限，我们将使用两个时间尺度分析来研究其收敛性，该分析将临界常微分方程与演员常微分方程渐近解耦。证明了批评家对Bellman方程解的收敛性和行动者对最优策略的收敛性。此外，还建立了该全局极小值的收敛速度。我们的收敛性分析在演员-批评家算法的学习率和探索率的特定选择下成立，这可以为演员-批评家算法在实践中的实现提供指导。摘要：Actor-critic algorithms are widely used in reinforcement learning, but are challenging to mathematically analyze due to the online arrival of non-i.i.d. data samples. The distribution of the data samples dynamically changes as the model is updated, introducing a complex feedback loop between the data distribution and the reinforcement learning algorithm. We prove that, under a time rescaling, the online actor-critic algorithm with tabular parametrization converges to an ordinary differential equations (ODEs) as the number of updates becomes large. The proof first establishes the geometric ergodicity of the data samples under a fixed actor policy. Then, using a Poisson equation, we prove that the fluctuations of the data samples around a dynamic probability measure, which is a function of the evolving actor model, vanish as the number of updates become large. Once the ODE limit has been derived, we study its convergence properties using a two time-scale analysis which asymptotically de-couples the critic ODE from the actor ODE. The convergence of the critic to the solution of the Bellman equation and the actor to the optimal policy are proven. In addition, a convergence rate to this global minimum is also established. Our convergence analysis holds under specific choices for the learning rates and exploration rates in the actor-critic algorithm, which could provide guidance for the implementation of actor-critic algorithms in practice.

元学习(1篇)

【1】 Prior Is All You Need to Improve the Robustness and Safety for the First Time Deployment of Meta RL 标题：首次部署Meta RL时，您只需使用之前的版本即可提高健壮性和安全性链接：https://arxiv.org/abs/2108.08448

作者：Lu Wen,Songan Zhang,H. Eric Tseng,Baljeet Singh,Dimitar Filev,Huei Peng 机构： University ofMichigan 摘要：元强化学习（Meta-RL）领域近年来取得了长足的进步。特别是，开发了非策略方法来提高Meta-RL技术的数据效率textit{actor-critic-RL}的概率嵌入（PEARL）是目前解决多MDP自适应问题的主要方法之一。许多现有的META RL方法的主要缺点，包括PARE，是它们在第一次暴露于新任务时没有明确考虑先前策略的安全性。这对于一些实际应用非常重要，包括野外机器人和自动车辆（AVs）。在本文中，我们开发了PEARL PLUS（PEARL$^ $）算法，该算法优化了策略的先验安全性和后验自适应性。在PEARL算法的基础上，我们提出的PEARL$^ $算法在奖励函数中引入了一个先验正则化项和一个新的Q网络，用于在先验上下文假设下恢复状态动作值，以提高训练网络首次暴露于新任务的鲁棒性和安全性。通过解决与机器人和AVs相关的三个安全关键决策问题，包括两个MuJoCo基准问题，证明了PEARL$^ $方法的性能。仿真实验表明，与原PEARL方法相比，先验策略的安全性得到了显著提高。摘要：The field of Meta Reinforcement Learning (Meta-RL) has seen substantial advancements recently. In particular, off-policy methods were developed to improve the data efficiency of Meta-RL techniques. textit{Probabilistic embeddings for actor-critic RL} (PEARL) is currently one of the leading approaches for multi-MDP adaptation problems. A major drawback of many existing Meta-RL methods, including PEARL, is that they do not explicitly consider the safety of the prior policy when it is exposed to a new task for the very first time. This is very important for some real-world applications, including field robots and Autonomous Vehicles (AVs). In this paper, we develop the PEARL PLUS (PEARL$^ $) algorithm, which optimizes the policy for both prior safety and posterior adaptation. Building on top of PEARL, our proposed PEARL$^ $ algorithm introduces a prior regularization term in the reward function and a new Q-network for recovering the state-action value with prior context assumption, to improve the robustness and safety of the trained network exposing to a new task for the first time. The performance of the PEARL$^ $ method is demonstrated by solving three safety-critical decision-making problems related to robots and AVs, including two MuJoCo benchmark problems. From the simulation experiments, we show that the safety of the prior policy is significantly improved compared to that of the original PEARL method.

医学相关(5篇)

【1】 Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model) 标题：代理辅助策略(基于传染病代理的模型的参数化) 链接：https://arxiv.org/abs/2108.08809

作者：Rylan Perumal,Terence L van Zyl 机构：Received: date Accepted: date, Under Review. Personal use of this material is permitted. Permission from the authors must be obtained for all other 备注：arXiv admin note: text overlap with arXiv:2008.11835 摘要：在基于agent的建模与仿真（ABMS）中，参数标定是一个重大挑战。基于代理的模型（ABM）的复杂性随着需要校准的参数数量的增加而增加。此参数扩展导致ABMS等价于say{维数灾难}。特别是，不可行的计算要求搜索无限参数空间。我们提出了一个更全面和适应性更强的ABMS框架，可以有效地交换参数化策略和替代模型，以参数化传染病ABM。该框架允许我们评估不同策略代理组合在准确性和效率（加速比）方面的性能。我们表明，在替代辅助抽样策略和基线中，我们的准确度优于奇偶校验。此外，我们还发现，结合支持向量机代理的度量随机响应面策略在最接近真实合成参数方面是最好的。此外，我们还表明，使用响应面模型（以XGBoost作为替代项）的动态坐标搜索在组合中实现了最高概率的近似累积合成每日感染数据分布，并且在我们的分析中实现了最显著的加速。最后，我们在真实环境中展示了DYCORS XGBoost和MSRS SVM可以分别以97.12$%和96.75$%的相似性近似真实世界的每日累积感染分布。摘要：Parameter calibration is a significant challenge in agent-based modelling and simulation (ABMS). An agent-based model's (ABM) complexity grows as the number of parameters required to be calibrated increases. This parameter expansion leads to the ABMS equivalent of the say{curse of dimensionality}. In particular, infeasible computational requirements searching an infinite parameter space. We propose a more comprehensive and adaptive ABMS Framework that can effectively swap out parameterisation strategies and surrogate models to parameterise an infectious disease ABM. This framework allows us to evaluate different strategy-surrogate combinations' performance in accuracy and efficiency (speedup). We show that we achieve better than parity in accuracy across the surrogate assisted sampling strategies and the baselines. Also, we identify that the Metric Stochastic Response Surface strategy combined with the Support Vector Machine surrogate is the best overall in getting closest to the true synthetic parameters. Also, we show that DYnamic COOrdindate Search Using Response Surface Models with XGBoost as a surrogate attains in combination the highest probability of approximating a cumulative synthetic daily infection data distribution and achieves the most significant speedup with regards to our analysis. Lastly, we show in a real-world setting that DYCORS XGBoost and MSRS SVM can approximate the real world cumulative daily infection distribution with $97.12$% and $96.75$% similarity respectively.

【2】 Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves 标题：特征加权叠加在非季节性时间序列预测中的应用--以冠状病毒流行曲线为例链接：https://arxiv.org/abs/2108.08723

作者：Pieter Cawood,Terence L. van Zyl 机构：School of, Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa, Institute for Intelligent Systems, University of Johannesburg 摘要：我们研究了在预测中的置乱技术，并检验了它们在非季节性时间序列中的应用潜力，这些时间序列与新冠病毒-19大流行早期的时间序列相似。开发改进的预测方法至关重要，因为它们在关键阶段为组织和决策者提供数据驱动的决策。我们建议使用后期数据融合，使用两个预测模型和两个元特征的叠加集成，在初步预测阶段证明其预测能力。最终的集成包括Prophet和长短时记忆（LSTM）神经网络作为基础模型。基本模型由多层感知器（MLP）组合而成，同时考虑到元特征，这些元特征表明与每个基本模型的预测精度具有最高的相关性。我们进一步表明，元特征的加入通常可以提高集合在7天和14天两个预测期内的预测精度。这项研究加强了以前的工作，并证明了将传统统计模型与深度学习模型相结合的价值，从而为跨领域的时间序列生成更精确的预测模型。摘要：We investigate ensembling techniques in forecasting and examine their potential for use in nonseasonal time-series similar to those in the early days of the COVID-19 pandemic. Developing improved forecast methods is essential as they provide data-driven decisions to organisations and decision-makers during critical phases. We propose using late data fusion, using a stacked ensemble of two forecasting models and two meta-features that prove their predictive power during a preliminary forecasting stage. The final ensembles include a Prophet and long short term memory (LSTM) neural network as base models. The base models are combined by a multilayer perceptron (MLP), taking into account meta-features that indicate the highest correlation with each base model's forecast accuracy. We further show that the inclusion of meta-features generally improves the ensemble's forecast accuracy across two forecast horizons of seven and fourteen days. This research reinforces previous work and demonstrates the value of combining traditional statistical models with deep learning models to produce more accurate forecast models for time-series across domains.

【3】 Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion 标题：利用大规模多模态数据融合识别Instagram上的非法毒贩链接：https://arxiv.org/abs/2108.08301

作者：Chuanbo Hu,Minglei Yin,Bin Liu,Xin Li,Yanfang Ye 机构： Case Western Reserve UniversityIllicit drug trafficking via social media sites such as Instagram has become a severe problem, CaseWestern Reserve University 摘要：通过Instagram等社交媒体网站非法贩运毒品已成为一个严重问题，因此引起了执法部门和公共卫生机构的高度关注。由于以下原因，如何从社交媒体数据中识别非法毒贩仍然是一项技术挑战。一方面，由于对爬行式社交媒体网站的隐私担忧，可用数据受到限制；另一方面，毒品交易模式的多样性使得很难可靠地将毒品交易者与普通吸毒者区分开来。与现有侧重于基于帖子的检测的方法不同，我们建议通过构建一个名为Instagram上识别毒贩（IDDIG）的大规模多模式数据集来解决非法毒贩识别问题。Instagram共收集了近4000个用户帐户，其中1400多个是毒贩，这些帐户包含多种数据源，包括帖子评论、帖子图片、主页简介和主页图片。然后，我们设计了一种基于四重模型的多模式融合方法，将与每个用户帐户相关联的多个数据源结合起来，以识别毒贩。在构建的IDDIG数据集上的实验结果证明了该方法在识别毒贩方面的有效性（准确率接近95%）。此外，我们还开发了一种基于标签的社区检测技术，用于发现进化模式，特别是与地理和药物类型相关的模式。摘要：Illicit drug trafficking via social media sites such as Instagram has become a severe problem, thus drawing a great deal of attention from law enforcement and public health agencies. How to identify illicit drug dealers from social media data has remained a technical challenge due to the following reasons. On the one hand, the available data are limited because of privacy concerns with crawling social media sites; on the other hand, the diversity of drug dealing patterns makes it difficult to reliably distinguish drug dealers from common drug users. Unlike existing methods that focus on posting-based detection, we propose to tackle the problem of illicit drug dealer identification by constructing a large-scale multimodal dataset named Identifying Drug Dealers on Instagram (IDDIG). Totally nearly 4,000 user accounts, of which over 1,400 are drug dealers, have been collected from Instagram with multiple data sources including post comments, post images, homepage bio, and homepage images. We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification. Experimental results on the constructed IDDIG dataset demonstrate the effectiveness of the proposed method in identifying drug dealers (almost 95% accuracy). Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types.

【4】 MobileCaps: A Lightweight Model for Screening and Severity Analysis of COVID-19 Chest X-Ray Images 标题：MobileCaps：一种用于冠状病毒胸部X线图像筛查和严重性分析的轻量级模型链接：https://arxiv.org/abs/2108.08775

作者：S J Pawan,Rahul Sankar,Amithash M Prabhudev,P A Mahesh,K Prakashini,Sudha Kiran Das,Jeny Rajan 机构：Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India, Department of Respiratory Medicine, Kasturba Medical College and Hospital, Manipal, J.S.S. Medical College, Mysore, India 备注：14 pages, 6 figures 摘要：由于新冠疫情对医疗体系和经济造成的灾难性影响，世界正在经历一个具有挑战性的阶段。传播速度、新冠病毒后的症状以及新冠病毒链的出现使全球的医疗体系陷入了混乱。因此，准确筛查新冠病毒-19病例已成为当务之急。由于病毒感染呼吸系统，胸部X光是一种广泛用于初始筛查的成像方式。我们进行了一项综合研究，利用CXR图像识别新冠病毒-19病例，并认识到有必要建立一个更具普遍性的模型。我们利用MobileNetV2体系结构作为特征提取器，并将其集成到胶囊网络中，以构建称为MobileCaps的全自动轻量级模型。MobileCaps在公开的数据集上使用模型融合和贝叶斯优化策略进行训练和评估，以有效地将新冠肺炎患者和非新冠肺炎患者的CXR图像分类。在另外两个经RT-PCR证实的数据集上进一步评估了所提出的模型，以证明其普遍性。我们还介绍了MobileCaps-S，并利用它对基于肺水肿放射学评估（RALE）评分技术的新冠病毒19的CXR图像进行严重性评估。我们的分类模型对新冠肺炎、非新冠肺炎和健康病例的总召回率分别为91.60、94.60、92.20和98.50、88.21、92.62。此外，严重性评估模型的R$^2$系数为70.51。由于所提出的模型比文献中报道的最先进模型具有更少的可训练参数，我们相信我们的模型将在帮助医疗系统对抗流感大流行方面发挥很大作用。摘要：The world is going through a challenging phase due to the disastrous effect caused by the COVID-19 pandemic on the healthcare system and the economy. The rate of spreading, post-COVID-19 symptoms, and the occurrence of new strands of COVID-19 have put the healthcare systems in disruption across the globe. Due to this, the task of accurately screening COVID-19 cases has become of utmost priority. Since the virus infects the respiratory system, Chest X-Ray is an imaging modality that is adopted extensively for the initial screening. We have performed a comprehensive study that uses CXR images to identify COVID-19 cases and realized the necessity of having a more generalizable model. We utilize MobileNetV2 architecture as the feature extractor and integrate it into Capsule Networks to construct a fully automated and lightweight model termed as MobileCaps. MobileCaps is trained and evaluated on the publicly available dataset with the model ensembling and Bayesian optimization strategies to efficiently classify CXR images of patients with COVID-19 from non-COVID-19 pneumonia and healthy cases. The proposed model is further evaluated on two additional RT-PCR confirmed datasets to demonstrate the generalizability. We also introduce MobileCaps-S and leverage it for performing severity assessment of CXR images of COVID-19 based on the Radiographic Assessment of Lung Edema (RALE) scoring technique. Our classification model achieved an overall recall of 91.60, 94.60, 92.20, and a precision of 98.50, 88.21, 92.62 for COVID-19, non-COVID-19 pneumonia, and healthy cases, respectively. Further, the severity assessment model attained an R$^2$ coefficient of 70.51. Owing to the fact that the proposed models have fewer trainable parameters than the state-of-the-art models reported in the literature, we believe our models will go a long way in aiding healthcare systems in the battle against the pandemic.

【5】 Medical Image Segmentation using 3D Convolutional Neural Networks: A Review 标题：三维卷积神经网络在医学图像分割中的研究进展链接：https://arxiv.org/abs/2108.08467

作者：S. Niyas,S J Pawan,M Anand Kumar,Jeny Rajan 机构：Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India, Department of Information Technology 备注：17 pages, 4 figures 摘要：计算机辅助医学图像分析在帮助医生进行专家临床诊断和确定最佳治疗方案方面发挥着重要作用。目前，卷积神经网络（CNN）是医学图像分析的首选方法。此外，随着三维（3D）成像系统的快速发展以及处理大量数据的优秀硬件和软件支持的可用性，3D深度学习方法在医学图像分析中越来越流行。在这里，我们提出了一个广泛的审查，最近发展的三维深度学习方法在医学图像分割。此外，还讨论了三维医学图像分割的研究差距和未来发展方向。摘要：Computer-aided medical image analysis plays a significant role in assisting medical practitioners for expert clinical diagnosis and deciding the optimal treatment plan. At present, convolutional neural networks (CNN) are the preferred choice for medical image analysis. In addition, with the rapid advancements in three-dimensional (3D) imaging systems and the availability of excellent hardware and software support to process large volumes of data, 3D deep learning methods are gaining popularity in medical image analysis. Here, we present an extensive review of the recently evolved 3D deep learning methods in medical image segmentation. Furthermore, the research gaps and future directions in 3D medical image segmentation are discussed.

蒸馏|知识提取(1篇)

【1】 QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction 标题：QUEACO：从弱标签行为数据中借用宝库进行查询属性值提取链接：https://arxiv.org/abs/2108.08468

作者：Danqing Zhang,Zheng Li,Tianyu Cao,Chen Luo,Tony Wu,Hanqing Lu,Yiwei Song,Bing Yin,Tuo Zhao,Qiang Yang 机构：Georgia Institute of Technology, GA, USA, Hong Kong University of Science and Technology, HK, China 备注：None 摘要：我们研究了查询属性值提取问题，其目的是将用户查询中的命名实体识别为不同的表面形式属性值，然后将其转换为正式的规范形式。这个问题包括两个阶段：{命名实体识别（NER）}和{属性值规范化（AVN）}。然而，现有的工作只关注NER阶段，而忽略了同样重要的AVN。为了弥补这一差距，本文提出了一个电子商务搜索中统一的查询属性值提取系统QUEACO，该系统包括两个阶段。此外，通过利用大规模弱标记行为数据，我们进一步提高了提取性能，同时降低了监督成本。具体而言，对于NER阶段，QUEACO采用了一种新型的教师-学生网络，其中在强标记数据上训练的教师网络生成伪标记以细化弱标记数据以训练学生网络。同时，教师网络可以通过学生对强标签数据的反馈动态调整，以最大限度地消除弱标签带来的噪声。对于AVN阶段，我们还利用弱标记的查询到属性行为数据将查询中的表面形式属性值规范化为产品中的规范形式。在真实世界的大规模电子商务数据集上进行的大量实验证明了QUEACO的有效性。摘要：We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: {named entity recognition (NER)} and {attribute value normalization (AVN)}. However, existing works only focus on the NER phase but neglect equally important AVN. To bridge this gap, this paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO, which involves both two phases. Moreover, by leveraging large-scale weakly-labeled behavior data, we further improve the extraction performance with less supervision cost. Specifically, for the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels to refine the weakly-labeled data for training a student network. Meanwhile, the teacher network can be dynamically adapted by the feedback of the student's performance on strongly-labeled data to maximally denoise the noisy supervisions from the weak labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products. Extensive experiments on a real-world large-scale E-commerce dataset demonstrate the effectiveness of QUEACO.

超分辨率|去噪|去模糊|去雾(1篇)

【1】 Temporal Kernel Consistency for Blind Video Super-Resolution 标题：盲视频超分辨率的时间核一致性研究链接：https://arxiv.org/abs/2108.08305

作者：Lichuan Xiang,Royson Lee,Mohamed S. Abdelfattah,Nicholas D. Lane,Hongkai Wen 机构：University of Warwick, University of Cambridge, Samsung AI Center, Cambridge 摘要：基于深度学习的盲超分辨率（SR）方法最近在未知退化的放大帧中取得了前所未有的性能。这些模型能够从给定的低分辨率（LR）图像中准确估计未知的降尺度核，以便在恢复过程中利用核。尽管这些方法在很大程度上取得了成功，但它们主要基于图像，因此不利用多个视频帧中内核的时间特性。在本文中，我们研究了核的时间特性，并强调了它在盲视频超分辨率任务中的重要性。具体地说，我们测量了真实世界视频的内核时间一致性，并说明了在场景及其对象的动态性不同的视频中，估计的内核在每帧中是如何变化的。有了这一新的见解，我们回顾了以前流行的视频SR方法，并表明以前在整个恢复过程中使用固定内核的假设在放大真实世界的视频时会导致视觉伪影。为了解决这个问题，我们定制了现有的单图像和视频SR技术，以在内核估计和视频放大过程中利用内核一致性。对合成视频和真实视频的大量实验表明，从数量和质量上都有很大的恢复收益，实现了盲视频SR的最新技术，并强调了利用内核时间一致性的潜力。摘要：Deep learning-based blind super-resolution (SR) methods have recently achieved unprecedented performance in upscaling frames with unknown degradation. These models are able to accurately estimate the unknown downscaling kernel from a given low-resolution (LR) image in order to leverage the kernel during restoration. Although these approaches have largely been successful, they are predominantly image-based and therefore do not exploit the temporal properties of the kernels across multiple video frames. In this paper, we investigated the temporal properties of the kernels and highlighted its importance in the task of blind video super-resolution. Specifically, we measured the kernel temporal consistency of real-world videos and illustrated how the estimated kernels might change per frame in videos of varying dynamicity of the scene and its objects. With this new insight, we revisited previous popular video SR approaches, and showed that previous assumptions of using a fixed kernel throughout the restoration process can lead to visual artifacts when upscaling real-world videos. In order to counteract this, we tailored existing single-image and video SR techniques to leverage kernel consistency during both kernel estimation and video upscaling processes. Extensive experiments on synthetic and real-world videos show substantial restoration gains quantitatively and qualitatively, achieving the new state-of-the-art in blind video SR and underlining the potential of exploiting kernel temporal consistency.

联邦学习|隐私保护|加密(6篇)

【1】 Communication-Efficient Federated Learning via Robust Distributed Mean Estimation 标题：基于稳健分布均值估计的通信高效联邦学习链接：https://arxiv.org/abs/2108.08842

作者：Shay Vargaftik,Ran Ben Basat,Amit Portnoy,Gal Mendelson,Yaniv Ben-Itzhak,Michael Mitzenmacher 机构：VMware Research, University College London, Ben-Gurion University, Stanford University, Harvard University 备注：A technical report that extends arXiv:2105.08339 摘要：联合学习通常依赖于分布式（小批量）SGD等算法，其中多个客户端计算其梯度，并将其发送给中心协调器，以平均和更新模型。为了优化传输时间和训练过程的可伸缩性，客户端通常使用有损压缩来减少消息大小。DRIVE是一种最新的算法，它使用每个坐标一位来压缩梯度（具有一些较低的阶开销）。在本技术报告中，我们概括了DRIVE以支持任何带宽限制，并将其扩展以支持异构客户端资源，使其对数据包丢失具有鲁棒性。摘要：Federated learning commonly relies on algorithms such as distributed (mini-batch) SGD, where multiple clients compute their gradients and send them to a central coordinator for averaging and updating the model. To optimize the transmission time and the scalability of the training process, clients often use lossy compression to reduce the message sizes. DRIVE is a recent state of the art algorithm that compresses gradients using one bit per coordinate (with some lower-order overhead). In this technical report, we generalize DRIVE to support any bandwidth constraint as well as extend it to support heterogeneous client resources and make it robust to packet loss.

【2】 Client Selection Approach in Support of Clustered Federated Learning over Wireless Edge Networks 标题：支持无线边缘网络分簇联合学习的客户选择方法链接：https://arxiv.org/abs/2108.08768

作者：Abdullatif Albaseer,Mohamed Abdallah,Ala Al-Fuqaha,Aiman Erbad 机构：Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar 备注：4 figures, 7 pages 摘要：集群联合多任务学习（CFL）是一种在数据不平衡且以非i.i.d.（非独立且相同分布）的方式分布于客户端时获得可靠专门模型的有效方案。虽然类似于余弦相似性的相似性度量指标可以用来为客户机组赋予一个专门的模型，但这一过程可能很困难，因为服务器应该在每个联合学习轮次中涉及所有客户机。因此，由于网络边缘的带宽和延迟限制，必须定期选择客户端子集。为此，本文提出了一种新的客户选择算法，该算法旨在加快收敛速度，以获得专门的机器学习模型，从而实现所有客户群的高测试精度。具体地说，我们引入了一种客户机选择方法，该方法利用设备的异构性，根据客户机的轮延迟来调度客户机，并利用占用更多时间更新模型的客户机的带宽重用。然后，服务器执行模型平均，并根据预定义的阈值对客户端进行集群。当一个特定的集群到达一个固定点时，该算法通过选择延迟较小的客户机来更新模型，从而对该集群使用贪婪调度算法。大量实验表明，该方法在为每个客户端注入适合其局部数据分布的专用模型的同时，减少了训练时间，加快了收敛速度达50%。摘要：Clustered Federated Multitask Learning (CFL) was introduced as an efficient scheme to obtain reliable specialized models when data is imbalanced and distributed in a non-i.i.d. (non-independent and identically distributed) fashion amongst clients. While a similarity measure metric, like the cosine similarity, can be used to endow groups of the client with a specialized model, this process can be arduous as the server should involve all clients in each of the federated learning rounds. Therefore, it is imperative that a subset of clients is selected periodically due to the limited bandwidth and latency constraints at the network edge. To this end, this paper proposes a new client selection algorithm that aims to accelerate the convergence rate for obtaining specialized machine learning models that achieve high test accuracies for all client groups. Specifically, we introduce a client selection approach that leverages the devices' heterogeneity to schedule the clients based on their round latency and exploits the bandwidth reuse for clients that consume more time to update the model. Then, the server performs model averaging and clusters the clients based on predefined thresholds. When a specific cluster reaches a stationary point, the proposed algorithm uses a greedy scheduling algorithm for that group by selecting the clients with less latency to update the model. Extensive experiments show that the proposed approach lowers the training time and accelerates the convergence rate by up to 50% while imbuing each client with a specialized model that is fit for its local data distribution.

【3】 Multi-Center Federated Learning 标题：多中心联合学习链接：https://arxiv.org/abs/2108.08647

作者：Ming Xie,Guodong Long,Tao Shen,Tianyi Zhou,Xianzhi Wang,Jing Jiang,Chengqi Zhang 机构：Australian AI Institute, University of Technology Sydney, Paul G. Allen School of Computer Science and Engineering, University of Washington 备注：arXiv admin note: substantial text overlap with arXiv:2005.01026 摘要：联邦学习（FL）可以保护分布式学习中的数据隐私，因为它只收集用户的局部梯度，而不访问他们的数据。然而，FL在实际环境中经常遇到的异质性中是脆弱的，例如，不同用户的非IID数据。现有的FL方法通常更新单个全局模型，通过聚合用户的梯度来获取所有用户的共享知识，而不考虑其数据分布之间的差异。通过比较，如果将用户分配到FL中的不同全局模型（即中心），则多个全局模型的混合可以捕获不同用户之间的异构性。为此，我们提出了一种新的多中心聚合机制。它从数据中学习多个全局模型，同时得出用户和中心之间的最佳匹配。然后，我们将其描述为一个双层优化问题，可以通过随机期望最大化（EM）算法有效地解决。在多个FL基准数据集上的实验表明，我们的方法优于几个流行的FL竞争对手。源代码在Github上是开源的。摘要：Federated learning (FL) can protect data privacy in distributed learning since it merely collects local gradients from users without access to their data. However, FL is fragile in the presence of heterogeneity that is commonly encountered in practical settings, e.g., non-IID data over different users. Existing FL approaches usually update a single global model to capture the shared knowledge of all users by aggregating their gradients, regardless of the discrepancy between their data distributions. By comparison, a mixture of multiple global models could capture the heterogeneity across various users if assigning the users to different global models (i.e., centers) in FL. To this end, we propose a novel multi-center aggregation mechanism . It learns multiple global models from data, and simultaneously derives the optimal matching between users and centers. We then formulate it as a bi-level optimization problem that can be efficiently solved by a stochastic expectation maximization (EM) algorithm. Experiments on multiple benchmark datasets of FL show that our method outperforms several popular FL competitors. The source code are open source on Github.

【4】 Towards More Efficient Federated Learning with Better Optimization Objects 标题：用更好的优化目标走向更有效的联邦学习链接：https://arxiv.org/abs/2108.08577

作者：Zirui Zhu,Ziyi Ye 机构：Dept. of Computer Science and Technology, Tsinghua University Beijing, China 摘要：联邦学习（FL）是一种受隐私保护的机器学习范式，它允许在边缘直接训练模型，而无需上传数据。FL在实际应用中面临的最大挑战之一是边缘节点数据的异构性，这将减慢收敛速度并降低模型的性能。对于上述问题，一个有代表性的解决方案是在本地训练中添加附加约束，如FedProx、FedCurv和FedCL。然而，上述算法仍有改进的余地。我们建议使用过去获得的所有模型的聚合作为新的约束目标，以进一步提高此类算法的性能。在不同环境下的实验表明，该方法显著提高了模型的收敛速度和性能。摘要：Federated Learning (FL) is a privacy-protected machine learning paradigm that allows model to be trained directly at the edge without uploading data. One of the biggest challenges faced by FL in practical applications is the heterogeneity of edge node data, which will slow down the convergence speed and degrade the performance of the model. For the above problems, a representative solution is to add additional constraints in the local training, such as FedProx, FedCurv and FedCL. However, the above algorithms still have room for improvement. We propose to use the aggregation of all models obtained in the past as new constraint target to further improve the performance of such algorithms. Experiments in various settings demonstrate that our method significantly improves the convergence speed and performance of the model.

【5】 Fair and Consistent Federated Learning 标题：公平一致的联合学习链接：https://arxiv.org/abs/2108.08435

作者：Sen Cui,Weishen Pan,Jian Liang,Changshui Zhang,Fei Wang 机构：Institute for Artificial Intelligence, Tsinghua University (THUAI), State Key Lab of Intelligent Technologies and Systems, Beijing National Research Center for Information Science and Technology (BNRist) 摘要：联邦学习（FL）因其能够从分布式数据源集中学习而不需要跨不同数据源访问原始数据样本而受到越来越多的关注。到目前为止，FL研究主要集中在提高性能、算法差异对FL学习模型的影响以及算法差异对效用不一致性的影响等方面。在本文中，我们提出了一个FL框架，共同考虑性能一致性和算法公平性在不同的本地客户端（数据源）。我们从约束多目标优化的角度导出了我们的框架，在这个框架中，我们学习了一个在所有客户机上满足公平约束且性能一致的模型。具体地说，我们将算法在每个本地客户端的预测损失作为一个目标，并通过优化一个包含所有目标的代理最大函数，在公平性约束下最大化性能最差的客户端。采用基于梯度的方法来实现该优化问题的帕累托最优性。理论分析证明，我们的方法可以收敛到一个Pareto解，在所有客户机上都有公平性约束的情况下达到最小-最大性能。在合成数据集和真实数据集上进行的综合实验表明，我们的方法优于基线，并且在实现所有本地客户的公平性和一致性方面有效。摘要：Federated learning (FL) has gain growing interests for its capability of learning from distributed data sources collectively without the need of accessing the raw data samples across different sources. So far FL research has mostly focused on improving the performance, how the algorithmic disparity will be impacted for the model learned from FL and the impact of algorithmic disparity on the utility inconsistency are largely unexplored. In this paper, we propose an FL framework to jointly consider performance consistency and algorithmic fairness across different local clients (data sources). We derive our framework from a constrained multi-objective optimization perspective, in which we learn a model satisfying fairness constraints on all clients with consistent performance. Specifically, we treat the algorithm prediction loss at each local client as an objective and maximize the worst-performing client with fairness constraints through optimizing a surrogate maximum function with all objectives involved. A gradient-based procedure is employed to achieve the Pareto optimality of this optimization problem. Theoretical analysis is provided to prove that our method can converge to a Pareto solution that achieves the min-max performance with fairness constraints on all clients. Comprehensive experiments on synthetic and real-world datasets demonstrate the superiority that our approach over baselines and its effectiveness in achieving both fairness and consistency across all local clients.

【6】 Federated Variational Learning for Anomaly Detection in Multivariate Time Series 标题：多变量时间序列异常检测的联邦变分学习链接：https://arxiv.org/abs/2108.08404

作者：Kai Zhang,Yushan Jiang,Lee Seversky,Chengtao Xu,Dahai Liu,Houbing Song 机构：Embry-Riddle Aeronautical University, Daytona Beach, FL , Air Force Research Laboratory, Rome, NY 备注：Accepted paper in the IEEE 40th International Performance Computing and Communications Conference - IPCCC 2021 摘要：由于网络传感器和执行器在网络物理系统（CPS）中生成高维多元时间序列数据，异常检测一直是一项具有挑战性的任务。除了此类时间序列的高度非线性、复杂和动态特性外，缺少标记数据还妨碍了以有监督的方式利用数据，从而妨碍了对异常现象的准确检测。另一方面，在网络边缘收集的数据往往对隐私敏感且数量庞大，这可能会妨碍在主服务器上进行集中训练。为了解决这些问题，我们以联邦方式提出了一个无监督的时间序列异常检测框架，以持续监控网络中互连设备的行为，并对异常事件发出警报，以便在意外后果发生之前采取应对措施。具体地说，我们将训练数据分布在边缘，学习基于卷积选通递归单元（ConvGRU）模型的共享变分自动编码器（VAE），该模型联合捕获多变量时间序列数据中的特征和时间依赖性，用于表示学习和下游异常检测任务。在三个真实网络传感器数据集上的实验说明了我们的方法相对于其他最先进模型的优势。我们还进行了大量实验，以证明我们的检测框架在非联邦和联邦设置下在总体性能和检测延迟方面的有效性。摘要：Anomaly detection has been a challenging task given high-dimensional multivariate time series data generated by networked sensors and actuators in Cyber-Physical Systems (CPS). Besides the highly nonlinear, complex, and dynamic natures of such time series, the lack of labeled data impedes data exploitation in a supervised manner and thus prevents an accurate detection of abnormal phenomenons. On the other hand, the collected data at the edge of the network is often privacy sensitive and large in quantity, which may hinder the centralized training at the main server. To tackle these issues, we propose an unsupervised time series anomaly detection framework in a federated fashion to continuously monitor the behaviors of interconnected devices within a network and alerts for abnormal incidents so that countermeasures can be taken before undesired consequences occur. To be specific, we leave the training data distributed at the edge to learn a shared Variational Autoencoder (VAE) based on Convolutional Gated Recurrent Unit (ConvGRU) model, which jointly captures feature and temporal dependencies in the multivariate time series data for representation learning and downstream anomaly detection tasks. Experiments on three real-world networked sensor datasets illustrate the advantage of our approach over other state-of-the-art models. We also conduct extensive experiments to demonstrate the effectiveness of our detection framework under non-federated and federated settings in terms of overall performance and detection latency.

推理|分析|理解|解释(2篇)

【1】 Attribute-based Explanations of Non-Linear Embeddings of High-Dimensional Data 标题：基于属性的高维数据非线性嵌入解释链接：https://arxiv.org/abs/2108.08706

作者：Jan-Tobias Sohns,Michaela Schmitt,Fabian Jirasek,Hans Hasse,Heike Leitte 机构： Hasse are with Laboratory of EngineeringThermodynamics (LTD) at TU Kaiserslautern 备注：IEEE VIS (InfoVis/VAST/SciVis) 2021 摘要：高维数据的嵌入广泛用于探索数据、验证分析结果和交流信息。它们的解释，特别是关于输入属性的解释，通常是困难的。对于PCA等线性项目，轴仍然可以进行有意义的注释。对于非线性投影，这已不再可能，需要基于属性的颜色编码等替代策略。在本文中，我们回顾了现有的增强技术，并讨论了它们的局限性。我们提出了一种非线性嵌入测量器（NoLiES），它将一种新的投影数据增强策略（范围集）与小倍数环境下的交互式分析相结合。Rangesets对装箱属性值使用基于集合的可视化方法，使用户能够快速观察结构并检测异常值。我们详细说明了代数拓扑和范围集之间的联系，并展示了NoLiES在具有各种挑战（复杂属性值分布、多属性、多数据点）的案例研究中的效用，以及在理解热力学中矩阵完备的潜在特征方面的实际应用。摘要：Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.

【2】 Understanding and Mitigating Annotation Bias in Facial Expression Recognition 标题：面部表情识别中标注偏差的理解与缓解链接：https://arxiv.org/abs/2108.08504

作者：Yunliang Chen,Jungseock Joo 机构：University of California, Los Angeles 备注：To appear in ICCV 2021 摘要：计算机视觉模型的性能取决于其训练数据的大小和质量。最近的研究揭示了常见图像数据集中先前未知的合成偏差，这些偏差会导致模型输出出现偏差，并提出了缓解这些偏差的方法。然而，大多数现有的工作都假设人工生成的注释可以被视为金标准和无偏见的。在本文中，我们揭示了这个假设可能是有问题的，并且应该特别注意防止模型学习这种注释偏差。我们专注于面部表情识别，并比较实验室控制和野生数据集之间的标签偏差。我们证明了许多表达数据集在性别之间存在显著的注释偏差，特别是当涉及到快乐和愤怒的表达时，并且传统方法无法完全缓解训练模型中的这种偏差。为了消除表情标注偏差，我们提出了一种AU校准的面部表情识别（AUC-FER）框架，该框架利用面部动作单元（AUs）并将三重态丢失纳入目标函数。实验结果表明，与现有的方法相比，该方法在消除表达式标注偏差方面更为有效。摘要：The performance of a computer vision model depends on the size and quality of its training data. Recent studies have unveiled previously-unknown composition biases in common image datasets which then lead to skewed model outputs, and have proposed methods to mitigate these biases. However, most existing works assume that human-generated annotations can be considered gold-standard and unbiased. In this paper, we reveal that this assumption can be problematic, and that special care should be taken to prevent models from learning such annotation biases. We focus on facial expression recognition and compare the label biases between lab-controlled and in-the-wild datasets. We demonstrate that many expression datasets contain significant annotation biases between genders, especially when it comes to the happy and angry expressions, and that traditional methods cannot fully mitigate such biases in trained models. To remove expression annotation bias, we propose an AU-Calibrated Facial Expression Recognition (AUC-FER) framework that utilizes facial action units (AUs) and incorporates the triplet loss into the objective function. Experimental results suggest that the proposed method is more effective in removing expression annotation bias than existing techniques.

检测相关(2篇)

【1】 Efficient remedies for outlier detection with variational autoencoders 标题：利用变分自动编码器进行离群点检测的有效补救方法链接：https://arxiv.org/abs/2108.08760

作者：Kushal Chauhan,Pradeep Shenoy,Manish Gupta,Devarajan Sridharan 机构：Google Research, Center for Neuroscience, Indian Institute of Science 备注：27 pages 摘要：当使用远离其训练分布的离群数据进行测试时，深度网络通常会做出自信但不正确的预测。由深层生成模型计算的似然度是使用未标记数据进行离群点检测的候选度量。然而，以前的研究表明，这种可能性是不可靠的，并且很容易通过对输入数据的简单转换而产生偏差。在这里，我们研究了最简单的深层生成模型中的变异自动编码器（VAE）异常检测。首先，我们证明了理论上的接地校正很容易通过VAE似然估计改善关键偏差。偏差校正是无模型的、特定于样本的，并且使用伯努利和连续伯努利可见分布精确计算。其次，我们展示了一种众所周知的预处理技术，对比度归一化，将偏差校正的有效性扩展到自然图像数据集。第三，我们证明了在VAE集合上计算的似然度的方差也能够实现鲁棒的离群点检测。我们使用九个（灰度和自然）图像数据集对我们的治疗方法进行了全面评估，并证明了与其他四种最先进的方法相比，在速度和准确性方面具有显著优势。我们的轻量级补救措施受到了生物学的启发，可能有助于通过多种类型的深层生成模型实现有效的异常值检测。摘要：Deep networks often make confident, yet incorrect, predictions when tested with outlier data that is far removed from their training distributions. Likelihoods computed by deep generative models are a candidate metric for outlier detection with unlabeled data. Yet, previous studies have shown that such likelihoods are unreliable and can be easily biased by simple transformations to input data. Here, we examine outlier detection with variational autoencoders (VAEs), among the simplest class of deep generative models. First, we show that a theoretically-grounded correction readily ameliorates a key bias with VAE likelihood estimates. The bias correction is model-free, sample-specific, and accurately computed with the Bernoulli and continuous Bernoulli visible distributions. Second, we show that a well-known preprocessing technique, contrast normalization, extends the effectiveness of bias correction to natural image datasets. Third, we show that the variance of the likelihoods computed over an ensemble of VAEs also enables robust outlier detection. We perform a comprehensive evaluation of our remedies with nine (grayscale and natural) image datasets, and demonstrate significant advantages, in terms of both speed and accuracy, over four other state-of-the-art methods. Our lightweight remedies are biologically inspired and may serve to achieve efficient outlier detection with many types of deep generative models.

【2】 Learning to Detect: A Data-driven Approach for Network Intrusion Detection 标题：学会检测：一种数据驱动的网络入侵检测方法链接：https://arxiv.org/abs/2108.08394

作者：Zachary Tauscher,Yushan Jiang,Kai Zhang,Jian Wang,Houbing Song 机构：Department of Electrical Engineering & Computer Science, Embry-Riddle Aeronautical University, Daytona Beach, FL , USA 备注：Accepted paper in the IEEE 40th International Performance Computing and Communications Conference - IPCCC 2021 摘要：随着每天产生的海量数据和世界互联网基础设施的日益互联，基于机器学习的入侵检测系统（IDS）已成为保护我们的经济和国家安全的重要组成部分。在本文中，我们对网络流量数据集NSL-KDD进行了全面的研究，通过可视化模式和采用不同的基于学习的模型来检测网络攻击。与以往采用单一学习模型方法进行入侵检测的浅层学习和深度学习模型不同，我们采用了分层策略，首先对入侵和正常行为进行分类，然后对特定类型的攻击进行分类。我们证明了无监督表示学习模型在二进制入侵检测任务中的优势。此外，我们还利用SVM-SMOTE过采样技术缓解了四类分类中的数据不平衡问题，并进一步证明了以深度神经网络为基础模型的过采样机制的有效性和缺陷。摘要：With massive data being generated daily and the ever-increasing interconnectivity of the world's Internet infrastructures, a machine learning based intrusion detection system (IDS) has become a vital component to protect our economic and national security. In this paper, we perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks. Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy, in which the intrusion and normal behavior are classified firstly, and then the specific types of attacks are classified. We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks. Besides, we alleviate the data imbalance problem with SVM-SMOTE oversampling technique in 4-class classification and further demonstrate the effectiveness and the drawback of the oversampling mechanism with a deep neural network as a base model.

分类|识别(3篇)

【1】 Optimally Efficient Sequential Calibration of Binary Classifiers to Minimize Classification Error 标题：最小化分类误差的二值分类器最优有效序贯校正链接：https://arxiv.org/abs/2108.08780

作者：Kaan Gokcesu,Hakan Gokcesu 摘要：在这项工作中，我们的目标是通过寻找到类概率的“最优”映射来校准二元分类问题估计量的分数输出，其中“最优”映射在某种意义上是最小化分类错误（或等效地，最大化精度）。我们证明，对于给定的目标变量和估计量的分数输出，“最优”软映射（将分数值单调映射到概率）是将分数值映射到$0$和$1$的硬映射。我们表明，对于类加权（其中一类的准确度更为重要）和样本加权（其中样本的准确分类并不同等重要）误差，甚至一般线性损失；这种硬映射特性被保留下来。我们提出了一种顺序递归合并方法，该方法可以按顺序为每个传入的新样本生成一个“最优”硬映射（对于迄今为止观察到的样本）。我们的方法在样本量时间复杂度上是对数的，这是最有效的。摘要：In this work, we aim to calibrate the score outputs of an estimator for the binary classification problem by finding an 'optimal' mapping to class probabilities, where the 'optimal' mapping is in the sense that minimizes the classification error (or equivalently, maximizes the accuracy). We show that for the given target variables and the score outputs of an estimator, an 'optimal' soft mapping, which monotonically maps the score values to probabilities, is a hard mapping that maps the score values to $0$ and $1$. We show that for class weighted (where the accuracy for one class is more important) and sample weighted (where the samples' accurate classifications are not equally important) errors, or even general linear losses; this hard mapping characteristic is preserved. We propose a sequential recursive merger approach, which produces an 'optimal' hard mapping (for the observed samples so far) sequentially with each incoming new sample. Our approach has a logarithmic in sample size time complexity, which is optimally efficient.

【2】 Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification 标题：基于反事实注意学习的细粒度视觉分类与再识别链接：https://arxiv.org/abs/2108.08728

作者：Yongming Rao,Guangyi Chen,Jiwen Lu,Jie Zhou 机构：Department of Automation, Tsinghua University, China, State Key Lab of Intelligent Technologies and Systems, China, Beijing National Research Center for Information Science and Technology, China 备注：Accepted to ICCV 2021 摘要：注意机制在细粒度视觉识别任务中显示出巨大的潜力。本文提出了一种基于因果推理的反事实注意学习方法来学习更有效的注意。与大多数现有的基于传统似然理论的视觉注意学习方法不同，我们提出用反事实因果关系来学习注意，它提供了一种测量注意质量的工具，并提供了一个强大的监督信号来指导学习过程。具体来说，我们通过反事实干预来分析学习到的视觉注意对网络预测的影响，并最大限度地提高影响，以鼓励网络学习更多有用的注意，用于细粒度图像识别。根据经验，我们在广泛的细粒度识别任务中评估了我们的方法，其中注意力起着至关重要的作用，包括细粒度图像分类、人员重新识别和车辆重新识别。所有基准的持续改进证明了我们方法的有效性。代码可在https://github.com/raoyongming/CAL 摘要：Attention mechanism has demonstrated great potential in fine-grained visual recognition tasks. In this paper, we present a counterfactual attention learning method to learn more effective attention based on causal inference. Unlike most existing methods that learn visual attention based on conventional likelihood, we propose to learn the attention with counterfactual causality, which provides a tool to measure the attention quality and a powerful supervisory signal to guide the learning process. Specifically, we analyze the effect of the learned visual attention on network prediction through counterfactual intervention and maximize the effect to encourage the network to learn more useful attention for fine-grained image recognition. Empirically, we evaluate our method on a wide range of fine-grained recognition tasks where attention plays a crucial role, including fine-grained image categorization, person re-identification, and vehicle re-identification. The consistent improvement on all benchmarks demonstrates the effectiveness of our method. Code is available at https://github.com/raoyongming/CAL

【3】 Classification of Diabetic Retinopathy Severity in Fundus Images with DenseNet121 and ResNet50 标题：DenseNet121和ResNet50在眼底图像中对糖尿病视网膜病变严重程度的分类链接：https://arxiv.org/abs/2108.08473

作者：Jonathan Zhang,Bowen Xie,Xin Wu,Rahul Ram,David Liang 机构：Commack High School, Commack, NY , Glenda Dawson High School, Pearland, TX , Mira Loma High School, Carmichael, CA , Ward Melville High School, East Setauket, NY , Machine Learning, Camp Illumina, Illumina Learning, arXiv:,.,v, [eess.IV] , Aug 备注：15 pages, 14 figures; Jonathan Zhang - first author, Rahul Ram and David Liang - principal investigators; classifier repository - $url{this https URL}$ 摘要：在这项工作中，深度学习算法用于根据糖尿病视网膜病变的严重程度对眼底图像进行分类。测试了两种模型结构（密集卷积网络-121和残差神经网络-50）以及三种图像类型（RGB、绿色和高对比度）的六种不同组合，以找到性能最高的组合。我们的平均验证损失为0.17，最大验证准确率为85%。通过测试多个组合，某些参数组合的表现优于其他组合，尽管总体上发现最小方差。绿色过滤的效果最差，而与RGB分析相比，放大对比度的效果似乎可以忽略不计。与DenseNet121相比，ResNet50被证明不是一个健壮的模型。摘要：In this work, deep learning algorithms are used to classify fundus images in terms of diabetic retinopathy severity. Six different combinations of two model architectures, the Dense Convolutional Network-121 and the Residual Neural Network-50 and three image types, RGB, Green, and High Contrast, were tested to find the highest performing combination. We achieved an average validation loss of 0.17 and a max validation accuracy of 85 percent. By testing out multiple combinations, certain combinations of parameters performed better than others, though minimal variance was found overall. Green filtration was shown to perform the poorest, while amplified contrast appeared to have a negligible effect in comparison to RGB analysis. ResNet50 proved to be less of a robust model as opposed to DenseNet121.

优化|敛散性(1篇)

【1】 Inverse design optimization framework via a two-step deep learning approach: application to a wind turbine airfoil 标题：基于两步深度学习的逆向设计优化框架：在风力机翼型中的应用链接：https://arxiv.org/abs/2108.08500

作者：Sunwoong Yang,Sanga Lee,Kwanjung Yee 机构：a Seoul National University, Seoul , Republic of Korea, b Korea Institute of Industrial Technology, Incheon , Republic of Korea 备注：This manuscript is being reviewed in the journal "Engineering with Computers" 摘要：尽管逆方法在气动设计中计算效率高，因为指定了期望的目标性能分布，但它有一些显著的限制，无法实现完全效率。首先，当指定的目标分布发生变化时，应重复迭代过程。可以执行目标分布优化，以澄清指定此分布时的模糊性，但在此过程中会出现一些其他问题，如分布参数化导致的表示能力损失、现实分布的过度约束、，由于理论/经验预测导致的感兴趣数量不准确，以及无法明确施加几何约束。为了解决这些问题，提出了一种基于两步深度学习的逆向设计优化框架。使用变分自动编码器和多层感知器生成真实的目标分布，并根据生成的分布分别预测感兴趣的数量和形状参数。然后，将目标分布优化作为逆设计优化进行。该框架采用主动学习和迁移学习技术来提高学习的准确性和效率。最后，该框架通过风力涡轮机叶片翼型的气动形状优化得到验证，其中逆向设计正在积极应用。优化结果表明，该框架具有足够的精度、效率和灵活性，可应用于其他逆向设计工程应用。摘要：Though inverse approach is computationally efficient in aerodynamic design as the desired target performance distribution is specified, it has some significant limitations that prevent full efficiency from being achieved. First, the iterative procedure should be repeated whenever the specified target distribution changes. Target distribution optimization can be performed to clarify the ambiguity in specifying this distribution, but several additional problems arise in this process such as loss of the representation capacity due to parameterization of the distribution, excessive constraints for a realistic distribution, inaccuracy of quantities of interest due to theoretical/empirical predictions, and the impossibility of explicitly imposing geometric constraints. To deal with these issues, a novel inverse design optimization framework with a two-step deep learning approach is proposed. A variational autoencoder and multi-layer perceptron are used to generate a realistic target distribution and predict the quantities of interest and shape parameters from the generated distribution, respectively. Then, target distribution optimization is performed as the inverse design optimization. The proposed framework applies active learning and transfer learning techniques to improve accuracy and efficiency. Finally, the framework is validated through aerodynamic shape optimizations of the airfoil of a wind turbine blade, where inverse design is actively being applied. The results of the optimizations show that this framework is sufficiently accurate, efficient, and flexible to be applied to other inverse design engineering applications.

预测|估计(1篇)

【1】 DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders 标题：DECA：基于胶囊自动编码器的深度视点等变人体姿态估计链接：https://arxiv.org/abs/2108.08557

作者：Nicola Garau,Niccolò Bisagno,Piotr Bródka,Nicola Conci 机构：University of Trento, Via Sommarive, Povo, Trento TN 备注：International Conference on Computer Vision 2021 (ICCV 2021), 8 pages, 4 figures, 4 tables, accepted for ICCV 2021 oral 摘要：人体姿势估计（HPE）旨在从图像或视频中检索人体关节的三维位置。我们发现，当前的3D HPE方法缺乏视点等价性，即在处理训练时看不到的视点时，它们往往会失败或表现不佳。深度学习方法通常依赖于缩放不变、平移不变或旋转不变操作，如最大池。然而，采用这样的程序并不一定能提高视点的泛化，反而会导致更多依赖数据的方法。为了解决这个问题，我们提出了一种具有快速变分贝叶斯胶囊路由的新型胶囊自动编码器网络，称为DECA。通过将每个关节建模为胶囊实体，并结合路由算法，我们的方法可以独立于视点在特征空间中保留关节的层次结构和几何结构。通过实现视点等变，我们大大减少了训练时的网络数据依赖性，从而提高了对不可见视点的泛化能力。在实验验证中，我们在可见和不可见视点、俯视和前视图的深度图像上都优于其他方法。在RGB领域，同一网络在具有挑战性的视点转移任务上提供了最先进的结果，也为俯视HPE建立了新的框架。有关代码，请访问https://github.com/mmlab-cv/DECA. 摘要：Human Pose Estimation (HPE) aims at retrieving the 3D position of human joints from images or videos. We show that current 3D HPE methods suffer a lack of viewpoint equivariance, namely they tend to fail or perform poorly when dealing with viewpoints unseen at training time. Deep learning methods often rely on either scale-invariant, translation-invariant, or rotation-invariant operations, such as max-pooling. However, the adoption of such procedures does not necessarily improve viewpoint generalization, rather leading to more data-dependent methods. To tackle this issue, we propose a novel capsule autoencoder network with fast Variational Bayes capsule routing, named DECA. By modeling each joint as a capsule entity, combined with the routing algorithm, our approach can preserve the joints' hierarchical and geometrical structure in the feature space, independently from the viewpoint. By achieving viewpoint equivariance, we drastically reduce the network data dependency at training time, resulting in an improved ability to generalize for unseen viewpoints. In the experimental validation, we outperform other methods on depth images from both seen and unseen viewpoints, both top-view, and front-view. In the RGB domain, the same network gives state-of-the-art results on the challenging viewpoint transfer task, also establishing a new framework for top-view HPE. The code can be found at https://github.com/mmlab-cv/DECA.

其他神经网络|深度学习|模型|建模(15篇)

【1】 Learning Equilibria in Matching Markets from Bandit Feedback 标题：从Bandit反馈学习匹配市场中的均衡链接：https://arxiv.org/abs/2108.08843

作者：Meena Jagadeesan,Alexander Wei,Yixin Wang,Michael I. Jordan,Jacob Steinhardt 机构：UC Berkeley, EECS and Statistics, UC Berkeley, Statistics 摘要：大规模双边匹配平台必须找到符合用户偏好的市场结果，同时从数据中学习这些偏好。然而，由于偏好在学习过程中固有的不确定性，稳定性的经典概念（Gale和Shapley，1962；Shapley和Shubik，1971）在这些环境中是无法实现的。为了弥补这一差距，我们开发了一个框架和算法，用于在不确定性条件下学习稳定的市场结果。我们的主要设置是与可转移的实用程序相匹配，平台既匹配代理，又设置代理之间的货币转移。我们设计了一个具有激励意识的学习目标，以捕捉市场结果与均衡的距离。利用这个目标，我们分析了学习的复杂性作为偏好结构的函数，将学习归结为一个随机的多臂强盗问题。在算法上，我们证明了“面对不确定性时的乐观主义”（许多bandit算法的基本原理）适用于与传输匹配的原始-对偶公式，并导致接近最优的遗憾边界。我们的工作为阐明大型数据驱动市场中何时以及如何出现稳定匹配迈出了第一步。摘要：Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. However, since preferences are inherently uncertain during learning, the classical notion of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) is unattainable in these settings. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that "optimism in the face of uncertainty," the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.

【2】 Learning-to-learn non-convex piecewise-Lipschitz functions 标题：学习型非凸分片Lipschitz函数链接：https://arxiv.org/abs/2108.08770

作者：Maria-Florina Balcan,Mikhail Khodak,Dravyansh Sharma,Ameet Talwalkar 摘要：我们分析了分段Lipschitz函数学习算法的初始化和步长的元学习，这是一种非凸设置，可应用于机器学习和算法。从离散不连续损失指数预测的最新遗憾界出发，我们将其推广为依赖于初始化的，然后利用这一结果提出了一个实用的元学习过程，该过程从多个在线学习任务中学习算法的初始化和步长。渐近地，我们保证任务间的平均后悔与任务相似性的自然概念相匹配，该概念度量不同任务的接近最优区域之间的重叠量。最后，我们在鲁棒元学习和多任务数据驱动算法设计两个重要的环境中对该方法及其保证进行了实例说明。摘要：We analyze the meta-learning of the initialization and step-size of learning algorithms for piecewise-Lipschitz functions, a non-convex setting with applications to both machine learning and algorithms. Starting from recent regret bounds for the exponential forecaster on losses with dispersed discontinuities, we generalize them to be initialization-dependent and then use this result to propose a practical meta-learning procedure that learns both the initialization and the step-size of the algorithm from multiple online learning tasks. Asymptotically, we guarantee that the average regret across tasks scales with a natural notion of task-similarity that measures the amount of overlap between near-optimal regions of different tasks. Finally, we instantiate the method and its guarantee in two important settings: robust meta-learning and multi-task data-driven algorithm design.

【3】 Threshold Phenomena in Learning Halfspaces with Massart Noise 标题：带有Massart噪声的半空间学习中的阈值现象链接：https://arxiv.org/abs/2108.08767

作者：Ilias Diakonikolas,Daniel M. Kane,Vasilis Kontonis,Christos Tzamos,Nikos Zarifis 机构：UW Madison, UC San-Diego 摘要：研究了高斯边缘条件下带马萨特噪声的$mathbb{R}^d$上PAC学习半空间的问题。在Massart噪声模型中，对于[0,1/2]$中的某些参数$eta，允许对手以$eta（mathbf{x}）leqeta$的概率翻转每个点$mathbf{x}$的标签。学习者的目标是输出分类错误为$mathrm{opt} epsilon$的假设，其中$mathrm{opt}$是目标半空间的错误。之前的工作研究了这个问题，假设目标半空间是齐次的，并且参数$eta$严格小于$1/2$。我们探索了当这些假设中的任何一个被移除时，问题的复杂性是如何变化的，建立了以下阈值现象：对于$eta=1/2$，我们证明了$d^{Omega（log（1/epsilon））}$的下界与问题的任何统计查询（SQ）算法的复杂性有关，即使对于齐次半空间也是如此。从积极的方面来看，我们给出了一个新的学习算法，该算法具有样本复杂度和运行时间$Ou0epsilon（1），d^{O（log（1/epsilon））}$。对于$eta<1/2$，我们在问题的SQ复杂性上建立了$d^{Omega（log（1/gamma））}$的下界，其中$gamma=max{epsilon、min{mathbf{Pr}[f（mathbf{x}）=1]、mathbf{Pr f}[f（mathbf{x}）=-1]}$和$f$是目标半空间。特别是，这意味着学习任意Massart半空间（即使是小常量$eta$）的SQ下界为$d^{Omega（log（1/epsilon））}$。我们用一个新的学习算法来补充这个下界，该算法具有样本复杂性和运行时$d^{O_{eta}（log（1/gamma））}mathrm{poly}（1/epsilon）$。总之，我们的结果定性地描述了Massart模型中学习半空间的复杂性。摘要：We study the problem of PAC learning halfspaces on $mathbb{R}^d$ with Massart noise under Gaussian marginals. In the Massart noise model, an adversary is allowed to flip the label of each point $mathbf{x}$ with probability $eta(mathbf{x}) leq eta$, for some parameter $eta in [0,1/2]$. The goal of the learner is to output a hypothesis with missclassification error $mathrm{opt} epsilon$, where $mathrm{opt}$ is the error of the target halfspace. Prior work studied this problem assuming that the target halfspace is homogeneous and that the parameter $eta$ is strictly smaller than $1/2$. We explore how the complexity of the problem changes when either of these assumptions is removed, establishing the following threshold phenomena: For $eta = 1/2$, we prove a lower bound of $d^{Omega (log(1/epsilon))}$ on the complexity of any Statistical Query (SQ) algorithm for the problem, which holds even for homogeneous halfspaces. On the positive side, we give a new learning algorithm for arbitrary halfspaces in this regime with sample complexity and running time $O_epsilon(1) , d^{O(log(1/epsilon))}$. For $eta <1/2$, we establish a lower bound of $d^{Omega(log(1/gamma))}$ on the SQ complexity of the problem, where $gamma = max{epsilon, min{mathbf{Pr}[f(mathbf{x}) = 1], mathbf{Pr}[f(mathbf{x}) = -1]} }$ and $f$ is the target halfspace. In particular, this implies an SQ lower bound of $d^{Omega (log(1/epsilon) )}$ for learning arbitrary Massart halfspaces (even for small constant $eta$). We complement this lower bound with a new learning algorithm for this regime with sample complexity and runtime $d^{O_{eta}(log(1/gamma))} mathrm{poly}(1/epsilon)$. Taken together, our results qualitatively characterize the complexity of learning halfspaces in the Massart model.

【4】 Analyze and Design Network Architectures by Recursion Formulas 标题：用递归公式分析和设计网络结构链接：https://arxiv.org/abs/2108.08689

作者：Yilin Liao,Hao Wang,Zhaoran Liu,Haozhe Li,Xinggao Liu 机构：State Key Laboratory of Industry Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou , P.R. China 备注：It is hoped that the new network architecture is derived according to a specific purpose 摘要：捷径/跳跃连接的有效性已得到广泛验证，这激发了对神经结构设计的大量探索。这项工作试图找到一种有效的方法来设计新的网络架构。研究发现，网络体系结构之间的主要差异可以反映在它们的递归公式中。在此基础上，提出了一种从数学公式的角度设计新型网络体系结构的方法。然后，通过一个案例分析，提出了一种基于ResNet的改进体系结构。此外，将新的体系结构与ResNet进行了比较，并在基于ResNet的网络上进行了测试。在CIFAR和ImageNet上进行了大量实验，见证了该体系结构提供的显著性能改进。摘要：The effectiveness of shortcut/skip-connection has been widely verified, which inspires massive explorations on neural architecture design. This work attempts to find an effective way to design new network architectures. It is discovered that the main difference between network architectures can be reflected in their recursion formulas. Based on this, a methodology is proposed to design novel network architectures from the perspective of mathematical formulas. Afterwards, a case study is provided to generate an improved architecture based on ResNet. Furthermore, the new architecture is compared with ResNet and then tested on ResNet-based networks. Massive experiments are conducted on CIFAR and ImageNet, which witnesses the significant performance improvements provided by the architecture.

【5】 Residual Tensor Train: a Flexible and Efficient Approach for Learning Multiple Multilinear Correlations 标题：残差张量训练：一种灵活有效的多重多线性相关学习方法链接：https://arxiv.org/abs/2108.08659

作者：Yiwei Chen,Yu Pan,Daoyi Dong 机构： Pan is with the Institute of Cyber-Systems and Control, Zhejiang University 备注：11 pages, 6 figures 摘要：张量列（TT）方法已成功应用于特征的多线性交互建模。然而，现有的模型缺乏灵活性和可推广性，因为它们只对单一类型的高阶相关性进行建模。实际上，特征中可能存在多个多线性相关性。在本文中，我们提出了一种新的残差张量序列（ResTT），它综合了TT和残差结构的优点，在同一模型中捕获从低阶到高阶的多线性特征相关性。特别地，我们证明了神经网络中的全连通层和Volterra级数可以作为rest的特例。此外，我们推导了基于平均场分析的权重初始化规则，以稳定ResTT的训练。我们证明了这种规则比TT规则宽松得多，这意味着ResTT可以很容易地解决当前TT模型中存在的消失和爆炸梯度问题。数值实验表明，ResTT优于最新的张量网络方法，并且在MNIST和时尚MNIST数据集上与基准深度学习模型具有竞争力。摘要：Tensor Train (TT) approach has been successfully applied in the modelling of the multilinear interaction of features. Nevertheless, the existing models lack flexibility and generalizability, as they only model a single type of high-order correlation. In practice, multiple multilinear correlations may exist within the features. In this paper, we present a novel Residual Tensor Train (ResTT) which integrates the merits of TT and residual structure to capture the multilinear feature correlations, from low to higher orders, within the same model. In particular, we prove that the fully-connected layer in neural networks and the Volterra series can be taken as special cases of ResTT. Furthermore, we derive the rule for weight initialization that stabilizes the training of ResTT based on a mean-field analysis. We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem that exists in the current TT models. Numerical experiments demonstrate that ResTT outperforms the state-of-the-art tensor network approaches, and is competitive with the benchmark deep learning models on MNIST and Fashion-MNIST datasets.

【6】 Learning System Parameters from Turing Patterns 标题：从图灵模式学习系统参数链接：https://arxiv.org/abs/2108.08542

作者：David Schnörr,Christoph Schnörr 机构： UK 2Heidelberg University 备注：32 pages, 10 figures 摘要：图灵机制描述了由于反应扩散过程中自发对称性破缺而出现的空间模式，并且是许多发展过程的基础。识别生物系统中的图灵机制是一个具有挑战性的问题。本文介绍了一种从观测到的图灵图形预测图灵参数值的方法。参数值对应于反应扩散方程的参数化系统，该系统生成图灵模式作为稳态。选择具有四个参数的Gierer-Meinhardt模型作为案例研究。采用了一种新的基于电阻距离直方图的不变模式表示方法，以及Wasserstein核，以处理依赖于假定未知初始条件的局部模式结构的高度可变排列。这使得能够计算模式之间的物理上合理的距离，计算模式簇，尤其是模型参数预测：对于小训练集，经典的最先进方法（包括算子值核）优于应用于原始模式数据的神经网络，而对于大型训练集，后者更准确。对于单个参数值，可以获得很好的预测，对于联合预测所有参数值，可以获得相当准确的结果。摘要：The Turing mechanism describes the emergence of spatial patterns due to spontaneous symmetry breaking in reaction-diffusion processes and underlies many developmental processes. Identifying Turing mechanisms in biological systems defines a challenging problem. This paper introduces an approach to the prediction of Turing parameter values from observed Turing patterns. The parameter values correspond to a parametrized system of reaction-diffusion equations that generate Turing patterns as steady state. The Gierer-Meinhardt model with four parameters is chosen as a case study. A novel invariant pattern representation based on resistance distance histograms is employed, along with Wasserstein kernels, in order to cope with the highly variable arrangement of local pattern structure that depends on the initial conditions which are assumed to be unknown. This enables to compute physically plausible distances between patterns, to compute clusters of patterns and, above all, model parameter prediction: for small training sets, classical state-of-the-art methods including operator-valued kernels outperform neural networks that are applied to raw pattern data, whereas for large training sets the latter are more accurate. Excellent predictions are obtained for single parameter values and reasonably accurate results for jointly predicting all parameter values.

【7】 Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain 标题：幅相重组：卷积神经网络的频域鲁棒性再思考链接：https://arxiv.org/abs/2108.08487

作者：Guangyao Chen,Peixi Peng,Li Ma,Jia Li,Lin Du,Yonghong Tian 机构：Department of Computer Science and Technology, Peking University, Peng Cheng Laborotory, State Key Laboratory of Virtual Reality Technology and Systems, SCSE, Beihang University, AI Application Research Center, Huawei 备注：ICCV 2021 摘要：最近，卷积神经网络（CNN）的泛化行为通过频率分量分解的解释技术逐渐变得透明。然而，图像相位谱对于鲁棒视觉系统的重要性仍然被忽视。在本文中，我们注意到CNN趋向于收敛于局部最优，这与训练图像的高频成分密切相关，而振幅谱容易受到干扰，例如噪声或常见的损坏。相比之下，更多的实证研究发现，人类依赖更多的相位成分来实现稳健的识别。这一观察结果进一步解释了CNN在对常见扰动和分布外检测的鲁棒性方面的泛化行为，并激发了通过重新组合当前图像的相位谱和干扰图像的振幅谱来设计数据增强的新视角。也就是说，生成的样本迫使CNN更加关注来自相位分量的结构化信息，并对振幅的变化保持鲁棒性。在多个图像数据集上的实验表明，该方法在多个泛化和校准任务上达到了最先进的性能，包括对常见腐蚀和表面变化的适应性、分布外检测和对抗性攻击。摘要：Recently, the generalization behavior of Convolutional Neural Networks (CNN) is gradually transparent through explanation techniques with the frequency components decomposition. However, the importance of the phase spectrum of the image for a robust vision system is still ignored. In this paper, we notice that the CNN tends to converge at the local optimum which is closely related to the high-frequency components of the training images, while the amplitude spectrum is easily disturbed such as noises or common corruptions. In contrast, more empirical studies found that humans rely on more phase components to achieve robust recognition. This observation leads to more explanations of the CNN's generalization behaviors in both robustness to common perturbations and out-of-distribution detection, and motivates a new perspective on data augmentation designed by re-combing the phase spectrum of the current image and the amplitude spectrum of the distracter image. That is, the generated samples force the CNN to pay more attention to the structured information from phase components and keep robust to the variation of the amplitude. Experiments on several image datasets indicate that the proposed method achieves state-of-the-art performances on multiple generalizations and calibration tasks, including adaptability for common corruptions and surface variations, out-of-distribution detection, and adversarial attack.

【8】 Language Model Augmented Relevance Score 标题：语言模型增强的相关性得分链接：https://arxiv.org/abs/2108.08485

作者：Ruibo Liu,Jason Wei,Soroush Vosoughi 机构：Dartmouth College, Google AI Language 备注：In ACL 2021 摘要：虽然自动度量通常用于评估NLG系统，但它们通常与人类的判断关联性较差。较新的指标（如BERTScore）解决了以前的指标（如BLEU和ROUGE）中的许多弱点，这些指标依赖于n-gram匹配。然而，这些新的方法仍然是有限的，因为它们不考虑生成上下文，所以它们不能正确地奖励生成的文本是正确的，但偏离给定的引用。在本文中，我们提出了语言模型增强相关性评分（MARS），一种新的上下文感知NLG评估指标。火星利用现成的语言模型，在强化学习指导下，创建既考虑生成上下文又考虑可用的人类参考的扩充引用，然后将其用作附加得分的生成文本的引用。与三个常见NLG任务中的七个现有指标相比，MARS不仅实现了与人类参考判断的更高相关性，而且在更大程度上区分了形成良好的候选对象和对抗性样本。摘要：Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements. Newer metrics such as BERTScore have addressed many weaknesses in prior metrics such as BLEU and ROUGE, which rely on n-gram matching. These newer methods, however, are still limited in that they do not consider the generation context, so they cannot properly reward generated text that is correct but deviates from the given reference. In this paper, we propose Language Model Augmented Relevance Score (MARS), a new context-aware metric for NLG evaluation. MARS leverages off-the-shelf language models, guided by reinforcement learning, to create augmented references that consider both the generation context and available human references, which are then used as additional references to score generated text. Compared with seven existing metrics in three common NLG tasks, MARS not only achieves higher correlation with human reference judgements, but also differentiates well-formed candidates from adversarial samples to a larger degree.

【9】 Neural Operator: Learning Maps Between Function Spaces 标题：神经算子：学习函数空间之间的映射链接：https://arxiv.org/abs/2108.08481

作者：Nikola Kovachki,Zongyi Li,Burigede Liu,Kamyar Azizzadenesheli,Kaushik Bhattacharya,Andrew Stuart,Anima Anandkumar 机构：Editor: 摘要：神经网络的经典发展主要集中于学习有限维欧氏空间或有限集之间的映射。我们提出了一种用于学习无穷维函数空间之间的算子映射的神经网络的推广。我们通过一类线性积分算子和非线性激活函数的组合来表示算子的逼近，从而使组合算子能够逼近复杂的非线性算子。此外，我们还介绍了四类算子参数化：基于图的算子、低秩算子、基于多极图的算子和傅立叶算子，并描述了每一类算子的有效计算算法。所提出的神经算子具有分辨率不变性：它们在底层函数空间的不同离散化之间共享相同的网络参数，并且可以用于零炮超分辨率。从数值上看，与基于Burgers方程、Darcy流和Navier-Stokes方程的现有机器学习方法相比，所提出的模型表现出了优越的性能，同时与传统的PDE解算器相比，速度快了几个数量级。摘要：The classical development of neural networks has primarily focused on learning mappings between finite dimensional Euclidean spaces or finite sets. We propose a generalization of neural networks tailored to learn operators mapping between infinite dimensional function spaces. We formulate the approximation of operators by composition of a class of linear integral operators and nonlinear activation functions, so that the composed operator can approximate complex nonlinear operators. Furthermore, we introduce four classes of operator parameterizations: graph-based operators, low-rank operators, multipole graph-based operators, and Fourier operators and describe efficient algorithms for computing with each one. The proposed neural operators are resolution-invariant: they share the same network parameters between different discretizations of the underlying function spaces and can be used for zero-shot super-resolutions. Numerically, the proposed models show superior performance compared to existing machine learning based methodologies on Burgers' equation, Darcy flow, and the Navier-Stokes equation, while being several order of magnitude faster compared to conventional PDE solvers.

【10】 Improving Human Decision-Making with Machine Learning 标题：利用机器学习提高人类决策能力链接：https://arxiv.org/abs/2108.08454

作者：Hamsa Bastani,Osbert Bastani,Wichinpong Park Sinchaisri 机构：University of Pennsylvania, University of California, Berkeley 摘要：人类智力的一个关键方面是他们以简洁的形式向他人传达知识的能力。然而，尽管目前的机器学习模型具有很强的预测能力，但它们基本上都是黑匣子，这使得人类很难提取有用的见解。针对顺序决策，我们设计了一种新的机器学习算法，以可解释的“提示”的形式将其见解传达给人类。我们的算法选择最能弥合人类用户和最优策略之间性能差距的提示。我们通过一系列随机对照的用户研究来评估我们的方法，参与者管理一个虚拟厨房。我们的实验表明，与直观的基线相比，我们的算法生成的提示可以显著提高人的表现。此外，我们还讨论了一些经验见解，这些见解有助于为人工智能协作算法的设计提供信息。例如，我们发现有证据表明，参与者并非只是盲目地遵循我们的提示；相反，他们将其与自己的经验相结合，以发现提高绩效的其他策略。摘要：A key aspect of human intelligence is their ability to convey their knowledge to others in succinct forms. However, despite their predictive power, current machine learning models are largely blackboxes, making it difficult for humans to extract useful insights. Focusing on sequential decision-making, we design a novel machine learning algorithm that conveys its insights to humans in the form of interpretable "tips". Our algorithm selects the tip that best bridges the gap in performance between human users and the optimal policy. We evaluate our approach through a series of randomized controlled user studies where participants manage a virtual kitchen. Our experiments show that the tips generated by our algorithm can significantly improve human performance relative to intuitive baselines. In addition, we discuss a number of empirical insights that can help inform the design of algorithms intended for human-AI collaboration. For instance, we find evidence that participants do not simply blindly follow our tips; instead, they combine them with their own experience to discover additional strategies for improving performance.

【11】 Deep Contrastive Learning for Multi-View Network Embedding 标题：多视点网络嵌入的深度对比学习链接：https://arxiv.org/abs/2108.08296

作者：Mengqi Zhang,Yanqiao Zhu,Shu Wu,Liang Wang 机构： Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences 备注：Work in progress 摘要：多视图网络嵌入的目的是将网络中的节点投影到低维向量上，同时保留其多个关系和属性信息。基于对比学习的方法在这项任务中已初步显示出良好的性能。然而，大多数基于对比学习的方法大多依赖于高质量的图形嵌入，而对不同图形视图之间关系的研究较少。针对这些不足，我们设计了一个新的多视图网络嵌入节点对节点对比学习框架（CREME），该框架主要包含两个对比目标：多视图融合InfoMax和视图间InfoMin。前者从不同的图视图生成的嵌入中提取信息，而后者更好地区分不同的图视图以捕获它们之间的互补信息。具体来说，我们首先应用视图编码器来生成每个图形视图表示，并利用多视图聚合器来融合这些表示。然后，我们将两个对比目标统一为一个学习目标进行训练。在三个真实数据集上的大量实验表明，CREME的性能始终优于现有的方法。摘要：Multi-view network embedding aims at projecting nodes in the network to low-dimensional vectors, while preserving their multiple relations and attribute information. Contrastive learning-based methods have preliminarily shown promising performance in this task. However, most contrastive learning-based methods mostly rely on high-quality graph embedding and explore less on the relationships between different graph views. To deal with these deficiencies, we design a novel node-to-node Contrastive learning framework for Multi-view network Embedding (CREME), which mainly contains two contrastive objectives: Multi-view fusion InfoMax and Inter-view InfoMin. The former objective distills information from embeddings generated from different graph views, while the latter distinguishes different graph views better to capture the complementary information between them. Specifically, we first apply a view encoder to generate each graph view representation and utilize a multi-view aggregator to fuse these representations. Then, we unify the two contrastive objectives into one learning objective for training. Extensive experiments on three real-world datasets show that CREME outperforms existing methods consistently.

【12】 AIRCHITECT: Learning Custom Architecture Design and Mapping Space 标题：AIRCHITECT：学习定制建筑设计和绘图空间链接：https://arxiv.org/abs/2108.08295

作者：Ananda Samajdar,Jan Moritz Joseph,Matthew Denton,Tushar Krishna 机构：Georgia Tech, Atlanta, GA, RWTH Aachen University, Aachen, Germany 摘要：设计空间探索是定制体系结构设计/部署过程中一个重要但代价高昂的步骤，目的是尽可能提高性能和能效。传统上，优化需要使用模拟或启发式工具对设计空间进行迭代采样。在本文中，我们研究了使用机器学习学习优化任务的可能性，从而使用学习到的模型预测定制架构的设计和映射空间的最佳参数，绕过任何探索步骤。我们使用了三个案例研究，涉及基于脉动阵列的定制架构设计和映射空间的最佳阵列设计、SRAM缓冲区大小、映射和调度确定。在这些案例研究的范围内，我们表明，当使用工作量和设计约束进行查询时，可以捕获设计空间并训练模型来“概括”预测最优设计和映射参数。我们为我们的案例研究对优化空间进行系统的设计感知和统计分析，并突出设计空间中的模式。我们将架构设计和映射描述为一个机器学习问题，它允许我们利用现有的ML模型进行训练和推理。我们设计并训练了一个名为AIRCHITECT的定制网络体系结构，该体系结构能够以高达94.3%的测试精度学习体系结构设计空间，并预测最佳配置，在一个工作负载为10^5$GEMM的测试数据集上，平均（GeoMean）达到99.9%的最佳性能。摘要：Design space exploration is an important but costly step involved in the design/deployment of custom architectures to squeeze out maximum possible performance and energy efficiency. Conventionally, optimizations require iterative sampling of the design space using simulation or heuristic tools. In this paper we investigate the possibility of learning the optimization task using machine learning and hence using the learnt model to predict optimal parameters for the design and mapping space of custom architectures, bypassing any exploration step. We use three case studies involving the optimal array design, SRAM buffer sizing, mapping, and schedule determination for systolic-array-based custom architecture design and mapping space. Within the purview of these case studies, we show that it is possible to capture the design space and train a model to "generalize" prediction the optimal design and mapping parameters when queried with workload and design constraints. We perform systematic design-aware and statistical analysis of the optimization space for our case studies and highlight the patterns in the design space. We formulate the architecture design and mapping as a machine learning problem that allows us to leverage existing ML models for training and inference. We design and train a custom network architecture called AIRCHITECT, which is capable of learning the architecture design space with as high as 94.3% test accuracy and predicting optimal configurations which achieve on average (GeoMean) of 99.9% the best possible performance on a test dataset with $10^5$ GEMM workloads.

【13】 A Framework for an Assessment of the Kernel-target Alignment in Tree Ensemble Kernel Learning 标题：树集成核学习中的核-目标对齐评估框架链接：https://arxiv.org/abs/2108.08752

作者：Dai Feng,Richard Baumgartner 机构：Data and Statistical Sciences, AbbVie Inc., North Chicago, IL, United States of America, Biometrics Research, Merck & Co., Inc., Kenilworth, NJ, United States of America 摘要：当用于内核学习时，由树集合（如随机森林（RF）或梯度增强树（GBT））生成的内核已被证明与各自的树集合（特别是在高维场景中）具有竞争力。另一方面，研究还表明，核算法的性能取决于核目标对齐的程度。然而，基于树集合的核学习的核-目标对齐还没有被研究，填补这一空白是我们工作的主要目标。利用核矩阵的特征分析，我们证明了对于连续目标，基于树的核学习的良好性能与强核目标对齐有关。此外，我们还表明，性能良好的基于树系综的核具有强目标对齐成分的特征，这些成分通过核矩阵的特征向量与目标之间的标量积表示。这表明，当基于树集成的核学习成功时，有监督问题的相关信息集中在目标对齐组件跨越的低维流形附近。通过landmark学习的敏感性分析进一步支持基于树集成的内核中强目标对齐组件的持久性。除了全面的模拟研究外，我们还提供了与模拟一致的几个真实数据集的实验结果。摘要：Kernels ensuing from tree ensembles such as random forest (RF) or gradient boosted trees (GBT), when used for kernel learning, have been shown to be competitive to their respective tree ensembles (particularly in higher dimensional scenarios). On the other hand, it has been also shown that performance of the kernel algorithms depends on the degree of the kernel-target alignment. However, the kernel-target alignment for kernel learning based on the tree ensembles has not been investigated and filling this gap is the main goal of our work. Using the eigenanalysis of the kernel matrix, we demonstrate that for continuous targets good performance of the tree-based kernel learning is associated with strong kernel-target alignment. Moreover, we show that well performing tree ensemble based kernels are characterized by strong target aligned components that are expressed through scalar products between the eigenvectors of the kernel matrix and the target. This suggests that when tree ensemble based kernel learning is successful, relevant information for the supervised problem is concentrated near lower dimensional manifold spanned by the target aligned components. Persistence of the strong target aligned components in tree ensemble based kernels is further supported by sensitivity analysis via landmark learning. In addition to a comprehensive simulation study, we also provide experimental results from several real life data sets that are in line with the simulations.

【14】 Determinant-free fermionic wave function using feed-forward neural network 标题：基于前馈神经网络的无行列式费米波函数链接：https://arxiv.org/abs/2108.08631

作者：Koji Inui,Yasuyuki Kato,Yukitoshi Motome 机构：Department of Applied Physics, The University of Tokyo, Hongo, Tokyo ,-, Japan 摘要：我们提出了一个用前馈神经网络求多体费米系统基态的通用框架。费米子的反置换关系通常由Slater行列式（或Pfaffian）实现为变分波函数，这是一个计算瓶颈，因为$N$粒子的数值代价为$O（N^3）$。我们通过显式计算与实空间中粒子交换相关的符号变化，并使用完全连接的神经网络优化波函数的其余部分，绕过了这个瓶颈。这将计算成本降低到$O（N^2）$或更低。我们表明，通过同时优化能量的“方差”和能量本身，可以提高近似的精度。我们还发现，蒙特卡罗抽样中的重新加权方法可以稳定计算。这些改进可应用于基于变分蒙特卡罗方法的其他方法。此外，我们还表明，通过使用系统的对称性、代表状态和一个附加的实现广义Gutzwiller-Jastrow因子的神经网络，可以进一步提高精度。通过将该方法应用于二维哈伯德模型，证明了该方法的有效性。摘要：We propose a general framework for finding the ground state of many-body fermionic systems by using feed-forward neural networks. The anticommutation relation for fermions is usually implemented to a variational wave function by the Slater determinant (or Pfaffian), which is a computational bottleneck because of the numerical cost of $O(N^3)$ for $N$ particles. We bypass this bottleneck by explicitly calculating the sign changes associated with particle exchanges in real space and using fully connected neural networks for optimizing the rest parts of the wave function. This reduces the computational cost to $O(N^2)$ or less. We show that the accuracy of the approximation can be improved by optimizing the "variance" of the energy simultaneously with the energy itself. We also find that a reweighting method in Monte Carlo sampling can stabilize the calculation. These improvements can be applied to other approaches based on variational Monte Carlo methods. Moreover, we show that the accuracy can be further improved by using the symmetry of the system, the representative states, and an additional neural network implementing a generalized Gutzwiller-Jastrow factor. We demonstrate the efficiency of the method by applying it to a two-dimensional Hubbard model.

【15】 Data-driven Modeling for Distribution Grids Under Partial Observability 标题：部分可观测性下的配电网数据驱动建模链接：https://arxiv.org/abs/2108.08350

作者：Shanny Lin,Hao Zhu 机构：The authors are with the Department of Electrical & Computer Engineer-ing, The University of Texas at Austin 摘要：准确建模配电网对于设计有效的监测和决策算法至关重要。为了提高线路参数估计的精度，本文研究了数据驱动分布建模的部分可观测性问题。受住宅负荷稀疏变化的启发，我们主张在双线性估计问题中正则化不可观测注入的群稀疏性。利用凸子问题的有效解，提出了保证收敛的交替极小化方案。在IEEE 123总线测试用例的单相等效物上使用真实负载数据的数值结果表明，与现有的参数估计和电压建模工作相比，所提出的解决方案的精度有所提高。摘要：Accurately modeling power distribution grids is crucial for designing effective monitoring and decision making algorithms. This paper addresses the partial observability issue of data-driven distribution modeling in order to improve the accuracy of line parameter estimation. Inspired by the sparse changes in residential loads, we advocate to regularize the group sparsity of the unobservable injections in a bi-linear estimation problem. The alternating minimization scheme of guaranteed convergence is proposed to take advantage of convex subproblems with efficient solutions. Numerical results using real-world load data on the single-phase equivalent of the IEEE 123-bus test case have demonstrated the accuracy improvements of the proposed solution over existing work for both parameter estimation and voltage modeling.

其他(12篇)

【1】 Lifelong Computing 标题：终身计算链接：https://arxiv.org/abs/2108.08802

作者：Danny Weyns,Thomas Bäck,Renè Vidal,Xin Yao,Ahmed Nabil Belbachir 机构：KU Leuven, Belgium, Linnaeus University, Sweden, Leiden University, The Netherlands, NORCE Norwegian Research Centre, René Vidal, Johns Hopkins University, USA, University of Birmingham, UK, SUSTech, China 备注：9 pages 摘要：计算系统构成了我们生活中许多方面的支柱，因此它们对于我们的社会来说就像水、电和道路基础设施一样重要。然而，在不断变化的环境中实现其目标的工程长时间运行的计算系统带来了重大挑战。目前，我们可以构建计算系统，随着时间的推移进行调整或学习，以匹配预期的变化。然而，处理意外的变化，如异常、新奇、新的目标或约束，需要系统进化，这在本质上仍然是人类驱动的活动。考虑到计算系统的日益复杂和需要处理的大量高度复杂的数据，这种方法最终将变得难以管理。为了突破这一现状，我们提出了一种新的计算系统设计和运行模式，我们称之为“终身计算”。该模式从集成计算/服务模块和学习模块的计算学习系统开始。计算仓库提供此类计算元素以及数据表和使用指南。当检测到异常、新奇、新目标或约束时，终身计算系统会激活一个进化自学习引擎，该引擎运行在线实验，以确定计算学习系统需要如何进化以应对变化，从而改变其架构，并根据需要集成计算仓库中的新计算元素。根据手头的领域，终身计算系统的某些活动可以由人类支持。我们通过未来的养鱼场景激发终身计算的需求，勾勒出终身计算系统的蓝图架构，并强调实现终身计算愿景的关键研究挑战。摘要：Computing systems form the backbone of many aspects of our life, hence they are becoming as vital as water, electricity, and road infrastructures for our society. Yet, engineering long running computing systems that achieve their goals in ever-changing environments pose significant challenges. Currently, we can build computing systems that adjust or learn over time to match changes that were anticipated. However, dealing with unanticipated changes, such as anomalies, novelties, new goals or constraints, requires system evolution, which remains in essence a human-driven activity. Given the growing complexity of computing systems and the vast amount of highly complex data to process, this approach will eventually become unmanageable. To break through the status quo, we put forward a new paradigm for the design and operation of computing systems that we coin "lifelong computing." The paradigm starts from computing-learning systems that integrate computing/service modules and learning modules. Computing warehouses offer such computing elements together with data sheets and usage guides. When detecting anomalies, novelties, new goals or constraints, a lifelong computing system activates an evolutionary self-learning engine that runs online experiments to determine how the computing-learning system needs to evolve to deal with the changes, thereby changing its architecture and integrating new computing elements from computing warehouses as needed. Depending on the domain at hand, some activities of lifelong computing systems can be supported by humans. We motivate the need for lifelong computing with a future fish farming scenario, outline a blueprint architecture for lifelong computing systems, and highlight key research challenges to realise the vision of lifelong computing.

【2】 Simple is better: Making Decision Trees faster using random sampling 标题：越简单越好：使用随机抽样使决策树更快链接：https://arxiv.org/abs/2108.08790

作者：Vignesh Nanda Kumar,Narayanan U Edakunni 机构：AI Labs, American Express, India 摘要：近年来，梯度增强的决策树在大数据上构建健壮的机器学习模型方面变得非常流行。使这些算法成功的主要技术是在构建决策树的同时分配计算。通过构建大数据集的分位数并从这些分位数集中选择候选分割点，从而实现了分布式决策树的构建。例如，在XGBoost中，使用复杂的分位数构建算法来识别决策树的候选分割点。当计算是分布式的时，这种方法通常会产生更好的结果。在本文中，我们消除了这样一种观念，即这些方法为以分布式方式构建决策树提供了更精确和可伸缩的方法。在一项重要贡献中，我们从理论和经验上证明，随机均匀选择分裂点在精度和计算效率方面提供相同甚至更好的性能。因此，与更复杂的方法相比，简单的随机选择点就足以构建决策树。摘要：In recent years, gradient boosted decision trees have become popular in building robust machine learning models on big data. The primary technique that has enabled these algorithms success has been distributing the computation while building the decision trees. A distributed decision tree building, in turn, has been enabled by building quantiles of the big datasets and choosing the candidate split points from these quantile sets. In XGBoost, for instance, a sophisticated quantile building algorithm is employed to identify the candidate split points for the decision trees. This method is often projected to yield better results when the computation is distributed. In this paper, we dispel the notion that these methods provide more accurate and scalable methods for building decision trees in a distributed manner. In a significant contribution, we show theoretically and empirically that choosing the split points uniformly at random provides the same or even better performance in terms of accuracy and computational efficiency. Hence, a simple random selection of points suffices for decision tree building compared to more sophisticated methods.

【3】 Czech News Dataset for Semanic Textual Similarity 标题：面向语义文本相似度的捷克语新闻数据集链接：https://arxiv.org/abs/2108.08708

作者：Jakub Sido,Michal Seják,Ondřej Pražák,Miloslav Konopík,Václav Moravec 机构： NTIS – New Technologies for the Information Society, Department of Computer Science and Engineering, University of West Bohemia, Czech Republic, Department of Journalism, Charles University, Czech Republic 摘要：本文描述了一个由具有语义相似性注释的句子组成的新数据集。数据来源于捷克语的新闻领域。我们详细描述了收集和注释数据的过程。该数据集包含138556条人类注释，分为训练集和测试集。总共有485名新闻专业学生参与了创作过程。为了提高测试集的可靠性，我们将注释计算为9个单独注释的平均值。我们通过测量注释间和注释内注释者的一致性来评估数据集的质量。除了协议编号之外，我们还提供了所收集数据集的详细统计信息。我们以一个基线实验来结束我们的论文，该实验构建了一个预测句子语义相似性的系统。由于大量的训练注释（116 956），该模型的性能明显优于平均注释者（人的相关系数分别为0,92和0,86）。摘要：This paper describes a novel dataset consisting of sentences with semantic similarity annotations. The data originate from the journalistic domain in the Czech language. We describe the process of collecting and annotating the data in detail. The dataset contains 138,556 human annotations divided into train and test sets. In total, 485 journalism students participated in the creation process. To increase the reliability of the test set, we compute the annotation as an average of 9 individual annotations. We evaluate the quality of the dataset by measuring inter and intra annotation annotators' agreements. Beside agreement numbers, we provide detailed statistics of the collected dataset. We conclude our paper with a baseline experiment of building a system for predicting the semantic similarity of sentences. Due to the massive number of training annotations (116 956), the model can perform significantly better than an average annotator (0,92 versus 0,86 of Person's correlation coefficients).

【4】 Settling the Variance of Multi-Agent Policy Gradients 标题：解决多Agent策略梯度差异的方法链接：https://arxiv.org/abs/2108.08612

作者：Jakub Grudzien Kuba,Muning Wen,Yaodong Yang,Linghui Meng,Shangding Gu,Haifeng Zhang,David Henry Mguni,Jun Wang 机构： 3Shanghai Jiao Tong University, 5Institute of Automation, 6University College London 摘要：策略梯度（PG）方法是一种流行的强化学习（RL）方法，通常使用基线来减少梯度估计的方差。在多agent RL（MARL）中，虽然PG定理可以自然扩展，但随着梯度估计方差随agent数量的增加而迅速增加，多agent PG（MAPG）方法的有效性会下降。在本文中，我们对MAPG方法进行了严格的分析，首先，通过量化代理数量和代理探索对MAPG估计量方差的贡献。基于此分析，我们推导出了达到最小方差的最佳基线（OB）。与OB相比，我们测量了现有MARL算法（如vanilla MAPG和COMA）的超额方差。考虑到使用深度神经网络，我们还提出了OB的替代版本，它可以无缝地插入MARL中任何现有的PG方法。在多代理MuJoCo和星际争霸挑战的基准上，我们的OB技术有效地稳定了训练，并显著提高了多代理PPO和COMA算法的性能。摘要：Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin.

【5】 Using Multilevel Circulant Matrix Approximate to Speed Up Kernel Logistic Regression 标题：利用多级循环矩阵近似加速核Logistic回归链接：https://arxiv.org/abs/2108.08605

作者：Junna~Zhang,Shuisheng~Zhou,~Cui~Fu,Zhuan Zhang 机构： Zhang are with the School of Math-ematics and Statistics, Xidian University 摘要：核逻辑回归（KLR）是统计机器学习中一种经典的非线性分类器。具有二次收敛速度的牛顿法比梯度法能更有效地求解KLR问题。然而，牛顿方法用于训练大规模问题的一个明显限制是$O（n^{3}）$时间复杂度和$O（n^{2}）$空间复杂度，其中$n$是训练实例的数量。在本文中，我们采用多级循环矩阵（MCM）近似核矩阵来节省存储空间并加速KLR的求解。结合MCM的特点和我们的巧妙设计，我们提出了一种MCM近似牛顿迭代法。我们首先根据核矩阵的半正性对牛顿方向进行简化，然后利用MCM对牛顿方向进行两步逼近。我们的方法通过使用多维快速傅立叶变换（mFFT）将每次迭代的时间复杂度降低到$O（nlogn）$。此外，由于MCM的内建周期性，空间复杂度可以降低到$O（n）$。对一些大规模二值分类和多分类问题的实验结果表明，该方法使KLR对大规模问题具有可扩展性，占用内存少，并且在较短的时间内收敛到测试精度。摘要：Kernel logistic regression (KLR) is a classical nonlinear classifier in statistical machine learning. Newton method with quadratic convergence rate can solve KLR problem more effectively than the gradient method. However, an obvious limitation of Newton method for training large-scale problems is the $O(n^{3})$ time complexity and $O(n^{2})$ space complexity, where $n$ is the number of training instances. In this paper, we employ the multilevel circulant matrix (MCM) approximate kernel matrix to save in storage space and accelerate the solution of the KLR. Combined with the characteristics of MCM and our ingenious design, we propose an MCM approximate Newton iterative method. We first simplify the Newton direction according to the semi-positivity of the kernel matrix and then perform a two-step approximation of the Newton direction by using MCM. Our method reduces the time complexity of each iteration to $O(n log n)$ by using the multidimensional fast Fourier transform (mFFT). In addition, the space complexity can be reduced to $O(n)$ due to the built-in periodicity of MCM. Experimental results on some large-scale binary and multi-classification problems show that our method makes KLR scalable for large-scale problems, with less memory consumption, and converges to test accuracy without sacrifice in a shorter time.

【6】 Pruning in the Face of Adversaries 标题：在对手面前修剪链接：https://arxiv.org/abs/2108.08560

作者：Florian Merkle,Maximilian Samsinger,Pascal Schöttle 机构：Management Center Innsbruck, Digital Business and Software Engineering, Pascal Sch¨ottle 摘要：深度神经网络在对抗性示例中的脆弱性——具有小的不可察觉的扰动的输入——最近在研究界获得了很多关注。同时，最先进的深度学习模型的参数数量也在大量增加，这对训练和部署此类模型所需的内存和计算资源产生了影响。控制神经网络大小的一种方法是回顾性地减少参数数量，即所谓的神经网络修剪。关于神经网络修剪对对抗性稳健性的影响的现有研究是零碎的，通常不符合稳健性评估的既定原则。我们通过评估剪枝模型对各种攻击强度、多种体系结构、数据集、剪枝方法和压缩率的L-0、L-2和L-无穷大攻击的鲁棒性来缩小这一差距。我们的结果证实了神经网络剪枝和对抗鲁棒性并不是相互排斥的。相反，可以找到在模型大小和对抗鲁棒性方面有利的最佳点。此外，我们将我们的分析扩展到包含关于对抗场景的附加假设的情况，并表明根据情况，不同的策略是最优的。摘要：The vulnerability of deep neural networks against adversarial examples - inputs with small imperceptible perturbations - has gained a lot of attention in the research community recently. Simultaneously, the number of parameters of state-of-the-art deep learning models has been growing massively, with implications on the memory and computational resources required to train and deploy such models. One approach to control the size of neural networks is retrospectively reducing the number of parameters, so-called neural network pruning. Available research on the impact of neural network pruning on the adversarial robustness is fragmentary and often does not adhere to established principles of robustness evaluation. We close this gap by evaluating the robustness of pruned models against L-0, L-2 and L-infinity attacks for a wide range of attack strengths, several architectures, data sets, pruning methods, and compression rates. Our results confirm that neural network pruning and adversarial robustness are not mutually exclusive. Instead, sweet spots can be found that are favorable in terms of model size and adversarial robustness. Furthermore, we extend our analysis to situations that incorporate additional assumptions on the adversarial scenario and show that depending on the situation, different strategies are optimal.

【7】 A Unified Objective for Novel Class Discovery 标题：新类发现的统一目标链接：https://arxiv.org/abs/2108.08536

作者：Enrico Fini,Enver Sangineto,Stéphane Lathuilière,Zhun Zhong,Moin Nabi,Elisa Ricci 机构：St´ephane Lathuiliere, University of Trento, Trento, Italy, LTCI, T´el´ecom Paris, Institut Polytechnique de Paris, France, SAP AI Research, Berlin, Germany, Fondazione Bruno Kessler, Trento, Italy 备注：ICCV 2021 (Oral) 摘要：本文研究了新类发现（NCD）问题。NCD旨在通过利用包含不同但相关类的标记集的先验知识，在未标记集中推断新的对象类别。现有方法通过考虑多个目标函数来解决这个问题，通常分别涉及标记样本和未标记样本的专门损失项，并且通常需要辅助正则化项。在本文中，我们从这个传统的方案出发，引入了一个统一的目标函数（UNO）来发现新的类，明确的目的是促进监督和非监督学习之间的协同。使用多视图自标记策略，我们生成伪标签，这些伪标签可以与地面真值标签进行均匀处理。这导致在已知和未知类别上都有一个单一的分类目标。尽管它很简单，UNO在几个基准上的表现都比最先进的水平有显著的优势（在CIFAR-100上为 10%，在ImageNet上为 8%）。项目页面位于：url{https://ncd-uno.github.io}. 摘要：In this paper, we study the problem of Novel Class Discovery (NCD). NCD aims at inferring novel object categories in an unlabeled set by leveraging from prior knowledge of a labeled set containing different, but related classes. Existing approaches tackle this problem by considering multiple objective functions, usually involving specialized loss terms for the labeled and the unlabeled samples respectively, and often requiring auxiliary regularization terms. In this paper, we depart from this traditional scheme and introduce a UNified Objective function (UNO) for discovering novel classes, with the explicit purpose of favoring synergy between supervised and unsupervised learning. Using a multi-view self-labeling strategy, we generate pseudo-labels that can be treated homogeneously with ground truth labels. This leads to a single classification objective operating on both known and unknown classes. Despite its simplicity, UNO outperforms the state of the art by a significant margin on several benchmarks (~ 10% on CIFAR-100 and 8% on ImageNet). The project page is available at: url{https://ncd-uno.github.io}.

【8】 Trends in Neural Architecture Search: Towards the Acceleration of Search 标题：神经结构搜索的发展趋势：加速搜索链接：https://arxiv.org/abs/2108.08474

作者：Youngkee Kim,Won Joon Yun,Youn Kyu Lee,Soyi Jung,Joongheon Kim 机构：†Department of Electrical and Computer Engineering, Korea University, Seoul, Republic of Korea, ◦Department of Computer Engineering, Hongik University, Seoul, Republic of Korea, †School of Software, Hallym University, Chuncheon, Republic of Korea 备注：4 pages, 5 figures, In Proceedings of the 12th International Conference on ICT Convergence (ICTC) 2021 摘要：在现代深度学习研究中，寻找最优（或接近最优）的神经网络模型是主要的研究方向之一，在许多应用中得到了广泛的研究。本文将神经结构搜索的主要研究方向分为神经进化算法、基于强化学习的算法和一次性结构搜索方法。此外，还介绍了每种研究趋势，最后对三种主要趋势进行了比较。最后，讨论了NAS未来的研究方向和发展趋势。摘要：In modern deep learning research, finding optimal (or near optimal) neural network models is one of major research directions and it is widely studied in many applications. In this paper, the main research trends of neural architecture search (NAS) are classified as neuro-evolutionary algorithms, reinforcement learning based algorithms, and one-shot architecture search approaches. Furthermore, each research trend is introduced and finally all the major three trends are compared. Lastly, the future research directions of NAS research trends are discussed.

【9】 FeelsGoodMan: Inferring Semantics of Twitch Neologisms 标题：FeelsGoodMan：推断抽搐新词的语义链接：https://arxiv.org/abs/2108.08411

作者：Pavel Dolin,Luc d'Hauthuille,Andrea Vattani 机构：Spiketrap, San Francisco, CA, USA, Luc d’Hauthuille 摘要：Twitch聊天在自然语言理解中提出了一个独特的问题，因为大量新词，特别是表情词的存在。总共有806万个表情，其中超过40万个在研究的一周内使用。几乎没有关于表情的含义或情绪的信息，随着新表情的不断涌入和频率的漂移，不可能维护更新的手动标记数据集。我们的论文有两方面的贡献。首先，我们为Twitch数据的情绪分析建立了一个新的基线，比之前的监督基准高出7.9%。其次，我们介绍了一个简单但功能强大的基于单词嵌入和k-NN的无监督框架，以丰富现有的词汇表外知识模型。该框架允许我们自动生成表情的伪词典，我们表明，即使将这些表情知识注入到在电影评论或Twitter等无关数据集上训练的情感分类器中，我们也几乎可以匹配上述监督基准。摘要：Twitch chats pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were used in the week studied. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in their frequencies, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two fold contribution. First we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous supervised benchmark by 7.9% points. Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate a pseudo-dictionary of emotes and we show that we can nearly match the supervised benchmark above even when injecting such emote knowledge into sentiment classifiers trained on extraneous datasets such as movie reviews or Twitter.

【10】 TFRD: A Benchmark Dataset for Research on Temperature Field Reconstruction of Heat-Source Systems 标题：TFRD：热源系统温度场重建研究的基准数据集链接：https://arxiv.org/abs/2108.08298

作者：Xiaoqian Chen,Zhiqiang Gong,Xiaoyu Zhao,Wen Yao 机构：Received: date Accepted: date 摘要：热管理在工程中起着重要的作用。用有限的监测张量重建热源系统（TFR-HSS）的温度场在热管理中起着至关重要的作用。然而，现有的常用插值方法通常不能提供精确的重建。此外，目前还没有公共数据集可用于广泛研究重建方法，以进一步推动工程领域的现场重建。为了克服这一问题，本工作以常用的插值方法和基于代理模型的方法为基线，为TFR-HSS任务构建了一个特定的数据集，即TFRD，以推进温度场重建的研究。首先，TFR-HSS任务从实际工程问题进行数学建模，并构建了三种类型的数值建模，以将问题转化为离散映射形式。此外，本文选取了四个典型的具有不同热源信息和边界条件的重构问题，并生成标准样本作为训练样本和测试样本进行进一步的研究。最后，对TFR-HSS任务的先前方法以及最近广泛使用的深度学习方法进行了全面回顾，并对TFRD的典型方法进行了性能分析，可作为该基准的基线结果。摘要：Heat management plays an important role in engineering. Temperature field reconstruction of heat source systems (TFR-HSS) with limited monitoring tensors, performs an essential role in heat management. However, prior methods with common interpolations usually cannot provide accurate reconstruction. In addition, there exists no public dataset for widely research of reconstruction methods to further boost the field reconstruction in engineering. To overcome this problem, this work construct a specific dataset, namely TFRD, for TFR-HSS task with commonly used methods, including the interpolation methods and the surrogate model based methods, as baselines to advance the research over temperature field reconstruction. First, the TFR-HSS task is mathematically modelled from real-world engineering problem and three types of numerically modellings have been constructed to transform the problem into discrete mapping forms. Besides, this work selects four typical reconstruction problem with different heat source information and boundary conditions and generate the standard samples as training and testing samples for further research. Finally, a comprehensive review of the prior methods for TFR-HSS task as well as recent widely used deep learning methods is given and we provide a performance analysis of typical methods on TFRD, which can be served as the baseline results on this benchmark.

【11】 GSVMA: A Genetic-Support Vector Machine-Anova method for CAD diagnosis based on Z-Alizadeh Sani dataset 标题：GSVMA：基于Z-Alizadeh SAI数据集的遗传支持向量机-方差分析CAD诊断方法链接：https://arxiv.org/abs/2108.08292

作者：Javad Hassannataj Joloudari,Faezeh Azizi,Mohammad Ali Nematollahi,Roohallah Alizadehsani,Edris Hassannataj,Amir Mosavi 机构：Department of Computer Engineering, University of Birjand, Birjand, Iran, Department of Computer Sciences, Fasa University, Fasa, Iran, Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC , Australia 备注：14 pages, 10 figures 摘要：冠心病（CAD）是全世界中年人心血管疾病死亡的重要原因之一。诊断CAD最典型的工具是血管造影。使用血管造影术进行CAD诊断的挑战是昂贵的，并且有副作用。替代解决方案之一是使用基于机器学习的模式进行CAD诊断。因此，本文提出了一种新的混合机器学习模型，称为遗传支持向量机和方差分析（GSVMA）。方差分析被称为支持向量机的核函数。该模型基于Z-Alizadeh-Sani数据集执行。采用遗传优化算法选择关键特征。此外，采用方差分析支持向量机、线性支持向量机和径向基函数支持向量机对数据集进行分类。因此，GSVMA混合方法的性能优于其他方法。通过对Z-Alizadeh-Sani数据集上的35个选定特征进行10倍交叉验证，该方法的最高精确度为89.45%。因此，遗传优化算法对于提高精度是非常有效的。计算机辅助的GSVMA方法可以帮助临床医生进行CAD诊断。摘要：Coronary heart disease (CAD) is one of the crucial reasons for cardiovascular mortality in middle-aged people worldwide. The most typical tool is angiography for diagnosing CAD. The challenges of CAD diagnosis using angiography are costly and have side effects. One of the alternative solutions is the use of machine learning-based patterns for CAD diagnosis. Hence, this paper provides a new hybrid machine learning model called Genetic Support Vector Machine and Analysis of Variance (GSVMA). The ANOVA is known as the kernel function for SVM. The proposed model is performed based on the Z-Alizadeh Sani dataset. A genetic optimization algorithm is used to select crucial features. In addition, SVM with Anova, Linear SVM, and LibSVM with radial basis function methods were applied to classify the dataset. As a result, the GSVMA hybrid method performs better than other methods. This proposed method has the highest accuracy of 89.45% through a 10-fold cross-validation technique with 35 selected features on the Z-Alizadeh Sani dataset. Therefore, the genetic optimization algorithm is very effective for improving accuracy. The computer-aided GSVMA method can be helped clinicians with CAD diagnosis.

【12】 On Accelerating Distributed Convex Optimizations 标题：关于加速分布式凸优化的研究链接：https://arxiv.org/abs/2108.08670

作者：Kushal Chakrabarti,Nirupam Gupta,Nikhil Chopra 机构：Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland , U.S.A., École polytechnique fédérale de Lausanne (EPFL), CH-, Lausanne, Switzerland, Department of Mechanical Engineering 摘要：本文研究了一个分布式多智能体凸优化问题。在这个问题中，系统由多个代理组成，每个代理都有一组本地数据点和一个相关的本地成本函数。代理连接到服务器，并且没有代理间通信。代理的目标是学习一个参数向量，该参数向量在不暴露本地数据点的情况下优化本地成本的集合。原则上，代理可以通过使用传统的分布式梯度下降方法与服务器协作来解决此问题。然而，当总成本是病态的时，梯度下降法（i）需要大量迭代才能收敛，（ii）对过程噪声非常不稳定。我们提出了一种迭代预处理技术来减轻代价函数条件对分布式梯度下降收敛速度的不利影响。与传统的预处理技术不同，我们提出的技术中的预处理矩阵迭代更新，以便于在分布式网络上实现。在分布式环境下，我们证明了该算法的线性收敛性，与传统的和自适应的梯度下降方法相比，具有更好的收敛速度。此外，对于总成本最小值唯一的特殊情况，我们的算法超线性收敛。我们证明了我们的算法在解决实际logistic回归问题和通过带噪声的二次模型模拟神经网络训练方面，与著名的分布式算法相比，具有优越的性能，从而表明了该算法在分布式求解非凸优化问题方面的效率。此外，我们的经验表明，该算法在不影响泛化性能的情况下，训练速度更快。摘要：This paper studies a distributed multi-agent convex optimization problem. The system comprises multiple agents in this problem, each with a set of local data points and an associated local cost function. The agents are connected to a server, and there is no inter-agent communication. The agents' goal is to learn a parameter vector that optimizes the aggregate of their local costs without revealing their local data points. In principle, the agents can solve this problem by collaborating with the server using the traditional distributed gradient-descent method. However, when the aggregate cost is ill-conditioned, the gradient-descent method (i) requires a large number of iterations to converge, and (ii) is highly unstable against process noise. We propose an iterative pre-conditioning technique to mitigate the deleterious effects of the cost function's conditioning on the convergence rate of distributed gradient-descent. Unlike the conventional pre-conditioning techniques, the pre-conditioner matrix in our proposed technique updates iteratively to facilitate implementation on the distributed network. In the distributed setting, we provably show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods. Additionally, for the special case when the minimizer of the aggregate cost is unique, our algorithm converges superlinearly. We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems and emulating neural network training via a noisy quadratic model, thereby signifying the proposed algorithm's efficiency for distributively solving non-convex optimization. Moreover, we empirically show that the proposed algorithm results in faster training without compromising the generalization performance.

linux https 网络安全批量计算学习方法

0 人点赞