机器学习学术速递[9.2]

2021-09-16 14:57:58 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.LG 方向,今日共计37篇

Graph相关(图学习|图神经网络|图优化等)(1篇)

【1】 Position-based Hash Embeddings For Scaling Graph Neural Networks 标题:基于位置的标度图神经网络散列嵌入算法 链接:https://arxiv.org/abs/2109.00101

作者:Maria Kalantzi,George Karypis 机构:Computer Science & Engineering, University of Minnesota 备注:10 pages 摘要:图形神经网络(GNNs)将深度表示学习的力量引入图形和关系数据,并在许多应用中实现最先进的性能。GNN通过考虑节点ego网络的拓扑结构和ego网络节点的特征来计算节点表示。当节点没有高质量的特征时,GNN学习嵌入层来计算节点嵌入并将其用作输入特征。但是,嵌入层的大小与图的大小成线性关系,并且不能扩展到具有数亿个节点的图。为了减少与此嵌入层相关的内存,可以潜在地使用NLP和推荐系统等应用程序中常用的基于哈希的方法。然而,这些思想的直接应用未能利用这样一个事实,即在许多现实世界的图中,拓扑上接近的节点往往会相互关联(同源性),因此它们的表示形式将是相似的。在这项工作中,我们提出了利用节点在图中的位置来显著减少所需内存的方法,同时尽可能减少生成的GNN模型的质量下降。我们的方法将节点的嵌入分解为两个组件:位置特定组件和节点特定组件。位置特定组件建模同质性,节点特定组件建模节点到节点的变化。使用不同数据集和GNN模型进行的大量实验表明,在几乎所有情况下,我们的方法能够将内存需求减少86%到97%,同时比其他竞争方法(包括完全嵌入)实现更好的分类精度。 摘要:Graph Neural Networks (GNNs) bring the power of deep representation learning to graph and relational data and achieve state-of-the-art performance in many applications. GNNs compute node representations by taking into account the topology of the node's ego-network and the features of the ego-network's nodes. When the nodes do not have high-quality features, GNNs learn an embedding layer to compute node embeddings and use them as input features. However, the size of the embedding layer is linear to the graph size and does not scale to graphs with hundreds of millions of nodes. To reduce the memory associated with this embedding layer, hashing-based approaches, commonly used in applications like NLP and recommender systems, can potentially be used. However, a direct application of these ideas fails to exploit the fact that in many real-world graphs, nodes that are topologically close will tend to be related to each other (homophily) and as such their representations will be similar. In this work, we present approaches that take advantage of the nodes' position in the graph to dramatically reduce the memory required, with minimal if any degradation in the quality of the resulting GNN model. Our approaches decompose a node's embedding into two components: a position-specific component and a node-specific component. The position-specific component models homophily and the node-specific component models the node-to-node variation. Extensive experiments using different datasets and GNN models show that in nearly all cases, our methods are able to reduce the memory requirements by 86% to 97% while achieving better classification accuracy than other competing approaches, including the full embeddings.

半/弱/无/有监督|不确定性|主动学习(2篇)

【1】 Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation 标题:基于不确定性估计的自主学习促进跨语言迁移 链接:https://arxiv.org/abs/2109.00194

作者:Liyan Xu,Xuchao Zhang,Xujiang Zhao,Haifeng Chen,Feng Chen,Jinho D. Choi 机构:Choi††Emory University, com§University of Texas at Dallas 备注:Accepted to EMNLP 2021 摘要:最近的多语言预训练语言模型已经取得了显著的Zero-Shot性能,该模型仅在一种源语言上进行微调,并直接在目标语言上进行评估。在这项工作中,我们提出了一个自学习框架,进一步利用目标语言的未标记数据,结合过程中的不确定性估计来选择高质量的银标签。三种不同的不确定性被专门用于跨语言迁移:语言异方差/同方差不确定性(LEU/LOU)、证据不确定性(EVI)。我们在两项跨语言任务(包括命名实体识别(NER)和自然语言推理(NLI))中评估了我们的框架,这两项跨语言任务共涵盖40种语言,其NER平均优于基线10 F1,NLI的准确度得分为2.5。 摘要:Recent multilingual pre-trained language models have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Three different uncertainties are adapted and analyzed specifically for the cross lingual transfer: Language Heteroscedastic/Homoscedastic Uncertainty (LEU/LOU), Evidential Uncertainty (EVI). We evaluate our framework with uncertainties on two cross-lingual tasks including Named Entity Recognition (NER) and Natural Language Inference (NLI) covering 40 languages in total, which outperforms the baselines significantly by 10 F1 on average for NER and 2.5 accuracy score for NLI.

【2】 Uncertainty Quantified Deep Learning for Predicting Dice Coefficient of Digital Histopathology Image Segmentation 标题:预测数字组织病理学图像分割骰子系数的不确定性量化深度学习 链接:https://arxiv.org/abs/2109.00115

作者:Sambuddha Ghosal,Audrey Xie,Pratik Shah 机构:Massachusetts Institute of Technology, Program in Media, Arts and Sciences and Media Laboratory, Ames Street, Cambridge MA , United States 备注:Submitted to the 2022 IEEE International Symposium on Biomedical Imaging (ISBI) scientific conference 摘要:深度学习模型(DLMs)可以在医学图像分割和分类任务中实现最先进的性能。然而,不为其预测提供反馈(如Dice系数(Dice))的DLM在现实世界临床环境中的部署潜力有限。不确定性评估可以通过确定需要进一步审查但在计算上仍然无法部署的预测来增加这些自动化系统的可信度。在这项研究中,我们使用具有随机初始化权重的DLM和蒙特卡罗脱落(MCD)从显微镜下苏木精和伊红(H&E)染色的前列腺核心活检RGB图像分割肿瘤。我们设计了一种新的方法,使用单个图像(而不是整个图像)的多个基于临床区域的不确定性来预测线性模型输出的DLM模型的骰子。生成图像级不确定性图,并显示不完善的模型分割与特定前列腺组织区域(有或无肿瘤)相关的高度不确定性之间的对应关系。本研究结果表明,线性模型可以学习量化深度学习的不确定性系数和相关性((Spearman相关性(p<0.05))来预测医学图像特定区域的Dice分数。 摘要:Deep learning models (DLMs) can achieve state of the art performance in medical image segmentation and classification tasks. However, DLMs that do not provide feedback for their predictions such as Dice coefficients (Dice) have limited deployment potential in real world clinical settings. Uncertainty estimates can increase the trust of these automated systems by identifying predictions that need further review but remain computationally prohibitive to deploy. In this study, we use a DLM with randomly initialized weights and Monte Carlo dropout (MCD) to segment tumors from microscopic Hematoxylin and Eosin (H&E) dye stained prostate core biopsy RGB images. We devise a novel approach that uses multiple clinical region based uncertainties from a single image (instead of the entire image) to predict Dice of the DLM model output by linear models. Image level uncertainty maps were generated and showed correspondence between imperfect model segmentation and high levels of uncertainty associated with specific prostate tissue regions with or without tumors. Results from this study suggest that linear models can learn coefficients of uncertainty quantified deep learning and correlations ((Spearman's correlation (p<0.05)) to predict Dice scores of specific regions of medical images.

迁移|Zero/Few/One-Shot|自适应(1篇)

【1】 Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues 标题:自适应的对话中指代标识的端到端共指解析系统 链接:https://arxiv.org/abs/2109.00185

作者:Liyan Xu,Jinho D. Choi 机构:Computer Science, Emory University, Atlanta, GA 备注:Submitted to CRAC 2021 摘要:我们提出了一个基于端到端的神经共指消解模型的有效系统,针对对话中的回指消解任务。我们的方法具体涉及三个方面,包括支持单身人士、在对话互动过程中编码说话人和话轮,以及利用现有资源进行知识转移。尽管我们的适应策略很简单,但它们对最终性能产生了重大影响,与基线相比F1提高了27。在CRAC 2021共享任务中,我们的最终系统在回指解析跟踪排行榜上排名第一,并在所有四个数据集上获得最佳评估结果。 摘要:We present an effective system adapted from the end-to-end neural coreference resolution model, targeting on the task of anaphora resolution in dialogues. Three aspects are specifically addressed in our approach, including the support of singletons, encoding speakers and turns throughout dialogue interactions, and knowledge transfer utilizing existing resources. Despite the simplicity of our adaptation strategies, they are shown to bring significant impact to the final performance, with up to 27 F1 improvement over the baseline. Our final system ranks the 1st place on the leaderboard of the anaphora resolution track in the CRAC 2021 shared task, and achieves the best evaluation results on all four datasets.

强化学习(1篇)

【1】 A Survey of Exploration Methods in Reinforcement Learning 标题:强化学习中的探索方法综述 链接:https://arxiv.org/abs/2109.00157

作者:Susan Amin,Maziar Gomrokchi,Harsh Satija,Herke van Hoof,Doina Precup 机构:Department of Computer Science, McGill University, Mila- Québec Artificial Intelligence Institute, Montréal, Québec, Canada, Informatics Institute, University of Amsterdam, Amsterdam, the Netherlands, Editor: xxx 摘要:探索是强化学习算法的一个重要组成部分,其中代理需要学习如何预测和控制未知且通常是随机的环境。强化学习代理主要依靠探索来获取学习过程中的信息数据,因为缺乏足够的信息可能会阻碍有效的学习。在这篇文章中,我们对(顺序)强化学习中的现代探索方法进行了综述,并对探索方法进行了分类。 摘要:Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments. Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning. In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.

符号|符号学习(1篇)

【1】 Complexity Measures for Multi-objective Symbolic Regression 标题:多目标符号回归的复杂性测度 链接:https://arxiv.org/abs/2109.00238

作者:Michael Kommenda,Andreas Beham,Michael Affenzeller,Gabriel Kronberger 备注:None 摘要:多目标符号回归的优点是,当学习模型的精度达到最大时,复杂性会自动调整,不需要事先指定。优化的结果不再是一个单一的解决方案,而是一个完整的帕累托前沿来描述精度和复杂性之间的权衡。在本文中,我们研究了在使用NSGA-II进行多目标优化时,符号回归中最适合使用哪些复杂度度量。此外,我们提出了一种新的复杂度度量方法,该方法基于模型中出现的函数符号包含语义信息,并在多个基准数据集上测试其效果。结果比较了多种复杂度的措施,提出了在所取得的准确性和模型长度,以说明如何影响算法的搜索方向。 摘要:Multi-objective symbolic regression has the advantage that while the accuracy of the learned models is maximized, the complexity is automatically adapted and need not be specified a-priori. The result of the optimization is not a single solution anymore, but a whole Pareto-front describing the trade-off between accuracy and complexity. In this contribution we study which complexity measures are most appropriately used in symbolic regression when performing multi- objective optimization with NSGA-II. Furthermore, we present a novel complexity measure that includes semantic information based on the function symbols occurring in the models and test its effects on several benchmark datasets. Results comparing multiple complexity measures are presented in terms of the achieved accuracy and model length to illustrate how the search direction of the algorithm is affected.

医学相关(2篇)

【1】 Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts 标题:从文本中探索识别罕见病及其临床表现的深度学习方法 链接:https://arxiv.org/abs/2109.00343

作者:Isabel Segura-Bedmar,David Camino-Perdonas,Sara Guerrero-Aspizua 机构:GUERRERO-ASPIZUA., Human Language and Accesibility Technologies, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Madrid, SPAIN (e-mail: 摘要:虽然罕见疾病的特点是发病率低,但约有3亿人受到罕见疾病的影响。对全科医生来说,这些疾病的早期准确诊断是一项重大挑战,因为他们没有足够的知识来识别这些疾病。除此之外,罕见疾病通常表现出多种多样的表现,这可能使诊断更加困难。延迟诊断可能会对患者的生活产生负面影响。因此,迫切需要增加有关罕见疾病的科学和医学知识。自然语言处理(NLP)和深度学习有助于提取罕见疾病的相关信息,以便于诊断和治疗。本文探讨了几种深度学习技术的使用,如双向长短时记忆(BiLSTM)网络或基于Transformers(BERT)双向编码器表示的深度上下文化单词表示,以识别罕见疾病及其在RareDis语料库中的临床表现(体征和症状)。该语料库包含5000多种罕见疾病和近6000种临床表现。BioBERT是一种基于BERT的领域特定语言表示,在生物医学语料库上进行训练,取得了最好的结果。特别是,该模型对罕见疾病的F1得分为85.2%,优于所有其他模型。 摘要:Although rare diseases are characterized by low prevalence, approximately 300 million people are affected by a rare disease. The early and accurate diagnosis of these conditions is a major challenge for general practitioners, who do not have enough knowledge to identify them. In addition to this, rare diseases usually show a wide variety of manifestations, which might make the diagnosis even more difficult. A delayed diagnosis can negatively affect the patient's life. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) and Deep Learning can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments. The paper explores the use of several deep learning techniques such as Bidirectional Long Short Term Memory (BiLSTM) networks or deep contextualized word representations based on Bidirectional Encoder Representations from Transformers (BERT) to recognize rare diseases and their clinical manifestations (signs and symptoms) in the RareDis corpus. This corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations. BioBERT, a domain-specific language representation based on BERT and trained on biomedical corpora, obtains the best results. In particular, this model obtains an F1-score of 85.2% for rare diseases, outperforming all the other models.

【2】 Federated Learning: Issues in Medical Application 标题:联合学习:医学应用中的问题 链接:https://arxiv.org/abs/2109.00202

作者:Joo Hun Yoo,Hyejun Jeong,Jaehyeok Lee,Tai-Myoung Chung 机构:College of Computing and Informatics, Sungkyunkwan University, Suwon, Korea (Republic of) 备注:20 pages, 3 figures, 1 table, submitted to FDSE2021 摘要:自2017年谷歌引入联邦学习(federated learning)以来,它一直在积极研究,尤其是在医学领域。联邦学习使人工智能学习成为可能,而无需移动本地数据。事实上,在人工智能中,不从本地客户收集数据的机器学习的想法非常有吸引力,因为数据保留在本地站点。然而,联合学习技术由于其自身的特点,如不相同的分布、客户参与管理和易受攻击的环境,仍然存在各种未决问题。在本演示中,将简要概述当前使联合学习在现实世界中发挥完美作用的问题。它们与数据/系统异构性、客户机管理、可追溯性和安全性相关。此外,我们还介绍了我们目前开发的模块化联邦学习框架,以试验各种技术和协议,找到上述问题的解决方案。该框架将在开发完成后向公众开放。 摘要:Since the federated learning, which makes AI learning possible without moving local data around, was introduced by google in 2017 it has been actively studied particularly in the field of medicine. In fact, the idea of machine learning in AI without collecting data from local clients is very attractive because data remain in local sites. However, federated learning techniques still have various open issues due to its own characteristics such as non identical distribution, client participation management, and vulnerable environments. In this presentation, the current issues to make federated learning flawlessly useful in the real world will be briefly overviewed. They are related to data/system heterogeneity, client management, traceability, and security. Also, we introduce the modularized federated learning framework, we currently develop, to experiment various techniques and protocols to find solutions for aforementioned issues. The framework will be open to public after development completes.

联邦学习|隐私保护|加密(2篇)

【1】 Asynchronous Federated Learning for Sensor Data with Concept Drift 标题:考虑概念漂移的传感器数据异步联合学习 链接:https://arxiv.org/abs/2109.00151

作者:Yujing Chen,Zheng Chai,Yue Cheng,Huzefa Rangwala 机构:Computer Science, George Mason University, Fairfax, USA 摘要:联合学习(FL)涉及多个分布式设备联合训练共享模型,而无需任何参与者向集中式服务器透露其本地数据。以前的大多数FL方法都假设设备上的数据在训练过程中是固定的。然而,这种假设是不现实的,因为这些设备通常具有不同的采样率和不同的系统配置。此外,设备数据的基本分布可以随时间动态变化,这称为概念漂移。由于现有数据和未来数据之间的不一致性,概念漂移使得学习过程变得复杂。由于本地设备的异构性,传统的概念漂移处理技术(如基于组块和集成学习的方法)不适用于联邦学习框架。我们提出了一种新的方法FedConD,用于检测和处理本地设备上的概念漂移,并最小化对异步FL中模型性能的影响。漂移检测策略基于一种自适应机制,该机制使用本地模型的历史性能。漂移自适应是通过在每个局部设备上调整目标函数的正则化参数来实现的。此外,我们在服务器端设计了一种通信策略,以谨慎的方式选择本地更新并加快模型收敛。对三个演化数据流和两个图像数据集的实验评估表明,与其他基线方法相比, model~能够检测和处理概念漂移,并降低总体通信成本。 摘要:Federated learning (FL) involves multiple distributed devices jointly training a shared model without any of the participants having to reveal their local data to a centralized server. Most of previous FL approaches assume that data on devices are fixed and stationary during the training process. However, this assumption is unrealistic because these devices usually have varying sampling rates and different system configurations. In addition, the underlying distribution of the device data can change dynamically over time, which is known as concept drift. Concept drift makes the learning process complicated because of the inconsistency between existing and upcoming data. Traditional concept drift handling techniques such as chunk based and ensemble learning-based methods are not suitable in the federated learning frameworks due to the heterogeneity of local devices. We propose a novel approach, FedConD, to detect and deal with the concept drift on local devices and minimize the effect on the performance of models in asynchronous FL. The drift detection strategy is based on an adaptive mechanism which uses the historical performance of the local models. The drift adaptation is realized by adjusting the regularization parameter of objective function on each local device. Additionally, we design a communication strategy on the server side to select local updates in a prudent fashion and speed up model convergence. Experimental evaluations on three evolving data streams and two image datasets show that model~detects and handles concept drift, and also reduces the overall communication cost compared to other baseline methods.

【2】 Federated Reconnaissance: Efficient, Distributed, Class-Incremental Learning 标题:联合侦察:高效、分布式、班级增量学习 链接:https://arxiv.org/abs/2109.00150

作者:Sean M. Hendryx,Dharma Raj KC,Bradley Walls,Clayton T. Morrison 机构:School of Information, University of Arizona, Department of Computer Science, Areté Associates 摘要:我们描述了联邦侦察,这是一类学习问题,其中分布式客户机独立学习新概念并有效地交流知识。特别是,我们为一个系统提出了一个评估框架和方法基线,在这个系统中,每个客户都希望学习越来越多的课程,并与其他客户有效地交流这些课程的知识,这样,在知识融合之后,客户机应该能够准确地区分客户机集合所观察到的类的超集中的类。我们比较了一系列针对这个问题的学习算法,发现原型网络是一种强大的方法,因为它们对灾难性遗忘具有鲁棒性,同时能够有效地吸收新信息。此外,我们还表明,原型向量的在线平均对于客户机模型合并是有效的,并且只需要少量的通信开销、内存和每个类的更新时间,而无需基于梯度的学习或超参数调整。此外,将我们的结果放在上下文中,我们发现一个简单的、具有四个卷积层的原型网络显著优于复杂的、最先进的连续学习算法,在学习600个Omniglot类后,准确率提高了22%以上,在增量学习20个mini ImageNet类后,准确率提高了33%以上。这些结果通过证明通信特征向量是分布式持续学习的有效、健壮和有效手段,对联邦侦察和持续学习具有重要意义。 摘要:We describe federated reconnaissance, a class of learning problems in which distributed clients learn new concepts independently and communicate that knowledge efficiently. In particular, we propose an evaluation framework and methodological baseline for a system in which each client is expected to learn a growing set of classes and communicate knowledge of those classes efficiently with other clients, such that, after knowledge merging, the clients should be able to accurately discriminate between classes in the superset of classes observed by the set of clients. We compare a range of learning algorithms for this problem and find that prototypical networks are a strong approach in that they are robust to catastrophic forgetting while incorporating new information efficiently. Furthermore, we show that the online averaging of prototype vectors is effective for client model merging and requires only a small amount of communication overhead, memory, and update time per class with no gradient-based learning or hyperparameter tuning. Additionally, to put our results in context, we find that a simple, prototypical network with four convolutional layers significantly outperforms complex, state of the art continual learning algorithms, increasing the accuracy by over 22% after learning 600 Omniglot classes and over 33% after learning 20 mini-ImageNet classes incrementally. These results have important implications for federated reconnaissance and continual learning more generally by demonstrating that communicating feature vectors is an efficient, robust, and effective means for distributed, continual learning.

推理|分析|理解|解释(2篇)

【1】 Position Masking for Improved Layout-Aware Document Understanding 标题:位置遮罩可改进版面感知文档理解 链接:https://arxiv.org/abs/2109.00442

作者:Anik Saha,Catherine Finegan-Dollak,Ashish Verma 机构:Rensselaer Polytechnic Institute, Troy, NY, USA, IBM Research, Yorktown Heights, NY, USA 备注:Document Intelligence Workshop at KDD, 2021 摘要:文档扫描和PDF的自然语言处理有可能极大地提高业务流程的效率。布局感知的单词嵌入,如LayoutLM,已显示出对此类文档进行分类和信息提取的前景。本文提出了一个新的预训练任务,称为,它可以提高包含二维位置嵌入的布局感知单词嵌入的性能。我们将仅使用语言掩蔽预训练的模型与同时使用语言掩蔽和位置掩蔽预训练的模型进行比较,发现位置掩蔽在形式理解任务中提高了5%以上的性能。 摘要:Natural language processing for document scans and PDFs has the potential to enormously improve the efficiency of business processes. Layout-aware word embeddings such as LayoutLM have shown promise for classification of and information extraction from such documents. This paper proposes a new pre-training task called that can improve performance of layout-aware word embeddings that incorporate 2-D position embeddings. We compare models pre-trained with only language masking against models pre-trained with both language masking and position masking, and we find that position masking improves performance by over 5% on a form understanding task.

【2】 Task-Oriented Communication for Multi-Device Cooperative Edge Inference 标题:面向任务的多设备协同边缘推理通信 链接:https://arxiv.org/abs/2109.00172

作者:Jiawei Shao,Yuyi Mao,Jun Zhang 机构:The authors are with the Department of Electronic and Information Engineering, Hong Kong Polytechnic University 摘要:本文研究了面向任务的多设备协作边缘推理通信,其中一组分布式低端边缘设备将提取的局部样本特征传输到强大的边缘服务器进行推理。虽然协作边缘推断可以克服单个设备的有限感测能力,但它显著增加了通信开销,并且可能导致过多的延迟。为了实现低延迟协同推理,我们提出了一种基于学习的通信方案,该方案以面向任务的方式优化局部特征提取和分布式特征编码,即。,消除数据冗余并传输下游推理任务所必需的信息,而不是在边缘服务器上重建数据样本。具体而言,我们利用信息瓶颈(IB)原理提取每个边缘设备上的任务相关特征,并采用分布式信息瓶颈(DIB)框架形式化描述分布式特征编码的最佳速率相关权衡的单个字母特征。为了允许灵活控制通信开销,我们将DIB框架扩展到分布式确定性信息瓶颈(DDIB)目标,该目标明确地包含编码特征的代表性成本。由于基于IB的目标对高维数据的计算是禁止的,我们采用变分近似使优化问题易于处理。为了补偿由于变分近似引起的潜在性能损失,我们还开发了一种选择性重传(SR)机制,以识别多个边缘设备编码特征中的冗余,从而实现额外的通信开销降低。大量实验表明,与基线方法相比,本文提出的面向任务的通信方案实现了更好的速率相关性权衡。 摘要:This paper investigates task-oriented communication for multi-device cooperative edge inference, where a group of distributed low-end edge devices transmit the extracted features of local samples to a powerful edge server for inference. While cooperative edge inference can overcome the limited sensing capability of a single device, it substantially increases the communication overhead and may incur excessive latency. To enable low-latency cooperative inference, we propose a learning-based communication scheme that optimizes local feature extraction and distributed feature encoding in a task-oriented manner, i.e., to remove data redundancy and transmit information that is essential for the downstream inference task rather than reconstructing the data samples at the edge server. Specifically, we leverage an information bottleneck (IB) principle to extract the task-relevant feature at each edge device and adopt a distributed information bottleneck (DIB) framework to formalize a single-letter characterization of the optimal rate-relevance tradeoff for distributed feature encoding. To admit flexible control of the communication overhead, we extend the DIB framework to a distributed deterministic information bottleneck (DDIB) objective that explicitly incorporates the representational costs of the encoded features. As the IB-based objectives are computationally prohibitive for high-dimensional data, we adopt variational approximations to make the optimization problems tractable. To compensate the potential performance loss due to the variational approximations, we also develop a selective retransmission (SR) mechanism to identify the redundancy in the encoded features of multiple edge devices to attain additional communication overhead reduction. Extensive experiments evidence that the proposed task-oriented communication scheme achieves a better rate-relevance tradeoff than baseline methods.

检测相关(2篇)

【1】 Deep Dual Support Vector Data Description for Anomaly Detection on Attributed Networks 标题:属性网络异常检测的深度对偶支持向量数据描述 链接:https://arxiv.org/abs/2109.00138

作者:Fengbin Zhang,Haoyi Fan,Ruidong Wang,Zuoyong Li,Tiancai Liang 机构:|, School of Computer Science and, Technology, Harbin University of Science, and Technology, Harbin, China., School of Information Engineering, Zhengzhou University, Zhengzhou, China., Fujian Provincial Key Laboratory of, Information Processing and Intelligent 备注:Accepted by International Journal of Intelligent Systems. Copyright (c) 2021 Wiley. The source codes will be publicly available at this https URL 摘要:网络在现实世界中无处不在,如社交网络和通信网络,网络异常检测的目的是发现其结构或属性模式明显偏离大多数参考节点的节点。然而,传统的异常检测方法大多忽略了数据点之间的关系结构信息,不能有效地推广到图结构数据。在本文中,我们提出了一种基于深度双支持向量数据描述的自动编码器(Dual-SVDAE)端到端模型,用于属性网络的异常检测,该模型同时考虑了属性网络的结构和属性。具体而言,双SVDAE由结构自动编码器和属性自动编码器组成,分别学习节点在结构空间和属性空间中的潜在表示。然后,采用双超球学习机制,分别从结构和属性两个角度学习正常节点的两个超球。此外,为了实现网络结构和属性的联合学习,我们融合了结构嵌入和属性嵌入作为特征解码器的最终输入来生成节点属性。最后,分别在潜在结构空间和属性空间中测量节点到每个超球学习中心的距离,检测异常节点。在真实的属性网络上进行的大量实验表明,对偶SVDAE的性能始终优于现有技术,这证明了该方法的有效性。 摘要:Networks are ubiquitous in the real world such as social networks and communication networks, and anomaly detection on networks aims at finding nodes whose structural or attributed patterns deviate significantly from the majority of reference nodes. However, most of the traditional anomaly detection methods neglect the relation structure information among data points and therefore cannot effectively generalize to the graph structure data. In this paper, we propose an end-to-end model of Deep Dual Support Vector Data description based Autoencoder (Dual-SVDAE) for anomaly detection on attributed networks, which considers both the structure and attribute for attributed networks. Specifically, Dual-SVDAE consists of a structure autoencoder and an attribute autoencoder to learn the latent representation of the node in the structure space and attribute space respectively. Then, a dual-hypersphere learning mechanism is imposed on them to learn two hyperspheres of normal nodes from the structure and attribute perspectives respectively. Moreover, to achieve joint learning between the structure and attribute of the network, we fuse the structure embedding and attribute embedding as the final input of the feature decoder to generate the node attribute. Finally, abnormal nodes can be detected by measuring the distance of nodes to the learned center of each hypersphere in the latent structure space and attribute space respectively. Extensive experiments on the real-world attributed networks show that Dual-SVDAE consistently outperforms the state-of-the-arts, which demonstrates the effectiveness of the proposed method.

【2】 Automatic non-invasive Cough Detection based on Accelerometer and Audio Signals 标题:基于加速度计和音频信号的无创咳嗽自动检测 链接:https://arxiv.org/abs/2109.00103

作者:Madhurananda Pahar,Igor Miranda,Andreas Diacon,Thomas Niesler 机构:Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa 备注:arXiv admin note: text overlap with arXiv:2102.04997 摘要:我们提出了一种基于加速度计和音频信号的自动无创检测咳嗽事件的方法。加速度信号由牢牢固定在患者床上的智能手机通过其集成的加速计捕获。同一部智能手机使用外部麦克风同时捕获音频信号。我们已经编译了一个手动注释的数据集,其中包含同时捕获的加速度和音频信号,用于结核病诊所14名成年男性患者的大约6000例咳嗽和68000例非咳嗽事件。LR、SVM和MLP作为基线分类器进行评估,并使用留一交叉验证方案与深层结构(如CNN、LSTM和Resnet50)进行比较。我们发现,所研究的分类器可以使用加速度或音频信号来区分咳嗽和其他活动,包括打喷嚏、清嗓子和在床上移动,具有很高的准确性。然而,在所有情况下,深度神经网络的性能明显优于浅层分类器,而Resnet50的性能最好,加速度和音频信号的AUC分别超过0.98和0.99。虽然基于音频的分类始终比基于加速的分类提供更好的性能,但我们观察到,对于最好的系统来说,差异非常小。由于加速信号需要更少的处理能力,由于录音的需要被回避,因此隐私被固有地保护,并且由于录音设备被连接到床上,并且没有佩戴,基于加速度计的高精度无创咳嗽检测器可能是长期咳嗽监测中更方便、更容易接受的方法。 摘要:We present an automatic non-invasive way of detecting cough events based on both accelerometer and audio signals. The acceleration signals are captured by a smartphone firmly attached to the patient's bed, using its integrated accelerometer. The audio signals are captured simultaneously by the same smartphone using an external microphone. We have compiled a manually-annotated dataset containing such simultaneously-captured acceleration and audio signals for approximately 6000 cough and 68000 non-cough events from 14 adult male patients in a tuberculosis clinic. LR, SVM and MLP are evaluated as baseline classifiers and compared with deep architectures such as CNN, LSTM, and Resnet50 using a leave-one-out cross-validation scheme. We find that the studied classifiers can use either acceleration or audio signals to distinguish between coughing and other activities including sneezing, throat-clearing, and movement on the bed with high accuracy. However, in all cases, the deep neural networks outperform the shallow classifiers by a clear margin and the Resnet50 offers the best performance by achieving an AUC exceeding 0.98 and 0.99 for acceleration and audio signals respectively. While audio-based classification consistently offers a better performance than acceleration-based classification, we observe that the difference is very small for the best systems. Since the acceleration signal requires less processing power, and since the need to record audio is sidestepped and thus privacy is inherently secured, and since the recording device is attached to the bed and not worn, an accelerometer-based highly accurate non-invasive cough detector may represent a more convenient and readily accepted method in long-term cough monitoring.

分类|识别(1篇)

【1】 An Empirical Study on the Joint Impact of Feature Selection and Data Resampling on Imbalance Classification 标题:特征选择和数据重采样对不平衡分类联合影响的实证研究 链接:https://arxiv.org/abs/2109.00201

作者:Chongsheng Zhang,Paolo Soda,Jingjun Bi,Gaojuan Fan,George Almpanidis,Salvador Garcia 机构:School of Computer and Information Engineering, Henan University, China, Department of Engineering, University Campus Bio-Medico of Rome, Italy, Department of Computer Science and Artificial Intelligence, University of Granada, Spain 备注:25 pages, 12 figures 摘要:现实世界的数据集通常呈现不同程度的不平衡(即长尾或倾斜)分布。虽然大多数(又称头部或频繁)类有足够的样本,但少数(又称尾部或罕见)类的样本数量可能不足。一方面,数据重采样是解决类不平衡的常用方法。另一方面,降维是一种传统的机器学习技术,用于在数据集上建立更强的分类模型,它减少了特征空间。然而,在高性能不平衡分类中,特征选择和数据重采样之间可能的协同作用以前很少被研究。为了解决这一问题,本文对特征选择和重采样对两类不平衡分类的联合影响进行了全面的实证研究。具体来说,我们研究了两条相反的管道在不平衡分类中的性能,即在数据重采样之前或之后应用特征选择。我们在52个公开数据集上进行了大量实验(共9225个实验),使用了9种特征选择方法、6种用于类不平衡学习的重采样方法和3种著名的分类算法。实验结果表明,这两条管道之间不存在恒定的赢家,因此,应考虑这两条管道来推导性能最佳的不平衡分类模型。我们还发现,不平衡分类模型的性能取决于所采用的分类器、多数样本数与少数样本数之比(IR)以及样本数与特征数之比(SFR)。总的来说,本研究为研究者和实践者提供了新的参考价值。 摘要:Real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (a.k.a., head or frequent) classes have sufficient samples, the minority (a.k.a., tail or rare) classes can be under-represented by a rather limited number of samples. On one hand, data resampling is a common approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a conventional machine learning technique for building stronger classification models on a dataset. However, the possible synergy between feature selection and data resampling for high-performance imbalance classification has rarely been investigated before. To address this issue, this paper carries out a comprehensive empirical study on the joint influence of feature selection and resampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines for imbalance classification, i.e., applying feature selection before or after data resampling. We conduct a large amount of experiments (a total of 9225 experiments) on 52 publicly available datasets, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results show that there is no constant winner between the two pipelines, thus both of them should be considered to derive the best performing model for imbalance classification. We also find that the performance of an imbalance classification model depends on the classifier adopted, the ratio between the number of majority and minority samples (IR), as well as on the ratio between the number of samples and features (SFR). Overall, this study should provide new reference value for researchers and practitioners in imbalance learning.

表征(1篇)

【1】 Sense representations for Portuguese: experiments with sense embeddings and deep neural language models 标题:葡萄牙语的意义表征:意义嵌入和深层神经语言模型的实验 链接:https://arxiv.org/abs/2109.00025

作者:Jessica Rodrigues da Silva,Helena de Medeiros Caseli 机构:br 1Federal University of Sa˜o Carlos (UFSCar) 备注:None 摘要:感官表征已经超越了Word2Vec、GloVe和FastText等词表征,并在广泛的自然语言处理任务中实现了创新性能。尽管在许多应用中非常有用,但生成单词嵌入的传统方法有一个严格的缺点:它们为给定的单词生成单个向量表示,而忽略了歧义单词可能具有不同含义的事实。在本文中,我们探讨了无监督意义表示,它不同于传统的单词嵌入,能够通过分析文本中的上下文语义来归纳单词的不同意义。本文研究的无监督意义表示有:意义嵌入和深层神经语言模型。我们介绍了为葡萄牙语生成语义嵌入而进行的第一次实验。我们的实验表明,语义嵌入模型(Sense2vec)在句法和语义类比任务中的性能优于传统的单词嵌入,证明了这里生成的语言资源可以提高葡萄牙语NLP任务的性能。我们还评估了预训练的深层神经语言模型(ELMo和BERT)在语义-文本相似性任务中的两种迁移学习方法:基于特征的和微调的性能。我们的实验表明,经过微调的多语言和葡萄牙语BERT语言模型能够获得比ELMo模型和基线更好的精度。 摘要:Sense representations have gone beyond word representations like Word2Vec, GloVe and FastText and achieved innovative performance on a wide range of natural language processing tasks. Although very useful in many applications, the traditional approaches for generating word embeddings have a strict drawback: they produce a single vector representation for a given word ignoring the fact that ambiguous words can assume different meanings. In this paper, we explore unsupervised sense representations which, different from traditional word embeddings, are able to induce different senses of a word by analyzing its contextual semantics in a text. The unsupervised sense representations investigated in this paper are: sense embeddings and deep neural language models. We present the first experiments carried out for generating sense embeddings for Portuguese. Our experiments show that the sense embedding model (Sense2vec) outperformed traditional word embeddings in syntactic and semantic analogies task, proving that the language resource generated here can improve the performance of NLP tasks in Portuguese. We also evaluated the performance of pre-trained deep neural language models (ELMo and BERT) in two transfer learning approaches: feature based and fine-tuning, in the semantic textual similarity task. Our experiments indicate that the fine tuned Multilingual and Portuguese BERT language models were able to achieve better accuracy than the ELMo model and baselines.

优化|敛散性(1篇)

【1】 Deep mathcal{L}^1 Stochastic Optimal Control Policies for Planetary Soft-landing链接:https://arxiv.org/abs/2109.00183

作者:Marcus A. Pereira,Camilo A. Duarte,Ioannis Exarchos,Evangelos A. Theodorou 机构:The Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA , School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA , Microsoft 摘要:本文基于非线性随机最优控制(SOC)原理和费曼-卡克理论,提出了一种新的基于深度学习的动力下降制导(PDG)问题的解决方案。我们的算法通过将PDG问题构造为$mathcal{L}^1$SOC问题来解决,以获得最小的燃油消耗。此外,它可以处理实际有用的控制约束、非线性动力学,并将状态约束强制为软约束。这是通过建立最近关于深层向前向后随机微分方程(FBSDE)和基于随机搜索的可微非凸优化神经网络层的工作来实现的。与以前的方法相比,我们的算法不需要约束的凸化或动力学的线性化,并且经验证明对随机干扰和航天器的初始位置具有鲁棒性。离线训练后,一旦航天器位于着陆区预先指定的半径范围内,并且处于预先指定的高度,即尖端位于着陆区的倒锥底部,我们的控制器就可以激活。我们的经验证明,我们的控制器可以成功和安全地降落在这个圆锥底部初始化的所有轨迹,同时最大限度地减少燃料消耗。 摘要:In this paper, we introduce a novel deep learning based solution to the Powered-Descent Guidance (PDG) problem, grounded in principles of nonlinear Stochastic Optimal Control (SOC) and Feynman-Kac theory. Our algorithm solves the PDG problem by framing it as an $mathcal{L}^1$ SOC problem for minimum fuel consumption. Additionally, it can handle practically useful control constraints, nonlinear dynamics and enforces state constraints as soft-constraints. This is achieved by building off of recent work on deep Forward-Backward Stochastic Differential Equations (FBSDEs) and differentiable non-convex optimization neural-network layers based on stochastic search. In contrast to previous approaches, our algorithm does not require convexification of the constraints or linearization of the dynamics and is empirically shown to be robust to stochastic disturbances and the initial position of the spacecraft. After training offline, our controller can be activated once the spacecraft is within a pre-specified radius of the landing zone and at a pre-specified altitude i.e., the base of an inverted cone with the tip at the landing zone. We demonstrate empirically that our controller can successfully and safely land all trajectories initialized at the base of this cone while minimizing fuel consumption.

其他神经网络|深度学习|模型|建模(11篇)

【1】 The Second International Verification of Neural Networks Competition (VNN-COMP 2021): Summary and Results 标题:第二届国际神经网络竞赛(VNN-COMP 2021)综述与结果 链接:https://arxiv.org/abs/2109.00498

作者:Stanley Bak,Changliu Liu,Taylor Johnson 机构:Liu is with Carnegie Mellon University, Johnson is with Vanderbilt University 摘要:本报告总结了第二届国际神经网络验证竞赛(VNN-COMP 2021),该竞赛是与第33届国际计算机辅助验证会议(CAV)合办的第四届ML自主系统形式化方法研讨会的一部分。12个队参加了这次比赛。竞赛的目的是在可扩展性和速度方面对神经网络验证中的最新方法进行客观比较。沿着这条路线,我们使用了标准格式(ONNX用于神经网络,VNNLIB用于规范)、标准硬件(所有工具都由AWS上的组织者运行)以及工具作者提供的工具参数。本报告总结了规则、基准、参与工具、结果以及从本次比赛中吸取的经验教训。 摘要:This report summarizes the second International Verification of Neural Networks Competition (VNN-COMP 2021), held as a part of the 4th Workshop on Formal Methods for ML-Enabled Autonomous Systems that was collocated with the 33rd International Conference on Computer-Aided Verification (CAV). Twelve teams participated in this competition. The goal of the competition is to provide an objective comparison of the state-of-the-art methods in neural network verification, in terms of scalability and speed. Along this line, we used standard formats (ONNX for neural networks and VNNLIB for specifications), standard hardware (all tools are run by the organizers on AWS), and tool parameters provided by the tool authors. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this competition.

【2】 The Impact of Reinitialization on Generalization in Convolutional Neural Networks 标题:卷积神经网络中重新初始化对泛化的影响 链接:https://arxiv.org/abs/2109.00267

作者:Ibrahim Alabdulmohsin,Hartmut Maennel,Daniel Keysers 机构:Google Research, Z¨urich, Switzerland 备注:12 figures, 7 tables 摘要:最近的研究结果表明,在训练过程中重新初始化神经网络的一部分参数可以提高泛化能力,特别是对于较小的训练集。我们研究了12个基准图像分类数据集中几种卷积结构中不同的重新初始化方法的影响,分析了它们的潜在收益并强调了其局限性。我们还介绍了一种新的分层重新初始化算法,其性能优于以前的方法,并对观察到的改进泛化提出了解释。首先,我们证明了分层重新初始化在不增加权重范数的情况下增加了训练示例的裕度,从而改善了神经网络基于裕度的泛化边界。第二,我们证明了它是在损失面的更平坦的局部极小值上解决的。第三,它通过强调神经网络的底层来鼓励学习一般规则,而不鼓励记忆。我们的结论是,对于使用自底向上分层重新初始化的小数据集,卷积神经网络的精度可以得到提高,其中重新初始化的层数可能因可用的计算预算而异。 摘要:Recent results suggest that reinitializing a subset of the parameters of a neural network during training can improve generalization, particularly for small training sets. We study the impact of different reinitialization methods in several convolutional architectures across 12 benchmark image classification datasets, analyzing their potential gains and highlighting limitations. We also introduce a new layerwise reinitialization algorithm that outperforms previous methods and suggest explanations of the observed improved generalization. First, we show that layerwise reinitialization increases the margin on the training examples without increasing the norm of the weights, hence leading to an improvement in margin-based generalization bounds for neural networks. Second, we demonstrate that it settles in flatter local minima of the loss surface. Third, it encourages learning general rules and discourages memorization by placing emphasis on the lower layers of the neural network. Our takeaway message is that the accuracy of convolutional neural networks can be improved for small datasets using bottom-up layerwise reinitialization, where the number of reinitialized layers may vary depending on the available compute budget.

【3】 Problem Learning: Towards the Free Will of Machines 标题:问题学习:走向机器的自由意志 链接:https://arxiv.org/abs/2109.00177

作者:Yongfeng Zhang 机构:Department of Computer Science, Rutgers University, New Brunswick, NJ 备注:17 pages, 1 figure 摘要:机器智能管道通常由六个部分组成:问题、表示、模型、损失、优化器和度量。研究人员一直在努力使管道的许多组件实现自动化。然而,管道的一个关键组成部分——问题定义——在自动化方面仍然没有得到充分的探索。通常,它需要领域专家的广泛努力来识别、定义和阐述某一领域的重要问题。然而,自动发现某个领域的研究或应用问题是有益的,因为它有助于识别隐藏在数据中的有效和潜在的重要问题,而这些问题是领域专家所不知道的,可以扩大我们在某个领域可以做的任务范围,甚至可以激发全新的发现。本文描述了问题学习,旨在学习从数据或机器与环境的交互中发现和定义有效的道德问题。我们将问题学习形式化为在问题空间中识别有效的道德问题,并介绍几种可能的问题学习方法。从广义上讲,问题学习是一种实现智能机器自由意志的方法。目前,机器仍然局限于解决人类定义的问题,没有能力或灵活性自由探索人类甚至未知的各种可能问题。尽管许多机器学习技术已经被开发并集成到智能系统中,但它们仍然关注于机器解决人类定义的问题的方法而不是目的。然而,提出好的问题有时甚至比解决问题更重要,因为一个好的问题有助于激发新的想法和获得更深的理解。本文还讨论了负责任人工智能背景下问题学习的伦理含义。 摘要:A machine intelligence pipeline usually consists of six components: problem, representation, model, loss, optimizer and metric. Researchers have worked hard trying to automate many components of the pipeline. However, one key component of the pipeline--problem definition--is still left mostly unexplored in terms of automation. Usually, it requires extensive efforts from domain experts to identify, define and formulate important problems in an area. However, automatically discovering research or application problems for an area is beneficial since it helps to identify valid and potentially important problems hidden in data that are unknown to domain experts, expand the scope of tasks that we can do in an area, and even inspire completely new findings. This paper describes Problem Learning, which aims at learning to discover and define valid and ethical problems from data or from the machine's interaction with the environment. We formalize problem learning as the identification of valid and ethical problems in a problem space and introduce several possible approaches to problem learning. In a broader sense, problem learning is an approach towards the free will of intelligent machines. Currently, machines are still limited to solving the problems defined by humans, without the ability or flexibility to freely explore various possible problems that are even unknown to humans. Though many machine learning techniques have been developed and integrated into intelligent systems, they still focus on the means rather than the purpose in that machines are still solving human defined problems. However, proposing good problems is sometimes even more important than solving problems, because a good problem can help to inspire new ideas and gain deeper understandings. The paper also discusses the ethical implications of problem learning under the background of Responsible AI.

【4】 A Weight Initialization Based on the Linear Product Structure for Neural Networks 标题:一种基于线性乘积结构的神经网络权重初始化方法 链接:https://arxiv.org/abs/2109.00125

作者:Qipin Chen,Wenrui Hao,Juncai He 机构:• We consider the neural network training from a nonlinear computation point of view;, • A new linear product structure initialization strategy has been developed for training neural networks; 备注:18 pages, 4 figures 摘要:权重初始化在神经网络训练中起着重要作用,也影响着大量的深度学习应用。对于不同的激活函数和不同的神经网络,已经开发了不同的权重初始化策略。这些初始化算法基于最小化层间参数的方差,并且当神经网络很深时(例如,死掉的ReLU)仍可能失败。为了应对这一挑战,我们从非线性计算的角度研究了神经网络,并提出了一种基于神经网络线性乘积结构(LPS)的权值初始化策略。该策略是从激活函数的多项式逼近出发,利用数值代数几何理论来保证找到所有的局部极小值。我们还提供了一个理论分析,与其他现有的初始化策略相比,LPS初始化具有较低的ReLU死亡概率。最后,我们在全连通神经网络和卷积神经网络上测试了LPS初始化算法,以证明其可行性、有效性和对公共数据集的鲁棒性。 摘要:Weight initialization plays an important role in training neural networks and also affects tremendous deep learning applications. Various weight initialization strategies have already been developed for different activation functions with different neural networks. These initialization algorithms are based on minimizing the variance of the parameters between layers and might still fail when neural networks are deep, e.g., dying ReLU. To address this challenge, we study neural networks from a nonlinear computation point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks. The proposed strategy is derived from the polynomial approximation of activation functions by using theories of numerical algebraic geometry to guarantee to find all the local minima. We also provide a theoretical analysis that the LPS initialization has a lower probability of dying ReLU comparing to other existing initialization strategies. Finally, we test the LPS initialization algorithm on both fully connected neural networks and convolutional neural networks to show its feasibility, efficiency, and robustness on public datasets.

【5】 Quantized convolutional neural networks through the lens of partial differential equations 标题:基于偏微分方程透镜的量子化卷积神经网络 链接:https://arxiv.org/abs/2109.00095

作者:Ido Ben-Yair,Gil Ben Shalom,Moshe Eliasof,Eran Treister 机构:Received: date Accepted: date 摘要:卷积神经网络(CNN)的量化是一种常用的方法,可以减轻CNN部署过程中的计算负担,尤其是在低资源边缘设备上。然而,定点算法对于神经网络中涉及的计算类型来说并不自然。在这项工作中,我们探索了使用基于PDE的透视图和分析来改进量化CNN的方法。首先,我们利用全变差(TV)方法将边缘感知平滑应用于整个网络中的特征贴图。这旨在减少值分布中的异常值,并促进更适合量化的分段常数映射。其次,我们考虑常见的CNNs的对称和稳定的变体,用于图像分类,和图卷积网络(GCNS)的图形节点分类。我们通过几个实验证明了前向稳定性的性质在不同的量化速率下保持了网络的行为。因此,稳定的量化网络的行为与非量化网络类似,即使它们依赖较少的参数。我们还发现,有时,稳定性甚至有助于提高准确性。这些特性对于诸如自动驾驶等敏感、资源受限、低功耗或实时应用特别有意义。 摘要:Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs, especially on low-resource edge devices. However, fixed-point arithmetic is not natural to the type of computations involved in neural networks. In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis. First, we harness the total variation (TV) approach to apply edge-aware smoothing to the feature maps throughout the network. This aims to reduce outliers in the distribution of values and promote piece-wise constant maps, which are more suitable for quantization. Secondly, we consider symmetric and stable variants of common CNNs for image classification, and Graph Convolutional Networks (GCNs) for graph node-classification. We demonstrate through several experiments that the property of forward stability preserves the action of a network under different quantization rates. As a result, stable quantized networks behave similarly to their non-quantized counterparts even though they rely on fewer parameters. We also find that at times, stability even aids in improving accuracy. These properties are of particular interest for sensitive, resource-constrained, low-power or real-time applications like autonomous driving.

【6】 Effectiveness of Deep Networks in NLP using BiDAF as an example architecture 标题:深度网络在自然语言处理中的有效性研究--以BiDAF为例 链接:https://arxiv.org/abs/2109.00074

作者:Soumyendu Sarkar 机构:Stanford Center for Professional Development 摘要:通过高级模型体系结构(如BERT和BiDAF)以及早期的基于单词、字符和上下文的嵌入,NLP的问答已经取得了进展。由于BERT已经超越了模型的准确性,下一个前沿领域的一个要素可能是引入深度网络和对其进行训练的有效方法。在此背景下,我探讨了专注于BiDAF模型编码器层的深度网络的有效性。BiDAF及其异构层提供了机会,不仅可以探索深层网络的有效性,还可以评估在较低层进行的改进是否与在模型体系结构的上层进行的改进相加。我相信NLP中的下一个最伟大的模型实际上将折叠在一个实体语言建模中,比如带有复合体系结构的BERT,除了通用语言建模之外,它还将带来改进,并将具有更广泛的分层体系结构。我尝试了旁路网络、剩余公路网络和DenseNet架构。此外,我还评估了对网络最后几层进行加密的有效性。我还研究了将字符嵌入添加到单词嵌入中所产生的差异,以及这些影响是否与深层网络相关。我的研究表明,深层次的人际网络实际上能有效地促进发展。此外,较低层的细化(如嵌入)与通过深层网络获得的收益一起传递。 摘要:Question Answering with NLP has progressed through the evolution of advanced model architectures like BERT and BiDAF and earlier word, character, and context-based embeddings. As BERT has leapfrogged the accuracy of models, an element of the next frontier can be the introduction of deep networks and an effective way to train them. In this context, I explored the effectiveness of deep networks focussing on the model encoder layer of BiDAF. BiDAF with its heterogeneous layers provides the opportunity not only to explore the effectiveness of deep networks but also to evaluate whether the refinements made in lower layers are additive to the refinements made in the upper layers of the model architecture. I believe the next greatest model in NLP will in fact fold in a solid language modeling like BERT with a composite architecture which will bring in refinements in addition to generic language modeling and will have a more extensive layered architecture. I experimented with the Bypass network, Residual Highway network, and DenseNet architectures. In addition, I evaluated the effectiveness of ensembling the last few layers of the network. I also studied the difference character embeddings make in adding them to the word embeddings, and whether the effects are additive with deep networks. My studies indicate that deep networks are in fact effective in giving a boost. Also, the refinements in the lower layers like embeddings are passed on additively to the gains made through deep networks.

【7】 Data-Driven Reduced-Order Modeling of Spatiotemporal Chaos with Neural Ordinary Differential Equations 标题:基于神经常微分方程的时空混沌数据驱动降阶建模 链接:https://arxiv.org/abs/2109.00060

作者:Alec J. Linot,Michael D. Graham 机构:Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison WI , USA, ), arXiv:,.,v, [cs.LG] , Aug 摘要:表现出混沌动力学的耗散偏微分方程趋向于演化为存在于有限维流形上的吸引子。我们提出了一种数据驱动的降阶建模方法,该方法利用这一事实,通过找到该流形的坐标,并找到一个描述该坐标系中动力学的常微分方程(ODE)。流形坐标是使用一个欠完整的自动编码器发现的——一个先降维后扩维的神经网络(NN)。然后,在这些坐标系中,使用神经ODE框架通过神经网络近似ODE。这两种方法都只需要数据快照来学习模型,并且数据可能分布广泛和/或不均匀。我们将此框架应用于Kuramoto-Sivashinsky,用于显示混沌动力学的不同域大小。在这个系统中,我们发现降维提高了相对于环境空间中出现伪影的预测的性能。然后,使用低维模型,我们改变训练数据间隔,并发现对于大间隔数据(间隔约为0.7 Lyapunov时间)的真实动态的优秀短期和长期统计再现。最后,我们将性能与不同程度的维度缩减进行比较,并在性能与维度方面找到一个“最佳点”。 摘要:Dissipative partial differential equations that exhibit chaotic dynamics tend to evolve to attractors that exist on finite-dimensional manifolds. We present a data-driven reduced order modeling method that capitalizes on this fact by finding the coordinates of this manifold and finding an ordinary differential equation (ODE) describing the dynamics in this coordinate system. The manifold coordinates are discovered using an undercomplete autoencoder -- a neural network (NN) that reduces then expands dimension. Then the ODE, in these coordinates, is approximated by a NN using the neural ODE framework. Both of these methods only require snapshots of data to learn a model, and the data can be widely and/or unevenly spaced. We apply this framework to the Kuramoto-Sivashinsky for different domain sizes that exhibit chaotic dynamics. With this system, we find that dimension reduction improves performance relative to predictions in the ambient space, where artifacts arise. Then, with the low-dimensional model, we vary the training data spacing and find excellent short- and long-time statistical recreation of the true dynamics for widely spaced data (spacing of ~0.7 Lyapunov times). We end by comparing performance with various degrees of dimension reduction, and find a "sweet spot" in terms of performance vs. dimension.

【8】 Machine-Learning media bias 标题:机器学习的媒体偏向 链接:https://arxiv.org/abs/2109.00024

作者:Samantha D'Alonzo,Max Tegmark 机构:Dept. of Physics and Institute for AI & Fundamental Interactions 备注:29 pages, 23 figs; data available at this https URL 摘要:我们提出了一种自动测量媒体偏差的方法。仅根据报纸使用不同短语的频率推断出哪家报纸发表了给定的文章,从而得出一个条件概率分布,其分析使我们能够自动将报纸和短语映射到偏差空间。通过分析大约100份报纸上的大约100万篇文章中的几十个新闻主题的偏见,我们的方法将报纸映射到二维偏见图景中,这与之前基于人类判断的偏见分类非常一致。一个维度可以解释为传统的左右偏差,另一个维度可以解释为建立偏差。这意味着,尽管新闻偏见本质上是政治性的,但其衡量标准不必如此。 摘要:We present an automated method for measuring media bias. Inferring which newspaper published a given article, based only on the frequencies with which it uses different phrases, leads to a conditional probability distribution whose analysis lets us automatically map newspapers and phrases into a bias space. By analyzing roughly a million articles from roughly a hundred newspapers for bias in dozens of news topics, our method maps newspapers into a two-dimensional bias landscape that agrees well with previous bias classifications based on human judgement. One dimension can be interpreted as traditional left-right bias, the other as establishment bias. This means that although news bias is inherently political, its measurement need not be.

【9】 FADE: FAir Double Ensemble Learning for Observable and Counterfactual Outcomes 标题:FADE:针对可观察和反事实结果的公平双重奏学习 链接:https://arxiv.org/abs/2109.00173

作者:Alan Mishler,Edward Kennedy 机构:Department of Statistics & Data Science, Carnegie Mellon, University, Pittsburgh, PA, USA., J. P. Morgan AI Research, New York, NY, USA. 备注:56 pages, 20 figures 摘要:建立公平预测的方法通常涉及公平性和准确性之间以及不同公平性标准之间的权衡,但这些权衡的性质各不相同。最近的工作试图在特定的问题设置中描述这些折衷,但这些方法通常不适合希望在不牺牲准确性的情况下提高现有基准模型公平性的用户,反之亦然。这些结果通常也局限于可观察的准确性和公平性标准。我们开发了一个灵活的公平集成学习框架,允许用户有效地探索公平精度空间,或改进基准模型的公平性或准确性。我们的框架可以同时针对多个可观察或反事实的公平性标准,并且使用户能够组合大量先前训练和新训练的预测因子。我们提供理论保证,我们的估计收敛速度快。我们将我们的方法应用于模拟数据和真实数据,涉及可观察和反事实的准确性和公平性标准。我们发现,令人惊讶的是,与无约束预测或现有基准模型相比,多个不公平度量有时可以同时最小化,而对准确性的影响很小。 摘要:Methods for building fair predictors often involve tradeoffs between fairness and accuracy and between different fairness criteria, but the nature of these tradeoffs varies. Recent work seeks to characterize these tradeoffs in specific problem settings, but these methods often do not accommodate users who wish to improve the fairness of an existing benchmark model without sacrificing accuracy, or vice versa. These results are also typically restricted to observable accuracy and fairness criteria. We develop a flexible framework for fair ensemble learning that allows users to efficiently explore the fairness-accuracy space or to improve the fairness or accuracy of a benchmark model. Our framework can simultaneously target multiple observable or counterfactual fairness criteria, and it enables users to combine a large number of previously trained and newly trained predictors. We provide theoretical guarantees that our estimators converge at fast rates. We apply our method on both simulated and real data, with respect to both observable and counterfactual accuracy and fairness criteria. We show that, surprisingly, multiple unfairness measures can sometimes be minimized simultaneously with little impact on accuracy, relative to unconstrained predictors or existing benchmark models.

【10】 GFINNs: GENERIC Formalism Informed Neural Networks for Deterministic and Stochastic Dynamical Systems 标题:GFINNS:确定性和随机动力系统的通用形式信息神经网络 链接:https://arxiv.org/abs/2109.00092

作者:Zhen Zhang,Yeonjong Shin,George Em Karniadakis 机构:GFINNs: GENERIC, Formalism Informed Neural, Networks for Deterministic, and Stochastic Dynamical, Systems, Division of Applied Mathematics, and , School of, Engineering, Brown University, Providence, RI, USA, propose, informed 摘要:我们提出了广义形式主义信息神经网络(GFINNs),它遵循广义形式主义的对称简并条件。GFINN由两个模块组成,每个模块包含两个组件。我们使用神经网络对每个组件进行建模,神经网络的结构设计满足所需条件。组件式架构设计提供了灵活的方法,可以将可用的物理信息利用到神经网络中。我们从理论上证明了GFINNs具有足够的表达能力来学习基本方程,从而建立了普遍逼近定理。我们演示了GFINNs在三个模拟问题中的性能:气体容器交换热量和体积、热弹性双摆和朗之万动力学。在所有的例子中,GFINNs都优于现有的方法,因此对确定性和随机系统的预测都具有良好的准确性。 摘要:We propose the GENERIC formalism informed neural networks (GFINNs) that obey the symmetric degeneracy conditions of the GENERIC formalism. GFINNs comprise two modules, each of which contains two components. We model each component using a neural network whose architecture is designed to satisfy the required conditions. The component-wise architecture design provides flexible ways of leveraging available physics information into neural networks. We prove theoretically that GFINNs are sufficiently expressive to learn the underlying equations, hence establishing the universal approximation theorem. We demonstrate the performance of GFINNs in three simulation problems: gas containers exchanging heat and volume, thermoelastic double pendulum and the Langevin dynamics. In all the examples, GFINNs outperform existing methods, hence demonstrating good accuracy in predictions for both deterministic and stochastic systems.

【11】 Scalable Spatiotemporally Varying Coefficient Modeling with Bayesian Kernelized Tensor Regression 标题:基于贝叶斯核张量回归的可伸缩时空变系数建模 链接:https://arxiv.org/abs/2109.00046

作者:Mengying Lei,Aurelie Labbe,Lijun Sun 机构:McGill University, Montreal, QC, Canada, HEC Montreal, Montreal, QC, Canada 摘要:时空变异系数模型(STVC)作为空间统计中的一种回归技术,是发现空间和时间上非平稳和可解释的响应协变量关联的重要工具。然而,由于计算量大,STVC难以应用于大规模时空分析。为了应对这一挑战,我们使用三阶张量结构总结了时空变化系数,并建议将时空变化系数模型重新表述为一个特殊的低阶张量回归问题。低秩分解可以有效地建模大数据的全局模式,并大大减少参数数量。为了进一步结合样本之间的局部时空依赖性,我们在时空因子矩阵上放置高斯过程(GP)先验,以便更好地编码每个因子分量上的局部时空过程。我们将总体框架称为贝叶斯核化张量回归(BKTR)。对于模型推理,我们开发了一种有效的马尔可夫链蒙特卡罗(MCMC)算法,该算法使用Gibbs采样更新因子矩阵,并使用切片采样更新核超参数。我们在合成数据集和真实数据集上进行了大量实验,我们的结果证实了BKTR在模型估计和参数推断方面的优越性能和效率。 摘要:As a regression technique in spatial statistics, spatiotemporally varying coefficient model (STVC) is an important tool to discover nonstationary and interpretable response-covariate associations over both space and time. However, it is difficult to apply STVC for large-scale spatiotemporal analysis due to the high computational cost. To address this challenge, we summarize the spatiotemporally varying coefficients using a third-order tensor structure and propose to reformulate the spatiotemporally varying coefficient model as a special low-rank tensor regression problem. The low-rank decomposition can effectively model the global patterns of the large data with substantially reduced number of parameters. To further incorporate the local spatiotemporal dependencies among the samples, we place Gaussian process (GP) priors on the spatial and temporal factor matrices to better encode local spatial and temporal processes on each factor component. We refer to the overall framework as Bayesian Kernelized Tensor Regression (BKTR). For model inference, we develop an efficient Markov chain Monte Carlo (MCMC) algorithm, which uses Gibbs sampling to update factor matrices and slice sampling to update kernel hyperparameters. We conduct extensive experiments on both synthetic and real-world data sets, and our results confirm the superior performance and efficiency of BKTR for model estimation and parameter inference.

其他(9篇)

【1】 Fairness based Multi-Preference Resource Allocation in Decentralised Open Markets 标题:分散化开放市场中基于公平的多偏好资源分配 链接:https://arxiv.org/abs/2109.00207

作者:Pankaj Mishra,Ahmed Moustafa,Takayuki Ito 机构:Nagoya Institute of Technology, Kyoto University 摘要:在这项工作中,我们关注分散的开放市场中的资源分配。在分散的开放市场中,由多个供应商和多个动态到达的买家组成,从而使市场变得复杂和动态。因为,在这些市场中,供应商和买方之间就价格、可扩展性、稳健性、延迟等多个相互冲突的问题进行谈判。因此,优化此类开放市场中的资源配置直接取决于两个关键决策,即:;结合不同类型的买家偏好和基于公平的供应商诱导策略。为此,在这项工作中,我们提出了一种采用反向拍卖范式的三步资源分配方法。在第一步中,根据建议的优先级机制,将优先级标签附加到每个投标供应商。然后,在第二步,计算购买者所有不同类型偏好的偏好分数。最后,在第三步,根据供应商的优先级标签和偏好分数确定优胜者。最后,我们将所提出的方法与两种最先进的资源定价和分配策略进行了比较。实验结果表明,该方法在购买者的独立效用和公开市场的整体效用方面优于其他两种资源分配方法。 摘要:In this work, we focus on resource allocation in a decentralised open market. In decentralised open markets consists of multiple vendors and multiple dynamically-arriving buyers, thus makes the market complex and dynamic. Because, in these markets, negotiations among vendors and buyers take place over multiple conflicting issues such as price, scalability, robustness, delay, etc. As a result, optimising the resource allocation in such open markets becomes directly dependent on two key decisions, which are; incorporating a different kind of buyers' preferences, and fairness based vendor elicitation strategy. Towards this end, in this work, we propose a three-step resource allocation approach that employs a reverse-auction paradigm. At the first step, priority label is attached to each bidding vendor based on the proposed priority mechanism. Then, at the second step, the preference score is calculated for all the different kinds of preferences of the buyers. Finally, at the third step, based on the priority label of the vendor and the preference score winner is determined. Finally, we compare the proposed approach with two state-of-the-art resource pricing and allocation strategies. The experimental results show that the proposed approach outperforms the other two resource allocation approaches in terms of the independent utilities of buyers and the overall utility of the open market.

【2】 Approximation Properties of Deep ReLU CNNs 标题:深度RELU CNNs的逼近性质 链接:https://arxiv.org/abs/2109.00190

作者:Juncai He,Lin Li,Jinchao Xu 机构:Peking University, ‡Department of Mathematics 备注:27 pages 摘要:本文致力于在二维空间上建立深ReLU卷积神经网络的$L^2$逼近性质。该分析基于大空间尺寸和多通道卷积核的分解定理。在给定ReLU激活函数的分解和性质的情况下,通过显示其与具有一个隐层的ReLU深度神经网络(DNN)的联系,得到了具有经典结构的深度ReLU CNN的一个普遍逼近定理。此外,基于这些网络之间的连接,还获得了具有ResNet、pre-act ResNet和MgNet结构的神经网络的近似性质。 摘要:This paper is devoted to establishing $L^2$ approximation properties for deep ReLU convolutional neural networks (CNNs) on two-dimensional space. The analysis is based on a decomposition theorem for convolutional kernels with large spatial size and multi-channel. Given that decomposition and the property of the ReLU activation function, a universal approximation theorem of deep ReLU CNNs with classic structure is obtained by showing its connection with ReLU deep neural networks (DNNs) with one hidden layer. Furthermore, approximation properties are also obtained for neural networks with ResNet, pre-act ResNet, and MgNet architecture based on connections between these networks.

【3】 Implicit Behavioral Cloning 标题:隐式行为克隆 链接:https://arxiv.org/abs/2109.00137

作者:Pete Florence,Corey Lynch,Andy Zeng,Oscar Ramirez,Ayzaan Wahid,Laura Downs,Adrian Wong,Johnny Lee,Igor Mordatch,Jonathan Tompson 机构:Robotics at Google 摘要:我们发现,在广泛的机器人策略学习场景中,使用隐式模型处理有监督的策略学习通常比常用的显式模型表现得更好。我们对这一发现进行了广泛的实验,并提供了直观的见解和理论依据,将隐式模型的特性与显式模型的特性相比较,特别是在逼近复杂、潜在不连续和多值(集值)函数方面。在机器人策略学习任务中,我们发现基于能量模型(EBM)的隐式行为克隆策略通常优于常见的显式(均方误差或混合密度)行为克隆策略,包括在具有高维动作空间和视觉图像输入的任务中。我们发现这些策略在D4RL基准测试套件中具有挑战性的人工专家任务上提供了有竞争力的结果或优于最先进的离线强化学习方法,尽管没有使用奖励信息。在现实世界中,具有隐式策略的机器人可以从人类演示中学习接触丰富的任务的复杂和非常微妙的行为,包括具有高度组合复杂性的任务和需要1毫米精度的任务。 摘要:We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.

【4】 Online Dynamic Window (ODW) Assisted Two-stage LSTM Frameworks for Indoor Localization 标题:在线动态窗口(ODW)辅助的两阶段LSTM室内定位框架 链接:https://arxiv.org/abs/2109.00126

作者:Mohammadamin Atashi,Mohammad Salimibeni,Arash Mohammadi 机构:Electrical and Computer Engineering, Concordia University, De Maisonneuve Blvd. W., Montreal, H,G ,M, QC, Canada., Concordia Institute for Information System Engineering, (CIISE), Concordia University, De Maisonneuve Blvd. W. 摘要:基于物联网(IoT)的室内定位近年来得到了广泛的应用,以满足日益增长的室内位置服务(LBS)需求。在这种情况下,基于惯性测量单元(IMU)的定位非常重要,因为它提供了独立于任何专有传感器/模块的可扩展解决方案。然而,现有的基于IMU的方法主要是基于统计航向和步长估计技术开发的,这些技术存在累积误差问题,并且具有广泛的计算时间要求,限制了其在实时室内定位中的应用。为了解决上述问题,我们提出了在线动态窗口(ODW)辅助的两阶段长短时记忆(LSTM)定位框架。提出了三个ODW,其中第一个模型使用自然语言处理(NLP)启发的动态窗口(DW)方法,这大大减少了所需的计算时间。第二个框架基于信号处理动态窗口(SP-DW)方法开发,以进一步缩短基于两阶段LSTM的模型所需的处理时间。第三个ODW称为SP-NLP,它结合了前两个窗口机制,以进一步提高整体实现的精度。与传统的基于LSTM的定位方法(张量计算要求高或精度低)相比,所提出的ODW辅助模型能够以高精度近实时方式进行室内定位。基于真实的行人航位推算(PDR)数据集,对所提出的ODW辅助模型的性能进行了评估。结果表明,所提出的ODW辅助技术在显著减少计算时间的情况下实现了高分类精度,使其适用于近实时实现。 摘要:Internet of Things (IoT)-based indoor localization has gained significant popularity recently to satisfy the ever-increasing requirements of indoor Location-based Services (LBS). In this context, Inertial Measurement Unit (IMU)-based localization is of interest as it provides a scalable solution independent of any proprietary sensors/modules. Existing IMU-based methodologies, however, are mainly developed based on statistical heading and step length estimation techniques that suffer from cumulative error issues and have extensive computational time requirements limiting their application for real-time indoor positioning. To address the aforementioned issues, we propose the Online Dynamic Window (ODW)-assisted two-stage Long Short Term Memory (LSTM) localization framework. Three ODWs are proposed, where the first model uses a Natural Language Processing (NLP)-inspired Dynamic Window (DW) approach, which significantly reduces the required computational time. The second framework is developed based on a Signal Processing Dynamic Windowing (SP-DW) approach to further reduce the required processing time of the two-stage LSTM-based model. The third ODW, referred to as the SP-NLP, combines the first two windowing mechanisms to further improve the overall achieved accuracy. Compared to the traditional LSTM-based positioning approaches, which suffer from either high tensor computation requirements or low accuracy, the proposed ODW-assisted models can perform indoor localization in a near-real time fashion with high accuracy. Performances of the proposed ODW-assisted models are evaluated based on a real Pedestrian Dead Reckoning (PDR) dataset. The results illustrate potentials of the proposed ODW-assisted techniques in achieving high classification accuracy with significantly reduced computational time, making them applicable for near real-time implementations.

【5】 MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics 标题:MiniF2F:正式奥林匹克级别数学的跨系统基准 链接:https://arxiv.org/abs/2109.00110

作者:Kunhao Zheng,Jesse Michael Han,Stanislas Polu 机构:´Ecole Polytechnique, OpenAI, University of Pittsburgh 摘要:我们提出miniF2F,一个正式的奥运会级别数学问题陈述的数据集,旨在为神经定理证明提供一个统一的跨系统基准。miniF2F基准目前以Metamath、Lean和Isabelle为目标,由488个问题陈述组成,这些问题陈述来自AIME、AMC和国际数学奥林匹克(IMO),以及高中和本科数学课程的材料。我们报告了基于GPT-3的神经定理证明器GPT-f的基线结果,并对其性能进行了分析。我们打算让miniF2F成为一项由社区推动的工作,并希望我们的基准将有助于推动神经定理证明方面的进步。 摘要:We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, and Isabelle and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses. We report baseline results using GPT-f, a neural theorem prover based on GPT-3 and provide an analysis of its performance. We intend for miniF2F to be a community-driven effort and hope that our benchmark will help spur advances in neural theorem proving.

【6】 It's not Rocket Science : Interpreting Figurative Language in Narratives 标题:不是火箭科学:解读叙事中的比喻语言 链接:https://arxiv.org/abs/2109.00087

作者:Tuhin Chakrabarty,Yejin Choi,Vered Shwartz 机构:Columbia University, Allen Institute for Artificial Intelligence, Paul G. Allen School of Computer Science & Engineering, University of Washington, University of British Columbia 摘要:比喻语言在英语中无处不在。然而,绝大多数NLP研究集中在字面语言上。设计的现有文本表示依赖于组合性,而比喻性语言通常是非组合性的。在本文中,我们研究了两种非组合比喻语言(习语和明喻)的解释。我们收集了包含比喻表达的虚构叙事数据集,以及基于对表达的正确解释的众包可信和不可信的延续。然后,我们训练模型来选择或生成合理的延续。我们的实验表明,仅基于预先训练的语言模型的模型在这些任务上的表现比人类差得多。此外,我们还提出了知识增强模型,采用人类的策略来解释比喻语言:从上下文推断意义并依赖组成词的字面意义。知识增强模型提高了区分性和生成性任务的绩效,进一步缩小了与人类绩效的差距。 摘要:Figurative language is ubiquitous in English. Yet, the vast majority of NLP research focuses on literal language. Existing text representations by design rely on compositionality, while figurative language is often non-compositional. In this paper, we study the interpretation of two non-compositional figurative languages (idioms and similes). We collected datasets of fictional narratives containing a figurative expression along with crowd-sourced plausible and implausible continuations relying on the correct interpretation of the expression. We then trained models to choose or generate the plausible continuation. Our experiments show that models based solely on pre-trained language models perform substantially worse than humans on these tasks. We additionally propose knowledge-enhanced models, adopting human strategies for interpreting figurative language: inferring meaning from the context and relying on the constituent word's literal meanings. The knowledge-enhanced models improve the performance on both the discriminative and generative tasks, further bridging the gap from human performance.

【7】 Working Memory Connections for LSTM 标题:LSTM的工作内存连接 链接:https://arxiv.org/abs/2109.00020

作者:Federico Landi,Lorenzo Baraldi,Marcella Cornia,Rita Cucchiara 机构:Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Modena, Italy 备注:Accepted for publication in Neural Networks 摘要:长短时记忆递归神经网络(LSTM)在学习长期依赖关系时,利用门控机制来缓解梯度的爆炸和消失。因此,LSTM和其他门控RNN被广泛采用,成为许多序列建模任务的事实标准。尽管LSTM内的存储单元包含基本信息,但不允许直接影响选通机制。在这项工作中,我们通过包含来自内部单元状态的信息来提高门电位。所提议的修改名为“工作记忆连接”,包括在网络门中添加一个可学习的单元内容非线性投影。这种修改可以适用于经典的LSTM门,而无需对底层任务进行任何假设,在处理较长序列时尤其有效。以前在这方面的研究工作可以追溯到21世纪初,与香草LSTM相比,无法带来一致的改进。作为本文的一部分,我们确定了一个与以前的连接相关的关键问题,该问题严重限制了它们的有效性,从而阻止了来自内部单元状态的知识的成功集成。我们通过广泛的实验评估表明,工作记忆连接不断提高LSTM在各种任务上的性能。数值结果表明,单元状态包含有用的信息,值得包含在栅极结构中。 摘要:Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

【8】 Half-Space and Box Constraints as NUV Priors: First Results 标题:作为NUV先验的半空间约束和长方体约束:第一个结果 链接:https://arxiv.org/abs/2109.00036

作者:Raphael Keusch,Hans-Andrea Loeliger 机构:ETH Zurich, Dept. of Information Technology & Electrical Engineering 摘要:具有未知方差的法线(NUV)可以表示许多有用的先验值,并与高斯模型和消息传递算法很好地融合。稀疏先验的NUV表示早已为人所知,而二进制(和M级)先验的NUV表示最近才被提出。在本文档中,我们提出了半空间约束和框约束的NUV表示,这允许将此类约束添加到具有任何先前已知NUV先验的任何线性高斯模型中,而不影响计算的可处理性。 摘要:Normals with unknown variance (NUV) can represent many useful priors and blend well with Gaussian models and message passing algorithms. NUV representations of sparsifying priors have long been known, and NUV representations of binary (and M-level) priors have been proposed very recently. In this document, we propose NUV representations of half-space constraints and box constraints, which allows to add such constraints to any linear Gaussian model with any of the previously known NUV priors without affecting the computational tractability.

【9】 Optimizing embedding-related quantum annealing parameters for reducing hardware bias 标题:优化嵌入相关量子退火参数以降低硬件偏差 链接:https://arxiv.org/abs/2011.00719

作者:Aaron Barbosa,Elijah Pelofske,Georg Hahn,Hristo N. Djidjev 机构:Los Alamos National Laboratory 摘要:量子退火机被设计用来提出NP难优化问题的近似最优解。然而,当前退火机(如D-Wave Systems,Inc.)的退火机)的精度受到环境噪声和硬件偏差的限制。处理这些缺陷和提高退火结果质量的一种方法是应用各种预处理技术,如自旋反转(SR)、退火偏移(AO)或链重(CW)。最大限度地提高这些技术的有效性涉及对大量参数进行优化,如果需要针对每个新问题实例进行优化,则成本将过高。在这项工作中,我们证明了上述参数优化可以针对一整类问题进行,只要每个实例使用先前选择的固定嵌入。具体地说,在训练阶段,我们将一个完整图的嵌入E固定到退火机的硬件上,然后运行优化算法来调整以下参数值集:SR的翻转位集、AO的特定量子位偏移和链权重分布,通过从该类中随机选择的一组训练图进行优化,其中使用E。在测试阶段,我们估计在训练阶段计算的参数对该类中随机选择的其他图的作用。我们研究了最大团、最大割和图划分问题的不同密度的图实例。我们的结果表明,与它们的默认行为相比,使用优化的SR、AO和CW参数可以显著改善退火结果。 摘要:Quantum annealers have been designed to propose near-optimal solutions to NP-hard optimization problems. However, the accuracy of current annealers such as the ones of D-Wave Systems, Inc., is limited by environmental noise and hardware biases. One way to deal with these imperfections and to improve the quality of the annealing results is to apply a variety of pre-processing techniques such as spin reversal (SR), anneal offsets (AO), or chain weights (CW). Maximizing the effectiveness of these techniques involves performing optimizations over a large number of parameters, which would be too costly if needed to be done for each new problem instance. In this work, we show that the aforementioned parameter optimization can be done for an entire class of problems, given each instance uses a previously chosen fixed embedding. Specifically, in the training phase, we fix an embedding E of a complete graph onto the hardware of the annealer, and then run an optimization algorithm to tune the following set of parameter values: the set of bits to be flipped for SR, the specific qubit offsets for AO, and the distribution of chain weights, optimized over a set of training graphs randomly chosen from that class, where the graphs are embedded onto the hardware using E. In the testing phase, we estimate how well the parameters computed during the training phase work on a random selection of other graphs from that class. We investigate graph instances of varying densities for the Maximum Clique, Maximum Cut, and Graph Partitioning problems. Our results indicate that, compared to their default behavior, substantial improvements of the annealing results can be achieved by using the optimized parameters for SR, AO, and CW.

机器翻译,仅供参考

0 人点赞