机器学习学术速递[7.16]

2021-07-27 11:00:02 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.LG 方向,今日共计107篇

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】 Hierarchical graph neural nets can capture long-range interactions 标题:层次图神经网络可以捕获远程交互

作者:Ladislav Rampášek,Guy Wolf 机构:Université de Montréal, Dept. of Math. & Stat.; Mila - Quebec AI Institute, Montreal, QC, Canada 链接:https://arxiv.org/abs/2107.07432 摘要:基于相邻节点间消息传递的图神经网络(GNNs)不足以捕捉图中的远程交互。在这个项目中,我们研究层次化的消息传递模型,利用多分辨率表示一个给定的图形。这有助于在不丢失局部信息的情况下学习跨越大感受野的特征,这在以前的分层GNNs研究中是没有研究过的。我们引入了层次图网(HGNet),对于任意两个连通的节点,它保证了最大对数长度w.r.t.的消息传递路径的存在。然而,在温和的假设下,它的内部层次结构保持与输入图的渐近大小相等。我们观察到,我们的HGNet优于传统的GCN层堆叠,特别是在分子性质预测基准上。最后,我们提出了两个基准任务,旨在阐明GNNs利用图形中的远程交互的能力。 摘要:Graph neural networks (GNNs) based on message passing between neighboring nodes are known to be insufficient for capturing long-range interactions in graphs. In this project we study hierarchical message passing models that leverage a multi-resolution representation of a given graph. This facilitates learning of features that span large receptive fields without loss of local information, an aspect not studied in preceding work on hierarchical GNNs. We introduce Hierarchical Graph Net (HGNet), which for any two connected nodes guarantees existence of message-passing paths of at most logarithmic length w.r.t. the input graph size. Yet, under mild assumptions, its internal hierarchy maintains asymptotic size equivalent to that of the input graph. We observe that our HGNet outperforms conventional stacking of GCN layers particularly in molecular property prediction benchmarks. Finally, we propose two benchmarking tasks designed to elucidate capability of GNNs to leverage long-range interactions in graphs.

【2】 A multi-schematic classifier-independent oversampling approach for imbalanced datasets 标题:一种多模式分类器无关的不平衡数据集过采样方法

作者:Saptarshi Bej,Kristian Schultz,Prashant Srivastava,Markus Wolfien,Olaf Wolkenhauer 机构: University of Rostock, Germany•Olaf Wolkenhauer is affiliated by the Leibniz-Institute for Food SystemsBiology, Technical University of Munich, University ofLondon, Department of Systems Biology & BioinformaticsUniversity of Rostock, Universit¨atsplatz 1 备注:12 tables, 6 figures 链接:https://arxiv.org/abs/2107.07349 摘要:在过去的二十年中,已经建立了超过85个过采样算法,其中大部分是SMOTE算法的扩展,以解决数据集不平衡的问题。然而,以往的研究表明,不同的过采样算法对不同的分类器具有不同的效率。由于算法众多,很难为所选分类器确定过采样算法。在这里,我们用一种多图解和分类器独立的过采样方法来克服这个问题:ProWRAS(邻近加权随机仿射阴影采样)。ProWRAS集成了局部随机仿射阴影采样(LoRAS)算法和邻近加权合成过采样(ProWSyn)算法。通过控制合成样本的方差,以及少数类数据的邻近加权聚类系统,ProWRAS算法与通过建模少数类的高维凸空间生成合成样本的算法相比,提高了性能。ProWRAS有四种过采样方案,每种方案都有其独特的方法来模拟生成数据的方差。最重要的是,通过选择适当的过采样方案,ProWRAS的性能与所使用的分类器无关。在20个公开的数据集上,我们用五种最先进的过采样模型和四种不同的分类器对我们新开发的ProWRAS算法进行了基准测试。ProWRAS在F1评分和Kappa评分方面均优于其他过采样算法。此外,我们还引入了一种新的分类器独立性度量I-score,定量地表明ProWRAS的性能更好,与所使用的分类器无关。在实践中,ProWRAS根据选择的分类器定制合成样本生成,从而减少基准测试工作。 摘要:Over 85 oversampling algorithms, mostly extensions of the SMOTE algorithm, have been built over the past two decades, to solve the problem of imbalanced datasets. However, it has been evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers. With numerous algorithms available, it is difficult to decide on an oversampling algorithm for a chosen classifier. Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach: ProWRAS(Proximity Weighted Random Affine Shadowsampling). ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS)algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm. By controlling the variance of the synthetic samples, as well as a proximity-weighted clustering system of the minority classdata, the ProWRAS algorithm improves performance, compared to algorithms that generate synthetic samples through modelling high dimensional convex spaces of the minority class. ProWRAS has four oversampling schemes, each of which has its unique way to model the variance of the generated data. Most importantly, the performance of ProWRAS with proper choice of oversampling schemes, is independent of the classifier used. We have benchmarked our newly developed ProWRAS algorithm against five sate-of-the-art oversampling models and four different classifiers on 20 publicly available datasets. ProWRAS outperforms other oversampling algorithms in a statistically significant way, in terms of both F1-score and Kappa-score. Moreover, we have introduced a novel measure for classifier independence I-score, and showed quantitatively that ProWRAS performs better, independent of the classifier used. In practice, ProWRAS customizes synthetic sample generation according to a classifier of choice and thereby reduces benchmarking efforts.

【3】 Expert Graphs: Synthesizing New Expertise via Collaboration 标题:专家图表:通过协作综合新的专业知识

作者:Bijan Mazaheri,Siddharth Jain,Jehoshua Bruck 机构: California Institute of Technology 备注:13 pages, 11 figures 链接:https://arxiv.org/abs/2107.07054 摘要:考虑在不确定输入下处理重叠问题的多专家专家。什么构成了一套一致的观点?如何预测专家对缺失子域的意见?在本文中,我们定义了一个分析这个问题的框架,称之为“专家图”。在专家图中,顶点表示类,边表示顶点主题的二元观点。我们推导出了专家图有效性的必要条件,并用它们来创建“综合专家”,描述与其他专家的观察意见一致的意见。我们证明了这个框架是等价的研究线性排序多面体。我们证明了我们的条件不足以描述团上的所有专家图,但足以描述圈。 摘要:Consider multiple experts with overlapping expertise working on a classification problem under uncertain input. What constitutes a consistent set of opinions? How can we predict the opinions of experts on missing sub-domains? In this paper, we define a framework of to analyze this problem, termed "expert graphs." In an expert graph, vertices represent classes and edges represent binary opinions on the topics of their vertices. We derive necessary conditions for expert graph validity and use them to create "synthetic experts" which describe opinions consistent with the observed opinions of other experts. We show this framework to be equivalent to the well-studied linear ordering polytope. We show our conditions are not sufficient for describing all expert graphs on cliques, but are sufficient for cycles.

【4】 GGT: Graph-Guided Testing for Adversarial Sample Detection of Deep Neural Network 标题:GGT:深度神经网络对抗性样本检测的图导测试

作者:Zuohui Chen,Renxuan Wang,Jingyang Xiang,Yue Yu,Xin Xia,Shouling Ji,Qi Xuan,Xiaoniu Yang 机构: Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou, China, National University of Defense Technology, Changsha, China, Monash University, Melbourne, Australia, Zhejiang University, Hangzhou, China 链接:https://arxiv.org/abs/2107.07043 摘要:深度神经网络(Deep Neural Networks,DNN)是一种易受对抗性样本攻击的网络,其检测对于DNN模型的广泛应用至关重要。近年来,软件工程中提出了许多深入的测试方法来发现DNN系统的脆弱性,其中一种方法模型变异测试(MMT)被成功地用于检测各种对抗性攻击产生的各种对抗性样本。然而,MMT中的变异模型数量庞大(如超过100个模型),且缺乏多样性(如易被高置信度对抗性样本所规避),这使得MMT在实际应用中的效率较低,对高置信度对抗性样本的检测效率较低。在这项研究中,我们提出了图形引导测试(GGT)的对抗性样本检测,以克服上述挑战。GGT以图特征为导向生成剪枝模型,每个剪枝模型的参数仅为MMT中变异模型的5%左右,图导向模型具有较高的多样性。在CIFAR10和SVHN上的实验验证了GGT在有效性和效率上都优于MMT。 摘要:Deep Neural Networks (DNN) are known to be vulnerable to adversarial samples, the detection of which is crucial for the wide application of these DNN models. Recently, a number of deep testing methods in software engineering were proposed to find the vulnerability of DNN systems, and one of them, i.e., Model Mutation Testing (MMT), was used to successfully detect various adversarial samples generated by different kinds of adversarial attacks. However, the mutated models in MMT are always huge in number (e.g., over 100 models) and lack diversity (e.g., can be easily circumvented by high-confidence adversarial samples), which makes it less efficient in real applications and less effective in detecting high-confidence adversarial samples. In this study, we propose Graph-Guided Testing (GGT) for adversarial sample detection to overcome these aforementioned challenges. GGT generates pruned models with the guide of graph characteristics, each of them has only about 5% parameters of the mutated model in MMT, and graph guided models have higher diversity. The experiments on CIFAR10 and SVHN validate that GGT performs much better than MMT with respect to both effectiveness and efficiency.

【5】 Classifying Component Function in Product Assemblies with Graph Neural Networks 标题:用图神经网络对产品装配中的部件功能进行分类

作者:Vincenzo Ferrero,Kaveh Hassani,Daniele Grandi,Bryony DuPont 机构:Design Engineering Laboratory, Oregon State University, Corvallis, Oregon, Autodesk, Inc., Autodesk Research, San Rafael, CA 链接:https://arxiv.org/abs/2107.07042 摘要:功能被定义为使产品能够完成设计目的的任务集合。功能性工具,如功能建模,在产品设计的早期阶段提供决策指导,在早期阶段还没有明确的设计决策。基于功能的设计数据通常是稀疏的,并且基于个别的解释。因此,基于功能的设计工具可以从自动功能分类中获益,以提高数据保真度,并提供支持基于功能的智能设计代理的功能表示模型。基于功能的设计数据通常存储在手动生成的设计存储库中。这些设计存储库是由功能流和组件分类法限定的产品设计中的专家知识和功能解释的集合。在这项工作中,我们将一个结构化的基于分类法的设计储存库表示为装配流程图,然后利用图神经网络(GNN)模型来执行自动功能分类。我们通过从存储库数据学习来建立组件功能分配的基本事实,从而支持自动功能分类。实验结果表明,我们的GNN模型的微观平均F${u1}$-分数为0.832(广义),0.756(广义),0.783(具体)三级函数。鉴于数据特征的不平衡,结果令人鼓舞。我们在这篇论文中的努力可以作为一个更复杂的应用在基于知识的CAD系统和基于功能的设计考虑X设计的起点。 摘要:Function is defined as the ensemble of tasks that enable the product to complete the designed purpose. Functional tools, such as functional modeling, offer decision guidance in the early phase of product design, where explicit design decisions are yet to be made. Function-based design data is often sparse and grounded in individual interpretation. As such, function-based design tools can benefit from automatic function classification to increase data fidelity and provide function representation models that enable function-based intelligent design agents. Function-based design data is commonly stored in manually generated design repositories. These design repositories are a collection of expert knowledge and interpretations of function in product design bounded by function-flow and component taxonomies. In this work, we represent a structured taxonomy-based design repository as assembly-flow graphs, then leverage a graph neural network (GNN) model to perform automatic function classification. We support automated function classification by learning from repository data to establish the ground truth of component function assignment. Experimental results show that our GNN model achieves a micro-average F${_1}$-score of 0.832 for tier 1 (broad), 0.756 for tier 2, and 0.783 for tier 3 (specific) functions. Given the imbalance of data features, the results are encouraging. Our efforts in this paper can be a starting point for more sophisticated applications in knowledge-based CAD systems and Design-for-X consideration in function-based design.

【6】 Short-term Hourly Streamflow Prediction with Graph Convolutional GRU Networks 标题:基于图卷积GRU网络的短期小时径流预报

作者:Muhammed Sit,Bekir Demiray,Ibrahim Demir 机构: The University of Iowa 备注:4 pages, Accepted to Tackling Climate Change with Machine Learning workshop at ICML 2021 链接:https://arxiv.org/abs/2107.07039 摘要:由于气候变化,洪水的频率和影响预计将增加。为了准备和减轻财产损失和死亡方面的后果,预测河流流量,从而预测洪水是至关重要的。本文提出了一个基于GRUs的图卷积模型,用于预测上游河网传感器位置未来36小时的流量。实验结果表明,本文提出的模型在短期径流预测中比持久性基线和卷积双向GRU网络具有更好的性能。 摘要:The frequency and impact of floods are expected to increase due to climate change. It is crucial to predict streamflow, consequently flooding, in order to prepare and mitigate its consequences in terms of property damage and fatalities. This paper presents a Graph Convolutional GRUs based model to predict the next 36 hours of streamflow for a sensor location using the upstream river network. As shown in experiment results, the model presented in this study provides better performance than the persistence baseline and a Convolutional Bidirectional GRU network for the selected study area in short-term streamflow prediction.

【7】 Elastic Graph Neural Networks 标题:弹性图神经网络

作者:Xiaorui Liu,Wei Jin,Yao Ma,Yaxin Li,Hua Liu,Yiqi Wang,Ming Yan,Jiliang Tang 机构: The specificEqual contribution 1Department of Computer Science and En-gineering, Michigan State University, USA 2School of Mathemat-ics, Shandong University, China 3Department of ComputationalMathematics 备注:ICML 2021 (International Conference on Machine Learning) 链接:https://arxiv.org/abs/2107.06996 摘要:虽然许多现有的图神经网络(GNNs)已经被证明可以执行基于$ellu 2$的全局平滑,但是在这项工作中,我们的目标是通过基于$ellu 1$的图平滑来进一步增强GNNs的局部平滑适应性。因此,我们引入了一类基于$ellu 1$和$ellu 2$的图平滑的GNNs(elasticgnns)。特别地,我们提出了一种新的通用的GNNs消息传递方案。这种消息传递算法不仅对反向传播训练友好,而且在理论上保证了算法的收敛性,达到了预期的平滑效果。在半监督学习任务上的实验表明,所提出的弹性GNNs对基准数据集具有较好的适应性,对图攻击具有较强的鲁棒性。Elastic GNNs的实现可从url获得{https://github.com/lxiaorui/ElasticGNN}. 摘要:While many existing graph neural networks (GNNs) have been proven to perform $ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $ell_1$-based graph smoothing. As a result, we introduce a family of GNNs (Elastic GNNs) based on $ell_1$ and $ell_2$-based graph smoothing. In particular, we propose a novel and general message passing scheme into GNNs. This message passing algorithm is not only friendly to back-propagation training but also achieves the desired smoothing properties with a theoretical convergence guarantee. Experiments on semi-supervised learning tasks demonstrate that the proposed Elastic GNNs obtain better adaptivity on benchmark datasets and are significantly robust to graph adversarial attacks. The implementation of Elastic GNNs is available at url{https://github.com/lxiaorui/ElasticGNN}.

【8】 Entropic Inequality Constraints from e-separation Relations in Directed Acyclic Graphs with Hidden Variables标题:隐变量有向无环图中e-分离关系的熵不等式约束

作者:Noam Finkelstein,Beata Zjawin,Elie Wolfe,Ilya Shpitser,Robert W. Spekkens 机构: Johns Hopkins University, Department of Computer Science, N Charles St, Baltimore, MD USA, Perimeter Institute for Theoretical Physics, Caroline St. N, Waterloo, Ontario, Canada, N,L ,Y 备注:15 pages. This arXiv version is slightly updated relative to the version in UAI proceedings. (Theorem 5 and Proposition 8 have been strengthened, with Appendix C revised correspondingly. Appendix D has been added.) 链接:https://arxiv.org/abs/2107.07087 摘要:带隐变量的有向无环图(dag)常被用来刻画系统中变量之间的因果关系。当一些变量不被观测时,DAG意味着对观测变量分布的一组众所周知的复杂约束。在这项工作中,我们提出了熵不等式约束所隐含的$e$-分离关系的隐变量DAG离散观测变量。这些约束可以直观地理解为遵循这样一个事实:沿因果路径的变量传递信息的能力受到其熵的限制;e、 在极端情况下,熵为$0$的变量不能传递任何信息。我们展示了如何使用这些约束来从观察到的数据分布中了解真正的因果模型。此外,我们提出了一种因果影响的度量方法,称为最小中间熵,并证明它可以扩充传统的度量方法,如平均因果影响。 摘要:Directed acyclic graphs (DAGs) with hidden variables are often used to characterize causal relations between variables in a system. When some variables are unobserved, DAGs imply a notoriously complicated set of constraints on the distribution of observed variables. In this work, we present entropic inequality constraints that are implied by $e$-separation relations in hidden variable DAGs with discrete observed variables. The constraints can intuitively be understood to follow from the fact that the capacity of variables along a causal pathway to convey information is restricted by their entropy; e.g. at the extreme case, a variable with entropy $0$ can convey no information. We show how these constraints can be used to learn about the true causal model from an observed data distribution. In addition, we propose a measure of causal influence called the minimal mediary entropy, and demonstrate that it can augment traditional measures such as the average causal effect.

Transformer(1篇)

【1】 Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis 标题:基于Transformer的快速SAT求解器和逻辑综合机器学习

作者:Feng Shi,Chonghan Lee,Mohammad Khairul Bashar,Nikhil Shukla,Song-Chun Zhu,Vijaykrishnan Narayanan 机构:University of California Los Angeles, The Pennsylvania State University, University of Virginia 链接:https://arxiv.org/abs/2107.07116 摘要:基于CNF的SAT和MaxSAT解算器是逻辑综合和验证系统的核心。这些约束问题在电子设计自动化中的日益普及,促使人们研究不同的SAT问题及其性质,以进一步提高计算效率。现代冲突驱动从句学习SAT求解器在理论和实践上都取得了成功,它可以在相对较短的时间内解决非常大的工业实例。近年来,机器学习方法为解决这一具有挑战性的问题提供了一个新的视角。神经符号模型可以作为一种通用的求解器,可以在不改变模型结构的情况下,根据数据对特定领域进行专门化。在这项工作中,我们提出了一个从Transformer架构导出的一次性模型来解决MaxSAT问题,这是SAT的优化版本,其目标是满足最大子句数。我们的模型具有无标度结构,可以处理不同大小的实例。我们使用元路径和自我注意机制来捕捉同质节点之间的交互。我们在二部图上采用交叉注意机制来捕捉异质节点之间的交互。我们进一步应用一个迭代算法来满足我们的模型中的附加条款,使解决方案接近于一个精确的SAT问题。注意机制利用并行性来加速。我们的评估表明,改进的加速比相比启发式方法和改进的完成率相比,机器学习方法。 摘要:CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for further computational efficiency. There has been both theoretical and practical success of modern Conflict-driven clause learning SAT solvers, which allows solving very large industrial instances in a relatively short amount of time. Recently, machine learning approaches provide a new dimension to solving this challenging problem. Neural symbolic models could serve as generic solvers that can be specialized for specific domains based on data without any changes to the structure of the model. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem, which is the optimization version of SAT where the goal is to satisfy the maximum number of clauses. Our model has a scale-free structure which could process varying size of instances. We use meta-path and self-attention mechanism to capture interactions among homogeneous nodes. We adopt cross-attention mechanisms on the bipartite graph to capture interactions among heterogeneous nodes. We further apply an iterative algorithm to our model to satisfy additional clauses, enabling a solution approaching that of an exact-SAT problem. The attention mechanisms leverage the parallelism for speedup. Our evaluation indicates improved speedup compared to heuristic approaches and improved completion rate compared to machine learning approaches.

GAN|对抗|攻击|生成相关(6篇)

【1】 Explore and Control with Adversarial Surprise 标题:对抗性突击的探索与控制

作者:Arnaud Fickinger,Natasha Jaques,Samyak Parajuli,Michael Chang,Nicholas Rhinehart,Glen Berseth,Stuart Russell,Sergey Levine 机构:UC Berkeley, Google Research, Brain Team 链接:https://arxiv.org/abs/2107.07394 摘要:强化学习(RL)提供了一个框架,学习目标导向的政策给予用户指定的奖励。然而,由于设计奖励往往需要大量的工程努力,我们感兴趣的是没有奖励的学习问题,即代理人必须在没有特定任务激励的情况下发现有用的行为。内在动机是一系列无监督的RL技术,它们为RL代理制定了总体目标,以优化这些目标,从而更好地探索或发现技能。在本文中,我们提出了一种新的无监督RL技术,该技术基于一个对抗性博弈,即两个策略相互竞争RL代理所经历的惊喜量。每个策略轮流控制代理。Explore策略将熵最大化,将代理置于意外或不熟悉的情况中。然后,控制策略接管并通过最小化熵来寻求从这些情况中恢复。该游戏利用多智能体竞争的力量,驱使智能体在学习掌握环境中越来越令人惊讶的部分的同时,寻找环境中越来越令人惊讶的部分。我们的经验表明,我们的方法导致出现复杂的技能表现出明确的相变。此外,我们在理论上(通过一个潜在的状态空间覆盖参数)和经验上都证明了我们的方法有潜力应用于随机的,部分观测的环境的探索。我们发现,对抗性惊喜比竞争性基线学习更复杂的行为,探索更有效,优于基于主动推理的内在动机方法、新颖性寻求(随机网络蒸馏(RND))和多智能体无监督的RL(不对称自玩(ASP)),Atari和VizDoom环境。 摘要:Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. However, since designing rewards often requires substantial engineering effort, we are interested in the problem of learning without rewards, where agents must discover useful behaviors in the absence of task-specific incentives. Intrinsic motivation is a family of unsupervised RL techniques which develop general objectives for an RL agent to optimize that lead to better exploration or the discovery of skills. In this paper, we propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. The policies each take turns controlling the agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. The game harnesses the power of multi-agent competition to drive the agent to seek out increasingly surprising parts of the environment while learning to gain mastery over them. We show empirically that our method leads to the emergence of complex skills by exhibiting clear phase transitions. Furthermore, we show both theoretically (via a latent state space coverage argument) and empirically that our method has the potential to be applied to the exploration of stochastic, partially-observed environments. We show that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments.

【2】 SilGAN: Generating driving maneuvers for scenario-based software-in-the-loop testing 标题:Silgan:为基于场景的软件在环测试生成驾驶动作

作者:Dhasarathy Parthasarathy,Anton Johansson 备注:Preprint of article accepted at The third IEEE International Conference On Artificial Intelligence Testing 2021, Oxford, UK 链接:https://arxiv.org/abs/2107.07364 摘要:汽车软件测试仍然很大程度上依赖于昂贵的现场测试来确保质量,因为基于模拟的测试等替代方法相对不成熟。作为降低对现场测试依赖的一个步骤,我们提出了SilGAN,一个深层生成模型,简化了汽车软件在环测试的规范、刺激生成和自动化。利用现场车辆记录的数据对模型进行训练。在训练之后,该模型使用一个简明的驾驶场景规范来生成在这种场景中可能发生的真实车辆状态转换。这种真实的车辆内部行为仿真可用于快速、系统和廉价的车辆控制软件测试。此外,通过提出一种有针对性的方法来搜索模型学习到的信息,我们展示了如何自动实现测试目标(如代码覆盖率)。数据驱动的端到端测试管道大大扩展了基于汽车仿真的测试的范围和可信度。这缩短了上市时间,同时有助于保持所需的质量标准。 摘要:Automotive software testing continues to rely largely upon expensive field tests to ensure quality because alternatives like simulation-based testing are relatively immature. As a step towards lowering reliance on field tests, we present SilGAN, a deep generative model that eases specification, stimulus generation, and automation of automotive software-in-the-loop testing. The model is trained using data recorded from vehicles in the field. Upon training, the model uses a concise specification for a driving scenario to generate realistic vehicle state transitions that can occur during such a scenario. Such authentic emulation of internal vehicle behavior can be used for rapid, systematic and inexpensive testing of vehicle control software. In addition, by presenting a targeted method for searching through the information learned by the model, we show how a test objective like code coverage can be automated. The data driven end-to-end testing pipeline that we present vastly expands the scope and credibility of automotive simulation-based testing. This reduces time to market while helping maintain required standards of quality.

【3】 Variational Topic Inference for Chest X-Ray Report Generation 标题:胸部X线报告生成的变分主题推理

作者:Ivona Najdenkoska,Xiantong Zhen,Marcel Worring,Ling Shao 机构: AIM Lab, University of Amsterdam, The Netherlands 备注:To be published in the International Conference on Medical Image Computing and Computer Assisted Intervention 2021 链接:https://arxiv.org/abs/2107.07314 摘要:自动生成医学影像报告有望减少工作量,并在临床实践中辅助诊断。最近的研究表明,深度学习模型可以成功地描述自然图像。然而,从医学数据中学习是具有挑战性的,因为不同专业知识和经验的放射科医生所写的报告具有多样性和不确定性。为了解决这些问题,我们提出了变分主题推理来自动生成报告。具体来说,我们引入一组主题作为潜在变量,通过在潜在空间中对齐图像和语言形式来指导句子生成。主题是在条件变分推理框架中推理出来的,每个主题都控制着报告中句子的生成。此外,我们采用视觉注意模块,使模型能够注意到图像中的不同位置,并生成更多的信息描述。我们在两个基准上进行了广泛的实验,即印第安纳大学。胸部X光和模拟CXR。结果表明,本文提出的变分主题推理方法可以生成新颖的报告,而不仅仅是训练中使用的报告的副本,同时在标准语言生成标准方面仍能达到与现有方法相当的性能。 摘要:Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To tackle these challenges, we propose variational topic inference for automatic report generation. Specifically, we introduce a set of topics as latent variables to guide sentence generation by aligning image and language modalities in a latent space. The topics are inferred in a conditional variational inference framework, with each topic governing the generation of a sentence in the report. Further, we adopt a visual attention module that enables the model to attend to different locations in the image and generate more informative descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate novel reports rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria.

【4】 Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills 标题:翻转表:从半结构化表格生成示例,以赋予语言模型推理技能

作者:Ori Yoran,Alon Talmor,Jonathan Berant 机构:Tel-Aviv University,The Allen Institute for AI 链接:https://arxiv.org/abs/2107.07261 摘要:通过语言建模目标预先训练的模型拥有丰富的世界知识和语言技能,但已知在需要推理的任务中会遇到困难。在这项工作中,我们建议利用半结构化表格,自动生成大规模的问题-段落对,其中回答问题需要对段落中的多个事实进行推理。我们在这个合成数据上添加了一个预训练步骤,其中包括需要16种不同推理技能的示例,例如数字比较、连接和事实合成。为了提高数据效率,我们提出了抽样策略,重点训练模型目前缺乏的推理技能。我们在三个以推理为中心的阅读理解数据集上对我们的方法进行了评估,结果表明,我们的模型PReasM大大优于T5,T5是一种流行的预训练编译码模型。此外,基于当前模型误差的抽样示例可以提高训练速度和整体性能。 摘要:Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs, where answering the question requires reasoning over multiple facts in the paragraph. We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills such as number comparison, conjunction, and fact composition. To improve data efficiency, we propose sampling strategies that focus training on reasoning skills the model is currently lacking. We evaluate our approach on three reading comprehension datasets that are focused on reasoning, and show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model. Moreover, sampling examples based on current model errors leads to faster training and higher overall performance.

【5】 MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators 标题:MCL-GAN:具有多个专门鉴别器的生成性对抗性网络

作者:Jinyoung Choi,Bohyung Han 机构:Computer Vision Laboratory, Dept. of ECE & ASRI, Seoul National University, Korea 链接:https://arxiv.org/abs/2107.07260 摘要:我们提出了一个具有多个鉴别器的生成性对抗网络,其中每个鉴别器被专门化以区分真实数据集的子集。这种方法有助于学习与底层数据分布相一致的生成器,从而缓解慢性模式崩溃问题。从多项选择学习的启发出发,我们引导每个鉴别器在整个数据子集中拥有专业知识,并允许生成器自动在潜在数据空间和真实数据空间之间找到合理的对应关系,而无需监督训练实例和鉴别器的数量。尽管使用了多个鉴别器,但是主干网在鉴别器之间共享,并且训练成本的增加被最小化。我们使用多个评估指标在标准数据集中证明了算法的有效性。 摘要:We propose a generative adversarial network with multiple discriminators, where each discriminator is specialized to distinguish the subset of a real dataset. This approach facilitates learning a generator coinciding with the underlying data distribution and thus mitigates the chronic mode collapse problem. From the inspiration of multiple choice learning, we guide each discriminator to have expertise in the subset of the entire data and allow the generator to find reasonable correspondences between the latent and real data spaces automatically without supervision for training examples and the number of discriminators. Despite the use of multiple discriminators, the backbone networks are shared across the discriminators and the increase of training cost is minimized. We demonstrate the effectiveness of our algorithm in the standard datasets using multiple evaluation metrics.

【6】 Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting 标题:子网替换:灰箱环境下针对深度神经网络的部署阶段后门攻击

作者:Xiangyu Qi,Jifeng Zhu,Chulin Xie,Yong Yang 机构:Zhejiang University, Tencent Zhuque Lab, University of Illinois at Urbana-Champaign 备注:6 pages, 3 figures, ICLR 2021 Workshop on Security and Safety in Machine Learning System 链接:https://arxiv.org/abs/2107.07240 摘要:我们研究了在部署阶段对深度神经网络(DNNs)进行后门攻击的现实可能性。具体来说,我们的目标是设计一个部署阶段的后门攻击算法,既具有威胁性,又具有现实可实施性。为此,我们提出了子网替换攻击(SRA),通过直接修改有限的模型参数,将后门嵌入DNNs。考虑到现实的实用性,我们放弃了现有研究中广泛采用的强白盒假设,而是在灰盒环境下工作,在灰盒环境下,受害者模型的结构信息是可用的,而对手不知道参数值。我们方法的基本原理是——给定某个特定体系结构的任何神经网络实例(不管其具体参数值如何),我们总是可以将后门嵌入到该模型实例中,方法是用恶意后门子网替换良性模型(无后门)的非常窄的子网,它被设计成对特定的后门触发模式敏感(触发大激活值)。 摘要:We study the realistic potential of conducting backdoor attack against deep neural networks (DNNs) during deployment stage. Specifically, our goal is to design a deployment-stage backdoor attack algorithm that is both threatening and realistically implementable. To this end, we propose Subnet Replacement Attack (SRA), which is capable of embedding backdoor into DNNs by directly modifying a limited number of model parameters. Considering the realistic practicability, we abandon the strong white-box assumption widely adopted in existing studies, instead, our algorithm works in a gray-box setting, where architecture information of the victim model is available but the adversaries do not have any knowledge of parameter values. The key philosophy underlying our approach is -- given any neural network instance (regardless of its specific parameter values) of a certain architecture, we can always embed a backdoor into that model instance, by replacing a very narrow subnet of a benign model (without backdoor) with a malicious backdoor subnet, which is designed to be sensitive (fire large activation value) to a particular backdoor trigger pattern.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】 A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification 标题:温文尔雅地介绍保角预测和无分布不确定性量化

作者:Anastasios N. Angelopoulos,Stephen Bates 备注:Blog and tutorial video this http URL 链接:https://arxiv.org/abs/2107.07511 摘要:黑箱机器学习方法现在经常用于高风险环境,如医疗诊断,这需要不确定性量化,以避免间接的模型故障。无分布不确定性量化(Distribution-free UQ)是一种用户友好的范例,用于为此类预测创建统计上严格的置信区间/集。关键的是,区间/集合在没有分布假设或模型假设的情况下是有效的,并且具有有限多个数据点的显式保证。适应输入的难度;当输入示例很困难时,不确定性区间/集合很大,表明模型可能是错误的。不需要做太多的工作,就可以在任何底层算法(如神经网络)上使用无分布方法来生成保证包含用户指定概率(如90%)的基本真值的置信集。事实上,这些方法易于理解和通用,适用于计算机视觉、自然语言处理、深度强化学习等领域的许多现代预测问题。这个实际操作的介绍是针对读者感兴趣的实际执行无分布UQ,包括保形预测和相关方法,谁不一定是一个统计学家。我们将用pythorn语法在Python中包含许多解释性说明、示例和代码示例。其目的是让读者对无分布UQ有一个有效的理解,允许他们用一个自包含的文档在算法上设置置信区间。 摘要:Black-box machine learning learning methods are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Distribution-free uncertainty quantification (distribution-free UQ) is a user-friendly paradigm for creating statistically rigorous confidence intervals/sets for such predictions. Critically, the intervals/sets are valid without distributional assumptions or model assumptions, with explicit guarantees with finitely many datapoints. Moreover, they adapt to the difficulty of the input; when the input example is difficult, the uncertainty intervals/sets are large, signaling that the model might be wrong. Without much work, one can use distribution-free methods on any underlying algorithm, such as a neural network, to produce confidence sets guaranteed to contain the ground truth with a user-specified probability, such as 90%. Indeed, the methods are easy-to-understand and general, applying to many modern prediction problems arising in the fields of computer vision, natural language processing, deep reinforcement learning, and so on. This hands-on introduction is aimed at a reader interested in the practical implementation of distribution-free UQ, including conformal prediction and related methods, who is not necessarily a statistician. We will include many explanatory illustrations, examples, and code samples in Python, with PyTorch syntax. The goal is to provide the reader a working understanding of distribution-free UQ, allowing them to put confidence intervals on their algorithms, with one self-contained document.

【2】 Randomized ReLU Activation for Uncertainty Estimation of Deep Neural Networks 标题:用于深度神经网络不确定性估计的随机REU激活

作者:Yufeng Xia,Jun Zhang,Zhiqiang Gong,Tingsong Jiang,Wen Yao 机构:Yao, Received: date Accepted: date 链接:https://arxiv.org/abs/2107.07197 摘要:深度神经网络(DNNs)已经成功地学习了各种任务中有用的数据表示,然而,评估这些表示的可靠性仍然是一个挑战。深系综被广泛认为是最先进的不确定度估计方法,但它的训练和测试成本很高。MC-Dropout是另一种替代方法,其成本较低,但缺乏预测的多样性。为了在更短的时间内得到更多的预测,我们引入了随机ReLU激活(RRA)框架。在此框架下,我们提出了两种估计不确定性的策略:MC-DropReLU和MC-RReLU。RRA框架没有像MC Dropout那样随机丢弃网络中的一些神经元,而是将随机性添加到激活功能模块中,使输出多样化。据我们所知,这是第一次尝试添加随机性的激活功能模块,以产生预测不确定性。从方差的角度分析比较了MC-Dropout方法和本方法的输出多样性,得到了两种方法的超参数与输出多样性的关系。此外,该方法实现简单,不需要修改现有模型。我们在CIFAR10、CIFAR100和tinyimagnet三个广泛使用的数据集上对RRA框架进行了实验验证。实验结果表明,该方法具有很好的性能,但在训练时间和记忆要求上更为优越。 摘要:Deep neural networks (DNNs) have successfully learned useful data representations in various tasks, however, assessing the reliability of these representations remains a challenge. Deep Ensemble is widely considered the state-of-the-art method for uncertainty estimation, but it is very expensive to train and test. MC-Dropout is another alternative method, which is less expensive but lacks the diversity of predictions. To get more diverse predictions in less time, we introduce Randomized ReLU Activation (RRA) framework. Under the framework, we propose two strategies, MC-DropReLU and MC-RReLU, to estimate uncertainty. Instead of randomly dropping some neurons of the network as in MC-Dropout, the RRA framework adds randomness to the activation function module, making the outputs diverse. As far as we know, this is the first attempt to add randomness to the activation function module to generate predictive uncertainty. We analyze and compare the output diversity of MC-Dropout and our method from the variance perspective and obtain the relationship between the hyperparameters and output diversity in the two methods. Moreover, our method is simple to implement and does not need to modify the existing model. We experimentally validate the RRA framework on three widely used datasets, CIFAR10, CIFAR100, and TinyImageNet. The experiments demonstrate that our method has competitive performance but is more favorable in training time and memory requirements.

【3】 MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning 标题:壁画:结果驱动强化学习的元学习不确定性感知奖励

作者:Kevin Li,Abhishek Gupta,Ashwin Reddy,Vitchyr Pong,Aurick Zhou,Justin Yu,Sergey Levine 机构:Equal contribution 1Department of Electrical Engineering andComputer Sciences 备注:Accepted to ICML 2021. First two authors contributed equally 链接:https://arxiv.org/abs/2107.07184 摘要:强化学习中的探索是一个具有挑战性的问题:在最坏的情况下,代理必须搜索可能隐藏在状态空间任何地方的奖励状态。我们是否可以定义一类更容易处理的RL问题,在这种情况下,向代理提供成功结果的示例?在这个问题设置中,通过训练分类器将状态分类为成功与否,可以自动获得奖励函数。如果训练得当,这样的分类器不仅可以提供奖励功能,而且实际上可以提供一个良好的客观景观,既可以促进向良好状态的进展,又可以提供校准的探索奖励。在这项工作中,我们证明了一个不确定性感知分类器可以解决具有挑战性的强化学习问题,既鼓励探索,又为积极的结果提供指导。我们提出了一种新的机制来获得这些校准的,不确定性感知的分类器,基于一种用于计算归一化最大似然(NML)分布的摊销技术,还展示了如何利用元学习工具使这些技术在计算上易于处理。我们证明了所得到的算法与基于计数的探索方法和用于学习奖励函数的先验算法有许多有趣的联系,同时也为目标提供了更有效的指导。我们证明了我们的算法解决了许多具有挑战性的导航和机器人操作任务,这些任务对于以前的方法来说是困难或不可能的。 摘要:Exploration in reinforcement learning is a challenging problem: in the worst case, the agent must search for reward states that could be hidden anywhere in the state space. Can we define a more tractable class of RL problems, where the agent is provided with examples of successful outcomes? In this problem setting, the reward function can be obtained automatically by training a classifier to categorize states as successful or not. If trained properly, such a classifier can not only afford a reward function, but actually provide a well-shaped objective landscape that both promotes progress toward good states and provides a calibrated exploration bonus. In this work, we we show that an uncertainty aware classifier can solve challenging reinforcement learning problems by both encouraging exploration and provided directed guidance towards positive outcomes. We propose a novel mechanism for obtaining these calibrated, uncertainty-aware classifiers based on an amortized technique for computing the normalized maximum likelihood (NML) distribution, also showing how these techniques can be made computationally tractable by leveraging tools from meta-learning. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions, while also providing more effective guidance towards the goal. We demonstrate that our algorithm solves a number of challenging navigation and robotic manipulation tasks which prove difficult or impossible for prior methods.

【4】 Backprop-Free Reinforcement Learning with Active Neural Generative Coding 标题:基于主动神经产生式编码的无反向传播强化学习

作者:Alexander Ororbia,Ankur Mali 机构:† Rochester Institute of Technology, Rochester, NY , ‡ Pennsylvania State University, University Park, PA 链接:https://arxiv.org/abs/2107.07046 摘要:在人类中,感知有助于从感官输入中快速识别和提取信息。这种意识在很大程度上取决于人类主体如何与环境交互。在这项工作中,我们提出了主动神经生成编码,这是一个在动态环境中学习动作驱动生成模型而不需要错误反向传播(backprop)的计算框架。具体来说,我们开发了一个智能代理,即使在报酬稀少的情况下也能进行操作,这是从计划作为推理的认知理论中得到的启发。在在线学习环境下,我们在几个控制问题上证明了我们提出的模型框架与深度Q-学习模型的性能相当。我们的代理的健壮性能提供了一个有希望的证据,证明神经推理和学习的无后推方法可以驱动目标导向行为。 摘要:In humans, perceptual awareness facilitates the fast recognition and extraction of information from sensory input. This awareness largely depends on how the human agent interacts with the environment. In this work, we propose active neural generative coding, a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments. Specifically, we develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference. We demonstrate on several control problems, in the online learning setting, that our proposed modeling framework performs competitively with deep Q-learning models. The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.

【5】 Assign Hysteresis Parameter For Ericsson BTS Power Saving Algorithm Using Unsupervised Learning 标题:基于无监督学习的爱立信基站节能算法滞后参数分配

作者:Thaer Sahmoud,Wesam Ashor 机构:Computer Engineering Department, Islamic University of Gaza, Palestine 备注:7 pages, 4 tables, 9 figures 链接:https://arxiv.org/abs/2107.07412 摘要:加沙地带长期缺电,影响到包括电信领域在内的所有行业,因此需要优化和减少电信设备的电力消耗。本文基于无监督机器学习聚类K-means算法,提出了一种新的模型,帮助GSM射频工程师选择最优的滞环参数值,用于Ericsson BTS的节电算法。通过使用我们的模型和BTS节能算法,我们减少了20.9%的活动TRX的数量。 摘要:Gaza Strip suffers from a chronic electricity deficit that affects all industries including the telecommunication field, so there is a need to optimize and reduce power consumption of the telecommunication equipment. In this paper we propose a new model that helps GSM radio frequency engineers to choose the optimal value of hysteresis parameter for Ericsson BTS power saving algorithm which aims to switch OFF unused frequency channels, our model is based on unsupervised machine learning clustering K-means algorithm. By using our model with BTS power saving algorithm we reduce number of active TRX by 20.9%.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】 Only Train Once: A One-Shot Neural Network Training And Pruning Framework 标题:只训练一次:一种一次性神经网络训练和修剪框架

作者:Tianyi Chen,Bo Ji,Tianyu Ding,Biyi Fang,Guanyi Wang,Zhihui Zhu,Luming Liang,Yixin Shi,Sheng Yi,Xiao Tu 机构:Microsoft, Zhejiang University, Johns Hopkins University, Georgia Institute of Technology, University of Denver 备注:Under Review 链接:https://arxiv.org/abs/2107.07467 摘要:结构化剪枝是将深度神经网络(DNNs)部署到资源受限设备上的一种常用技术。然而,现有的剪枝方法通常是启发式的,任务指定的,并且需要额外的微调过程。为了克服这些局限性,我们提出了一个框架,将DNNs压缩成更轻薄的体系结构,只需训练一次(OTO)即可获得具有竞争力的性能和显著的FLOPs降低。OTO包含两个关键点:(i)将DNNs的参数划分为零不变群,使我们能够在不影响输出的情况下剪除零群;(ii)为了提升零群,我们构造了一个结构化稀疏优化问题,并提出了一种新的优化算法半空间随机投影梯度(HSPG)来解决该问题,该算法在群稀疏性探索方面优于标准的近邻方法,并保持了相当的收敛性。为了验证OTO的有效性,我们在不微调推理加速和参数缩减的情况下,从零开始同时训练和压缩完整模型,并在VGG16上实现了CIFAR10的最新结果,ResNet50上实现了CIFAR10/ImageNet和Bert上的最新结果。 摘要:Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable convergence. To demonstrate the effectiveness of OTO, we train and compress full models simultaneously from scratch without fine-tuning for inference speedup and parameter reduction, and achieve state-of-the-art results on VGG16 for CIFAR10, ResNet50 for CIFAR10/ImageNet and Bert for SQuAD.

【2】 FLEX: Unifying Evaluation for Few-Shot NLP 标题:FLEX:对少发NLP的统一评估

作者:Jonathan Bragg,Arman Cohan,Kyle Lo,Iz Beltagy 机构:Allen Institute for AI, Seattle, WA 备注:First two authors contributed equally. Code and leaderboard available at: this https URL 链接:https://arxiv.org/abs/2107.07170 摘要:少数镜头NLP研究是高度活跃的,但进行不相交的研究线程与评估套件,缺乏挑战性,但现实的测试设置,并没有采用仔细的实验设计。因此,社区不知道哪些技术表现最好,甚至不知道它们是否优于简单的基线。我们为一个理想的Few-ShotNLP基准制定了需求,并提出了FLEX,第一个基准,公共排行榜,以及为Few-ShotNLP技术提供统一、全面测量的框架。FLEX整合并引入了新的少数镜头评估最佳实践,包括四种传输设置的测量、Zero-Shot评估的文本标签,以及优化统计准确性的基准设计原则性方法,同时使研究人员无需大量计算资源即可获得评估成本。此外,我们还提出了UniFew,这是一个简单而强大的基于提示的Few-Shot学习模型,它将预训练和微调提示格式统一起来,避免了最近基于提示方法的复杂机制,使下游任务格式适应语言模型预训练目标。我们证明,尽管简单UniFew取得的结果与流行的元学习和基于提示的方法都具有竞争力。 摘要:Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which techniques perform best or even if they outperform simple baselines. We formulate desiderata for an ideal few-shot NLP benchmark and present FLEX, the first benchmark, public leaderboard, and framework that provides unified, comprehensive measurement for few-shot NLP techniques. FLEX incorporates and introduces new best practices for few-shot evaluation, including measurement of four transfer settings, textual labels for zero-shot evaluation, and a principled approach to benchmark design that optimizes statistical accuracy while keeping evaluation costs accessible to researchers without large compute resources. In addition, we present UniFew, a simple yet strong prompt-based model for few-shot learning which unifies the pretraining and finetuning prompt formats, eschewing complex machinery of recent prompt-based approaches in adapting downstream task formats to language model pretraining objectives. We demonstrate that despite simplicity UniFew achieves results competitive with both popular meta-learning and prompt-based approaches.

【3】 NeuSaver: Neural Adaptive Power Consumption Optimization for Mobile Video Streaming 标题:NeuSaver:移动视频流的神经自适应功耗优化

作者:Kyoungjun Park,Myungchul Kim,Laihyuk Park 机构:Seoul National University of Science and Technology 备注:13 pages, 8 figures, 3 tables, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 链接:https://arxiv.org/abs/2107.07127 摘要:视频流服务致力于以更高的分辨率和帧速率支持高质量的视频,以提高体验质量(QoE)。然而,高质量的视频在移动设备上消耗了大量的能量。本文提出了NeuSaver,它在不影响用户体验的前提下,通过对每个视频块应用自适应的帧速率来降低移动设备在流式传输视频时的功耗。NeuSaver生成一个最优策略,该策略使用强化学习(RL)为每个视频块确定适当的帧速率。RL模型根据先前的观察自动学习使QoE目标最大化的策略。NeuSaver还使用了异步的advantage-actor-critic算法来快速而健壮地增强RL模型。支持NeuSaver的流媒体服务器将视频预处理为具有不同帧速率的片段,这类似于在HTTP上的动态自适应流媒体中创建具有多个比特率的视频的过程。NeuSaver使用常用的H.264视频编解码器。我们在各种实验和用户研究中评估了NeuSaver,通过四个视频类别以及最先进的模型。我们的实验表明,NeuSaver在实现高QoE的同时,有效地降低了移动设备在流视频时的功耗,平均降低了16.14%,最高降低了23.12%。 摘要:Video streaming services strive to support high-quality videos at higher resolutions and frame rates to improve the quality of experience (QoE). However, high-quality videos consume considerable amounts of energy on mobile devices. This paper proposes NeuSaver, which reduces the power consumption of mobile devices when streaming videos by applying an adaptive frame rate to each video chunk without compromising user experience. NeuSaver generates an optimal policy that determines the appropriate frame rate for each video chunk using reinforcement learning (RL). The RL model automatically learns the policy that maximizes the QoE goals based on previous observations. NeuSaver also uses an asynchronous advantage actor-critic algorithm to reinforce the RL model quickly and robustly. Streaming servers that support NeuSaver preprocesses videos into segments with various frame rates, which is similar to the process of creating videos with multiple bit rates in dynamic adaptive streaming over HTTP. NeuSaver utilizes the commonly used H.264 video codec. We evaluated NeuSaver in various experiments and a user study through four video categories along with the state-of-the-art model. Our experiments showed that NeuSaver effectively reduces the power consumption of mobile devices when streaming video by an average of 16.14% and up to 23.12% while achieving high QoE.

【4】 Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition 标题:利用层次结构进行Few-Shot乐器识别

作者:Hugo Flores Garcia,Aldo Aguilar,Ethan Manilow,Bryan Pardo 机构:Interactive Audio Lab, Northwestern University, Evanston, IL, USA 链接:https://arxiv.org/abs/2107.07029 摘要:乐器识别的深度学习工作通常集中在乐器类,我们有丰富的数据。在这项工作中,我们利用在少数镜头学习设置的乐器之间的层次关系,使分类更广泛的一套乐器,给出了一些例子在推理。我们把一个典型的层次结构的音乐原型与一个预先定义的层次结构相结合。这些扩展不需要对网络体系结构进行任何更改,并且可以轻松地添加或删除新的级别。与非分层的少数镜头基线相比,该方法显著提高了分类精度,显著降低了训练中未发现的仪器分类错误的严重程度。 摘要:Deep learning work on musical instrument recognition has generally focused on instrument classes for which we have abundant data. In this work, we exploit hierarchical relationships between instruments in a few-shot learning setup to enable classification of a wider set of musical instruments, given a few examples at inference. We apply a hierarchical loss function to the training of prototypical networks, combined with a method to aggregate prototypes hierarchically, mirroring the structure of a predefined musical instrument hierarchy. These extensions require no changes to the network architecture and new levels can be easily added or removed. Compared to a non-hierarchical few-shot baseline, our method leads to a significant increase in classification accuracy and significant decrease mistake severity on instrument classes unseen in training.

【5】 Finding Significant Features for Few-Shot Learning using Dimensionality Reduction 标题:基于降维的Few-Shot学习中重要特征的发现

作者:Mauricio Mendez-Ruiz,Ivan Garcia Jorge Gonzalez-Zapata,Gilberto Ochoa-Ruiz,Andres Mendez-Vazquez 机构: Tecnol´ogico de Monterrey, School of Engineering and Sciences, Mexico, CINVESTAV Unidad Guadalajara, Mexico 备注:This paper is currently under review for the Mexican International Conference on Artificial Intelligence (MICAI) 2021 链接:https://arxiv.org/abs/2107.06992 摘要:Few-Shot学习是一种相对较新的技术,专门用于处理数据量很少的问题。这些方法的目标是用少量的样本对以前没有见过的类别进行分类。最近的方法,如度量学习,采用元学习策略,其中我们有符合支持(训练)数据和查询(测试)数据的幕式任务。度量学习方法已经证明,简单的模型可以通过学习相似函数来比较支持和查询数据,从而获得良好的性能。然而,由给定的度量学习方法所学习的特征空间可能不会利用特定的少数镜头任务所提供的信息。在这项工作中,我们探索使用降维技术作为一种方法,以找到任务的重要特征,帮助作出更好的预测。我们通过基于类内和类间距离的评分,以及选择不同类实例距离较远而同一类实例距离较近的特征约简方法来度量约简后的特征的性能。该模块允许度量学习方法给出的相似函数具有更多的判别特征,从而提高分类的准确率。我们的方法比minimagenet数据集中的度量学习基线的准确率性能提高了约2%。 摘要:Few-shot learning is a relatively new technique that specializes in problems where we have little amounts of data. The goal of these methods is to classify categories that have not been seen before with just a handful of samples. Recent approaches, such as metric learning, adopt the meta-learning strategy in which we have episodic tasks conformed by support (training) data and query (test) data. Metric learning methods have demonstrated that simple models can achieve good performance by learning a similarity function to compare the support and the query data. However, the feature space learned by a given metric learning approach may not exploit the information given by a specific few-shot task. In this work, we explore the use of dimension reduction techniques as a way to find task-significant features helping to make better predictions. We measure the performance of the reduced features by assigning a score based on the intra-class and inter-class distance, and selecting a feature reduction method in which instances of different classes are far away and instances of the same class are close. This module helps to improve the accuracy performance by allowing the similarity function, given by the metric learning method, to have more discriminative features for the classification. Our method outperforms the metric learning baselines in the miniImageNet dataset by around 2% in accuracy performance.

强化学习(3篇)

【1】 PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration 标题:PC-MLP:具有策略覆盖引导探索的基于模型的强化学习

作者:Yuda Song,Wen Sun 机构: CarnegieMellon University, USA 2Department of ComputerScience, Cornell University 链接:https://arxiv.org/abs/2107.07410 摘要:与无模型强化学习相比,基于模型的强化学习(RL)具有潜在的样本效率,是一种流行的学习范式。然而,现有的基于经验模型的RL方法缺乏探索能力。研究了核化非线性调节器(KNR)和线性Markov决策过程(MDPs)的一种计算和统计有效的基于模型的算法。对于这两种模型,我们的算法都保证了多项式样本的复杂度,并且只使用了对规划预言机的访问。实验上,我们首先证明了我们的算法的灵活性和有效性,在一组具有挑战性的探索控制任务中,现有的基于经验模型的RL方法完全失败。然后,我们证明了即使在不需要大量探索的常见密集报酬控制基准中,我们的方法仍然保持了优异的性能。最后,我们证明了我们的方法也可以有效地进行无报酬的探索。我们的代码可以在https://github.com/yudasong/PCMLP. 摘要:Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL. However, existing empirical model-based RL approaches lack the ability to explore. This work studies a computationally and statistically efficient model-based algorithm for both Kernelized Nonlinear Regulators (KNR) and linear Markov Decision Processes (MDPs). For both models, our algorithm guarantees polynomial sample complexity and only uses access to a planning oracle. Experimentally, we first demonstrate the flexibility and efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration. Finally, we demonstrate that our method can also perform reward-free exploration efficiently. Our code can be found at https://github.com/yudasong/PCMLP.

【2】 A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis 标题:基于程序综合的数学推理强化学习环境

作者:Joseph Palermo,Johnny Ye,Alok Singh 机构:Cash App Labs, Lawrence Berkeley National Laboratory 链接:https://arxiv.org/abs/2107.07373 摘要:我们将DeepMind数学数据集转化为一个强化学习环境,将其解释为一个程序综合问题。在环境中执行的每个操作都会将一个运算符或一个输入添加到离散计算图中。计算正确答案的图产生正回报,使策略的优化能够构建以问题陈述为条件的计算图。使用双DQN对各种问题类型的子集训练基线模型,证明了在组合爆炸和噪声奖励的挑战下学习正确构造图的能力。 摘要:We convert the DeepMind Mathematics Dataset into a reinforcement learning environment by interpreting it as a program synthesis problem. Each action taken in the environment adds an operator or an input into a discrete compute graph. Graphs which compute correct answers yield positive reward, enabling the optimization of a policy to construct compute graphs conditioned on problem statements. Baseline models are trained using Double DQN on various subsets of problem types, demonstrating the capability to learn to correctly construct graphs despite the challenges of combinatorial explosion and noisy rewards.

【3】 NVCell: Standard Cell Layout in Advanced Technology Nodes with Reinforcement Learning 标题:NVCell:基于强化学习的高级技术节点标准单元布局

作者:Haoxing Ren,Matthew Fojtik,Brucek Khailany 机构:NVIDIA, Austin, TX, USA, Durham, NC, USA 链接:https://arxiv.org/abs/2107.07044 摘要:由于复杂的设计规则,高质量的标准单元布局自动化在当今工业中仍然具有挑战性。本文介绍了一种称为NVCell的自动标准单元布局生成器,它可以在一个先进的技术节点上为工业标准单元库中90%以上的单行单元生成面积相等或更小的布局。NVCell利用强化学习(RL)来修复布线过程中的设计规则冲突,并生成有效的布局。 摘要:High quality standard cell layout automation in advanced technology nodes is still challenging in the industry today because of complex design rules. In this paper we introduce an automatic standard cell layout generator called NVCell that can generate layouts with equal or smaller area for over 90% of single row cells in an industry standard cell library on an advanced technology node. NVCell leverages reinforcement learning (RL) to fix design rule violations during routing and to generate efficient placements.

元学习(1篇)

【1】 Proceedings of the Sixteenth Workshop on Logical Frameworks and Meta-Languages: Theory and Practice 标题:第十六届逻辑框架与元语言研讨会论文集:理论与实践

作者:Elaine Pimentel,Enrico Tassi 备注:None 链接:https://arxiv.org/abs/2107.07376 摘要:逻辑框架和元语言形成了一个共同的基础,用于表示、实现和推理逻辑和计算机科学中感兴趣的各种演绎系统。它们的设计、实现和在推理任务中的应用,从软件的正确性到形式化系统的属性,都是过去二十年来大量研究的焦点。本次研讨会汇集了设计人员、实施人员和实践者,讨论了影响逻辑框架结构和实用性的各个方面,包括变量绑定的处理、归纳和共归纳推理技术以及推理过程的表达性和清晰性。 摘要:Logical frameworks and meta-languages form a common substrate for representing, implementing and reasoning about a wide variety of deductive systems of interest in logic and computer science. Their design, implementation and their use in reasoning tasks, ranging from the correctness of software to the properties of formal systems, have been the focus of considerable research over the last two decades. This workshop brings together designers, implementors and practitioners to discuss various aspects impinging on the structure and utility of logical frameworks, including the treatment of variable binding, inductive and co-inductive reasoning techniques and the expressiveness and lucidity of the reasoning process.

医学相关(2篇)

【1】 Personalized and Reliable Decision Sets: Enhancing Interpretability in Clinical Decision Support Systems 标题:个性化和可靠的决策集:增强临床决策支持系统的可解释性

作者:Francisco Valente,Simão Paredes,Jorge Henriques 机构: 1Centre for Informatics and Systems of University of Coimbra, Portugal 2Coimbra Institute of Engineering(ISEC) 备注:Accepted to the ICML 2021 Workshop on Interpretable Machine Learning in Healthcare 链接:https://arxiv.org/abs/2107.07483 摘要:在这项研究中,我们提出了一个新的临床决策支持系统,并讨论了其解释性相关的性质。它将决策规则集与机器学习方案相结合,以提供全局和局部的可解释性。更具体地说,机器学习用于预测这些规则中的每一个对于特定患者是正确的可能性,这也可能有助于更好的预测性能。此外,个人预测的可靠性分析也被提出,有助于进一步个性化的解释。这几个要素的结合对于获得临床利益相关者的信任至关重要,从而更好地评估患者的病情和改善医生的决策。 摘要:In this study, we present a novel clinical decision support system and discuss its interpretability-related properties. It combines a decision set of rules with a machine learning scheme to offer global and local interpretability. More specifically, machine learning is used to predict the likelihood of each of those rules to be correct for a particular patient, which may also contribute to better predictive performances. Moreover, the reliability analysis of individual predictions is also addressed, contributing to further personalized interpretability. The combination of these several elements may be crucial to obtain the clinical stakeholders' trust, leading to a better assessment of patients' conditions and improvement of the physicians' decision-making.

【2】 Multi-Channel Auto-Encoders and a Novel Dataset for Learning Domain Invariant Representations of Histopathology Images 标题:多通道自动编码器和一种用于学习组织病理学图像领域不变表示的新数据集

作者:Andrew Moyes,Richard Gault,Kun Zhang,Ji Ming,Danny Crookes,Jing Wang 机构:. School of Electronics, Electrical Engineering and Computer Science, Queen’s University, Belfast 链接:https://arxiv.org/abs/2107.07271 摘要:领域转移是开发自动组织病理学管道时经常遇到的问题。机器学习模型(如自动组织病理学管道中的卷积神经网络)在应用于新的数据域时,由于染色和扫描协议的不同,其性能往往会降低。双通道自动编码器(DCAE)模型以前被证明可以产生对不同数码幻灯片扫描仪引入的外观变化不太敏感的特征表示。在这项工作中,多通道自动编码器(MCAE)模型被提出作为DCAE的一个扩展,DCAE从两个以上的数据域学习。此外,使用CycleGANs生成一个合成数据集,该数据集包含对齐的组织图像,这些图像的外观经过了合成修改。实验结果表明,MCAE模型产生的特征表示对域间变化的敏感性低于比较StaNoSA方法。此外,MCAE和StaNoSA模型在一个新的组织分类任务上进行了测试。实验结果表明,MCAE模型的f1成绩比StaNoSA模型高出5个百分点。这些结果表明,MCAE模型通过主动学习标准化的特征表示,能够比现有方法更好地推广到新的数据和任务中。 摘要:Domain shift is a problem commonly encountered when developing automated histopathology pipelines. The performance of machine learning models such as convolutional neural networks within automated histopathology pipelines is often diminished when applying them to novel data domains due to factors arising from differing staining and scanning protocols. The Dual-Channel Auto-Encoder (DCAE) model was previously shown to produce feature representations that are less sensitive to appearance variation introduced by different digital slide scanners. In this work, the Multi-Channel Auto-Encoder (MCAE) model is presented as an extension to DCAE which learns from more than two domains of data. Additionally, a synthetic dataset is generated using CycleGANs that contains aligned tissue images that have had their appearance synthetically modified. Experimental results show that the MCAE model produces feature representations that are less sensitive to inter-domain variations than the comparative StaNoSA method when tested on the novel synthetic data. Additionally, the MCAE and StaNoSA models are tested on a novel tissue classification task. The results of this experiment show the MCAE model out performs the StaNoSA model by 5 percentage-points in the f1-score. These results show that the MCAE model is able to generalise better to novel data and tasks than existing approaches by actively learning normalised feature representations.

蒸馏|知识提取(2篇)

【1】 Modeling Accurate Human Activity Recognition for Embedded Devices Using Multi-level Distillation 标题:基于多级蒸馏的嵌入式设备精确人体活动识别建模

作者:Runze Chen,Haiyong Luo,Fang Zhao,Xuechun Meng,Zhiqing Xie,Yida Zhu 备注:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 链接:https://arxiv.org/abs/2107.07331 摘要:基于IMU传感器的人类活动识别(Human activity recognition,HAR)是普适计算的一个重要领域。由于在物联网设备或智能手机中部署人工智能的趋势不断改善,越来越多的研究人员为嵌入式设备设计了HAR模型。我们提出了一个即插即用的HAR建模管道,通过多级蒸馏来建立深度卷积HAR模型,并在本地支持嵌入式设备。SMLDist包括阶段蒸馏、记忆蒸馏和logits蒸馏,涵盖了深层模型的所有信息流。阶段升华限制了中间特征的学习方向。记忆蒸馏教授学生模型如何解释和存储基于Hopfield网络的高维特征之间的内在关系。Logits蒸馏通过平滑的条件规则构造蒸馏Logits,以保持概率分布,提高软目标的正确性。我们比较了各种最先进的HAR框架和SMLDist构建的mobilenetv3模型在嵌入式平台上的精度、F1宏评分和能耗性能。该模型具有良好的鲁棒性、效率和准确性。SMLDist还可以在七个公共数据集上以与其他最先进的知识提取方法相同的压缩率压缩性能损失较小的模型。 摘要:Human activity recognition (HAR) based on IMU sensors is an essential domain in ubiquitous computing. Because of the improving trend to deploy artificial intelligence into IoT devices or smartphones, more researchers design the HAR models for embedded devices. We propose a plug-and-play HAR modeling pipeline with multi-level distillation to build deep convolutional HAR models with native support of embedded devices. SMLDist consists of stage distillation, memory distillation, and logits distillation, which covers all the information flow of the deep models. Stage distillation constrains the learning direction of the intermediate features. Memory distillation teaches the student models how to explain and store the inner relationship between high-dimensional features based on Hopfield networks. Logits distillation constructs distilled logits by a smoothed conditional rule to keep the probable distribution and improve the correctness of the soft target. We compare the performance of accuracy, F1 macro score, and energy cost on the embedded platform of various state-of-the-art HAR frameworks with a MobileNet V3 model built by SMLDist. The produced model has well balance with robustness, efficiency, and accuracy. SMLDist can also compress the models with minor performance loss in an equal compression rate than other state-of-the-art knowledge distillation methods on seven public datasets.

【2】 Confidence Conditioned Knowledge Distillation 标题:置信度条件下的知识提炼

作者:Sourav Mishra,Suresh Sundaram 机构:Department of Aerospace Engineering, Indian Institute of Science, Bangalore 备注:31 pages, 41 references, 5 figures, 9 tables 链接:https://arxiv.org/abs/2107.06993 摘要:提出了一种新的置信条件知识提取(CCKD)方案,用于将知识从教师模型转移到学生模型。现有的最先进的方法为此采用固定损失函数,忽略了不同样本需要传输的不同层次的信息。除此之外,这些方法在数据使用方面也是低效的。CCKD通过利用教师模型分配给正确班级的置信度来设计样本特定损失函数(CCKD-L公式)和目标(CCKD-T公式)来解决这些问题。此外,CCKD通过采用自我调节来阻止这些样本参与蒸馏过程,从而提高了数据效率,学生模型在蒸馏过程中学习更快。在多个基准数据集上的实证评估表明,CCKD方法至少达到了与其他最先进方法相同的泛化性能水平,同时在该过程中具有数据效率。通过CCKD方法训练的学生模型不会保留教师模型在训练集中犯下的大部分错误分类。与传统的KD方法相比,通过CCKD方法进行的提取提高了学生模型对敌方攻击的恢复能力。实验表明,MNIST和Fashion MNIST数据集的抗攻击性能至少提高了3%,CIFAR10数据集的抗攻击性能至少提高了6%。 摘要:In this paper, a novel confidence conditioned knowledge distillation (CCKD) scheme for transferring the knowledge from a teacher model to a student model is proposed. Existing state-of-the-art methods employ fixed loss functions for this purpose and ignore the different levels of information that need to be transferred for different samples. In addition to that, these methods are also inefficient in terms of data usage. CCKD addresses these issues by leveraging the confidence assigned by the teacher model to the correct class to devise sample-specific loss functions (CCKD-L formulation) and targets (CCKD-T formulation). Further, CCKD improves the data efficiency by employing self-regulation to stop those samples from participating in the distillation process on which the student model learns faster. Empirical evaluations on several benchmark datasets show that CCKD methods achieve at least as much generalization performance levels as other state-of-the-art methods while being data efficient in the process. Student models trained through CCKD methods do not retain most of the misclassifications commited by the teacher model on the training set. Distillation through CCKD methods improves the resilience of the student models against adversarial attacks compared to the conventional KD method. Experiments show at least 3% increase in performance against adversarial attacks for the MNIST and the Fashion MNIST datasets, and at least 6% increase for the CIFAR10 dataset.

推荐(2篇)

【1】 You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a (Mostly) Serverless and Open Stack 标题:您不需要更大的船:在(主要)无服务器和开放堆栈中提供合理规模的建议

作者:Jacopo Tagliabue 备注:Manuscript version of a work accepted at RecSys 2021 (camera-ready forthcoming) 链接:https://arxiv.org/abs/2107.07346 摘要:我们认为,不成熟的数据管道阻碍了大部分行业从业者利用推荐系统的最新研究成果。我们提出了一个“合理规模”的机器学习模板数据栈,并展示了采用无服务器模式可以解决多少挑战。利用我们的经验,我们详细介绍了现代开放源码如何在有限的基础设施工作下提供处理数TB数据的管道。 摘要:We argue that immature data pipelines are preventing a large portion of industry practitioners from leveraging the latest research on recommender systems. We propose our template data stack for machine learning at "reasonable scale", and show how many challenges are solved by embracing a serverless paradigm. Leveraging our experience, we detail how modern open source can provide a pipeline processing terabytes of data with limited infrastructure work.

【2】 Online Learning for Recommendations at Grubhub 标题:GrubHub推荐的在线学习

作者:Alex Egg 链接:https://arxiv.org/abs/2107.07106 摘要:我们提出了一种方法,可以很容易地修改现有的离线推荐系统运行在线使用转移学习。推荐系统的在线学习有两个主要优点:质量和规模。像生产中的许多机器学习算法一样,如果不定期进行再训练,就会出现概念漂移。经常在线更新的策略可以适应比批处理系统更快的漂移。对于推荐者这样的用户交互系统来说尤其如此,因为在推荐者中,底层的分布会随着用户行为的变化而急剧变化。当一个平台像Grubhub一样快速增长时,运行批量训练作业的成本变得非常重要。从离线的无状态批处理学习到在线的有状态增量学习的转变可以恢复,例如在Grubhub,可以节省45倍的成本,并增加20%的指标。在向在线状态学习过渡的过程中,我们需要克服一些挑战,即收敛性、非平稳嵌入和非策略评估,这些都是我们在生产中运行该系统的经验。 摘要:We propose a method to easily modify existing offline Recommender Systems to run online using Transfer Learning. Online Learning for Recommender Systems has two main advantages: quality and scale. Like many Machine Learning algorithms in production if not regularly retrained will suffer from Concept Drift. A policy that is updated frequently online can adapt to drift faster than a batch system. This is especially true for user-interaction systems like recommenders where the underlying distribution can shift drastically to follow user behaviour. As a platform grows rapidly like Grubhub, the cost of running batch training jobs becomes material. A shift from stateless batch learning offline to stateful incremental learning online can recover, for example, at Grubhub, up to a 45x cost savings and a 20% metrics increase. There are a few challenges to overcome with the transition to online stateful learning, namely convergence, non-stationary embeddings and off-policy evaluation, which we explore from our experiences running this system in production.

聚类(1篇)

【1】 Hybrid Ant Swarm-Based Data Clustering 标题:基于混合蚁群算法的数据聚类

作者:Md Ali Azam,Abir Hossen,Md Hafizur Rahman 机构:Electrical Engineering, South Dakota School of Mines and Technology, Rapid City, SD 备注:None 链接:https://arxiv.org/abs/2107.07382 摘要:受生物启发的计算技术在包括数据聚类在内的许多研究领域都是非常有效和有用的。蚂蚁聚类算法是近二十年来被广泛研究的一种自然聚类技术。本文将蚁群算法(ACA)扩展为一种混合蚁群算法(hACA)。具体来说,我们在标准蚁群算法中加入了一个遗传算法来扩展混合算法以获得更好的性能。我们还引入了新的取舍规则来提高聚类性能。研究了hACA算法的性能,并与标准ACA算法进行了比较。 摘要:Biologically inspired computing techniques are very effective and useful in many areas of research including data clustering. Ant clustering algorithm is a nature-inspired clustering technique which is extensively studied for over two decades. In this study, we extend the ant clustering algorithm (ACA) to a hybrid ant clustering algorithm (hACA). Specifically, we include a genetic algorithm in standard ACA to extend the hybrid algorithm for better performance. We also introduced novel pick up and drop off rules to speed up the clustering performance. We study the performance of the hACA algorithm and compare with standard ACA as a benchmark.

自动驾驶|车辆|车道检测等(1篇)

【1】 An Overview and Experimental Study of Learning-based Optimization Algorithms for Vehicle Routing Problem 标题:基于学习的车辆路径优化算法综述与实验研究

作者:Bingjie Li,Guohua Wu,Yongming He,Mingfeng Fan,Witold Pedrycz 机构: NationalUniversity of Defense Technology, com)Witold Pedrycz is with the Department of Electrical and Computer Engi-neering, University of Alberta 备注:18 pages, 11 figures 链接:https://arxiv.org/abs/2107.07076 摘要:车辆路径问题(VRP)是一个典型的离散组合优化问题,人们提出了许多模型和算法来求解VRP及其变量。虽然现有的方法对这一领域的发展做出了很大的贡献,但这些方法要么在问题规模上受到限制,要么在选择参数时需要人工干预。为了解决这些困难,许多研究认为基于学习的优化算法来解决VRP。本文回顾了这一领域的最新进展,并将相关方法分为端到端方法和分步方法。我们设计了三个部分的实验对四种具有代表性的基于学习的优化算法的性能进行了评价,并得出结论:结合启发式搜索可以有效地提高LBO模型的学习能力和抽样效率。最后指出LBO算法的研究方向是解决现实世界中的大规模多约束问题。 摘要:Vehicle routing problem (VRP) is a typical discrete combinatorial optimization problem, and many models and algorithms have been proposed to solve VRP and variants. Although existing approaches has contributed a lot to the development of this field, these approaches either are limited in problem size or need manual intervening in choosing parameters. To tackle these difficulties, many studies consider learning-based optimization algorithms to solve VRP. This paper reviews recent advances in this field and divides relevant approaches into end-to-end approaches and step-by-step approaches. We design three part experiments to justly evaluate performance of four representative learning-based optimization algorithms and conclude that combining heuristic search can effectively improve learning ability and sampled efficiency of LBO models. Finally we point out that research trend of LBO algorithms is to solve large-scale and multiple constraints problems from real world.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 High carbon stock mapping at large scale with optical satellite imagery and spaceborne LIDAR 标题:利用光学卫星影像和星载激光雷达进行大比例尺高碳储量制图

作者:Nico Lang,Konrad Schindler,Jan Dirk Wegner 机构:EcoVision Lab, Photogrammetry and Remote Sensing, ETH Zurich, Switzerland, Institute for Computational Science, University of Zurich, Switzerland 链接:https://arxiv.org/abs/2107.07431 摘要:对商品日益增长的需求正导致世界各地土地利用的变化。在热带地区,森林砍伐造成高碳排放,威胁生物多样性,常常与农业扩张有关。虽然人们普遍认识到需要建立无森林砍伐的全球供应链,但在实践中取得进展仍然是一项挑战。在这里,我们提出了一个自动化的方法,旨在支持保护和可持续的土地利用规划决策,通过绘制大规模和高空间分辨率的热带景观图,遵循高碳储量(HCS)方法。发展了一种深度学习方法,通过从稀疏的GEDI激光雷达参考数据中学习,估计每10米Sentinel-2像素的林冠高度,获得6.3米的总体RMSE。我们表明,这些林冠顶高的墙到墙地图可以预测HCS森林和退化地区的分类,总体准确率为86%,并为印度尼西亚、马来西亚和菲律宾绘制了第一张高碳储量地图。 摘要:The increasing demand for commodities is leading to changes in land use worldwide. In the tropics, deforestation, which causes high carbon emissions and threatens biodiversity, is often linked to agricultural expansion. While the need for deforestation-free global supply chains is widely recognized, making progress in practice remains a challenge. Here, we propose an automated approach that aims to support conservation and sustainable land use planning decisions by mapping tropical landscapes at large scale and high spatial resolution following the High Carbon Stock (HCS) approach. A deep learning approach is developed that estimates canopy height for each 10 m Sentinel-2 pixel by learning from sparse GEDI LIDAR reference data, achieving an overall RMSE of 6.3 m. We show that these wall-to-wall maps of canopy top height are predictive for classifying HCS forests and degraded areas with an overall accuracy of 86 % and produce a first high carbon stock map for Indonesia, Malaysia, and the Philippines.

联邦学习|隐私保护|加密(1篇)

【1】 DeFed: A Principled Decentralized and Privacy-Preserving Federated Learning Algorithm 标题:DeFED:一种原则性分散保护隐私的联邦学习算法

作者:Ye Yuan,Ruijuan Chen,Chuan Sun,Maolin Wang,Feng Hua,Xinlei Yi,Tao Yang,Jun Liu 机构: Feng Hua are with School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Xinlei Yi is with School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology 链接:https://arxiv.org/abs/2107.07171 摘要:联邦学习使大量的客户机能够参与到共享模型的学习中,同时维护存储在每个客户机中的训练数据,从而保护数据的隐私和安全。到目前为止,联邦学习框架是以一种集中的方式构建的,其中需要一个中央客户机来收集和分发来自每个其他客户机的信息。这不仅导致中央客户端的通信压力很大,而且使中央客户端极易受到故障和攻击。本文提出了一种原则性的去中心化联邦学习算法(DeFed),该算法去除了传统联邦平均(FedAvg)设置中的中心客户机,只依赖于客户机与其本地邻居之间的信息传输。当损失函数为光滑强凸函数时,本文提出的DeFed算法可以达到全局最小值,收敛速度为$O(1/T)$,其中$T$是梯度下降的迭代次数。最后,将该算法应用于多个玩具实例,验证了算法的有效性。 摘要:Federated learning enables a large number of clients to participate in learning a shared model while maintaining the training data stored in each client, which protects data privacy and security. Till now, federated learning frameworks are built in a centralized way, in which a central client is needed for collecting and distributing information from every other client. This not only leads to high communication pressure at the central client, but also renders the central client highly vulnerable to failure and attack. Here we propose a principled decentralized federated learning algorithm (DeFed), which removes the central client in the classical Federated Averaging (FedAvg) setting and only relies information transmission between clients and their local neighbors. The proposed DeFed algorithm is proven to reach the global minimum with a convergence rate of $O(1/T)$ when the loss function is smooth and strongly convex, where $T$ is the number of iterations in gradient descent. Finally, the proposed algorithm has been applied to a number of toy examples to demonstrate its effectiveness.

推理|分析|理解|解释(8篇)

【1】 Algorithmic Concept-based Explainable Reasoning 标题:基于算法概念的可解释推理

作者:Dobrik Georgiev,Pietro Barbiero,Dmitry Kazhdan,Petar Veličković,Pietro Liò 机构:University of Cambridge, Petar Veliˇckovi´c, DeepMind 备注:preprint 链接:https://arxiv.org/abs/2107.07493 摘要:最近对图神经网络(GNN)模型的研究成功地将GNN应用于经典的图算法和组合优化问题。这有许多好处,例如允许在不满足前提条件时应用算法,或者在没有足够的训练数据或无法生成足够的训练数据时重用学习的模型。不幸的是,这些方法的一个主要障碍是缺乏可解释性,因为GNNs是不能直接解释的黑盒模型。在这项工作中,我们通过将现有的基于概念的解释工作应用于GNN模型来解决这一局限性。我们介绍了瓶颈GNNs的概念,它依赖于对GNN读出机制的修改。通过三个案例研究,我们证明:(i)我们提出的模型能够准确地学习概念,并根据每个目标类的学习概念提取命题公式(ii)我们基于概念的GNN模型实现了与最先进模型的比较性能(iii)我们可以导出全局图概念,而无需显式地提供对图级概念的任何监督。 摘要:Recent research on graph neural network (GNN) models successfully applied GNNs to classical graph algorithms and combinatorial optimisation problems. This has numerous benefits, such as allowing applications of algorithms when preconditions are not satisfied, or reusing learned models when sufficient training data is not available or can't be generated. Unfortunately, a key hindrance of these approaches is their lack of explainability, since GNNs are black-box models that cannot be interpreted directly. In this work, we address this limitation by applying existing work on concept-based explanations to GNN models. We introduce concept-bottleneck GNNs, which rely on a modification to the GNN readout mechanism. Using three case studies we demonstrate that: (i) our proposed model is capable of accurately learning concepts and extracting propositional formulas based on the learned concepts for each target class; (ii) our concept-based GNN models achieve comparative performance with state-of-the-art models; (iii) we can derive global graph concepts, without explicitly providing any supervision on graph-level concepts.

【2】 Machine Learning-Based Analysis of Free-Text Keystroke Dynamics 标题:基于机器学习的自由文本击键动力学分析

作者:Han-Chih Chang,Jianwei Li,Mark Stamp 链接:https://arxiv.org/abs/2107.07409 摘要:主动和被动生物特征认证与识别技术的发展在网络安全中发挥着越来越重要的作用。击键动力学可以用来分析用户基于各种键盘输入进行键入的方式。以前的工作表明,基于击键动力学可以实现用户身份验证和分类。在本研究中,我们考虑了基于自由文本的击键动态特征的用户分类问题。我们实现并分析了一种结合卷积神经网络(CNN)和选通递归单元(GRU)的深度学习模型。我们优化得到的模型,并考虑几个相关的相关问题。我们的模型与以往可比研究中得到的最佳结果具有竞争性。 摘要:The development of active and passive biometric authentication and identification technology plays an increasingly important role in cybersecurity. Keystroke dynamics can be used to analyze the way that a user types based on various keyboard input. Previous work has shown that user authentication and classification can be achieved based on keystroke dynamics. In this research, we consider the problem of user classification based on keystroke dynamics features collected from free-text. We implement and analyze a novel a deep learning model that combines a convolutional neural network (CNN) and a gated recurrent unit (GRU). We optimize the resulting model and consider several relevant related problems. Our model is competitive with the best results obtained in previous comparable research.

【3】 Probabilistic analysis of solar cell optical performance using Gaussian processes 标题:基于高斯过程的太阳电池光学性能概率分析

作者:Rahul Jaiswal,Manel Martínez-Ramón,Tito Busani 机构:Center for High Technology Materials, Albuquerque, Electrical & Computer Engineering Department, University of New Mexico, Corresponding author, A R T I C L E I N F O 链接:https://arxiv.org/abs/2107.07342 摘要:本文研究了不同的基于机器学习的预测方法在硅基织构电池性能评估中的应用。引入了置信限域的概念,详细讨论了该概念的优点。结果表明,利用高斯过程可以准确地估计反射剖面和与深度相关的光学产额剖面,并且精确地知道预测值的不确定性,还表明可以根据期望的性能指标估计单元设计参数。 摘要:This work investigates application of different machine learning based prediction methodologies to estimate the performance of silicon based textured cells. Concept of confidence bound regions is introduced and advantages of this concept are discussed in detail. Results show that reflection profiles and depth dependent optical generation profiles can be accurately estimated using Gaussian processes with exact knowledge of uncertainty in the prediction values.It is also shown that cell design parameters can be estimated for a desired performance metric.

【4】 Explainable AI: current status and future directions 标题:可解释人工智能:现状与未来发展方向

作者:Prashant Gohel,Priyanka Singh,Manoranjan Mohanty 机构:DA-IICT, Gandhinagar, Gujarat, India, Centre for Forensic Science, University of Technology Sydney, Australia 链接:https://arxiv.org/abs/2107.07045 摘要:可解释人工智能是人工智能领域的一个新兴研究领域。XAI可以解释AI是如何获得特定解决方案的(例如,分类或目标检测),还可以回答其他“wh”问题。这种解释在传统的人工智能中是不可能的。可解释性对于关键应用至关重要,例如国防、医疗保健、法律和秩序以及自动驾驶车辆等,在这些应用中,技术诀窍是信任和透明所必需的。到目前为止,许多XAI技术都是为此类应用而设计的。本文从多媒体(即文本、图像、音频和视频)的角度对这些技术进行了概述。讨论了这些技术的优缺点,并对今后的发展方向提出了一些建议。 摘要:Explainable Artificial Intelligence (XAI) is an emerging area of research in the field of Artificial Intelligence (AI). XAI can explain how AI obtained a particular solution (e.g., classification or object detection) and can also answer other "wh" questions. This explainability is not possible in traditional AI. Explainability is essential for critical applications, such as defense, health care, law and order, and autonomous driving vehicles, etc, where the know-how is required for trust and transparency. A number of XAI techniques so far have been purposed for such applications. This paper provides an overview of these techniques from a multimedia (i.e., text, image, audio, and video) point of view. The advantages and shortcomings of these techniques have been discussed, and pointers to some future directions have also been provided.

【5】 Annotation and Classification of Evidence and Reasoning Revisions in Argumentative Writing 标题:议论文写作中证据与推理修改的注解与分类

作者:Tazin Afrin,Elaine Wang,Diane Litman,Lindsay C. Matsumura,Richard Correnti 机构:Learning Research and Development Center, University of Pittsburgh, Pittsburgh, Pennsylvania 备注:10 pages, 11 tables, 15th Workshop on Innovative Use of NLP for Building Educational Applications 链接:https://arxiv.org/abs/2107.06990 摘要:自动写作评价系统可以提高学生的写作水平,只要学生注意到所提供的反馈,并根据反馈修改论文草稿。然而,现有的关于这类系统中议论文修改的研究主要集中在学生修改的类型(如表面与内容)上,而不是修改在多大程度上对反馈做出反应并改进论文。我们引入了一个注释方案来捕捉证据使用和推理的句子级修订的性质(RER方案),并将其应用于五年级和六年级学生的议论文中。我们表明,可靠的手动注释可以实现,修订注释与论文改进的整体评估相关联,与提供的反馈一致。此外,我们探讨了根据我们的方案自动分类修订的可行性。 摘要:Automated writing evaluation systems can improve students' writing insofar as students attend to the feedback provided and revise their essay drafts in ways aligned with such feedback. Existing research on revision of argumentative writing in such systems, however, has focused on the types of revisions students make (e.g., surface vs. content) rather than the extent to which revisions actually respond to the feedback provided and improve the essay. We introduce an annotation scheme to capture the nature of sentence-level revisions of evidence use and reasoning (the `RER' scheme) and apply it to 5th- and 6th-grade students' argumentative essays. We show that reliable manual annotation can be achieved and that revision annotations correlate with a holistic assessment of essay improvement in line with the feedback provided. Furthermore, we explore the feasibility of automatically classifying revisions according to our scheme.

【6】 Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference 标题:基于记忆感知的神经网络融合和瓦片加速边缘推理

作者:Jackson Farley,Andreas Gerstlauer 机构:Electrical and Computer Engineering, The University of Texas at Austin 链接:https://arxiv.org/abs/2107.06960 摘要:在资源受限的边缘设备上运行昂贵的机器学习(ML)网络是一个日益严峻的研究挑战。具有大卷积层的ML网络很容易超过可用内存,由于过度交换而增加延迟。以前的内存减少技术,如剪枝和量化,降低了模型的准确性,往往需要重新训练。或者,分布式方法将卷积划分为等效的更小的子计算,但是实现会引入通信成本,并且需要设备网络。然而,分布式分区方法也可以用于通过将网络细分为更小的操作来减少单个设备上的内存占用。本报告扩展了以前关于分布式分区的工作,将卷积层平铺和融合到单个设备上的内存感知执行中。我们的方法扩展了先前的融合策略,允许两组独立融合和平铺的卷积层。这种方法通过数据重用减少了开销,并进一步减少了内存占用。我们还提出了一个内存使用预测与搜索算法相结合,为任意卷积层提供融合和平铺配置。将该方法应用于YOLOv2目标检测网络,结果表明,该方法的运行内存不足一半,在严重的内存限制下,加速比可达2.78。此外,我们的算法将返回一个配置,其延迟在手动搜索中测量的最佳延迟的6%以内。 摘要:A rising research challenge is running costly machine learning (ML) networks locally on resource-constrained edge devices. ML networks with large convolutional layers can easily exceed available memory, increasing latency due to excessive swapping. Previous memory reduction techniques such as pruning and quantization reduce model accuracy and often require retraining. Alternatively, distributed methods partition the convolutions into equivalent smaller sub-computations, but the implementations introduce communication costs and require a network of devices. However, a distributed partitioning approach can also be used to run in a reduced memory footprint on a single device by subdividing the network into smaller operations. This report extends prior work on distributed partitioning using tiling and fusing of convolutional layers into a memory-aware execution on a single device. Our approach extends prior fusing strategies to allow for two groups of convolutional layers that are fused and tiled independently. This approach reduces overhead via data reuse, and reduces the memory footprint further. We also propose a memory usage predictor coupled with a search algorithm to provide fusing and tiling configurations for an arbitrary set of convolutional layers. When applied to the YOLOv2 object detection network, results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints. Additionally, our algorithm will return a configuration with a latency that is within 6% of the best latency measured in a manual search.

【7】 Understanding Failures in Out-of-Distribution Detection with Deep Generative Models 标题:用深度生成模型理解失配检测中的故障

作者:Lily H. Zhang,Mark Goldstein,Rajesh Ranganath 机构: OOD detection has been formalized as the taskof identifying points with low likelihood 1 under the training 1New York University 备注:Accepted at ICML 2021 链接:https://arxiv.org/abs/2107.06908 摘要:深度生成模型(Deep-generative models,DGMs)似乎很适合检测分布外(out-of-distribution,OOD)输入,但这种模型被证明比训练分布的图像具有更高的概率或密度。在这项工作中,我们解释了为什么这种行为应该归因于模型错误估计。我们首先证明了,如果不假设输出分布是相关的,任何方法都不能保证超越随机机会的性能。然后我们质询了典型集假设,即相关的输出分布可以位于数据分布的高似然区域,并且OOD检测应该基于数据分布的典型集来定义。我们强调假设输入和输出分布之间的支持重叠所隐含的后果,以及OOD检测的典型集的任意性。我们的结果表明,估计误差比基于似然的OOD检测和感兴趣的外分布之间的不一致更合理的解释,并且我们说明了即使最小的估计误差也会导致OOD检测失败,从而为深入生成建模和OOD检测的未来工作提供了启示。 摘要:Deep generative models (DGMs) seem a natural fit for detecting out-of-distribution (OOD) inputs, but such models have been shown to assign higher probabilities or densities to OOD images than images from the training distribution. In this work, we explain why this behavior should be attributed to model misestimation. We first prove that no method can guarantee performance beyond random chance without assumptions on which out-distributions are relevant. We then interrogate the typical set hypothesis, the claim that relevant out-distributions can lie in high likelihood regions of the data distribution, and that OOD detection should be defined based on the data distribution's typical set. We highlight the consequences implied by assuming support overlap between in- and out-distributions, as well as the arbitrariness of the typical set for OOD detection. Our results suggest that estimation error is a more plausible explanation than the misalignment between likelihood-based OOD detection and out-distributions of interest, and we illustrate how even minimal estimation error can lead to OOD detection failures, yielding implications for future work in deep generative modeling and OOD detection.

【8】 Principal component analysis for Gaussian process posteriors 标题:高斯过程后验的主成分分析

作者:Hideaki Ishibashi,Shotaro Akaho 机构:Kyushu Institute of Technology., The National Institute of Advanced Industrial Science and Technology RIKEN AIP. 链接:https://arxiv.org/abs/2107.07115 摘要:提出了一种基于GP-PCA的高斯过程后验主成分分析方法。由于GP-PCA估计的是GP后验概率的低维空间,因此它可以用于元学习,元学习是通过估计一组任务的结构来提高新任务精度的框架。问题是如何定义一组具有无限维参数(如坐标系和散度)的GPs的结构。在本研究中,我们在资讯几何的框架下,藉由考虑具有相同优先权的GP后验空间,将GP的无穷性化为有限维情形。此外,我们提出了一种基于变分推理的GP-PCA逼近方法,并通过实验验证了GP-PCA作为元学习的有效性。 摘要:This paper proposes an extension of principal component analysis for Gaussian process posteriors denoted by GP-PCA. Since GP-PCA estimates a low-dimensional space of GP posteriors, it can be used for meta-learning, which is a framework for improving the precision of a new task by estimating a structure of a set of tasks. The issue is how to define a structure of a set of GPs with an infinite-dimensional parameter, such as coordinate system and a divergence. In this study, we reduce the infiniteness of GP to the finite-dimensional case under the information geometrical framework by considering a space of GP posteriors that has the same prior. In addition, we propose an approximation method of GP-PCA based on variational inference and demonstrate the effectiveness of GP-PCA as meta-learning through experiments.

检测相关(1篇)

【1】 Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests 标题:特征偏移检测:通过条件分布测试定位哪些特征发生了偏移

作者:Sean Kulinski,Saurabh Bagchi,David I. Inouye 机构:School of Electrical and Computer Engineering, Purdue University 备注:None 链接:https://arxiv.org/abs/2107.06929 摘要:虽然以前的分布偏移检测方法可以识别是否发生了偏移,但这些方法无法定位哪些特定功能导致了分布偏移——这是诊断或修复任何潜在问题的关键步骤。例如,在军事传感器网络中,用户需要检测一个或多个传感器何时受损,关键的是,他们需要知道哪些特定传感器可能受损。因此,我们首先将这个问题的形式化定义为多重条件分布假设检验,并提出非参数和参数统计检验。为了提高效率和灵活性,我们建议使用基于密度模型得分函数(即相对于输入的梯度)的检验统计量,该统计量可以方便地计算单个向前和向后过程中所有维度的检验统计量。任何密度模型都可用于计算必要的统计数据,包括深度密度模型,如标准化流或自回归模型。此外,我们还开发了在多变量时间序列数据中识别何时何地发生转移的方法,并在模拟数据和真实数据上使用真实攻击模型显示多个场景的结果。 摘要:While previous distribution shift detection approaches can identify if a shift has occurred, these approaches cannot localize which specific features have caused a distribution shift -- a critical step in diagnosing or fixing any underlying issue. For example, in military sensor networks, users will want to detect when one or more of the sensors has been compromised, and critically, they will want to know which specific sensors might be compromised. Thus, we first define a formalization of this problem as multiple conditional distribution hypothesis tests and propose both non-parametric and parametric statistical tests. For both efficiency and flexibility, we then propose to use a test statistic based on the density model score function (i.e. gradient with respect to the input) -- which can easily compute test statistics for all dimensions in a single forward and backward pass. Any density model could be used for computing the necessary statistics including deep density models such as normalizing flows or autoregressive models. We additionally develop methods for identifying when and where a shift occurs in multivariate time-series data and show results for multiple scenarios using realistic attack models on both simulated and real world data.

分类|识别(1篇)

【1】 Data vs classifiers, who wins? 标题:数据VS分类器,谁赢了?

作者:Lucas F. F. Cardoso,Vitor C. A. Santos,Regiane S. Kawasaki Francês,Ricardo B. C. Prudêncio,Ronnie C. O. Alves 机构:Faculdade de Computação, Universidade Federal do Pará, Belém, Brazil, Centro de Informática, Universidade Federal de Pernambuco, Recife, Brazil, Instituto Tecnológico Vale, Belém, Brazil, A R T I C L E I N F O 备注:15 pages, 6 figures and 9 tables 链接:https://arxiv.org/abs/2107.07451 摘要:机器学习所涉及的分类实验由两个重要部分组成:数据部分和算法部分。因为它们是问题的基本部分,所以在根据基准评估模型的性能时,必须同时考虑这两者。最好的分类器需要可靠的基准来进行适当的评估。为此,使用了OpenML-CC18等黄金标准基准。然而,在性能评估期间,数据复杂性通常不与模型一起考虑。最近的研究采用项目反应理论(IRT)作为一种新的方法来评估数据集和算法,能够同时评估两者。本文提出了一种新的基于IRT和Glicko-2的评价方法,它与decort工具一起开发,用于指导以ML为单位的IRT估计。它探索IRT作为评估OpenML-CC18基准的算法评估能力的工具,并检查是否有比原始基准更有效的数据集子集。几个分类器,从经典到集成,也使用IRT模型进行评估。采用Glicko-2评分系统,结合IRT对分类器的内在能力和分类性能进行综合评价。值得注意的是,并不是所有的OpenML-CC18数据集对于评估算法都非常有用,只有10%的数据集被认为是非常困难的。此外,还验证了一个仅包含原始大小50%的更有效子集的存在性。而Randon-Forest被选为具有最佳先天能力的算法。 摘要:The classification experiments covered by machine learning (ML) are composed by two important parts: the data and the algorithm. As they are a fundamental part of the problem, both must be considered when evaluating a model's performance against a benchmark. The best classifiers need robust benchmarks to be properly evaluated. For this, gold standard benchmarks such as OpenML-CC18 are used. However, data complexity is commonly not considered along with the model during a performance evaluation. Recent studies employ Item Response Theory (IRT) as a new approach to evaluating datasets and algorithms, capable of evaluating both simultaneously. This work presents a new evaluation methodology based on IRT and Glicko-2, jointly with the decodIRT tool developed to guide the estimation of IRT in ML. It explores the IRT as a tool to evaluate the OpenML-CC18 benchmark for its algorithmic evaluation capability and checks if there is a subset of datasets more efficient than the original benchmark. Several classifiers, from classics to ensemble, are also evaluated using the IRT models. The Glicko-2 rating system was applied together with IRT to summarize the innate ability and classifiers performance. It was noted that not all OpenML-CC18 datasets are really useful for evaluating algorithms, where only 10% were rated as being really difficult. Furthermore, it was verified the existence of a more efficient subset containing only 50% of the original size. While Randon Forest was singled out as the algorithm with the best innate ability.

表征(2篇)

【1】 MultiBench: Multiscale Benchmarks for Multimodal Representation Learning 标题:MultiBench:多模态表征学习的多尺度基准

作者:Paul Pu Liang,Yiwei Lyu,Xiang Fan,Zetian Wu,Yun Cheng,Jason Wu,Leslie Chen,Peter Wu,Michelle A. Lee,Yuke Zhu,Ruslan Salakhutdinov,Louis-Philippe Morency 机构:CMU,Johns Hopkins,Northeastern,Stanford,UT Austin 备注:Code: this https URL and Website: this https URL 链接:https://arxiv.org/abs/2107.07502 摘要:学习多模态表示涉及整合来自多个异构数据源的信息。这是一个具有挑战性但又至关重要的领域,在多媒体、情感计算、机器人技术、金融、人机交互和医疗保健等领域有着广泛的应用。不幸的是,多模态研究的资源有限,无法研究(1)跨域和模式的泛化,(2)训练和推理过程中的复杂性,以及(3)对噪声和缺失模式的鲁棒性。为了加快研究模式和任务的进度,同时确保真实世界的鲁棒性,我们发布了MultiBench,这是一个系统和统一的大规模基准,涵盖15个数据集、10个模式、20个预测任务和6个研究领域。MultiBench提供了一个自动化的端到端机器学习管道,简化和标准化数据加载、实验设置和模型评估。为了实现整体评估,MultiBench提供了一种全面的方法来评估(1)泛化,(2)时间和空间复杂性,以及(3)模态稳健性。MultiBench为未来的研究带来了巨大的挑战,包括对大规模多模态数据集的可扩展性和对现实缺陷的鲁棒性。为了配合这一基准,我们还提供了多模式学习中20种核心方法的标准化实施。简单地应用在不同研究领域提出的方法可以提高9/15数据集的最新性能。因此,MultiBench在统一多模态研究中不相交的工作方面具有里程碑意义,并为更好地理解多模态模型的能力和局限性铺平了道路,同时确保了易用性、可访问性和可再现性。MultiBench,我们的标准化代码和排行榜是公开的,将定期更新,并欢迎来自社区的投入。 摘要:Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

【2】 CLSRIL-23: Cross Lingual Speech Representations for Indic Languages 标题:CLSRIL-23:印地语的跨语言语音表示

作者:Anirudh Gupta,Harveen Singh Chadha,Priyanshi Shah,Neeraj Chimmwal,Ankur Dhuriya,Rishabh Gaur,Vivek Raghavan 机构:Thoughtworks, Ekstep Foundation 备注:7 pages, 2 figures 链接:https://arxiv.org/abs/2107.07402 摘要:我们提出了一个CLSRIL-23,一个基于自监督学习的音频预训练模型,它从23种印度语的原始音频中学习跨语言的语音表征。它是建立在wav2vec2.0的基础上的,wav2vec2.0通过训练隐藏的潜在言语表征的对比任务来解决这个问题,并联合学习所有语言共享的潜在语的量化。我们比较了训练前的语言损失,以比较单语和多语训练的效果。我们还比较了一些下游语音识别微调任务的性能,实验表明,多语言预训练在学习编码语言语音相似性的语音表征方面以及在下游任务上的性能都优于单语训练。当使用印地语的多语种预训练模型进行微调时,WER和CER分别下降了5%和9.5%。所有的代码模型都是开源的。CLSRIL-23是一个以23美元的语言和近10000小时的音频数据训练的模型,用于促进印度语语音识别的研究。我们希望,新的国家的最先进的系统将创建使用自我监督的方法,特别是低资源的印度语。 摘要:We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining. Performance on some downstream fine-tuning tasks for speech recognition is also compared and our experiments show that multilingual pretraining outperforms monolingual training, in terms of learning speech representations which encodes phonetic similarity of languages and also in terms of performance on down stream tasks. A decrease of 5% is observed in WER and 9.5% in CER when a multilingual pretrained model is used for finetuning in Hindi. All the code models are also open sourced. CLSRIL-23 is a model trained on $23$ languages and almost 10,000 hours of audio data to facilitate research in speech recognition for Indic languages. We hope that new state of the art systems will be created using the self supervised approach, especially for low resources Indic languages.

编码器(1篇)

【1】 DAL: Feature Learning from Overt Speech to Decode Imagined Speech-based EEG Signals with Convolutional Autoencoder 标题:DAL:基于卷积自动编码器的显性语音特征学习解码想象语音脑电信号

作者:Dae-Hyeok Lee,Sung-Jin Kim,Seong-Whan Lee 机构: Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-ku, Seoul , Korea, Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-ku 备注:14 pages, 6 figures 链接:https://arxiv.org/abs/2107.07064 摘要:脑机接口(Brain-computerinterface,BCI)是通过反映人的意图和状态来实现人与设备之间通信的工具之一。随着人工智能技术的发展,人们对利用脑电(EEG)实现人与无人机之间的通信越来越感兴趣。特别地,在控制无人机群(例如方向或编队)的情况下,与控制无人机单元相比有许多优点。想象语音是一种内生的BCI范式,它可以识别用户的意图。在进行假想言语时,使用者把发音想象成实际说话。相比之下,公开讲话是一项任务,其中用户直接发音的话。当使用想象语音控制无人机群时,复杂的指令可以更直观地传递,但解码性能低于其他内生BCI范式。提出了基于深度自学习(DAL)的公开语音脑电特征提取方法,用于基于想象语音的脑电信号分类。据我们所知,这项研究是首次尝试利用公开语音的脑电特征,用自动编码器解码基于想象语音的脑电信号。共有八名受试者参加了实验。对4个词进行分类时,DAL的平均准确率为48.41%。此外,在比较公开语音的w/o和w/EEG特征时,当包含公开语音的EEG特征时,性能提高了7.42%。因此,我们证明了公开语音的脑电特征可以提高想象语音的解码性能。 摘要:Brain-computer interface (BCI) is one of the tools which enables the communication between humans and devices by reflecting intention and status of humans. With the development of artificial intelligence, the interest in communication between humans and drones using electroencephalogram (EEG) is increased. Especially, in the case of controlling drone swarms such as direction or formation, there are many advantages compared with controlling a drone unit. Imagined speech is one of the endogenous BCI paradigms, which can identify intentions of users. When conducting imagined speech, the users imagine the pronunciation as if actually speaking. In contrast, overt speech is a task in which the users directly pronounce the words. When controlling drone swarms using imagined speech, complex commands can be delivered more intuitively, but decoding performance is lower than that of other endogenous BCI paradigms. We proposed the Deep-autoleaner (DAL) to learn EEG features of overt speech for imagined speech-based EEG signals classification. To the best of our knowledge, this study is the first attempt to use EEG features of overt speech to decode imagined speech-based EEG signals with an autoencoder. A total of eight subjects participated in the experiment. When classifying four words, the average accuracy of the DAL was 48.41%. In addition, when comparing the performance between w/o and w/ EEG features of overt speech, there was a performance improvement of 7.42% when including EEG features of overt speech. Hence, we demonstrated that EEG features of overt speech could improve the decoding performance of imagined speech.

优化|敛散性(3篇)

【1】 USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems 标题:USCO-Solver:求解不确定性随机组合优化问题

作者:Guangmo Tong 机构:Department of Computer and Information Science, University of Delaware 链接:https://arxiv.org/abs/2107.07508 摘要:现实世界中的决策系统往往受到不确定性的影响,这些不确定性必须通过观测数据来解决。因此,我们经常遇到目标函数未知的组合优化问题,因此必须用经验证据来证明。与依赖于学习和优化策略的共同实践相比,我们考虑组合空间之间的回归,旨在从输入解对样本中推断出高质量的优化解——而不需要学习目标函数。我们的主要成果是一个通用解算器,能够处理抽象的不确定随机组合优化问题。对于学习基础,我们提出了学习误差分析下的PAC贝叶斯框架下使用一种新的边际分析。在实证研究中,我们使用概念验证实验来证明我们的设计,并与其他可能适用的方法进行比较。总的来说,我们在合成数据集和真实数据集上对几个经典的组合问题获得了非常令人鼓舞的实验结果。 摘要:Real-world decision-making systems are often subject to uncertainties that have to be resolved through observational data. Therefore, we are frequently confronted with combinatorial optimization problems of which the objective function is unknown and thus has to be debunked using empirical evidence. In contrast to the common practice that relies on a learning-and-optimization strategy, we consider the regression between combinatorial spaces, aiming to infer high-quality optimization solutions from samples of input-solution pairs -- without the need to learn the objective function. Our main deliverable is a universal solver that is able to handle abstract undetermined stochastic combinatorial optimization problems. For learning foundations, we present learning-error analysis under the PAC-Bayesian framework using a new margin-based analysis. In empirical studies, we demonstrate our design using proof-of-concept experiments, and compare it with other methods that are potentially applicable. Overall, we obtain highly encouraging experimental results for several classic combinatorial problems on both synthetic and real-world datasets.

【2】 Optimal Scoring Rule Design 标题:最优评分规则设计

作者:Yiling Chen,Fang-Yi Yu 机构:Harvard University yiling, edu†Harvard University fangyiyu 链接:https://arxiv.org/abs/2107.07420 摘要:本文介绍了一个合理设计评分规则的优化问题。考虑一个想要收集一个代理人关于未知状态的预测的委托人。代理既可以报告其先前的预测,也可以访问代价高昂的信号并报告后验预测。给定一组可能的分布,其中包含代理的后验预测分布,委托人的目标是设计一个有界的评分规则,使代理在报告其后验预测和报告其先前预测之间的最坏情况收益增量最大化。我们研究了这类优化的两种设置:静态设置和渐近设置。在静态环境下,当代理可以访问一个信号时,我们提出了一种有效的算法来计算分布集合有限时的最优评分规则。agent可以在渐近环境下自适应地、无限期地改进其预测。我们首先考虑具有消失协方差的后验分布序列,它模拟了大样本的一般估计,并证明了二次评分规则的最优性。然后,当agent的后验分布为Beta-Bernoulli过程时,我们发现log评分规则是最优的。对于具有Dirichlet先验的分类分布,我们还证明了对数评分规则在较小的函数集上的最优性。 摘要:This paper introduces an optimization problem for proper scoring rule design. Consider a principal who wants to collect an agent's prediction about an unknown state. The agent can either report his prior prediction or access a costly signal and report the posterior prediction. Given a collection of possible distributions containing the agent's posterior prediction distribution, the principal's objective is to design a bounded scoring rule to maximize the agent's worst-case payoff increment between reporting his posterior prediction and reporting his prior prediction. We study two settings of such optimization for proper scoring rules: static and asymptotic settings. In the static setting, where the agent can access one signal, we propose an efficient algorithm to compute an optimal scoring rule when the collection of distributions is finite. The agent can adaptively and indefinitely refine his prediction in the asymptotic setting. We first consider a sequence of collections of posterior distributions with vanishing covariance, which emulates general estimators with large samples, and show the optimality of the quadratic scoring rule. Then, when the agent's posterior distribution is a Beta-Bernoulli process, we find that the log scoring rule is optimal. We also prove the optimality of the log scoring rule over a smaller set of functions for categorical distributions with Dirichlet priors.

【3】 A Field Guide to Federated Optimization 标题:联合优化现场指南

作者:Jianyu Wang,Zachary Charles,Zheng Xu,Gauri Joshi,H. Brendan McMahan,Blaise Aguera y Arcas,Maruan Al-Shedivat,Galen Andrew,Salman Avestimehr,Katharine Daly,Deepesh Data,Suhas Diggavi,Hubert Eichner,Advait Gadhikar,Zachary Garrett,Antonious M. Girgis,Filip Hanzely,Andrew Hard,Chaoyang He,Samuel Horvath,Zhouyuan Huo,Alex Ingerman,Martin Jaggi,Tara Javidi,Peter Kairouz,Satyen Kale,Sai Praneeth Karimireddy,Jakub Konecny,Sanmi Koyejo,Tian Li,Luyang Liu,Mehryar Mohri,Hang Qi,Sashank J. Reddi,Peter Richtarik,Karan Singhal,Virginia Smith,Mahdi Soltanolkotabi,Weikang Song,Ananda Theertha Suresh,Sebastian U. Stich,Ameet Talwalkar,Hongyi Wang,Blake Woodworth,Shanshan Wu,Felix X. Yu,Honglin Yuan,Manzil Zaheer,Mi Zhang,Tong Zhang,Chunxiang Zheng,Chen Zhu,Wennan Zhu 机构:H. Brendan, McMahan∗, Blaise Ag¨uera y Arcas, Samuel Horv´ath, Zhouyuan, Huo, Jakub Koneˇcn´y, Peter Richt´arik, Ameet, Talwalkar, Felix X., Yu, Carnegie Mellon University, ´Ecole Polytechnique F´ed´erale de Lausanne, Google Research, King, Abdullah University of Science and Technology 链接:https://arxiv.org/abs/2107.06917 摘要:联合学习和分析是一种分布式方法,用于从分散的数据中协作学习模型(或统计数据),其动机是隐私保护,并设计用于隐私保护。分布式学习过程可以描述为解决联合优化问题,该问题强调通信效率、数据异构性、与隐私和系统需求的兼容性,以及其他问题设置中不主要考虑的约束。本文通过具体实例和实际实现,对联邦优化算法的制定、设计、评估和分析提供了建议和指导,重点是通过有效的仿真来推断真实世界的性能。这项工作的目的不是调查当前的文献,而是启发研究人员和实践者设计可用于各种实际应用的联合学习算法。 摘要:Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.

预测|估计(5篇)

【1】 Low-Rank Temporal Attention-Augmented Bilinear Network for financial time-series forecasting 标题:用于金融时间序列预测的低秩时间注意力增广双线性网络

作者:Mostafa Shabani,Alexandros Iosifidis 机构:Department of Engineering, Aarhus University, Aarhus, Denmark 链接:https://arxiv.org/abs/2107.06995 摘要:金融市场分析,尤其是股票价格变动的预测,是一个具有挑战性的问题。金融时间序列数据的非平稳性和非线性是造成这些挑战的主要原因。深度学习模型在许多来自不同领域的问题上取得了显著的性能改进,包括金融时间序列数据的预测问题。虽然预测性能是此类模型的主要目标,但处理超高频数据集在模型参数数量和推理速度方面受到限制。时间注意力增强双线性网络是近年来提出的一种高效、高性能的极限订单图书时间序列预测模型。本文提出了一种低阶张量近似模型,进一步减少了可训练参数的数目,提高了训练速度。 摘要:Financial market analysis, especially the prediction of movements of stock prices, is a challenging problem. The nature of financial time-series data, being non-stationary and nonlinear, is the main cause of these challenges. Deep learning models have led to significant performance improvements in many problems coming from different domains, including prediction problems of financial time-series data. Although the prediction performance is the main goal of such models, dealing with ultra high-frequency data sets restrictions in terms of the number of model parameters and its inference speed. The Temporal Attention-Augmented Bilinear network was recently proposed as an efficient and high-performing model for Limit Order Book time-series forecasting. In this paper, we propose a low-rank tensor approximation of the model to further reduce the number of trainable parameters and increase its speed.

【2】 Physics-informed generative neural network: an application to troposphere temperature prediction 标题:物理信息产生式神经网络在对流层温度预报中的应用

作者:Zhihao Chen,Jie Gao,Weikai Wang,Zheng Yan 机构:East China Regional Air Traffic Management Bureau, CAAC, Shanghai, China, AI Lab, Shanghai Em-Data Technology Co. Ltd., Shanghai, China, ∗Corresponding author, The authors contributed equally. 链接:https://arxiv.org/abs/2107.06991 摘要:对流层是大多数天气现象发生的大气层之一。对流层的温度变化,特别是对流层中层的典型水平500hpa,是未来天气变化的重要指标。数值天气预报是一种有效的温度预报方法,但其计算复杂性阻碍了预报的及时响应。提出了一种基于物理信息深度学习的温度预测方法。新的模型,称为PGnet,建立在一个生成的神经网络与掩模矩阵。该掩模用于区分由第一物理阶段生成的低质量预测区域。生成型神经网络以掩模作为第二阶段精细预测的先验。提出了一种掩模丢失和跳跃模式策略来训练生成型神经网络,使其在进行时间序列预测时不积累误差。在ERA5上的实验表明,PGnet可以生成比现有技术更精确的温度预测。 摘要:The troposphere is one of the atmospheric layers where most weather phenomena occur. Temperature variations in the troposphere, especially at 500 hPa, a typical level of the middle troposphere, are significant indicators of future weather changes. Numerical weather prediction is effective for temperature prediction, but its computational complexity hinders a timely response. This paper proposes a novel temperature prediction approach in framework ofphysics-informed deep learning. The new model, called PGnet, builds upon a generative neural network with a mask matrix. The mask is designed to distinguish the low-quality predicted regions generated by the first physical stage. The generative neural network takes the mask as prior for the second-stage refined predictions. A mask-loss and a jump pattern strategy are developed to train the generative neural network without accumulating errors during making time-series predictions. Experiments on ERA5 demonstrate that PGnet can generate more refined temperature predictions than the state-of-the-art.

【3】 Mid-flight Forecasting for CPA Lines in Online Advertising 标题:网络广告中CPA线路的中途预测

作者:Hao He,Tian Zhou,Lihua Ren,Niklas Karlsson,Aaron Flores 机构:For Verizon Media Demand Side Platform (DSP), forecasting of ad, campaign performance not only feeds key information to the op-, timization server to allow the system to operate on a high perfor- 备注:41st International Symposium on Forecasting, June 27-30, 2021 链接:https://arxiv.org/abs/2107.07494 摘要:对于Verizon MediaDemand-Side平台(DSP),广告活动绩效预测不仅向优化服务器提供关键信息,使系统能够在高性能模式下运行,还为广告商提供可操作的见解。本文采用招投标机制,研究了航班中线CPA线路的预测问题。提出的方法生成了各种关键性能指标和优化信号之间的关系。它还可以用来估计广告活动绩效指标对优化信号调整的敏感性,对广告活动管理系统的设计具有重要意义。分析了广告主支出与有效每次行动成本(eCPA)之间的关系,为广告主进行航线调整提供了依据。文中还讨论了实现中的一些实际问题,如数据集的降采样。最后,将预测结果与实际交货情况进行了比较,验证了预测结果的准确性。 摘要:For Verizon MediaDemand Side Platform(DSP), forecasting of ad campaign performance not only feeds key information to the optimization server to allow the system to operate on a high-performance mode, but also produces actionable insights to the advertisers. In this paper, the forecasting problem for CPA lines in the middle of the flight is investigated by taking the bidding mechanism into account. The proposed methodology generates relationships between various key performance metrics and optimization signals. It can also be used to estimate the sensitivity of ad campaign performance metrics to the adjustments of optimization signal, which is important to the design of a campaign management system. The relationship between advertiser spends and effective Cost Per Action(eCPA) is also characterized, which serves as a guidance for mid-flight line adjustment to the advertisers. Several practical issues in implementation, such as downsampling of the dataset, are also discussed in the paper. At last, the forecasting results are validated against actual deliveries and demonstrates promising accuracy.

【4】 FastSHAP: Real-Time Shapley Value Estimation 标题:FastSHAP:Shapley值的实时估计

作者:Neil Jethani,Mukund Sudarshan,Ian Covert,Su-In Lee,Rajesh Ranganath 机构:New York University, University of Washington 备注:20 pages, 10 figures, 3 tables 链接:https://arxiv.org/abs/2107.07436 摘要:Shapley值被广泛用于解释黑匣子模型,但是它们的计算成本很高,因为它们需要许多模型评估。我们介绍了FastSHAP,一种利用学习的解释模型估计单次向前传球中Shapley值的方法。FastSHAP通过受Shapley值的加权最小二乘特征启发的学习方法来分摊解释许多输入的成本,并且可以使用标准随机梯度优化来训练它。我们将FastSHAP与现有的估计方法进行了比较,结果表明它能产生高质量的解释,并具有数量级的加速比。 摘要:Shapley values are widely used to explain black-box models, but they are costly to calculate because they require many model evaluations. We introduce FastSHAP, a method for estimating Shapley values in a single forward pass using a learned explainer model. FastSHAP amortizes the cost of explaining many inputs via a learning approach inspired by the Shapley value's weighted least squares characterization, and it can be trained using standard stochastic gradient optimization. We compare FastSHAP to existing estimation approaches, revealing that it generates high-quality explanations with orders of magnitude speedup.

【5】 Untrained DNN for Channel Estimation of RIS-Assisted Multi-User OFDM System with Hardware Impairments 标题:硬件损伤条件下RIS辅助多用户OFDM系统信道估计的非训练DNN算法

作者:Nipuni Ginige,K. B. Shashika Manosha,Nandana Rajatheva,Matti Latva-aho 机构:Center for Wireless Communications, University of Oulu, Finland 链接:https://arxiv.org/abs/2107.07423 摘要:可重构智能表面(RIS)是一种新兴的技术,用于改善第五代(5G)网络和其他网络的性能。由于RIS的无源性,使得RIS辅助系统的信道估计具有挑战性。本文旨在介绍一种基于深度学习的低复杂度信道估计器,用于有硬件损伤的RIS辅助多用户单输入多输出(SIMO)正交频分复用(OFDM)系统。提出了一种基于深度图像先验(DIP)网络的未训练深度神经网络(DNN),对传统的基于导频的最小二乘(LS)估计得到的系统有效信道进行去噪处理,获得更精确的估计。结果表明,与传统方法相比,该方法具有较高的精度和较低的复杂度。此外,我们还证明了所提出的估计器对由收发信机和RIS的硬件损伤引起的干扰具有鲁棒性。 摘要:Reconfigurable intelligent surface (RIS) is an emerging technology for improving performance in fifth-generation (5G) and beyond networks. Practically channel estimation of RIS-assisted systems is challenging due to the passive nature of the RIS. The purpose of this paper is to introduce a deep learning-based, low complexity channel estimator for the RIS-assisted multi-user single-input-multiple-output (SIMO) orthogonal frequency division multiplexing (OFDM) system with hardware impairments. We propose an untrained deep neural network (DNN) based on the deep image prior (DIP) network to denoise the effective channel of the system obtained from the conventional pilot-based least-square (LS) estimation and acquire a more accurate estimation. We have shown that our proposed method has high performance in terms of accuracy and low complexity compared to conventional methods. Further, we have shown that the proposed estimator is robust to interference caused by the hardware impairments at the transceiver and RIS.

其他神经网络|深度学习|模型|建模(23篇)

【1】 Adaptable Agent Populations via a Generative Model of Policies 标题:基于策略生成模型的适应性Agent种群

作者:Kenneth Derek,Phillip Isola 机构:MIT CSAIL 备注:Website at this https URL 链接:https://arxiv.org/abs/2107.07506 摘要:在自然界中,生命找到了无数的生存方式,并常常茁壮成长。在物种之间,甚至在物种内部,每个个体在某种程度上都是独一无二的,这种多样性给生命带来了适应性和健壮性。在这项工作中,我们的目标是学习在任何给定的环境中的多样性和高回报政策的空间。为此,我们引入了一个策略生成模型,将低维的潜在空间映射到agent策略空间。我们的方法可以学习整个群体的代理策略,而不需要使用单独的策略参数。正如现实世界中的种群可以通过自然选择来适应和进化一样,我们的方法也可以通过在潜在空间中选择策略来适应环境的变化。我们在各种环境中测试生成模型的功能,包括开放式网格世界和两人足球环境。代码、可视化和其他实验可以在https://kennyderek.github.io/adap/. 摘要:In the natural world, life has found innumerable ways to survive and often thrive. Between and even within species, each individual is in some manner unique, and this diversity lends adaptability and robustness to life. In this work, we aim to learn a space of diverse and high-reward policies on any given environment. To this end, we introduce a generative model of policies, which maps a low-dimensional latent space to an agent policy space. Our method enables learning an entire population of agent policies, without requiring the use of separate policy parameters. Just as real world populations can adapt and evolve via natural selection, our method is able to adapt to changes in our environment solely by selecting for policies in latent space. We test our generative model's capabilities in a variety of environments, including an open-ended grid-world and a two-player soccer environment. Code, visualizations, and additional experiments can be found at https://kennyderek.github.io/adap/.

【2】 An Overview of Machine Learning-aided Optical Performance Monitoring Techniques 标题:机器学习辅助光学性能监测技术综述

作者:Dativa K. Tizikara,Jonathan Serugunda,Andrew Katumba 机构:Department of Electrical and Computer Engineering,School of Engineering, College of Design, Art and Technology, Makerere University, Kampala 链接:https://arxiv.org/abs/2107.07338 摘要:未来的通信系统面临着对高容量、动态带宽、可靠性和异构业务的日益增长的需求。为了满足这些要求,网络变得越来越复杂,因此需要新的设计方法和监控技术,因为它们正朝着自治的方向发展。近年来,机器学习作为一种很有前途的技术已经走到了最前沿。光纤通信已经能够为大多数应用提供所需的高容量,然而,需要提高可伸缩性和对不断变化的用户需求和链路条件的适应性。准确的性能监视是此转换的一个组成部分。本文综述了机器学习算法在光学性能监测中的应用。此外,由于许多OPM依赖于信号类型的知识,我们还回顾了调制格式识别和比特率识别的工作。此外,我们还简要介绍了一种神经形态的方法,作为一种新兴的技术,直到最近才应用于这一领域。 摘要:Future communication systems are faced with increased demand for high capacity, dynamic bandwidth, reliability and heterogeneous traffic. To meet these requirements, networks have become more complex and thus require new design methods and monitoring techniques, as they evolve towards becoming autonomous. Machine learning has come to the forefront in recent years as a promising technology to aid in this evolution. Optical fiber communications can already provide the high capacity required for most applications, however, there is a need for increased scalability and adaptability to changing user demands and link conditions. Accurate performance monitoring is an integral part of this transformation. In this paper we review optical performance monitoring techniques where machine learning algorithms have been applied. Moreover, since alot of OPM depends on knowledge of the signal type, we also review work for modulation format recognition and bitrate identification. We additionally briefly introduce a neuromorphic approach to OPM as an emerging technique that has only recently been applied to this domain.

【3】 Training for temporal sparsity in deep neural networks, application in video processing 标题:时间稀疏性的深度神经网络训练及其在视频处理中的应用

作者:Amirreza Yousefzadeh,Manolis Sifalakis 链接:https://arxiv.org/abs/2107.07305 摘要:激活稀疏性提高了稀疏感知神经网络加速器的计算效率和资源利用率。由于DNNs中的主要操作是使用权重计算内积的激活的乘法累加(MAC),跳过两个操作数中至少有一个为零的操作可以使推断在延迟和功率方面更有效。激活的空间稀疏化是DNN文献中的一个热门话题,已经建立了几种方法来偏向DNN。另一方面,时间稀疏性是仿生脉冲神经网络(SNNs)的固有特性,神经形态处理利用SNNs提高硬件效率。引入和利用时空稀疏性,是DNN文献中很少探讨的话题,但与DNN的发展趋势完全一致,从静态信号处理转向更多的流信号处理。为了实现这一目标,本文引入了一种新的DNN层(称为Delta激活层),其唯一目的是提高训练过程中激活的时间稀疏性。Delta激活层将时间稀疏性转化为空间稀疏性,以便在硬件中执行稀疏张量乘法时加以利用。通过在训练过程中使用delta推理和“通常的”空间稀疏启发式,所得到的模型不仅学习利用空间而且还学习利用时间激活稀疏性(对于给定的输入数据分布)。人们可以在普通训练或细化阶段使用δ激活层。我们实现了Delta激活层作为标准Tensoflow Keras库的扩展,并将其应用于人体动作识别(UCF101)数据集的深层神经网络训练。我们报告了激活稀疏度几乎提高了3倍,在长时间的训练后,模型精度的可恢复性损失。 摘要:Activation sparsity improves compute efficiency and resource utilization in sparsity-aware neural network accelerators. As the predominant operation in DNNs is multiply-accumulate (MAC) of activations with weights to compute inner products, skipping operations where (at least) one of the two operands is zero can make inference more efficient in terms of latency and power. Spatial sparsification of activations is a popular topic in DNN literature and several methods have already been established to bias a DNN for it. On the other hand, temporal sparsity is an inherent feature of bio-inspired spiking neural networks (SNNs), which neuromorphic processing exploits for hardware efficiency. Introducing and exploiting spatio-temporal sparsity, is a topic much less explored in DNN literature, but in perfect resonance with the trend in DNN, to shift from static signal processing to more streaming signal processing. Towards this goal, in this paper we introduce a new DNN layer (called Delta Activation Layer), whose sole purpose is to promote temporal sparsity of activations during training. A Delta Activation Layer casts temporal sparsity into spatial activation sparsity to be exploited when performing sparse tensor multiplications in hardware. By employing delta inference and ``the usual'' spatial sparsification heuristics during training, the resulting model learns to exploit not only spatial but also temporal activation sparsity (for a given input data distribution). One may use the Delta Activation Layer either during vanilla training or during a refinement phase. We have implemented Delta Activation Layer as an extension of the standard Tensoflow-Keras library, and applied it to train deep neural networks on the Human Action Recognition (UCF101) dataset. We report an almost 3x improvement of activation sparsity, with recoverable loss of model accuracy after longer training.

【4】 A Robust Deep Learning Workflow to Predict Multiphase Flow Behavior during Geological CO2 Sequestration Injection and Post-Injection Periods 标题:用于预测地质CO2封存注入和注入后多相流行为的鲁棒深度学习工作流

作者:Bicheng Yan,Bailian Chen,Dylan Robert Harp,Rajesh J. Pawar 机构:Earth and Environmental Sciences, Los Alamos National Laboratory, Corresponding author 备注:16 pages, 13 figures, 4 tables 链接:https://arxiv.org/abs/2107.07274 摘要:本文有助于开发和评估一个深入学习的工作流程,准确有效地预测地质CO2封存(GCS)作业注入和注入后压力和CO2羽流的时空演变。基于Fourier神经元算子的深度学习工作流采用输入变量或特征,包括岩石性质、油井操作控制和时间步长,并预测压力和CO2饱和度的状态变量。为了进一步提高预测保真度,针对CO2注入和注入后两个阶段,由于流体流动和传输的主要驱动力不同,分别训练了不同的深度学习模型。我们还探索了不同的特征组合来预测状态变量。以三维非均质含水层CO2注入与封存为例,应用基于物理的模拟数据训练的深度学习工作流,对物理过程进行仿真。通过这个数值实验,我们证明了用两个独立的深度学习模型来区分注入后和注入期,可以得到最准确的压力预测,而用一个单独的深度学习模型来预测整个GCS过程,包括CO2的累积注入量作为深度学习特征,得出最准确的二氧化碳饱和度预测。对于注入后阶段,在预测压力和饱和度时,关键是利用累积CO2注入量为深度学习模型提供有关总碳储量的信息。深度学习工作流不仅在时间和空间尺度上提供了很高的预测保真度,而且与全物理油藏模拟相比,其速度提高了250倍,因此将成为工程师管理GCS长期过程的重要预测工具。 摘要:This paper contributes to the development and evaluation of a deep learning workflow that accurately and efficiently predicts the temporal-spatial evolution of pressure and CO2 plumes during injection and post-injection periods of geologic CO2 sequestration (GCS) operations. Based on a Fourier Neuron Operator, the deep learning workflow takes input variables or features including rock properties, well operational controls and time steps, and predicts the state variables of pressure and CO2 saturation. To further improve the predictive fidelity, separate deep learning models are trained for CO2 injection and post-injection periods due the difference in primary driving force of fluid flow and transport during these two phases. We also explore different combinations of features to predict the state variables. We use a realistic example of CO2 injection and storage in a 3D heterogeneous saline aquifer, and apply the deep learning workflow that is trained from physics-based simulation data and emulate the physics process. Through this numerical experiment, we demonstrate that using two separate deep learning models to distinguish post-injection from injection period generates the most accurate prediction of pressure, and a single deep learning model of the whole GCS process including the cumulative injection volume of CO2 as a deep learning feature, leads to the most accurate prediction of CO2 saturation. For the post-injection period, it is key to use cumulative CO2 injection volume to inform the deep learning models about the total carbon storage when predicting either pressure or saturation. The deep learning workflow not only provides high predictive fidelity across temporal and spatial scales, but also offers a speedup of 250 times compared to full physics reservoir simulation, and thus will be a significant predictive tool for engineers to manage the long term process of GCS.

【5】 Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo 标题:基于城域调整哈密顿蒙特卡罗的分散贝叶斯学习

作者:Vyacheslav Kungurtsev,Adam Cobb,Tara Javidi,Brian Jalaian 机构:Department of Computer Science, Czech Technical University in Prague, SRI International, Electrical and Computer Engineering, University of California, San Diego, DEVCOM Army Research Laboratory 链接:https://arxiv.org/abs/2107.07211 摘要:随着嵌入式软件在自主设备上的普及,由分散的代理网络执行的联合学习变得越来越重要。贝叶斯学习方法可以提供更多关于随机量不确定性的信息,而Langevin和Hamiltonian方法可以有效地实现对大参数维数不确定分布的采样。这种方法最近才出现在分散的环境中,或者只使用随机梯度Langevin和Hamiltonian蒙特卡罗方法,这些方法需要一个逐步减小的步长来从后验概率中进行渐近采样,并且在实践中已知,与具有Metropolis平差的常数步长方法相比,它们对不确定性的描述更不准确,或者假设势函数具有强凸性。我们提出了第一种将常数步长Metropolis调整HMC纳入分散抽样框架的方法,给出了一致性和后验平稳分布概率距离的理论保证,并用数值方法证明了它们对标准现实问题的有效性,包括已知高度非凸的神经网络的分散学习。 摘要:Federated learning performed by a decentralized networks of agents is becoming increasingly important with the prevalence of embedded software on autonomous devices. Bayesian approaches to learning benefit from offering more information as to the uncertainty of a random quantity, and Langevin and Hamiltonian methods are effective at realizing sampling from an uncertain distribution with large parameter dimensions. Such methods have only recently appeared in the decentralized setting, and either exclusively use stochastic gradient Langevin and Hamiltonian Monte Carlo approaches that require a diminishing stepsize to asymptotically sample from the posterior and are known in practice to characterize uncertainty less faithfully than constant step-size methods with a Metropolis adjustment, or assume strong convexity properties of the potential function. We present the first approach to incorporating constant stepsize Metropolis-adjusted HMC in the decentralized sampling framework, show theoretical guarantees for consensus and probability distance to the posterior stationary distribution, and demonstrate their effectiveness numerically on standard real world problems, including decentralized learning of neural networks which is known to be highly non-convex.

【6】 Lockout: Sparse Regularization of Neural Networks 标题:锁定:神经网络的稀疏正则化

作者:Gilmer Valdes,Wilmer Arbelo,Yannet Interian,Jerome H. Friedman 机构:Department of Radiation Oncology, Department of Epidemiology and Biostatistics, University of California San Francisco, CA , USA, M.S. in Data Science Program, University of San Francisco, San Francisco, CA , USA, Department of Statistics 链接:https://arxiv.org/abs/2107.07160 摘要:许多回归和分类程序都适合参数化函数$f(x;w) 根据损失准则$L(y,f)$,预测变量$x$到数据${x{i},y{i}}u 1^N$。通常,通过在参数$w$的值上设置约束$P(w)leq t$来应用正则化以提高精度。尽管在特殊情况下,$f$是线性函数时,存在有效的方法来为所有$tgeq0$的值寻找这些约束优化问题的解决方案,但在$f$是非线性函数时(例如神经网络)没有可用的方法。这里我们提出了一个快速算法,它为任何可微函数$f$和损失$L$以及任何约束$P$提供了所有这些解,这些约束$P$是每个参数绝对值的单调递增函数。讨论了稀疏诱导正则化在任意神经网络中的应用。实验结果表明,这些稀疏解在精度和可解释性上通常优于稠密解。这种精确度的提高常常使神经网络在分析表格数据时与最先进的方法相竞争,有时甚至优于这些方法。 摘要:Many regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data ${x_{i},y_{i}}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization problems for all values of $tgeq0$ in the special case when $f$ is a linear function, none are available when $f$ is non-linear (e.g. Neural Networks). Here we present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter. Applications involving sparsity inducing regularization of arbitrary Neural Networks are discussed. Empirical results indicate that these sparse solutions are usually superior to their dense counterparts in both accuracy and interpretability. This improvement in accuracy can often make Neural Networks competitive with, and sometimes superior to, state-of-the-art methods in the analysis of tabular data.

【7】 Deep Learning on a Data Diet: Finding Important Examples Early in Training 标题:关于数据饮食的深度学习:在训练的早期找到重要的例子

作者:Mansheej Paul,Surya Ganguli,Gintare Karolina Dziugaite 机构:Stanford University; Facebook AI Research, Element AI, a ServiceNow Company; Mila 备注:18 pages, 16 figures 链接:https://arxiv.org/abs/2107.07075 摘要:深度学习最近的成功部分是由于在越来越大的数据集上训练越来越过分框架化的网络。因此,很自然地会问:有多少数据是多余的,哪些例子对概括很重要,我们如何找到它们?在这项工作中,我们做了一个惊人的观察,即在标准视觉基准上,个体训练样本的初始损失梯度范数,在几个权值初始化上的平均值,可以用来识别对泛化很重要的较小的训练数据集。此外,仅经过几次训练后,梯度范数中的信息就会反映在标准误差中——预测概率和一个热标签之间的L2距离——可以用来修剪数据集的重要部分,而不牺牲测试精度。在此基础上,我们提出了在训练初期仅使用局部信息的数据剪枝方法,并将其与最近的工作联系起来,通过丢弃在训练过程中很少被遗忘的例子来剪枝数据。我们的方法还揭示了底层数据分布如何影响训练动态:它们根据样本对泛化的重要性对样本进行排序,检测有噪声的样本,并识别在训练过程中相对稳定的模型数据表示的子空间。 摘要:The recent success of deep learning has partially been driven by training increasingly overparametrized networks on ever larger datasets. It is therefore natural to ask: how much of the data is superfluous, which examples are important for generalization, and how do we find them? In this work, we make the striking observation that, on standard vision benchmarks, the initial loss gradient norm of individual training examples, averaged over several weight initializations, can be used to identify a smaller set of training data that is important for generalization. Furthermore, after only a few epochs of training, the information in gradient norms is reflected in the normed error--L2 distance between the predicted probabilities and one hot labels--which can be used to prune a significant fraction of the dataset without sacrificing test accuracy. Based on this, we propose data pruning methods which use only local information early in training, and connect them to recent work that prunes data by discarding examples that are rarely forgotten over the course of training. Our methods also shed light on how the underlying data distribution shapes the training dynamics: they rank examples based on their importance for generalization, detect noisy examples and identify subspaces of the model's data representation that are relatively stable over training.

【8】 Mitigating Memorization in Sample Selection for Learning with Noisy Labels 标题:带噪声标签学习中样本选择中减少记忆问题的研究

作者:Kyeongbo Kong,Junggi Lee,Youngchul Kwak,Young-Rae Cho,Seong-Eun Kim,Woo-Jin Song 机构: ZhangEqual contribution 1Department of Electrical Engineering, Pohang University of Science and Technology, South Korea 3Department of Applied Artificial Intelligence, Seoul National University of Science and Technology (SeoulTech) 备注:14 pages, 9 figures, spotlight presented at the ICML 2021 Workshop on Subset Selection in ML 链接:https://arxiv.org/abs/2107.07041 摘要:由于深度学习容易受到标签噪声的影响,因此样本选择技术(sample selection techniques)受到了广泛的关注。然而,如果标签主要被少数类破坏,这些噪声样本称为显性噪声标签样本,网络也通过内容感知优化快速学习显性噪声标签样本。在这项研究中,我们提出了一个令人信服的标准,以惩罚占主导地位的噪声标签样本密集通过类罚款标签。通过平均每个观察到的标签的预测置信度,我们可以得到合适的惩罚标签,如果标签在很大程度上被某些类损坏,那么这些惩罚标签具有较高的值。使用基准(CIFAR-10、CIFAR-100、Tiny-ImageNet)和真实数据集(ANIMAL-10N、Clothing1M)进行实验,以评估在不同噪声率的各种场景下提出的标准。使用所提出的样本选择,网络的学习过程变得非常鲁棒的噪声标签相比,现有的方法在几种噪声类型。 摘要:Because deep learning is vulnerable to noisy labels, sample selection techniques, which train networks with only clean labeled data, have attracted a great attention. However, if the labels are dominantly corrupted by few classes, these noisy samples are called dominant-noisy-labeled samples, the network also learns dominant-noisy-labeled samples rapidly via content-aware optimization. In this study, we propose a compelling criteria to penalize dominant-noisy-labeled samples intensively through class-wise penalty labels. By averaging prediction confidences for the each observed label, we obtain suitable penalty labels that have high values if the labels are largely corrupted by some classes. Experiments were performed using benchmarks (CIFAR-10, CIFAR-100, Tiny-ImageNet) and real-world datasets (ANIMAL-10N, Clothing1M) to evaluate the proposed criteria in various scenarios with different noise rates. Using the proposed sample selection, the learning process of the network becomes significantly robust to noisy labels compared to existing methods in several noise types.

【9】 Parsimony-Enhanced Sparse Bayesian Learning for Robust Discovery of Partial Differential Equations 标题:简约增强型稀疏贝叶斯学习在偏微分方程稳健发现中的应用

作者:Zhiming Zhang,Yongming Liu 机构:School for Engineering of Matter, Transport and Energy, Arizona State University, Tempe, AZ, USA 备注:arXiv admin note: text overlap with arXiv:2102.06504 链接:https://arxiv.org/abs/2107.07040 摘要:健壮的物理发现是许多科学和工程领域都非常感兴趣的。受代表模型是最简单的原则的启发,提出了一种同时考虑模型的简约性和稀疏性的模型选择准则。提出了一种改进的稀疏贝叶斯学习(PeSBL)方法,用于求解非线性动力系统的偏微分方程。与传统的稀疏贝叶斯学习(SBL)方法相比,PeSBL方法除了具有稀疏性外,还提高了学习模型的简约性。在该方法中,考虑到多项式的幂次和空间导数的阶数增加的复杂性,首次使用模型项在指定候选库中的位置来评估模型项的简约性。随后,利用原始数据通过贝叶斯推理更新模型参数。该方法旨在减少稀疏回归前数据预处理和数值微分过程中可能出现的信息丢失带来的误差。数值算例研究结果表明,在高噪声数据(本研究中高达50%)中,用本文提出的PeSBL方法可以正确地识别许多正则动力系统的控制偏微分方程。其次,将该方法推广到随机偏微分方程学习中,将所有参数和建模误差视为随机变量。将层次贝叶斯推理(HBI)与所提出的基于观测总体的随机偏微分方程学习框架相结合。最后,将该方法应用于具有不确定性的系统响应预测和异常诊断。本研究中所有示例的代码可在网站上获得:https://github.com/ymlasu. 摘要:Robust physics discovery is of great interest for many scientific and engineering fields. Inspired by the principle that a representative model is the one simplest possible, a new model selection criteria considering both model's Parsimony and Sparsity is proposed. A Parsimony Enhanced Sparse Bayesian Learning (PeSBL) method is developed for discovering the governing Partial Differential Equations (PDEs) of nonlinear dynamical systems. Compared with the conventional Sparse Bayesian Learning (SBL) method, the PeSBL method promotes parsimony of the learned model in addition to its sparsity. In this method, the parsimony of model terms is evaluated using their locations in the prescribed candidate library, for the first time, considering the increased complexity with the power of polynomials and the order of spatial derivatives. Subsequently, the model parameters are updated through Bayesian inference with the raw data. This procedure aims to reduce the error associated with the possible loss of information in data preprocessing and numerical differentiation prior to sparse regression. Results of numerical case studies indicate that the governing PDEs of many canonical dynamical systems can be correctly identified using the proposed PeSBL method from highly noisy data (up to 50% in the current study). Next, the proposed methodology is extended for stochastic PDE learning where all parameters and modeling error are considered as random variables. Hierarchical Bayesian Inference (HBI) is integrated with the proposed framework for stochastic PDE learning from a population of observations. Finally, the proposed PeSBL is demonstrated for system response prediction with uncertainties and anomaly diagnosis. Codes of all demonstrated examples in this study are available on the website: https://github.com/ymlasu.

【10】 Hybrid Bayesian Neural Networks with Functional Probabilistic Layers 标题:具有函数概率层的混合贝叶斯神经网络

作者:Daniel T. Chang 链接:https://arxiv.org/abs/2107.07014 摘要:贝叶斯神经网络提供了一种直接而自然的方法来扩展标准的深度神经网络,通过使用传统上编码权重(和偏差)不确定性的概率层来支持概率深度学习。特别地,混合贝叶斯神经网络利用标准的确定性层以及在网络中司法定位的少数概率层来进行不确定性估计。贝叶斯推理的一个主要方面和好处是,先验在原则上提供了对先验知识进行编码以用于推理和预测的方法。然而,由于权重没有直观的解释,很难指定权重的先验值。此外,权值的先验值与网络计算的函数之间的关系很难刻画。相反,函数是直观的解释和直接的,因为它们将输入映射到输出。因此,对先验知识进行编码,并将其用于基于函数的推理和预测是很自然的。为了支持这一点,我们提出了混合贝叶斯神经网络的功能概率层编码功能(和激活)的不确定性。我们讨论了它们在函数贝叶斯推理、函数变分推理、稀疏高斯过程和稀疏变分高斯过程中的基础。我们还使用GPflus进行了一些概念验证实验,GPflus是一个新的库,提供了高斯过程层,并支持使用确定性Keras层来形成混合神经网络和高斯过程模型。 摘要:Bayesian neural networks provide a direct and natural way to extend standard deep neural networks to support probabilistic deep learning through the use of probabilistic layers that, traditionally, encode weight (and bias) uncertainty. In particular, hybrid Bayesian neural networks utilize standard deterministic layers together with few probabilistic layers judicially positioned in the networks for uncertainty estimation. A major aspect and benefit of Bayesian inference is that priors, in principle, provide the means to encode prior knowledge for use in inference and prediction. However, it is difficult to specify priors on weights since the weights have no intuitive interpretation. Further, the relationships of priors on weights to the functions computed by networks are difficult to characterize. In contrast, functions are intuitive to interpret and are direct since they map inputs to outputs. Therefore, it is natural to specify priors on functions to encode prior knowledge, and to use them in inference and prediction based on functions. To support this, we propose hybrid Bayesian neural networks with functional probabilistic layers that encode function (and activation) uncertainty. We discuss their foundations in functional Bayesian inference, functional variational inference, sparse Gaussian processes, and sparse variational Gaussian processes. We further perform few proof-of-concept experiments using GPflus, a new library that provides Gaussian process layers and supports their use with deterministic Keras layers to form hybrid neural network and Gaussian process models.

【11】 WeightScale: Interpreting Weight Change in Neural Networks 标题:WeightScale:解释神经网络中的权重变化

作者:Ayush Manish Agrawal,Atharva Tendle,Harshvardhan Sikka,Sahib Singh 机构:University of Nebraska-Lincoln, Georgia Institute of Technology, Ford Motor Company 备注:9 pages, 8 figures. arXiv admin note: text overlap with arXiv:2011.06735 链接:https://arxiv.org/abs/2107.07005 摘要:解释神经网络的学习动力学可以提供有用的见解,了解网络是如何学习的,以及开发更好的训练和设计方法。我们提出了一种解释神经网络学习的方法,通过在每层基础上测量相对权重的变化,并通过降维和聚类的结合动态地聚集新的趋势,使我们能够扩展到非常深的网络。我们使用这种方法来研究视觉任务背景下跨各种最先进网络的学习,并深入了解这些网络的学习行为,包括任务复杂性如何影响网络深层次的分层学习。 摘要:Interpreting the learning dynamics of neural networks can provide useful insights into how networks learn and the development of better training and design approaches. We present an approach to interpret learning in neural networks by measuring relative weight change on a per layer basis and dynamically aggregating emerging trends through combination of dimensionality reduction and clustering which allows us to scale to very deep networks. We use this approach to investigate learning in the context of vision tasks across a variety of state-of-the-art networks and provide insights into the learning behavior of these networks, including how task complexity affects layer-wise learning in deeper layers of networks.

【12】 DeepHyperion: Exploring the Feature Space of Deep Learning-Based Systems through Illumination Search 标题:DeepHyperion:通过光照搜索探索深度学习系统的特征空间

作者:Tahereh Zohdinasab,Vincenzo Riccio,Alessio Gambi,Paolo Tonella 机构:Università della Svizzera Italiana, Lugano, Switzerland, University of Passau, Passau, Germany 备注:To be published in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '21), July 11-17, 2021, Virtual, Denmark. ACM, New York, NY, USA, 12 pages 链接:https://arxiv.org/abs/2107.06997 摘要:深度学习(Deep Learning,DL)已成功应用于包括安全关键领域在内的广泛应用领域。文献中最近提出了几种DL测试方法,但没有一种方法旨在评估生成输入的不同可解释特征如何影响系统行为。在本文中,我们借助于光照搜索来寻找分布在表示系统特征空间的地图单元中性能最高的测试用例(即,行为不端和最接近行为不端的测试用例)。我们介绍了一种方法,指导我们的方法在识别和量化的任务,为给定领域的特征空间的维度的用户。我们开发了DeepHyperion,这是一种基于搜索的DL系统工具,通过向开发人员提供可解释的特征图,自动生成的输入与暴露行为的信息一起放置在该特征图中,从而照亮(即,全面探索)特征空间。 摘要:Deep Learning (DL) has been successfully applied to a wide range of application domains, including safety-critical ones. Several DL testing approaches have been recently proposed in the literature but none of them aims to assess how different interpretable features of the generated inputs affect the system's behaviour. In this paper, we resort to Illumination Search to find the highest-performing test cases (i.e., misbehaving and closest to misbehaving), spread across the cells of a map representing the feature space of the system. We introduce a methodology that guides the users of our approach in the tasks of identifying and quantifying the dimensions of the feature space for a given domain. We developed DeepHyperion, a search-based tool for DL systems that illuminates, i.e., explores at large, the feature space, by providing developers with an interpretable feature map where automatically generated inputs are placed along with information about the exposed behaviours.

【13】 What underlies rapid learning and systematic generalization in humans 标题:人类快速学习和系统归纳的基础是什么?

作者:Andrew Joohun Nam,James L. McClelland 机构:Department of Psychology, Stanford University 备注:22 pages, 48 references, 6 Figures, and one Table, plus SI 链接:https://arxiv.org/abs/2107.06994 摘要:尽管神经网络取得了突破性的成功,但现代模型需要大量数据集的广泛训练,而且样本外泛化能力较差。一个建议的解决方案是在模型中建立系统性和特定领域的约束,呼应经典的符号认知架构的原则。在本文中,我们考虑这种方法的局限性,通过检查成人学习抽象推理任务的能力,从一个简短的教学教程和解释反馈错误的反应,证明了人类的学习动力和在训练样本范围外的概括能力与典型的神经网络模型有很大的不同,并且该模型对于作者没有预料到的特征变化是脆弱的。我们从人类数据中提出了进一步的证据,证明持续解决难题的能力与教育,特别是基础数学教育,以及提供可靠的、可识别的、有效的策略描述的能力有关。我们认为,人类的快速学习和系统概括可能依赖于一个渐进的、依赖经验的学习过程,即使用指令和解释来指导构建支持归纳推理的显式抽象规则。 摘要:Despite the groundbreaking successes of neural networks, contemporary models require extensive training with massive datasets and exhibit poor out-of-sample generalization. One proposed solution is to build systematicity and domain-specific constraints into the model, echoing the tenets of classical, symbolic cognitive architectures. In this paper, we consider the limitations of this approach by examining human adults' ability to learn an abstract reasoning task from a brief instructional tutorial and explanatory feedback for incorrect responses, demonstrating that human learning dynamics and ability to generalize outside the range of the training examples differ drastically from those of a representative neural network model, and that the model is brittle to changes in features not anticipated by its authors. We present further evidence from human data that the ability to consistently solve the puzzles was associated with education, particularly basic mathematics education, and with the ability to provide a reliably identifiable, valid description of the strategy used. We propose that rapid learning and systematic generalization in humans may depend on a gradual, experience-dependent process of learning-to-learn using instructions and explanations to guide the construction of explicit abstract rules that support generalizable inferences.

【14】 Mapping Learning Algorithms on Data, a useful step for optimizing performances and their comparison 标题:数据映射学习算法--性能优化的有效步骤及其比较

作者:Filippo Neri 机构: University of Naples 备注:The main classification class for the paper is Machine Learning 链接:https://arxiv.org/abs/2107.06981 摘要:在本文中,我们提出了一种新的方法来映射数据上的学习算法(性能映射),以便更深入地了解它们在参数空间中的性能分布。这种方法在为手头的数据选择一个学习者的最佳配置时提供了有用的信息,它还增强了学习者在不同学习环境中的比较。为了解释所提出的方法,本研究引入了学习情境、绩效地图和高绩效功能的概念。然后将这些概念应用到各种学习情境中,以展示它们的使用如何为学习者的行为提供更多的见解,并增强学习者在不同学习情境中的比较。这项研究是通过一项广泛的实验研究来完成的,它描述了所提出的方法是如何应用的。 摘要:In the paper, we propose a novel methodology to map learning algorithms on data (performance map) in order to gain more insights in the distribution of their performances across their parameter space. This methodology provides useful information when selecting a learner's best configuration for the data at hand, and it also enhances the comparison of learners across learning contexts. In order to explain the proposed methodology, the study introduces the notions of learning context, performance map, and high performance function. It then applies these concepts to a variety of learning contexts to show how their use can provide more insights in a learner's behavior, and can enhance the comparison of learners across learning contexts. The study is completed by an extensive experimental study describing how the proposed methodology can be applied.

【15】 HTLM: Hyper-Text Pre-Training and Prompting of Language Models 标题:HTLM:语言模型的超文本预训练和提示

作者:Armen Aghajanyan,Dmytro Okhonko,Mike Lewis,Mandar Joshi,Hu Xu,Gargi Ghosh,Luke Zettlemoyer 机构:Facebook AI, University of Washington 链接:https://arxiv.org/abs/2107.06955 摘要:我们介绍了HTLM,一种在大规模网络爬网中训练的超文本语言模型。建模超文本有许多优点:(1)它易于大规模收集,(2)它提供丰富的文档级别和结束任务相邻的监视(例如,类和id属性通常编码文档类别信息),以及(3)它允许新的结构化提示,遵循HTML的既定语义(例如,通过填充包含输入文本的网页的标题标签来进行零快照摘要)。我们表明,预训练与巴特风格去噪损失直接在简化的HTML提供了一个广泛的终端任务和监督水平的高效传输。HTLM在Zero-Shot提示和分类基准微调方面的性能达到或超过了同等大小的纯文本LMs,同时也为Zero-Shot摘要设置了最新的性能水平。我们还发现,就数据效率而言,超文本提示比现有LMs的纯文本提示更能为HTLM提供价值,而且HTLM在自动提示方面非常有效,只需为任何可用的训练数据生成最可能的超文本格式。我们将发布所有代码和模型,以支持未来的HTLM研究。 摘要:We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of HTML (e.g. to do zero-shot summarization by infilling title tags for a webpage that contains the input text). We show that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels. HTLM matches or exceeds the performance of comparably sized text-only LMs for zero-shot prompting and fine-tuning for classification benchmarks, while also setting new state-of-the-art performance levels for zero-shot summarization. We also find that hyper-text prompts provide more value to HTLM, in terms of data efficiency, than plain text prompts do for existing LMs, and that HTLM is highly effective at auto-prompting itself, by simply generating the most likely hyper-text formatting for any available training data. We will release all code and models to support future HTLM research.

【16】 Towards Quantifying the Carbon Emissions of Differentially Private Machine Learning 标题:差异化私人机器学习的碳排放量化研究

作者:Rakshit Naidu,Harshita Diddee,Ajinkya Mulay,Aleti Vardhan,Krithika Ramesh,Ahmed Zamzam 机构:Equal contribution 1Carnegie Mellon University 2ManipalInstitute of Technology 3Bharati Vidyapeeth’s College ofEngineering 4Purdue University 5The National RenewableEnergy Laboratory 备注:4 3 pages; 6 figures; 8 tables. Accepted at SRML workshop at ICML'21 链接:https://arxiv.org/abs/2107.06946 摘要:近年来,利用大规模数据集的机器学习技术取得了显著的成绩。微分隐私通过加入噪声的方式,为这种学习算法提供了强有力的隐私保障。差分隐私的代价通常是模型精度的降低和收敛速度的降低。本文从学习算法的碳足迹的角度研究了由于运行时间较长或实验失败而导致的差异隐私对学习算法的影响。通过广泛的实验,进一步指导选择噪音水平,这可以在所需的隐私水平和减少碳排放之间取得平衡。 摘要:In recent years, machine learning techniques utilizing large-scale datasets have achieved remarkable performance. Differential privacy, by means of adding noise, provides strong privacy guarantees for such learning algorithms. The cost of differential privacy is often a reduced model accuracy and a lowered convergence speed. This paper investigates the impact of differential privacy on learning algorithms in terms of their carbon footprint due to either longer run-times or failed experiments. Through extensive experiments, further guidance is provided on choosing the noise levels which can strike a balance between desired privacy levels and reduced carbon emissions.

【17】 FetalNet: Multi-task deep learning framework for fetal ultrasound biometric measurements 标题:FetalNet:胎儿超声生物特征测量的多任务深度学习框架

作者:Szymon Płotka,Tomasz Włodarczyk,Adam Klasa,Michał Lipa,Arkadiusz Sitek,Tomasz Trzciński 机构: Sano Centre for Computational Medicine, Cracow, Poland, Warsaw University of Technology, Warsaw, Poland, Medical University of Warsaw, Warsaw, Poland, Fetai Health Ltd., Tooploox, Wroclaw, Poland 备注:Submitted to ICONIP 2021 链接:https://arxiv.org/abs/2107.06943 摘要:在本文中,我们提出了一个端到端的多任务神经网络称为胎儿网络与注意机制和堆叠模块的时空胎儿超声扫描视频分析。胎儿生物测量是妊娠期的一种标准检查,用于胎儿生长监测和估计胎龄和胎儿体重。胎儿超声扫描图像分析的主要目的是寻找合适的标准平面来测量胎儿的头、腹和股骨。由于超声数据中固有的高散斑噪声和阴影,需要医学专家和超声经验来找到合适的采集平面并对胎儿进行精确测量。此外,现有的计算机辅助胎儿超声生物特征测量方法只处理一帧图像,没有考虑时间特征。针对这些不足,我们提出了一种端到端的多任务神经网络,用于时空超声扫描视频分析,同时对胎儿身体部位进行定位、分类和测量。我们提出了一个新的编码器-解码器分割架构,其中包含了一个分类分支。此外,我们利用一个堆叠模组的注意机制来学习显著地图,以抑制不相关的US区域,并有效地进行扫描平面定位。我们对700名不同患者的常规检查中的胎儿超声视频进行了训练。我们称之为FetalNet的方法在胎儿超声录像的分类和分割方面都优于现有的最新方法。 摘要:In this paper, we propose an end-to-end multi-task neural network called FetalNet with an attention mechanism and stacked module for spatio-temporal fetal ultrasound scan video analysis. Fetal biometric measurement is a standard examination during pregnancy used for the fetus growth monitoring and estimation of gestational age and fetal weight. The main goal in fetal ultrasound scan video analysis is to find proper standard planes to measure the fetal head, abdomen and femur. Due to natural high speckle noise and shadows in ultrasound data, medical expertise and sonographic experience are required to find the appropriate acquisition plane and perform accurate measurements of the fetus. In addition, existing computer-aided methods for fetal US biometric measurement address only one single image frame without considering temporal features. To address these shortcomings, we propose an end-to-end multi-task neural network for spatio-temporal ultrasound scan video analysis to simultaneously localize, classify and measure the fetal body parts. We propose a new encoder-decoder segmentation architecture that incorporates a classification branch. Additionally, we employ an attention mechanism with a stacked module to learn salient maps to suppress irrelevant US regions and efficient scan plane localization. We trained on the fetal ultrasound video comes from routine examinations of 700 different patients. Our method called FetalNet outperforms existing state-of-the-art methods in both classification and segmentation in fetal ultrasound video recordings.

【18】 Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines 标题:Chimera:利用双向管道高效训练大规模神经网络

作者:Shigang Li,Torsten Hoefler 机构:Department of Computer Science, ETH Zurich, Switzerland 备注:The paper was accepted by the 2021 International Conference for High Performance Computing, Networking, Storage and Analysis (SC'21), in Best Paper Finalist 链接:https://arxiv.org/abs/2107.06925 摘要:大规模训练大型深度学习模型非常具有挑战性。该文提出了一种新的流水线并行方案Chimera,该方案结合了双向流水线来有效地训练大规模模型。Chimera是一种同步方法,因此不会损失精度,这比异步方法更易于收敛。与最新的同步管道方法相比,奇美拉减少了多达50%的气泡数量;得益于双向管道的复杂调度,奇美拉拥有更均衡的激活内存消耗。评估是在基于Transformer的语言模型上进行的。对于在Piz-Daint超级计算机的2048个GPU节点上运行的GPT-2模型,Chimera比最先进的同步和异步流水线方法提高了1.16倍-2.34倍的训练吞吐量。 摘要:Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. Chimera is a synchronous approach and therefore no loss of accuracy, which is more convergence-friendly than asynchronous approaches. Compared with the latest synchronous pipeline approach, Chimera reduces the number of bubbles by up to 50%; benefiting from the sophisticated scheduling of bidirectional pipelines, Chimera has a more balanced activation memory consumption. Evaluations are conducted on Transformer based language models. For a GPT-2 model with 1.3 billion parameters running on 2,048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1.16x-2.34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

【19】 FMNet: Latent Feature-wise Mapping Network for Cleaning up Noisy Micro-Doppler Spectrogram 标题:FMNet:去除噪声微多普勒频谱图的潜在特征映射网络

作者:Chong Tang,Wenda Li,Shelly Vishwakarma,Fangzhan Shi,Simon Julier,Kevin Chetty 机构:∗Department of Security and Crime Science, University College London, UK, †Department of Computer Science, University College London, UK 链接:https://arxiv.org/abs/2107.07312 摘要:微多普勒信号包含了大量的目标动力学信息。然而,雷达传感系统容易受到噪声环境的影响,在微多普勒频谱图上产生难以理解的运动模式。同时,雷达回波信号往往会受到多径、杂波和干扰的影响。这些问题给运动特征提取、利用微多普勒信号(micro Doppler signatures,简称$mu$-DS)进行活动分类等带来了困难。本文提出了一种潜在的特征映射策略,称为特征映射网络(feature mapping Network,FMNet),对测量的光谱图进行变换,使其在相同条件下更接近模拟的输出。基于实测谱图和匹配的模拟数据,我们的框架包括三个部分:一个用于提取潜在表征/特征的编码器,一个根据潜在特征输出重构谱图的解码器,以及一个最小化实测和模拟数据潜在特征距离的鉴别器。我们用六个活动数据和两个实验场景对FMNet进行了演示,最终的结果显示了很强的增强模式,能够最大程度地保留实际的运动信息。另一方面,我们也提出了一个新的想法,即只使用模拟数据训练分类器,并在用FMNet清理后预测新的测量样本。从最终的分类结果中,我们可以看到显著的改进。 摘要:Micro-Doppler signatures contain considerable information about target dynamics. However, the radar sensing systems are easily affected by noisy surroundings, resulting in uninterpretable motion patterns on the micro-Doppler spectrogram. Meanwhile, radar returns often suffer from multipath, clutter and interference. These issues lead to difficulty in, for example motion feature extraction, activity classification using micro Doppler signatures ($mu$-DS), etc. In this paper, we propose a latent feature-wise mapping strategy, called Feature Mapping Network (FMNet), to transform measured spectrograms so that they more closely resemble the output from a simulation under the same conditions. Based on measured spectrogram and the matched simulated data, our framework contains three parts: an Encoder which is used to extract latent representations/features, a Decoder outputs reconstructed spectrogram according to the latent features, and a Discriminator minimizes the distance of latent features of measured and simulated data. We demonstrate the FMNet with six activities data and two experimental scenarios, and final results show strong enhanced patterns and can keep actual motion information to the greatest extent. On the other hand, we also propose a novel idea which trains a classifier with only simulated data and predicts new measured samples after cleaning them up with the FMNet. From final classification results, we can see significant improvements.

【20】 Continuous-variable neural-network quantum states and the quantum rotor model 标题:连续变量神经网络量子态与量子转子模型

作者:James Stokes,Saibal De,Shravan Veerapaneni,Giuseppe Carleo 机构: Flatiron Institute, NY 100 10 USA 2Department of Mathematics, University of Michigan, MI 48 109 USA 3Institute of Physics 链接:https://arxiv.org/abs/2107.07105 摘要:我们开始研究神经网络量子态算法来分析连续可变晶格量子系统的第一量子化。介绍了一个简单的连续变量试验波函数族,它自然地推广了用于分析量子自旋系统的受限玻耳兹曼机(RBM)波函数。由于它的简单性,同样的变分montecarlo训练算法已经被开发用于基态确定和自旋系统的时间演化,在连续统中具有天然的相似性。我们提供了一个原理证明的背景下确定基态的量子转子哈密顿量。结果与基于偏微分方程(PDE)的可伸缩特征解算器得到的结果进行了比较。这项研究作为一个基准,可以比较连续变量神经量子态的未来研究,并指出需要考虑深层网络架构和更复杂的训练算法。 摘要:We initiate the study of neural-network quantum state algorithms for analyzing continuous-variable lattice quantum systems in first quantization. A simple family of continuous-variable trial wavefunctons is introduced which naturally generalizes the restricted Boltzmann machine (RBM) wavefunction introduced for analyzing quantum spin systems. By virtue of its simplicity, the same variational Monte Carlo training algorithms that have been developed for ground state determination and time evolution of spin systems have natural analogues in the continuum. We offer a proof of principle demonstration in the context of ground state determination of a stoquastic quantum rotor Hamiltonian. Results are compared against those obtained from partial differential equation (PDE) based scalable eigensolvers. This study serves as a benchmark against which future investigation of continuous-variable neural quantum states can be compared, and points to the need to consider deep network architectures and more sophisticated training algorithms.

【21】 Learning-based Spectrum Sensing and Access in Cognitive Radios via Approximate POMDPs 标题:认知无线电中基于学习的近似POMDP频谱感知与接入

作者:Bharath Keshavamurthy,Nicolo Michelusi 机构:The authors are with the School of Electrical, Arizona State University 备注:33 pages, 9 figures, 1 table, Major Revisions under review at IEEE Transactions on Cognitive Communications and Networking (IEEE TCCN) 链接:https://arxiv.org/abs/2107.07049 摘要:提出了一种新的基于学习的频谱感知与接入(LESSA)框架,其中认知无线电(CR)学习无线电生态系统中许可用户(lu)频谱占用的时频相关模型;同时,在感知约束下设计了一种近似最优的频谱感知和接入策略。提出了一种基于噪声频谱测量的LU频谱占用参数Markov转移模型的Baum-Welch学习算法。频谱感知和接入被转换成一个部分可观测的马尔可夫决策过程,通过基于随机点的值迭代进行近似优化。提出了分段、Hamming距离状态滤波器和montecarlo方法来降低固有的计算复杂度,并提出了一个加权奖励度量来调节CR吞吐量和LU干扰之间的权衡。数值评估表明,LESSA在预测LU频谱占用率的情况下,其性能在genie辅助上界的5%以内,并且在整个权衡区域优于最先进的算法:71%优于基于相关的聚类,26%优于Neyman-Pearson检测,6%优于Viterbi算法,在自适应深度Q网络上为9%。然后,通过提出新的邻居发现和信道访问等级分配,将LESSA扩展到分布式多代理设置(MA-LESSA)。MA-LESSA通过g统计量和ACKs将CR吞吐量提高了43%,比合作贪婪分布式学习提高了84%,比非合作学习提高了3倍。最后,MA-LESSA在DARPA SC2平台上实现,在真实的TDWR-UNII无线局域网仿真中表现出了优于竞争对手的性能;在ESP32电台的试验台上进一步验证了该方法的可行性,成功率达96%。 摘要:A novel LEarning-based Spectrum Sensing and Access (LESSA) framework is proposed, wherein a cognitive radio (CR) learns a time-frequency correlation model underlying spectrum occupancy of licensed users (LUs) in a radio ecosystem; concurrently, it devises an approximately optimal spectrum sensing and access policy under sensing constraints. A Baum-Welch algorithm is proposed to learn a parametric Markov transition model of LU spectrum occupancy based on noisy spectrum measurements. Spectrum sensing and access are cast as a Partially-Observable Markov Decision Process, approximately optimized via randomized point-based value iteration. Fragmentation, Hamming-distance state filters and Monte-Carlo methods are proposed to alleviate the inherent computational complexity, and a weighted reward metric to regulate the trade-off between CR throughput and LU interference. Numerical evaluations demonstrate that LESSA performs within 5 percent of a genie-aided upper bound with foreknowledge of LU spectrum occupancy, and outperforms state-of-the-art algorithms across the entire trade-off region: 71 percent over correlation-based clustering, 26 percent over Neyman-Pearson detection, 6 percent over the Viterbi algorithm, and 9 percent over an adaptive Deep Q-Network. LESSA is then extended to a distributed Multi-Agent setting (MA-LESSA), by proposing novel neighbor discovery and channel access rank allocation. MA-LESSA improves CR throughput by 43 percent over cooperative TD-SARSA, 84 percent over cooperative greedy distributed learning, and 3x over non-cooperative learning via g-statistics and ACKs. Finally, MA-LESSA is implemented on the DARPA SC2 platform, manifesting superior performance over competitors in a real-world TDWR-UNII WLAN emulation; its implementation feasibility is further validated on a testbed of ESP32 radios, exhibiting 96 percent success probability.

【22】 Performance of Bayesian linear regression in a model with mismatch 标题:贝叶斯线性回归在失配模型中的性能

作者:Jean Barbier,Wei-Kuo Chen,Dmitry Panchenko,Manuel Sáenz 链接:https://arxiv.org/abs/2107.06936 摘要:对于一个随机设计的高维线性回归模型,分析了高斯先验下对数凹贝叶斯后验分布均值估计的性能。该模型在以下意义上是不匹配的:像统计学家假设的模型一样,标签生成过程在输入数据中是线性的,但是分类器的地面真值先验和高斯噪声方差对她来说都是未知的。这个推理模型可以用自旋玻璃中加德纳模型的一个版本来表述,并且,我们使用空腔方法,为各种重叠阶参数提供了不动点方程,特别是在假设解的唯一性的情况下,得到了分类器上均方重构误差的表达式。作为直接推论,我们得到了自由能的表达式。Shcherbina和Tirozzi以及Talagrand已经研究过类似的模型,但我们的论点更为直接,一些假设也有所放松。我们分析的一个有趣的结果是,在岭回归的随机设计环境中,后验平均数的性能独立于统计学家假设的噪声方差(或“温度”),并且与通常的(零温度)岭估计相匹配。 摘要:For a model of high-dimensional linear regression with random design, we analyze the performance of an estimator given by the mean of a log-concave Bayesian posterior distribution with gaussian prior. The model is mismatched in the following sense: like the model assumed by the statistician, the labels-generating process is linear in the input data, but both the classifier ground-truth prior and gaussian noise variance are unknown to her. This inference model can be rephrased as a version of the Gardner model in spin glasses and, using the cavity method, we provide fixed point equations for various overlap order parameters, yielding in particular an expression for the mean-square reconstruction error on the classifier (under an assumption of uniqueness of solutions). As a direct corollary we obtain an expression for the free energy. Similar models have already been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are more straightforward and some assumptions are relaxed. An interesting consequence of our analysis is that in the random design setting of ridge regression, the performance of the posterior mean is independent of the noise variance (or "temperature") assumed by the statistician, and matches the one of the usual (zero temperature) ridge estimator.

【23】 Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group 标题:迈向信息流的量化:深度神经网络和重整化群中的相对熵

作者:Johanna Erdmenger,Kevin T. Grosvenor,Ro Jefferson 机构:Institute for Theoretical Physics and Astrophysics and W¨urzburg-Dresden Cluster of Excellence, ct.qmat, Julius-Maximilians-Universit¨at W¨urzburg, Am Hubland, W¨urzburg, Germany 备注:41 pages, 8 figures; code available at this https URL 链接:https://arxiv.org/abs/2107.06898 摘要:我们研究了重整化群(RG)和深层神经网络之间的相似性,其中随后的神经元层类似于沿着RG的连续步骤。特别地,我们通过在抽取RG下显式计算一维和二维Ising模型以及作为深度函数的前馈神经网络中的相对熵或Kullback-Leibler散度来量化信息流。我们观察到性质相同的行为,其特征是单调增加到参数相关的渐近值。在量子场论方面,单调增长证实了相对熵与c定理之间的联系。对于神经网络,渐近行为可能对机器学习中的各种信息最大化方法,以及解开紧致性和可推广性的纠缠都有影响。此外,当我们考虑二维Ising模型和随机神经网络都表现出非平凡的临界点时,相对熵对任一系统的相位结构都不敏感。从这个意义上说,为了充分阐明这些模型中的信息流,需要更精细的探针。 摘要:We investigate the analogy between the renormalization group (RG) and deep neural networks, wherein subsequent layers of neurons are analogous to successive steps along the RG. In particular, we quantify the flow of information by explicitly computing the relative entropy or Kullback-Leibler divergence in both the one- and two-dimensional Ising models under decimation RG, as well as in a feedforward neural network as a function of depth. We observe qualitatively identical behavior characterized by the monotonic increase to a parameter-dependent asymptotic value. On the quantum field theory side, the monotonic increase confirms the connection between the relative entropy and the c-theorem. For the neural networks, the asymptotic behavior may have implications for various information maximization methods in machine learning, as well as for disentangling compactness and generalizability. Furthermore, while both the two-dimensional Ising model and the random neural networks we consider exhibit non-trivial critical points, the relative entropy appears insensitive to the phase structure of either system. In this sense, more refined probes are required in order to fully elucidate the flow of information in these models.

其他(24篇)

【1】 Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks 标题:班次:跨多个大规模任务的实际分布班次的数据集

作者:Andrey Malinin,Neil Band,German Chesnokov,Yarin Gal,Mark J. F. Gales,Alexey Noskov,Andrey Ploskonosov,Liudmila Prokhorenkova,Ivan Provilkov,Vatsal Raina,Vyas Raina,Mariya Shmatova,Panos Tigas,Boris Yangel 机构: 2HSE University, 3Moscow Institute of Physics and Technology, 4University of Cambridge, 5University of Oxford, 6Alan Turing InstitutePreprint 链接:https://arxiv.org/abs/2107.07455 摘要:对于如何提高对分布偏移和不确定性估计的鲁棒性,人们已经进行了大量的研究。相比之下,只有有限的工作审查了为评估这些方法而开发的标准数据集和基准。此外,大多数关于不确定性估计和鲁棒性的工作已经发展了基于小尺度回归或图像分类任务的新技术。然而,许多实际感兴趣的任务具有不同的模式,例如表格数据、音频、文本或传感器数据,这对回归和离散或连续结构化预测提出了重大挑战。因此,鉴于该领域的现状,有必要建立一个标准化的大规模任务数据集,涵盖受分配变化影响的一系列模式。这将使研究人员能够有意义地评估最近开发的大量不确定性量化方法,以及评估标准和最先进的基线。在这项工作中,我们提出emph{Shifts Dataset}来评估不确定性估计和对分布移位的鲁棒性。从工业来源和服务收集的数据集由三个任务组成,每个任务对应一种特定的数据模式:表格天气预报、机器翻译和自动驾驶汽车(SDC)车辆运动预测。所有这些数据模式和任务都受到真实的“野外”分布变化的影响,并在不确定性估计方面提出了有趣的挑战。在这项工作中,我们提供了所有任务的数据集和基线结果的描述。 摘要:There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined developing standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. Thus, given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary. This will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, as well as assessment criteria and state-of-the-art baselines. In this work, we propose the emph{Shifts Dataset} for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation. In this work we provide a description of the dataset and baseline results for all tasks.

【2】 AutoBERT-Zero: Evolving BERT Backbone from Scratch 标题:AutoBERT-Zero:从头开始进化的BERT主干

作者:Jiahui Gao,Hang Xu,Han shi,Xiaozhe Ren,Philip L. H. Yu,Xiaodan Liang,Xin Jiang,Zhenguo Li 机构:Philip L.H. Yu , The University of Hong Kong, Huawei Noah’s Ark Lab, Hong Kong University of Science and Technology, Sun Yat-sen University 备注:9 pages 链接:https://arxiv.org/abs/2107.07445 摘要:基于变换器的预训练语言模型(如BERT及其变体)最近在各种自然语言处理(NLP)任务中取得了良好的性能。然而,传统的范式纯粹是通过将人工设计的全局自我注意层叠加,引入诱导偏差,从而导致次优。在这项工作中,我们提出了一个操作优先的神经架构搜索(OP-NAS)算法来自动搜索有前途的混合主干架构。我们精心设计的搜索空间(i)在层内包含原始的数学运算以探索新的注意结构,(ii)在层间利用卷积块作为注意结构的补充以更好地学习局部依赖性。我们优化了搜索算法和候选模型的评估,以提高我们提出的OP-NAS的效率。具体来说,我们提出了操作优先级(OP)进化策略,通过平衡探索和开发来促进模型搜索。此外,我们还设计了一种双分支权重共享(BIWS)训练策略,用于快速模型评估。大量实验表明,该搜索结构(autobertzero)在各种下游任务中的性能明显优于BERT及其不同模型容量的变体,证明了该结构的迁移和泛化能力。值得注意的是,autobertzero-base在GLUE测试集上比RoBERTa-base(使用更多的数据)和BERT-large(模型尺寸更大)分别高出2.4和1.4分。代码和预先训练的模型将公开。 摘要:Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks. However, the conventional paradigm constructs the backbone by purely stacking the manually designed global self-attention layers, introducing inductive bias and thus leading to sub-optimal. In this work, we propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. Our well-designed search space (i) contains primitive math operations in the intra-layer level to explore novel attention structures, and (ii) leverages convolution blocks to be the supplementary for attention structure in the inter-layer level to better learn local dependency. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Specifically, we propose Operation-Priority (OP) evolution strategy to facilitate model search via balancing exploration and exploitation. Furthermore, we design a Bi-branch Weight-Sharing (BIWS) training strategy for fast model evaluation. Extensive experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks, proving the architecture's transfer and generalization abilities. Remarkably, AutoBERT-Zero-base outperforms RoBERTa-base (using much more data) and BERT-large (with much larger model size) by 2.4 and 1.4 higher score on GLUE test set. Code and pre-trained models will be made publicly available.

【3】 Convolutional Neural Bandit: Provable Algorithm for Visual-aware Advertising 标题:卷积神经BANDIT:视觉感知广告的可证明算法

作者:Yikun Ban,Jingrui He 机构:University of Illinois at Urbana-Champaign 备注:23 pages, in submission 链接:https://arxiv.org/abs/2107.07438 摘要:网络广告在网络商务中无处不在。图像显示被认为是与客户交互最常用的格式之一。背景多武装土匪在广告的应用中成功地解决了推荐程序中存在的探索性开发困境。受视觉感知广告的启发,本文提出了一种上下文bandit算法,该算法利用卷积神经网络(CNN)学习奖励函数,并结合置信上限(UCB)进行探索。我们还证明了当网络过度参数化并与卷积神经切线核(CNTK)建立强连接时,一个近似最优的遗憾界$tilde{mathcal{O}(sqrt{T})$。最后,我们评估了所提出的算法的实验性能,并在真实的图像数据集上证明了它优于其他最先进的基于UCB的bandit算法。 摘要:Online advertising is ubiquitous in web business. Image displaying is considered as one of the most commonly used formats to interact with customers. Contextual multi-armed bandit has shown success in the application of advertising to solve the exploration-exploitation dilemma existed in the recommendation procedure. Inspired by the visual-aware advertising, in this paper, we propose a contextual bandit algorithm, where the convolutional neural network (CNN) is utilized to learn the reward function along with an upper confidence bound (UCB) for exploration. We also prove a near-optimal regret bound $tilde{mathcal{O}}(sqrt{T})$ when the network is over-parameterized and establish strong connections with convolutional neural tangent kernel (CNTK). Finally, we evaluate the empirical performance of the proposed algorithm and show that it outperforms other state-of-the-art UCB-based bandit algorithms on real-world image data sets.

【4】 Auditing for Diversity using Representative Examples 标题:利用有代表性的例子对多样性进行审计

作者:Vijay Keswani,L. Elisa Celis 机构:Yale University 链接:https://arxiv.org/abs/2107.07393 摘要:在将这些数据用于下游应用程序之前,评估与人相关的信息数据集的多样性至关重要。对于给定的数据集,这通常涉及计算受保护属性(如性别、方言等)的经验边际分布的不平衡或差异。然而,现实世界中的数据集,例如来自Google搜索的图像或Twitter帖子的集合,通常没有标记的受保护属性。因此,为了得到这样的数据集的差异度量,元素需要手工标记或群组注释,这是昂贵的过程。我们提出了一个成本效益的方法来近似的差距,一个给定的未标记的数据集,就一个保护的属性,使用一个控制集的标记代表性的例子。该算法利用数据集元素与控制集元素之间的成对相似性,有效地引导数据集元素间的差异近似。重要的是,我们证明了使用一个比数据集的大小小得多的控制集足以获得一个小的近似误差。此外,基于我们的理论框架,我们还提供了一种算法来构造自适应控制集,实现了比随机选择控制集更小的逼近误差。在两个图像数据集和一个Twitter数据集上的仿真表明了我们的方法(使用随机和自适应控制集)在审计各种数据集多样性方面的有效性。 摘要:Assessing the diversity of a dataset of information associated with people is crucial before using such data for downstream applications. For a given dataset, this often involves computing the imbalance or disparity in the empirical marginal distribution of a protected attribute (e.g. gender, dialect, etc.). However, real-world datasets, such as images from Google Search or collections of Twitter posts, often do not have protected attributes labeled. Consequently, to derive disparity measures for such datasets, the elements need to hand-labeled or crowd-annotated, which are expensive processes. We propose a cost-effective approach to approximate the disparity of a given unlabeled dataset, with respect to a protected attribute, using a control set of labeled representative examples. Our proposed algorithm uses the pairwise similarity between elements in the dataset and elements in the control set to effectively bootstrap an approximation to the disparity of the dataset. Importantly, we show that using a control set whose size is much smaller than the size of the dataset is sufficient to achieve a small approximation error. Further, based on our theoretical framework, we also provide an algorithm to construct adaptive control sets that achieve smaller approximation errors than randomly chosen control sets. Simulations on two image datasets and one Twitter dataset demonstrate the efficacy of our approach (using random and adaptive control sets) in auditing the diversity of a wide variety of datasets.

【5】 A Fixed Version of Quadratic Program in Gradient Episodic Memory 标题:梯度情景记忆中二次规划的一个固定版本

作者:Wei Zhou,Yiying Li 机构:National University of Defense Technology 链接:https://arxiv.org/abs/2107.07384 摘要:梯度情景记忆确实是一种新的连续学习方法,它能在不遗忘已有知识的情况下快速解决新问题。然而,在研究这篇论文的过程中,我们发现二次规划对偶问题的证明存在一些问题,因此本文给出了这个问题的固定版本。 摘要:Gradient Episodic Memory is indeed a novel method for continual learning, which solves new problems quickly without forgetting previously acquired knowledge. However, in the process of studying the paper, we found there were some problems in the proof of the dual problem of Quadratic Program, so here we give our fixed version for this problem.

【6】 Copula-Based Normalizing Flows 标题:基于Copula的归一化流

作者:Mike Laszkiewicz,Johannes Lederer,Asja Fischer 机构:Equal contribution 1Department of Mathematics 备注:Accepted for presentation at the ICML 2021 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF 2021) 链接:https://arxiv.org/abs/2107.07352 摘要:通过将数据转换为高斯基分布的样本来学习分布的规范化流已经证明了强大的密度近似。但它们的表现力受到这种基数分布选择的限制。因此,我们建议将基分布推广到更精细的copula分布,以便更准确地捕捉目标分布的特性。在第一个实证分析中,我们证明了这种替代可以显著地提高重尾数据的灵活性、稳定性和有效性。我们的结果表明,改进与学习流的局部Lipschitz稳定性增加有关。 摘要:Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, therefore, propose to generalize the base distribution to a more elaborate copula distribution to capture the properties of the target distribution more accurately. In a first empirical analysis, we demonstrate that this replacement can dramatically improve the vanilla normalizing flows in terms of flexibility, stability, and effectivity for heavy-tailed data. Our results suggest that the improvements are related to an increased local Lipschitz-stability of the learned flow.

【7】 Inferring the Structure of Ordinary Differential Equations 标题:常微分方程结构的推断

作者:Juliane Weilbach,Sebastian Gerwinn,Christian Weilbach,Melih Kandemir 机构: Germany 2University of British Columbia 链接:https://arxiv.org/abs/2107.07345 摘要:理解物理现象通常意味着理解支配观测测量的潜在动力系统。虽然准确的预测可以通过黑箱系统实现,但它们往往缺乏可解释性,不太适合进一步的专家调查。或者,可以通过符号回归分析动力学。在本文中,我们通过(Udrescu et al.,2020)将称为AIFeynman的方法扩展到动态设置中,以便根据从结果轨迹中观察到的结果对ODE系统执行符号回归。我们比较了这种扩展到国家的最先进的方法,符号回归经验上的几个动力系统的地面真值方程的复杂性不断增加可用。虽然提出的方法在这个基准上表现最好,但是我们观察到所有比较符号回归方法在更复杂的系统上的困难,例如Cart-Pole。 摘要:Understanding physical phenomena oftentimes means understanding the underlying dynamical system that governs observational measurements. While accurate prediction can be achieved with black box systems, they often lack interpretability and are less amenable for further expert investigation. Alternatively, the dynamics can be analysed via symbolic regression. In this paper, we extend the approach by (Udrescu et al., 2020) called AIFeynman to the dynamic setting to perform symbolic regression on ODE systems based on observations from the resulting trajectories. We compare this extension to state-of-the-art approaches for symbolic regression empirically on several dynamical systems for which the ground truth equations of increasing complexity are available. Although the proposed approach performs best on this benchmark, we observed difficulties of all the compared symbolic regression approaches on more complex systems, such as Cart-Pole.

【8】 Framework for A Personalized Intelligent Assistant to Elderly People for Activities of Daily Living 标题:一种面向老年人日常生活活动的个性化智能助手框架

作者:Nirmalya Thakur,Chia Y. Han 机构:edu Department of Electrical Engineering and Computer Science University of Cincinnati Cincinnati 备注:None 链接:https://arxiv.org/abs/2107.07344 摘要:随着老年人口的不断增长,人们需要满足他们日益增长的需求,并提供能够提高他们在智能家居中生活质量的解决方案。除了对与系统接口的恐惧和焦虑之外;随着年龄的增长,老年人往往会面临认知障碍、记忆力减退、行为紊乱甚至身体限制等问题。提供以科技为基础的解决方案,以满足老年人的这些需求,并为老年人创造智能和辅助生活空间;关键在于开发能够通过解决其多样性进行调整的系统,并能够根据其日常目标增强其性能。因此,本研究提出一个架构来开发一个个人化的智慧助理员,以协助老人在智慧且互联的物联网环境中进行日常生活活动。这种个性化的智能助手可以分析用户执行的不同任务,并根据用户的日常生活、当前的情感状态和潜在的用户体验来推荐活动。为了验证这个框架的有效性,我们分别对一个普通用户和一个特定用户的数据集进行了测试。结果表明,该模型在对特定用户建模时的性能准确率为73.12%,大大高于对普通用户建模时的性能,支持了该框架的开发和实现。 摘要:The increasing population of elderly people is associated with the need to meet their increasing requirements and to provide solutions that can improve their quality of life in a smart home. In addition to fear and anxiety towards interfacing with systems; cognitive disabilities, weakened memory, disorganized behavior and even physical limitations are some of the problems that elderly people tend to face with increasing age. The essence of providing technology-based solutions to address these needs of elderly people and to create smart and assisted living spaces for the elderly; lies in developing systems that can adapt by addressing their diversity and can augment their performances in the context of their day to day goals. Therefore, this work proposes a framework for development of a Personalized Intelligent Assistant to help elderly people perform Activities of Daily Living (ADLs) in a smart and connected Internet of Things (IoT) based environment. This Personalized Intelligent Assistant can analyze different tasks performed by the user and recommend activities by considering their daily routine, current affective state and the underlining user experience. To uphold the efficacy of this proposed framework, it has been tested on a couple of datasets for modelling an average user and a specific user respectively. The results presented show that the model achieves a performance accuracy of 73.12% when modelling a specific user, which is considerably higher than its performance while modelling an average user, this upholds the relevance for development and implementation of this proposed framework.

【9】 Mutation is all you need 标题:变异就是你需要的全部

作者:Lennart Schneider,Florian Pfisterer,Martin Binder,Bernd Bischl 机构:Department of Statistics, LMU Munich, Germany 备注:Accepted for the 8th ICML Workshop on Automated Machine Learning (2021). 10 pages, 1 table, 3 figures 链接:https://arxiv.org/abs/2107.07343 摘要:神经结构搜索(NAS)通过自动化深度神经网络的结构工程,使非专家能够进行深度学习。香蕉是一个国家的最先进的NAS方法,是嵌入在贝叶斯优化框架。最近的实验结果表明,banana在NAS-Bench-101基准上的强大性能取决于其路径编码,而不是其代理模型的选择。实验结果表明,banana在NAS-Bench-301基准上的性能是由其捕获函数优化器决定的,该优化器对现有的函数变异最小。 摘要:Neural architecture search (NAS) promises to make deep learning accessible to non-experts by automating architecture engineering of deep neural networks. BANANAS is one state-of-the-art NAS method that is embedded within the Bayesian optimization framework. Recent experimental findings have demonstrated the strong performance of BANANAS on the NAS-Bench-101 benchmark being determined by its path encoding and not its choice of surrogate model. We present experimental results suggesting that the performance of BANANAS on the NAS-Bench-301 benchmark is determined by its acquisition function optimizer, which minimally mutates the incumbent.

【10】 Leveraging wisdom of the crowds to improve consensus among radiologists by real time, blinded collaborations on a digital swarm platform 标题:利用群众的智慧,在数字集群平台上实时、盲目地提高放射科医生之间的共识

作者:Rutwik Shah,Bruno Astuto,Tyler Gleason,Will Fletcher,Justin Banaga,Kevin Sweetwood,Allen Ye,Rina Patel,Kevin McGill,Thomas Link,Jason Crane,Valentina Pedoia,Sharmila Majumdar 机构:Center for Intelligent Imaging, Dept. of Radiology and Biomedical Imaging, UCSF 备注:24 pages, 2 tables, 7 figures 链接:https://arxiv.org/abs/2107.07341 摘要:今天,放射科医生在做出诊断决定和标记图像以训练人工智能算法方面发挥着关键作用。在解释具有挑战性的案例时,专家之间的读者可靠性(IRR)较低。尽管基于团队的决策被认为优于个人决策,但在群体互动中,个人间的偏见往往会蔓延,从而限制非主导参与者表达真实的观点。为了克服低共识和个人间偏见的双重问题,我们探索了一种以蜂群为模型的解决方案。两个独立的队列;三名放射科医生和五名放射科住院医师在一个数字swarm平台上进行了实时的、盲法的合作,在膝关节MR检查中对半月板病变进行分级。这些共识投票以临床(关节镜检查)和放射学(最高级放射科医生)观察为基准。将一致投票的内部收益率与两组大多数和最有信心投票的内部收益率进行比较,放射科医师组的群体投票的内部收益率比多数投票提高了23%。与多数票相比,三居民群体投票的内部收益率提高了23%。与多数票相比,5个居民群体的内部收益率提高了32%。Swarm共识投票也提高了50%的特异性。在放射科医生和住院医师队列中,群体一致投票优于个人和多数投票决定。5居民群体的内部收益率高于3居民群体,表明群体规模增加的积极作用。参加和居住的人群也比最先进的人工智能算法预测的要好。利用数字swarm平台改进协议,允许参与者表达无需判断的意图,从而产生卓越的临床表现和强大的人工智能训练标签。 摘要:Radiologists today play a key role in making diagnostic decisions and labeling images for training A.I. algorithms. Low inter-reader reliability (IRR) can be seen between experts when interpreting challenging cases. While teams-based decisions are known to outperform individual decisions, inter-personal biases often creep up in group interactions which limit non-dominant participants from expressing true opinions. To overcome the dual problems of low consensus and inter-personal bias, we explored a solution modeled on biological swarms of bees. Two separate cohorts; three radiologists and five radiology residents collaborated on a digital swarm platform in real time and in a blinded fashion, grading meniscal lesions on knee MR exams. These consensus votes were benchmarked against clinical (arthroscopy) and radiological (senior-most radiologist) observations. The IRR of the consensus votes was compared to the IRR of the majority and most confident votes of the two cohorts.The radiologist cohort saw an improvement of 23% in IRR of swarm votes over majority vote. Similar improvement of 23% in IRR in 3-resident swarm votes over majority vote, was observed. The 5-resident swarm had an even higher improvement of 32% in IRR over majority vote. Swarm consensus votes also improved specificity by up to 50%. The swarm consensus votes outperformed individual and majority vote decisions in both the radiologists and resident cohorts. The 5-resident swarm had higher IRR than 3-resident swarm indicating positive effect of increased swarm size. The attending and resident swarms also outperformed predictions from a state-of-the-art A.I. algorithm. Utilizing a digital swarm platform improved agreement and allows participants to express judgement free intent, resulting in superior clinical performance and robust A.I. training labels.

【11】 Tournesol: A quest for a large, secure and trustworthy database of reliable human judgments 标题:电气石:寻求一个大型的、安全的、值得信赖的、可靠的人类判断数据库

作者:Lê-Nguyên Hoang,Louis Faucon,Aidan Jungo,Sergei Volodin,Dalia Papuc,Orfeas Liossatos,Ben Crulis,Mariame Tighanimine,Isabela Constantin,Anastasiia Kucherenko,Alexandre Maurer,Felix Grimberg,Vlad Nitu,Chris Vossen,Sébastien Rouault,El-Mahdi El-Mhamdi 机构:IC, EPFL, Switzerland, Tournesol Association, Switzerland, University of Tours, France, LISE, CNAM-CNRS, France, UM,P, Benguerir, Morocco, CNRS, INSA Lyon, France, ´Ecole Polytechnique, France 备注:27 pages, 13 figures 链接:https://arxiv.org/abs/2107.07334 摘要:今天的大规模算法已经变得非常有影响力,因为它们推荐并调节数十亿人每天接触的内容。他们实际上是我们社会信息饮食的监管者,从形成公众健康观点到组织社会运动团体。这引起了人们的严重关注,但也为推广高质量信息提供了巨大的机会。解决问题和抓住机遇是一项具有挑战性的、巨大的、令人难以置信的工作,因为直觉上有吸引力的想法往往会带来不必要的副作用,而且需要我们思考自己最喜欢的东西。了解当今的大规模算法是如何构建的,对于确定哪些干预措施最有效至关重要。鉴于这些算法在很大程度上依赖于{it机器学习},我们得出以下关键结论:emph{任何在非受控数据上训练的算法都不能被信任}。事实上,恶意实体可以控制数据,用危险的操纵性伪造输入毒害数据,从而使经过训练的算法极其不安全。因此,我们认为,走向安全和合乎道德的大规模算法的第一步必须是收集一个大的、安全的和可靠的人类判断数据集。{我们要在这个平台上实现一个开放的源代码Tournesol}{https://tournesol.app}. Tournesol的目标是收集一个大型数据库,其中包含人类对哪些算法应该广泛推荐(以及哪些算法应该停止广泛推荐)的判断。我们概述了Tournesol数据库的结构、Tournesol平台的主要特点以及使其成为一个成功项目必须克服的主要障碍。最重要的是,我们认为,如果成功,ToueNoL可以作为任何安全和伦理大规模算法的必要基础。 摘要:Today's large-scale algorithms have become immensely influential, as they recommend and moderate the content that billions of humans are exposed to on a daily basis. They are the de-facto regulators of our societies' information diet, from shaping opinions on public health to organizing groups for social movements. This creates serious concerns, but also great opportunities to promote quality information. Addressing the concerns and seizing the opportunities is a challenging, enormous and fabulous endeavor, as intuitively appealing ideas often come with unwanted {it side effects}, and as it requires us to think about what we deeply prefer. Understanding how today's large-scale algorithms are built is critical to determine what interventions will be most effective. Given that these algorithms rely heavily on {it machine learning}, we make the following key observation: emph{any algorithm trained on uncontrolled data must not be trusted}. Indeed, a malicious entity could take control over the data, poison it with dangerously manipulative fabricated inputs, and thereby make the trained algorithm extremely unsafe. We thus argue that the first step towards safe and ethical large-scale algorithms must be the collection of a large, secure and trustworthy dataset of reliable human judgments. To achieve this, we introduce emph{Tournesol}, an open source platform available at url{https://tournesol.app}. Tournesol aims to collect a large database of human judgments on what algorithms ought to widely recommend (and what they ought to stop widely recommending). We outline the structure of the Tournesol database, the key features of the Tournesol platform and the main hurdles that must be overcome to make it a successful project. Most importantly, we argue that, if successful, Tournesol may then serve as the essential foundation for any safe and ethical large-scale algorithm.

【12】 Input Dependent Sparse Gaussian Processes 标题:依赖输入的稀疏高斯过程

作者:Bahram Jafrasteh,Carlos Villacampa-Calvo,Daniel Hernández-Lobato 机构:Computer Science Department, Universidad Autónoma de Madrid 链接:https://arxiv.org/abs/2107.07281 摘要:高斯过程(GPs)是一种贝叶斯模型,提供与预测相关的不确定性估计。由于其非参数性质,它们也非常灵活。然而,随着训练实例数量的增加,GPs的可扩展性较差。更准确地说,它们的立方成本是N$。为了克服这个问题,通常使用稀疏GP近似,在训练过程中引入一组$Mlln$诱导点。将诱导点作为近似后验分布参数$q$学习诱导点的位置。稀疏GPs与变分推理相结合,可将GPs的训练成本降低到$mathcal{O}(M^3)$。关键的是,诱导点决定了模型的灵活性,它们通常位于潜在函数变化的输入空间区域。然而,对于某些学习任务,为了获得良好的预测性能,可能需要大量的诱导点。为了解决这一局限性,我们建议在这里分期计算诱导点的位置,以及变分后验近似q的参数。为此,我们使用一个神经网络来接收观测数据作为输入,并输出诱导点的位置和参数$q$。我们在几个实验中对我们的方法进行了评估,结果表明它的性能与其他最先进的稀疏变分GP方法相似或更好。然而,由于诱导点对输入数据的依赖性,我们的方法大大减少了诱导点的数目。这使得我们的方法可以扩展到更大的数据集,并且具有更快的训练和预测时间。 摘要:Gaussian Processes (GPs) are Bayesian models that provide uncertainty estimates associated to the predictions made. They are also very flexible due to their non-parametric nature. Nevertheless, GPs suffer from poor scalability as the number of training instances N increases. More precisely, they have a cubic cost with respect to $N$. To overcome this problem, sparse GP approximations are often used, where a set of $M ll N$ inducing points is introduced during training. The location of the inducing points is learned by considering them as parameters of an approximate posterior distribution $q$. Sparse GPs, combined with variational inference for inferring $q$, reduce the training cost of GPs to $mathcal{O}(M^3)$. Critically, the inducing points determine the flexibility of the model and they are often located in regions of the input space where the latent function changes. A limitation is, however, that for some learning tasks a large number of inducing points may be required to obtain a good prediction performance. To address this limitation, we propose here to amortize the computation of the inducing points locations, as well as the parameters of the variational posterior approximation q. For this, we use a neural network that receives the observed data as an input and outputs the inducing points locations and the parameters of $q$. We evaluate our method in several experiments, showing that it performs similar or better than other state-of-the-art sparse variational GP approaches. However, with our method the number of inducing points is reduced drastically due to their dependency on the input data. This makes our method scale to larger datasets and have faster training and prediction times.

【13】 On the expressivity of bi-Lipschitz normalizing flows 标题:关于双Lipschitz正规流的表现性

作者:Alexandre Verine,Benjamin Negrevergne,Fabrice Rossi,Yann Chevaleyre 机构: several researchers have studied thelimitations of neural networks with bounded Lipschitz con- 1Universit´e Paris-Dauphine, PSL Research University, France 2Universit´e Paris-Dauphine, PSLResearchUniversity 链接:https://arxiv.org/abs/2107.07232 摘要:如果可逆函数及其逆函数都有界Lipschitz常数,则可逆函数是双Lipschitz。目前,大多数标准化流都是通过设计或训练来限制数值误差(除其他外)的双Lipschitz流。本文讨论了bi-Lipschitz规范化流的表示性,并确定了几种难以用这种模型来近似的目标分布。然后,我们通过给出这些特别不利分布之间的总变化距离的几个下界以及它们的最佳可能近似,来刻画bi-Lipschitz正规化流的表达能力。最后,我们讨论了潜在的补救措施,其中包括使用更复杂的潜在分布。 摘要:An invertible function is bi-Lipschitz if both the function and its inverse have bounded Lipschitz constants. Nowadays, most Normalizing Flows are bi-Lipschitz by design or by training to limit numerical errors (among other things). In this paper, we discuss the expressivity of bi-Lipschitz Normalizing Flows and identify several target distributions that are difficult to approximate using such models. Then, we characterize the expressivity of bi-Lipschitz Normalizing Flows by giving several lower bounds on the Total Variation distance between these particularly unfavorable distributions and their best possible approximation. Finally, we discuss potential remedies which include using more complex latent distributions.

【14】 What Image Features Boost Housing Market Predictions? 标题:哪些形象特征提升了房地产市场预测?

作者:Zona Kostic,Aleksandar Jevremovic 机构: Harvard University, Paulson School Of Engineering andApplied Sciences, Singidunum University 链接:https://arxiv.org/abs/2107.07148 摘要:房地产的吸引力是最有趣,但最具挑战性的类别之一。图像特征用于描述某些属性,并检查视觉因素对上市价格或时间框架的影响。在这篇论文中,我们提出了一套技术来提取视觉特征,以便在现代预测算法中有效地包含数值。我们讨论了香农熵、重心计算、图像分割和卷积神经网络等技术。通过将这些技术应用于一组房地产相关图像(室内、室外和卫星图像)的比较,我们得出以下结论:(i)熵是预测房价最有效的单位数视觉度量(图像分割是预测房屋寿命最重要的视觉特征;以及(iii)深部图像特征可用于量化内部特征,并有助于吸引建模。这里选择的40个图像特征集具有很强的预测能力,并优于一些最强的元数据预测。在房地产估价过程中,无需更换专家,本文提出的技术可以有效地描述可见特征,从而将感知吸引力作为一种定量度量引入到住房预测建模中。 摘要:The attractiveness of a property is one of the most interesting, yet challenging, categories to model. Image characteristics are used to describe certain attributes, and to examine the influence of visual factors on the price or timeframe of the listing. In this paper, we propose a set of techniques for the extraction of visual features for efficient numerical inclusion in modern-day predictive algorithms. We discuss techniques such as Shannon's entropy, calculating the center of gravity, employing image segmentation, and using Convolutional Neural Networks. After comparing these techniques as applied to a set of property-related images (indoor, outdoor, and satellite), we conclude the following: (i) the entropy is the most efficient single-digit visual measure for housing price prediction; (ii) image segmentation is the most important visual feature for the prediction of housing lifespan; and (iii) deep image features can be used to quantify interior characteristics and contribute to captivation modeling. The set of 40 image features selected here carries a significant amount of predictive power and outperforms some of the strongest metadata predictors. Without any need to replace a human expert in a real-estate appraisal process, we conclude that the techniques presented in this paper can efficiently describe visible characteristics, thus introducing perceived attractiveness as a quantitative measure into the predictive modeling of housing.

【15】 Recurrent Parameter Generators 标题:递归参数生成器

作者:Jiayun Wang,Yubei Chen,Stella X. Yu,Brian Cheung,Yann LeCun 机构: UC Berkeley ICSI , Facebook AI Research, New York University , MIT CSAIL & BCS 链接:https://arxiv.org/abs/2107.07110 摘要:我们提出了一种通用的方法,对许多不同的卷积层循环使用相同的参数来构建一个深度网络。具体地说,对于一个网络,我们创建一个递归参数发生器(RPG),从中生成每个卷积层的参数。虽然使用递归模型来建立深度卷积神经网络(CNN)并不完全是一种新方法,但是与现有的方法相比,我们的方法取得了显著的性能改进。我们演示了如何建立一个单层的神经网络,以达到类似的性能相比,其他传统的CNN模型在各种应用和数据集。这种方法使我们可以建立一个任意复杂的神经网络的任何数量的参数。例如,我们构建了一个ResNet34,模型参数减少了400美元以上,仍然达到了41.6美元的ImageNet top-1精度。此外,我们还证明了RPG可以应用于不同的尺度,如层、块甚至子网络。具体来说,我们使用RPG构建了一个ResNet18网络,其权值的数量相当于传统ResNet的一个卷积层,并且表明该模型可以达到$67.2%$ImageNet top-1的精度。该方法可以看作是一种逆模型压缩方法。它的目的不是从一个大模型中删除未使用的参数,而是将更多的信息压缩到少量的参数中。大量的实验结果证明了该参数发生器的有效性。 摘要:We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. Specifically, for a network, we create a recurrent parameter generator (RPG), from which the parameters of each convolution layer are generated. Though using recurrent models to build a deep convolutional neural network (CNN) is not entirely new, our method achieves significant performance gain compared to the existing works. We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets. Such a method allows us to build an arbitrarily complex neural network with any amount of parameters. For example, we build a ResNet34 with model parameters reduced by more than $400$ times, which still achieves $41.6%$ ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG can be applied at different scales, such as layers, blocks, or even sub-networks. Specifically, we use the RPG to build a ResNet18 network with the number of weights equivalent to one convolutional layer of a conventional ResNet and show this model can achieve $67.2%$ ImageNet top-1 accuracy. The proposed method can be viewed as an inverse approach to model compression. Rather than removing the unused parameters from a large model, it aims to squeeze more information into a small number of parameters. Extensive experiment results are provided to demonstrate the power of the proposed recurrent parameter generator.

【16】 A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing 标题:一种广义的保边缘保结构图像平滑框架

作者:Wei Liu,Pingping Zhang,Yinjie Lei,Xiaolin Huang,Jie Yang,Michael Ng 机构:cn•Yinjie Lei is with the School of Electronics and Information Engineering, Sichuan University 备注:This work is accepted by TPAMI. The code is available at this https URL arXiv admin note: substantial text overlap with arXiv:1907.09642 链接:https://arxiv.org/abs/2107.07058 摘要:图像平滑是计算机视觉和图形学应用中的一个基本过程。在不同的任务中,所需的平滑特性可能不同,甚至相互矛盾。然而,一个平滑算子固有的平滑性质通常是固定的,因此不能满足不同应用的各种要求。本文首先介绍了截尾Huber罚函数,它在不同的参数设置下表现出很强的灵活性。在此基础上,引入截断Huber罚函数,提出了一个广义框架。结合其强大的灵活性,我们的框架能够实现不同的平滑性质,甚至可以实现矛盾的平滑行为。它还可以产生以前的方法很难实现的平滑行为,因此在具有挑战性的情况下可以获得优异的性能。这些共同使得我们的框架能够应用于一系列应用,并且能够在多个任务中优于最先进的方法,例如图像细节增强、剪贴画压缩伪影去除、引导深度图恢复、图像纹理去除等,在非凸非光滑的优化框架下,给出了一个有效的数值解,并从理论上保证了其收敛性。进一步提出了一种简单而有效的方法,在保持算法性能的同时降低了算法的计算量。通过一系列应用的综合实验,验证了该方法的有效性和优越的性能。我们的代码在https://github.com/wliusjtu/Generalized-Smoothing-Framework. 摘要:Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, we first introduce the truncated Huber penalty function which shows strong flexibility under different parameter settings. A generalized framework is then proposed with the introduced truncated Huber penalty function. When combined with its strong flexibility, our framework is able to achieve diverse smoothing natures where contradictive smoothing behaviors can even be achieved. It can also yield the smoothing behavior that can seldom be achieved by previous methods, and superior performance is thus achieved in challenging cases. These together enable our framework capable of a range of applications and able to outperform the state-of-the-art approaches in several tasks, such as image detail enhancement, clip-art compression artifacts removal, guided depth map restoration, image texture removal, etc. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. A simple yet effective approach is further proposed to reduce the computational cost of our method while maintaining its performance. The effectiveness and superior performance of our approach are validated through comprehensive experiments in a range of applications. Our code is available at https://github.com/wliusjtu/Generalized-Smoothing-Framework.

【17】 Conditional Teaching Size 标题:有条件的教学规模

作者:Manuel Garcia-Piqueras,José Hernández-Orallo 机构:Jos´e Hern´andez-Orallo,[,−,−,−,], Math. Dept., Universidad de Castilla-La Mancha, Albacete, Spain, VRAIN, Universitat Politecnica de Valencia, Valencia, Spain 备注:26 pages 链接:https://arxiv.org/abs/2107.07038 摘要:最近的机器教学研究探索了用通用语言表达任何概念的教学。在这种组合背景下,新的实验结果表明,存在着比概念描述本身短得惊人的数据教学集。然而,通过教学规模和概念复杂性,这些显著的实验结果存在一定的局限性,我们将在这里进一步探讨。由于概念很少单独教授,我们研究概念的最佳配置来教授给定的一组概念,其中首先获得的概念可以重新用于描述新的概念。这种新的条件教学规模的概念揭示了新的见解,如插入现象:某些先验知识产生了更简单的兼容概念,增加了我们想要教授的概念的教学规模。对于条件Kolmogorov复杂性,这种情况不会发生。在此基础上,提出了一种基于避免插入的课程优化算法。本文介绍了一系列的理论结果,包括它们的证明,以及今后工作的一些方向。作文情境下课程教学的新的研究可能性正在被广泛探索。 摘要:Recent research in machine teaching has explored the instruction of any concept expressed in a universal language. In this compositional context, new experimental results have shown that there exist data teaching sets surprisingly shorter than the concept description itself. However, there exists a bound for those remarkable experimental findings through teaching size and concept complexity that we further explore here. As concepts are rarely taught in isolation we investigate the best configuration of concepts to teach a given set of concepts, where those that have been acquired first can be reused for the description of new ones. This new notion of conditional teaching size uncovers new insights, such as the interposition phenomenon: certain prior knowledge generates simpler compatible concepts that increase the teaching size of the concept that we want to teach. This does not happen for conditional Kolmogorov complexity. Furthermore, we provide an algorithm that constructs optimal curricula based on interposition avoidance. This paper presents a series of theoretical results, including their proofs, and some directions for future work. New research possibilities in curriculum teaching in compositional scenarios are now wide open to exploration.

【18】 Free-Text Keystroke Dynamics for User Authentication 标题:用于用户身份验证的自由文本击键动力学

作者:Jianwei Li,Han-Chih Chang,Mark Stamp 链接:https://arxiv.org/abs/2107.07009 摘要:在本研究中,我们考虑基于从自由文本获得的击键动力学验证用户身份的问题。我们采用了一种新的特征工程方法来生成像图像一样的过渡矩阵。对于这一类图像特征,采用带切口的卷积神经网络(CNN)可以获得最佳效果。一个由CNN和一个递归神经网络(RNN)组成的混合模型也被证明比以前在这个领域的研究有更好的表现。 摘要:In this research, we consider the problem of verifying user identity based on keystroke dynamics obtained from free-text. We employ a novel feature engineering method that generates image-like transition matrices. For this image-like feature, a convolution neural network (CNN) with cutout achieves the best results. A hybrid model consisting of a CNN and a recurrent neural network (RNN) is also shown to outperform previous research in this field.

【19】 The Benchmark Lottery 标题:标杆彩票

作者:Mostafa Dehghani,Yi Tay,Alexey A. Gritsenko,Zhe Zhao,Neil Houlsby,Fernando Diaz,Donald Metzler,Oriol Vinyals 机构:Google Brain, Google Research, DeepMind 链接:https://arxiv.org/abs/2107.07002 摘要:为了确定不同算法和方法的相对有效性,经验机器学习的世界强烈依赖于基准测试。本文提出了“基准抽奖”的概念,描述了ML基准过程的整体脆弱性。基准彩票假设除了基本算法的优越性之外,许多因素都可能导致一种方法被认为是优越的。在ML社区中流行的多个基准设置上,我们表明,只要选择不同的基准任务,算法的相对性能可能会显著改变,这突出了当前范例的脆弱性以及基准ML方法可能产生的错误解释。考虑到每一个基准都会说明它认为什么是重要的,我们认为这可能会导致社区中有偏见的进展。我们讨论了观察到的现象的含义,并提供了使用多个机器学习领域和社区作为用例来缓解这些现象的建议,包括自然语言处理、计算机视觉、信息检索、推荐系统和强化学习。 摘要:The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior. On multiple benchmark setups that are prevalent in the ML community, we show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks, highlighting the fragility of the current paradigms and potential fallacious interpretation derived from benchmarking ML methods. Given that every benchmark makes a statement about what it perceives to be important, we argue that this might lead to biased progress in the community. We discuss the implications of the observed phenomena and provide recommendations on mitigating them using multiple machine learning domains and communities as use cases, including natural language processing, computer vision, information retrieval, recommender systems, and reinforcement learning.

【20】 On the impossibility of non-trivial accuracy under fairness constraints 标题:论公平约束下非平凡精度的不可能性

作者:Carlos Pinzón,Catuscia Palamidessi,Pablo Piantanida,Frank Valencia 机构:Inria, France, CNRS, France, L,S, Centrale Supélec, Université Paris-Saclay, LIX, École Polytechnique, Institut Polytechnique de Paris 备注:9 pages 链接:https://arxiv.org/abs/2107.06944 摘要:关于机器学习(ML)中的公平性的一个主要关注点是,为了实现它,人们可能不得不放弃一些精确性。考虑到这一权衡,Hardt等人提出了机会均等(EO)的概念,旨在与精确性相兼容。事实上,可以证明,如果输入数据的来源是确定的,那么这两个概念就可以很好地相互配合。然而,在概率的情况下,情况发生了变化。正如我们所展示的,对于概率数据源,EO只能在完全不影响精度的情况下实现,即在实现EO的模型中,那些预测不依赖于输入的模型具有最高的精度。 摘要:One of the main concerns about fairness in machine learning (ML) is that, in order to achieve it, one may have to renounce to some accuracy. Having this trade-off in mind, Hardt et al. have proposed the notion of equal opportunities (EO), designed so as to be compatible with accuracy. In fact, it can be shown that if the source of input data is deterministic, the two notions go well along with each other. In the probabilistic case, however, things change. As we show, there are probabilistic data sources for which EO can only be achieved at the total detriment of accuracy, i.e. among the models that achieve EO, those whose prediction does not depend on the input have the highest accuracy.

【21】 Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update 标题:无牛顿:草图牛顿更新的简单化而不需要权衡

作者:Michał Dereziński,Jonathan Lacotte,Mert Pilanci,Michael W. Mahoney 链接:https://arxiv.org/abs/2107.07480 摘要:在二阶优化中,一个潜在的瓶颈是在每次迭代时计算优化函数的Hessian矩阵。随机素描已成为一种强有力的技术,用于构建估计的赫斯可用于执行近似牛顿步骤。这涉及到一个随机素描矩阵的乘法,这在素描的计算量和优化算法的收敛速度之间引入了一个折衷。一个理论上可取但实际上过于昂贵的选择是使用稠密高斯草图矩阵,它产生精确牛顿步的无偏估计,并提供与问题无关的强大收敛保证。我们证明了高斯素描矩阵可以极大地稀疏化,显著地减少了素描的计算量,而不影响其收敛性。这种称为Newton LESS的方法基于最近引入的一种绘制技术:利用分数稀疏(LESS)嵌入。我们证明了对于一大类优化任务,Newton-LESS与Gaussian嵌入具有几乎相同的与问题无关的局部收敛速度,不仅可以达到常数因子,甚至可以达到低阶项。特别是,这导致了一个新的国家的最先进的收敛结果的迭代最小二乘解算器。最后,我们将较少的嵌入扩展到均匀稀疏的随机符号矩阵,这些随机符号矩阵可以有效地实现,并且在数值实验中表现良好。 摘要:In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton-LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.

【22】 Multi-label Chaining with Imprecise Probabilities 标题:具有不精确概率的多标签链

作者:Yonatan Carlos Carranza Alarcón,Sébastien Destercke 机构:Multi-label chaining with imprecise probabilitiesYonatan Carlos Carranza Alarc´on 1[0000−000 2−86 57−6 3 5 5] and S´ebastienDestercke 1[0000−000 3− 20 26− 468X]Sorbonne Universit´es, Universit´e Technologique de Compiegne 链接:https://arxiv.org/abs/2107.07443 摘要:我们提出了两种不同的策略来扩展经典的多标签链方法来处理不精确的概率估计。这些估计使用分布的凸集(或credal集)来描述我们的不确定性,而不是精确的不确定性。使用这种估计的主要原因是:(1)当链中检测到高度不确定性时,做出谨慎的预测(或根本不做决定),以及(2)通过避免链中早期决策中产生的偏差,做出更精确的预测。通过使用朴素credal分类器,我们提出了有效的程序和理论证明来解决这两种策略。我们在缺失标签上的实验结果表明,我们的方法对精确模型失败的那些难以预测的实例产生了相应的谨慎性。 摘要:We present two different strategies to extend the classical multi-label chaining approach to handle imprecise probability estimates. These estimates use convex sets of distributions (or credal sets) in order to describe our uncertainty rather than a precise one. The main reasons one could have for using such estimations are (1) to make cautious predictions (or no decision at all) when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining. Through the use of the naive credal classifier, we propose efficient procedures with theoretical justifications to solve both strategies. Our experimental results on missing labels, which investigate how reliable these predictions are in both approaches, indicate that our approaches produce relevant cautiousness on those hard-to-predict instances where the precise models fail.

【23】 A unified framework for bandit multiple testing 标题:一种统一的盗版多重测试框架

作者:Ziyu Xu,Ruodu Wang,Aaditya Ramdas 机构:Departments of ,Statistics and ,Machine Learning, Carnegie Mellon University, Department of Statistics and Actuarial Science, University of Waterloo 备注:37 pages. 6 figures 链接:https://arxiv.org/abs/2107.07322 摘要:在bandit多假设测试中,每个arm对应一个我们希望测试的不同的零假设,目标是设计自适应算法,正确地识别大量有趣的arm(真实的发现),而只是错误地识别一些不感兴趣的arm(错误的发现)。非bandit多重测试中的一个常见指标是错误发现率(FDR)。我们提出了一个统一的,模块化的框架,土匪罗斯福控制,强调解耦的探索和总结的证据。我们利用强大的基于鞅的“e-过程”概念来确保在一般问题设置中对任意复合零、探索规则和停止时间的FDR控制。特别地,有效的FDR控制保持,即使武器的报酬分配可能是依赖的,多个武器可以同时被查询,并且多个(合作或竞争的)代理可以查询武器,也包括组合半强盗类型设置。先前的工作已经非常详细地考虑了这样一种设置,即每个手臂的报酬分布是独立的、次高斯分布的,并且在每一步都查询一个手臂。在这种特殊情况下,我们的框架恢复了匹配的样本复杂性保证,并且在实践中表现得相当或更好。对于其他设置,样本复杂性将取决于问题的更精细细节(正在测试的复合空值、探索算法、数据依赖结构、停止规则),我们不探索这些;我们的贡献是要表明罗斯福的保证是干净的,对这些细节是完全不可知的。 摘要:In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set of interesting arms (true discoveries), while only mistakenly identifying a few uninteresting ones (false discoveries). One common metric in non-bandit multiple testing is the false discovery rate (FDR). We propose a unified, modular framework for bandit FDR control that emphasizes the decoupling of exploration and summarization of evidence. We utilize the powerful martingale-based concept of ``e-processes'' to ensure FDR control for arbitrary composite nulls, exploration rules and stopping times in generic problem settings. In particular, valid FDR control holds even if the reward distributions of the arms could be dependent, multiple arms may be queried simultaneously, and multiple (cooperating or competing) agents may be querying arms, covering combinatorial semi-bandit type settings as well. Prior work has considered in great detail the setting where each arm's reward distribution is independent and sub-Gaussian, and a single arm is queried at each step. Our framework recovers matching sample complexity guarantees in this special case, and performs comparably or better in practice. For other settings, sample complexities will depend on the finer details of the problem (composite nulls being tested, exploration algorithm, data dependence structure, stopping rule) and we do not explore these; our contribution is to show that the FDR guarantee is clean and entirely agnostic to these details.

【24】 Hida-Matérn Kernel 标题:Hida-Matérn核

作者:Matthew Dowling,Piotr Sokół,Il Memming Park 机构:Department of Neurobiology and Behavior, Department of Electrical and Computer Engineering, Institute for Advanced Computational Sciences, Institute of AI-driven Discovery and Innovation, Stony Brook University, Stony Brook, NY, USA 链接:https://arxiv.org/abs/2107.07098 摘要:我们提出了一类Hida-Mat核,它是平稳Gauss-Markov过程整个空间上协方差函数的规范族。它通过允许在具有振荡成分的过程上灵活构造先验,扩展到Mat核上。任何平稳核,包括广泛使用的平方指数核和谱混合核,要么直接在这类核内,要么是适当的渐近极限,证明了这类核的普遍性。利用它的马尔可夫性质,我们展示了如何仅使用核及其导数来表示状态空间模型这样的过程。反过来,这使我们能够更有效地执行高斯过程推理,并减轻通常的计算负担。我们还展示了如何利用状态空间表示的特殊性质,在进一步降低计算复杂度的同时,提高数值稳定性。 摘要:We present the class of Hida-Mat'ern kernels, which is the canonical family of covariance functions over the entire space of stationary Gauss-Markov Processes. It extends upon Mat'ern kernels, by allowing for flexible construction of priors over processes with oscillatory components. Any stationary kernel, including the widely used squared-exponential and spectral mixture kernels, are either directly within this class or are appropriate asymptotic limits, demonstrating the generality of this class. Taking advantage of its Markovian nature we show how to represent such processes as state space models using only the kernel and its derivatives. In turn this allows us to perform Gaussian Process inference more efficiently and side step the usual computational burdens. We also show how exploiting special properties of the state space representation enables improved numerical stability in addition to further reductions of computational complexity.

0 人点赞