人工智能学术速递[7.13]

2021-07-27 10:51:22 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.AI人工智能,共计91篇

【1】 Hierarchical Neural Dynamic Policies 标题:分层神经动态策略

作者:Shikhar Bahl,Abhinav Gupta,Deepak Pathak 机构:Carnegie Mellon University 备注:Accepted at RSS 2021. Videos and code at this https URL 链接:https://arxiv.org/abs/2107.05627 摘要:我们在学习高维图像输入的同时,解决了对现实世界中动态任务的不可见配置的泛化问题。基于非线性动力学系统的方法已经成功地演示了机器人的动态行为,但是很难推广到不可见的结构以及从图像输入中学习。最近的工作通过使用深度网络策略和重参数化动作来嵌入动态系统的结构来解决这个问题,但是仍然在图像目标的不同配置域中挣扎,因此很难推广。在本文中,我们通过将动态系统的结构嵌入到一个称为层次神经动态策略(H-NDPs)的层次深度策略学习框架中来解决这种二分性。H-NDPs不是直接将深层动力系统与不同的数据相匹配,而是在状态空间中学习基于局部动力系统的策略,然后将其提取为仅从高维图像操作的基于全局动力系统的策略。H-NDP还提供了平滑的轨迹,在现实世界中具有强大的安全优势。我们在现实世界(数字书写、舀水和倒水)和模拟(抓、扔、拣)中对动态任务进行了广泛的实验。我们发现,H-NDPs可以很容易地与模仿和强化学习相结合,并获得最先进的结果。视频结果位于https://shikharbahl.github.io/hierarchical-ndps/ 摘要:We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at https://shikharbahl.github.io/hierarchical-ndps/

【2】 Let's Play for Action: Recognizing Activities of Daily Living by Learning from Life Simulation Video Games 标题:让我们为行动而玩:从生活模拟视频游戏中学习认识日常生活活动

作者:Alina Roitberg,David Schneider,Aulia Djamal,Constantin Seibold,Simon Reiß,Rainer Stiefelhagen 机构:Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany, Accepted at IROS , © IEEE. Personal use is permitted, but republicationredistribution requires IEEE permission. Permission from IEEE must be 链接:https://arxiv.org/abs/2107.05617 摘要:对智能辅助机器人来说,识别日常生活活动(ADL)是一个至关重要的过程,但是收集大量带注释的数据集需要耗时的时间标记,并且会引起隐私问题,例如,如果数据是在真实的家庭中收集的。在这项工作中,我们探索了通过玩生活模拟视频游戏来构建ADL识别训练示例的概念,并介绍了用流行的商业游戏Sims4创建的SIMS4ACTION数据集。我们通过“自上而下”的方式具体执行感兴趣的动作来构建SIMS4ACTION,而游戏环境允许我们在环境、拍摄角度和主题外观之间自由切换。虽然从理论角度来看,对游戏数据的ADL识别很有趣,但关键的挑战在于将其转移到现实世界的应用程序,如智能家居或辅助机器人。为了满足这个需求,Sims4Action附带了一个GamingToReal基准,在这个基准中,模型是在从现有ADL数据集派生的真实视频上进行评估的。在我们的框架中,我们整合了两种现代的基于视频的活动识别算法,揭示了生活模拟视频游戏作为一种廉价且入侵性小得多的训练数据源的价值。然而,我们的研究结果也表明,涉及游戏和真实数据混合的任务具有挑战性,开辟了一个新的研究方向。我们将在https://github.com/aroitberg/sims4action. 摘要:Recognizing Activities of Daily Living (ADL) is a vital process for intelligent assistive robots, but collecting large annotated datasets requires time-consuming temporal labeling and raises privacy concerns, e.g., if the data is collected in a real household. In this work, we explore the concept of constructing training examples for ADL recognition by playing life simulation video games and introduce the SIMS4ACTION dataset created with the popular commercial game THE SIMS 4. We build Sims4Action by specifically executing actions-of-interest in a "top-down" manner, while the gaming circumstances allow us to freely switch between environments, camera angles and subject appearances. While ADL recognition on gaming data is interesting from the theoretical perspective, the key challenge arises from transferring it to the real-world applications, such as smart-homes or assistive robotics. To meet this requirement, Sims4Action is accompanied with a GamingToReal benchmark, where the models are evaluated on real videos derived from an existing ADL dataset. We integrate two modern algorithms for video-based activity recognition in our framework, revealing the value of life simulation video games as an inexpensive and far less intrusive source of training data. However, our results also indicate that tasks involving a mixture of gaming and real data are challenging, opening a new research direction. We will make our dataset publicly available at https://github.com/aroitberg/sims4action.

【3】 A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution 标题:一种面向高级自然语言指令执行的持久空间语义表示

作者:Valts Blukis,Chris Paxton,Dieter Fox,Animesh Garg,Yoav Artzi 机构:NVIDIA, Cornell University, University of Washington, University of Toronto, Vector Institute 备注:Submitted to CoRL 2021 链接:https://arxiv.org/abs/2107.05612 摘要:自然语言为机器人代理提供了一个可访问的、可表达的接口来指定长期任务。然而,非专家可能会用高级指令来指定这些任务,这些指令通过几个抽象层抽象特定的机器人动作。我们认为,在长时间的执行范围内,语言和机器人动作之间弥合这一鸿沟的关键是持久性表示。我们提出了一种持久的空间语义表示方法,并展示了如何构建一个执行分层推理的代理来有效地执行长期任务。我们在ALFRED基准上评估我们的方法,并获得最先进的结果,尽管完全避免了常用的分步说明。 摘要:Natural language provides an accessible and expressive interface to specify long-term tasks for robotic agents. However, non-experts are likely to specify such tasks with high-level instructions, which abstract over specific robot actions through several layers of abstraction. We propose that key to bridging this gap between language and robot actions over long execution horizons are persistent representations. We propose a persistent spatial semantic representation method, and show how it enables building an agent that performs hierarchical reasoning to effectively execute long-term tasks. We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.

【4】 Active Divergence with Generative Deep Learning -- A Survey and Taxonomy 标题:生成性深度学习的主动发散--综述与分类

作者:Terence Broad,Sebastian Berns,Simon Colton,Mick Grierson 机构: Department of Computing, Goldsmiths, University of London, UK, Creative Computing Institute, University of The Arts London, UK, School of Electronic Engineering and Computer Science, Queen Mary University of London, UK 链接:https://arxiv.org/abs/2107.05599 摘要:生成性深度学习系统为人工制品的生成提供了强大的工具,因为它们能够对数据的分布进行建模并生成高保真的结果。然而,在计算创造性的背景下,一个主要的缺点是它们不能以创造性的方式显式地偏离训练数据,并且仅限于拟合目标数据分布。为了解决这些局限性,有越来越多的方法对这些模型进行优化、黑客攻击和重写,以便主动偏离训练数据。我们对主动发散技术的现状进行了分类和全面的调查,强调了计算创造力研究人员在真正创造性系统中推进这些方法和使用深层生成模型的潜力。 摘要:Generative deep learning systems offer powerful tools for artefact generation, given their ability to model distributions of data and generate high-fidelity results. In the context of computational creativity, however, a major shortcoming is that they are unable to explicitly diverge from the training data in creative ways and are limited to fitting the target data distribution. To address these limitations, there have been a growing number of approaches for optimising, hacking and rewriting these models in order to actively diverge from the training data. We present a taxonomy and comprehensive survey of the state of the art of active divergence techniques, highlighting the potential for computational creativity researchers to advance these methods and use deep generative models in truly creative systems.

【5】 ProGS: Property Graph Shapes Language (Extended Version) 标题:PROGS:属性图形状语言(扩展版本)

作者:Philipp Seifer,Ralf Lämmel,Steffen Staab 机构: The Software Languages Team, University of Koblenz-Landau, Germany, Institute for Parallel and Distributed Systems, University of Stuttgart, Germany, Web and Internet Science Research Group, University of Southampton, England 链接:https://arxiv.org/abs/2107.05566 摘要:属性图构成表示知识图的数据模型。它们允许方便地表示事实,包括关于事实的事实,在其他三元组的主客体位置用三元组表示。像Wikidata这样的知识图是由各种各样的贡献者和各种各样的来源创建的,这使得它们容易出现两种类型的错误。第一类错误,即事实的虚假性,通过属性图来表示出处和有效性,使得三元组作为一级对象出现在元数据三元组的主语位置。第二类错误,违反域约束,到目前为止还没有涉及到属性图。在RDF表示中,这个错误可以通过SHACL或ShEx等形状语言来解决,这些语言允许检查图对于一组域约束是否有效。借鉴SHACL的语法和语义定义,我们设计了一种用于属性图的形状语言ProGS,它允许在属性图上构造形状约束,包括属性图的特定结构,如带有标识的边和节点和边的键值注释。我们定义了ProGS的形式化语义,研究了根据ProGS形状集验证属性图的复杂性,并与SHACL的相应结果进行了比较,实现了一个利用应答集编程的原型验证器。 摘要:Property graphs constitute data models for representing knowledge graphs. They allow for the convenient representation of facts, including facts about facts, represented by triples in subject or object position of other triples. Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata triples. The second type of error, violation of domain constraints, has not been addressed with regard to property graphs so far. In RDF representations, this error can be addressed by shape languages such as SHACL or ShEx, which allow for checking whether graphs are valid with respect to a set of domain constraints. Borrowing ideas from the syntax and semantics definitions of SHACL, we design a shape language for property graphs, ProGS, which allows for formulating shape constraints on property graphs including their specific constructs, such as edges with identities and key-value annotations to both nodes and edges. We define a formal semantics of ProGS, investigate the resulting complexity of validating property graphs against sets of ProGS shapes, compare with corresponding results for SHACL, and implement a prototypical validator that utilizes answer set programming.

【6】 Research on Metro Service Quality Improvement Schemes Considering Feasibility 标题:考虑可行性的地铁服务质量改善方案研究

作者:Chen Weiya,Li Jiajia,Kang Zixuan 机构:年月Journal of South China University of TechnologyMonthYear收稿日期:基金项目:湖南省自然科学基金资助项目( 20 18JJ 2 5 37)Foundation items 备注:in Chinese language 链接:https://arxiv.org/abs/2107.05558 摘要:根据服务质量调查结果,制定合理的改进方案,是地铁公司管理的一项重要任务。考虑服务质量属性在一定时期内的得分、权重和改进的可行性,将决策树(DT)与重要性能分析(IPA)相结合,建立DT-IPA模型,确定属性的改进优先级,并对改进程度进行量化。从最优决策树中提取的If-then规则和用层次分析法计算的改进可行性是DT-IPA模型的两个主要内容。它们用于优化IPA确定的属性的初始改进优先级,并量化调整后属性的改进程度。这样,整体服务质量才能达到高分,实现经营目标。以长沙地铁为例,验证了DT-IPA模型的有效性。该方法可作为地铁公司管理者提高地铁服务质量的决策工具。 摘要:It is an important management task of metro agencies to formulate reasonable improvement schemes based on the result of service quality surveys. Considering scores, weights, and improvement feasibility of service quality attributes in a certain period, this paper integrates Decision Tree (DT) into Importance-Performance analysis (IPA) to build a DT-IPA model, which is used to determine the improvement priority of attributes, and to quantify the improvement degree. If-then rules extracted from the optimal decision tree and the improvement feasibility computed by analytic hierarchy process are two main items derived from the DT-IPA model. They are used to optimize the initial improvement priority of attributes determined by IPA and to quantify the degree of improvement of the adjusted attributes. Then, the overall service quality can reach a high score, realizing the operation goal. The effectiveness of the DT-IPA model was verified through an empirical study which was taken place in Changsha Metro, China. The proposed method can be a decision-making tool for metro agency managers to improve the quality of metro service.

【7】 Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing 标题:广义图画强化学习中更好的拉普拉斯表示

作者:Kaixin Wang,Kuangqi Zhou,Qixin Zhang,Jie Shao,Bryan Hooi,Jiashi Feng 机构: ex-Equal contribution 1National University of Singapore 2CityUniversity of Hong Kong 3ByteDance AI lab 备注:ICML 2021 链接:https://arxiv.org/abs/2107.05545 摘要:拉普拉斯表示法以状态转移图的拉普拉斯矩阵的特征向量作为状态嵌入,为状态提供了简洁、信息丰富的表示,近年来在强化学习中受到越来越多的关注。这样的表示捕获了底层状态空间的几何结构,并且有利于RL任务,例如选项发现和奖励成形。为了在大的(甚至是连续的)状态空间中逼近拉普拉斯表示,最近的工作提出最小化一个谱图绘制目标,然而它除了特征向量之外还有无穷多个全局极小值。因此,他们所学的拉普拉斯表示可能不同于基本真理。为了解决这个问题,我们将图的绘制目标转化为一个广义形式,并导出一个新的学习目标,证明了它的特征向量是唯一的全局极小值。它使学习高质量的拉普拉斯表示,忠实地接近地面真理。我们通过在一组gridworld和连续控制环境上的综合实验来验证这一点。此外,我们发现,我们学习拉普拉斯表示导致更多的探索性选择和更好的奖励塑造。 摘要:The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward shaping. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward shaping.

【8】 End-to-End Natural Language Understanding Pipeline for Bangla Conversational Agent 标题:面向孟加拉会话Agent的端到端自然语言理解流水线

作者:Fahim Shahriar Khan,Mueeze Al Mushabbir,Mohammad Sabik Irbaz,MD Abdullah Al Nasim 机构:Department of Computer Science and Engineering, Islamic University of Technology, Machine Learning Team, Pioneer Alpha Ltd., A PREPRINT 备注:Under Review 链接:https://arxiv.org/abs/2107.05541 摘要:聊天机器人是一种智能软件,用来代替人与人之间的互动。然而,现有的研究通常不能为像孟加拉语这样的低资源语言提供足够的支持。此外,由于社交媒体的日益普及,我们还可以看到母语为孟加拉语的人之间用孟加拉语音译(主要是英语)进行交流的增多。在本文中,我们提出了一种新的方法来建立一个孟加拉聊天机器人,目的是作为一个商务助理,可以用孟加拉语和孟加拉语音译在英语中进行沟通,具有很高的可信度。由于带注释的数据不可用于此目的,我们必须使用Rasa开源框架、fastText嵌入、Polyglot嵌入、Flask和其他系统作为构建块来处理整个机器学习生命周期(数据准备、机器学习建模和模型部署)。在处理倾斜的带注释的数据集时,我们尝试不同的设置和管道来评估哪种方法最有效,并为观察到的结果提供可能的推理。最后,我们提出了一个用于意图分类和实体提取的管道,达到了合理的性能(准确率:83.02%,准确率:80.82%,召回率:83.02%,F1评分:80%)。 摘要:Chatbots are intelligent software built to be used as a replacement for human interaction. However, existing studies typically do not provide enough support for low-resource languages like Bangla. Moreover, due to the increasing popularity of social media, we can also see the rise of interactions in Bangla transliteration (mostly in English) among the native Bangla speakers. In this paper, we propose a novel approach to build a Bangla chatbot aimed to be used as a business assistant which can communicate in Bangla and Bangla Transliteration in English with high confidence consistently. Since annotated data was not available for this purpose, we had to work on the whole machine learning life cycle (data preparation, machine learning modeling, and model deployment) using Rasa Open Source Framework, fastText embeddings, Polyglot embeddings, Flask, and other systems as building blocks. While working with the skewed annotated dataset, we try out different setups and pipelines to evaluate which works best and provide possible reasoning behind the observed results. Finally, we present a pipeline for intent classification and entity extraction which achieves reasonable performance (accuracy: 83.02%, precision: 80.82%, recall: 83.02%, F1-score: 80%).

【9】 1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection 标题:ICDAR 2021数学公式检测竞赛一等奖

作者:Yuxiang Zhong,Xianbiao Qi,Shanjun Li,Dengyi Gu,Yihao Chen,Peiyang Ning,Rong Xiao 机构: Visual Computing Group (VCGroup), Ping An Property & Casualty Insurance Company. 备注:1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection. this http URL 链接:https://arxiv.org/abs/2107.05534 摘要:在这份技术报告中,我们介绍了2021年ICDAR数学公式检测竞赛(MFD)的第一名解决方案。MFD任务有三个关键挑战,包括大跨度、高宽比的大变化、丰富的字符集和数学表达式。考虑到这些挑战,我们采用无锚方法广义焦损(GFL)代替基于锚的方法,证明了自适应训练抽样策略(ATSS)和适当特征金字塔网络(FPN)能够很好地解决尺度变化这一重要问题。同时,我们还发现了一些技巧,如可变形卷积网络(DCN)、SyncBN和加权盒融合(WBF)在MFD任务中是有效的。我们提出的方法在最后15个小组中排名第一。 摘要:In this technical report, we present our 1st place solution for the ICDAR 2021 competition on mathematical formula detection (MFD). The MFD task has three key challenges including a large scale span, large variation of the ratio between height and width, and rich character set and mathematical expressions. Considering these challenges, we used Generalized Focal Loss (GFL), an anchor-free method, instead of the anchor-based method, and prove the Adaptive Training Sampling Strategy (ATSS) and proper Feature Pyramid Network (FPN) can well solve the important issue of scale variation. Meanwhile, we also found some tricks, e.g., Deformable Convolution Network (DCN), SyncBN, and Weighted Box Fusion (WBF), were effective in MFD task. Our proposed method ranked 1st in the final 15 teams.

【10】 Anatomy-Constrained Contrastive Learning for Synthetic Segmentation without Ground-truth 标题:基于解剖学约束的无基础真实感合成分割的对比学习

作者:Bo Zhou,Chi Liu,James S. Duncan 机构: Biomedical Engineering, Yale University, New Haven, CT, USA, Radiology and Biomedical Imaging, Yale University, New Haven, CT, USA 备注:Accepted at MICCAI 2021 链接:https://arxiv.org/abs/2107.05482 摘要:通常需要大量的人工分割来训练一个健壮的分割网络,以便在新的成像模式中分割出感兴趣的对象。如果可以利用一种成像模式(例如,CT)中的手动分割来训练另一种成像模式(例如,CBCT/MRI/PET)中的分割网络,则可以减轻手动工作。在这项工作中,我们开发了一个解剖学约束的对比合成分割网络(AccSeg-Net)来训练一个分割网络,用于目标成像模式,而不使用它的地面真值。具体来说,我们提出在无监督自适应过程中使用解剖约束和斑块对比学习来保证解剖学的逼真度,从而在具有正确解剖结构/内容的自适应图像上训练分割网络。我们的AccSeg网络的训练数据包括1)成像数据与源模态中的分割地面真值配对,以及2)未配对的源模态和目标模态成像数据。我们在CBCT、MRI和PET成像数据上取得了成功的应用,并且与以前的方法相比显示出了更好的分割性能。 摘要:A large amount of manual segmentation is typically required to train a robust segmentation network so that it can segment objects of interest in a new imaging modality. The manual efforts can be alleviated if the manual segmentation in one imaging modality (e.g., CT) can be utilized to train a segmentation network in another imaging modality (e.g., CBCT/MRI/PET). In this work, we developed an anatomy-constrained contrastive synthetic segmentation network (AccSeg-Net) to train a segmentation network for a target imaging modality without using its ground truth. Specifically, we proposed to use anatomy-constraint and patch contrastive learning to ensure the anatomy fidelity during the unsupervised adaptation, such that the segmentation network can be trained on the adapted image with correct anatomical structure/content. The training data for our AccSeg-Net consists of 1) imaging data paired with segmentation ground-truth in source modality, and 2) unpaired source and target modality imaging data. We demonstrated successful applications on CBCT, MRI, and PET imaging data, and showed superior segmentation performances as compared to previous methods.

【11】 Denoising User-aware Memory Network for Recommendation 标题:用户感知记忆网络的去噪推荐

作者:Zhi Bian,Shaojun Zhou,Hao Fu,Qihong Yang,Zhenqi Sun,Junjie Tang,Guiquan Liu,Kaikui Liu,Xiaolong Li 机构:ibaba Group 链接:https://arxiv.org/abs/2107.05474 摘要:为了更好地提高用户满意度和业务效率,基于序列的推荐系统越来越受到关注,该系统用于推断用户动态偏好的演化,最近的研究发现,内隐和外显反馈序列可以更好地理解用户偏好的演变。然而,大多数现有的推荐技术都没有考虑隐含反馈中包含的噪声,这将导致用户兴趣的偏向表示和次优推荐性能。同时,现有的方法利用项目序列来捕捉用户兴趣的演化。这些方法的性能受到序列长度的限制,不能有效地对长期利益进行建模。在此基础上,我们提出了一种新的CTR模型去噪用户感知记忆网络(DUMN)。具体来说,该框架:(i)提出了一种基于正交映射的特征净化模块,该模块利用显式反馈的表示来净化隐式反馈的表示,有效地去除了隐式反馈的噪声(ii)设计了一个用户记忆网络,通过改进记忆网络对用户的长期兴趣进行细粒度建模,而现有的方法忽略了这一点;(iii)开发一个偏好感知交互表示组件,基于选通融合用户的长期和短期兴趣,以了解用户无偏偏好的演化。在两个真实的电子商务用户行为数据集上的大量实验表明,DUMN比最新的基线有显著的改进。DUMN模型的代码已作为附加材料上传。 摘要:For better user satisfaction and business effectiveness, more and more attention has been paid to the sequence-based recommendation system, which is used to infer the evolution of users' dynamic preferences, and recent studies have noticed that the evolution of users' preferences can be better understood from the implicit and explicit feedback sequences. However, most of the existing recommendation techniques do not consider the noise contained in implicit feedback, which will lead to the biased representation of user interest and a suboptimal recommendation performance. Meanwhile, the existing methods utilize item sequence for capturing the evolution of user interest. The performance of these methods is limited by the length of the sequence, and can not effectively model the long-term interest in a long period of time. Based on this observation, we propose a novel CTR model named denoising user-aware memory network (DUMN). Specifically, the framework: (i) proposes a feature purification module based on orthogonal mapping, which use the representation of explicit feedback to purify the representation of implicit feedback, and effectively denoise the implicit feedback; (ii) designs a user memory network to model the long-term interests in a fine-grained way by improving the memory network, which is ignored by the existing methods; and (iii) develops a preference-aware interactive representation component to fuse the long-term and short-term interests of users based on gating to understand the evolution of unbiased preferences of users. Extensive experiments on two real e-commerce user behavior datasets show that DUMN has a significant improvement over the state-of-the-art baselines. The code of DUMN model has been uploaded as an additional material.

【12】 Visual-Tactile Cross-Modal Data Generation using Residue-Fusion GAN with Feature-Matching and Perceptual Losses 标题:基于特征匹配和感知损失残差融合GaN的视觉-触觉交叉模态数据生成

作者:Shaoyu Cai,Kening Zhu,Yuki Ban,Takuji Narumi 机构:hk 2City University of Hong Kong Shenzhen Research Institute 备注:8 pages, 6 figures, Accepted by IEEE Robotics and Automation Letters 链接:https://arxiv.org/abs/2107.05468 摘要:现有的心理物理学研究表明,跨模态视觉触觉知觉是人类日常活动中的常见现象。然而,建立从一个模态空间到另一个模态空间的算法映射,即跨模态视觉触觉数据的转换/生成仍然是一个挑战,这对于机器人的操作可能是重要的。在本文中,我们提出了一种基于深度学习的跨模态视觉触觉数据生成方法。该方法以物体表面的视觉图像作为视觉数据,笔在物体表面滑动所产生的加速度计信号作为触觉数据。我们采用条件GAN(cGAN)结构和剩余融合(RF)模块,通过附加特征匹配(FM)和感知损耗来训练模型,实现跨模态数据的生成。实验结果表明,RF模块的加入,FM和感知损耗的引入,显著地提高了跨模态数据的生成性能,提高了对生成数据的分类精度和地面真实感与生成数据的视觉相似性。 摘要:Existing psychophysical studies have revealed that the cross-modal visual-tactile perception is common for humans performing daily activities. However, it is still challenging to build the algorithmic mapping from one modality space to another, namely the cross-modal visual-tactile data translation/generation, which could be potentially important for robotic operation. In this paper, we propose a deep-learning-based approach for cross-modal visual-tactile data generation by leveraging the framework of the generative adversarial networks (GANs). Our approach takes the visual image of a material surface as the visual data, and the accelerometer signal induced by the pen-sliding movement on the surface as the tactile data. We adopt the conditional-GAN (cGAN) structure together with the residue-fusion (RF) module, and train the model with the additional feature-matching (FM) and perceptual losses to achieve the cross-modal data generation. The experimental results show that the inclusion of the RF module, and the FM and the perceptual losses significantly improves cross-modal data generation performance in terms of the classification accuracy upon the generated data and the visual similarity between the ground-truth and the generated data.

【13】 IGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control 标题:iGrow:自主温室控制的智能农业解决方案

作者:Xiaoyan Cao,Yao Yao,Lanqing Li,Wanpeng Zhang,Zhicheng An,Zhong Zhang,Shihui Guo,Li Xiao,Xiaoyu Cao,Dijun Luo 机构:School of Informatics, Xiamen, Xiamen, China, Tsinghua-Berkeley Shenzhen, Institute, Tsinghua University, Shenzhen, China, AI Lab, Tencent, College of Chemistry and Chemical, Engineering, Xiamen University 备注:10 pages, 6 figures, 4 tables, submitted to journal Nature Machine Intelligence 链接:https://arxiv.org/abs/2107.05464 摘要:农业是人类文明的基础。然而,全球人口的快速增长和老龄化对这一基石提出了挑战,要求更多的健康和新鲜食品。物联网技术使现代自主温室成为一种可行和可靠的粮食生产引擎。然而,有能力管理高科技温室的受过教育和技术熟练的劳动力却很少。人工智能(AI)和云计算技术是在这种受控环境中实现精确控制和高效生产的有前途的解决方案。本文提出了一种智能农业解决方案,即iGrow:(1)利用物联网和云计算技术来测量、收集和管理不断增长的数据,支持决策人工智能模块的迭代,该模块由一个增量模型和一个优化算法组成(2) 提出了一种基于数据积累的三阶段增量式模型,使种植者/中央计算机能够方便、低成本地调度控制策略(3) 提出了一种基于模型的迭代优化算法,可以在实时生产中动态优化温室控制策略。在模拟实验中,评估结果显示我们的增量式模型的精确度与先进的番茄模拟器相当,而我们的优化算法可以击败第二届自主温室挑战赛的冠军。在实际温室中进行的A/B测试的令人信服的结果表明,与种植专家相比,我们的解决方案显著提高了产量(可商业销售的水果)( 10.15%)和净利润( 87.07%)。 摘要:Agriculture is the foundation of human civilization. However, the rapid increase and aging of the global population pose challenges on this cornerstone by demanding more healthy and fresh food. Internet of Things (IoT) technology makes modern autonomous greenhouse a viable and reliable engine of food production. However, the educated and skilled labor capable of overseeing high-tech greenhouses is scarce. Artificial intelligence (AI) and cloud computing technologies are promising solutions for precision control and high-efficiency production in such controlled environments. In this paper, we propose a smart agriculture solution, namely iGrow: (1) we use IoT and cloud computing technologies to measure, collect, and manage growing data, to support iteration of our decision-making AI module, which consists of an incremental model and an optimization algorithm; (2) we propose a three-stage incremental model based on accumulating data, enabling growers/central computers to schedule control strategies conveniently and at low cost; (3) we propose a model-based iterative optimization algorithm, which can dynamically optimize the greenhouse control strategy in real-time production. In the simulated experiment, evaluation results show the accuracy of our incremental model is comparable to an advanced tomato simulator, while our optimization algorithms can beat the champion of the 2nd Autonomous Greenhouse Challenge. Compelling results from the A/B test in real greenhouses demonstrate that our solution significantly increases production (commercially sellable fruits) ( 10.15%) and net profit ( 87.07%) with statistical significance compared to planting experts.

【14】 Automated Label Generation for Time Series Classification with Representation Learning: Reduction of Label Cost for Training 标题:基于表示学习的时间序列分类标签自动生成:降低训练标签成本

作者:Soma Bandyopadhyay,Anish Datta,Arpan Pal 机构:TCS Research, TATA Consultancy Services ,Kolkata, India 备注:8 pages, 5 figures, 3 tables accepted in IJCAI2021 Weakly Supervised Representation Learning (WSRL) Workshop ; this https URL 链接:https://arxiv.org/abs/2107.05458 摘要:终端用户、边缘设备和不同的可穿戴设备生成的时间序列大多没有标记。提出了一种利用极少数有代表性的有标签时间序列自动生成无标签时间序列标签的方法。我们的方法是基于表示学习使用自动编码紧凑序列(AECS)与选择最佳的距离测量。它通过学习潜在结构,在迭代中进行自校正,并利用变分自动编码器(VAE)对代表性时间序列进行综合增强,以提高标签的质量。我们实验了UCR和UCI档案,公共现实世界的单变量,多变量时间序列取自不同的应用领域。实验结果表明,该方法与完全监督分类的性能非常接近。该方法不仅产生接近基准的结果,而且在某些情况下优于基准性能。 摘要:Time-series generated by end-users, edge devices, and different wearables are mostly unlabelled. We propose a method to auto-generate labels of un-labelled time-series, exploiting very few representative labelled time-series. Our method is based on representation learning using Auto Encoded Compact Sequence (AECS) with a choice of best distance measure. It performs self-correction in iterations, by learning latent structure, as well as synthetically boosting representative time-series using Variational-Auto-Encoder (VAE) to improve the quality of labels. We have experimented with UCR and UCI archives, public real-world univariate, multivariate time-series taken from different application domains. Experimental results demonstrate that the proposed method is very close to the performance achieved by fully supervised classification. The proposed method not only produces close to benchmark results but outperforms the benchmark performance in some cases.

【15】 Improving the Algorithm of Deep Learning with Differential Privacy 标题:利用差分隐私改进深度学习算法

作者:Mehdi Amian 机构:INRS-EMT, University of Quebec, Montreal, Canada 链接:https://arxiv.org/abs/2107.05457 摘要:本文针对深度学习模型,提出了一种改进的差分私有随机梯度下降(DPSGD)算法。作为一个动机问题,迄今为止,几乎没有最先进的机器学习算法雇用现有的隐私保护组件,因为在其他方面严重损害其效用,尽管至关重要的必要性。在这项研究的想法是自然的和可解释的,有助于提高效用方面的国家的最先进的。提出的技术的另一个特点是它的简单性,这使得它再次更自然,也更适合于现实世界和特别是商业应用。直觉是出于隐私的原因来修剪和平衡个体之间的巨大差异,同时,为了追求表现而保留相对的个体差异。本文提出的思想也可以应用于递归神经网络(RNN)来解决梯度爆炸问题。将该算法应用于一个分类任务的基准数据集MNIST和CIFAR-10,并计算了效用测度。结果优于原来的工作。 摘要:In this paper, an adjustment to the original differentially private stochastic gradient descent (DPSGD) algorithm for deep learning models is proposed. As a matter of motivation, to date, almost no state-of-the-art machine learning algorithm hires the existing privacy protecting components due to otherwise serious compromise in their utility despite the vital necessity. The idea in this study is natural and interpretable, contributing to improve the utility with respect to the state-of-the-art. Another property of the proposed technique is its simplicity which makes it again more natural and also more appropriate for real world and specially commercial applications. The intuition is to trim and balance out wild individual discrepancies for privacy reasons, and at the same time, to preserve relative individual differences for seeking performance. The idea proposed here can also be applied to the recurrent neural networks (RNN) to solve the gradient exploding problem. The algorithm is applied to benchmark datasets MNIST and CIFAR-10 for a classification task and the utility measure is calculated. The results outperformed the original work.

【16】 Source-Free Adaptation to Measurement Shift via Bottom-Up Feature Restoration 标题:通过自下而上的特征恢复实现测量漂移的无源自适应

作者:Cian Eastwood,Ian Mason,Christopher K. I. Williams,Bernhard Schölkopf 机构:† School of Informatics, University of Edinburgh, ‡ Alan Turing Institute, London, § Max Planck Institute for Intelligent Systems, Tübingen 链接:https://arxiv.org/abs/2107.05446 摘要:源自由域自适应(SFDA)的目的是在自适应过程中,在不访问源域数据的情况下,将源域中已标记数据训练的模型自适应到目标域中未标记的数据。现有的SFDA方法利用熵最小化技术:(i)只适用于分类(ii)破坏模型校准;并且(iii)依赖源模型在目标域中实现良好的特征空间类分离。我们针对一种特别普遍的领域转移(称为测量转移)来解决这些问题,其特征是测量系统的变化(例如传感器或照明的变化)。在源域中,我们存储了源数据下特征分布的轻量级和灵活的近似值。在目标域,我们采用特征抽取器,使得目标数据下的近似特征分布与源数据上的近似特征分布重新对齐。我们称这种方法为特征恢复(FR),因为它试图从目标域中提取与先前从源域中提取的语义相同的特征。我们还提出了自底向上的特征恢复(BUFR),这是一种自底向上的特征恢复训练方案,通过在网络的后一层保留学习到的结构来提高性能。通过实验,我们证明了BUFR在精度、校准和数据效率方面往往优于现有的SFDA方法,同时对源模型在目标域的性能依赖性较小。 摘要:Source-free domain adaptation (SFDA) aims to adapt a model trained on labelled data in a source domain to unlabelled data in a target domain without access to the source-domain data during adaptation. Existing methods for SFDA leverage entropy-minimization techniques which: (i) apply only to classification; (ii) destroy model calibration; and (iii) rely on the source model achieving a good level of feature-space class-separation in the target domain. We address these issues for a particularly pervasive type of domain shift called measurement shift, characterized by a change in measurement system (e.g. a change in sensor or lighting). In the source domain, we store a lightweight and flexible approximation of the feature distribution under the source data. In the target domain, we adapt the feature-extractor such that the approximate feature distribution under the target data realigns with that saved on the source. We call this method Feature Restoration (FR) as it seeks to extract features with the same semantics from the target domain as were previously extracted from the source. We additionally propose Bottom-Up Feature Restoration (BUFR), a bottom-up training scheme for FR which boosts performance by preserving learnt structure in the later layers of a network. Through experiments we demonstrate that BUFR often outperforms existing SFDA methods in terms of accuracy, calibration, and data efficiency, while being less reliant on the performance of the source model in the target domain.

【17】 Disentangling Transfer and Interference in Multi-Domain Learning 标题:多域学习中的解缠、转移和干扰

作者:Yipeng Zhang,Tyler L. Hayes,Christopher Kanan 机构: University of Rochester, Rochester, NY, USA, Rochester Institute of Technology, Paige, New York, NY, USA, Cornell Tech 链接:https://arxiv.org/abs/2107.05445 摘要:人类非常善于将知识从一个领域转移到另一个领域,从而能够快速学习新的任务。同样,迁移学习在许多计算机视觉问题的预训练中取得了巨大的成功。然而,在多领域学习中,网络学习由不同数据集定义的多个任务,迁移的好处还没有得到充分的研究。学习多个域可能是有益的,或者在网络容量有限的情况下,这些域可能相互干扰。在这项工作中,我们破译的条件下,干扰和知识转移发生在多领域学习。提出了新的干扰和传输分离指标,并建立了实验协议。我们进一步研究了网络容量、任务分组和动态损失加权在减少干扰和促进传输方面的作用。我们在CIFAR-100、MiniPlaces和微型ImageNet数据集上展示了我们的发现。 摘要:Humans are incredibly good at transferring knowledge from one domain to another, enabling rapid learning of new tasks. Likewise, transfer learning has enabled enormous success in many computer vision problems using pretraining. However, the benefits of transfer in multi-domain learning, where a network learns multiple tasks defined by different datasets, has not been adequately studied. Learning multiple domains could be beneficial or these domains could interfere with each other given limited network capacity. In this work, we decipher the conditions where interference and knowledge transfer occur in multi-domain learning. We propose new metrics disentangling interference and transfer and set up experimental protocols. We further examine the roles of network capacity, task grouping, and dynamic loss weighting in reducing interference and facilitating transfer. We demonstrate our findings on the CIFAR-100, MiniPlaces, and Tiny-ImageNet datasets.

【18】 CoBERL: Contrastive BERT for Reinforcement Learning 标题:CoBERL:强化学习的对比ERT

作者:Andrea Banino,Adrià Puidomenech Badia,Jacob Walker,Tim Scholtes,Jovana Mitrovic,Charles Blundell 机构:DeepMind, London, UK 备注:9 pages, 2 figures, 6 tables 链接:https://arxiv.org/abs/2107.05431 摘要:许多强化学习(RL)代理需要大量的经验来解决任务。我们提出了RL的对比BERT(CoBERL),它结合了一种新的对比损耗和一种混合LSTM转换器结构来解决提高数据效率的挑战。CoBERL支持从广泛领域的像素进行高效、健壮的学习。我们使用双向掩蔽预测结合最近的对比方法的推广学习更好的表示Transformer在RL,而不需要手工工程的数据扩充。我们发现,CoBERL在整个Atari套件、一组控制任务和一个具有挑战性的3D环境中不断提高性能。 摘要:Many reinforcement learning (RL) agents require a large amount of experience to solve tasks. We propose Contrastive BERT for RL (CoBERL), an agent that combines a new contrastive loss and a hybrid LSTM-transformer architecture to tackle the challenge of improving data efficiency. CoBERL enables efficient, robust learning from pixels across a wide range of domains. We use bidirectional masked prediction in combination with a generalization of recent contrastive methods to learn better representations for transformers in RL, without the need of hand engineered data augmentations. We find that CoBERL consistently improves performance across the full Atari suite, a set of control tasks and a challenging 3D environment.

【19】 MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition 标题:MECT:基于多元数据嵌入的中文命名实体识别交叉转换器

作者:Shuang Wu,Xiaoning Song,Zhenhua Feng 机构:School of Artificial Intelligence and Computer Science, Jiangnan University, China, Department of Computer Science, University of Surrey, UK, Centre for Vision, Speech and Signal Processing, University of Surrey, UK 备注:Accepted to ACL-2021 链接:https://arxiv.org/abs/2107.05418 摘要:近年来,在中文命名实体识别(NER)中,词增强技术得到了广泛的应用,减少了分词错误,增加了中文词的语义和边界信息。然而,这些方法在综合词汇信息后往往忽略了汉字结构的信息。汉字自古以来就是由象形文字演变而来的,其结构往往反映了更多的文字信息。提出了一种新的基于多元数据嵌入的交叉变换(MECT)算法,通过融合汉字的结构信息,提高了该算法的性能。具体地说,我们在一个双流转换器中使用多元数据嵌入来整合汉字特征和字根级嵌入。结合汉字的结构特点,MECT能更好地捕捉汉字的语义信息,为NER提供参考。在几个著名的基准测试数据集上的实验结果证明了所提出的MECT方法的优点和优越性https://github.com/CoderMusou/MECT4CNER. 摘要:Recently, word enhancement has become very popular for Chinese Named Entity Recognition (NER), reducing segmentation errors and increasing the semantic and boundary information of Chinese words. However, these methods tend to ignore the information of the Chinese character structure after integrating the lexical information. Chinese characters have evolved from pictographs since ancient times, and their structure often reflects more information about the characters. This paper presents a novel Multi-metadata Embedding based Cross-Transformer (MECT) to improve the performance of Chinese NER by fusing the structural information of Chinese characters. Specifically, we use multi-metadata embedding in a two-stream Transformer to integrate Chinese character features with the radical-level embedding. With the structural characteristics of Chinese characters, MECT can better capture the semantic information of Chinese characters for NER. The experimental results obtained on several well-known benchmarking datasets demonstrate the merits and superiority of the proposed MECT method.footnote{The source code of the proposed method is publicly available at https://github.com/CoderMusou/MECT4CNER.

【20】 PonderNet: Learning to Ponder 标题:PonderNet:学会思考

作者:Andrea Banino,Jan Balaguer,Charles Blundell 机构:DeepMind, London, UK 备注:16 pages, 2 figures, 2 tables, 8th ICML Workshop on Automated Machine Learning (2021) 链接:https://arxiv.org/abs/2107.05407 摘要:在标准的神经网络中,计算量随输入的大小而增长,但不随学习问题的复杂性而增长。为了克服这一限制,我们引入了poundernet,这是一种新的算法,它可以根据手头问题的复杂性来调整计算量。PounderNet端到端地学习计算步骤的数量,以在训练预测精度、计算成本和泛化之间实现有效的折衷。在一个复杂的综合问题上,PounderNet比以前的自适应计算方法大大提高了性能,而且在传统神经网络失败的外推测试中也取得了成功。此外,我们的方法与现实世界问答数据集上的最新结果相匹配,但使用较少的计算量。最后,PounderNet在一项复杂的任务上取得了最新的成果,旨在测试神经网络的推理能力 摘要:In standard neural networks the amount of computation used grows with the size of the inputs, but not with the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a complex synthetic problem, PonderNet dramatically improves performance over previous adaptive computation methods and additionally succeeds at extrapolation tests where traditional neural networks fail. Also, our method matched the current state of the art results on a real world question and answering dataset, but using less compute. Finally, PonderNet reached state of the art results on a complex task designed to test the reasoning capabilities of neural networks.1

【21】 Identifying Hijacked Reviews 标题:识别被劫持的评论

作者:Monika Daryani,James Caverlee 机构:Texas A&M University, College Station, TX 备注:To be published in ACL-IJCNLP 2021 Workshop on e-Commerce and NLP (ECNLP) 链接:https://arxiv.org/abs/2107.05385 摘要:虚假评论和评论操纵是全球在线市场上日益严重的问题。评论劫持是一种新的评论操纵策略,其中不道德的卖家“劫持”现有的产品页面(通常是一个有许多正面评论的页面),然后用完全不同的产品更新产品细节,如标题、照片和描述。由于之前的评论仍然附加,新的项目似乎审查良好。然而,目前还没有关于评论劫持的公开数据集,关于这种策略的文献也知之甚少。因此,本文提出了一个由三部分组成的研究:(i)提出了一个通过交换产品和评论来生成评论劫持综合标记数据的框架(ii)然后,我们评估了一个双LSTM网络和BERT序列对分类器在利用这些数据区分合法评论和劫持评论方面的潜力;(iii)然后,我们在原始数据中的31K个产品集合(有6.5 M个评论)上部署性能最佳的模型,其中我们发现了100个以前未知的评论劫持示例。 摘要:Fake reviews and review manipulation are growing problems on online marketplaces globally. Review Hijacking is a new review manipulation tactic in which unethical sellers "hijack" an existing product page (usually one with many positive reviews), then update the product details like title, photo, and description with those of an entirely different product. With the earlier reviews still attached, the new item appears well-reviewed. However, there are no public datasets of review hijacking and little is known in the literature about this tactic. Hence, this paper proposes a three-part study: (i) we propose a framework to generate synthetically labeled data for review hijacking by swapping products and reviews; (ii) then, we evaluate the potential of both a Twin LSTM network and BERT sequence pair classifier to distinguish legitimate reviews from hijacked ones using this data; and (iii) we then deploy the best performing model on a collection of 31K products (with 6.5 M reviews) in the original data, where we find 100s of previously unknown examples of review hijacking.

【22】 Not Quite 'Ask a Librarian': AI on the Nature, Value, and Future of LIS 标题:不完全“问图书馆员”--人工智能谈图书馆的性质、价值和未来

作者:Jesse David Dinneen,Helen Bubinger 备注:Final version to appear in ASIS&T '21: Proceedings of the 84th Annual Meeting of the Association for Information Science & Technology, 58 链接:https://arxiv.org/abs/2107.05383 摘要:人工智能语言模型在网络数据上训练生成反映人类知识和公众情绪的散文,但也可以包含新颖的见解和预测。我们问了世界上最好的语言模型GPT-3,关于图书情报学(LIS)的性质、价值和未来的15个难题,这些问题一直受到LIS学者的关注。我们从45种不同的反应中呈现亮点,从陈词滥调和漫画到有趣的观点和令人担忧的未来愿景,从而为AI语言模型的当前性能提供了一个定制的演示。今天,我们也反思了用人工智能来预测或产生研究想法的可行性。最后,我们分享了完整的响应日志在线供读者考虑和评估自己。 摘要:AI language models trained on Web data generate prose that reflects human knowledge and public sentiments, but can also contain novel insights and predictions. We asked the world's best language model, GPT-3, fifteen difficult questions about the nature, value, and future of library and information science (LIS), topics that receive perennial attention from LIS scholars. We present highlights from its 45 different responses, which range from platitudes and caricatures to interesting perspectives and worrisome visions of the future, thus providing an LIS-tailored demonstration of the current performance of AI language models. We also reflect on the viability of using AI to forecast or generate research ideas in this way today. Finally, we have shared the full response log online for readers to consider and evaluate for themselves.

【23】 DISCO : efficient unsupervised decoding for discrete natural language problems via convex relaxation 标题:DISCO:基于凸松弛的离散自然语言问题的高效无监督解码

作者:Anish Acharya,Rudrajit Das,Greg Durrett,Inderjit Dhillon,Sujay Sanghavi 机构:University of Texas at Austin, Amazon 链接:https://arxiv.org/abs/2107.05380 摘要:本文研究了测试时间译码;在几乎所有的连续文本生成任务中,一个普遍存在的步骤跨越了一系列自然语言处理(NLP)问题。我们的主要贡献是为组合NP-hard译码问题建立了一个连续松弛框架,并提出了一种基于标准一阶梯度的高效算法Disco。我们提供了严密的分析,并表明我们提出的算法线性收敛到$epsilon$附近的最优解。最后,我们对敌方文本生成任务进行了初步的实验,结果表明Disco的性能优于几种常用的解码方法。 摘要:In this paper we study test time decoding; an ubiquitous step in almost all sequential text generation task spanning across a wide array of natural language processing (NLP) problems. Our main contribution is to develop a continuous relaxation framework for the combinatorial NP-hard decoding problem and propose Disco - an efficient algorithm based on standard first order gradient based. We provide tight analysis and show that our proposed algorithm linearly converges to within $epsilon$ neighborhood of the optima. Finally, we perform preliminary experiments on the task of adversarial text generation and show superior performance of Disco over several popular decoding approaches.

【24】 On the Evaluation of Commit Message Generation Models: An Experimental Study 标题:提交消息生成模型评价的实验研究

作者:Wei Tao,Yanlin Wang,Ensheng Shi,Lun Du,Hongyu Zhang,Dongmei Zhang,Wenqiang Zhang 机构: Fudan University, com§Xi’an Jiaotong University, cn¶The University of Newcastle 链接:https://arxiv.org/abs/2107.05373 摘要:提交消息是代码更改的自然语言描述,对于程序理解和维护非常重要。但是,手动编写提交消息既耗时又费力,尤其是在代码频繁更新的情况下。已经提出了各种利用生成或检索技术来自动生成提交消息的方法。为了更好地理解现有方法在解决这一问题上的表现,本文对最新的模型和数据集进行了系统深入的分析。我们发现:(1)以前的工作中使用了不同的BLEU度量变量,这影响了对现有方法的评估和理解(2) 大多数现有的数据集只从Java存储库中获取,而其他编程语言中的存储库没有得到充分的研究(3) 数据集分割策略对现有模型的性能有很大的影响。当数据集按提交进行拆分时,有些模型表现出更好的性能,而当数据集按时间戳或项目进行拆分时,其他模型表现出更好的性能。基于我们的发现,我们进行了一次人类评估,并找到了与人类任务得分最相关的BLEU指标。我们还收集了一个大规模的、信息丰富的、多语言的提交消息数据集MCMD,并在此数据集上评估了现有的模型。此外,我们在不同的数据集分割策略下进行了大量的实验,并提出了不同场景下适合的模型。基于实验结果和发现,我们为综合评价提交消息生成模型提供了可行的建议,并讨论了未来可能的研究方向。我们相信这项工作可以帮助从业者和研究人员更好地评估和选择自动提交消息生成的模型。 摘要:Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets. We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods. (2) Most existing datasets are crawled only from Java repositories while repositories in other programming languages are not sufficiently explored. (3) Dataset splitting strategies can influence the performance of existing models by a large margin. Some models show better performance when the datasets are split by commit, while other models perform better when the datasets are split by timestamp or by project. Based on our findings, we conduct a human evaluation and find the BLEU metric that best correlates with the human scores for the task. We also collect a large-scale, information-rich, and multi-language commit message dataset MCMD and evaluate existing models on this dataset. Furthermore, we conduct extensive experiments under different dataset splitting strategies and suggest the suitable models under different scenarios. Based on the experimental results and findings, we provide feasible suggestions for comprehensively evaluating commit message generation models and discuss possible future research directions. We believe this work can help practitioners and researchers better evaluate and select models for automatic commit message generation.

【25】 How to Approximate Ontology-Mediated Queries 标题:如何近似本体介导的查询

作者:Anneke Haga,Carsten Lutz,Leif Sabellek,Frank Wolter 机构:Department of Computer Science, University of Bremen, Germany, Department of Computer Science, University of Liverpool, UK 链接:https://arxiv.org/abs/2107.05369 摘要:介绍并研究了基于描述逻辑ALC和ALCI的本体中介查询的近似概念。我们的近似方法有两种:一种是用易处理的本体语言(如ELI或某些TGDs)代替本体,另一种是用易处理的类(如树宽由常数限定的数据库类)代替数据库。我们确定了计算复杂性和所得到的近似的相对完整性(几乎所有这些方法都降低了从coNP完全到PTime的数据复杂度,在某些情况下甚至降低到固定参数可处理和线性时间。虽然种类(1)的近似也降低了组合复杂度,但种类(2)的近似往往不是这种情况。在某些情况下,合并的复杂性甚至会增加。 摘要:We introduce and study several notions of approximation for ontology-mediated queries based on the description logics ALC and ALCI. Our approximations are of two kinds: we may (1) replace the ontology with one formulated in a tractable ontology language such as ELI or certain TGDs and (2) replace the database with one from a tractable class such as the class of databases whose treewidth is bounded by a constant. We determine the computational complexity and the relative completeness of the resulting approximations. (Almost) all of them reduce the data complexity from coNP-complete to PTime, in some cases even to fixed-parameter tractable and to linear time. While approximations of kind (1) also reduce the combined complexity, this tends to not be the case for approximations of kind (2). In some cases, the combined complexity even increases.

【26】 A Three Phase Semantic Web Matchmaker 标题:一种三阶段语义Web匹配器

作者:Golsa Heidari,Kamran Zamanifar 机构:Naser nematbakhsh, Dept. of Computer Science, Islamic Azad University, Najafabad Branch, Isfahan, Iran. 备注:None 链接:https://arxiv.org/abs/2107.05368 摘要:由于使用了根据面向服务的体系结构构建的环境,我们有了更有效和动态的应用程序。语义匹配过程就是寻找有价值的候选服务进行替换。这是使用语义Web服务的一个非常重要的方面。我们提出的matchmaker算法在语义Web服务匹配的输入和输出描述的基础上进行Web服务的语义匹配。这种技术利用了图形结构和流网络的优点。我们的新方法是将匹配分数分配给输入和输出参数及其类型的语义。它构造了一个流网络,其中边的权重就是这些分数,利用FordFulkerson算法,我们得到了两个web服务的匹配率。因此,所有服务都应该用相同的本体Web语言来描述。在这些候选者中,在执行失败的情况下,选择最佳候选者进行替换。我们的方法使用了运行时间最少的算法来进行二部匹配。问题的重要性在于,在实际系统中,许多基本问题都会因回答晚而发生。因此,系统的服务应该一直处于开启状态,如果其中一个崩溃,它将很快被替换。语义网媒人简化了这个过程。 摘要:Since using environments that are made according to the service oriented architecture, we have more effective and dynamic applications. Semantic matchmaking process is finding valuable service candidates for substitution. It is a very important aspect of using semantic Web Services. Our proposed matchmaker algorithm performs semantic matching of Web Services on the basis of input and output descriptions of semantic Web Services matching. This technique takes advantages from a graph structure and flow networks. Our novel approach is assigning matchmaking scores to semantics of the inputs and outputs parameters and their types. It makes a flow network in which the weights of the edges are these scores, using FordFulkerson algorithm, we find matching rate of two web services. So, all services should be described in the same Ontology Web Language. Among these candidates, best one is chosen for substitution in the case of an execution failure. Our approach uses the algorithm that has the least running time among all others that can be used for bipartite matching. The importance of problem is that in real systems, many fundamental problems will occur by late answering. So system`s service should always be on and if one of them crashes, it would be replaced fast. Semantic web matchmaker eases this process.

【27】 HCGR: Hyperbolic Contrastive Graph Representation Learning for Session-based Recommendation 标题:HCGR:基于会话推荐的双曲对比图表示学习

作者:Naicheng Guo,Xiaolei Liu,Shaoshuai Li,Qiongxu Ma,Yunan Zhao,Bing Han,Lin Zheng,Kaixin Gao,Xiaobo Guo 机构:†Department of Computer Science, Shantou University, ‡School of Mathematics, Tianjin University 链接:https://arxiv.org/abs/2107.05366 摘要:基于会话的推荐(Session-based recommendation,SBR)通过捕获用户行为演化过程中的短期和连续模式来学习用户的偏好。在SBR领域的研究中,基于图的方法是一种比较有效的方法,通常是在欧氏空间下通过消息聚合来提取项目信息。然而,这种方法不能有效地提取会话中连续项之间的层次信息,这是表示用户偏好的关键。在本文中,我们提出了一个双曲线对比图推荐器(HCGR),这是一个基于会话的推荐框架,它包含了Lorentz双曲线空间,以充分捕捉项目的一致性和层次表示。在此框架下,我们设计了一种新的自适应双曲线注意计算方法,在基于会话的行为序列中聚合每个用户偏好的图形信息。此外,在双曲空间中考虑正负样本间的测地距离,利用对比学习优化项目表示。在四个真实数据集上进行的大量实验表明,就$HitRate$、$NDCG$和$MRR$而言,HCGR始终比最先进的基线高出0.43$%$-28.84$%$。 摘要:Session-based recommendation (SBR) learns users' preferences by capturing the short-term and sequential patterns from the evolution of user behaviors. Among the studies in the SBR field, graph-based approaches are a relatively powerful kind of way, which generally extract item information by message aggregation under Euclidean space. However, such methods can't effectively extract the hierarchical information contained among consecutive items in a session, which is critical to represent users' preferences. In this paper, we present a hyperbolic contrastive graph recommender (HCGR), a principled session-based recommendation framework involving Lorentz hyperbolic space to adequately capture the coherence and hierarchical representations of the items. Within this framework, we design a novel adaptive hyperbolic attention computation to aggregate the graph message of each user's preference in a session-based behavior sequence. In addition, contrastive learning is leveraged to optimize the item representation by considering the geodesic distance between positive and negative samples in hyperbolic space. Extensive experiments on four real-world datasets demonstrate that HCGR consistently outperforms state-of-the-art baselines by 0.43$%$-28.84$%$ in terms of $HitRate$, $NDCG$ and $MRR$.

【28】 Towards solving the 7-in-a-row game 标题:朝着解决7连排游戏的方向前进

作者:Domonkos Czifra,Endre Csóka,Zsolt Zombori,Géza Makay 机构:Alfr´ed R´enyi Institute of Mathematics, Budapest, Hungary, nd Endre Cs´oka, E¨otv¨os Lor´and University, th G´eza Makay, University of Szeged, Szeged, Hungary 链接:https://arxiv.org/abs/2107.05363 摘要:本文探讨了七连胜博弈的博弈论价值。我们把这个问题归结为求解一个有限棋盘游戏,我们用证明数搜索来解决这个问题。我们提出了一些启发式改进证明数搜索和检查他们的影响在这个特定的游戏环境。虽然我们的论文没有解决7连胜游戏,但是我们的实验表明我们在这方面取得了显著的进展。 摘要:Our paper explores the game theoretic value of the 7-in-a-row game. We reduce the problem to solving a finite board game, which we target using Proof Number Search. We present a number of heuristic improvements to Proof Number Search and examine their effect within the context of this particular game. Although our paper does not solve the 7-in-a-row game, our experiments indicate that we have made significant progress towards it.

【29】 Zero-shot Visual Question Answering using Knowledge Graph 标题:基于知识图的Zero-Shot视觉答疑

作者:Zhuo Chen,Jiaoyan Chen,Yuxia Geng,Jeff Z. Pan,Zonggang Yuan,Huajun Chen 备注:accepted at the International Semantic Web Conference '21 (ISWC 2021) 链接:https://arxiv.org/abs/2107.05348 摘要:将外部知识融入到可视化问答系统(VQA)中已成为一个重要的现实需求。现有的方法大多采用不同组件的流水线方法进行知识匹配与提取、特征学习等,但是当某些组件性能不好时,这种流水线方法会受到影响,导致错误传播和整体性能较差。此外,现有的大多数方法都忽略了答案偏误问题——在实际应用中,许多答案可能从未在训练过程中出现过(即看不见的答案),我们提出了一种基于知识图的Zero-ShotVQA算法和一种基于掩码的学习机制来更好地融合外部知识,并针对F-VQA数据集提出了新的基于答案的Zero-ShotVQA分割。实验结果表明,该方法可以在Zero-ShotVQA中获得最先进的性能,同时在普通VQA任务上大大扩充了现有的端到端模型。 摘要:Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application.To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal VQA task.

【30】 SimDem A Multi-agent Simulation Environment to Model Persons with Dementia and their Assistance 标题:SimDEM--痴呆建模的多智能体仿真环境及其辅助

作者:Muhammad Salman Shaukat,Bjarne Christian Hiller,Sebastian Bader,Thomas Kirste 机构:Department of Computer Science, University of Rostock, Rostock, Germany 备注:5 pages, accepted in ARIAL@IJCAI 2021: 4th Workshop on AI for Aging, Rehabilitation, and Intelligent Assisted Living 链接:https://arxiv.org/abs/2107.05346 摘要:开发基于人工智能的辅助系统来帮助痴呆症患者需要大量的训练数据。然而,数据收集带来了伦理、法律、经济和后勤问题。在这方面,合成数据生成工具提供了一个潜在的解决方案。然而,我们认为现有的这些工具并不能充分反映行为模拟中的认知缺陷。为了解决这些问题,我们提出了一个模拟模型(SimDem),它主要关注PwD引起的认知障碍,用户可以很容易地配置和调整该模型来模拟和评估辅助解决方案。 摘要:Developing artificial intelligence based assistive systems to aid Persons with Dementia (PwD) requires large amounts of training data. However, data collection poses ethical, legal, economic, and logistic issues. Synthetic data generation tools, in this regard, provide a potential solution. However, we believe that already available such tools do not adequately reflect cognitive deficiencies in behavior simulation. To counter these issues we propose a simulation model (SimDem ) that primarily focuses on cognitive impairments suffered by PwD and can be easily configured and adapted by the users to model and evaluate assistive solutions.

【31】 Post Triangular Rewiring Method for Shorter RRT Robot Path Planning 标题:短程RRT机器人路径规划的后三角重布线法

作者:Jin-Gu Kang,Jin-Woo Jung 机构:Department of Computer Science and Engineering, Dongguk University, Seoul , Korea 备注:Under review on IJFIS(International Journal of Fuzzy logic and Intelligent Systems; this http URL) 链接:https://arxiv.org/abs/2107.05344 摘要:本文提出了后三角重布线方法,该方法最大限度地减少了规划时间的牺牲,克服了快速探索随机树(RRT)算法等基于抽样的算法的最优性限制。通过三角不等式原理,提出了一种比RRT算法更接近最优路径的后三角重布线方法。通过实验验证了该方法的有效性。将本文提出的方法应用于RRT算法,与规划时间相比,优化效率提高。 摘要:This paper proposed the 'Post Triangular Rewiring' method that minimizes the sacrifice of planning time and overcomes the limit of Optimality of sampling-based algorithm such as Rapidly-exploring Random Tree (RRT) algorithm. The proposed 'Post Triangular Rewiring' method creates a closer to the optimal path than RRT algorithm before application through the triangular inequality principle. The experiments were conducted to verify a performance of the proposed method. When the method proposed in this paper are applied to the RRT algorithm, the Optimality efficiency increase compared to the planning time.

【32】 Real-Time Super-Resolution System of 4K-Video Based on Deep Learning 标题:基于深度学习的4K视频实时超分辨率系统

作者:Yanpeng Cao,Chengcheng Wang,Changjun Song,He Li,Yongming Tang 机构:Joint International Research Laboratory of Information Display and Visualization, Southeast University, Nanjing, CN, Department of Engineering, University of Cambridge, Cambridge, UK 备注:8 pages, 7 figures, ASAP 链接:https://arxiv.org/abs/2107.05307 摘要:视频超分辨率(VSR)技术在重建低质量视频方面有着突出的优势,避免了基于插值的算法带来的令人不快的模糊效果。然而,在实际应用中,特别是在大规模VSR任务中,庞大的计算复杂度和内存占用严重影响了系统的可靠性和运行时推理。本文探讨了实时VSR系统的可能性,并设计了一种高效通用的VSR网络EGVSR。提出了一种基于时空对抗学习的EGVSR算法。为了在4K分辨率下追求更快的VSR处理能力,本文尝试在保证高视觉质量的前提下,选择轻量级的网络结构和高效的上采样方法来减少EGVSR网络所需的计算量。此外,在实际硬件平台上实现了批量归一化计算融合、卷积加速算法等神经网络加速技术,优化了EGVSR网络的推理过程。最后,我们的EGVSR达到了实时处理的能力4K@29.61FPS. 与目前最先进的VSR网络TecoGAN相比,计算密度降低了85.04%,性能提高了7.92倍。在视觉质量方面,所提出的EGVSR在公共测试数据集Vid4的大多数度量(如LPIPS、tOF、tLP等)中名列前茅,在总体性能得分上超过了其他最先进的方法。这个项目的源代码可以在https://github.com/Thmen/EGVSR. 摘要:Video super-resolution (VSR) technology excels in reconstructing low-quality video, avoiding unpleasant blur effect caused by interpolation-based algorithms. However, vast computation complexity and memory occupation hampers the edge of deplorability and the runtime inference in real-life applications, especially for large-scale VSR task. This paper explores the possibility of real-time VSR system and designs an efficient and generic VSR network, termed EGVSR. The proposed EGVSR is based on spatio-temporal adversarial learning for temporal coherence. In order to pursue faster VSR processing ability up to 4K resolution, this paper tries to choose lightweight network structure and efficient upsampling method to reduce the computation required by EGVSR network under the guarantee of high visual quality. Besides, we implement the batch normalization computation fusion, convolutional acceleration algorithm and other neural network acceleration techniques on the actual hardware platform to optimize the inference process of EGVSR network. Finally, our EGVSR achieves the real-time processing capacity of 4K@29.61FPS. Compared with TecoGAN, the most advanced VSR network at present, we achieve 85.04% reduction of computation density and 7.92x performance speedups. In terms of visual quality, the proposed EGVSR tops the list of most metrics (such as LPIPS, tOF, tLP, etc.) on the public test dataset Vid4 and surpasses other state-of-the-art methods in overall performance score. The source code of this project can be found on https://github.com/Thmen/EGVSR.

【33】 HEMP: High-order Entropy Minimization for neural network comPression 标题:HEMP:神经网络压缩的高阶熵最小化算法

作者:Enzo Tartaglione,Stéphane Lathuilière,Attilio Fiandrotti,Marco Cagnazzo,Marco Grangetto 机构:University of Torino, Torino, Italy, T´el´ecom Paris, Paris, France 链接:https://arxiv.org/abs/2107.05298 摘要:我们将量化人工神经网络的熵表示为一个可微函数,可以作为正则项插入到通过梯度下降最小化的代价函数中。我们的公式有效地扩展到一阶以上,并且对量子化方案是不可知的。然后对网络进行训练,使量化参数的熵最小化,从而通过熵编码对量化参数进行最优压缩。我们用我们的熵公式在多个数据集上对已知的网络结构进行量化和压缩。我们的方法优于类似的方法,享受高阶熵估计的好处,显示出对非均匀量化的灵活性(我们使用Lloyd max量化),对任何熵阶的可伸缩性最小化和压缩方面的效率。我们表明,HEMP能够与其他旨在修剪或量化模型本身的方法协同工作,在不损害模型性能的情况下,在存储大小可压缩性方面提供显著的好处。 摘要:We formulate the entropy of a quantized artificial neural network as a differentiable function that can be plugged as a regularization term into the cost function minimized by gradient descent. Our formulation scales efficiently beyond the first order and is agnostic of the quantization scheme. The network can then be trained to minimize the entropy of the quantized parameters, so that they can be optimally compressed via entropy coding. We experiment with our entropy formulation at quantizing and compressing well-known network architectures over multiple datasets. Our approach compares favorably over similar methods, enjoying the benefits of higher order entropy estimate, showing flexibility towards non-uniform quantization (we use Lloyd-max quantization), scalability towards any entropy order to be minimized and efficiency in terms of compression. We show that HEMP is able to work in synergy with other approaches aiming at pruning or quantizing the model itself, delivering significant benefits in terms of storage size compressibility without harming the model's performance.

【34】 Continuous Time Bandits With Sampling Costs 标题:具有抽样费用的连续时间带

作者:Rahul Vaze,Manjesh K. Hanawal 机构:School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, Maharastra , India, Industrial Engineering and Operations Research, Indian Institute of Technology, Bombay, Mumbai, Maharashtra , India 链接:https://arxiv.org/abs/2107.05289 摘要:我们考虑一个连续时间的多臂强盗问题(CTMAB),其中学习者可以在给定的间隔中采样任意次数的臂,并从每个样本获得随机回报,然而,增加采样频率会导致附加惩罚/成本。因此,作为采样频率的函数,在获得大量报酬和产生采样成本之间存在权衡。其目标是设计一个最小化后悔的学习算法,即oracle策略的收益与学习算法的收益之差。CTMAB与通常的多臂bandit问题(MAB)有本质的不同,例如,在CTMAB中,即使是单臂情况也是不平凡的,因为最佳采样频率取决于需要估计的臂的平均值。我们首先建立任何算法可达到的遗憾的下界,然后提出达到对数因子下界的算法。对于单臂情形,我们证明了遗憾的下界是$Omega((logt)^2/mu)$,其中$mu$是臂的平均值,$T$是时间范围。对于多臂情形,我们证明了遗憾的下界是$Omega((logt)^2mu/Delta^2)$,其中$mu$现在表示最佳臂的平均值,$Delta$是最佳臂和次最佳臂的平均值之差。然后,我们提出一个算法,实现绑定到常数项。 摘要:We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample, however, increasing the frequency of sampling incurs an additive penalty/cost. Thus, there is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency. The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm. CTMAB is fundamentally different than the usual multi-arm bandit problem (MAB), e.g., even the single-arm case is non-trivial in CTMAB, since the optimal sampling frequency depends on the mean of the arm, which needs to be estimated. We first establish lower bounds on the regret achievable with any algorithm and then propose algorithms that achieve the lower bound up to logarithmic factors. For the single-arm case, we show that the lower bound on the regret is $Omega((log T)^2/mu)$, where $mu$ is the mean of the arm, and $T$ is the time horizon. For the multiple arms case, we show that the lower bound on the regret is $Omega((log T)^2 mu/Delta^2)$, where $mu$ now represents the mean of the best arm, and $Delta$ is the difference of the mean of the best and the second-best arm. We then propose an algorithm that achieves the bound up to constant terms.

【35】 Constrained Sampling from a Kernel Density Estimator to Generate Scenarios for the Assessment of Automated Vehicles 标题:来自核密度估计器的受限抽样以生成用于自动车辆评估的场景

作者:Erwin de Gelder,Eric Cator,Jan-Pieter Paardekooper,Olaf Op den Camp,Bart De Schutter 机构: The Netherlands 2Delft University of Technology, The Netherlands 3Radboud University, The Netherlands 4Radboud University, Donders Institute for Brain 备注:6 pages, 3 figures, to be published in the proceedings of the IEEE Intelligent Vehicle Symposium Workshops (IV workshop) 链接:https://arxiv.org/abs/2107.05278 摘要:自动车辆的安全评估是自动车辆开发周期的一个重要方面。作为完整安全评估的一部分,基于情景的评估方法已被该领域的许多参与者所接受。场景是AV需要适当响应的道路上情况的表示。生成所需的基于场景的测试描述的一种方法是参数化场景,并从概率密度函数(pdf)中提取这些参数。由于pdf的形状事先是未知的,假设pdf的函数形式并将参数与数据拟合可能会导致不准确的拟合。作为一种替代方法,核密度估计(KDE)是一种很有希望的用于估计底层pdf的候选方法,因为它对底层参数的分布具有灵活性。从用KDE估计的pdf中提取随机样本是可能的,而不需要评估实际pdf,这使得它适合于为例如montecarlo方法提取随机样本。然而,据作者所知,文献中并未描述当样本满足线性等式约束时从KDE取样。本文提出了一种基于KDE估计的pdf样本的抽取方法,使得样本满足线性等式约束。文中还给出了一个伪码算法。该方法可用于生成具有例如预定的起始速度的场景或生成不同类型的场景。文中还指出,当采用奇异值分解(SVD)来降低参数向量的维数时,可以采用抽样方案的方法。 摘要:The safety assessment of automated vehicles (AVs) is an important aspect of the development cycle of AVs. A scenario-based assessment approach is accepted by many players in the field as part of the complete safety assessment. A scenario is a representation of a situation on the road to which the AV needs to respond appropriately. One way to generate the required scenario-based test descriptions is to parameterize the scenarios and to draw these parameters from a probability density function (pdf). Because the shape of the pdf is unknown beforehand, assuming a functional form of the pdf and fitting the parameters to the data may lead to inaccurate fits. As an alternative, Kernel Density Estimation (KDE) is a promising candidate for estimating the underlying pdf, because it is flexible with the underlying distribution of the parameters. Drawing random samples from a pdf estimated with KDE is possible without the need of evaluating the actual pdf, which makes it suitable for drawing random samples for, e.g., Monte Carlo methods. Sampling from a KDE while the samples satisfy a linear equality constraint, however, has not been described in the literature, as far as the authors know. In this paper, we propose a method to sample from a pdf estimated using KDE, such that the samples satisfy a linear equality constraint. We also present an algorithm of our method in pseudo-code. The method can be used to generating scenarios that have, e.g., a predetermined starting speed or to generate different types of scenarios. This paper also shows that the method for sampling scenarios can be used in case a Singular Value Decomposition (SVD) is used to reduce the dimension of the parameter vectors.

【36】 Impact of Energy Efficiency on the Morphology and Behaviour of Evolved Robots 标题:能效对进化机器人形态和行为的影响

作者:Margarita Rebolledo,Daan Zeeuwe,Thomas Bartz-Beielstein,A. E. Eiben 机构:and A.E. Eiben, Institute for Data Science, Engineering, and Analytics, TH, Köln, Gummersbach, Germany, Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands 链接:https://arxiv.org/abs/2107.05249 摘要:大多数进化机器人的研究集中在进化一些有针对性的行为而不考虑能量的使用。这限制了这类系统的实用价值,因为能源效率是真实世界自主机器人的一个重要特性。在本文中,我们通过扩展我们的模拟器与电池模型,并考虑到能源消耗,在健身评估的问题。利用这个系统,我们研究了能量感知如何影响机器人的进化。由于我们的系统是进化形态以及控制器,主要的研究问题有两个:(i)对进化机器人的形态有什么影响,和(ii)如果能量消耗包含在适应性评估中,对进化机器人的行为有什么影响?结果表明,NSGA-II将能量消耗纳入多目标适应度中,在降低机器人速度的同时,减小了机器人的平均体积。然而,在不减小尺寸的情况下生成的机器人可以达到与基线集机器人相当的速度。 摘要:Most evolutionary robotics studies focus on evolving some targeted behavior without taking the energy usage into account. This limits the practical value of such systems because energy efficiency is an important property for real-world autonomous robots. In this paper, we mitigate this problem by extending our simulator with a battery model and taking energy consumption into account during fitness evaluations. Using this system we investigate how energy awareness affects the evolution of robots. Since our system is to evolve morphologies as well as controllers, the main research question is twofold: (i) what is the impact on the morphologies of the evolved robots, and (ii) what is the impact on the behavior of the evolved robots if energy consumption is included in the fitness evaluation? The results show that including the energy consumption in the fitness in a multi-objective fashion (by NSGA-II) reduces the average size of robot bodies while at the same time reducing their speed. However, robots generated without size reduction can achieve speeds comparable to robots from the baseline set.

【37】 Cautious Actor-Critic 标题:谨慎的演员-评论家

作者:Lingwei Zhu,Toshinori Kitamura,Takamitsu Matsubara 机构:Nara Institute of Science and Technology, Japan 备注:23 pages 链接:https://arxiv.org/abs/2107.05217 摘要:非策略学习的振荡性能和行为-批评(AC)设置中的持续错误要求算法能够保守地学习以更好地适应稳定性关键应用。本文提出了一种新的非策略AC算法CAC。“谨慎”这个名字来源于双重保守的性质,即我们利用保守策略迭代中的经典策略插值作为参与者,利用保守值迭代的熵正则化作为批评家。我们的主要观察结果是熵正则化的批评家促进和简化了笨拙的内插actor更新,同时确保了稳健的策略改进。在一系列具有挑战性的连续控制问题上,我们将CAC与最新的AC方法进行了比较,并证明了CAC在显著稳定学习的同时,具有相当的性能。 摘要:The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.

【38】 A Simple Reward-free Approach to Constrained Reinforcement Learning 标题:一种简单的无报酬约束强化学习方法

作者:Sobhan Miryoosefi,Chi Jin 机构:Princeton University 链接:https://arxiv.org/abs/2107.05216 摘要:在约束强化学习(RL)中,学习主体不仅要优化总体奖励,还要满足额外的安全性、多样性或预算约束。因此,现有的约束RL解决方案需要几个新的算法成分,这些成分明显不同于标准RL。另一方面,无报酬RL是在无约束文献中独立发展起来的,它不需要报酬信息就可以学习过渡动力学,因此自然能够在公共动力学下处理多目标RL。本文将无报酬RL和约束RL联系起来。特别地,我们提出了一个简单的元算法,使得在给定任何无报酬RL-oracle的情况下,可以直接解决可接近性和约束RL问题,而样本复杂度的开销可以忽略不计。利用现有的无报酬RL解算器,我们的框架在表MDP环境下为受限RL提供了锐利的样本复杂度结果,将现有的最佳结果匹配到一个水平依赖因子;我们的框架直接扩展到一个表格式的两人Markov对策,并给出了一个线性函数逼近的约束RL的新结果。 摘要:In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the overall reward but also satisfy the additional safety, diversity, or budget constraints. Consequently, existing constrained RL solutions require several new algorithmic ingredients that are notably different from standard RL. On the other hand, reward-free RL is independently developed in the unconstrained literature, which learns the transition dynamics without using the reward information, and thus naturally capable of addressing RL with multiple objectives under the common dynamics. This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity. Utilizing the existing reward-free RL solvers, our framework provides sharp sample complexity results for constrained RL in the tabular MDP setting, matching the best existing results up to a factor of horizon dependence; our framework directly extends to a setting of tabular two-player Markov games, and gives a new result for constrained RL with linear function approximation.

【39】 TransClaw U-Net: Claw U-Net with Transformers for Medical Image Segmentation 标题:TransClaw U-Net:用于医学图像分割的带Transformer的爪形U-Net

作者:Yao Chang,Hu Menghan,Zhai Guangtao,Zhang Xiao-Ping 机构: Xiao-Ping Zhang 3 1Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University 2Institute of Image Communication and Information Processing, Shanghai Jiao Tong University 3Department of Electrical 备注:8 page, 3 figures 链接:https://arxiv.org/abs/2107.05188 摘要:近年来,计算机辅助诊断已成为一个日益热门的话题。基于卷积神经网络的方法在医学图像分割和分类中取得了良好的效果。由于卷积运算的局限性,长期的空间特征往往得不到准确的提取。因此,我们提出了一种将卷积运算和变换运算结合在编码部分的TransClaw U-Net网络结构。卷积部分用于提取浅层空间特征,便于上采样后图像分辨率的恢复。利用变换器部分对补丁进行编码,利用自注意机制获取序列间的全局信息。解码部分保留了自底向上采样结构,以获得更好的细节分割性能。在Synapse多器官分割数据集上的实验结果表明,transclawu-Net的性能优于其他网络结构。烧蚀实验也证明了U-Net的泛化性能。 摘要:In recent years, computer-aided diagnosis has become an increasingly popular topic. Methods based on convolutional neural networks have achieved good performance in medical image segmentation and classification. Due to the limitations of the convolution operation, the long-term spatial features are often not accurately obtained. Hence, we propose a TransClaw U-Net network structure, which combines the convolution operation with the transformer operation in the encoding part. The convolution part is applied for extracting the shallow spatial features to facilitate the recovery of the image resolution after upsampling. The transformer part is used to encode the patches, and the self-attention mechanism is used to obtain global information between sequences. The decoding part retains the bottom upsampling structure for better detail segmentation performance. The experimental results on Synapse Multi-organ Segmentation Datasets show that the performance of TransClaw U-Net is better than other network structures. The ablation experiments also prove the generalization performance of TransClaw U-Net.

【40】 Deep Transfer Learning Based Intrusion Detection System for Electric Vehicular Networks 标题:基于深度迁移学习的电动车载网络入侵检测系统

作者:Sk. Tanzir Mehedi,Adnan Anwar,Ziaur Rahman,Kawsar Ahmed 机构:Department of Information and Communication Technology, Mawlana Bhashani Science and Technology, Centre for Cyber Security Research and Innovation (CSRI), Deakin University, Geelong , Australia, , Citation: Mehedi, S.T.; Anwar, A.;, Rahman, Z.; Ahmed, K. Deep 备注:None 链接:https://arxiv.org/abs/2107.05172 摘要:控制器局域网(CAN)总线以其简单、适用、健壮的体系结构成为实时车载网络(IVN)系统中的重要协议。由于复杂的数据密集型架构大大增加了未经授权网络的可访问性和各种类型网络攻击的可能性,IVN设备的风险仍然是不安全和易受攻击的。因此,在IVN设备中检测网络攻击已经成为人们越来越感兴趣的问题。随着入侵检测网络的迅速发展和威胁类型的不断变化,传统的基于机器学习的入侵检测系统必须不断更新以适应当前环境的安全需求。目前,深度学习、深度迁移学习的发展及其在多个领域的有效成果已经成为网络入侵检测的有效解决方案。本文提出了一种基于深度迁移学习的IVN入侵检测模型,并与现有的几种入侵检测模型进行了比较。这些独特的贡献包括:最适合识别恶意CAN消息和准确检测正常和异常活动的有效属性选择、设计基于深度转移学习的LeNet模型以及考虑真实数据的评估。为此,进行了广泛的实验性能评估。实验结果表明,与主流的机器学习、深度学习和基准深度迁移学习模型相比,本文提出的入侵检测系统的检测精度有了很大的提高,并且在实时IVN安全方面表现出了更好的性能。 摘要:The Controller Area Network (CAN) bus works as an important protocol in the real-time In-Vehicle Network (IVN) systems for its simple, suitable, and robust architecture. The risk of IVN devices has still been insecure and vulnerable due to the complex data-intensive architectures which greatly increase the accessibility to unauthorized networks and the possibility of various types of cyberattacks. Therefore, the detection of cyberattacks in IVN devices has become a growing interest. With the rapid development of IVNs and evolving threat types, the traditional machine learning-based IDS has to update to cope with the security requirements of the current environment. Nowadays, the progression of deep learning, deep transfer learning, and its impactful outcome in several areas has guided as an effective solution for network intrusion detection. This manuscript proposes a deep transfer learning-based IDS model for IVN along with improved performance in comparison to several other existing models. The unique contributions include effective attribute selection which is best suited to identify malicious CAN messages and accurately detect the normal and abnormal activities, designing a deep transfer learning-based LeNet model, and evaluating considering real-world data. To this end, an extensive experimental performance evaluation has been conducted. The architecture along with empirical analyses shows that the proposed IDS greatly improves the detection accuracy over the mainstream machine learning, deep learning, and benchmark deep transfer learning models and has demonstrated better performance for real-time IVN security.

【41】 Stateful Detection of Model Extraction Attacks 标题:模型提取攻击的状态检测

作者:Soham Pal,Yash Gupta,Aditya Kanade,Shirish Shevade 机构:Indian Institute of Science, Bangalore, nference 链接:https://arxiv.org/abs/2107.05166 摘要:机器学习即服务提供商通过应用程序编程接口(API)向开发人员公开机器学习(ML)模型。最近的工作表明,攻击者可以利用这些API,通过使用自己选择的样本查询这些ML模型,从而提取这些模型的良好近似值。我们提出VarDetect,一个有状态的监视器,它可以跟踪这样一个服务的用户所做的查询的分布,来检测模型提取攻击。VarDetect利用改进的变分自动编码器学习到的潜在分布,将三种类型的攻击者样本从良性样本中鲁棒地分离出来,并成功地为每种类型发出警报。此外,由于VarDetect被部署为一种自动防御机制,提取的替代模型显示出预期的较差性能和可转移性。最后,我们证明了即使是预先知道VarDetect部署的自适应攻击者,也能被它检测到。 摘要:Machine-Learning-as-a-Service providers expose machine learning (ML) models through application programming interfaces (APIs) to developers. Recent work has shown that attackers can exploit these APIs to extract good approximations of such ML models, by querying them with samples of their choosing. We propose VarDetect, a stateful monitor that tracks the distribution of queries made by users of such a service, to detect model extraction attacks. Harnessing the latent distributions learned by a modified variational autoencoder, VarDetect robustly separates three types of attacker samples from benign samples, and successfully raises an alarm for each. Further, with VarDetect deployed as an automated defense mechanism, the extracted substitute models are found to exhibit poor performance and transferability, as intended. Finally, we demonstrate that even adaptive attackers with prior knowledge of the deployment of VarDetect, are detected by it.

【42】 MOOCRep: A Unified Pre-trained Embedding of MOOC Entities 标题:MOOCRep:MOOC实体的统一预训练嵌入

作者:Shalini Pandey,Jaideep Srivastava 链接:https://arxiv.org/abs/2107.05154 摘要:许多机器学习模型都是为了解决大规模在线开放课程(MOOC)平台上的信息过载问题而建立的。这些模型依赖于学习MOOC实体的强大表示。然而,他们面临着缺乏专家标签数据的问题。为了克服这个问题,我们提出利用MOOC结构中丰富的未标记数据来学习MOOC实体的预训练表示,这些数据可以直接应用于下游任务。虽然现有的预训练方法在自然语言处理领域已经取得了成功,因为它们学习到了强大的文本表示,但是它们的模型并没有利用MOOC实体的丰富信息。这些丰富的信息包括讲座、概念和课程之间的图形关系,以及关于概念复杂性的领域知识。我们开发了MOOCRep,这是一种基于Transformer语言模型的新方法,训练有两个预训练目标:1)基于图的目标来捕捉存在于图中的实体和关系的强大信号;2)面向领域的目标来有效地整合概念的复杂度。我们的实验表明,MOOCRep的嵌入在概念前提预测和讲座推荐这两项对教育界非常重要的任务上优于最先进的表征学习方法。 摘要:Many machine learning models have been built to tackle information overload issues on Massive Open Online Courses (MOOC) platforms. These models rely on learning powerful representations of MOOC entities. However, they suffer from the problem of scarce expert label data. To overcome this problem, we propose to learn pre-trained representations of MOOC entities using abundant unlabeled data from the structure of MOOCs which can directly be applied to the downstream tasks. While existing pre-training methods have been successful in NLP areas as they learn powerful textual representation, their models do not leverage the richer information about MOOC entities. This richer information includes the graph relationship between the lectures, concepts, and courses along with the domain knowledge about the complexity of a concept. We develop MOOCRep, a novel method based on Transformer language model trained with two pre-training objectives : 1) graph-based objective to capture the powerful signal of entities and relations that exist in the graph, and 2) domain-oriented objective to effectively incorporate the complexity level of concepts. Our experiments reveal that MOOCRep's embeddings outperform state-of-the-art representation learning methods on two tasks important for education community, concept pre-requisite prediction and lecture recommendation.

【43】 Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF 标题:科技论文的文档嵌入:Word嵌入与TFIDF的效果

作者:H. J. Meijer,J. Truong,R. Karimi 机构:Karimi,[,−,−,−,], University of Amsterdam, Science park , WX Amsterdam, The, Elsevier BV, Radarweg , NX Amsterdam, The Netherlands 链接:https://arxiv.org/abs/2107.05151 摘要:近几年来,基于神经网络的词嵌入技术在自然语言处理领域得到了广泛的应用。所进行的研究主要集中于在维基百科或其他新闻和社交媒体来源等公共语料库上训练的单词嵌入的质量和应用。然而,这些研究仅限于一般文本,因此缺乏技术和科学上的细微差别,如特定领域的词汇、缩写或学术背景中常用的科学公式。本研究主要探讨词嵌入在大型学术语料库中的应用效果。更具体地说,我们比较了训练词嵌入和TFIDF表示在科学文章内容建模中的质量和效率。我们使用word2vec skip-gram模型,对大约7000万篇科学文章的标题和摘要进行训练。此外,我们还开发了一个在科学背景下评估内容模型的基准。该基准基于一项分类任务,将2017年发表的130万篇文章与期刊进行匹配。我们的结果表明,基于词嵌入的内容模型更适合标题(短文本),而TFIDF更适合摘要(长文本)。然而,对于较大的文本,TFIDF的微小改进是以牺牲3.7倍的内存需求和高达184倍的计算时间为代价的,这可能使其对于在线应用程序效率低下。此外,我们还建立了一个二维可视化的期刊模型,通过嵌入定性检查嵌入模型。此图显示了有用的见解,可用于查找有竞争力的期刊或差距,以提出新的期刊。 摘要:Over the last few years, neural network derived word embeddings became popular in the natural language processing literature. Studies conducted have mostly focused on the quality and application of word embeddings trained on public available corpuses such as Wikipedia or other news and social media sources. However, these studies are limited to generic text and thus lack technical and scientific nuances such as domain specific vocabulary, abbreviations, or scientific formulas which are commonly used in academic context. This research focuses on the performance of word embeddings applied to a large scale academic corpus. More specifically, we compare quality and efficiency of trained word embeddings to TFIDF representations in modeling content of scientific articles. We use a word2vec skip-gram model trained on titles and abstracts of about 70 million scientific articles. Furthermore, we have developed a benchmark to evaluate content models in a scientific context. The benchmark is based on a categorization task that matches articles to journals for about 1.3 million articles published in 2017. Our results show that content models based on word embeddings are better for titles (short text) while TFIDF works better for abstracts (longer text). However, the slight improvement of TFIDF for larger text comes at the expense of 3.7 times more memory requirement as well as up to 184 times higher computation times which may make it inefficient for online applications. In addition, we have created a 2-dimensional visualization of the journals modeled via embeddings to qualitatively inspect embedding model. This graph shows useful insights and can be used to find competitive journals or gaps to propose new journals.

【44】 Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity 标题:Repo2Vec:一种确定知识库相似度的综合嵌入方法

作者:Md Omar Faruk Rokon,Pei Yan,Risul Islam,Michalis Faloutsos 机构:UC Riverside 备注:10 pages, 8 figures, 5 tables. In press: ICSME'21 链接:https://arxiv.org/abs/2107.05112 摘要:我们如何在大型在线存档(如GitHub)中识别类似的存储库和集群?确定存储库相似性是研究此类软件生态系统的动态和演化的一个重要组成部分。关键的挑战是确定不同存储库特性的正确表示方式,其方式是:(a)它捕获可用信息的所有方面,以及(b)它易于被MLM算法使用。我们提出了Repo2Vec,这是一种综合的嵌入方法,通过组合来自三种类型信息源的特征,将存储库表示为一个分布式向量。作为我们的关键新奇,我们考虑三种类型的信息:(a)元数据,(b)存储库的结构,以及(c)源代码。我们还介绍了一系列的嵌入方法来表示这些信息类型,并将它们组合到一个嵌入中。我们使用来自GitHub的两个真实数据集对1013个存储库的方法进行了评估。首先,我们展示了我们的方法在精确度方面优于以前的方法(93%对78%),强相似存储库的数量几乎是以前的两倍,误报率减少了30%。第二,我们展示了repo2vec如何为:(a)区分恶意软件和良性存储库,以及(b)识别有意义的层次聚类提供坚实的基础。例如,在区分恶意软件和良性存储库方面,我们实现了98%的准确率和96%的召回率。总的来说,我们的工作是实现许多存储库分析功能的基本构建块,例如按目标平台或意图对存储库进行分类、检测代码重用和克隆以及识别沿袭和演化。 摘要:How can we identify similar repositories and clusters among a large online archive, such as GitHub? Determiningrepository similarity is an essential building block in studying the dynamics and the evolution of such software ecosystems. The key challenge is to determine the right representation for the diverse repository features in a way that: (a) it captures all aspects of the available information, and (b) it is readily usable by MLalgorithms. We propose Repo2Vec, a comprehensive embedding approach to represent a repository as a distributed vector by combining features from three types of information sources. As our key novelty, we consider three types of information: (a)metadata, (b) the structure of the repository, and (c) the source code. We also introduce a series of embedding approaches to represent and combine these information types into a single embedding. We evaluate our method with two real datasets from GitHub for a combined 1013 repositories. First, we show that our method outperforms previous methods in terms of precision (93%vs 78%), with nearly twice as many Strongly Similar repositories and 30% fewer False Positives. Second, we show how Repo2Vecprovides a solid basis for: (a) distinguishing between malware and benign repositories, and (b) identifying a meaningful hierarchical clustering. For example, we achieve 98% precision and 96%recall in distinguishing malware and benign repositories. Overall, our work is a fundamental building block for enabling many repository analysis functions such as repository categorization by target platform or intention, detecting code-reuse and clones, and identifying lineage and evolution.

【45】 Machine Learning Challenges and Opportunities in the African Agricultural Sector -- A General Perspective 标题:机器学习在非洲农业部门的挑战和机遇--总体观点

作者:Racine Ly 机构:AKADEMIYA, Kigali, Rwanda 备注:This paper has been submitted as an internal discussion paper at AKADEMIYA2063. It has 13 pages and contains 4 images and 2 tables 链接:https://arxiv.org/abs/2107.05101 摘要:计算机能力的提高、算法技术的进步以及可用数据的显著增加促成了人工智能技术的最新发展。它的一个分支,叫做机器学习(ML),在模仿人类智能的特征方面表现出很强的能力,比如视觉、语言和解决问题的能力。然而,正如以前的技术革命所表明的那样,它们最显著的影响可能主要发生在非传统技术使用者的其他部门。农业部门对非洲经济至关重要;在气候变化时代,提高产量、减少损失和有效管理自然资源至关重要。机器学习是一种具有预测附加值的技术,因此有可能减少跨部门(在本例中是农业部门)的不确定性和风险。本文的目的是背景和讨论障碍ML为基础的解决方案,为非洲农业。在第二部分中,我们从历史和技术的角度概述了ML技术及其主要驱动力。在第三部分中,我们简要回顾了目前ML在农业中的应用。最后,在第4节中,我们讨论了对非洲日益增长的ML兴趣以及在农业部门创建和使用基于ML的解决方案的潜在障碍。 摘要:The improvement of computers' capacities, advancements in algorithmic techniques, and the significant increase of available data have enabled the recent developments of Artificial Intelligence (AI) technology. One of its branches, called Machine Learning (ML), has shown strong capacities in mimicking characteristics attributed to human intelligence, such as vision, speech, and problem-solving. However, as previous technological revolutions suggest, their most significant impacts could be mostly expected on other sectors that were not traditional users of that technology. The agricultural sector is vital for African economies; improving yields, mitigating losses, and effective management of natural resources are crucial in a climate change era. Machine Learning is a technology with an added value in making predictions, hence the potential to reduce uncertainties and risk across sectors, in this case, the agricultural sector. The purpose of this paper is to contextualize and discuss barriers to ML-based solutions for African agriculture. In the second section, we provided an overview of ML technology from a historical and technical perspective and its main driving force. In the third section, we provided a brief review of the current use of ML in agriculture. Finally, in section 4, we discuss ML growing interest in Africa and the potential barriers to creating and using ML-based solutions in the agricultural sector.

【46】 Fairer Software Made Easier (using "Keys")

作者:Tim Menzies,Kewen Peng,Andre Lustosa 机构:Computer Science, NC State, USA 备注:Submitted to NIER ASE 2021 (new ideas, emerging research) 链接:https://arxiv.org/abs/2107.05088 摘要:我们能简化软件分析的解释吗?也许 吧。最近的研究结果表明,系统往往表现出“键效应”;i、 一些关键的特性控制着其余的。很明显,对于由几个键控制的系统来说,解释和控制只是在这些键之间运行一些“假设”查询。通过利用关键效应,即使是复杂的解释,比如道德人工智能系统所需要的解释,也应该可以大大简化。 摘要:Can we simplify explanations for software analytics? Maybe. Recent results show that systems often exhibit a "keys effect"; i.e. a few key features control the rest. Just to say the obvious, for systems controlled by a few keys, explanation and control is just a matter of running a handful of "what-if" queries across the keys. By exploiting the keys effect, it should be possible to dramatically simplify even complex explanations, such as those required for ethical AI systems.

【47】 SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs 标题:SGD:隐式正则化、批量和多期的作用

作者:Satyen Kale,Ayush Sekhari,Karthik Sridharan 机构:Google Research, NY, Cornell University 链接:https://arxiv.org/abs/2107.05074 摘要:多历元、小批量、随机梯度下降(SGD)是大参数模型学习的首选方法。一个流行的理论解释了为什么SGD在实践中工作得很好,是因为该算法有一个隐式正则化,使其输出偏向于一个好的解决方案。也许理论上最容易理解的SGD学习设置是随机凸优化(SCO),其中众所周知,SGD以$O(1/sqrt{n})$的速率学习,其中$n$是样本数。在本文中,我们考虑SCO的问题,并探讨隐式正则化,批量大小和多个时代的作用SGD。我们的主要贡献有三个方面:(a)我们证明了对于任何正则化子,都存在一个正则化经验风险最小化无法学习的SCO问题。这自动排除了任何隐含的正则化为基础的解释,为成功的SGD(b) 在样本复杂度方面,我们通过经验损失梯度下降(GD)将SGD和学习分离开来。我们证明了存在一个SCO问题,使得具有任何步长和迭代次数的GD只能以次优的速率学习:至少$widetilde{Omega}(1/n^{5/12})$(c) 我们提出了一种在实践中常用的SGD的多历元变体。在最坏的情况下,我们证明了该算法至少和单通SGD算法一样好。然而,对于某些SCO问题,对数据集进行多次传递可以显著优于单次传递SGD。我们将我们的结果推广到一般的学习环境,给出了一个对于任何数据分布都是可学习的问题,并且对于这个问题,对于任何正则化函数,SGD严格地优于RERM。最后,我们讨论了我们的结果对深度学习的影响,并显示了两层对角神经网络的SGD和ERM之间的分离。 摘要:Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an implicit regularization that biases its output towards a good solution. Perhaps the theoretically most well understood learning setting for SGD is that of Stochastic Convex Optimization (SCO), where it is well known that SGD learns at a rate of $O(1/sqrt{n})$, where $n$ is the number of samples. In this paper, we consider the problem of SCO and explore the role of implicit regularization, batch size and multiple epochs for SGD. Our main contributions are threefold: (a) We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn. This automatically rules out any implicit regularization based explanation for the success of SGD. (b) We provide a separation between SGD and learning via Gradient Descent on empirical loss (GD) in terms of sample complexity. We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $widetilde{Omega}(1/n^{5/12})$. (c) We present a multi-epoch variant of SGD commonly used in practice. We prove that this algorithm is at least as good as single pass SGD in the worst case. However, for certain SCO problems, taking multiple passes over the dataset can significantly outperform single pass SGD. We extend our results to the general learning setting by showing a problem which is learnable for any data distribution, and for this problem, SGD is strictly better than RERM for any regularization function. We conclude by discussing the implications of our results for deep learning, and show a separation between SGD and ERM for two layer diagonal neural networks.

【48】 Locality Relationship Constrained Multi-view Clustering Framework 标题:位置关系约束的多视图聚类框架

作者:Xiangzhu Meng,Wei Wei,Wenzhe Liu 链接:https://arxiv.org/abs/2107.05073 摘要:在大多数实际应用中,通常使用来自不同视图的多个特征来表示一个对象。其中,基于多视点子空间的聚类方法得到了众多研究者的广泛关注,其目的是为多视点数据提供聚类解决方案。然而,现有的方法大多不能充分利用多视点场景下样本间的局部几何结构和相似关系。为了解决这些问题,我们提出了一种新的基于局部关系约束的多视图学习方法来研究多视图聚类问题,称为局部关系约束多视图聚类框架(LRC-MCF)。LRC-MCF通过获取多个视图之间的局部关系信息和共同的相似关系,探索不同视图之间的多样性、几何性、一致性和互补性信息。此外,LRC-MCF算法在寻找公共视图局部结构时充分考虑了不同视图的权重,直接生成最终的聚类结果。为了有效地减少学习表示的冗余度,本文还考虑了公共相似矩阵的低秩约束。为了解决LRC-MCF的极小化问题,提出了一种交替方向极小化(ADM)方法来迭代计算所有变量的LRC-MCF。在七个基准多视图数据集上的大量实验结果验证了LRC-MCF方法的有效性。 摘要:In most practical applications, it's common to utilize multiple features from different views to represent one object. Among these works, multi-view subspace-based clustering has gained extensive attention from many researchers, which aims to provide clustering solutions to multi-view data. However, most existing methods fail to take full use of the locality geometric structure and similarity relationship among samples under the multi-view scenario. To solve these issues, we propose a novel multi-view learning method with locality relationship constraint to explore the problem of multi-view clustering, called Locality Relationship Constrained Multi-view Clustering Framework (LRC-MCF). LRC-MCF aims to explore the diversity, geometric, consensus and complementary information among different views, by capturing the locality relationship information and the common similarity relationships among multiple views. Moreover, LRC-MCF takes sufficient consideration to weights of different views in finding the common-view locality structure and straightforwardly produce the final clusters. To effectually reduce the redundancy of the learned representations, the low-rank constraint on the common similarity matrix is considered additionally. To solve the minimization problem of LRC-MCF, an Alternating Direction Minimization (ADM) method is provided to iteratively calculate all variables LRC-MCF. Extensive experimental results on seven benchmark multi-view datasets validate the effectiveness of the LRC-MCF method.

【49】 Machine Learning based CVD Virtual Metrology in Mass Produced Semiconductor Process 标题:基于机器学习的大规模生产半导体CVD虚拟测量

作者:Yunsong Xie,Ryan Stearrett 机构:Samsung Austin Semiconductor Company, Austin, TX, USA 链接:https://arxiv.org/abs/2107.05071 摘要:在基于机器学习的化学气相沉积(CVD)虚拟计量(VM)中,对数据输入、特征选择和回归算法三个关键方面进行了交叉测试。结果表明,线性特征选择回归算法对虚拟机数据的拟合程度普遍偏低。为了获得更高的预测精度,数据输入也是必要的,因为当获得最佳精度时,数据可用性仅为~70%。本文提出了一种非线性特征选择回归算法,结合最近邻数据插补算法,预测精度可达0.7。这将导致70%的CVD工艺变化减少,这被认为将导致物理计量的频率降低,以及更可靠的大规模生产的晶圆质量提高。 摘要:A cross-benchmark has been done on three critical aspects, data imputing, feature selection and regression algorithms, for machine learning based chemical vapor deposition (CVD) virtual metrology (VM). The result reveals that linear feature selection regression algorithm would extensively under-fit the VM data. Data imputing is also necessary to achieve a higher prediction accuracy as the data availability is only ~70% when optimal accuracy is obtained. This work suggests a nonlinear feature selection and regression algorithm combined with nearest data imputing algorithm would provide a prediction accuracy as high as 0.7. This would lead to 70% reduced CVD processing variation, which is believed to will lead to reduced frequency of physical metrology as well as more reliable mass-produced wafer with improved quality.

【50】 BCNet: A Deep Convolutional Neural Network for Breast Cancer Grading 标题:BCNet:一种用于乳腺癌分级的深度卷积神经网络

作者:Pouya Hallaj Zavareh,Atefeh Safayari,Hamidreza Bolhasani 链接:https://arxiv.org/abs/2107.05037 摘要:乳腺癌已成为全世界最常见的癌症之一,对人类,特别是女性构成严重威胁。为了提供有效的治疗或预防这种癌症,疾病的早期诊断将是非常重要的。有各种各样的方法来检测这种疾病,其中使用图像必须发挥主导作用。近年来,深度学习被广泛地应用于各个科学领域,尤其是医学领域。在乳腺癌检测问题中,一些不同的深度学习技术在不同的数据集上得到了发展,并取得了很好的准确性。在这篇文章中,我们的目的是提出一个深层神经网络模型来分类来自Databiox图像数据集的组织病理学图像,作为这个图像数据库的第一个应用。我们提出的BCNet模型充分利用了迁移学习的方法,即从可用的相关模型中选择VGG16作为特征抽取器。此外,为了解决数据不足的问题,我们采用了数据扩充技术来扩充输入数据集。本研究中的所有实现,从预处理动作到描述模型架构图,都是使用tf.kerasapi实现的。结果表明,该模型的有效性验证准确率为88%,评价准确率为72%。 摘要:Breast cancer has become one of the most prevalent cancers by which people all over the world are affected and is posed serious threats to human beings, in a particular woman. In order to provide effective treatment or prevention of this cancer, disease diagnosis in the early stages would be of high importance. There have been various methods to detect this disorder in which using images have to play a dominant role. Deep learning has been recently adopted widely in different areas of science, especially medicine. In breast cancer detection problems, some diverse deep learning techniques have been developed on different datasets and resulted in good accuracy. In this article, we aimed to present a deep neural network model to classify histopathological images from the Databiox image dataset as the first application on this image database. Our proposed model named BCNet has taken advantage of the transfer learning approach in which VGG16 is selected from available pertained models as a feature extractor. Furthermore, to address the problem of insufficient data, we employed the data augmentation technique to expand the input dataset. All implementations in this research, ranging from pre-processing actions to depicting the diagram of the model architecture, have been carried out using tf.keras API. As a consequence of the proposed model execution, the significant validation accuracy of 88% and evaluation accuracy of 72% obtained.

【51】 Blending Pruning Criteria for Convolutional Neural Networks 标题:卷积神经网络的混合剪枝准则

作者:Wei He,Zhongzhan Huang,Mingfu Liang,Senwei Liang,Haizhao Yang 机构:Yang, Nanyang Technological University , Tsinghua University, Northwestern University , Purdue University 链接:https://arxiv.org/abs/2107.05033 摘要:卷积神经网络(CNNs)在各种视觉应用中的进展引起了人们的广泛关注。然而,大多数cnn无法满足实际部署的严格要求。为了克服这个问题,最近流行的网络剪枝是一种减少模型冗余的有效方法。但是,根据过滤器在不同修剪标准上的“重要性”对过滤器进行排序可能不一致。根据某个标准,一个过滤器可能是重要的,而根据另一个标准,它是不必要的,这表明每个标准只是综合“重要性”的一个局部视图。基于这个动机,我们提出了一个新的框架来整合现有的过滤器剪枝标准,探索标准的多样性。该框架包括两个阶段:准则聚类和过滤器重要性校正。首先,根据“重要性”得分的排序,通过分层聚类来压缩剪枝准则。第二,在每个聚类中,我们提出一个校正因子来调整每个候选混合样本的显著性,并通过进化算法来搜索最优混合准则。在CIFAR-100和ImageNet基准上的定量结果表明,我们的框架优于最先进的基线,在剪枝之后重新升级到紧凑的模型性能。 摘要:The advancement of convolutional neural networks (CNNs) on various vision applications has attracted lots of attention. Yet the majority of CNNs are unable to satisfy the strict requirement for real-world deployment. To overcome this, the recent popular network pruning is an effective method to reduce the redundancy of the models. However, the ranking of filters according to their "importance" on different pruning criteria may be inconsistent. One filter could be important according to a certain criterion, while it is unnecessary according to another one, which indicates that each criterion is only a partial view of the comprehensive "importance". From this motivation, we propose a novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity. The proposed framework contains two stages: Criteria Clustering and Filters Importance Calibration. First, we condense the pruning criteria via layerwise clustering based on the rank of "importance" score. Second, within each cluster, we propose a calibration factor to adjust their significance for each selected blending candidates and search for the optimal blending criterion via Evolutionary Algorithm. Quantitative results on the CIFAR-100 and ImageNet benchmarks show that our framework outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning.

【52】 Semi-Supervised Object Detection with Adaptive Class-Rebalancing Self-Training 标题:自适应类重平衡自训练的半监督目标检测

作者:Fangyuan Zhang,Tianxiang Pan,Bin Wang 机构: Software School of Tsinghua University 链接:https://arxiv.org/abs/2107.05031 摘要:本研究深入探讨半监督目标侦测(SSOD),以提高额外未标记资料的侦测器效能。最近,通过自我训练,SSOD的表现达到了最先进的水平,其中训练监督由基本事实和伪标签组成。在目前的研究中,我们发现SSOD的班级不平衡严重阻碍了自我训练的有效性。为了解决类不平衡的问题,我们提出了一种基于CropBank的自适应类再平衡自训练(ACRST)。ACRST自适应地用CropBank中提取的前景实例重新平衡训练数据,从而缓解了类的不平衡。由于检测任务的高度复杂性,我们观察到SSOD中的自训练和数据再平衡都会受到噪声伪标签的影响。因此,我们提出了一种新的两阶段滤波算法来产生精确的伪标签。我们的方法在MS-COCO和VOC基准上取得了令人满意的改进。当MS-COCO中仅使用1%的标记数据时,我们的方法比监督基线的mAP提高了17.02,比现有方法的mAP提高了5.32。 摘要:This study delves into semi-supervised object detection (SSOD) to improve detector performance with additional unlabeled data. State-of-the-art SSOD performance has been achieved recently by self-training, in which training supervision consists of ground truths and pseudo-labels. In current studies, we observe that class imbalance in SSOD severely impedes the effectiveness of self-training. To address the class imbalance, we propose adaptive class-rebalancing self-training (ACRST) with a novel memory module called CropBank. ACRST adaptively rebalances the training data with foreground instances extracted from the CropBank, thereby alleviating the class imbalance. Owing to the high complexity of detection tasks, we observe that both self-training and data-rebalancing suffer from noisy pseudo-labels in SSOD. Therefore, we propose a novel two-stage filtering algorithm to generate accurate pseudo-labels. Our method achieves satisfactory improvements on MS-COCO and VOC benchmarks. When using only 1% labeled data in MS-COCO, our method achieves 17.02 mAP improvement over supervised baselines, and 5.32 mAP improvement compared with state-of-the-art methods.

【53】 Improving Low-resource Reading Comprehension via Cross-lingual Transposition Rethinking 标题:跨语言换位反思提高低资源阅读理解水平

作者:Gaochen Wu,Bin Xu1,Yuxin Qin,Fei Kong,Bangchang Liu,Hongwen Zhao,Dejie Chang 机构:Computer Science and Technology, Tsinghua University, Beijing, China, Beijing MoreHealth Technology Group Co. Ltd 链接:https://arxiv.org/abs/2107.05002 摘要:摘要在大规模高质量的阅读理解训练数据的支持下,摘要阅读理解取得了巨大的进步。尽管取得了如此迅速的进展和广泛的应用,但除英语等高资源语言外的其他语言中的数据集仍然很少。为了解决这个问题,我们提出了一个跨语言转位再思考(XLTT)模型,通过在多语言环境下对现有的高质量提取阅读理解数据集进行建模。具体而言,我们提出多语种适应性注意(MAA),将内注意和间注意结合起来,从每一对语系中学习更一般化的语义和词汇知识。此外,为了充分利用现有的数据集,我们采用了一种新的训练框架,通过计算每个现有数据集和目标数据集之间的任务级相似度来训练我们的模型。实验结果表明,我们的XLTT模型在两个多语种的ERC基准上超过了6个基线,特别是对于低资源语言,F1和EM的平均改进分别为3.9和4.1。 摘要:Extractive Reading Comprehension (ERC) has made tremendous advances enabled by the availability of large-scale high-quality ERC training data. Despite of such rapid progress and widespread application, the datasets in languages other than high-resource languages such as English remain scarce. To address this issue, we propose a Cross-Lingual Transposition ReThinking (XLTT) model by modelling existing high-quality extractive reading comprehension datasets in a multilingual environment. To be specific, we present multilingual adaptive attention (MAA) to combine intra-attention and inter-attention to learn more general generalizable semantic and lexical knowledge from each pair of language families. Furthermore, to make full use of existing datasets, we adopt a new training framework to train our model by calculating task-level similarities between each existing dataset and target dataset. The experimental results show that our XLTT model surpasses six baselines on two multilingual ERC benchmarks, especially more effective for low-resource languages with 3.9 and 4.1 average improvement in F1 and EM, respectively.

【54】 Prediction Surface Uncertainty Quantification in Object Detection Models for Autonomous Driving 标题:自动驾驶目标检测模型中预测表面不确定性的量化

作者:Ferhat Ozgur Catak,Tao Yue,Shaukat Ali 机构:Simula Research Laboratory, Fornebu, Norway, Nanjing University of Aeronautics and Astronautics 备注:Accepted in AITest 2021, The Third IEEE International Conference On Artificial Intelligence Testing 链接:https://arxiv.org/abs/2107.04991 摘要:自动驾驶汽车中的目标检测通常基于摄像机图像和激光雷达输入,通常用于训练预测模型,如用于目标识别、速度调整等决策的深度人工神经网络。这种决策中的错误可能是有害的;因此,通过不确定性度量来度量这些预测模型所作决策的可靠性是至关重要的。在深度学习模型中,不确定性常常被用来衡量分类问题。然而,自主驾驶中的深度学习模型往往是多输出回归模型。因此,我们提出一种新的方法称为纯(预测表面不确定度)来测量这种回归模型的预测不确定度。我们将目标识别问题描述为一个具有多个输出的回归模型,用于在二维摄像机视图中寻找目标位置。为了评估,我们修改了三个广泛应用的对象识别模型(即YoLo、SSD300和SSD512),并使用了KITTI、Stanford Cars、Berkeley DeepDrive和NEXET数据集。结果表明,预测面不确定性与预测精度呈显著负相关,表明不确定性对自主驾驶决策有显著影响。 摘要:Object detection in autonomous cars is commonly based on camera images and Lidar inputs, which are often used to train prediction models such as deep artificial neural networks for decision making for object recognition, adjusting speed, etc. A mistake in such decision making can be damaging; thus, it is vital to measure the reliability of decisions made by such prediction models via uncertainty measurement. Uncertainty, in deep learning models, is often measured for classification problems. However, deep learning models in autonomous driving are often multi-output regression models. Hence, we propose a novel method called PURE (Prediction sURface uncErtainty) for measuring prediction uncertainty of such regression models. We formulate the object recognition problem as a regression model with more than one outputs for finding object locations in a 2-dimensional camera view. For evaluation, we modified three widely-applied object recognition models (i.e., YoLo, SSD300 and SSD512) and used the KITTI, Stanford Cars, Berkeley DeepDrive, and NEXET datasets. Results showed the statistically significant negative correlation between prediction surface uncertainty and prediction accuracy suggesting that uncertainty significantly impacts the decisions made by autonomous driving.

【55】 Leveraging Domain Adaptation for Low-Resource Geospatial Machine Learning 标题:利用领域自适应实现低资源地理空间机器学习

作者:Jack Lynch,Sam Wookey 机构: 1Department of Electrical Engineering 备注:Tackling Climate Change with Machine Learning Workshop at ICML 2021 链接:https://arxiv.org/abs/2107.04983 摘要:随着地理空间图像可用性和分辨率的激增,遥感机器学习已经成熟,但由于需要标记数据,机器学习的实用性受到限制。更重要的是,许多标记的地理空间数据集是特定于某些地区、仪器或极端天气事件的。我们调查应用现代领域适应多个拟议的地理空间基准,发现独特的挑战,并提出解决办法。 摘要:Machine learning in remote sensing has matured alongside a proliferation in availability and resolution of geospatial imagery, but its utility is bottlenecked by the need for labeled data. What's more, many labeled geospatial datasets are specific to certain regions, instruments, or extreme weather events. We investigate the application of modern domain-adaptation to multiple proposed geospatial benchmarks, uncovering unique challenges and proposing solutions to them.

【56】 Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results 标题:分布外动态检测:与RL相关的基准和结果

作者:Mohamad H Danesh,Alan Fern 机构: a trained 1School of Electrical Engineering and Computer Science, Ore-gon State University 备注:ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning 链接:https://arxiv.org/abs/2107.04982 摘要:我们研究了分布外动态(OODD)检测问题,该问题涉及到与训练分布动态相比,时间过程的动态何时发生变化的检测。这与控制、强化学习(RL)和多变量时间序列的应用有关,其中测试时间动态的变化会以未知的方式影响学习控制器/预测器的性能。这个问题在深度RL环境下尤其重要,在深度RL环境下,学习的控制器常常过度适应训练环境。然而,目前对于RL研究中常用的环境类型,还缺乏已建立的OODD基准。我们的第一个贡献是设计一组OODD基准,这些基准源于具有不同OODD类型和强度的通用RL环境。我们的第二个贡献是设计了一种基于递归隐式分位数网络(RIQNs)的强OODD基线方法,用于监控OODD检测的自回归预测误差。我们的最终贡献是评估基准上的RIQN方法,为将来的比较提供基线结果。 摘要:We study the problem of out-of-distribution dynamics (OODD) detection, which involves detecting when the dynamics of a temporal process change compared to the training-distribution dynamics. This is relevant to applications in control, reinforcement learning (RL), and multi-variate time-series, where changes to test time dynamics can impact the performance of learning controllers/predictors in unknown ways. This problem is particularly important in the context of deep RL, where learned controllers often overfit to the training environment. Currently, however, there is a lack of established OODD benchmarks for the types of environments commonly used in RL research. Our first contribution is to design a set of OODD benchmarks derived from common RL environments with varying types and intensities of OODD. Our second contribution is to design a strong OODD baseline approach based on recurrent implicit quantile networks (RIQNs), which monitors autoregressive prediction errors for OODD detection. Our final contribution is to evaluate the RIQN approach on the benchmarks to provide baseline results for future comparison.

【57】 Self-service Data Classification Using Interactive Visualization and Interpretable Machine Learning 标题:基于交互式可视化和可解释机器学习的自助式数据分类

作者:Sridevi Narayana Wagle,Boris Kovalerchuk 机构:Department of Computer Science, Central Washington University, USA 备注:37 pages, 33 figures, 7 tables 链接:https://arxiv.org/abs/2107.04971 摘要:机器学习算法通常产生被最终用户和开发人员视为复杂黑盒模型的模型。他们无法从设计的领域来解释模型。提出的迭代视觉逻辑分类器(IVLC)是一种可解释的机器学习算法,它允许最终用户设计一个模型并对数据进行分类,具有更高的可信度,而且不必牺牲准确性。这种技术特别有助于处理敏感和关键的数据,如癌症数据在医疗领域的高成本的错误。借助于所提出的交互式无损多维可视化,最终用户可以识别数据中的模式,并据此做出可解释的决策。这种选择在黑箱机器学习方法中是不可能的。交互式移位成对坐标软件系统(SPCVis)支持可解释IVLC算法。它是一个具有用户交互功能的无损多维数据可视化系统。交互式方法为最终用户提供了灵活性,使其能够以自助方式执行数据分类,而不必依赖于机器学习专家。在处理具有数百个维度/特征的大型数据集时,交互式模式发现变得很有挑战性。为了克服这个问题,本章提出了一种结合新的坐标顺序优化算法和遗传算法的自动分类方法。COO算法自动生成最能代表数据分离的坐标对序列,遗传算法通过自动生成数据分类区域来优化IVLC算法。实验结果表明,该方法是可行的,包括用于数据分类的交互式和自动化过程的基准数据集。 摘要:Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. They fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and lossless multidimensional visualization, end users can identify the pattern in the data based on which they can make explainable decisions. Such options would not be possible in black box machine learning methodologies. The interpretable IVLC algorithm is supported by the Interactive Shifted Paired Coordinates Software System (SPCVis). It is a lossless multidimensional data visualization system with user interactive features. The interactive approach provides flexibility to the end user to perform data classification as self-service without having to rely on a machine learning expert. Interactive pattern discovery becomes challenging while dealing with large data sets with hundreds of dimensions/features. To overcome this problem, this chapter proposes an automated classification approach combined with new Coordinate Order Optimizer (COO) algorithm and a Genetic algorithm. The COO algorithm automatically generates the coordinate pair sequences that best represent the data separation and the genetic algorithm helps optimizing the proposed IVLC algorithm by automatically generating the areas for data classification. The feasibility of the approach is shown by experiments on benchmark datasets covering both interactive and automated processes used for data classification.

【58】 Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision 标题:无处不在的学习:有限监督下的广义零射学习的再思考

作者:Gaurav Bhatt,Shivam Chandok,Vineeth N Balasubramanian 机构:IIT Hyderabad 链接:https://arxiv.org/abs/2107.04952 摘要:大多数Zero-Shot学习方法和少量镜头学习方法的一个常见问题是,它们对所看到的类有偏见,从而导致次优性能。现有的工作旨在利用未标记的图像从看不见的类(即传Zero-Shot)在训练过程中,使泛化。然而,这限制了它们在实际场景中的使用,在实际场景中,来自目标不可见类的数据不可用或无法收集。在这项工作中,我们提出了一个实用的归纳零和Few-Shot学习设置,其中来自其他数据类的未标记图像,不属于可见或不可见的类别,可以用来提高任何镜头学习的泛化。我们利用了一个基于专家产品的公式,并引入了一个新的AUD模块,使我们能够使用数据类之外的未标记样本,这些数据类通常很容易获得,并且实际上不需要任何注释成本。此外,我们还证明了我们的模型的适用性,以解决更实际和更具挑战性的,在有限的监督设置下,即使是基类看到没有足够的注释样本的广义零炮。 摘要:A common problem with most zero and few-shot learning approaches is they suffer from bias towards seen classes resulting in sub-optimal performance. Existing efforts aim to utilize unlabeled images from unseen classes (i.e transductive zero-shot) during training to enable generalization. However, this limits their use in practical scenarios where data from target unseen classes is unavailable or infeasible to collect. In this work, we present a practical setting of inductive zero and few-shot learning, where unlabeled images from other out-of-data classes, that do not belong to seen or unseen categories, can be used to improve generalization in any-shot learning. We leverage a formulation based on product-of-experts and introduce a new AUD module that enables us to use unlabeled samples from out-of-data classes which are usually easily available and practically entail no annotation cost. In addition, we also demonstrate the applicability of our model to address a more practical and challenging, Generalized Zero-shot under a limited supervision setting, where even base seen classes do not have sufficient annotated samples.

【59】 Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network 标题:部分对抗性时间注意网络的部分视频域自适应

作者:Yuecong Xu,Jianfei Yang,Haozhi Cao,Qi Li,Kezhi Mao,Zhenghua Chen 机构:School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Nanyang Avenue, Singapore , Institute for Infocomm Research, Agency for Science, Technology and Research (ASTAR), Singapore 备注:The new datasets for PVDA: HMDB-ARID(partial), MiniKinetics-UCF, HMDB-ARID(partial) can be downloaded from this https URL 链接:https://arxiv.org/abs/2107.04941 摘要:部分域适配(PDA)是一种实用的、通用的域适配方案,它放松了完全共享标签空间的假设,使得源标签空间包含目标标签空间。PDA的主要挑战是源类引起的负迁移问题。对于视频而言,这种负迁移可能是由空间和时间特征触发的,这导致了一个更具挑战性的部分视频域自适应(PVDA)问题。本文提出了一种新的部分对抗式时间注意网络(PATAN)来解决PVDA问题,该网络利用空间和时间特征来过滤纯源类。此外,PATAN通过关注对类过滤过程贡献更大的局部时态特征来构造有效的整体时态特征。我们进一步引入新的基准,以促进PVDA问题的研究,涵盖广泛的PVDA场景。实证结果表明,国家的最先进的性能,我们提出的PATAN跨多个PVDA基准。 摘要:Partial Domain Adaptation (PDA) is a practical and general domain adaptation scenario, which relaxes the fully shared label space assumption such that the source label space subsumes the target one. The key challenge of PDA is the issue of negative transfer caused by source-only classes. For videos, such negative transfer could be triggered by both spatial and temporal features, which leads to a more challenging Partial Video Domain Adaptation (PVDA) problem. In this paper, we propose a novel Partial Adversarial Temporal Attentive Network (PATAN) to address the PVDA problem by utilizing both spatial and temporal features for filtering source-only classes. Besides, PATAN constructs effective overall temporal features by attending to local temporal features that contribute more toward the class filtration process. We further introduce new benchmarks to facilitate research on PVDA problems, covering a wide range of PVDA scenarios. Empirical results demonstrate the state-of-the-art performance of our proposed PATAN across the multiple PVDA benchmarks.

【60】 Aligning Correlation Information for Domain Adaptation in Action Recognition 标题:动作识别中域自适应的相关信息对齐

作者:Yuecong Xu,Jianfei Yang,Haozhi Cao,Kezhi Mao,Jianxiong Yin,Simon See 机构:School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, NVIDIA AI Tech Center, International Business Park Rd, #,-,A Nordic European, center, Singapore 备注:The dataset HMDB-ARID is available at this https URL 链接:https://arxiv.org/abs/2107.04932 摘要:域适配(DA)方法解决域转移问题,使网络能够应用于不同的场景。虽然近年来人们提出了各种各样的图像数据采集方法,但对视频数据采集的研究却非常有限。这在一定程度上是由于适应视频中特征的不同形式的复杂性,其中包括作为跨时空维度的像素的长期依赖性而提取的相关特征。相关特征与动作类高度相关,通过有监督的动作识别任务证明了它们在准确提取视频特征方面的有效性。然而,由于域转移,同一动作的相关特征在不同的域中会有所不同。因此,我们提出了一种新的对抗性相关适应网络(ACAN),通过调整像素相关来调整动作视频。ACAN的目标是最小化相关信息的分布,称为像素相关差异(PCD)。此外,由于缺乏具有较大域偏移的跨域视频数据集,视频数据挖掘的研究也受到限制。因此,我们引入了一个新的HMDB-ARID数据集,由于域之间的统计差异较大,导致了较大的域偏移。此数据集是为了利用当前的数据集进行暗视频分类而构建的。实验结果证明了我们提出的ACAN在现有和新的视频数据集上的最新性能。 摘要:Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards video DA. This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependencies of pixels across spatiotemporal dimensions. The correlation features are highly associated with action classes and proven their effectiveness in accurate video feature extraction through the supervised action recognition task. Yet correlation features of the same action would differ across domains due to domain shift. Therefore we propose a novel Adversarial Correlation Adaptation Network (ACAN) to align action videos by aligning pixel correlations. ACAN aims to minimize the distribution of correlation information, termed as Pixel Correlation Discrepancy (PCD). Additionally, video DA research is also limited by the lack of cross-domain video datasets with larger domain shifts. We, therefore, introduce a novel HMDB-ARID dataset with a larger domain shift caused by a larger statistical difference between domains. This dataset is built in an effort to leverage current datasets for dark video classification. Empirical results demonstrate the state-of-the-art performance of our proposed ACAN for both existing and the new video DA datasets.

【61】 Distributed Deep Reinforcement Learning for Intelligent Traffic Monitoring with a Team of Aerial Robots 标题:分布式深度强化学习在空中机器人智能交通监控中的应用

作者:Behzad Khamidehi,Elvino S. Sousa 机构:UniversityofToronto 备注:IEEE International Conference on Intelligent Transportation - ITSC2021 链接:https://arxiv.org/abs/2107.04924 摘要:本文研究了一组空中机器人在路网中的交通监控问题。由于两个主要原因,这个问题具有挑战性。首先,交通事件在时间和空间上都是随机的。其次,当交通事件以不同的速率到达路网的不同位置时,问题具有非齐次结构。因此,与其他位置相比,某些位置需要更多的机器人访问。为了解决这些问题,我们为道路网络的每个位置定义了一个不确定性度量,并为空中机器人提出了一个路径规划问题,以最小化网络的平均不确定性。将该问题表示为部分可观测马尔可夫决策过程(POMDP),提出了一种基于深度强化学习的分布式可扩展算法。我们考虑两种不同的场景,取决于代理(空中机器人)和交通管理中心(TMC)之间的通信模式。第一个场景假设代理连续地与TMC通信,以发送/接收关于流量事件的实时信息。因此,代理具有全局和实时的环境知识。然而,在第二种情况下,我们考虑一个具有挑战性的设置,其中的空中机器人的观测是部分的,并局限于它们的感测范围。此外,与第一种情况相比,空中机器人和TMC之间的信息交换仅限于特定的时间实例。我们评估了我们提出的算法在这两种情况下的性能,并在一个交通监控系统中演示了它的功能。 摘要:This paper studies the traffic monitoring problem in a road network using a team of aerial robots. The problem is challenging due to two main reasons. First, the traffic events are stochastic, both temporally and spatially. Second, the problem has a non-homogeneous structure as the traffic events arrive at different locations of the road network at different rates. Accordingly, some locations require more visits by the robots compared to other locations. To address these issues, we define an uncertainty metric for each location of the road network and formulate a path planning problem for the aerial robots to minimize the network's average uncertainty. We express this problem as a partially observable Markov decision process (POMDP) and propose a distributed and scalable algorithm based on deep reinforcement learning to solve it. We consider two different scenarios depending on the communication mode between the agents (aerial robots) and the traffic management center (TMC). The first scenario assumes that the agents continuously communicate with the TMC to send/receive real-time information about the traffic events. Hence, the agents have global and real-time knowledge of the environment. However, in the second scenario, we consider a challenging setting where the observation of the aerial robots is partial and limited to their sensing ranges. Moreover, in contrast to the first scenario, the information exchange between the aerial robots and the TMC is restricted to specific time instances. We evaluate the performance of our proposed algorithm in both scenarios for a real road network topology and demonstrate its functionality in a traffic monitoring system.

【62】 From Common Sense Reasoning to Neural Network Models through Multiple Preferences: an overview 标题:从常识推理到多偏好神经网络模型:综述

作者:Laura Giordano,Valentina Gliozzi,Daniele Theseider Dupré 备注:17 pages. arXiv admin note: text overlap with arXiv:2008.13278, arXiv:2012.13421, arXiv:2103.06854 链接:https://arxiv.org/abs/2107.04870 摘要:本文基于多重优先语义,讨论了条件优先逻辑与神经网络模型之间的关系。我们提出了一个概念智能的多参考语义,最近引入的可撤销描述逻辑考虑了不同概念的偏好,作为对神经网络模型提供语义解释的工具。这种方法已经在无监督神经网络模型(自组织映射)和有监督神经网络模型(多层感知器)中得到了探索,我们期望同样的方法可以推广到其他神经网络模型。它允许通过捕获网络输入输出行为的解释来检查网络的逻辑属性(通过模型检查)。对于多层感知器,深度网络本身可以看作是一个条件知识库,其中突触连接对应于加权条件。本文以自组织映射和多层感知器为例,描述了一般方法,并讨论了一些有待解决的问题和观点。 摘要:In this paper we discuss the relationships between conditional and preferential logics and neural network models, based on a multi-preferential semantics. We propose a concept-wise multipreference semantics, recently introduced for defeasible description logics to take into account preferences with respect to different concepts, as a tool for providing a semantic interpretation to neural network models. This approach has been explored both for unsupervised neural network models (Self-Organising Maps) and for supervised ones (Multilayer Perceptrons), and we expect that the same approach might be extended to other neural network models. It allows for logical properties of the network to be checked (by model checking) over an interpretation capturing the input-output behavior of the network. For Multilayer Perceptrons, the deep network itself can be regarded as a conditional knowledge base, in which synaptic connections correspond to weighted conditionals. The paper describes the general approach, through the cases of Self-Organising Maps and Multilayer Perceptrons, and discusses some open issues and perspectives.

【63】 Propagation-aware Social Recommendation by Transfer Learning 标题:基于迁移学习的传播感知社交推荐

作者:Haodong Chang,Yabo Chu 机构:Propagation-aware Social Recommendation byTransfer LearningHaodong Chang 1[0000−000 2− 50 1 5− 179 3] and Yabo Chu 2[0000−000 2− 169 4−9 179] 1 University of Technology Sydney, au 2 Northeastern University 链接:https://arxiv.org/abs/2107.04846 摘要:社会感知推荐方法是解决传统推荐系统数据稀疏问题的有效方法。其背后的假设是,社交用户连接中的知识可以共享并转移到用户项交互领域,从而帮助了解用户偏好。然而,现有的方法大多只采用迁移学习过程中用户之间的一阶连接,忽略了高阶连接。我们认为,更好的推荐性能也可以受益于高阶社会关系。本文提出了一种基于社会关系传播的传播感知迁移学习网络(PTLN)。我们的目标是更好地挖掘隐藏在社交网络中的共享知识,从而进一步提高推荐性能。具体而言,我们从两个方面来探讨社会影响:(a)高阶朋友已被考虑到的顺序偏见(b) 同一顺序的不同朋友对注意机制的推荐具有不同的重要性。此外,我们还设计了一种新的正则化方法来弥合社交关系和用户项交互之间的鸿沟。我们在两个真实世界的数据集上进行了大量的实验,并在排名准确性方面击败了其他同行,特别是对于历史交互很少的冷启动用户。 摘要:Social-aware recommendation approaches have been recognized as an effective way to solve the data sparsity issue of traditional recommender systems. The assumption behind is that the knowledge in social user-user connections can be shared and transferred to the domain of user-item interactions, whereby to help learn user preferences. However, most existing approaches merely adopt the first-order connections among users during transfer learning, ignoring those connections in higher orders. We argue that better recommendation performance can also benefit from high-order social relations. In this paper, we propose a novel Propagation-aware Transfer Learning Network (PTLN) based on the propagation of social relations. We aim to better mine the sharing knowledge hidden in social networks and thus further improve recommendation performance. Specifically, we explore social influence in two aspects: (a) higher-order friends have been taken into consideration by order bias; (b) different friends in the same order will have distinct importance for recommendation by an attention mechanism. Besides, we design a novel regularization to bridge the gap between social relations and user-item interactions. We conduct extensive experiments on two real-world datasets and beat other counterparts in terms of ranking accuracy, especially for the cold-start users with few historical interactions.

【64】 Identifying Layers Susceptible to Adversarial Attacks 标题:识别易受对抗性攻击的层

作者:Shoaib Ahmed Siddiqui,Thomas Breuel 机构:German Research Center for Artificial Intelligence (DFKI), TU Kaiserslautern, NVIDIA Research 链接:https://arxiv.org/abs/2107.04827 摘要:常见的神经网络结构容易受到敌对样本的攻击。神经网络结构通常被认为分为低级特征提取层和高级分类层;网络对敌方样本的敏感性通常被认为是与分类有关的问题,而不是与特征提取有关的问题。我们通过在CIFAR-10、Imagenette和ImageNet上使用非对抗性和对抗性数据有选择地重新训练VGG和ResNet架构的不同部分来测试这个想法。我们的实验结果表明,对抗性样本的敏感性与低水平的特征提取层有关。因此,再训练高级层不足以实现健壮性。这种现象可能有两种解释:要么,敌对攻击产生早期层次的输出,与攻击类中的特征无法区分,或者,对抗性攻击会产生早期层的输出,这些输出在统计上与非对抗性样本的特征不同,并且不允许后续层进行一致的分类。通过对隐藏层特征向量分布的大规模非线性降维和密度建模,我们发现非对抗性样本和对抗性样本的特征分布有很大差异。我们的结果为对抗性样本的统计起源和可能的防御提供了新的见解。 摘要:Common neural network architectures are susceptible to attack by adversarial samples. Neural network architectures are commonly thought of as divided into low-level feature extraction layers and high-level classification layers; susceptibility of networks to adversarial samples is often thought of as a problem related to classification rather than feature extraction. We test this idea by selectively retraining different portions of VGG and ResNet architectures on CIFAR-10, Imagenette and ImageNet using non-adversarial and adversarial data. Our experimental results show that susceptibility to adversarial samples is associated with low-level feature extraction layers. Therefore, retraining high-level layers is insufficient for achieving robustness. This phenomenon could have two explanations: either, adversarial attacks yield outputs from early layers that are indistinguishable from features found in the attack classes, or adversarial attacks yield outputs from early layers that differ statistically from features for non-adversarial samples and do not permit consistent classification by subsequent layers. We test this question by large-scale non-linear dimensionality reduction and density modeling on distributions of feature vectors in hidden layers and find that the feature distributions between non-adversarial and adversarial samples differ substantially. Our results provide new insights into the statistical origins of adversarial samples and possible defenses.

【65】 Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase Recognition 标题:非端到端:探索在线手术分期识别的多级体系结构

作者:Fangqiu Yi,Tingting Jiang 机构:NELVT, Department of Computer Science, Peking University, China 备注:Not accepted by M2CAI2021 链接:https://arxiv.org/abs/2107.04810 摘要:手术相位识别是计算机辅助手术系统特别关注的问题,其目标是预测手术视频的每一帧发生的相位。具有多阶段结构的网络已被广泛应用于许多模式丰富的计算机视觉任务中,其中预测器阶段首先输出初始预测,附加的细化阶段对初始预测进行进一步细化。现有的研究表明,手术视频内容具有良好的有序性和丰富的时间模式,使得多阶段结构非常适合于手术阶段识别任务。然而,我们观察到,当简单地将多阶段架构应用于手术阶段识别任务时,端到端的训练方式会使细化能力达不到预期的效果。针对这一问题,我们提出了一种新的非端到端训练策略,并探讨了手术阶段识别任务的多阶段结构设计。对于非端到端的训练策略,在细化阶段分别用两类扰动序列进行训练。同时,我们评估了三种不同的细化模型选择,以表明我们的分析和解决方案对特定多阶段模型的选择是稳健的。我们在两个公共基准上进行了实验,M2CAI16工作流挑战和Cholec80数据集。结果表明,采用我们的策略训练的多阶段体系结构大大提高了当前最先进的单阶段模型的性能。代码位于url{https://github.com/ChinaYi/casual_tcn}. 摘要:Surgical phase recognition is of particular interest to computer assisted surgery systems, in which the goal is to predict what phase is occurring at each frame for a surgery video. Networks with multi-stage architecture have been widely applied in many computer vision tasks with rich patterns, where a predictor stage first outputs initial predictions and an additional refinement stage operates on the initial predictions to perform further refinement. Existing works show that surgical video contents are well ordered and contain rich temporal patterns, making the multi-stage architecture well suited for the surgical phase recognition task. However, we observe that when simply applying the multi-stage architecture to the surgical phase recognition task, the end-to-end training manner will make the refinement ability fall short of its wishes. To address the problem, we propose a new non end-to-end training strategy and explore different designs of multi-stage architecture for surgical phase recognition task. For the non end-to-end training strategy, the refinement stage is trained separately with proposed two types of disturbed sequences. Meanwhile, we evaluate three different choices of refinement models to show that our analysis and solution are robust to the choices of specific multi-stage models. We conduct experiments on two public benchmarks, the M2CAI16 Workflow Challenge, and the Cholec80 dataset. Results show that multi-stage architecture trained with our strategy largely boosts the performance of the current state-of-the-art single-stage model. Code is available at url{https://github.com/ChinaYi/casual_tcn}.

【66】 Speech2Video: Cross-Modal Distillation for Speech to Video Generation 标题:Speech2Video:用于语音到视频生成的跨模式提取

作者:Shijing Si,Jianzong Wang,Xiaoyang Qu,Ning Cheng,Wenqi Wei,Xinghua Zhu,Jing Xiao 机构:Ping An Technology (Shenzhen) Co., Ltd., China 备注:Accepted by InterSpeech2021 链接:https://arxiv.org/abs/2107.04806 摘要:本文研究了一种仅从语音中生成人脸视频的新方法。语音到视频生成技术可以在娱乐、客户服务和人机交互行业激发有趣的应用。事实上,演讲中的音色、口音和语速可能包含与演讲者外表相关的丰富信息。挑战主要在于从音频信号中分离出不同的视觉属性。在这篇文章中,我们提出了一种轻量级的跨模态提取方法,从未标记的视频输入中提取分离的情感和身份信息。提取的特征然后被一个生成性的对抗网络集成到有声人脸视频片段中。通过精心设计的鉴别器,该框架获得了真实的生成结果。对被观察个体的实验表明,该框架仅从语音中捕捉情感表达,并在视频输出中产生自发的面部运动。与将语音与说话人的静态图像相结合的基线方法相比,该框架的结果几乎是不可区分的。用户研究还表明,该方法在生成视频的情感表达方面优于现有算法。 摘要:This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique can spark interesting applications in entertainment, customer service, and human-computer-interaction industries. Indeed, the timbre, accent and speed in speeches could contain rich information relevant to speakers' appearance. The challenge mainly lies in disentangling the distinct visual attributes from audio signals. In this article, we propose a light-weight, cross-modal distillation method to extract disentangled emotional and identity information from unlabelled video inputs. The extracted features are then integrated by a generative adversarial network into talking face video clips. With carefully crafted discriminators, the proposed framework achieves realistic generation results. Experiments with observed individuals demonstrated that the proposed framework captures the emotional expressions solely from speeches, and produces spontaneous facial motion in the video output. Compared to the baseline method where speeches are combined with a static image of the speaker, the results of the proposed framework is almost indistinguishable. User studies also show that the proposed method outperforms the existing algorithms in terms of emotion expression in the generated videos.

【67】 Formal context reduction in deriving concept hierarchies from corpora using adaptive evolutionary clustering algorithm star 标题:利用自适应进化聚类算法STAR从语料库导出概念层次的形式背景约简

作者:Bryar A. Hassan,Tarik A. Rashid,Seyedali Mirjalili 机构: Department of Computer Networks, Technical College of Informatics, Sulaimani Polytechnic University, Kurdistan Institution for Strategic Studies and Scientific Research, Sulaimani, Iraq 备注:Complex Intell. Syst. (2021) 链接:https://arxiv.org/abs/2107.04781 摘要:从语料库中自动生成概念层次结构的过程是有益的,因为手动构建概念层次结构通常是一个耗时且资源密集的过程。因此,从语料库学习概念层次结构的整个过程包含一系列步骤:将文本解析为句子,拆分句子,然后对其进行标记。在柠檬化步骤之后,使用FCA提取成对。然而,在形式上下文中可能会有一些无趣和错误的配对。生成形式上下文可能会导致一个耗时的过程,因此需要减少形式上下文的大小来删除不感兴趣和错误的上下文对,从而减少提取概念格和概念层次结构所需的时间。在此前提下,本研究旨在提出两个框架:(1)一个框架来回顾目前利用FCA从语料库中提取概念层次的过程(2) 使用自适应版本的ECA*减少第一个框架的形式上下文歧义的框架。在这两个框架上,利用Wikipedia的385个样本语料库进行了实验,以检验形式上下文的缩减,从而得到概念格和概念层次。利用概念格不变量,将生成的形式上下文格评价为标准格。因此,两个格之间的同态保持了产生的概念层次结构的质量,与基本概念层次结构相比,保持了89%,简化概念格继承了标准概念格的结构关系。自适应ECA*与四种相应的基线算法进行比较,以测量具有不同密度(填充率)的随机数据集上的执行时间。结果表明,在不同的填充率下,自适应ECA*比其他竞争性技术在概念格上表现得更快。 摘要:It is beneficial to automate the process of deriving concept hierarchies from corpora since a manual construction of concept hierarchies is typically a time-consuming and resource-intensive process. As such, the overall process of learning concept hierarchies from corpora encompasses a set of steps: parsing the text into sentences, splitting the sentences and then tokenising it. After the lemmatisation step, the pairs are extracted using FCA. However, there might be some uninteresting and erroneous pairs in the formal context. Generating formal context may lead to a time-consuming process, so formal context size reduction is required to remove uninterested and erroneous pairs, taking less time to extract the concept lattice and concept hierarchies accordingly. In this premise, this study aims to propose two frameworks: (1) A framework to review the current process of deriving concept hierarchies from corpus utilising FCA; (2) A framework to decrease the formal contexts ambiguity of the first framework using an adaptive version of ECA*. Experiments are conducted by applying 385 sample corpora from Wikipedia on the two frameworks to examine the reducing size of formal context, which leads to yield concept lattice and concept hierarchy. The resulting lattice of formal context is evaluated to the standard one using concept lattice-invariants. Accordingly, the homomorphic between the two lattices preserves the quality of resulting concept hierarchies by 89% in contrast to the basic ones, and the reduced concept lattice inherits the structural relation of the standard one. The adaptive ECA* is examined against its four counterpart baseline algorithms to measure the execution time on random datasets with different densities (fill ratios). The results show that adaptive ECA* performs concept lattice faster than other mentioned competitive techniques in different fill ratios.

【68】 LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks 标题:LS3:迭代任务长视距视觉运动控制的潜在空间安全集

作者:Albert Wilcox,Ashwin Balakrishna,Brijen Thananjeyan,Joseph E. Gonzalez,Ken Goldberg 机构: equal contribution 备注:Preprint, Under Review. First two authors contributed equally 链接:https://arxiv.org/abs/2107.04775 摘要:强化学习(RL)算法在探索高维环境以学习复杂的、长时间范围的任务方面取得了令人印象深刻的成功,但在探索不受约束的情况下,它往往表现出不安全的行为,需要广泛的环境交互。动态不确定环境下安全学习的一个很有前途的策略是要求agent能够鲁棒地返回到任务成功(从而保证安全)的状态。虽然这种方法在低维环境中取得了成功,但在具有高维状态空间(如图像)的环境中实施这种约束是一个挑战。我们提出了潜在空间安全集(LS3),通过使用次优演示和学习动力学模型,将该策略扩展到迭代的、长视距的图像观测任务,将探索限制在学习安全集的邻域内,其中任务可能完成。我们评估了4个领域的LS3,包括一个具有挑战性的模拟顺序推送任务和一个物理电缆路由任务。我们发现,LS3在满足约束条件的同时,可以利用先前的任务成功来限制探索和学习,比先前的算法更有效。看到了吗https://tinyurl.com/latent-ss 代码和补充材料。 摘要:Reinforcement learning (RL) algorithms have shown impressive success in exploring high-dimensional environments to learn complex, long-horizon tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrained. A promising strategy for safe learning in dynamically uncertain environments is requiring that the agent can robustly return to states where task success (and therefore safety) can be guaranteed. While this approach has been successful in low-dimensions, enforcing this constraint in environments with high-dimensional state spaces, such as images, is challenging. We present Latent Space Safe Sets (LS3), which extends this strategy to iterative, long-horizon tasks with image observations by using suboptimal demonstrations and a learned dynamics model to restrict exploration to the neighborhood of a learned Safe Set where task completion is likely. We evaluate LS3 on 4 domains, including a challenging sequential pushing task in simulation and a physical cable routing task. We find that LS3 can use prior task successes to restrict exploration and learn more efficiently than prior algorithms while satisfying constraints. See https://tinyurl.com/latent-ss for code and supplementary material.

【69】 Similar Cases Recommendation using Legal Knowledge Graphs 标题:基于法律知识图的相似案例推荐

作者:Jaspreet Singh Dhani,Ruchika Bhatt,Balaji Ganesan,Parikshet Sirohi,Vasudha Bhatnagar 备注:4 pages. 5 figures. KG Workshop at KDD 2021 链接:https://arxiv.org/abs/2107.04771 摘要:从法院案例、判决、法律和其他法律文件构建的法律知识图可以实现问答、文档相似性和搜索等多种应用。在NLP任务中使用知识图进行远程监控已经得到了很好的研究,而将知识图用于下游图任务(如节点相似性)在选择节点类型及其特征方面存在挑战。在这个演示中,我们描述了从我们的法律知识图导出的案例图中预测相似节点的解决方案。 摘要:A legal knowledge graph constructed from court cases, judgments, laws and other legal documents can enable a number of applications like question answering, document similarity, and search. While the use of knowledge graphs for distant supervision in NLP tasks is well researched, using knowledge graphs for downstream graph tasks like node similarity presents challenges in selecting node types and their features. In this demo, we describe our solution for predicting similar nodes in a case graph derived from our legal knowledge graph.

【70】 DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering 标题:DualVGR:一种用于视频答疑的双视觉图形推理单元

作者:Jianyu Wang,Bing-Kun Bao,Changsheng Xu 机构:and also with the School of Artificial Intelligence 备注:None 链接:https://arxiv.org/abs/2107.04768 摘要:视频问答是一项具有挑战性的任务,它要求agent能够理解丰富的视频内容并进行时空推理。然而,现有的基于图的方法不能很好地进行多步推理,忽略了VideoQA的两个性质:(1)即使对于同一个视频,不同的问题也可能需要不同数量的视频片段或对象来用关系推理来推断答案(2) 在推理过程中,外观特征和运动特征具有复杂的相互依赖关系,二者相互关联、相互补充。基于这些观察,我们提出了一种双视觉图形推理单元(DualVGR),它以端到端的方式推理视频。我们的DualVGR的第一个贡献是设计了一个可解释的查询惩罚模块,该模块可以通过多个推理周期过滤出不相关的视觉特征。第二个贡献是提出了基于视频的多视点图形注意网络,它捕获了外观和运动特征之间的关系。我们的双VGR网络在基准MSVD-QA和SVQA数据集上实现了最先进的性能,并在基准MSRVTT-QA数据集上展示了竞争结果。我们的代码在https://github.com/MMIR/DualVGR-VideoQA. 摘要:Video question answering is a challenging task, which requires agents to be able to understand rich video contents and perform spatial-temporal reasoning. However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer with relational reasoning; (2) During reasoning, appearance and motion features have complicated interdependence which are correlated and complementary to each other. Based on these observations, we propose a Dual-Visual Graph Reasoning Unit (DualVGR) which reasons over videos in an end-to-end fashion. The first contribution of our DualVGR is the design of an explainable Query Punishment Module, which can filter out irrelevant visual features through multiple cycles of reasoning. The second contribution is the proposed Video-based Multi-view Graph Attention Network, which captures the relations between appearance and motion features. Our DualVGR network achieves state-of-the-art performance on the benchmark MSVD-QA and SVQA datasets, and demonstrates competitive results on benchmark MSRVTT-QA datasets. Our code is available at https://github.com/MMIR/DualVGR-VideoQA.

【71】 Anomaly Detection in Residential Video Surveillance on Edge Devices in IoT Framework 标题:物联网框架下边缘设备住宅视频监控中的异常检测

作者:Mayur R. Parate,Kishor M. Bhurchandi,Ashwin G. Kothari 机构: Parate is with the Department of Electronics and CommunicationEngineering, Indian Institute of Information Technology 备注:7 Pages, 7 Figures and 3 Tables 链接:https://arxiv.org/abs/2107.04767 摘要:智能居民监控是智能社区最基本的服务之一。日益增长的安全需求要求监控系统能够检测监控场景中的异常情况。在住宅社会中,采用高容量的计算设备进行智能监控成本高昂,而且不可行。因此,本文提出了一种基于CPU边缘设备的智能监控异常检测方法。开发了一个用于捕获对象级推理和跟踪的模块化框架。为了处理部分遮挡、姿态变形和复杂场景,我们采用了特征编码和轨迹关联。对异常检测框架的元素进行了优化,使其能够在具有足够FPS的仅CPU边缘设备上运行。实验结果表明,该方法是可行的,在实际场景中取得了满意的效果。 摘要:Intelligent resident surveillance is one of the most essential smart community services. The increasing demand for security needs surveillance systems to be able to detect anomalies in surveillance scenes. Employing high-capacity computational devices for intelligent surveillance in residential societies is costly and not feasible. Therefore, we propose anomaly detection for intelligent surveillance using CPU-only edge devices. A modular framework to capture object-level inferences and tracking is developed. To cope with partial occlusions, posture deformations, and complex scenes we employed feature encoding and trajectory associations. Elements of the anomaly detection framework are optimized to run on CPU-only edge devices with sufficient FPS. The experimental results indicate the proposed method is feasible and achieves satisfactory results in real-life scenarios.

【72】 Hack The Box: Fooling Deep Learning Abstraction-Based Monitors 标题:破解盒子:愚弄深度学习基于抽象的监视器

作者:Sara Hajj Ibrahim,Mohamed Nassar 机构: American University of Beirut (AUB), University of New Haven 链接:https://arxiv.org/abs/2107.04764 摘要:深度学习是一种适应概念层次结构的机器学习。深度学习分类器将输入层概念的最基本版本链接到输出层概念的最抽象版本,也称为类或标签。然而,一旦在一组有限的课程中进行训练,深度学习模型就没有能力说一个给定的输入不属于任何一个课程,而且根本无法链接。不相关类预测的正确失效是一个具有挑战性的问题,在文献中有许多方法被解决。新颖性检测使深度学习能够为新奇/看不见的类输出“不知道”。尽管如此,对于新颖性检测的安全方面还没有给予关注。在本文中,我们考虑基于抽象的新颖性检测的案例研究表明,它是不利于对抗样本。此外,我们还证明了利用对抗性样本欺骗深度学习分类器,同时绕过新颖性检测监控的可行性。换句话说,这些监控箱是可以被黑客攻击的。我们证明了新颖性检测本身最终是一个攻击面。 摘要:Deep learning is a type of machine learning that adapts a deep hierarchy of concepts. Deep learning classifiers link the most basic version of concepts at the input layer to the most abstract version of concepts at the output layer, also known as a class or label. However, once trained over a finite set of classes, a deep learning model does not have the power to say that a given input does not belong to any of the classes and simply cannot be linked. Correctly invalidating the prediction of unrelated classes is a challenging problem that has been tackled in many ways in the literature. Novelty detection gives deep learning the ability to output "do not know" for novel/unseen classes. Still, no attention has been given to the security aspects of novelty detection. In this paper, we consider the case study of abstraction-based novelty detection and show that it is not robust against adversarial samples. Moreover, we show the feasibility of crafting adversarial samples that fool the deep learning classifier and bypass the novelty detection monitoring at the same time. In other words, these monitoring boxes are hackable. We demonstrate that novelty detection itself ends up as an attack surface.

【73】 Beyond Low-pass Filtering: Graph Convolutional Networks with Automatic Filtering 标题:超越低通滤波:具有自动滤波功能的图卷积网络

作者:Zonghan Wu,Shirui Pan,Guodong Long,Jing Jiang,Chengqi Zhang 机构: Pan is with the Department of Data Science and AI 备注:11 pages 链接:https://arxiv.org/abs/2107.04755 摘要:图卷积网络对于图结构数据的深度学习是必不可少的。现有的大多数图卷积网络都有两大缺点。首先,它们本质上是低通滤波器,因此忽略了图形信号中可能有用的中高频带。其次,现有图卷积滤波器的带宽是固定的。图卷积滤波器的参数只变换图的输入,而不改变图卷积滤波器函数的曲率。在现实中,除非我们有专业领域的知识,否则我们不确定是否应该在某一点上保持或切断频率。本文提出了自动图卷积网络(AutoGCN)来捕获图信号的全频谱,并自动更新图卷积滤波器的带宽。虽然它是基于图谱理论,我们的AutoGCN也局限于空间,并具有空间形式。实验结果表明,与仅作为低通滤波器的基线方法相比,AutoGCN算法有显著的改进。 摘要:Graph convolutional networks are becoming indispensable for deep learning from graph-structured data. Most of the existing graph convolutional networks share two big shortcomings. First, they are essentially low-pass filters, thus the potentially useful middle and high frequency band of graph signals are ignored. Second, the bandwidth of existing graph convolutional filters is fixed. Parameters of a graph convolutional filter only transform the graph inputs without changing the curvature of a graph convolutional filter function. In reality, we are uncertain about whether we should retain or cut off the frequency at a certain point unless we have expert domain knowledge. In this paper, we propose Automatic Graph Convolutional Networks (AutoGCN) to capture the full spectrum of graph signals and automatically update the bandwidth of graph convolutional filters. While it is based on graph spectral theory, our AutoGCN is also localized in space and has a spatial form. Experimental results show that AutoGCN achieves significant improvement over baseline methods which only work as low-pass filters.

【74】 Lifelong Twin Generative Adversarial Networks 标题:终生孪生对抗性网络

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:Accepted at International Conference on Image Processing (ICIP 2021) 链接:https://arxiv.org/abs/2107.04708 摘要:本文提出了一种新的持续学习生成模型,称为终身双生成对抗网络(ltgans)。LT-GANs从多个数据库中学习一系列任务,其体系结构由三个组件组成:两个相同的生成器,即教师和助手,以及一个鉴别器。为了让中职干部不忘学习新概念,我们引入了一种新的终身训练方法,即终身对抗性知识提炼法(LAKD),它鼓励教师和助手在学习新数据库的同时,交替进行教学。这种训练方法有利于将知识从一个知识渊博的玩家转移到另一个对先前给定任务了解较少的玩家。 摘要:In this paper, we propose a new continuously learning generative model, called the Lifelong Twin Generative Adversarial Networks (LT-GANs). LT-GANs learns a sequence of tasks from several databases and its architecture consists of three components: two identical generators, namely the Teacher and Assistant, and one Discriminator. In order to allow for the LT-GANs to learn new concepts without forgetting, we introduce a new lifelong training approach, namely Lifelong Adversarial Knowledge Distillation (LAKD), which encourages the Teacher and Assistant to alternately teach each other, while learning a new database. This training approach favours transferring knowledge from a more knowledgeable player to another player which knows less information about a previously given task.

【75】 InfoVAEGAN : learning joint interpretable representations by information maximization and maximum likelihood 标题:InfoVAEGAN:基于信息最大化和最大似然的联合可解释表示学习

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:Accepted at International Conference on Image Processing (ICIP 2021) 链接:https://arxiv.org/abs/2107.04705 摘要:学习分离的和可解释的表示是在流形上实现综合数据表示的一个重要步骤。本文提出了一种新的表示学习算法,该算法结合了变分自编码器(VAE)的推理能力和生成对抗网络(GAN)的泛化能力。该模型称为InfoVAEGAN,由三个网络组成:编码器、发生器和鉴别器。InfoVAEGAN的目标是通过使用两个不同的无数据对数似然函数对从生成器分布中采样的变量,以无监督的方式联合学习离散和连续的可解释表示。我们提出了一个两阶段的算法,分别优化推理网络和生成器训练。此外,我们通过最大化现有潜变量与通过生成和推理过程产生的潜变量之间的互信息来加强可解释表示的学习。 摘要:Learning disentangled and interpretable representations is an important step towards accomplishing comprehensive data representations on the manifold. In this paper, we propose a novel representation learning algorithm which combines the inference abilities of Variational Autoencoders (VAE) with the generalization capability of Generative Adversarial Networks (GAN). The proposed model, called InfoVAEGAN, consists of three networks~: Encoder, Generator and Discriminator. InfoVAEGAN aims to jointly learn discrete and continuous interpretable representations in an unsupervised manner by using two different data-free log-likelihood functions onto the variables sampled from the generator's distribution. We propose a two-stage algorithm for optimizing the inference network separately from the generator training. Moreover, we enforce the learning of interpretable representations through the maximization of the mutual information between the existing latent variables and those created through generative and inference processes.

【76】 Lifelong Mixture of Variational Autoencoders 标题:变分自动编码器的终身混合

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:Accepted by IEEE Transactions on Neural Networks and Learning Systems 链接:https://arxiv.org/abs/2107.04694 摘要:在本文中,我们提出了一个端到端的终身学习混合专家。每个专家由一个变分自动编码器(VAE)实现。混合系统中的专家通过在给定训练样本的对数似然上最大化单个分量证据下限(MELBO)的混合进行联合训练。混合物中的混合系数控制着每个专家在目标表示中的贡献。这些样本来自狄里克莱分布,其参数在终身学习期间通过非参数估计确定。当新任务与先前学习的任务相似时,该模型可以快速学习新任务。提出的VAE终身混合(L-MVAE)在学习一个全新的任务时,用新的组件扩展了它的体系结构。经过训练后,我们的模型可以自动确定输入新数据样本时要使用的相关专家。由于推理过程中只使用一个专家,这种机制既提高了存储效率,又减少了计算量。L-MVAE推理模型能够在与不同任务相关的数据域之间的联合潜在空间中进行插值,并且被证明对于解纠缠学习表示是有效的。 摘要:In this paper, we propose an end-to-end lifelong learning mixture of experts. Each expert is implemented by a Variational Autoencoder (VAE). The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the given training samples. The mixing coefficients in the mixture, control the contributions of each expert in the goal representation. These are sampled from a Dirichlet distribution whose parameters are determined through non-parametric estimation during lifelong learning. The model can learn new tasks fast when these are similar to those previously learnt. The proposed Lifelong mixture of VAE (L-MVAE) expands its architecture with new components when learning a completely new task. After the training, our model can automatically determine the relevant expert to be used when fed with new data samples. This mechanism benefits both the memory efficiency and the required computational cost as only one expert is used during the inference. The L-MVAE inference model is able to perform interpolation in the joint latent space across the data domains associated with different tasks and is shown to be efficient for disentangled learning representation.

【77】 Lifelong Teacher-Student Network Learning 标题:终身师生网络学习

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:18 pages, 18 figures. in IEEE Transactions on Pattern Analysis and Machine Intelligence 链接:https://arxiv.org/abs/2107.04689 摘要:人类独特的认知能力在于从一系列经验中获得新知识和新技能的能力。同时,人工智能系统擅长只学习最后一个给定的任务,而不能记住过去学习的数据库。我们提出了一个新的终身学习方法,采用师生网络框架。当学生模块使用一个新的给定数据库进行训练时,教师模块会提醒学生过去所学的信息。教师由一个生成性对抗网络(GAN)实现,被训练来保存和回放与先前学习数据库的概率表示相对应的过去知识。同时,学生模块由一个变分自动编码器(VAE)实现,VAE从教师模块的输出和新的可用数据库中推断其潜在变量的表示。此外,学生模块被训练来捕获跨不同领域的连续和离散的基础数据表示。将所提出的终身学习框架应用于有监督、半监督和无监督训练中。代码可用~:url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning} 摘要:A unique cognitive capability of humans consists in their ability to acquire new knowledge and skills from a sequence of experiences. Meanwhile, artificial intelligence systems are good at learning only the last given task without being able to remember the databases learnt in the past. We propose a novel lifelong learning methodology by employing a Teacher-Student network framework. While the Student module is trained with a new given database, the Teacher module would remind the Student about the information learnt in the past. The Teacher, implemented by a Generative Adversarial Network (GAN), is trained to preserve and replay past knowledge corresponding to the probabilistic representations of previously learn databases. Meanwhile, the Student module is implemented by a Variational Autoencoder (VAE) which infers its latent variable representation from both the output of the Teacher module as well as from the newly available database. Moreover, the Student module is trained to capture both continuous and discrete underlying data representations across different domains. The proposed lifelong learning framework is applied in supervised, semi-supervised and unsupervised training. The code is available~: url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning}

【78】 A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data 标题:表格数据反事实生成方法的框架与标杆研究

作者:Raphael Mazzine,David Martens 链接:https://arxiv.org/abs/2107.04680 摘要:反事实解释被认为是解释机器学习预测的有效方法。这种兴趣反映在一个相对年轻的文献中,已经有几十种算法旨在产生这样的解释。这些算法的重点是发现如何修改特征来改变输出分类。然而,这个相当普遍的目标可以通过不同的方式实现,这就需要一种方法来测试和测试这些算法。这项工作的贡献是多方面的:首先,对22个表格数据集上的10种算法方法进行了大型基准研究,使用了9个相关的评估指标。第二,介绍了一种新颖的、首创的、用于测试反事实生成算法的框架。第三,一套客观指标来评估和比较反事实的结果。最后,从基准测试结果中可以看出哪些方法在哪种类型的数据集上获得最佳性能。这种基准研究和框架可以帮助实践者确定哪种技术和构建块最适合他们的环境,并且可以帮助研究人员设计和评估当前和未来的反事实生成算法。我们的研究结果表明,总体而言,没有一个最佳的算法来生成反事实的解释,因为性能高度依赖于与数据集、模型、分数和事实点特定性相关的属性。 摘要:Counterfactual explanations are viewed as an effective way to explain machine learning predictions. This interest is reflected by a relatively young literature with already dozens of algorithms aiming to generate such explanations. These algorithms are focused on finding how features can be modified to change the output classification. However, this rather general objective can be achieved in different ways, which brings about the need for a methodology to test and benchmark these algorithms. The contributions of this work are manifold: First, a large benchmarking study of 10 algorithmic approaches on 22 tabular datasets is performed, using 9 relevant evaluation metrics. Second, the introduction of a novel, first of its kind, framework to test counterfactual generation algorithms. Third, a set of objective metrics to evaluate and compare counterfactual results. And finally, insight from the benchmarking results that indicate which approaches obtain the best performance on what type of dataset. This benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context, and can help researchers in the design and evaluation of current and future counterfactual generation algorithms. Our findings show that, overall, there's no single best algorithm to generate counterfactual explanations as the performance highly depends on properties related to the dataset, model, score and factual point specificities.

【79】 Training Over-parameterized Models with Non-decomposable Objectives 标题:训练具有不可分解目标的过参数化模型

作者:Harikrishna Narasimhan,Aditya Krishna Menon 机构:Google Research, Mountain View, Google Research, New York 链接:https://arxiv.org/abs/2107.04641 摘要:许多现代机器学习应用程序都有着复杂而微妙的设计目标,比如最小化最坏情况下的错误,满足给定的精度或召回目标,或者强制执行组公平性约束。用于优化此类不可分解目标的流行技术将问题简化为一系列对成本敏感的学习任务,然后通过使用示例特定成本重新加权训练损失来解决每个任务。我们指出,标准的方法,重新加权的损失,以纳入标签成本可能会产生不满意的结果时,用于训练参数化模型。作为补救措施,我们提出了新的成本敏感损失,扩展了经典的logit调整思想,以处理更一般的成本矩阵。我们的损失是经过校准的,并且可以通过从教师模型中提取标签来进一步改善。通过在基准图像数据集上的实验,我们展示了该方法在训练具有共同鲁棒性和约束优化目标的ResNet模型方面的有效性。 摘要:Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.

【80】 Playing Angry Birds with a Domain-Independent PDDL Planner 标题:使用与域无关的PDDL 规划器玩愤怒的小鸟

作者:Wiktor Piotrowski,Roni Stern,Matthew Klenk,Alexandre Perez,Shiwali Mohan,Johan de Kleer,Jacob Le 备注:2 pages, submitted to ICAPS 2021 Demonstration Track 链接:https://arxiv.org/abs/2107.04635 摘要:这个演示文件介绍了第一个系统玩流行的愤怒的小鸟游戏使用领域独立规划。我们的系统模型愤怒的小鸟水平使用PDDL ,一种混合离散/连续域的规划语言。它使用独立于域的PDDL planner来生成计划并执行它们。在这个演示文件中,我们介绍了这个领域的系统PDDL 模型,确定了降低问题复杂性的关键设计决策,并将我们的系统性能与这个领域的模型特定方法进行了比较。结果表明,我们的系统的性能是等同于其他领域特定系统的愤怒鸟,建议域独立规划的适用性这一基准AI挑战。 摘要:This demo paper presents the first system for playing the popular Angry Birds game using a domain-independent planner. Our system models Angry Birds levels using PDDL , a planning language for mixed discrete/continuous domains. It uses a domain-independent PDDL planner to generate plans and executes them. In this demo paper, we present the system's PDDL model for this domain, identify key design decisions that reduce the problem complexity, and compare the performance of our system to model-specific methods for this domain. The results show that our system's performance is on par with other domain-specific systems for Angry Birds, suggesting the applicability of domain-independent planning to this benchmark AI challenge.

【81】 Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes 标题:从非马尔科夫随机报酬过程中学习概率报酬机器

作者:Alvaro Velasquez,Andre Beckus,Taylor Dohmen,Ashutosh Trivedi,Noah Topper,George Atia 机构: but can be 1Air Force Research Laboratory 2University of Colorado, Boul-der 3University of Central Florida 链接:https://arxiv.org/abs/2107.04633 摘要:在典型环境下,强化学习的成功部分取决于对奖励信号的马尔可夫假设,而奖励信号是agent学习最优策略的基础。近年来,奖励机的使用放宽了这一假设,使非马尔可夫奖励的结构化表示成为可能。特别地,这种表示可以用来扩充底层决策过程的状态空间,从而促进非马尔可夫强化学习。然而,这些奖励机器无法捕捉随机奖励信号的语义。在本文中,我们通过引入概率奖励机(PRMs)作为非马尔可夫随机奖励的表示,在这方面取得了进展。我们提出了一个算法来学习PRM从底层的决策过程,以及学习的PRM表示一个给定的决策策略。 摘要:The success of reinforcement learning in typical settings is, in part, predicated on underlying Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process as well as to learn the PRM representation of a given decision-making policy.

【82】 Algorithmic Causal Effect Identification with causaleffect 标题:考虑因果关系的算法因果关系识别

作者:Martí Pedemonte,Jordi Vitrià,Álvaro Parafita 机构:Universitat de Barcelona, Department of Mathematics and Computer Science 备注:40 pages, 27 figures 链接:https://arxiv.org/abs/2107.04632 摘要:当我们理解因果关系时,我们作为一个物种的进化向前迈出了一大步。对于某些事件,这些关联可能微不足道,但它们并不在复杂的场景中。为了严格证明某些事件是由其他事件引起的,引入了$do$-算子及其相关规则,将因果理论和因果推理形式化。本报告的主要目标是回顾并在Python中实现一些从观测数据计算条件和非条件因果查询的算法。为此,我们首先介绍了概率论和图论的一些基本背景知识,然后介绍了用于构建算法的因果理论的重要结果。然后,我们深入研究了Shpitser和Pearl在2006年提出的识别算法,并解释了我们在Python中的实现。主识别算法可以看作是$do$-演算规则的重复应用,它最终或者从实验概率返回因果查询的表达式,或者不能识别因果效应,在这种情况下,因果效应是不可识别的。我们将介绍我们新开发的Python库并给出一些使用示例。 摘要:Our evolution as a species made a huge step forward when we understood the relationships between causes and effects. These associations may be trivial for some events, but they are not in complex scenarios. To rigorously prove that some occurrences are caused by others, causal theory and causal inference were formalized, introducing the $do$-operator and its associated rules. The main goal of this report is to review and implement in Python some algorithms to compute conditional and non-conditional causal queries from observational data. To this end, we first present some basic background knowledge on probability and graph theory, before introducing important results on causal theory, used in the construction of the algorithms. We then thoroughly study the identification algorithms presented by Shpitser and Pearl in 2006, explaining our implementation in Python alongside. The main identification algorithm can be seen as a repeated application of the rules of $do$-calculus, and it eventually either returns an expression for the causal query from experimental probabilities or fails to identify the causal effect, in which case the effect is non-identifiable. We introduce our newly developed Python library and give some usage examples.

【83】 Ill-posed Surface Emissivity Retrieval from Multi-Geometry HyperspectralImages using a Hybrid Deep Neural Network 标题:基于混合深度神经网络的多几何高光谱图像病态表面发射率反演

作者:Fangcao Xu,Jian Suna,Guido Cervonea,Mark Salvador 机构:Department of Geography, the Pennsylvania State University, University Park, PA, USA, Institute for Computational and Data Sciences, the Pennsylvania State University, University Park, PA, USA, Zi INC, Washington D.C., USA 链接:https://arxiv.org/abs/2107.04631 摘要:大气校正是遥感中的一项基本任务,因为观测要么是在大气中进行的,要么是在大气中进行的。大气校正误差会显著改变观测的光谱特征,导致分类或目标探测的失效。在处理高光谱数据时,这一点更为关键,因为高光谱数据需要对光谱特性进行精确测量。最先进的基于物理的大气校正方法需要广泛的传感器特性、采集几何结构和采集场景的环境特性的先验知识。这些方法的计算成本很高,由于缺乏足够的环境和收集信息,容易出现不准确的情况,并且通常不可能用于实时应用。本文提出了一种几何相关的混合神经网络,用于多扫描高光谱数据的自动大气校正。所提出的网络可以在不需要任何额外气象数据的情况下表征大气。提出了一种网格搜索方法来解决温度-发射率分离问题。结果表明,该网络能准确地描述29种不同材料的大气特征,并能在0.02的平均绝对误差下估计目标发射率谱。这种解决方案可以导致精确的大气校正,以提高目标检测的实时应用。 摘要:Atmospheric correction is a fundamental task in remote sensing because observations are taken either of the atmosphere or looking through the atmosphere. Atmospheric correction errors can significantly alter the spectral signature of the observations, and lead to invalid classifications or target detection. This is even more crucial when working with hyperspectral data, where a precise measurement of spectral properties is required. State-of-the-art physics-based atmospheric correction approaches require extensive prior knowledge about sensor characteristics, collection geometry, and environmental characteristics of the scene being collected. These approaches are computationally expensive, prone to inaccuracy due to lack of sufficient environmental and collection information, and often impossible for real-time applications. In this paper, a geometry-dependent hybrid neural network is proposed for automatic atmospheric correction using multi-scan hyperspectral data collected from different geometries. The proposed network can characterize the atmosphere without any additional meteorological data. A grid-search method is also proposed to solve the temperature emissivity separation problem. Results show that the proposed network has the capacity to accurately characterize the atmosphere and estimate target emissivity spectra with a Mean Absolute Error (MAE) under 0.02 for 29 different materials. This solution can lead to accurate atmospheric correction to improve target detection for real time applications.

【84】 Diverse Video Generation using a Gaussian Process Trigger 标题:使用高斯过程触发器的多样化视频生成

作者:Gaurav Shrivastava,Abhinav Shrivastava 机构:University of Maryland, College Park 备注:International Conference on Learning Representations, 2021 链接:https://arxiv.org/abs/2107.04619 摘要:在给定一些上下文(或过去)帧的情况下生成未来帧是一项具有挑战性的任务。它需要对视频的时间一致性进行建模,并根据潜在未来状态的多样性对多模态进行建模。当前用于视频生成的变分方法倾向于在多模式的未来结果上边缘化。相反,我们建议在未来的结果中显式地建模多模态,并利用它来抽样不同的未来。我们的方法,多样化的视频发生器,使用高斯过程(GP)来学习给定过去的未来状态的先验知识,并在给定特定样本的可能未来保持概率分布。此外,我们利用这种分布随时间的变化,通过估计正在进行的序列的结束来控制未来不同状态的采样。也就是说,我们使用GP在输出函数空间上的方差来触发动作序列中的变化。在重建质量和生成序列的多样性方面,我们在不同的未来帧生成方面取得了最新的成果。 摘要:Generating future frames given a few context (or past) frames is a challenging task. It requires modeling the temporal coherence of videos and multi-modality in terms of diversity in the potential future states. Current variational approaches for video generation tend to marginalize over multi-modal future outcomes. Instead, we propose to explicitly model the multi-modality in the future outcomes and leverage it to sample diverse futures. Our approach, Diverse Video Generator, uses a Gaussian Process (GP) to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample. In addition, we leverage the changes in this distribution over time to control the sampling of diverse future states by estimating the end of ongoing sequences. That is, we use the variance of GP over the output function space to trigger a change in an action sequence. We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences.

【85】 Tropical cyclone intensity estimations over the Indian ocean using Machine Learning 标题:基于机器学习的印度洋热带气旋强度估计

作者:Koushik Biswas,Sandeep Kumar,Ashish Kumar Pandey 机构:Department of Computer Science, IIIT Delhi, New Delhi, India,., &, Shaheed Bhagat Singh College, University of Delhi, Department of Mathematics, IIIT Delhi 备注:10 pages 链接:https://arxiv.org/abs/2107.05573 摘要:热带气旋是地球上最强大、破坏力最大的自然现象之一。热带风暴和暴雨会引起洪水,导致人员伤亡和经济损失。伴随飓风而来的毁灭性大风不仅严重影响沿海地区,甚至遥远地区。我们的研究集中在北印度洋热带气旋的强度估计,特别是气旋等级和最大持续地面风速(MSWS)。我们使用各种机器学习算法来估计气旋等级和城市固体废弃物。我们使用了原始盆地、日期、时间、纬度、经度、估计的中心压力和压降作为模型的属性。我们使用分类结果变量、旋风等级的多类分类模型,以及MSW的回归模型,因为它是一个连续变量。利用北印度洋28年来的最佳跟踪数据,我们估计了88%的精度和MSWS的均方根误差(RMSE)为2.3。对于更高级别的类别(5-7),准确率平均提高到98.84%。我们用最近在北印度洋的两个热带气旋瓦尤和法尼来测试我们的模式。对于等级,我们分别获得了93.22%和95.23%的准确率,而对于MSWS,我们分别获得了2.2和3.4的RMSE和0.99和0.99的R^2$。 摘要:Tropical cyclones are one of the most powerful and destructive natural phenomena on earth. Tropical storms and heavy rains can cause floods, which lead to human lives and economic loss. Devastating winds accompanying cyclones heavily affect not only the coastal regions, even distant areas. Our study focuses on the intensity estimation, particularly cyclone grade and maximum sustained surface wind speed (MSWS) of a tropical cyclone over the North Indian Ocean. We use various machine learning algorithms to estimate cyclone grade and MSWS. We have used the basin of origin, date, time, latitude, longitude, estimated central pressure, and pressure drop as attributes of our models. We use multi-class classification models for the categorical outcome variable, cyclone grade, and regression models for MSWS as it is a continuous variable. Using the best track data of 28 years over the North Indian Ocean, we estimate grade with an accuracy of 88% and MSWS with a root mean square error (RMSE) of 2.3. For higher grade categories (5-7), accuracy improves to an average of 98.84%. We tested our model with two recent tropical cyclones in the North Indian Ocean, Vayu and Fani. For grade, we obtained an accuracy of 93.22% and 95.23% respectively, while for MSWS, we obtained RMSE of 2.2 and 3.4 and $R^2$ of 0.99 and 0.99, respectively.

【86】 Synthesizing Multi-Tracer PET Images for Alzheimer's Disease Patients using a 3D Unified Anatomy-aware Cyclic Adversarial Network 标题:基于三维统一解剖感知循环对抗网络的阿尔茨海默病患者多示踪PET图像合成

作者:Bo Zhou,Rui Wang,Ming-Kai Chen,Adam P. Mecca,Ryan S. O'Dell,Christopher H. Van Dyck,Richard E. Carson,James S. Duncan,Chi Liu 机构:Liu, Department of Biomedical Engineering, Yale University, USA, Department of Radiology and Biomedical Imaging, Yale University, USA, Department of Engineering Physics, Tsinghua University, China, Department of Psychiatry, Yale University, USA 备注:Accepted at MICCAI 2021 链接:https://arxiv.org/abs/2107.05491 摘要:正电子发射断层扫描(PET)是研究阿尔茨海默病(AD)的重要工具。PET扫描可以作为诊断工具,并提供认知障碍患者的分子特征。然而,需要多种示踪剂来测量葡萄糖代谢(18F-FDG)、突触囊泡蛋白(11C-UCB-J)和β-淀粉样蛋白(11C-PiB)。给病人使用多种示踪剂会导致高剂量和高成本的辐射。此外,使用新的或不太可用的示踪剂、复杂的生产方法和短半衰期同位素进行PET扫描的机会可能非常有限。因此,有必要建立一个有效的多示踪剂PET合成模型,从单示踪剂PET合成多示踪剂PET。以往的医学图像合成工作主要集中在一对一的固定域平移,不能同时从多个追踪域中学习特征。给定3个或更多的跟踪器,依赖以前的方法也会对要训练的模型数量造成沉重负担。为了解决这些问题,我们提出了一个三维统一的解剖感知循环对抗网络(UCAN),用一个统一的生成模型来转换多示踪PET体积,其中MR包含解剖信息。对多示踪剂PET数据集的评估表明,我们的UCAN可以生成高质量的多示踪剂PET体积,所有PET示踪剂的NMSE小于15%。 摘要:Positron Emission Tomography (PET) is an important tool for studying Alzheimer's disease (AD). PET scans can be used as diagnostics tools, and to provide molecular characterization of patients with cognitive disorders. However, multiple tracers are needed to measure glucose metabolism (18F-FDG), synaptic vesicle protein (11C-UCB-J), and $beta$-amyloid (11C-PiB). Administering multiple tracers to patient will lead to high radiation dose and cost. In addition, access to PET scans using new or less-available tracers with sophisticated production methods and short half-life isotopes may be very limited. Thus, it is desirable to develop an efficient multi-tracer PET synthesis model that can generate multi-tracer PET from single-tracer PET. Previous works on medical image synthesis focus on one-to-one fixed domain translations, and cannot simultaneously learn the feature from multi-tracer domains. Given 3 or more tracers, relying on previous methods will also create a heavy burden on the number of models to be trained. To tackle these issues, we propose a 3D unified anatomy-aware cyclic adversarial network (UCAN) for translating multi-tracer PET volumes with one unified generative model, where MR with anatomical information is incorporated. Evaluations on a multi-tracer PET dataset demonstrate the feasibility that our UCAN can generate high-quality multi-tracer PET volumes, with NMSE less than 15% for all PET tracers.

【87】 Bayesian brains and the Rényi divergence 标题:贝叶斯大脑与Rényi发散

作者:Noor Sajid,Francesco Faccio,Lancelot Da Costa,Thomas Parr,Jürgen Schmidhuber,Karl Friston 机构:WCHN, University College London, UK, Swiss AI Lab IDSIA, Switzerland., Imperial College London, UK & 备注:23 pages, 5 figures 链接:https://arxiv.org/abs/2107.05438 摘要:在贝叶斯脑假设下,行为变异可归因于生成模型参数的不同先验。这提供了一个正式的解释,为什么个人在面对类似的选择时表现出不一致的行为偏好。例如,贪婪的偏好是对某些结果的自信(或精确)信念的结果。在这里,我们提供了一个使用R′enyi发散及其相关变分界限的行为变异性的替代解释。R′enyi界类似于变分自由能(或证据下限),并且可以在相同的假设下导出。重要的是,这些界限提供了一种正式的方法,在给定固定的先验条件下,通过$alpha$参数建立行为差异。这取决于$alpha$的变化,这些变化改变了界限(在连续尺度上),导致不同的后验估计和随后的行为变化。因此,看起来每个人都有不同的前科,得出了不同的结论。更具体地说,$alpha到0^{ }$的优化会导致质量覆盖的变化估计和增加选择行为的可变性。此外,$alphato infty$优化会导致大量寻求变化的后验概率和贪婪偏好。我们举例说明这个公式通过模拟多武装土匪任务。我们注意到,这些$alpha$参数可能特别相关,即形状偏好,当真实后验分布与假设(更简单的)近似密度不在同一个分布族中时,这在许多真实场景中可能是这样。随后偏离了常规的变分推理,在假设大脑进行变分贝叶斯推理的情况下,为生物(或人工)因素的行为偏好差异提供了一个潜在有用的解释。 摘要:Under the Bayesian brain hypothesis, behavioural variations can be attributed to different priors over generative model parameters. This provides a formal explanation for why individuals exhibit inconsistent behavioural preferences when confronted with similar choices. For example, greedy preferences are a consequence of confident (or precise) beliefs over certain outcomes. Here, we offer an alternative account of behavioural variability using R'enyi divergences and their associated variational bounds. R'enyi bounds are analogous to the variational free energy (or evidence lower bound) and can be derived under the same assumptions. Importantly, these bounds provide a formal way to establish behavioural differences through an $alpha$ parameter, given fixed priors. This rests on changes in $alpha$ that alter the bound (on a continuous scale), inducing different posterior estimates and consequent variations in behaviour. Thus, it looks as if individuals have different priors, and have reached different conclusions. More specifically, $alpha to 0^{ }$ optimisation leads to mass-covering variational estimates and increased variability in choice behaviour. Furthermore, $alpha to infty$ optimisation leads to mass-seeking variational posteriors and greedy preferences. We exemplify this formulation through simulations of the multi-armed bandit task. We note that these $alpha$ parameterisations may be especially relevant, i.e., shape preferences, when the true posterior is not in the same family of distributions as the assumed (simpler) approximate density, which may be the case in many real-world scenarios. The ensuing departure from vanilla variational inference provides a potentially useful explanation for differences in behavioural preferences of biological (or artificial) agents under the assumption that the brain performs variational Bayesian inference.

【88】 Metalearning Linear Bandits by Prior Update 标题:基于先验更新的元学习线性Bitts

作者:Amit Peleg,Naama Pearl,Ron Meir 机构:Technion, Israel, University of Haifa, Israel 链接:https://arxiv.org/abs/2107.05320 摘要:序贯决策的完全贝叶斯方法假设问题参数是由已知的先验信息生成的,而在实际应用中,这种信息往往是缺乏的,需要通过学习来估计。这一问题在具有部分信息的决策设置中更加严重,使用错误的先验可能导致较差的探索和较差的性能。在这项工作中,我们证明,在随机线性强盗和高斯先验的情况下,只要先验估计足够接近真实先验,使用错误先验的算法的性能接近使用真实先验的算法的性能。接下来,我们讨论通过metalearning学习先验知识的任务,即学习者在多个任务实例中更新先验知识的估计,以提高未来任务的性能。然后在每个任务中根据传入的观察更新估计的先验值,同时选择行动以最大化预期回报。在这项工作中,我们应用这个方案在一个线性土匪设置,并提供了算法和遗憾界,证明其有效性,相比,算法知道正确的先验知识。我们的结果适用于一类广泛的算法,包括,例如,汤普森采样和信息定向采样。 摘要:Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior, while in practice, such information is often lacking, and needs to be estimated through learning. This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of the algorithm that uses the true prior. Next, we address the task of learning the prior through metalearning, where a learner updates its estimate of the prior across multiple task instances in order to improve performance on future tasks. The estimated prior is then updated within each task based on incoming observations, while actions are selected in order to maximize expected reward. In this work we apply this scheme within a linear bandit setting, and provide algorithms and regret bounds, demonstrating its effectiveness, as compared to an algorithm that knows the correct prior. Our results hold for a broad class of algorithms, including, for example, Thompson Sampling and Information Directed Sampling.

【89】 Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper 标题:使用分层包装器提高因果发现的效率和准确性

作者:Shami Nisimov,Yaniv Gurwicz,Raanan Y. Rohekar,Gal Novik 机构:Intel Labs 备注:The 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021), Workshop on Tractable Probabilistic Modeling 链接:https://arxiv.org/abs/2107.05001 摘要:从观测数据中发现因果关系是许多科学分支的重要工具。在某些假设下,它允许科学家解释现象、预测和做出决定。在大样本限制下,已有完善的因果发现算法被引入其中,搜索表示因果关系的有向无环图(DAG)或其等价类。然而,在现实世界中,只有有限的训练数据可用,这限制了这些算法使用的统计测试的能力,导致推断的因果模型中的错误。这通常是通过设计一个策略来解决的,该策略使用尽可能少的统计测试。在本文中,我们以递归包装的形式为现有的基于约束的因果发现算法引入了这样一种策略,它保持了算法的可靠性和完整性。它从一开始就使用规范化的最小割准则递归地对观测变量进行聚类,并在回溯过程中使用基线因果发现算法来学习局部子图。然后将它们结合起来,确保完整性。通过烧蚀研究、使用合成数据和常见的实际基准,我们证明了我们的方法需要更少的统计测试,学习更精确的图形,并且需要比基线算法更短的运行时间。 摘要:Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.

【90】 Dense-Sparse Deep CNN Training for Image Denoising 标题:用于图像去噪的稠密-稀疏深度CNN训练

作者:Basit O. Alawode,Mudassir Masood,Tarig Ballal,Tareq Al-Naffouri 机构:Electrical Engineering Department, Computer, Electrical, and Mathematical Sciences and Engineering, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 链接:https://arxiv.org/abs/2107.04857 摘要:近年来,卷积神经网络等深度学习方法在图像去噪领域得到了广泛的应用。这是因为它们被证明有能力超越最先进的经典图像去噪算法,如BM3D。深度去噪CNNs(DnCNNs)采用多个前向卷积层,加入了批量归一化和残差学习的正则化方法,显著提高了去噪性能。然而,这是以大量可训练参数为代价的。在本文中,我们通过减少参数的数量来解决这个问题,同时达到相当的性能水平。我们从使用密集-稀疏-密集(DSD)训练方法的训练网络获得的改进性能中获得动机。我们将这种训练方法扩展到一个简化的DnCNN(rdncn)网络,使得去噪速度更快,参数显著减少,性能与DnCNN相当。 摘要:Recently, deep learning (DL) methods such as convolutional neural networks (CNNs) have gained prominence in the area of image denoising. This is owing to their proven ability to surpass state-of-the-art classical image denoising algorithms such as BM3D. Deep denoising CNNs (DnCNNs) use many feedforward convolution layers with added regularization methods of batch normalization and residual learning to improve denoising performance significantly. However, this comes at the expense of a huge number of trainable parameters. In this paper, we address this issue by reducing the number of parameters while achieving a comparable level of performance. We derive motivation from the improved performance obtained by training networks using the dense-sparse-dense (DSD) training approach. We extend this training approach to a reduced DnCNN (RDnCNN) network resulting in a faster denoising network with significantly reduced parameters and comparable performance to the DnCNN.

【91】 Machine Learning for Financial Forecasting, Planning and Analysis: Recent Developments and Pitfalls 标题:用于财务预测、规划和分析的机器学习:最新进展和陷阱

作者:Helmut Wasserbacher,Martin Spindler 备注:31 pages, 3 figures, 4 tables 链接:https://arxiv.org/abs/2107.04851 摘要:本文介绍了用于财务预测、计划和分析(FP&A)的机器学习。机器学习似乎非常适合支持FP&A,从大量数据中高度自动化地提取信息。然而,由于大多数传统的机器学习技术侧重于预测(prediction),因此我们讨论了在使用它们进行规划和资源分配(因果推理)时必须特别注意的问题。虽然机器学习的简单应用通常在这种情况下失败,但最近开发的双机器学习框架可以解决感兴趣的因果问题。我们回顾了FP&A中机器学习的最新文献,并在一个模拟研究中说明了如何将机器学习用于预测和规划。我们还研究了随着数据点数量的增加,预测和计划是如何改进的。 摘要:This article is an introduction to machine learning for financial forecasting, planning and analysis (FP&A). Machine learning appears well suited to support FP&A with the highly automated extraction of information from large amounts of data. However, because most traditional machine learning techniques focus on forecasting (prediction), we discuss the particular care that must be taken to avoid the pitfalls of using them for planning and resource allocation (causal inference). While the naive application of machine learning usually fails in this context, the recently developed double machine learning framework can address causal questions of interest. We review the current literature on machine learning in FP&A and illustrate in a simulation study how machine learning can be used for both forecasting and planning. We also investigate how forecasting and planning improve as the number of data points increases.

0 人点赞