人工智能学术速递[7.19]

2021-07-27 11:03:49 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.AI人工智能,共计29篇

【1】 Graph Kernel Attention Transformers 标题:图核注意力转换器

作者:Krzysztof Choromanski,Han Lin,Haoxian Chen,Jack Parker-Holder 机构:Google Brain Robotics & Columbia University, University of Oxford 备注:18 pages, 9 figures 链接:https://arxiv.org/abs/2107.07999 摘要:我们引入了一类新的图神经网络(GNNs),它结合了迄今为止独立研究的几个概念:图核、基于注意的具有结构先验的网络和最近通过低秩分解技术应用小内存占用隐式注意方法的高效Transformer结构。本文的目标有两个。由美国提出的图形内核注意变换器(或GKATs)比SOTA-GNNs更具表现力,因为它能够在一个层中建模更长范围的依赖关系。因此,他们可以使用更浅层次的架构设计。此外,GKAT注意层在输入图的节点数上是线性的,而不是二次的,即使这些图是稠密的,需要的计算量也比规则图注意层少。他们通过应用新的图核类,通过图上的随机游动来实现随机特征映射分解。作为这些技术的副产品,我们得到了一类新的可学习的图草图,称为图,它对拓扑图的性质以及节点的特征进行了紧凑的编码。我们将我们的方法与9个不同的GNN类进行了详尽的实证比较,从基序检测到社会网络分类,再到生物信息学挑战,显示了GKATs的一致性。 摘要:We introduce a new class of graph neural networks (GNNs), by combining several concepts that were so far studied independently - graph kernels, attention-based networks with structural priors and more recently, efficient Transformers architectures applying small memory footprint implicit attention methods via low rank decomposition techniques. The goal of the paper is twofold. Proposed by us Graph Kernel Attention Transformers (or GKATs) are much more expressive than SOTA GNNs as capable of modeling longer-range dependencies within a single layer. Consequently, they can use more shallow architecture design. Furthermore, GKAT attention layers scale linearly rather than quadratically in the number of nodes of the input graphs, even when those graphs are dense, requiring less compute than their regular graph attention counterparts. They achieve it by applying new classes of graph kernels admitting random feature map decomposition via random walks on graphs. As a byproduct of the introduced techniques, we obtain a new class of learnable graph sketches, called graphots, compactly encoding topological graph properties as well as nodes' features. We conducted exhaustive empirical comparison of our method with nine different GNN classes on tasks ranging from motif detection through social network classification to bioinformatics challenges, showing consistent gains coming from GKATs.

【2】 Blockchain Technology: Bitcoins, Cryptocurrency and Applications 标题:区块链技术:比特币、加密货币和应用

作者:Bosubabu Sambana 机构:Assistant Professor, Department of Computer Science and Engineering 备注:7 Pages, 4 Figures 链接:https://arxiv.org/abs/2107.07964 摘要:区块链是一个分散的账本,用于安全地交换数字货币,以高效的方式执行交易和交易,网络的每个用户都可以访问加密账本的最少副本,以便他们能够验证新的交易。区块链分类账是过去执行的所有比特币交易的集合。基本上,它是一个分布式数据库,它维护着不断增长的防篡改数据结构块,这些数据结构块保存着成批的单个事务。完成的区块按线性和时间顺序添加。每个块包含一个时间戳和指向前一个块的信息链接。比特币是一种点对点的无许可网络,允许每个用户连接到网络并发送新的事务来验证和创建新的块。Satoshi Nakamoto在他的研究论文中描述了比特币数字货币的设计,这篇文章发表在密码学ListServ 2008上。Nakamoto的建议解决了密码学的长期悬而未决的问题,为数字货币奠定了基础。本文阐述了比特币的概念、特征、区块链的必要性以及比特币的工作原理。它试图强调区块链在塑造银行业、金融服务业未来以及采用互联网和未来技术方面的作用。 摘要:Blockchain is a decentralized ledger used to securely exchange digital currency, perform deals and transactions efficient manner, each user of the network has access to the least copy of the encrypted ledger so that they can validate a new transaction. The blockchain ledger is a collection of all Bitcoin transactions executed in the past. Basically, it's distributed database that maintains continuously growing tamper-proof data structure blocks that holds batches of individual transactions. The completed blocks are added in a linear and chronological order. Each block contains a timestamp and information link which points to a previous block. Bitcoin is a peer-to-peer permissionless network that allows every user to connect to the network and send new transactions to verify and create new blocks. Satoshi Nakamoto described the design of Bitcoin digital currency in his research paper posted to a cryptography listserv 2008. Nakamoto's suggestion has solved the long-pending problem of cryptography and laid the foundation stone for digital currency. This paper explains the concept of bitcoin, its characteristics, the need for Blockchain, and how Bitcoin works. It attempts to highlight the role of Blockchain in shaping the future of banking , financial services, and the adoption of the Internet of Thinks and future Technologies.

【3】 MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary Learning for Multiobjective Optimization 标题:MODRL/D-EL:用于多目标优化的进化学习多目标深度强化学习

作者:Yongxin Zhang,Jiahai Wang,Zizhen Zhang,Yalan Zhou 机构:School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, P. R. China, Colloge of Information, Guangdong University of Finance and Economics 链接:https://arxiv.org/abs/2107.07961 摘要:基于学习的启发式算法求解组合优化问题近年来引起了学术界的广泛关注。虽然大多数现有的工作只考虑具有简单约束的单目标问题,但许多现实世界问题具有多目标视角,并且包含丰富的约束集。针对具有时间窗的多目标车辆路径问题,提出了一种基于进化学习算法的多目标深度强化学习算法。在该算法中,采用分解策略为一组注意模型生成子问题。通过引入全面的上下文信息进一步增强注意模型。采用进化学习方法对模型参数进行微调。在MO-VRPTW实例上的实验结果表明,该算法优于其他基于学习和迭代的方法。 摘要:Learning-based heuristics for solving combinatorial optimization problems has recently attracted much academic attention. While most of the existing works only consider the single objective problem with simple constraints, many real-world problems have the multiobjective perspective and contain a rich set of constraints. This paper proposes a multiobjective deep reinforcement learning with evolutionary learning algorithm for a typical complex problem called the multiobjective vehicle routing problem with time windows (MO-VRPTW). In the proposed algorithm, the decomposition strategy is applied to generate subproblems for a set of attention models. The comprehensive context information is introduced to further enhance the attention models. The evolutionary learning is also employed to fine-tune the parameters of the models. The experimental results on MO-VRPTW instances demonstrate the superiority of the proposed algorithm over other learning-based and iterative-based approaches.

【4】 Temporal-aware Language Representation Learning From Crowdsourced Labels 标题:基于众包标签的时态感知语言表征学习

作者:Yang Hao,Xiao Zhai,Wenbiao Ding,Zitao Liu 机构:TAL Education Group, Beijing, China 备注:The 59th Annual Meeting of the Association for Computational Linguistics Workshop on Representation Learning for NLP (ACL RepL4NLP 2021) 链接:https://arxiv.org/abs/2107.07958 摘要:从众包标签中学习有效的语言表示对于许多实际的机器学习任务是至关重要的。这个问题的一个挑战性方面是众包标签的质量会受到观察者内部和观察者之间的高度可变性的影响。由于高容量的深层神经网络可以很容易地记忆众包标签之间的所有分歧,直接应用现有的有监督语言表示学习算法可能会产生次优解。在本文中,我们提出了emph{TACMA},一个underline{t}emporal-underline{a}ware语言表示学习启发式算法,用于带有underline{m}多个underline{a}注释符的underline{c}行源标签。该方法(1)利用注意机制对观察者内部的可变性进行了显式建模(2) 计算和汇总来自多个工人的每个样本的信心分数,以解决观察者之间的分歧。所提出的启发式算法非常容易在大约5行代码中实现。在四个合成数据集和四个真实数据集上对所提出的启发式算法进行了评估。结果表明,我们的方法在预测精度和AUC方面优于许多最新的基线。为了鼓励可复制的结果,我们在url上公开了我们的代码{https://github.com/CrowdsourcingMining/TACMA}. 摘要:Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intra- and inter-observer variability. Since the high-capacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervised language representation learning algorithms may yield suboptimal solutions. In this paper, we propose emph{TACMA}, a underline{t}emporal-underline{a}ware language representation learning heuristic for underline{c}rowdsourced labels with underline{m}ultiple underline{a}nnotators. The proposed approach (1) explicitly models the intra-observer variability with attention mechanism; (2) computes and aggregates per-sample confidence scores from multiple workers to address the inter-observer disagreements. The proposed heuristic is extremely easy to implement in around 5 lines of code. The proposed heuristic is evaluated on four synthetic and four real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage the reproducible results, we make our code publicly available at url{https://github.com/CrowdsourcingMining/TACMA}.

【5】 Automatic Task Requirements Writing Evaluation via Machine Reading Comprehension 标题:基于机器阅读理解的任务要求写作自动评测

作者:Shiting Xu,Guowei Xu,Peilei Jia,Wenbiao Ding,Zhongqin Wu,Zitao Liu 机构:TAL Education Group, Beijing, China 备注:AIED'21: The 22nd International Conference on Artificial Intelligence in Education, 2021 链接:https://arxiv.org/abs/2107.07957 摘要:任务要求(TRs)写作是英语重点测试和英语初级测试中的一个重要题型。一个TR写作问题可能包括多个要求,一篇高质量的论文必须对每个要求做出全面准确的回应。然而,有限的教师资源使学生无法立即获得详细的评分。大多数现有的自动论文评分系统侧重于给出一个整体的分数,但很少提供支持它的理由。为了在一定程度上解决这一问题,本文提出了一种基于机器阅读理解的端到端框架。该框架不仅检测文章是否回应了需求问题,而且清楚地标记了文章回答问题的位置。该框架由三个模块组成:问题规范化模块、基于ELECTRA的MRC模块和响应定位模块。我们广泛探索最先进的MRC方法。我们的方法在真实的教育数据集上获得了0.93的准确度和0.85的F1分数。为了鼓励可复制的结果,我们在url上公开了我们的代码{https://github.com/aied2021TRMRC/AIED_2021_TRMRC_code}. 摘要:Task requirements (TRs) writing is an important question type in Key English Test and Preliminary English Test. A TR writing question may include multiple requirements and a high-quality essay must respond to each requirement thoroughly and accurately. However, the limited teacher resources prevent students from getting detailed grading instantly. The majority of existing automatic essay scoring systems focus on giving a holistic score but rarely provide reasons to support it. In this paper, we proposed an end-to-end framework based on machine reading comprehension (MRC) to address this problem to some extent. The framework not only detects whether an essay responds to a requirement question, but clearly marks where the essay answers the question. Our framework consists of three modules: question normalization module, ELECTRA based MRC module and response locating module. We extensively explore state-of-the-art MRC methods. Our approach achieves 0.93 accuracy score and 0.85 F1 score on a real-world educational dataset. To encourage reproducible results, we make our code publicly available at url{https://github.com/aied2021TRMRC/AIED_2021_TRMRC_code}.

【6】 Unsupervised Discovery of Object Radiance Fields 标题:对象辐射场的无监督发现

作者:Hong-Xing Yu,Leonidas J. Guibas,Jiajun Wu 机构:Stanford University 备注:Project page: this https URL 链接:https://arxiv.org/abs/2107.07905 摘要:我们研究了从一幅图像中推断出一个以对象为中心的场景表示的问题,目的是得到一个能够解释图像形成过程、捕捉场景的3D特性并在无监督的情况下学习的表示。由于将复杂的3D-to-2D图像形成过程集成到强大的推理方案(如深度网络)中的基本挑战,大多数现有的场景分解方法都缺乏一个或多个这样的特征。在这篇论文中,我们提出了无监督的物体辐射场发现(uORF),结合神经三维场景表示和绘制的最新进展,以及用于无监督三维场景分解的深度推理网络。uORF在没有注释的多视图RGB图像上训练,学习从单个图像分解具有不同纹理背景的复杂场景。我们证明了uORF在三个数据集上的无监督三维场景分割、新颖的视图合成和场景编辑都有很好的表现。 摘要:We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature, and is learned without supervision. Most existing methods on scene decomposition lack one or more of these characteristics, due to the fundamental challenge in integrating the complex 3D-to-2D image formation process into powerful inference schemes like deep networks. In this paper, we propose unsupervised discovery of Object Radiance Fields (uORF), integrating recent progresses in neural 3D scene representations and rendering with deep inference networks for unsupervised 3D scene decomposition. Trained on multi-view RGB images without annotations, uORF learns to decompose complex scenes with diverse, textured background from a single image. We show that uORF performs well on unsupervised 3D scene segmentation, novel view synthesis, and scene editing on three datasets.

【7】 Ranking labs-of-origin for genetically engineered DNA using Metric Learning 标题:利用度量学习对基因工程DNA原产地实验室进行排序

作者:I. Muniz,F. H. F. Camargo,A. Marques 备注:4 pages, 2 figures, 1 algorithm 链接:https://arxiv.org/abs/2107.07878 摘要:随着基因工程的不断发展,一个共同关注的问题是如何识别基因工程DNA序列的起源实验室。因此,AltLabs主办了基因工程归因挑战赛,召集了许多团队提出新的工具来解决这个问题。在这里,我们展示了我们提出的方法,以排名最有可能的起源实验室和生成嵌入DNA序列和实验室。这些嵌入还可以执行各种其他任务,比如对DNA序列和实验室进行聚类,并将它们用作用于解决其他问题的机器学习模型的特征。这项工作表明,我们的方法优于经典的训练方法,为这个任务,同时产生其他有用的信息。 摘要:With the constant advancements of genetic engineering, a common concern is to be able to identify the lab-of-origin of genetically engineered DNA sequences. For that reason, AltLabs has hosted the genetic Engineering Attribution Challenge to gather many teams to propose new tools to solve this problem. Here we show our proposed method to rank the most likely labs-of-origin and generate embeddings for DNA sequences and labs. These embeddings can also perform various other tasks, like clustering both DNA sequences and labs and using them as features for Machine Learning models applied to solve other problems. This work demonstrates that our method outperforms the classic training method for this task while generating other helpful information.

【8】 Deep Learning Beam Optimization in Millimeter-Wave Communication Systems 标题:毫米波通信系统中的深度学习波束优化

作者:Rafail Ismayilov,Renato L. G. Cavalcante,Sławomir Stańczak 机构:∗Fraunhofer Heinrich Hertz Institute, Berlin, Germany, †Technical University of Berlin, Berlin, Germany 链接:https://arxiv.org/abs/2107.07846 摘要:提出了一种将不动点算法与神经网络相结合的方法,对毫米波通信系统中的离散变量和连续变量进行联合优化,从而在一个明确的意义上公平地分配用户速率。更详细地说,离散变量包括用户接入点分配和波束配置,而连续变量指的是功率分配。利用神经网络从用户相关信息中预测波束形状。给定所预测的波束配置,一个定点算法分配功率并将用户分配到接入点,这样用户就可以获得最大的无干扰速率。该方法以“一次性”的方式预测波束形状,大大降低了波束搜索过程的复杂度。此外,即使预测的波束配置不是最优的,固定点算法仍然为给定的波束配置提供最优的功率分配和用户接入点分配。 摘要:We propose a method that combines fixed point algorithms with a neural network to optimize jointly discrete and continuous variables in millimeter-wave communication systems, so that the users' rates are allocated fairly in a well-defined sense. In more detail, the discrete variables include user-access point assignments and the beam configurations, while the continuous variables refer to the power allocation. The beam configuration is predicted from user-related information using a neural network. Given the predicted beam configuration, a fixed point algorithm allocates power and assigns users to access points so that the users achieve the maximum fraction of their interference-free rates. The proposed method predicts the beam configuration in a "one-shot" manner, which significantly reduces the complexity of the beam search procedure. Moreover, even if the predicted beam configurations are not optimal, the fixed point algorithm still provides the optimal power allocation and user-access point assignments for the given beam configuration.

【9】 Versatile modular neural locomotion control with fast learning 标题:具有快速学习功能的多功能模块化神经运动控制

作者:Mathias Thor,Poramate Manoonpong 机构:Embodied AI and Neurorobotics Laboratory, SDU Biorobotics, The Mærsk Mc-Kinney Møller Institute, The University of Southern Denmark, Campusvej , Odense , Denmark, Bio-Inspired Robotics and Neural Engineering Laboratory 备注:For supplementary video files see: this https URL 链接:https://arxiv.org/abs/2107.07844 摘要:腿部机器人在高度非结构化的环境中有着巨大的潜力。然而,运动控制的设计仍然具有挑战性。目前,控制器必须为特定的机器人和任务手动设计,或者通过机器学习方法自动设计,这些方法需要较长的训练时间并产生大量不透明的控制器。从动物运动的启发,我们提出了一个简单而通用的模块化快速学习神经控制结构。该方法的主要优点是可以逐步增加特定于行为的控制模块,以获得日益复杂的紧急运动行为,并且可以快速自动地学习与现有模块接口的神经连接。在一系列实验中,我们展示了如何快速学习八个模块,并将其添加到一个基本控制模块中,以获得紧急的自适应行为,从而使六足机器人能够在复杂环境中导航。我们还表明,模块可以在操作过程中添加和删除,而不影响其余控制器的功能。最后,在一个物理六足机器人上成功地演示了该控制方法。综上所述,我们的研究为复杂机器人系统的多功能神经运动控制的快速自动设计迈出了重要的一步。 摘要:Legged robots have significant potential to operate in highly unstructured environments. The design of locomotion control is, however, still challenging. Currently, controllers must be either manually designed for specific robots and tasks, or automatically designed via machine learning methods that require long training times and yield large opaque controllers. Drawing inspiration from animal locomotion, we propose a simple yet versatile modular neural control structure with fast learning. The key advantages of our approach are that behavior-specific control modules can be added incrementally to obtain increasingly complex emergent locomotion behaviors, and that neural connections interfacing with existing modules can be quickly and automatically learned. In a series of experiments, we show how eight modules can be quickly learned and added to a base control module to obtain emergent adaptive behaviors allowing a hexapod robot to navigate in complex environments. We also show that modules can be added and removed during operation without affecting the functionality of the remaining controller. Finally, the control approach was successfully demonstrated on a physical hexapod robot. Taken together, our study reveals a significant step towards fast automatic design of versatile neural locomotion control for complex robotic systems.

【10】 A Survey of Knowledge Graph Embedding and Their Applications 标题:知识图嵌入及其应用综述

作者:Shivani Choudhary,Tarun Luthra,Ashima Mittal,Rajat Singh 机构:Indian Institute of Technology Delhi, Hauz Khas, Delhi-, India 备注:11 pages, 9 figures 链接:https://arxiv.org/abs/2107.07842 摘要:知识图嵌入提供了一种通用的知识表示技术。这些技术可以应用于知识图的完备化、缺失信息的预测、推荐系统、问答、查询扩展等领域,而嵌入在知识图中的信息虽然是结构化的,但在实际应用中很难消耗。知识图嵌入使得真实世界的应用程序能够使用信息来提高性能。知识图嵌入是一个非常活跃的研究领域。大多数嵌入方法都是针对基于结构的信息。近年来的研究将实体嵌入的边界扩展到基于文本和基于图像的信息。人们努力用上下文信息来增强表示。本文介绍了KG嵌入领域从简单的基于翻译的模型到基于丰富性的模型的发展。本文介绍了知识图在实际应用中的应用。 摘要:Knowledge Graph embedding provides a versatile technique for representing knowledge. These techniques can be used in a variety of applications such as completion of knowledge graph to predict missing information, recommender systems, question answering, query expansion, etc. The information embedded in Knowledge graph though being structured is challenging to consume in a real-world application. Knowledge graph embedding enables the real-world application to consume information to improve performance. Knowledge graph embedding is an active research area. Most of the embedding methods focus on structure-based information. Recent research has extended the boundary to include text-based information and image-based information in entity embedding. Efforts have been made to enhance the representation with context information. This paper introduces growth in the field of KG embedding from simple translation-based models to enrichment-based models. This paper includes the utility of the Knowledge graph in real-world applications.

【11】 Graph Representation Learning for Road Type Classification 标题:用于道路类型分类的图表示学习

作者:Zahra Gharaee,Shreyas Kowshik,Oliver Stromann,Michael Felsberg 机构:Computer Vision Laboratory (CVL), Department of Electrical Engineering, University of Link¨oping, Link¨oping, Sweden, Department of Mathematics, Indian Institute of Technology Kharagpur, India, Autonomous Transport Solutions Research, Scania CV AB, Sweden 链接:https://arxiv.org/abs/2107.07791 摘要:我们提出了一种新的基于学习的道路网络图表示方法,采用最先进的图卷积神经网络。我们的方法被应用于17个城市的真实道路网从开放的街道地图。边缘特征是生成道路网络描述图的关键,而图卷积网络通常只依赖于节点特征。我们证明了高度代表性的边缘特征仍然可以通过线图变换集成到这样的网络中。提出了一种基于局部邻域和全局邻域的拓扑邻域采样方法。我们比较了使用不同类型的邻域聚合函数的学习表征在导入和归纳任务中以及在有监督和无监督学习中的表现。此外,我们还提出了一种新的聚合方法,即图注意同构网络。结果表明,在道路类型分类问题上,GAIN算法的性能优于现有的方法。 摘要:We present a novel learning-based approach to graph representations of road networks employing state-of-the-art graph convolutional neural networks. Our approach is applied to realistic road networks of 17 cities from Open Street Map. While edge features are crucial to generate descriptive graph representations of road networks, graph convolutional networks usually rely on node features only. We show that the highly representative edge features can still be integrated into such networks by applying a line graph transformation. We also propose a method for neighborhood sampling based on a topological neighborhood composed of both local and global neighbors. We compare the performance of learning representations using different types of neighborhood aggregation functions in transductive and inductive tasks and in supervised and unsupervised learning. Furthermore, we propose a novel aggregation approach, Graph Attention Isomorphism Network, GAIN. Our results show that GAIN outperforms state-of-the-art methods on the road type classification problem.

【12】 Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation 标题:深入了解:开放领域对话生成的知识对话循环利用机制

作者:Yajing Sun,Yue Hu,Luxi Xing,Yuqiang Xie,Xiangpeng Wei 机构:Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 链接:https://arxiv.org/abs/2107.07771 摘要:端到端智能神经对话系统存在产生不一致和重复反应的问题。现有的对话模式注重将个人知识单方面地融入到对话中,而忽略了将与个性相关的对话信息融入到作为双边信息流的个人知识中会提高后续对话的质量。此外,在会话层面上控制个人知识的使用也是必不可少的。本文提出了一种会话自适应多视角人物角色感知响应生成模型,旨在提高会话的一致性,减少重复。首先,我们从多个角度考虑会话一致性。从人物模型的角度出发,我们设计了一个新的交互模块,该模块不仅将个性化知识迭代地融入到每个回合的会话中,而且从会话中获取与个性相关的信息,以增强个性化知识的语义表示。从语体的角度出发,引入语体向量,并将其输入解码器,以保持语体的一致性。为了避免会话重复,我们设计了一个覆盖机制来跟踪个人知识利用的激活。通过自动评价和人工评价实验,验证了该模型的优越性。 摘要:End-to-End intelligent neural dialogue systems suffer from the problems of generating inconsistent and repetitive responses. Existing dialogue models pay attention to unilaterally incorporating personal knowledge into the dialog while ignoring the fact that incorporating the personality-related conversation information into personal knowledge taken as the bilateral information flow boosts the quality of the subsequent conversation. Besides, it is indispensable to control personal knowledge utilization over the conversation level. In this paper, we propose a conversation-adaption multi-view persona aware response generation model that aims at enhancing conversation consistency and alleviating the repetition from two folds. First, we consider conversation consistency from multiple views. From the view of the persona profile, we design a novel interaction module that not only iteratively incorporates personalized knowledge into each turn conversation but also captures the personality-related information from conversation to enhance personalized knowledge semantic representation. From the view of speaking style, we introduce the speaking style vector and feed it into the decoder to keep the speaking style consistency. To avoid conversation repetition, we devise a coverage mechanism to keep track of the activation of personal knowledge utilization. Experiments on both automatic and human evaluation verify the superiority of our model over previous models.

【13】 Architecture of Automated Crypto-Finance Agent 标题:自动密码金融代理的体系结构

作者:Ali Raheman,Anton Kolonin,Ben Goertzel,Gergely Hegykozi,Ikram Ansari 机构:Autonio Foundation Ltd. , High Street,nd Floor, Bristol BS,BY, UK, SingularityNET Foundation, Barbara Strozzilaan , (Eurocenter II) , HN Amsterdam Netherlands 备注:9 pages, 7 figures 链接:https://arxiv.org/abs/2107.07769 摘要:我们提出了一个自主代理的认知结构,用于分散金融中的主动投资组合管理,包括资产选择、投资组合平衡、流动性提供和交易等活动。给出了该体系结构的部分实现,并给出了初步的结果和结论。 摘要:We present the cognitive architecture of an autonomous agent for active portfolio management in decentralized finance, involving activities such as asset selection, portfolio balancing, liquidity provision, and trading. Partial implementation of the architecture is provided and supplied with preliminary results and conclusions.

【14】 MS-MDA: Multisource Marginal Distribution Adaptation for Cross-subject and Cross-session EEG Emotion Recognition 标题:MS-MDA:多源边缘分布自适应跨学科、跨时段脑电信号情绪识别

作者:Hao Chen,Ming Jin,Zhunan Li,Cunhang Fan,Jinpeng Li,Huiguang He 机构:School of Computer Science and Technology 备注:10 pages, 8 figures 链接:https://arxiv.org/abs/2107.07740 摘要:作为精神疾病诊断和康复的重要组成部分,基于脑电图(EEG)的情感识别以其高精度和高可靠性而取得了重大进展。然而,实用性的一个障碍在于主题和会话之间的可变性。虽然已有一些研究采用了域适应(DA)方法来解决这一问题,但大多数研究将来自不同受试者和会话的多个脑电数据作为一个单一的源域进行传输,这要么不能满足域适应的假设,即源具有一定的边缘分布,或者增加适应的难度。因此,我们提出了一种多源边缘分布自适应(MS-MDA)的脑电情感识别方法,该方法同时考虑了领域不变性和特定领域的特征。首先假设不同的脑电数据具有相同的低层特征,然后构造多个脑电数据源域的独立分支,采用一对一的域自适应,提取特定域的特征。最后,由多个分支进行推理。我们评估了SEED和SEED-IV分别识别三种和四种情绪的方法。实验结果表明,MS-MDA在跨会话和跨主题传输场景中的性能优于比较方法和最新模型。代码位于https://github.com/VoiceBeer/MS-MDA. 摘要:As an essential element for the diagnosis and rehabilitation of psychiatric disorders, the electroencephalogram (EEG) based emotion recognition has achieved significant progress due to its high precision and reliability. However, one obstacle to practicality lies in the variability between subjects and sessions. Although several studies have adopted domain adaptation (DA) approaches to tackle this problem, most of them treat multiple EEG data from different subjects and sessions together as a single source domain for transfer, which either fails to satisfy the assumption of domain adaptation that the source has a certain marginal distribution, or increases the difficulty of adaptation. We therefore propose the multi-source marginal distribution adaptation (MS-MDA) for EEG emotion recognition, which takes both domain-invariant and domain-specific features into consideration. First, we assume that different EEG data share the same low-level features, then we construct independent branches for multiple EEG data source domains to adopt one-to-one domain adaptation and extract domain-specific features. Finally, the inference is made by multiple branches. We evaluate our method on SEED and SEED-IV for recognizing three and four emotions, respectively. Experimental results show that the MS-MDA outperforms the comparison methods and state-of-the-art models in cross-session and cross-subject transfer scenarios in our settings. Codes at https://github.com/VoiceBeer/MS-MDA.

【15】 Semi-supervised Learning for Marked Temporal Point Processes 标题:带标记时点过程的半监督学习

作者:Shivshankar Reddy,Anand Vir Singh Chauhan,Maneet Singh,Karamjit Singh 机构:AI Garage, Mastercard, India 链接:https://arxiv.org/abs/2107.07729 摘要:时间点过程(tpp)通常被用来表示事件序列,这些序列是按照事件发生的时间排序的。由于tpp的灵活性,它被用来模拟不同的场景,并在各种实际应用中表现出了适用性。tpp侧重于事件发生的建模,而标记时间点过程(markedtemporalpointprocess,MTPP)也侧重于事件的类别(marker)的建模。在过去的几年中,MTPP的研究得到了广泛的关注,主要集中在监督算法上。尽管有研究重点,但在半监督环境下开发解决方案这一具有挑战性的问题仍然受到了有限的关注,因为在半监督环境下,算法可以访问标记和未标记数据的混合。本研究提出一种新的适用于此类情形的标记时间点过程半监督学习算法(SSL-MTPP)。提出的SSL-MTPP算法结合了有标记和无标记的数据来学习一个鲁棒的标记预测模型。该算法利用基于RNN的编解码模块学习时间序列的有效表示。通过在转发数据集上的多个协议验证了该算法的有效性,与传统的有监督学习方法相比,SSL-MTPP算法的性能得到了改善。 摘要:Temporal Point Processes (TPPs) are often used to represent the sequence of events ordered as per the time of occurrence. Owing to their flexible nature, TPPs have been used to model different scenarios and have shown applicability in various real-world applications. While TPPs focus on modeling the event occurrence, Marked Temporal Point Process (MTPP) focuses on modeling the category/class of the event as well (termed as the marker). Research in MTPP has garnered substantial attention over the past few years, with an extensive focus on supervised algorithms. Despite the research focus, limited attention has been given to the challenging problem of developing solutions in semi-supervised settings, where algorithms have access to a mix of labeled and unlabeled data. This research proposes a novel algorithm for Semi-supervised Learning for Marked Temporal Point Processes (SSL-MTPP) applicable in such scenarios. The proposed SSL-MTPP algorithm utilizes a combination of labeled and unlabeled data for learning a robust marker prediction model. The proposed algorithm utilizes an RNN-based Encoder-Decoder module for learning effective representations of the time sequence. The efficacy of the proposed algorithm has been demonstrated via multiple protocols on the Retweet dataset, where the proposed SSL-MTPP demonstrates improved performance in comparison to the traditional supervised learning approach.

【16】 Neural Contextual Anomaly Detection for Time Series 标题:时间序列的神经上下文异常检测

作者:Chris U. Carmona,François-Xavier Aubet,Valentin Flunkert,Jan Gasthaus 机构:University of Oxford, AWS AI Labs 备注:Chris and Franc{c}ois-Xavier contributed equally 链接:https://arxiv.org/abs/2107.07702 摘要:我们介绍了神经上下文异常检测(NCAD),这是一个用于时间序列异常检测的框架,它可以无缝地从无监督环境扩展到有监督环境,并且适用于单变量和多变量时间序列。这是通过有效地结合多元时间序列表示学习的最新发展,以及最初为计算机视觉开发的深度异常检测技术来实现的,我们根据时间序列设置来定制。我们的基于窗口的方法通过向可用数据中注入一般的合成异常,有助于学习正常类和异常类之间的边界。此外,我们的方法可以有效地利用所有可用的信息,无论是作为领域知识,还是作为半监督环境下的训练标签。我们在标准基准数据集上的经验证明,我们的方法在这些设置中获得了最先进的性能。 摘要:We introduce Neural Contextual Anomaly Detection (NCAD), a framework for anomaly detection on time series that scales seamlessly from the unsupervised to supervised setting, and is applicable to both univariate and multivariate time series. This is achieved by effectively combining recent developments in representation learning for multivariate time series, with techniques for deep anomaly detection originally developed for computer vision that we tailor to the time series setting. Our window-based approach facilitates learning the boundary between normal and anomalous classes by injecting generic synthetic anomalies into the available data. Moreover, our method can effectively take advantage of all the available information, be it as domain knowledge, or as training labels in the semi-supervised setting. We demonstrate empirically on standard benchmark datasets that our approach obtains a state-of-the-art performance in these settings.

【17】 Constrained Feedforward Neural Network Training via Reachability Analysis 标题:基于可达性分析的约束前馈神经网络训练

作者:Long Kiu Chung,Adam Dai,Derek Knowles,Shreyas Kousik,Grace X. Gao 机构:Dai is with the Department of Electrical Engineering 备注:5 pages, 4 figures 链接:https://arxiv.org/abs/2107.07696 摘要:最近,神经网络的应用越来越广泛,但在诸如人类附近和周围的机器人等安全关键领域的应用有限。这是因为训练神经网络服从安全约束仍然是一个公开的挑战。大多数现有的安全相关方法只寻求验证已经训练过的网络是否服从约束,需要交替训练和验证。相反,本文提出了一种同时训练和验证具有校正线性单元(ReLU)非线性的前馈神经网络的约束方法。通过计算网络的输出空间可达集并确保其不与不安全集相交来实施约束;训练是通过在可达集和输出空间的不安全部分之间建立一个新的冲突检查损失函数来实现的。可达集和不安全集由约束的zonotopes表示,这是一种凸多面体表示,支持可微碰撞检查。该方法在一个具有一个非线性层和大约50个参数的网络上得到了成功的验证。 摘要:Neural networks have recently become popular for a wide variety of uses, but have seen limited application in safety-critical domains such as robotics near and around humans. This is because it remains an open challenge to train a neural network to obey safety constraints. Most existing safety-related methods only seek to verify that already-trained networks obey constraints, requiring alternating training and verification. Instead, this work proposes a constrained method to simultaneously train and verify a feedforward neural network with rectified linear unit (ReLU) nonlinearities. Constraints are enforced by computing the network's output-space reachable set and ensuring that it does not intersect with unsafe sets; training is achieved by formulating a novel collision-check loss function between the reachable set and unsafe portions of the output space. The reachable and unsafe sets are represented by constrained zonotopes, a convex polytope representation that enables differentiable collision checking. The proposed method is demonstrated successfully on a network with one nonlinearity layer and approximately 50 parameters.

【18】 Imitate TheWorld: A Search Engine Simulation Platform 标题:模仿世界:一个搜索引擎仿真平台

作者:Yongqing Gao,Guangda Huzhang,Weijie Shen,Yawen Liu,Wen-Ji Zhou,Qing Da,Dan Shen,Yang Yu 链接:https://arxiv.org/abs/2107.07693 摘要:最近的电子商务应用受益于深度学习技术的发展。然而,我们注意到,许多工作试图通过密切匹配离线标签(遵循监督学习范式)来实现业务目标的最大化。这导致模型在曲线下面积(AUC)和归一化贴现累积收益(NDCG)方面获得了较高的离线性能,但无法持续增加用户购买量等收入指标。针对这些问题,我们构建了一个模拟的搜索引擎AESim,它可以通过一个训练有素的鉴别器对生成的页面作为一个动态的数据集进行适当的反馈。与以往与现实世界失去联系的模拟平台不同,我们的模拟平台依赖于AliExpress搜索中的真实数据:使用对抗性学习生成虚拟用户,使用生成性对抗性模仿学习(GAIL)捕获用户的行为模式。我们的实验还表明,AESim比经典的排名指标更能反映排名模型的在线性能,这意味着AESim可以在不上网的情况下充当AliExpress搜索和评估模型的代理。 摘要:Recent E-commerce applications benefit from the growth of deep learning techniques. However, we notice that many works attempt to maximize business objectives by closely matching offline labels which follow the supervised learning paradigm. This results in models obtain high offline performance in terms of Area Under Curve (AUC) and Normalized Discounted Cumulative Gain (NDCG), but cannot consistently increase the revenue metrics such as purchases amount of users. Towards the issues, we build a simulated search engine AESim that can properly give feedback by a well-trained discriminator for generated pages, as a dynamic dataset. Different from previous simulation platforms which lose connection with the real world, ours depends on the real data in AliExpress Search: we use adversarial learning to generate virtual users and use Generative Adversarial Imitation Learning (GAIL) to capture behavior patterns of users. Our experiments also show AESim can better reflect the online performance of ranking models than classic ranking metrics, implying AESim can play a surrogate of AliExpress Search and evaluate models without going online.

【19】 CutDepth:Edge-aware Data Augmentation in Depth Estimation 标题:CutDepth:深度估计中的边缘感知数据增强

作者:Yasunori Ishii,Takayoshi Yamashita 机构:Panasonic, Kadoma, Kadoma City, Osaka, Japan, Chubu University, Matsumotocho, Kasugai, Aichi, Japan 链接:https://arxiv.org/abs/2107.07684 摘要:在单目深度估计中,由于需要同时获取RGB图像和深度,因此很难大规模地采集数据。因此,数据扩充对这项任务非常重要。然而,对于单目深度估计等需要逐像素变换的任务,数据增强的研究却很少。在本文中,我们提出了一种数据扩充方法,称为切割深度。在CutDepth中,在训练期间,部分深度被粘贴到输入图像上。该方法在不破坏边缘特征的前提下扩展了变异数据。客观和主观的实验结果表明,该方法优于传统的数据扩充方法。即使在长距离的训练数据较少的情况下,切割深度也能提高估计精度。 摘要:It is difficult to collect data on a large scale in a monocular depth estimation because the task requires the simultaneous acquisition of RGB images and depths. Data augmentation is thus important to this task. However, there has been little research on data augmentation for tasks such as monocular depth estimation, where the transformation is performed pixel by pixel. In this paper, we propose a data augmentation method, called CutDepth. In CutDepth, part of the depth is pasted onto an input image during training. The method extends variations data without destroying edge features. Experiments objectively and subjectively show that the proposed method outperforms conventional methods of data augmentation. The estimation accuracy is improved with CutDepth even though there are few training data at long distances.

【20】 TAPEX: Table Pre-training via Learning a Neural SQL Executor 标题:TAPEX:通过学习神经SQL执行器进行表预训练

作者:Qian Liu,Bei Chen,Jiaqi Guo,Zeqi Lin,Jian-guang Lou 机构:†Beihang University, Beijing, China; §Microsoft Research, Beijing, China, ♦Xi’an Jiaotong University, Xi’an, China 备注:Work in progress, the project homepage is at this https URL 链接:https://arxiv.org/abs/2107.07653 摘要:近年来,预训练语言模型在自然语言句子和(半)结构化表的建模方面取得了成功。然而,现有的表格预训练技术往往存在数据质量低、预训练效率低等问题。本文证明了通过在合成语料库上学习一个神经SQL执行器可以实现表的预训练,而合成语料库是通过自动合成可执行SQL查询获得的。通过对合成语料库的预训练,我们的方法TAPEX极大地提高了下游任务的性能,使现有的语言模型最多提高了19.5%。同时,TAPEX具有非常高的预训练效率,并且在使用小的预训练语料库时产生了很强的效果。实验结果表明,TAPEX在很大程度上优于以前的表预训练方法,并且我们的模型在四个已知数据集上取得了最新的结果,包括将WikiSQL表示精度提高到89.6%( 4.9%),将WikiTableQuestions表示精度提高到57.5%( 4.8%),SQA表示准确率为74.5%( 3.5%),TabFact表示准确率为84.6%( 3.6%)。我们的工作通过对合成可执行程序的预训练,为过度结构化数据的推理开辟了道路。 摘要:Recent years pre-trained language models hit a success on modeling natural language sentences and (semi-)structured tables. However, existing table pre-training techniques always suffer from low data quality and low pre-training efficiency. In this paper, we show that table pre-training can be realized by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries. By pre-training on the synthetic corpus, our approach TAPEX dramatically improves the performance on downstream tasks, boosting existing language models by at most 19.5%. Meanwhile, TAPEX has remarkably high pre-training efficiency and yields strong results when using a small pre-trained corpus. Experimental results demonstrate that TAPEX outperforms previous table pre-training approaches by a large margin, and our model achieves new state-of-the-art results on four well-known datasets, including improving the WikiSQL denotation accuracy to 89.6% ( 4.9%), the WikiTableQuestions denotation accuracy to 57.5% ( 4.8%), the SQA denotation accuracy to 74.5% ( 3.5%), and the TabFact accuracy to 84.6% ( 3.6%). Our work opens the way to reason over structured data by pre-training on synthetic executable programs.

【21】 Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 标题:融合前对齐:动量蒸馏的视觉和语言表征学习

作者:Junnan Li,Ramprasaath R. Selvaraju,Akhilesh Deepak Gotmare,Shafiq Joty,Caiming Xiong,Steven Hoi 机构:Salesforce Research 链接:https://arxiv.org/abs/2107.07651 摘要:大规模的视觉和语言表征学习在各种视觉语言任务中表现出了良好的效果。大多数现有的方法采用基于Transformer的多模式编码器来联合建模视觉标记(基于区域的图像特征)和单词标记。由于视觉标记和文字标记是不对齐的,因此多模态编码器学习图像-文本交互是一个挑战。在本文中,我们引入了一种对比损失法(ALBEF),通过跨模态注意将图像和文本表征融合在一起,从而使视觉和语言表征的学习更加扎实。与大多数现有方法不同,我们的方法不需要边界框注释,也不需要高分辨率图像。为了提高对含噪web数据的学习,我们提出了动量蒸馏,一种从动量模型产生的伪目标中学习的自训练方法。我们从互信息最大化的角度对ALBEF进行了理论分析,表明不同的训练任务可以被解释为生成图像-文本对视图的不同方式。ALBEF在多个下游视觉语言任务上实现了最先进的性能。在图像文本检索方面,ALBEF的性能优于在数量级较大的数据集上预先训练的方法。在VQA和NLVR$^2$上,ALBEF的绝对改善率分别为2.37%和3.84%,同时具有更快的推理速度。代码和预先训练的模型可在https://github.com/salesforce/ALBEF/. 摘要:Large-scale vision and language representation learning has shown promising improvements on various vision-language tasks. Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens. Because the visual tokens and word tokens are unaligned, it is challenging for the multimodal encoder to learn image-text interactions. In this paper, we introduce a contrastive loss to ALign the image and text representations BEfore Fusing (ALBEF) them through cross-modal attention, which enables more grounded vision and language representation learning. Unlike most existing methods, our method does not require bounding box annotations nor high-resolution images. In order to improve learning from noisy web data, we propose momentum distillation, a self-training method which learns from pseudo-targets produced by a momentum model. We provide a theoretical analysis of ALBEF from a mutual information maximization perspective, showing that different training tasks can be interpreted as different ways to generate views for an image-text pair. ALBEF achieves state-of-the-art performance on multiple downstream vision-language tasks. On image-text retrieval, ALBEF outperforms methods that are pre-trained on orders of magnitude larger datasets. On VQA and NLVR$^2$, ALBEF achieves absolute improvements of 2.37% and 3.84% compared to the state-of-the-art, while enjoying faster inference speed. Code and pre-trained models are available at https://github.com/salesforce/ALBEF/.

【22】 Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi 标题:基于HANABI的学习型和规则型Agent的人工智能团队评估

作者:Ho Chit Siu,Jaime D. Pena,Kimberlee C. Chang,Edenna Chen,Yutai Zhou,Victor J. Lopez,Kyle Palko,Ross E. Allen 机构:Jaime D. Peña ∗ 链接:https://arxiv.org/abs/2107.07630 摘要:深度强化学习在围棋和星际争霸等竞技游戏中产生了超人AI。类似的学习技术能为人机合作游戏创造一个优秀的人工智能队友吗?人类会更喜欢改善客观团队绩效的人工智能队友,还是那些改善主观信任度的人工智能队友?在这项研究中,我们使用基于规则和基于学习的代理对合作纸牌游戏emph{Hanabi}中的人类和人工智能代理团队进行了单盲评估。除了游戏分数作为人工智能团队绩效的客观度量外,我们还量化了人工智能团队绩效、团队合作、可解释性、信任度和整体偏好的主观度量。我们发现,在几乎所有的主观指标中,人类对基于规则的人工智能队友(SmartBot)的偏好明显高于最先进的基于学习的人工智能队友(Other Play),并且通常对基于学习的智能体持负面看法,尽管在游戏得分上没有统计差异。这一结果对未来的人工智能设计和强化学习基准测试有一定的启示,强调需要将人工智能团队的主观指标纳入其中,而不是单一地关注客观任务绩效。 摘要:Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve objective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game emph{Hanabi}, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human's perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score. This result has implications for future AI design and reinforcement learning benchmarking, highlighting the need to incorporate subjective metrics of human-AI teaming rather than a singular focus on objective task performance.

【23】 A Channel Coding Benchmark for Meta-Learning 标题:一种面向元学习的信道编码基准

作者:Rui Li,Ondrej Bohdal,Rajesh Mishra,Hyeji Kim,Da Li,Nicholas Lane,Timothy Hospedales 机构:Cambridge, UK, School of Informatics, University of Edinburgh, UK, UT Austin, US, Samsung AI Center, UK, Samsung AI Center and 链接:https://arxiv.org/abs/2107.07579 摘要:元学习为新任务的数据高效学习提供了一系列流行而有效的方法。然而,到目前为止,元学习中的几个重要问题已经被证明是很难研究的。例如,在现实环境中,元学习者必须从广泛且潜在的多模式训练任务分布中学习,绩效会下降;当元训练和元测试任务分配之间存在分配转移时。这些问题通常很难研究,因为任务分布的形状以及它们之间的转换在标准基准中不容易测量或控制。我们提出了信道编码问题作为元学习的基准。信道编码是一种重要的实际应用,在这种应用中,任务分布是自然产生的,快速适应新的任务具有重要的实用价值。我们使用这个基准来研究元学习的几个方面,包括任务分布广度和移位的影响,这些都可以在编码问题中得到控制。展望未来,这个基准为社区提供了一个工具,以研究元学习的能力和局限性,并推动对实际强大和有效的元学习者的研究。 摘要:Meta-learning provides a popular and effective family of methods for data-efficient learning of new tasks. However, several important issues in meta-learning have proven hard to study thus far. For example, performance degrades in real-world settings where meta-learners must learn from a wide and potentially multi-modal distribution of training tasks; and when distribution shift exists between meta-train and meta-test task distributions. These issues are typically hard to study since the shape of task distributions, and shift between them are not straightforward to measure or control in standard benchmarks. We propose the channel coding problem as a benchmark for meta-learning. Channel coding is an important practical application where task distributions naturally arise, and fast adaptation to new tasks is practically valuable. We use this benchmark to study several aspects of meta-learning, including the impact of task distribution breadth and shift, which can be controlled in the coding problem. Going forward, this benchmark provides a tool for the community to study the capabilities and limitations of meta-learning, and to drive research on practically robust and effective meta-learners.

【24】 Real-Time Violence Detection Using CNN-LSTM 标题:基于CNN-LSTM的实时暴力检测

作者:Mann Patel 机构:Patel Charotar University of Science and Technology 18dcs07 4 备注:5 pages, 9 figures 链接:https://arxiv.org/abs/2107.07578 摘要:然而,在过去的40年里,暴力事件的发生率下降了57%,但这并没有改变暴力事件的实际发生方式,这是法律所看不见的。暴力有时可以由更高的当局进行大规模控制,然而,为了使一切保持一致,人们必须对发生在每个广场的每一条道路上的每一个运动进行“微观管理”。为了解决蝴蝶效应对我们环境的影响,我建立了一个独特的模型和一个理论体系,利用深度学习来处理这个问题。该模型接受闭路电视视频的输入,并在进行推断后,识别是否正在进行暴力运动。虚拟体系结构的目标是实现视频源的概率驱动计算,并减少对每个CCTV视频源进行简单计算的开销。 摘要:Violence rates however have been brought down about 57% during the span of the past 4 decades yet it doesn't change the way that the demonstration of violence actually happens, unseen by the law. Violence can be mass controlled sometimes by higher authorities, however, to hold everything in line one must "Microgovern" over each movement occurring in every road of each square. To address the butterfly effects impact in our setting, I made a unique model and a theorized system to handle the issue utilizing deep learning. The model takes the input of the CCTV video feeds and after drawing inference, recognizes if a violent movement is going on. And hypothesized architecture aims towards probability-driven computation of video feeds and reduces overhead from naively computing for every CCTV video feeds.

【25】 Beyond Goldfish Memory: Long-Term Open-Domain Conversation 标题:超越金鱼记忆:长期开放领域对话

作者:Jing Xu,Arthur Szlam,Jason Weston 机构:Facebook AI Research 链接:https://arxiv.org/abs/2107.07567 摘要:尽管开放领域对话模型最近有所改进,但最先进的模型是在几乎没有上下文的简短对话中进行训练和评估的。相比之下,长期会话环境很少被研究。在这项工作中,我们收集并发布了一个由多个聊天会话组成的人类数据集,通过这些会话,说话的伙伴可以了解彼此的兴趣,并讨论他们从过去的会话中学到的东西。我们展示了在现有数据集上训练的现有模型如何在自动和人工评估的长期会话环境中表现不佳,并且我们研究了可以表现更好的长期上下文模型。特别地,我们发现检索增强方法和具有总结和回忆先前对话能力的方法优于目前被认为是最先进的标准编码器-解码器架构。 摘要:Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions. We show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations, and we study long-context models that can perform much better. In particular, we find retrieval-augmented methods and methods with an ability to summarize and recall previous conversations outperform the standard encoder-decoder architectures currently considered state of the art.

【26】 Internet-Augmented Dialogue Generation 标题:互联网增强的对话生成

作者:Mojtaba Komeili,Kurt Shuster,Jason Weston 机构:Facebook AI Research 链接:https://arxiv.org/abs/2107.07566 摘要:地球上最大的不断更新的知识库可以通过互联网搜索访问。在这项工作中,我们研究给予这些信息的会话代理。大型语言模型,即使它们在其权重范围内存储了大量令人印象深刻的知识,在生成对话时也会产生幻觉(Shuster et al.,2021);而且,这些事实在模型训练时被冻结了。相比之下,我们提出了一种基于上下文学习生成internet搜索查询的方法,然后对搜索结果进行条件处理以最终生成响应,这种方法可以利用最新的相关信息。我们在一个新收集的人类对话数据集上训练和评估这样的模型,其中一个演讲者在知识驱动的讨论中被允许访问互联网搜索,以确定他们的回答。我们发现,与不使用增广或基于FAISS的检索的现有方法相比,基于搜索查询的会话互联网访问提供了更高的性能(Lewis等人,2020)。 摘要:The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen in time at the point of model training. In contrast, we propose an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information. We train and evaluate such models on a newly collected dataset of human-human conversations whereby one of the speakers is given access to internet search during knowledgedriven discussions in order to ground their responses. We find that search-query based access of the internet in conversation provides superior performance compared to existing approaches that either use no augmentation or FAISS-based retrieval (Lewis et al., 2020).

【27】 A New Robust Multivariate Mode Estimator for Eye-tracking Calibration 标题:一种新的用于眼动定标的鲁棒多变量模式估计器

作者:Adrien Brilhault,Sergio Neuenschwander,Ricardo Araujo Rios 链接:https://arxiv.org/abs/2107.08030 摘要:本文提出了一种估计多元分布主模式的新方法,并将其应用于人眼跟踪标定。当对合作性较差的对象(如婴儿或猴子)进行眼睛跟踪实验时,校准数据通常会受到高度污染。异常值通常以聚类的形式组织,对应于受试者不看校准点的时间间隔。在这种类型的多模态分布中,大多数中心趋势度量在估计主注视坐标(第一种模式)时失败,导致将注视映射到屏幕坐标时出现错误和不准确。在这里,我们开发了一种新的算法来识别多元分布的第一种模式,称为BRIL,它依赖于基于递归深度的滤波。这种新方法在高斯分布和均匀分布的人工混合上进行了测试,并与现有方法(传统的深度中位数、位置和散射的稳健估计以及基于聚类的方法)进行了比较。我们获得了优异的性能,即使是对于包含非常高比例的离群值的分布,无论是分组还是随机分布。最后,我们利用卷尾猴眼睛跟踪标定的实验数据,证明了我们的方法在真实场景中的优势,特别是对于其他算法通常缺乏准确性的分布。 摘要:We propose in this work a new method for estimating the main mode of multivariate distributions, with application to eye-tracking calibrations. When performing eye-tracking experiments with poorly cooperative subjects, such as infants or monkeys, the calibration data generally suffer from high contamination. Outliers are typically organized in clusters, corresponding to the time intervals when subjects were not looking at the calibration points. In this type of multimodal distributions, most central tendency measures fail at estimating the principal fixation coordinates (the first mode), resulting in errors and inaccuracies when mapping the gaze to the screen coordinates. Here, we developed a new algorithm to identify the first mode of multivariate distributions, named BRIL, which rely on recursive depth-based filtering. This novel approach was tested on artificial mixtures of Gaussian and Uniform distributions, and compared to existing methods (conventional depth medians, robust estimators of location and scatter, and clustering-based approaches). We obtained outstanding performances, even for distributions containing very high proportions of outliers, both grouped in clusters and randomly distributed. Finally, we demonstrate the strength of our method in a real-world scenario using experimental data from eye-tracking calibrations with Capuchin monkeys, especially for distributions where other algorithms typically lack accuracy.

【28】 Deep Learning Based Hybrid Precoding in Dual-Band Communication Systems 标题:双频通信系统中基于深度学习的混合预编码

作者:Rafail Ismayilov,Renato L. G. Cavalcante,Sławomir Stańczak 机构:∗Fraunhofer Heinrich Hertz Institute, Berlin, Germany, †Technical University of Berlin, Berlin, Germany 链接:https://arxiv.org/abs/2107.07843 摘要:提出了一种基于深度学习的方法,利用从亚6GHz频段提取的空间和时间信息来预测/跟踪毫米波频段的波束。更详细地,我们考虑在子6GHz和毫米波波段中工作的双频通信系统。其目标是最大限度地实现互信息在毫米波波段与混合模拟/数字架构,其中模拟预编码器(射频预编码器)是取自有限码本。使用传统的搜索方法寻找RF预编码器会产生很大的信令开销,并且信令会随着RF链的数量和移相器的分辨率而变化。该方法利用亚6GHz频段与毫米波频段的时空相关性,通过对亚6GHz频段的信道测量,预测/跟踪毫米波频段的射频预编码器,克服了毫米波频段信令开销大的问题。所提出的方法提供了一个较小的候选集,因此与传统的搜索启发式算法相比,在该候选集上执行搜索可以显著降低信令开销。仿真结果表明,该方法可以提供合理的可实现速率,同时显著降低信令开销。 摘要:We propose a deep learning-based method that uses spatial and temporal information extracted from the sub-6GHz band to predict/track beams in the millimeter-wave (mmWave) band. In more detail, we consider a dual-band communication system operating in both the sub-6GHz and mmWave bands. The objective is to maximize the achievable mutual information in the mmWave band with a hybrid analog/digital architecture where analog precoders (RF precoders) are taken from a finite codebook. Finding a RF precoder using conventional search methods incurs large signalling overhead, and the signalling scales with the number of RF chains and the resolution of the phase shifters. To overcome the issue of large signalling overhead in the mmWave band, the proposed method exploits the spatiotemporal correlation between sub-6GHz and mmWave bands, and it predicts/tracks the RF precoders in the mmWave band from sub-6GHz channel measurements. The proposed method provides a smaller candidate set so that performing a search over that set significantly reduces the signalling overhead compared with conventional search heuristics. Simulations show that the proposed method can provide reasonable achievable rates while significantly reducing the signalling overhead.

【29】 Depth Estimation from Monocular Images and Sparse radar using Deep Ordinal Regression Network 标题:基于深度有序回归网络的单目图像和稀疏雷达深度估计

作者:Chen-Chou Lo,Patrick Vandewalle 机构:EAVISE, PSI, Dept. of Electrical Engineering (ESAT), KU Leuven, Jan de Nayerlaan , Sint-Katelijne-Waver, Belgium 备注:Accepted to ICIP2021 链接:https://arxiv.org/abs/2107.07596 摘要:将稀疏雷达数据融合到单目深度估计模型中,提出了一种新的预处理方法来降低雷达提供的稀疏性和有限视场。我们研究了不同雷达模式的固有误差,结果表明我们提出的方法可以在减少误差的前提下获得更多的数据点。在Fu等人提出的深度有序回归网络的基础上,提出了一种利用深度学习从单目二维图像和稀疏雷达测量值中估计密集深度图的新方法,该方法首先将稀疏的二维点转换为高度扩展的三维测量值,然后将其包含到网络中晚期融合方法。在nuScenes数据集上进行了实验。我们的实验在白天和夜晚的场景中都展示了最先进的性能。 摘要:We integrate sparse radar data into a monocular depth estimation model and introduce a novel preprocessing method for reducing the sparseness and limited field of view provided by radar. We explore the intrinsic error of different radar modalities and show our proposed method results in more data points with reduced error. We further propose a novel method for estimating dense depth maps from monocular 2D images and sparse radar measurements using deep learning based on the deep ordinal regression network by Fu et al. Radar data are integrated by first converting the sparse 2D points to a height-extended 3D measurement and then including it into the network using a late fusion approach. Experiments are conducted on the nuScenes dataset. Our experiments demonstrate state-of-the-art performance in both day and night scenes.

0 人点赞