人工智能学术速递[6.29]

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.AI人工智能，共计73篇

【1】 Doing good by fighting fraud: Ethical anti-fraud systems for mobile payments 标题：通过打击欺诈做好事：移动支付的道德反欺诈系统

作者：Zainul Abi Din,Hari Venugopalan,Henry Lin,Adam Wushensky,Steven Liu,Samuel T. King 机构：∗ University of California, Davis, † Bouncer Technologies 链接：https://arxiv.org/abs/2106.14861 摘要：pp建设者通常使用安全挑战（一种逐步验证的形式）来为他们的应用程序增加安全性。然而，这类建筑的伦理意涵之前还没有被研究过。在本文中，我们提出了一个大规模的测量研究运行现有的反欺诈安全挑战，拳击手，在真实的应用程序运行在移动设备上。我们发现，虽然Boxer总体上工作得很好，但它无法在运行机器学习模型的设备上以低于每秒一帧（FPS）的速度进行有效扫描，从而阻碍了使用廉价设备的用户。根据我们的研究成果，我们设计了一个新的反欺诈系统，用于扫描支付卡，该系统在现代移动设备上广泛的性能特征和硬件配置下工作。与Boxer相比，Daredevil将运行速度低于1fps的设备数量减少了一个数量级，为打击欺诈提供了一个更公平的系统。我们总共收集了5085444个真实设备的数据，这些设备分布在496个运行生产软件并与真实用户交互的真实应用程序中。摘要：pp builders commonly use security challenges, aform of step-up authentication, to add security to their apps. However, the ethical implications of this type of architecture has not been studied previously. In this paper, we present a large-scale measurement study of running an existing anti-fraud security challenge, Boxer, in real apps running on mobile devices. We find that although Boxer does work well overall, it is unable to scan effectively on devices that run its machine learning models at less than one frame per second (FPS), blocking users who use inexpensive devices. With the insights from our study, we design Daredevil, anew anti-fraud system for scanning payment cards that work swell across the broad range of performance characteristics and hardware configurations found on modern mobile devices. Daredevil reduces the number of devices that run at less than one FPS by an order of magnitude compared to Boxer, providing a more equitable system for fighting fraud. In total, we collect data from 5,085,444 real devices spread across 496 real apps running production software and interacting with real users.

【2】 K-Net: Towards Unified Image Segmentation 标题：K-Net：走向统一的图像分割

作者：Wenwei Zhang,Jiangmiao Pang,Kai Chen,Chen Change Loy 机构：S-Lab, Nanyang Technological University, CUHK-SenseTime Joint Lab, the Chinese University of Hong Kong, SenseTime Research, Shanghai AI Laboratory 备注：Technical Report 链接：https://arxiv.org/abs/2106.14855 摘要：语义、实例和全景分割已经使用不同的和专门的框架来解决，尽管它们之间存在潜在的联系。本文为这些基本相似的任务提供了一个统一、简单、有效的框架。这个名为K-Net的框架通过一组可学习的内核一致地分割实例和语义类别，每个内核负责为一个潜在实例或一个stuff类生成一个掩码。为了克服区分不同实例的困难，我们提出了一种内核更新策略，使得每个内核都是动态的，并以其在输入图像中的有意义组为条件。K-网可以采用二部匹配的端到端方式进行训练，其训练和推理自然是无NMS和无盒的。K-Net在没有钟声和哨声的情况下，以52.1%的PQ和54.3%的mIoU分别超过了以往在MS-COCO上的全景分割和在ADE20K上的语义分割的最新单模型结果。其实例分割性能与级联掩码R-CNNon-MS-COCO相当，推理速度提高了60%-90%。代码和模型将在https://github.com/open-mmlab/mmdetection. 摘要：Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous state-of-the-art single-model results of panoptic segmentation on MS COCO and semantic segmentation on ADE20K with 52.1% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNNon MS COCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/open-mmlab/mmdetection.

【3】 Improving Prediction of Low-Prior Clinical Events with Simultaneous General Patient-State Representation Learning 标题：利用同步的一般患者-状态表征学习改进低先期临床事件的预测

作者：Matthew Barren,Milos Hauskrecht 机构：Improving Prediction of Low-Prior ClinicalEvents with Simultaneous General Patient-StateRepresentation LearningMatthew Barren[0000−000 3−08 5 5− 2 1 4 4] and Milos Hauskrecht[0000−000 2−78 18−06 3 3]University of Pittsburgh 备注：Accepted at 19th International Conference on Artificial Intelligence in Medicine (AIME 2021) 链接：https://arxiv.org/abs/2106.14838 摘要：低先验目标在许多重要的临床事件中是常见的，这就带来了有足够的数据来支持预测模型学习的挑战。许多以前的工作都是通过建立一个通用的病人状态表示模型，然后将其适应于一个新的低先验预测目标来解决这个问题。在这个模式中，一般病人状态模型和目标任务之间的不一致可能阻碍预测性能。为了克服这一挑战，我们提出了一种新的方法，通过低先验监督目标和通用病人状态表示（GPSR）的多任务学习同时优化共享模型。更具体地说，我们的方法通过联合优化一个共享模型来提高低优先级任务的预测性能，该模型结合了目标事件的丢失和广泛的一般临床事件。我们在递归神经网络（RNN）的背景下研究了该方法。通过使用MIMIC-III数据对多个临床事件目标进行的大量实验，我们发现在模型训练中加入一般患者状态表示任务可以提高对单个低先验目标的预测能力。摘要：Low-prior targets are common among many important clinical events, which introduces the challenge of having enough data to support learning of their predictive models. Many prior works have addressed this problem by first building a general patient-state representation model, and then adapting it to a new low-prior prediction target. In this schema, there is potential for the predictive performance to be hindered by the misalignment between the general patient-state model and the target task. To overcome this challenge, we propose a new method that simultaneously optimizes a shared model through multi-task learning of both the low-prior supervised target and general purpose patient-state representation (GPSR). More specifically, our method improves prediction performance of a low-prior task by jointly optimizing a shared model that combines the loss of the target event and a broad range of generic clinical events. We study the approach in the context of Recurrent Neural Networks (RNNs). Through extensive experiments on multiple clinical event targets using MIMIC-III data, we show that the inclusion of general patient-state representation tasks during model training improves the prediction of individual low-prior targets.

【4】 Virtual Agents in Live Coding: A Short Review 标题：实时编码中的虚拟代理：简评

作者：Anna Xambó 备注：Preprint version submitted to eContact! (this https URL) for the special issue 21.1 - Take Back the Stage: Live coding, live audiovisual, laptop orchestra 链接：https://arxiv.org/abs/2106.14835 摘要：人工智能和实时编码很少被探索。本文简要回顾了在实时编码实践中使用虚拟代理的不同观点，回顾了过去和现在，并指出了未来的发展方向。摘要：AI and live coding has been little explored. This article contributes with a short review of different perspectives of using virtual agents in the practice of live coding looking at past and present as well as pointing to future directions.

【5】 Feature Importance Guided Attack: A Model Agnostic Adversarial Attack 标题：特征重要性制导攻击：一种不可知的对抗性攻击模型

作者：Gilad Gressel,Niranjan Hegde,Archana Sreekumar,Michael Darling 机构：Darling, Center for Cybersecurity Systems and Networks, Amrita University, Kerala, India, Sandia National Laboratories, Albuquerque, USA 链接：https://arxiv.org/abs/2106.14815 摘要：机器学习模型容易受到敌方攻击，这大大降低了它们的性能。对这些攻击的可靠防御是一个尚未解决的挑战。在这项工作中，我们提出了一种新的规避攻击：特征重要性引导攻击（FIGA），它生成对抗性规避样本。FIGA是一种模型无关的算法，它不需要预先知道模型的学习算法，只需要知道模型的特征表示。FIGA利用特征重要性排名；它沿着我们希望模拟的目标类的方向干扰输入的最重要的特性。我们演示了针对八个网络钓鱼检测模型的FIGA。我们通过干扰对手可以控制的钓鱼网站功能来保持攻击的真实性。使用FIGA，我们能够使钓鱼检测模型的F1得分平均从0.96降低到0.41。最后，我们将对抗性训练作为对抗FIGA的一种防御手段，并证明了虽然对抗性训练有时是有效的，但是可以通过改变FIGA的参数来规避。摘要：Machine learning models are susceptible to adversarial attacks which dramatically reduce their performance. Reliable defenses to these attacks are an unsolved challenge. In this work, we present a novel evasion attack: the 'Feature Importance Guided Attack' (FIGA) which generates adversarial evasion samples. FIGA is model agnostic, it assumes no prior knowledge of the defending model's learning algorithm, but does assume knowledge of the feature representation. FIGA leverages feature importance rankings; it perturbs the most important features of the input in the direction of the target class we wish to mimic. We demonstrate FIGA against eight phishing detection models. We keep the attack realistic by perturbing phishing website features that an adversary would have control over. Using FIGA we are able to cause a reduction in the F1-score of a phishing detection model from 0.96 to 0.41 on average. Finally, we implement adversarial training as a defense against FIGA and show that while it is sometimes effective, it can be evaded by changing the parameters of FIGA.

【6】 Hyperspectral Remote Sensing Image Classification Based on Multi-scale Cross Graphic Convolution 标题：基于多尺度交叉图形卷积的高光谱遥感图像分类

作者：Yunsong Zhao,Yin Li,Zhihan Chen,Tianchong Qiu,Guojin Liu 链接：https://arxiv.org/abs/2106.14804 摘要：特征的挖掘和利用直接影响到高光谱遥感图像分类识别模型的分类性能。传统的模型通常从单一的角度进行特征挖掘，挖掘出的特征有限，忽略了它们之间的内在联系。因此，有用的特征丢失，分类结果不令人满意。为了充分挖掘和利用图像特征，提出了一种新的多尺度特征挖掘学习算法（MGRNet）。该模型利用主成分分析对原始高光谱图像进行降维，保留99.99%的语义信息，提取降维特征。利用多尺度卷积算法对输入降维特征进行挖掘，得到浅层特征，作为多尺度图卷积算法的输入，构造不同尺度下特征值之间的内在关系。然后对图卷积得到的多尺度信息进行交叉融合，再将得到的新信息输入残差网络算法进行深度特征挖掘。最后，采用灵活的最大传递函数分类器对最终特征进行预测，完成分类。在三种常见的高光谱数据集上的实验结果表明，本文提出的MGRNet算法在识别精度上优于传统方法。摘要：The mining and utilization of features directly affect the classification performance of models used in the classification and recognition of hyperspectral remote sensing images. Traditional models usually conduct feature mining from a single perspective, with the features mined being limited and the internal relationships between them being ignored. Consequently, useful features are lost and classification results are unsatisfactory. To fully mine and utilize image features, a new multi-scale feature-mining learning algorithm (MGRNet) is proposed. The model uses principal component analysis to reduce the dimensionality of the original hyperspectral image (HSI) to retain 99.99% of its semantic information and extract dimensionality reduction features. Using a multi-scale convolution algorithm, the input dimensionality reduction features were mined to obtain shallow features, which then served as inputs into a multi-scale graph convolution algorithm to construct the internal relationships between eigenvalues at different scales. We then carried out cross fusion of multi-scale information obtained by graph convolution, before inputting the new information obtained into the residual network algorithm for deep feature mining. Finally, a flexible maximum transfer function classifier was used to predict the final features and complete the classification. Experiments on three common hyperspectral datasets showed the MGRNet algorithm proposed in this paper to be superior to traditional methods in recognition accuracy.

【7】 Reasoning on textit{DL-Lite}_{cal R} with Defeasibility in ASP

作者：Loris Bozzato,Thomas Eiter,Luciano Serafini 机构：Fondazione Bruno Kessler, Via Sommarive , Trento, Italy, Technische Universit¨at Wien, Favoritenstraße ,-, A-, Vienna, Austria 备注：Under consideration in Theory and Practice of Logic Programming (TPLP). This paper is an extended and revised version of a conference paper appearing in the proceedings of the 3rd International Joint Conference on Rules and Reasoning (RuleML RR 2019). arXiv admin note: text overlap with arXiv:1905.09221 链接：https://arxiv.org/abs/2106.14801 摘要：可撤销知识的推理是描述逻辑领域的一个重要课题，因为它关系到在知识库中表示异常实例的需要。在这个方向上，在我们以前的工作中，我们提出了一个框架来表示（上下文化的）OWL-RL知识库，其中包含一个关于不可行公理的合理例外的概念：在这样的框架中，推理是通过翻译成ASP程序来实现的。然而，OWL-RL的推理过程引入了一个复杂的编码，以便捕获对异常进行推理所需的负面信息进行推理。本文将合理异常方法应用于$textit{DL Lite}{uuur}$知识库，即OWL-QL语言，给出了$textit{DL Lite}{uuur}$知识库的定义，并研究了它们的语义和计算性质。特别地，我们研究了异常对未命名个体的影响。$textit{DL Lite}{cal R}$公理的有限形式允许我们制定一个更简单的ASP编码，其中对负面信息的推理由直接规则管理。由此产生的物化方法产生了一个完整的推理过程，例如用不可行公理签入$textit{DL Lite}{cal R}$。在逻辑程序设计的理论和实践中都在考虑。摘要：Reasoning on defeasible knowledge is a topic of interest in the area of description logics, as it is related to the need of representing exceptional instances in knowledge bases. In this direction, in our previous works we presented a framework for representing (contextualized) OWL RL knowledge bases with a notion of justified exceptions on defeasible axioms: reasoning in such framework is realized by a translation into ASP programs. The resulting reasoning process for OWL RL, however, introduces a complex encoding in order to capture reasoning on the negative information needed for reasoning on exceptions. In this paper, we apply the justified exception approach to knowledge bases in $textit{DL-Lite}_{cal R}$, i.e., the language underlying OWL QL. We provide a definition for $textit{DL-Lite}_{cal R}$ knowledge bases with defeasible axioms and study their semantic and computational properties. In particular, we study the effects of exceptions over unnamed individuals. The limited form of $textit{DL-Lite}_{cal R}$ axioms allows us to formulate a simpler ASP encoding, where reasoning on negative information is managed by direct rules. The resulting materialization method gives rise to a complete reasoning procedure for instance checking in $textit{DL-Lite}_{cal R}$ with defeasible axioms. Under consideration in Theory and Practice of Logic Programming (TPLP).

【8】 Training Massive Deep Neural Networks in a Smart Contract: A New Hope 标题：在智能合同中训练海量深度神经网络：新的希望

作者：Yin Yang 链接：https://arxiv.org/abs/2106.14763 摘要：深度神经网络（DNNs）在DeFi和NFT交易等区块链应用中非常有用。然而，在当今的区块链平台上，作为智能合约的一部分训练/运行大规模DNN是不可行的，因为这些平台的两个基本设计问题。首先，现在的区块链通常要求每个节点在任何时候都保持完整的世界状态，这意味着节点必须执行每个区块中的所有事务。这对于涉及dnn的计算密集型智能合约来说是非常昂贵的。第二，现有的区块链平台期望智能合约交易具有确定性、可复制的结果和效果。相反，dnn通常在大规模并行计算设备（如gpu、tpu和/或计算集群）上无锁地训练/运行，这些设备通常不会产生确定性结果。本文提出了新颖的平台设计，统称为新希望（ANH），以解决上述问题。其主要思想是（i）计算密集型智能合约交易仅由需要其结果的节点或专业服务提供商执行，以及（ii）非确定性智能合约交易导致不确定的结果，尽管成本相对较高，但仍然可以验证；特别是对于DNNs，验证成本通常可以通过验证结果的属性而不是其精确值来降低。此外，我们还讨论了ANH的各种含义，包括其对代币可替代性、切分、私人交易的影响，以及智能合约的基本含义。摘要：Deep neural networks (DNNs) could be very useful in blockchain applications such as DeFi and NFT trading. However, training / running large-scale DNNs as part of a smart contract is infeasible on today's blockchain platforms, due to two fundamental design issues of these platforms. First, blockchains nowadays typically require that each node maintain the complete world state at any time, meaning that the node must execute all transactions in every block. This is prohibitively expensive for computationally intensive smart contracts involving DNNs. Second, existing blockchain platforms expect smart contract transactions to have deterministic, reproducible results and effects. In contrast, DNNs are usually trained / run lock-free on massively parallel computing devices such as GPUs, TPUs and / or computing clusters, which often do not yield deterministic results. This paper proposes novel platform designs, collectively called A New Hope (ANH), that address the above issues. The main ideas are (i) computing-intensive smart contract transactions are only executed by nodes who need their results, or by specialized serviced providers, and (ii) a non-deterministic smart contract transaction leads to uncertain results, which can still be validated, though at a relatively high cost; specifically for DNNs, the validation cost can often be reduced by verifying properties of the results instead of their exact values. In addition, we discuss various implications of ANH, including its effects on token fungibility, sharding, private transactions, and the fundamental meaning of a smart contract.

【9】 A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning 标题：一种理论驱动的对比表征学习自标注求精方法

作者：Pan Zhou,Caiming Xiong,Xiao-Tong Yuan,Steven Hoi 机构：∗ Salesforce Research, † Nanjing University of Information Science & Technology 备注：under review. arXiv admin note: substantial text overlap with arXiv:1903.11680 by other authors 链接：https://arxiv.org/abs/2106.14749 摘要：对于图像查询，无监督对比学习将同一图像的裁剪标记为正片，将其他图像的裁剪标记为负片。尽管这样的本地标签分配策略很直观，但它不能揭示查询及其正负项之间潜在的语义相似性，并且会降低性能，因为某些负项在语义上与查询相似，甚至与查询共享同一语义类。在这项工作中，我们首先证明了在对比学习中，不准确的标签分配严重影响了语义实例识别的泛化，而准确的标签则有利于语义实例识别的泛化。受这一理论的启发，我们提出了一种新的对比学习自标记优化方法。它通过两个互补的模块来提高标签质量：（i）自标记精炼（SLR）来生成准确的标签；（ii）动量混合（MM）来增强查询与其正查询之间的相似性。SLR使用查询的一个正数来估计查询与其正数和负数之间的语义相似度，并将估计的相似度与对比学习中的香草标签分配相结合，以迭代方式生成更准确、信息更丰富的软标签。从理论上证明了SLR能够准确地恢复标签损坏数据的真实语义标签，并监督网络实现分类任务的零预测误差。MM将查询和阳性信息随机组合，以增加生成的虚拟查询和它们的阳性信息之间的语义相似性，从而提高标签的准确性。在CIFAR10、ImageNet、VOC和COCO上的实验结果表明了该方法的有效性。PyTorch代码和模型将在网上发布。摘要：For an image query, unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives. Although intuitive, such a native label assignment strategy cannot reveal the underlying semantic similarity between a query and its positives and negatives, and impairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query. In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination, while accurate labels benefit its generalization. Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning. It improves the label quality via two complementary modules: (i) self-labeling refinery (SLR) to generate accurate labels and (ii) momentum mixup (MM) to enhance similarity between query and its positive. SLR uses a positive of a query to estimate semantic similarity between a query and its positive and negatives, and combines estimated similarity with vanilla label assignment in contrastive learning to iteratively generate more accurate and informative soft labels. We theoretically show that our SLR can exactly recover the true semantic labels of label-corrupted data, and supervises networks to achieve zero prediction error on classification tasks. MM randomly combines queries and positives to increase semantic similarity between the generated virtual queries and their positives so as to improves label accuracy. Experimental results on CIFAR10, ImageNet, VOC and COCO show the effectiveness of our method. PyTorch code and model will be released online.

【10】 TENT: Tensorized Encoder Transformer for Temperature Forecasting 标题：帐篷：用于温度预测的张力编码器Transformer

作者：Onur Bilgin,Paweł Mąka,Thomas Vergutz,Siamak Mehrkanoon 机构：Department of Data Science and Knowledge Engineering, Maastricht University, The Netherlands 备注：9 pages, 10 figures 链接：https://arxiv.org/abs/2106.14742 摘要：可靠的天气预报在科学、商业和社会中具有重要意义。用于天气预报任务的最佳数据驱动模型依赖于递归或卷积神经网络，其中一些神经网络包含注意机制。在这项工作中，我们介绍了一个新的模式，基于Transformer架构的天气预报。提出的TENT（TENT）模型具有张量注意力，通过对气象数据进行多维张量处理，充分利用了气象数据的时空结构。结果表明，与原始Transformer的编码器部分和三维卷积神经网络相比，本文提出的帐篷模型能更好地模拟复杂天气模式，用于温度预报任务。在两个实际气象数据集上进行了实验。数据集包括来自美国、加拿大和欧洲城市的历史测量数据。第一个数据集包含2012年10月至2017年11月美国和加拿大30个城市的每小时天气属性测量值。第二个数据集包含2005年5月至2020年4月欧洲18个城市的每日天气属性测量值。我们使用我们的注意力机制计算出的注意力得分来阐明我们的模型的决策过程和对任务最重要城市的洞察知识。摘要：Reliable weather forecasting is of great importance in science, business and society. The best performing data-driven models for weather prediction tasks rely on recurrent or convolutional neural networks, where some of which incorporate attention mechanisms. In this work, we introduce a new model based on the Transformer architecture for weather forecasting. The proposed Tensorial Encoder Transformer (TENT) model is equipped with tensorial attention and thus it exploits the spatiotemporal structure of weather data by processing it in multidimensional tensorial format. We show that compared to the encoder part of the original transformer and 3D convolutional neural networks, the proposed TENT model can better model the underlying complex pattern of weather data for the studied temperature prediction task. Experiments on two real-life weather datasets are performed. The datasets consist of historical measurements from USA, Canada and European cities. The first dataset contains hourly measurements of weather attributes for 30 cities in USA and Canada from October 2012 to November 2017. The second dataset contains daily measurements of weather attributes of 18 cities across Europe from May 2005 to April 2020. We use attention scores calculated from our attention mechanism to shed light on the decision-making process of our model and have insight knowledge on the most important cities for the task.

【11】 Fractal Pyramid Networks 标题：分形金字塔网络

作者：Zhiqiang Deng,Huimin Yu,Yangqi Long 机构： Department of Information Science and Electronic Engineering, Zhejiang University, China, State Key Laboratory of CAD & CG, China 链接：https://arxiv.org/abs/2106.14694 摘要：我们提出了一种新的网络结构，分形金字塔网络（PFNs）的像素预测任务作为一种替代广泛使用的编码器-解码器结构。在编码器-解码器结构中，输入由编码-解码管道处理，该管道试图获得语义大信道特征。与此不同的是，我们提出的PFNs拥有多条信息处理路径，并将信息编码成多个独立的小通道特征。在自监督单目深度估计的任务中，即使没有对ImageNet进行预训练，我们的模型也可以在KITTI数据集上以更少的参数与最新的方法进行比较。此外，预测的视觉质量也得到了显著提高。语义分割实验证明了PFNs可以应用于其他像素级的预测任务，并且证明了我们的模型能够捕捉到更多的全局结构信息。摘要：We propose a new network architecture, the Fractal Pyramid Networks (PFNs) for pixel-wise prediction tasks as an alternative to the widely used encoder-decoder structure. In the encoder-decoder structure, the input is processed by an encoding-decoding pipeline that tries to get a semantic large-channel feature. Different from that, our proposed PFNs hold multiple information processing pathways and encode the information to multiple separate small-channel features. On the task of self-supervised monocular depth estimation, even without ImageNet pretrained, our models can compete or outperform the state-of-the-art methods on the KITTI dataset with much fewer parameters. Moreover, the visual quality of the prediction is significantly improved. The experiment of semantic segmentation provides evidence that the PFNs can be applied to other pixel-wise prediction tasks, and demonstrates that our models can catch more global structure information.

【12】 Robust Learning-Augmented Caching: An Experimental Study 标题：健壮学习-增强缓存：一项实验研究

作者：Jakub Chłędowski,Adam Polak,Bartosz Szabucki,Konrad Zolna 机构： Which item to evict from the cache in orderto make room for a new item when the cache is full? TheEqual contribution 1Jagiellonian University 备注：ICML 2021 链接：https://arxiv.org/abs/2106.14693 摘要：有效的缓存对现代计算系统的性能至关重要。缓存中出现的一个关键优化问题——逐出哪个项来为新项腾出空间——在不知道未来的情况下无法得到最佳解决。对于这个问题有许多经典的近似算法，但是最近的研究者开始成功地应用机器学习，通过发现隐式输入模式和预测未来来决定要排除什么。虽然机器学习通常不提供任何最坏情况的保证，但新的学习领域增强算法提出了利用经典在线缓存算法使机器学习预测具有鲁棒性的解决方案。我们是第一个全面评估这些学习增强算法对现实世界的缓存数据集和国家的最先进的机器学习预测。我们证明了一个简单的方法——盲目地跟随一个预测器或一个经典的鲁棒算法，并在其中一个变得比另一个更差时进行切换——与一个性能良好的预测器相比只具有较低的开销，而当耦合的预测器失败时与经典方法竞争，从而提供了廉价的最坏情况保险。摘要：Effective caching is crucial for the performance of modern-day computing systems. A key optimization problem arising in caching -- which item to evict to make room for a new item -- cannot be optimally solved without knowing the future. There are many classical approximation algorithms for this problem, but more recently researchers started to successfully apply machine learning to decide what to evict by discovering implicit input patterns and predicting the future. While machine learning typically does not provide any worst-case guarantees, the new field of learning-augmented algorithms proposes solutions that leverage classical online caching algorithms to make the machine-learned predictors robust. We are the first to comprehensively evaluate these learning-augmented algorithms on real-world caching datasets and state-of-the-art machine-learned predictors. We show that a straightforward method -- blindly following either a predictor or a classical robust algorithm, and switching whenever one becomes worse than the other -- has only a low overhead over a well-performing predictor, while competing with classical methods when the coupled predictor fails, thus providing a cheap worst-case insurance.

【13】 Using Issues to Explain Legal Decisions 标题：用问题来解释法律决定

作者：Trevor Bench-Capon 机构：Using Issues to Explain Legal DecisionsTrevor Bench-CaponUniversity of Liverpool 备注：Presented at the XAILA workshop 2021 链接：https://arxiv.org/abs/2106.14688 摘要：由于需要解释机器学习系统的输出以预测法律案例的结果，人们对传统人工智能和法律系统提供的解释产生了新的兴趣，特别是那些使用基于因素的推理和判例的系统。在这篇论文中，我们考虑我们应该期望从这些系统中得到什么样的解释，特别关注在案例中使用问题所能提供的结构。摘要：The need to explain the output from Machine Learning systems designed to predict the outcomes of legal cases has led to a renewed interest in the explanations offered by traditional AI and Law systems, especially those using factor based reasoning and precedent cases. In this paper we consider what sort of explanations we should expect from such systems, with a particular focus on the structure that can be provided by the use of issues in cases.

【14】 Improving Uncertainty Calibration of Deep Neural Networks via Truth Discovery and Geometric Optimization 标题：基于真值发现和几何优化的深度神经网络不确定性校正

作者：Chunwei Ma,Ziyun Huang,Jiayi Xian,Mingchen Gao,Jinhui Xu 机构：Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA, Computer Science and Software Engineering, Penn State Erie, Erie, PA, USA 备注：37th Conference on Uncertainty in Artificial Intelligence (UAI 2021) 链接：https://arxiv.org/abs/2106.14662 摘要：深度神经网络（Deep Neural Networks，DNNs）尽管近年来取得了巨大的成功，但由于其学习过程中固有的不确定性，仍然可能对其预测产生怀疑。集成技术和事后校准是两种在提高DNNs不确定度校准方面有前途的方法。然而，这两种方法的协同效应还没有得到很好的探讨。在这篇论文中，我们提出了一个真理发现框架来整合基于集成和事后校准的方法。利用集合候选者的几何方差作为样本不确定性的一个良好指标，我们设计了一个可证明无精度下降的保精度真值估计器。此外，我们还证明了通过真理发现正则化优化可以增强事后校准。在包括CIFAR和ImageNet在内的大规模数据集上，我们的方法在基于直方图和基于核密度的评价指标上都比最新的校正方法有了一致的改进。我们的代码在https://github.com/horsepurve/truly-uncertain. 摘要：Deep Neural Networks (DNNs), despite their tremendous success in recent years, could still cast doubts on their predictions due to the intrinsic uncertainty associated with their learning process. Ensemble techniques and post-hoc calibrations are two types of approaches that have individually shown promise in improving the uncertainty calibration of DNNs. However, the synergistic effect of the two types of methods has not been well explored. In this paper, we propose a truth discovery framework to integrate ensemble-based and post-hoc calibration methods. Using the geometric variance of the ensemble candidates as a good indicator for sample uncertainty, we design an accuracy-preserving truth estimator with provably no accuracy drop. Furthermore, we show that post-hoc calibration can also be enhanced by truth discovery-regularized optimization. On large-scale datasets including CIFAR and ImageNet, our method shows consistent improvement against state-of-the-art calibration approaches on both histogram-based and kernel density-based evaluation metrics. Our codes are available at https://github.com/horsepurve/truly-uncertain.

【15】 Context-aware Heterogeneous Graph Attention Network for User Behavior Prediction in Local Consumer Service Platform 标题：基于上下文感知的异构图注意力网络在本地消费者服务平台用户行为预测中的应用

作者：Peiyuan Zhu,Xiaofeng Wang 机构：Alibaba Group, Hangzhou, China 链接：https://arxiv.org/abs/2106.14652 摘要：本地消费服务平台作为近年来发展起来的一种新型电子商务平台，为用户提供软件消费服务到附近的商店或家中，如Groupon、口碑等。与其他常见的电子商务平台不同，用户在本地消费服务平台上的行为与其实时的本地上下文信息密切相关。因此，构建一个上下文感知的用户行为预测系统，能够在本地消费服务平台上为商家和用户提供更好的服务。然而，以往的工作大多只是将上下文信息作为一种普通的特征加入预测模型中，以获得特定上下文下的预测列表，而忽略了用户在不同上下文中的兴趣往往存在显著差异的事实。因此，本文提出了一种上下文感知的异类图注意网络（CHGAT）来动态生成用户的表示，并估计未来行为的概率。具体地说，我们首先利用多源的历史行为构造了基于元路径的异构图，并用一种新的统一知识表示方法来理解图中的异构顶点。其次，提出了一种基于图顶点的上下文感知聚合的多级注意机制，包括顶点级注意网络和路径级注意网络。这两种方法都是为了在搜索系统中获取图形中包含的信息与外部实时上下文信息之间的语义关联。然后，本文提出的模型将特定的图形与其对应的上下文特征进行聚合，得到特定上下文下用户兴趣的表示，并将其输入到预测网络中，最终得到用户行为的预测概率。摘要：As a new type of e-commerce platform developed in recent years, local consumer service platform provides users with software to consume service to the nearby store or to the home, such as Groupon and Koubei. Different from other common e-commerce platforms, the behavior of users on the local consumer service platform is closely related to their real-time local context information. Therefore, building a context-aware user behavior prediction system is able to provide both merchants and users better service in local consumer service platforms. However, most of the previous work just treats the contextual information as an ordinary feature into the prediction model to obtain the prediction list under a specific context, which ignores the fact that the interest of a user in different contexts is often significantly different. Hence, in this paper, we propose a context-aware heterogeneous graph attention network (CHGAT) to dynamically generate the representation of the user and to estimate the probability for future behavior. Specifically, we first construct the meta-path based heterogeneous graphs with the historical behaviors from multiple sources and comprehend heterogeneous vertices in the graph with a novel unified knowledge representing approach. Next, a multi-level attention mechanism is introduced for context-aware aggregation with graph vertices, which contains the vertex-level attention network and the path-level attention network. Both of them aim to capture the semantic correlation between information contained in the graph and the outside real-time contextual information in the search system. Then the model proposed in this paper aggregates specific graphs with their corresponding context features and obtains the representation of user interest under a specific context and input it into the prediction network to finally obtain the predicted probability of user behavior.

【16】 Expert Q-learning: Deep Q-learning With State Values From Expert Examples 标题：专家问答学习：从专家实例中利用状态值进行深度问答学习

作者：Li Meng,Anis Yazidi,Morten Goodwin,Paal Engelstad 机构：University of Oslo, University of Agder, Oslo Metropolitan University 链接：https://arxiv.org/abs/2106.14642 摘要：提出了一种新的专家Q学习算法。专家Q-learning的灵感来源于决斗Q-learning，旨在将Q值分解为状态值和动作优势，将半监督学习的思想融入强化学习。与生成性对抗性模仿学习和演示中的深度Q学习不同，我们使用的离线专家只从{-1，0，1}预测一个状态的值，表明这是一个坏的、中立的还是好的状态。除了Q网络之外，还设计了一个专家网络，每当专家示例缓冲区不为空时，每次都会在常规离线小批量更新之后进行更新。我们的算法还保留了Q网络和专家网络的异步副本，使用与双Q学习相同的方式预测目标值。在Othello博弈中，我们将我们的算法与最新的Q学习算法进行了比较，该算法是双Q学习和决斗Q学习的结合。结果表明，专家Q-学习确实是有用的，更能抵抗Q-学习的高估偏差。基线Q-学习算法表现出不稳定和次优的行为，特别是在与随机玩家比赛时，而专家Q-学习则表现出更稳健的性能和更高的分数。不使用示例的专家Q学习在针对固定玩家进行训练和测试时，也获得了比基线算法更好的结果。另一方面，没有例子的专家Q学习算法在直接博弈中无法战胜基线Q学习算法，尽管它也显示出了减少高估偏差的力量。摘要：We propose a novel algorithm named Expert Q-learning. Expert Q-learning was inspired by Dueling Q-learning and aimed at incorporating the ideas from semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. Different from Generative Adversarial Imitation Learning and Deep Q-Learning from Demonstrations, the offline expert we have used only predicts the value of a state from {-1, 0, 1}, indicating whether this is a bad, neutral or good state. An expert network is designed in addition to the Q-network and updated each time following the regular offline minibatch update whenever the expert example buffer is not empty. Our algorithm also keeps asynchronous copies of the Q-network and expert network, predicting the target values using the same manner as of Double Q-learning. We compared on the game of Othello our algorithm with the state-of-the-art Q-learning algorithm, which was a combination of Double Q-learning and Dueling Q-learning. The results showed that Expert Q-learning was indeed useful and more resistant to the overestimation bias of Q-learning. The baseline Q-learning algorithm exhibited unstable and suboptimal behavior, especially when playing against a stochastic player, whereas Expert Q-learning demonstrated more robust performance with higher scores. Expert Q-learning without using examples has also gained better results than the baseline algorithm when trained and tested against a fixed player. On the other hand, Expert Q-learning without examples cannot win against the baseline Q-learning algorithm in direct game competitions despite the fact that it has also shown the strength of reducing the overestimation bias.

【17】 AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT 标题：2021年CASE的AMU-EURANOVA任务1：评估多语种BERT的稳定性

作者：Léo Bouscarrat,Antoine Bonnefoy,Cécile Capponi,Carlos Ramisch 机构：EURA NOVA, Marseille, France, Aix Marseille Univ, Universit´e de Toulon, CNRS, LIS, Marseille, France 备注：None 链接：https://arxiv.org/abs/2106.14625 摘要：本文解释了我们参与案例2021共享任务的任务1。此任务是关于从新闻中提取多语言事件。我们重点研究了子任务4，事件信息提取。这个子任务有一个小的训练数据集，我们微调了一个多语言的BERT来解决这个子任务。我们研究了数据集的不稳定性问题，并试图减轻它。摘要：This paper explains our participation in task 1 of the CASE 2021 shared task. This task is about multilingual event extraction from news. We focused on sub-task 4, event information extraction. This sub-task has a small training dataset and we fine-tuned a multilingual BERT to solve this sub-task. We studied the instability problem on the dataset and tried to mitigate it.

【18】 Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering 标题：BioASQ 2020综述：第八届BioASQ关于大规模生物医学语义索引和问题回答的挑战

作者：Anastasios Nentidis,Anastasia Krithara,Konstantinos Bougiatiotis,Martin Krallinger,Carlos Rodriguez-Penagos,Marta Villegas,Georgios Paliouras 机构：Paliouras, National Center for Scientific Research “Demokritos”, Athens, Greece, Aristotle University of Thessaloniki, Thessaloniki, Greece, National and Kapodistrian University of Athens, Athens, Greece, Barcelona Supercomputing Center, Barcelona, Spain 备注：None 链接：https://arxiv.org/abs/2106.14618 摘要：在本文中，我们对BioASQ挑战的第八版进行了概述，该挑战作为评估论坛（CLEF）2020会议和实验室的一个实验室运行。BioASQ是一系列挑战，旨在促进大规模生物医学语义索引和问答的系统和方法。为此，自2012年起，每年组织一次共享任务，不同的团队开发系统，在相同要求的基准数据集上竞争，这些数据集代表生物医学领域专家的真实信息需求。今年，随着西班牙语医学语义索引新任务的推出，这一挑战得到了扩展。共有34支系统超过100个的队伍参加了挑战赛的三项任务。与前几年一样，评估结果显示，表现最好的系统成功地超越了强大的基线，这表明最先进的系统通过不断改进不断推动研究前沿。摘要：In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different teams develop systems that compete on the same demanding benchmark datasets that represent the real information needs of experts in the biomedical domain. This year, the challenge has been extended with the introduction of a new task on medical semantic indexing in Spanish. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge. As in previous years, the results of the evaluation reveal that the top-performing systems managed to outperform the strong baselines, which suggests that state-of-the-art systems keep pushing the frontier of research through continuous improvements.

【19】 An evaluation of template and ML-based generation of user-readable text from a knowledge graph 标题：从知识图生成用户可读文本的模板和基于ML的评估

作者：Zola Mahlaza,C. Maria Keet,Jarryd Dunn,Matthew Poulter 机构：University of Cape Town, University Avenue, Cape Town, South Africa 备注：15 pages, 6 figures 链接：https://arxiv.org/abs/2106.14613 摘要：典型的用户友好的知识图呈现是可视化和自然语言文本。在后一种HCI解决方案中，数据驱动的自然语言生成系统受到越来越多的关注，但由于受到内容丢失、幻觉或重复等错误的影响，它们的性能往往优于基于模板的系统。目前还不清楚这些错误中的哪些与文本所针对的人的低质量判断显著相关，这妨碍了根据错误对改进人类评估的影响来解决错误。我们通过一个实验评估了它们之间的可能联系，该实验利用了专家和众包的方法来评估人类编写的文本、模板生成的文本和序列到序列模型生成的文本。结果表明，有错误的文本与人类对自然性和质量的低判断之间没有显著的关联。在机器学习生成的文本中，有丢失或产生幻觉的插槽与人类对自然性和质量的低判断之间也没有显著的联系。因此，这两种方法似乎都是为知识图设计自然语言接口的可行选择。摘要：Typical user-friendly renderings of knowledge graphs are visualisations and natural language text. Within the latter HCI solution approach, data-driven natural language generation systems receive increased attention, but they are often outperformed by template-based systems due to suffering from errors such as content dropping, hallucination, or repetition. It is unknown which of those errors are associated significantly with low quality judgements by humans who the text is aimed for, which hampers addressing errors based on their impact on improving human evaluations. We assessed their possible association with an experiment availing of expert and crowdsourced evaluations of human authored text, template generated text, and sequence-to-sequence model generated text. The results showed that there was no significant association between human authored texts with errors and the low human judgements of naturalness and quality. There was also no significant association between machine learning generated texts with dropped or hallucinated slots and the low human judgements of naturalness and quality. Thus, both approaches appear to be viable options for designing a natural language interface for knowledge graphs.

【20】 An Adversarial Learning based Multi-Step Spoken Language Understanding System through Human-Computer Interaction 标题：一种基于对抗性学习的人机交互多步骤口语理解系统

作者：Yu Wang,Yilin Shen,Hongxia Jin 机构：AI Center, Samsung Research America, Mountain View, CA, USA 备注：5 Pages, original work published at ICASSP 2021 链接：https://arxiv.org/abs/2106.14611 摘要：现有的大多数口语理解系统只能基于单轮用户查询进行语义框架解析。他们不能通过与用户的多轮交互来获取用户的反馈来更新/添加/删除槽值。本文介绍了一种新的基于对抗式学习的多步口语理解系统，该系统可以利用多轮用户反馈信息来更新时隙值。我们在基准ATIS数据集上进行了两个实验，结果表明，新系统只需一轮反馈，就F1而言，至少可以提高2.5%$的解析性能。当反馈轮的数量增加时，改进会变得更大。此外，我们还将新系统与最新的对话状态跟踪系统进行了比较，结果表明，新的交互系统在多轮口语理解任务中，在时隙和句子水平上的准确率都有更好的表现。摘要：Most of the existing spoken language understanding systems can perform only semantic frame parsing based on a single-round user query. They cannot take users' feedback to update/add/remove slot values through multiround interactions with users. In this paper, we introduce a novel multi-step spoken language understanding system based on adversarial learning that can leverage the multiround user's feedback to update slot values. We perform two experiments on the benchmark ATIS dataset and demonstrate that the new system can improve parsing performance by at least $2.5%$ in terms of F1, with only one round of feedback. The improvement becomes even larger when the number of feedback rounds increases. Furthermore, we also compare the new system with state-of-the-art dialogue state tracking systems and demonstrate that the new interactive system can perform better on multiround spoken language understanding tasks in terms of slot- and sentence-level accuracy.

【21】 Neural Models for Offensive Language Detection 标题：用于攻击性语言检测的神经模型

作者：Ehab Hamdy 机构：. Pr¨ufer, Prof. Dr. Jelena Mitrovi´c, Prof. Dr. Michael Granitzer 链接：https://arxiv.org/abs/2106.14609 摘要：攻击性语言检测是一种不断发展的自然语言处理应用。这种增长主要是因为社交网络的广泛使用，社交网络成为人们交流、工作和享受娱乐内容的主流渠道。许多分享攻击性和攻击性内容的事件在很大程度上对社会产生了负面影响。我们相信，改进和比较不同的机器学习模型来对抗这些有害内容是本论文的一个重要而富有挑战性的目标。我们针对攻击性语言检测问题，建立有效的攻击性语言检测自动化模型。随着NLP模型的最新进展，特别是Transformer模型，它解决了标准seq-to-seq技术的许多缺点。BERT模型已经在许多NLP任务上显示了最新的结果。虽然文献仍在探讨BERT在自然语言处理领域取得成就的原因。其他有效的变体已经被开发出来，以改进标准的BERT，例如RoBERTa和ALBERT。此外，由于社交媒体上文本的多语种性质可能会影响对给定tween的模型决策，因此有必要研究多语种模型，例如XLM RoBERTa在100种语言上的训练以及它与单语种模型的比较。罗BERT为基础的模式被证明是最有能力的模式，并取得了最高的F1成绩的任务。全面的攻击性语言检测系统的另一个关键方面是模型训练和推理的速度。在这方面，我们考虑了模型运行时，并对名为BlazingText的FastText的非常高效的实现进行了微调，该实现取得了良好的效果，比基于BERT的模型快得多。摘要：Offensive language detection is an ever-growing natural language processing (NLP) application. This growth is mainly because of the widespread usage of social networks, which becomes a mainstream channel for people to communicate, work, and enjoy entertainment content. Many incidents of sharing aggressive and offensive content negatively impacted society to a great extend. We believe contributing to improving and comparing different machine learning models to fight such harmful contents is an important and challenging goal for this thesis. We targeted the problem of offensive language detection for building efficient automated models for offensive language detection. With the recent advancements of NLP models, specifically, the Transformer model, which tackled many shortcomings of the standard seq-to-seq techniques. The BERT model has shown state-of-the-art results on many NLP tasks. Although the literature still exploring the reasons for the BERT achievements in the NLP field. Other efficient variants have been developed to improve upon the standard BERT, such as RoBERTa and ALBERT. Moreover, due to the multilingual nature of text on social media that could affect the model decision on a given tween, it is becoming essential to examine multilingual models such as XLM-RoBERTa trained on 100 languages and how did it compare to unilingual models. The RoBERTa based model proved to be the most capable model and achieved the highest F1 score for the tasks. Another critical aspect of a well-rounded offensive language detection system is the speed at which a model can be trained and make inferences. In that respect, we have considered the model run-time and fine-tuned the very efficient implementation of FastText called BlazingText that achieved good results, which is much faster than BERT-based models.

【22】 Modeling and Reasoning in Event Calculus using Goal-Directed Constraint Answer Set Programming 标题：基于目标导向约束答案集编程的事件演算建模与推理

作者：Joaquín Arias,Manuel Carro,Zhuo Chen,Gopal Gupta 机构：CETINIA, Universidad Rey Juan Carlos,IMDEA Software Institute, Universidad Polit´ecnica de Madrid,University of Texas at Dallas 备注：Under consideration in Theory and Practice of Logic Programming (TPLP) 链接：https://arxiv.org/abs/2106.14566 摘要：自动常识推理对于构建具有可解释性人工智能等特征的类人人工智能系统至关重要。事件演算（EC）是一种形式主义，它以合理的逻辑基础来模拟常识推理。以往使用电子商务进行机械化推理的尝试在处理密集域（如时间和其他物理量）中的连续变化、变量之间的约束、默认否定以及不同推理方法的统一应用等方面面临困难。我们建议使用s（CASP），一个查询驱动的自顶向下的执行模型，用于带约束的谓词-答案集编程，使用EC进行建模和推理。我们展示了电子商务场景是如何自然地、直接地编码在s（CASP）中的，以及它如何在包含密集时间和密集流约束的领域中实现演绎和诱因推理任务。摘要：Automated commonsense reasoning is essential for building human-like AI systems featuring, for example, explainable AI. Event Calculus (EC) is a family of formalisms that model commonsense reasoning with a sound, logical basis. Previous attempts to mechanize reasoning using EC faced difficulties in the treatment of the continuous change in dense domains (e.g., time and other physical quantities), constraints among variables, default negation, and the uniform application of different inference methods, among others. We propose the use of s(CASP), a query-driven, top-down execution model for Predicate Answer Set Programming with Constraints, to model and reason using EC. We show how EC scenarios can be naturally and directly encoded in s(CASP) and how it enables deductive and abductive reasoning tasks in domains featuring constraints involving both dense time and dense fluents.

【23】 Unsupervised Continual Learning via Self-Adaptive Deep Clustering Approach 标题：基于自适应深度聚类的无监督连续学习

作者：Mahardhika Pratama,Andri Ashfahani,Edwin Lughofer 机构：SCSE, Nanyang Technological University, Singapore, DKBMS, Johanes Kepler University, Linz, Austria 备注：currently under review 链接：https://arxiv.org/abs/2106.14563 摘要：在现有文献中，无监督的持续学习仍然是一个相对未知的领域，因为现有的绝大多数作品都要求无限地获取基础知识，这会产生昂贵的标签成本。另一个问题在于任务边界和任务id的问题，这些问题必须为模型的更新或模型的预测所知，从而妨碍了实时部署的可行性。本文提出了自适应深度持续学习者的知识保持问题。KIERA是从灵活的深度聚类方法的概念发展而来的，它具有弹性的网络结构，能够及时地应对不断变化的环境。为了克服灾难性遗忘问题，提出了基于质心的经验回放方法。KIERA不利用任何标记样本进行模型更新，同时具有任务不可知的优点。KIERA的优势已经在流行的持续学习问题中得到了数值验证，与最先进的方法相比，KIERA具有很强的竞争力。我们的实现在textit{url中提供{https://github.com/ContinualAL/KIERA}}. 摘要：Unsupervised continual learning remains a relatively uncharted territory in the existing literature because the vast majority of existing works call for unlimited access of ground truth incurring expensive labelling cost. Another issue lies in the problem of task boundaries and task IDs which must be known for model's updates or model's predictions hindering feasibility for real-time deployment. Knowledge Retention in Self-Adaptive Deep Continual Learner, (KIERA), is proposed in this paper. KIERA is developed from the notion of flexible deep clustering approach possessing an elastic network structure to cope with changing environments in the timely manner. The centroid-based experience replay is put forward to overcome the catastrophic forgetting problem. KIERA does not exploit any labelled samples for model updates while featuring a task-agnostic merit. The advantage of KIERA has been numerically validated in popular continual learning problems where it shows highly competitive performance compared to state-of-the art approaches. Our implementation is available in textit{url{https://github.com/ContinualAL/KIERA}}.

【24】 Contrastive Counterfactual Visual Explanations With Overdetermination 标题：超定的反事实对比视觉解释

作者：Adam White,Kwun Ho Ngan,James Phelan,Saman Sadeghi Afgeh,Kevin Ryan,Constantino Carlos Reyes-Aldasoro,Artur d'Avila Garcez 机构：School of Mathematics, Computer Science and Engineering, City, University of London, London, EC,V ,HB, UK, Saman.Sadeghiafgeh, Kevin.Ryan 链接：https://arxiv.org/abs/2106.14556 摘要：本文提出了一种新的可解释人工智能方法——清晰图像法。清晰的图像是基于这样一种观点：一个令人满意的解释应该是对比的、反事实的和可测量的。清晰图像通过对比图像和通过对抗学习自动生成的相应图像来解释图像的分类概率。这使得显著的分割和扰动都能忠实地确定每个片段的重要性。CLEAR Image已成功应用于医学影像案例研究，使用一种新的定点博弈度量，它比Grad-CAM和LIME等方法平均高出27%。CLEAR Image擅长识别“因果过度确定”的情况，即图像中存在多个斑块，其中任何一个斑块本身足以导致分类概率接近1。摘要：A novel explainable AI method called CLEAR Image is introduced in this paper. CLEAR Image is based on the view that a satisfactory explanation should be contrastive, counterfactual and measurable. CLEAR Image explains an image's classification probability by contrasting the image with a corresponding image generated automatically via adversarial learning. This enables both salient segmentation and perturbations that faithfully determine each segment's importance. CLEAR Image was successfully applied to a medical imaging case study where it outperformed methods such as Grad-CAM and LIME by an average of 27% using a novel pointing game metric. CLEAR Image excels in identifying cases of "causal overdetermination" where there are multiple patches in an image, any one of which is sufficient by itself to cause the classification probability to be close to one.

【25】 Cheating Detection Pipeline for Online Interviews and Exams 标题：在线面试和考试的作弊检测管道

作者：Azmi Can Özgen,Mahiye Uluyağmur Öztürk,Umut Bayraktar 机构：Huawei Turkey R&D Center, Istanbul, Turkey 链接：https://arxiv.org/abs/2106.14483 摘要：由于流行病和远程工作环境的优势，远程考试和工作面试越来越受欢迎并成为不可或缺的。大多数公司和学术机构利用这些系统进行招聘和在线考试。然而，远程考试系统的关键问题之一是在可靠的环境中进行考试。在这项工作中，我们提出了在线面试和考试作弊分析管道。这套系统只需要考生的一段视频，这段视频是在考试期间录制的。然后利用作弊检测管道来检测另一个人、电子设备使用情况和候选人缺勤状态。该流水线由人脸检测、人脸识别、目标检测和人脸跟踪算法组成。为了评估管道的性能，我们收集了一个私有视频数据集。视频数据集包括作弊活动和干净的视频。最终，我们的管道提供了一个有效和快速的准则，以检测和分析作弊活动在网上面试和考试视频。摘要：Remote examination and job interviews have gained popularity and become indispensable because of both pandemics and the advantage of remote working circumstances. Most companies and academic institutions utilize these systems for their recruitment processes and also for online exams. However, one of the critical problems of the remote examination systems is conducting the exams in a reliable environment. In this work, we present a cheating analysis pipeline for online interviews and exams. The system only requires a video of the candidate, which is recorded during the exam. Then cheating detection pipeline is employed to detect another person, electronic device usage, and candidate absence status. The pipeline consists of face detection, face recognition, object detection, and face tracking algorithms. To evaluate the performance of the pipeline we collected a private video dataset. The video dataset includes both cheating activities and clean videos. Ultimately, our pipeline presents an efficient and fast guideline to detect and analyze cheating activities in an online interview and exam video.

【26】 Dizygotic Conditional Variational AutoEncoder for Multi-Modal and Partial Modality Absent Few-Shot Learning 标题：缺少少射学习的多模态和部分模态双合子条件变分自动编码器

作者：Yi Zhang,Sheng Huang,Xi Peng,Dan Yang 机构： Chongqing University 备注：13 pages 链接：https://arxiv.org/abs/2106.14467 摘要：数据增强技术是提高Few-Shot分类性能的有力手段。它生成更多的样本作为补充，然后将该任务转化为一个共同的有监督学习问题进行求解。然而，大多数主流的基于数据增强的方法只考虑单一模态的信息，导致生成的特征的多样性和质量较低。针对上述问题，本文提出了一种新的多模态数据增强方法双合子条件变分自动编码器（DCVAE）。DCVAE通过将两个条件变分自编码器（CVAEs）以双合子共生的方式与相同种子但不同模态条件配对来进行特征合成。随后，将两个cvae生成的特征自适应地组合起来，得到最终的特征，并将其转换回配对条件，同时保证这些条件不仅在表示上而且在函数上与原始条件一致。DCVAE本质上提供了一种新的思路，即利用不同模态先验信息的互补性，在不同的多模态场景中进行数据扩充。大量的实验结果表明，我们的工作在minimagenet、CIFAR-FS和CUB数据集上取得了最先进的性能，并且能够在部分模态缺失的情况下很好地工作。摘要：Data augmentation is a powerful technique for improving the performance of the few-shot classification task. It generates more samples as supplements, and then this task can be transformed into a common supervised learning issue for solution. However, most mainstream data augmentation based approaches only consider the single modality information, which leads to the low diversity and quality of generated features. In this paper, we present a novel multi-modal data augmentation approach named Dizygotic Conditional Variational AutoEncoder (DCVAE) for addressing the aforementioned issue. DCVAE conducts feature synthesis via pairing two Conditional Variational AutoEncoders (CVAEs) with the same seed but different modality conditions in a dizygotic symbiosis manner. Subsequently, the generated features of two CVAEs are adaptively combined to yield the final feature, which can be converted back into its paired conditions while ensuring these conditions are consistent with the original conditions not only in representation but also in function. DCVAE essentially provides a new idea of data augmentation in various multi-modal scenarios by exploiting the complement of different modality prior information. Extensive experimental results demonstrate our work achieves state-of-the-art performances on miniImageNet, CIFAR-FS and CUB datasets, and is able to work well in the partial modality absence case.

【27】 Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU 标题：增强SLU中意图分类和域外检测的泛化能力

作者：Yilin Shen,Yen-Chang Hsu,Avik Ray,Hongxia Jin 机构：Samsung Research America 链接：https://arxiv.org/abs/2106.14464 摘要：意图分类是口语理解中的一项重要任务。由于大多数模型都是用预先收集的域内（IND）训练话语建立的，因此它们检测不受支持的域外（OOD）话语的能力在实际应用中起着至关重要的作用。最近的研究表明，使用额外的数据和标签可以提高OOD的检测性能，但是收集这些数据的成本可能会很高。该文提出了一种只训练IND数据的模型，同时支持IND-intent分类和OOD检测。我们的方法设计了一个新的域正则化模块（DRM）来减少香草分类器的过度自信现象，在这两种情况下都取得了较好的泛化效果。此外，DRM还可以作为基于神经网络的意图分类器中最后一层的替代品，提供了一种低成本的改进策略。对四个数据集的评估表明，我们基于BERT和RoBERTa模型的方法与现有方法和我们为比较创建的强大基线相比，达到了最先进的性能。摘要：Intent classification is a major task in spoken language understanding (SLU). Since most models are built with pre-collected in-domain (IND) training utterances, their ability to detect unsupported out-of-domain (OOD) utterances has a critical effect in practical use. Recent works have shown that using extra data and labels can improve the OOD detection performance, yet it could be costly to collect such data. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection. Our method designs a novel domain-regularized module (DRM) to reduce the overconfident phenomenon of a vanilla classifier, achieving a better generalization in both cases. Besides, DRM can be used as a drop-in replacement for the last layer in any neural network-based intent classifier, providing a low-cost strategy for a significant improvement. The evaluation on four datasets shows that our method built on BERT and RoBERTa models achieves state-of-the-art performance against existing approaches and the strong baselines we created for the comparisons.

【28】 RadGraph: Extracting Clinical Entities and Relations from Radiology Reports 标题：RadGraph：从放射学报告中提取临床实体和关系

作者：Saahil Jain,Ashwin Agrawal,Adriel Saporta,Steven QH Truong,Du Nguyen Duong,Tan Bui,Pierre Chambon,Yuhao Zhang,Matthew P. Lungren,Andrew Y. Ng,Curtis P. Langlotz,Pranav Rajpurkar 机构：Stanford University, VinBrain, VinUniversity 链接：https://arxiv.org/abs/2106.14463 摘要：从自由文本放射报告中提取结构化的临床信息可以使放射报告信息用于各种重要的医疗保健应用程序。在我们的工作中，我们提出了RadGraph，一个数据集的实体和关系在全文胸部X射线放射学报告的基础上，我们设计了一个新的信息提取模式结构放射学报告。我们发布了一个开发数据集，该数据集包含来自MIMIC-CXR数据集（14579个实体和10889个关系）的500份放射报告的经委员会认证的放射学家注释，以及一个测试数据集，它包含两组独立的board certified radiologist注解，用于100份放射报告，在MIMIC-CXR和CheXpert数据集中平均分配。利用这些数据集，我们训练并测试了一个深度学习模型RadGraph Benchmark，该模型在MIMIC-CXR和CheXpert测试集上的关系提取的micro F1分别达到0.82和0.73。此外，我们还发布了一个推断数据集，其中包含由RadGraph Benchmark自动生成的注释，这些注释跨越220763份MIMIC-CXR报告（约600万个实体和400万个关系）和500份CheXpert报告（13783个实体和9908个关系），并映射到相关胸片。我们免费提供的数据集可以促进医学自然语言处理方面的广泛研究，以及计算机视觉和多模式学习（与胸片相关）。摘要：Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on the MIMIC-CXR and CheXpert test sets respectively. Additionally, we release an inference dataset, which contains annotations automatically generated by RadGraph Benchmark across 220,763 MIMIC-CXR reports (around 6 million entities and 4 million relations) and 500 CheXpert reports (13,783 entities and 9,908 relations) with mappings to associated chest radiographs. Our freely available dataset can facilitate a wide range of research in medical natural language processing, as well as computer vision and multi-modal learning when linked to chest radiographs.

【29】 Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection 标题：特征组合备受关注：百度足球嵌入和基于Transformer的时间检测

作者：Xin Zhou,Le Kang,Zhiyu Cheng,Bo He,Jingyu Xin 机构：Baidu Research, Bordeaux Dr, Sunnyvale, CA , USA 备注：Tech Report. Authors Xin Zhou, Le Kang, and Zhiyu Cheng made equal contributions 链接：https://arxiv.org/abs/2106.14447 摘要：随着互联网技术和新兴工具的迅速发展，与体育相关的在线视频以前所未有的速度增长。为了实现体育视频编辑/亮点生成过程的自动化，一个关键的任务是准确地识别和定位长视频中的事件。在这份技术报告中，我们提出了一个两阶段的范例来检测足球广播视频中发生了什么以及什么时候发生的事件。具体来说，我们对足球数据中的多个动作识别模型进行了微调，以提取高层语义特征，并设计了一个基于变换器的时态检测模块来定位目标事件。在CVPR 2021 ActivityNet研讨会的SoccerNet-v2挑战赛中，这种方法在动作捕捉和重放接地这两项任务中都取得了最先进的性能。我们的足球嵌入功能发布于https://github.com/baidu-research/vidpress-sports. 通过与更广泛的社区分享这些特征，我们希望能够加速足球视频理解的研究。摘要：With rapidly evolving internet technologies and emerging tools, sports related videos generated online are increasing at an unprecedentedly fast pace. To automate sports video editing/highlight generation process, a key task is to precisely recognize and locate the events in the long untrimmed videos. In this tech report, we present a two-stage paradigm to detect what and when events happen in soccer broadcast videos. Specifically, we fine-tune multiple action recognition models on soccer data to extract high-level semantic features, and design a transformer based temporal detection module to locate the target events. This approach achieved the state-of-the-art performance in both two tasks, i.e., action spotting and replay grounding, in the SoccerNet-v2 Challenge, under CVPR 2021 ActivityNet workshop. Our soccer embedding features are released at https://github.com/baidu-research/vidpress-sports. By sharing these features with the broader community, we hope to accelerate the research into soccer video understanding.

【30】 Modelling Monotonic and Non-Monotonic Attribute Dependencies with Embeddings: A Theoretical Analysis 标题：带嵌入的单调和非单调属性依赖模型的理论分析

作者：Steven Schockaert 链接：https://arxiv.org/abs/2106.14431 摘要：在过去的十年中，实体嵌入在人工智能中已经变得无处不在。这种嵌入本质上是对感兴趣的实体的紧凑但语义上有意义的表示。在大多数方法中，向量用于表示实体本身，以及表示它们的相关属性。使用属性嵌入的一个重要优点是可以捕获属性之间的语义依赖关系。然而，对于什么样的语义依赖可以用这种方式建模却知之甚少。本文的目的是阐明这个问题，集中在一个实体的嵌入是通过汇集其已知属性的嵌入来获得的设置。我们特别关注的是研究不同嵌入策略的理论局限性，而不是它们在实践中有效学习属性依赖性的能力。我们首先展示了一些负面的结果，揭示了一些最流行的嵌入模型甚至不能捕获基本的Horn规则。然而，我们也发现一些嵌入策略在原则上能够模拟单调和非单调属性依赖。摘要：During the last decade, entity embeddings have become ubiquitous in Artificial Intelligence. Such embeddings essentially serve as compact but semantically meaningful representations of the entities of interest. In most approaches, vectors are used for representing the entities themselves, as well as for representing their associated attributes. An important advantage of using attribute embeddings is that (some of the) semantic dependencies between the attributes can thus be captured. However, little is known about what kinds of semantic dependencies can be modelled in this way. The aim of this paper is to shed light on this question, focusing on settings where the embedding of an entity is obtained by pooling the embeddings of its known attributes. Our particular focus is on studying the theoretical limitations of different embedding strategies, rather than their ability to effectively learn attribute dependencies in practice. We first show a number of negative results, revealing that some of the most popular embedding models are not able to capture even basic Horn rules. However, we also find that some embedding strategies are capable, in principle, of modelling both monotonic and non-monotonic attribute dependencies.

【31】 Capturing the temporal constraints of gradual patterns 标题：捕捉渐变模式的时间约束

作者：Dickson Odhiambo Owuor 机构：Le , Octobre , Sous la direction de Prof. Anne LAURENT, et Dr. Joseph Onderi ORERO, Devant le jury composé de, Anne Laurent, Professeur, LIRMM, Université de Montpellier, Directrice, Joseph Onderi Orero, Maître de Conférence, FIT, Strathmore University, Co-encadrant 备注：155 pagesm Doctoral thesis, Montpellier 链接：https://arxiv.org/abs/2106.14417 摘要：渐进模式挖掘允许通过渐进规则（如：“X越多，Y越多”）提取属性相关性。这种相关性在识别和隔离属性之间的关系时很有用，这些属性通过对数据集的快速扫描可能不明显。例如，研究人员可以应用渐进模式挖掘来确定数据集的哪些属性表现出不熟悉的相关性，以便将它们分离出来进行更深入的探索或分析。在这项工作中，我们提出了一种蚁群优化技术，它使用一种流行的概率方法，模仿生物蚂蚁寻找食物的最短路径来解决组合问题。在我们的第二个贡献中，我们扩展了现有的渐进模式挖掘技术，允许提取渐进模式以及受影响的渐进项目集之间的近似时间延迟。这种模式被称为模糊时间渐进模式，其形式可能是：“X越多，Y越多，大约3个月后”。在我们的第三个贡献中，我们提出了一个数据交叉模型，允许将大部分渐进模式挖掘算法实现集成到云平台中。这一贡献的动力来自于物联网应用在我们社会的几乎每个领域的激增，以及来自不同来源的大规模时间序列数据的提供。摘要：Gradual pattern mining allows for extraction of attribute correlations through gradual rules such as: "the more X, the more Y". Such correlations are useful in identifying and isolating relationships among the attributes that may not be obvious through quick scans on a data set. For instance, a researcher may apply gradual pattern mining to determine which attributes of a data set exhibit unfamiliar correlations in order to isolate them for deeper exploration or analysis. In this work, we propose an ant colony optimization technique which uses a popular probabilistic approach that mimics the behavior biological ants as they search for the shortest path to find food in order to solve combinatorial problems. In our second contribution, we extend an existing gradual pattern mining technique to allow for extraction of gradual patterns together with an approximated temporal lag between the affected gradual item sets. Such a pattern is referred to as a fuzzy-temporal gradual pattern and it may take the form: "the more X, the more Y, almost 3 months later". In our third contribution, we propose a data crossing model that allows for integration of mostly gradual pattern mining algorithm implementations into a Cloud platform. This contribution is motivated by the proliferation of IoT applications in almost every area of our society and this comes with provision of large-scale time-series data from different sources.

【32】 Single RGB-D Camera Teleoperation for General Robotic Manipulation 标题：用于通用机器人操作的单RGB-D摄像机遥操作

作者：Quan Vuong,Yuzhe Qin,Runlin Guo,Xiaolong Wang,Hao Su,Henrik Christensen 机构：University of California San Diego 链接：https://arxiv.org/abs/2106.14396 摘要：提出了一种利用单台RGB-D摄像机作为人体运动捕捉设备的遥操作系统。我们的系统可以执行一般的操作任务，如折叠布，锤击和3mm间隙钉孔。我们建议使用非笛卡尔斜坐标系、动态运动缩放和操作员框架的重新定位来增加遥操作系统的灵活性。我们假设，降低进入遥操作的门槛将允许更广泛地部署有监督的自治系统，这将反过来生成现实的数据集，从而释放机器学习用于机器人操作的潜力。我们的系统演示可以在线获得https://sites.google.com/view/manipulation-teleop-with-rgbd 摘要：We propose a teleoperation system that uses a single RGB-D camera as the human motion capture device. Our system can perform general manipulation tasks such as cloth folding, hammering and 3mm clearance peg in hole. We propose the use of non-Cartesian oblique coordinate frame, dynamic motion scaling and reposition of operator frames to increase the flexibility of our teleoperation system. We hypothesize that lowering the barrier of entry to teleoperation will allow for wider deployment of supervised autonomy system, which will in turn generates realistic datasets that unlock the potential of machine learning for robotic manipulation. Demo of our systems are available online https://sites.google.com/view/manipulation-teleop-with-rgbd

【33】 Word2Box: Learning Word Representation Using Box Embeddings 标题：Word2Box：使用Box嵌入学习单词表示

作者：Shib Sankar Dasgupta,Michael Boratko,Shriya Atmakuri,Xiang Lorraine Li,Dhruvesh Patel,Andrew McCallum 机构：College of Information and Computer Sciences, University of Massachusetts, Amherst 备注：Work in progress 链接：https://arxiv.org/abs/2106.14361 摘要：词汇的学习向量表示是自然语言处理中最基本的主题之一，它能够捕捉在各种自然语言处理任务中有用的句法和语义关系。然而，向量表示可能会受到限制，因为点积相似性等典型的评分将向量在空间中的位置和大小交织在一起。表征学习领域的令人兴奋的创新已经提出了替代的基本表征，如分布、双曲向量或区域。我们的模型Word2Box采用了一种基于区域的方法来解决单词表示问题，将单词表示为$n$维的矩形。这些表示独立地编码位置和宽度，并提供额外的几何操作，如交集和包含，这使得它们能够对向量难以处理的共现模式进行建模。我们展示了在各种单词相似性任务上的改进性能，特别是在不太常见的单词上，并对Word2Box提供的额外独特表达能力进行了定性分析。摘要：Learning vector representations for words is one of the most fundamental topics in NLP, capable of capturing syntactic and semantic relationships useful in a variety of downstream NLP tasks. Vector representations can be limiting, however, in that typical scoring such as dot product similarity intertwines position and magnitude of the vector in space. Exciting innovations in the space of representation learning have proposed alternative fundamental representations, such as distributions, hyperbolic vectors, or regions. Our model, Word2Box, takes a region-based approach to the problem of word representation, representing words as $n$-dimensional rectangles. These representations encode position and breadth independently and provide additional geometric operations such as intersection and containment which allow them to model co-occurrence patterns vectors struggle with. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a qualitative analysis exploring the additional unique expressivity provided by Word2Box.

【34】 Unsupervised Skill Discovery with Bottleneck Option Learning 标题：具有瓶颈选项学习的无监督技能发现

作者：Jaekyeom Kim,Seohong Park,Gunhee Kim 机构：Equal contribution 1Department of Computer Science andEngineering, Seoul National University 备注：Accepted to ICML 2021. Code at this https URL 链接：https://arxiv.org/abs/2106.14305 摘要：像人类一样，在没有任何外部奖励或监督的情况下，从环境中获得固有技能的能力是一个重要的问题。提出了一种新的无监督技能发现方法&信息瓶颈选择学习（IBOL）。除了环境的线性化可以促进更多不同的和遥远的状态转换之外，IBOL还可以发现不同的技能。它提供了抽象的技能学习与信息瓶颈框架的选择与改进的稳定性和鼓励解开。我们的经验证明，IBOL在MuJoCo环境中，包括Ant、HalfCheetah、Hopper和D'Kitty，在信息论评估和下游任务上优于多种最先进的无监督技能发现方法。摘要：Having the ability to acquire inherent skills from environments without any external rewards or supervision like humans is an important problem. We propose a novel unsupervised skill discovery method named Information Bottleneck Option Learning (IBOL). On top of the linearization of environments that promotes more various and distant state transitions, IBOL enables the discovery of diverse skills. It provides the abstraction of the skills learned with the information bottleneck framework for the options with improved stability and encouraged disentanglement. We empirically demonstrate that IBOL outperforms multiple state-of-the-art unsupervised skill discovery methods on the information-theoretic evaluations and downstream tasks in MuJoCo environments, including Ant, HalfCheetah, Hopper and D'Kitty.

【35】 ASK: Adversarial Soft k-Nearest Neighbor Attack and Defense 标题：问：对抗性软k近邻攻击与防御

作者：Ren Wang,Tianqi Chen,Philip Yao,Sijia Liu,Indika Rajapakse,Alfred Hero 机构：University of Michigan, Michigan State University 链接：https://arxiv.org/abs/2106.14300 摘要：基于K近邻（kNN）的深度学习方法由于其简单性和几何可解释性而被广泛应用。然而，基于kNN的分类模型的鲁棒性还没有得到充分的研究，kNN攻击策略还不成熟。在本文中，我们提出了一种对抗性的软kNN（ASK）损失，以设计更有效的kNN攻击策略，并开发更好的防御。我们的ASK-loss方法有两个优点。首先，ASK-loss比以往提出的目标更能逼近kNN的分类错误概率。其次，ASK损失是可解释的：它保留了扰动输入和未扰动输入的kNN之间的互信息。我们利用ASK丢失产生了一种新的攻击方法ASK攻击（ASK-Atk），它比以往的kNN攻击具有更高的攻击效率和准确率。在ASK-Atk算法的基础上，我们提出了一种ASK-Def算法来优化ASK-Atk引起的最坏训练损失。摘要：K-Nearest Neighbor (kNN)-based deep learning methods have been applied to many applications due to their simplicity and geometric interpretability. However, the robustness of kNN-based classification models has not been thoroughly explored and kNN attack strategies are underdeveloped. In this paper, we propose an Adversarial Soft kNN (ASK) loss to both design more effective kNN attack strategies and to develop better defenses against them. Our ASK loss approach has two advantages. First, ASK loss can better approximate the kNN's probability of classification error than objectives proposed in previous works. Second, the ASK loss is interpretable: it preserves the mutual information between the perturbed input and the kNN of the unperturbed input. We use the ASK loss to generate a novel attack method called the ASK-Attack (ASK-Atk), which shows superior attack efficiency and accuracy degradation relative to previous kNN attacks. Based on the ASK-Atk, we then derive an ASK-Defense (ASK-Def) method that optimizes the worst-case training loss induced by ASK-Atk.

【36】 Pairing Conceptual Modeling with Machine Learning 标题：将概念建模与机器学习配对

作者：Wolfgang Maass,Veda C. Storey 机构：German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany, J. Mack Robinson College of Business, Georgia State University 备注：None 链接：https://arxiv.org/abs/2106.14251 摘要：概念建模和机器学习一直被认为是重要的研究领域。随着越来越重视为商业和其他应用数字化和处理大量数据，考虑这些研究领域如何相互补充将是有益的。为了理解它们如何配对，我们提供了机器学习基础和开发周期的概述。然后，我们研究了如何将概念建模应用于机器学习，并提出了一个将概念建模纳入数据科学项目的框架。通过将该框架应用于一个医疗保健应用程序来说明该框架。对于逆配对，机器学习可以通过文本和规则挖掘以及知识图来影响概念建模。以这种方式将概念建模与机器学习结合起来，为以后的研究打下基础。摘要：Both conceptual modeling and machine learning have long been recognized as important areas of research. With the increasing emphasis on digitizing and processing large amounts of data for business and other applications, it would be helpful to consider how these areas of research can complement each other. To understand how they can be paired, we provide an overview of machine learning foundations and development cycle. We then examine how conceptual modeling can be applied to machine learning and propose a framework for incorporating conceptual modeling into data science projects. The framework is illustrated by applying it to a healthcare application. For the inverse pairing, machine learning can impact conceptual modeling through text and rule mining, as well as knowledge graphs. The pairing of conceptual modeling and machine learning in this this way should help lay the foundations for future research.

【37】 AI based Presentation Creator With Customized Audio Content Delivery 标题：基于AI的演示文稿创建器，支持定制音频内容交付

作者：Muvazima Mansoor,Srikanth Chandar,Ramamoorthy Srinath 机构：ECE, PES University, Bengaluru, India, CSE 链接：https://arxiv.org/abs/2106.14213 摘要：在本文中，我们提出了一个架构来解决一个新的问题陈述，这个问题陈述在最近随着COVID-19流行对虚拟内容交付需求的增加而更加突出。所有的教育机构、工作场所、研究中心等都在试图通过使用在线内容传递来弥合这个社会距离遥远的时代的沟通鸿沟。现在的趋势是创建演示文稿，然后使用各种虚拟会议平台进行演示。我们试图通过本文来减少和消除创建演示文稿和交付演示文稿所花费的时间，本论文旨在使用机器学习（ML）算法和自然语言处理（NLP）模块来自动化从文档创建基于幻灯片的演示文稿的过程，然后使用最先进的语音克隆模型，以所需作者的声音传递内容。我们认为结构化文档（如研究论文）是必须呈现的内容。研究论文首先使用BERT摘要技术进行总结，并浓缩成幻灯片中的要点。以Tacotron为灵感的架构，带有编码器、合成器和基于生成对抗网络（GAN）的声码器，用于以作者的声音（或任何定制的声音）来传达幻灯片的内容。现在，几乎所有的学习都转向了在线模式，专业人士现在在舒适的家里工作。由于目前的情况，教师和专业人员已转向介绍，以帮助他们传授信息。在本文中，我们的目标是通过自动化创建演示文稿的过程并随后以定制的声音交付演示文稿，从而减少创建演示文稿所需的大量时间，使用一种可以使用短音频片段克隆任何声音的内容交付机制。摘要：In this paper, we propose an architecture to solve a novel problem statement that has stemmed more so in recent times with an increase in demand for virtual content delivery due to the COVID-19 pandemic. All educational institutions, workplaces, research centers, etc. are trying to bridge the gap of communication during these socially distanced times with the use of online content delivery. The trend now is to create presentations, and then subsequently deliver the same using various virtual meeting platforms. The time being spent in such creation of presentations and delivering is what we try to reduce and eliminate through this paper which aims to use Machine Learning (ML) algorithms and Natural Language Processing (NLP) modules to automate the process of creating a slides-based presentation from a document, and then use state-of-the-art voice cloning models to deliver the content in the desired author's voice. We consider a structured document such as a research paper to be the content that has to be presented. The research paper is first summarized using BERT summarization techniques and condensed into bullet points that go into the slides. Tacotron inspired architecture with Encoder, Synthesizer, and a Generative Adversarial Network (GAN) based vocoder, is used to convey the contents of the slides in the author's voice (or any customized voice). Almost all learning has now been shifted to online mode, and professionals are now working from the comfort of their homes. Due to the current situation, teachers and professionals have shifted to presentations to help them in imparting information. In this paper, we aim to reduce the considerable amount of time that is taken in creating a presentation by automating this process and subsequently delivering this presentation in a customized voice, using a content delivery mechanism that can clone any voice using a short audio clip.

【38】 Learning to solve geometric construction problems from images 标题：学习从图像中解决几何构造问题

作者：J. Macke,J. Sedlar,M. Olsak,J. Urban,J. Sivic 机构： andJosef Sivic 2[0000−000 2− 2 5 5 4− 5 30 1] 1 Charles University in Prague, Czech Republic 2 Czech Technical University in Prague, cz 3 University of Innsbruck 备注：16 pages, 7 figures, 3 tables 链接：https://arxiv.org/abs/2106.14195 摘要：我们描述了一个纯粹的基于图像的方法来寻找几何结构与尺子和罗盘在欧几里德几何游戏。该方法采用了maskr-CNN最新的图像处理神经网络结构，并加入了基于树的搜索过程。在有监督的环境下，该方法从欧几里德的前六个层次包中学习求解所有68种几何构造问题，平均准确率为92%。当对新问题进行评价时，该方法可以解决68类欧氏问题中的31类。我们相信，这是第一次，一个纯粹的图像为基础的学习已被训练，以解决几何构造问题的这一困难。摘要：We describe a purely image-based method for finding geometric constructions with a ruler and compass in the Euclidea geometric game. The method is based on adapting the Mask R-CNN state-of-the-art image processing neural architecture and adding a tree-based search procedure to it. In a supervised setting, the method learns to solve all 68 kinds of geometric construction problems from the first six level packs of Euclidea with an average 92% accuracy. When evaluated on new kinds of problems, the method can solve 31 of the 68 kinds of Euclidea problems. We believe that this is the first time that a purely image-based learning has been trained to solve geometric construction problems of this difficulty.

【39】 PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph 标题：PeCoQ：基于知识图的波斯复杂问答数据集

作者：Romina Etezadi,Mehrnoush Shamsfard 机构：ADatasetforPersianComplexQuestionAnsweringoverKnowledgeGraphRominaEtezadiNaturalLanguageProcessingLabShahidBeheshtiUniversityTehran, irMehrnoushShamsfardNaturalLanguageProcessingLabShahidBeheshtiUniversityTehran 备注：5 pages, 4 figures 链接：https://arxiv.org/abs/2106.14167 摘要：问答系统可以从非结构化文本或结构化数据（如知识图）中找到用户问题的答案。使用监督学习方法（包括深度学习模型）回答问题需要大量的训练数据集。近年来，一些数据集被提出用于知识图上的问答任务，这是本文的重点。虽然有许多英语数据集被提出，但也有一些波斯语问答数据集。本文介绍了波斯语问答数据集textit{PeCoQ}。这个数据集包含10000个复杂的问题和答案，这些问题和答案是从波斯知识图FarsBase中提取出来的。对于每个问题，还提供了SPARQL查询和两个由语言学家编写的释义。数据集中有不同类型的复杂性，如多关系、多实体、顺序和时间约束。在本文中，我们讨论了数据集的特点，并描述了我们构建数据集的方法。摘要：Question answering systems may find the answers to users' questions from either unstructured texts or structured data such as knowledge graphs. Answering questions using supervised learning approaches including deep learning models need large training datasets. In recent years, some datasets have been presented for the task of Question answering over knowledge graphs, which is the focus of this paper. Although many datasets in English were proposed, there have been a few question-answering datasets in Persian. This paper introduces textit{PeCoQ}, a dataset for Persian question answering. This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase. For each question, the SPARQL query and two paraphrases that were written by linguists are provided as well. There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints. In this paper, we discuss the dataset's characteristics and describe our methodology for building it.

【40】 Effective Cascade Dual-Decoder Model for Joint Entity and Relation Extraction 标题：一种有效的联合实体和关系提取的级联双解码器模型

作者：Lianbo Ma,Huimin Ren,Xiliang Zhang 链接：https://arxiv.org/abs/2106.14163 摘要：从文本中提取关系三元组是知识图构造中的一项基本任务。现有方法普遍采用单一模型联合提取实体和关系的方法，这种方法往往存在三重重叠问题。也就是说，在一个句子中有多个关系三元组共享相同的实体。在这项工作中，我们提出了一个有效的级联双解码器的方法来提取重叠的关系三元组，其中包括一个文本特定的关系解码器和一个关系对应的实体解码器。我们的方法很简单：文本特定关系解码器根据句子的文本语义检测句子中的关系，并将其作为额外的特征来指导实体提取；对于每个具有可训练嵌入的提取关系，关系对应实体解码器使用基于跨度的标记方案来检测对应的头部和尾部实体。这样就很自然地解决了重叠三重问题。在两个公共数据集上的实验表明，在严格的评价指标下，该方法优于现有的方法，并获得了较好的F1成绩。我们的实现可在https://github.com/prastunlp/DualDec. 摘要：Extracting relational triples from texts is a fundamental task in knowledge graph construction. The popular way of existing methods is to jointly extract entities and relations using a single model, which often suffers from the overlapping triple problem. That is, there are multiple relational triples that share the same entities within one sentence. In this work, we propose an effective cascade dual-decoder approach to extract overlapping relational triples, which includes a text-specific relation decoder and a relation-corresponded entity decoder. Our approach is straightforward: the text-specific relation decoder detects relations from a sentence according to its text semantics and treats them as extra features to guide the entity extraction; for each extracted relation, which is with trainable embedding, the relation-corresponded entity decoder detects the corresponding head and tail entities using a span-based tagging scheme. In this way, the overlapping triple problem is tackled naturally. Experiments on two public datasets demonstrate that our proposed approach outperforms state-of-the-art methods and achieves better F1 scores under the strict evaluation metric. Our implementation is available at https://github.com/prastunlp/DualDec.

【41】 Continuous Control with Deep Reinforcement Learning for Autonomous Vessels 标题：基于深度强化学习的自治船舶连续控制

作者：Nader Zare,Bruno Brandoli,Mahtab Sarvmaili,Amilcar Soares,Stan Matwin 机构：Institute for Big Data Analytics, Dalhousie University, Halifax, Department of Computer Science, Memorial University of, Newfoundland, St. John’s, Institute for Computer Science, Polish Academy of Sciences, Warsaw 链接：https://arxiv.org/abs/2106.14130 摘要：海上自主运输在世界经济全球化进程中发挥了重要作用。深度强化学习（DRL）已被应用于自动路径规划，以模拟船舶在公海中的避碰情况。直接从输入中学习复杂映射的端到端方法在不同环境下达到目标的泛化能力较差。在这项工作中，我们提出了一种新的策略，称为状态-动作旋转，通过旋转获得的经验（状态-动作-状态）并将其保存在重放缓冲区中，来提高代理在不可见情况下的性能。我们设计了基于深度确定性策略梯度、局部视图生成器和规划器的模型。我们的代理使用两个深度卷积神经网络来估计策略和行动价值函数。所提出的模型经过了详尽的训练，并用蒙特利尔和哈利法克斯等城市的真实地图在海上场景中进行了测试。实验结果表明，CVN上的状态-动作旋转相对于具有规划器和局部视图（VNPLV）的船舶导航器（RATD）的到达目的地率（RATD）提高了11.96%，在不可见映射中的性能也提高了30.82%。我们提出的方法在一个新的环境中进行测试时表现出鲁棒性方面的优势，支持通过使用状态-动作旋转来实现泛化的思想。摘要：Maritime autonomous transportation has played a crucial role in the globalization of the world economy. Deep Reinforcement Learning (DRL) has been applied to automatic path planning to simulate vessel collision avoidance situations in open seas. End-to-end approaches that learn complex mappings directly from the input have poor generalization to reach the targets in different environments. In this work, we present a new strategy called state-action rotation to improve agent's performance in unseen situations by rotating the obtained experience (state-action-state) and preserving them in the replay buffer. We designed our model based on Deep Deterministic Policy Gradient, local view maker, and planner. Our agent uses two deep Convolutional Neural Networks to estimate the policy and action-value functions. The proposed model was exhaustively trained and tested in maritime scenarios with real maps from cities such as Montreal and Halifax. Experimental results show that the state-action rotation on top of the CVN consistently improves the rate of arrival to a destination (RATD) by up 11.96% with respect to the Vessel Navigator with Planner and Local View (VNPLV), as well as it achieves superior performance in unseen mappings by up 30.82%. Our proposed approach exhibits advantages in terms of robustness when tested in a new environment, supporting the idea that generalization can be achieved by using state-action rotation.

【42】 Visual Conceptual Blending with Large-scale Language and Vision Models 标题：大规模语言和视觉模型的视觉概念融合

作者：Songwei Ge,Devi Parikh 机构：University of Maryland, Facebook AI Research & Georgia Tech 链接：https://arxiv.org/abs/2106.14127 摘要：我们提出这样一个问题：最近的大规模语言和图像生成模型能在多大程度上融合视觉概念？给定一个任意的对象，我们识别一个相关的对象，并使用一个语言模型生成两个对象混合的一个句子描述。然后，我们使用基于文本的图像生成模型生成混合的视觉描述。定量和定性评价表明，语言模型优于传统的概念整合方法，最近的大规模图像生成模型优于先前的视觉描述模型。摘要：We ask the question: to what extent can recent large-scale language and image generation models blend visual concepts? Given an arbitrary object, we identify a relevant object and generate a single-sentence description of the blend of the two using a language model. We then generate a visual depiction of the blend using a text-based image generation model. Quantitative and qualitative evaluations demonstrate the superiority of language models over classical methods for conceptual blending, and of recent large-scale image generation models over prior models for the visual depiction.

【43】 Graph Convolutional Memory for Deep Reinforcement Learning 标题：深度强化学习的图卷积记忆

作者：Steven D. Morad,Stephan Liwicki,Amanda Prorok 机构：Department of Computer Science and Technology, University of Cambridge UK, Toshiba Europe Ltd. 链接：https://arxiv.org/abs/2106.14117 摘要：解决部分可观测马尔可夫决策过程（POMDPs）是将深度强化学习（DRL）应用于现实世界机器人问题的关键，在现实世界中，agent对世界的看法是不完全的。提出了一种利用深度强化学习求解POMDPs的图卷积存储器（GCM）。与递归神经网络（RNN）或变换器不同，GCM通过知识图将特定领域的先验知识嵌入到记忆召回过程中。通过在图中封装先验知识，GCM可以适应特定的任务，但仍然适用于任何DRL任务。利用图卷积，GCM提取层次图特征，类似于卷积神经网络（CNN）中的图像特征。我们发现GCM在控制、长期非顺序回忆和3D导航任务上优于长-短期记忆（LSTM）、强化学习的门控Transformer（GTrXL）和可微神经计算机（DNCs），同时使用的参数明显较少。摘要：Solving partially-observable Markov decision processes (POMDPs) is critical when applying deep reinforcement learning (DRL) to real-world robotics problems, where agents have an incomplete view of the world. We present graph convolutional memory (GCM) for solving POMDPs using deep reinforcement learning. Unlike recurrent neural networks (RNNs) or transformers, GCM embeds domain-specific priors into the memory recall process via a knowledge graph. By encapsulating priors in the graph, GCM adapts to specific tasks but remains applicable to any DRL task. Using graph convolutions, GCM extracts hierarchical graph features, analogous to image features in a convolutional neural network (CNN). We show GCM outperforms long short-term memory (LSTM), gated transformers for reinforcement learning (GTrXL), and differentiable neural computers (DNCs) on control, long-term non-sequential recall, and 3D navigation tasks while using significantly fewer parameters.

【44】 Time-Series Representation Learning via Temporal and Contextual Contrasting 标题：基于时间和上下文对比的时间序列表征学习

作者：Emadeldeen Eldele,Mohamed Ragab,Zhenghua Chen,Min Wu,Chee Keong Kwoh,Xiaoli Li,Cuntai Guan 机构：School of Computer Science and Engineering, Nanyang Technological University, Singapore, Institute for Infocomm Research, ASTAR, Singapore 备注：Accepted in IJCAI-21 conference ... please cite the conference version 链接：https://arxiv.org/abs/2106.14112 摘要：从时间动态的未标记时间序列数据中学习合适的表示是一项非常具有挑战性的任务。本文提出了一种基于时间和上下文对比的无监督时间序列表示学习框架（TS-TCC），用于从未标记数据中学习时间序列表示。首先，利用弱增广和强增广将原始时间序列数据转换成两种不同但相关的视图。其次，我们提出了一个新的时间对比模块，通过设计一个困难的交叉视图预测任务来学习鲁棒的时间表示。最后，为了进一步学习区分性表征，我们提出了一个基于时间对比模块的上下文对比模块。它试图最大化同一样本的不同上下文之间的相似性，同时最小化不同样本的上下文之间的相似性。在三个真实的时间序列数据集上进行了实验。结果表明，在所提出的TS-TCC学习的特征基础上训练线性分类器与有监督训练的效果相当。此外，我们提出的TS-TCC在较少的标记数据和迁移学习场景中表现出很高的效率。该代码在https://github.com/emadeldeen24/TS-TCC. 摘要：Learning decent representations from unlabeled time-series data with temporal dynamics is a very challenging task. In this paper, we propose an unsupervised Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC), to learn time-series representation from unlabeled data. First, the raw time-series data are transformed into two different yet correlated views by using weak and strong augmentations. Second, we propose a novel temporal contrasting module to learn robust temporal representations by designing a tough cross-view prediction task. Last, to further learn discriminative representations, we propose a contextual contrasting module built upon the contexts from the temporal contrasting module. It attempts to maximize the similarity among different contexts of the same sample while minimizing similarity among contexts of different samples. Experiments have been carried out on three real-world time-series datasets. The results manifest that training a linear classifier on top of the features learned by our proposed TS-TCC performs comparably with the supervised training. Additionally, our proposed TS-TCC shows high efficiency in few-labeled data and transfer learning scenarios. The code is publicly available at https://github.com/emadeldeen24/TS-TCC.

【45】 Image Classification with CondenseNeXt for ARM-Based Computing Platforms 标题：基于ARM计算平台的CondenseNeXt图像分类

作者：Priyank Kalgaonkar,Mohamed El-Sharkawy 机构：Department of Electrical and Computer Engineering, Purdue School of Engineering and Technology, Indianapolis, Indiana , USA. 备注：6 pages, 7 figures, conference, published IEEE Conference paper 链接：https://arxiv.org/abs/2106.14102 摘要：在这篇论文中，我们展示了我们的超高效深度卷积神经网络架构：CondenseNeXt在NXP BlueBox上的实现，NXP BlueBox是一个为自动驾驶车辆开发的自主驾驶开发平台。我们证明CondenseNeXt在FLOPs方面非常有效，它是为基于ARM的嵌入式计算平台设计的，计算资源有限，可以执行图像分类，而不需要CUDA支持的GPU。CondenseNeXt利用最先进的深度可分离卷积和模型压缩技术来实现显著的计算效率。对CIFAR-10、CIFAR-100和ImageNet数据集进行了广泛的分析，以验证卷积神经网络（CNN）结构的性能。在CIFAR-10（4.79%top-1误差）、CIFAR-100（21.98%top-1误差）和ImageNet（7.91%single model，single cropt top-5误差）三个基准数据集上实现了最先进的图像分类性能。CondenseNeXt最终训练的模型尺寸比CondenseNet提高了2.9 MB，前向触发器减少了59.98%，并且可以在基于ARM的计算平台上执行图像分类，而不需要CUDA支持的GPU，具有出色的效率。摘要：In this paper, we demonstrate the implementation of our ultra-efficient deep convolutional neural network architecture: CondenseNeXt on NXP BlueBox, an autonomous driving development platform developed for self-driving vehicles. We show that CondenseNeXt is remarkably efficient in terms of FLOPs, designed for ARM-based embedded computing platforms with limited computational resources and can perform image classification without the need of a CUDA enabled GPU. CondenseNeXt utilizes the state-of-the-art depthwise separable convolution and model compression techniques to achieve a remarkable computational efficiency. Extensive analyses are conducted on CIFAR-10, CIFAR-100 and ImageNet datasets to verify the performance of CondenseNeXt Convolutional Neural Network (CNN) architecture. It achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error). CondenseNeXt achieves final trained model size improvement of 2.9 MB and up to 59.98% reduction in forward FLOPs compared to CondenseNet and can perform image classification on ARM-Based computing platforms without needing a CUDA enabled GPU support, with outstanding efficiency.

【46】 Real-time 3D Object Detection using Feature Map Flow 标题：基于特征地图流的实时三维目标检测

作者：Youshaa Murhij,Dmitry Yudin 机构：Moscow Institute of Physics and Technology, Dolgoprodny, Russia 备注：CVPR 2021 Workshop on autonomous driving (Waymo Real-time 3D Detection) 链接：https://arxiv.org/abs/2106.14101 摘要：本文提出了一种基于深度神经模型推理（FMF）不同时间步长的时空特征图融合的实时三维检测方法。该方法提高了基于基线的三维检测中心的质量，并在nuScenes和Waymo基准上提供了实时性能。代码位于https://github.com/YoushaaMurhij/FMFNet 摘要：In this paper, we present a real-time 3D detection approach considering time-spatial feature map aggregation from different time steps of deep neural model inference (named feature map flow, FMF). Proposed approach improves the quality of 3D detection center-based baseline and provides real-time performance on the nuScenes and Waymo benchmark. Code is available at https://github.com/YoushaaMurhij/FMFNet

【47】 Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts 标题：基于语义概念的多模态变分自动编码器的广义零点学习

作者：Nihar Bendre,Kevin Desai,Peyman Najafirad 机构：Secure Artificial Intelligent Laboratory and Autonomy (AILA), The University of Texas at San Antonio, Texas, USA 备注：5 pages, 2 figures, 2 tables 链接：https://arxiv.org/abs/2106.14082 摘要：随着数据量的不断增加，多模态学习面临的主要挑战是标记样本的局限性。对于分类任务，元学习、Zero-Shot学习和Few-Shot学习等技术展示了基于先验知识学习新类信息的能力。最近的技术试图学习语义空间和图像空间之间的跨模态映射。然而，它们往往忽略了局部和全局的语义知识。为了克服这个问题，我们提出了一种多模态变分自动编码器（M-VAE），它可以学习图像特征的共享潜空间和语义空间。在我们的方法中，我们将多模态数据连接到单个嵌入，然后将其传递给VAE以学习潜在空间。我们建议在通过解码器重建特征嵌入的过程中使用多模态损耗。我们的方法能够关联模式并利用局部和全局语义知识进行新的样本预测。我们在四个基准数据集上使用MLP分类器的实验结果表明，我们提出的模型优于目前最先进的广义Zero-Shot学习方法。摘要：With the ever-increasing amount of data, the central challenge in multimodal learning involves limitations of labelled samples. For the task of classification, techniques such as meta-learning, zero-shot learning, and few-shot learning showcase the ability to learn information about novel classes based on prior knowledge. Recent techniques try to learn a cross-modal mapping between the semantic space and the image space. However, they tend to ignore the local and global semantic knowledge. To overcome this problem, we propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space. In our approach we concatenate multimodal data to a single embedding before passing it to the VAE for learning the latent space. We propose the use of a multi-modal loss during the reconstruction of the feature embedding through the decoder. Our approach is capable to correlating modalities and exploit the local and global semantic knowledge for novel sample predictions. Our experimental results using a MLP classifier on four benchmark datasets show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning.

【48】 Model-Advantage Optimization for Model-Based Reinforcement Learning 标题：基于模型的强化学习的模型优势优化

作者：Nirbhay Modhe,Harish Kamath,Dhruv Batra,Ashwin Kalyan 机构：Georgia Tech, Allen Institute for AI 链接：https://arxiv.org/abs/2106.14080 摘要：传统上，基于模型的强化学习（MBRL）算法的设计目标是学习环境的精确动态。这导致了模型学习的目标与寻找最优策略的整体学习问题之间的不匹配。价值感知模型学习是最大似然法的一种替代模型学习范式，它通过学习策略的价值函数为模型学习提供信息。虽然这种模式在理论上是合理的，但它并没有超出玩具设置的范围。在这项工作中，我们提出了一个新的价值意识的目标，这是一个上限的绝对性能差异的政策在两个模型。此外，我们还提出了一个通用算法，该算法修改了标准的MBRL管道——实现具有价值感知目标的学习。我们提出的目标，结合这个算法，是第一个成功的实例价值意识的MBRL在具有挑战性的连续控制环境，优于以往的价值意识的目标和具有竞争力的性能w.r.t.MLE为基础的MBRL方法。摘要：Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment. This introduces a mismatch between the objectives of model-learning and the overall learning problem of finding an optimal policy. Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy. While this paradigm is theoretically sound, it does not scale beyond toy settings. In this work, we propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models. Further, we propose a general purpose algorithm that modifies the standard MBRL pipeline -- enabling learning with value aware objectives. Our proposed objective, in conjunction with this algorithm, is the first successful instantiation of value-aware MBRL on challenging continuous control environments, outperforming previous value-aware objectives and with competitive performance w.r.t. MLE-based MBRL approaches.

【49】 Vision-driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks 标题：视觉驱动的顺应性操作，实现可靠、高精度的装配任务

作者：Andrew S. Morgan,Bowen Wen,Junchi Liang,Abdeslam Boularias,Aaron M. Dollar,Kostas Bekris 机构：Deparmtent of Mechanical Engineering and Materials Science, Yale University, USA, Department of Computer Science, Rutgers University, USA 链接：https://arxiv.org/abs/2106.14070 摘要：高度受限的操作任务对于自主机器人来说仍然是一个挑战，因为它们需要高水平的精度，通常小于1毫米，这与传统的感知系统所能达到的目标是不相容的。本文证明了将最先进的目标跟踪技术与被动自适应机械硬件相结合，可以在工业相关公差（0.25mm）很小的情况下完成精密操作任务。所提出的控制方法通过视觉跟踪相关工作空间中物体的相对6D姿态来实现闭环控制。它通过手内操作调整柔顺机械手和手的控制基准，完成物体插入任务。与以前的插入工作相反，我们的方法不需要昂贵的力传感器、精确的机械手，也不需要耗时的在线学习，因为在线学习需要大量的数据。相反，这项工作利用了机械柔顺性，并利用了离线学习的手的对象不可知操作模型、现成的运动规划和仅用合成数据训练的基于RGBD的对象跟踪器。这些特性使得所提出的系统可以很容易地推广和转移到新的任务和环境中。本文详细描述了该系统的组成部分，并通过大量实验证明了其有效性，包括各种几何图形的紧公差孔内插钉任务以及开放世界约束放置任务。摘要：Highly constrained manipulation tasks continue to be challenging for autonomous robots as they require high levels of precision, typically less than 1mm, which is often incompatible with what can be achieved by traditional perception systems. This paper demonstrates that the combination of state-of-the-art object tracking with passively adaptive mechanical hardware can be leveraged to complete precision manipulation tasks with tight, industrially-relevant tolerances (0.25mm). The proposed control method closes the loop through vision by tracking the relative 6D pose of objects in the relevant workspace. It adjusts the control reference of both the compliant manipulator and the hand to complete object insertion tasks via within-hand manipulation. Contrary to previous efforts for insertion, our method does not require expensive force sensors, precision manipulators, or time-consuming, online learning, which is data hungry. Instead, this effort leverages mechanical compliance and utilizes an object agnostic manipulation model of the hand learned offline, off-the-shelf motion planning, and an RGBD-based object tracker trained solely with synthetic data. These features allow the proposed system to easily generalize and transfer to new tasks and environments. This paper describes in detail the system components and showcases its efficacy with extensive experiments involving tight tolerance peg-in-hole insertion tasks of various geometries as well as open-world constrained placement tasks.

【50】 A Neural-symbolic Approach for Ontology-mediated Query Answering 标题：本体介导的查询回答的神经符号方法

作者：Medina Andresel,Csaba Domokos,Daria Stepanova,Trung-Kien Tran 机构：Bosch Center for AI, TU Wien 链接：https://arxiv.org/abs/2106.14052 摘要：近年来，知识图的低维向量空间表示被应用于不完全知识图上合取查询的求解。然而，目前的方法主要集中在归纳推理，即基于从数据中学习到的模式，通过预测事实来回答问题，缺乏应用外部领域知识进行演绎推理的能力。这样的（专家或常识）领域知识是一种宝贵的资源，可以用来提高机器智能。为了解决这一问题，我们引入了一种神经符号方法，用于在嵌入空间操作的不完全KG上进行本体介导的CQ应答。更具体地说，我们提出了各种数据扩充策略，使用基于查询重写的方法生成训练查询，然后利用一种新的损失函数来训练模型。实验结果证明了我们的训练策略和新的损失函数的有效性，即在需要归纳推理和演绎推理的情况下，我们的方法明显优于基线。摘要：Recently, low-dimensional vector space representations of knowledge graphs (KGs) have been applied to find answers to conjunctive queries (CQs) over incomplete KGs. However, the current methods only focus on inductive reasoning, i.e. answering CQs by predicting facts based on patterns learned from the data, and lack the ability of deductive reasoning by applying external domain knowledge. Such (expert or commonsense) domain knowledge is an invaluable resource which can be used to advance machine intelligence. To address this shortcoming, we introduce a neural-symbolic method for ontology-mediated CQ answering over incomplete KGs that operates in the embedding space. More specifically, we propose various data augmentation strategies to generate training queries using query-rewriting based methods and then exploit a novel loss function for training the model. The experimental results demonstrate the effectiveness of our training strategies and the new loss function, i.e., our method significantly outperforms the baseline in the settings that require both inductive and deductive reasoning.

【51】 Improved Approximation Algorithms for Individually Fair Clustering 标题：改进的个体公平聚类近似算法

作者：Ali Vakilian,Mustafa Yalçıner 链接：https://arxiv.org/abs/2106.14043 摘要：在Jung等人[2020]提出的公平性概念下，我们考虑了具有$ellp$-范数成本的$k$-聚类问题，包括$median，$k$-均值和$k$-中心成本函数：给定一组大小为$n$的点$p$，如果p$中的每个点$v，则一组$k$中心诱导公平聚类，$v$可以在$n/k$最近的邻居中找到一个中心。最近，Mahabadi和Vakilian[2020]展示了如何获得a$（p^{O（p）}，7）公平$k$聚类问题的$-双准则逼近$ellp$-范数代价：每个点在距离它的$（n/k）$最近邻最多$7$倍的距离内找到一个中心，解的$ellp$-范数代价最多$p^{O（p）}$倍最优公平解的代价。在这项工作中，对于任何$varepsilon>0$，我们提出了一个改进的$（16^p varepsilon，3）$-双准则近似，用于公平的$ellp$-范数成本的$k$-聚类。为了实现我们的保证，我们扩展了[Charikar et al.，2002，Swamy，2016]的框架，并设计了一个在拟阵约束下具有$ellp$范数成本的设施位置的$16^p$近似算法，这可能是一个独立的利益。此外，我们的方法建议将我们的个体公平聚类减少到Kleindessner等人[2019]提出的具有组公平性要求的聚类，这本质上是中间拟阵问题[Krishnaswamy等人，2011]。摘要：We consider the $k$-clustering problem with $ell_p$-norm cost, which includes $k$-median, $k$-means and $k$-center cost functions, under an individual notion of fairness proposed by Jung et al. [2020]: given a set of points $P$ of size $n$, a set of $k$ centers induces a fair clustering if for every point $vin P$, $v$ can find a center among its $n/k$ closest neighbors. Recently, Mahabadi and Vakilian [2020] showed how to get a $(p^{O(p)},7)$-bicriteria approximation for the problem of fair $k$-clustering with $ell_p$-norm cost: every point finds a center within distance at most $7$ times its distance to its $(n/k)$-th closest neighbor and the $ell_p$-norm cost of the solution is at most $p^{O(p)}$ times the cost of an optimal fair solution. In this work, for any $varepsilon>0$, we present an improved $(16^p varepsilon,3)$-bicriteria approximation for the fair $k$-clustering with $ell_p$-norm cost. To achieve our guarantees, we extend the framework of [Charikar et al., 2002, Swamy, 2016] and devise a $16^p$-approximation algorithm for the facility location with $ell_p$-norm cost under matroid constraint which might be of an independent interest. Besides, our approach suggests a reduction from our individually fair clustering to a clustering with a group fairness requirement proposed by Kleindessner et al. [2019], which is essentially the median matroid problem [Krishnaswamy et al., 2011].

【52】 The Feasibility and Inevitability of Stealth Attacks 标题：论隐形攻击的可行性和必然性

作者：Ivan Y. Tyukin,Desmond J. Higham,Eliyas Woldegeorgis,Alexander N. Gorban 机构：University of Leicester, Leicester, LE,RH, UK, University of Edinburgh, Edinburgh, EH,FD, UK 链接：https://arxiv.org/abs/2106.13997 摘要：我们开发和研究了新的对抗性干扰，使攻击者能够控制包括深度学习神经网络在内的通用人工智能（AI）系统中的决策。与对抗性数据修改不同，我们这里考虑的攻击机制涉及对人工智能系统本身的修改。这样的秘密攻击可能是由软件开发团队中一个调皮、腐败或不满的成员实施的。它也可以由那些希望利用“人工智能民主化”议程的人来实现，在这个议程中，网络架构和经过训练的参数集是公开共享的。在[Tyukin等人，国际神经网络联合会议，2020]的工作基础上，我们开发了一系列新的可实施的攻击策略，并进行了相应的分析，表明隐形攻击很有可能是透明的，在攻击者未知的固定验证集上，系统性能保持不变，同时在感兴趣的触发器输入上引发任何所需的输出。攻击者只需要估计验证集的大小和AI相关潜在空间的分布。在深度学习神经网络的情况下，我们证明了单神经元攻击是可能的-修改了与单个神经元相关的权重和偏差-揭示了由过度参数化引起的脆弱性。我们在现实环境中阐述这些概念。在理论和计算结果的指导下，提出了防范隐身攻击的策略。摘要：We develop and study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence (AI) systems including deep learning neural networks. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself. Such a stealth attack could be conducted by a mischievous, corrupt or disgruntled member of a software development team. It could also be made by those wishing to exploit a "democratization of AI" agenda, where network architectures and trained parameter sets are shared publicly. Building on work by [Tyukin et al., International Joint Conference on Neural Networks, 2020], we develop a range of new implementable attack strategies with accompanying analysis, showing that with high probability a stealth attack can be made transparent, in the sense that system performance is unchanged on a fixed validation set which is unknown to the attacker, while evoking any desired output on a trigger input of interest. The attacker only needs to have estimates of the size of the validation set and the spread of the AI's relevant latent space. In the case of deep learning neural networks, we show that a one neuron attack is possible - a modification to the weights and bias associated with a single neuron - revealing a vulnerability arising from over-parameterization. We illustrate these concepts in a realistic setting. Guided by the theory and computational results, we also propose strategies to guard against stealth attacks.

【53】 Rise of the Autonomous Machines 标题：自主机器的崛起

作者：Shaoshan Liu,Jean-Luc Gaudiot 机构：PerceptIn Inc, University of California, Irvine, U.S.A. 备注：to appear in IEEE Computer Magazine 链接：https://arxiv.org/abs/2106.13987 摘要：经过几十年的不断进步和发展，信息技术已经发展到可以说我们正在进入自主机器时代，但是在实现这一点的道路上还存在许多障碍。在本文中，我们对自主机器的技术和非技术挑战进行了初步的认识和分类；对于我们已经确定的十个领域中的每一个，我们回顾了当前的状况、障碍和潜在的研究方向。希望这将有助于社区为未来定义清晰、有效和更正式的开发目标。摘要：After decades of uninterrupted progress and growth, information technology has so evolved that it can be said we are entering the age of autonomous machines, but there exist many roadblocks in the way of making this a reality. In this article, we make a preliminary attempt at recognizing and categorizing the technical and non-technical challenges of autonomous machines; for each of the ten areas we have identified, we review current status, roadblocks, and potential research directions. It is hoped that this will help the community define clear, effective, and more formal development goalposts for the future.

【54】 ShapeEditer: a StyleGAN Encoder for Face Swapping 标题：ShapeEditer：一种面向人脸交换的StyleGAN编码器

作者：Shuai Yang,Kai Qiao 机构：Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support, Force Information Engineering University, Zhengzhou, China 备注：13 pages, 3 figures 链接：https://arxiv.org/abs/2106.13984 摘要：在本文中，我们提出了一种新的编码器，称为ShapeEditor，用于高分辨率、逼真和高保真的人脸交换。首先，为了保证足够的清晰度和真实性，我们的核心思想是采用一种先进的预训练高质量随机人脸图像发生器StyleGAN作为主干。其次，设计了两步编码器ShapeEditor，使交换后的人脸融合了输入人脸的身份和属性。第一步，分别提取源图像的特征向量和目标图像的属性向量；在第二步中，我们将身份向量和属性向量的连接映射到$mathcal{W }$势空间。此外，为了学习如何映射到StyleGAN的潜在空间，我们提出了一组无需人工标注训练数据的自监督损失函数。在测试数据集上的大量实验表明，该方法的结果不仅在清晰度和真实性方面比现有方法有很大的优势，而且体现了身份和属性的充分集成。摘要：In this paper, we propose a novel encoder, called ShapeEditor, for high-resolution, realistic and high-fidelity face exchange. First of all, in order to ensure sufficient clarity and authenticity, our key idea is to use an advanced pretrained high-quality random face image generator, i.e. StyleGAN, as backbone. Secondly, we design ShapeEditor, a two-step encoder, to make the swapped face integrate the identity and attribute of the input faces. In the first step, we extract the identity vector of the source image and the attribute vector of the target image respectively; in the second step, we map the concatenation of identity vector and attribute vector into the $mathcal{W }$ potential space. In addition, for learning to map into the latent space of StyleGAN, we propose a set of self-supervised loss functions with which the training data do not need to be labeled manually. Extensive experiments on the test dataset show that the results of our method not only have a great advantage in clarity and authenticity than other state-of-the-art methods, but also reflect the sufficient integration of identity and attribute.

【55】 Explanatory Pluralism in Explainable AI 标题：可解释人工智能中的解释多元化

作者：Yiheng Yao 机构：Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO, USA 备注：To be published in CD-MAKE 2021 conference proceedings 链接：https://arxiv.org/abs/2106.13976 摘要：人工智能模型越来越广泛的应用，激发了不同利益相关者对解释的需求。然而，这种要求是模棱两可的，因为有许多类型的“解释”具有不同的评价标准。本着多元论的精神，我绘制了解释类型的分类法，以及可以处理它们的相关XAI方法。当我们试图揭示人工智能模型的内在机制时，我们会开发诊断解释。当我们试图使模型输出易于理解时，我们会给出解释。当我们希望对我们的模型形成稳定的概括时，我们会给出期望解释。最后，当我们想证明模型的使用是正确的时，我们会给出角色解释，将模型置于其社会背景中。这种多元化观点的动机源于对原因的考虑，即可操作的关系，以及不同类型的解释，即确定我们可以干预的人工智能系统中的相关点，以影响我们期望的变化。本文减少了XAI领域中“解释”一词的歧义，为从业者和利益相关者提供了一个避免歧义、评估XAI方法和假定解释的有用模板。摘要：The increasingly widespread application of AI models motivates increased demand for explanations from a variety of stakeholders. However, this demand is ambiguous because there are many types of 'explanation' with different evaluative criteria. In the spirit of pluralism, I chart a taxonomy of types of explanation and the associated XAI methods that can address them. When we look to expose the inner mechanisms of AI models, we develop Diagnostic-explanations. When we seek to render model output understandable, we produce Explication-explanations. When we wish to form stable generalizations of our models, we produce Expectation-explanations. Finally, when we want to justify the usage of a model, we produce Role-explanations that situate models within their social context. The motivation for such a pluralistic view stems from a consideration of causes as manipulable relationships and the different types of explanations as identifying the relevant points in AI systems we can intervene upon to affect our desired changes. This paper reduces the ambiguity in use of the word 'explanation' in the field of XAI, allowing practitioners and stakeholders a useful template for avoiding equivocation and evaluating XAI methods and putative explanations.

【56】 Intrinsically Motivated Self-supervised Learning in Reinforcement Learning 标题：强化学习中的内在激励自监督学习

作者：Yue Zhao,Chenzhuang Du,Hang Zhao,Tiejun Li 机构：Peking University, Tsinghua University, Shanghai Qi Zhi Institute 链接：https://arxiv.org/abs/2106.13970 摘要：在基于视觉的强化学习（RL）任务中，为了获得更多的语义表示和提高样本效率，通常采用一种具有代理自监督损失的辅助任务分配方法。然而，由于表示学习部分和决策部分是分离的，自监督辅助任务中的大量信息被忽略了。为了充分利用辅助任务中的信息，我们提出了一种简单而有效的方法，将自我监督损失作为内在奖励，称为强化学习中的内在动机自我监督学习（IM-SSR）。形式化地证明了自监督损失可以分解为新状态的探索和干扰消除的鲁棒性改进。IM-SSR可以毫不费力地插入任何强化学习与自我监督辅助目标几乎没有额外的费用。与IM-SSR相结合，以前的算法在基于视觉的机器人任务中，特别是在奖励信号稀疏的情况下，在样本效率和泛化能力上都有显著的提高。摘要：In vision-based reinforcement learning (RL) tasks, it is prevalent to assign the auxiliary task with a surrogate self-supervised loss so as to obtain more semantic representations and improve sample efficiency. However, abundant information in self-supervised auxiliary tasks has been disregarded, since the representation learning part and the decision-making part are separated. To sufficiently utilize information in the auxiliary task, we present a simple yet effective idea to employ self-supervised loss as an intrinsic reward, called Intrinsically Motivated Self-Supervised learning in Reinforcement learning (IM-SSR). We formally show that the self-supervised loss can be decomposed as exploration for novel states and robustness improvement from nuisance elimination. IM-SSR can be effortlessly plugged into any reinforcement learning with self-supervised auxiliary objectives with nearly no additional cost. Combined with IM-SSR, the previous underlying algorithms achieve salient improvements on both sample efficiency and generalization in various vision-based robotics tasks from the DeepMind Control Suite, especially when the reward signal is sparse.

【57】 Autonomous Deep Quality Monitoring in Streaming Environments 标题：流媒体环境下的自主深度质量监控

作者：Andri Ashfahani,Mahardhika Pratama,Edwin Lughofer,Edward Yapp Kien Yee 机构：SCSE, NTU, Singapore, DKBMS, JKU, Austria, E. Y. K. Yee, SIMTech, ASTAR 备注：None 链接：https://arxiv.org/abs/2106.13955 摘要：在工业中，质量监控的通常做法是依靠手动检查，众所周知，手动检查速度慢、容易出错且依赖于操作员。这一问题对数据驱动方法开发的自动化实时质量监控提出了强烈的需求，从而减轻了对操作员的依赖，并适应了各种过程的不确定性。尽管如此，当前的方法并没有考虑到感官信息的流性质，而严重依赖于手工制作的特性，使其具有特定于应用程序的特性。本文提出了基于最近发展起来的数据流深度学习算法的在线质量监控方法，即动态进化容量神经网络NADINE 。它的特点是集成了1-D和2-D卷积层，以提取时间序列的自然特征和从我们自己的项目中的注塑机的传感器和摄像头捕获的视觉数据流。实时实验中，在线质量监控任务在预先测试的基础上进行动态模拟，然后采用显著的数据流评估协议进行训练。与最先进的技术相比，NADINE 在流媒体环境中的质量监控任务平均提高了4.68%。为了支持可复制的研究计划，NADINE 的代码、结果以及补充材料和注塑数据集在url中提供{https://github.com/ContinualAL/NADINE-IJCNN2021}. 摘要：The common practice of quality monitoring in industry relies on manual inspection well-known to be slow, error-prone and operator-dependent. This issue raises strong demand for automated real-time quality monitoring developed from data-driven approaches thus alleviating from operator dependence and adapting to various process uncertainties. Nonetheless, current approaches do not take into account the streaming nature of sensory information while relying heavily on hand-crafted features making them application-specific. This paper proposes the online quality monitoring methodology developed from recently developed deep learning algorithms for data streams, Neural Networks with Dynamically Evolved Capacity (NADINE), namely NADINE . It features the integration of 1-D and 2-D convolutional layers to extract natural features of time-series and visual data streams captured from sensors and cameras of the injection molding machines from our own project. Real-time experiments have been conducted where the online quality monitoring task is simulated on the fly under the prequential test-then-train fashion - the prominent data stream evaluation protocol. Comparison with the state-of-the-art techniques clearly exhibits the advantage of NADINE with 4.68% improvement on average for the quality monitoring task in streaming environments. To support the reproducible research initiative, codes, results of NADINE along with supplementary materials and injection molding dataset are made available in url{https://github.com/ContinualAL/NADINE-IJCNN2021}.

【58】 Continual Learning via Inter-Task Synaptic Mapping 标题：基于任务间突触映射的持续学习

作者：Mao Fubing,Weng Weiwei,Mahardhika Pratama,Edward Yapp Kien Yee 机构：Kien Yeec, National Engineering Research Center for Big Data Technology and System, Services, Computing Technology and System Lab, Cluster and Grid Computing Lab, School of, Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 备注：None 链接：https://arxiv.org/abs/2106.13954 摘要：从流任务中学习会导致一个模型灾难性地抹去它从以前的片段中吸收的独特经验。虽然LWF、SI、EWC等正则化技术通过限制旧任务的重要参数在接受新概念时的变化，证明了它们是克服这一问题的有效途径，但这些方法没有利用每个任务的公共信息，这些信息可以共享给现有的神经元。因此，由于参数重要性变量迅速爆炸，它们不能很好地扩展到大规模问题。本文提出了一种任务间突触标测（ISYANA）方法来支持知识的持续学习。ISYANA将任务与神经元的关系以及概念与概念的关系结合起来，这样就防止了神经元在接受相关概念的同时接受不同的概念。对基准连续学习问题进行了数值研究，并与常用的连续学习算法进行了比较。ISYANA的表演与艺术水平相比具有竞争力。ISYANA的代码在url中提供{https://github.com/ContinualAL/ISYANAKBS}. 摘要：Learning from streaming tasks leads a model to catastrophically erase unique experiences it absorbs from previous episodes. While regularization techniques such as LWF, SI, EWC have proven themselves as an effective avenue to overcome this issue by constraining important parameters of old tasks from changing when accepting new concepts, these approaches do not exploit common information of each task which can be shared to existing neurons. As a result, they do not scale well to large-scale problems since the parameter importance variables quickly explode. An Inter-Task Synaptic Mapping (ISYANA) is proposed here to underpin knowledge retention for continual learning. ISYANA combines task-to-neuron relationship as well as concept-to-concept relationship such that it prevents a neuron to embrace distinct concepts while merely accepting relevant concept. Numerical study in the benchmark continual learning problems has been carried out followed by comparison against prominent continual learning algorithms. ISYANA exhibits competitive performance compared to state of the arts. Codes of ISYANA is made available in url{https://github.com/ContinualAL/ISYANAKBS}.

【59】 Discovering Generalizable Skills via Automated Generation of Diverse Tasks 标题：通过自动生成不同的任务来发现可概括的技能

作者：Kuan Fang,Yuke Zhu,Silvio Savarese,Li Fei-Fei 机构： Stanford University, UT Austin, Nvidia 备注：RSS 2021 链接：https://arxiv.org/abs/2106.13935 摘要：智能体的学习效率和泛化能力可以通过使用一组有用的技能得到很大的提高。然而，机器人技能的设计在现实世界的应用中往往是棘手的，因为它需要大量的努力和专业知识。在这项工作中，我们介绍了在多样化环境中的技能学习（SLIDE），一种通过自动生成一组不同的任务来发现可概括技能的方法。与以往在无监督下发现技能的工作不同，我们的方法鼓励技能在相同的环境中产生不同的结果，我们将每一项技能与一个可训练的任务生成器生成的唯一任务配对。为了鼓励归纳技能的出现，我们的方法训练每一项技能，使之专门化成对的任务，并最大限度地提高生成任务的多样性。根据机器人在生成的任务中的行为，联合训练一个任务鉴别器来估计多样性目标的证据下界。所学的技能，然后可以组成一个分层强化学习算法来解决看不见的目标任务。实验结果表明，该方法能有效地学习两个桌面操作领域的机器人技能。结果表明，与现有的强化学习和技能学习方法相比，所学习的技能能够有效地提高机器人在各种未知目标任务中的性能。摘要：The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills which incentivizes the skills to produce different outcomes in the same environment, our method pairs each skill with a unique task produced by a trainable task generator. To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks. We demonstrate that the proposed method can effectively learn a variety of robot skills in two tabletop manipulation domains. Our results suggest that the learned skills can effectively improve the robot's performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods.

【60】 Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models 标题：采用验收和排序模型降低代码完成的隐性成本

作者：Jingxuan Li,Rui Huang,Wei Li,Kai Yao,Weiguo Tan 机构：Huawei Technologies Co., Ltd, Shenzhen, China 备注：10 pages, 7 figures, accepted by ICSME 2021 链接：https://arxiv.org/abs/2106.13928 摘要：代码完成被软件开发人员广泛地用来为部分编写的代码片段提供编码建议。除了传统的代码完成方法只支持最小位置的单标记完成外，最近的研究表明，在更灵活的位置提供更长的代码完成时间的能力。然而，这种频繁触发和较长的完成结果会降低总体精度，因为它们会生成更多无效结果。而且，不同的研究大多互不相容。因此，开发一个集成框架是非常重要的，它可以将多个模型的结果结合起来，从而得出每个模型的优点和缺点。本文进行了编码模拟，从代码上下文和不同的代码完成模型中收集数据，然后将数据应用到两个任务中。首先，我们引入一个接受模型，它可以动态地控制是否向开发人员显示完成结果。它使用模拟特征来预测这些模型的输出是否存在正确的结果。我们的最佳模型将假阳性完成率从55.09%降低到17.44%。其次，我们设计了一个融合排序方案，可以自动识别完成结果的优先级，并对多个代码完成模型中的候选代码进行重新排序。该方案可以灵活地处理各种模型，而不管其完成结果的类型或长度。我们将这个排名方案与两个频率模型和一个GPT-2风格的语言模型以及接受模型相结合，使得TOP1和TOP5的准确率分别提高了27.80%和37.64%。此外，我们还提出了一种新的代码完成评估指标，即效益成本比（BCR），该指标考虑了节省击键的好处和完成列表浏览的隐藏成本，更接近真实的编码体验场景。摘要：Code completion is widely used by software developers to provide coding suggestions given a partially written code snippet. Apart from the traditional code completion methods, which only support single token completion at minimal positions, recent studies show the ability to provide longer code completion at more flexible positions. However, such frequently triggered and longer completion results reduce the overall precision as they generate more invalid results. Moreover, different studies are mostly incompatible with each other. Thus, it is vital to develop an ensemble framework that can combine results from multiple models to draw merits and offset defects of each model. This paper conducts a coding simulation to collect data from code context and different code completion models and then apply the data in two tasks. First, we introduce an acceptance model which can dynamically control whether to display completion results to the developer. It uses simulation features to predict whether correct results exist in the output of these models. Our best model reduces the percentage of false-positive completion from 55.09% to 17.44%. Second, we design a fusion ranking scheme that can automatically identify the priority of the completion results and reorder the candidates from multiple code completion models. This scheme is flexible in dealing with various models, regardless of the type or the length of their completion results. We integrate this ranking scheme with two frequency models and a GPT-2 styled language model, along with the acceptance model to yield 27.80% and 37.64% increase in TOP1 and TOP5 accuracy, respectively. In addition, we propose a new code completion evaluation metric, Benefit-Cost Ratio(BCR), taking into account the benefit of keystrokes saving and hidden cost of completion list browsing, which is closer to real coder experience scenario.

【61】 Quantum Computing for Artificial Intelligence Based Mobile Network Optimization 标题：基于人工智能的移动网络优化的量子计算

作者：Furqan Ahmed,Petri Mähönen 机构：Elisa Corporation, Helsinki, Finland, Petri M¨ah¨onen, Institute for Networked Systems, RWTH Aachen University, Aachen, Germany, ©, IEEE 备注：Accepted in 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) - Track 4: Mobile and Wireless Networks 链接：https://arxiv.org/abs/2106.13917 摘要：本文讨论了如何利用人工智能中的约束满足问题的概念对无线接入网优化问题进行建模，并用量子计算机进行大规模求解。作为一个案例研究，我们讨论了根序列索引（RSI）分配问题-一个重要的LTE/NR物理随机接入信道配置相关的自动化用例。我们将RSI分配描述为二次无约束二元优化（QUBO）问题，该问题是使用从商业移动网络中获取的数据构造的，并使用基于云的商业量子计算平台来解决。结果表明，量子退火算法能够成功地分配无冲突的rsi。与已有的启发式算法相比，一些经典算法在求解质量和计算时间上更为有效。非量子优势是由于当前的实现是一个半量子概念证明算法。而且，结果取决于所用量子计算机的类型。尽管如此，提出的框架是高度灵活的，并拥有巨大的潜力，利用量子计算的力量在移动网络自动化。摘要：In this paper, we discuss how certain radio access network optimization problems can be modelled using the concept of constraint satisfaction problems in artificial intelligence, and solved at scale using a quantum computer. As a case study, we discuss root sequence index (RSI) assignment problem - an important LTE/NR physical random access channel configuration related automation use-case. We formulate RSI assignment as quadratic unconstrained binary optimization (QUBO) problem constructed using data ingested from a commercial mobile network, and solve it using a cloud-based commercially available quantum computing platform. Results show that quantum annealing solver can successfully assign conflict-free RSIs. Comparison with well-known heuristics reveals that some classic algorithms are even more effective in terms of solution quality and computation time. The non-quantum advantage is due to the fact that current implementation is a semi-quantum proof-of-concept algorithm. Also, the results depend on the type of quantum computer used. Nevertheless, the proposed framework is highly flexible and holds tremendous potential for harnessing the power of quantum computing in mobile network automation.

【62】 Midpoint Regularization: from High Uncertainty Training to Conservative Classification 标题：中点正则化：从高不确定性训练到保守分类

作者：Hongyu Guo 机构：National Research Council Canada, Montreal Road, Ottawa, Ontario, K,A ,R 备注：Accepted to ECML-PKDD 2021. arXiv admin note: substantial text overlap with arXiv:2012.01559 链接：https://arxiv.org/abs/2106.13913 摘要：标签平滑（LS）通过惩罚产生过度自信输出分布的模型，提高了模型的泛化能力。对于每个训练样本，LS策略通过将其分布质量分布在非地面真值类上来平滑一个热编码的训练信号。我们通过考虑示例对来扩展这一技术，即PLS。PLS首先通过平均随机样本对来创建中点样本，然后在训练过程中学习每个中点样本的平滑分布，从而得到具有高不确定性标签的中点用于训练。实验表明，PLS显著优于LS，实现了高达30%的相对分类误差降低。我们还设想，PLS产生非常低的中奖softmax分数都在分布内和分布外的样本。摘要：Label Smoothing (LS) improves model generalization through penalizing models from generating overconfident output distributions. For each training sample the LS strategy smooths the one-hot encoded training signal by distributing its distribution mass over the non-ground truth classes. We extend this technique by considering example pairs, coined PLS. PLS first creates midpoint samples by averaging random sample pairs and then learns a smoothing distribution during training for each of these midpoint samples, resulting in midpoints with high uncertainty labels for training. We empirically show that PLS significantly outperforms LS, achieving up to 30% of relative classification error reduction. We also visualize that PLS produces very low winning softmax scores for both in and out of distribution samples.

【63】 Predictive Control Using Learned State Space Models via Rolling Horizon Evolution 标题：基于滚动时域进化学习状态空间模型的预测控制

作者：Alvaro Ovalle,Simon M. Lucas 机构：School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom 备注：Accepted at the Bridging the Gap Between AI Planning and Reinforcement Learning (PRL) Workshop at ICAPS 2021 链接：https://arxiv.org/abs/2106.13911 摘要：基于模型的强化学习的兴趣很大一部分来自于获得一个能够进行战略性长期决策的前向模型的潜在效用。假设一个agent成功地学习了一个有用的预测模型，它仍然需要一种机制来利用它来生成和选择相互竞争的模拟计划。在本文中，我们将进化算法规划技术与通过深度学习和变分推理学习的模型相结合来探索这一主题。我们演示了一个代理的方法，该代理在一组视觉导航任务中可靠地执行在线规划。摘要：A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolutionary algorithmic planning techniques with models learned via deep learning and variational inference. We demonstrate the approach with an agent that reliably performs online planning in a set of visual navigation tasks.

【64】 Compositional Reinforcement Learning from Logical Specifications 标题：来自逻辑规范的组合强化学习

作者：Kishor Jothimurugan,Suguman Bansal,Osbert Bastani,Rajeev Alur 机构：University of Pennsylvania 链接：https://arxiv.org/abs/2106.13906 摘要：研究了逻辑规范下复杂任务的学习控制策略问题。最近的方法自动从一个给定的规范生成一个奖励函数，并使用适当的强化学习算法来学习一个使期望奖励最大化的策略。然而，这些方法对于需要高层次规划的复杂任务的扩展性很差。在这项工作中，我们开发了一种组合学习方法，称为DiRL，它将高级规划和强化学习交织在一起。首先，DiRL将规范编码为抽象图；直观地说，图的顶点和边分别对应于状态空间和简单子任务的区域。然后，我们的方法结合强化学习来学习Dijkstra式规划算法中每个边（子任务）的神经网络策略，以计算图中的高级规划。在一组具有连续状态空间和动作空间的具有挑战性的控制基准上对所提出的方法进行了评估，结果表明该方法优于最新的基准。摘要：We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. First, DiRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.

【65】 Building Bridges: Generative Artworks to Explore AI Ethics 标题：搭建桥梁：探索人工智能伦理的生成性艺术品

作者：Ramya Srinivasan,Devi Parikh 机构：Fujitsu Research of America, Georgia Tech and Facebook AI Research 链接：https://arxiv.org/abs/2106.13901 摘要：近年来，人们越来越重视理解和减轻人工智能技术对社会的不利影响。在学术界、工业界和政府机构中，各种各样的努力都在努力加强人工智能道德。道德人工智能系统设计中的一个重大挑战是，人工智能管道中有多个利益相关者，每个利益相关者都有自己的约束和利益。这些不同的观点往往不被理解，部分原因是沟通上的差距。例如，设计和开发人工智能模型的人工智能研究人员不一定意识到人工智能决策的复合效应在消费者生活中引发的不稳定性。有必要在更广泛的背景下教育不同的利益相关者他们的角色和责任。在这份立场文件中，我们概述了一些潜在的方式，在其中，生成性艺术作品可以发挥这一作用，作为访问和强大的教育工具，以呈现不同的观点。我们希望激发跨学科的讨论，广泛的计算创造力作为一种工具，提高人工智能伦理。摘要：In recent years, there has been an increased emphasis on understanding and mitigating adverse impacts of artificial intelligence (AI) technologies on society. Across academia, industry, and government bodies, a variety of endeavours are being pursued towards enhancing AI ethics. A significant challenge in the design of ethical AI systems is that there are multiple stakeholders in the AI pipeline, each with their own set of constraints and interests. These different perspectives are often not understood, due in part to communication gaps.For example, AI researchers who design and develop AI models are not necessarily aware of the instability induced in consumers' lives by the compounded effects of AI decisions. Educating different stakeholders about their roles and responsibilities in the broader context becomes necessary. In this position paper, we outline some potential ways in which generative artworks can play this role by serving as accessible and powerful educational tools for surfacing different perspectives. We hope to spark interdisciplinary discussions about computational creativity broadly as a tool for enhancing AI ethics.

【66】 Domain Conditional Predictors for Domain Adaptation 标题：域自适应的域条件预报器

作者：Joao Monteiro,Xavier Gibert,Jianqiao Feng,Vincent Dumoulin,Dar-Shyang Lee 机构：Google 备注：Part of the pre-registration workshop at NeurIPS 2020: this https URL 链接：https://arxiv.org/abs/2106.13899 摘要：学习保证通常依赖于i.i.d.数据的假设，一旦预测者被部署到执行现实任务中，在实践中很可能会违反这些假设。因此，域自适应方法作为一个有用的框架出现，在支持不同的训练和测试数据分布时产生额外的灵活性，前提是满足其他假设，如协变量移位，即期望标签上的条件分布独立于基础数据分布。为了在不同的训练数据源和测试数据源之间进行泛化，引入了几种方法，这些方法通常依赖于域不变性的一般思想，使得预测模型忽略了数据生成分布。在本文中，我们通过从相反的方向来处理跨数据源的泛化问题：我们考虑一种条件建模方法，其中预测除了依赖于输入数据外，还使用与底层数据生成分布相关的信息。例如，该模型有一个明确的机制来适应不断变化的环境和/或新的数据源。我们认为，这种方法比现有的域自适应方法更具普遍适用性，因为它不需要额外的假设，如协变量移位，并进一步产生更简单的训练算法，避免了通常在域不变方法中使用的minimax公式引起的训练不稳定性的共同来源。摘要：Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which expects the conditional distributions over labels to be independent of the underlying data distribution. Several approaches were introduced in order to induce generalization across varying train and test data sources, and those often rely on the general idea of domain-invariance, in such a way that the data-generating distributions are to be disregarded by the prediction model. In this contribution, we tackle the problem of generalizing across data sources by approaching it from the opposite direction: we consider a conditional modeling approach in which predictions, in addition to being dependent on the input data, use information relative to the underlying data-generating distribution. For instance, the model has an explicit mechanism to adapt to changing environments and/or new data sources. We argue that such an approach is more generally applicable than current domain adaptation methods since it does not require extra assumptions such as covariate shift and further yields simpler training algorithms that avoid a common source of training instabilities caused by minimax formulations, often employed in domain-invariant methods.

【67】 Closed-form Continuous-Depth Models 标题：闭合形式的连续深度模型

作者：Ramin Hasani,Mathias Lechner,Alexander Amini,Lucas Liebenwein,Max Tschaikowski,Gerald Teschl,Daniela Rus 机构： 3Aalborg University, 4University of Vienna 备注：17 pages 链接：https://arxiv.org/abs/2106.13898 摘要：连续深度神经模型，其中模型隐藏状态的导数由神经网络定义，具有强大的顺序数据处理能力。然而，这些模型依赖于先进的数值微分方程（DE）解算器，在计算成本和模型复杂性方面都产生了巨大的开销。在本文中，我们提出了一个新的模型族，称为闭式连续深度（CfC）网络，该网络描述简单，速度至少快一个数量级，同时与基于ODE的网络模型相比具有同样强大的建模能力。这些模型由此从时间连续模型的表达子集的解析闭式解导出，从而减轻了对所有复杂解算器的需求。在我们的实验评估中，我们证明了CfC网络在一系列不同的时间序列预测任务（包括那些具有长期依赖性和不规则采样数据的任务）上优于先进的递归模型。我们相信，我们的发现为在资源受限的环境中训练和部署丰富的、连续的神经模型提供了新的机会，这些环境对性能和效率都有要求。摘要：Continuous-depth neural models, where the derivative of the model's hidden state is defined by a neural network, have enabled strong sequential data processing capabilities. However, these models rely on advanced numerical differential equation (DE) solvers resulting in a significant overhead both in terms of computational cost and model complexity. In this paper, we present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster while exhibiting equally strong modeling abilities compared to their ODE-based counterparts. The models are hereby derived from the analytical closed-form solution of an expressive subset of time-continuous models, thus alleviating the need for complex DE solvers all together. In our experimental evaluations, we demonstrate that CfC networks outperform advanced, recurrent models over a diverse set of time-series prediction tasks, including those with long-term dependencies and irregularly sampled data. We believe our findings open new opportunities to train and deploy rich, continuous neural models in resource-constrained settings, which demand both performance and efficiency.

【68】 Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits 标题：关系环的具有上界置信度的知识注入政策梯度

作者：Kaushik Roy,Qi Zhang,Manas Gaur,Amit Sheth 机构：Artificial Intelligence Institute, University of South Carolina, Columbia, USA 备注：Accepted for publication in the research track at ECML-PKDD 2021 链接：https://arxiv.org/abs/2106.13895 摘要：上下文盗贼在各种现实场景中发现了重要的用例，如在线广告、推荐系统、医疗保健等。然而，大多数算法使用平面特征向量来表示上下文，而在现实世界中，有不同数量的对象和它们之间的关系在上下文中建模。例如，在音乐推荐系统中，用户上下文包含他们所听的音乐、艺术家创作的音乐、艺术家专辑等。添加更丰富的关系上下文表示还引入了更大的上下文空间，使得探索更困难。为了提高勘探开发的效率，可以注入有关背景的知识来指导勘探开发战略。由于其描述性，关系上下文表示为人类指定知识提供了一种自然的方式。我们提出了一种新的基于知识注入策略梯度的算法和一种新的基于知识注入策略梯度的置信上限算法，并对一个模拟的音乐推荐数据集和各种实际数据集进行了实验分析后悔和不能后悔的地方。摘要：Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.

【69】 Self-paced Principal Component Analysis 标题：自定步主成分分析

作者：Zhao Kang,Hongfei Liu,Jiangxin Li,Xiaofeng Zhu,Ling Tian 机构： [ 1 2] develop a computationallysimple paradigm for image denoising using superpixel-basedThe authors are with the School of Computer Science and Engineering, University of Electronic Science and Technology of China 链接：https://arxiv.org/abs/2106.13880 摘要：主成分分析（PCA）在降维和特征提取方面有着广泛的应用。鲁棒主元分析（RPCA）在l1范数、l2范数、p范数等不同的鲁棒距离度量下，能在一定程度上处理噪声或异常值。然而，现实世界中的数据可能显示这些简单函数无法完全捕获的结构。另外，现有方法对复杂样本和简单样本一视同仁。相比之下，人类通常采用的学习模式是从简单到复杂，从少到多。基于这一原理，我们提出了一种新的方法，称为自步PCA（SPCA），以进一步降低噪声和异常值的影响。值得注意的是，在每次迭代开始时计算每个样本的复杂度，以便将从简单到更复杂的样本集成到训练中。基于交替优化，SPCA找到一个最优的投影矩阵，并迭代地滤除异常值。理论分析证明了SPCA的合理性。在流行数据集上的大量实验表明，该方法能显著提高现有结果。摘要：Principal Component Analysis (PCA) has been widely used for dimensionality reduction and feature extraction. Robust PCA (RPCA), under different robust distance metrics, such as l1-norm and l2, p-norm, can deal with noise or outliers to some extent. However, real-world data may display structures that can not be fully captured by these simple functions. In addition, existing methods treat complex and simple samples equally. By contrast, a learning pattern typically adopted by human beings is to learn from simple to complex and less to more. Based on this principle, we propose a novel method called Self-paced PCA (SPCA) to further reduce the effect of noise and outliers. Notably, the complexity of each sample is calculated at the beginning of each iteration in order to integrate samples from simple to more complex into training. Based on an alternating optimization, SPCA finds an optimal projection matrix and filters out outliers iteratively. Theoretical analysis is presented to show the rationality of SPCA. Extensive experiments on popular data sets demonstrate that the proposed method can improve the state of-the-art results considerably.

【70】 Rationale-Inspired Natural Language Explanations with Commonsense 标题：理性启发的常识自然语言解释

作者：Bodhisattwa Prasad Majumder,Oana-Maria Camburu,Thomas Lukasiewicz,Julian McAuley 机构：Department of Computer Science and Engineering, UC San Diego, USA, Department of Computer Science, University of Oxford, UK, Alan Turing Institute, London, UK 链接：https://arxiv.org/abs/2106.13876 摘要：可解释的机器学习模型主要使用提取原理（即输入特征的子集）或自由文本自然语言解释（NLEs）作为抽象证明来证明预测标签的正确性。虽然NLE比提取理论更全面，但机器生成的NLE有时缺乏常识性知识。在这里，我们表明，常识知识可以作为一个桥梁之间的提取原理和自然语言，使这两种类型的解释更好。更准确地说，我们引入了一个统一的框架，称为RExC（理性启发的常识解释），它（1）将原理提取为一组负责机器预测的特征，（2）使用可用的常识资源扩展提取原理，利用扩展知识生成自然语言解释。我们的框架在自然语言处理和视觉语言理解的五个任务中生成NLE，大大超过了以前的最新水平，人类注释者一致认为RExC生成的解释更全面，基于常识，与以前的先进车型相比，总体上更受欢迎。此外，我们的工作表明，常识性的基础解释可以提高任务绩效和基本原理提取能力。摘要：Explainable machine learning models primarily justify predicted labels using either extractive rationales (i.e., subsets of input features) or free-text natural language explanations (NLEs) as abstractive justifications. While NLEs can be more comprehensive than extractive rationales, machine-generated NLEs have been shown to sometimes lack commonsense knowledge. Here, we show that commonsense knowledge can act as a bridge between extractive rationales and NLEs, rendering both types of explanations better. More precisely, we introduce a unified framework, called RExC (Rationale-Inspired Explanations with Commonsense), that (1) extracts rationales as a set of features responsible for machine predictions, (2) expands the extractive rationales using available commonsense resources, and (3) uses the expanded knowledge to generate natural language explanations. Our framework surpasses by a large margin the previous state-of-the-art in generating NLEs across five tasks in both natural language processing and vision-language understanding, with human annotators consistently rating the explanations generated by RExC to be more comprehensive, grounded in commonsense, and overall preferred compared to previous state-of-the-art models. Moreover, our work shows that commonsense-grounded explanations can enhance both task performance and rationales extraction capabilities.

【71】 A multi-stage machine learning model on diagnosis of esophageal manometry 标题：食管测压诊断的多阶段机器学习模型

作者：Wenjun Kou,Dustin A. Carlson,Alexandra J. Baumann,Erica N. Donnan,Jacob M. Schauer,Mozziyar Etemadi,John E. Pandolfino 机构：arXiv:,.,v, [cs.LG] , Jun 链接：https://arxiv.org/abs/2106.13869 摘要：高分辨率测压（HRM）是诊断食管动力障碍的主要方法。它的解释和分类包括对吞咽水平结果的初步评估，然后基于芝加哥分类（CC）使用树状算法推导研究水平诊断。这种使用人力资源管理诊断运动障碍的方法是使用一个多阶段的建模框架来反映的，该框架是由多种机器学习方法组合而成的。具体来说，该框架包括燕子级的深度学习模型和学习级的基于特征的机器学习模型。在吞咽水平阶段，建立了三个基于卷积神经网络（CNNs）的模型来预测吞咽类型、吞咽加压和综合松弛压力（IRP）。在研究阶段，对基于专家知识的规则模型、xgboost模型和人工神经网络（ANN）模型家族进行了模型选择，设计了后两个模型，并利用输出知识的激励进行了扩充。利用贝叶斯原理，提出了一种简单的模型不可知的模型平衡策略，通过精度得分加权得到模型平均值。对平均（混合）模型和单个模型进行了比较和评价，其中在测试数据集上，前1预测的最佳性能为0.81，前2预测的最佳性能为0.92。这是第一个人工智能风格的模型，自动预测CC诊断的人力资源管理研究的原始数据。此外，提出的模型框架可以很容易地扩展到多模态任务，例如基于HRM和功能性管腔成像探针全景测量（FLIP）的临床数据对食管患者进行诊断。摘要：High-resolution manometry (HRM) is the primary procedure used to diagnose esophageal motility disorders. Its interpretation and classification includes an initial evaluation of swallow-level outcomes and then derivation of a study-level diagnosis based on Chicago Classification (CC), using a tree-like algorithm. This diagnostic approach on motility disordered using HRM was mirrored using a multi-stage modeling framework developed using a combination of various machine learning approaches. Specifically, the framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. In the swallow-level stage, three models based on convolutional neural networks (CNNs) were developed to predict swallow type, swallow pressurization, and integrated relaxation pressure (IRP). At the study-level stage, model selection from families of the expert-knowledge-based rule models, xgboost models and artificial neural network(ANN) models were conducted, with the latter two model designed and augmented with motivation from the export knowledge. A simple model-agnostic strategy of model balancing motivated by Bayesian principles was utilized, which gave rise to model averaging weighted by precision scores. The averaged (blended) models and individual models were compared and evaluated, of which the best performance on test dataset is 0.81 in top-1 prediction, 0.92 in top-2 predictions. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data. Moreover, the proposed modeling framework could be easily extended to multi-modal tasks, such as diagnosis of esophageal patients based on clinical data from both HRM and functional luminal imaging probe panometry (FLIP).

【72】 Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits 标题：向探索者学习：土匪的最优报酬估计

作者：Wenshuo Guo,Kumar Krishna Agrawal,Aditya Grover,Vidya Muthukumar,Ashwin Pananjady 机构：⋄Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, †Facebook AI Research, ‡School of Electrical & Computer Engineering and School of Industrial & Systems Engineering, Georgia Institute of Technology 链接：https://arxiv.org/abs/2106.14866 摘要：通过观察一个低后悔演示者的学习过程，我们引入了估计多武装土匪实例回报的“逆土匪”问题。现有的逆强化学习相关问题的解决方法假设执行一个最优策略，因此存在可辨识性问题。相比之下，我们的范例利用了演示者在通往最优的道路上的行为，特别是在探索阶段，以获得一致的报酬估计。我们开发了简单而有效的奖励估计程序，用于一类基于高置信度的算法的演示，表明奖励估计随着算法的遗憾增加而变得越来越容易。我们将这些上界与适用于任何演示算法的信息论下界相匹配，从而描述探索和报酬估计之间的最佳权衡。对自然科学的合成数据和模拟实验设计数据的大量经验评估证实了我们的理论结果。摘要：We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, our paradigm leverages the demonstrator's behavior en route to optimality, and in particular, the exploration phase, to obtain consistent reward estimates. We develop simple and efficient reward estimation procedures for demonstrations within a class of upper-confidence-based algorithms, showing that reward estimation gets progressively easier as the regret of the algorithm increases. We match these upper bounds with information-theoretic lower bounds that apply to any demonstrator algorithm, thereby characterizing the optimal tradeoff between exploration and reward estimation. Extensive empirical evaluations on both synthetic data and simulated experimental design data from the natural sciences corroborate our theoretical results.

【73】 Topos and Stacks of Deep Neural Networks 标题：深度神经网络的拓扑和堆栈

作者：Jean-Claude Belfiore,Daniel Bennequin 机构：Huawei Paris Research Center, Mathematical and Algorithmic Sciences, Lab, University of Paris, Contents 链接：https://arxiv.org/abs/2106.14587 摘要：每一个已知的人工深层神经网络（DNN）都对应于一个典型Grothendieck拓扑中的一个对象；它的学习动态对应于这个拓扑中的态射流。层中的不变性结构（如CNN或LSTM）对应于Giraud的堆栈。这种不变性被认为是泛化特性的原因，即在约束条件下从学习数据进行外推。纤维代表前语义范畴（Culioli，Thom），人工语言在其上被定义，具有内部逻辑、直觉、古典或线性（Girard）。网络的语义功能是它用这种语言表达理论的能力，用这种语言回答输入数据输出中的问题。语义信息的数量和空间是通过类比香农熵的同调解释来定义的（P.Baudot和D.B.2015）。它们概括了Carnap和Bar-Hillel（1952）发现的度量。令人惊奇的是，在Quillen的封闭模型范畴中，上述语义结构被几何纤维对象分类，然后它们产生DNNs的同伦不变量及其语义功能。意向类型理论（martinloef）组织这些对象和它们之间的纤维。信息的内容和交换是由格罗森迪克的衍生工具来分析的。摘要：Every known artificial deep neural network (DNN) corresponds to an object in a canonical Grothendieck's topos; its learning dynamic corresponds to a flow of morphisms in this topos. Invariance structures in the layers (like CNNs or LSTMs) correspond to Giraud's stacks. This invariance is supposed to be responsible of the generalization property, that is extrapolation from learning data under constraints. The fibers represent pre-semantic categories (Culioli, Thom), over which artificial languages are defined, with internal logics, intuitionist, classical or linear (Girard). Semantic functioning of a network is its ability to express theories in such a language for answering questions in output about input data. Quantities and spaces of semantic information are defined by analogy with the homological interpretation of Shannon's entropy (P.Baudot and D.B. 2015). They generalize the measures found by Carnap and Bar-Hillel (1952). Amazingly, the above semantical structures are classified by geometric fibrant objects in a closed model category of Quillen, then they give rise to homotopical invariants of DNNs and of their semantic functioning. Intentional type theories (Martin-Loef) organize these objects and fibrations between them. Information contents and exchanges are analyzed by Grothendieck's derivators.

linux https 网络安全 NLP服务强化学习

0 人点赞