访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.AI人工智能,共计84篇
【1】 How Do Adam and Training Strategies Help BNNs Optimization? 标题:ADAM和训练策略如何帮助BNN优化?
作者:Zechun Liu,Zhiqiang Shen,Shichao Li,Koen Helwegen,Dong Huang,Kwang-Ting Cheng 机构: IntroductionBinary Neural Networks (BNNs) have gained increasingattention in recent years due to the high compression ra-Equal contribution 1Hong Kong University of Science andTechnology 2Carnegie Mellon University 3Plumerai 备注:ICML 2021. Code and models are available at this https URL 链接:https://arxiv.org/abs/2106.11309 摘要:最佳性能的二元神经网络(BNNs)通常是通过Adam优化及其多步训练来实现的。然而,据我们所知,很少有研究探讨Adam在BNN优化方面优于SGD等其他优化器的根本原因,或者提供支持特定训练策略的分析性解释。为了解决这个问题,在本文中,我们首先研究了在训练过程中的梯度和权重的轨迹。我们证明了Adam中二阶动量的正则化效应对于恢复BNNs中由于激活饱和而死亡的权值是至关重要的。我们发现Adam通过其自适应学习速率策略,能够更好地处理BNNs的粗糙损失面,并获得更好的最优解,具有更高的泛化能力。此外,我们还考察了实值权值在二元网络中的有趣作用,揭示了权值衰减对BNN优化稳定性和迟滞性的影响。通过大量的实验和分析,我们得到了一个简单的训练方案,建立在现有的基于Adam的优化基础上,使用与最先进的ReActNet相同的体系结构,在ImageNet数据集上实现了70.5%的top-1精度,同时实现了1.1%的更高精度。代码和型号可在https://github.com/liuzechun/AdamBNN. 摘要:The best performing Binary Neural Networks (BNNs) are usually attained using Adam optimization and its multi-step training variants. However, to the best of our knowledge, few studies explore the fundamental reasons why Adam is superior to other optimizers like SGD for BNN optimization or provide analytical explanations that support specific training strategies. To address this, in this paper we first investigate the trajectories of gradients and weights in BNNs during the training process. We show the regularization effect of second-order momentum in Adam is crucial to revitalize the weights that are dead due to the activation saturation in BNNs. We find that Adam, through its adaptive learning rate strategy, is better equipped to handle the rugged loss surface of BNNs and reaches a better optimum with higher generalization ability. Furthermore, we inspect the intriguing role of the real-valued weights in binary networks, and reveal the effect of weight decay on the stability and sluggishness of BNN optimization. Through extensive experiments and analysis, we derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the ImageNet dataset using the same architecture as the state-of-the-art ReActNet while achieving 1.1% higher accuracy. Code and models are available at https://github.com/liuzechun/AdamBNN.
【2】 Boundary Graph Neural Networks for 3D Simulations 标题:用于三维仿真的边界图神经网络
作者:Andreas Mayr,Sebastian Lehner,Arno Mayrhofer,Christoph Kloss,Sepp Hochreiter,Johannes Brandstetter 机构:ELLIS Unit Linz, LIT AI Lab, Johannes Kepler University Linz, DCS Computing GmbH, Linz, Austria, Institute of Advanced Research in, Artificial Intelligence (IARAI), University of Amsterdam 链接:https://arxiv.org/abs/2106.11299 摘要:丰富的数据为机器学习在自然科学和工程领域提供了巨大的动力。然而,模拟物理过程的建模仍然很困难。这样做的一个关键问题是几何边界的正确处理。虽然三角化的几何边界在工程应用中非常常见,但是由于它们在尺寸和方向上的异质性,机器学习方法很难对它们进行建模。在这项工作中,我们引入了边界图神经网络(BGNNs),它可以动态地修改图的结构来处理边界条件。通过修改边、增加节点特征和动态插入虚拟节点来构造边界图结构。在工业机械标准件料斗和转鼓的复杂三维颗粒流过程中进行了试验。利用一种昂贵而复杂的离散元方法得到的精确模拟结果,对BGNNs的计算效率、颗粒流和混合熵的预测精度进行了评价。即使存在复杂的边界,BGNNs也能够在数十万个模拟时间步内准确地再现模拟不确定性中的三维颗粒流,最显著的是,颗粒完全停留在几何对象内,而无需使用手工制作的条件或限制。 摘要:The abundance of data has given machine learning huge momentum in natural sciences and engineering. However, the modeling of simulated physical processes remains difficult. A key problem in doing so is the correct handling of geometric boundaries. While triangularized geometric boundaries are very common in engineering applications, they are notoriously difficult to model by machine learning approaches due to their heterogeneity with respect to size and orientation. In this work, we introduce Boundary Graph Neural Networks (BGNNs), which dynamically modify graph structures to address boundary conditions. Boundary graph structures are constructed via modifying edges, augmenting node features, and dynamically inserting virtual nodes. The new BGNNs are tested on complex 3D granular flow processes of hoppers and rotating drums which are standard parts of industrial machinery. Using precise simulations that are obtained by an expensive and complex discrete element method, BGNNs are evaluated in terms of computational efficiency as well as prediction accuracy of particle flows and mixing entropies. Even if complex boundaries are present, BGNNs are able to accurately reproduce 3D granular flows within simulation uncertainties over hundreds of thousands of simulation timesteps, and most notably particles completely stay within the geometric objects without using handcrafted conditions or restrictions.
【3】 VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning 标题:VIMPAC:基于掩蔽标记预测和对比学习的视频预训练
作者:Hao Tan,Jie Lei,Thomas Wolf,Mohit Bansal 机构:UNC Chapel Hill, Huggingface 备注:Under review, 23 Pages 链接:https://arxiv.org/abs/2106.11250 摘要:视频理解依赖于感知全局内容和建模其内部联系(例如因果关系、运动和时空对应)。为了了解这些交互作用,我们应用一个掩码,然后预测通过VQ-VAE生成的离散化视频令牌的预训练任务。与语言不同的是,文本标记更加独立,相邻的视频标记通常具有很强的相关性(例如,连续的视频帧通常看起来非常相似),因此统一地屏蔽单个标记将使任务变得过于琐碎而无法学习有用的表示。为了解决这个问题,我们提出了一种分块掩蔽策略,在空间域和时间域中掩蔽相邻的视频令牌。我们还添加了一种无需增强的对比学习方法,通过预测视频片段是否来自同一视频来进一步捕获全局内容。我们在未分级的视频上对我们的模型进行了预训练,并表明我们的预训练模型可以在多个视频理解数据集(如SSV2、g48)上达到最先进的结果。最后,对模型的可扩展性和预训练方法设计进行了详细的分析。代码发布时间https://github.com/airsplay/vimpac. 摘要:Video understanding relies on perceiving the global content and modeling its internal connections (e.g., causality, movement, and spatio-temporal correspondence). To learn these interactions, we apply a mask-then-predict pre-training task on discretized video tokens generated via VQ-VAE. Unlike language, where the text tokens are more independent, neighboring video tokens typically have strong correlations (e.g., consecutive video frames usually look very similar), and hence uniformly masking individual tokens will make the task too trivial to learn useful representations. To deal with this issue, we propose a block-wise masking strategy where we mask neighboring video tokens in both spatial and temporal domains. We also add an augmentation-free contrastive learning method to further capture the global content by predicting whether the video clips are sampled from the same video. We pre-train our model on uncurated videos and show that our pre-trained model can reach state-of-the-art results on several video understanding datasets (e.g., SSV2, Diving48). Lastly, we provide detailed analyses on model scalability and pre-training method design. Code is released at https://github.com/airsplay/vimpac.
【4】 Can poachers find animals from public camera trap images? 标题:偷猎者能从公共相机捕捉到的图像中找到动物吗?
作者:Sara Beery,Elizabeth Bondi 机构:California Institute of Technology†, Harvard University‡, Equal contribution 备注:CV4Animals Workshop at CVPR 2021 链接:https://arxiv.org/abs/2106.11236 摘要:为了保护包含敏感、高目标物种的相机陷阱数据的位置,许多生态学家在发布数据时随机混淆相机的经纬度。例如,他们可以为网络中的每个摄像机发布一个随机位置,该位置位于真实摄像机位置的1公里半径范围内。在本文中,我们研究了地理模糊处理对维护相机陷阱位置隐私的鲁棒性,并通过一个案例研究表明,一些简单、直观的启发式算法和公开可用的卫星光栅可以用来减少可能包含相机的面积87%(假设在1km内随机模糊),证明地理模糊可能不如以前认为的有效。 摘要:To protect the location of camera trap data containing sensitive, high-target species, many ecologists randomly obfuscate the latitude and longitude of the camera when publishing their data. For example, they may publish a random location within a 1km radius of the true camera location for each camera in their network. In this paper, we investigate the robustness of geo-obfuscation for maintaining camera trap location privacy, and show via a case study that a few simple, intuitive heuristics and publicly available satellite rasters can be used to reduce the area likely to contain the camera by 87% (assuming random obfuscation within 1km), demonstrating that geo-obfuscation may be less effective than previously believed.
【5】 Corruption Robust Active Learning 标题:腐败鲁棒主动学习
作者:Yifang Chen,Simon S. Du,Kevin Jamieson 机构:Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle,WA 链接:https://arxiv.org/abs/2106.11220 摘要:我们对未知对抗性标签损坏情况下基于流媒体的二值分类主动学习进行了理论研究。在这种情况下,每次学习者观察样本之前,对手都会决定是否破坏标签。首先,我们证明,在良性破坏环境(包括作为特例的错误指定环境)中,随着假设消除阈值的略微增大,经典的RobustCAL框架可以(令人惊讶地)获得与非破坏环境中几乎相同的标签复杂性保证。但是,此算法在一般损坏设置中可能会失败。为了解决这个缺点,我们提出了一个新的算法,它是可证明正确的不存在任何假设的腐蚀。此外,该算法在未损坏设置(由RobustCAL实现)中具有极大的最小标签复杂度,并且只需要在损坏设置中增加$tilde{mathcal{O}(C{mathrm{total}})$个标签即可实现$mathcal{O}(varepsilon frac{C{mathrm{total}}{n})$,其中$varepsilon$是目标精度,$C{mathrm{total}}$是损坏的总数,$n$是未标记样本的总数。 摘要:We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions. In this setting, every time before the learner observes a sample, the adversary decides whether to corrupt the label or not. First, we show that, in a benign corruption setting (which includes the misspecification setting as a special case), with a slight enlargement on the hypothesis elimination threshold, the classical RobustCAL framework can (surprisingly) achieve nearly the same label complexity guarantee as in the non-corrupted setting. However, this algorithm can fail in the general corruption setting. To resolve this drawback, we propose a new algorithm which is provably correct without any assumptions on the presence of corruptions. Furthermore, this algorithm enjoys the minimax label complexity in the non-corrupted setting (which is achieved by RobustCAL) and only requires $tilde{mathcal{O}}(C_{mathrm{total}})$ additional labels in the corrupted setting to achieve $mathcal{O}(varepsilon frac{C_{mathrm{total}}}{n})$, where $varepsilon$ is the target accuracy, $C_{mathrm{total}}$ is the total number of corruptions and $n$ is the total number of unlabeled samples.
【6】 Iterative Network Pruning with Uncertainty Regularization for Lifelong Sentiment Classification 标题:基于不确定性正则化的迭代网络修剪终生情感分类
作者:Binzong Geng,Min Yang,Fajie Yuan,Shupeng Wang,Xiang Ao,Ruifeng Xu 机构:University of Science and Technology of China, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Westlake University, Key Lab of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, CAS 备注:Accepted by the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021 链接:https://arxiv.org/abs/2106.11197 摘要:终身学习能力对于情感分类器处理网络上连续不断的观点信息流至关重要。然而,对于深度神经网络来说,进行终身学习是非常重要的,因为不断地训练增量可用信息不可避免地会导致灾难性的遗忘或干扰。本文利用网络剪枝和权值正则化的原理,提出了一种基于不确定性正则化的迭代网络剪枝终身情感分类方法。IPRLS通过迭代的方式进行网络剪枝和不确定性正则化,可以适应单个BERT模型处理来自多个域的连续到达数据,同时避免灾难性的遗忘和干扰。具体地说,我们利用一种迭代剪枝方法来去除大型深度网络中的冗余参数,这样释放出来的空间就可以用来学习新的任务,解决灾难性的遗忘问题。在学习新任务时,我们没有保持旧任务不变,而是使用基于贝叶斯在线学习框架的不确定性正则化来约束BERT中旧任务权重的更新,从而实现正后向迁移,也就是说,学习新任务可以提高过去任务的绩效,同时保护旧知识不被丢失。此外,我们还提出了一个任务相关的低维残差函数并行于BERT的每一层,使得IPRLS在学习新任务时不容易丢失基本BERT网络中的知识。在16个流行评论语料库上进行的大量实验表明,所提出的IPRLS方法明显优于强基线方法。为了再现性,我们将代码和数据提交至:https://github.com/siat-nlp/IPRLS. 摘要:Lifelong learning capabilities are crucial for sentiment classifiers to process continuous streams of opinioned information on the Web. However, performing lifelong learning is non-trivial for deep neural networks as continually training of incrementally available information inevitably results in catastrophic forgetting or interference. In this paper, we propose a novel iterative network pruning with uncertainty regularization method for lifelong sentiment classification (IPRLS), which leverages the principles of network pruning and weight regularization. By performing network pruning with uncertainty regularization in an iterative manner, IPRLS can adapta single BERT model to work with continuously arriving data from multiple domains while avoiding catastrophic forgetting and interference. Specifically, we leverage an iterative pruning method to remove redundant parameters in large deep networks so that the freed-up space can then be employed to learn new tasks, tackling the catastrophic forgetting problem. Instead of keeping the old-tasks fixed when learning new tasks, we also use an uncertainty regularization based on the Bayesian online learning framework to constrain the update of old tasks weights in BERT, which enables positive backward transfer, i.e. learning new tasks improves performance on past tasks while protecting old knowledge from being lost. In addition, we propose a task-specific low-dimensional residual function in parallel to each layer of BERT, which makes IPRLS less prone to losing the knowledge saved in the base BERT network when learning a new task. Extensive experiments on 16 popular review corpora demonstrate that the proposed IPRLS method sig-nificantly outperforms the strong baselines for lifelong sentiment classification. For reproducibility, we submit the code and data at:https://github.com/siat-nlp/IPRLS.
【7】 On fine-tuning of Autoencoders for Fuzzy rule classifiers 标题:模糊规则分类器自动编码器的微调研究
作者:Rahul Kumar Sevakula,Nishchal Kumar Verma,Hisao Ishibuchi 链接:https://arxiv.org/abs/2106.11182 摘要:深层神经网络的最新发现使研究人员能够处理一些非常复杂的问题,如图像分类和音频分类,并改进了理论和经验证明。提出了一种将自动编码器应用于模糊规则分类器的新方案。自动编码器在堆叠时可以学习到数据之间复杂的非线性关系,而所提出的基于FRC的框架可以让用户向系统输入专家知识。本文进一步介绍了四种新的自动编码器微调策略,以提高FRC的分类和规则约简性能。该框架已经在五个真实的基准数据集上进行了测试。与之前15项研究的详细比较,以及10倍交叉验证性能表明,所提出的方法能够构建FRCs,从而提供最先进的精确度。 摘要:Recent discoveries in Deep Neural Networks are allowing researchers to tackle some very complex problems such as image classification and audio classification, with improved theoretical and empirical justifications. This paper presents a novel scheme to incorporate the use of autoencoders in Fuzzy rule classifiers (FRC). Autoencoders when stacked can learn the complex non-linear relationships amongst data, and the proposed framework built towards FRC can allow users to input expert knowledge to the system. This paper further introduces four novel fine-tuning strategies for autoencoders to improve the FRC's classification and rule reduction performance. The proposed framework has been tested across five real-world benchmark datasets. Elaborate comparisons with over 15 previous studies, and across 10-fold cross validation performance, suggest that the proposed methods are capable of building FRCs which can provide state of the art accuracies.
【8】 Vehicle Trajectory Prediction in City-scale Road Networks using a Direction-based Sequence-to-Sequence Model with Spatiotemporal Attention Mechanisms 标题:具有时空注意机制的基于方向的序列到序列模型在城市路网车辆轨迹预测中的应用
作者:Yuebing Liang,Zhan Zhao 机构:Department of Urban Planning and Design, University of Hong Kong 链接:https://arxiv.org/abs/2106.11175 摘要:城市尺度下的车辆轨迹预测对于车辆导航、交通管理、位置推荐等各种基于位置的应用具有重要意义。现有方法通常将轨迹表示为一系列网格单元、路段或意图集。这些方法都不理想,因为基于单元的表示方法忽略了道路网的结构,另外两种方法在分析城市规模的道路网时效率较低。另外,大多数模型都侧重于预测下一个位置,对于较长的序列很难推广。为了解决这些问题,我们提出了一种新的序列到序列模型D-LSTM(Direction-based Long-Short-Term Memory),它将每条轨迹表示为一系列交叉点和相关的运动方向,然后将它们输入到LSTM编解码网络中,用于将来的轨迹生成。此外,我们还引入了一种空间注意机制来捕捉道路网络中的动态空间依赖,以及一种带有滑动上下文窗口的时间注意机制来捕捉轨迹数据中的短期和长期时间依赖。基于两个真实的大规模滑行轨迹数据集的大量实验表明,D-LSTM算法的性能优于现有的最新车辆轨迹预测方法,验证了所提出的轨迹表示方法和时空注意机制的有效性。 摘要:Trajectory prediction of vehicles at the city scale is of great importance to various location-based applications such as vehicle navigation, traffic management, and location-based recommendations. Existing methods typically represent a trajectory as a sequence of grid cells, road segments or intention sets. None of them is ideal, as the cell-based representation ignores the road network structures and the other two are less efficient in analyzing city-scale road networks. In addition, most models focus on predicting the immediate next position, and are difficult to generalize for longer sequences. To address these problems, we propose a novel sequence-to-sequence model named D-LSTM (Direction-based Long Short-Term Memory), which represents each trajectory as a sequence of intersections and associated movement directions, and then feeds them into a LSTM encoder-decoder network for future trajectory generation. Furthermore, we introduce a spatial attention mechanism to capture dynamic spatial dependencies in road networks, and a temporal attention mechanism with a sliding context window to capture both short- and long-term temporal dependencies in trajectory data. Extensive experiments based on two real-world large-scale taxi trajectory datasets show that D-LSTM outperforms the existing state-of-the-art methods for vehicle trajectory prediction, validating the effectiveness of the proposed trajectory representation method and spatiotemporal attention mechanisms.
【9】 Curriculum-Driven Multi-Agent Learning and the Role of Implicit Communication in Teamwork 标题:课程驱动的多Agent学习与隐性沟通在团队合作中的作用
作者:Niko A. Grupen,Daniel D. Lee,Bart Selman 机构: Cornell University, Cornell Tech 备注:18 pages, 10 figures 链接:https://arxiv.org/abs/2106.11156 摘要:我们提出了一个课程驱动的学习策略来解决多智能体协调任务。我们的方法受到动物通信研究的启发,这表明两个简单的设计特征(相互奖励和分散)在本质上支持广泛的通信协议。我们强调了将紧急通信类似地解释为频谱的重要性。我们引入了一个环形的,连续的空间追踪回避环境,并证明了单纯的分散学习并不能很好地执行。在此基础上,提出了一种新的课程驱动的多智能体学习策略。对追踪规避的实验表明,我们的方法能够使分散的追踪者学会协调和捕获一个优秀的规避者,显著优于复杂的分析策略。我们通过额外的定量分析(包括基于影响力的措施,如即时协调)认为,紧急内隐沟通在实现更高水平的协调方面发挥着重要作用。 摘要:We propose a curriculum-driven learning strategy for solving difficult multi-agent coordination tasks. Our method is inspired by a study of animal communication, which shows that two straightforward design features (mutual reward and decentralization) support a vast spectrum of communication protocols in nature. We highlight the importance of similarly interpreting emergent communication as a spectrum. We introduce a toroidal, continuous-space pursuit-evasion environment and show that naive decentralized learning does not perform well. We then propose a novel curriculum-driven strategy for multi-agent learning. Experiments with pursuit-evasion show that our approach enables decentralized pursuers to learn to coordinate and capture a superior evader, significantly outperforming sophisticated analytical policies. We argue through additional quantitative analysis -- including influence-based measures such as Instantaneous Coordination -- that emergent implicit communication plays a large role in enabling superior levels of coordination.
【10】 GraphMixup: Improving Class-Imbalanced Node Classification on Graphs by Self-supervised Context Prediction 标题:GraphMixup:利用自监督上下文预测改进图的类不平衡节点分类
作者:Lirong Wu,Haitao Lin,Zhangyang Gao,Cheng Tan,Stan. Z. Li 机构: AI Lab, School of Engineering, Westlake University, Hangzhou , Zhejiang Province, China, Institute of Advanced Technology, Westlake Institute for Advanced Study, Hangzhou , Zhejiang Province, China, Zhejiang University, Hangzhou , Zhejiang Province, China 链接:https://arxiv.org/abs/2106.11133 摘要:近年来,用图神经网络(GNNs)处理节点分类问题取得了很大的成功。然而,现有的gnn大多是基于不同类的节点样本是均衡的假设,而现实世界中的许多图都存在着类不均衡的问题,即有些类的样本可能比其它类少很多。在这种情况下,直接用原始数据训练GNN分类器会低估这些少数类的样本,从而导致次优性能。提出了一种改进图类不平衡节点分类的新框架GraphMixup。然而,由于少数类的极端稀疏性,直接在输入空间或嵌入空间进行混合可能产生域外样本;因此,我们构造语义关系空间,允许在语义层次上进行特征合成。此外,我们应用两种基于上下文的自监督技术来捕获图结构中的局部和全局信息,然后针对图数据提出了边缘融合。最后,我们开发了一个emph{Reinforcement Mixup}机制来自适应地确定要为这些少数类生成多少样本。在三个真实数据集上的大量实验表明,GraphMixup对于类不平衡节点分类任务产生了令人鼓舞的结果。 摘要:Recent years have witnessed great success in handling node classification tasks with Graph Neural Networks (GNNs). However, most existing GNNs are based on the assumption that node samples for different classes are balanced, while for many real-world graphs, there exists the problem of class imbalance, i.e., some classes may have much fewer samples than others. In this case, directly training a GNN classifier with raw data would under-represent samples from those minority classes and result in sub-optimal performance. This paper presents GraphMixup, a novel mixup-based framework for improving class-imbalanced node classification on graphs. However, directly performing mixup in the input space or embedding space may produce out-of-domain samples due to the extreme sparsity of minority classes; hence we construct semantic relation spaces that allows the Feature Mixup to be performed at the semantic level. Moreover, we apply two context-based self-supervised techniques to capture both local and global information in the graph structure and then propose Edge Mixup specifically for graph data. Finally, we develop a emph{Reinforcement Mixup} mechanism to adaptively determine how many samples are to be generated by mixup for those minority classes. Extensive experiments on three real-world datasets show that GraphMixup yields truly encouraging results for class-imbalanced node classification tasks.
【11】 Decadal Forecasts with ResDMD: a Residual DMD Neural Network 标题:基于ResDMD的年代际预报:一种残差DMD神经网络
作者:Eduardo Rodrigues,Bianca Zadrozny,Campbell Watson,David Gold 备注:Accepted to ICML 2021 Workshop Tackling Climate Change with Machine Learning 链接:https://arxiv.org/abs/2106.11111 摘要:运营预测中心正在投资十年(1-10年)预测系统,以支持更具气候适应能力的社会的长期决策。以前采用的一种方法是动态模式分解(DMD)算法,也称为线性逆模型,它将线性动态模型与数据相匹配。虽然DMD通常将真实动力学中的非线性项近似为随机噪声的线性系统,但是我们研究了DMD的一个扩展,它将非线性项显式地表示为神经网络。我们的权值初始化允许网络在训练前产生合理的结果,然后在训练后当数据可用时改进预测。在这篇短文中,我们评估了所提出的模拟全球海面温度的架构,并将其结果与最先进的动态模式CFSv2所产生的标准DMD和季节预报进行了比较。 摘要:Operational forecasting centers are investing in decadal (1-10 year) forecast systems to support long-term decision making for a more climate-resilient society. One method that has previously been employed is the Dynamic Mode Decomposition (DMD) algorithm - also known as the Linear Inverse Model - which fits linear dynamical models to data. While the DMD usually approximates non-linear terms in the true dynamics as a linear system with random noise, we investigate an extension to the DMD that explicitly represents the non-linear terms as a neural network. Our weight initialization allows the network to produce sensible results before training and then improve the prediction after training as data becomes available. In this short paper, we evaluate the proposed architecture for simulating global sea surface temperatures and compare the results with the standard DMD and seasonal forecasts produced by the state-of-the-art dynamical model, CFSv2.
【12】 EML Online Speech Activity Detection for the Fearless Steps Challenge Phase-III 标题:面向无畏舞步挑战赛第三阶段的EML在线语音活动检测
作者:Omid Ghahabi,Volker Fischer 机构:EML Speech Technology GmbH, Berliner Straße , Heidelberg, Germany 链接:https://arxiv.org/abs/2106.11075 摘要:语音活动检测(SAD)是大多数语音技术应用的一个主要部分,它是在音频记录中定位语音片段。在信噪比(SNR)变化的噪声环境下,鲁棒SAD通常比较困难。“无畏的脚步挑战”最近从美国宇航局阿波罗11号任务中为不同的语音处理任务(包括SAD)提供了这样的数据。大多数录音都会因频道内和频道之间不同种类和级别的噪声而降级。本文介绍了EML在线算法的最新阶段的这一挑战。该算法可以在有监督和无监督的情况下进行训练,并在运行时大约每0.1秒分配一次语音和非语音标签。实验结果表明,在单CPU环境下,开发和评估数据集的实时性因子约为0.002。 摘要:Speech Activity Detection (SAD), locating speech segments within an audio recording, is a main part of most speech technology applications. Robust SAD is usually more difficult in noisy conditions with varying signal-to-noise ratios (SNR). The Fearless Steps challenge has recently provided such data from the NASA Apollo-11 mission for different speech processing tasks including SAD. Most audio recordings are degraded by different kinds and levels of noise varying within and between channels. This paper describes the EML online algorithm for the most recent phase of this challenge. The proposed algorithm can be trained both in a supervised and unsupervised manner and assigns speech and non-speech labels at runtime approximately every 0.1 sec. The experimental results show a competitive accuracy on both development and evaluation datasets with a real-time factor of about 0.002 using a single CPU machine.
【13】 Techniques for Symbol Grounding with SATNet 标题:使用SATNet实现符号接地的技术
作者:Sever Topan,David Rolnick,Xujie Si 机构: the issue 1McGill University and NVIDIA, ca 2McGill University and Mila – Quebec AI Institute 备注:Code available at this https URL 链接:https://arxiv.org/abs/2106.11072 摘要:许多专家认为,人工智能的未来受到该领域将符号逻辑推理集成到深度学习体系结构的能力的限制。最近提出的可微MAXSAT求解器SATNet,是其与传统神经网络集成和解决视觉推理问题能力的突破。例如,它可以从图像例子中学习数独的规则。尽管SATNet取得了成功,但它还是屈服于神经符号系统中的一个关键挑战,即符号基础问题:在没有明确监督的情况下,无法将视觉输入映射到符号变量(“标签泄漏”)。在这项工作中,我们提出了一个自我监督的预训练管道,使SATNet能够克服这一限制,从而拓宽了SATNet体系结构可以解决的问题的类别,包括完全没有中间标签的数据集。我们证明,我们的方法可以使卫星网达到完全准确,甚至与一个更困难的问题设置,防止任何标签泄漏。此外,我们还介绍了一种校对方法,进一步提高了SATNet体系结构的性能,击败了最先进的视觉数独。 摘要:Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into deep learning architectures. The recently proposed differentiable MAXSAT solver, SATNet, was a breakthrough in its capacity to integrate with a traditional neural network and solve visual reasoning problems. For instance, it can learn the rules of Sudoku purely from image examples. Despite its success, SATNet was shown to succumb to a key challenge in neurosymbolic systems known as the Symbol Grounding Problem: the inability to map visual inputs to symbolic variables without explicit supervision ("label leakage"). In this work, we present a self-supervised pre-training pipeline that enables SATNet to overcome this limitation, thus broadening the class of problems that SATNet architectures can solve to include datasets where no intermediary labels are available at all. We demonstrate that our method allows SATNet to attain full accuracy even with a harder problem setup that prevents any label leakage. We additionally introduce a proofreading method that further improves the performance of SATNet architectures, beating the state-of-the-art on Visual Sudoku.
【14】 QuaPy: A Python-Based Framework for Quantification 标题:QuaPy:一个基于Python的量化框架
作者:Alejandro Moreo,Andrea Esuli,Fabrizio Sebastiani 机构:Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi , Pisa, Italy 链接:https://arxiv.org/abs/2106.11057 摘要:QuaPy是一个用Python编写的用于执行量化(也称为监督流行率估计)的开源框架。量化是通过监督学习训练量词的任务,其中量词是一个预测因子,用于估计未标记数据样本中感兴趣类别的相对频率(又称流行值)。虽然可以通过对每个未标记的数据项应用标准分类器并计算分配给每个类的数据项的数量来执行量化,但是已经表明,这种“分类和计数”方法的性能优于专门为量化设计的方法。QuaPy提供了许多基线方法和高级量化方法的实现、面向量化的模型选择例程、一些广泛接受的评估度量以及在现场常规使用的健壮的评估协议。QuaPy还提供了常用于测试量词的数据集,并提供了便于分析和解释结果的可视化工具。该软件是开放源代码的,通过BSD-3许可证公开提供https://github.com/HLT-ISTI/QuaPy,并可通过pip安装(https://pypi.org/project/QuaPy/) 摘要:QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that estimates the relative frequencies (a.k.a. prevalence values) of the classes of interest in a sample of unlabelled data. While quantification can be trivially performed by applying a standard classifier to each unlabelled data item and counting how many data items have been assigned to each class, it has been shown that this "classify and count" method is outperformed by methods specifically designed for quantification. QuaPy provides implementations of a number of baseline methods and advanced quantification methods, of routines for quantification-oriented model selection, of several broadly accepted evaluation measures, and of robust evaluation protocols routinely used in the field. QuaPy also makes available datasets commonly used for testing quantifiers, and offers visualization tools for facilitating the analysis and interpretation of the results. The software is open-source and publicly available under a BSD-3 licence via https://github.com/HLT-ISTI/QuaPy, and can be installed via pip (https://pypi.org/project/QuaPy/)
【15】 Paradigm selection for Data Fusion of SAR and Multispectral Sentinel data applied to Land-Cover Classification 标题:用于土地覆盖分类的SAR与多光谱哨兵数据融合范例选择
作者:Alessandro Sebastianelli,Maria Pia Del Rosso,Pierre Philippe Mathieu,Silvia Liberata Ullo 机构: University of Sannio 备注:This work has been submitted to the IEEE Geoscience and Remote Sensing Letters for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 链接:https://arxiv.org/abs/2106.11056 摘要:数据融合是一项众所周知的技术,在人工智能对地观测(AI4EO)领域得到越来越广泛的应用,主要是因为它能够通过组合多个数据源来加强AI4EO的应用,从而产生更好的结果。另一方面,与其他卫星数据分析方法一样,由于人工智能(AI)的集成,数据融合本身也在受益和发展。本文分析并实现了四种基于卷积神经网络的数据融合方法。其目的是在确定了CNN的基本结构之后,为选择最佳的数据融合框架提供一个系统的过程,从而得到最佳的分类结果,并在涉及数据融合应用于遥感时帮助感兴趣的研究人员进行工作。该方法已在土地覆被分类中得到验证,但也可以推广到其他情况。 摘要:Data fusion is a well-known technique, becoming more and more popular in the Artificial Intelligence for Earth Observation (AI4EO) domain mainly due to its ability of reinforcing AI4EO applications by combining multiple data sources and thus bringing better results. On the other hand, like other methods for satellite data analysis, data fusion itself is also benefiting and evolving thanks to the integration of Artificial Intelligence (AI). In this letter, four data fusion paradigms, based on Convolutional Neural Networks (CNNs), are analyzed and implemented. The goals are to provide a systematic procedure for choosing the best data fusion framework, resulting in the best classification results, once the basic structure for the CNN has been defined, and to help interested researchers in their work when data fusion applied to remote sensing is involved. The procedure has been validated for land-cover classification but it can be transferred to other cases.
【16】 Performance Evaluation of Classification Models for Household Income, Consumption and Expenditure Data Set 标题:家庭收入、消费和支出数据集分类模型的性能评价
作者:Mersha Nigus,Dorsewamy 机构:Department of Computer Science, Mangalore University, Karnataka, India, Key words: Machine learning, classification, HICE, food, insecurity, KNN, Corresponding Author: 链接:https://arxiv.org/abs/2106.11055 摘要:由于最近区域和全球一级的粮食短缺以及主要捐助国对消除长期饥饿的新承诺,粮食安全在今天的政策议程上比过去更加突出。机器学习可用于家庭食品不安全分类的一个领域。在这项研究中,我们建立了一个稳健的方法来分类是否一个家庭是食品安全和食品不安全的机器学习算法。在本研究中,我们使用十种机器学习演算法来分类家庭的食物安全状况。梯度增强(GB)、随机林(RF)、额外树(ET)、Bagging、K-最近邻(KNN)、决策树(DT)、支持向量机(SVM)、Logistic回归(LR)、Ada-Boost(AB)和朴素贝叶斯是本研究中使用的分类算法(NB)。然后,通过从HICE调查数据中收集数据并由领域专家进行验证,开发家庭食品安全状况数据集,完成分类任务。所有分类器的性能对于所有性能指标都有更好的结果。随机森林模型和梯度增强模型的测试精度为0.9997,其他分类器如Bagging、Decision tree、Ada-Boost、Extra tree、K-近邻、Logistic回归、SVM和Naive Bayes的得分分别为0.9996、0.09996、0.9994、0.95675、0.9415、0.8915、0.7853和0.7595,分别。 摘要:Food security is more prominent on the policy agenda today than it has been in the past, thanks to recent food shortages at both the regional and global levels as well as renewed promises from major donor countries to combat chronic hunger. One field where machine learning can be used is in the classification of household food insecurity. In this study, we establish a robust methodology to categorize whether or not a household is being food secure and food insecure by machine learning algorithms. In this study, we have used ten machine learning algorithms to classify the food security status of the Household. Gradient Boosting (GB), Random Forest (RF), Extra Tree (ET), Bagging, K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), Ada Boost (AB) and Naive Bayes were the classification algorithms used throughout this study (NB). Then, we perform classification tasks from developing data set for household food security status by gathering data from HICE survey data and validating it by Domain Experts. The performance of all classifiers has better results for all performance metrics. The performance of the Random Forest and Gradient Boosting models are outstanding with a testing accuracy of 0.9997 and the other classifier such as Bagging, Decision tree, Ada Boost, Extra tree, K-nearest neighbor, Logistic Regression, SVM and Naive Bayes are scored 0.9996, 0.09996, 0.9994, 0.95675, 0.9415, 0.8915, 0.7853 and 0.7595, respectively.
【17】 Visual Probing: Cognitive Framework for Explaining Self-Supervised Image Representations 标题:视觉探测:解释自我监督图像表征的认知框架
作者:Witold Oleszkiewicz,Dominika Basaj,Igor Sieradzki,Michał Górszczak,Barbara Rychalska,Koryna Lewandowska,Tomasz Trzciński,Bartosz Zieliński 机构: Lewandowska is with Department of Cognitive Neuroscience and Neu-roergonomics 链接:https://arxiv.org/abs/2106.11054 摘要:最近引入的图像表示学习的自监督方法与完全监督的竞争对手相比提供了相当或更好的结果,然而相应的解释自监督方法的努力却相对滞后。基于这一观察,我们引入了一个新的视觉探测框架,通过利用自然语言处理中的探测任务来解释自监督模型。探测任务需要了解图像部分之间的语义关系。因此,我们提出了一个系统的方法来获得视觉中自然语言的类似物,例如视觉词汇、上下文和分类法。我们的建议是基于马尔的视觉计算理论和关注的特点,如纹理,形状和线条。我们在解释自我监督表征时展示了这些类比的有效性和适用性。我们的主要发现强调语言和视觉之间的关系可以作为一个有效而直观的工具来发现机器学习模型是如何工作的,独立于数据模式。我们的工作开启了一条通向更可解释和透明人工智能的研究道路。 摘要:Recently introduced self-supervised methods for image representation learning provide on par or superior results to their fully supervised competitors, yet the corresponding efforts to explain the self-supervised approaches lag behind. Motivated by this observation, we introduce a novel visual probing framework for explaining the self-supervised models by leveraging probing tasks employed previously in natural language processing. The probing tasks require knowledge about semantic relationships between image parts. Hence, we propose a systematic approach to obtain analogs of natural language in vision, such as visual words, context, and taxonomy. Our proposal is grounded in Marr's computational theory of vision and concerns features like textures, shapes, and lines. We show the effectiveness and applicability of those analogs in the context of explaining self-supervised representations. Our key findings emphasize that relations between language and vision can serve as an effective yet intuitive tool for discovering how machine learning models work, independently of data modality. Our work opens a plethora of research pathways towards more explainable and transparent AI.
【18】 Leveraging Language to Learn Program Abstractions and Search Heuristics 标题:利用语言学习程序抽象和搜索启发式
作者:Catherine Wong,Kevin Ellis,Joshua B. Tenenbaum,Jacob Andreas 机构: a framework for improving the ef-ficiency and generalizability of learned program synthesis 1MIT 2Cornell University 3Center for Brains 备注:appeared in Thirty-eighth International Conference on Machine Learning (ICML 2021) 链接:https://arxiv.org/abs/2106.11053 摘要:归纳程序综合,或从期望行为的例子中推断程序,为建立可解释的、健壮的和可推广的机器学习系统提供了一个通用的范例。有效的程序综合取决于两个关键要素:一个强大的函数库,从中生成程序;一个高效的搜索策略,用于查找解决给定任务的程序。我们介绍了LAPS(Language for Abstraction and Program Search),一种使用自然语言注释来指导库的联合学习的技术,以及用于合成的神经引导搜索模型。当集成到最先进的库学习系统(DreamCoder)中时,LAPS生成更高质量的库,并在字符串编辑、图像合成和场景抽象推理三个领域提高搜索效率和泛化能力,即使在测试时没有可用的自然语言提示。 摘要:Inductive program synthesis, or inferring programs from examples of desired behavior, offers a general paradigm for building interpretable, robust, and generalizable machine learning systems. Effective program synthesis depends on two key ingredients: a strong library of functions from which to build programs, and an efficient search strategy for finding programs that solve a given task. We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization on three domains -- string editing, image composition, and abstract reasoning about scenes -- even when no natural language hints are available at test time.
【19】 Institutionalising Ethics in AI through Broader Impact Requirements 标题:通过更广泛的影响要求使人工智能中的道德制度化
作者:Carina Prunkl,Carolyn Ashurst,Markus Anderljung,Helena Webb,Jan Leike,Allan Dafoe 机构:Institute for Ethics in AI, University of Oxford, Future of Humanity Institute, University of Oxford, Department of Computer Science, University of Oxford 备注:None 链接:https://arxiv.org/abs/2106.11039 摘要:将原则转化为实践是人工智能(AI)治理最紧迫的挑战之一。在本文中,我们将回顾世界上最大的人工智能会议之一提出的一项新的治理计划。2020年,神经信息处理系统会议(NeurIPS)提出了一项要求,要求提交的作者必须包括一份关于其研究对更广泛社会影响的声明。我们从类似的治理举措(包括机构审查委员会(irb))和资助申请的影响要求中汲取见解,研究此类举措的风险、挑战和潜在好处。在这些挑战中,我们列出了缺乏公认的最佳实践和程序透明度、研究人员机会成本、机构和社会压力、认知偏见以及任务本身的困难性。另一方面,潜在的好处包括改进对影响的预期和识别,与政策和治理专家更好地沟通,以及全面加强围绕负责任研究的规范。为了最大限度地增加成功的机会,我们建议采取措施,增加透明度,改进指导,建立激励机制,认真参与这一进程,并促进公众审议这一要求的优点和未来。也许这一分析最重要的贡献是我们可以获得的关于有效社区治理的见解,以及更广泛的人工智能研究社区的角色和责任。 摘要:Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) governance. In this article, we reflect on a novel governance initiative by one of the world's largest AI conferences. In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts of their research. Drawing insights from similar governance initiatives, including institutional review boards (IRBs) and impact requirements for funding applications, we investigate the risks, challenges and potential benefits of such an initiative. Among the challenges, we list a lack of recognised best practice and procedural transparency, researcher opportunity costs, institutional and social pressures, cognitive biases, and the inherently difficult nature of the task. The potential benefits, on the other hand, include improved anticipation and identification of impacts, better communication with policy and governance experts, and a general strengthening of the norms around responsible research. To maximise the chance of success, we recommend measures to increase transparency, improve guidance, create incentives to engage earnestly with the process, and facilitate public deliberation on the requirement's merits and future. Perhaps the most important contribution from this analysis are the insights we can gain regarding effective community-based governance and the role and responsibility of the AI research community more broadly.
【20】 Know Your Model (KYM): Increasing Trust in AI and Machine Learning 标题:了解你的模型(KIM):增加对人工智能和机器学习的信任
作者:Mary Roszel,Robert Norvill,Jean Hilger,Radu State 机构:University of Luxembourg, Banque et Caisse d’Epargne de l’Etat 备注:10 pages 链接:https://arxiv.org/abs/2106.11036 摘要:人工智能系统的广泛应用引起了人们对此类系统对社会的潜在影响的关注。特别值得关注的是预测错误可能对现实世界场景产生的后果,以及人类对人工智能系统的信任。有必要了解我们如何评估人工智能的可信性,以及个人和实体如何开发可信性人工智能系统。在本文中,我们分析了每一个可信性要素,并提供了一套20条准则,这些准则可用于确保最佳人工智能功能,同时考虑到对人类更大的道德、技术和实际影响。此外,这些指导方针有助于确保可信度是可证明和可证明的,它们与实现无关,并且可以应用于任何部门的任何人工智能系统。 摘要:The widespread utilization of AI systems has drawn attention to the potential impacts of such systems on society. Of particular concern are the consequences that prediction errors may have on real-world scenarios, and the trust humanity places in AI systems. It is necessary to understand how we can evaluate trustworthiness in AI and how individuals and entities alike can develop trustworthy AI systems. In this paper, we analyze each element of trustworthiness and provide a set of 20 guidelines that can be leveraged to ensure optimal AI functionality while taking into account the greater ethical, technical, and practical impacts to humanity. Moreover, the guidelines help ensure that trustworthiness is provable and can be demonstrated, they are implementation agnostic, and they can be applied to any AI system in any sector.
【21】 Teaching Machine Learning in K-12 Computing Education: Potential and Pitfalls 标题:K-12计算教育中的机器学习教学:潜力与陷阱
作者:Matti Tedre,Tapani Toivonen,Juho Kaihila,Henriikka Vartiainen,Teemu Valtonen,Ilkka Jormanainen,Arnold Pears 机构:School of Applied Educational Science and Teacher Education, University of Eastern Finland, PO Box , Joensuu, Finland (e-mail: 链接:https://arxiv.org/abs/2106.11034 摘要:在过去的几十年中,机器学习技术的大量实际应用显示了数据驱动方法在许多计算领域的潜力。机器学习越来越多地被纳入高等教育的计算机课程中,越来越多的计划也在K-12计算机教育中推广机器学习。随着机器学习进入K-12计算教育,理解在这样的系统中直觉和代理是如何发展的成为一个关键的研究领域。但是,由于学校和教师已经在努力将传统的计算思维和传统的人工智能融入学校课程,了解K-12机器学习教学背后的挑战对计算教育研究来说是一个更为艰巨的挑战。尽管机器学习在现代计算领域处于中心地位,但是计算机教育研究文献中关于人们如何训练、测试、改进和部署机器学习系统的研究却非常少。K-12课程空间尤其如此。本文图表的新兴轨迹,教育实践,理论和技术相关的教学机器学习在K-12教育。本文将现有的工作放在一般的计算机教育背景下,并描述了K-12计算机教育工作者在面对这一挑战时应考虑的一些差异。这篇文章的重点是范式转变的关键方面,这将是必要的,以便成功地将机器学习纳入更广泛的K-12计算课程。关键的一步是放弃基于规则的“传统”编程是发展下一代计算思维的核心方面和基石的信念。 摘要:Over the past decades, numerous practical applications of machine learning techniques have shown the potential of data-driven approaches in a large number of computing fields. Machine learning is increasingly included in computing curricula in higher education, and a quickly growing number of initiatives are expanding it in K-12 computing education, too. As machine learning enters K-12 computing education, understanding how intuition and agency in the context of such systems is developed becomes a key research area. But as schools and teachers are already struggling with integrating traditional computational thinking and traditional artificial intelligence into school curricula, understanding the challenges behind teaching machine learning in K-12 is an even more daunting challenge for computing education research. Despite the central position of machine learning in the field of modern computing, the computing education research body of literature contains remarkably few studies of how people learn to train, test, improve, and deploy machine learning systems. This is especially true of the K-12 curriculum space. This article charts the emerging trajectories in educational practice, theory, and technology related to teaching machine learning in K-12 education. The article situates the existing work in the context of computing education in general, and describes some differences that K-12 computing educators should take into account when facing this challenge. The article focuses on key aspects of the paradigm shift that will be required in order to successfully integrate machine learning into the broader K-12 computing curricula. A crucial step is abandoning the belief that rule-based "traditional" programming is a central aspect and building block in developing next generation computational thinking.
【22】 Hard Choices in Artificial Intelligence 标题:人工智能的艰难抉择
作者:Roel Dobbe,Thomas Krendl Gilbert,Yonatan Mintz 备注:Pre-print. Shorter versions published at Neurips 2019 Workshop on AI for Social Good and Conference on AI, Ethics and Society 2020 链接:https://arxiv.org/abs/2106.11022 摘要:随着人工智能系统被整合到高风险的社会领域,研究人员现在研究如何以安全和道德的方式设计和操作它们。然而,在复杂的社会环境中识别和诊断安全风险的标准仍然不明确,存在争议。在这篇论文中,我们检验了关于人工智能系统的安全和道德行为的争论中的模糊性。我们展示了这种模糊性如何不能仅仅通过数学形式主义来解决,而是需要考虑发展的政治以及部署的背景。从一个新的社会技术词汇中,我们重新定义了模糊性在人工智能系统开发的关键阶段的独特设计挑战。由此产生的人工智能硬选择框架(HCAI)通过1)识别设计决策和重大社会技术挑战之间的重叠点,赋予开发者权力;2) 鼓励建立利益相关者反馈渠道,以便安全问题得到彻底解决。因此,HCAI有助于及时讨论AI在民主社会中的发展状况,认为审议应该是AI安全的目标,而不仅仅是确保AI安全的程序。 摘要:As AI systems are integrated into high stakes social domains, researchers now examine how to design and operate them in a safe and ethical manner. However, the criteria for identifying and diagnosing safety risks in complex social contexts remain unclear and contested. In this paper, we examine the vagueness in debates about the safety and ethical behavior of AI systems. We show how this vagueness cannot be resolved through mathematical formalism alone, instead requiring deliberation about the politics of development as well as the context of deployment. Drawing from a new sociotechnical lexicon, we redefine vagueness in terms of distinct design challenges at key stages in AI system development. The resulting framework of Hard Choices in Artificial Intelligence (HCAI) empowers developers by 1) identifying points of overlap between design decisions and major sociotechnical challenges; 2) motivating the creation of stakeholder feedback channels so that safety issues can be exhaustively addressed. As such, HCAI contributes to a timely debate about the status of AI development in democratic societies, arguing that deliberation should be the goal of AI Safety, not just the procedure by which it is ensured.
【23】 Wheelchair automation by a hybrid BCI system using SSVEP and eye blinks 标题:使用SSVEP和眨眼的混合BCI系统实现轮椅自动化
作者:Lizy Kanungo,Nikhil Garg,Anish Bhobe,Smit Rajguru,Veeky Baths 机构:are with the Cognitive Neuroscience Lab, Department of Biological, Science, BITS, Pilani- K. K. Birla Goa Campus, Goa, India , (e-mail:, focuses hisher attention on a Repetitive Visual Stimulus, (RVS), a source flickering at frequency ,Hz or above [,]. 链接:https://arxiv.org/abs/2106.11008 摘要:这项工作提出了一个混合脑-机接口系统的自动化轮椅为残疾人。本文详细介绍了一种基于BCI的轮椅工作原型,该原型可以在典型的家庭环境中进行导航,结构修改最少,并且不会对用户造成任何视觉障碍和不适。该原型基于稳态视觉诱发电位和眨眼的联合机制。用13Hz和15Hz闪烁的发光二极管分别选择左右方向,记录脑电数据,诱发SSVEP。此外,出现三次连续闪烁被用作停止正在进行的动作的指标。采用小波包去噪方法,对窄带重构脑电信号进行小波包分解、典型相关分析等特征提取。利用贝叶斯优化方法得到5次交叉验证,优化支持向量机的超参数。对新模型进行了检验,平均交叉验证准确率为89.65% 6.6%(SD),检验准确率为83.53% 8.59%(SD)。轮椅由RaspberryPi通过WiFi控制。所开发的原型在所有试验中的平均成功率为86.97%,每个命令执行4.015秒。该原型可以有效地在家庭环境中使用,而不会给用户带来任何不适。 摘要:This work proposes a hybrid Brain Computer Interface system for the automation of a wheelchair for the disabled. Herein a working prototype of a BCI-based wheelchair is detailed that can navigate inside a typical home environment with minimum structural modification and without any visual obstruction and discomfort to the user. The prototype is based on a combined mechanism of steady-state visually evoked potential and eye blinks. To elicit SSVEP, LEDs flickering at 13Hz and 15Hz were used to select the left and right direction, respectively, and EEG data was recorded. In addition, the occurrence of three continuous blinks was used as an indicator for stopping an ongoing action. The wavelet packet denoising method was applied, followed by feature extraction methods such as Wavelet Packet Decomposition and Canonical Correlation Analysis over narrowband reconstructed EEG signals. Bayesian optimization was used to obtain 5 fold cross-validations to optimize the hyperparameters of the Support Vector Machine. The resulting new model was tested and the average cross-validation accuracy 89.65% 6.6% (SD) and testing accuracy 83.53% 8.59% (SD) were obtained. The wheelchair was controlled by RaspberryPi through WiFi. The developed prototype demonstrated an average of 86.97% success rate for all trials with 4.015s for each command execution. The prototype can be used efficiently in a home environment without causing any discomfort to the user.
【24】 BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation 标题:BernNet:通过Bernstein逼近学习任意图谱滤波器
作者:Mingguo He,Zhewei Wei,Zengfeng Huang,Hongteng Xu 机构:Renmin University of China, Fudan University 备注:14 pages, 31 figures 链接:https://arxiv.org/abs/2106.10994 摘要:许多有代表性的图神经网络,如$、GPR-GNN和ChebyNet,都是用图谱滤波器近似图卷积的。然而,现有的工作要么应用预先定义的滤波器权值,要么在没有必要约束的情况下学习它们,这可能导致滤波器过于简单或不适定。为了克服这些问题,我们提出了$textit{BernNet}$,这是一种具有理论支持的新型图神经网络,它为设计和学习任意图谱滤波器提供了一种简单而有效的方案。特别地,对于图的规范化拉普拉斯谱上的任何滤波器,我们的BernNet通过一阶-K$Bernstein多项式逼近来估计它,并通过设置Bernstein基的系数来设计它的谱性质。此外,我们可以根据观察到的图形及其相关信号来学习系数(以及相应的滤波器权重),从而实现数据专用的BernNet。实验结果表明,BernNet可以学习任意的频谱滤波器,包括复杂的带阻滤波器和梳状滤波器,在实际的图形建模任务中取得了良好的性能。 摘要:Many representative graph neural networks, $e.g.$, GPR-GNN and ChebyNet, approximate graph convolutions with graph spectral filters. However, existing work either applies predefined filter weights or learns them without necessary constraints, which may lead to oversimplified or ill-posed filters. To overcome these issues, we propose $textit{BernNet}$, a novel graph neural network with theoretical support that provides a simple but effective scheme for designing and learning arbitrary graph spectral filters. In particular, for any filter over the normalized Laplacian spectrum of a graph, our BernNet estimates it by an order-$K$ Bernstein polynomial approximation and designs its spectral property by setting the coefficients of the Bernstein basis. Moreover, we can learn the coefficients (and the corresponding filter weights) based on observed graphs and their associated signals and thus achieve the BernNet specialized for the data. Our experiments demonstrate that BernNet can learn arbitrary spectral filters, including complicated band-rejection and comb filters, and it achieves superior performance in real-world graph modeling tasks.
【25】 Pre-training also Transfers Non-Robustness 标题:预训练也传递了非健壮性
作者:Jiaming Zhang,Jitao Sang,Qi Yi,Huiwen Dong,Jian Yu 机构:Beijing Jiaotong University, Beijing Normal University 链接:https://arxiv.org/abs/2106.10989 摘要:预训练使许多任务取得了最新成果。尽管预训练对泛化有显著的贡献,但我们在本研究中观察到,预训练也将非鲁棒性从预训练模型转化为微调模型。以图像分类为例,首先对各种数据集和网络主干进行实验,探讨影响鲁棒性的因素。进一步分析了微调模型与标准模型的差异,揭示了导致非鲁棒性传递的原因。最后,通过正则化目标任务和源任务之间的差异,提出了一种简单的鲁棒预训练方法。结果验证了该方法在抑制非鲁棒性和保持泛化方面的有效性。 摘要:Pre-training has enabled many state-of-the-art results on many tasks. In spite of its recognized contribution to generalization, we observed in this study that pre-training also transfers the non-robustness from pre-trained model into the fine-tuned model. Using image classification as an example, we first conducted experiments on various datasets and network backbones to explore the factors influencing robustness. Further analysis is conducted on examining the difference between the fine-tuned model and standard model to uncover the reason leading to the non-robustness transfer. Finally, we introduce a simple robust pre-training solution by regularizing the difference between target and source tasks. Results validate the effectiveness in alleviating non-robustness and preserving generalization.
【26】 Attribute Selection using Contranominal Scales 标题:利用共生标度进行属性选择
作者:Dominik Dürrschnabel,Maren Koyda,Gerd Stumme 机构: Knowledge & Data Engineering Group, University of Kassel, Germany, Interdisciplinary Research Center for Information System Design 备注:17 pages, 2 figures, 3 tables, 1 algorithm, 26th International Conference on Conceptual Structures 链接:https://arxiv.org/abs/2106.10978 摘要:形式概念分析(FCA)允许通过导出概念并在格中排序来分析二进制数据。FCA的主要目标之一是使人们能够理解数据中包含的信息;然而,概念格的大尺寸限制了理解其基本结构特性的可行性。这种格的大小取决于相应形式上下文中同构于高维逆标度的子上下文的数量。在这项工作中,我们提出了一种算法constrafinder,它能够计算给定形式上下文的所有反义词尺度。利用该算法,我们引入了delta调整这一新的方法,通过选择合适的属性子集来减少形式上下文中的对位标度。我们证明了对上下文的delta调整减小了由此产生的子半格的大小,并且蕴涵集仅限于有意义的蕴涵。通过分类任务对其相关知识进行评估。因此,我们所提出的技术在保留重要概念结构的同时,极大地提高了可理解性。 摘要:Formal Concept Analysis (FCA) allows to analyze binary data by deriving concepts and ordering them in lattices. One of the main goals of FCA is to enable humans to comprehend the information that is encapsulated in the data; however, the large size of concept lattices is a limiting factor for the feasibility of understanding the underlying structural properties. The size of such a lattice depends on the number of subcontexts in the corresponding formal context that are isomorphic to a contranominal scale of high dimension. In this work, we propose the algorithm ContraFinder that enables the computation of all contranominal scales of a given formal context. Leveraging this algorithm, we introduce delta-adjusting, a novel approach in order to decrease the number of contranominal scales in a formal context by the selection of an appropriate attribute subset. We demonstrate that delta-adjusting a context reduces the size of the hereby emerging sub-semilattice and that the implication set is restricted to meaningful implications. This is evaluated with respect to its associated knowledge by means of a classification task. Hence, our proposed technique strongly improves understandability while preserving important conceptual structures.
【27】 Defeasible Reasoning via Datalog^neg标题:通过数据日志
作者:Michael J. Maher 机构:Reasoning Research Institute, Canberra, Australia 备注:Under consideration in Theory and Practice of Logic Programming (TPLP) 链接:https://arxiv.org/abs/2106.10946 摘要:我们要解决的问题是将可撤销的理论编译成Datalog$^neg$程序。对于可撤销逻辑$DL(partial|{|})$,我们证明了此编译的正确性,但我们使用的技术适用于许多其他可撤销逻辑。与其他可撤销逻辑相比,$DL(部分{| |})$的结构属性支持有效实现和/或逼近逻辑中可撤销理论的结论。我们还使用以前研究得很好的逻辑程序的结构属性来适应不完整的Datalog$^neg$实现。 摘要:We address the problem of compiling defeasible theories to Datalog$^neg$ programs. We prove the correctness of this compilation, for the defeasible logic $DL(partial_{||})$, but the techniques we use apply to many other defeasible logics. Structural properties of $DL(partial_{||})$ are identified that support efficient implementation and/or approximation of the conclusions of defeasible theories in the logic, compared with other defeasible logics. We also use previously well-studied structural properties of logic programs to adapt to incomplete Datalog$^neg$ implementations.
【28】 Hard hat wearing detection based on head keypoint localization 标题:基于头部关键点定位的安全帽佩戴检测
作者:Bartosz Wójcik,Mateusz Żarski,Kamil Książek,Jarosław Adam Miszczak,Mirosław Jan Skibniewski 机构:Jan Skibniewskid,b,e,f, Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka ,-, Gliwice, Poland, Electronics and Computer Science, Silesian University of Technology, Akademicka ,-, Gliwice, Poland 备注:15 pages, 9 figures and 9 tables 链接:https://arxiv.org/abs/2106.10944 摘要:近年来,基于视觉的施工现场安全系统中的深度学习方法受到了广泛关注,尤其是个人防护装备。然而,尽管如此,仍然没有可靠的方法来确定工人和他们的安全帽之间的关系。为了解决这一问题,本文提出了一种结合深度学习、目标检测和头部关键点定位的方法,并结合简单的基于规则的推理。在测试中,该方法超越了以往基于不同实例的相对包围盒位置以及直接检测戴头盔者和未戴头盔者的方法。结果表明,将新的深度学习方法与基于规则的人性化解释系统相结合,可以得到既可靠又能成功模拟人工现场监督的解决方案。这项工作是发展完全自主的建筑工地安全系统的下一步,表明这方面仍有改进的余地。 摘要:In recent years, a lot of attention is paid to deep learning methods in the context of vision-based construction site safety systems, especially regarding personal protective equipment. However, despite all this attention, there is still no reliable way to establish the relationship between workers and their hard hats. To answer this problem a combination of deep learning, object detection and head keypoint localization, with simple rule-based reasoning is proposed in this article. In tests, this solution surpassed the previous methods based on the relative bounding box position of different instances, as well as direct detection of hard hat wearers and non-wearers. The results show that the conjunction of novel deep learning methods with humanly-interpretable rule-based systems can result in a solution that is both reliable and can successfully mimic manual, on-site supervision. This work is the next step in the development of fully autonomous construction site safety systems and shows that there is still room for improvement in this area.
【29】 On Limited-Memory Subsampling Strategies for Bandits 标题:关于Bitits的有限存储二次采样策略
作者:Dorian Baudry,Yoan Russac,Olivier Cappé 机构: Université PSL 备注:None 链接:https://arxiv.org/abs/2106.10935 摘要:最近,基于子抽样的非参数bandit算法引起了人们极大的兴趣。然而,这些方法的一个缺点是随机子抽样和存储奖励的完整历史所需的额外复杂性。我们的第一个贡献是证明一个简单的确定性子抽样规则,在Baudry等人(2020)最近的工作中以“最后一块子抽样”的名义提出,在单参数指数族中是渐近最优的。此外,我们证明了当将算法内存限制为时间范围的多对数函数时,这些保证也成立。这些发现开辟了新的视角,特别是对于arm分布随时间演化的非平稳场景。我们提出了一种只使用最新观测值进行二次抽样的算法,在已知突变次数的假设下实现了最优后悔保证。大量的数值模拟强调了这种方法的优点,特别是当变化不仅仅影响奖励的方式时。 摘要:There has been a recent surge of interest in nonparametric bandit algorithms based on subsampling. One drawback however of these approaches is the additional complexity required by random subsampling and the storage of the full history of rewards. Our first contribution is to show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. (2020) under the name of ''last-block subsampling'', is asymptotically optimal in one-parameter exponential families. In addition, we prove that these guarantees also hold when limiting the algorithm memory to a polylogarithmic function of the time horizon. These findings open up new perspectives, in particular for non-stationary scenarios in which the arm distributions evolve over time. We propose a variant of the algorithm in which only the most recent observations are used for subsampling, achieving optimal regret guarantees under the assumption of a known number of abrupt changes. Extensive numerical simulations highlight the merits of this approach, particularly when the changes are not only affecting the means of the rewards.
【30】 GRAND: Graph Neural Diffusion 标题:GRAND:图的神经扩散
作者:Benjamin Paul Chamberlain,James Rowbottom,Maria Gorinova,Stefan Webb,Emanuele Rossi,Michael M. Bronstein 备注:15 pages, 4 figures. Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s) 链接:https://arxiv.org/abs/2106.10934 摘要:我们提出了一种图神经扩散(GRAND)方法,它将图的深度学习看作是一个连续的扩散过程,并将图神经网络(GNNs)看作是基本偏微分方程的离散化。在我们的模型中,层结构和拓扑对应于时间和空间算子的离散化选择。我们的方法允许有原则地开发一类广泛的新GNN,这些GNN能够解决图形学习模型的常见问题,如深度、过度平滑和瓶颈。我们的模型成功的关键是关于数据扰动的稳定性,这是针对隐式和显式离散格式的。我们开发了GRAND的线性和非线性版本,在许多标准图形基准上都取得了有竞争力的结果。 摘要:We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.
【31】 STEP-EZ: Syntax Tree guided semantic ExPlanation for Explainable Zero-shot modeling of clinical depression symptoms from text 标题:STEP-EZ:句法树引导的语义解释对临床抑郁症状文本的可解释零射建模
作者:Nawshad Farruque,Randy Goebel,Osmar Zaiane,Sudhakar Sivapalan 机构:Sivapalan, Department of Computing Science, University of Alberta, Alberta Machine Intelligence Institute (AMII), University of Alberta, Department of Psychiatry, University of Alberta 链接:https://arxiv.org/abs/2106.10928 摘要:我们致力于探索Zero-Shot学习(ZSL)的各种方法,以及它们对一个因训练数据匮乏而臭名昭著的具有挑战性但又很重要的有监督学习任务的解释能力,即从文本中检测抑郁症状(DSD)。我们首先在临床医生的帮助下,对ZSL模型的不同组成部分进行综合,并对我们的基本事实样本和抑郁症症状线索的治疗过程进行分析。接下来,我们将分析各种最先进的ZSL模型的准确性以及它们对我们任务的潜在增强。此外,我们还为使用ZSL进行基于文本的分层解释机制勾画了一个框架,我们称之为语法树指导的语义解释(STEP)。最后,我们总结了实验结果,从中我们可以得出结论,我们可以使用ZSL模型,并达到合理的准确性和解释性,衡量了提出的解释性指数(EI)。据我们所知,这项工作是第一次从准确性和解释性两个方面全面探讨ZSL模型在DSD任务中的有效性。 摘要:We focus on exploring various approaches of Zero-Shot Learning (ZSL) and their explainability for a challenging yet important supervised learning task notorious for training data scarcity, i.e. Depression Symptoms Detection (DSD) from text. We start with a comprehensive synthesis of different components of our ZSL modeling and analysis of our ground truth samples and Depression symptom clues curation process with the help of a practicing clinician. We next analyze the accuracy of various state-of-the-art ZSL models and their potential enhancements for our task. Further, we sketch a framework for the use of ZSL for hierarchical text-based explanation mechanism, which we call, Syntax Tree-Guided Semantic Explanation (STEP). Finally, we summarize experiments from which we conclude that we can use ZSL models and achieve reasonable accuracy and explainability, measured by a proposed Explainability Index (EI). This work is, to our knowledge, the first work to exhaustively explore the efficacy of ZSL models for DSD task, both in terms of accuracy and explainability.
【32】 Open-set Label Noise Can Improve Robustness Against Inherent Label Noise 标题:开集标签噪声可以提高对固有标签噪声的鲁棒性
作者:Hongxin Wei,Lue Tao,Renchunzi Xie,Bo An 机构:Nanyang Technological University, Singapore, Nanjing University of Aeronautics and Astronautics 链接:https://arxiv.org/abs/2106.10891 摘要:在弱监督学习中,带噪声标签的学习是一个具有实际挑战性的问题。在现有文献中,开集噪声和闭集噪声一样,通常被认为是有害的。在本文中,我们经验地证明了开集噪声标签是无毒的,甚至有利于对固有噪声标签的鲁棒性。受观测结果的启发,我们提出了一种简单而有效的正则化方法,即在训练中引入带动态噪声标签的开集样本(ODNL)。有了ODNL,神经网络的额外容量可以以一种不干扰从干净数据中学习模式的方式被大量消耗。通过对SGD噪声的分析,我们证明了该方法所产生的噪声具有随机方向性、无冲突性和有偏性,这有助于模型收敛到一个稳定的平坦极小值,并使模型在非分布情况下产生保守的预测。实验结果表明,该方法不仅提高了现有许多鲁棒算法的性能,而且在标签噪声环境下,也显著改善了非分布检测任务。 摘要:Learning with noisy labels is a practically challenging problem in weakly supervised learning. In the existing literature, open-set noises are always considered to be poisonous for generalization, similar to closed-set noises. In this paper, we empirically show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. Inspired by the observations, we propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training. With ODNL, the extra capacity of the neural network can be largely consumed in a way that does not interfere with learning patterns from clean data. Through the lens of SGD noise, we show that the noises induced by our method are random-direction, conflict-free and biased, which may help the model converge to a flat minimum with superior stability and enforce the model to produce conservative predictions on Out-of-Distribution instances. Extensive experimental results on benchmark datasets with various types of noisy labels demonstrate that the proposed method not only enhances the performance of many existing robust algorithms but also achieves significant improvement on Out-of-Distribution detection tasks even in the label noise setting.
【33】 Proceedings Eighteenth Conference on Theoretical Aspects of Rationality and Knowledge 标题:第十八届理性和知识理论方面会议论文集
作者:Joseph Halpern,Andrés Perea 备注:None 链接:https://arxiv.org/abs/2106.10886 摘要:TARK会议(理性和知识的理论方面)是一个两年一次的会议,旨在汇集来自不同领域的研究人员,包括计算机科学、人工智能、博弈论、决策论、哲学、逻辑学、语言学和认知科学。它的目标是加深我们对跨学科问题的理解,包括理性和知识的推理。感兴趣的主题包括但不限于知识、信念、意识和不确定性的语义模型、有限理性和资源有限推理、常识认知推理、认知逻辑、认知博弈论、知识和行动、知识和其他心理状态推理的应用、信念修正、,以及多智能体系统的基础。这些程序包含了在6月25日和2021年6月27日在中国北京清华大学举行的理性与知识理论(TARK 2021)第十八次会议上被接受的论文。 摘要:The TARK conference (Theoretical Aspects of Rationality and Knowledge) is a biannual conference that aims to bring together researchers from a wide variety of fields, including computer science, artificial intelligence, game theory, decision theory, philosophy, logic, linguistics, and cognitive science. Its goal is to further our understanding of interdisciplinary issues involving reasoning about rationality and knowledge. Topics of interest include, but are not limited to, semantic models for knowledge, belief, awareness and uncertainty, bounded rationality and resource-bounded reasoning, commonsense epistemic reasoning, epistemic logic, epistemic game theory, knowledge and action, applications of reasoning about knowledge and other mental states, belief revision, and foundations of multi-agent systems. These proceedings contain the papers that have been accepted for presentation at the Eighteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2021), held between June 25 and June 27, 2021, at Tsinghua University at Beijing, China.
【34】 Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes 标题:Total Generate:用于生成人脸、手、身体和自然场景的循环生成对抗性网络
作者:Hao Tang,Nicu Sebe 机构: StarGANHao Tang and Nicu Sebe are with the Department of Information Engineer-ing and Computer Science (DISI), University of Trento 备注:Accepted to TMM, an extended version of a paper published in ACM MM 2019. arXiv admin note: substantial text overlap with arXiv:1908.00999 链接:https://arxiv.org/abs/2106.10876 摘要:提出了一种新的、统一的循环生成对抗网络(C2GAN),用于生成人脸、手、身体和自然场景。我们提出的C2GAN是一个跨模态模型,以交互方式探索输入图像数据和制导数据的联合开发。C2GAN包含两个不同的生成器,即图像生成生成器和制导生成生成器。两个发生器以端到端的方式相互连接和训练,并显式地形成三个循环子网,即一个图像生成循环和两个制导生成循环。每个循环的目的是重建输入域,同时产生一个有用的输出,涉及到另一个循环的生成。通过这种方式,各周期相互约束,隐含地提供来自图像和制导模式的互补信息,并在各周期之间带来额外的监督梯度,从而促进整个模型的更稳健优化。对四个子任务的大量实验结果表明,与现有模型相比,C2GAN能有效地生成更真实的图像。代码可在https://github.com/Ha0Tang/C2GAN. 摘要:We propose a novel and unified Cycle in Cycle Generative Adversarial Network (C2GAN) for generating human faces, hands, bodies, and natural scenes. Our proposed C2GAN is a cross-modal model exploring the joint exploitation of the input image data and guidance data in an interactive manner. C2GAN contains two different generators, i.e., an image-generation generator and a guidance-generation generator. Both generators are mutually connected and trained in an end-to-end fashion and explicitly form three cycled subnets, i.e., one image generation cycle and two guidance generation cycles. Each cycle aims at reconstructing the input domain and simultaneously produces a useful output involved in the generation of another cycle. In this way, the cycles constrain each other implicitly providing complementary information from both image and guidance modalities and bringing an extra supervision gradient across the cycles, facilitating a more robust optimization of the whole model. Extensive results on four guided image-to-image translation subtasks demonstrate that the proposed C2GAN is effective in generating more realistic images compared with state-of-the-art models. The code is available at https://github.com/Ha0Tang/C2GAN.
【35】 On the Importance of Environments in Human-Robot Coordination 标题:论环境在人-机器人协调中的重要性
作者:Matthew C. Fontaine,Ya-Chuan Hsu,Yulun Zhang,Bryon Tjakana,Stefanos Nikolaidis 机构:Department of Computer Science, University of Southern California, Los Angeles, CA, USA 备注:Accepted to Robotics: Science and Systems (RSS) 2021 链接:https://arxiv.org/abs/2106.10853 摘要:当研究机器人与人类协作时,大部分的焦点都集中在机器人策略上,这些策略能够在协作任务中与人类的队友进行流畅的协调。然而,人们很少关注环境对协调行为的影响。为了深入研究导致不同行为的环境,我们提出了一个程序生成环境的框架,这些环境(1)在风格上类似于人类编写的环境,(2)保证人类机器人团队可以解决,(3)在协调措施方面不同。我们通过模拟和在线用户研究,分析了在超调基准域中程序生成的环境。结果表明,即使机器人运行相同的规划算法,环境也会导致不同的涌现行为,并且在协作流畅性指标上存在显著的统计差异。 摘要:When studying robots collaborating with humans, much of the focus has been on robot policies that coordinate fluently with human teammates in collaborative tasks. However, less emphasis has been placed on the effect of the environment on coordination behaviors. To thoroughly explore environments that result in diverse behaviors, we propose a framework for procedural generation of environments that are (1) stylistically similar to human-authored environments, (2) guaranteed to be solvable by the human-robot team, and (3) diverse with respect to coordination measures. We analyze the procedurally generated environments in the Overcooked benchmark domain via simulation and an online user study. Results show that the environments result in qualitatively different emerging behaviors and statistically significant differences in collaborative fluency metrics, even when the robot runs the same planning algorithm.
【36】 Trainable Class Prototypes for Few-Shot Learning 标题:用于Few-Shot学习的可训练类原型
作者:Jianyi Li,Guizhong Liu 机构:School of Information and Communications, Xi’an Jiaotong University, Xi’an, P.R. China 备注:8 pages, 2 figures,and 3 Tables. arXiv admin note: substantial text overlap with arXiv:2008.09942 链接:https://arxiv.org/abs/2106.10846 摘要:度量学习是一种广泛应用的Few-Shot学习方法,其中原型的质量是算法的关键。本文在元训练和任务训练框架下,提出了可训练的距离测量原型,而不是人工的距离测量原型。同时为了避免幕式元训练带来的弊端,我们采用了基于自监督学习的非幕式元训练。总体而言,我们分两个阶段来解决少数镜头任务:通过自监督学习对可转移特征提取器进行元训练和对原型进行度量分类训练。此外,元训练和任务训练都采用了简单的注意机制。我们的方法在标准的Few-Shot视觉分类数据集上实现了各种已建立的Few-Shot任务的最新性能,与现有的无监督Few-Shot学习方法相比提高了约20%。 摘要:Metric learning is a widely used method for few shot learning in which the quality of prototypes plays a key role in the algorithm. In this paper we propose the trainable prototypes for distance measure instead of the artificial ones within the meta-training and task-training framework. Also to avoid the disadvantages that the episodic meta-training brought, we adopt non-episodic meta-training based on self-supervised learning. Overall we solve the few-shot tasks in two phases: meta-training a transferable feature extractor via self-supervised learning and training the prototypes for metric classification. In addition, the simple attention mechanism is used in both meta-training and task-training. Our method achieves state-of-the-art performance in a variety of established few-shot tasks on the standard few-shot visual classification dataset, with about 20% increase compared to the available unsupervised few-shot learning methods.
【37】 Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling 标题:注意:多语言、多领域序列建模中的中心选择
作者:Hongyu Gong,Yun Tang,Juan Pino,Xian Li 机构:Facebook AI Research 链接:https://arxiv.org/abs/2106.10840 摘要:多头注意使每个注意头从输入序列的不同部分收集显著信息,使其成为序列建模的有力机制。多语言和多领域学习是序列建模的常见场景,其中的关键挑战是最大化跨语言和领域的正迁移和负迁移。在本文中,我们发现非选择性的注意力分享是次优的,以实现良好的泛化跨所有语言和领域。我们进一步提出注意共享策略,以促进多语种和多领域序列建模中的参数共享和专业化。我们的方法自动学习不同语言和领域的共享和专门注意头,以减轻它们的干扰。在包括语音识别、文本到文本和语音到文本翻译在内的各种任务中,所提出的注意共享策略始终为建立在多头注意基础上的序列模型带来收益。对于语音到文本的翻译,我们的方法在多语言环境下,平均产生$13$语言方向上的$2.0$BLEU,在多域环境下,平均产生$3$域上的$2.0$BLEU。 摘要:Multi-head attention has each of the attention heads collect salient information from different parts of an input sequence, making it a powerful mechanism for sequence modeling. Multilingual and multi-domain learning are common scenarios for sequence modeling, where the key challenge is to maximize positive transfer and mitigate negative transfer across languages and domains. In this paper, we find that non-selective attention sharing is sub-optimal for achieving good generalization across all languages and domains. We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling. Our approach automatically learns shared and specialized attention heads for different languages and domains to mitigate their interference. Evaluated in various tasks including speech recognition, text-to-text and speech-to-text translation, the proposed attention sharing strategies consistently bring gains to sequence models built upon multi-head attention. For speech-to-text translation, our approach yields an average of $ 2.0$ BLEU over $13$ language directions in multilingual setting and $ 2.0$ BLEU over $3$ domains in multi-domain setting.
【38】 Online Handbook of Argumentation for AI: Volume 2 标题:人工智能论证在线手册:第2卷
作者:OHAAI Collaboration,Andreas Brannstrom,Federico Castagna,Theo Duchatelle,Matt Foulis,Timotheus Kampik,Isabelle Kuhlmann,Lars Malmqvist,Mariela Morveli-Espinoza,Jack Mumford,Stipe Pandzic,Robin Schaefer,Luke Thorburn,Andreas Xydis,Antonio Yuste-Ginel,Heng Zheng 机构:Francesca Mosca, S, tefan Sarkadi, arXiv:,.,v, [cs.AI] , Jun 链接:https://arxiv.org/abs/2106.10832 摘要:本卷包含为人工智能(OHAAI)在线论证手册第二卷选择的论文的修订版本。以前,关于论证和论证相互作用的形式理论已经被提出和研究,这导致了近年来对论证计算模型的研究。论证,作为人工智能的一个领域,对于对知识的符号表示和可撤销推理感兴趣的研究者来说是非常重要的。这本手册的目的是提供一个开放的访问和策划的论辩研究界选集。OHAAI被设计成一个研究中心,跟踪最新的和即将到来的博士驱动的研究,在所有与人工智能相关的领域中的理论和应用的论证。 摘要:This volume contains revised versions of the papers selected for the second volume of the Online Handbook of Argumentation for AI (OHAAI). Previously, formal theories of argument and argument interaction have been proposed and studied, and this has led to the more recent study of computational models of argument. Argumentation, as a field within artificial intelligence (AI), is highly relevant for researchers interested in symbolic representations of knowledge and defeasible reasoning. The purpose of this handbook is to provide an open access and curated anthology for the argumentation research community. OHAAI is designed to serve as a research hub to keep track of the latest and upcoming PhD-driven research on the theory and application of argumentation in all areas related to AI.
【39】 DiGS : Divergence guided shape implicit neural representation for unoriented point clouds 标题:DIGS:无方向点云的散度引导形状隐式神经表示
作者:Yizhak Ben-Shabat,Chamin Hewa Koneputugodage,Stephen Gould 机构:The Australian National University, Technion Israel Institute of Technology 链接:https://arxiv.org/abs/2106.10811 摘要:最近,神经形状表示在形状分析和重建任务中被证明是有效的。现有的神经网络方法需要点坐标和相应的法向量来学习形状的隐式水平集。通常不提供法向量作为原始数据,因此需要近似和重定位作为预处理阶段,这两个阶段都会引入噪声。本文提出了一种不需要法向量作为输入的散度引导的形状表示学习方法。我们证明,在距离函数的散度上加入软约束有利于平滑解,该解可靠地确定梯度方向以匹配每个点的未知法线,在某些情况下甚至比直接使用地面真值法线向量的方法更好。此外,我们提出了一种新的正弦形状表示网络的几何初始化方法,进一步提高了收敛到期望解的能力。我们评估了我们的方法在表面重建任务中的有效性,并与其他无定向方法相比显示了最先进的性能,与定向方法相比,显示了相当的性能。 摘要:Neural shape representations have recently shown to be effective in shape analysis and reconstruction tasks. Existing neural network methods require point coordinates and corresponding normal vectors to learn the implicit level sets of the shape. Normal vectors are often not provided as raw data, therefore, approximation and reorientation are required as pre-processing stages, both of which can introduce noise. In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input. We show that incorporating a soft constraint on the divergence of the distance function favours smooth solutions that reliably orients gradients to match the unknown normal at each point, in some cases even better than approaches that use ground truth normal vectors directly. Additionally, we introduce a novel geometric initialization method for sinusoidal shape representation networks that further improves convergence to the desired solution. We evaluate the effectiveness of our approach on the task of surface reconstruction and show state-of-the-art performance compared to other unoriented methods and on-par performance compared to oriented methods.
【40】 Adversarial Attack on Graph Neural Networks as An Influence Maximization Problem 标题:图神经网络的对抗性攻击作为影响最大化问题
作者:Jiaqi Ma,Junwei Deng,Qiaozhu Mei 机构:School of Information, University of Michigan, ‡Department of EECS 链接:https://arxiv.org/abs/2106.10785 摘要:图神经网络(GNNs)引起了人们越来越多的兴趣。随着GNNs在现实世界中的广泛应用,迫切需要了解GNNs在对抗性攻击下的健壮性,特别是在现实环境中。在这项工作中,我们研究了在限制和现实的设置下,通过扰动一小部分节点的特征来攻击GNNs的问题,而不需要访问模型参数和模型预测。我们的形式化分析将这种攻击与图上的影响最大化问题联系起来。这种联系不仅加深了我们对GNNs对抗性攻击问题的认识,而且使我们能够提出一组行之有效的攻击策略。我们的实验验证了所提出的攻击策略显著降低了三种流行GNN模型的性能,并优于基线对抗攻击策略。 摘要:Graph neural networks (GNNs) have attracted increasing interests. With broad deployments of GNNs in real-world applications, there is an urgent need for understanding the robustness of GNNs under adversarial attacks, especially in realistic setups. In this work, we study the problem of attacking GNNs in a restricted and realistic setup, by perturbing the features of a small set of nodes, with no access to model parameters and model predictions. Our formal analysis draws a connection between this type of attacks and an influence maximization problem on the graph. This connection not only enhances our understanding on the problem of adversarial attack on GNNs, but also allows us to propose a group of effective and practical attack strategies. Our experiments verify that the proposed attack strategies significantly degrade the performance of three popular GNN models and outperform baseline adversarial attack strategies.
【41】 OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation 标题:OptiDICE:基于平稳分布修正估计的离线策略优化
作者:Jongmin Lee,Wonseok Jeon,Byung-Jun Lee,Joelle Pineau,Kee-Eung Kim 机构:Equal contribution 1School of Computing, Quebec AI Institute 3School of Computer Science, 5Facebook AI Research 6GraduateSchool of AI 备注:26 pages, 11 figures, Accepted at ICML 2021 链接:https://arxiv.org/abs/2106.10783 摘要:我们考虑离线强化学习(RL)设置,其中agent的目标是仅从数据中优化策略,而不需要进一步的环境交互。在离线RL中,由于优化的目标策略偏离了用于数据收集的行为策略,分布转移成为主要的困难来源。这通常会导致对动作值的高估,这对使用自举的无模型算法带来了严重的问题。为了缓解这个问题,以前的离线RL算法经常使用复杂的技术,鼓励低估动作值,这就引入了一组额外的需要适当调整的超参数。在本文中,我们提出了一个离线RL算法,以更具原则性的方式防止高估。与以往的离线RL算法不同,我们的算法OptiDICE直接估计最优策略的平稳分布修正,不依赖于策略梯度。使用一组广泛的离线RL基准数据集,我们证明optdice的性能与最先进的方法相当。 摘要:We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.
【42】 Learning to Track Object Position through Occlusion 标题:学习通过遮挡跟踪对象位置
作者:Satyaki Chakraborty,Martial Hebert 机构:Carnegie Mellon University 链接:https://arxiv.org/abs/2106.10766 摘要:遮挡是目标探测器和跟踪器面临的最重要的挑战之一。虽然目标检测和跟踪在过去都受到了广泛的关注,但该领域现有的大多数方法都没有针对遮挡目标进行检测和跟踪。然而,对于不同的自主任务来说,通过遮挡来检测或跟踪感兴趣的物体是一个长期的挑战。传统的方法是使用视觉目标跟踪器和显式遮挡建模经验漂移,并作出一些基本假设的数据。我们建议在基于区域的视频对象检测器成功的基础上,采用“通过检测跟踪”的方法来解决这一问题。我们的视频级目标检测器在其核心使用了一种新的循环计算单元,即使在遮挡情况下也能实现目标特征的长期传播。最后,我们将我们的方法与现有的最先进的视频对象检测器进行了比较,结果表明我们的方法在从互联网上收集的家具装配视频数据集上取得了很好的效果,其中螺钉、螺母和螺栓等小对象经常被摄像机的视点遮挡。 摘要:Occlusion is one of the most significant challenges encountered by object detectors and trackers. While both object detection and tracking has received a lot of attention in the past, most existing methods in this domain do not target detecting or tracking objects when they are occluded. However, being able to detect or track an object of interest through occlusion has been a long standing challenge for different autonomous tasks. Traditional methods that employ visual object trackers with explicit occlusion modeling experience drift and make several fundamental assumptions about the data. We propose to address this with a `tracking-by-detection` approach that builds upon the success of region based video object detectors. Our video level object detector uses a novel recurrent computational unit at its core that enables long term propagation of object features even under occlusion. Finally, we compare our approach with existing state-of-the-art video object detectors and show that our approach achieves superior results on a dataset of furniture assembly videos collected from the internet, where small objects like screws, nuts, and bolts often get occluded from the camera viewpoint.
【43】 Is Shapley Value fair? Improving Client Selection for Mavericks in Federated Learning 标题:沙普利的价值公平吗?在联合学习中改进小牛队的客户选择
作者:Jiyue Huang,Chi Hong,Lydia Y. Chen,Stefanie Roos 机构:Delft University of Technology 链接:https://arxiv.org/abs/2106.10734 摘要:Shapley值通常用于衡量和激励客户参与联合学习。在本文中,我们从理论上和模拟上证明了Shapley Value低估了一种常见的客户类型:Maverick。Mavericks是在数据分布和数据量上都不同的客户机,可以是某些类型数据的唯一所有者。在正确的时刻选择正确的客户机对于联邦学习减少收敛时间和提高准确性非常重要。我们提出了FedEMD,一种基于局部和全局数据分布之间的Wasserstein距离的自适应客户选择策略。由于FedEMD调整了选择概率,使得当模型受益于稀有类的改进时,最好选择小牛,因此它始终确保在存在不同类型小牛的情况下快速收敛。与现有的策略(包括基于Shapley值的策略)相比,FedEMD将FedAvg聚合的神经网络分类器的收敛性提高了至少26.9%。 摘要:Shapley Value is commonly adopted to measure and incentivize client participation in federated learning. In this paper, we show -- theoretically and through simulations -- that Shapley Value underestimates the contribution of a common type of client: the Maverick. Mavericks are clients that differ both in data distribution and data quantity and can be the sole owners of certain types of data. Selecting the right clients at the right moment is important for federated learning to reduce convergence times and improve accuracy. We propose FedEMD, an adaptive client selection strategy based on the Wasserstein distance between the local and global data distributions. As FedEMD adapts the selection probability such that Mavericks are preferably selected when the model benefits from improvement on rare classes, it consistently ensures the fast convergence in the presence of different types of Mavericks. Compared to existing strategies, including Shapley Value-based ones, FedEMD improves the convergence of neural network classifiers by at least 26.9% for FedAvg aggregation compared with the state of the art.
【44】 Neighborhood Contrastive Learning for Novel Class Discovery 标题:邻域对比学习在新颖类发现中的应用
作者:Zhun Zhong,Enrico Fini,Subhankar Roy,Zhiming Luo,Elisa Ricci,Nicu Sebe 机构:University of Trento, Xiamen University ,Fondazione Bruno Kessler 备注:CVPR 2021 链接:https://arxiv.org/abs/2106.10731 摘要:在本文中,我们讨论了新类发现(NCD),即在给定已知类的标记数据集的一组未标记样本中发现新类的任务。我们利用NCD的特点建立了一个新的框架,称为邻域对比学习(NCL),用来学习对聚类性能有重要影响的区分性表示。我们的贡献是双重的。首先,我们发现在标记集上训练的特征提取器生成表示,其中泛型查询样本及其邻居可能共享同一类。我们利用这一观察结果,通过对比学习来检索和聚合伪正对,从而鼓励模型学习更多的区分性表征。其次,我们注意到大多数实例很容易被网络所识别,对对比损失的贡献较小。为了克服这个问题,我们提出在特征空间中混合标记样本和未标记样本来产生硬底片。我们通过实验证明了这两个因素对聚类性能的显著贡献,并使我们的模型大大优于最先进的方法(例如,在CIFAR-100上聚类准确率 13%,在ImageNet上聚类准确率 8%)。 摘要:In this paper, we address Novel Class Discovery (NCD), the task of unveiling new classes in a set of unlabeled samples given a labeled dataset with known classes. We exploit the peculiarities of NCD to build a new framework, named Neighborhood Contrastive Learning (NCL), to learn discriminative representations that are important to clustering performance. Our contribution is twofold. First, we find that a feature extractor trained on the labeled set generates representations in which a generic query sample and its neighbors are likely to share the same class. We exploit this observation to retrieve and aggregate pseudo-positive pairs with contrastive learning, thus encouraging the model to learn more discriminative representations. Second, we notice that most of the instances are easily discriminated by the network, contributing less to the contrastive loss. To overcome this issue, we propose to generate hard negatives by mixing labeled and unlabeled samples in the feature space. We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin (e.g., clustering accuracy 13% on CIFAR-100 and 8% on ImageNet).
【45】 Plant Disease Detection Using Image Processing and Machine Learning 标题:基于图像处理和机器学习的植物病害检测
作者:Pranesh Kulkarni,Atharva Karwande,Tejas Kolhe,Soham Kamble,Akshay Joshi,Medha Wyawahare 机构: Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, India. 链接:https://arxiv.org/abs/2106.10698 摘要:在农业实践中,一项重要而繁琐的任务是检测作物上的病害。它需要大量的时间和熟练的劳动力。利用计算机视觉和机器学习技术,提出了一种智能高效的作物病害检测技术。该系统可检测5种常见植物的20种病害,准确率达93%。 摘要:One of the important and tedious task in agricultural practices is the detection of the disease on crops. It requires huge time as well as skilled labor. This paper proposes a smart and efficient technique for detection of crop disease which uses computer vision and machine learning techniques. The proposed system is able to detect 20 different diseases of 5 common plants with 93% accuracy.
【46】 Fast PDN Impedance Prediction Using Deep Learning 标题:基于深度学习的PDN阻抗快速预测
作者:Ling Zhang,Jack Juang,Zurab Kiguradze,Bo Pu,Shuai Jin,Songping Wu,Zhiping Yang,Chulsoon Hwang 机构:EMC Laboratory, Missouri University, of Science and Technology, Rolla, MO, Google Inc., Mountain View, CA, USA, Correspondence, Univeristy of Science and Technology., Present Address, Enterprise Dr., Rolla, MO 链接:https://arxiv.org/abs/2106.10693 摘要:对于具有不规则板形和多层堆叠的印刷电路板(pcb)的配电网(PDN)的建模与仿真,采用全波仿真方法计算效率很低。提出了一种利用深度学习进行PDN阻抗预测的新概念。采用边界元法(BEM)有效地计算了任意板形和叠层的阻抗。然后随机生成100多万块不同形状、堆叠、IC定位和decap布局的电路板来训练深度神经网络(DNN)。经过训练的DNN可以准确地预测没有用于训练的新电路板配置的阻抗。使用训练的DNN所消耗的时间仅为0.1秒,比BEM方法快100多倍,比全波模拟快5000倍。 摘要:Modeling and simulating a power distribution network (PDN) for printed circuit boards (PCBs) with irregular board shapes and multi-layer stackup is computationally inefficient using full-wave simulations. This paper presents a new concept of using deep learning for PDN impedance prediction. A boundary element method (BEM) is applied to efficiently calculate the impedance for arbitrary board shape and stackup. Then over one million boards with different shapes, stackup, IC location, and decap placement are randomly generated to train a deep neural network (DNN). The trained DNN can predict the impedance accurately for new board configurations that have not been used for training. The consumed time using the trained DNN is only 0.1 seconds, which is over 100 times faster than the BEM method and 5000 times faster than full-wave simulations.
【47】 MILP, pseudo-boolean, and OMT solvers for optimal fault-tolerant placements of relay nodes in mission critical wireless networks 标题:任务关键型无线网络中中继节点最优容错配置的MILP、伪布尔和OMT求解器
作者:Quian Matteo Chen,Alberto Finzi,Toni Mancini,Igor Melatti,Enrico Tronci 机构:Department of Electrical Engineering and Information Technology, University of Naples Federico II, Italy, Computer Science Department, Sapienza University of Rome, Italy 备注:None 链接:https://arxiv.org/abs/2106.10685 摘要:在机场等关键基础设施中,必须非常小心地保护无线电通信网络免受外部电磁干扰。保护这类关键任务无线电通信网络通常是通过利用辐射计来解决的:至少有三个适当部署的辐射计,以及一个从中收集信息的网关,允许监测和定位不应出现在监测区域的电磁辐射源。通常,仪表通过中继节点连接到网关。因此,为了提供可靠的监控,对中继节点的网络进行一定程度的容错是必不可少的。另一方面,中继节点的部署通常相当昂贵。因此,我们有两个相互冲突的要求:在保证给定的容错性的同时最小化成本。在本文中,我们讨论了计算中继节点的部署问题,该问题使中继节点的网络成本最小化,同时即使某些中继节点(达到给定的最大数目)出现故障(容错),也能保证网络正常工作。我们表明,通过在高性能计算基础设施上进行计算密集型预处理,上述优化问题可以编码为0/1线性规划,适合于用标准人工智能推理机(如MILP、PB-SAT和SMT/OMT)求解。我们的问题公式使我们能够在意大利罗马达芬奇机场地区中继节点网络部署的实际案例研究中,展示比较这三种解决技术性能的实验结果。 摘要:In critical infrastructures like airports, much care has to be devoted in protecting radio communication networks from external electromagnetic interference. Protection of such mission-critical radio communication networks is usually tackled by exploiting radiogoniometers: at least three suitably deployed radiogoniometers, and a gateway gathering information from them, permit to monitor and localise sources of electromagnetic emissions that are not supposed to be present in the monitored area. Typically, radiogoniometers are connected to the gateway through relay nodes. As a result, some degree of fault-tolerance for the network of relay nodes is essential in order to offer a reliable monitoring. On the other hand, deployment of relay nodes is typically quite expensive. As a result, we have two conflicting requirements: minimise costs while guaranteeing a given fault-tolerance. In this paper, we address the problem of computing a deployment for relay nodes that minimises the relay node network cost while at the same time guaranteeing proper working of the network even when some of the relay nodes (up to a given maximum number) become faulty (fault-tolerance). We show that, by means of a computation-intensive pre-processing on a HPC infrastructure, the above optimisation problem can be encoded as a 0/1 Linear Program, becoming suitable to be approached with standard Artificial Intelligence reasoners like MILP, PB-SAT, and SMT/OMT solvers. Our problem formulation enables us to present experimental results comparing the performance of these three solving technologies on a real case study of a relay node network deployment in areas of the Leonardo da Vinci Airport in Rome, Italy.
【48】 Optimal personalised treatment computation through in silico clinical trials on patient digital twins 标题:数字双胞胎患者电子临床试验中的最佳个体化治疗计算
作者:Stefano Sinisi,Vadim Alimguzhin,Toni Mancini,Enrico Tronci,Federico Mari,Brigitte Leeners 机构:Computer Science Department, Sapienza University of Rome, Italy, Department of Movement, Human and Health Sciences, University of Rome Foro Italico, Italy, Division of Reproductive Endocrinology, University Hospital Zurich, Switzerland 备注:None 链接:https://arxiv.org/abs/2106.10684 摘要:电子临床试验(ISTC),即通过计算机模拟进行的临床实验活动,有希望减少药物治疗的安全性和有效性评估的时间和成本,减少对动物和人体试验的需要,并实现精确医学。在本文中,我们提出的方法和算法,通过广泛的计算机模拟-为基础的实验活动(ISTC)指导下的智能搜索,优化药物治疗个别病人(精确医学)。我们在一个案例研究中展示了我们的方法的有效性,这个案例研究涉及到真正的药物治疗,即人类辅助生殖的复杂临床方案的下调阶段。 摘要:In Silico Clinical Trials (ISTC), i.e., clinical experimental campaigns carried out by means of computer simulations, hold the promise to decrease time and cost for the safety and efficacy assessment of pharmacological treatments, reduce the need for animal and human testing, and enable precision medicine. In this paper we present methods and an algorithm that, by means of extensive computer simulation--based experimental campaigns (ISTC) guided by intelligent search, optimise a pharmacological treatment for an individual patient (precision medicine). e show the effectiveness of our approach on a case study involving a real pharmacological treatment, namely the downregulation phase of a complex clinical protocol for assisted reproduction in humans.
【49】 Solution for Large-scale Long-tailed Recognition with Noisy Labels 标题:一种含噪声标签的大规模长尾识别解决方案
作者:Yuqiao Xian,Jia-Xin Zhuang,Fufu Yu 机构:Sun Yat-sen University, Tencent Youtu Lab 备注:None 链接:https://arxiv.org/abs/2106.10683 摘要:这是CVPR 2021 AliProducts Challenge的技术报告。AliProducts Challenge是为研究全球领先的电子商务公司所遇到的大规模、细粒度的商品图像识别问题而提出的一项竞赛。大规模产品识别同时面临着标注噪声、数据分布不均衡(长尾)和细粒度分类的挑战。在我们的解决方案中,我们采用了CNN和Transformer的最先进的模型架构,包括ResNeSt、EfficientNetV2和DeiT。我们发现,迭代数据清洗、分类器权重归一化、高分辨率微调和测试时间增加是提高噪声和不平衡数据集训练性能的关键。最后,我们使用我们的集成模型获得了6.4365%的平均分类错误率。 摘要:This is a technical report for CVPR 2021 AliProducts Challenge. AliProducts Challenge is a competition proposed for studying the large-scale and fine-grained commodity image recognition problem encountered by worldleading ecommerce companies. The large-scale product recognition simultaneously meets the challenge of noisy annotations, imbalanced (long-tailed) data distribution and fine-grained classification. In our solution, we adopt stateof-the-art model architectures of both CNNs and Transformer, including ResNeSt, EfficientNetV2, and DeiT. We found that iterative data cleaning, classifier weight normalization, high-resolution finetuning, and test time augmentation are key components to improve the performance of training with the noisy and imbalanced dataset. Finally, we obtain 6.4365% mean class error rate in the leaderboard with our ensemble model.
【50】 Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences 标题:标签、复制或预测:使用序列进行视觉信息提取的统一弱监督学习框架
作者:Jiapeng Wang,Tianwei Wang,Guozhi Tang,Lianwen Jin,Weihong Ma,Kai Ding,Yichao Huang 机构:School of Electronic and Information Engineering, South China University of Technology, China, IntSig Information Co., Ltd, Shanghai, China, Guangdong Artificial Intelligence and Digital Economy Laboratory (Pazhou Lab), Guangzhou, China 备注:IJCAI2021 链接:https://arxiv.org/abs/2106.10681 摘要:视觉信息抽取(VIE)近年来受到越来越多的关注。现有的方法通常是先将光学字符识别(OCR)结果组织成纯文本,然后利用标记级实体标注作为监督来训练序列标注模型。然而,它耗费了大量的注释成本,并且可能会导致标签混淆,OCR错误也会严重影响最终的性能。本文提出了一个统一的弱监督学习框架TCPN(Tag,Copy或Predict Network),该框架引入了一个高效的编码器来同时对二维OCR结果中的语义和布局信息进行建模;2) 一种仅利用关键信息序列作为监督的弱监督训练策略;3)一种灵活可切换的解码器,包含两种推理模式:一种(复制或预测模式)是通过在每个时间步从输入或预测中复制一个令牌来输出不同类别的关键信息序列;另一种(标记模式)是在单个前向传递中直接标记输入序列。我们的方法在几个公共基准上显示了最新的性能,这充分证明了它的有效性。 摘要:Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, and the OCR errors will also significantly affect the final performance. In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass. Our method shows new state-of-the-art performance on several public benchmarks, which fully proves its effectiveness.
【51】 A Comprehensive Review on Non-Neural Networks Collaborative Filtering Recommendation Systems 标题:非神经网络协同过滤推荐系统综述
作者:Carmel Wenga,Majirus Fansi,Sébastien Chabrier,Jean-Martial Mari,Alban Gabillon 机构:(a) Université de la Polynésie Française 备注:29 pages, 7 tables and 2 figures 链接:https://arxiv.org/abs/2106.10679 摘要:在过去的二十年里,由于在线应用程序中数据量的爆炸式增长,推荐系统吸引了很多人的兴趣。协同过滤是信息推荐应用中应用最为广泛的一种过滤方式。协同过滤(CF)利用一组用户的已知偏好对其他用户的未知偏好进行预测和推荐(推荐是基于用户过去的行为)。20世纪90年代首次提出,各种越来越成功的模式被提出。由于机器学习技术在许多领域的成功,在推荐系统中的应用越来越受到重视。在本文中,我们概述了推荐系统的CF方法,它们的两个主要类别,以及它们的评估指标。通过介绍经典机器学习算法从最初的使用案例到先进的机器学习模型的演变过程,重点研究了经典机器学习算法在CF推荐系统中的应用。我们试图对CF系统(使用python实现)提供一个全面和比较的概述,作为这一领域研究和实践的指南。 摘要:Over the past two decades, recommender systems have attracted a lot of interest due to the explosion in the amount of data in online applications. A particular attention has been paid to collaborative filtering, which is the most widely used in applications that involve information recommendations. Collaborative filtering (CF) uses the known preference of a group of users to make predictions and recommendations about the unknown preferences of other users (recommendations are made based on the past behavior of users). First introduced in the 1990s, a wide variety of increasingly successful models have been proposed. Due to the success of machine learning techniques in many areas, there has been a growing emphasis on the application of such algorithms in recommendation systems. In this article, we present an overview of the CF approaches for recommender systems, their two main categories, and their evaluation metrics. We focus on the application of classical Machine Learning algorithms to CF recommender systems by presenting their evolution from their first use-cases to advanced Machine Learning models. We attempt to provide a comprehensive and comparative overview of CF systems (with python implementations) that can serve as a guideline for research and practice in this area.
【52】 CAMERAS: Enhanced Resolution And Sanity preserving Class Activation Mapping for image saliency 标题:摄像机:增强分辨率和保持理智的图像显著类激活映射
作者:Mohammad A. A. K. Jalwana,Naveed Akhtar,Mohammed Bennamoun,Ajmal Mian 机构:Computer Science and Software Engineering, The University of Western Australia., Code: 备注:IEEE CVPR 2021 paper 链接:https://arxiv.org/abs/2106.10649 摘要:反向传播图像显著性旨在通过估计输入中单个像素的模型中心重要性来解释模型预测。然而,网络中早期层的类不敏感只允许使用较深层的低分辨率激活图进行显著性计算,从而导致图像显著性受损。纠正这一点可能会导致理智的失败。我们提出相机,一种技术来计算高保真反向传播显著性地图不需要任何外部先验和保持地图健全。该方法系统地对激活图和反向传播梯度进行多尺度积累和融合,计算出精确的显著性图。从精确的图像显著性到不同模型输入特征相对重要性的表达,以及视觉相似对象的模型感知之间的精确区分,我们的高分辨率映射为本文提出的黑盒深度视觉模型提供了多种新颖的见解。我们还通过将攻击信号集中在地图确定的精确区域,大幅降低攻击信号的范数,证明了我们的显著性地图在对抗设置中的实用性。我们的方法也启发了新的评价指标和健全检查这一发展中的研究方向。此处提供代码https://github.com/VisMIL/CAMERAS 摘要:Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input. However, class-insensitivity of the earlier layers in a network only allows saliency computation with low resolution activation maps of the deeper layers, resulting in compromised image saliency. Remedifying this can lead to sanity failures. We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors and preserving the map sanity. Our method systematically performs multi-scale accumulation and fusion of the activation maps and backpropagated gradients to compute precise saliency maps. From accurate image saliency to articulation of relative importance of input features for different models, and precise discrimination between model perception of visually similar objects, our high-resolution mapping offers multiple novel insights into the black-box deep visual models, which are presented in the paper. We also demonstrate the utility of our saliency maps in adversarial setup by drastically reducing the norm of attack signals by focusing them on the precise regions identified by our maps. Our method also inspires new evaluation metrics and a sanity check for this developing research direction. Code is available here https://github.com/VisMIL/CAMERAS
【53】 Discrepancies in Epidemiological Modeling of Aggregated Heterogeneous Data 标题:聚合型异质数据流行病学建模中的差异
作者:Anna L. Trella,Peniel N. Argaw,Michelle M. Li,James A. Hay 机构:Harvard John A. Paulson School Of Engineering and Applied Sciences,Harvard Medical School, Harvard T.H. Chan School of Public Health 备注:Accepted to IJCAI 2021 Workshop on Artificial Intelligence for Social Good 链接:https://arxiv.org/abs/2106.10610 摘要:在流行病学建模中,大多数分析假设一个单一的流行病过程来生成地面真实数据。然而,这种假定的数据生成过程可能是不现实的,因为流行病的数据源往往是跨地理区域和社区汇总的。因此,当面对复杂系统时,用于估计流行病学参数(例如传播率)的最新模型可能是不合适的。我们的工作从经验上证明了将流行病学模型应用于聚集数据集的一些局限性。我们通过结合多个流行病的发病率曲线生成了三个复杂的暴发情景,这些流行病是通过具有不同参数集的SEIR模型独立模拟的。利用这些场景,我们评估了最先进的贝叶斯推理方法的稳健性,该方法从病毒载量监测数据中估计疫情轨迹。在这个贝叶斯推理框架下,我们评估了两个数据生成模型:一个简单的指数增长模型和一个高度灵活的高斯过程先验模型。我们的结果表明,这两个模型产生准确的传播率估计的合并发病率曲线的成本产生有偏见的估计每一个潜在的流行病,反映高度异质性的潜在人口动态。指数增长模型虽然可以解释,但无法捕捉潜在流行病的复杂性。在有足够的监测数据的情况下,高斯过程先验模型能够捕捉到复杂轨迹的形状,但在数据覆盖率较低的情况下,该模型是不精确的。因此,我们的结果强调了忽略数据生成过程中的复杂性和异质性的潜在缺陷,这可能掩盖潜在的位置和种群特定的流行病动态。 摘要:Within epidemiological modeling, the majority of analyses assume a single epidemic process for generating ground-truth data. However, this assumed data generation process can be unrealistic, since data sources for epidemics are often aggregated across geographic regions and communities. As a result, state-of-the-art models for estimating epidemiological parameters, e.g.~transmission rates, can be inappropriate when faced with complex systems. Our work empirically demonstrates some limitations of applying epidemiological models to aggregated datasets. We generate three complex outbreak scenarios by combining incidence curves from multiple epidemics that are independently simulated via SEIR models with different sets of parameters. Using these scenarios, we assess the robustness of a state-of-the-art Bayesian inference method that estimates the epidemic trajectory from viral load surveillance data. We evaluate two data-generating models within this Bayesian inference framework: a simple exponential growth model and a highly flexible Gaussian process prior model. Our results show that both models generate accurate transmission rate estimates for the combined incidence curve at the cost of generating biased estimates for each underlying epidemic, reflecting highly heterogeneous underlying population dynamics. The exponential growth model, while interpretable, is unable to capture the complexity of the underlying epidemics. With sufficient surveillance data, the Gaussian process prior model captures the shape of complex trajectories, but is imprecise for periods of low data coverage. Thus, our results highlight the potential pitfalls of neglecting complexity and heterogeneity in the data generation process, which can mask underlying location- and population-specific epidemic dynamics.
【54】 Attack to Fool and Explain Deep Networks 标题:攻击愚人并解释深层网络
作者:Naveed Akhtar,Muhammad A. A. K. Jalwana,Mohammed Bennamoun,Ajmal Mian 机构:•All authors are with the Department of Computer Science and SoftwareEngineering, University of Western Australia 备注:To appear in IEEE TPAMI. arXiv admin note: text overlap with arXiv:1905.11544 链接:https://arxiv.org/abs/2106.10606 摘要:深度视觉模型容易受到输入的对抗性干扰。尽管这些信号是经过精心设计的,但在人类看来,它们仍然像噪音一样。这一观察结果导致了深层视觉表征与人类感知不一致的论点。我们通过提供人类对抗性干扰中有意义模式的证据来反驳这一论点。我们首先提出一种攻击,愚弄一个网络,使其混淆整个类别的对象(源类)和目标标签。我们的攻击还限制了来自非源类的样本的非故意愚弄,从而限制了用于网络愚弄的人类定义的语义概念。我们证明了所提出的攻击不仅导致了扰动中规则几何模式的出现,而且揭示了深模型决策边界的深刻信息。进一步探讨这一现象,我们改变了攻击的“对抗性”目标,将其作为“解释”深层视觉表象的工具。我们表明,通过仔细的通道和投影的扰动计算我们的方法,我们可以可视化一个模型的理解人类定义的语义概念。最后,我们利用扰动的可解释性,通过攻击具有对抗性的鲁棒“分类器”来进行图像生成、修复和交互式图像处理。总之,我们的主要贡献是一种新的实用主义对抗攻击,它随后被转化为一种解释视觉模型的工具。这篇文章还为我们的攻击在多个有趣的应用中超越敌方目标做出了次要贡献。 摘要:Deep visual models are susceptible to adversarial perturbations to inputs. Although these signals are carefully crafted, they still appear noise-like patterns to humans. This observation has led to the argument that deep visual representation is misaligned with human perception. We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. We first propose an attack that fools a network to confuse a whole category of objects (source class) with a target label. Our attack also limits the unintended fooling by samples from non-sources classes, thereby circumscribing human-defined semantic notions for network fooling. We show that the proposed attack not only leads to the emergence of regular geometric patterns in the perturbations, but also reveals insightful information about the decision boundaries of deep models. Exploring this phenomenon further, we alter the `adversarial' objective of our attack to use it as a tool to `explain' deep visual representation. We show that by careful channeling and projection of the perturbations computed by our method, we can visualize a model's understanding of human-defined semantic notions. Finally, we exploit the explanability properties of our perturbations to perform image generation, inpainting and interactive image manipulation by attacking adversarialy robust `classifiers'.In all, our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models. The article also makes secondary contributions in terms of establishing the utility of our attack beyond the adversarial objective with multiple interesting applications.
【55】 Improving Label Quality by Jointly Modeling Items and Annotators 标题:通过联合建模项目和注释器来提高标签质量
作者:Tharindu Cyril Weerasooriya,Alexander G. Ororbia,Christopher M. Homan 机构:Department of Computer Science, Rochester Institute of Technology, Rochester, NY 链接:https://arxiv.org/abs/2106.10600 摘要:我们提出了一个完全贝叶斯框架,用于从噪声注释器中学习地面真值标签。我们的框架通过将标签分布上的生成性贝叶斯软聚类模型分解到经典的David和Skene联合注释器数据模型中来确保可伸缩性。早期沿着这些路线的研究既没有完全纳入标签分布,也没有探索仅由注释器或数据进行聚类。我们的框架包含了所有这些特性:(1)一个图形模型,设计用于提供注释器响应的更好的地面真值估计,作为黑盒监督学习算法的输入;(2)一个独立的神经模型,其内部结构捕获了图形模型的许多特性。我们使用这两种模型进行监督学习实验,并将其与一个基线和一个最新模型的性能进行比较。 摘要:We propose a fully Bayesian framework for learning ground truth labels from noisy annotators. Our framework ensures scalability by factoring a generative, Bayesian soft clustering model over label distributions into the classic David and Skene joint annotator-data model. Earlier research along these lines has neither fully incorporated label distributions nor explored clustering by annotators only or data only. Our framework incorporates all of these properties as: (1) a graphical model designed to provide better ground truth estimates of annotator responses as input to emph{any} black box supervised learning algorithm, and (2) a standalone neural model whose internal structure captures many of the properties of the graphical model. We conduct supervised learning experiments using both models and compare them to the performance of one baseline and a state-of-the-art model.
【56】 TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition 标题:TGRNet:一种面向表格结构识别的表格图形重构网络
作者:Wenyuan Xue,Baosheng Yu,Wen Wang,Dacheng Tao,Qingyong Li 机构:∗Beijing Key Lab of Trafic Data Analysis and Mining, Beijing Jiaotong University, China, †School of Computer Science, The University of Sydney, Australia, ‡JD Explore Academy, China 链接:https://arxiv.org/abs/2106.10598 摘要:表是一种非常有效的数据结构,在商业和科学研究中得到了广泛的应用。考虑到在线和离线文档中的大量表格数据,表格自动识别越来越受到文档分析界的关注。虽然人类可以很容易地理解表格的结构,但对于机器来说,理解表格结构仍然是一个挑战,特别是由于表格的布局和样式多种多样。现有的方法通常将表建模为不同表单元之间的标记序列或邻接矩阵,没有考虑到表单元逻辑位置的重要性,例如单元格位于表的第一行和第二列。本文将表结构识别问题转化为表图重构问题,提出了一种用于表结构识别的端到端可训练表图重构网络。具体地说,该方法有两个主要分支,一个单元检测分支和一个单元逻辑位置分支,共同预测不同单元的空间位置和逻辑位置。在三个流行的表识别数据集和一个新的带有表图标注的数据集(TableGraph-350K)上的实验结果证明了所提出的TGRNet在表结构识别中的有效性。代码和注释将公开。 摘要:A table arranging data in rows and columns is a very effective data structure, which has been widely used in business and scientific research. Considering large-scale tabular data in online and offline documents, automatic table recognition has attracted increasing attention from the document analysis community. Though human can easily understand the structure of tables, it remains a challenge for machines to understand that, especially due to a variety of different table layouts and styles. Existing methods usually model a table as either the markup sequence or the adjacency matrix between different table cells, failing to address the importance of the logical location of table cells, e.g., a cell is located in the first row and the second column of the table. In this paper, we reformulate the problem of table structure recognition as the table graph reconstruction, and propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition. Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells. Experimental results on three popular table recognition datasets and a new dataset with table graph annotations (TableGraph-350K) demonstrate the effectiveness of the proposed TGRNet for table structure recognition. Code and annotations will be made publicly available.
【57】 Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling 标题:加速策略评估:利用自适应重要性抽样学习对抗性环境
作者:Mengdi Xu,Peide Huang,Fengpei Li,Jiacheng Zhu,Xuewei Qi,Kentaro Oguchi,Zhiyuan Huang,Henry Lam,Ding Zhao 机构:. Carnegie Mellon University ,. Columbia University ,. Morgan Stanley AI CoE, . Toyota Motor North America R&D ,. Tongji University 备注:10 pages, 5 figures 链接:https://arxiv.org/abs/2106.10566 摘要:对罕见但高风险事件的评估仍然是从智能代理获取可靠策略的主要困难之一,特别是在大型或连续的状态/动作空间中,有限的可伸缩性强制使用大量的测试迭代。另一方面,安全关键系统中有偏见或不准确的策略评估可能会在部署过程中导致意外的灾难性故障。本文提出了一种加速策略评估(APE)方法,该方法能同时发现马尔可夫决策过程中的稀有事件并估计稀有事件概率。APE方法将环境本质看作一个对抗性的agent,通过自适应重要性抽样,学习零方差抽样分布进行策略评估。此外,通过引入函数逼近器,APE可以扩展到大的离散或连续空间。在适当的正则条件下,我们研究了算法的收敛性。我们的实证研究表明,在多智能体和单智能体环境下,APE估计稀有事件概率的方差较小,同时只使用了比基线方法少几个数量级的样本。 摘要:The evaluation of rare but high-stakes events remains one of the main difficulties in obtaining reliable policies from intelligent agents, especially in large or continuous state/action spaces where limited scalability enforces the use of a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. In this paper, we propose the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function approximators. We investigate the convergence properties of proposed algorithms under suitable regularity conditions. Our empirical studies show that APE estimates rare event probability with a smaller variance while only using orders of magnitude fewer samples compared to baseline methods in both multi-agent and single-agent environments.
【58】 Score-Based Explanations in Data Management and Machine Learning: An Answer-Set Programming Approach to Counterfactual Analysis 标题:数据管理和机器学习中基于分数的解释:反事实分析的答案集编程方法
作者:Leopoldo Bertossi 机构:Universidad Adolfo Ib´a˜nez, and, Millennium Inst. for Foundational Research on Data (IMFD), Santiago, Chile 备注:Paper associated to forthcoming short course at Fall School. arXiv admin note: text overlap with arXiv:2007.12799 链接:https://arxiv.org/abs/2106.10562 摘要:我们描述了一些最近的方法,以分数为基础的解释数据库中的查询答案和结果分类模型在机器学习。重点是作者和合作者所做的工作。特别强调了基于答案集编程的陈述式方法,以及反事实推理在分数说明和计算中的应用。几个例子说明了这些方法的灵活性。 摘要:We describe some recent approaches to score-based explanations for query answers in databases and outcomes from classification models in machine learning. The focus is on work done by the author and collaborators. Special emphasis is placed on declarative approaches based on answer-set programming to the use of counterfactual reasoning for score specification and computation. Several examples that illustrate the flexibility of these methods are shown.
【59】 Learning Space Partitions for Path Planning 标题:用于路径规划的学习空间划分
作者:Kevin Yang,Tianjun Zhang,Chris Cummins,Brandon Cui,Benoit Steiner,Linnan Wang,Joseph E. Gonzalez,Dan Klein,Yuandong Tian 备注:Under submission to NeurIPS 2021 链接:https://arxiv.org/abs/2106.10544 摘要:路径规划是一个有效地发现高报酬轨迹的问题,通常需要优化高维多模态报酬函数。像CEM和CMA-ES这样的流行方法贪婪地关注搜索空间中有前途的区域,并且可能陷入局部极大值。DOO和VOOT平衡探索和开发,但使用独立于奖励函数的空间划分策略进行优化。最近,LaMCTS在经验上学会了以奖励敏感的方式划分搜索空间进行黑盒优化。在本文中,我们发展了一种新的形式遗憾分析,以确定这种自适应区域划分方案何时以及为什么有效。我们还提出了一种新的路径规划方法PlaLaM,它改进了每个子区域内的函数值估计,并使用了搜索空间的潜在表示。根据经验,PlaLaM在二维导航任务中的性能优于现有的路径规划方法,特别是在存在难以逃逸的局部最优解的情况下,并且当插入带有规划组件(如PETS)的基于模型的RL时显示出优势。这些增益转移到高度多模态的现实世界任务中,在编译器相位排序方面,我们的性能比强基线高出245%,在分子设计方面,我们的性能比强基线高出0.4,在0-1的范围内。 摘要:Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works. We also propose a new path planning method PlaLaM which improves the function value estimation within each sub-region, and uses a latent representation of the search space. Empirically, PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima, and shows benefits when plugged into model-based RL with planning components such as PETS. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale.
【60】 Learning and Generalization in Overparameterized Normalizing Flows 标题:过参数化正规化流动中的学习与泛化
作者:Kulin Shah,Amit Deshpande,Navin Goyal 机构:Microsoft Research 备注:80 pages, 79 figures 链接:https://arxiv.org/abs/2106.10535 摘要:在有监督学习中,当使用随机梯度下降法训练时,具有一个隐层的超参数化神经网络具有足够小的学习率和适当的初始化,可以证明是有效的学习和推广。相比之下,过度参数化在无监督学习中的好处并没有得到很好的理解。规范化流(NFs)是无监督学习中用于抽样和密度估计的一类重要模型。在本文中,我们从理论和实证上分析了当底层神经网络是一个隐层超参数化网络时这些模型。我们的主要贡献有两个方面:(1)一方面,我们提供了理论和经验证据,证明对于一类包含大多数现有NF模型的NFs,过度参数化会损害训练(2) 另一方面,我们证明了最近引入的无约束NFs模型在底层网络被过度参数化时,可以在最小的假设下有效地学习任何合理的数据分布。 摘要:In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is one-hidden-layer overparameterized network. Our main contributions are two-fold: (1) On the one hand, we provide theoretical and empirical evidence that for a class of NFs containing most of the existing NF models, overparametrization hurts training. (2) On the other hand, we prove that unconstrained NFs, a recently introduced model, can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized.
【61】 QUBO transformation using Eigenvalue Decomposition 标题:基于特征值分解的Qubo变换
作者:Amit Verma,Mark Lewis 机构:Received: date Accepted: date 备注:Preprint submitted to Springer 链接:https://arxiv.org/abs/2106.10532 摘要:二次无约束二元优化(Quadratic Unconstrained Binary Optimization,QUBO)是一种用于组合优化问题的通用建模框架,是量子退火机的要求。本文利用基本Q矩阵的特征值分解,通过从优势特征值和特征向量中提取信息来改变和改进搜索过程,从而隐式地引导搜索向解决方案领域中有前途的领域发展。在基准数据集上的计算结果说明了我们的程序的有效性,证明了在具有显性特征值的问题上有显著的性能改进。 摘要:Quadratic Unconstrained Binary Optimization (QUBO) is a general-purpose modeling framework for combinatorial optimization problems and is a requirement for quantum annealers. This paper utilizes the eigenvalue decomposition of the underlying Q matrix to alter and improve the search process by extracting the information from dominant eigenvalues and eigenvectors to implicitly guide the search towards promising areas of the solution landscape. Computational results on benchmark datasets illustrate the efficacy of our routine demonstrating significant performance improvements on problems with dominant eigenvalues.
【62】 A Max-Min Entropy Framework for Reinforcement Learning 标题:一种强化学习的极大-最小熵框架
作者:Seungyul Han,Youngchul Sung 机构:Dept. of Electrical Engineering, KAIST, Daejeon, South Korea 备注:Submitted to NIPS 2021 链接:https://arxiv.org/abs/2106.10517 摘要:针对最大熵强化学习框架在无模型样本学习中的局限性,提出了一种最大最小熵强化学习框架。最大熵RL框架指导学习策略到达高熵状态,而max-min-entropy框架旨在学习访问低熵状态并最大化这些低熵状态的熵以促进探索。对于一般马尔可夫决策过程(MDPs),在提出的最大最小熵框架下,基于探索与开发的分离,构造了一种有效的算法。数值结果表明,与现有的RL算法相比,该算法的性能有了很大的提高。 摘要:In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the maximum entropy RL framework in model-free sample-based learning. Whereas the maximum entropy RL framework guides learning for policies to reach states with high entropy in the future, the proposed max-min entropy framework aims to learn to visit states with low entropy and maximize the entropy of these low-entropy states to promote exploration. For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework based on disentanglement of exploration and exploitation. Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms.
【63】 JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs 标题:JointGT:用于从知识图生成文本的图文联合表示学习
作者:Pei Ke,Haozhe Ji,Yu Ran,Xin Cui,Liwei Wang,Linfeng Song,Xiaoyan Zhu,Minlie Huang 机构:The CoAI group, Department of Computer Science and Technology, Institute for Artificial Intelligence, State Key Lab of Intelligent Technology and Systems, Beijing National Research Center for Information Science and Technology 备注:ACL 2021 (Findings) 链接:https://arxiv.org/abs/2106.10502 摘要:现有的知识图到文本(KG-to-text)生成的预训练模型只是对文本到文本的预训练模型进行微调,例如在KG-to-text数据集上的BART或T5,这在很大程度上忽略了编码过程中的图结构,并且缺乏详细的预训练任务来显式地建模图-文本对齐。为了解决这些问题,我们提出了一个称为JointGT的图文联合表示学习模型。在编码过程中,我们设计了一个结构感知的语义聚合模块,嵌入到每个转换层中,以保持图的结构。此外,我们提出了三个新的预训练任务来显式地增强图-文本对齐,包括各自的文本/图重建,以及通过最优传输在嵌入空间进行图-文本对齐。实验表明,JointGT在不同的KG-to-text数据集上获得了最新的性能。 摘要:Existing pre-trained models for knowledge-graph-to-text (KG-to-text) generation simply fine-tune text-to-text pre-trained models such as BART or T5 on KG-to-text datasets, which largely ignore the graph structure during encoding and lack elaborate pre-training tasks to explicitly model graph-text alignments. To tackle these problems, we propose a graph-text joint representation learning model called JointGT. During encoding, we devise a structure-aware semantic aggregation module which is plugged into each Transformer layer to preserve the graph structure. Furthermore, we propose three new pre-training tasks to explicitly enhance the graph-text alignment including respective text / graph reconstruction, and graph-text alignment in the embedding space via Optimal Transport. Experiments show that JointGT obtains new state-of-the-art performance on various KG-to-text datasets.
【64】 Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication 标题:用分块矩阵-矩阵乘法评估空间加速器体系结构
作者:Gordon E. Moon,Hyoukjun Kwon,Geonhwa Jeong,Prasanth Chatarasi,Sivasankaran Rajamanickam,Tushar Krishna 链接:https://arxiv.org/abs/2106.10499 摘要:人们对机器学习应用中的定制空间加速器越来越感兴趣。这些加速器采用了一个处理元素(PE)的空间阵列,通过定制的缓冲区层次结构和芯片上的网络进行交互。这些加速器的效率来自于采用优化的数据流(即跨PEs的数据的空间/时间分区和细粒度调度)策略来优化数据重用。这项工作的重点是评估这些加速器架构使用平铺通用矩阵乘法(GEMM)内核。为此,我们开发了一个框架,利用运行时和能量的分析成本模型,为给定空间加速器和工作负载组合的平铺GEMM找到优化映射(数据流和平铺大小)。我们对五个空间加速器的评估表明,由我们的框架系统生成的平铺GEMM映射在各种GEMM工作负载和加速器上实现了高性能。 摘要:There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.
【65】 Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters 标题:连续参数文语转换语音声码研究进展
作者:Mohammed Salah Al-Radhi,Tamás Gábor Csapó,Géza Németh 机构: Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary, MTA-ELTE Lendület Lingual Articulation Research Group, Budapest, Hungary 备注:6 pages, 3 figures, International Conference on Artificial Intelligence and Speech Technology (AIST2020) 链接:https://arxiv.org/abs/2106.10481 摘要:声码器作为统计参数文本到语音(TTS)合成和语音转换系统中的主要部件,受到了新的关注。尽管有一些声码技术提供了几乎被接受的合成语音,但是它们的高计算复杂度和不规则结构仍然被认为是具有挑战性的问题,这会导致各种各样的语音质量下降。因此,本文提出了一种新的连续声码器技术,即所有特征都是连续的,并提出了一种灵活的语音合成系统。首先,提出了一种新的基于相位失真的连续噪声掩蔽方法,消除了残余噪声对感知的影响,使噪声特性得到准确的重建。其次,针对基于递归网络的TTS任务,提出了神经序列到序列的建模方法。研究了双向长短时记忆(LSTM)和选通递归单元(GRU),并将其应用于连续参数建模,使其听起来更像人。评价结果表明,与其他传统方法相比,该模型达到了语音合成的最新水平。 摘要:Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS) synthesis and speech transformation systems. Even though there are vocoding techniques give almost accepted synthesized speech, their high computational complexity and irregular structures are still considered challenging concerns, which yield a variety of voice quality degradation. Therefore, this paper presents new techniques in a continuous vocoder, that is all features are continuous and presents a flexible speech synthesis system. First, a new continuous noise masking based on the phase distortion is proposed to eliminate the perceptual impact of the residual noise and letting an accurate reconstruction of noise characteristics. Second, we addressed the need of neural sequence to sequence modeling approach for the task of TTS based on recurrent networks. Bidirectional long short-term memory (LSTM) and gated recurrent unit (GRU) are studied and applied to model continuous parameters for more natural-sounding like a human. The evaluation results proved that the proposed model achieves the state-of-the-art performance of the speech synthesis compared with the other traditional methods.
【66】 Practical Transferability Estimation for Image Classification Tasks 标题:一种实用的图像分类任务可转移性估计
作者:Yang Tan,Yang Li,Shao-Lun Huang 机构:Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, China 备注:12 pages 链接:https://arxiv.org/abs/2106.10479 摘要:迁移性估计是迁移学习中的一个重要问题,它用来预测将源模型(源任务)迁移到目标任务时的性能。最近的分析性可转移性度量被广泛应用于源模型选择和多任务学习。在具有挑战性的跨域跨任务传输设置下,早期的指标并不能很好地工作,但是最近的OTCE评分通过使用辅助任务获得了显著的性能。一个名为OT-based NCE score的简化版本牺牲了准确度以提高效率,但它可以进一步改进。因此,我们提出了一个实用的可转移性度量JC-NCE评分,以进一步提高跨域跨任务可转移性估计的性能,该评分比OTCE评分更有效,比基于OT的NCE评分更准确。具体来说,我们通过求解一个同时考虑样本距离和标签距离的最优传输问题来建立源数据和目标数据之间的联合对应关系,然后计算可传输性得分作为负条件熵。在数据集内和数据集间转移设置下的广泛验证表明,我们的JC-NCE得分优于基于OT的NCE得分,分别获得约7%和12%的收益。 摘要:Transferability estimation is an essential problem in transfer learning to predict how good the performance is when transfer a source model (source task) to a target task. Recent analytical transferability metrics have been widely used for source model selection and multi-task learning. Earlier metrics does not work sufficiently well under the challenging cross-domain cross-task transfer settings, but recent OTCE score achieves a noteworthy performance using auxiliary tasks. A simplified version named OT-based NCE score sacrifices accuracy to be more efficient, but it can be further improved. Consequently, we propose a practical transferability metric called JC-NCE score to further improve the cross-domain cross-task transferability estimation performance, which is more efficient than the OTCE score and more accurate than the OT-based NCE score. Specifically, we build the joint correspondences between source and target data via solving an optimal transport problem with considering both the sample distance and label distance, and then compute the transferability score as the negative conditional entropy. Extensive validations under the intra-dataset and inter-dataset transfer settings demonstrate that our JC-NCE score outperforms the OT-based NCE score with about 7% and 12% gains, respectively.
【67】 Learning Timestamp-Level Representations for Time Series with Hierarchical Contrastive Loss 标题:具有分层对比损失的时间序列的学习时间戳层次表示
作者:Zhihan Yue,Yujing Wang,Juanyong Duan,Tianmeng Yang,Congrui Huang,Bixiong Xu 机构:Peking University,Microsoft 备注:20 pages, 6 figures 链接:https://arxiv.org/abs/2106.10466 摘要:提出了一种学习时间序列时间戳级表示的通用框架TS2Vec。与现有方法不同,TS2Vec执行时间戳区分,它直接为每个时间戳学习上下文表示向量。我们发现学习到的表征具有很强的预测能力。在有监督的时间序列预测中,基于学习表示的线性回归算法的性能优于以往的sota算法。此外,实例级表示可以简单地通过在所有时间戳的学习表示之上应用最大池层来获得。我们对时间序列分类任务进行了大量的实验,以评估实例级表示的质量。结果表明,在125个UCR数据集和29个UEA数据集上,TS2Vec与现有的无监督时间序列表示方法相比有了显著的改进。源代码在https://github.com/yuezhihan/ts2vec. 摘要:This paper presents TS2Vec, a universal framework for learning timestamp-level representations of time series. Unlike existing methods, TS2Vec performs timestamp-wise discrimination, which learns a contextual representation vector directly for each timestamp. We find that the learned representations have superior predictive ability. A linear regression trained on top of the learned representations outperforms previous SOTAs for supervised time series forecasting. Also, the instance-level representations can be simply obtained by applying a max pooling layer on top of learned representations of all timestamps. We conduct extensive experiments on time series classification tasks to evaluate the quality of instance-level representations. As a result, TS2Vec achieves significant improvement compared with existing SOTAs of unsupervised time series representation on 125 UCR datasets and 29 UEA datasets. The source code is publicly available at https://github.com/yuezhihan/ts2vec.
【68】 Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering 标题:满足您的需求:用于视频答疑的运动外观协同网络
作者:Ahjeong Seo,Gi-Cheon Kang,Joonhan Park,Byoung-Tak Zhang 机构:AI Institute for Seoul National University (AIIS), Hanyang University 备注:ACL 2021 链接:https://arxiv.org/abs/2106.10446 摘要:视频问答是一项任务,它需要一个人工智能代理回答问题的基础上,视频。这项任务包含三个关键挑战:(1)理解各种问题的意图,(2)捕捉输入视频的各种元素(例如,对象、动作、因果关系),以及(3)语言和视觉信息之间的跨模态基础。我们提出了运动-外观协同网络(MASN),它嵌入了基于运动和外观信息的两个跨模态特征,并根据问题的意图有选择地利用它们。MASN由运动模块、外观模块和运动外观融合模块组成。运动模块计算面向动作的跨模态关节表示,而外观模块关注输入视频的外观。最后,运动-外观融合模块将运动模块和外观模块的每个输出作为输入,进行问题引导融合。因此,MASN在TGIF-QA和MSVD-QA数据集上实现了最新的性能。通过可视化MASN的推理结果进行定性分析。代码可在https://github.com/ahjeongseo/MASN-pytorch. 摘要:Video Question Answering is a task which requires an AI agent to answer questions grounded in video. This task entails three key challenges: (1) understand the intention of various questions, (2) capturing various elements of the input video (e.g., object, action, causality), and (3) cross-modal grounding between language and vision information. We propose Motion-Appearance Synergistic Networks (MASN), which embed two cross-modal features grounded on motion and appearance information and selectively utilize them depending on the question's intentions. MASN consists of a motion module, an appearance module, and a motion-appearance fusion module. The motion module computes the action-oriented cross-modal joint representations, while the appearance module focuses on the appearance aspect of the input video. Finally, the motion-appearance fusion module takes each output of the motion module and the appearance module as input, and performs question-guided fusion. As a result, MASN achieves new state-of-the-art performance on the TGIF-QA and MSVD-QA datasets. We also conduct qualitative analysis by visualizing the inference results of MASN. The code is available at https://github.com/ahjeongseo/MASN-pytorch.
【69】 Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions 标题:已知和未知转移的近极小极大最优对抗模仿学习
作者:Tian Xu,Ziniu Li,Yang Yu 机构:National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, Shenzhen , China, Pazhou Lab, Guangzhou, China, Polixir.ai 链接:https://arxiv.org/abs/2106.10424 摘要:本文致力于设计可证明有效的对抗性模仿学习(AIL)算法,直接从专家演示中优化策略。首先,在已知的过渡设置下,我们开发了一个专家样本复杂度为$tilde{O}(H ^{3/2}| S |/varepsilon)$的过渡感知的AIL算法,其中,$H$是规划范围,$S |$是状态空间大小,$varepsilon$是期望的策略值差距。这改进了以前所有方法的$tilde{O}(H^2 | S |/varepsilon^2)$的最佳界限,并与[Rajaraman et al.,2021]中的$tilde{Omega}(H ^{3/2}S |/varepsilon)$的下界匹配到对数因子。TAIL的核心是专家状态行为分布的细粒度估计,它明确地利用了转移函数信息。其次,考虑到实际环境中过渡函数通常是未知的,但允许环境交互,因此我们提出了一种基于模型的过渡感知AIL算法MB-TAIL。特别地,MB-TAIL通过与环境的交互作用建立了一个经验转换模型,并在恢复的经验模型下进行了仿真。MB-TAIL的交互复杂度为$tilde{O}(H^3 | S | 2 | A |/varepsilon^2)$,这改进了[Shani et al.,2021]中$tilde{O}(H^4 | S | 2 | A |/varepsilon ^2)$的最著名结果。最后,我们的理论结果得到了两个具有挑战性的mdp的数值计算和详细分析的支持。 摘要:This paper is dedicated to designing provably efficient adversarial imitation learning (AIL) algorithms that directly optimize policies from expert demonstrations. Firstly, we develop a transition-aware AIL algorithm named TAIL with an expert sample complexity of $tilde{O}(H^{3/2} |S|/varepsilon)$ under the known transition setting, where $H$ is the planning horizon, $|S|$ is the state space size and $varepsilon$ is desired policy value gap. This improves upon the previous best bound of $tilde{O}(H^2 |S| / varepsilon^2)$ for AIL methods and matches the lower bound of $tilde{Omega} (H^{3/2} |S|/varepsilon)$ in [Rajaraman et al., 2021] up to a logarithmic factor. The key ingredient of TAIL is a fine-grained estimator for expert state-action distribution, which explicitly utilizes the transition function information. Secondly, considering practical settings where the transition functions are usually unknown but environment interaction is allowed, we accordingly develop a model-based transition-aware AIL algorithm named MB-TAIL. In particular, MB-TAIL builds an empirical transition model by interacting with the environment and performs imitation under the recovered empirical model. The interaction complexity of MB-TAIL is $tilde{O} (H^3 |S|^2 |A| / varepsilon^2)$, which improves the best known result of $tilde{O} (H^4 |S|^2 |A| / varepsilon^2)$ in [Shani et al., 2021]. Finally, our theoretical results are supported by numerical evaluation and detailed analysis on two challenging MDPs.
【70】 Predicting Critical Nodes in Temporal Networks by Dynamic Graph Convolutional Networks 标题:基于动态图卷积网络的时态网络关键节点预测
作者:En-Yu Yu,Yan Fu,Jun-Lin Zhou,Hong-Liang Sun,Duan-Bing Chen 机构:Big Data Research Center, University of Electronic Science and Technology of China, School of Information Engineering, Nanjing University of Finance and Economics, School of Computer Science and Technology, University of Nottingham, Ningbo, P. 链接:https://arxiv.org/abs/2106.10419 摘要:许多真实世界的系统都可以用时态网络来表示,节点在结构和功能上扮演着截然不同的角色,边缘表示节点之间的关系。识别关键节点可以帮助我们控制舆论或流行病的传播,预测学术界的领军人物,为各种商品做广告,等等。然而,在时态网络中,由于网络结构随时间变化,识别关键节点相当困难。本文结合时态网络的序列拓扑信息,提出了一种新颖有效的基于特殊GCNs和RNNs相结合的学习框架,用于识别具有最佳扩展能力的节点。通过加权模型评价了该方法的有效性。在四个真实时态网络上的实验结果表明,该方法在Kendall$tau$系数和top$k$命中率方面均优于传统和深度学习基准方法。 摘要:Many real-world systems can be expressed in temporal networks with nodes playing far different roles in structure and function and edges representing the relationships between nodes. Identifying critical nodes can help us control the spread of public opinions or epidemics, predict leading figures in academia, conduct advertisements for various commodities, and so on. However, it is rather difficult to identify critical nodes because the network structure changes over time in temporal networks. In this paper, considering the sequence topological information of temporal networks, a novel and effective learning framework based on the combination of special GCNs and RNNs is proposed to identify nodes with the best spreading ability. The effectiveness of the approach is evaluated by weighted Susceptible-Infected-Recovered model. Experimental results on four real-world temporal networks demonstrate that the proposed method outperforms both traditional and deep learning benchmark methods in terms of the Kendall $tau$ coefficient and top $k$ hit rate.
【71】 Boosting Offline Reinforcement Learning with Residual Generative Modeling 标题:基于残差产生式建模的离线强化学习
作者:Hua Wei,Deheng Ye,Zhao Liu,Hao Wu,Bo Yuan,Qiang Fu,Wei Yang,Zhenhui,Li 机构:Tencent AI Lab, Shenzhen, China, The Pennsylvania State University, University Park, USA 备注:Accepted by IJCAI 2021, appendix included, 9 pages, 4 figures, 2 tables 链接:https://arxiv.org/abs/2106.10411 摘要:离线强化学习(RL)试图在不进行在线探索的情况下,通过记录离线经验来学习近似最优策略。现有的离线RL研究包括:1)生成性建模,即使用固定数据近似策略;学习状态-动作-价值函数。目前的研究主要集中在状态作用函数部分,通过减少训练数据分布偏移引起的值函数逼近的自举误差,而忽略了生成建模中误差传播的影响。本文分析了生成性建模中的误差。我们提出了一个残差生成模型AQL(action-conditional Q-learning)来减少离线RL的策略逼近误差。我们证明了我们的方法可以在不同的基准数据集中学习更精确的策略近似。此外,在多人在线竞技场(MOBA)游戏《国王的荣誉》中,我们还证明了所提出的离线RL方法可以在复杂的控制任务中学习到更多有竞争力的人工智能。 摘要:Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.
【72】 Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments 标题:Scenic4RL:强化学习环境的程序化建模与生成
作者:Abdus Salam Azad,Edward Kim,Qiancheng Wu,Kimin Lee,Ion Stoica,Pieter Abbeel,Sanjit A. Seshia 机构:Department of Electrical Engineering and Computer Sciences, University of California, Berkeley 备注:First two authors contributed equally. Currently Under Review 链接:https://arxiv.org/abs/2106.10365 摘要:强化学习(RL)agent的能力直接取决于环境生成的学习场景的多样性以及它捕捉真实场景的程度。然而,现有的环境/模拟器缺乏对初始状态分布和过渡动力学进行系统建模的支持。此外,在足球等复杂领域,可能场景的空间是无限的,这使得一个研究小组不可能提供一套完整的场景来训练、测试和基准RL算法。为了解决这个问题,我们首次采用了一种现有的形式化场景描述语言scient,直观地建模和生成交互场景。我们将SCENIC与googleresearch足球环境连接起来,创建了一个名为SCENIC4RL的平台。利用这个平台,我们提供了一个由36个场景程序组成的数据集,这些场景程序以场景和演示数据的形式编码,并从其中的一个子集生成演示数据。我们分享我们的实验结果,以显示我们的数据集和平台的有效性,训练,测试和基准RL算法。更重要的是,我们将我们的平台开源,使RL社区能够共同为构建一套全面的场景做出贡献。 摘要:The capability of reinforcement learning (RL) agent directly depends on the diversity of learning scenarios the environment generates and how closely it captures real-world situations. However, existing environments/simulators lack the support to systematically model distributions over initial states and transition dynamics. Furthermore, in complex domains such as soccer, the space of possible scenarios is infinite, which makes it impossible for one research group to provide a comprehensive set of scenarios to train, test, and benchmark RL algorithms. To address this issue, for the first time, we adopt an existing formal scenario specification language, SCENIC, to intuitively model and generate interactive scenarios. We interfaced SCENIC to Google Research Soccer environment to create a platform called SCENIC4RL. Using this platform, we provide a dataset consisting of 36 scenario programs encoded in SCENIC and demonstration data generated from a subset of them. We share our experimental results to show the effectiveness of our dataset and the platform to train, test, and benchmark RL algorithms. More importantly, we open-source our platform to enable RL community to collectively contribute to constructing a comprehensive set of scenarios.
【73】 The Perils of Learning Before Optimizing 标题:先学习后优化的危险
作者:Chris Cameron,Jason Hartford,Taylor Lundy,Kevin Leyton-Brown 机构:Department of Computer Science, University of British Columbia 链接:https://arxiv.org/abs/2106.10349 摘要:制定现实世界中的优化问题通常是从历史数据进行预测开始的(例如,一个旨在推荐快速路线的优化器依赖于行程时间预测)。通常,用于生成优化问题的预测模型的学习和问题的求解分两个阶段进行。最近的工作表明,这种预测模型可以通过优化任务进行区分,从而实现端到端的学习。这种方法通常会产生经验上的改进,这通常归因于端到端进行了比两阶段解决方案中使用的标准损失函数更好的误差权衡。我们完善了这一解释,并更精确地描述了端到端何时可以提高性能。当预测目标是随机的时,两阶段解决方案必须事先选择要建模的目标分布的统计信息——我们考虑对预测目标的期望——而端到端解决方案可以自适应地做出这种选择。我们证明了两阶段和端到端方法之间的性能差距与随机优化中的{price of correlation}概念密切相关,并展示了一些现有POC结果对我们的预测-然后优化问题的影响。然后,我们考虑一种新的特别实用的设置,其中目标函数中的系数依赖于多个预测目标。我们给出了显式结构,其中(1)两阶段的性能无限差于端到端;(2)两级最优。我们确定了大量的实际应用程序,这些应用程序的目标函数依赖于多个预测目标,但仍然部署了两阶段解决方案。我们还使用模拟实验量化性能差距。 摘要:Formulating real-world optimization problems often begins with making predictions from historical data (e.g., an optimizer that aims to recommend fast routes relies upon travel-time predictions). Typically, learning the prediction model used to generate the optimization problem and solving that problem are performed in two separate stages. Recent work has showed how such prediction models can be learned end-to-end by differentiating through the optimization task. Such methods often yield empirical improvements, which are typically attributed to end-to-end making better error tradeoffs than the standard loss function used in a two-stage solution. We refine this explanation and more precisely characterize when end-to-end can improve performance. When prediction targets are stochastic, a two-stage solution must make an a priori choice about which statistics of the target distribution to model -- we consider expectations over prediction targets -- while an end-to-end solution can make this choice adaptively. We show that the performance gap between a two-stage and end-to-end approach is closely related to the emph{price of correlation} concept in stochastic optimization and show the implications of some existing POC results for our predict-then-optimize problem. We then consider a novel and particularly practical setting, where coefficients in the objective function depend on multiple prediction targets. We give explicit constructions where (1) two-stage performs unboundedly worse than end-to-end; and (2) two-stage is optimal. We identify a large set of real-world applications whose objective functions rely on multiple prediction targets but which nevertheless deploy two-stage solutions. We also use simulations to experimentally quantify performance gaps.
【74】 A system of vision sensor based deep neural networks for complex driving scene analysis in support of crash risk assessment and prevention 标题:基于视觉传感器的深度神经网络复杂驾驶场景分析系统支持碰撞风险评估和预防
作者:Muhammad Monjurul Karim,Yu Li,Ruwen Qin,Zhaozheng Yin 机构:Department of Civil Engineering, Stony Brook University, Stony Brook, NY , USA, Department of Computer Science 备注:11 Pages, 8 Figures, Presented in TRB conference 链接:https://arxiv.org/abs/2106.10319 摘要:为了帮助人类驾驶员和自动驾驶车辆评估碰撞风险,使用车辆上的仪表板摄像头和深度学习算法进行驾驶场景分析至关重要。尽管这些技术越来越普及,但为此目的的驾驶场景分析仍然是一个挑战。这主要是由于缺乏用于分析碰撞风险指标和碰撞可能性的带注释的大型图像数据集,以及缺乏从复杂驾驶场景中提取大量所需信息的有效方法。为了填补这一空白,本文开发了一个场景分析系统。该系统的多任务神经网络由两个多任务神经网络组成,分别进行场景分类,为每个场景提供四个标签。该系统将deeplabv3和yolov3结合起来,检测和定位危险行人和最近的车辆。所有已识别的信息都可以为自动驾驶车辆或人类驾驶员提供态势感知,以识别周围交通的碰撞风险。为了解决交通事故研究中注释图像数据集的不足,本文开发了两个全新的数据集并向公众开放,这两个数据集在训练所提出的深度神经网络方面是有效的。文中进一步对多网的性能和系统的效率进行了评估。通过典型实例进一步说明了综合场景分析。结果表明,所开发的系统和数据集对驾驶场景分析的有效性,以及对碰撞风险评估和碰撞预防的支持性。 摘要:To assist human drivers and autonomous vehicles in assessing crash risks, driving scene analysis using dash cameras on vehicles and deep learning algorithms is of paramount importance. Although these technologies are increasingly available, driving scene analysis for this purpose still remains a challenge. This is mainly due to the lack of annotated large image datasets for analyzing crash risk indicators and crash likelihood, and the lack of an effective method to extract lots of required information from complex driving scenes. To fill the gap, this paper develops a scene analysis system. The Multi-Net of the system includes two multi-task neural networks that perform scene classification to provide four labels for each scene. The DeepLab v3 and YOLO v3 are combined by the system to detect and locate risky pedestrians and the nearest vehicles. All identified information can provide the situational awareness to autonomous vehicles or human drivers for identifying crash risks from the surrounding traffic. To address the scarcity of annotated image datasets for studying traffic crashes, two completely new datasets have been developed by this paper and made available to the public, which were proved to be effective in training the proposed deep neural networks. The paper further evaluates the performance of the Multi-Net and the efficiency of the developed system. Comprehensive scene analysis is further illustrated with representative examples. Results demonstrate the effectiveness of the developed system and datasets for driving scene analysis, and their supportiveness for crash risk assessment and crash prevention.
【75】 Sample Efficient Social Navigation Using Inverse Reinforcement Learning 标题:基于逆向强化学习的样本高效社交导航
作者:Bobak H. Baghi,Gregory Dudek 机构: Baghi and 2Gregory Dudek are with the School ofComputer Science, McGill University, 3 480 Rue University 链接:https://arxiv.org/abs/2106.10318 摘要:在本文中,我们提出了一个算法,以有效地学习社会兼容的导航政策,从观察人类的轨迹。当移动机器人开始在社交空间中居住和交通时,它们必须考虑到社交线索,并以一种顺应社会的方式行事。我们专注于从例子中学习这些线索。我们描述了一种基于逆强化学习的算法,该算法在不知道人的具体行为的情况下,从人的轨迹观察中学习。我们通过利用重放缓冲区的概念(在许多非策略强化学习方法中发现)来消除与逆强化学习相关的额外样本复杂性,从而提高了我们方法的样本效率。我们通过使用公开的行人运动数据集训练代理来评估我们的方法,并将其与相关方法进行比较。结果表明,该方法在降低训练时间和样本复杂度的同时,具有更好的性能。 摘要:In this paper, we present an algorithm to efficiently learn socially-compliant navigation policies from observations of human trajectories. As mobile robots come to inhabit and traffic social spaces, they must account for social cues and behave in a socially compliant manner. We focus on learning such cues from examples. We describe an inverse reinforcement learning based algorithm which learns from human trajectory observations without knowing their specific actions. We increase the sample-efficiency of our approach over alternative methods by leveraging the notion of a replay buffer (found in many off-policy reinforcement learning methods) to eliminate the additional sample complexity associated with inverse reinforcement learning. We evaluate our method by training agents using publicly available pedestrian motion data sets and compare it to related methods. We show that our approach yields better performance while also decreasing training time and sample complexity.
【76】 Proper Value Equivalence 标题:真值等价
作者:Christopher Grimm,André Barreto,Gregory Farquhar,David Silver,Satinder Singh 机构:Computer Science & Engineering, University of Michigan, DeepMind 链接:https://arxiv.org/abs/2106.10316 摘要:基于模型的强化学习(RL)的一个主要挑战是决定环境的哪些方面应该被建模。价值等价(VE)原则为这个问题提供了一个简单的答案:模型应该捕获与基于价值的规划相关的环境方面。从技术上讲,VE是基于一组策略和一组函数来区分模型的:一个模型被称为对环境的VE,如果它为策略诱导的Bellman操作符在应用于函数时产生正确的结果。随着策略和函数数量的增加,VE模型集逐渐缩小,最终崩溃为对应于一个完美模型的单个点。因此,价值工程原则的一个基本问题是如何选择最小的策略和功能集,这些策略和功能集足以进行规划。在本文中,我们朝着回答这个问题迈出了重要的一步。首先,我们将VE的概念推广到与Bellman算子的$k$应用程序相关的$k$对等体。这将导致一系列VE类的大小增加为$krightarrowinfty$。在极限条件下,所有函数都成为值函数,我们有一个特殊的VE实例,我们称之为适当的VE或简称PVE。与VE不同,PVE类可能包含多个模型,即使在使用所有值函数的限制下也是如此。最关键的是,所有这些模型都足以进行规划,这意味着它们将产生一个最优的政策,尽管它们可能会忽略环境的许多方面。我们构造了一个用于学习PVE模型的损失函数,并认为MuZero和Muesli等流行算法可以理解为最小化这种损失的上界。我们利用这一联系提出了对MuZero的一个修改,并表明它可以在实践中提高性能。 摘要:One of the main challenges in model-based reinforcement learning (RL) is to decide which aspects of the environment should be modeled. The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning. Technically, VE distinguishes models based on a set of policies and a set of functions: a model is said to be VE to the environment if the Bellman operators it induces for the policies yield the correct result when applied to the functions. As the number of policies and functions increase, the set of VE models shrinks, eventually collapsing to a single point corresponding to a perfect model. A fundamental question underlying the VE principle is thus how to select the smallest sets of policies and functions that are sufficient for planning. In this paper we take an important step towards answering this question. We start by generalizing the concept of VE to order-$k$ counterparts defined with respect to $k$ applications of the Bellman operator. This leads to a family of VE classes that increase in size as $k rightarrow infty$. In the limit, all functions become value functions, and we have a special instantiation of VE which we call proper VE or simply PVE. Unlike VE, the PVE class may contain multiple models even in the limit when all value functions are used. Crucially, all these models are sufficient for planning, meaning that they will yield an optimal policy despite the fact that they may ignore many aspects of the environment. We construct a loss function for learning PVE models and argue that popular algorithms such as MuZero and Muesli can be understood as minimizing an upper bound for this loss. We leverage this connection to propose a modification to MuZero and show that it can lead to improved performance in practice.
【77】 Towards Single Stage Weakly Supervised Semantic Segmentation 标题:面向单阶段弱监督的语义切分
作者:Peri Akiva,Kristin Dana 机构:Rutgers University 链接:https://arxiv.org/abs/2106.10309 摘要:获取语义分割标签的昂贵过程推动了对弱监督语义分割(WSSS)方法的研究,该方法仅使用图像级、点或盒标签。由于缺乏密集的场景表示,通常需要通过多个阶段的训练和细化来增加复杂度以获得关于场景的附加语义信息。当前最先进的(SOTA)模型利用图像级标签来生成类激活映射(cam),这些类激活映射经过多个细化阶段,然后再对它们进行阈值化,从而生成用于监控的伪掩码。多阶段方法的计算成本很高,而且cam生成依赖于图像级标签,对于更复杂的场景缺乏通用性。相反,我们的方法提供了一个可推广到任意数据集的单阶段方法,即可以从头开始训练,不依赖于预先训练的主干、分类或单独的细化任务。我们利用点注释通过细化和过滤的特征生成可靠的动态伪掩模。虽然我们的方法需要的点注释只比图像级注释稍微贵一点,但我们要证明SOTA在基准数据集(PascalVOC 2012)上的性能,以及在最近的真实数据集(CRAID、CityPersons、IAD)上显著优于其他SOTA WSSS方法。 摘要:The costly process of obtaining semantic segmentation labels has driven research towards weakly supervised semantic segmentation (WSSS) methods, using only image-level, point, or box labels. The lack of dense scene representation requires methods to increase complexity to obtain additional semantic information about the scene, often done through multiple stages of training and refinement. Current state-of-the-art (SOTA) models leverage image-level labels to produce class activation maps (CAMs) which go through multiple stages of refinement before they are thresholded to make pseudo-masks for supervision. The multi-stage approach is computationally expensive, and dependency on image-level labels for CAMs generation lacks generalizability to more complex scenes. In contrary, our method offers a single-stage approach generalizable to arbitrary dataset, that is trainable from scratch, without any dependency on pre-trained backbones, classification, or separate refinement tasks. We utilize point annotations to generate reliable, on-the-fly pseudo-masks through refined and filtered features. While our method requires point annotations that are only slightly more expensive than image-level annotations, we are to demonstrate SOTA performance on benchmark datasets (PascalVOC 2012), as well as significantly outperform other SOTA WSSS methods on recent real-world datasets (CRAID, CityPersons, IAD).
【78】 Multi-Task Learning for User Engagement and Adoption in Live Video Streaming Events 标题:视频直播活动中用户参与度和采用率的多任务学习
作者:Stefanos Antaris,Dimitrios Rafailidis,Romina Arriaza 机构: KTH Royal Institute of Technology, Sweden, Hive Streaming AB, Sweden, University of Thessaly, Greece 链接:https://arxiv.org/abs/2106.10305 摘要:目前,流媒体直播已成为国际大型企业观众交流的主流。如果观众分布在世界各地,主要的挑战在于如何安排最佳的活动时间,以提高观众的参与度和采纳率。本文提出了一个多任务深度强化学习模型来选择实时视频流事件的时间,旨在同时优化观看者的参与度和接受度。我们将观众的参与和采纳视为独立的任务,并建立一个统一的损失函数来学习一个共同的策略。此外,我们还考虑到每个任务可能对agent的训练策略有不同的贡献。因此,为了确定每个任务对agent训练的贡献,我们为每个任务的状态-动作转换设计了一个转换器的体系结构。我们在四个真实世界的数据集上评估了我们提出的模型,这些数据集是由四家大型企业从2019年1月到2021年3月的实时视频流事件生成的。我们的实验表明,与几种最先进的策略相比,我们提出的模型是有效的。出于复制目的,我们的评估数据集和实现可在https://github.com/stefanosantaris/merlin. 摘要:Nowadays, live video streaming events have become a mainstay in viewer's communication in large international enterprises. Provided that viewers are distributed worldwide, the main challenge resides on how to schedule the optimal event's time so as to improve both the viewer's engagement and adoption. In this paper we present a multi-task deep reinforcement learning model to select the time of a live video streaming event, aiming to optimize the viewer's engagement and adoption at the same time. We consider the engagement and adoption of the viewers as independent tasks and formulate a unified loss function to learn a common policy. In addition, we account for the fact that each task might have different contribution to the training strategy of the agent. Therefore, to determine the contribution of each task to the agent's training, we design a Transformer's architecture for the state-action transitions of each task. We evaluate our proposed model on four real-world datasets, generated by the live video streaming events of four large enterprises spanning from January 2019 until March 2021. Our experiments demonstrate the effectiveness of the proposed model when compared with several state-of-the-art strategies. For reproduction purposes, our evaluation datasets and implementation are publicly available at https://github.com/stefanosantaris/merlin.
【79】 Dependency Structure Misspecification in Multi-Source Weak Supervision Models 标题:多源弱监督模型中的依赖结构错误规范
作者:Salva Rühling Cachay,Benedikt Boecking,Artur Dubrawski 机构:Carnegie Mellon University 备注:Oral presentation at the Workshop on Weakly Supervised Learning at ICLR 2021 链接:https://arxiv.org/abs/2106.10302 摘要:数据编程(DP)已被证明是一个有吸引力的替代昂贵的手工数据标签。在DP中,用户将领域知识编码为emph{labeling functions}(LF),这是一种启发式方法,它对数据的子集进行有噪声的标记,并且可能具有复杂的依赖关系。然后将标签模型拟合到LFs中,以产生未知类标签的估计。研究了标签模型错误对下游分类器测试集性能的影响。这给实践者带来了一个严重的认识差距,特别是因为在DP的现场应用中,LFs之间的依赖结构常常被忽略。我们分析了由于结构过度规范而导致的建模误差。我们推导了建模误差的新的理论界,并从经验上证明了这种误差可能是巨大的,即使在建模一个看似合理的结构时也是如此。 摘要:Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure.
【80】 UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control 标题:UniTTS:用于语音风格控制的统一嵌入空间的残差学习
作者:Minsu Kang,Sungjae Kim,Injung Kim 机构:Department of Computer Science and Electronic Engineering, Handong Global University 链接:https://arxiv.org/abs/2106.11171 摘要:提出了一种新的高保真表达性语音合成模型UniTTS,该模型学习和控制重叠样式属性,避免干扰。UniTTS通过应用属性前后音素嵌入之间的残差来表示单个统一嵌入空间中的多个风格属性。该方法在控制说话人ID和情感等难以清晰分离的多个属性时尤其有效,因为它在增加说话人ID和情感的方差时最小化了冗余,并基于说话人ID和情感预测了时长、基音和能量。实验结果表明,该方法能够以一种易于分离的方式协调地学习多个属性。此外,UniTTS还合成了控制多种风格属性的高保真语音信号。最后给出了合成语音样本https://jackson-kang.github.io/paper_works/UniTTS/demos. 摘要:We propose a novel high-fidelity expressive speech synthesis model, UniTTS, that learns and controls overlapping style attributes avoiding interference. UniTTS represents multiple style attributes in a single unified embedding space by the residuals between the phoneme embeddings before and after applying the attributes. The proposed method is especially effective in controlling multiple attributes that are difficult to separate cleanly, such as speaker ID and emotion, because it minimizes redundancy when adding variance in speaker ID and emotion, and additionally, predicts duration, pitch, and energy based on the speaker ID and emotion. In experiments, the visualization results exhibit that the proposed methods learned multiple attributes harmoniously in a manner that can be easily separated again. As well, UniTTS synthesized high-fidelity speech signals controlling multiple style attributes. The synthesized speech samples are presented at https://jackson-kang.github.io/paper_works/UniTTS/demos.
【81】 EMG Signal Classification Using Reflection Coefficients and Extreme Value Machine 标题:基于反射系数和极值机的肌电信号分类
作者:Reza Bagherian Azhiri,Mohammad Esmaeili,Mohsen Jafarzadeh,Mehrdad Nourani 机构:Department of Mechanical Engineering, Richardson, TX, USA, Department of Electrical and Computer Engineering, El Pomar Institute for Innovation and Commercialization, University of Colorado Colorado Springs, Colorado Springs, CO, USA 链接:https://arxiv.org/abs/2106.10561 摘要:肌电图是一种很有前途的方法,以手势识别的人,如果一个有效的分类器,高精度是可用的。本文提出利用极值机(EVM)作为一种高性能的肌电信号分类算法。我们利用自回归(AR)模型得到的反射系数来训练分类器。实验结果表明,与文献中基于K近邻(KNN)和支持向量机(SVM)的传统分类器相比,EVM具有更好的分类精度。 摘要:Electromyography is a promising approach to the gesture recognition of humans if an efficient classifier with high accuracy is available. In this paper, we propose to utilize Extreme Value Machine (EVM) as a high-performance algorithm for the classification of EMG signals. We employ reflection coefficients obtained from an Autoregressive (AR) model to train a set of classifiers. Our experimental results indicate that EVM has better accuracy in comparison to the conventional classifiers approved in the literature based on K-Nearest Neighbors (KNN) and Support Vector Machine (SVM).
【82】 QFCNN: Quantum Fourier Convolutional Neural Network 标题:QFCNN:量子傅立叶卷积神经网络
作者:Feihong Shen,Jun Liu 机构:Jilin Univeristy, Singapore University of Technology and Design 备注:14 pages, 6 figures 链接:https://arxiv.org/abs/2106.10421 摘要:神经网络和量子计算都是一个重要而有吸引力的领域,它们的交互学科对于传统计算机无法完成的大规模计算任务具有很大的应用前景。然而,这两种开发都受到硬件开发范围的限制。然而,在gpu强大到足以运行非常深的模型之前,已经提出了许多神经网络算法。类似地,量子算法也可以在真正的量子计算机易于访问之前被提出作为知识储备。具体而言,利用神经网络和量子计算的优点,设计量子深神经网络(QDNNs)在含噪的中尺度量子处理器上进行加速也是一个重要的研究课题。卷积神经网络(CNN)作为一种应用最广泛的神经网络结构,在量子机制的作用下仍有待于进一步的加速,但已有的尝试并不多。本文提出了一种新的混合量子经典电路,即量子傅立叶卷积网络(QFCN)。与经典CNN相比,我们的模型在理论上实现了指数级的加速,并且改进了现有的量子CNN的最佳结果。我们将此架构应用于不同的深度学习任务,包括交通量预测和图像分类,以展示此架构的潜力。 摘要:The neural network and quantum computing are both significant and appealing fields, with their interactive disciplines promising for large-scale computing tasks that are untackled by conventional computers. However, both developments are restricted by the scope of the hardware development. Nevertheless, many neural network algorithms had been proposed before GPUs become powerful enough for running very deep models. Similarly, quantum algorithms can also be proposed as knowledge reserves before real quantum computers are easily accessible. Specifically, taking advantage of both the neural networks and quantum computation and designing quantum deep neural networks (QDNNs) for acceleration on Noisy Intermediate-Scale Quantum (NISQ) processors is also an important research problem. As one of the most widely used neural network architectures, convolutional neural network (CNN) remains to be accelerated by quantum mechanisms, with only a few attempts have been demonstrated. In this paper, we propose a new hybrid quantum-classical circuit, namely Quantum Fourier Convolutional Network (QFCN). Our model achieves exponential speed-up compared with classical CNN theoretically and improves over the existing best result of quantum CNN. We demonstrate the potential of this architecture by applying it to different deep learning tasks, including traffic prediction and image classification.
【83】 Learning the Preferences of Uncertain Humans with Inverse Decision Theory 标题:用逆决策理论学习不确定人的偏好
作者:Cassidy Laidlaw,Stuart Russell 机构:University of California, Berkeley 链接:https://arxiv.org/abs/2106.10394 摘要:现有的用于学习人类偏好的观察方法,如逆强化学习,通常对人类环境的可观察性做出强有力的假设。然而,在现实中,人们在不确定的情况下做出许多重要的决策。为了更好地理解这些情况下的偏好学习,我们研究了逆决策理论(IDT)的背景,IDT是一个先前提出的框架,在这个框架中,观察到一个人在不确定的情况下做出非连续的二元决策。在IDT中,人们的偏好是通过损失函数来传递的,损失函数表示不同类型错误之间的权衡。我们给出了IDT的第一个统计分析,提供了确定这些偏好所需的条件,并描述了样本的复杂性——即必须观察的决策数量,以了解人类在达到所需精度时所做的权衡。有趣的是,我们发现当决策问题更不确定时,识别偏好实际上更容易。此外,不确定决策问题允许我们放松不现实的假设,即人是一个最优决策者,但仍然确定他们的确切偏好;我们也给出了这种次优情况下的样本复杂性。我们的分析与直觉相矛盾,即部分可观测性会使偏好学习变得更加困难。它还为理解和改进不确定和次优人类的偏好学习方法提供了第一步。 摘要:Existing observational approaches for learning human preferences, such as inverse reinforcement learning, usually make strong assumptions about the observability of the human's environment. However, in reality, people make many important decisions under uncertainty. To better understand preference learning in these cases, we study the setting of inverse decision theory (IDT), a previously proposed framework where a human is observed making non-sequential binary decisions under uncertainty. In IDT, the human's preferences are conveyed through their loss function, which expresses a tradeoff between different types of mistakes. We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision. Interestingly, we show that it is actually easier to identify preferences when the decision problem is more uncertain. Furthermore, uncertain decision problems allow us to relax the unrealistic assumption that the human is an optimal decision maker but still identify their exact preferences; we give sample complexities in this suboptimal case as well. Our analysis contradicts the intuition that partial observability should make preference learning more difficult. It also provides a first step towards understanding and improving preference learning methods for uncertain and suboptimal humans.
【84】 On the benefits of maximum likelihood estimation for Regression and Forecasting 标题:论极大似然估计在回归预测中的效益
作者:Pranjal Awasthi,Abhimanyu Das,Rajat Sen,Ananda Theertha Suresh 机构:Google Research 链接:https://arxiv.org/abs/2106.10370 摘要:我们主张用一种实用的最大似然估计(MLE)方法进行回归和预测,作为对特定目标度量的典型经验风险最小化(ERM)方法的替代方法。这种方法更适合于捕获归纳偏差,例如数据集中的先验领域知识,并且可以在推理时输出事后估计器,从而优化不同类型的目标度量。我们给出的理论结果表明,在某些一般条件下,我们的方法总是与目标度量的任何估计相竞争的,并且在许多实际情况下(如Poisson回归)实际上可以比ERM优越得多。我们的经验证明,我们的方法实例化一个设计良好的通用混合似然家庭可以获得优于ERM的各种任务的时间序列预测和回归数据集不同的数据分布。 摘要:We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is always competitive with any estimator for the target metric under some general conditions, and in many practical settings (such as Poisson Regression) can actually be much superior to ERM. We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over ERM for a variety of tasks across time-series forecasting and regression datasets with different data distributions.