人工智能学术速递[6.24]

2021-07-02 18:17:23 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.AI人工智能,共计44篇

【1】 DeepStochLog: Neural Stochastic Logic Programming 标题:DeepStochLog:神经随机逻辑编程

作者:Thomas Winters,Giuseppe Marra,Robin Manhaeve,Luc De Raedt 机构:KU Leuven, Dept. of Computer Science; Leuven.AI, B-, Leuven, Belgium, AASS, Örebro University, Sweden 备注:Thomas Winters and Giuseppe Marra contributed equally to this work 链接:https://arxiv.org/abs/2106.12574 摘要:神经符号学习的最新进展,如DeepProbLog,用神经谓词扩展了概率逻辑程序。与图形模型一样,这些概率逻辑程序定义了可能世界上的概率分布,因此很难进行推理。我们提出了一种基于随机定分句文法的神经符号框架DeepStochLog,这是一种随机逻辑程序,它定义了可能导子的概率分布。更具体地说,我们将神经语法规则引入到随机定分句语法中,以创建一个可以端到端训练的框架。我们证明了神经随机逻辑程序的推理和学习比神经概率逻辑程序的规模要大得多。此外,实验评估表明,DeepStochLog在具有挑战性的神经符号学习任务上取得了最新的成果。 摘要:Recent advances in neural symbolic learning, such as DeepProbLog, extend probabilistic logic programs with neural predicates. Like graphical models, these probabilistic logic programs define a probability distribution over possible worlds, for which inference is computationally hard. We propose DeepStochLog, an alternative neural symbolic framework based on stochastic definite clause grammars, a type of stochastic logic program, which defines a probability distribution over possible derivations. More specifically, we introduce neural grammar rules into stochastic definite clause grammars to create a framework that can be trained end-to-end. We show that inference and learning in neural stochastic logic programming scale much better than for neural probabilistic logic programs. Furthermore, the experimental evaluation shows that DeepStochLog achieves state-of-the-art results on challenging neural symbolic learning tasks.

【2】 Gradient-Based Interpretability Methods and Binarized Neural Networks 标题:基于梯度的可解释性方法与二值化神经网络

作者:Amy Widdicombe,Simon J. Julier 机构: 1Department of Computer Science, University College London 备注:Accepted at the ICML 2021 Workshop on Theoretic Foundation, Criticism & Application Trend of Explainable AI 链接:https://arxiv.org/abs/2106.12569 摘要:二值化神经网络(BNNs)有可能彻底改变在边缘计算平台上进行深度学习的方式。然而,这些网络上的可解释性方法的有效性尚未得到评估。本文比较了几种常用的基于显著图的可解释性技术(梯度、平滑梯度和梯度凸轮)在二值化或全精度神经网络(FPNNs)中的性能。我们发现,基本梯度法产生非常相似的寻找地图为这两种类型的网络。然而,SmoothGrad为BNNs生成了明显的噪声贴图。GradCAM还生成了不同网络类型的显著性图,其中一些bnn的解释似乎毫无意义。我们评论了这些解释上的差异的可能原因,并以解释性技术为什么应该在更广泛的网络类型上进行测试为例。 摘要:Binarized Neural Networks (BNNs) have the potential to revolutionize the way that deep learning is carried out in edge computing platforms. However, the effectiveness of interpretability methods on these networks has not been assessed. In this paper, we compare the performance of several widely used saliency map-based interpretabilty techniques (Gradient, SmoothGrad and GradCAM), when applied to Binarized or Full Precision Neural Networks (FPNNs). We found that the basic Gradient method produces very similar-looking maps for both types of network. However, SmoothGrad produces significantly noisier maps for BNNs. GradCAM also produces saliency maps which differ between network types, with some of the BNNs having seemingly nonsensical explanations. We comment on possible reasons for these differences in explanations and present it as an example of why interpretability techniques should be tested on a wider range of network types.

【3】 Multi-Class Classification of Blood Cells - End to End Computer Vision based diagnosis case study 标题:血细胞多级分类--基于计算机视觉的端到端诊断案例研究

作者:Sai Sukruth Bezugam 机构:Electrical Engineering Department, Indian Institute of Technology Delhi 备注:18 pages, 10 figures 链接:https://arxiv.org/abs/2106.12548 摘要:血源性疾病的诊断通常涉及识别和描述患者血样。自动检测和分类血细胞亚型的方法在医学上有重要的应用。自动化的医学图像处理和分析为医学诊断提供了强有力的工具。在这项工作中,我们处理的问题,白血球分类的基础上,其外部轮廓,颜色的形态特征。我们将探索一套预处理和分割(基于颜色的分割、形态学处理、轮廓)算法以及一套特征提取方法(角点检测算法和梯度直方图(HOG)),降维算法(主成分分析(PCA)),能够通过各种无监督(k-近邻)和有监督(支持向量机、决策树、线性判别分析、二次判别分析、,朴素贝叶斯(naivebayes)算法将不同类别的白细胞分为嗜酸性粒细胞、淋巴细胞、单核细胞和中性粒细胞。我们甚至向前迈出了一步,探索各种深度卷积神经网络架构(Sqeezent、MobilenetV1、MobilenetV2、InceptionNet等),无需预处理/分割和预处理。我们希望探索许多算法来识别时间复杂度最低、资源需求较低的鲁棒算法。这项工作的结果可以作为根据自动血细胞分类的要求选择算法的线索。 摘要:The diagnosis of blood-based diseases often involves identifying and characterizing patient blood samples. Automated methods to detect and classify blood cell subtypes have important medical applications. Automated medical image processing and analysis offers a powerful tool for medical diagnosis. In this work we tackle the problem of white blood cell classification based on the morphological characteristics of their outer contour, color. The work we would explore a set of preprocessing and segmentation (Color-based segmentation, Morphological processing, contouring) algorithms along with a set of features extraction methods (Corner detection algorithms and Histogram of Gradients(HOG)), dimensionality reduction algorithms (Principal Component Analysis(PCA)) that are able to recognize and classify through various Unsupervised(k-nearest neighbors) and Supervised (Support Vector Machine, Decision Trees, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Naive Bayes) algorithms different categories of white blood cells to Eosinophil, Lymphocyte, Monocyte, and Neutrophil. We even take a step forwards to explore various Deep Convolutional Neural network architecture (Sqeezent, MobilenetV1,MobilenetV2, InceptionNet etc.) without preprocessing/segmentation and with preprocessing. We would like to explore many algorithms to identify the robust algorithm with least time complexity and low resource requirement. The outcome of this work can be a cue to selection of algorithms as per requirement for automated blood cell classification.

【4】 Synthetic Benchmarks for Scientific Research in Explainable Machine Learning 标题:可解释机器学习中的科学研究综合基准

作者:Yang Liu,Sujay Khandagale,Colin White,Willie Neiswanger 机构:AI 2Stanford University 链接:https://arxiv.org/abs/2106.12543 摘要:随着机器学习模型变得越来越复杂,它们的应用变得越来越重要,解释模型预测的工具变得越来越重要。尽管可解释性技术被广泛使用,评估和比较不同的特征属性方法仍然具有挑战性:评估理想情况下需要人类研究,而经验评估指标在实际数据集的计算上往往是禁止的。在这项工作中,我们通过发布XAI Bench来解决这个问题:一套合成数据集和一个用于基准特性属性算法的库。与真实世界的数据集不同,合成数据集允许有效地计算条件期望值,这些值是评估基本真值Shapley值和其他度量所需的。我们发布的合成数据集提供了各种各样的参数,可以配置这些参数来模拟真实世界的数据。我们通过对流行的解释性技术进行多个评估指标的基准测试,并识别流行解释者的失败模式,来展示我们的库的强大功能。我们图书馆的效率将有助于从开发到部署带来新的解释方法。 摘要:As machine learning models grow more complex and their applications become more high-stakes, tools for explaining model predictions have become increasingly important. Despite the widespread use of explainability techniques, evaluating and comparing different feature attribution methods remains challenging: evaluations ideally require human studies, and empirical evaluation metrics are often computationally prohibitive on real-world datasets. In this work, we address this issue by releasing XAI-Bench: a suite of synthetic datasets along with a library for benchmarking feature attribution algorithms. Unlike real-world datasets, synthetic datasets allow the efficient computation of conditional expected values that are needed to evaluate ground-truth Shapley values and other metrics. The synthetic datasets we release offer a wide variety of parameters that can be configured to simulate real-world data. We demonstrate the power of our library by benchmarking popular explainability techniques across several evaluation metrics and identifying failure modes for popular explainers. The efficiency of our library will help bring new explainability methods from development to deployment.

【5】 Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation 标题:从粗到精的Q-注意:基于离散化的视觉机器人操作的有效学习

作者:Stephen James,Kentaro Wada,Tristan Laidlow,Andrew J. Davison 机构:Dyson Robotics Lab, Imperial College London 备注:Videos and code found at this https URL 链接:https://arxiv.org/abs/2106.12534 摘要:回顾过去几年,深度强化学习(RL)的最大突破是在离散动作领域。然而,机器人操作本身就是一个连续的控制环境,但是这些连续的控制强化学习算法往往依赖于演员-评论家方法,由于演员和评论家的联合优化,这些方法效率低,训练难度大。为此,我们探讨了如何将离散动作RL算法的稳定性引入机器人操作领域。我们扩展了最近发布的ARM算法,将连续的次优姿态代理替换为离散的次优姿态代理。考虑到旋转的有界性,旋转的离散化是微不足道的,而平移本质上是无界的,这使得离散化很困难。通过对三维空间的离散化,将平移预测转化为体素预测问题;然而,大型工作空间的体素化是内存密集型的,并且不会与高密度的体素一起工作,这对于获得机器人操作所需的分辨率至关重要。因此,我们建议通过逐渐提高分辨率,从粗到细地应用这种体素预测。在每一步中,我们提取最高值的体素作为预测位置,然后作为下一步高分辨率体素化的中心。这种从粗到精的预测应用于多个步骤,给出了几乎无损的翻译预测。结果表明,与连续控制算法相比,本文提出的由粗到精算法能更有效地完成RLBench任务,甚至能在不到7分钟的时间内训练出一些实际任务,即表格rasa,只需3次演示。此外,我们还表明,通过移动到体素表示,我们能够很容易地合并来自多个摄像头的观察。 摘要:Reflecting on the last few years, the biggest breakthroughs in deep reinforcement learning (RL) have been in the discrete action domain. Robotic manipulation, however, is inherently a continuous control environment, but these continuous control reinforcement learning algorithms often depend on actor-critic methods that are sample-inefficient and inherently difficult to train, due to the joint optimisation of the actor and critic. To that end, we explore how we can bring the stability of discrete action RL algorithms to the robot manipulation domain. We extend the recently released ARM algorithm, by replacing the continuous next-best pose agent with a discrete next-best pose agent. Discretisation of rotation is trivial given its bounded nature, while translation is inherently unbounded, making discretisation difficult. We formulate the translation prediction as the voxel prediction problem by discretising the 3D space; however, voxelisation of a large workspace is memory intensive and would not work with a high density of voxels, crucial to obtaining the resolution needed for robotic manipulation. We therefore propose to apply this voxel prediction in a coarse-to-fine manner by gradually increasing the resolution. In each step, we extract the highest valued voxel as the predicted location, which is then used as the centre of the higher-resolution voxelisation in the next step. This coarse-to-fine prediction is applied over several steps, giving a near-lossless prediction of the translation. We show that our new coarse-to-fine algorithm is able to accomplish RLBench tasks much more efficiently than the continuous control equivalent, and even train some real-world tasks, tabular rasa, in less than 7 minutes, with only 3 demonstrations. Moreover, we show that by moving to a voxel representation, we are able to easily incorporate observations from multiple cameras.

【6】 Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis 标题:跨域无监督Tag-to-Cine MRI合成的生成性自我训练

作者:Xiaofeng Liu,Fangxu Xing,Maureen Stone,Jiachen Zhuo,Reese Timothy,Jerry L. Prince,Georges El Fakhri,Jonghye Woo 机构: Gordon Center for Medical Imaging, Department of Radiology, Massachusetts, Dept. of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA, Athinoula A. Martinos Center for Biomedical Imaging, Dept. of Radiology 备注:MICCAI 2021 (early accept <13%) 链接:https://arxiv.org/abs/2106.12499 摘要:基于自训练的无监督域自适应(UDA)在将源域中训练好的深度学习模型应用于未标记的目标域时,显示出很大的潜力来解决域转移问题。然而,尽管自训练UDA已经证明了其在区分性任务(如分类和分割)上的有效性,但是通过基于softmax离散直方图的可靠伪标记选择,生成性任务(如图像合成)的自训练UDA还没有得到充分的研究。在这项工作中,我们提出了一个新的具有连续值预测和回归目标的生成性自训练(GST)UDA框架,用于跨域图像合成。具体地说,我们提出用不确定性掩模过滤伪标签,并用实际的变分Bayes学习量化生成图像的预测置信度。测试时间的快速自适应是通过一个基于轮的交替优化方案来实现的。我们在标记到电影磁共振成像(MRI)合成问题上验证了我们的框架,其中源域和目标域中的数据集是从不同的扫描仪或中心获得的。广泛的验证进行了验证,以验证我们的框架对流行的对抗性训练方法。结果表明,与对抗性训练方法相比,我们的GST在新靶区标记了受试者的MRI,大大提高了合成质量。 摘要:Self-training based unsupervised domain adaptation (UDA) has shown great potential to address the problem of domain shift, when applying a trained deep learning model in a source domain to unlabeled target domains. However, while the self-training UDA has demonstrated its effectiveness on discriminative tasks, such as classification and segmentation, via the reliable pseudo-label selection based on the softmax discrete histogram, the self-training UDA for generative tasks, such as image synthesis, is not fully investigated. In this work, we propose a novel generative self-training (GST) UDA framework with continuous value prediction and regression objective for cross-domain image synthesis. Specifically, we propose to filter the pseudo-label with an uncertainty mask, and quantify the predictive confidence of generated images with practical variational Bayes learning. The fast test-time adaptation is achieved by a round-based alternative optimization scheme. We validated our framework on the tagged-to-cine magnetic resonance imaging (MRI) synthesis problem, where datasets in the source and target domains were acquired from different scanners or centers. Extensive validations were carried out to verify our framework against popular adversarial training UDA methods. Results show that our GST, with tagged MRI of test subjects in new target domains, improved the synthesis quality by a large margin, compared with the adversarial training UDA methods.

【7】 Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation 标题:自适应现成的源分割器用于目标医学图像分割

作者:Xiaofeng Liu,Fangxu Xing,Chao Yang,Georges El Fakhri,Jonghye Woo 机构: Gordon Center for Medical Imaging, Department of Radiology, Massachusetts, General Hospital and Harvard Medical School, Boston, MA, Facebook Artificial Intelligence, Boston, MA 备注:To appear in MICCAI 2021 链接:https://arxiv.org/abs/2106.12497 摘要:无监督域自适应(Unsupervised domain adaption,UDA)的目的是将从标记的源域学习到的知识转移到一个未标记的、不可见的目标域,该目标域通常基于两个域的数据进行训练。然而,由于数据存储或隐私问题,在适配阶段对源域数据的访问通常是有限的。为了缓解这一问题,在本文中,我们针对无源UDA进行了分割,并提出了一种在源域中预先训练好的现成的分割模型来适应目标域,该模型采用了一种自适应的批量归一化统计自适应框架。具体地说,域特定的低阶批处理统计量,即均值和方差,通过指数动量衰减方案逐渐适应,而域共享的高阶批处理统计量,即缩放和移动参数的一致性,通过我们的优化目标得到了显式的加强。首先自适应地测量每个信道的可转移性,从中平衡每个信道的贡献。此外,提出的无源UDA框架与无监督学习方法(如自熵最小化)是正交的,因此可以简单地添加到我们的框架之上。在BraTS 2018数据库上的大量实验表明,我们的无源UDA框架在跨子类型UDA分割任务中优于现有的源松弛UDA方法,并且在跨模态UDA分割任务中得到了与源数据监督UDA方法相当的结果。 摘要:Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to an unlabeled and unseen target domain, which is usually trained on data from both domains. Access to the source domain data at the adaptation stage, however, is often limited, due to data storage or privacy issues. To alleviate this, in this work, we target source free UDA for segmentation, and propose to adapt an ``off-the-shelf" segmentation model pre-trained in the source domain to the target domain, with an adaptive batch-wise normalization statistics adaptation framework. Specifically, the domain-specific low-order batch statistics, i.e., mean and variance, are gradually adapted with an exponential momentum decay scheme, while the consistency of domain shareable high-order batch statistics, i.e., scaling and shifting parameters, is explicitly enforced by our optimization objective. The transferability of each channel is adaptively measured first from which to balance the contribution of each channel. Moreover, the proposed source free UDA framework is orthogonal to unsupervised learning methods, e.g., self-entropy minimization, which can thus be simply added on top of our framework. Extensive experiments on the BraTS 2018 database show that our source free UDA framework outperformed existing source-relaxed UDA methods for the cross-subtype UDA segmentation task and yielded comparable results for the cross-modality UDA segmentation task, compared with a supervised UDA methods with the source data.

【8】 Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations 标题:基于迁移学习和数据转换的预训练视觉模型对文本数据的分类

作者:Charaf Eddine Benarab 备注:Paper contains: 5 pages, 6 figures, 1 table 链接:https://arxiv.org/abs/2106.12479 摘要:知识是人类通过经验获得的,我们在同一时间完成不同任务所能获得的知识或技能水平之间没有界限。当谈到神经网络时,情况并非如此,该领域的重大突破都是针对特定任务和领域的。视觉和语言以不同的方式处理,使用不同的方法和不同的数据集。在这项工作中,我们建议使用在ImageNet上训练的基准视觉模型所获得的知识来帮助更小的体系结构学习文本分类。将IMDB数据集中包含的文本数据转换为灰度图像后。对不同领域和迁移学习方法进行了分析。尽管不同的数据集带来了挑战,但还是取得了有希望的结果。这项工作的主要贡献是一种新颖的方法,它将语言和视觉上的大型预训练模型连接起来,在不同的子领域中从原始任务中获得最先进的结果。不需要高计算能力的资源。具体来说,情感分析是在视觉模型和语言模型之间传递知识后实现的。将BERT嵌入变换为灰度图像,然后将这些图像作为预训练视觉模型的训练样本,如VGG16和ResNet索引项:自然语言、视觉、BERT、迁移学习、CNN、域自适应。 摘要:Knowledge is acquired by humans through experience, and no boundary is set between the kinds of knowledge or skill levels we can achieve on different tasks at the same time. When it comes to Neural Networks, that is not the case, the major breakthroughs in the field are extremely task and domain specific. Vision and language are dealt with in separate manners, using separate methods and different datasets. In this work, we propose to use knowledge acquired by benchmark Vision Models which are trained on ImageNet to help a much smaller architecture learn to classify text. After transforming the textual data contained in the IMDB dataset to gray scale images. An analysis of different domains and the Transfer Learning method is carried out. Despite the challenge posed by the very different datasets, promising results are achieved. The main contribution of this work is a novel approach which links large pretrained models on both language and vision to achieve state-of-the-art results in different sub-fields from the original task. Without needing high compute capacity resources. Specifically, Sentiment Analysis is achieved after transferring knowledge between vision and language models. BERT embeddings are transformed into grayscale images, these images are then used as training examples for pretrained vision models such as VGG16 and ResNet Index Terms: Natural language, Vision, BERT, Transfer Learning, CNN, Domain Adaptation.

【9】 How Well do Feature Visualizations Support Causal Understanding of CNN Activations? 标题:功能可视化在多大程度上支持CNN激活的因果理解?

作者:Roland S. Zimmermann,Judy Borowski,Robert Geirhos,Matthias Bethge,Thomas S. A. Wallis,Wieland Brendel 机构: 1University of Tübingen, Germany 2TechnischeUniversitätDarmstadt 备注:ICML 2021 XAI workshop version. Joint first and last authors. Project website at this https URL 链接:https://arxiv.org/abs/2106.12447 摘要:理解深度卷积神经网络内部工作的一种广泛使用的方法是通过激活最大化来可视化单元响应。通过激活最大化的特征可视化被认为为人类提供了关于导致一个单位被激活的图像特征的精确信息。如果这是真的,这些合成图像应该能让人类预测干预的效果,比如遮挡图像的某个区域(比如,狗的头部)是否会改变一个单位的激活。在这里,我们通过让人类预测两个方形闭塞中的哪一个导致一个单位的激活发生更大的变化来检验这个假设。大规模众包实验和专家测量均表明,平均而言,Olah等人(2017年)的极度活跃的特征可视化确实有助于人类完成这项任务(准确率为67美元/pm 4%$;没有任何可视化的基线性能是$60pm3%$)。但是,与其他可视化(例如数据集样本)相比,它们没有提供任何显著的优势,后者产生类似的性能($66pm 3%$到$67pm 3%$精度)。综上所述,我们提出了一个客观的心理物理学任务来量化单位水平的可解释性方法对人类的益处,并且没有发现任何证据表明特征可视化比简单的替代可视化能为人类提供更好的“因果理解”。 摘要:One widely used approach towards understanding the inner workings of deep convolutional neural networks is to visualize unit responses via activation maximization. Feature visualizations via activation maximization are thought to provide humans with precise information about the image features that cause a unit to be activated. If this is indeed true, these synthetic images should enable humans to predict the effect of an intervention, such as whether occluding a certain patch of the image (say, a dog's head) changes a unit's activation. Here, we test this hypothesis by asking humans to predict which of two square occlusions causes a larger change to a unit's activation. Both a large-scale crowdsourced experiment and measurements with experts show that on average, the extremely activating feature visualizations by Olah et al. (2017) indeed help humans on this task ($67 pm 4%$ accuracy; baseline performance without any visualizations is $60 pm 3%$). However, they do not provide any significant advantage over other visualizations (such as e.g. dataset samples), which yield similar performance ($66 pm 3%$ to $67 pm 3%$ accuracy). Taken together, we propose an objective psychophysical task to quantify the benefit of unit-level interpretability methods for humans, and find no evidence that feature visualizations provide humans with better "causal understanding" than simple alternative visualizations.

【10】 Beyond Predictions in Neural ODEs: Identification and Interventions 标题:神经学颂歌中的超越预测:识别与干预

作者:Hananeh Aliee,Fabian J. Theis,Niki Kilbertus 机构:TUM, Helmholtz Center, Munich 链接:https://arxiv.org/abs/2106.12430 摘要:在模式匹配和预测任务取得巨大成功的推动下,研究人员越来越多地借助机器学习来帮助原始的科学发现。有了大量关于一个系统的观测数据,我们能揭示其演化的规律吗?解决这项任务有很大的希望,充分了解因果关系,并能够作出可靠的预测系统的行为下的干预。对于由常微分方程组(ODEs)生成的时间序列数据,我们朝着回答这个问题迈出了一步。虽然控制常微分方程可能无法单独从数据中识别,但我们表明,将简单的正则化方案与灵活的神经常微分方程相结合,可以从时间序列数据中稳健地恢复动力学和因果结构。我们的结果对各种(非线性)一阶和二阶系统以及实际数据验证了我们的方法。我们的结论是,在对变量或系统本身进行干预的情况下,我们也可以做出准确的预测。 摘要:Spurred by tremendous success in pattern matching and prediction tasks, researchers increasingly resort to machine learning to aid original scientific discovery. Given large amounts of observational data about a system, can we uncover the rules that govern its evolution? Solving this task holds the great promise of fully understanding the causal interactions and being able to make reliable predictions about the system's behavior under interventions. We take a step towards answering this question for time-series data generated from systems of ordinary differential equations (ODEs). While the governing ODEs might not be identifiable from data alone, we show that combining simple regularization schemes with flexible neural ODEs can robustly recover the dynamics and causal structures from time-series data. Our results on a variety of (non)-linear first and second order systems as well as real data validate our method. We conclude by showing that we can also make accurate predictions under interventions on variables or the system itself.

【11】 Alias-Free Generative Adversarial Networks 标题:无别名生成性对抗性网络

作者:Tero Karras,Miika Aittala,Samuli Laine,Erik Härkönen,Janne Hellsten,Jaakko Lehtinen,Timo Aila 机构:Aalto University and NVIDIA, NVIDIA and Aalto University 链接:https://arxiv.org/abs/2106.12423 摘要:我们观察到,尽管他们的层次卷积性质,合成过程中的典型生成对手网络依赖于绝对像素坐标在一个不健康的方式。这表现为,例如,细节似乎被粘在图像坐标上,而不是被描绘对象的表面。我们追踪的根本原因是粗心的信号处理,造成混叠在发电机网络。将网络中的所有信号解释为连续的,我们导出了普遍适用的、小的体系结构更改,以保证不需要的信息不会泄漏到分层合成过程中。得到的网络与StyleGAN2的FID匹配,但在内部表示上有很大的不同,即使在亚像素尺度上,它们也完全等同于平移和旋转。我们的结果为更适合视频和动画的生成模型铺平了道路。 摘要:We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

【12】 Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation 标题:心脏磁共振图像分析中的公平性:基于深度学习的分割中数据不平衡引起的偏差研究

作者:Esther Puyol-Anton,Bram Ruijsink,Stefan K. Piechnik,Stefan Neubauer,Steffen E. Petersen,Reza Razavi,Andrew P. King 机构: School of Biomedical Engineering & Imaging Sciences, King's College London, UK, Guy’s and St Thomas' Hospital, London, UK., William Harvey Research Institute, NIHR Barts Biomedical Research Centre 备注:MICCAI 2021 conference 链接:https://arxiv.org/abs/2106.12387 摘要:人工智能中的“公平”主题是指基于种族和性别等人口统计特征评估人工智能算法的潜在偏见,以及开发解决这种偏见的算法。迄今为止,大多数应用都是在计算机视觉领域,尽管一些医疗领域的工作已经开始出现。近年来,深度学习(DL)在心脏MR分割中的应用已经取得了令人印象深刻的成果,这些技术也开始被应用到临床实践中。然而,目前还没有研究此类模型的公平性。在这项工作中,我们对种族/性别群体进行了这样的分析,重点是训练数据不平衡的问题,使用一个nnU网络模型,对来自英国Biobank数据集的电影短轴心脏MR数据进行训练和评估,该数据集由6个不同种族的5903名受试者组成。我们发现不同种族的人在掷骰子的表现上有显著的统计学差异。为了减少种族偏见,我们研究了三种策略:(1)分层分批抽样,其中分批抽样是分层的,以确保种族群体之间的平衡(2) 公平元分割学习,训练DL分类器对种族进行分类,并与分割模型进行联合优化;(3)保护群体模型,其中为每个种族群体训练不同的分割模型。我们还将结果与拥有完全平衡的数据库的场景进行了比较。为了评估公平性,我们使用了骰子平均值的标准差(SD)和偏误率(SER)。我们的结果表明,种族偏见的结果来自于使用不平衡的训练数据,并且所有提出的偏见缓解策略都提高了公平性,最佳的SD和SER来自于使用保护组模型。 摘要:The subject of "fairness" in artificial intelligence (AI) refers to assessing AI algorithms for potential bias based on demographic characteristics such as race and gender, and the development of algorithms to address this bias. Most applications to date have been in computer vision, although some work in healthcare has started to emerge. The use of deep learning (DL) in cardiac MR segmentation has led to impressive results in recent years, and such techniques are starting to be translated into clinical practice. However, no work has yet investigated the fairness of such models. In this work, we perform such an analysis for racial/gender groups, focusing on the problem of training data imbalance, using a nnU-Net model trained and evaluated on cine short axis cardiac MR data from the UK Biobank dataset, consisting of 5,903 subjects from 6 different racial groups. We find statistically significant differences in Dice performance between different racial groups. To reduce the racial bias, we investigated three strategies: (1) stratified batch sampling, in which batch sampling is stratified to ensure balance between racial groups; (2) fair meta-learning for segmentation, in which a DL classifier is trained to classify race and jointly optimized with the segmentation model; and (3) protected group models, in which a different segmentation model is trained for each racial group. We also compared the results to the scenario where we have a perfectly balanced database. To assess fairness we used the standard deviation (SD) and skewed error ratio (SER) of the average Dice values. Our results demonstrate that the racial bias results from the use of imbalanced training data, and that all proposed bias mitigation strategies improved fairness, with the best SD and SER resulting from the use of protected group models.

【13】 AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks 标题:AC/DC:深度神经网络的交替压缩/解压缩训练

作者:Alexandra Peste,Eugenia Iofinova,Adrian Vladu,Dan Alistarh 机构: Université de Paris 链接:https://arxiv.org/abs/2106.12379 摘要:随着深度神经网络(DNNs)计算量的不断增加,人们对DNN模型的研究越来越感兴趣。最近的工作研究了更困难的稀疏训练情况,其中DNN权重尽可能地已经稀疏,以减少训练期间的计算成本。现有的稀疏训练方法主要是基于经验的,相对于密集基线,稀疏训练方法的精度往往较低。在本文中,我们提出了一种通用的DNNs的交替压缩/解压缩(AC/DC)训练方法,证明了该算法的收敛性,并证明了AC/DC在计算量相近的情况下,其精度优于现有的稀疏训练方法;在高稀疏度下,AC/DC甚至优于依赖于精确的预训练密集模型的现有方法。AC/DC的一个重要特性是它允许密集和稀疏模型的联合训练,在训练过程结束时产生精确的稀疏-密集模型对。这在实践中是有用的,在资源受限的环境中,压缩变体可能需要部署,而无需重新执行整个训练流程,同时也为我们深入了解密集模型和压缩模型之间的精度差距。 摘要:The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are sparse, yet accurate. Recent work has investigated the even harder case of sparse training, where the DNN weights are, for as much as possible, already sparse to reduce computational costs during training. Existing sparse training methods are mainly empirical and often have lower accuracy relative to the dense baseline. In this paper, we present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of DNNs, demonstrate convergence for a variant of the algorithm, and show that AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets; at high sparsity levels, AC/DC even outperforms existing methods that rely on accurate pre-trained dense models. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process. This is useful in practice, where compressed variants may be desirable for deployment in resource-constrained settings without re-doing the entire training flow, and also provides us with insights into the accuracy gap between dense and compressed models.

【14】 PALRACE: Reading Comprehension Dataset with Human Data and Labeled Rationales 标题:PALRACE:包含人类数据和标注理论的阅读理解数据集

作者:Jiajie Zou,Yuran Zhang,Peiqing Jin,Cheng Luo,Xunyi Pan,Nai Ding 机构:Zhejiang Lab Hangzhou, China, Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical, Engineering and Instrument Sciences, Zhejiang University Hangzhou, China 链接:https://arxiv.org/abs/2106.12373 摘要:预先训练的语言模型在机器阅读理解(MRC)任务中取得了很好的效果,但结果很难解释。使模型可解释的一个吸引人的方法是为其决策提供理论依据。为了促进对人类基本原理的监督学习,我们提出了一个新的MRC数据集PALRACE(修剪和标记的RACE),它包含了从RACE数据集中选择的800个通道的人类标记的基本原理。我们把每一篇文章的问题进一步分为六类。每一篇文章都由至少26名参与者阅读,他们用自己的理由来回答问题。此外,我们进行了一次基本原理评估会议,要求参与者仅根据标记的基本原理回答问题,确认标记的基本原理质量高,能够充分支持问题回答。 摘要:Pre-trained language models achieves high performance on machine reading comprehension (MRC) tasks but the results are hard to explain. An appealing approach to make models explainable is to provide rationales for its decision. To facilitate supervised learning of human rationales, here we present PALRACE (Pruned And Labeled RACE), a new MRC dataset with human labeled rationales for 800 passages selected from the RACE dataset. We further classified the question to each passage into 6 types. Each passage was read by at least 26 participants, who labeled their rationales to answer the question. Besides, we conducted a rationale evaluation session in which participants were asked to answering the question solely based on labeled rationales, confirming that the labeled rationales were of high quality and can sufficiently support question answering.

【15】 Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis 标题:你应该再深入一点吗?基于感受场分析的无训练卷积神经网络结构优化

作者:Mats L. Richter,Julius Schöning,Ulf Krumnack 机构:Osnabrueck, Germany, Julius Sch¨oning, Osnabr¨uck University of Applied Sciences 备注:Preprint 链接:https://arxiv.org/abs/2106.12307 摘要:将人工神经网络(ANN)应用到特定任务中,研究人员、程序员和其他专家通常会在设计中过多地使用卷积层。这意味着,这些人工神经网络包含太多的参数,需要在不影响结果的情况下进行不必要的训练。卷积层所能处理的特征受到其感受野的严格限制。通过逐层分析感受野的扩展,我们可以可靠地预测在给定的神经网络结构中,对推理没有定性贡献的层序列。基于这些分析,我们提出了解决这些低效率的设计策略,优化了人工神经网络的可解释性和计算性能。由于这些策略和分析都不需要对实际模型进行训练,因此这些洞察使得人工神经网络体系结构的设计过程非常有效,将来可能会实现自动化。 摘要:Applying artificial neural networks (ANN) to specific tasks, researchers, programmers, and other specialists usually overshot the number of convolutional layers in their designs. By implication, these ANNs hold too many parameters, which needed unnecessarily trained without impacting the result. The features, a convolutional layer can process, are strictly limited by its receptive field. By layer-wise analyzing the expansion of the receptive fields, we can reliably predict sequences of layers that will not contribute qualitatively to the inference in thegiven ANN architecture. Based on these analyses, we propose design strategies to resolve these inefficiencies, optimizing the explainability and the computational performance of ANNs. Since neither the strategies nor the analysis requires training of the actual model, these insights allow for a very efficient design process of ANNs architectures which might be automated in the future.

【16】 3D human tongue reconstruction from single "in-the-wild" images

作者:Stylianos Ploumpis,Stylianos Moschoglou,Vasileios Triantafyllou,Stefanos Zafeiriou 机构:Imperial College London, UK, Huawei Technologies Co. Ltd 备注:10 pages, 9 figures 链接:https://arxiv.org/abs/2106.12302 摘要:基于单个图像的三维人脸重建是计算机视觉领域的一个研究热点,特别是由于其在真实感三维化身创建、姿态不变人脸识别和人脸幻觉等领域的广泛应用。自从90年代末引入三维变形模型以来,我们目睹了一场旨在解决这一问题的研究热潮。然而,尽管主要归因于深度学习进步的单个图像的3D面部重建中的细节水平不断提高,但是在文献中所有3D面部模型中仍然缺少更精细和高度可变形的面部组件,例如舌头,尽管这对于3D化身表示的真实性非常重要。在这项工作中,我们提出了第一个,据我们所知,端到端的训练管道,准确地重建三维人脸和舌头一起。此外,我们通过引入一种新的适合于三维舌面生成的GAN方法,使得该管道在野外图像中具有很强的鲁棒性。最后,我们向社区公开了第一个不同的舌头数据集,包括1800个原始扫描,700个不同性别、年龄和种族背景的个体。正如我们在一系列大量的定量和定性实验中所证明的,我们的模型被证明是健壮的,并且真实地捕捉到了三维舌头结构,即使在不利的“野外”条件下也是如此。 摘要:3D face reconstruction from a single image is a task that has garnered increased interest in the Computer Vision community, especially due to its broad use in a number of applications such as realistic 3D avatar creation, pose invariant face recognition and face hallucination. Since the introduction of the 3D Morphable Model in the late 90's, we witnessed an explosion of research aiming at particularly tackling this task. Nevertheless, despite the increasing level of detail in the 3D face reconstructions from single images mainly attributed to deep learning advances, finer and highly deformable components of the face such as the tongue are still absent from all 3D face models in the literature, although being very important for the realness of the 3D avatar representations. In this work we present the first, to the best of our knowledge, end-to-end trainable pipeline that accurately reconstructs the 3D face together with the tongue. Moreover, we make this pipeline robust in "in-the-wild" images by introducing a novel GAN method tailored for 3D tongue surface generation. Finally, we make publicly available to the community the first diverse tongue dataset, consisting of 1,800 raw scans of 700 individuals varying in gender, age, and ethnicity backgrounds. As we demonstrate in an extensive series of quantitative as well as qualitative experiments, our model proves to be robust and realistically captures the 3D tongue structure, even in adverse "in-the-wild" conditions.

【17】 Behavior Mimics Distribution: Combining Individual and Group Behaviors for Federated Learning 标题:行为模拟分布:将个体行为和群体行为结合起来进行联合学习

作者:Hua Huang,Fanhua Shang,Yuanyuan Liu,Hongying Liu 机构:Key Lab of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, China, Peng Cheng Lab, Shenzhen, China 备注:This paper has been accepted by International Joint Conference on Artificial Intelligence (IJCAI) 2021 链接:https://arxiv.org/abs/2106.12300 摘要:联邦学习(FL)已经成为一种活跃的、有前途的分布式机器学习模式。由于统计上的异质性,最近的研究清楚地表明,流行的FL方法(例如FedAvg)的性能由于本地更新引起的客户端漂移而急剧恶化。本文提出了一种新的联合学习算法(IGFL),它利用个体和群体的行为来模拟分布,从而提高了对异质性的处理能力。与现有的FL方法不同,我们的IGFL可以应用于客户机和服务器优化。作为一个副产品,我们提出了一种新的基于注意的联邦学习在服务器优化的IGFL。据我们所知,这是第一次将注意机制纳入联邦优化。我们进行了大量的实验,结果表明IGFL可以显著提高现有联邦学习方法的性能。特别是当个体间的数据分布不同时,IGFL可以将分类精度提高13%左右。 摘要:Federated Learning (FL) has become an active and promising distributed machine learning paradigm. As a result of statistical heterogeneity, recent studies clearly show that the performance of popular FL methods (e.g., FedAvg) deteriorates dramatically due to the client drift caused by local updates. This paper proposes a novel Federated Learning algorithm (called IGFL), which leverages both Individual and Group behaviors to mimic distribution, thereby improving the ability to deal with heterogeneity. Unlike existing FL methods, our IGFL can be applied to both client and server optimization. As a by-product, we propose a new attention-based federated learning in the server optimization of IGFL. To the best of our knowledge, this is the first time to incorporate attention mechanisms into federated optimization. We conduct extensive experiments and show that IGFL can significantly improve the performance of existing federated learning methods. Especially when the distributions of data among individuals are diverse, IGFL can improve the classification accuracy by about 13% compared with prior baselines.

【18】 A Label Management Mechanism for Retinal Fundus Image Classification of Diabetic Retinopathy 标题:糖尿病视网膜病变视网膜眼底图像分类的标签管理机制

作者:Mengdi Gao,Ximeng Feng,Mufeng Geng,Zhe Jiang,Lei Zhu,Xiangxi Meng,Chuanqing Zhou,Qiushi Ren,Yanye Lu 备注:10 pages, 9 figures 链接:https://arxiv.org/abs/2106.12284 摘要:糖尿病视网膜病变(DR)仍然是工作年龄成年人视力损害和不可逆性失明的最常见原因。由于深度学习(deep learning,DL)的兴起,基于DL的DR诊断已经成为DR早期筛查和严重程度分级的一个很有前途的工具。然而,训练深度神经网络(deep neural networks,DNNs)需要大量仔细标记的数据。在标记大量数据时,可能会引入带噪的标记数据,降低模型的性能。在这项工作中,我们提出了一种新的标签管理机制(LMM)的DNN,以克服过度拟合的噪声数据。LMM利用贝叶斯统计中的最大后验概率(maximum posteriori probability,MAP)和时间加权技术,对不干净数据的标签进行选择性校正,逐步净化训练数据,提高分类性能。对合成噪声数据(mesidor&我们收集的DR数据集)和真实噪声数据(ANIMAL-10N)的综合实验表明,LMM可以提高模型的性能,优于三种最先进的方法。 摘要:Diabetic retinopathy (DR) remains the most prevalent cause of vision impairment and irreversible blindness in the working-age adults. Due to the renaissance of deep learning (DL), DL-based DR diagnosis has become a promising tool for the early screening and severity grading of DR. However, training deep neural networks (DNNs) requires an enormous amount of carefully labeled data. Noisy label data may be introduced when labeling plenty of data, degrading the performance of models. In this work, we propose a novel label management mechanism (LMM) for the DNN to overcome overfitting on the noisy data. LMM utilizes maximum posteriori probability (MAP) in the Bayesian statistic and time-weighted technique to selectively correct the labels of unclean data, which gradually purify the training data and improve classification performance. Comprehensive experiments on both synthetic noise data (Messidor & our collected DR dataset) and real-world noise data (ANIMAL-10N) demonstrated that LMM could boost performance of models and is superior to three state-of-the-art methods.

【19】 Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders 标题:基于动态变分自动编码器的无监督语音增强

作者:Xiaoyu Bie,Simon Leglaive,Xavier Alameda-Pineda,Laurent Girin 链接:https://arxiv.org/abs/2106.12271 摘要:动态变分自动编码器(dynamicvariationauto-encoders,DVAEs)是一类具有潜变量的深生成模型,用于时间序列数据建模。DVAEs可以看作是变分自编码器(VAE)的扩展,包括对数据序列中连续观测向量和/或潜在向量之间的时间依赖性的建模。以往的研究表明,DVAEs在语音信号(谱图)建模中有着广泛的应用前景,其性能优于VAE。独立地,VAE已经成功地应用于噪声中的语音增强,在无监督的噪声不可知设置中,不需要使用干净和有噪声语音样本的并行数据集进行训练,而只需要干净的语音信号。在本文中,我们将这些工作扩展到基于DVAE的单通道无监督语音增强,从而开发了语音信号的无监督表示学习和动力学建模。我们提出了一种基于DVAEs最一般形式的无监督语音增强算法,并将其应用于三种特定的DVAE模型,以说明该框架的通用性。更准确地说,我们将基于DVAE的语音先验知识与基于非负矩阵分解的噪声模型相结合,提出了一种变分期望最大化(VEM)算法来进行语音增强。实验结果表明,基于DVAEs的语音增强方法优于VAE算法和有监督的语音增强基线。 摘要:Dynamical variational auto-encoders (DVAEs) are a class of deep generative models with latent variables, dedicated to time series data modeling. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include the modeling of temporal dependencies between successive observed and/or latent vectors in data sequences. Previous work has shown the interest of DVAEs and their better performance over the VAE for speech signals (spectrogram) modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that does not require the use of a parallel dataset of clean and noisy speech samples for training, but only requires clean speech signals. In this paper, we extend those works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm based on the most general form of DVAEs, that we then adapt to three specific DVAE models to illustrate the versatility of the framework. More precisely, we combine DVAE-based speech priors with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. Experimental results show that the proposed approach based on DVAEs outperforms its VAE counterpart and a supervised speech enhancement baseline.

【20】 Improved Acyclicity Reasoning for Bayesian Network Structure Learning with Constraint Programming 标题:基于约束规划的贝叶斯网络结构学习改进的无圈推理

作者:Fulya Trösser,Simon de Givry,George Katsirelos 机构:Universit´e de Toulouse, INRAE, UR MIAT, F-, Castanet-Tolosan, France, UMR MIA-Paris, INRAE, AgroParisTech, Univ. Paris-Saclay, Paris, France 备注:None 链接:https://arxiv.org/abs/2106.12269 摘要:贝叶斯网络是一种概率图形模型,具有广泛的应用领域,包括基因调控网络推理、风险分析和图像处理。从离散数据中学习贝叶斯网络(BNSL)的结构是一个NP-hard任务,具有超指数的有向无环图搜索空间。在这项工作中,我们提出了一个新的多项式时间算法来发现所有可能的聚类割集的子集,一个贪心算法来近似求解得到的线性规划,以及一个广义弧一致性算法来解决无环约束。我们将它们嵌入到基于约束编程的分支定界求解器CPBayes中,并表明尽管它们是次优的,但它们的性能提高了几个数量级。所得到的解算器也与GOBNILP相比较,GOBNILP是解决BNSL问题的最先进的解算器,它解决了NP难问题,能够发现每个割点并精确地求解线性规划。 摘要:Bayesian networks are probabilistic graphical models with a wide range of application areas including gene regulatory networks inference, risk analysis and image processing. Learning the structure of a Bayesian network (BNSL) from discrete data is known to be an NP-hard task with a superexponential search space of directed acyclic graphs. In this work, we propose a new polynomial time algorithm for discovering a subset of all possible cluster cuts, a greedy algorithm for approximately solving the resulting linear program, and a generalised arc consistency algorithm for the acyclicity constraint. We embed these in the constraint programmingbased branch-and-bound solver CPBayes and show that, despite being suboptimal, they improve performance by orders of magnitude. The resulting solver also compares favourably with GOBNILP, a state-of-the-art solver for the BNSL problem which solves an NP-hard problem to discover each cut and solves the linear program exactly.

【21】 ADAVI: Automatic Dual Amortized Variational Inference Applied To Pyramidal Bayesian Models 标题:ADAVI:应用于金字塔贝叶斯模型的自动对偶摊销变分推理

作者:Louis Rouillard,Demian Wassermann 机构:Université Paris-Saclay, Inria, CEA, Palaiseau, France 备注:None 链接:https://arxiv.org/abs/2106.12248 摘要:通常,人口研究的特点是金字塔组织的数据表示使用层次贝叶斯模型(HBM)丰富的板块。这些模型在神经成像(neuroimaging)等环境中可能会变得异常庞大,其中一个样本由一个功能性MRI信号组成,该信号在4个测量环节中,在6.4万个大脑位置进行测量,至少有数十名受试者。即使是在300个大脑位置的特定皮层区域上的一个简化例子,也会有大约100万个参数,这妨碍了基于模拟的推理(SBI)等现代密度估计技术的使用。为了在这类具有挑战性的问题中推断参数的后验分布,我们设计了一种新的方法来自动产生一个变分族对偶到目标HBM。这个变量族表示为一个神经网络,由一个基于注意的分层编码器组合而成,该编码器将摘要统计信息提供给一组规范化流。我们自动导出的神经网络利用了厚板的可交换性,并对其参数空间进行因子分解。由此产生的体系结构相对于典型的SBI表示减少了几个数量级的参数化,同时保持了表达能力。我们的方法在摊销设置中对指定的HBM进行推断:一旦训练,它可以很容易地应用于新的数据样本来计算参数的全后验概率。我们证明了我们的方法对模拟数据的能力,以及一个具有挑战性的高维大脑分割实验。我们还提出了SBI技术和结构化变分推理交叉的几个问题。 摘要:Frequently, population studies feature pyramidally-organized data represented using Hierarchical Bayesian Models (HBM) enriched with plates. These models can become prohibitively large in settings such as neuroimaging, where a sample is composed of a functional MRI signal measured on 64 thousand brain locations, across 4 measurement sessions, and at least tens of subjects. Even a reduced example on a specific cortical region of 300 brain locations features around 1 million parameters, hampering the usage of modern density estimation techniques such as Simulation-Based Inference (SBI). To infer parameter posterior distributions in this challenging class of problems, we designed a novel methodology that automatically produces a variational family dual to a target HBM. This variatonal family, represented as a neural network, consists in the combination of an attention-based hierarchical encoder feeding summary statistics to a set of normalizing flows. Our automatically-derived neural network exploits exchangeability in the plate-enriched HBM and factorizes its parameter space. The resulting architecture reduces by orders of magnitude its parameterization with respect to that of a typical SBI representation, while maintaining expressivity. Our method performs inference on the specified HBM in an amortized setup: once trained, it can readily be applied to a new data sample to compute the parameters' full posterior. We demonstrate the capability of our method on simulated data, as well as a challenging high-dimensional brain parcellation experiment. We also open up several questions that lie at the intersection between SBI techniques and structured Variational Inference.

【22】 A Unified Approach to Fair Online Learning via Blackwell Approachability 标题:通过Blackwell可接近性实现公平在线学习的统一方法

作者:Evgenii Chzhen,Christophe Giraud,Gilles Stoltz 机构:Université Paris-Saclay, CNRS, Laboratoire de mathématiques d’Orsay, Orsay, France 链接:https://arxiv.org/abs/2106.12242 摘要:我们提供了一个设置和一般方法,公平的在线学习随机敏感和非敏感的背景。场景是玩家和自然之间的重复游戏,在每个阶段,双方都根据上下文选择动作。受无意识概念的启发,我们假设玩家在做出决定之前只能访问非敏感上下文,同时我们讨论了自然访问敏感上下文和自然不知道敏感上下文的两种情况。利用Blackwell的可接近性理论处理未知上下文分布的情况,给出了学习目标与公平约束相容的一般充要条件。这一条件在(分组)无遗憾和(分组)校准目标以及作为附加约束的人口均等上被实例化。当目标与约束不相容时,所提供的框架允许描述两者之间的最佳权衡。 摘要:We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts. The setting is a repeated game between the Player and Nature, where at each stage both pick actions based on the contexts. Inspired by the notion of unawareness, we assume that the Player can only access the non-sensitive context before making a decision, while we discuss both cases of Nature accessing the sensitive contexts and Nature unaware of the sensitive contexts. Adapting Blackwell's approachability theory to handle the case of an unknown contexts' distribution, we provide a general necessary and sufficient condition for learning objectives to be compatible with some fairness constraints. This condition is instantiated on (group-wise) no-regret and (group-wise) calibration objectives, and on demographic parity as an additional constraint. When the objective is not compatible with the constraint, the provided framework permits to characterise the optimal trade-off between the two.

【23】 Sentinel-1 and Sentinel-2 Spatio-Temporal Data Fusion for Clouds Removal 标题:Sentinel-1和Sentinel-2时空数据融合去云技术

作者:Alessandro Sebastianelli,Artur Nowakowski,Erika Puglisi,Maria Pia Del Rosso,Jamila Mifdal,Fiora Pirri,Pierre Philippe Mathieu,Silvia Liberata Ullo 链接:https://arxiv.org/abs/2106.12226 摘要:云层的丰富性,无论是在空间上还是在时间上,常常使得光学图像的遥感应用变得困难甚至不可能。本文提出并发展了一种新的基于联合数据融合的光学图像复原方法,将三个深度神经网络结合起来,融合从Sentinel-1和Sentinel-2时间序列数据中提取的时空特征。值得强调的是,代码和数据集都是从零开始实现的,并提供给感兴趣的研究人员进行进一步的分析和调查。 摘要:The abundance of clouds, located both spatially and temporally, often makes remote sensing applications with optical images difficult or even impossible. In this manuscript, a novel method for clouds-corrupted optical image restoration has been presented and developed, based on a joint data fusion paradigm, where three deep neural networks have been combined in order to fuse spatio-temporal features extracted from Sentinel-1 and Sentinel-2 time-series of data. It is worth highlighting that both the code and the dataset have been implemented from scratch and made available to interested research for further analysis and investigation.

【24】 Not all users are the same: Providing personalized explanations for sequential decision making problems 标题:并非所有用户都是相同的:为顺序决策问题提供个性化解释

作者:Utkarsh Soni,Sarath Sreedharan,Subbarao Kambhampati 机构:School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ , USA. 链接:https://arxiv.org/abs/2106.12207 摘要:人们对设计能够与人类协同工作的自主代理越来越感兴趣。毫无疑问,这些代理人将被要求解释他们的行为和决定。虽然生成解释是一个积极研究的主题,但大多数工作往往侧重于生成解释的方法,这些方法是一刀切的。因为在用户模型中的细节被完全忽略了。少数着眼于根据用户背景定制解释的作品依赖于用户的特定模型(分析模型或学习的标签模型)。因此,本文的目标是提出一个端到端的自适应解释生成系统,该系统首先学习agent可以与之交互的不同类型的用户。然后在与目标用户的交互过程中,任务是动态地识别类型并相应地调整其解释。前者是通过数据驱动的聚类方法实现的,而后者则将解释生成问题编译成POMDP。我们使用最先进的POMDP求解器在两个领域证明了我们的系统的有效性。我们还报告了一项用户研究的结果,该研究调查了在人机交互环境中提供个性化解释的好处。 摘要:There is a growing interest in designing autonomous agents that can work alongside humans. Such agents will undoubtedly be expected to explain their behavior and decisions. While generating explanations is an actively researched topic, most works tend to focus on methods that generate explanations that are one size fits all. As in the specifics of the user-model are completely ignored. The handful of works that look at tailoring their explanation to the user's background rely on having specific models of the users (either analytic models or learned labeling models). The goal of this work is thus to propose an end-to-end adaptive explanation generation system that begins by learning the different types of users that the agent could interact with. Then during the interaction with the target user, it is tasked with identifying the type on the fly and adjust its explanations accordingly. The former is achieved by a data-driven clustering approach while for the latter, we compile our explanation generation problem into a POMDP. We demonstrate the usefulness of our system on two domains using state-of-the-art POMDP solvers. We also report the results of a user study that investigates the benefits of providing personalized explanations in a human-robot interaction setting.

【25】 A Review of Assistive Technologies for Activities of Daily Living of Elderly 标题:老年人日常生活活动辅助技术综述

作者:Nirmalya Thakur,Chia Y. Han 机构:Department of Electrical Engineering and Computer Science, College of Engineering and Applied Sciences, University of Cincinnati, Ohio, US 备注:None 链接:https://arxiv.org/abs/2106.12183 摘要:本世纪的一个显著特点是老年人口不断增加。随着年龄的增长,老年人由于身体残疾、认知问题、记忆力减退和行为紊乱而有多种需求和要求。这些限制的程度也因年龄、性别、背景、经验、技能、知识等不同而有所不同。随着年龄的增长,这些不同的需求和挑战限制了老年人独立进行日常生活活动的能力。此外,护理人员的短缺使老年人迫切需要以技术为基础的服务,帮助他们完成日常工作,以维持他们的独立生活和积极老龄化。为了满足这些需要,这项工作包括在这一领域作出三大贡献。首先,它提供了一个相当全面的审查辅助生活技术,旨在帮助老年人进行日常生活能力。其次,工作讨论了通过本次审查确定的挑战,这些挑战目前存在于智能家居和智能城市中实施老年人护理辅助生活服务的背景下。最后,该工作还概述了实施、扩展和整合该领域现有工作的方法,以便开发一个急需的框架,能够根据老年人不同和不断变化的需求为他们提供个性化的帮助和以用户为中心的行为干预。 摘要:One of the distinct features of this century has been the population of older adults which has been on a constant rise. Elderly people have several needs and requirements due to physical disabilities, cognitive issues, weakened memory and disorganized behavior, that they face with increasing age. The extent of these limitations also differs according to the varying diversities in elderly, which include age, gender, background, experience, skills, knowledge and so on. These varying needs and challenges with increasing age, limits abilities of older adults to perform Activities of Daily Living (ADLs) in an independent manner. To add to it, the shortage of caregivers creates a looming need for technology-based services for elderly people, to assist them in performing their daily routine tasks to sustain their independent living and active aging. To address these needs, this work consists of making three major contributions in this field. First, it provides a rather comprehensive review of assisted living technologies aimed at helping elderly people to perform ADLs. Second, the work discusses the challenges identified through this review, that currently exist in the context of implementation of assisted living services for elderly care in Smart Homes and Smart Cities. Finally, the work also outlines an approach for implementation, extension and integration of the existing works in this field for development of a much-needed framework that can provide personalized assistance and user-centered behavior interventions to elderly as per their varying and ever-changing needs.

【26】 Deep Neural Network Based Respiratory Pathology Classification Using Cough Sounds 标题:基于深度神经网络的咳嗽音呼吸系统病理分类

作者:Balamurali B T,Hwan Ing Hee,Saumitra Kapoor,Oon Hoe Teoh,Sung Shin Teng,Khai Pin Lee,Dorien Herremans,Jer Ming Chen 机构:����������, Citation: Lastname, F.; Lastname, F.;, Lastname, F. Title. Preprints ,. 链接:https://arxiv.org/abs/2106.12174 摘要:智能系统正在改变世界,也在改变我们的医疗体系。我们提出了一个基于深度学习的咳嗽音分类模型,可以区分哮喘、上呼吸道感染(URTI)和下呼吸道感染(LRTI)等健康咳嗽和病理性咳嗽的儿童。为了训练一个深层神经网络模型,我们收集了一个新的咳嗽声数据集,标记了临床医生的诊断。所选择的模型是基于Mel倒谱系数(mfcc)特征的双向长短时记忆网络(BiLSTM)。当对健康或病理(一般或属于特定的呼吸病理)两类咳嗽进行分类时,得到的训练模型在根据医生诊断提供的标签对咳嗽进行分类时达到了84%以上的准确率。为了对受试者的呼吸病理状况进行分类,将每个受试者的多个咳嗽时期的结果结合起来。三种呼吸疾病的预测准确率均超过91%。然而,当模型被训练来分类和区分四类咳嗽时,总体准确率下降:一类病理性咳嗽常常被误分类为另一类。然而,如果将健康咳嗽分为健康咳嗽和病理咳嗽分为某些病理类型,则四类模型的总体准确率在84%以上。对MFCC特征空间的纵向研究表明,病理性咳嗽与恢复性咳嗽在同一个受试者身上所占的特征空间是相同的,因此仅用MFCC特征很难区分。 摘要:Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). In order to train a deep neural network model, we collected a new dataset of cough sounds, labelled with clinician's diagnosis. The chosen model is a bidirectional long-short term memory network (BiLSTM) based on Mel Frequency Cepstral Coefficients (MFCCs) features. The resulting trained model when trained for classifying two classes of coughs -- healthy or pathology (in general or belonging to a specific respiratory pathology), reaches accuracy exceeding 84% when classifying cough to the label provided by the physicians' diagnosis. In order to classify subject's respiratory pathology condition, results of multiple cough epochs per subject were combined. The resulting prediction accuracy exceeds 91% for all three respiratory pathologies. However, when the model is trained to classify and discriminate among the four classes of coughs, overall accuracy dropped: one class of pathological coughs are often misclassified as other. However, if one consider the healthy cough classified as healthy and pathological cough classified to have some kind of pathologies, then the overall accuracy of four class model is above 84%. A longitudinal study of MFCC feature space when comparing pathological and recovered coughs collected from the same subjects revealed the fact that pathological cough irrespective of the underlying conditions occupy the same feature space making it harder to differentiate only using MFCC features.

【27】 APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores 标题:APNN-TC:在安培GPU张量核上加速任意精度神经网络

作者:Boyuan Feng,Yuke Wang,Tong Geng,Ang Li,Yufei Ding 机构:†University of California, Santa Barbara, Pacific Northwest National Lab. 备注:Accepted by SC'21 链接:https://arxiv.org/abs/2106.12169 摘要:近年来,量化加速神经网络得到了广泛的研究。不幸的是,先前具有不同精度的工作(例如,1位权重和2位激活)通常受到GPU上有限精度支持的限制(例如,int1和int4)。为了打破这种限制,我们引入了第一个任意精度的神经网络框架(APNN-TC)来充分利用安培GPU张量核的量化优势。具体地说,APNN-TC首先采用了一种新的仿真算法来支持具有int1计算原语和异或/布尔运算的任意短比特宽度计算。第二,APNN-TC集成了任意精度层设计,通过新的批处理策略和专门的内存组织,有效地将仿真算法映射到张量核。第三,APNN-TC采用了一种新颖的任意精度神经网络设计,最大限度地减少了跨层的内存访问,进一步提高了性能。广泛的评估表明,APNN-TC可以实现明显的加速比剪刀差核和各种神经网络模型,如ResNet和VGG。 摘要:Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access across layers and further improve performance. Extensive evaluations show that APNN-TC can achieve significant speedup over CUTLASS kernels and various NN models, such as ResNet and VGG.

【28】 Neural Fashion Image Captioning : Accounting for Data Diversity 标题:神经时尚图像字幕:考虑数据多样性

作者:Gilles Hacheme,Noureini Sayouti 机构:Ai,Innov, Aix-Marseille University (Aix-Marseille School of Economics), CNRS & EHESS, Marseille, France 链接:https://arxiv.org/abs/2106.12154 摘要:图像字幕的应用领域越来越广,时尚也不例外。拥有自动项目描述是非常有趣的时尚网络平台托管有时几十万张图片。本文是第一个针对时尚图像的图像字幕。为了有助于解决数据集的多样性问题,我们引入了InFashAIv1数据集,其中包含了近16000张非洲时尚商品图片及其标题、价格和一般描述。除了InFashAIv1之外,我们还使用了众所周知的DeepFashion数据集。字幕是使用由CNN编码器和RNN解码器组成的textit{Show and Tell}模型生成的。我们发现,在这两个数据集上联合训练模型可以提高非洲风格时尚图片的字幕质量,这意味着从西方风格的数据中学习。InFashAIv1数据集在href发布{https://github.com/hgilles06/infashai}{Github}鼓励更多多样性的作品。 摘要:Image captioning has increasingly large domains of application, and fashion is not an exception. Having automatic item descriptions is of great interest for fashion web platforms hosting sometimes hundreds of thousands of images. This paper is one of the first tackling image captioning for fashion images. To contribute addressing dataset diversity issues, we introduced the InFashAIv1 dataset containing almost 16.000 African fashion item images with their titles, prices and general descriptions. We also used the well known DeepFashion dataset in addition to InFashAIv1. Captions are generated using the textit{Show and Tell} model made of CNN encoder and RNN Decoder. We showed that jointly training the model on both datasets improves captions quality for African style fashion images, suggesting a transfer learning from Western style data. The InFashAIv1 dataset is released on href{https://github.com/hgilles06/infashai}{Github} to encourage works with more diversity inclusion.

【29】 Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark 标题:基于宽度的Lookahead,在Atari-2600基准上学习基本策略和启发式算法

作者:Stefan O'Toole,Nir Lipovetzky,Miquel Ramirez,Adrian Pearce 机构:School of Computing and Information Systems,Electrical and Electronic Engineering, Univeristy of Melbourne, Australia 链接:https://arxiv.org/abs/2106.12151 摘要:我们提出了新的基于宽度的规划和学习算法应用于Atari-2600基准。这些算法的灵感来自于对以前基于宽度的规划师所做的设计决策的仔细分析。我们在Atari-2600游戏中对我们的新算法进行了测试,结果表明,我们的最佳算法RIW$U C$ CPV的性能优于先前引入的基于宽度的规划和学习算法$pi$-IW(1)、$pi$-IW(1) 和$pi$-HIW(n,1)。此外,我们还根据Atari-2600游戏的一些定义特征对它们进行了分类。通过对游戏的分析,可以进一步了解基于宽度的算法的行为和性能。即,对于具有大分支因子的博弈和具有稀疏有意义报酬的博弈,RIW$u C$ CPV优于$pi$-IW、$pi$-IW(1) 和$pi$-HIW(n,1)。 摘要:We propose new width-based planning and learning algorithms applied over the Atari-2600 benchmark. The algorithms presented are inspired from a careful analysis of the design decisions made by previous width-based planners. We benchmark our new algorithms over the Atari-2600 games and show that our best performing algorithm, RIW$_C$ CPV, outperforms previously introduced width-based planning and learning algorithms $pi$-IW(1), $pi$-IW(1) and $pi$-HIW(n, 1). Furthermore, we present a taxonomy of the set of Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the width-based algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, RIW$_C$ CPV outperforms $pi$-IW, $pi$-IW(1) and $pi$-HIW(n, 1).

【30】 NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs 标题:NodePiess:大型知识图的组合和参数高效表示

作者:Mikhail Galkin,Jiapeng Wu,Etienne Denis,William L. Hamilton 机构:Mila, McGill University, Montreal, Canada 链接:https://arxiv.org/abs/2106.12144 摘要:传统的知识图表示学习算法将每个实体映射到一个唯一的嵌入向量。这种浅层查找导致用于存储嵌入矩阵的内存消耗的线性增长,并且在处理真实世界的KG时产生高计算成本。与NLP中常用的子词标记化方法相比较,我们探索了具有可能的次线性内存需求的更具参数效率的节点嵌入策略。为此,我们提出了NodePiece,一种基于锚的方法来学习固定大小的实体词汇表。在NodePiece中,子词/子实体单元的词汇表是由具有已知关系类型的图中的锚节点构造的。给定这样一个固定大小的词汇表,可以引导任何实体的编码和嵌入,包括在训练期间看不到的实体。实验表明,NodePiece在节点分类、链路预测和关系预测任务中表现出很强的竞争力,同时在图中保留不到10%的显式节点作为锚,并且参数通常减少10倍。 摘要:Conventional representation learning algorithms for knowledge graphs (KG) map each entity to a unique embedding vector. Such a shallow lookup results in a linear growth of memory consumption for storing the embedding matrix and incurs high computational costs when working with real-world KGs. Drawing parallels with subword tokenization commonly used in NLP, we explore the landscape of more parameter-efficient node embedding strategies with possibly sublinear memory requirements. To this end, we propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary. In NodePiece, a vocabulary of subword/sub-entity units is constructed from anchor nodes in a graph with known relation types. Given such a fixed-size vocabulary, it is possible to bootstrap an encoding and embedding for any entity, including those unseen during training. Experiments show that NodePiece performs competitively in node classification, link prediction, and relation prediction tasks while retaining less than 10% of explicit nodes in a graph as anchors and often having 10x fewer parameters.

【31】 IQ-Learn: Inverse soft-Q Learning for Imitation 标题:IQ-Learning:用于模仿的逆软Q学习

作者:Divyansh Garg,Shuvam Chakraborty,Chris Cundy,Jiaming Song,Stefano Ermon 机构:Stanford University 链接:https://arxiv.org/abs/2106.12142 摘要:在许多顺序决策问题(如机器人控制、游戏、顺序预测)中,人类或专家数据包含了关于任务的有用信息。然而,在具有复杂动力学的高维环境中,从少量专家数据中进行模仿学习是一项具有挑战性的工作。行为克隆是一种简单的方法,由于其简单的实现和稳定的收敛性而被广泛应用,但它不利用任何涉及环境动态的信息。现有的许多利用动态信息的方法在实际应用中很难训练,这是由于对报酬和策略逼近器或有偏的高方差梯度估计的对抗性优化过程。我们提出了一种动态感知的IL方法,它通过学习一个Q函数来避免对抗性训练,隐式地表示奖励和策略。在标准测试中,隐式学习的奖赏与真实奖赏呈高度正相关,说明我们的方法也可用于逆强化学习。我们的方法,逆软Q学习(IQ-Learn)在离线和在线模拟学习环境中获得了最先进的结果,在所需的环境交互数量和在高维空间中的可伸缩性方面都超过了现有的方法。 摘要:In many sequential decision-making problems (e.g., robotics control, game playing, sequential prediction), human or expert data is available containing useful information about the task. However, imitation learning (IL) from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence but doesn't utilize any information involving the environment's dynamics. Many existing methods that exploit dynamics information are difficult to train in practice due to an adversarial optimization process over reward and policy approximators or biased, high variance gradient estimators. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy. On standard benchmarks, the implicitly learned rewards show a high positive correlation with the ground-truth rewards, illustrating our method can also be used for inverse reinforcement learning (IRL). Our method, Inverse soft-Q learning (IQ-Learn) obtains state-of-the-art results in offline and online imitation learning settings, surpassing existing methods both in the number of required environment interactions and scalability in high-dimensional spaces.

【32】 PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database 标题:PatentNet:一个大规模不完整的多视图、多模态、多标签工业品图像数据库

作者:Fangyuan Lei,Da Huang,Jianjian Jiang,Ruijun Ma,Senhong Wang,Jiangzhong Cao,Yusen Lin,Qingyun Dai 机构:a Guangdong Provincial Key Laboratory of Intellectual Property &Big Data, Guangzhou , China, b School of Electronic and Information, Guangdong Polytechnic Normal University, Guangzhou , China 备注:12 pages,7 figures 链接:https://arxiv.org/abs/2106.12139 摘要:在深度学习领域,大规模的图像数据集为目标识别和检索的成功带来了突破。目前,作为创新的体现,工业产品的多样性显著增大,其中不完全的多视角、多模态、多标签数据集与传统的数据集不同。在本文中,我们介绍了一个工业品数据集,即PatentNet,其中包含大量高度多样化、精确和详细的工业品图像注释以及相应的文本。在PatentNet中,图像和文本来源于外观设计专利。PatentNet拥有超过600万张由专业人员手动检查的工业品标签图像和相应文本,是第一个正在进行的工业品图像数据库,其种类比以前用于基准测试的工业品数据集更广泛。PatentNet根据Locarno分类协议将数百万张图像组织成32个类和219个子类。通过对图像分类、图像检索和不完全多视图聚类的大量实验,我们证明了我们的专利网比现有的工业图像数据集更具多样性、复杂性和挑战性,具有更高的潜力。此外,PatentNet中不完全多视图、多模态和多标签的特性能够在人工智能领域和其他领域提供无与伦比的机会。 摘要:In deep learning area, large-scale image datasets bring a breakthrough in the success of object recognition and retrieval. Nowadays, as the embodiment of innovation, the diversity of the industrial goods is significantly larger, in which the incomplete multiview, multimodal and multilabel are different from the traditional dataset. In this paper, we introduce an industrial goods dataset, namely PatentNet, with numerous highly diverse, accurate and detailed annotations of industrial goods images, and corresponding texts. In PatentNet, the images and texts are sourced from design patent. Within over 6M images and corresponding texts of industrial goods labeled manually checked by professionals, PatentNet is the first ongoing industrial goods image database whose varieties are wider than industrial goods datasets used previously for benchmarking. PatentNet organizes millions of images into 32 classes and 219 subclasses based on the Locarno Classification Agreement. Through extensive experiments on image classification, image retrieval and incomplete multiview clustering, we demonstrate that our PatentNet is much more diverse, complex, and challenging, enjoying higher potentials than existing industrial image datasets. Furthermore, the characteristics of incomplete multiview, multimodal and multilabel in PatentNet are able to offer unparalleled opportunities in the artificial intelligence community and beyond.

【33】 NAX: Co-Designing Neural Network and Hardware Architecture for Memristive Xbar based Computing Systems 标题:NAX:基于记忆Xbar计算系统的神经网络和硬件协同设计

作者:Shubham Negi,Indranil Chakraborty,Aayush Ankit,Kaushik Roy 机构:Purdue University, Microsoft Corporation 备注:10 pages, 9 figures 链接:https://arxiv.org/abs/2106.12125 摘要:内存计算(IMC)硬件使用记忆交叉阵列(MCA)越来越流行,以加速深层神经网络(DNN),因为它缓解了与冯诺依曼体系结构相关的“内存墙”问题。映射到这些硬件的DNNs的硬件效率(能量、延迟和面积)以及应用精度(考虑设备和电路的非理想性)与网络参数(如内核大小、深度等)和硬件结构参数(如纵横制大小)共同相关。然而,网络和硬件参数的协同优化带来了一个具有挑战性的搜索空间,它由不同的内核大小映射到不同的纵横制大小。为此,我们提出了NAX——一个高效的神经结构搜索引擎,它可以共同设计神经网络和基于IMC的硬件结构。NAX探索上述的搜索空间,以确定每个DNN层的内核和相应的crossbar大小,从而在硬件效率和应用程序精度之间实现最佳的折衷。从NAX得到的结果表明,该网络在不同的网络层上具有不同的纵横制尺寸,并且在考虑纵横制的非理想性的情况下实现了最佳的硬件效率和精度。与基线ResNet-20和ResNet-18模型相比,在CIFAR-10和Tiny-ImageNet上,我们的模型的精度分别提高了0.8%、0.2%,EDAP(能量延迟面积积)降低了17%、4%。 摘要:In-Memory Computing (IMC) hardware using Memristive Crossbar Arrays (MCAs) are gaining popularity to accelerate Deep Neural Networks (DNNs) since it alleviates the "memory wall" problem associated with von-Neumann architecture. The hardware efficiency (energy, latency and area) as well as application accuracy (considering device and circuit non-idealities) of DNNs mapped to such hardware are co-dependent on network parameters, such as kernel size, depth etc. and hardware architecture parameters such as crossbar size. However, co-optimization of both network and hardware parameters presents a challenging search space comprising of different kernel sizes mapped to varying crossbar sizes. To that effect, we propose NAX -- an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture. NAX explores the aforementioned search space to determine kernel and corresponding crossbar sizes for each DNN layer to achieve optimal tradeoffs between hardware efficiency and application accuracy. Our results from NAX show that the networks have heterogeneous crossbar sizes across different network layers, and achieves optimal hardware efficiency and accuracy considering the non-idealities in crossbars. On CIFAR-10 and Tiny ImageNet, our models achieve 0.8%, 0.2% higher accuracy, and 17%, 4% lower EDAP (energy-delay-area product) compared to a baseline ResNet-20 and ResNet-18 models, respectively.

【34】 Exploiting Negative Learning for Implicit Pseudo Label Rectification in Source-Free Domain Adaptive Semantic Segmentation 标题:无源域自适应语义分割中利用负学习进行隐式伪标签校正

作者:Xin Luo,Wei Chen,Yusong Tan,Chen Li,Yulin He,Xiaogang Jia 机构:College of Computers, National University of Defense Technology 备注:8 pages, 4 figures 链接:https://arxiv.org/abs/2106.12123 摘要:在没有源数据的情况下,需要将存储在经过良好训练的源模型中的知识转移到无注释的目标域。然而,最新的无源域适配(SFDA)方法受到严格限制:1)必须访问源模型的内部规范;在自我训练中,伪标签应该是干净的,使得依赖语义切分的关键任务不可靠。针对这些缺陷,本文提出了一种基于伪标记校正的领域自适应语义分割方法(即PR-SFDA),分为两个阶段:1)采用最大平方损失法对目标模型进行正则化,以保证预测的可信度;和2)textit{噪声感知伪标签学习}:负学习在训练中能容忍噪声伪标签,同时正学习能快速收敛。在领域自适应语义切分基准测试textit{GTA5$to$Cityscapes}上进行了大量的实验。总的来说, textit{PR-SFDA}实现了4900万的性能,与最先进的同类产品非常接近。请注意,后者需要访问源模型的内部规范,而textit{PR-SFDA}解决方案则不需要访问任何规范。 摘要:It is desirable to transfer the knowledge stored in a well-trained source model onto non-annotated target domain in the absence of source data. However, state-of-the-art methods for source free domain adaptation (SFDA) are subject to strict limits: 1) access to internal specifications of source models is a must; and 2) pseudo labels should be clean during self-training, making critical tasks relying on semantic segmentation unreliable. Aiming at these pitfalls, this study develops a domain adaptive solution to semantic segmentation with pseudo label rectification (namely textit{PR-SFDA}), which operates in two phases: 1) textit{Confidence-regularized unsupervised learning}: Maximum squares loss applies to regularize the target model to ensure the confidence in prediction; and 2) textit{Noise-aware pseudo label learning}: Negative learning enables tolerance to noisy pseudo labels in training, meanwhile positive learning achieves fast convergence. Extensive experiments have been performed on domain adaptive semantic segmentation benchmark, textit{GTA5 $to$ Cityscapes}. Overall, textit{PR-SFDA} achieves a performance of 49.0 mIoU, which is very close to that of the state-of-the-art counterparts. Note that the latter demand accesses to the source model's internal specifications, whereas the textit{PR-SFDA} solution needs none as a sharp contrast.

【35】 Prevention and Resolution of Conflicts in Social Navigation -- a Survey 标题:社会通航冲突的预防与化解--一项调查报告

作者:Reuth Mirsky,Xuesu Xiao,Justin Hart,Peter Stone 链接:https://arxiv.org/abs/2106.12113 摘要:随着机器人在共享的人类-机器人环境中进行协作的目标日益临近,在这种环境下进行导航变得至关重要和令人满意。机器人技术的最新发展已经遇到并解决了人类-机器人混合环境中导航的一些挑战,近年来,我们观察到一系列相关工作专门针对如何处理社会导航中代理之间的冲突这一问题。这些贡献提供了模型、算法和评估指标,但是由于这一研究领域本身是跨学科的,许多相关论文不具有可比性,研究人员之间也没有标准词汇。这项调查的主要目的是通过提出这样一种共同语言来弥合这一差距,用它来调查现有的工作,并突出公开的问题。它首先定义社交导航中的冲突,并提供其组件的详细分类。这项调查然后映射现有的工作,同时讨论论文使用拟议的分类框架。最后,本文提出了当前社会导航前沿研究的一些方向和问题,以期对今后的重点研究工作有所帮助。 摘要:With the approaching goal of having robots collaborate in shared human-robot environments, navigation in this context becomes both crucial and desirable. Recent developments in robotics have encountered and tackled some of the challenges of navigating in mixed human-robot environments, and in recent years we observe a surge of related work that specifically targets the question of how to handle conflicts between agents in social navigation. These contributions offer models, algorithms, and evaluation metrics, however as this research area is inherently interdisciplinary, many of the relevant papers are not comparable and there is no standard vocabulary between the researchers. The main goal of this survey is to bridge this gap by proposing such a common language, using it to survey existing work, and highlighting open problems. It starts by defining a conflict in social navigation, and offers a detailed taxonomy of its components. This survey then maps existing work while discussing papers using the framing of the proposed taxonomy. Finally, this paper propose some future directions and problems that are currently in the frontier of social navigation to help focus research efforts.

【36】 A Federated Data-Driven Evolutionary Algorithm for Expensive Multi/Many-objective Optimization 标题:代价昂贵的多目标优化的联邦数据驱动进化算法

作者:Jinjin Xu,Yaochu Jin,Wenli Du 机构:Received: date Accepted: date 链接:https://arxiv.org/abs/2106.12086 摘要:数据驱动优化在现实世界中有许多成功的应用,在进化优化领域受到越来越多的关注。大多数现有的算法都假设用于优化的数据总是可以在中央服务器上用于构建代理。然而,当数据必须以分布式方式收集并且受到隐私限制时,这种假设可能无法成立。提出了一种联邦数据驱动的多目标进化优化算法。为此,我们利用联邦学习构造代理项,以便多个客户机协作训练径向基函数网络作为全局代理项。然后提出了一种新的联邦捕获函数,用于中央服务器使用全局代理逼近目标值,并基于局部模型估计逼近目标值的不确定性水平。通过与两种最新的代理辅助多目标进化算法的比较,验证了该算法在一系列多目标基准问题上的性能。 摘要:Data-driven optimization has found many successful applications in the real world and received increased attention in the field of evolutionary optimization. Most existing algorithms assume that the data used for optimization is always available on a central server for construction of surrogates. This assumption, however, may fail to hold when the data must be collected in a distributed way and is subject to privacy restrictions. This paper aims to propose a federated data-driven evolutionary multi-/many-objective optimization algorithm. To this end, we leverage federated learning for surrogate construction so that multiple clients collaboratively train a radial-basis-function-network as the global surrogate. Then a new federated acquisition function is proposed for the central server to approximate the objective values using the global surrogate and estimate the uncertainty level of the approximated objective values based on the local models. The performance of the proposed algorithm is verified on a series of multi/many-objective benchmark problems by comparing it with two state-of-the-art surrogate-assisted multi-objective evolutionary algorithms.

【37】 Towards Consistent Predictive Confidence through Fitted Ensembles 标题:通过拟合的系综走向一致的预测置信度

作者:Navid Kardan,Ankit Sharma,Kenneth O. Stanley 机构:Department of Computer Science, University of Central Florida, Orlando, USA, -,-,- 备注:IJCNN 2021 链接:https://arxiv.org/abs/2106.12070 摘要:深度神经网络是机器学习应用中最近取得的许多成功的背后原因。然而,这些模型在遇到分布外(OOD)的例子或做出错误的预测时会产生过度自信的决策。这种不一致的预测置信度限制了将独立训练的学习模型集成到一个更大的系统中。本文引入可分离概念学习框架,在面向对象的实例中真实地度量分类器的性能。在此设置中,分类器的多个实例在类集合的分区的不同部分上进行训练。随后,在单独的测试集上评估这些模型组合的性能。与当前的OOD检测技术不同,该框架不需要辅助OOD数据集,也不需要将分类与检测性能分开。此外,我们提出了一个新的强基线,用于在深度模型中更一致的预测置信度,称为拟合集合,其中过度自信的预测被原始分类任务的转换版本纠正。通过观察组件间相互矛盾的预测,拟合的集合可以自然地检测出OOD示例,而不需要辅助数据。在MNIST、SVHN、CIFAR-10/100和ImageNet上的实验表明,在OOD示例上,拟合的集成显著优于传统的集成,并且可以扩展。 摘要:Deep neural networks are behind many of the recent successes in machine learning applications. However, these models can produce overconfident decisions while encountering out-of-distribution (OOD) examples or making a wrong prediction. This inconsistent predictive confidence limits the integration of independently-trained learning models into a larger system. This paper introduces separable concept learning framework to realistically measure the performance of classifiers in presence of OOD examples. In this setup, several instances of a classifier are trained on different parts of a partition of the set of classes. Later, the performance of the combination of these models is evaluated on a separate test set. Unlike current OOD detection techniques, this framework does not require auxiliary OOD datasets and does not separate classification from detection performance. Furthermore, we present a new strong baseline for more consistent predictive confidence in deep models, called fitted ensembles, where overconfident predictions are rectified by transformed versions of the original classification task. Fitted ensembles can naturally detect OOD examples without requiring auxiliary data by observing contradicting predictions among its components. Experiments on MNIST, SVHN, CIFAR-10/100, and ImageNet show fitted ensemble significantly outperform conventional ensembles on OOD examples and are possible to scale.

【38】 On Positivity Bias in Negative Reviews 标题:论负面评论中的正面偏向

作者:Madhusudhan Aithal,Chenhao Tan 机构:University of Colorado Boulder, University of Chicago 备注:11 pages, 17 figures, ACL 2021 链接:https://arxiv.org/abs/2106.12056 摘要:先前的研究表明,在人类表达中,积极的词语比消极的词语出现的频率更高,这通常归因于积极偏见,即人们倾向于报告对现实的积极看法。但是负面评论中使用的语言呢?与先前的研究结果一致,我们使用了大量的数据,发现英语负面评论中的正面词多于负面词。我们将这一观察结果与先前关于否定语用学的研究结果相一致,并表明否定通常与否定评论中的肯定词相联系。此外,在否定性评论中,大多数带有肯定词的句子表达的是基于情感量词的否定意见,表示某种形式的否定。 摘要:Prior work has revealed that positive words occur more frequently than negative words in human expressions, which is typically attributed to positivity bias, a tendency for people to report positive views of reality. But what about the language used in negative reviews? Consistent with prior work, we show that English negative reviews tend to contain more positive words than negative words, using a variety of datasets. We reconcile this observation with prior findings on the pragmatics of negation, and show that negations are commonly associated with positive words in negative reviews. Furthermore, in negative reviews, the majority of sentences with positive words express negative opinions based on sentiment classifiers, indicating some form of negation.

【39】 ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences 标题:ABCD:一个将复句转换为简单句覆盖集的图框架

作者:Yanjun Gao,Ting-hao,Huang,Rebecca J. Passonneau 机构:Pennsylvania State University 备注:To appear in the proceeding of 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021) Main Conference 链接:https://arxiv.org/abs/2106.12027 摘要:原子分句是理解复句的基本语篇单位。复杂句中原子句的识别对于摘要、论据挖掘、语篇分析、语篇分析、问答等应用都具有重要意义。以前的工作主要依赖于依赖于解析的基于规则的方法。我们提出了一个新的任务,将每一个复句分解成由源中的时态从句派生出的简单句,并提出了一个新的问题表述作为一个图编辑任务。我们的神经模型学习接受、破坏、复制或删除结合词邻接和语法依赖的图形元素。整个处理流程包括图形构造、图形编辑和从输出图形生成句子的模块。我们介绍了一个新的用于训练和评估复句分解的数据集DeSSE和MinWikiSplit的子集MinWiki。ABCD作为MinWiki上的两个解析基线,性能相当。在具有更均匀的复杂句型平衡的DeSSE上,我们的模型实现了比编码器-解码器基线更高的原子句数量精度。结果包括详细的误差分析。 摘要:Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation as a graph edit task. Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies. The full processing pipeline includes modules for graph construction, graph editing, and sentence generation from the output graph. We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition, and MinWiki, a subset of MinWikiSplit. ABCD achieves comparable performance as two parsing baselines on MinWiki. On DeSSE, which has a more even balance of complex sentence types, our model achieves higher accuracy on the number of atomic sentences than an encoder-decoder baseline. Results include a detailed error analysis.

【40】 The Neurally-Guided Shape Parser: A Monte Carlo Method for Hierarchical Labeling of Over-segmented 3D Shapes 标题:神经引导的形状解析器:超分割三维形状分层标注的蒙特卡罗方法

作者:R. Kenny Jones,Rana Hanocka,Daniel Ritchie 机构:Brown University, University of Chicago 链接:https://arxiv.org/abs/2106.12026 摘要:许多基于学习的三维形状语义分割方法通过端到端方式训练的单通道方法将标签分配给形状原子(例如点云中的点或网格中的面)。这种方法取得了令人印象深刻的性能,但需要大量的标记训练数据。这种范式纠缠着两个可分离的子问题:(1)将形状分解为区域;(2)为这些区域分配语义标签。我们声称,解开这些子问题可以减少标记数据的负担:(1)区域分解不需要语义标记,可以在无监督的方式下执行;(2)标记形状区域而不是原子会导致较小的搜索空间,应该可以用较少的标记训练数据进行学习。在本文中,我们通过介绍神经引导形状分析器(NGSP)来研究第二种说法,NGSP是一种学习如何为过度分割的3D形状区域分配语义标签的方法。我们通过映射推理来解决这个问题,在输入形状的条件下建立标签分配的后验概率模型。我们采用了一种由神经网络引导的蒙特卡罗重要性抽样方法,这种基于搜索的方法通过假设输入形状被分解为离散区域而变得可行。我们评估了NGSP的任务层次语义分割制造三维形状从零件网。我们发现NGSP比基线有显著的性能改进,基线学习标记形状原子,然后为每个形状区域聚合预测,特别是在低数据区域。最后,我们证明了NGSP对区域粒度的鲁棒性,因为它在区域发生严重损坏的情况下仍然保持了很强的分割性能。 摘要:Many learning-based 3D shape semantic segmentation methods assign labels to shape atoms (e.g. points in a point cloud or faces in a mesh) with a single-pass approach trained in an end-to-end fashion. Such methods achieve impressive performance but require large amounts of labeled training data. This paradigm entangles two separable subproblems: (1) decomposing a shape into regions and (2) assigning semantic labels to these regions. We claim that disentangling these subproblems reduces the labeled data burden: (1) region decomposition requires no semantic labels and could be performed in an unsupervised fashion, and (2) labeling shape regions instead of atoms results in a smaller search space and should be learnable with less labeled training data. In this paper, we investigate this second claim by presenting the Neurally-Guided Shape Parser (NGSP), a method that learns how to assign semantic labels to regions of an over-segmented 3D shape. We solve this problem via MAP inference, modeling the posterior probability of a labeling assignment conditioned on an input shape. We employ a Monte Carlo importance sampling approach guided by a neural proposal network, a search-based approach made feasible by assuming the input shape is decomposed into discrete regions. We evaluate NGSP on the task of hierarchical semantic segmentation on manufactured 3D shapes from PartNet. We find that NGSP delivers significant performance improvements over baselines that learn to label shape atoms and then aggregate predictions for each shape region, especially in low-data regimes. Finally, we demonstrate that NGSP is robust to region granularity, as it maintains strong segmentation performance even as the regions undergo significant corruption.

【41】 Q-Learning Lagrange Policies for Multi-Action Restless Bandits 标题:多行动无休止土匪的Q-学习Lagrange策略

作者:Jackson A. Killian,Arpita Biswas,Sanket Shah,Milind Tambe 机构:Harvard University, Cambridge, MA, USA 备注:13 pages, 6 figures, to be published in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data 链接:https://arxiv.org/abs/2106.12024 摘要:多行动不安多武装土匪(RMAB)是一个强大的资源分配框架,其中$N$独立进程被管理。然而,以往的工作只研究离线设置的问题动力学是已知的。针对这一限制性假设,我们设计了第一个基于拉格朗日松弛和Q-学习相结合的多动作rmab在线学习策略的算法。我们的第一种方法,MAIQL,将二元动作rmab中Whittle索引的Q-学习方法扩展到多动作设置。我们导出了一个广义更新规则和收敛性证明,并证明在标准假设下,MAIQL收敛到渐近最优的多作用RMAB策略$trightarrow{}infty$。然而,MAIQL依赖于在两个时间尺度上学习Q函数和索引,这导致收敛速度慢,并且要求问题结构能够很好地执行。因此,我们设计了第二种算法LPQL,它通过Q-学习的一种变体来最小化Lagrange界,从而学习多动作RMABs的性能良好且更通用的Lagrange策略。为了保证快速收敛,我们采用了一种能够在单个时间尺度上进行学习的近似策略,然后给出了将近似精度与LPQL返回值的上界$trightarrow{}infty$联系起来的保证。最后,我们表明,我们的方法在多个设置中总是优于基线,包括一个来自真实世界的药物依从性数据。 摘要:Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed. However, previous work only study the offline setting where problem dynamics are known. We address this restrictive assumption, designing the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning. Our first approach, MAIQL, extends a method for Q-learning the Whittle index in binary-action RMABs to the multi-action setting. We derive a generalized update rule and convergence proof and establish that, under standard assumptions, MAIQL converges to the asymptotically optimal multi-action RMAB policy as $trightarrow{}infty$. However, MAIQL relies on learning Q-functions and indexes on two timescales which leads to slow convergence and requires problem structure to perform well. Thus, we design a second algorithm, LPQL, which learns the well-performing and more general Lagrange policy for multi-action RMABs by learning to minimize the Lagrange bound through a variant of Q-learning. To ensure fast convergence, we take an approximation strategy that enables learning on a single timescale, then give a guarantee relating the approximation's precision to an upper bound of LPQL's return as $trightarrow{}infty$. Finally, we show that our approaches always outperform baselines across multiple settings, including one derived from real-world medication adherence data.

【42】 Exploring the Representational Power of Graph Autoencoder 标题:图形自动编码器表现力的探索

作者:Maroun Haddad,Mohamed Bouguessa 机构:Department of Computer Science, University of Quebec at Montreal, Montreal, Quebec, Canada 链接:https://arxiv.org/abs/2106.12005 摘要:尽管表征学习在许多图形学习任务中取得了巨大的成功,但是对于这些嵌入所捕获的结构背后的理解却很少。例如,我们想知道拓扑特征,如三角形计数、节点度和其他中心性度量是否具体编码在嵌入中。此外,我们询问嵌入中这些结构的存在是否对下游任务(如聚类和分类)的更好性能是必要的。为了解决这些问题,我们对三类无监督图嵌入模型和七种不同的图自动编码器进行了广泛的实证研究。结果表明,在模型保持二阶近似的前提下,采用和聚集规则的图自动编码器的第一层具体地保留了度、局部聚类得分、介数中心性、特征向量中心性和三角形计数等五个拓扑特征。我们通过揭示上述模型嵌入中拓扑特征分布的层次结构来补充这些特征存在的进一步证据。我们还证明了具有这些性质的模型在某些下游任务上的性能优于其他模型,特别是当保留的特征与手头的任务相关时。最后,我们通过一个与社会影响预测相关的测试案例来评估我们研究结果的适用性。 摘要:While representation learning has yielded a great success on many graph learning tasks, there is little understanding behind the structures that are being captured by these embeddings. For example, we wonder if the topological features, such as the Triangle Count, the Degree of the node and other centrality measures are concretely encoded in the embeddings. Furthermore, we ask if the presence of these structures in the embeddings is necessary for a better performance on the downstream tasks, such as clustering and classification. To address these questions, we conduct an extensive empirical study over three classes of unsupervised graph embedding models and seven different variants of Graph Autoencoders. Our results show that five topological features: the Degree, the Local Clustering Score, the Betweenness Centrality, the Eigenvector Centrality, and Triangle Count are concretely preserved in the first layer of the graph autoencoder that employs the SUM aggregation rule, under the condition that the model preserves the second-order proximity. We supplement further evidence for the presence of these features by revealing a hierarchy in the distribution of the topological features in the embeddings of the aforementioned model. We also show that a model with such properties can outperform other models on certain downstream tasks, especially when the preserved features are relevant to the task at hand. Finally, we evaluate the suitability of our findings through a test case study related to social influence prediction.

【43】 On the Diversity and Limits of Human Explanations 标题:论人文主义解释的多样性与局限性

作者:Chenhao Tan 机构:Department of Computer Science & Harris School of Public Policy, University of Chicago 备注:15 pages, 12 tables 链接:https://arxiv.org/abs/2106.11988 摘要:在自然语言处理领域,越来越多的人致力于建立人类解释的数据集。然而,解释一词包含了广泛的概念,每个概念都有不同的性质和分支。我们的目标是提供一个不同类型的解释和人类的局限性概述,并讨论收集和使用NLP解释的含义。受心理学和认知科学的启发,我们将NLP中已有的人类解释分为三类:近端机制、证据和过程。这三种类型在性质上是不同的,对结果的解释也有影响。例如,在心理学中,程序不被认为是解释,而是与从指令中学习的大量工作联系在一起。解释的多样性进一步体现在注释者解释和回答开放式为什么问题所需的代理问题上。最后,解释可能需要不同的,往往是更深层次的理解,而不是预测,这让人怀疑人类是否能在某些任务中提供有用的解释。 摘要:A growing effort in NLP aims to build datasets of human explanations. However, the term explanation encompasses a broad range of notions, each with different properties and ramifications. Our goal is to provide an overview of diverse types of explanations and human limitations, and discuss implications for collecting and using explanations in NLP. Inspired by prior work in psychology and cognitive sciences, we group existing human explanations in NLP into three categories: proximal mechanism, evidence, and procedure. These three types differ in nature and have implications for the resultant explanations. For instance, procedure is not considered explanations in psychology and connects with a rich body of work on learning from instructions. The diversity of explanations is further evidenced by proxy questions that are needed for annotators to interpret and answer open-ended why questions. Finally, explanations may require different, often deeper, understandings than predictions, which casts doubt on whether humans can provide useful explanations in some tasks.

【44】 Bounds on Causal Effects and Application to High Dimensional Data 标题:因果效应的界限及其在高维数据中的应用

作者:Ang Li,Judea Pearl 机构:University of California, Los Angeles, Computer Science Department 链接:https://arxiv.org/abs/2106.12121 摘要:本文讨论了当后门或前门准则中的调整变量被部分观测时因果效应的估计问题。对于这类情形,我们通过求解两个非线性优化问题得到了因果效应的界,并证明了该界是充分的。利用这种优化方法,我们提出了一个降维框架,允许一个交易估计功率偏差,并通过仿真研究证明其性能。 摘要:This paper addresses the problem of estimating causal effects when adjustment variables in the back-door or front-door criterion are partially observed. For such scenarios, we derive bounds on the causal effects by solving two non-linear optimization problems, and demonstrate that the bounds are sufficient. Using this optimization method, we propose a framework for dimensionality reduction that allows one to trade bias for estimation power, and demonstrate its performance using simulation studies.

0 人点赞