cs.AI人工智能,共计108篇
【1】 Functional Regularization for Reinforcement Learning via Learned Fourier Features 标题:基于学习傅立叶特征的强化学习函数正则化 链接:https://arxiv.org/abs/2112.03257
作者:Alexander C. Li,Deepak Pathak 机构:Carnegie Mellon University 备注:Accepted at NeurIPS 2021. Website at this https URL 摘要:我们提出了一种简单的深度强化学习体系结构,通过将输入嵌入到学习的傅里叶基中,并表明它提高了基于状态和基于图像的RL的采样效率。我们使用神经切线核对我们的体系结构进行了无限宽度分析,并从理论上表明,调整傅里叶基的初始方差相当于对学习的深层网络进行函数正则化。也就是说,这些学习到的傅立叶特征允许调整网络在训练数据中的欠拟合或过拟合不同频率的程度,并因此提供一种受控机制来改进RL优化的稳定性和性能。从经验上讲,这使我们能够优先学习低频函数,并通过在优化过程中降低网络对噪声的敏感性来加快学习速度,例如在贝尔曼更新期间。在标准的基于状态和基于图像的RL基准测试上的实验表明,我们的体系结构明显优于基线。网址:https://alexanderli.com/learned-fourier-features 摘要:We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting the degree to which networks underfit or overfit different frequencies in the training data, and hence provide a controlled mechanism to improve the stability and performance of RL optimization. Empirically, this allows us to prioritize learning low-frequency functions and speed up learning by reducing networks' susceptibility to noise in the optimization process, such as during Bellman updates. Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. Website at https://alexanderli.com/learned-fourier-features
【2】 Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention 标题:常识问答中的人类平等观:用外部关注增强自我关注 链接:https://arxiv.org/abs/2112.03254
作者:Yichong Xu,Chenguang Zhu,Shuohang Wang,Siqi Sun,Hao Cheng,Xiaodong Liu,Jianfeng Gao,Pengcheng He,Michael Zeng,Xuedong Huang 机构:Microsoft Corporation 备注:11 pages, 1 figure, 7 tables 摘要:当今大多数人工智能系统都专注于在大量不同数据上使用自我关注机制和转换器架构,以获得令人印象深刻的性能提升。在本文中,我们建议使用外部注意机制来增强transformer体系结构,以带来外部知识和上下文。通过将外部信息集成到预测过程中,我们希望减少对更大模型的需求,并提高人工智能系统的民主化程度。我们发现,所提出的外部注意机制可以显著改善现有的人工智能系统的性能,允许从业者容易地将基础AI模型定制到许多不同的下游应用。特别是,我们关注于常识推理的任务,证明所提出的外部注意机制可以增强现有的Transformer模型,并显著提高模型的推理能力。所提议的系统,知识外部注意推理(KEAR),在开放常识QA研究基准上达到人类平等,准确率为89.4%,而人类准确率为88.9%。 摘要:Most of today's AI systems focus on using self-attention mechanisms and transformer architectures on large amounts of diverse data to achieve impressive performance gains. In this paper, we propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear. By integrating external information into the prediction process, we hope to reduce the need for ever-larger models and increase the democratization of AI systems. We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems, allowing practitioners to easily customize foundation AI models to many diverse downstream applications. In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities. The proposed system, Knowledge External Attention for Reasoning (KEAR), reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4% in comparison to the human accuracy of 88.9%.
【3】 GAM Changer: Editing Generalized Additive Models with Interactive Visualization 标题:GAM转换器:使用交互式可视化编辑广义加法模型 链接:https://arxiv.org/abs/2112.03245
作者:Zijie J. Wang,Alex Kale,Harsha Nori,Peter Stella,Mark Nunnally,Duen Horng Chau,Mihaela Vorvoreanu,Jennifer Wortman Vaughan,Rich Caruana 机构:Georgia Tech ,University of Washington ,Microsoft Research ,NYU Langone Health, CHANGER Align ML Models with Human Knowledge, Slice, B, History Panel, A GAM Canvas, B, Metric Panel, B, Feature Panel, Pneumonia Risk 备注:7 pages, 15 figures, accepted to the Research2Clinics workshop at NeurIPS 2021. For a demo video, see this https URL For a live demo, visit this https URL 摘要:可解释机器学习(ML)研究的最新进展表明,模型利用数据中的不良模式进行预测,这可能会对部署造成危害。然而,目前尚不清楚我们如何修复这些模型。我们介绍我们正在进行的工作,GAM转换器,一个开源的交互式系统,帮助数据科学家和领域专家轻松、负责地编辑他们的广义相加模型(GAMs)。借助新颖的可视化技术,我们的工具将可解释性转化为行动——使人类用户能够分析、验证模型行为,并使其与知识和价值观相一致。使用现代web技术构建,我们的工具在用户的计算笔记本或web浏览器中本地运行,无需额外的计算资源,降低了创建更负责任的ML模型的障碍。GAM转换器可在以下位置获得:https://interpret.ml/gam-changer. 摘要:Recent strides in interpretable machine learning (ML) research reveal that models exploit undesirable patterns in the data to make predictions, which potentially causes harms in deployment. However, it is unclear how we can fix these models. We present our ongoing work, GAM Changer, an open-source interactive system to help data scientists and domain experts easily and responsibly edit their Generalized Additive Models (GAMs). With novel visualization techniques, our tool puts interpretability into action -- empowering human users to analyze, validate, and align model behaviors with their knowledge and values. Built using modern web technologies, our tool runs locally in users' computational notebooks or web browsers without requiring extra compute resources, lowering the barrier to creating more responsible ML models. GAM Changer is available at https://interpret.ml/gam-changer.
【4】 Unsupervised Domain Adaptation for Semantic Image Segmentation: a Comprehensive Survey 标题:无监督区域自适应语义图像分割研究综述 链接:https://arxiv.org/abs/2112.03241
作者:Gabriela Csurka,Riccardo Volpi,Boris Chidlovskii 机构:Naver Labs Europe, France 备注:33 pages 摘要:语义分割在各种计算机视觉应用中起着基础性的作用,为图像的全局理解提供了关键信息。然而,最先进的模型依赖于大量带注释的样本,这些样本的获取成本比图像分类等任务更高。由于未标记数据的获取成本显著降低,因此无监督领域自适应在语义分割社区中取得了广泛的成功也就不足为奇了。本次调查旨在总结这一飞速发展的领域五年来的工作,其中包括语义分割本身的重要性以及使分割模型适应新环境的关键需求。我们提出了最重要的语义分割方法;我们提供了一个全面的调查领域适应技术的语义分割;我们揭示了新的趋势,如多领域学习、领域泛化、测试时间自适应或无源领域自适应;我们通过描述语义切分研究中最广泛使用的数据集和基准来总结这项调查。我们希望这项调查将为学术界和工业界的研究人员提供全面的参考指南,并帮助他们在该领域培养新的研究方向。 摘要:Semantic segmentation plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. Yet, the state-of-the-art models rely on large amount of annotated samples, which are more expensive to obtain than in tasks such as image classification. Since unlabelled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation reached a broad success within the semantic segmentation community. This survey is an effort to summarize five years of this incredibly rapidly growing field, which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. We present the most important semantic segmentation methods; we provide a comprehensive survey on domain adaptation techniques for semantic segmentation; we unveil newer trends such as multi-domain learning, domain generalization, test-time adaptation or source-free domain adaptation; we conclude this survey by describing datasets and benchmarks most widely used in semantic segmentation research. We hope that this survey will provide researchers across academia and industry with a comprehensive reference guide and will help them in fostering new research directions in the field.
【5】 Simulation Intelligence: Towards a New Generation of Scientific Methods 标题:仿真智能:迈向新一代科学方法 链接:https://arxiv.org/abs/2112.03235
作者:Alexander Lavin,Hector Zenil,Brooks Paige,David Krakauer,Justin Gottschlich,Tim Mattson,Anima Anandkumar,Sanjay Choudry,Kamil Rocki,Atılım Güneş Baydin,Carina Prunkl,Brooks Paige,Olexandr Isayev,Erik Peterson,Peter L. McMahon,Jakob Macke,Kyle Cranmer,Jiaxin Zhang,Haruko Wainwright,Adi Hanuka,Manuela Veloso,Samuel Assefa,Stephan Zheng,Avi Pfeffer 机构:Institute for Simulation Intelligence, Alan Turing Institute, Santa Fe Institute, Intel Labs, Nvidia, Neuralink, Atılım Güne¸s Baydin, University of Oxford, Carnegie Mellon University, Cornell University, Jakob H. Macke, University of Tübingen, New York University 摘要:最初的“七个主题”提出了科学计算领域基本方法的路线图,其中主题是捕获计算和数据移动模式的算法方法。我们提出了“模拟智能的九个主题”,这是一个开发和集成科学计算、科学模拟和人工智能合并所需的基本算法的路线图。我们简称这种合并模拟智能(SI)。我们认为,模拟智能的主题是相互关联和相互依存的,就像操作系统各层中的组件一样。利用这一隐喻,我们探索了仿真智能操作系统堆栈(SI堆栈)的每一层的本质以及其中的主题:(1)多物理和多尺度建模;(2) 代理建模与仿真;(3) 基于仿真的推理;(4) 因果建模与推理;(5) 基于Agent的建模;(6) 概率规划;(7) 可微规划;(8) 开放式优化;(9) 机器编程。我们相信,母题之间的协调努力为加速科学发现提供了巨大的机会,从解决合成生物学和气候科学中的反问题,到指导核能实验和预测社会经济环境中的紧急行为。我们详细阐述了SI堆栈的每一层,详细介绍了最先进的方法,展示了突出挑战和机遇的示例,并倡导通过具体方式推进主题及其组合的协同效应。推进和集成这些技术可以实现一种健壮、高效的假设模拟分析类型的科学方法,我们将通过几个人机团队和自动化科学的用例介绍这种方法。 摘要:The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.
【6】 CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks 标题:CALVIN:一种用于长视距机器人操作任务的语言条件策略学习基准 链接:https://arxiv.org/abs/2112.03227
作者:Oier Mees,Lukas Hermann,Erick Rosete-Beas,Wolfram Burgard 机构: All authors are with the Universityof Freiburg 备注:this http URL 摘要:在其环境中与人类共存的通用机器人必须学会将人类语言与其感知和行动联系起来,以便在一系列日常任务中发挥作用。此外,他们还需要掌握多种通用技能,通过遵循无约束的语言指令来完成长期任务。在本文中,我们介绍了CALVIN(从语言和视觉合成动作),这是一个开源的模拟基准,用于学习长视野语言条件任务。我们的目标是开发能够解决许多机器人操作任务的代理,这些任务可以通过车载传感器完成,并且只能通过人类语言指定。CALVIN任务在序列长度、动作空间和语言方面比现有的视觉和语言任务数据集更复杂,并且支持传感器套件的灵活规范。我们评估Zero-Shot中的代理,以获得新的语言指令和新的环境和对象。我们发现,基于多语境模仿学习的基线模型在卡尔文身上表现不佳,这表明有很大的空间开发创新代理,学习将人类语言与其世界模型与该基准相关联。 摘要:General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this paper, we present CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments and objects. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
【7】 Context-Aware Transfer Attacks for Object Detection 标题:面向对象检测的上下文感知传输攻击 链接:https://arxiv.org/abs/2112.03223
作者:Zikui Cai,Xinxin Xie,Shasha Li,Mingjun Yin,Chengyu Song,Srikanth V. Krishnamurthy,Amit K. Roy-Chowdhury,M. Salman Asif 机构: Electrical and Computer Engineering, University of California Riverside, Computer Science and Engineering, University of California Riverside 备注:accepted to AAAI 2022 摘要:近年来,针对图像分类器的黑盒转移攻击得到了广泛的研究。相比之下,在目标探测器的转移攻击方面进展甚微。对象检测器对图像进行整体查看,一个对象(或缺少对象)的检测通常取决于场景中的其他对象。这使得这种检测器固有的上下文感知和对抗性攻击比那些针对图像分类器的攻击更具挑战性。在本文中,我们提出了一种新的方法来生成对象检测器的上下文感知攻击。我们证明,通过使用对象及其相对位置和大小的共现性作为上下文信息,我们可以成功地生成有针对性的错误分类攻击,在blackbox对象检测器上实现比最新技术更高的传输成功率。我们使用PASCAL VOC和MS COCO数据集的图像在各种物体检测器上测试我们的方法,并证明与其他最先进的方法相比,性能提高了20美元百分点。 摘要:Blackbox transfer attacks for image classifiers have been extensively studied in recent years. In contrast, little progress has been made on transfer attacks for object detectors. Object detectors take a holistic view of the image and the detection of one object (or lack thereof) often depends on other objects in the scene. This makes such detectors inherently context-aware and adversarial attacks in this space are more challenging than those targeting image classifiers. In this paper, we present a new approach to generate context-aware attacks for object detectors. We show that by using co-occurrence of objects and their relative locations and sizes as context information, we can successfully generate targeted mis-categorization attacks that achieve higher transfer success rates on blackbox object detectors than the state-of-the-art. We test our approach on a variety of object detectors with images from PASCAL VOC and MS COCO datasets and demonstrate up to $20$ percentage points improvement in performance compared to the other state-of-the-art methods.
【8】 Multi-scale Feature Learning Dynamics: Insights for Double Descent 标题:多尺度特征学习动态:对双重下降的见解 链接:https://arxiv.org/abs/2112.03215
作者:Mohammad Pezeshki,Amartya Mitra,Yoshua Bengio,Guillaume Lajoie 机构:Mila, Universit´e de Montr´eal, University of California, Riverside 摘要:建立深度学习理论基础的一个关键挑战是神经网络的复杂优化动力学,这是由大量网络参数之间的高维交互作用造成的。这种非平凡的动态导致了有趣的行为,如泛化误差的“双下降”现象。这一现象的更普遍研究方面对应于模型双下降,其中测试误差随着模型复杂性的增加呈现第二次下降,超过了经典的U形误差曲线。在这项工作中,我们研究了较少研究的历元双下降的起源,其中测试误差经历了两个非单调的转变,或者随着训练时间的增加而下降。通过利用统计物理的工具,我们研究了一个线性师生结构,它表现出与深层神经网络类似的划时代双下降。在此背景下,我们推导了广义误差在训练过程中演化的封闭形式解析表达式。我们发现,双下降可以归因于在不同尺度上学习的不同特征:随着快速学习特征的过度拟合,较慢的学习特征开始拟合,导致测试错误再次下降。我们通过数值实验验证了我们的发现,我们的理论准确地预测了实证结果,并与深层神经网络中的观察结果保持一致。 摘要:A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.
【9】 Physically Consistent Neural Networks for building thermal modeling: theory and analysis 标题:用于建筑热模拟的物理一致性神经网络:理论与分析 链接:https://arxiv.org/abs/2112.03212
作者:Loris Di Natale,Bratislav Svetozarevic,Philipp Heer,Colin N. Jones 机构: Swiss Federal Institute of Technology Lausanne (EPFL) 备注:Preprint submitted to Applied Energy. 12 pages in the main text 5 in appendix, 11 figures 摘要:由于其高能源强度,建筑在当前世界能源转型中发挥着重要作用。建筑模型无处不在,因为它们在建筑使用寿命的每个阶段都需要,即用于设计、改造和控制操作。基于物理方程的经典白盒模型必然遵循物理定律,但其底层结构的具体设计可能会妨碍其表达能力,从而影响其准确性。另一方面,黑箱模型更适合捕捉非线性建筑动态,因此通常可以获得更好的精度,但它们需要大量数据,并且可能不遵循物理定律,这是神经网络(NN)模型特别常见的问题。为了解决这一已知的泛化问题,最近引入了基于物理的神经网络,研究人员在神经网络结构中引入先验知识,以使其符合已知的基本物理定律,并避免经典的神经网络泛化问题。在这项工作中,我们提出了一种新的物理信息神经网络体系结构,称为物理一致性神经网络(PCNN),它只需要过去的操作数据,不需要工程开销,包括与经典神经网络并行运行的线性模块中的先验知识。我们正式证明了这样的网络在物理上是一致的——通过设计,甚至是在看不见的数据上——关于不同的控制输入和外部和相邻区域的温度。我们在一个案例研究中展示了它们的性能,其中PCNN在3$天的预测范围内比基于经典物理的电阻-电容模型的精度高出50%$。此外,尽管PCNN的结构受到限制,但在验证数据方面,PCNN的性能与经典NNs相似,过度拟合的训练数据较少,并且保持了较高的表达能力以解决泛化问题。 摘要:Due to their high energy intensity, buildings play a major role in the current worldwide energy transition. Building models are ubiquitous since they are needed at each stage of the life of buildings, i.e. for design, retrofitting, and control operations. Classical white-box models, based on physical equations, are bound to follow the laws of physics but the specific design of their underlying structure might hinder their expressiveness and hence their accuracy. On the other hand, black-box models are better suited to capture nonlinear building dynamics and thus can often achieve better accuracy, but they require a lot of data and might not follow the laws of physics, a problem that is particularly common for neural network (NN) models. To counter this known generalization issue, physics-informed NNs have recently been introduced, where researchers introduce prior knowledge in the structure of NNs to ground them in known underlying physical laws and avoid classical NN generalization issues. In this work, we present a novel physics-informed NN architecture, dubbed Physically Consistent NN (PCNN), which only requires past operational data and no engineering overhead, including prior knowledge in a linear module running in parallel to a classical NN. We formally prove that such networks are physically consistent -- by design and even on unseen data -- with respect to different control inputs and temperatures outside and in neighboring zones. We demonstrate their performance on a case study, where the PCNN attains an accuracy up to $50%$ better than a classical physics-based resistance-capacitance model on $3$-day long prediction horizons. Furthermore, despite their constrained structure, PCNNs attain similar performance to classical NNs on the validation data, overfitting the training data less and retaining high expressiveness to tackle the generalization issue.
【10】 An unsupervised extractive summarization method based on multi-round computation 标题:一种基于多轮计算的无监督抽取文摘方法 链接:https://arxiv.org/abs/2112.03203
作者:Dehao Tao,Yingzhu Xiong,Jin He,Skevin,Yongfeng Huang 机构: NGN Lab, Tsinghua University, Beijing, China, University of Wisconsin-Madison, Madison, United States, VMware Inc 摘要:文本摘要方法一直是人们关注的焦点。近年来,深度学习已经被应用到文本摘要中,并且被证明是非常有效的。然而,目前大多数基于深度学习的文本摘要方法都需要大规模的数据集,这在实际应用中很难实现。提出了一种基于多轮计算的无监督抽取文本摘要方法。基于有向图算法,我们将传统的一次计算句子排名的方法改为多轮计算,并在每轮计算后动态优化摘要句子,以更好地匹配文本的特征。本文在四个数据集上进行了实验,每个数据集分别包含中文、英文、长文本和短文本。实验结果表明,与基线方法和其他无监督方法相比,该方法具有更好的性能,并且对不同的数据集具有较强的鲁棒性。 摘要:Text summarization methods have attracted much attention all the time. In recent years, deep learning has been applied to text summarization, and it turned out to be pretty effective. However, most of the current text summarization methods based on deep learning need large-scale datasets, which is difficult to achieve in practical applications. In this paper, an unsupervised extractive text summarization method based on multi-round calculation is proposed. Based on the directed graph algorithm, we change the traditional method of calculating the sentence ranking at one time to multi-round calculation, and the summary sentences are dynamically optimized after each round of calculation to better match the characteristics of the text. In this paper, experiments are carried out on four data sets, each separately containing Chinese, English, long and short texts. The experiment results show that our method has better performance than both baseline methods and other unsupervised methods and is robust on different datasets.
【11】 Player of Games 标题:最佳游戏选手 链接:https://arxiv.org/abs/2112.03178
作者:Martin Schmid,Matej Moravcik,Neil Burch,Rudolf Kadlec,Josh Davidson,Kevin Waugh,Nolan Bard,Finbarr Timbers,Marc Lanctot,Zach Holland,Elnaz Davoodi,Alden Christianson,Michael Bowling 机构:Matej Moravˇc´ık, DeepMind∗ 摘要:游戏作为人工智能进步的基准由来已久。最近,使用搜索和学习的方法在一组完美信息博弈中表现出强大的性能,而使用博弈论推理和学习的方法在特定的不完美信息扑克变体中表现出强大的性能。我们介绍了游戏玩家,这是一种通用算法,它结合了引导搜索、自我游戏学习和博弈论推理,统一了以前的方法。Player of Games是第一个在大型完美和不完美信息博弈中获得强大经验性能的算法,这是迈向真正适用于任意环境的通用算法的重要一步。我们证明了游戏者是健全的,随着可用计算时间和近似容量的增加,会收敛到完美游戏者。游戏玩家在国际象棋和围棋中表现出色,击败了最强大的公开代理——德克萨斯州无限制扑克(Slumbot),并击败了苏格兰场最先进的代理——一款不完美的信息游戏,它展示了引导搜索、学习和博弈论推理的价值。 摘要:Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
【12】 Tele-EvalNet: A Low-cost, Teleconsultation System for Home based Rehabilitation of Stroke Survivors using Multiscale CNN-LSTM Architecture 标题:Tele-EvalNet:一种基于多尺度CNN-LSTM架构的低成本卒中幸存者家庭康复远程会诊系统 链接:https://arxiv.org/abs/2112.03168
作者:Aditya Kanade,Mansi Sharma,M. Manivannan 机构: A great effort is directed at improving thequality of life of stroke survivors with wide range 1The authors are with Department of Electrical Engineering, Indian Institute of Technology Madras, 2The author is with Department of Applied Mechanics 摘要:技术在康复、改善患者预后和降低医疗成本方面发挥着重要作用。然而,现有的方法缺乏临床验证、鲁棒性和易用性。我们提出了Tele-EvalNet,这是一个由两部分组成的新系统:实时反馈模型和总体性能评估模型。实时反馈模型通过使用彩色标记突出显示的易于理解的说明,展示了对练习正确性的反馈。整体绩效评估模型学习将关节数据映射到临床医生给出的绩效分数。该模型通过从关节数据中提取临床认可的特征来实现这一点。此外,使用自动编码器将这些特征编码到低维空间。提出了一种新的多尺度CNN-LSTM网络,利用在多尺度上提取的特征来学习性能数据到分数的映射。建议的系统在分数预测方面显示出高度的改进,并且优于最先进的康复模型。 摘要:Technology has an important role to play in the field of Rehabilitation, improving patient outcomes and reducing healthcare costs. However, existing approaches lack clinical validation, robustness and ease of use. We propose Tele-EvalNet, a novel system consisting of two components: a live feedback model and an overall performance evaluation model. The live feedback model demonstrates feedback on exercise correctness with easy to understand instructions highlighted using color markers. The overall performance evaluation model learns a mapping of joint data to scores, given to the performance by clinicians. The model does this by extracting clinically approved features from joint data. Further, these features are encoded to a lower dimensional space with an autoencoder. A novel multi-scale CNN-LSTM network is proposed to learn a mapping of performance data to the scores by leveraging features extracted at multiple scales. The proposed system shows a high degree of improvement in score predictions and outperforms the state-of-the-art rehabilitation models.
【13】 UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks 标题:Unilog:部署一个模型并将其专用于所有日志分析任务 链接:https://arxiv.org/abs/2112.03159
作者:Yichen Zhu,Weibin Meng,Ying Liu,Shenglin Zhang,Tao Han,Shimin Tao,Dan Pei 机构:‡University of Toronto, §Huawei, †Tsinghua University, ¶Nankai University 备注:Technical report 摘要:UniLog:部署一个模型并将其专门用于所有日志分析任务 摘要:UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks
【14】 Requirements for Open Political Information: Transparency Beyond Open Data 标题:对公开政治信息的要求:公开数据之外的透明度 链接:https://arxiv.org/abs/2112.03119
作者:Andong Luis Li Zhao,Andrew Paley,Rachel Adler,Harper Pack,Sergio Servantez,Alexander Einarsson,Cameron Barrie,Marko Sterbentz,Kristian Hammond 机构:Northwestern University,Northeastern Illinois University 备注:Presented at AAAI FSS-21: Artificial Intelligence in Government and Public Sector, Washington, DC, USA 摘要:一个政治上知情的公民对于一个发达的民主国家来说是必不可少的。尽管美国政府一直在推行开放数据政策,但这些努力在实现开放政府方面还不够,因为只有具备技术和领域知识的人才可以访问数据中的信息。在这项工作中,我们进行了用户访谈,以确定利益相关者之间的需求。我们进一步利用这些信息勾勒出功能性政治信息技术系统的基本要求。 摘要:A politically informed citizenry is imperative for a welldeveloped democracy. While the US government has pursued policies for open data, these efforts have been insufficient in achieving an open government because only people with technical and domain knowledge can access information in the data. In this work, we conduct user interviews to identify wants and needs among stakeholders. We further use this information to sketch out the foundational requirements for a functional political information technical system.
【15】 Active Learning Meets Optimized Item Selection 标题:主动学习满足优化选题需求 链接:https://arxiv.org/abs/2112.03105
作者:Bernard Kleynhans,Xin Wang,Serdar Kadıoğlu 机构:AI Center of Excellence, Fidelity Investments, Boston, USA 备注:IJCAI 2021 Data Science Meets Optimization Workshop (DSO@IJCAI 2021) 摘要:设计具有有限或没有可用训练数据的推荐系统仍然是一项挑战。为此,提出了一个新的组合优化问题,用于生成用于实验的优化项目选择,目的是缩短随机训练数据的收集时间。我们首先介绍了优化项目选择问题的概况和解决该问题的多层次优化框架。该方法集成了离散优化、无监督聚类和潜在文本嵌入等技术。然后,我们将讨论如何将优化项目选择与主动学习结合起来,作为随机探索的一部分。 摘要:Designing recommendation systems with limited or no available training data remains a challenge. To that end, a new combinatorial optimization problem is formulated to generate optimized item selection for experimentation with the goal to shorten the time for collecting randomized training data. We first present an overview of the optimized item selection problem and a multi-level optimization framework to solve it. The approach integrates techniques from discrete optimization, unsupervised clustering, and latent text embeddings. We then discuss how to incorporate optimized item selection with active learning as part of randomized exploration in an ongoing fashion.
【16】 Extracting Domain-specific Concepts from Large-scale Linked Open Data 标题:从大规模关联开放数据中提取领域相关概念 链接:https://arxiv.org/abs/2112.03102
作者:Satoshi Kume,Kouji Kozaki 机构:Osaka Electro-Communication University, Neyagawa, Osaka, Japan 摘要:我们提出了一种从大规模链接开放数据(LOD)中提取目标领域概念的方法,以支持提供领域特定知识和定义的领域本体的构建。所提出的方法通过将LOD词汇表与目标域相关的技术术语相链接来定义搜索实体。然后将搜索实体用作获取LOD中上层概念的起点,并检查常见上层实体的出现情况和路径关系链,以确定目标域中概念连接的范围。使用技术词典索引和自然语言处理来评估提取的概念是否覆盖该领域。作为从LOD中提取类层次结构的示例,我们使用Wikidata为聚合物材料和物理特性构建了领域本体。该方法可以应用于具有类层次结构的一般数据集,并且允许本体开发人员为自己的目的创建领域本体的初始模型。 摘要:We propose a methodology for extracting concepts for a target domain from large-scale linked open data (LOD) to support the construction of domain ontologies providing field-specific knowledge and definitions. The proposed method defines search entities by linking the LOD vocabulary with technical terms related to the target domain. The search entities are then used as a starting point for obtaining upper-level concepts in the LOD, and the occurrences of common upper-level entities and the chain-of-path relationships are examined to determine the range of conceptual connections in the target domain. A technical dictionary index and natural language processing are used to evaluate whether the extracted concepts cover the domain. As an example of extracting a class hierarchy from LOD, we used Wikidata to construct a domain ontology for polymer materials and physical properties. The proposed method can be applied to general datasets with class hierarchies, and it allows ontology developers to create an initial model of the domain ontology for their own purposes.
【17】 Flexible Option Learning 标题:灵活的选项学习 链接:https://arxiv.org/abs/2112.03097
作者:Martin Klissarov,Doina Precup 机构:Mila, McGill University and DeepMind 备注:NeurIPS 2021 Spotlight 摘要:强化学习中的时间抽象(RL)通过更有效地传播信息,为改善复杂环境中的泛化和知识转移提供了希望。虽然期权学习最初是以允许同时更新多个期权的方式制定的,使用非策略、期权内学习(Sutton、Precup&Singh,1999),但最近的许多分层强化学习方法一次只更新一个期权:当前执行的期权。我们在深度强化学习的背景下重新审视和扩展选项内学习,以便能够更新与当前原始动作选择一致的所有选项,而不引入任何额外的估计。因此,我们的方法可以自然地在大多数分层RL框架中采用。当我们将我们的方法与用于选项发现的option critic算法相结合时,我们在许多领域的性能和数据效率都得到了显著的提高。 摘要:Temporal abstraction in reinforcement learning (RL), offers the promise of improving generalization and knowledge transfer in complex environments, by propagating information more efficiently over time. Although option learning was initially formulated in a way that allows updating many options simultaneously, using off-policy, intra-option learning (Sutton, Precup & Singh, 1999), many of the recent hierarchical reinforcement learning approaches only update a single option at a time: the option currently executing. We revisit and extend intra-option learning in the context of deep reinforcement learning, in order to enable updating all options consistent with current primitive action choices, without introducing any additional estimates. Our method can therefore be naturally adopted in most hierarchical RL frameworks. When we combine our approach with the option-critic algorithm for option discovery, we obtain significant improvements in performance and data-efficiency across a wide variety of domains.
【18】 Transfer learning to improve streamflow forecasts in data sparse regions 标题:用于改进数据稀疏区域中的流量预测的转移学习 链接:https://arxiv.org/abs/2112.03088
作者:Roland Oruche,Lisa Egede,Tracy Baker,Fearghal O'Donncha 机构:Department of Electrical Engineering & Computer Science, University of Missouri–Columbia, USA, Human Computer Interaction Institute, Carnegie Mellon University, USA, The Nature Conservancy, New York, NY, USA, IBM Research Europe, IE 备注:9 pages, 5 figures, 1 table 摘要:有效的水资源管理需要在空间和时间上提供有关水资源质量和数量的信息。在本文中,我们通过微调和参数转移来研究迁移学习(TL)背后的方法,以便在数据稀疏区域中获得更好的径流预测泛化性能。我们提出了一种标准的长-短期记忆(LSTM)形式的递归神经网络,以适应足够大的源域数据集,并将学习到的权重重新调整到更小但相似的目标域数据集。我们提出了一种时空应用迁移学习方法,通过分离模型的空间和时间成分,并训练模型基于表示空间可变性的分类数据集进行泛化。该框架基于美国丰富的基准数据集开发,并基于肯尼亚自然保护协会收集的较小数据集进行评估。通过我们的TL技术,LSTM模型表现出了泛化性能。本次试验的结果表明,当使用知识转移和静态描述符改进数据稀疏区域的水文模型泛化时,预测径流响应的有效预测技巧。 摘要:Effective water resource management requires information on water availability, both in terms of quality and quantity, spatially and temporally. In this paper, we study the methodology behind Transfer Learning (TL) through fine-tuning and parameter transferring for better generalization performance of streamflow prediction in data-sparse regions. We propose a standard recurrent neural network in the form of Long Short-Term Memory (LSTM) to fit on a sufficiently large source domain dataset and repurpose the learned weights to a significantly smaller, yet similar target domain datasets. We present a methodology to implement transfer learning approaches for spatiotemporal applications by separating the spatial and temporal components of the model and training the model to generalize based on categorical datasets representing spatial variability. The framework is developed on a rich benchmark dataset from the US and evaluated on a smaller dataset collected by The Nature Conservancy in Kenya. The LSTM model exhibits generalization performance through our TL technique. Results from this current experiment demonstrate the effective predictive skill of forecasting streamflow responses when knowledge transferring and static descriptors are used to improve hydrologic model generalization in data-sparse regions.
【19】 Active Learning for Event Extraction with Memory-based Loss Prediction Model 标题:基于记忆损失预测模型的主动学习事件提取 链接:https://arxiv.org/abs/2112.03073
作者:Shirong Shen,Zhen Li,Guilin Qi 机构:School of Computer Science and Engineering, Southeast University, China 摘要:事件提取(EE)在许多工业应用场景中起着重要作用,高质量的EE方法需要大量的人工标注数据来训练监督学习模型。然而,获取标注数据的成本非常高,尤其是对领域事件的标注,需要相应领域的专家参与。因此,我们引入主动学习(AL)技术来降低事件注释的成本。但是现有的AL方法存在两个主要问题,使得它们不能很好地用于事件提取。首先,现有的基于池的选择策略在计算成本和样本有效性方面存在局限性。其次,现有的样本重要性评价缺乏对局部样本信息的利用。在这篇文章中,我们提出了一种新的深度迭代方法。我们提出了基于批处理的选择策略和基于记忆的损失预测模型(MBLP)来有效地选择未标记的样本。在选择过程中,我们使用内部-外部样本损失排序方法,利用局部信息评估样本的重要性。最后,我们提出了一种延迟训练策略来训练MBLP模型。在三个领域的数据集上进行了大量的实验,我们的方法优于其他最先进的方法。 摘要:Event extraction (EE) plays an important role in many industrial application scenarios, and high-quality EE methods require a large amount of manual annotation data to train supervised learning models. However, the cost of obtaining annotation data is very high, especially for annotation of domain events, which requires the participation of experts from corresponding domain. So we introduce active learning (AL) technology to reduce the cost of event annotation. But the existing AL methods have two main problems, which make them not well used for event extraction. Firstly, the existing pool-based selection strategies have limitations in terms of computational cost and sample validity. Secondly, the existing evaluation of sample importance lacks the use of local sample information. In this paper, we present a novel deep AL method for EE. We propose a batch-based selection strategy and a Memory-Based Loss Prediction model (MBLP) to select unlabeled samples efficiently. During the selection process, we use an internal-external sample loss ranking method to evaluate the sample importance by using local information. Finally, we propose a delayed training strategy to train the MBLP model. Extensive experiments are performed on three domain datasets, and our method outperforms other state-of-the-art methods.
【20】 Thinking Beyond Distributions in Testing Machine Learned Models 标题:机器学习模型测试中的超越分布思考 链接:https://arxiv.org/abs/2112.03057
作者:Negar Rostamzadeh,Ben Hutchinson,Christina Greer,Vinodkumar Prabhakaran 机构:Google Research 备注:None 摘要:机器学习(ML)社区内的测试实践集中于评估学习模型的预测性能,该模型根据测试数据集进行测量,测试数据集通常来自与训练数据集相同的分布。尽管最近在ML社区内关于稳健性和公平性测试的工作指出了针对分布变化进行测试的重要性,但这些工作也集中于估计模型对参考数据集/分布产生错误的可能性。我们认为,这种测试观点积极地阻止了研究人员和开发人员研究健壮性故障的其他来源,例如可能产生严重不良影响的角落案例。我们与软件工程测试中数十年的工作进行了对比,这些工作专注于评估软件系统在各种压力条件下的性能,包括角落案例,而不是仅仅关注平均案例行为。最后,我们提出了一系列建议,将机器学习测试的视野扩展到严格的实践中。 摘要:Testing practices within the machine learning (ML) community have centered around assessing a learned model's predictive performance measured against a test dataset, often drawn from the same distribution as the training dataset. While recent work on robustness and fairness testing within the ML community has pointed to the importance of testing against distributional shifts, these efforts also focus on estimating the likelihood of the model making an error against a reference dataset/distribution. We argue that this view of testing actively discourages researchers and developers from looking into other sources of robustness failures, for instance corner cases which may have severe undesirable impacts. We draw parallels with decades of work within software engineering testing focused on assessing a software system against various stress conditions, including corner cases, as opposed to solely focusing on average-case behaviour. Finally, we put forth a set of recommendations to broaden the view of machine learning testing to a rigorous practice.
【21】 Keeping it Simple: Language Models can learn Complex Molecular Distributions 标题:保持简单:语言模型可以学习复杂的分子分布 链接:https://arxiv.org/abs/2112.03041
作者:Daniel Flam-Shepherd,Kevin Zhu,Alán Aspuru-Guzik 机构:Department of Computer Science, University of Toronto, Toronto, Ontario M,S ,E, Canada, Vector Institute for Artificial Intelligence, Toronto, Ontario M,S ,M, Canada, Department of Chemistry, University of Toronto, Toronto, Ontario M,G ,Z, Canada 摘要:分子的深层生成模型在相关数据集上得到了训练,并得到了极大的普及,这些模型被用来搜索化学空间。新功能化合物逆向设计的生成模型的下游效用取决于它们学习分子训练分布的能力。最简单的例子是采用递归神经网络形式的语言模型,并使用字符串表示生成分子。更复杂的是图形生成模型,它按顺序构造分子图,通常实现最先进的结果。然而,最近的研究表明,语言模型比人们曾经认为的更有能力,特别是在数据量较低的情况下。在这项工作中,我们研究了简单语言模型学习分子分布的能力。为此,我们通过编译特别复杂的分子分布来介绍几个具有挑战性的生成性建模任务。在每项任务中,我们将语言模型的能力与两种广泛使用的图形生成模型进行比较。结果表明,语言模型是强大的生成模型,能够熟练地学习复杂的分子分布,并且比图形模型具有更好的性能。语言模型可以准确地生成:ZINC15中得分最高的惩罚LogP分子的分布、多峰分子分布以及PubChem中的最大分子。 摘要:Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets, these models are used to search through chemical space. The downstream utility of generative models for the inverse design of novel functional compounds depends on their ability to learn a training distribution of molecules. The most simple example is a language model that takes the form of a recurrent neural network and generates molecules using a string representation. More sophisticated are graph generative models, which sequentially construct molecular graphs and typically achieve state of the art results. However, recent work has shown that language models are more capable than once thought, particularly in the low data regime. In this work, we investigate the capacity of simple language models to learn distributions of molecules. For this purpose, we introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules. On each task, we evaluate the ability of language models as compared with two widely used graph generative models. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions -- and yield better performance than the graph models. Language models can accurately generate: distributions of the highest scoring penalized LogP molecules in ZINC15, multi-modal molecular distributions as well as the largest molecules in PubChem.
【22】 Unsupervised Law Article Mining based on Deep Pre-Trained Language Representation Models with Application to the Italian Civil Code 标题:基于深度预训练语言表示模型的无监督法律条款挖掘及其在意大利民法典中的应用 链接:https://arxiv.org/abs/2112.03033
作者:Andrea Tagarelli,Andrea Simeri 机构:University of Calabria 备注:None 摘要:将法律搜索和检索建模为预测问题是近年来出现的一种主要的法律智能方法。针对法律文章检索任务,我们提出了一个名为LamBERTa的深入学习框架,该框架专为民法典设计,并专门针对意大利民法典进行训练。据我们所知,这是第一项基于BERT(来自Transformer的双向编码器表示)学习框架提出意大利法律体系法律条款预测高级方法的研究,该框架最近在深度学习方法中引起了越来越多的关注,在一些自然语言处理和学习任务中表现出出色的效果。我们通过对意大利民法典或其部分的意大利预训练BERT进行微调来定义LamBERTa模型,将法律文章检索作为一项分类任务。我们的LamBERTa框架的一个关键方面是,我们设想它来解决一个极端的分类场景,该场景的特点是类的数量很多,很少有镜头学习问题,并且缺少意大利法律预测任务的测试查询基准。为了解决这些问题,我们为法律条款的无监督标记定义了不同的方法,原则上可以应用于任何法律条款代码系统。我们深入了解了LamBERTa模型的可解释性和可解释性,并针对单标签和多标签评估任务,对不同类型的查询集进行了广泛的实验分析。经验证据表明了LamBERTa的有效性,以及它相对于广泛使用的深度学习文本分类器和为属性感知预测任务构思的少数快照学习者的优势。 摘要:Modeling law search and retrieval as prediction problems has recently emerged as a predominant approach in law intelligence. Focusing on the law article retrieval task, we present a deep learning framework named LamBERTa, which is designed for civil-law codes, and specifically trained on the Italian civil code. To our knowledge, this is the first study proposing an advanced approach to law article prediction for the Italian legal system based on a BERT (Bidirectional Encoder Representations from Transformers) learning framework, which has recently attracted increased attention among deep learning approaches, showing outstanding effectiveness in several natural language processing and learning tasks. We define LamBERTa models by fine-tuning an Italian pre-trained BERT on the Italian civil code or its portions, for law article retrieval as a classification task. One key aspect of our LamBERTa framework is that we conceived it to address an extreme classification scenario, which is characterized by a high number of classes, the few-shot learning problem, and the lack of test query benchmarks for Italian legal prediction tasks. To solve such issues, we define different methods for the unsupervised labeling of the law articles, which can in principle be applied to any law article code system. We provide insights into the explainability and interpretability of our LamBERTa models, and we present an extensive experimental analysis over query sets of different type, for single-label as well as multi-label evaluation tasks. Empirical evidence has shown the effectiveness of LamBERTa, and also its superiority against widely used deep-learning text classifiers and a few-shot learner conceived for an attribute-aware prediction task.
【23】 On the algebraic structures of the space of interval-valued intuitionistic fuzzy numbers 标题:区间值直觉模糊数空间的代数结构 链接:https://arxiv.org/abs/2112.03026
作者:Xinxing Wu,Chaoyue Tan,Gul Deniz Cayli,Peide Liu 机构:School of Sciences, Southwest Petroleum University, Chengdu, Sichuan , China, Zhuhai College of Jilin University, Zhuhai, Guangdong , China, Department of Mathematics, Karadeniz Technical University, Trabzon, Turkey 摘要:本研究受Huang等人(Soft Compute.25513-25202021)和Wang等人(Inf.Sci.179,3026-30402009)的启发,其中介绍了区间值直觉模糊数(IVIFNs)的一些排序技术。在本研究中,我们证明了基于分数函数和三类熵函数的任意两个IVIFN的比较方法中所有IVIFN的空间都是一个完整链,并且得到了该关系是一个容许序。此外,我们还证明了IVIFNs是基于分数、准确性、隶属度不确定性指数和犹豫不确定性指数函数的IVIFNs比较方法中关系的完整链。 摘要:This study is inspired by those of Huang et al. (Soft Comput. 25, 2513--2520, 2021) and Wang et al. (Inf. Sci. 179, 3026--3040, 2009) in which some ranking techniques for interval-valued intuitionistic fuzzy numbers (IVIFNs) were introduced. In this study, we prove that the space of all IVIFNs with the relation in the method for comparing any two IVIFNs based on a score function and three types of entropy functions is a complete chain and obtain that this relation is an admissible order. Moreover, we demonstrate that IVIFNs are complete chains to the relation in the comparison method for IVIFNs on the basis of score, accuracy, membership uncertainty index, and hesitation uncertainty index functions.
【24】 Isomer: Transfer enhanced Dual-Channel Heterogeneous Dependency Attention Network for Aspect-based Sentiment Classification 标题:异构体:基于方面的情感分类转移增强型双通道异构依赖注意网络 链接:https://arxiv.org/abs/2112.03011
作者:Yukun Cao,Yijia Tang,Ziyue Wei,ChengKun Jin,Zeyu Miao,Yixin Fang,Haizhou Du,Feifei Xu 机构:ShangHai University of Electric Power 备注:6 Pages, 2 Figures 摘要:基于体的情感分类旨在预测句子中特定体的情感极性。然而,现有的大多数方法试图将依赖关系构造成稀疏性和模糊性的齐次依赖图,它不能涵盖短文本的综合上下文特征或考虑任何附加节点类型或语义关系信息。为了解决这些问题,我们提出了一个名为异构体的情感分析模型,该模型对包含外部知识的异构依赖图进行双通道关注,以有效地集成其他附加信息。具体而言,在异构体中设计了一个传输增强的双通道异构依赖注意网络,以使用异构依赖图对短文本进行建模。这些异构依赖图不仅考虑不同类型的信息,而且还包括外部知识。实验研究表明,在基准数据集上,我们的模型优于最近的模型。此外,结果表明,我们的方法抓住了各种信息特征的重要性,以关注信息上下文词。 摘要:Aspect-based sentiment classification aims to predict the sentiment polarity of a specific aspect in a sentence. However, most existing methods attempt to construct dependency relations into a homogeneous dependency graph with the sparsity and ambiguity, which cannot cover the comprehensive contextualized features of short texts or consider any additional node types or semantic relation information. To solve those issues, we present a sentiment analysis model named Isomer, which performs a dual-channel attention on heterogeneous dependency graphs incorporating external knowledge, to effectively integrate other additional information. Specifically, a transfer-enhanced dual-channel heterogeneous dependency attention network is devised in Isomer to model short texts using heterogeneous dependency graphs. These heterogeneous dependency graphs not only consider different types of information but also incorporate external knowledge. Experiments studies show that our model outperforms recent models on benchmark datasets. Furthermore, the results suggest that our method captures the importance of various information features to focus on informative contextual words.
【25】 Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals 标题:带区分种子词的弱监督原型主题模型:通过自探索监督信号修正类别先验 链接:https://arxiv.org/abs/2112.03009
作者:Bing Wang,Yue Wang,Ximing Li,Jihong Ouyang 机构:College of Computer Science and Technology, Jilin University, China, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University 摘要:无数据文本分类,即一种新的弱监督学习范式,是指使用未标记文档和一些预定义的类别代表词(称为种子词)进行学习的任务。最近产生的无数据方法仅通过种子词的出现来构造文档特定的类别先验,然而,此类类别先验通常包含非常有限甚至有噪声的监督信号。为了解决这个问题,本文提出了一种新的类别先验公式。首先,对于每一个文档,我们考虑其标签隶属度不仅计算种子字出现,而且还使用一个新的原型方案,捕捉伪最近邻类别。第二,对于每个标签,我们考虑其频率先验知识的语料库,这也是一个歧视性的知识分类。通过将所提出的类别先验引入到先前的无数据生成方法中,我们提出了一种新的无数据生成方法,即弱监督原型主题模型(WSPTM)。在真实数据集上的实验结果表明,WSPTM优于现有的基线方法。 摘要:Dataless text classification, i.e., a new paradigm of weakly supervised learning, refers to the task of learning with unlabeled documents and a few predefined representative words of categories, known as seed words. The recent generative dataless methods construct document-specific category priors by using seed word occurrences only, however, such category priors often contain very limited and even noisy supervised signals. To remedy this problem, in this paper we propose a novel formulation of category prior. First, for each document, we consider its label membership degree by not only counting seed word occurrences, but also using a novel prototype scheme, which captures pseudo-nearest neighboring categories. Second, for each label, we consider its frequency prior knowledge of the corpus, which is also a discriminative knowledge for classification. By incorporating the proposed category prior into the previous generative dataless method, we suggest a novel generative dataless method, namely Weakly Supervised Prototype Topic Model (WSPTM). The experimental results on real-world datasets demonstrate that WSPTM outperforms the existing baseline methods.
【26】 How News Evolves? Modeling News Text and Coverage using Graphs and Hawkes Process 标题:新闻是如何演变的?利用图和Hawkes过程对新闻文本和报道进行建模 链接:https://arxiv.org/abs/2112.03008
作者:Honggen Zhang,June Zhang 机构:Electrical and Computer Engineering, University of Hawai’i at M¯anoa, Honolulu, USA 摘要:自动监控新闻内容是一个重要的问题。与传统文本不同,新闻内容具有时间成分。然而,很少有研究将自然语言处理与动态系统模型相结合。一个原因是,对自然语言的细微差别进行数学建模很有挑战性。在本文中,我们将讨论如何构建一个新的新闻文章数据集。然后,我们提出了一种将随时间收集的新闻文本转换为一系列表示语义三元组(主语!谓语!宾语)的有向多重图的方法。我们使用离散时间Hawkes过程对这些图中特定拓扑变化的动力学进行建模。通过我们的真实数据,我们表明,分析图的结构和离散时间霍克斯过程模型可以深入了解新闻事件是如何被报道的,以及如何预测未来可能如何被报道。 摘要:Monitoring news content automatically is an important problem. The news content, unlike traditional text, has a temporal component. However, few works have explored the combination of natural language processing and dynamic system models. One reason is that it is challenging to mathematically model the nuances of natural language. In this paper, we discuss how we built a novel dataset of news articles collected over time. Then, we present a method of converting news text collected over time to a sequence of directed multi-graphs, which represent semantic triples (Subject ! Predicate ! Object). We model the dynamics of specific topological changes from these graphs using discrete-time Hawkes processes. With our real-world data, we show that analyzing the structures of the graphs and the discrete-time Hawkes process model can yield insights on how the news events were covered and how to predict how it may be covered in the future.
【27】 How to Build Robust FAQ Chatbot with Controllable Question Generator? 标题:如何用可控问题生成器构建健壮的FAQ聊天机器人? 链接:https://arxiv.org/abs/2112.03007
作者:Yan Pan,Mingyang Ma,Bernhard Pflugfelder,Georg Groh 机构:a Technical University of Munich, Department of Informatics, Research Group Social Computing, Boltzmannstr. , Garching, Germany, b BMW Group, Petuelring , Munich, Germany, corresponding author 摘要:许多无法回答的对抗性问题用一些似是而非的答案愚弄问答系统。构建一个健壮的、常见问题(FAQ)聊天机器人需要大量不同的对抗性示例。最近的问题生成方法在从非结构化文本生成许多高质量和多样的对抗性问题-答案对方面是无效的。我们提出了多样性可控语义有效敌方攻击者(DCSA),这是一种高质量、多样性、可控的方法,用于生成具有语义图的标准和敌方样本。流畅和语义生成的问答对成功地愚弄了我们的文章检索模型。在此基础上,研究了在不同领域之间生成QA对的QA模型的鲁棒性和泛化性。我们发现,生成的数据集提高了QA模型在新目标领域的通用性,以及QA模型检测无法回答的敌对问题的鲁棒性。 摘要:Many unanswerable adversarial questions fool the question-answer (QA) system with some plausible answers. Building a robust, frequently asked questions (FAQ) chatbot needs a large amount of diverse adversarial examples. Recent question generation methods are ineffective at generating many high-quality and diverse adversarial question-answer pairs from unstructured text. We propose the diversity controllable semantically valid adversarial attacker (DCSA), a high-quality, diverse, controllable method to generate standard and adversarial samples with a semantic graph. The fluent and semantically generated QA pairs fool our passage retrieval model successfully. After that, we conduct a study on the robustness and generalization of the QA model with generated QA pairs among different domains. We find that the generated data set improves the generalizability of the QA model to the new target domain and the robustness of the QA model to detect unanswerable adversarial questions.
【28】 CU-UD: text-mining drug and chemical-protein interactions with ensembles of BERT-based models 标题:Cu-UD:与基于BERT的模型集合的文本挖掘药物和化学-蛋白质相互作用(Text-Mining Drug and Chemical-Protein Interaction) 链接:https://arxiv.org/abs/2112.03004
作者:Mehmet Efruz Karabulut,K. Vijay-Shanker,Yifan Peng 机构:Computer & Information Sciences, University of Delaware, Newark, DE, USA, Population Health Sciences, Weill Cornell Medicine, New York, NY, USA 备注:Proceedings of the BioCreative VII Challenge Evaluation Workshop 摘要:识别化学物质和蛋白质之间的关系是一项重要的文本挖掘任务。BioCreative VII track 1 DrugProt任务旨在促进开发和评估能够自动检测PubMed摘要中化合物/药物和基因/蛋白质之间关系的系统。在本文中,我们描述了我们的提交,这是一个集成系统,包括多个基于BERT的语言模型。我们使用多数投票和多层感知器将各个模型的输出结合起来。我们的系统获得了0.7708的精确度和0.7770的召回率,F1得分为0.7739,证明了使用基于BERT语言模型的集合自动检测化学物质和蛋白质之间关系的有效性。我们的代码可在https://github.com/bionlplab/drugprot_bcvii. 摘要:Identifying the relations between chemicals and proteins is an important text mining task. BioCreative VII track 1 DrugProt task aims to promote the development and evaluation of systems that can automatically detect relations between chemical compounds/drugs and genes/proteins in PubMed abstracts. In this paper, we describe our submission, which is an ensemble system, including multiple BERT-based language models. We combine the outputs of individual models using majority voting and multilayer perceptron. Our system obtained 0.7708 in precision and 0.7770 in recall, for an F1 score of 0.7739, demonstrating the effectiveness of using ensembles of BERT-based language models for automatically detecting relations between chemicals and proteins. Our code is available at https://github.com/bionlplab/drugprot_bcvii.
【29】 GraphPrompt: Biomedical Entity Normalization Using Graph-based Prompt Templates 标题:GraphPrompt:基于图形提示模板的生物医学实体规范化 链接:https://arxiv.org/abs/2112.03002
作者:Jiayou Zhang,Zhirui Wang,Shizhuo Zhang,Megh Manoj Bhalerao,Yucong Liu,Dawei Zhu,Sheng Wang 机构:Tsinghua University,Peking University,Nanyang Technological University, University of Washington,University of Chicago 备注:12 pages 摘要:生物医学实体规范化统一了生物医学实验和研究的语言,并进一步使我们能够获得生命科学的整体视图。目前的方法主要研究更标准化的实体(如疾病和药物)的标准化,而忽视更模糊但关键的实体(如路径、功能和细胞类型),从而阻碍了它们在现实世界中的应用。为了在这些未充分探索的实体上实现生物医学实体规范化,我们首先引入一个专家管理的数据集OBO syn,其中包含70种不同类型的实体和200万个管理的实体同义词对。为了利用该数据集中独特的图形结构,我们提出了GraphPrompt,这是一种基于提示的学习方法,根据图形创建提示模板。GraphPrompt在Zero-Shot和Few-Shot设置上分别获得41.0%和29.9%的改进,表明这些基于图形的提示模板的有效性。我们设想,我们的方法GraphPrompt和OBO syn dataset可以广泛应用于基于图形的NLP任务,并作为分析多样性和积累生物医学数据的基础。 摘要:Biomedical entity normalization unifies the language across biomedical experiments and studies, and further enables us to obtain a holistic view of life sciences. Current approaches mainly study the normalization of more standardized entities such as diseases and drugs, while disregarding the more ambiguous but crucial entities such as pathways, functions and cell types, hindering their real-world applications. To achieve biomedical entity normalization on these under-explored entities, we first introduce an expert-curated dataset OBO-syn encompassing 70 different types of entities and 2 million curated entity-synonym pairs. To utilize the unique graph structure in this dataset, we propose GraphPrompt, a prompt-based learning approach that creates prompt templates according to the graphs. GraphPrompt obtained 41.0% and 29.9% improvement on zero-shot and few-shot settings respectively, indicating the effectiveness of these graph-based prompt templates. We envision that our method GraphPrompt and OBO-syn dataset can be broadly applied to graph-based NLP tasks, and serve as the basis for analyzing diverse and accumulating biomedical data.
【30】 Development of a robust cascaded architecture for intelligent robot grasping using limited labelled data 标题:基于有限标签数据的鲁棒级联智能机器人抓取体系结构研究 链接:https://arxiv.org/abs/2112.03001
作者:Priya Shukla,Vandana Kushwaha,G. C. Nandi 机构:Received: date Accepted: date, List of abbreviations, VAE: Variational Auto-Encoder, VQVAE: Vector Quantized Variational Auto-Encoder, CNN: Convolutional Neural Network, GGCNN: Generative Grasp CNN, GGCNN,: Generative Grasp CNN-, RGGCNN: Representation based GGCNN 备注:12 摘要:即使对人类来说,智能地抓取物体也是一项具有挑战性的任务,我们在童年时期花了相当多的时间来学习如何正确地抓取物体。就机器人而言,我们不能花那么多时间来学习如何有效地抓取物体。因此,在本研究中,我们提出了一种基于VQVAE的高效学习体系结构,以便能够用与正确抓取相对应的足够数据来教授机器人。然而,在机器人抓取领域,获取足够的标记数据是极其困难的。为了解决这一问题,研究了一种半监督学习模型,该模型即使在有限的标记数据集下也具有更强的泛化能力。与现有最先进的车型(包括早期车型)相比,其性能提高了6%。在实验过程中,已经观察到,与不使用未标记数据生成抓取矩形的现有方法相比,我们提出的模型RGGCNN2在抓取孤立对象以及杂乱环境中的对象方面的性能显著更好。据我们所知,开发一个智能机器人抓取模型(基于半监督学习),通过表示学习进行训练,利用GGCNN2体系结构的高质量学习能力,使用有限数量的标记数据集和学习到的潜在嵌入,可作为一种事实上的训练方法,该方法已在本文中建立,并通过使用Baxter(Anukul)研究机器人进行严格的硬件实验进行验证。 摘要:Grasping objects intelligently is a challenging task even for humans and we spend a considerable amount of time during our childhood to learn how to grasp objects correctly. In the case of robots, we can not afford to spend that much time on making it to learn how to grasp objects effectively. Therefore, in the present research we propose an efficient learning architecture based on VQVAE so that robots can be taught with sufficient data corresponding to correct grasping. However, getting sufficient labelled data is extremely difficult in the robot grasping domain. To help solve this problem, a semi-supervised learning based model which has much more generalization capability even with limited labelled data set, has been investigated. Its performance shows 6% improvement when compared with existing state-of-the-art models including our earlier model. During experimentation, It has been observed that our proposed model, RGGCNN2, performs significantly better, both in grasping isolated objects as well as objects in a cluttered environment, compared to the existing approaches which do not use unlabelled data for generating grasping rectangles. To the best of our knowledge, developing an intelligent robot grasping model (based on semi-supervised learning) trained through representation learning and exploiting the high-quality learning ability of GGCNN2 architecture with the limited number of labelled dataset together with the learned latent embeddings, can be used as a de-facto training method which has been established and also validated in this paper through rigorous hardware experimentations using Baxter (Anukul) research robot.
【31】 Sequential Randomized Smoothing for Adversarially Robust Speech Recognition 标题:对抗性语音识别的序贯随机平滑算法 链接:https://arxiv.org/abs/2112.03000
作者:Raphael Olivier,Bhiksha Raj 机构:Carnegie Mellon University, Language Technologies Institute, Forbes Avenue, Pittsburgh PA , USA 备注:to be published in the proceedings of EMNLP 2021 摘要:虽然自动语音识别已被证明容易受到敌对攻击,但对这些攻击的防御仍然滞后。自适应攻击可以部分打破现有的幼稚防御。在分类任务中,随机平滑范式已被证明是有效的防御模型。然而,由于ASR任务的复杂性及其输出的连续性,很难将这种范式应用于ASR任务。我们的论文通过利用特定于语音的工具(如增强和罗孚投票)来设计对扰动具有鲁棒性的ASR模型,从而克服了其中的一些挑战。我们在我们的模型中应用了最新攻击的自适应版本,如不可察觉的ASR攻击,并表明我们最强的防御对所有使用听不见噪声的攻击都是鲁棒的,并且只能在非常高的失真情况下被破坏。 摘要:While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Randomized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.
【32】 TaskDrop: A Competitive Baseline for Continual Learning of Sentiment Classification 标题:TaskDrop:一种好胜情感分类持续学习的基线 链接:https://arxiv.org/abs/2112.02995
作者:Jianping Mei,Yilun Zheng,Qianwei Zhou,Rui Yan 机构:Zhejiang University of Technology 摘要:在本文中,我们研究了连续学习环境下的多任务情绪分类问题,即,对一个模型进行顺序训练,以对特定类别的产品评论的情绪进行分类。在不同产品类别的评论中使用常见的情感词会导致巨大的跨任务相似性,这与其他领域的持续学习不同。这种知识共享的性质使得以遗忘减少为中心的方法对所考虑的问题的效果较差。与现有方法不同的是,任务特定的掩码是通过特定的假定训练目标学习的,我们提出了一种称为任务感知辍学(TaskDrop)的方法,以随机方式生成掩码。标准辍学为每个历元的每个训练实例生成并应用随机掩码以实现有效的正则化,而TaskDrop则为任务级容量分配和重用应用随机掩码。我们在三个多任务回顾数据集上进行了实验研究,并对各种基线和最先进的方法进行了比较。我们的实证结果表明,不管简单程度如何,TaskDrop在所有三个数据集上都取得了具有竞争力的表现,特别是在相对长期的学习之后。这表明所提出的随机容量分配机制适用于连续情绪分类。 摘要:In this paper, we study the multi-task sentiment classification problem in the continual learning setting, i.e., a model is sequentially trained to classifier the sentiment of reviews of products in a particular category. The use of common sentiment words in reviews of different product categories leads to large cross-task similarity, which differentiates it from continual learning in other domains. This knowledge sharing nature renders forgetting reduction focused approaches less effective for the problem under consideration. Unlike existing approaches, where task-specific masks are learned with specifically presumed training objectives, we propose an approach called Task-aware Dropout (TaskDrop) to generate masks in a random way. While the standard dropout generates and applies random masks for each training instance per epoch for effective regularization, TaskDrop applies random masking for task-wise capacity allocation and reuse. We conducted experimental studies on three multi-task review datasets and made comparison to various baselines and state-of-the-art approaches. Our empirical results show that regardless of simplicity, TaskDrop overall achieved competitive performances for all the three datasets, especially after relative long term learning. This demonstrates that the proposed random capacity allocation mechanism works well for continual sentiment classification.
【33】 IBERT: Idiom Cloze-style reading comprehension with Attention 标题:Ibert:有注意力的成语完形填空式阅读理解 链接:https://arxiv.org/abs/2112.02994
作者:Ruiyang Qin,Haozheng Luo,Zheheng Fan,Ziang Ren 机构:Georgia Institute of technology, Atlanta, Georgia, USA 摘要:习语是一种特殊的固定短语,通常来源于故事。它们通常用于非正式对话和文学作品中。它们的含义通常是高度非组合的。习语完形填空是自然语言处理研究中的一个挑战性问题。以前完成这项任务的方法是建立在序列到序列(Seq2Seq)模型上的,并且在现有数据集上取得了相当好的性能。然而,他们在理解习语表达的高度非组合意义方面还存在不足。它们也不同时考虑局部和全局上下文。在本文中,我们提出了一个基于BERT的嵌入Seq2Seq模型,该模型对惯用表达式进行编码,并在全局和局部上下文中考虑它们。我们的模型使用XLNET作为编码器,使用RoBERTa为给定上下文选择最可能的惯用语。在EPIE静态语料库数据集上的实验表明,该模型的性能优于现有的最新研究成果。 摘要:Idioms are special fixed phrases usually derived from stories. They are commonly used in casual conversations and literary writings. Their meanings are usually highly non-compositional. The idiom cloze task is a challenge problem in Natural Language Processing (NLP) research problem. Previous approaches to this task are built on sequence-to-sequence (Seq2Seq) models and achieved reasonably well performance on existing datasets. However, they fall short in understanding the highly non-compositional meaning of idiomatic expressions. They also do not consider both the local and global context at the same time. In this paper, we proposed a BERT-based embedding Seq2Seq model that encodes idiomatic expressions and considers them in both global and local context. Our model uses XLNET as the encoder and RoBERTa for choosing the most probable idiom for a given context. Experiments on the EPIE Static Corpus dataset show that our model performs better than existing state-of-the-arts.
【34】 Towards More Robust Natural Language Understanding 标题:走向更健壮的自然语言理解 链接:https://arxiv.org/abs/2112.02992
作者:Xinliang Frederick Zhang 备注:Undergraduate Research Thesis, The Ohio State University 摘要:自然语言理解(NLU)是自然语言处理(NLP)的一个分支,它使用智能计算机软件来理解编码人类知识的文本。近年来,通过深度学习技术,特别是预训练语言模型,在各种NLU任务中取得了显著进展。除了提出更先进的模型体系结构外,构建更可靠和可信的数据集在改进NLU系统方面也发挥着巨大的作用,没有这些,就不可能训练出一个像样的NLU模型。值得注意的是,人类理解自然语言的能力是灵活而强大的。相反,大多数现有NLU系统无法在域外数据上实现理想的性能,或者难以处理现实世界中具有挑战性的项目(例如,固有的模糊项目、敌对项目)。因此,为了让自然语言理解模型更有效地理解人类语言,人们期望优先研究鲁棒自然语言理解。在本论文中,我们认为NLU系统由两部分组成:NLU模型和NLU数据集。因此,我们认为,为了实现健壮的NLU,模型体系结构/训练和数据集同样重要。具体而言,我们将关注三个NLU任务,以说明不同NLU任务中的鲁棒性问题,以及我们的贡献(即新模型和新数据集),以帮助实现更健壮的自然语言理解。展望未来,强大的自然语言理解的最终目标是建立能够人性化的NLU模型。也就是说,期望健壮的NLU系统能够更可靠地将知识从训练语料库转移到看不见的文档中,并在遇到挑战性项目时存活下来,即使系统不知道用户输入的先验信息。 摘要:Natural Language Understanding (NLU) is a branch of Natural Language Processing (NLP) that uses intelligent computer software to understand texts that encode human knowledge. Recent years have witnessed notable progress across various NLU tasks with deep learning techniques, especially with pretrained language models. Besides proposing more advanced model architectures, constructing more reliable and trustworthy datasets also plays a huge role in improving NLU systems, without which it would be impossible to train a decent NLU model. It's worth noting that the human ability of understanding natural language is flexible and robust. On the contrary, most of existing NLU systems fail to achieve desirable performance on out-of-domain data or struggle on handling challenging items (e.g., inherently ambiguous items, adversarial items) in the real world. Therefore, in order to have NLU models understand human language more effectively, it is expected to prioritize the study on robust natural language understanding. In this thesis, we deem that NLU systems are consisting of two components: NLU models and NLU datasets. As such, we argue that, to achieve robust NLU, the model architecture/training and the dataset are equally important. Specifically, we will focus on three NLU tasks to illustrate the robustness problem in different NLU tasks and our contributions (i.e., novel models and new datasets) to help achieve more robust natural language understanding. Moving forward, the ultimate goal for robust natural language understanding is to build NLU models which can behave humanly. That is, it's expected that robust NLU systems are capable to transfer the knowledge from training corpus to unseen documents more reliably and survive when encountering challenging items even if the system doesn't know a priori of users' inputs.
【35】 Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery 标题:多光谱遥感图像目标检测的跨模态注意特征融合 链接:https://arxiv.org/abs/2112.02991
作者:Qingyun Fang,Zhaokui Wang 备注:23 pages,11 figures, under consideration at Pattern Recognition 摘要:跨模态融合多光谱遥感图像对的互补信息可以提高检测算法的感知能力,使其在更广泛的应用(如夜间检测)中更加稳健和可靠。与以前的方法相比,我们认为不同的特征应该被专门处理,模态特定的特征应该被保留和增强,而模态共享的特征应该从RGB和热红外模态中挑选出来。基于这一思想,提出了一种新颖、轻量级的多光谱特征融合方法,即跨模态注意特征融合(CMAFF)。给定RGB和IR图像的中间特征映射,我们的模块并行地从两种不同的模式(公共模式和差分模式)推断注意映射,然后将注意映射分别乘以输入特征映射以进行自适应特征增强或选择。大量的实验表明,我们提出的方法可以在较低的计算成本下实现最先进的性能。 摘要:Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms, making them more robust and reliable for a wider range of applications, such as nighttime detection. Compared with prior methods, we think different features should be processed specifically, the modality-specific features should be retained and enhanced, while the modality-shared features should be cherry-picked from the RGB and thermal IR modalities. Following this idea, a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions are proposed, named Cross-Modality Attentive Feature Fusion (CMAFF). Given the intermediate feature maps of RGB and IR images, our module parallel infers attention maps from two separate modalities, common- and differential-modality, then the attention maps are multiplied to the input feature map respectively for adaptive feature enhancement or selection. Extensive experiments demonstrate that our proposed approach can achieve the state-of-the-art performance at a low computation cost.
【36】 On the complexity of Dark Chinese Chess 标题:论黑棋的复杂性 链接:https://arxiv.org/abs/2112.02989
作者:Cong Wang,Tongwei Lu 机构:School of Computer Science and, Engineering, Wuhan Institute of Technology, Wuhan, China 摘要:本文对中国象棋的一种变体“黑棋”的博弈进行了复杂性分析。黑暗中国象棋结合了棋盘和纸牌游戏中一些最复杂的方面,如长期战略或规划、大状态空间、随机性和不完全信息,这使得它更接近于现实世界的决策问题,并对游戏AI提出了巨大挑战。在这里,我们设计了一个自玩程序来计算游戏树的复杂度和游戏的平均信息集大小,并提出了一种计算信息集数量的算法。 摘要:This paper provides a complexity analysis for the game of dark Chinese chess (a.k.a. "JieQi"), a variation of Chinese chess. Dark Chinese chess combines some of the most complicated aspects of board and card games, such as long-term strategy or planning, large state space, stochastic, and imperfect-information, which make it closer to the real world decision-making problem and pose great challenges to game AI. Here we design a self-play program to calculate the game tree complexity and average information set size of the game, and propose an algorithm to calculate the number of information sets.
【37】 Lecture Notes on Partially Known MDPs 标题:关于部分已知的MDP的课堂讲稿 链接:https://arxiv.org/abs/2112.02976
作者:Guillermo A. Perez 机构: P´erezUniversiteit AntwerpenAbstractIn these notes we will tackle the problem of finding optimal policies for Markovdecision processes (MDPs) which are not fully known to us 摘要:在这些说明中,我们将解决为我们并不完全了解的马尔可夫决策过程(MDP)寻找最优策略的问题。我们的目的是从离线环境慢慢过渡到在线(学习)环境。也就是说,我们正在走向强化学习。 摘要:In these notes we will tackle the problem of finding optimal policies for Markov decision processes (MDPs) which are not fully known to us. Our intention is to slowly transition from an offline setting to an online (learning) setting. Namely, we are moving towards reinforcement learning.
【38】 DANets: Deep Abstract Networks for Tabular Data Classification and Regression 标题:DANet:用于表格数据分类和回归的深层抽象网络 链接:https://arxiv.org/abs/2112.02962
作者:Jintai Chen,Kuanlun Liao,Yao Wan,Danny Z. Chen,Jian Wu 机构: College of Computer Science and Technology, Zhejiang University, Hangzhou, China, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China 备注:@inproceedings{danets, title={DANets: Deep Abstract Networks for Tabular Data Classification and Regression}, author={Chen, Jintai and Liao, Kuanlun and Wan, Yao and Chen, Danny Z and Wu, Jian}, booktitle={AAAI}, year={2022} } 摘要:表格数据在现实世界的应用中无处不在。尽管许多常用的神经组件(例如卷积)和可扩展神经网络(例如ResNet)已由机器学习社区开发,但其中很少有对表格数据有效的,也很少有设计适合表格数据结构。在本文中,我们提出了一种新颖灵活的表格数据神经组件,称为抽象层(Abstray),它学习显式地对相关输入特征进行分组,并生成更高层次的语义抽象特征。此外,我们还设计了一种结构重参数化方法来压缩Abstray,从而在参考阶段将计算复杂度降低了一个明显的幅度。使用Abstrays构建了一个特殊的基本块,并通过叠加这些块构造了一系列用于表格数据分类和回归的深度抽象网络(DANets)。在DANets中,引入了一个特殊的快捷路径,以从原始表格特征中获取信息,从而帮助不同级别的特征交互。在七个实际表格数据集上的综合实验表明,我们的Abstray和DANets对于表格数据分类和回归是有效的,并且计算复杂度优于竞争方法。此外,我们还评估了DANet的性能增益,验证了我们方法的可扩展性。我们的代码可在https://github.com/WhatAShot/DANet. 摘要:Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANet as it goes deep, verifying the extendibility of our method. Our code is available at https://github.com/WhatAShot/DANet.
【39】 Does constituency analysis enhance domain-specific pre-trained BERT models for relation extraction? 标题:成分分析是否增强了用于关系提取的特定领域预先训练的BERT模型? 链接:https://arxiv.org/abs/2112.02955
作者:Anfu Tang,Louise Deléger,Robert Bossy,Pierre Zweigenbaum,Claire Nédellec 机构:Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France, Université Paris-Saclay, CNRS, Laboratoire interdisciplinaire des sciences du numérique, Orsay, France 备注:None 摘要:近年来,人们对关系抽取进行了大量的研究。BioCreative VII的DrugProt track提供了一个手动注释的语料库,用于开发和评估关系提取系统,其中研究了化学品和基因之间的相互作用。我们描述了提交文件时使用的集合系统,该系统通过多数投票将微调bioBERT、sciBERT和const bioBERT模型的预测结合起来。我们用BERT测试了句法信息对关系提取的贡献。我们观察到,将基于成分的句法信息添加到BERT中提高了精确度,但降低了召回率,因为在序列集中很少看到的关系不太可能被注入句法信息的BERT模型预测。我们的代码可以在线获取[https://github.com/Maple177/drugprot-relation-extraction]. 摘要:Recently many studies have been conducted on the topic of relation extraction. The DrugProt track at BioCreative VII provides a manually-annotated corpus for the purpose of the development and evaluation of relation extraction systems, in which interactions between chemicals and genes are studied. We describe the ensemble system that we used for our submission, which combines predictions of fine-tuned bioBERT, sciBERT and const-bioBERT models by majority voting. We specifically tested the contribution of syntactic information to relation extraction with BERT. We observed that adding constituentbased syntactic information to BERT improved precision, but decreased recall, since relations rarely seen in the train set were less likely to be predicted by BERT models in which the syntactic information is infused. Our code is available online [https://github.com/Maple177/drugprot-relation-extraction].
【40】 Pairwise Learning for Neural Link Prediction 标题:神经链路预测的成对学习方法 链接:https://arxiv.org/abs/2112.02936
作者:Zhitao Wang,Yong Zhou,Litao Hong,Yuanhang Zou,Hanjing Su 机构:WeChat Pay, Tencent, WeChat Search, Tencent 摘要:在本文中,我们旨在提供一个有效的成对学习神经链接预测(PLNLP)框架。该框架将链路预测视为一个成对学习排序问题,由四个主要部分组成,即邻域编码器、链路预测器、负采样和目标函数。该框架是灵活的,任何通用的图神经卷积或链路预测特定的神经结构都可以用作邻域编码器。对于链路预测器,我们设计了不同的评分函数,可以根据不同类型的图进行选择。在阴性采样器中,我们提供了几种特定于问题的采样策略。对于目标函数,我们建议使用一个有效的排名损失,它近似地最大化标准排名度量AUC。我们在开放图基准的4个链路属性预测数据集上评估了所提出的PLNLP框架,包括texttt{ogbl ddi}、texttt{ogbl collab}、texttt{ogbl ppa}和texttt{ogbl-ciation2}。PLNLP在texttt{ogbl ddi}上的性能达到了前1名,在texttt{ogbl collab}和texttt{ogbl-ciation2}上的性能达到了前2名,仅在基本的神经结构上。性能证明了PLNLP的有效性。 摘要:In this paper, we aim at providing an effective Pairwise Learning Neural Link Prediction (PLNLP) framework. The framework treats link prediction as a pairwise learning to rank problem and consists of four main components, i.e., neighborhood encoder, link predictor, negative sampler and objective function. The framework is flexible that any generic graph neural convolution or link prediction specific neural architecture could be employed as neighborhood encoder. For link predictor, we design different scoring functions, which could be selected based on different types of graphs. In negative sampler, we provide several sampling strategies, which are problem specific. As for objective function, we propose to use an effective ranking loss, which approximately maximizes the standard ranking metric AUC. We evaluate the proposed PLNLP framework on 4 link property prediction datasets of Open Graph Benchmark, including texttt{ogbl-ddi}, texttt{ogbl-collab}, texttt{ogbl-ppa} and texttt{ogbl-ciation2}. PLNLP achieves Top 1 performance on texttt{ogbl-ddi}, and Top 2 performance on texttt{ogbl-collab} and texttt{ogbl-ciation2} only with basic neural architecture. The performance demonstrates the effectiveness of PLNLP.
【41】 Is Class-Incremental Enough for Continual Learning? 标题:班级增量足够持续学习吗? 链接:https://arxiv.org/abs/2112.02925
作者:Andrea Cossu,Gabriele Graffieti,Lorenzo Pellegrini,Davide Maltoni,Davide Bacciu,Antonio Carta,Vincenzo Lomonaco 机构:Pervasive AI Lab, Computer Science Department, University of Pisa, Pisa, Italy, Scuola Normale Superiore, Pisa, Italy, Biometric System & Smart City Lab, Computer Science Department, University of, Bologna, Bologna, Italy, Correspondence: 备注:Under review 摘要:模型持续学习的能力可以在不同的持续学习场景中进行经验评估。每个场景都定义了学习环境的约束和机会。在这里,我们挑战了持续学习文献中的当前趋势,即主要在课堂增量场景中进行实验,在这种场景中,一次体验中出现的课堂永远不会被重温。我们认为,过度关注这一环境可能会限制未来关于持续学习的研究,因为课堂增量场景人为地加剧灾难性遗忘,而牺牲了其他重要目标,如前向迁移和计算效率。事实上,在许多现实环境中,重复先前遇到的概念是自然发生的,有助于缓和先前知识的中断。我们主张对替代性持续学习场景进行更深入的研究,在该场景中,重复通过设计整合到传入信息流中。从已经存在的建议开始,我们描述了此类课程增量和重复场景可以为持续学习模型的更全面评估提供的优势。 摘要:The ability of a model to learn continually can be empirically assessed in different continual learning scenarios. Each scenario defines the constraints and the opportunities of the learning environment. Here, we challenge the current trend in the continual learning literature to experiment mainly on class-incremental scenarios, where classes present in one experience are never revisited. We posit that an excessive focus on this setting may be limiting for future research on continual learning, since class-incremental scenarios artificially exacerbate catastrophic forgetting, at the expense of other important objectives like forward transfer and computational efficiency. In many real-world environments, in fact, repetition of previously encountered concepts occurs naturally and contributes to softening the disruption of previous knowledge. We advocate for a more in-depth study of alternative continual learning scenarios, in which repetition is integrated by design in the stream of incoming information. Starting from already existing proposals, we describe the advantages such class-incremental with repetition scenarios could offer for a more comprehensive assessment of continual learning models.
【42】 A Tale of Color Variants: Representation and Self-Supervised Learning in Fashion E-Commerce 标题:颜色变体的故事:服装电子商务中的表征和自我监督学习 链接:https://arxiv.org/abs/2112.02910
作者:Ujjal Kr Dutta,Sandeep Repakula,Maulik Parmar,Abhinav Ravi 机构:Data Sciences-Image Sciences, Myntra 备注:In Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)/ AAAI Conference on Artificial Intelligence (AAAI) 2022. arXiv admin note: substantial text overlap with arXiv:2104.08581 摘要:在本文中,我们讨论了时尚电子商务中的一个关键问题(与客户体验以及收入有关):颜色变体识别,即识别在设计(或风格)上完全匹配但仅在颜色上不同的时尚产品。我们提出了一个通用框架,该框架的核心是利用深度视觉表征学习,为我们的时尚电子商务平台解决这个问题。我们的框架可以通过手动获取的三元组形式的监控信号进行训练。然而,在捕获所有困难的情况下,为时尚电子商务平台(如我们的平台)中通常存在的整个庞大数据集合获取手动注释是不可行的。但是,为了拯救我们,有趣的是,我们观察到,时尚电子商务中的这一关键问题也可以通过简单的基于颜色抖动的图像增强来解决,这一点最近在对比自监督学习(SSL)文献中广为流行,该文献旨在学习视觉表示,而不使用手动标签。这自然会在我们的脑海中引出一个问题:我们是否可以在用例中利用SSL,并且仍然可以获得与受监管框架相当的性能?答案是,是的!因为,颜色变化的时尚对象只不过是一种风格的表现形式,不同的颜色,一个经过训练对颜色保持不变的模型(有监督或没有监督)应该能够识别这一点!这是本文进一步从定性和定量两方面论证的内容,同时评估了两种最先进的SSL技术,并提出了一种新方法。 摘要:In this paper, we address a crucial problem in fashion e-commerce (with respect to customer experience, as well as revenue): color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for our fashion e-commerce platform. Our framework could be trained with supervisory signals in the form of triplets, that are obtained manually. However, it is infeasible to obtain manual annotations for the entire huge collection of data usually present in fashion e-commerce platforms, such as ours, while capturing all the difficult corner cases. But, to our rescue, interestingly we observed that this crucial problem in fashion e-commerce could also be solved by simple color jitter based image augmentation, that recently became widely popular in the contrastive Self-Supervised Learning (SSL) literature, that seeks to learn visual representations without using manual labels. This naturally led to a question in our mind: Could we leverage SSL in our use-case, and still obtain comparable performance to our supervised framework? The answer is, Yes! because, color variant fashion objects are nothing but manifestations of a style, in different colors, and a model trained to be invariant to the color (with, or without supervision), should be able to recognize this! This is what the paper further demonstrates, both qualitatively, and quantitatively, while evaluating a couple of state-of-the-art SSL techniques, and also proposing a novel method.
【43】 Interpretable Image Classification with Differentiable Prototypes Assignment 标题:基于不同原型赋值的可解释图像分类 链接:https://arxiv.org/abs/2112.02902
作者:Dawid Rymarczyk,Łukasz Struski,Michał Górszczak,Koryna Lewandowska,Jacek Tabor,Bartosz Zieliński 机构:Michał G´orszczak, Bartosz Zieli´nski, Jagiellonian University, Ardigen SA, Department of Cognitive Neuroscience and Neuroergonomics, Institute of Applied Psychology 备注:Code will be published after paper acceptance 摘要:我们介绍ProtoPool,一个可解释的图像分类模型,它有一个由类共享的原型池。与现有方法相比,该训练更为直接,因为它不需要修剪阶段。它是通过引入原型对特定类的完全可微赋值来实现的。此外,我们还引入了一种新的焦点相似性函数,将模型聚焦在罕见的前景特征上。我们表明,ProtoPool在CUB-200-2011和斯坦福汽车数据集上获得了最先进的准确性,大大减少了原型的数量。我们提供了该方法的理论分析和用户研究,以表明我们的原型比通过竞争方法获得的原型更具特色。 摘要:We introduce ProtoPool, an interpretable image classification model with a pool of prototypes shared by the classes. The training is more straightforward than in the existing methods because it does not require the pruning stage. It is obtained by introducing a fully differentiable assignment of prototypes to particular classes. Moreover, we introduce a novel focal similarity function to focus the model on the rare foreground features. We show that ProtoPool obtains state-of-the-art accuracy on the CUB-200-2011 and the Stanford Cars datasets, substantially reducing the number of prototypes. We provide a theoretical analysis of the method and a user study to show that our prototypes are more distinctive than those obtained with competitive methods.
【44】 Seeing BDD100K in dark: Single-Stage Night-time Object Detection via Continual Fourier Contrastive Learning 标题:在黑暗中看到BDD100K:基于连续傅立叶对比学习的单级夜间目标检测 链接:https://arxiv.org/abs/2112.02891
作者:Ujjal Kr Dutta 摘要:尽管最先进的目标探测器有了巨大的改进,但在有限的可用论文中,通过非统一的评估协议对夜间目标探测的研究也很少。除了缺乏解决这一问题的方法外,还缺乏足够大的基准数据集来研究夜间目标检测。最近,我们引入了大规模的BDD100K,我们认为应该选择它作为基准,以启动这一领域的研究。现在,对于这些方法,现有的方法(数量有限)主要是基于生成图像转换的方法,或者基于图像增强/照明的方法,这两种方法都不是自然的,都不符合人类在夜间看到物体的方式(通过聚焦物体轮廓)。在本文中,我们填补了这3个空白:1。缺乏统一的评估方案(由于其有效性和效率,使用单级检测器),2。选择用于基准夜间目标检测的数据集,以及3。一种解决当前备选方案局限性的新方法。我们的方法利用了基于对比学习的特征提取器,通过傅立叶变换从频域中借用信息,并以基于持续学习的方式进行训练。当用于对象检测时(在微调分类和回归层后),学习的功能有助于实现新的最先进的经验性能,轻松超越大量竞争对手。 摘要:Despite tremendous improvements in state-of-the-art object detectors, addressing object detection in the night-time has been studied only sparsely, that too, via non-uniform evaluation protocols among the limited available papers. In addition to the lack of methods to address this problem, there was also a lack of an adequately large benchmark dataset to study night-time object detection. Recently, the large scale BDD100K was introduced, which, in our opinion, should be chosen as the benchmark, to kickstart research in this area. Now, coming to the methods, existing approaches (limited in number), are mainly either generative image translation based, or image enhancement/ illumination based, neither of which is natural, conforming to how humans see objects in the night time (by focusing on object contours). In this paper, we bridge these 3 gaps: 1. Lack of an uniform evaluation protocol (using a single-stage detector, due to its efficacy, and efficiency), 2. Choice of dataset for benchmarking night-time object detection, and 3. A novel method to address the limitations of current alternatives. Our method leverages a Contrastive Learning based feature extractor, borrowing information from the frequency domain via Fourier transformation, and trained in a continual learning based fashion. The learned features when used for object detection (after fine-tuning the classification and regression layers), help achieve a new state-of-the-art empirical performance, comfortably outperforming an extensive number of competitors.
【45】 Invitation in Crowdsourcing Contests 标题:众包大赛邀请函 链接:https://arxiv.org/abs/2112.02884
作者:Qi Shi,Dong Hao 机构:University of Electronic Science and Technology of China 摘要:在众包竞赛中,持有某项任务的请求者将其发布到人群中。然后人群中的人相互竞争以赢得奖励。尽管在现实生活中,人群通常是网络化的,人们通过社会关系相互影响,但现有的众包竞赛理论并不旨在回答人际关系如何影响人们的激励和行为,从而影响众包绩效。在这项工作中,我们新颖地将人们的社会关系作为建模和设计代理人众包竞赛激励的关键因素。然后,我们建立了一种新的竞争机制,通过该机制,请求者可以促使代理邀请其邻居参与任务。该机制有一个简单的规则,很容易让代理发挥作用。根据我们的均衡分析,在贝叶斯纳什均衡中,代理人的行为表现出巨大的多样性,捕捉到除了内在能力之外,代理人之间的社会联系也在决策中起着核心作用。然后,我们设计了一个有效的算法来自动计算邀请众包竞赛的贝叶斯纳什均衡,并进一步将其应用于大型图。理论和实证结果都表明,邀请众包竞赛可以大幅扩大贡献者的数量,从而请求者可以在不需要大量广告支出的情况下获得显著更好的解决方案。 摘要:In a crowdsourcing contest, a requester holding a task posts it to a crowd. People in the crowd then compete with each other to win the rewards. Although in real life, a crowd is usually networked and people influence each other via social ties, existing crowdsourcing contest theories do not aim to answer how interpersonal relationships influence peoples' incentives and behaviors, and thereby affect the crowdsourcing performance. In this work, we novelly take peoples' social ties as a key factor in the modeling and designing of agents' incentives for crowdsourcing contests. We then establish a new contest mechanism by which the requester can impel agents to invite their neighbours to contribute to the task. The mechanism has a simple rule and is very easy for agents to play. According to our equilibrium analysis, in the Bayesian Nash equilibrium agents' behaviors show a vast diversity, capturing that besides the intrinsic ability, the social ties among agents also play a central role for decision-making. After that, we design an effective algorithm to automatically compute the Bayesian Nash equilibrium of the invitation crowdsourcing contest and further adapt it to large graphs. Both theoretical and empirical results show that, the invitation crowdsourcing contest can substantially enlarge the number of contributors, whereby the requester can obtain significantly better solutions without a large advertisement expenditure.
【46】 Distance and Hop-wise Structures Encoding Enhanced Graph Attention Networks 标题:增强型图注意网络的距离和跳数结构编码 链接:https://arxiv.org/abs/2112.02868
作者:Zhiguo Huang,Xiaowei Chen,Bojuan Wang 机构:Sci-Tech Academy of ZheJiang University;Research Center of Hundsun LTD., Hangzhou, China, School of Finance, NanKai University, Tianjin, China 备注:11 pages; 1 figures; 摘要:大量工作已经证明,现有的邻域平均图神经网络不能有效地捕捉结构特征,许多工作表明注入结构、距离、位置或空间特征可以显著提高GNN的性能,将总体结构和距离注入GNNs是一个直观但未触及的想法。在这项工作中,我们阐明了方向。首先提取节点的逐跳结构信息,计算距离分布信息,结合节点的内在特征,将其嵌入到同一向量空间中,然后将其相加。将得到的嵌入向量送入GAT(如GAT、AGDN)中,然后进行校正和平滑,实验表明DHSEGATs达到了有竞争力的结果。该守则可于https://github.com/hzg0601/DHSEGATs. 摘要:Numerous works have proven that existing neighbor-averaging Graph Neural Networks cannot efficiently catch structure features, and many works show that injecting structure, distance, position or spatial features can significantly improve performance of GNNs, however, injecting overall structure and distance into GNNs is an intuitive but remaining untouched idea. In this work, we shed light on the direction. We first extracting hop-wise structure information and compute distance distributional information, gathering with node's intrinsic features, embedding them into same vector space and then adding them up. The derived embedding vectors are then fed into GATs(like GAT, AGDN) and then Correct and Smooth, experiments show that the DHSEGATs achieve competitive result. The code is available at https://github.com/hzg0601/DHSEGATs.
【47】 Target Entropy Annealing for Discrete Soft Actor-Critic 标题:离散软角色-批评者的目标熵退火算法 链接:https://arxiv.org/abs/2112.02852
作者:Yaosheng Xu,Dailin Hu,Litian Liang,Stephen McAleer,Pieter Abbeel,Roy Fox 机构:Department of Computer Science, University of California, Irvine, Department of Electrical Engineering and Computer Science, University of California, Berkeley 备注:None 摘要:软演员评论家(SAC)被认为是连续动作空间环境中最先进的算法。它使用最大熵框架来提高效率和稳定性,并应用启发式温度拉格朗日项来调整温度$alpha$,这决定了策略的“软”程度。与直觉相反的是,经验证据表明SAC在离散领域表现不佳。在本文中,我们研究了这一现象的可能解释,并提出了目标熵调度SAC(TES-SAC),一种应用于SAC的目标熵参数的退火方法。目标熵是温度拉格朗日项中的常数,表示离散SAC中的目标策略熵。我们在Atari 2600游戏上比较了我们的方法和不同的恒定目标熵SAC,并分析了我们的调度如何影响SAC。 摘要:Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $alpha$, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.
【48】 Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks 标题:离线预训练多Agent决策转换器:一个大序列模型征服所有StarCraftII任务 链接:https://arxiv.org/abs/2112.02845
作者:Linghui Meng,Muning Wen,Yaodong Yang,Chenyang Le,Xiyun Li,Weinan Zhang,Ying Wen,Haifeng Zhang,Jun Wang,Bo Xu 机构:Institute of Automation, CAS, China,School of Artificial Intelligence, UCAS, China, Shanghai Jiao Tong University,King’s College London, University College London 备注:17 pages, 6 figures 摘要:离线强化学习利用静态数据集学习最佳策略,无需访问环境。由于多智能体在线交互的昂贵性和训练过程中对样本数量的要求,这种技术适合于多智能体学习任务。然而,在多智能体强化学习(MARL)中,离线预训练与在线微调的范例从未被研究过,离线MARL研究的数据集或基准也不可用。在本文中,我们试图回答以下问题:MARL中的离线预训练是否能够学习有助于提高多个下游任务性能的通用策略表示。我们首先介绍了第一个基于StarCraftII环境的具有不同质量级别的离线MARL数据集,然后提出了一种新的用于有效离线学习的多智能体决策转换器(MADT)体系结构。MADT利用Transformer的时间表示建模能力,并将其与离线和在线MARL任务集成。MADT的一个重要优点是,它学习可在不同任务场景下在不同类型的代理之间传输的通用策略。在星际争霸II离线数据集上进行评估时,MADT的性能优于最先进的离线RL基线。当应用于在线任务时,预先训练的MADT显著提高了样本效率,即使在Zero-Shot的情况下也有很强的性能。据我们所知,这是第一项研究和证明离线预训练模型在MARL中的样本效率和通用性增强方面的有效性的工作。 摘要:Offline reinforcement learning leverages static datasets to learn optimal policies with no necessity to access the environment. This technique is desirable for multi-agent learning tasks due to the expensiveness of agents' online interactions and the demanding number of samples during training. Yet, in multi-agent reinforcement learning (MARL), the paradigm of offline pre-training with online fine-tuning has never been studied, nor datasets or benchmarks for offline MARL research are available. In this paper, we try to answer the question of whether offline pre-training in MARL is able to learn generalisable policy representations that can help improve the performance of multiple downstream tasks. We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment, and then propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning. MADT leverages Transformer's modelling ability of temporal representations and integrates it with both offline and online MARL tasks. A crucial benefit of MADT is that it learns generalisable policies that can transfer between different types of agents under different task scenarios. When evaluated on StarCraft II offline dataset, MADT demonstrates superior performance than state-of-the-art offline RL baselines. When applied to online tasks, the pre-trained MADT significantly improves sample efficiency, and enjoys strong performance even in zero-shot cases. To our best knowledge, this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalisability enhancements in MARL.
【49】 MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering 标题:MOCA:结合多阶段领域预训练和交叉引导多模态注意的教科书答疑 链接:https://arxiv.org/abs/2112.02839
作者:Fangzhi Xu,Qika Lin,Jun Liu,Lingling Zhang,Tianzhe Zhao,Qi Chai,Yudai Pan 机构:Department of Computer Science and Technology, Xi’an Jiaotong University, China, National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, China 备注:9 pages, 6 figures 摘要:教科书问答(TQA)是一项复杂的多模态任务,通过大量的上下文描述和丰富的图表来推断答案。与可视化问答(VQA)相比,TQA包含大量不常见的术语和各种图表输入。它对特定领域跨语言模型的表示能力提出了新的挑战。它还将多模态融合推进到一个更复杂的层次。为了解决上述问题,我们提出了一个新的模型MoCA,该模型结合了多阶段领域预训练和多模式交叉注意的TQA任务。首先,我们引入一个多级域预训练模块,利用span掩码策略和有监督的预微调进行无监督的后预训练。特别是对于域后预训练,我们提出了一种启发式生成算法来使用术语语料库。其次,为了充分考虑语境和图式的丰富输入,提出了一种基于循序渐进策略的跨学科多模式注意更新文本、问题图和教学图特征的方法。此外,采用了双选通机制来改进模型集成。实验结果显示了我们的模型的优越性,在验证和测试分割方面,我们的模型分别比最先进的方法高2.21%和2.43%。 摘要:Textbook Question Answering (TQA) is a complex multimodal task to infer answers given large context descriptions and abundant diagrams. Compared with Visual Question Answering (VQA), TQA contains a large number of uncommon terminologies and various diagram inputs. It brings new challenges to the representation capability of language model for domain-specific spans. And it also pushes the multimodal fusion to a more complex level. To tackle the above issues, we propose a novel model named MoCA, which incorporates multi-stage domain pretraining and multimodal cross attention for the TQA task. Firstly, we introduce a multi-stage domain pretraining module to conduct unsupervised post-pretraining with the span mask strategy and supervised pre-finetune. Especially for domain post-pretraining, we propose a heuristic generation algorithm to employ the terminology corpus. Secondly, to fully consider the rich inputs of context and diagrams, we propose cross-guided multimodal attention to update the features of text, question diagram and instructional diagram based on a progressive strategy. Further, a dual gating mechanism is adopted to improve the model ensemble. The experimental results show the superiority of our model, which outperforms the state-of-the-art methods by 2.21% and 2.43% for validation and test split respectively.
【50】 SyntEO: Synthetic Dataset Generation for Earth Observation with Deep Learning -- Demonstrated for Offshore Wind Farm Detection 标题:SyntEO:基于深度学习的对地观测综合数据集生成--以海上风电场探测为例 链接:https://arxiv.org/abs/2112.02829
作者:Thorsten Hoeser,Claudia Kuenzer 机构:German Remote Sensing Data Center (DFD), German Aerospace Center (DLR), Department of Remote Sensing, Institute of Geography and Geology, University of Wuerzburg 备注:25 pages, 12 figures 摘要:随着过去几年深入学习的出现,地球观测研究出现了新的机遇。然而,它们也带来了新的挑战。深度学习模型的数据饥渴训练过程需要大量的、资源昂贵的、带注释的数据集,并部分取代了知识驱动的方法,因此模型行为和最终预测过程成为一个黑箱。拟议的SyntEO方法使地球观测研究人员能够自动生成大型深度学习准备数据集,从而释放其他占用的资源。SyntEO通过在数据生成过程中以高度结构化的方式包含专家知识来实现这一点。通过这种方式,建立了完全可控的实验环境,支持模型训练中的洞察。因此,SyntEO使学习过程可接近,模型行为可解释,这是可解释机器学习的重要基石。我们通过在两个世界上最大的海上风能生产基地的Sentinel-1图像中预测海上风电场来演示SyntEO方法。生成的最大数据集有90000个训练示例。用于目标检测的基本卷积神经网络(仅针对该合成数据进行训练)通过在具有挑战性的环境中最小化错误检测,自信地检测海上风电场。此外,还生成了四个顺序数据集,演示了SyntEO方法如何精确定义数据集结构并影响训练过程。因此,SyntEO是一种混合方法,它在专家知识和数据驱动的图像分析之间创建了一个接口。 摘要:With the emergence of deep learning in the last years, new opportunities arose in Earth observation research. Nevertheless, they also brought with them new challenges. The data-hungry training processes of deep learning models demand large, resource expensive, annotated datasets and partly replaced knowledge-driven approaches, so that model behaviour and the final prediction process became a black box. The proposed SyntEO approach enables Earth observation researchers to automatically generate large deep learning ready datasets and thus free up otherwise occupied resources. SyntEO does this by including expert knowledge in the data generation process in a highly structured manner. In this way, fully controllable experiment environments are set up, which support insights in the model training. Thus, SyntEO makes the learning process approachable and model behaviour interpretable, an important cornerstone for explainable machine learning. We demonstrate the SyntEO approach by predicting offshore wind farms in Sentinel-1 images on two of the worlds largest offshore wind energy production sites. The largest generated dataset has 90,000 training examples. A basic convolutional neural network for object detection, that is only trained on this synthetic data, confidently detects offshore wind farms by minimising false detections in challenging environments. In addition, four sequential datasets are generated, demonstrating how the SyntEO approach can precisely define the dataset structure and influence the training process. SyntEO is thus a hybrid approach that creates an interface between expert knowledge and data-driven image analysis.
【51】 Fast Test Input Generation for Finding Deviated Behaviors in Compressed Deep Neural Network 标题:用于发现压缩深度神经网络偏差行为的快速测试输入生成 链接:https://arxiv.org/abs/2112.02819
作者:Yongqiang Tian,Wuqi Zhang,Ming Wen,Shing-Chi Cheung,Chengnian Sun,Shiqing Ma,Yu Jiang 机构: University of Waterloo, Hong Kong University of Science and Technology, Huazhong University of Science and Technology, Rutgers University, Tsinghua University 摘要:模型压缩可以显著减少深度神经网络(DNN)模型的大小,以便压缩后的大型复杂模型可以部署在资源有限的移动和物联网设备上。然而,模型压缩通常会在压缩模型中引入偏差行为:原始模型和压缩模型对相同的输入输出不同的预测结果。因此,在部署之前警告开发人员并帮助他们全面评估此类行为可能产生的后果是至关重要的。为此,我们提出了TriggerFinder,这是一种新颖、有效和高效的测试方法,用于自动识别输入以触发压缩模型中的偏差行为。将输入i作为种子,TriggerFinder迭代地应用一系列变异操作来更改i,直到生成的输入触发偏离行为。然而,压缩模型通常隐藏其结构和梯度信息;如果没有指导等内部信息,就很难有效地触发偏差行为。为了应对这一挑战,我们提出了一种新的适应度函数,以确定更接近可能触发偏差预测的输入的突变输入。此外,TriggerFinder将该搜索问题建模为马尔可夫链过程,并利用Metropolis-Hasting算法指导变异算子的选择。我们在18个压缩模型和两个数据集上评估了TriggerFinder。实验结果表明,当基线在某些情况下失败时,TriggerFinder可以成功地找到所有种子输入的触发输入。至于效率,TriggerFinder的速度是基线速度的5.2x-115.8x。此外,TriggerFinder查找一个触发输入所需的查询仅为基线的51.8x-535.6x。 摘要:Model compression can significantly reduce sizes of deep neural network (DNN) models so that large, sophisticated models after compression can be deployed on resource-limited mobile and IoT devices. However, model compression often introduces deviated behaviors into a compressed model: the original and compressed models output different prediction results for the same input. Hence, it is critical to warn developers and help them comprehensively evaluate possible consequences of such behaviors before deployment. To this end, we propose TriggerFinder, a novel, effective and efficient testing approach to automatically identifying inputs to trigger deviated behaviors in compressed models. Given an input i as a seed, TriggerFinder iteratively applies a series of mutation operations to change i until the resulting input triggers a deviated behavior. However, compressed models usually hide their architecture and gradient information; without such internal information as guidance, it becomes difficult to effectively and efficiently trigger deviated behaviors. To tackle this challenge, we propose a novel fitness function to determine the mutated input that is closer to the inputs that can trigger the deviated predictions. Furthermore, TriggerFinder models this search problem as a Markov Chain process and leverages the Metropolis-Hasting algorithm to guide the selection of mutation operators. We evaluated TriggerFinder on 18 compressed models with two datasets. The experiment results demonstrate that TriggerFinder can successfully find triggering inputs for all seed inputs while the baseline fails in certain cases. As for efficiency, TriggerFinder is 5.2x-115.8x as fast as the baselines. Furthermore, the queries required by TriggerFinder to find one triggering input is only 51.8x-535.6x as small as the baseline.
【52】 ED2: An Environment Dynamics Decomposition Framework for World Model Construction 标题:ED2:一种用于构建世界模型的环境动力学分解框架 链接:https://arxiv.org/abs/2112.02817
作者:Cong Wang,Tianpei Yang,Jianye Hao,Yan Zheng,Hongyao Tang,Fazl Barez,Jinyi Liu,Jiajie Peng,Haiyin Piao,Zhixiao Sun 机构:College of Intelligence and Computing, Tianjin University, School of Computer Science, Northwestern Polytechnical University 摘要:基于模型的强化学习方法在许多任务中取得了显著的样本效率,但其性能往往受到模型误差的限制。为了减少模型误差,以前的工作使用一个设计良好的网络来拟合整个环境动力学,将环境动力学视为一个黑箱。然而,这些方法缺乏考虑环境分解性质的动态可能包含多个子动力学,可以单独建模,使我们能够更准确地构建世界模型。在本文中,我们提出了环境动力学分解(ED2),一种新的世界模型构建框架,以分解的方式对环境进行建模。ED2包含两个关键组件:子动力学发现(SD2)和动力学分解预测(D2P)。SD2发现环境中的子动力学,然后D2P根据子动力学构造分解的世界模型。ED2可以很容易地与现有的MBRL算法相结合,实证结果表明,ED2显著降低了模型误差,提高了最先进的MBRL算法在各种任务上的性能。 摘要:Model-based reinforcement learning methods achieve significant sample efficiency in many tasks, but their performance is often limited by the existence of the model error. To reduce the model error, previous works use a single well-designed network to fit the entire environment dynamics, which treats the environment dynamics as a black box. However, these methods lack to consider the environmental decomposed property that the dynamics may contain multiple sub-dynamics, which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error and boosts the performance of the state-of-the-art MBRL algorithms on various tasks.
【53】 A Survey of Deep Learning for Low-Shot Object Detection 标题:深度学习在低镜头目标检测中的研究进展 链接:https://arxiv.org/abs/2112.02814
作者:Qihan Huang,Haofei Zhang,Jie Song,Mingli Song 摘要:目标检测是计算机视觉和图像处理中的一项基本任务。目前,基于深度学习的目标检测器已经取得了巨大的成功,拥有丰富的标记数据。但在现实生活中,并不能保证每个对象类别都有足够的标记样本用于训练。当训练数据有限时,这些大目标检测器容易过度拟合。因此,有必要在目标检测中引入少量镜头学习和Zero-Shot学习,将其统称为低镜头目标检测。低镜头目标检测(LSOD)的目的是从少量甚至零标记的数据中检测目标,这可以分为Few-Shot目标检测(FSOD)和Zero-Shot目标检测(ZSD)。本文对基于FSOD和ZSD的深度学习进行了全面调查。首先,本调查将FSOD和ZSD的方法分为不同的类别,并讨论了它们的优缺点。其次,本次调查回顾了FSOD和ZSD的数据集设置和评估指标,然后分析了不同方法在这些基准上的性能。最后,本调查讨论了FSOD和ZSD的未来挑战和前景。 摘要:Object detection is a fundamental task in computer vision and image processing. Current deep learning based object detectors have been highly successful with abundant labeled data. But in real life, it is not guaranteed that each object category has enough labeled samples for training. These large object detectors are easy to overfit when the training data is limited. Therefore, it is necessary to introduce few-shot learning and zero-shot learning into object detection, which can be named low-shot object detection together. Low-Shot Object Detection (LSOD) aims to detect objects from a few or even zero labeled data, which can be categorized into few-shot object detection (FSOD) and zero-shot object detection (ZSD), respectively. This paper conducts a comprehensive survey for deep learning based FSOD and ZSD. First, this survey classifies methods for FSOD and ZSD into different categories and discusses the pros and cons of them. Second, this survey reviews dataset settings and evaluation metrics for FSOD and ZSD, then analyzes the performance of different methods on these benchmarks. Finally, this survey discusses future challenges and promising directions for FSOD and ZSD.
【54】 MDPGT: Momentum-based Decentralized Policy Gradient Tracking 标题:MDPGT:基于动量的分散策略梯度跟踪 链接:https://arxiv.org/abs/2112.02813
作者:Zhanhong Jiang,Xian Yeow Lee,Sin Yong Tan,Kai Liang Tan,Aditya Balu,Young M. Lee,Chinmay Hegde,Soumik Sarkar 机构:Johnson Controls Inc., East Michigan St, Milwaukee, WI , Iowa State University, Ames, IA , New York University, MetroTech Center, Brooklyn, NY 摘要:我们提出了一种新的多智能体强化学习策略梯度方法,该方法利用了两种不同的方差缩减技术,并且不需要在迭代过程中进行大量处理。具体地说,我们提出了一种基于动量的分散策略梯度跟踪(MDPGT),其中使用一种新的基于动量的方差缩减技术来近似具有重要抽样的局部策略梯度代理,并采用一个中间参数来跟踪两个连续的策略梯度代理。此外,MDPGT可证明实现了$mathcal{O}(N^{-1}epsilon^{-3})$的最佳可用样本复杂度,以便收敛到$N$局部性能函数(可能是非局部函数)的全局平均值的$epsilon$平稳点。这比分散无模型强化学习中的最新样本复杂度要好,当使用单个轨迹初始化时,样本复杂度与现有分散策略梯度方法获得的样本复杂度匹配。我们进一步验证了高斯策略函数的理论主张。当所需的容错$epsilon$足够小时,MDPGT会导致线性加速,这是以前在分散随机优化中建立的,但不适用于强化学习。最后,我们提供了一个多智能体强化学习基准环境的实证结果,以支持我们的理论发现。 摘要:We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. Moreover, MDPGT provably achieves the best available sample complexity of $mathcal{O}(N^{-1}epsilon^{-3})$ for converging to an $epsilon$-stationary point of the global average of $N$ local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning, and when initialized with a single trajectory, the sample complexity matches those obtained by the existing decentralized policy gradient methods. We further validate the theoretical claim for the Gaussian policy function. When the required error tolerance $epsilon$ is small enough, MDPGT leads to a linear speed up, which has been previously established in decentralized stochastic optimization, but not for reinforcement learning. Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.
【55】 An Effective GCN-based Hierarchical Multi-label classification for Protein Function Prediction 标题:一种有效的基于GCN的蛋白质功能预测分层多标签分类方法 链接:https://arxiv.org/abs/2112.02810
作者:Kyudam Choi,Yurim Lee,Cheongwon Kim,Minsung Yoon 机构:Department of Software Convergence, Sejong University, Seoul, South Korea, Department of Artificial Intelligence and Linguistic Engineering, Sejong University, Seoul, South Korea 摘要:我们提出了一种有效的方法来改进蛋白质功能预测(PFP),利用基因本体(GO)术语的层次特征。我们的方法包括用于编码蛋白质序列的语言模型和用于表示GO术语的图卷积网络(GCN)。为了反映GO-To-GCN的层次结构,我们采用包含整个层次信息的节点(GO-term)表示。与以前的模型相比,我们的算法通过扩展GO图在大规模图中显示了有效性。实验结果表明,我们的方法优于现有的PFP方法。 摘要:We propose an effective method to improve Protein Function Prediction (PFP) utilizing hierarchical features of Gene Ontology (GO) terms. Our method consists of a language model for encoding the protein sequence and a Graph Convolutional Network (GCN) for representing GO terms. To reflect the hierarchical structure of GO to GCN, we employ node(GO term)-wise representations containing the whole hierarchical information. Our algorithm shows effectiveness in a large-scale graph by expanding the GO graph compared to previous models. Experimental results show that our method outperformed state-of-the-art PFP approaches.
【56】 Texture Reformer: Towards Fast and Universal Interactive Texture Transfer 标题:纹理重建器:迈向快速、通用的交互式纹理转换 链接:https://arxiv.org/abs/2112.02788
作者:Zhizhong Wang,Lei Zhao,Haibo Chen,Ailin Li,Zhiwen Zuo,Wei Xing,Dongming Lu 机构:College of Computer Science and Technology, Zhejiang University 备注:Accepted by AAAI2022 摘要:在本文中,我们提出了纹理重整器,这是一种基于神经网络的快速通用框架,用于用户指定指导下的交互式纹理传输。挑战在于三个方面:1)任务的多样性,2)制导图的简单性,以及3)执行效率。为了应对这些挑战,我们的关键思想是使用一种新的前馈多视图和多阶段合成过程,包括I)全局视图结构对齐阶段,II)局部视图纹理细化阶段,以及III)整体效果增强阶段,以粗到细的方式合成具有连贯结构和精细纹理细节的高质量结果。此外,我们还引入了一种新的无学习视图特定纹理重构(VSTR)操作,并采用了一种新的语义地图引导策略,以实现更精确的语义引导和结构保持纹理传输。在各种应用场景上的实验结果证明了该框架的有效性和优越性。与最先进的交互式纹理转移算法相比,它不仅获得了更高质量的结果,而且更显著的是,它的速度也快了2-5个数量级。代码可在https://github.com/EndyWon/Texture-Reformer. 摘要:In this paper, we present the texture reformer, a fast and universal neural-based framework for interactive texture transfer with user-specified guidance. The challenges lie in three aspects: 1) the diversity of tasks, 2) the simplicity of guidance maps, and 3) the execution efficiency. To address these challenges, our key idea is to use a novel feed-forward multi-view and multi-stage synthesis procedure consisting of I) a global view structure alignment stage, II) a local view texture refinement stage, and III) a holistic effect enhancement stage to synthesize high-quality results with coherent structures and fine texture details in a coarse-to-fine fashion. In addition, we also introduce a novel learning-free view-specific texture reformation (VSTR) operation with a new semantic map guidance strategy to achieve more accurate semantic-guided and structure-preserved texture transfer. The experimental results on a variety of application scenarios demonstrate the effectiveness and superiority of our framework. And compared with the state-of-the-art interactive texture transfer algorithms, it not only achieves higher quality results but, more remarkably, also is 2-5 orders of magnitude faster. Code is available at https://github.com/EndyWon/Texture-Reformer.
【57】 A General Framework for Debiasing in CTR Prediction 标题:CTR预测中去偏的一个通用框架 链接:https://arxiv.org/abs/2112.02767
作者:Wenjie Chu,Shen Li,Chao Chen,Longfei Xu,Hengbin Cui,Kaikui Liu 机构:Alibaba Group 摘要:大多数现有的降低点击率(CTR)预测的方法依赖于一个过于简单的假设,即点击概率是观察概率和相关概率的乘积。然而,由于这两种概率之间存在复杂的相互作用,因此这些方法不能应用于其他场景,例如查询自动完成(QAC)和路由推荐。我们在不简化变量之间关系的情况下提出了一个通用的借记框架,它可以处理CTR预测中的所有场景。仿真实验表明:在最简单的情况下,我们的方法保持了与现有方法相似的AUC;在其他场景中,与现有方法相比,我们的方法实现了相当大的改进。同时,在在线实验中,该框架也得到了持续的显著改进。 摘要:Most of the existing methods for debaising in click-through rate (CTR) prediction depend on an oversimplified assumption, i.e., the click probability is the product of observation probability and relevance probability. However, since there is a complicated interplay between these two probabilities, these methods cannot be applied to other scenarios, e.g. query auto completion (QAC) and route recommendation. We propose a general debiasing framework without simplifying the relationships between variables, which can handle all scenarios in CTR prediction. Simulation experiments show that: under the simplest scenario, our method maintains a similar AUC with the state-of-the-art methods; in other scenarios, our method achieves considerable improvements compared with existing methods. Meanwhile, in online experiments, the framework also gains significant improvements consistently.
【58】 BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery 标题:BCD网:贝叶斯因果发现的可扩展变分方法 链接:https://arxiv.org/abs/2112.02761
作者:Chris Cundy,Aditya Grover,Stefano Ermon 机构:Department of Computer Science, Stanford University, Facebook AI Research, University of California, Los Angeles 备注:Neural Information Processing Systems 2021 摘要:结构方程模型(SEM)是通过有向无环图(DAG)对因果关系进行推理的有效框架。最近的进展使得能够从观测数据对DAG进行有效的最大似然点估计。然而,在实际场景中,点估计可能无法准确捕获推断基础图时的不确定性,其中真实DAG是不可识别的和/或观测数据集是有限的。我们提出了贝叶斯因果发现网(BCD网),这是一种变分推理框架,用于估计线性高斯SEM的DAG上的分布。由于图的离散性和组合性,在DAG上开发一个完整的贝叶斯后验具有挑战性。我们分析了DAG上可伸缩VI的关键设计选择,例如1)通过表达性变分族对DAG进行参数化,2)实现低方差随机优化的连续松弛,以及3)对潜在变量的适当优先级。我们对真实数据和合成数据进行了一系列实验,结果表明,BCD网络在低数据区的结构汉明距离等标准因果发现指标上优于最大似然方法。 摘要:A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes.
【59】 End-to-end Adaptive Distributed Training on PaddlePaddle 标题:基于PaddlePaddle的端到端自适应分布式训练 链接:https://arxiv.org/abs/2112.02752
作者:Yulong Ao,Zhihua Wu,Dianhai Yu,Weibao Gong,Zhiqing Kui,Minxu Zhang,Zilingfeng Ye,Liang Shen,Yanjun Ma,Tian Wu,Haifeng Wang,Wei Zeng,Chao Yang 机构: Baidu Inc., Peking University 备注:16 pages, 10 figures, 4 tables 摘要:分布式训练已成为训练处理海量数据的大型神经网络(NN)模型的一种普遍而有效的方法。然而,要满足不同的神经网络模型、不同的计算资源及其在训练工作中的动态变化的需求是非常具有挑战性的。在本研究中,我们以系统的端到端视图设计了分布式训练框架,通过充分考虑资源分配、模型划分、任务布置和分布式执行,为不同场景(尤其是工业应用和生产环境)提供内置的自适应能力。基于统一分布图和统一集群对象,我们的自适应框架配备了全局代价模型和全局规划器,可以实现任意并行、资源感知布局、多模式执行、容错和弹性分布式训练。实验表明,该框架能够满足应用程序的多样性和资源的异构性等方面的需求,具有很强的竞争力。具有2600亿个参数的ERNIE语言模型在数千个AI处理器上得到有效训练,可扩展性差91.7%。通过采用异构流水线异步执行,来自推荐系统的模型吞吐量可分别提高到仅GPU和仅CPU训练的2.1倍和3.3倍。此外,容错和弹性分布式训练已成功应用于在线工业应用中,使长期训练作业失败的数量减少了34.49%,生产环境中的全局调度效率提高了33.91%。 摘要:Distributed training has become a pervasive and effective approach for training a large neural network (NN) model with processing massive data. However, it is very challenging to satisfy requirements from various NN models, diverse computing resources, and their dynamic changes during a training job. In this study, we design our distributed training framework in a systematic end-to-end view to provide the built-in adaptive ability for different scenarios, especially for industrial applications and production environments, by fully considering resource allocation, model partition, task placement, and distributed execution. Based on the unified distributed graph and the unified cluster object, our adaptive framework is equipped with a global cost model and a global planner, which can enable arbitrary parallelism, resource-aware placement, multi-mode execution, fault-tolerant, and elastic distributed training. The experiments demonstrate that our framework can satisfy various requirements from the diversity of applications and the heterogeneity of resources with highly competitive performance. The ERNIE language model with 260 billion parameters is efficiently trained on thousands of AI processors with 91.7% weak scalability. The throughput of the model from the recommender system by employing the heterogeneous pipeline asynchronous execution can be increased up to 2.1 times and 3.3 times that of the GPU-only and CPU-only training respectively. Moreover, the fault-tolerant and elastic distributed training have been successfully applied to the online industrial applications, which give a reduction of 34.49% in the number of failed long-term training jobs and an increase of 33.91% for the global scheduling efficiency in the production environment.
【60】 Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based Summarization 标题:团队Hitachi@AutoMin 2021:在基于主题的摘要上构建具有论元结构的无引用自动会议记录流水线 链接:https://arxiv.org/abs/2112.02741
作者:Atsuki Yamaguchi,Gaku Morio,Hiroaki Ozaki,Ken-ichi Yokote,Kenji Nagamatsu 机构:Research and Development Group, Hitachi, Ltd., Kokubunji, Tokyo, Japan 备注:8 pages, 4 figures 摘要:本文介绍了日立团队提出的第一个共享任务自动记录系统(AutoMin-2021)。我们采用无参考的方法(即不使用训练分钟数)进行自动记录(任务a),该方法首先根据主题将成绩单拆分为块,然后使用预先训练的BART模型对聊天对话摘要语料库进行微调,对这些块进行总结。此外,我们对生成的会议记录应用了一种参数挖掘技术,以一种结构良好且连贯的方式重新组织它们。我们利用多个相关性得分来确定一分钟是否来源于同一次会议,同时给出一份会议记录或另一分钟(任务B和C)。在这些分数的基础上,我们训练一个传统的机器学习模型来约束他们并做出最终决定。因此,我们的任务A方法在所有提交材料中获得了最佳的充分性分数,并且在语法正确性和流利性方面接近最佳系统。对于任务B和C,所提出的模型成功地优于多数票基线。 摘要:This paper introduces the proposed automatic minuting system of the Hitachi team for the First Shared Task on Automatic Minuting (AutoMin-2021). We utilize a reference-free approach (i.e., without using training minutes) for automatic minuting (Task A), which first splits a transcript into blocks on the basis of topics and subsequently summarizes those blocks with a pre-trained BART model fine-tuned on a summarization corpus of chat dialogue. In addition, we apply a technique of argument mining to the generated minutes, reorganizing them in a well-structured and coherent way. We utilize multiple relevance scores to determine whether or not a minute is derived from the same meeting when either a transcript or another minute is given (Task B and C). On top of those scores, we train a conventional machine learning model to bind them and to make final decisions. Consequently, our approach for Task A achieve the best adequacy score among all submissions and close performance to the best system in terms of grammatical correctness and fluency. For Task B and C, the proposed model successfully outperformed a majority vote baseline.
【61】 JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering 标题:JointLK:常识问答的语言模型和知识图联合推理 链接:https://arxiv.org/abs/2112.02732
作者:Yueqing Sun,Qi Shi,Le Qi,Yu Zhang 机构:Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, China 摘要:现有的用于问答的KG增强模型主要集中于设计精细图神经网络(GNN)来建模知识图(KG)。然而,它们忽略了(i)对问题上下文表示和KG表示的有效融合和推理,以及(ii)在推理过程中从嘈杂的KG中自动选择相关节点。在本文中,我们提出了一种新的模型JointLK,它通过LMs和GNNs的联合推理和动态KGs剪枝机制来解决上述限制。具体地说,JointLK通过一个新颖的密集双向注意模块在LMs和GNN之间执行联合推理,其中每个问题令牌参与KG节点,每个KG节点参与问题令牌,两种模式表示通过多步交互相互融合和更新。然后,动态剪枝模块使用联合推理产生的注意权重递归剪枝不相关的KG节点。我们在CommonsenseQA和OpenBookQA数据集上的结果表明,我们的模式融合和知识修剪方法可以更好地利用相关知识进行推理。 摘要:Existing KG-augmented models for question answering primarily focus on designing elaborate Graph Neural Networks (GNNs) to model knowledge graphs (KGs). However, they ignore (i) the effectively fusing and reasoning over question context representations and the KG representations, and (ii) automatically selecting relevant nodes from the noisy KGs during reasoning. In this paper, we propose a novel model, JointLK, which solves the above limitations through the joint reasoning of LMs and GNNs and the dynamic KGs pruning mechanism. Specifically, JointLK performs joint reasoning between the LMs and the GNNs through a novel dense bidirectional attention module, in which each question token attends on KG nodes and each KG node attends on question tokens, and the two modal representations fuse and update mutually by multi-step interactions. Then, the dynamic pruning module uses the attention weights generated by joint reasoning to recursively prune irrelevant KG nodes. Our results on the CommonsenseQA and OpenBookQA datasets demonstrate that our modal fusion and knowledge pruning methods can make better use of relevant knowledge for reasoning.
【62】 NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 标题:NL-Augmenter:一种任务敏感型自然语言增强框架 链接:https://arxiv.org/abs/2112.02721
作者:Kaustubh D. Dhole,Varun Gangal,Sebastian Gehrmann,Aadesh Gupta,Zhenhao Li,Saad Mahamood,Abinaya Mahendiran,Simon Mille,Ashish Srivastava,Samson Tan,Tongshuang Wu,Jascha Sohl-Dickstein,Jinho D. Choi,Eduard Hovy,Ondrej Dusek,Sebastian Ruder,Sajant Anand,Nagender Aneja,Rabin Banjade,Lisa Barthe,Hanna Behnke,Ian Berlot-Attwell,Connor Boyle,Caroline Brun,Marco Antonio Sobrevilla Cabezudo,Samuel Cahyawijaya,Emile Chapuis,Wanxiang Che,Mukund Choudhary,Christian Clauss,Pierre Colombo,Filip Cornell,Gautier Dagan,Mayukh Das,Tanay Dixit,Thomas Dopierre,Paul-Alexis Dray,Suchitra Dubey,Tatiana Ekeinhor,Marco Di Giovanni,Rishabh Gupta,Rishabh Gupta,Louanes Hamla,Sang Han,Fabrice Harel-Canada,Antoine Honore,Ishan Jindal,Przemyslaw K. Joniak,Denis Kleyko,Venelin Kovatchev,Kalpesh Krishna,Ashutosh Kumar,Stefan Langer,Seungjae Ryan Lee,Corey James Levinson,Hualou Liang,Kaizhao Liang,Zhexiong Liu,Andrey Lukyanenko,Vukosi Marivate,Gerard de Melo,Simon Meoni,Maxime Meyer,Afnan Mir,Nafise Sadat Moosavi,Niklas Muennighoff,Timothy Sum Hon Mun,Kenton Murray,Marcin Namysl,Maria Obedkova,Priti Oli,Nivranshu Pasricha,Jan Pfister,Richard Plant,Vinay Prabhu,Vasile Pais,Libo Qin,Shahab Raji,Pawan Kumar Rajpoot,Vikas Raunak,Roy Rinberg,Nicolas Roberts,Juan Diego Rodriguez,Claude Roux,Vasconcellos P. H. S.,Ananya B. Sai,Robin M. Schmidt,Thomas Scialom,Tshephisho Sefara,Saqib N. Shamsi,Xudong Shen,Haoyue Shi,Yiwen Shi,Anna Shvets,Nick Siegel,Damien Sileo,Jamie Simon,Chandan Singh,Roman Sitelew,Priyank Soni,Taylor Sorensen,William Soto,Aman Srivastava,KV Aditya Srivatsa,Tony Sun,Mukund Varma T,A Tabassum,Fiona Anting Tan,Ryan Teehan,Mo Tiwari,Marie Tolkiehn,Athena Wang,Zijian Wang,Gloria Wang,Zijie J. Wang,Fuxuan Wei,Bryan Wilie,Genta Indra Winata,Xinyi Wu,Witold Wydmański,Tianbao Xie,Usama Yaseen,M. Yee,Jing Zhang,Yue Zhang 机构:ACKO,Agara Labs,Amelia R&D, New York,Applied Research Laboratories, The University of Texas at Austin, Bloomberg,Brigham Young University,Carnegie Mellon University,Center for Data and Computing in Natural Sciences 备注:39 pages, repository at this https URL 摘要:数据扩充是自然语言处理(NLP)中模型稳健性评估和增强训练数据多样性的重要组成部分。在本文中,我们介绍了NL Augmenter,这是一个新的基于Python的参与式自然语言增强框架,它支持创建转换(对数据的修改)和过滤器(根据特定特性进行数据拆分)。我们描述了框架和一组初始的117个转换和23个过滤器,用于各种自然语言任务。我们通过使用NL增强器的几个变换来分析流行自然语言模型的鲁棒性,从而证明了NL增强器的有效性。NL Augmenter存储库(url)上公开了基础架构、数据卡和健壮性分析结果{https://github.com/GEM-benchmark/NL-Augmenter}). 摘要:Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (url{https://github.com/GEM-benchmark/NL-Augmenter}).
【63】 Joint Symmetry Detection and Shape Matching for Non-Rigid Point Cloud 标题:非刚性点云的节点对称性检测与形状匹配 链接:https://arxiv.org/abs/2112.02713
作者:Abhishek Sharma,Maks Ovsjanikov 机构:LIX, Ecole Polytechnique, IPParis, France 备注:Under Review. arXiv admin note: substantial text overlap with arXiv:2110.02994 摘要:尽管深度函数映射在非刚性三维形状匹配中取得了成功,但目前还没有同时对自对称性和形状匹配进行建模的学习框架。尽管对称性失配导致的误差是非刚性形状匹配中的一个主要挑战,这一点仍然存在。在本文中,我们提出了一个新的框架,同时学习自对称性以及一对形状之间的成对映射。我们的关键思想是通过正则化项将自对称映射和成对映射耦合在一起,正则化项为它们提供联合约束,从而获得更精确的映射。我们在几个基准上验证了我们的方法,在这两个任务上,我们的方法都优于许多竞争性基线。 摘要:Despite the success of deep functional maps in non-rigid 3D shape matching, there exists no learning framework that models both self-symmetry and shape matching simultaneously. This is despite the fact that errors due to symmetry mismatch are a major challenge in non-rigid shape matching. In this paper, we propose a novel framework that simultaneously learns both self symmetry as well as a pairwise map between a pair of shapes. Our key idea is to couple a self symmetry map and a pairwise map through a regularization term that provides a joint constraint on both of them, thereby, leading to more accurate maps. We validate our method on several benchmarks where it outperforms many competitive baselines on both tasks.
【64】 Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification 标题:开放式词汇脑电转文本解码与零射情感分类 链接:https://arxiv.org/abs/2112.02690
作者:Zhenhailong Wang,Heng Ji 机构:University of Illinois at Urbana-Champaign 备注:9 pages, 2 figures, Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI2022) 摘要:最先进的脑-文本系统在使用神经网络直接从大脑信号解码语言方面取得了巨大成功。然而,目前的方法仅限于小型封闭词汇表,这远远不足以进行自然交流。此外,大多数高性能方法需要来自侵入性设备(如ECoG)的数据。在本文中,我们将问题扩展到自然阅读任务中的开放词汇脑电图(EEG)-文本序列-序列解码和零触发句子情感分类。我们假设人脑是一个特殊的文本编码器,并提出了一个利用预先训练的语言模型(如BART)的新框架。我们的模型在EEG-To-Text解码上获得40.1%的BLEU-1分数,在基于零拍EEG的三元情绪分类上获得55.6%的F1分数,显著优于监督基线。此外,我们还表明,我们提出的模型可以处理来自不同主题和来源的数据,一旦有足够的数据可用,就显示出一个高性能的开放词汇脑-文本系统的巨大潜力 摘要:State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks. However, current approaches are limited to small closed vocabularies which are far from enough for natural communication. In addition, most of the high-performing approaches require data from invasive devices (e.g., ECoG). In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks. We hypothesis that the human brain functions as a special text encoder and propose a novel framework leveraging pre-trained language models (e.g., BART). Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines. Furthermore, we show that our proposed model can handle data from various subjects and sources, showing great potential for a high-performance open vocabulary brain-to-text system once sufficient data is available
【65】 BERTMap: A BERT-based Ontology Alignment System 标题:BERTMap:一种基于ERT的本体对齐系统 链接:https://arxiv.org/abs/2112.02682
作者:Yuan He,Jiaoyan Chen,Denvar Antonyrajah,Ian Horrocks 机构: Department of Computer Science, University of Oxford, UK, Samsung Research, UK 备注:Full version (with appendix) of the accepted paper in 36th AAAI Conference on Artificial Intelligence 2022 摘要:本体匹配在知识集成中起着至关重要的作用。由于机器学习在许多领域的成功,它已经被应用于OM中。然而,现有的方法,通常采用特别的特征工程或非上下文单词嵌入,尚未超过基于规则的系统,尤其是在无监督的环境中。在本文中,我们提出了一种新的OM系统BERTMap,它可以同时支持无监督和半监督设置。该方法首先在对从本体中提取的文本语义语料库的上下文嵌入模型进行微调的基础上,使用分类器对映射进行预测,然后利用本体结构和逻辑通过扩展和修复对映射进行细化。我们对生物医学本体的三个校准任务的评估表明,BERTMap通常比领先的OM系统LogMap和AML性能更好。 摘要:Ontology alignment (a.k.a ontology matching (OM)) plays a critical role in knowledge integration. Owing to the success of machine learning in many domains, it has been applied in OM. However, the existing methods, which often adopt ad-hoc feature engineering or non-contextual word embeddings, have not yet outperformed rule-based systems especially in an unsupervised setting. In this paper, we propose a novel OM system named BERTMap which can support both unsupervised and semi-supervised settings. It first predicts mappings using a classifier based on fine-tuning the contextual embedding model BERT on text semantics corpora extracted from ontologies, and then refines the mappings through extension and repair by utilizing the ontology structure and logic. Our evaluation with three alignment tasks on biomedical ontologies demonstrates that BERTMap can often perform better than the leading OM systems LogMap and AML.
【66】 Diverse, Global and Amortised Counterfactual Explanations for Uncertainty Estimates 标题:不确定性估计的多样化、全局性和摊销反事实解释 链接:https://arxiv.org/abs/2112.02646
作者:Dan Ley,Umang Bhatt,Adrian Weller 机构:University of Cambridge, UK,The Alan Turing Institute, UK 备注:Accepted as a conference paper to AAAI 2022 摘要:为了从可微概率模型中解释不确定性估计,最近的工作建议为模型不确定的给定数据点生成单个反事实潜在不确定性解释(CLUE),识别输入的单个流形变化,从而使模型在其预测中变得更加确定。我们将探索范围扩大到检查{delta}-线索,即潜在空间中原始输入的{delta}球内的一组潜在线索。我们研究了这些集合的多样性,发现许多线索是多余的;因此,我们提出了不同的线索({nabla}-CLUE),这是一组线索,每个线索都对如何减少与输入相关的不确定性提出了不同的解释。然后,我们进一步提出了全局摊销线索(GLAM-CLUE),这是一种独特而新颖的方法,它可以学习特定不确定输入组上的摊销映射,并在单个函数调用中将其有效地转换为模型确定的输入。我们的实验表明{delta}-CLUE、{nabla}-CLUE和GLAM-CLUE都解决了CLUE的缺点,并为实践者提供了不确定性估计的有益解释。 摘要:To interpret uncertainty estimates from differentiable probabilistic models, recent work has proposed generating a single Counterfactual Latent Uncertainty Explanation (CLUE) for a given data point where the model is uncertain, identifying a single, on-manifold change to the input such that the model becomes more certain in its prediction. We broaden the exploration to examine {delta}-CLUE, the set of potential CLUEs within a {delta} ball of the original input in latent space. We study the diversity of such sets and find that many CLUEs are redundant; as such, we propose DIVerse CLUE ({nabla}-CLUE), a set of CLUEs which each propose a distinct explanation as to how one can decrease the uncertainty associated with an input. We then further propose GLobal AMortised CLUE (GLAM-CLUE), a distinct and novel method which learns amortised mappings on specific groups of uncertain inputs, taking them and efficiently transforming them in a single function call into inputs for which a model will be certain. Our experiments show that {delta}-CLUE, {nabla}-CLUE, and GLAM-CLUE all address shortcomings of CLUE and provide beneficial explanations of uncertainty estimates to practitioners.
【67】 Modeling Live Video Streaming: Real-Time Classification, QoE Inference, and Field Evaluation 标题:实时视频流建模:实时分类、QoE推断和现场评估 链接:https://arxiv.org/abs/2112.02637
作者:Sharat Chandra Madanapalli,Alex Mathai,Hassan Habibi Gharakheili,Vijay Sivaraman 摘要:在Twitch和YouTube live等平台上,社交媒体、职业体育和视频游戏正在推动实时视频流的快速增长。实时流媒体体验非常容易受到短时网络拥塞的影响,因为客户端播放缓冲区通常不超过几秒钟。不幸的是,识别此类流并测量其QoE以进行网络管理是一项挑战,因为内容提供商在直播和视频点播(VoD)流中基本上使用相同的交付基础设施,并且数据包检查技术(包括SNI/DNS查询监控)无法始终区分这两者。在本文中,我们设计、构建并部署了ReCLive:一种基于网络级行为特征的实时视频检测和QoE测量的机器学习方法。我们的贡献有四个方面:(1)我们分析了来自Twitch和YouTube的约23000个视频流,并确定了其流量特征中区分直播和点播流的关键特征。我们将我们的流量跟踪作为公开数据发布给公众;(2) 我们开发了一个基于LSTM的二进制分类器模型,该模型实时区分实时流和按需流,跨提供商的准确率超过95%;(3) 我们开发了一种方法,根据分辨率和缓冲区暂停事件估计实时流媒体流的QoE度量,总体准确率分别为93%和90%;(4)最后,我们对我们的解决方案进行了原型化,在实验室对其进行了训练,并将其部署到一个服务于7000多名用户的实时ISP网络中。我们的方法为ISP提供对实时视频流的细粒度可见性,使他们能够测量和改善用户体验。 摘要:Social media, professional sports, and video games are driving rapid growth in live video streaming, on platforms such as Twitch and YouTube Live. Live streaming experience is very susceptible to short-time-scale network congestion since client playback buffers are often no more than a few seconds. Unfortunately, identifying such streams and measuring their QoE for network management is challenging, since content providers largely use the same delivery infrastructure for live and video-on-demand (VoD) streaming, and packet inspection techniques (including SNI/DNS query monitoring) cannot always distinguish between the two. In this paper, we design, build, and deploy ReCLive: a machine learning method for live video detection and QoE measurement based on network-level behavioral characteristics. Our contributions are four-fold: (1) We analyze about 23,000 video streams from Twitch and YouTube, and identify key features in their traffic profile that differentiate live and on-demand streaming. We release our traffic traces as open data to the public; (2) We develop an LSTM-based binary classifier model that distinguishes live from on-demand streams in real-time with over 95% accuracy across providers; (3) We develop a method that estimates QoE metrics of live streaming flows in terms of resolution and buffer stall events with overall accuracies of 93% and 90%, respectively; and (4) Finally, we prototype our solution, train it in the lab, and deploy it in a live ISP network serving more than 7,000 subscribers. Our method provides ISPs with fine-grained visibility into live video streams, enabling them to measure and improve user experience.
【68】 The Complexity of Data-Driven Norm Synthesis and Revision 标题:数据驱动的规范合成与修订的复杂性 链接:https://arxiv.org/abs/2112.02626
作者:Davide Dell'Anna,Natasha Alechina,Brian Logan,Maarten Löffler,Fabiano Dalpiaz,Mehdi Dastani 机构:Utrecht University 摘要:在多智能体系统(MAS)中,规范作为协调和控制智能体活动的一种方式被广泛提出。规范规定了代理人应遵循的行为,以实现MAS的目标。然而,设计规范以实现特定的系统目标可能很困难,特别是当系统目标的表述语言与规范的表达语言之间没有直接联系时。在本文中,我们考虑从代理行为的踪迹合成一个范数的问题,其中每个迹被标记为行为是否满足系统目标。我们证明了范数综合问题是NP完全的。 摘要:Norms have been widely proposed as a way of coordinating and controlling the activities of agents in a multi-agent system (MAS). A norm specifies the behaviour an agent should follow in order to achieve the objective of the MAS. However, designing norms to achieve a particular system objective can be difficult, particularly when there is no direct link between the language in which the system objective is stated and the language in which the norms can be expressed. In this paper, we consider the problem of synthesising a norm from traces of agent behaviour, where each trace is labelled with whether the behaviour satisfies the system objective. We show that the norm synthesis problem is NP-complete.
【69】 Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View 标题:医疗保健中可解释的深度学习:归因视角的方法论考察 链接:https://arxiv.org/abs/2112.02625
作者:Di Jin,Elena Sergeeva,Wei-Hung Weng,Geeticka Chauhan,Peter Szolovits 机构:Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA 备注:The first four authors contributed equally, psz is the corresponding author. To appear as an advanced review in WIREs Mechanisms of Disease Journal 摘要:电子病历(EHR)数据的大量收集和深度学习(DL)领域前所未有的技术进步,引发了人们对开发基于DL的诊断、预后和治疗临床决策支持系统的研究兴趣。尽管人们认识到深度学习在医疗保健中的价值,但由于DL的黑箱性质,在实际医疗保健环境中进一步采用的障碍仍然存在。因此,出现了对可解释DL的需求,它允许最终用户在采取行动之前评估模型决策,以了解是接受还是拒绝预测和建议。在这篇综述中,我们重点讨论了DL模型在医疗保健中的可解释性。我们从深入全面地介绍解释性方法开始,作为该领域未来研究人员或临床工作者的方法学参考。除了这些方法的细节之外,我们还讨论了这些方法的优缺点,以及每种方法适用于哪些场景,以便感兴趣的读者能够了解如何在这些方法中进行比较和选择。此外,我们还讨论了这些最初为解决一般领域问题而开发的方法是如何适应和应用于医疗保健问题的,以及它们如何帮助医生更好地理解这些数据驱动技术。总的来说,我们希望这项调查能够帮助人工智能(AI)和临床领域的研究人员和实践者了解我们有哪些方法来增强其DL模型的可解释性,并据此选择最佳的方法。 摘要:The increasing availability of large collections of electronic health record (EHR) data and unprecedented technical advances in deep learning (DL) have sparked a surge of research interest in developing DL based clinical decision support systems for diagnosis, prognosis, and treatment. Despite the recognition of the value of deep learning in healthcare, impediments to further adoption in real healthcare settings remain due to the black-box nature of DL. Therefore, there is an emerging need for interpretable DL, which allows end users to evaluate the model decision making to know whether to accept or reject predictions and recommendations before an action is taken. In this review, we focus on the interpretability of the DL models in healthcare. We start by introducing the methods for interpretability in depth and comprehensively as a methodological reference for future researchers or clinical practitioners in this field. Besides the methods' details, we also include a discussion of advantages and disadvantages of these methods and which scenarios each of them is suitable for, so that interested readers can know how to compare and choose among them for use. Moreover, we discuss how these methods, originally developed for solving general-domain problems, have been adapted and applied to healthcare problems and how they can help physicians better understand these data-driven technologies. Overall, we hope this survey can help researchers and practitioners in both artificial intelligence (AI) and clinical fields understand what methods we have for enhancing the interpretability of their DL models and choose the optimal one accordingly.
【70】 Dynamic Token Normalization Improves Vision Transformer 标题:动态标记归一化改进了视觉转换器 链接:https://arxiv.org/abs/2112.02624
作者:Wenqi Shao,Yixiao Ge,Zhaoyang Zhang,Xuyuan Xu,Xiaogang Wang,Ying Shan,Ping Luo 机构: The Chinese University of Hong Kong, ARC Lab, Tencent PCG, AI Technology Center of Tencent Video, The University of Hong Kong 备注:18 pages, 12 Tables, 9 Figures 摘要:视觉转换器(ViT)及其变体(例如,SWN、PVT)在各种计算机视觉任务中取得了巨大成功,因为它们能够学习远程上下文信息。层规范化(LN)是这些模型中的一个重要组成部分。然而,我们发现普通LN使不同位置的标记在大小上相似,因为它规范化了每个标记中的嵌入。Transformer很难捕捉电感偏置,例如LN图像中的位置背景。我们通过提出一种新的规范化器来解决这个问题,称为动态令牌规范化(DTN),其中在每个令牌内(令牌内)和不同令牌间(令牌间)执行规范化。DTN有几个优点。首先,它建立在一个统一的公式上,因此可以表示各种现有的规范化方法。其次,DTN学习以令牌内和令牌间的方式规范化令牌,使转换器能够捕获全局上下文信息和本地位置上下文。{第三,通过简单地替换LN层,DTN可以很容易地插入各种视觉变换器,如ViT、SWN、PVT、LeViT、T2T ViT、BigBird和Reformer。大量实验表明,配备DTN的变换器在最小的额外参数和计算开销下始终优于基线模型。例如,DTN输出在ImageNet上以$0.5%$-$1.2%$top-1精度执行LN,在COCO基准上以$1.2$-$1.4$box AP进行目标检测,在ImageNet-C上以$2.3%$-$3.9%$mCE进行鲁棒性实验,在远程竞技场上以$0.5%$-$0.8%$精度进行远程列表操作。}代码将在url公开{https://github.com/wqshao126/DTN} 摘要:Vision Transformer (ViT) and its variants (e.g., Swin, PVT) have achieved great success in various computer vision tasks, owing to their capability to learn long-range contextual information. Layer Normalization (LN) is an essential ingredient in these models. However, we found that the ordinary LN makes tokens at different positions similar in magnitude because it normalizes embeddings within each token. It is difficult for Transformers to capture inductive bias such as the positional context in an image with LN. We tackle this problem by proposing a new normalizer, termed Dynamic Token Normalization (DTN), where normalization is performed both within each token (intra-token) and across different tokens (inter-token). DTN has several merits. Firstly, it is built on a unified formulation and thus can represent various existing normalization methods. Secondly, DTN learns to normalize tokens in both intra-token and inter-token manners, enabling Transformers to capture both the global contextual information and the local positional context. {Thirdly, by simply replacing LN layers, DTN can be readily plugged into various vision transformers, such as ViT, Swin, PVT, LeViT, T2T-ViT, BigBird and Reformer. Extensive experiments show that the transformer equipped with DTN consistently outperforms baseline model with minimal extra parameters and computational overhead. For example, DTN outperforms LN by $0.5%$ - $1.2%$ top-1 accuracy on ImageNet, by $1.2$ - $1.4$ box AP in object detection on COCO benchmark, by $2.3%$ - $3.9%$ mCE in robustness experiments on ImageNet-C, and by $0.5%$ - $0.8%$ accuracy in Long ListOps on Long-Range Arena.} Codes will be made public at url{https://github.com/wqshao126/DTN}
【71】 Probabilistic Deep Learning to Quantify Uncertainty in Air Quality Forecasting 标题:概率深度学习在空气质量预报不确定性量化中的应用 链接:https://arxiv.org/abs/2112.02622
作者:Abdulmajid Murad,Frank Alexander Kraemer,Kerstin Bach,Gavin Taylor 机构:Norwegian University of Science and Technology, United States Naval Academy 备注:None 摘要:数据驱动的空气质量预测最近实现了更准确的短期预测。尽管取得了成功,但大多数当前的数据驱动解决方案都缺乏对模型不确定性的适当量化,无法传达对预测的信任程度。最近,在概率深度学习中开发了几种实用的不确定性估计工具。然而,这些工具在空气质量预测领域尚未得到实证应用和广泛比较。因此,这项工作将最先进的不确定性量化技术应用于实际环境中的空气质量预测。通过大量实验,我们描述了训练概率模型,并基于经验性能、置信度估计的可靠性和实用性评估了它们的预测不确定性。我们还建议使用“自由”对抗训练和利用空气质量数据固有的时间和空间相关性来改进这些模型。我们的实验表明,提出的模型在量化数据驱动的空气质量预测中的不确定性方面比以前的工作表现得更好。总的来说,贝叶斯神经网络提供了一个更可靠的不确定性估计,但在实现和扩展方面具有挑战性。其他可扩展的方法,如deep ensemble、Monte Carlo(MC)dropout和随机加权平均高斯(SWAG),如果应用正确,可以表现良好,但在性能指标上有不同的权衡和细微的变化。最后,我们的结果显示了不确定性估计的实际影响,并证明了概率模型确实更适合于做出明智的决策。代码和数据集位于url{https://github.com/Abdulmajid-Murad/deep_probabilistic_forecast} 摘要:Data-driven forecasts of air quality have recently achieved more accurate short-term predictions. Despite their success, most of the current data-driven solutions lack proper quantifications of model uncertainty that communicate how much to trust the forecasts. Recently, several practical tools to estimate uncertainty have been developed in probabilistic deep learning. However, there have not been empirical applications and extensive comparisons of these tools in the domain of air quality forecasts. Therefore, this work applies state-of-the-art techniques of uncertainty quantification in a real-world setting of air quality forecasts. Through extensive experiments, we describe training probabilistic models and evaluate their predictive uncertainties based on empirical performance, reliability of confidence estimate, and practical applicability. We also propose improving these models using "free" adversarial training and exploiting temporal and spatial correlation inherent in air quality data. Our experiments demonstrate that the proposed models perform better than previous works in quantifying uncertainty in data-driven air quality forecasts. Overall, Bayesian neural networks provide a more reliable uncertainty estimate but can be challenging to implement and scale. Other scalable methods, such as deep ensemble, Monte Carlo (MC) dropout, and stochastic weight averaging-Gaussian (SWAG), can perform well if applied correctly but with different tradeoffs and slight variations in performance metrics. Finally, our results show the practical impact of uncertainty estimation and demonstrate that, indeed, probabilistic models are more suitable for making informed decisions. Code and dataset are available at url{https://github.com/Abdulmajid-Murad/deep_probabilistic_forecast}
【72】 PSI: A Pedestrian Behavior Dataset for Socially Intelligent Autonomous Car 标题:PSI:一种面向社会智能自动驾驶汽车的行人行为数据集 链接:https://arxiv.org/abs/2112.02604
作者:Tina Chen,Renran Tian,Yaobin Chen,Joshua Domeyer,Heishiro Toyoda,Rini Sherony,Taotao Jing,Zhengming Ding 机构:Electrical & Computer Engineering, IUPUI, Computer Information Technology, Collaborative Safety Research Center, Toyota Motor North America, Toyota Research Institute, Computer Science, Tulane University 摘要:行人行为预测对于全自动车辆在繁忙的城市街道上安全高效地行驶至关重要。未来的自动驾驶汽车不仅需要具备技术能力,还需要具备社会能力。随着越来越多的算法和数据集被开发用于预测行人行为,这些工作缺乏基准标签和能力来估计行人的时间动态意图变化,提供交互场景的解释,并支持具有社会智能的算法。本文提出并共享了另一个称为IUPUI-PSI(PSI)数据的基准数据集,该数据集除了全面的计算机视觉标签外,还具有两个创新的标签。第一个新颖的标签是行人在ego车辆前通过的动态意图变化,由24名具有不同背景的驾驶员实现。第二个是基于文本的解释,在评估行人意图和预测他们在互动期间的行为时,驾驶员的推理过程。这些创新的标签可以实现多个计算机视觉任务,包括行人意图/行为预测、车辆-行人交互分割以及视频到语言的可解释算法映射。发布的数据集可以从根本上改进行人行为预测模型的开发,并开发出具有社会智能的自动驾驶汽车,以有效地与行人互动。数据集已通过不同的任务进行评估,并发布给公众访问。 摘要:Prediction of pedestrian behavior is critical for fully autonomous vehicles to drive in busy city streets safely and efficiently. The future autonomous cars need to fit into mixed conditions with not only technical but also social capabilities. As more algorithms and datasets have been developed to predict pedestrian behaviors, these efforts lack the benchmark labels and the capability to estimate the temporal-dynamic intent changes of the pedestrians, provide explanations of the interaction scenes, and support algorithms with social intelligence. This paper proposes and shares another benchmark dataset called the IUPUI-CSRC Pedestrian Situated Intent (PSI) data with two innovative labels besides comprehensive computer vision labels. The first novel label is the dynamic intent changes for the pedestrians to cross in front of the ego-vehicle, achieved from 24 drivers with diverse backgrounds. The second one is the text-based explanations of the driver reasoning process when estimating pedestrian intents and predicting their behaviors during the interaction period. These innovative labels can enable several computer vision tasks, including pedestrian intent/behavior prediction, vehicle-pedestrian interaction segmentation, and video-to-language mapping for explainable algorithms. The released dataset can fundamentally improve the development of pedestrian behavior prediction models and develop socially intelligent autonomous cars to interact with pedestrians efficiently. The dataset has been evaluated with different tasks and is released to the public to access.
【73】 A Novel Approach to Solving Goal-Achieving Problems for Board Games 标题:一种解决棋类游戏目标实现问题的新方法 链接:https://arxiv.org/abs/2112.02563
作者:Chung-Chin Shih,Ti-Rong Wu,Ting Han Wei,I-Chen Wu 机构:National Yang Ming Chiao Tung University, Hsinchu, Taiwan, Research Center for Information Technology Innovation, Academia Sinica, Taiwan, Department of Computing Science, University of Alberta, Edmonton, Canada 备注:Accepted by AAAI2022. In this version, supplementary materials are added 摘要:目标实现问题是一个难题,它设定了一个明确的目标和特定的情况。一个研究得很好的例子是围棋的生死(L&D)问题类别,它帮助玩家磨练识别区域安全的技能。以前的许多方法,如lambda搜索,都是先尝试空移动,然后派生出所谓的关联区域(RZs),对手不需要在该区域之外搜索。本文首先提出了一种新的基于RZ的方法,称为基于RZ的搜索(RZS),用于解决Go的L&D问题。RZS先尝试移动,然后再确定它们是否为空移动。这意味着我们不需要依赖于空移动启发式,从而产生一个更优雅的算法,这样它也可以无缝地融入AlphaZero在我们的解算器中的超人级游戏中。为了重新利用AlphaZero解决问题,我们还提出了一种新的训练方法,称为更快的生活(FTL),它修改了AlphaZero以吸引它更快地赢得比赛。我们使用RZS和FTL在运行中解决L&D问题,即解决专业L&D书籍中106个问题中的68个,而以前的程序仅解决11个。最后,我们讨论了该方法的通用性,即RZS适用于解决许多其他棋盘游戏的目标实现问题。 摘要:Goal-achieving problems are puzzles that set up a specific situation with a clear objective. An example that is well-studied is the category of life-and-death (L&D) problems for Go, which helps players hone their skill of identifying region safety. Many previous methods like lambda search try null moves first, then derive so-called relevance zones (RZs), outside of which the opponent does not need to search. This paper first proposes a novel RZ-based approach, called the RZ-Based Search (RZS), to solving L&D problems for Go. RZS tries moves before determining whether they are null moves post-hoc. This means we do not need to rely on null move heuristics, resulting in a more elegant algorithm, so that it can also be seamlessly incorporated into AlphaZero's super-human level play in our solver. To repurpose AlphaZero for solving, we also propose a new training method called Faster to Life (FTL), which modifies AlphaZero to entice it to win more quickly. We use RZS and FTL to solve L&D problems on Go, namely solving 68 among 106 problems from a professional L&D book while a previous program solves 11 only. Finally, we discuss that the approach is generic in the sense that RZS is applicable to solving many other goal-achieving problems for board games.
【74】 Interpretable Privacy Preservation of Text Representations Using Vector Steganography 标题:基于矢量隐写的文本表示的可解释隐私保护 链接:https://arxiv.org/abs/2112.02557
作者:Geetanjali Bihani 机构:Purdue University 备注:Accepted at AAAI 2022 Doctoral Consortium 摘要:由语言模型(LMs)生成的上下文词表示学习训练语料库中存在的虚假联想。最近的发现表明,对手可以利用这些关联对语料库中提到的实体的私有属性进行反向工程。这些发现促使人们努力将语言模型的隐私风险降至最低。然而,现有的方法缺乏可解释性,在数据效用上存在妥协,并且不能提供隐私保证。因此,我博士研究的目标是开发可解释的文本表示隐私保护方法,在保证隐私的同时保留数据效用。为此,我的目标是研究和开发将隐写术修改纳入向量几何体的方法,以模糊潜在的虚假关联,并保留在训练过程中学习到的分布语义属性。 摘要:Contextual word representations generated by language models (LMs) learn spurious associations present in the training corpora. Recent findings reveal that adversaries can exploit these associations to reverse-engineer the private attributes of entities mentioned within the corpora. These findings have led to efforts towards minimizing the privacy risks of language models. However, existing approaches lack interpretability, compromise on data utility and fail to provide privacy guarantees. Thus, the goal of my doctoral research is to develop interpretable approaches towards privacy preservation of text representations that retain data utility while guaranteeing privacy. To this end, I aim to study and develop methods to incorporate steganographic modifications within the vector geometry to obfuscate underlying spurious associations and preserve the distributional semantic properties learnt during training.
【75】 Robust Active Learning: Sample-Efficient Training of Robust Deep Learning Models 标题:鲁棒主动学习:稳健深度学习模型的样本效率训练 链接:https://arxiv.org/abs/2112.02542
作者:Yuejun Guo,Qiang Hu,Maxime Cordy,Mike Papadakis,Yves Le Traon 机构:University of Luxembourg, Luxembourg 备注:10 pages 摘要:主动学习是一种成熟的技术,可以降低标记成本,从而建立高质量的机器学习模型。主动学习的一个核心组成部分是采集功能,用于确定应选择哪些数据进行注释。最先进的采集功能——更重要的是,主动学习技术——被设计为最大限度地提高干净的性能(例如准确性),而忽略了鲁棒性,这是一个受到越来越多关注的重要质量特性。因此,主动学习可以生成精确但不健壮的模型。在本文中,我们提出了emph{robust active learning},这是一种集成对抗性训练的主动学习过程,是产生鲁棒模型的最成熟的方法。通过对11个采集函数、4个数据集、6个DNN体系结构和15105个经过训练的DNN的实证研究,我们表明鲁棒主动学习可以生成鲁棒性(对抗性示例的准确度)在2.35%到63.85%之间的模型,而标准主动学习系统实现的鲁棒性可以忽略不计(小于0.20%). 然而,我们的研究还表明,在稳健性方面,准确度表现良好的采集函数比随机采样差。因此,我们研究了这背后的原因,并设计了一种新的采集功能,以干净的性能和鲁棒性为目标。我们的采集功能——命名为基于密度的鲁棒熵采样(DRE)——在鲁棒性方面比其他采集功能(包括随机)高出24.40%(特别是比随机采集功能高出3.84%),同时在准确性方面保持竞争力。此外,我们还证明了DRE可作为模型再训练的测试选择指标,并从所有比较函数中脱颖而出,鲁棒性高达8.21%。 摘要:Active learning is an established technique to reduce the labeling cost to build high-quality machine learning models. A core component of active learning is the acquisition function that determines which data should be selected to annotate. State-of-the-art acquisition functions -- and more largely, active learning techniques -- have been designed to maximize the clean performance (e.g. accuracy) and have disregarded robustness, an important quality property that has received increasing attention. Active learning, therefore, produces models that are accurate but not robust. In this paper, we propose emph{robust active learning}, an active learning process that integrates adversarial training -- the most established method to produce robust models. Via an empirical study on 11 acquisition functions, 4 datasets, 6 DNN architectures, and 15105 trained DNNs, we show that robust active learning can produce models with the robustness (accuracy on adversarial examples) ranging from 2.35% to 63.85%, whereas standard active learning systematically achieves negligible robustness (less than 0.20%). Our study also reveals, however, that the acquisition functions that perform well on accuracy are worse than random sampling when it comes to robustness. We, therefore, examine the reasons behind this and devise a new acquisition function that targets both clean performance and robustness. Our acquisition function -- named density-based robust sampling with entropy (DRE) -- outperforms the other acquisition functions (including random) in terms of robustness by up to 24.40% (3.84% than random particularly), while remaining competitive on accuracy. Additionally, we prove that DRE is applicable as a test selection metric for model retraining and stands out from all compared functions by up to 8.21% robustness.
【76】 Inf-CP: A Reliable Channel Pruning based on Channel Influence 标题:Inf-CP:一种基于信道影响的可靠信道剪枝 链接:https://arxiv.org/abs/2112.02521
作者:Bilan Lai,Haoran Xiang,Furao Shen 机构:School of Artificial Intelligence, of Nanjing University 摘要:通道修剪最有效的方法之一是根据每个神经元的重要性进行修剪。然而,测量每个神经元的重要性是一个NP难问题。以前的工作建议通过考虑单层或多层连续神经元的统计数据进行修剪。这些工作无法消除重建误差中不同数据对模型的影响,目前还没有工作证明参数的绝对值可以直接用作判断权重重要性的依据。一种更合理的方法是消除批次数据之间的差异,以准确测量影响权重。在本文中,我们建议使用集成学习为不同批次的数据训练模型,并使用影响函数(鲁棒统计中的经典技术)学习跟踪模型预测并返回其训练参数梯度的算法,以便确定每个参数的责任,在预测过程中,我们称之为“影响”。此外,我们从理论上证明了深度网络的反向传播是权重影响函数的一阶泰勒近似。我们进行了大量的实验,证明了基于影响函数的集成学习剪枝比只关注错误重构更有效。在CIFAR上的实验表明,影响修剪达到了最先进的效果。 摘要:One of the most effective methods of channel pruning is to trim on the basis of the importance of each neuron. However, measuring the importance of each neuron is an NP-hard problem. Previous works have proposed to trim by considering the statistics of a single layer or a plurality of successive layers of neurons. These works cannot eliminate the influence of different data on the model in the reconstruction error, and currently, there is no work to prove that the absolute values of the parameters can be directly used as the basis for judging the importance of the weights. A more reasonable approach is to eliminate the difference between batch data that accurately measures the weight of influence. In this paper, we propose to use ensemble learning to train a model for different batches of data and use the influence function (a classic technique from robust statistics) to learn the algorithm to track the model's prediction and return its training parameter gradient, so that we can determine the responsibility for each parameter, which we call "influence", in the prediction process. In addition, we theoretically prove that the back-propagation of the deep network is a first-order Taylor approximation of the influence function of the weights. We perform extensive experiments to prove that pruning based on the influence function using the idea of ensemble learning will be much more effective than just focusing on error reconstruction. Experiments on CIFAR shows that the influence pruning achieves the state-of-the-art result.
【77】 Neural Photometry-guided Visual Attribute Transfer 标题:神经光度学引导的视觉属性传递 链接:https://arxiv.org/abs/2112.02520
作者:Carlos Rodriguez-Pardo,Elena Garces 机构: Spain) and withUniversidad Carlos III de Madrid ( 2800 5, Spain) and with UniversidadRey Juan Carlos ( 289 3 3 备注:13 pages. To be published in Transactions on Visualizations and Computer Graphics. Project website: this http URL 摘要:我们提出了一种基于深度学习的方法,用于将空间变化的视觉材质属性(例如纹理贴图或图像样式化)传播到相同或类似材质的较大样本。对于训练,我们利用在多个照明下拍摄的材料图像和专用的数据增强策略,使传输对新的照明条件和仿射变形具有鲁棒性。我们的模型依赖于一个有监督的图像到图像的转换框架,并且对转换域是不可知的;我们展示了语义分割、法线映射和样式化。采用图像类比法,该方法只要求训练数据包含与输入制导相同的视觉结构。我们的方法以交互速率工作,使其适合于材料编辑应用程序。我们在一个受控的环境中全面评估我们的学习方法,提供定量的绩效衡量。最后,我们证明了在单一材料上训练模型足以推广到相同类型的材料,而不需要大量数据集。 摘要:We present a deep learning-based method for propagating spatially-varying visual material attributes (e.g. texture maps or image stylizations) to larger samples of the same or similar materials. For training, we leverage images of the material taken under multiple illuminations and a dedicated data augmentation policy, making the transfer robust to novel illumination conditions and affine deformations. Our model relies on a supervised image-to-image translation framework and is agnostic to the transferred domain; we showcase a semantic segmentation, a normal map, and a stylization. Following an image analogies approach, the method only requires the training data to contain the same visual structures as the input guidance. Our approach works at interactive rates, making it suitable for material edit applications. We thoroughly evaluate our learning methodology in a controlled setup providing quantitative measures of performance. Last, we demonstrate that training the model on a single material is enough to generalize to materials of the same type without the need for massive datasets.
【78】 Intention Recognition for Multiple Agents 标题:多智能体的意图识别 链接:https://arxiv.org/abs/2112.02513
作者:Zhang Zhang,Yifeng Zeng,Yingke Chen 机构:National Engineering Laboratory for Electric Vehicles, Beijing Institute of Technology, Beijing , China, University of Northumbria, Sutherland Building Newcastle-upon-Tyne, NE,ST, United Kingdom 备注:17pages, 30figures, 1 table, 2 algorithms, journal 摘要:意图识别是多智能体系统中促进协作的重要步骤。现有的研究主要集中在单个agent环境下的意图识别,并在识别过程中使用了一种描述性模型,如贝叶斯网络。在本文中,我们采用一种规定性的方法来模拟代理人的行为,其中他们的意图隐藏在执行计划中。我们在行为模型中引入了地标,从而增强了识别多个主体共同意图的信息特征。我们通过在他们的计划中只关注行动序列来进一步完善模型,并提供一个轻模型来识别和比较他们的意图。新模型提供了一种简单的方法,可以根据在代理交互中观察到的部分计划对代理的共同意图进行分组。我们提供了实验结果支持。 摘要:Intention recognition is an important step to facilitate collaboration in multi-agent systems. Existing work mainly focuses on intention recognition in a single-agent setting and uses a descriptive model, e.g. Bayesian networks, in the recognition process. In this paper, we resort to a prescriptive approach to model agents' behaviour where which their intentions are hidden in implementing their plans. We introduce landmarks into the behavioural model therefore enhancing informative features for identifying common intentions for multiple agents. We further refine the model by focusing only action sequences in their plan and provide a light model for identifying and comparing their intentions. The new models provide a simple approach of grouping agents' common intentions upon partial plans observed in agents' interactions. We provide experimental results in support.
【79】 Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI 标题:使用无网格MMI进行端到端语音识别的一致训练和解码 链接:https://arxiv.org/abs/2112.02498
作者:Jinchuan Tian,Jianwei Yu,Chao Weng,Shi-Xiong Zhang,Dan Su,Dong Yu,Yuexian Zou 机构:ADSPLAB, School of ECE, Peking University, Shenzhen, China, Tencent AI Lab 摘要:最近,端到端(E2E)框架在各种自动语音识别(ASR)任务中取得了显著的成果。然而,无格最大互信息(LF-MMI)作为混合ASR系统中表现出优异性能的区别性训练标准之一,在E2E ASR框架中很少采用。在这项工作中,我们提出了一种在训练和解码阶段将LF-MMI准则集成到E2E ASR框架中的新方法。该方法在两种最广泛使用的E2E框架上显示了其有效性,包括基于注意的编码器-解码器(AED)和神经传感器(NTs)。实验表明,LF-MMI标准的引入在不同的数据集和不同的E2E ASR框架上一致地导致了显著的性能改进。我们最好的模型在Aisell-1开发/测试集上实现了4.1%/4.4%的竞争CER;在强基线下,我们还显著减少了Aishell-2和Librispeech数据集的错误。 摘要:Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks. However, Lattice-Free Maximum Mutual Information (LF-MMI), as one of the discriminative training criteria that show superior performance in hybrid ASR systems, is rarely adopted in E2E ASR frameworks. In this work, we propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. The proposed approach shows its effectiveness on two of the most widely used E2E frameworks including Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements on various datasets and different E2E ASR frameworks. The best of our models achieves competitive CER of 4.1% / 4.4% on Aishell-1 dev/test set; we also achieve significant error reduction on Aishell-2 and Librispeech datasets over strong baselines.
【80】 Augmentation-Free Self-Supervised Learning on Graphs 标题:图上的无增广自监督学习 链接:https://arxiv.org/abs/2112.02472
作者:Namkyeong Lee,Junseok Lee,Chanyoung Park 机构: Dept. of Industrial and Systems Engineering, KAIST, Daejeon, Republic of Korea, Graduate School of Artificial Intelligence, KAIST, Daejeon, Republic of Korea 摘要:受图像自监督方法最近取得的成功的启发,图形结构数据的自监督学习得到了快速发展,尤其是基于增强的对比方法。然而,我们认为,如果没有精心设计的增广技术,图上的增广可能会表现得任意,因为图的底层语义可能会发生剧烈的变化。因此,现有基于增强的方法的性能高度依赖于增强方案的选择,即与增强相关的超参数。在本文中,我们提出了一种新的无增广自监督图学习框架AFGRL。具体来说,我们通过发现与图共享局部结构信息和全局语义的节点来生成图的另一种视图。对各种节点级任务(即节点分类、聚类和各种真实数据集上的相似性搜索)的大量实验证明了AFGRL的优越性。AFGRL的源代码可在https://github.com/Namkyeong/AFGRL. 摘要:Inspired by the recent success of self-supervised methods applied on images, self-supervised learning on graph structured data has seen rapid growth especially centered on augmentation-based contrastive methods. However, we argue that without carefully designed augmentation techniques, augmentations on graphs may behave arbitrarily in that the underlying semantics of graphs can drastically change. As a consequence, the performance of existing augmentation-based methods is highly dependent on the choice of augmentation scheme, i.e., hyperparameters associated with augmentations. In this paper, we propose a novel augmentation-free self-supervised learning framework for graphs, named AFGRL. Specifically, we generate an alternative view of a graph by discovering nodes that share the local structural information and the global semantics with the graph. Extensive experiments towards various node-level tasks, i.e., node classification, clustering, and similarity search on various real-world datasets demonstrate the superiority of AFGRL. The source code for AFGRL is available at https://github.com/Namkyeong/AFGRL.
【81】 Artificial Cognitively-inspired Generation of the Notion of Topological Group in the Context of Artificial Mathematical Intelligence 标题:人工认知启发下拓扑群概念在人工数学智能中的生成 链接:https://arxiv.org/abs/2112.02457
作者:Danny A. J. Gomez-Ramirez,Yoe A. Herrera-Jaramillo,Florian Geismann 摘要:在人工数学智能的研究计划中引入了概念计算的新计算范式。我们为拓扑群的基本数学概念提供了显式的人工生成(或概念计算)。具体来说,我们从属于拓扑和抽象代数的两个基本概念开始,并用通用代数规范语言(CASL)递归地描述形式规范。这些概念空间之间的概念整合可以在异构工具集(HETS)中通过计算实现。拓扑群的基本概念是通过基于概念整合和概念识别的三种不同的人工规范明确生成的,首先是连续函数和数学群的概念(用最小集理论条件描述)。这构成了人工数学智能第三支柱的额外启发性证据。 摘要:The new computational paradigm of conceptual computation has been introduced in the research program of Artificial Mathematical Intelligence. We provide the explicit artificial generation (or conceptual computation) for the fundamental mathematical notion of topological groups. Specifically, we start with two basic notions belonging to topology and abstract algebra, and we describe recursively formal specifications in the Common Algebraic Specification Language (CASL). The notion of conceptual blending between such conceptual spaces can be materialized computationally in the Heterogeneous Tool Set (HETS). The fundamental notion of topological groups is explicitly generated through three different artificial specifications based on conceptual blending and conceptual identification, starting with the concepts of continuous functions and mathematical groups (described with minimal set-theoretical conditions). This constitutes in additional heuristic evidence for the third pillar of Artificial Mathematical Intelligence.
【82】 Emojich -- zero-shot emoji generation using Russian language: a technical report 标题:Emojich--使用俄语生成Zero-Shot表情符号:技术报告 链接:https://arxiv.org/abs/2112.02448
作者:Alex Shonenkov,Daria Bakshandaeva,Denis Dimitrov,Aleksandr Nikolich 机构:Sber AI, MIPT, Sber AI, Lomonosov MSU, Sber AI, MIREA 备注:5 pages, 4 figures and big figure at appendix, technical report 摘要:本技术报告介绍了一个文本到图像的神经网络“Emojich”,该网络以俄语字幕为条件生成表情符号。我们的目标是在微调阶段保持预训练大模型ruDALL-E Malevich(XL)1.3B参数的泛化能力,同时为生成的图像提供特殊风格。这里介绍了一些工程方法、代码实现、用于复制结果的所有超参数以及每个人都可以创建自己的定制贴纸集的电报机器人。此外,还展示了通过“Emojich”模型获得的一些新生成的表情。 摘要:This technical report presents a text-to-image neural network "Emojich" that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage, while giving special style to the images generated. Here are presented some engineering methods, code realization, all hyper-parameters for reproducing results and a Telegram bot where everyone can create their own customized sets of stickers. Also, some newly generated emojis obtained by "Emojich" model are demonstrated.
【83】 Functional Task Tree Generation from a Knowledge Graph to Solve Unseen Problems 标题:从知识图生成求解未知问题的功能任务树 链接:https://arxiv.org/abs/2112.02433
作者:Md. Sadman Sakib,David Paulius,Yu Sun 机构: which is part of the Department of ComputerScience & Engineering at the University of South Florida, David Paulius is a postdoctoral researcher in the Human-centered AssistiveRobotics (HCR) group at the Technical University of Munich 摘要:开发智能和自主机器人的一个主要组成部分是合适的知识表示,机器人可以从中获取有关其动作或世界的知识。然而,与人类不同,机器人无法创造性地适应新的场景,因为它们的知识和环境是严格定义的。为了解决生成称为任务树的新颖灵活的任务计划的问题,我们探索了如何使用机器人知识库中最初没有的概念导出计划。以知识图形式存在的知识被用作参考基础,以创建任务树,并使用新的对象或状态组合进行修改。为了展示我们方法的灵活性,我们从Recipe1M 数据集中随机选择了配方,并生成了它们的任务树。然后,使用可视化工具对任务树进行彻底检查,该工具描述了每种成分如何随着每个动作而变化,从而生成所需的膳食。我们的结果表明,所提出的方法可以产生高精度的任务计划,即使是从未见过的成分组合。 摘要:A major component for developing intelligent and autonomous robots is a suitable knowledge representation, from which a robot can acquire knowledge about its actions or world. However, unlike humans, robots cannot creatively adapt to novel scenarios, as their knowledge and environment are rigidly defined. To address the problem of producing novel and flexible task plans called task trees, we explore how we can derive plans with concepts not originally in the robot's knowledge base. Existing knowledge in the form of a knowledge graph is used as a base of reference to create task trees that are modified with new object or state combinations. To demonstrate the flexibility of our method, we randomly selected recipes from the Recipe1M dataset and generated their task trees. The task trees were then thoroughly checked with a visualization tool that portrays how each ingredient changes with each action to produce the desired meal. Our results indicate that the proposed method can produce task plans with high accuracy even for never-before-seen ingredient combinations.
【84】 Predicting Bandwidth Utilization on Network Links Using Machine Learning 标题:基于机器学习的网络链路带宽利用率预测 链接:https://arxiv.org/abs/2112.02417
作者:Maxime Labonne,Charalampos Chatzinakis,Alexis Olivereau 机构:Institut LIST, CEA, F-, Palaiseau, France 摘要:预测网络链路上的带宽利用率对于检测拥塞非常有用,以便在拥塞发生之前进行纠正。在本文中,我们提出了一种解决方案来预测不同网络链路之间的带宽利用率,具有非常高的精度。创建一个模拟网络,以收集与每个接口上的网络链路性能相关的数据。这些数据通过特征工程进行处理和扩展,以创建训练集。为了预测未来的带宽消耗,我们评估并比较了三种机器学习算法,即ARIMA(自回归综合移动平均)、MLP(多层感知器)和LSTM(长-短期记忆)。LSTM比ARIMA和MLP的预测更准确,误差很少超过3%(ARIMA为40%,MLP为20%)。然后,我们证明了所提出的解决方案可以通过软件定义网络(SDN)平台管理的反应实时使用。 摘要:Predicting the bandwidth utilization on network links can be extremely useful for detecting congestion in order to correct them before they occur. In this paper, we present a solution to predict the bandwidth utilization between different network links with a very high accuracy. A simulated network is created to collect data related to the performance of the network links on every interface. These data are processed and expanded with feature engineering in order to create a training set. We evaluate and compare three types of machine learning algorithms, namely ARIMA (AutoRegressive Integrated Moving Average), MLP (Multi Layer Perceptron) and LSTM (Long Short-Term Memory), in order to predict the future bandwidth consumption. The LSTM outperforms ARIMA and MLP with very accurate predictions, rarely exceeding a 3% error (40% for ARIMA and 20% for the MLP). We then show that the proposed solution can be used in real time with a reaction managed by a Software-Defined Networking (SDN) platform.
【85】 PointCLIP: Point Cloud Understanding by CLIP 标题:PointCLIP:通过剪辑了解点云 链接:https://arxiv.org/abs/2112.02413
作者:Renrui Zhang,Ziyu Guo,Wei Zhang,Kunchang Li,Xupeng Miao,Bin Cui,Yu Qiao,Peng Gao,Hongsheng Li 机构:Shanghai AI Laboratory, Peking University, The Chinese University of Hong Kong 备注:Open sourced, Code and Model Available 摘要:最近,通过对比视觉语言预训练(CLIP)进行的Zero-Shot和Few-Shot学习在2D视觉识别上表现出了鼓舞人心的表现,2D视觉识别学习在开放词汇环境下将图像与其对应的文本匹配。然而,由大规模的二维图像-文本对预训练的片段是否可以推广到三维识别,这一问题仍有待研究。在本文中,我们通过提出PointCLIP来确定这种设置是可行的,PointCLIP在剪辑编码的点云和3D类别文本之间进行对齐。具体地说,我们通过将点云投影到多视图深度图中而不进行渲染来对其进行编码,并聚合视图方向的零拍预测以实现从二维到三维的知识转移。在此基础上,我们设计了一个视图间适配器,以更好地提取全局特征,并将从三维学习到的少量镜头知识自适应地融合到二维预训练的片段中。只需在少量快照设置中微调轻量级适配器,PointCLIP的性能就可以大大提高。此外,我们还观察了PointCLIP和经典3D监督网络之间的互补性。通过简单的集成,PointCLIP提高了基线的性能,甚至超过了最先进的模型。因此,PointCLIP是一种在低资源成本和数据管理下通过CLIP进行有效三维点云理解的有希望的替代方案。我们在广泛采用的ModelNet10、ModelNet40和具有挑战性的ScanObjectNN上进行了彻底的实验,以证明PointCLIP的有效性。该代码发布于https://github.com/ZrrSkywalker/PointCLIP. 摘要:Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point cloud and 3D category texts. Specifically, we encode a point cloud by projecting it into multi-view depth maps without rendering, and aggregate the view-wise zero-shot prediction to achieve knowledge transfer from 2D to 3D. On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D. By just fine-tuning the lightweight adapter in the few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the complementary property between PointCLIP and classical 3D-supervised networks. By simple ensembling, PointCLIP boosts baseline's performance and even surpasses state-of-the-art models. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding via CLIP under low resource cost and data regime. We conduct thorough experiments on widely-adopted ModelNet10, ModelNet40 and the challenging ScanObjectNN to demonstrate the effectiveness of PointCLIP. The code is released at https://github.com/ZrrSkywalker/PointCLIP.
【86】 Understanding Dynamic Spatio-Temporal Contexts in Long Short-Term Memory for Road Traffic Speed Prediction 标题:道路交通速度预测中长短期记忆中动态时空语境的理解 链接:https://arxiv.org/abs/2112.02409
作者:Won Kyung Lee,Deuk Sin Kwon,So Young Sohn 机构:Department of Industrial Engineering, Yonsei University, Shinchon-dong, Seoul ,-, Republic of Korea 备注:10pages, 2 tables, 4 figures, 2017 KDD Cup 摘要:可靠的交通流预测对于创建智能交通系统至关重要。已经开发了许多基于大数据的预测方法,但它们不能反映考虑时间和位置的道路之间复杂的动态交互。在这项研究中,我们提出了一个动态局部长短时记忆(LSTM)模型,该模型涉及道路之间的空间和时间依赖性。为此,我们使用局部动态空间权重矩阵及其动态变化。此外,LSTM模型可以处理具有长相关性以及复杂非线性特征的序列数据。实证结果表明,与两种不同的基线方法相比,该模型具有更好的预测性能。 摘要:Reliable traffic flow prediction is crucial to creating intelligent transportation systems. Many big-data-based prediction approaches have been developed but they do not reflect complicated dynamic interactions between roads considering time and location. In this study, we propose a dynamically localised long short-term memory (LSTM) model that involves both spatial and temporal dependence between roads. To do so, we use a localised dynamic spatial weight matrix along with its dynamic variation. Moreover, the LSTM model can deal with sequential data with long dependency as well as complex non-linear features. Empirical results indicated superior prediction performances of the proposed model compared to two different baseline methods.
【87】 Towards automated verification of multi-party consensus protocols 标题:走向多方共识协议的自动验证 链接:https://arxiv.org/abs/2112.02397
作者:Ivan Fedotov,Anton Khritankov,Artem Barger 机构: Moscow Institute of Physics and Technology 备注:Accepted in the 5th International Conference on Big Data and Smart Computing 摘要:区块链技术和相关框架最近受到了广泛关注。区块链系统使用多方共识协议达成交易协议。Hyperledger Fabric framework根据背书策略协议公开多方共识,以达成交易共识。在本文中,我们定义了一个具有概率性质的区块链多方共识验证问题。此外,我们提出了一种使用统计模型检验和假设检验的背书政策验证技术。我们分析了策略的几个方面,包括为组织分配权重的能力和组织的拒绝概率。我们在实验中演示了我们的验证技术的工作,以及如何使用实验结果使模型满足规范要求。可以使用我们的技术使用Hyperledger结构框架设计企业应用程序。 摘要:Blockchain technology and related frameworks have recently received extensive attention. Blockchain systems use multi-party consensus protocols to reach agreements on transactions. Hyperledger Fabric framework exposes a multi-party consensus, based on endorsement policy protocol, to reach a consensus on a transaction. In this paper, we define a problem of verification of a blockchain multi-party consensus with probabilistic properties. Further, we propose a verification technique of endorsement policies using statistical model checking and hypothesis testing. We analyze several aspects of the policies, including the ability to assign weights to organizations and the refusal probabilities of organizations. We demonstrate on experiments the work of our verification technique and how one can use experimental results to make the model satisfiable the specification. One can use our technique to design enterprise applications with the Hyperledger Fabric framework.
【88】 Overcome Anterograde Forgetting with Cycled Memory Networks 标题:用循环记忆网络克服顺行遗忘 链接:https://arxiv.org/abs/2112.02342
作者:Jian Peng,Dingqi Ye,Bo Tang,Yinjie Lei,Yu Liu,Haifeng Li 机构: School of Geosciences and Info-Physics, Central South University, Changsha , Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS , USA 备注:14 pages, 15 figures 摘要:终生从一系列任务中学习对于智能体走向人工通用智能至关重要。这要求代理不断地学习和记忆新知识而不受干扰。本文首先论证了利用神经网络进行终身学习的一个基本问题,即顺行遗忘,即保存和转移记忆可能会抑制新知识的学习。这归因于这样一个事实,即神经网络的学习能力会随着它对历史知识的记忆而降低,并且当它将不相关的旧知识转移到当前任务时,可能会出现概念混淆。本研究提出了一个称为循环记忆网络(CMN)的通用框架来解决终身学习神经网络中的顺行遗忘问题。CMN由两个独立的内存网络组成,用于存储短期和长期内存,以避免容量缩减。设计了一个转移单元来连接这两个记忆网络,使知识从长期记忆网络转移到短期记忆网络,以减轻概念上的混淆,并开发了一种记忆整合机制,将短期知识整合到长期记忆网络中以积累知识。实验结果表明,CMN能够有效地解决多个任务相关、任务冲突、类增量和跨域基准上的顺行遗忘问题。 摘要:Learning from a sequence of tasks for a lifetime is essential for an agent towards artificial general intelligence. This requires the agent to continuously learn and memorize new knowledge without interference. This paper first demonstrates a fundamental issue of lifelong learning using neural networks, named anterograde forgetting, i.e., preserving and transferring memory may inhibit the learning of new knowledge. This is attributed to the fact that the learning capacity of a neural network will be reduced as it keeps memorizing historical knowledge, and the fact that conceptual confusion may occur as it transfers irrelevant old knowledge to the current task. This work proposes a general framework named Cycled Memory Networks (CMN) to address the anterograde forgetting in neural networks for lifelong learning. The CMN consists of two individual memory networks to store short-term and long-term memories to avoid capacity shrinkage. A transfer cell is designed to connect these two memory networks, enabling knowledge transfer from the long-term memory network to the short-term memory network to mitigate the conceptual confusion, and a memory consolidation mechanism is developed to integrate short-term knowledge into the long-term memory network for knowledge accumulation. Experimental results demonstrate that the CMN can effectively address the anterograde forgetting on several task-related, task-conflict, class-incremental and cross-domain benchmarks.
【89】 Efficient Pressure: Improving efficiency for signalized intersections 标题:有效压力:提高信号交叉口的效率 链接:https://arxiv.org/abs/2112.02336
作者:Qiang Wu,Liang Zhang,Jun Shen,Linyuan Lü,Bo Du,Jianqing Wu 机构:Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of, China, Chengdu , China, School of Life Sciences, Lanzhou University, Lanzhou , China 备注:7pages, 3figures 摘要:由于传统的方法不能适应动态交通条件,强化学习(RL)在解决交通信号控制(TSC)问题上受到了越来越多的关注。然而,现有的基于RL的方法在计算资源方面既不经济,也不比传统方法更具鲁棒性,因此很少部署,这就提出了一个关键的研究问题:如何在基于RL的方法的基础上构造训练较少、复杂度较低的TSC自适应控制器?为了解决这个问题,在本文中,我们(1)创新性地将交通运动表示指定为交通网络中车辆排队的简单但有效的压力,即有效压力(EP);(2) 建立交通信号设置协议,包括TSC的相位持续时间、信号相位号和EP;(3) 在传统的最大压力(MP)方法的基础上,设计了一种TSC方法,即利用EP捕捉交通状态的高效最大压力(高效MP);(4)开发了一个通用的基于RL的TSC算法模板:EP下的高效Xlight(高效Xlight)。通过在我们的交通信号设置TSC协议中对多个真实数据集的综合实验,我们证明了有效的压力是对传统建模和基于RL的建模的补充,以设计更好的TSC方法。我们的代码在Github上发布。 摘要:Since conventional approaches could not adapt to dynamic traffic conditions, reinforcement learning (RL) has attracted more attention to help solve the traffic signal control (TSC) problem. However, existing RL-based methods are rarely deployed considering that they are neither cost-effective in terms of computing resources nor more robust than traditional approaches, which raises a critical research question: how to construct an adaptive controller for TSC with less training and reduced complexity based on RL-based approach? To address this question, in this paper, we (1) innovatively specify the traffic movement representation as a simple but efficient pressure of vehicle queues in a traffic network, namely efficient pressure (EP); (2) build a traffic signal settings protocol, including phase duration, signal phase number and EP for TSC; (3) design a TSC approach based on the traditional max pressure (MP) approach, namely efficient max pressure (Efficient-MP) using the EP to capture the traffic state; and (4) develop a general RL-based TSC algorithm template: efficient Xlight (Efficient-XLight) under EP. Through comprehensive experiments on multiple real-world datasets in our traffic signal settings' protocol for TSC, we demonstrate that efficient pressure is complementary to traditional and RL-based modeling to design better TSC methods. Our code is released on Github.
【90】 LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI 标题:LoNLI:一个测试NLI多种逻辑推理能力的可扩展框架 链接:https://arxiv.org/abs/2112.02333
作者:Ishan Tarunesh,Somak Aditya,Monojit Choudhury 机构: Samsung Korea , Microsoft Research India, Bengaluru, India 备注:arXiv admin note: substantial text overlap with arXiv:2107.07229 摘要:自然语言推理(NLI)被认为是测试自然语言理解(NLU)的代表性任务。在这项工作中,我们提出了一个可扩展的框架来集体但分类地测试NLI(以及通过扩展,NLU)所需的各种逻辑推理能力。在行为测试的推动下,我们创建了一个半合成大型测试台(363个模板,363k个示例)和一个相关框架,该框架提供以下实用工具:1)沿17个推理维度(包括语用推理)单独测试和分析推理能力,2)设计实验研究跨能力信息内容(遗漏或引入一个);3)合成性质使我们能够控制人为因素和偏差。从自由形式的自然语言模板(使用检查表)继承的自动化测试用例实例化能力,以及定义良好的功能分类,使我们能够扩展到(认知上)更难的测试用例,同时改变自然语言的复杂性。通过我们对最先进的NLI系统的分析,我们发现我们的基准确实很难(即使有额外资源方面的训练,也不是微不足道的)。有些功能更难实现。进一步的细粒度分析和微调实验揭示了关于这些功能和模型的更多见解——支持并扩展了以前的观察结果。最后,我们还进行了一项用户研究,以调查与其他模型相比,是否可以利用行为信息更好地概括某些模型。 摘要:Natural Language Inference (NLI) is considered a representative task to test natural language understanding (NLU). In this work, we propose an extensible framework to collectively yet categorically test diverse Logical reasoning capabilities required for NLI (and by extension, NLU). Motivated by behavioral testing, we create a semi-synthetic large test-bench (363 templates, 363k examples) and an associated framework that offers following utilities: 1) individually test and analyze reasoning capabilities along 17 reasoning dimensions (including pragmatic reasoning), 2) design experiments to study cross-capability information content (leave one out or bring one in); and 3) the synthetic nature enable us to control for artifacts and biases. The inherited power of automated test case instantiation from free-form natural language templates (using CheckList), and a well-defined taxonomy of capabilities enable us to extend to (cognitively) harder test cases while varying the complexity of natural language. Through our analysis of state-of-the-art NLI systems, we observe that our benchmark is indeed hard (and non-trivial even with training on additional resources). Some capabilities stand out as harder. Further fine-grained analysis and fine-tuning experiments reveal more insights about these capabilities and the models -- supporting and extending previous observations. Towards the end we also perform an user-study, to investigate whether behavioral information can be utilised to generalize much better for some models compared to others.
【91】 An Annotated Video Dataset for Computing Video Memorability 标题:一种用于计算视频记忆性的带注释的视频数据集 链接:https://arxiv.org/abs/2112.02303
作者:Rukiye Savran Kiziltepe,Lorin Sweeney,Mihai Gabriel Constantin,Faiyaz Doctor,Alba Garcia Seco de Herrera,Claire-Helene Demarty,Graham Healy,Bogdan Ionescu,Alan F. Smeaton 机构:University of Essex, UK, Insight Centre for Data Analytics, Dublin City University, Glasnevin, Dublin , Ireland, University Politehnica of Bucharest, Romania, InterDigital, R&I, France, A R T I C L E I N F O 备注:None 摘要:1275名用户使用一组公开的短格式视频剪辑链接(每个视频剪辑的平均持续时间为6秒),对每个视频进行多次手动注释,以表明视频的长期和短期记忆性。这些注释是作为在线记忆游戏的一部分收集的,并测量了参与者在观看一组视频时回忆以前看过视频的能力。识别任务是在前几分钟内看到的视频上执行的,用于短期记忆,而在前24到72小时内看到的视频用于长期记忆。数据包括每个视频每次识别的反应时间。与每个视频相关联的是文本描述(字幕)以及应用于从每个视频(开始、中间和结束)提取的3帧的图像级特征集合。还提供了视频级功能。该数据集在2020年作为中世纪基准的一部分用于视频可记忆性任务。 摘要:Using a collection of publicly available links to short form video clips of an average of 6 seconds duration each, 1,275 users manually annotated each video multiple times to indicate both long-term and short-term memorability of the videos. The annotations were gathered as part of an online memory game and measured a participant's ability to recall having seen the video previously when shown a collection of videos. The recognition tasks were performed on videos seen within the previous few minutes for short-term memorability and within the previous 24 to 72 hours for long-term memorability. Data includes the reaction times for each recognition of each video. Associated with each video are text descriptions (captions) as well as a collection of image-level features applied to 3 frames extracted from each video (start, middle and end). Video-level features are also provided. The dataset was used in the Video Memorability task as part of the MediaEval benchmark in 2020.
【92】 Stage Conscious Attention Network (SCAN) : A Demonstration-Conditioned Policy for Few-Shot Imitation 标题:阶段意识注意网络(SCAN):一种有示范条件的Few-Shot模仿策略 链接:https://arxiv.org/abs/2112.02278
作者:Jia-Fong Yeh,Chi-Ming Chung,Hung-Ting Su,Yi-Ting Chen,Winston H. Hsu 机构:National Taiwan University, National Yang Ming Chiao Tung University, Mobile Drive Technology 备注:Accepted by AAAI 2022, preprint version, first two authors contribute equally 摘要:在Few-Shot模仿学习(FSIL)中,使用行为克隆(BC)以较少的专家演示来解决看不见的任务成为一个流行的研究方向。以下功能在机器人技术应用中至关重要:(1)在包含多个阶段的复合任务中表现。(2) 从少量长度变化和错位演示中检索知识。(3) 向不同的专家学习。以前的任何工作都不能同时实现这些能力。在这项工作中,我们在上述设置的联合下进行FSIL问题,并引入一种新的阶段意识注意网络(SCAN)来同时从几个演示中检索知识。扫描使用注意模块识别长度变化演示中的每个阶段。此外,它是在示范条件下设计的,学习专家和代理人之间的关系。实验结果表明,SCAN可以从不同的专家那里学习,而无需微调,在复杂的复合任务中,SCAN的表现优于基线,具有可解释的可视化效果。 摘要:In few-shot imitation learning (FSIL), using behavioral cloning (BC) to solve unseen tasks with few expert demonstrations becomes a popular research direction. The following capabilities are essential in robotics applications: (1) Behaving in compound tasks that contain multiple stages. (2) Retrieving knowledge from few length-variant and misalignment demonstrations. (3) Learning from a different expert. No previous work can achieve these abilities at the same time. In this work, we conduct FSIL problem under the union of above settings and introduce a novel stage conscious attention network (SCAN) to retrieve knowledge from few demonstrations simultaneously. SCAN uses an attention module to identify each stage in length-variant demonstrations. Moreover, it is designed under demonstration-conditioned policy that learns the relationship between experts and agents. Experiment results show that SCAN can learn from different experts without fine-tuning and outperform baselines in complicated compound tasks with explainable visualization.
【93】 A Multi-Strategy based Pre-Training Method for Cold-Start Recommendation 标题:一种基于多策略的冷启动推荐预训练方法 链接:https://arxiv.org/abs/2112.02275
作者:Bowen Hao,Hongzhi Yin,Jing Zhang,Cuiping Li,Hong Chen 备注:12 pages, 6 figures 摘要:冷启动问题是推荐任务面临的一个基本挑战。最近,基于图神经网络(GNNs)模型的自监督学习(SSL)PT-GNN对GNN模型进行预训练,以重构冷启动嵌入,并显示出冷启动推荐的巨大潜力。然而,由于过平滑问题,PT-GNN只能捕获最多3阶关系,无法提供太多有用的辅助信息来描述目标冷启动用户或项目。此外,嵌入重建任务只考虑用户和项目子图内的内部相关性,而忽略了不同子图之间的相互相关性。为了解决上述问题,我们提出了一种基于多策略的冷启动推荐预训练方法(MPT),该方法从模型结构和借口任务的角度对PT-GNN进行了扩展,以提高冷启动推荐性能。具体来说,在模型架构方面,除了GNN编码器捕获的用户和项目的短期依赖性之外,我们还引入了Transformer编码器来捕获长期依赖性。在借口任务方面,除了通过嵌入重构任务考虑用户和项目之间的内在相关性外,我们还添加了嵌入对比学习任务来捕捉用户和项目之间的内在相关性。我们在元学习设置下对GNN和Transformer编码器进行这些借口任务的训练,以模拟真实的冷启动场景,使模型能够轻松快速地适应新的冷启动用户和项目。在三个公共推荐数据集上的实验表明,所提出的MPT模型优于普通GNN模型、预训练GNN模型的用户/项目嵌入推理和推荐任务。 摘要:Cold-start problem is a fundamental challenge for recommendation tasks. The recent self-supervised learning (SSL) on Graph Neural Networks (GNNs) model, PT-GNN, pre-trains the GNN model to reconstruct the cold-start embeddings and has shown great potential for cold-start recommendation. However, due to the over-smoothing problem, PT-GNN can only capture up to 3-order relation, which can not provide much useful auxiliary information to depict the target cold-start user or item. Besides, the embedding reconstruction task only considers the intra-correlations within the subgraph of users and items, while ignoring the inter-correlations across different subgraphs. To solve the above challenges, we propose a multi-strategy based pre-training method for cold-start recommendation (MPT), which extends PT-GNN from the perspective of model architecture and pretext tasks to improve the cold-start recommendation performance. Specifically, in terms of the model architecture, in addition to the short-range dependencies of users and items captured by the GNN encoder, we introduce a Transformer encoder to capture long-range dependencies. In terms of the pretext task, in addition to considering the intra-correlations of users and items by the embedding reconstruction task, we add embedding contrastive learning task to capture inter-correlations of users and items. We train the GNN and Transformer encoders on these pretext tasks under the meta-learning setting to simulate the real cold-start scenario, making the model easily and rapidly being adapted to new cold-start users and items. Experiments on three public recommendation datasets show the superiority of the proposed MPT model against the vanilla GNN models, the pre-training GNN model on user/item embedding inference and the recommendation task.
【94】 Self-supervised Graph Learning for Occasional Group Recommendation 标题:用于偶发群体推荐的自监督图学习 链接:https://arxiv.org/abs/2112.02274
作者:Bowen Hao,Hongzhi Yin,Jing Zhang,Cuiping Li,Hong Chen 机构:Renmin University of China, The University of Queensland, Australia 备注:11pages, 6 figures 摘要:我们研究向临时组(也称为冷启动组)推荐项目的问题,临时组是临时形成的,并且很少或没有历史交互项目。由于偶然群体与项目交互的极端稀疏性问题,很难为这些偶然群体学习高质量的嵌入。尽管最近在图形神经网络(GNNs)方面取得了一些进展,但在GNNs的图形卷积过程中,没有明确考虑高阶冷启动邻居。本文提出了一种自监督图学习范式,该范式联合训练主干GNN模型在元学习环境下重构组/用户/项目嵌入,从而直接提高嵌入质量,并易于适应新的临时组。为了进一步减少冷启动邻居的影响,我们加入了一个基于自注意的元聚合器,以增强每个图卷积步骤的聚合能力。此外,我们还添加了对比学习(CL)适配器来明确考虑组和非组成员之间的相关性。在三个公共推荐数据集上的实验结果显示了我们提出的模型相对于最先进的群体推荐方法的优越性。 摘要:We study the problem of recommending items to occasional groups (a.k.a. cold-start groups), where the occasional groups are formed ad-hoc and have few or no historical interacted items. Due to the extreme sparsity issue of the occasional groups' interactions with items, it is difficult to learn high-quality embeddings for these occasional groups. Despite the recent advances on Graph Neural Networks (GNNs) incorporate high-order collaborative signals to alleviate the problem, the high-order cold-start neighbors are not explicitly considered during the graph convolution in GNNs. This paper proposes a self-supervised graph learning paradigm, which jointly trains the backbone GNN model to reconstruct the group/user/item embeddings under the meta-learning setting, such that it can directly improve the embedding quality and can be easily adapted to the new occasional groups. To further reduce the impact from the cold-start neighbors, we incorporate a self-attention-based meta aggregator to enhance the aggregation ability of each graph convolution step. Besides, we add a contrastive learning (CL) adapter to explicitly consider the correlations between the group and non-group members. Experimental results on three public recommendation datasets show the superiority of our proposed model against the state-of-the-art group recommendation methods.
【95】 Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding 标题:为源代码理解架起预先训练的模型和下游任务之间的桥梁 链接:https://arxiv.org/abs/2112.02268
作者:Deze Wang,Zhouyang Jia,Shanshan Li,Yue Yu,Yun Xiong,Wei Dong,Xiangke Liao 机构:National University of Defense, Technology, Changsha, Hunan, China, Fudan University, Shanghai, China 备注:Accepted to the 44th International Conference on Software Engineering (ICSE 2022) 摘要:随着预训练模型的巨大成功,pretrain-then-finetune范式已被广泛应用于下游任务,以便于理解源代码。然而,与从头开始大规模模型的昂贵训练相比,如何有效地使预先训练的模型适应新的任务还没有得到充分的探索。在本文中,我们提出了一种桥接预训练模型和代码相关任务的方法。我们利用语义保持变换来丰富下游数据的多样性,并帮助预先训练的模型学习对这些语义等价变换不变的语义特征。此外,我们还引入课程学习,以简单到困难的方式组织转换后的数据,以微调现有的预训练模型。我们将我们的方法应用于一系列预先训练的模型,它们在源代码理解任务(如算法分类、代码克隆检测和代码搜索)上的表现明显优于最先进的模型。我们的实验甚至表明,在没有对代码数据进行大量预训练的情况下,使用我们的轻量级方法进行微调的自然语言预训练模型RoBERTa可以比在上述任务中进行微调的现有代码预训练模型(如CodeBERT和GraphCodeBERT)表现更好或与之竞争。这一发现表明,代码预训练模型仍有很大的改进空间。 摘要:With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to effectively adapt pre-trained models to a new task has not been fully explored. In this paper, we propose an approach to bridge pre-trained models and code-related tasks. We exploit semantic-preserving transformation to enrich downstream data diversity, and help pre-trained models learn semantic features invariant to these semantically equivalent transformations. Further, we introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models. We apply our approach to a range of pre-trained models, and they significantly outperform the state-of-the-art models on tasks for source code understanding, such as algorithm classification, code clone detection, and code search. Our experiments even show that without heavy pre-training on code data, natural language pre-trained model RoBERTa fine-tuned with our lightweight approach could outperform or rival existing code pre-trained models fine-tuned on the above tasks, such as CodeBERT and GraphCodeBERT. This finding suggests that there is still much room for improvement in code pre-trained models.
【96】 Towards the One Learning Algorithm Hypothesis: A System-theoretic Approach 标题:走向一种学习算法假设:一种系统论方法 链接:https://arxiv.org/abs/2112.02256
作者:Christos Mavridis,John Baras 机构: some of the mostimpactful breakthroughs in machine learning research haveThe authors are with the Department of Electrical and Computer Engineer-ing and the Institute for Systems Research, University of Maryland 备注:arXiv admin note: text overlap with arXiv:2102.05836 摘要:人类认知中普遍存在的学习结构是一个广泛流传的猜想,这一猜想得到了神经科学实验结果的支持。虽然还没有低层次的实现方法,但人们认为人类感知和学习的抽象轮廓包含三个基本属性:(a)层次注意和处理,(b)基于记忆的知识表示,以及(c)渐进学习和知识压缩。我们从系统理论的角度来设计这样一个学习体系结构,开发了一个包含三个主要组件的闭环系统:(i)多分辨率分析预处理器,(ii)组不变特征提取器,(iii)基于知识的渐进式学习模块。多分辨率反馈回路用于学习,即使系统参数适应在线观测。为了设计(i)和(ii),我们基于已建立的基于小波的多分辨率分析理论和群卷积算子的性质。关于(iii),我们介绍了一种新的学习算法,该算法以多分辨率构造逐步增长的知识表示。该算法是基于退火优化的在线确定性退火(ODA)算法的扩展,采用无梯度随机近似求解。ODA具有固有的鲁棒性和正则化特性,并提供了一种通过直观的分岔现象逐步增加学习模型复杂性的方法,即根据需要增加神经元的数量。提出的多分辨率方法具有层次性、渐进性、基于知识和可解释性。我们在最先进的学习算法和深度学习方法的背景下说明了所提出的体系结构的特性。 摘要:The existence of a universal learning architecture in human cognition is a widely spread conjecture supported by experimental findings from neuroscience. While no low-level implementation can be specified yet, an abstract outline of human perception and learning is believed to entail three basic properties: (a) hierarchical attention and processing, (b) memory-based knowledge representation, and (c) progressive learning and knowledge compaction. We approach the design of such a learning architecture from a system-theoretic viewpoint, developing a closed-loop system with three main components: (i) a multi-resolution analysis pre-processor, (ii) a group-invariant feature extractor, and (iii) a progressive knowledge-based learning module. Multi-resolution feedback loops are used for learning, i.e., for adapting the system parameters to online observations. To design (i) and (ii), we build upon the established theory of wavelet-based multi-resolution analysis and the properties of group convolution operators. Regarding (iii), we introduce a novel learning algorithm that constructs progressively growing knowledge representations in multiple resolutions. The proposed algorithm is an extension of the Online Deterministic Annealing (ODA) algorithm based on annealing optimization, solved using gradient-free stochastic approximation. ODA has inherent robustness and regularization properties and provides a means to progressively increase the complexity of the learning model i.e. the number of the neurons, as needed, through an intuitive bifurcation phenomenon. The proposed multi-resolution approach is hierarchical, progressive, knowledge-based, and interpretable. We illustrate the properties of the proposed architecture in the context of the state-of-the-art learning algorithms and deep learning methods.
【97】 In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers 标题:寻找歧义:为群体工作者阐明注释指南的三阶段工作流设计 链接:https://arxiv.org/abs/2112.02255
作者:Vivek Krishna Pradhan,Mike Schaekermann,Matthew Lease 机构: University of Texas at Austin, Amazon AI (work done prior to Amazon), Correspondence: 备注:22 pages, 5 figures, 4 tables 摘要:我们提出了一种新的三阶段众包注释查找-解析-标签工作流,以减少任务指令中的歧义,从而提高注释质量。第1阶段(查找)要求人群在给定任务说明的情况下查找正确标签不明确的示例。工作人员还被要求提供一个简短的标签,描述发现的特定实例所体现的模糊概念。我们在此阶段比较了协作设计与非协作设计。在第2阶段(解决),请求者选择一个或多个这些模棱两可的示例来标记(解决模棱两可)。新标签会自动注入到任务指令中,以提高清晰度。最后,在第3阶段(标签),工人使用修订后的指南进行实际注释,并举例说明。我们比较了使用这些示例的三种设计:仅示例、仅标记或两者。我们使用亚马逊的Mechanical Turk报告了六个任务设计的图像标记实验。结果表明,注释的准确性得到了提高,并进一步深入了解了众包注释任务的有效设计。 摘要:We propose a novel three-stage FIND-RESOLVE-LABEL workflow for crowdsourced annotation to reduce ambiguity in task instructions and thus improve annotation quality. Stage 1 (FIND) asks the crowd to find examples whose correct label seems ambiguous given task instructions. Workers are also asked to provide a short tag which describes the ambiguous concept embodied by the specific instance found. We compare collaborative vs. non-collaborative designs for this stage. In Stage 2 (RESOLVE), the requester selects one or more of these ambiguous examples to label (resolving ambiguity). The new label(s) are automatically injected back into task instructions in order to improve clarity. Finally, in Stage 3 (LABEL), workers perform the actual annotation using the revised guidelines with clarifying examples. We compare three designs for using these examples: examples only, tags only, or both. We report image labeling experiments over six task designs using Amazon's Mechanical Turk. Results show improved annotation accuracy and further insights regarding effective design for crowdsourced annotation tasks.
【98】 A Game-Theoretic Approach for AI-based Botnet Attack Defence 标题:基于人工智能的僵尸网络攻击防御的博弈论方法 链接:https://arxiv.org/abs/2112.02223
作者:Hooman Alavizadeh,Julian Jang-Jaccard,Tansu Alpcan,Seyit A. Camtepe 机构:Cybersecurity Lab, Comp SciInfo Tech, Massey University, Auckland, NEW ZEALAND, Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, AUSTRALIA, Seyit Camtepe, Data, CSIRO 摘要:新一代僵尸网络利用人工智能(AI)技术隐藏僵尸主机的身份和攻击意图以避免被检测。不幸的是,还没有一个现有的评估工具能够评估现有防御策略对这种基于人工智能的僵尸网络攻击的有效性。在本文中,我们提出了一个序列博弈论模型,该模型能够分析僵尸网络攻击者和防御者为达到纳什均衡(NE)而可能使用的潜在策略的细节。效用函数是在攻击者以最小的攻击成本发动最大数量的DDoS攻击,而防御者以最小的防御成本使用最大数量的防御策略的假设下计算的。我们根据不同(模拟)云带大小和不同攻击成功率值所涉及的各种防御策略进行了数值分析。我们的实验结果证实,根据对攻击率的仔细评估,防御的成功在很大程度上取决于所使用的防御策略的数量。 摘要:The new generation of botnets leverages Artificial Intelligent (AI) techniques to conceal the identity of botmasters and the attack intention to avoid detection. Unfortunately, there has not been an existing assessment tool capable of evaluating the effectiveness of existing defense strategies against this kind of AI-based botnet attack. In this paper, we propose a sequential game theory model that is capable to analyse the details of the potential strategies botnet attackers and defenders could use to reach Nash Equilibrium (NE). The utility function is computed under the assumption when the attacker launches the maximum number of DDoS attacks with the minimum attack cost while the defender utilises the maximum number of defense strategies with the minimum defense cost. We conduct a numerical analysis based on a various number of defense strategies involved on different (simulated) cloud-band sizes in relation to different attack success rate values. Our experimental results confirm that the success of defense highly depends on the number of defense strategies used according to careful evaluation of attack rates.
【99】 Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management 标题:基于数学规划的强化学习在多级库存管理中的应用 链接:https://arxiv.org/abs/2112.02215
作者:Pavithra Harsha,Ashish Jagmohan,Jayant R. Kalagnanam,Brian Quanz,Divya Singhvi 机构: IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY , USA, Stern School of Business, New York University, New York, NY , USA 备注:Accepted to NeurIPS 2021 Deep RL Workshop. Authors are listed in alphabetical order 摘要:强化学习在机器人、游戏和许多其他领域取得了重大突破。但是,在复杂的现实决策问题中,RL的应用仍然有限。运营管理中的许多问题(例如库存和收入管理)的特点是大行动空间和随机系统动力学。这些特点使得现有的RL方法很难解决问题,这些方法依赖枚举技术来解决每一步动作问题。为了解决这些问题,我们开发了可编程参与者强化学习(PARL),这是一种策略迭代方法,使用整数规划和样本平均近似的技术。通过分析,我们证明了对于给定的批评家,当潜在的不确定性样本趋于无穷大时,每次迭代中学习的策略收敛到最优策略。事实上,我们证明了对潜在的不确定性分布进行适当选择的离散化可以产生接近最优的参与者策略,即使来自潜在不确定性的样本很少。然后,我们将我们的算法应用于具有复杂供应链结构的实际库存管理问题,并表明PARL在这些设置下优于最先进的RL和库存优化方法。我们发现,在不同的供应链环境中,PARL平均比常用的基本库存启发式方法高出44.7%,而表现最好的RL方法平均高达12.1%。 摘要:Reinforcement learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others. But the application to RL in complex real-world decision making problems remains limited. Many problems in operations management (inventory and revenue management, for example) are characterized by large action spaces and stochastic system dynamics. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. Analytically, we show that the for a given critic, the learned policy in each iteration converges to the optimal policy as the underlying samples of the uncertainty go to infinity. Practically, we show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty. We then apply our algorithm to real-world inventory management problems with complex supply chain structures and show that PARL outperforms state-of-the-art RL and inventory optimization methods in these settings. We find that PARL outperforms commonly used base stock heuristic by 44.7% and the best performing RL method by up to 12.1% on average across different supply chain environments.
【100】 Behind the Curtain: Learning Occluded Shapes for 3D Object Detection 标题:幕后:学习用于3D对象检测的遮挡形状 链接:https://arxiv.org/abs/2112.02205
作者:Qiangeng Xu,Yiqi Zhong,Ulrich Neumann 机构:University of Southern California 备注:None 摘要:激光雷达传感器的进步提供了丰富的3D数据,支持3D场景理解。然而,由于遮挡和信号缺失,激光雷达点云实际上是2.5D的,因为它们只覆盖部分底层形状,这对3D感知提出了根本性的挑战。为了应对这一挑战,我们提出了一种新的基于激光雷达的三维物体检测模型,称为幕后探测器(BtcDet),该模型学习物体形状先验知识,并估计点云中部分遮挡(遮挡)的完整物体形状。BtcDet首先识别受遮挡和信号缺失影响的区域。在这些区域中,我们的模型预测占用的概率,这表明区域是否包含对象形状。与此概率图集成,BtcDet可以生成高质量的3D提案。最后,占用概率还集成到提案细化模块中,以生成最终边界框。在KITTI数据集和Waymo开放数据集上的大量实验证明了BtcDet的有效性。特别是在KITTI基准上对汽车和自行车的3D检测方面,BtcDet以惊人的优势超过了所有已发布的最新方法。代码发布了(https://github.com/Xharlie/BtcDet}{https://github.com/Xharlie/BtcDet). 摘要:Advances in LiDAR sensors provide rich 3D data that supports 3D scene understanding. However, due to occlusion and signal miss, LiDAR point clouds are in practice 2.5D as they cover only partial underlying shapes, which poses a fundamental challenge to 3D perception. To tackle the challenge, we present a novel LiDAR-based 3D object detection model, dubbed Behind the Curtain Detector (BtcDet), which learns the object shape priors and estimates the complete object shapes that are partially occluded (curtained) in point clouds. BtcDet first identifies the regions that are affected by occlusion and signal miss. In these regions, our model predicts the probability of occupancy that indicates if a region contains object shapes. Integrated with this probability map, BtcDet can generate high-quality 3D proposals. Finally, the probability of occupancy is also integrated into a proposal refinement module to generate the final bounding boxes. Extensive experiments on the KITTI Dataset and the Waymo Open Dataset demonstrate the effectiveness of BtcDet. Particularly, for the 3D detection of both cars and cyclists on the KITTI benchmark, BtcDet surpasses all of the published state-of-the-art methods by remarkable margins. Code is released (https://github.com/Xharlie/BtcDet}{https://github.com/Xharlie/BtcDet).
【101】 NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference 标题:NN-LUT:用于有效Transformer推理的非线性运算的神经逼近 链接:https://arxiv.org/abs/2112.02191
作者:Joonsang Yu,Junki Park,Seongmin Park,Minsoo Kim,Sihwa Lee,Dong Hyun Lee,Jungwook Choi 机构:NAVER AI Lab, Face, NAVER Clova, SAIT, Hanyang University 备注:7 pages, 3 figures 摘要:非线性操作,如GELU、层规范化和Softmax,是Transformer模型的重要组成部分,但成本高昂。之前的一些工作通过查找表或整数计算简化了这些操作,但此类近似精度较低,或硬件成本较高,延迟较长。本文提出了一种ac精确且硬件友好的近似框架用于有效的Transformer推断。我们的框架采用简单的神经网络作为通用近似器,其结构等效地转换为LUT。所提出的称为NN-LUT的框架可以准确地替换流行的BERT模型中的所有非线性运算,具有显著的效果减少了面积、功耗和延迟。 摘要:Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a LUT. The proposed framework called NN-LUT can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.
【102】 CTIN: Robust Contextual Transformer Network for Inertial Navigation 标题:CTIN:面向惯性导航的鲁棒上下文Transformer网络 链接:https://arxiv.org/abs/2112.02143
作者:Bingbing Rao,Ehsan Kazemi,Yifan Ding,Devu M Shila,Frank M. Tucker,Liqiang Wang 机构:Department of Computer Science, University of Central Florida, Orlando, FL, USA, Unknot.id Inc., Orlando, FL, USA, U.S. Army CCDC SC, Orlando, FL, USA 备注:Accepted as technical research paper in 36th AAAI Conference on Artificial Intelligence, 2022 摘要:近年来,数据驱动的惯性导航方法已经证明了它们能够利用经过良好训练的神经网络从惯性测量单元(IMU)的测量中获得精确的位置估计。本文提出了一种新的基于上下文变换的鲁棒惯性导航网络(CTIN)为了准确预测速度和轨迹。为此,我们首先设计了一个基于ResNet的编码器,该编码器通过局部和全局多头部自我注意来增强,以从IMU测量中捕获空间上下文信息。然后,我们通过在Transformer解码器中利用多头部注意,将这些空间表示与时间知识融合。最后,利用减少不确定性的多任务学习来提高速度和轨迹的学习效率和预测精度。通过对大量惯性数据集(例如RIDI、OxIOD、RoNIN、IDOL和our own)的大量实验,CTIN非常稳健,优于最先进的模型。 摘要:Recently, data-driven inertial navigation approaches have demonstrated their capability of using well-trained neural networks to obtain accurate position estimates from inertial measurement units (IMU) measurements. In this paper, we propose a novel robust Contextual Transformer-based network for Inertial Navigation~(CTIN) to accurately predict velocity and trajectory. To this end, we first design a ResNet-based encoder enhanced by local and global multi-head self-attention to capture spatial contextual information from IMU measurements. Then we fuse these spatial representations with temporal knowledge by leveraging multi-head attention in the Transformer decoder. Finally, multi-task learning with uncertainty reduction is leveraged to improve learning efficiency and prediction accuracy of velocity and trajectory. Through extensive experiments over a wide range of inertial datasets~(e.g. RIDI, OxIOD, RoNIN, IDOL, and our own), CTIN is very robust and outperforms state-of-the-art models.
【103】 Combining Embeddings and Fuzzy Time Series for High-Dimensional Time Series Forecasting in Internet of Energy Applications 标题:嵌入与模糊时间序列相结合的高维时间序列预测在能源互联网中的应用 链接:https://arxiv.org/abs/2112.02140
作者:Hugo Vinicius Bitencourt,Luiz Augusto Facury de Souza,Matheus Cascalho dos Santos,Petrônio Cândido de Lima e Silva,Frederico Gadelha Guimarães 机构:Lima e Silvac, Frederico Gadelha Guimar˜aesb,∗, Graduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil 备注:18 pages, 3 figures 摘要:住宅用电预测对于帮助智能电网管理和保存能源以确保高效使用至关重要。在客户层面进行准确的能源预测将直接反映整个电网系统的效率提高,然而,由于气象和占用模式等诸多影响因素,预测建筑能源使用是一项复杂的任务。此外,随着多传感器环境的出现以及能源消费者与智能电网之间的双向通信,高维时间序列越来越多地出现在能源互联网(IoE)中。因此,能够计算高维时间序列的方法在智能建筑和IoE应用中具有重要价值。模糊时间序列(FTS)模型作为数据驱动的非参数模型,具有易于实现和高精度的特点。不幸的是,如果使用所有特征来训练模型,现有的FTS模型可能不可行。我们提出了一种处理高维时间序列的新方法,通过将原始高维数据投影到低维嵌入空间,并在这种低维表示中使用多元FTS方法。结合这些技术可以更好地表示多元时间序列的复杂内容和更准确的预测。 摘要:The prediction of residential power usage is essential in assisting a smart grid to manage and preserve energy to ensure efficient use. An accurate energy forecasting at the customer level will reflect directly into efficiency improvements across the power grid system, however forecasting building energy use is a complex task due to many influencing factors, such as meteorological and occupancy patterns. In addiction, high-dimensional time series increasingly arise in the Internet of Energy (IoE), given the emergence of multi-sensor environments and the two way communication between energy consumers and the smart grid. Therefore, methods that are capable of computing high-dimensional time series are of great value in smart building and IoE applications. Fuzzy Time Series (FTS) models stand out as data-driven non-parametric models of easy implementation and high accuracy. Unfortunately, the existing FTS models can be unfeasible if all features were used to train the model. We present a new methodology for handling high-dimensional time series, by projecting the original high-dimensional data into a low dimensional embedding space and using multivariate FTS approach in this low dimensional representation. Combining these techniques enables a better representation of the complex content of multivariate time series and more accurate forecasts.
【104】 Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs? 标题:OpenAI Codex和其他大型语言模型能帮助我们修复安全漏洞吗? 链接:https://arxiv.org/abs/2112.02125
作者:Hammond Pearce,Benjamin Tan,Baleegh Ahmad,Ramesh Karri,Brendan Dolan-Gavitt 备注:16 pages, 16 figures 摘要:人类开发人员可以生成具有网络安全弱点的代码。新兴的“智能”代码完成工具能否帮助修复这些弱点?在这项工作中,我们研究了使用大型语言模型(LLM)进行代码(如OpenAI的Codex和AI21的Jurassic J-1)的零漏洞修复。我们调查了提示设计中的挑战,这些提示诱使LLM生成不安全代码的修复版本。这是很困难的,因为有很多方法可以用自然语言表达关键信息,包括语义和语法。通过对四种商用、黑匣子、“现成”LLM以及本地训练的模型进行大规模研究,对合成、手工和真实世界的安全漏洞场景进行混合,我们的实验表明,LLM可以100%地修复我们合成和手工生成的场景,以及在现实世界开源项目中选择的历史bug中58%的漏洞。 摘要:Human developers can produce code with cybersecurity weaknesses. Can emerging 'smart' code completion tools help repair those weaknesses? In this work, we examine the use of large language models (LLMs) for code (such as OpenAI's Codex and AI21's Jurassic J-1) for zero-shot vulnerability repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure code. This is difficult due to the numerous ways to phrase key information -- both semantically and syntactically -- with natural languages. By performing a large scale study of four commercially available, black-box, "off-the-shelf" LLMs, as well as a locally-trained model, on a mix of synthetic, hand-crafted, and real-world security bug scenarios, our experiments show that LLMs could collectively repair 100% of our synthetically generated and hand-crafted scenarios, as well as 58% of vulnerabilities in a selection of historical bugs in real-world open-source projects.
【105】 Two-step Lookahead Bayesian Optimization with Inequality Constraints 标题:不等式约束下的两步超前贝叶斯优化 链接:https://arxiv.org/abs/2112.02833
作者:Yunxiang Zhang,Xiangyu Zhang,Peter I. Frazier 机构:Cornell University 摘要:计算效率高的非短视贝叶斯优化(BO)技术的最新进展,与传统的短视方法相比,提高了查询效率,如预期改进,但只略微增加了计算成本。然而,这些进步在很大程度上局限于无约束优化。对于约束优化,现有的少数非短视BO方法需要大量计算。例如,一种现有的非近视约束BO方法[Lam和Willcox,2017]依赖于蒙特卡罗卷展采集函数的计算成本高、不可靠、无蛮力导数的优化。使用重参数化技巧在无约束环境下对非近视采集函数进行更有效的基于导数的优化的方法,如样本平均近似和无穷小扰动分析,请勿扩展:约束会在采样采集函数曲面中引入不连续性,从而阻碍其优化。此外,我们认为,在约束问题中,非近视更为重要,因为对违反约束的恐惧会使近视方法远离可行区域和不可行区域之间的边界采样,从而减缓紧约束最优解的发现。在本文中,我们提出了一个计算效率高的两步前瞻约束贝叶斯优化获取函数(2-OPT-C),支持顺序和批设置。为了实现快速捕获函数优化,我们开发了一种新的基于似然比的两步最优捕获函数梯度无偏估计器,该估计器不使用重新参数化技巧。在数值实验中,2-OPT-C通常比以前的方法提高查询效率2倍或更多,在某些情况下提高10倍或更多。 摘要:Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost. These advances have been largely limited, however, to unconstrained optimization. For constrained optimization, the few existing non-myopic BO methods require heavy computation. For instance, one existing non-myopic constrained BO method [Lam and Willcox, 2017] relies on computationally expensive unreliable brute-force derivative-free optimization of a Monte Carlo rollout acquisition function. Methods that use the reparameterization trick for more efficient derivative-based optimization of non-myopic acquisition functions in the unconstrained setting, like sample average approximation and infinitesimal perturbation analysis, do not extend: constraints introduce discontinuities in the sampled acquisition function surface that hinder its optimization. Moreover, we argue here that being non-myopic is even more important in constrained problems because fear of violating constraints pushes myopic methods away from sampling the boundary between feasible and infeasible regions, slowing the discovery of optimal solutions with tight constraints. In this paper, we propose a computationally efficient two-step lookahead constrained Bayesian optimization acquisition function (2-OPT-C) supporting both sequential and batch settings. To enable fast acquisition function optimization, we develop a novel likelihood-ratio-based unbiased estimator of the gradient of the two-step optimal acquisition function that does not use the reparameterization trick. In numerical experiments, 2-OPT-C typically improves query efficiency by 2x or more over previous methods, and in some cases by 10x or more.
【106】 Learning to Search in Local Branching 标题:在局部分支中学习搜索 链接:https://arxiv.org/abs/2112.02195
作者:Defeng Liu,Matteo Fischetti,Andrea Lodi 机构:Canada Excellence Research Chair, Polytechnique Montr´eal, Department of Information Engineering, University of Padova, Jacobs Technion-Cornell Institute, Cornell University 摘要:寻找混合整数线性规划问题的高质量解对于许多实际应用具有重要意义。在这方面,提出了细化启发式局部分支(LB)来产生改进的解,并对MILP中局部搜索方法的发展产生了很大的影响。该算法迭代地探索由所谓的局部分支约束定义的解邻域序列,即限制与参考解距离的线性不等式。对于LB算法,邻域大小的选择对性能至关重要。虽然在原始LB方案中它是由保守值初始化的,但我们的新观察结果是,最佳大小强烈依赖于特定的MILP实例。在这项工作中,我们研究了搜索邻域的大小与底层LB算法行为之间的关系,并设计了一个基于学习的框架来指导LB启发式算法的邻域搜索。该框架包括两个阶段的战略。对于第一阶段,通过一个回归任务,训练一个比例回归模型,在第一次迭代时预测LB邻域的大小。在第二阶段,我们利用强化学习,设计一种强化邻域搜索策略,在后续迭代中动态调整大小。我们的计算表明,邻域大小确实是可以学习的,从而提高了性能,并且整个算法在实例大小方面以及在实例之间都具有良好的通用性。 摘要:Finding high-quality solutions to mixed-integer linear programming problems (MILPs) is of great importance for many practical applications. In this respect, the refinement heuristic local branching (LB) has been proposed to produce improving solutions and has been highly influential for the development of local search methods in MILP. The algorithm iteratively explores a sequence of solution neighborhoods defined by the so-called local branching constraint, namely, a linear inequality limiting the distance from a reference solution. For a LB algorithm, the choice of the neighborhood size is critical to performance. Although it was initialized by a conservative value in the original LB scheme, our new observation is that the best size is strongly dependent on the particular MILP instance. In this work, we investigate the relation between the size of the search neighborhood and the behavior of the underlying LB algorithm, and we devise a leaning based framework for guiding the neighborhood search of the LB heuristic. The framework consists of a two-phase strategy. For the first phase, a scaled regression model is trained to predict the size of the LB neighborhood at the first iteration through a regression task. In the second phase, we leverage reinforcement learning and devise a reinforced neighborhood search strategy to dynamically adapt the size at the subsequent iterations. We computationally show that the neighborhood size can indeed be learned, leading to improved performances and that the overall algorithm generalizes well both with respect to the instance size and, remarkably, across instances.
【107】 Intelligent Trading Systems: A Sentiment-Aware Reinforcement Learning Approach 标题:智能交易系统:一种情感感知强化学习方法 链接:https://arxiv.org/abs/2112.02095
作者:Francisco Caio Lima Paiva,Leonardo Kanashiro Felizardo,Reinaldo Augusto da Costa Bianchi,Anna Helena Reali Costa 机构:Universidade de São Paulo, São Paulo, SP, Brazil, Centro Universitário FEI, São Bernardo do Campo, SP, Brazil 备注:9 pages, 5 figures, To appear in the Proceedings of the 2nd ACM International Conference on AI in Finance (ICAIF'21), November 3-5, 2021, Virtual Event, USA 摘要:基于模式识别在证券交易所对单一资产进行有利可图交易的可行性一直吸引着研究人员。强化学习(RL)和自然语言处理在这些单一资产交易任务中已经声名狼藉,但只有少数作品探索了它们的结合。此外,一些问题仍未得到解决,例如通过明确捕获反映市场状况的情绪特征来提取市场情绪动量,以及评估不同情况下RL结果的一致性和稳定性。填补这一空白,我们提出了情绪感知RL(SentARL)智能交易系统,该系统通过从文本新闻中提取自适应数量的过去情绪特征,利用市场情绪,提高利润稳定性。我们评估了20项资产、两项交易成本和五个不同时期的SentARL,并对其进行了初始化,以显示其与基线的一致有效性。随后,这一彻底的评估使我们能够确定新闻报道和市场情绪之间的界限,即价格时间序列的相关性,在该相关性之上,SentARL的有效性是突出的。 摘要:The feasibility of making profitable trades on a single asset on stock exchanges based on patterns identification has long attracted researchers. Reinforcement Learning (RL) and Natural Language Processing have gained notoriety in these single-asset trading tasks, but only a few works have explored their combination. Moreover, some issues are still not addressed, such as extracting market sentiment momentum through the explicit capture of sentiment features that reflect the market condition over time and assessing the consistency and stability of RL results in different situations. Filling this gap, we propose the Sentiment-Aware RL (SentARL) intelligent trading system that improves profit stability by leveraging market mood through an adaptive amount of past sentiment features drawn from textual news. We evaluated SentARL across twenty assets, two transaction costs, and five different periods and initializations to show its consistent effectiveness against baselines. Subsequently, this thorough assessment allowed us to identify the boundary between news coverage and market sentiment regarding the correlation of price-time series above which SentARL's effectiveness is outstanding.
【108】 Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent Flows 标题:打破“屏障”的融合:通过定时融合流进行优化 链接:https://arxiv.org/abs/2112.01363
作者:Param Budhraja,Mayank Baranwal,Kunal Garg,Ashish Hota 机构: Indian Institute of Technology Kharagpur, Tata Consultancy Services Research & Innovation, Mumbai, University of California, Santa Cruz 备注:Accepted at AAAI Conference on Artificial Intelligence, 2022, to appear 摘要:加速梯度法是机器学习和其他数据分析领域中自然产生的大规模数据驱动优化问题的基础。我们引入了一个基于梯度的优化框架来实现加速,基于最近引入的动态系统的固定时间稳定性的概念。该方法是简单的基于梯度的方法的推广,可适当缩放,以在固定时间内收敛到优化器,与初始化无关。我们首先利用连续时间框架来设计固定时间稳定的动态系统,然后提供一致的离散化策略,使得等效离散时间算法在实际固定的迭代次数内跟踪优化器,从而实现这一点。我们还从理论上分析了所提出的梯度流的收敛性,以及它们对一系列服从强凸性、严格凸性和可能非凸性但满足Polyak-{L}ojasiewicz不等式的函数的加性扰动的鲁棒性。我们还证明了由于固定时间收敛,收敛速度上的遗憾界是常数。超参数具有直观的解释,并且可以进行调整,以符合所需收敛速度的要求。我们通过一系列数值算例验证了所提格式的加速收敛性,并与最新的优化算法进行了比较。我们的工作为通过连续时间流的离散化开发新的优化算法提供了见解。 摘要:Accelerated gradient methods are the cornerstones of large-scale, data-driven optimization problems that arise naturally in machine learning and other fields concerning data analysis. We introduce a gradient-based optimization framework for achieving acceleration, based on the recently introduced notion of fixed-time stability of dynamical systems. The method presents itself as a generalization of simple gradient-based methods suitably scaled to achieve convergence to the optimizer in a fixed-time, independent of the initialization. We achieve this by first leveraging a continuous-time framework for designing fixed-time stable dynamical systems, and later providing a consistent discretization strategy, such that the equivalent discrete-time algorithm tracks the optimizer in a practically fixed number of iterations. We also provide a theoretical analysis of the convergence behavior of the proposed gradient flows, and their robustness to additive disturbances for a range of functions obeying strong convexity, strict convexity, and possibly nonconvexity but satisfying the Polyak-{L}ojasiewicz inequality. We also show that the regret bound on the convergence rate is constant by virtue of the fixed-time convergence. The hyperparameters have intuitive interpretations and can be tuned to fit the requirements on the desired convergence rates. We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms. Our work provides insights on developing novel optimization algorithms via discretization of continuous-time flows.
机器翻译,仅供参考