人工智能学术速递[9.13]

2021-09-16 17:28:32 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.AI人工智能,共计35篇

【1】 PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams 标题:PWPAE:物联网数据流概念漂移自适应集成框架 链接:https://arxiv.org/abs/2109.05013

作者:Li Yang,Dimitrios Michael Manias,Abdallah Shami 机构:Western University, London, Ontario, Canada 备注:Accepted and to appear in IEEE GlobeCom 2021; Code is available at Github link: this https URL 摘要:随着物联网(IoT)设备和系统数量的激增,物联网数据分析技术已被开发用于检测恶意网络攻击和保护物联网系统;然而,物联网数据分析中经常出现概念漂移问题,因为物联网数据通常是随时间变化的动态数据流,导致模型退化和攻击检测失败。这是因为传统的数据分析模型是静态模型,无法适应数据分布的变化。在本文中,我们提出了一个性能加权概率平均集成(PWPAE)框架,用于通过物联网数据流分析进行漂移自适应物联网异常检测。在两个公共数据集上的实验表明,与最新的方法相比,我们提出的PWPAE方法是有效的。 摘要:As the number of Internet of Things (IoT) devices and systems have surged, IoT data analytics techniques have been developed to detect malicious cyber-attacks and secure IoT systems; however, concept drift issues often occur in IoT data analytics, as IoT data is often dynamic data streams that change over time, causing model degradation and attack detection failure. This is because traditional data analytics models are static models that cannot adapt to data distribution changes. In this paper, we propose a Performance Weighted Probability Averaging Ensemble (PWPAE) framework for drift adaptive IoT anomaly detection through IoT data stream analytics. Experiments on two public datasets show the effectiveness of our proposed PWPAE method compared against state-of-the-art methods.

【2】 Fairness without the sensitive attribute via Causal Variational Autoencoder 标题:基于因果变分自动编码器的无敏感属性公平性 链接:https://arxiv.org/abs/2109.04999

作者:Vincent Grari,Sylvain Lamprier,Marcin Detyniecki 机构: Sorbonne Universit´e, LIP,CNRS, Paris, France, AXA, Paris, France, Polish Academy of Science, IBS PAN, Warsaw, Poland 备注:8 pages, 9 figures 摘要:近年来,机器学习模型中的大多数公平策略都集中在通过假设观察到敏感信息来减少不必要的偏差。然而,这在实践中并不总是可能的。由于隐私目的和欧盟的RGPD等各种法规,许多个人敏感属性经常未被收集。我们注意到,在这种困难的环境中,缺乏减少偏见的方法,特别是在实现人口均等和均等赔率等经典公平目标方面。通过利用最近的发展进行近似推断,我们提出了一种填补这一空白的方法。基于因果图,我们依赖一个新的基于变分自动编码的框架SRCVAE来推断敏感信息代理,该代理在对抗性公平方法中用于减少偏差。我们以经验证明,该领域的现有工作有显著改进。我们观察到,生成的代理的潜在空间恢复了敏感信息,并且我们的方法在两个真实数据集上获得相同的公平性水平的同时,获得了更高的精度,这与使用com mon公平性定义测量的结果相同。 摘要:In recent years, most fairness strategies in machine learning models focus on mitigating unwanted biases by assuming that the sensitive information is observed. However this is not always possible in practice. Due to privacy purposes and var-ious regulations such as RGPD in EU, many personal sensitive attributes are frequently not collected. We notice a lack of approaches for mitigating bias in such difficult settings, in particular for achieving classical fairness objectives such as Demographic Parity and Equalized Odds. By leveraging recent developments for approximate inference, we propose an approach to fill this gap. Based on a causal graph, we rely on a new variational auto-encoding based framework named SRCVAE to infer a sensitive information proxy, that serve for bias mitigation in an adversarial fairness approach. We empirically demonstrate significant improvements over existing works in the field. We observe that the generated proxy's latent space recovers sensitive information and that our approach achieves a higher accuracy while obtaining the same level of fairness on two real datasets, as measured using com-mon fairness definitions.

【3】 Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization 标题:基于话题感知的对比学习在抽象对话摘要中的应用 链接:https://arxiv.org/abs/2109.04994

作者:Junpeng Liu,Yanyan Zou,Hainan Zhang,Hongshen Chen,Zhuoye Ding,Caixia Yuan,Xiaojie Wang 机构:Beijing University of Posts and Telecommunications, Beijing, China 备注:EMNLP 2021 摘要:与结构良好的文本(如新闻报道和百科全书文章)不同,对话内容通常来自两个或多个对话者,彼此交换信息。在这种情况下,对话的主题可能会随着进展而变化,并且某个主题的关键信息通常分散在不同说话人的多个话语中,这对抽象总结对话提出了挑战。为了捕捉会话中的各种话题信息,并概述所捕捉话题的显著事实,本研究提出了两个主题感知对比学习目标,即连贯性检测和子摘要生成目标,这将隐含地模拟话题变化,并处理对话摘要任务中信息分散的挑战。提出的对比目标作为主要对话摘要任务的辅助任务,通过替代参数更新策略进行统一。在基准数据集上的大量实验表明,所提出的简单方法显著优于强基线,并实现了最新的性能。代码和经过训练的模型可通过href公开获取{https://github.com/Junpliu/ConDigSum}{https://github.com/Junpliu/ConDigSum}. 摘要:Unlike well-structured text, such as news reports and encyclopedia articles, dialogue content often comes from two or more interlocutors, exchanging information with each other. In such a scenario, the topic of a conversation can vary upon progression and the key information for a certain topic is often scattered across multiple utterances of different speakers, which poses challenges to abstractly summarize dialogues. To capture the various topic information of a conversation and outline salient facts for the captured topics, this work proposes two topic-aware contrastive learning objectives, namely coherence detection and sub-summary generation objectives, which are expected to implicitly model the topic change and handle information scattering challenges for the dialogue summarization task. The proposed contrastive objectives are framed as auxiliary tasks for the primary dialogue summarization task, united via an alternative parameter updating strategy. Extensive experiments on benchmark datasets demonstrate that the proposed simple method significantly outperforms strong baselines and achieves new state-of-the-art performance. The code and trained models are publicly available via href{https://github.com/Junpliu/ConDigSum}{https://github.com/Junpliu/ConDigSum}.

【4】 LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation 标题:LAViTeR:在图像和字幕生成的辅助下学习对齐的视觉和文本表示 链接:https://arxiv.org/abs/2109.04993

作者:Mohammad Abuzar Shaikh,Zhanghexuan Ji,Dana Moukheiber,Sargur Srihari,Mingchen Gao 机构:Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA 备注:14 pages, 10 Figures, 5 Tables 摘要:从大规模图像-文本对中预训练视觉和文本表示正成为许多下游视觉语言任务的标准方法。基于Transformer的模型通过一系列自我监督学习任务来学习模态间和模态内注意。本文提出了一种新颖的视觉和文本表征学习体系结构LAViTeR。主要模块,视觉文本对齐(VTA)将由两个辅助任务辅助,基于GAN的图像合成和图像字幕。我们还提出了一个新的评价指标来衡量学习到的视觉嵌入和文本嵌入之间的相似性。在两个公共数据集CUB和MS-COCO上的实验结果表明,在联合特征嵌入空间中,视觉和文本表示对齐效果更好 摘要:Pre-training visual and textual representations from large-scale image-text pairs is becoming a standard approach for many downstream vision-language tasks. The transformer-based models learn inter and intra-modal attention through a list of self-supervised learning tasks. This paper proposes LAViTeR, a novel architecture for visual and textual representation learning. The main module, Visual Textual Alignment (VTA) will be assisted by two auxiliary tasks, GAN-based image synthesis and Image Captioning. We also propose a new evaluation metric measuring the similarity between the learnt visual and textual embedding. The experimental results on two public datasets, CUB and MS-COCO, demonstrate superior visual and textual representation alignment in the joint feature embedding space

【5】 Detection of GAN-synthesized street videos 标题:GaN合成街道视频的检测 链接:https://arxiv.org/abs/2109.04991

作者:Omran Alamayreh,Mauro Barni 机构:Department of Information Engineering and Mathematics, University of Siena, Via Roma , - Siena, ITALY 备注:2021 29th European Association for Signal Processing (EURASIP) 摘要:对人工智能生成视频检测的研究几乎完全集中在人脸视频上,通常称为深度伪造。随着一系列有效工具的开发,人脸交换、人脸重现和表情操纵等操作已经成为一项深入研究的主题,以区分人工视频和真实视频。对于人工非面部视频的检测,人们的关注要少得多。然而,用于生成此类视频的新工具正在快速开发,并将很快达到视频的质量水平。本文的目的是研究一种新的人工智能生成的视频帧驱动街道序列(这里称为DeepStreets视频)的可检测性,根据其性质,无法使用用于面部伪造的相同工具进行分析。具体来说,我们提供了一个简单的基于帧的检测器,在由Vid2vid体系结构生成的最先进的DeepStreets视频上实现了非常好的性能。值得注意的是,检测器在压缩视频上保持了非常好的性能,即使训练期间使用的压缩级别与测试视频不匹配。 摘要:Research on the detection of AI-generated videos has focused almost exclusively on face videos, usually referred to as deepfakes. Manipulations like face swapping, face reenactment and expression manipulation have been the subject of an intense research with the development of a number of efficient tools to distinguish artificial videos from genuine ones. Much less attention has been paid to the detection of artificial non-facial videos. Yet, new tools for the generation of such kind of videos are being developed at a fast pace and will soon reach the quality level of deepfake videos. The goal of this paper is to investigate the detectability of a new kind of AI-generated videos framing driving street sequences (here referred to as DeepStreets videos), which, by their nature, can not be analysed with the same tools used for facial deepfakes. Specifically, we present a simple frame-based detector, achieving very good performance on state-of-the-art DeepStreets videos generated by the Vid2vid architecture. Noticeably, the detector retains very good performance on compressed videos, even when the compression level used during training does not match that used for the test videos.

【6】 Examining Cross-lingual Contextual Embeddings with Orthogonal Structural Probes 标题:用正交结构探针检查跨语言上下文嵌入 链接:https://arxiv.org/abs/2109.04921

作者:Tomasz Limisiewicz,David Mareček 机构:Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic 备注:EMNLP 2021 Main Conference 摘要:最先进的上下文嵌入来自仅适用于少数语言的大型语言模型。对于其他人,我们需要学习使用多语言模型的表示。关于多语言嵌入是否可以在跨多种语言共享的空间中对齐,目前仍存在争论。新的正交结构探针(Limisiewicz和Marev{c}ek,2021)允许我们针对特定的语言特征回答这个问题,并学习仅基于单语注释数据集的投影。我们评估了九种不同语言的MBert上下文表示中编码的句法(UD)和词汇(WordNet)结构信息。我们观察到,对于与英语密切相关的语言,不需要转换。评估信息被编码在共享的跨语言嵌入空间中。对于其他语言,对每种语言分别应用所学的正交变换是有益的。我们成功地将我们的研究结果应用于Zero-Shot和Few-Shot跨语言分析。 摘要:State-of-the-art contextual embeddings are obtained from large language models available only for a few languages. For others, we need to learn representations using a multilingual model. There is an ongoing debate on whether multilingual embeddings can be aligned in a space shared across many languages. The novel Orthogonal Structural Probe (Limisiewicz and Marev{c}ek, 2021) allows us to answer this question for specific linguistic features and learn a projection based only on mono-lingual annotated datasets. We evaluate syntactic (UD) and lexical (WordNet) structural information encoded inmBERT's contextual representations for nine diverse languages. We observe that for languages closely related to English, no transformation is needed. The evaluated information is encoded in a shared cross-lingual embedding space. For other languages, it is beneficial to apply orthogonal transformation learned separately for each language. We successfully apply our findings to zero-shot and few-shot cross-lingual parsing.

【7】 ReasonBERT: Pre-trained to Reason with Distant Supervision 标题:ReasonBERT:接受过远程监督推理的预训 链接:https://arxiv.org/abs/2109.04912

作者:Xiang Deng,Yu Su,Alyssa Lees,You Wu,Cong Yu,Huan Sun 机构:The Ohio State University, Columbus, OH, Google Research, New York, NY 备注:Accepted to EMNLP'2021. Our code and pre-trained models are available at this https URL 摘要:我们提出了ReasonBert,这是一种预训练方法,它增强了语言模型在长期关系和多个可能混合的上下文中进行推理的能力。与现有的只从自然出现的文本的局部上下文中获取学习信号的预训练方法不同,我们提出了一种广义的远程监督概念,自动连接多个文本和表格,以创建需要长期推理的预训练示例。模拟了不同类型的推理,包括交叉多个证据、从一个证据过渡到另一个证据以及检测无法回答的案例。我们对从单跳到多跳、从纯文本到纯表格再到混合的各种抽取式问答数据集进行了综合评估,这些数据集需要各种推理能力,并表明ReasonBert在一系列强基线上取得了显著的改进。少数镜头实验进一步证明,我们的预训练方法大大提高了样本效率。 摘要:We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts. Unlike existing pre-training methods that only harvest learning signals from local contexts of naturally occurring texts, we propose a generalized notion of distant supervision to automatically connect multiple pieces of text and tables to create pre-training examples that require long-range reasoning. Different types of reasoning are simulated, including intersecting multiple pieces of evidence, bridging from one piece of evidence to another, and detecting unanswerable cases. We conduct a comprehensive evaluation on a variety of extractive question answering datasets ranging from single-hop to multi-hop and from text-only to table-only to hybrid that require various reasoning capabilities and show that ReasonBert achieves remarkable improvement over an array of strong baselines. Few-shot experiments further demonstrate that our pre-training method substantially improves sample efficiency.

【8】 How Can Subgroup Discovery Help AIOps? 标题:子组发现如何帮助AIOPS? 链接:https://arxiv.org/abs/2109.04909

作者:Youcef Remil 机构:INSA Lyon, CNRS, LIRIS UMR, France, Infologic R&D, Bourg-les-Valence, France 摘要:对现代IT系统的真正监管带来了新的挑战,因为在分析和监控大数据流时,它需要更高的可伸缩性、可靠性和效率标准。基于规则的推理机是维护系统检测异常并自动解决异常的关键组件。然而,它们仍然局限于简单和一般的规则,无法处理大量数据,也无法处理IT系统发出的大量警报,这是专家系统时代的经验教训。人工智能操作系统(AIOps)建议利用大数据上的高级分析和机器学习来改进和自动化监控系统的每一步,并在检测停机、确定根本原因和应用适当的修复措施方面帮助事件管理。然而,最好的AIOps技术依赖于不透明的模型,这大大限制了它们的采用。作为这篇博士论文的一部分,我们研究亚组发现如何帮助AIOps。这种有前途的数据挖掘技术为从数据中提取有趣的假设和理解预测模型背后的潜在过程提供了可能性。为了确保我们的主张的相关性,该项目涉及来自法国软件编辑Infologic的数据挖掘研究人员和实践者。 摘要:The genuine supervision of modern IT systems brings new challenges as it requires higher standards of scalability, reliability and efficiency when analysing and monitoring big data streams. Rule-based inference engines are a key component of maintenance systems in detecting anomalies and automating their resolution. However, they remain confined to simple and general rules and cannot handle the huge amount of data, nor the large number of alerts raised by IT systems, a lesson learned from expert systems era. Artificial Intelligence for Operation Systems (AIOps) proposes to take advantage of advanced analytics and machine learning on big data to improve and automate every step of supervision systems and aid incident management in detecting outages, identifying root causes and applying appropriate healing actions. Nevertheless, the best AIOps techniques rely on opaque models, strongly limiting their adoption. As a part of this PhD thesis, we study how Subgroup Discovery can help AIOps. This promising data mining technique offers possibilities to extract interesting hypothesis from data and understand the underlying process behind predictive models. To ensure relevancy of our propositions, this project involves both data mining researchers and practitioners from Infologic, a French software editor.

【9】 SO-SLAM: Semantic Object SLAM with Scale Proportional and Symmetrical Texture Constraints 标题:SO-SLAM:具有比例、比例和对称纹理约束的语义对象SLAM 链接:https://arxiv.org/abs/2109.04884

作者:Ziwei Liao,Yutong Hu,Jiadong Zhang,Xianyu Qi,Xiaoyu Zhang,Wei Wang 备注:Submitted to RAL&ICRA 2022 摘要:对象SLAM将对象的概念引入到同步定位和映射(SLAM)中,并帮助理解移动机器人的室内场景和对象级交互应用程序。最先进的目标SLAM系统面临着诸如部分观测、遮挡、不可观测问题等挑战,限制了映射精度和鲁棒性。提出了一种新的单目语义对象SLAM(SO-SLAM)系统,解决了对象空间约束的引入问题。我们探讨了三种具有代表性的空间约束,包括比例约束、对称纹理约束和平面支撑约束。基于这些语义约束,我们提出了两种新的方法——更健壮的对象初始化方法和方向精细优化方法。我们已经在公共数据集和作者记录的移动机器人数据集上验证了算法的性能,并在映射效果上取得了显著的改进。我们将在此处发布代码:https://github.com/XunshanMan/SoSLAM. 摘要:Object SLAM introduces the concept of objects into Simultaneous Localization and Mapping (SLAM) and helps understand indoor scenes for mobile robots and object-level interactive applications. The state-of-art object SLAM systems face challenges such as partial observations, occlusions, unobservable problems, limiting the mapping accuracy and robustness. This paper proposes a novel monocular Semantic Object SLAM (SO-SLAM) system that addresses the introduction of object spatial constraints. We explore three representative spatial constraints, including scale proportional constraint, symmetrical texture constraint and plane supporting constraint. Based on these semantic constraints, we propose two new methods - a more robust object initialization method and an orientation fine optimization method. We have verified the performance of the algorithm on the public datasets and an author-recorded mobile robot dataset and achieved a significant improvement on mapping effects. We will release the code here: https://github.com/XunshanMan/SoSLAM.

【10】 Efficient Test Time Adapter Ensembling for Low-resource Language Varieties 标题:针对低资源语言变体的高效测试时间适配器集成 链接:https://arxiv.org/abs/2109.04877

作者:Xinyi Wang,Yulia Tsvetkov,Sebastian Ruder,Graham Neubig 机构:Language Technology Institute, Carnegie Mellon University, Paul G. Allen School of Computer Science & Engineering, University of Washington, DeepMind 备注:EMNLP 2021 Findings 摘要:适配器是重量轻的模块,允许对预训练模型进行参数有效的微调。最近提出了专门的语言和任务适配器,以促进多语言预训练模型的跨语言迁移(Pfeiffer等人,2020b)。然而,这种方法需要为希望支持的每种语言训练一个单独的语言适配器,这对于数据有限的语言来说是不切实际的。直观的解决方案是为新的语言变体使用相关的语言适配器,但我们观察到,此解决方案可能导致次优性能。在本文中,我们的目标是在不训练新适配器的情况下提高语言适配器对未覆盖语言的健壮性。我们发现,将多个现有的语言适配器集成在一起,可以使经过微调的模型比这些适配器中未包含的其他语言变体更加健壮。基于这一观察,我们提出了熵最小化的适配器集合(EMEA),一种通过最小化预测的熵来优化每个测试句子的预训练语言适配器的集合权重的方法。在三组不同语言变体上的实验表明,我们的方法在所有语言的命名实体识别和词性标注方面都有显著的改进。 摘要:Adapters are light-weight modules that allow parameter-efficient fine-tuning of pretrained models. Specialized language and task adapters have recently been proposed to facilitate cross-lingual transfer of multilingual pretrained models (Pfeiffer et al., 2020b). However, this approach requires training a separate language adapter for every language one wishes to support, which can be impractical for languages with limited data. An intuitive solution is to use a related language adapter for the new language variety, but we observe that this solution can lead to sub-optimal performance. In this paper, we aim to improve the robustness of language adapters to uncovered languages without training new adapters. We find that ensembling multiple existing language adapters makes the fine-tuned model significantly more robust to other language varieties not included in these adapters. Building upon this observation, we propose Entropy Minimized Ensemble of Adapters (EMEA), a method that optimizes the ensemble weights of the pretrained language adapters for each test sentence by minimizing the entropy of its predictions. Experiments on three diverse groups of language varieties show that our method leads to significant improvements on both named entity recognition and part-of-speech tagging across all languages.

【11】 MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment 标题:MultiAzterTest:用于可读性评估的多语言层次多语言分析器 链接:https://arxiv.org/abs/2109.04870

作者:Kepa Bengoetxea,Itziar Gonzalez-Dios 机构:Ixa group, HiTZ center,University of the Basque Country (UPVEHU) 备注:33 pages 摘要:可读性评估的任务是确定文本的难易程度,或文本的级别。传统上,人们使用依赖于语言的可读性公式,但这些公式很少考虑文本特征。然而,评估文本复杂性的自然语言处理(NLP)工具能够测量更多不同的特征,并且可以适应不同的语言。在本文中,我们介绍了MultiAzterTest工具:(i)一个开源NLP工具,它分析英语、西班牙语和巴斯克语的125种衔接、语言和可读性度量标准的文本,但其体系结构设计便于适应其他语言;(ii)可读性评估分类器,用于提高英语Coh Metrix、西班牙语Coh Metrix Esp和巴斯克语ErreXail的性能;iii)网络工具。当使用SMO分类器将英语分为三个阅读水平(初级、中级和高级)时,多Aztertest的准确率为90.09%,在巴斯克语为95.50%,在西班牙语为90%。通过使用跨语言特征,MultiAzterTest还可以在复杂与简单的区分中获得竞争结果。 摘要:Readability assessment is the task of determining how difficult or easy a text is or which level/grade it has. Traditionally, language dependent readability formula have been used, but these formulae take few text characteristics into account. However, Natural Language Processing (NLP) tools that assess the complexity of texts are able to measure more different features and can be adapted to different languages. In this paper, we present the MultiAzterTest tool: (i) an open source NLP tool which analyzes texts on over 125 measures of cohesion,language, and readability for English, Spanish and Basque, but whose architecture is designed to easily adapt other languages; (ii) readability assessment classifiers that improve the performance of Coh-Metrix in English, Coh-Metrix-Esp in Spanish and ErreXail in Basque; iii) a web tool. MultiAzterTest obtains 90.09 % in accuracy when classifying into three reading levels (elementary, intermediate, and advanced) in English and 95.50 % in Basque and 90 % in Spanish when classifying into two reading levels (simple and complex) using a SMO classifier. Using cross-lingual features, MultiAzterTest also obtains competitive results above all in a complex vs simple distinction.

【12】 Emerging AI Security Threats for Autonomous Cars -- Case Studies 标题:自动驾驶汽车新出现的人工智能安全威胁--案例研究 链接:https://arxiv.org/abs/2109.04865

作者:Shanthi Lekkala,Tanya Motwani,Manojkumar Parmar,Amit Phadke 机构:Robert Bosch Engineering and Business Solutions Private Limited, Bengaluru, India 备注:6 pages, 4 figures; Manuscript is accepted at ESCAR Europe 2021 conference 摘要:从目标检测到路径规划,人工智能对自主车辆做出了重大贡献。然而,人工智能模型需要大量敏感的训练数据,并且通常需要大量计算才能建立。此类模型的商业价值促使攻击者发起各种攻击。对手可以出于赚钱的目的发动模型提取攻击,或者向模型规避等其他攻击发起攻击。在特定情况下,它甚至会破坏品牌声誉、差异化和价值主张。此外,知识产权法律和人工智能相关法律仍在发展中,各国之间并不统一。我们详细讨论了模型提取攻击,包括两个用例和一个可能危害自动驾驶汽车的通用杀伤链。必须调查管理和降低模型被盗风险的策略。 摘要:Artificial Intelligence has made a significant contribution to autonomous vehicles, from object detection to path planning. However, AI models require a large amount of sensitive training data and are usually computationally intensive to build. The commercial value of such models motivates attackers to mount various attacks. Adversaries can launch model extraction attacks for monetization purposes or step-ping-stone towards other attacks like model evasion. In specific cases, it even results in destroying brand reputation, differentiation, and value proposition. In addition, IP laws and AI-related legalities are still evolving and are not uniform across countries. We discuss model extraction attacks in detail with two use-cases and a generic kill-chain that can compromise autonomous cars. It is essential to investigate strategies to manage and mitigate the risk of model theft.

【13】 An Evaluation Dataset and Strategy for Building Robust Multi-turn Response Selection Model 标题:一种构建稳健多回合响应选择模型的评价数据集和策略 链接:https://arxiv.org/abs/2109.04834

作者:Kijong Han,Seojin Lee,Wooin Lee,Joosung Lee,Dong-hun Lee 备注:EMNLP 2021 摘要:多轮反应选择模型最近在几个基准数据集中显示出与人类相当的性能。然而,在现实环境中,这些模型往往存在弱点,例如在没有全面理解上下文的情况下,严重基于表面模式做出错误预测。例如,这些模型通常会给错误的回答候选项高分,该候选项包含多个与上下文相关但使用不一致时态的关键字。在这项研究中,我们分析了开放域韩国多轮反应选择模型的弱点,并发布了一个对抗性数据集来评估这些弱点。我们还提出了一种在这种敌对环境中构建健壮模型的策略。 摘要:Multi-turn response selection models have recently shown comparable performance to humans in several benchmark datasets. However, in the real environment, these models often have weaknesses, such as making incorrect predictions based heavily on superficial patterns without a comprehensive understanding of the context. For example, these models often give a high score to the wrong response candidate containing several keywords related to the context but using the inconsistent tense. In this study, we analyze the weaknesses of the open-domain Korean Multi-turn response selection models and publish an adversarial dataset to evaluate these weaknesses. We also suggest a strategy to build a robust model in this adversarial environment.

【14】 Solving the Extended Job Shop Scheduling Problem with AGVs -- Classical and Quantum Approaches 标题:用AGVS求解扩展的Job Shop调度问题--经典方法和量子方法 链接:https://arxiv.org/abs/2109.04830

作者:Marc Geitz,Cristian Grozea,Wolfgang Steigerwald,Robin Stöhr,Armin Wolf 摘要:作业调度优化(JSO)的主题是处理组织中的作业调度,以便根据假设目标优化组织单个工作步骤。本文提供了一个用例,它处理JSO的一个子方面,即作业车间调度问题(JSSP或JSP)。由于许多优化问题JSSP是NP完全的,这意味着复杂性随着系统中每个节点的增加呈指数增长。该用例的目标是展示如何使用约束编程(CP)和量子计算(QC)交替使用,结合自主地面车辆(AGV),为柔性组织机械中的某些工件创建优化的工作公鸡。给出并讨论了基于CP和量子退火模型的经典解的结果。研究项目PlanQK中详细阐述了所有提出的结果。 摘要:The subject of Job Scheduling Optimisation (JSO) deals with the scheduling of jobs in an organization, so that the single working steps are optimally organized regarding the postulated targets. In this paper a use case is provided which deals with a sub-aspect of JSO, the Job Shop Scheduling Problem (JSSP or JSP). As many optimization problems JSSP is NP-complete, which means the complexity increases with every node in the system exponentially. The goal of the use case is to show how to create an optimized duty rooster for certain workpieces in a flexible organized machinery, combined with an Autonomous Ground Vehicle (AGV), using Constraint Programming (CP) and Quantum Computing (QC) alternatively. The results of a classical solution based on CP and on a Quantum Annealing model are presented and discussed. All presented results have been elaborated in the research project PlanQK.

【15】 Secondary control activation analysed and predicted with explainable AI 标题:用可解释人工智能分析和预测二次控制激活 链接:https://arxiv.org/abs/2109.04802

作者:Johannes Kruse,Benjamin Schäfer,Dirk Witthaut 机构:∗Institute for Energy and Climate Research - Systems Analysis and Technology Evaluation (IEK-STE), Forschungszentrum Jülich, Jülich, Germany, †Institute for Theoretical Physics, University of Cologne, Köln, Germany 备注:8 pages, 6 figures 摘要:向可再生能源系统的过渡对电网的运行和稳定性提出了挑战。二次控制是将电力系统恢复到扰动后的基准的关键。低估必要的控制能力可能需要采取紧急措施,如甩负荷。因此,需要对新出现的风险和控制的驱动因素有一个坚实的理解。在这一贡献中,我们建立了一个可解释的机器学习模型,用于激活德国的二次控制权。通过训练梯度增强树,我们得到了控制激活的精确描述。利用SHapely加法解释(SHAP)值,我们研究了控制激活与外部特征(如发电组合、预测误差和电力市场数据)之间的相关性。因此,我们的分析揭示了导致德国电力系统高备用需求的驱动因素。我们的透明方法,利用开放数据并使机器学习模型可解释,开辟了新的科学发现途径。 摘要:The transition to a renewable energy system poses challenges for power grid operation and stability. Secondary control is key in restoring the power system to its reference following a disturbance. Underestimating the necessary control capacity may require emergency measures, such as load shedding. Hence, a solid understanding of the emerging risks and the driving factors of control is needed. In this contribution, we establish an explainable machine learning model for the activation of secondary control power in Germany. Training gradient boosted trees, we obtain an accurate description of control activation. Using SHapely Additive exPlanation (SHAP) values, we investigate the dependency between control activation and external features such as the generation mix, forecasting errors, and electricity market data. Thereby, our analysis reveals drivers that lead to high reserve requirements in the German power system. Our transparent approach, utilizing open data and making machine learning models interpretable, opens new scientific discovery avenues.

【16】 Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition 标题:端到端多通道远场语音识别的自注意通道组合器前端 链接:https://arxiv.org/abs/2109.04783

作者:Rong Gong,Carl Quillen,Dushyant Sharma,Andrew Goderre,José Laínez,Ljubomir Milanović 机构:Nuance Communications GmbH, Vienna, Austria, Nuance Communications Inc., Burlington, USA, Nuance Communications S.A., Madrid, Spain 备注:In Proceedings of Interspeech 2021 摘要:当提供足够大的远场训练数据时,联合优化多通道前端和端到端(E2E)自动语音识别(ASR)后端将显示有希望的结果。最近的文献表明,传统的波束形成器设计,如MVDR(最小方差无失真响应)或固定波束形成器,可以成功地作为前端集成到具有可学习参数的E2E ASR系统中。在这项工作中,我们提出了自注意通道组合器(SACC)ASR前端,它利用自注意机制在幅度谱域中组合多通道音频信号。在多通道播放测试数据上进行的实验表明,与基于最先进的固定波束形成器的前端相比,SACC实现了9.3%的WERR,两者都与基于ContextNet的ASR后端进行了联合优化。我们还演示了SACC和传统波束形成器之间的连接,并分析了SACC的中间输出。 摘要:When a sufficiently large far-field training data is presented, jointly optimizing a multichannel frontend and an end-to-end (E2E) Automatic Speech Recognition (ASR) backend shows promising results. Recent literature has shown traditional beamformer designs, such as MVDR (Minimum Variance Distortionless Response) or fixed beamformers can be successfully integrated as the frontend into an E2E ASR system with learnable parameters. In this work, we propose the self-attention channel combinator (SACC) ASR frontend, which leverages the self-attention mechanism to combine multichannel audio signals in the magnitude spectral domain. Experiments conducted on a multichannel playback test data shows that the SACC achieved a 9.3% WERR compared to a state-of-the-art fixed beamformer-based frontend, both jointly optimized with a ContextNet-based ASR backend. We also demonstrate the connection between the SACC and the traditional beamformers, and analyze the intermediate outputs of the SACC.

【17】 Improving Multilingual Translation by Representation and Gradient Regularization 标题:基于表示和梯度正则化的多语种翻译改进 链接:https://arxiv.org/abs/2109.04778

作者:Yilin Yang,Akiko Eriguchi,Alexandre Muzio,Prasad Tadepalli,Stefan Lee,Hany Hassan 机构:Oregon State University, Microsoft 备注:EMNLP 2021 (Long) 摘要:多语言神经机器翻译(NMT)使一个模型能够服务于所有的翻译方向,包括在训练过程中看不见的,即Zero-Shot翻译。尽管在理论上很有吸引力,但当前的模型通常会产生低质量的翻译——通常甚至无法以正确的目标语言生成输出。在这项工作中,我们观察到,即使在强大的多语言系统中,在大规模多语言语料库上训练的非目标翻译也占主导地位。为了解决这个问题,我们提出了一种在表示层和梯度层上正则化NMT模型的联合方法。在表示层,我们利用辅助目标语言预测任务来规范解码器输出,以保留有关目标语言的信息。在梯度级别,我们利用少量直接数据(数千个句子对)来调整模型梯度。我们的结果表明,在WMT和OPUS数据集上,我们的方法在减少非目标翻译发生率和提高Zero-Shot翻译性能方面都非常有效,分别提高了 5.59和 10.38 BLEU。此外,实验表明,在没有少量直接数据的情况下,我们的方法也能很好地工作。 摘要:Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low quality translations -- commonly failing to even produce outputs in the right target language. In this work, we observe that off-target translation is dominant even in strong multilingual systems, trained on massive multilingual corpora. To address this issue, we propose a joint approach to regularize NMT models at both representation-level and gradient-level. At the representation level, we leverage an auxiliary target language prediction task to regularize decoder outputs to retain information about the target language. At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance by 5.59 and 10.38 BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our method also works well when the small amount of direct data is not available.

【18】 Automated Machine Learning, Bounded Rationality, and Rational Metareasoning 标题:自动机器学习、有限理性和理性元区域划分 链接:https://arxiv.org/abs/2109.04744

作者:Eyke Hüllermeier,Felix Mohr,Alexander Tornede,Marcel Wever 机构: LMU Munich, Munich, Germany, Universidad de La Sabana, Ch´ıa, Cundinamarca, Colombia, Paderborn University, Paderborn, Germany 备注:Accepted at ECMLPKDD WORKSHOP ON AUTOMATING DATA SCIENCE (ADS2021) - this https URL 摘要:有限理性的概念源于这样一种认识,即拥有有限认知或计算资源的主体无法实现完全理性的行为。关于有限理性的研究,主要由赫BERT·西蒙(Herbert Simon)发起,在经济学和社会科学领域有着悠久的传统,但在现代人工智能和智能体设计中也发挥着重要作用。在有限资源下采取行动需要代理思考如何以最佳方式使用这些资源,从而在元级别上进行推理和决策。在本文中,我们将从有限理性的角度来看待自动机器学习(AutoML)和相关问题,本质上将AutoML工具视为一个代理,它必须在给定的数据集上训练模型,并将寻找一种好的方法(合适的“ML管道”)视为元层面上的考虑。 摘要:The notion of bounded rationality originated from the insight that perfectly rational behavior cannot be realized by agents with limited cognitive or computational resources. Research on bounded rationality, mainly initiated by Herbert Simon, has a longstanding tradition in economics and the social sciences, but also plays a major role in modern AI and intelligent agent design. Taking actions under bounded resources requires an agent to reflect on how to use these resources in an optimal way - hence, to reason and make decisions on a meta-level. In this paper, we will look at automated machine learning (AutoML) and related problems from the perspective of bounded rationality, essentially viewing an AutoML tool as an agent that has to train a model on a given set of data, and the search for a good way of doing so (a suitable "ML pipeline") as deliberation on a meta-level.

【19】 Boosting Graph Search with Attention Network for Solving the General Orienteering Problem 标题:用注意力网络增强图搜索求解一般定向问题 链接:https://arxiv.org/abs/2109.04730

作者:Zongtao Liu,Jing Xu,Jintao Su,Tao Xiao,Yang Yang 机构:Zhejiang University, State Grid Corporation of China 备注:7 pages, 3 figures 摘要:最近,一些研究探索了使用神经网络来解决不同的路由问题,这是一个好的方向。这些研究通常设计一个基于编码器-解码器的框架,该框架使用节点的编码器嵌入和特定于问题的上下文来生成节点序列(路径),并通过波束搜索进一步优化生成的结果。然而,现有的模型只能支持节点坐标作为输入,忽略了所研究路由问题的自参考特性,缺乏对节点选择初始阶段可靠性低的考虑,难以在实际中应用。在本文中,我们以定向运动问题为例来解决这些局限性。我们提出了一种新的组合的变异波束搜索算法和学习启发式解决一般定向问题。我们通过一个以节点间距离为输入的注意网络获取启发式,并通过强化学习框架进行学习。实证研究表明,我们的方法可以超越广泛的基线,获得接近最优或高度专业化方法的结果。此外,我们提出的框架可以很容易地应用于其他路由问题。我们的代码是公开的。 摘要:Recently, several studies have explored the use of neural network to solve different routing problems, which is an auspicious direction. These studies usually design an encoder-decoder based framework that uses encoder embeddings of nodes and the problem-specific context to produce node sequence(path), and further optimize the produced result on top by beam search. However, existing models can only support node coordinates as input, ignore the self-referential property of the studied routing problems, and lack the consideration about the low reliability in the initial stage of node selection, thus are hard to be applied in real-world. In this paper, we take the orienteering problem as an example to tackle these limitations. We propose a novel combination of a variant beam search algorithm and a learned heuristic for solving the general orienteering problem. We acquire the heuristic with an attention network that takes the distances among nodes as input, and learn it via a reinforcement learning framework. The empirical studies show that our method can surpass a wide range of baselines and achieve results close to the optimal or highly specialized approach. Also, our proposed framework can be easily applied to other routing problems. Our code is publicly available.

【20】 A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations 标题:一种简单有效的消除多语种表征中自我语言偏差的方法 链接:https://arxiv.org/abs/2109.04727

作者:Ziyi Yang,Yinfei Yang,Daniel Cer,Eric Darve 机构:Stanford University, Google Research 备注:Accepted to the 2021 Conference on Empirical Methods in Natural Language Processing 摘要:语言不可知和语义语言信息隔离是多语言表示模型的一个新兴研究方向。我们从几何代数和语义空间的新角度来探讨这个问题。一种简单而高效的方法“语言信息去除(LIR)”从多语种数据预训练的多语种表示中的语义相关成分中提取出语言身份信息。作为一种后训练和模型不可知的方法,LIR只使用简单的线性运算,例如矩阵分解和正交投影。LIR表明,对于弱对齐多语言系统,语义空间的主要成分主要编码语言身份信息。我们首先在跨语言问答检索任务(LAReQA)上评估LIR,该任务要求对多语言嵌入空间进行强对齐。实验表明,LIR在这项任务上非常有效,对于弱对齐模型,MAP的相对改善率几乎为100%。然后,我们在Amazon Reviews和XEVAL数据集上评估LIR,观察到删除语言信息能够提高跨语言迁移性能。 摘要:Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment shows that LIR is highly effectively on this task, yielding almost 100% relative improvement in MAP for weak-alignment models. We then evaluate the LIR on Amazon Reviews and XEVAL dataset, with the observation that removing language information is able to improve the cross-lingual transfer performance.

【21】 6MapNet: Representing soccer players from tracking data by a triplet network 标题:6MapNet:用三元组网络表示足球运动员的跟踪数据 链接:https://arxiv.org/abs/2109.04720

作者:Hyunsung Kim,Jihun Kim,Dongwook Chung,Jonghyun Lee,Jinsung Yoon,Sang-Ki Ko 机构: Fitogether Inc., Seoul, South Korea, Seoul National University, Seoul, South Korea, Kangwon National University, Chuncheon, South Korea 备注:12 pages, 4 figures, In 8th Workshop on Machine Learning and Data Mining for Sports Analytics (MLSA21) 摘要:虽然个别足球运动员的价值观已经达到了天文数字,但主观判断仍然在球员分析中发挥着重要作用。最近,有新的尝试使用基于视频的事件流数据定量地掌握球员的风格。然而,由于注释成本高和事件流数据稀疏,它们在可伸缩性方面存在一些限制。在本文中,我们构建了一个名为6MPANET的三重网络,该网络可以使用游戏中的GPS数据有效地捕获玩家的移动方式。在没有任何足球特定动作注释的情况下,我们使用球员的位置和速度生成两种类型的热图。然后,我们的子网络将这些热图对映射为特征向量,这些特征向量的相似性对应于游戏风格的实际相似性。实验结果表明,该方法只需少量的匹配就能准确地识别玩家。 摘要:Although the values of individual soccer players have become astronomical, subjective judgments still play a big part in the player analysis. Recently, there have been new attempts to quantitatively grasp players' styles using video-based event stream data. However, they have some limitations in scalability due to high annotation costs and sparsity of event stream data. In this paper, we build a triplet network named 6MapNet that can effectively capture the movement styles of players using in-game GPS data. Without any annotation of soccer-specific actions, we use players' locations and velocities to generate two types of heatmaps. Our subnetworks then map these heatmap pairs into feature vectors whose similarity corresponds to the actual similarity of playing styles. The experimental results show that players can be accurately identified with only a small number of matches by our method.

【22】 Heterogeneous Graph Neural Networks for Keyphrase Generation 标题:用于关键词生成的异构图神经网络 链接:https://arxiv.org/abs/2109.04703

作者:Jiacheng Ye,Ruijian Cai,Tao Gui,Qi Zhang 机构:School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai, China, Institute of Modern Languages and Linguistics, Fudan University 备注:Accepted by EMNLP 2021 摘要:编码器-解码器框架通过预测源文档中出现的当前关键短语和不出现的关键短语,在关键短语生成(KG)任务中实现最先进的结果。但是,仅依赖源文档可能会导致生成无法控制且不准确的缺席关键短语。为了解决这些问题,我们提出了一种新的基于图的方法,可以从相关参考文献中获取明确的知识。我们的模型首先从预定义的索引中检索一些与源文档相似的文档关键字对作为引用。然后构造一个异构图来捕获源文档与其引用之间不同粒度的关系。为了指导解码过程,引入了分层注意和复制机制,该机制根据源文档及其引用的相关性和重要性,直接从源文档及其引用中复制适当的单词。在多个KG基准上的实验结果表明,与其他基准模型相比,该模型取得了显著的改进,尤其是在缺少关键短语预测方面。 摘要:The encoder-decoder framework achieves state-of-the-art results in keyphrase generation (KG) tasks by predicting both present keyphrases that appear in the source document and absent keyphrases that do not. However, relying solely on the source document can result in generating uncontrollable and inaccurate absent keyphrases. To address these problems, we propose a novel graph-based method that can capture explicit knowledge from related references. Our model first retrieves some document-keyphrases pairs similar to the source document from a pre-defined index as references. Then a heterogeneous graph is constructed to capture relationships of different granularities between the source document and its references. To guide the decoding process, a hierarchical attention and copy mechanism is introduced, which directly copies appropriate words from both the source document and its references based on their relevance and significance. The experimental results on multiple KG benchmarks show that the proposed model achieves significant improvements against other baseline models, especially with regard to the absent keyphrase prediction.

【23】 Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning 标题:利用可区分奖励模仿学习生成自含式和以摘要为中心的问题答案对 链接:https://arxiv.org/abs/2109.04689

作者:Li Zhou,Kevin Small,Yong Zhang,Sandeep Atluri 机构:Amazon Alexa 备注:To appear in Proceedings of EMNLP 2021 摘要:受会话新闻推荐系统中建议问题生成的启发,我们提出了一个以摘要为中心的自足问题和长度受限的文章摘要答案生成问题-答案对(QA对)的模型。我们首先收集一个新的新闻文章数据集,以问题为标题,并将其与不同长度的摘要配对。该数据集用于学习一个QA对生成模型,生成摘要作为答案,在简洁性和充分性以及相应问题之间取得平衡。然后,我们用一个可微的奖励函数来加强QA对生成过程,以减轻暴露偏差,这是自然语言生成中的一个常见问题。自动度量和人工评估都表明,这些QA对成功地抓住了文章的中心要点,并实现了较高的答案准确性。 摘要:Motivated by suggested question generation in conversational news recommendation systems, we propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers. We begin by collecting a new dataset of news articles with questions as titles and pairing them with summaries of varying length. This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions. We then reinforce the QA pair generation process with a differentiable reward function to mitigate exposure bias, a common problem in natural language generation. Both automatic metrics and human evaluation demonstrate these QA pairs successfully capture the central gists of the articles and achieve high answer accuracy.

【24】 Enhancing Unsupervised Anomaly Detection with Score-Guided Network 标题:利用分数导引网络增强无监督异常检测 链接:https://arxiv.org/abs/2109.04684

作者:Zongyuan Huang,Baohua Zhang,Guoqiang Hu,Longyuan Li,Yanyan Xu,Yaohui Jin 机构: Jin are with the MoE Key Laboratoryof Artificial Intelligence and AI Institute 摘要:异常检测在各种实际应用中起着至关重要的作用,包括医疗保健和金融系统。由于这些复杂系统中异常标签的数量有限,无监督异常检测方法近年来受到了广泛关注。现有无监督方法面临的两大挑战是:(i)在过渡场中区分正常数据和异常数据,其中正常数据和异常数据高度混合在一起;(ii)定义一个有效的度量,以最大化由表征学习者建立的假设空间中正常和异常数据之间的差距。为此,本研究提出了一种新的评分网络,该网络采用评分引导的正则化方法来学习和扩大正常数据和异常数据之间的异常评分差异。通过这种分数引导策略,表征学习者可以在模型训练阶段逐步学习更多的信息表征,特别是对于过渡场中的样本。接下来,我们提出了一种分数引导自动编码器(SG-AE),将分数网络合并到用于异常检测的自动编码器框架中,以及其他三种最先进的模型,以进一步证明设计的有效性和可转移性。在合成数据集和真实数据集上的大量实验证明了这些分数引导模型(SGM)的最新性能。 摘要:Anomaly detection plays a crucial role in various real-world applications, including healthcare and finance systems. Owing to the limited number of anomaly labels in these complex systems, unsupervised anomaly detection methods have attracted great attention in recent years. Two major challenges faced by the existing unsupervised methods are: (i) distinguishing between normal and abnormal data in the transition field, where normal and abnormal data are highly mixed together; (ii) defining an effective metric to maximize the gap between normal and abnormal data in a hypothesis space, which is built by a representation learner. To that end, this work proposes a novel scoring network with a score-guided regularization to learn and enlarge the anomaly score disparities between normal and abnormal data. With such score-guided strategy, the representation learner can gradually learn more informative representation during the model training stage, especially for the samples in the transition field. We next propose a score-guided autoencoder (SG-AE), incorporating the scoring network into an autoencoder framework for anomaly detection, as well as other three state-of-the-art models, to further demonstrate the effectiveness and transferability of the design. Extensive experiments on both synthetic and real-world datasets demonstrate the state-of-the-art performance of these score-guided models (SGMs).

【25】 PIP: Physical Interaction Prediction via Mental Imagery with Span Selection 标题:PIP:基于带跨度选择的心理意象的物理交互预测 链接:https://arxiv.org/abs/2109.04683

作者:Jiafei Duan,Samson Yu,Soujanya Poria,Bihan Wen,Cheston Tan 机构:Institute for Infocomm Research, ASTAR, Singapore University of Technology and Design, Nanyang Technological University of Singapore 摘要:为了使高级人工智能(AI)与人类价值观保持一致并促进安全的AI,AI预测物理交互的结果非常重要。即使关于人类如何预测现实世界中物体之间物理交互的结果的争论还在继续,也有人试图通过认知启发的人工智能方法来解决这一问题。然而,仍然缺乏模拟人类用于预测现实世界中物理交互的心理意象的人工智能方法。在这项工作中,我们提出了一种新的PIP方案:通过跨度选择的心理意象进行物理交互预测。PIP利用深度生成模型输出对象间物理交互的未来帧,然后通过使用跨度选择关注显著帧来提取预测物理交互的关键信息。为了评估我们的模型,我们提出了一个大规模的空间 合成视频帧数据集,包括三维环境中的三个物理交互事件。我们的实验表明,PIP在可见和不可见对象的物理交互预测方面都优于基线和人类性能。此外,PIP的跨度选择方案可以有效地识别在生成的帧中对象之间发生物理交互的帧,从而增加可解释性。 摘要:To align advanced artificial intelligence (AI) with human values and promote safe AI, it is important for AI to predict the outcome of physical interactions. Even with the ongoing debates on how humans predict the outcomes of physical interactions among objects in the real world, there are works attempting to tackle this task via cognitive-inspired AI approaches. However, there is still a lack of AI approaches that mimic the mental imagery humans use to predict physical interactions in the real world. In this work, we propose a novel PIP scheme: Physical Interaction Prediction via Mental Imagery with Span Selection. PIP utilizes a deep generative model to output future frames of physical interactions among objects before extracting crucial information for predicting physical interactions by focusing on salient frames using span selection. To evaluate our model, we propose a large-scale SPACE dataset of synthetic video frames, including three physical interaction events in a 3D environment. Our experiments show that PIP outperforms baselines and human performance in physical interaction prediction for both seen and unseen objects. Furthermore, PIP's span selection scheme can effectively identify the frames where physical interactions among objects occur within the generated frames, allowing for added interpretability.

【26】 Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model 标题:一种文本到文本迁移模型的数值学习能力研究 链接:https://arxiv.org/abs/2109.04672

作者:Kuntal Kumar Pal,Chitta Baral 机构:Department of Computer Science, Arizona State University, Tempe, Arizona, USA 备注:7 pages, 10 figures, 5 tables, Accepted in the Findings of EMNLP 2021 摘要:基于transformer的预训练语言模型在大多数传统NLP任务中都非常成功。但在那些需要数字理解的任务中,他们往往会遇到困难。一些可能的原因可能是没有专门设计用于学习和保持计算能力的标记器和预训练目标。在这里,我们研究了文本到文本迁移学习模型(T5)学习算术的能力,该模型在传统NLP任务中的表现优于前人。我们考虑了四个数字任务:数列、数量级预测、序列中的最小值和最大值和排序。我们发现,尽管T5模型在插值设置中表现相当好,但在所有四项任务中,它们在外推设置中表现相当困难。 摘要:The transformer-based pre-trained language models have been tremendously successful in most of the conventional NLP tasks. But they often struggle in those tasks where numerical understanding is required. Some possible reasons can be the tokenizers and pre-training objectives which are not specifically designed to learn and preserve numeracy. Here we investigate the ability of text-to-text transfer learning model (T5), which has outperformed its predecessors in the conventional NLP tasks, to learn numeracy. We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting. We find that, although T5 models perform reasonably well in the interpolation setting, they struggle considerably in the extrapolation setting across all four tasks.

【27】 Dynamic Collective Intelligence Learning: Finding Efficient Sparse Model via Refined Gradients for Pruned Weights 标题:动态集体智能学习:通过精化剪枝权重梯度找到有效的稀疏模型 链接:https://arxiv.org/abs/2109.04660

作者:Jangho Kim,Jayeon Yoo,Yeji Song,KiYoon Yoo,Nojun Kwak 摘要:随着深度神经网络(DNN)的发展,DNN参数的数量急剧增加。这使得DNN模型很难部署在资源有限的嵌入式系统上。为了缓解这一问题,出现了动态剪枝方法,该方法通过使用直通估计器(STE)来近似剪枝权重的梯度,试图在训练期间发现不同的稀疏模式。STE可以帮助修剪后的权重在发现动态稀疏模式的过程中恢复。然而,由于STE近似的梯度信号不可靠,使用这些粗梯度会导致训练不稳定和性能下降。在这项工作中,为了解决这个问题,我们引入了改进的梯度,通过从两个权重集(修剪和未修剪)形成双转发路径来更新修剪后的权重。我们提出了一种新的动态集体智能学习(DCIL),它利用了两个权重集的集体智能之间的学习协同作用。我们通过在CIFAR和ImageNet数据集上显示训练稳定性和模型性能的增强来验证改进梯度的有用性。DCIL优于以前提出的各种剪枝方案,包括其他动态剪枝方法,在训练期间具有更强的稳定性。 摘要:With the growth of deep neural networks (DNN), the number of DNN parameters has drastically increased. This makes DNN models hard to be deployed on resource-limited embedded systems. To alleviate this problem, dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. STE can help the pruned weights revive in the process of finding dynamic sparsity patterns. However, using these coarse gradients causes training instability and performance degradation owing to the unreliable gradient signal of the STE approximation. In this work, to tackle this issue, we introduce refined gradients to update the pruned weights by forming dual forwarding paths from two sets (pruned and unpruned) of weights. We propose a novel Dynamic Collective Intelligence Learning (DCIL) which makes use of the learning synergy between the collective intelligence of both weight sets. We verify the usefulness of the refined gradients by showing enhancements in the training stability and the model performance on the CIFAR and ImageNet datasets. DCIL outperforms various previously proposed pruning schemes including other dynamic pruning methods with enhanced stability during training.

【28】 AI Agents in Emergency Response Applications 标题:人工智能Agent在应急响应中的应用 链接:https://arxiv.org/abs/2109.04646

作者:Aryan Naim,Ryan Alimo,Jay Braun 摘要:应急人员应对各种情况,从火灾、医疗、危险材料、工业事故到自然灾害。自然灾害或恐怖行为等情况需要消防员、医护人员、危险品小组和其他机构做出多方面的反应。工程人工智能系统,以帮助应急人员被证明是一个困难的系统工程问题。任务关键型“边缘AI”情况需要低延迟、可靠的分析。为了进一步增加复杂性,当生命受到威胁时,需要高度的模型精度,因此需要将高度精确但计算密集的模型部署到资源受限的设备上。为了解决所有这些问题,我们提出了一种基于代理的体系结构,用于通过基于5G服务的体系结构部署AI代理。 摘要:Emergency personnel respond to various situations ranging from fire, medical, hazardous materials, industrial accidents, to natural disasters. Situations such as natural disasters or terrorist acts require a multifaceted response of firefighters, paramedics, hazmat teams, and other agencies. Engineering AI systems that aid emergency personnel proves to be a difficult system engineering problem. Mission-critical "edge AI" situations require low-latency, reliable analytics. To further add complexity, a high degree of model accuracy is required when lives are at stake, creating a need for the deployment of highly accurate, however computationally intensive models to resource-constrained devices. To address all these issues, we propose an agent-based architecture for deployment of AI agents via 5G service-based architecture.

【29】 Knowledge-Assisted Reasoning of Model-Augmented System Requirements with Event Calculus and Goal-Directed Answer Set Programming 标题:基于事件演算和目标定向答案集编程的增模系统需求知识辅助推理 链接:https://arxiv.org/abs/2109.04634

作者:Brendan Hall,Sarat Chandra Varanasi,Jan Fiedor,Joaquín Arias,Kinjal Basu,Fang Li,Devesh Bhatt,Kevin Driscoll,Elmer Salazar,Gopal Gupta 机构:Honeywell Advanced Technology, Plymouth, USA, The University of Texas at Dallas, Richardson, USA, Honeywell International s.r.o & Brno Univ. of Technology, Brno, Czech Republic, Universidad Rey Juan Carlos, Madrid, Spain 备注:None 摘要:我们考虑在受限自然语言中表示的网络物理系统的要求。我们提出了新的自动化技术,以帮助开发这些需求,从而使它们保持一致,并能够承受感知到的故障。我们展示了如何使用事件演算(EC)对网络物理系统的需求进行建模,事件演算是AI中用于表示动作和变化的一种形式。我们还展示了如何使用应答集编程(ASP)及其查询驱动实现(CASP)直接实现需求的事件演算模型。此事件演算模型可用于自动验证需求。由于ASP是一种表达性的知识表示语言,因此它还可以用来表示有关网络物理系统的上下文知识,而这反过来又可以用来查找其需求规范中的漏洞。我们通过航空电子领域的高度警报系统来说明我们的方法。 摘要:We consider requirements for cyber-physical systems represented in constrained natural language. We present novel automated techniques for aiding in the development of these requirements so that they are consistent and can withstand perceived failures. We show how cyber-physical systems' requirements can be modeled using the event calculus (EC), a formalism used in AI for representing actions and change. We also show how answer set programming (ASP) and its query-driven implementation s(CASP) can be used to directly realize the event calculus model of the requirements. This event calculus model can be used to automatically validate the requirements. Since ASP is an expressive knowledge representation language, it can also be used to represent contextual knowledge about cyber-physical systems, which, in turn, can be used to find gaps in their requirements specifications. We illustrate our approach through an altitude alerting system from the avionics domain.

【30】 Efficiently Identifying Task Groupings for Multi-Task Learning 标题:多任务学习中任务分组的有效识别 链接:https://arxiv.org/abs/2109.04617

作者:Christopher Fifty,Ehsan Amid,Zhe Zhao,Tianhe Yu,Rohan Anil,Chelsea Finn 机构:Google Brain, Google Research, Stanford University 摘要:多任务学习可以利用一个任务学习到的信息来帮助其他任务的训练。尽管有这种能力,但在一个模型中天真地将所有任务训练在一起通常会降低性能,并且通过任务分组组合进行彻底搜索的成本可能会高得令人望而却步。因此,在没有明确解决方案的情况下,有效地确定将从联合训练中受益的任务仍然是一个具有挑战性的设计问题。在本文中,我们提出了一种在多任务学习模型中选择哪些任务应该一起训练的方法。我们的方法通过将所有任务共同训练并量化一个任务的梯度对另一个任务损失的影响来确定单个训练运行中的任务分组。在大规模Taskonomy计算机视觉数据集上,我们发现,与简单地同时训练所有任务相比,这种方法可以减少10.0%的测试损失,同时操作速度比最先进的任务分组方法快11.6倍。 摘要:Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from co-training remains a challenging design question without a clear solution. In this paper, we suggest an approach to select which tasks should train together in multi-task learning models. Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss. On the large-scale Taskonomy computer vision dataset, we find this method can decrease test loss by 10.0% compared to simply training all tasks together while operating 11.6 times faster than a state-of-the-art task grouping method.

【31】 EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation 标题:EVOQUER:通过视频旋转的BackQuery生成增强时间基础 链接:https://arxiv.org/abs/2109.04600

作者:Yanjun Gao,Lulu Liu,Jason Wang,Xin Chen,Huayan Wang,Rui Zhang 机构:Pennsylvania State University, Kwai Inc 备注:Accepted by Visually Grounded Interaction and Language (ViGIL) Workshop at NAACL 2021 摘要:时间基础旨在预测与自然语言查询输入相对应的视频片段的时间间隔。在这项工作中,我们提出了EVOQUER,一个结合现有文本到视频接地模型和视频辅助查询生成网络的临时接地框架。给定一个查询和一个未剪辑的视频,时间接地模型预测目标间隔,并通过生成输入查询的简化版本将预测的视频剪辑输入到视频翻译任务中。EVOQUER通过合并来自时间基础和作为反馈的查询生成的损失函数来形成闭环学习。我们在两个广泛使用的数据集(Charades STA和ActivityNet)上的实验表明,EVOQUER在最短时间内实现了1.05和1.31的有希望的改进R@0.7. 我们还讨论了查询生成任务如何通过解释时态基础模型行为来促进错误分析。 摘要:Temporal grounding aims to predict a time interval of a video clip corresponding to a natural language query input. In this work, we present EVOQUER, a temporal grounding framework incorporating an existing text-to-video grounding model and a video-assisted query generation network. Given a query and an untrimmed video, the temporal grounding model predicts the target interval, and the predicted video clip is fed into a video translation task by generating a simplified version of the input query. EVOQUER forms closed-loop learning by incorporating loss functions from both temporal grounding and query generation serving as feedback. Our experiments on two widely used datasets, Charades-STA and ActivityNet, show that EVOQUER achieves promising improvements by 1.05 and 1.31 at R@0.7. We also discuss how the query generation task could facilitate error analysis by explaining temporal grounding model behavior.

【32】 Deciphering Environmental Air Pollution with Large Scale City Data 标题:利用大比例尺城市数据破译环境空气污染 链接:https://arxiv.org/abs/2109.04572

作者:Mayukh Bhattacharyya,Sayan Nag,Udita Ghosh 机构:Stony Brook University, University of Toronto, Zendrive 摘要:在21世纪对可持续环境条件构成威胁的众多危害中,只有少数危害的影响比空气污染更严重。它在确定城市环境中的健康和生活水平方面的重要性只会随着时间的推移而增加。从交通和发电厂的排放、家庭排放、自然原因等各种因素都是造成空气污染水平上升的主要原因或影响因素。然而,缺乏涉及主要因素的大规模数据阻碍了对不同空气污染物可变性的原因和关系的研究。通过这项工作,我们引入了一个大规模的城市智能数据集,用于探索这些代理之间长期的关系。我们对数据集进行分析和探索,得出我们可以通过对数据建模得出的推论。此外,我们还提供了一套基准,用于使用一套不同的模型和方法来估计或预测污染物水平。通过我们的论文,我们试图为这一领域的进一步研究提供一个基础,这将需要我们在不久的将来给予高度重视。 摘要:Out of the numerous hazards posing a threat to sustainable environmental conditions in the 21st century, only a few have a graver impact than air pollution. Its importance in determining the health and living standards in urban settings is only expected to increase with time. Various factors ranging from emissions from traffic and power plants, household emissions, natural causes are known to be primary causal agents or influencers behind rising air pollution levels. However, the lack of large scale data involving the major factors has hindered the research on the causes and relations governing the variability of the different air pollutants. Through this work, we introduce a large scale city-wise dataset for exploring the relationships among these agents over a long period of time. We analyze and explore the dataset to bring out inferences which we can derive by modeling the data. Also, we provide a set of benchmarks for the problem of estimating or forecasting pollutant levels with a set of diverse models and methodologies. Through our paper, we seek to provide a ground base for further research into this domain that will demand critical attention of ours in the near future.

【33】 TENET: Temporal CNN with Attention for Anomaly Detection in Automotive Cyber-Physical Systems 标题:宗旨:关注时间的CNN用于汽车数字物理系统中的异常检测 链接:https://arxiv.org/abs/2109.04565

作者:S. V. Thiruloga,V. K. Kukkala,S. Pasricha 机构: Temporal CNN with Attention for Anomaly Detection in Automotive Cyber-Physical Systems Sooryaa Vignesh Thiruloga Electrical and Computer Engineering Colorado State University Fort Collins CO USA sooryaa 摘要:现代车辆具有多个电子控制单元(ECU),它们作为复杂的分布式网络物理系统(CPS)的一部分连接在一起。电子控制系统和外部电子系统之间日益增加的通信使这些车辆特别容易受到各种网络攻击。在这项工作中,我们提出了一种新的异常检测框架,称为TENET,用于检测由对车辆的网络攻击引起的异常。特尼特使用时间卷积神经网络和集成注意机制来检测异常攻击模式。与之前在汽车异常检测方面表现最佳的工作相比,TENET能够实现32.70%的假阴性率、19.14%的马修斯相关系数和17.25%的ROC-AUC度量改进,模型参数减少94.62%,内存占用减少86.95%,推理时间减少48.14%。 摘要:Modern vehicles have multiple electronic control units (ECUs) that are connected together as part of a complex distributed cyber-physical system (CPS). The ever-increasing communication between ECUs and external electronic systems has made these vehicles particularly susceptible to a variety of cyber-attacks. In this work, we present a novel anomaly detection framework called TENET to detect anomalies induced by cyber-attacks on vehicles. TENET uses temporal convolutional neural networks with an integrated attention mechanism to detect anomalous attack patterns. TENET is able to achieve an improvement of 32.70% in False Negative Rate, 19.14% in the Mathews Correlation Coefficient, and 17.25% in the ROC-AUC metric, with 94.62% fewer model parameters, 86.95% decrease in memory footprint, and 48.14% lower inference time when compared to the best performing prior work on automotive anomaly detection.

【34】 Identifying Morality Frames in Political Tweets using Relational Learning 标题:利用关系学习识别政治推文中的道德框架 链接:https://arxiv.org/abs/2109.04535

作者:Shamik Roy,Maria Leonor Pacheco,Dan Goldwasser 机构:Department of Computer Science, Purdue University, West Lafayette, IN, USA 备注:Accepted to EMNLP 2021 摘要:从文本中提取道德情感是理解公众舆论、社会运动和政策决策的重要组成部分。道德基础理论确定了五个道德基础,每个道德基础都有正负极性。然而,道德情感往往是由其目标驱动的,目标可以对应于个人或集体实体。在本文中,我们介绍了道德框架,这是一个组织针对不同实体的道德态度的表示框架,并提出了一个新颖、高质量的美国政客所写推文注释数据集。然后,我们提出了一个关系学习模型来联合预测对实体和道德基础的道德态度。我们做了定性和定量的评估,表明不同的政治意识形态对实体的道德情感差异很大。 摘要:Extracting moral sentiment from text is a vital component in understanding public opinion, social movements, and policy decisions. The Moral Foundation Theory identifies five moral foundations, each associated with a positive and negative polarity. However, moral sentiment is often motivated by its targets, which can correspond to individuals or collective entities. In this paper, we introduce morality frames, a representation framework for organizing moral attitudes directed at different entities, and come up with a novel and high-quality annotated dataset of tweets written by US politicians. Then, we propose a relational learning model to predict moral attitudes towards entities and moral foundations jointly. We do qualitative and quantitative evaluations, showing that moral sentiment towards entities differs highly across political ideologies.

【35】 Bootstrapped Meta-Learning 标题:自助式元学习 链接:https://arxiv.org/abs/2109.04504

作者:Sebastian Flennerhag,Yannick Schroecker,Tom Zahavy,Hado van Hasselt,David Silver,Satinder Singh 机构:DeepMind 备注:31 pages, 19 figures, 7 tables 摘要:元学习使人工智能能够通过学习如何学习来提高其效率。释放这种潜力需要克服一个具有挑战性的元优化问题,该问题通常表现出病态和短视的元目标。我们提出了一种算法,通过让元学习者自学来解决这些问题。该算法首先从元学习器中引导一个目标,然后通过在选择的(伪)度量下最小化到该目标的距离来优化元学习器。以梯度元学习为重点,我们建立了保证性能改进的条件,并表明改进与目标距离有关。因此,通过控制曲率,可以使用距离度量来简化元优化,例如通过减少病态。此外,自举机制可以扩展有效的元学习范围,而无需通过所有更新进行反向传播。该算法通用性强,易于实现。我们在Atari ALE基准上实现了无模型代理的最新水平,在Few-Shot学习中改进了MAML,并展示了我们的方法如何通过在Q-learning代理中进行元学习和高效探索来打开新的可能性。 摘要:Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem that often exhibits ill-conditioning, and myopic meta-objectives. We propose an algorithm that tackles these issues by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the improvement is related to the target distance. Thus, by controlling curvature, the distance measure can be used to ease meta-optimization, for instance by reducing ill-conditioning. Further, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The algorithm is versatile and easy to implement. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities by meta-learning efficient exploration in a Q-learning agent.

机器翻译,仅供参考

0 人点赞