人工智能学术速递[8.19]

2021-08-24 16:35:41 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.AI人工智能,共计33篇

【1】 Fake News and Phishing Detection Using a Machine Learning Trained Expert System 标题:基于机器学习训练的专家系统在假新闻和钓鱼检测中的应用 链接:https://arxiv.org/abs/2108.08264

作者:Benjamin Fitzpatrick,Xinyu "Sherwin" Liang,Jeremy Straub 机构:Department of Electrical and Computer Engineering, University of Alabama, H.M. Comer,th Avenue, Tuscaloosa, AL , Phone: , (,) ,-, Fax: , (,) ,-, Xinyu “Sherwin” Liang, Dallas College – North Lake, N. MacArthur Blvd., Irving, TX , Corresponding Author 摘要:专家系统已被用来使计算机能够作出建议和决定。本文介绍了一个机器学习训练专家系统(MLES)在钓鱼网站检测和假新闻检测中的应用。这两个主题都有一个相似的目标:设计一个规则事实网络,允许计算机像领域专家一样在各个领域做出可解释的决策。钓鱼网站检测研究使用MLES通过分析网站属性(如URL长度和过期时间)来检测潜在的钓鱼网站。虚假新闻检测研究使用MLES规则事实网络,根据情感、说话人的政治背景和工作等因素来衡量新闻报道的真实性。这两项研究使用不同的MLES网络实现,本文对此进行了介绍和比较。假新闻研究采用了更线性的设计,而网络钓鱼项目则采用了更复杂的连接结构。这两个网络的输入都基于常用的数据集。 摘要:Expert systems have been used to enable computers to make recommendations and decisions. This paper presents the use of a machine learning trained expert system (MLES) for phishing site detection and fake news detection. Both topics share a similar goal: to design a rule-fact network that allows a computer to make explainable decisions like domain experts in each respective area. The phishing website detection study uses a MLES to detect potential phishing websites by analyzing site properties (like URL length and expiration time). The fake news detection study uses a MLES rule-fact network to gauge news story truthfulness based on factors such as emotion, the speaker's political affiliation status, and job. The two studies use different MLES network implementations, which are presented and compared herein. The fake news study utilized a more linear design while the phishing project utilized a more complex connection structure. Both networks' inputs are based on commonly available data sets.

【2】 Combating Informational Denial-of-Service (IDoS) Attacks: Modeling and Mitigation of Attentional Human Vulnerability 标题:对抗信息拒绝服务(IDOS)攻击:关注的人类脆弱性的建模和缓解 链接:https://arxiv.org/abs/2108.08255

作者:Linan Huang,Quanyan Zhu 摘要:这项工作提出了一类新的主动攻击,称为信息拒绝服务(IDoS)攻击,它利用人的注意力弱点。IDoS攻击通过产生大量假动作,耗尽了人类操作员的认知资源,从而阻止人类识别隐藏在假动作中的真实攻击。这项工作旨在正式定义IDoS攻击,量化其后果,并开发人类辅助安全技术,以减轻IDoS攻击的严重程度和风险。为此,我们将带有类别标签的假攻击和真攻击的顺序到达建模为半马尔可夫过程。辅助技术通过定期突出显示选择性警报以防止其他警报分散注意力,从而战略性地管理人类注意力。采用数据驱动的方法对不同注意管理策略下的人的绩效进行评估。在一个典型的特殊情况下,我们建立了两个动态规划表示之间的计算等价性,以简化理论计算和在线学习。案例研究证实了学习框架的有效性。数值结果说明了AM策略如何减轻IDoS攻击的严重程度和风险。此外,我们描述了所有AM策略下最低严重性水平的基本限值,以及减少IDoS风险的检查期的最大长度。 摘要:This work proposes a new class of proactive attacks called the Informational Denial-of-Service (IDoS) attacks that exploit the attentional human vulnerability. By generating a large volume of feints, IDoS attacks deplete the cognition resources of human operators to prevent humans from identifying the real attacks hidden among feints. This work aims to formally define IDoS attacks, quantify their consequences, and develop human-assistive security technologies to mitigate the severity level and risks of IDoS attacks. To this end, we model the feint and real attacks' sequential arrivals with category labels as a semi-Markov process. The assistive technology strategically manages human attention by highlighting selective alerts periodically to prevent the distraction of other alerts. A data-driven approach is applied to evaluate human performance under different Attention Management (AM) strategies. Under a representative special case, we establish the computational equivalency between two dynamic programming representations to simplify the theoretical computation and the online learning. A case study corroborates the effectiveness of the learning framework. The numerical results illustrate how AM strategies can alleviate the severity level and the risk of IDoS attacks. Furthermore, we characterize the fundamental limits of the minimum severity level under all AM strategies and the maximum length of the inspection period to reduce the IDoS risks.

【3】 Deep Natural Language Processing for LinkedIn Search Systems 标题:LinkedIn搜索系统的深度自然语言处理 链接:https://arxiv.org/abs/2108.08252

作者:Weiwei Guo,Xiaowei Liu,Sida Wang,Michaeel Kazi,Zhoutong Fu,Huiji Gao,Jun Jia,Liang Zhang,Bo Long 机构:LinkedIn, Mountain View, CA 摘要:许多搜索系统处理大量自然语言数据,例如搜索查询、用户配置文件和文档,其中基于深度学习的自然语言处理技术(deep-NLP)可以提供很大帮助。在这篇文章中,我们介绍了一项综合研究,将深度自然语言处理技术应用于搜索引擎中的五个代表性任务。通过五个任务的模型设计和实验,读者可以找到三个重要问题的答案:(1)深度自然语言处理在搜索系统中什么时候有用/不有用(2) 如何应对延迟挑战(3) 如何确保模型的健壮性?这项工作建立在LinkedIn搜索现有工作的基础上,并在商业搜索引擎上进行了大规模测试。我们相信,我们的经验可以为行业和研究界提供有用的见解。 摘要:Many search systems work with large amounts of natural language data, e.g., search queries, user profiles and documents, where deep learning based natural language processing techniques (deep NLP) can be of great help. In this paper, we introduce a comprehensive study of applying deep NLP techniques to five representative tasks in search engines. Through the model design and experiments of the five tasks, readers can find answers to three important questions: (1) When is deep NLP helpful/not helpful in search systems? (2) How to address latency challenges? (3) How to ensure model robustness? This work builds on existing efforts of LinkedIn search, and is tested at scale on a commercial search engine. We believe our experiences can provide useful insights for the industry and research communities.

【4】 Semi-Supervised Learning for Channel Charting-Aided IoT Localization in Millimeter Wave Networks 标题:毫米波网络信道图辅助物联网定位的半监督学习 链接:https://arxiv.org/abs/2108.08241

作者:Qianqian Zhang,Walid Saad 机构:Semi-Supervised Learning for Channel Charting-Aided IoTLocalization in Millimeter Wave NetworksQianqian Zhang and Walid SaadBradley Department of Electrical and Computer Engineering 摘要:本文提出了一种新的毫米波网络信道图辅助定位框架。特别地,提出了一种卷积自编码器模型,用于基于不同基站接收到的多径信道状态信息(CSI)估计无线用户设备(UE)的三维位置。为了学习无线电几何图并捕获每个UE的相对位置,以无监督的方式构造基于自动编码器的信道图,使得物理空间中的相邻UE在信道图中保持接近。接下来,将信道图模型扩展到半监督框架中,其中自动编码器分为两个组件:编码器和解码器,并且使用带有相关位置信息的标记CSI数据集对每个组件进行单独优化,以进一步提高定位精度。仿真结果表明,与已有的有监督定位方法和传统的无监督CC定位方法相比,提出的CC辅助半监督定位方法具有更高的定位精度。 摘要:In this paper, a novel framework is proposed for channel charting (CC)-aided localization in millimeter wave networks. In particular, a convolutional autoencoder model is proposed to estimate the three-dimensional location of wireless user equipment (UE), based on multipath channel state information (CSI), received by different base stations. In order to learn the radio-geometry map and capture the relative position of each UE, an autoencoder-based channel chart is constructed in an unsupervised manner, such that neighboring UEs in the physical space will remain close in the channel chart. Next, the channel charting model is extended to a semi-supervised framework, where the autoencoder is divided into two components: an encoder and a decoder, and each component is optimized individually, using the labeled CSI dataset with associated location information, to further improve positioning accuracy. Simulation results show that the proposed CC-aided semi-supervised localization yields a higher accuracy, compared with existing supervised positioning and conventional unsupervised CC approaches.

【5】 LOKI: Long Term and Key Intentions for Trajectory Prediction 标题:LOKI:轨迹预测的长期和关键意图 链接:https://arxiv.org/abs/2108.08236

作者:Harshayu Girase,Haiming Gang,Srikanth Malla,Jiachen Li,Akira Kanehara,Karttikeya Mangalam,Chiho Choi 机构:Honda Research Institute USA, University of California, Berkeley, Honda R&D Co., Ltd. 备注:ICCV 2021 (The dataset is available at this https URL) 摘要:轨迹预测方面的最新进展表明,关于主体意图的明确推理对于准确预测其运动非常重要。然而,目前的研究活动并不直接适用于智能和安全关键系统。这主要是因为很少有公共数据集是可用的,他们只考虑行人特定意图短暂的时间跨度从限制自我中心的观点。为此,我们提出了LOKI(长期和关键意图),这是一种新型的大规模数据集,旨在解决自主驾驶环境中异构交通代理(行人和车辆)的联合轨迹和意图预测问题。创建LOKI数据集是为了发现可能影响意图的几个因素,包括i)代理人自身意愿,ii)社会互动,iii)环境约束,以及iv)上下文信息。我们还提出了一个联合执行轨迹和意图预测的模型,表明关于意图的循环推理可以辅助轨迹预测。我们展示了我们的方法比最先进的轨迹预测方法高出27%$,并且还为基于帧的意图估计提供了基线。 摘要:Recent advances in trajectory prediction have shown that explicit reasoning about agents' intent is important to accurately forecast their motion. However, the current research activities are not directly applicable to intelligent and safety critical systems. This is mainly because very few public datasets are available, and they only consider pedestrian-specific intents for a short temporal horizon from a restricted egocentric view. To this end, we propose LOKI (LOng term and Key Intentions), a novel large-scale dataset that is designed to tackle joint trajectory and intention prediction for heterogeneous traffic agents (pedestrians and vehicles) in an autonomous driving setting. The LOKI dataset is created to discover several factors that may affect intention, including i) agent's own will, ii) social interactions, iii) environmental constraints, and iv) contextual information. We also propose a model that jointly performs trajectory and intention prediction, showing that recurrently reasoning about intention can assist with trajectory prediction. We show our method outperforms state-of-the-art trajectory prediction methods by upto $27%$ and also provide a baseline for frame-wise intention estimation.

【6】 Streaming and Learning the Personal Context 标题:流媒体和学习个人情境 链接:https://arxiv.org/abs/2108.08234

作者:Fausto Giunchiglia,Marcelo Rodas Britez,Andrea Bontempelli,Xiaoyue Li 机构:University of Trento, Trento, Italy 备注:9 pages, 4 figures 摘要:个人环境的表示是复杂的,对于提高机器对人类理解世界的帮助以及人类对机器提高效率的帮助至关重要。我们的目标是设计一个新的个人背景模型表示,并设计一个学习过程,以便更好地与机器学习相结合。我们的目标是将这些元素实现到现代系统体系结构中,重点是在现实环境中。此外,我们还将在具体相关的工作文件中展示如何改进我们的建议。最后,我们通过改进的模型、学习过程的实现以及这些组件的架构设计,进一步改进了个人上下文表示。 摘要:The representation of the personal context is complex and essential to improve the help machines can give to humans for making sense of the world, and the help humans can give to machines to improve their efficiency. We aim to design a novel model representation of the personal context and design a learning process for better integration with machine learning. We aim to implement these elements into a modern system architecture focus in real-life environments. Also, we show how our proposal can improve in specifically related work papers. Finally, we are moving forward with a better personal context representation with an improved model, the implementation of the learning process, and the architectural design of these components.

【7】 Analogical Learning in Tactical Decision Games 标题:战术决策博弈中的类比学习 链接:https://arxiv.org/abs/2108.08227

作者:Tom Hinrichs,Greg Dunham,Ken Forbus 机构:Qualitative Reasoning Group, Northwestern University, Maple Avenue, Evanston, IL , USA 备注:6 pages, 2 figures, unpublished 摘要:战术决策游戏(TDG)是在地图上以文字和图形方式呈现的军事冲突场景。这些场景为机器学习提供了一个具有挑战性的领域,因为它们是开放的、高度结构化的,并且通常包含许多不同相关性的细节。我们已经开发了一个交互式同伴系统的问题解决组件,该系统提出了军事任务,使用类比检索、映射和约束传播的组合来解决TDG场景。我们使用这个问题解决组件来探索类比学习。在本文中,我们描述了在该领域的学习中遇到的问题,以及我们为解决这些问题而开发的方法,例如类比映射对应的分区约束和使用增量重映射来提高鲁棒性。我们给出的学习实验结果表明,尽管存在弱域理论,但通过简单的示例积累,性能有所提高。 摘要:Tactical Decision Games (TDGs) are military conflict scenarios presented both textually and graphically on a map. These scenarios provide a challenging domain for machine learning because they are open-ended, highly structured, and typically contain many details of varying relevance. We have developed a problem-solving component of an interactive companion system that proposes military tasks to solve TDG scenarios using a combination of analogical retrieval, mapping, and constraint propagation. We use this problem-solving component to explore analogical learning. In this paper, we describe the problems encountered in learning for this domain, and the methods we have developed to address these, such as partition constraints on analogical mapping correspondences and the use of incremental remapping to improve robustness. We present the results of learning experiments that show improvement in performance through the simple accumulation of examples, despite a weak domain theory.

【8】 X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics 标题:X-modaler:一个用于跨模态分析的通用高性能代码库 链接:https://arxiv.org/abs/2108.08217

作者:Yehao Li,Yingwei Pan,Jingwen Chen,Ting Yao,Tao Mei 机构:JD AI Research, Beijing, China 备注:Accepted by 2021 ACMMM Open Source Software Competition. Source code: this https URL 摘要:随着深度学习在过去十年中的兴起和发展,出现了稳定的创新和突破势头,令人信服地推动了多媒体领域视觉和语言之间的跨模态分析的最新发展。然而,还没有一个开源代码库来支持以统一和模块化的方式训练和部署用于跨模态分析的众多神经网络模型。在这项工作中,我们提出了X-modaler——一种通用的高性能代码库,它将最先进的跨模态分析封装到几个通用阶段(例如预处理、编码器、跨模态交互、解码器和解码策略)。每个阶段都具有功能,涵盖了一系列在最新技术中广泛采用的模块,并允许在这些模块之间无缝切换。这种方式自然能够灵活地实现图像字幕、视频字幕和视觉语言预训练的最新算法,以促进研究社区的快速发展。同时,由于几个阶段的有效模块化设计(例如,跨模态交互)在不同的视觉语言任务中共享,因此X-modaler可以简单地扩展为跨模态分析中其他任务的启动原型,包括视觉问答、视觉常识推理和跨模态检索。X-modaler是Apache许可的代码库,其源代码、示例项目和预先训练的模型可在线获取:https://github.com/YehLi/xmodaler. 摘要:With the rise and development of deep learning over the past decade, there has been a steady momentum of innovation and breakthroughs that convincingly push the state-of-the-art of cross-modal analytics between vision and language in multimedia field. Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion. In this work, we propose X-modaler -- a versatile and high-performance codebase that encapsulates the state-of-the-art cross-modal analytics into several general-purpose stages (e.g., pre-processing, encoder, cross-modal interaction, decoder, and decode strategy). Each stage is empowered with the functionality that covers a series of modules widely adopted in state-of-the-arts and allows seamless switching in between. This way naturally enables a flexible implementation of state-of-the-art algorithms for image captioning, video captioning, and vision-language pre-training, aiming to facilitate the rapid development of research community. Meanwhile, since the effective modular designs in several stages (e.g., cross-modal interaction) are shared across different vision-language tasks, X-modaler can be simply extended to power startup prototypes for other tasks in cross-modal analytics, including visual question answering, visual commonsense reasoning, and cross-modal retrieval. X-modaler is an Apache-licensed codebase, and its source codes, sample projects and pre-trained models are available on-line: https://github.com/YehLi/xmodaler.

【9】 SHAQ: Single Headed Attention with Quasi-Recurrence 标题:沙克:单头注意力与准重复 链接:https://arxiv.org/abs/2108.08207

作者:Nashwin Bharwani,Warren Kushner,Sangeet Dandona,Ben Schreiber 机构: This is fea-sible for researchers at big tech companies and leading re-search universities 备注:8 pages, 11 figures 摘要:近年来,自然语言处理的研究主要集中在大规模变换模型上。尽管Transformer在许多重要的语言任务上都达到了最先进的水平,但它通常需要昂贵的计算资源,并且需要几天到几周的时间来训练。这对大型科技公司和一流研究型大学的研究人员来说是可行的,但对好斗的创业者、学生和独立研究人员来说则不然。Stephen Merity的SHA-RNN是一种紧凑的混合注意力RNN模型,专为消费者级建模而设计,因为它需要更少的参数和更少的训练时间才能达到接近最先进的结果。我们在这里通过对几个体系结构单元的探索性模型分析来分析Merity的模型,在我们的评估中考虑了训练时间和总体质量。最后,我们将这些发现结合到一个新的结构中,我们称之为SHAQ:单头注意准递归神经网络。通过我们的新架构,我们实现了与SHA-RNN相似的精度结果,同时在训练中实现了4倍的速度提升。 摘要:Natural Language Processing research has recently been dominated by large scale transformer models. Although they achieve state of the art on many important language tasks, transformers often require expensive compute resources, and days spanning to weeks to train. This is feasible for researchers at big tech companies and leading research universities, but not for scrappy start-up founders, students, and independent researchers. Stephen Merity's SHA-RNN, a compact, hybrid attention-RNN model, is designed for consumer-grade modeling as it requires significantly fewer parameters and less training time to reach near state of the art results. We analyze Merity's model here through an exploratory model analysis over several units of the architecture considering both training time and overall quality in our assessment. Ultimately, we combine these findings into a new architecture which we call SHAQ: Single Headed Attention Quasi-recurrent Neural Network. With our new architecture we achieved similar accuracy results as the SHA-RNN while accomplishing a 4x speed boost in training.

【10】 CARE: Coherent Actionable Recourse based on Sound Counterfactual Explanations 标题:关注:基于合理的反事实解释的连贯的可诉资源 链接:https://arxiv.org/abs/2108.08197

作者:Peyman Rasouli,Ingrid Chieh Yu 机构:Department of Informatics, University of Oslo, Oslo, Norway 摘要:反事实解释方法以“假设情景”的形式解释机器学习模型的输出,而不损害保真度可解释性权衡。他们解释了如何通过建议对输入特征进行小的更改,从模型中获得所需的预测。我们认为,应根据源自地面真相数据分布并与领域知识相联系的合理反事实解释,建立可采取行动的追索权。此外,它需要在满足用户/域指定约束的同时保持更改/未更改特征之间的一致性。本文介绍了CARE,这是一个模块化的解释框架,它以连续和结构化的方式处理模型和用户级别的需求。我们通过在多目标优化框架中提出新颖高效的解决方案来解决现有需求。所设计的框架能够包含任意需求,并生成反事实解释和可选择的可诉追索权。作为一种模型不可知的方法,CARE为表格分类和回归设置中的任何黑盒模型生成多种多样的解释。在标准数据集和黑盒模型上的几个实验证明了我们的模块化框架的有效性及其与基线相比的优越性能。 摘要:Counterfactual explanation methods interpret the outputs of a machine learning model in the form of "what-if scenarios" without compromising the fidelity-interpretability trade-off. They explain how to obtain a desired prediction from the model by recommending small changes to the input features, aka recourse. We believe an actionable recourse should be created based on sound counterfactual explanations originating from the distribution of the ground-truth data and linked to the domain knowledge. Moreover, it needs to preserve the coherency between changed/unchanged features while satisfying user/domain-specified constraints. This paper introduces CARE, a modular explanation framework that addresses the model- and user-level desiderata in a consecutive and structured manner. We tackle the existing requirements by proposing novel and efficient solutions that are formulated in a multi-objective optimization framework. The designed framework enables including arbitrary requirements and generating counterfactual explanations and actionable recourse by choice. As a model-agnostic approach, CARE generates multiple, diverse explanations for any black-box model in tabular classification and regression settings. Several experiments on standard data sets and black-box models demonstrate the effectiveness of our modular framework and its superior performance compared to the baselines.

【11】 DeepExpress: Heterogeneous and Coupled Sequence Modeling for Express Delivery Prediction 标题:DeepExpress:用于快递预测的异构耦合序列建模 链接:https://arxiv.org/abs/2108.08170

作者:Siyuan Ren,Bin Guo,Longbing Cao,Ke Li,Jiaqi Liu,Zhiwen Yu 机构: Northwestern Polytechnical UniversityBIN GUO, Northwestern Polytechnical UniversityLONGBING CAO, University of Technology SydneyKE LI, Northwestern Polytechnical UniversityJIAQI LIU, Northwestern Polytechnical UniversityZHIWEN YU 摘要:快递顺序的预测,即建模和估计每日进出包裹的数量,对于在线业务、物流和积极的客户体验,特别是对于资源分配优化和促销活动安排至关重要。对消费者交付请求的精确估计必须涉及连续因素,如购物行为、天气条件、事件、商业活动及其耦合。此外,传统的序列预测假设序列演化稳定,无法处理复杂的非线性序列和上述多源数据中的各种特征效应。尽管深层网络和注意机制显示了复杂序列建模的潜力,但现有网络忽略了特征和序列之间的异构和耦合情况,导致预测精度低下。为了解决这些问题,我们提出了基于深度学习的快递序列预测模型DeepExpress,该模型将经典的seq2seq框架扩展到学习序列和特征之间的复杂耦合。DeepExpress利用express delivery seq2seq学习、精心设计的异构特征表示和新颖的联合训练注意机制自适应映射异构数据,并捕获序列特征耦合以进行精确估计。对真实数据的实验结果表明,该方法优于浅基线和深基线模型。 摘要:The prediction of express delivery sequence, i.e., modeling and estimating the volumes of daily incoming and outgoing parcels for delivery, is critical for online business, logistics, and positive customer experience, and specifically for resource allocation optimization and promotional activity arrangement. A precise estimate of consumer delivery requests has to involve sequential factors such as shopping behaviors, weather conditions, events, business campaigns, and their couplings. Besides, conventional sequence prediction assumes a stable sequence evolution, failing to address complex nonlinear sequences and various feature effects in the above multi-source data. Although deep networks and attention mechanisms demonstrate the potential of complex sequence modeling, extant networks ignore the heterogeneous and coupling situation between features and sequences, resulting in weak prediction accuracy. To address these issues, we propose DeepExpress - a deep-learning based express delivery sequence prediction model, which extends the classic seq2seq framework to learning complex coupling between sequence and features. DeepExpress leverages an express delivery seq2seq learning, a carefully-designed heterogeneous feature representation, and a novel joint training attention mechanism to adaptively map heterogeneous data, and capture sequence-feature coupling for precise estimation. Experimental results on real-world data demonstrate that the proposed method outperforms both shallow and deep baseline models.

【12】 Active Observer Visual Problem-Solving Methods are Dynamically Hypothesized, Deployed and Tested 标题:动态假设、部署和测试主动观察者可视化问题解决方法 链接:https://arxiv.org/abs/2108.08145

作者:Markus D. Solbach,John K. Tsotsos 机构:Electrical Engineering and Computer Science, York University, Toronto, Canada 备注:15 pages, 6 figures 摘要:STAR体系结构旨在测试复杂现实视觉空间任务和行为的视觉注意力全选择性调节模型的价值。然而,关于人类如何以主动观察者的身份在3D中解决这些任务的知识是贫乏的。因此,我们设计了一种新的实验装置并研究了这种行为。我们发现,人类表现出各种各样的问题解决策略,其广度和复杂性令人惊讶,并且不容易被当前的方法处理。很明显,解决方法是由假设的动作序列动态组合而成的,对它们进行测试,如果失败,则尝试不同的动作序列。积极观察的重要性是惊人的,因为缺乏任何学习效果。这些结果告诉我们,STAR的认知程序表示扩展了其与现实任务的相关性。 摘要:The STAR architecture was designed to test the value of the full Selective Tuning model of visual attention for complex real-world visuospatial tasks and behaviors. However, knowledge of how humans solve such tasks in 3D as active observers is lean. We thus devised a novel experimental setup and examined such behavior. We discovered that humans exhibit a variety of problem-solving strategies whose breadth and complexity are surprising and not easily handled by current methodologies. It is apparent that solution methods are dynamically composed by hypothesizing sequences of actions, testing them, and if they fail, trying different ones. The importance of active observation is striking as is the lack of any learning effect. These results inform our Cognitive Program representation of STAR extending its relevance to real-world tasks.

【13】 Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues 标题:利用高亮线索调整音高和响度的格斗游戏解说员 链接:https://arxiv.org/abs/2108.08112

作者:Junjie H. Xu,Zhou Fang,Qihang Chen,Satoru Ohno,Pujana Paliyawan 机构:∗Graduate School of Comprehensive Human Sciences, University of Tsukuba, Japan, †Department of Sociology, Doshisha University, Japan, ‡Research Organization of Science and Technology, Ritsumeikan University, Japan 备注:None 摘要:本文介绍了一个在格斗游戏中提供实时游戏评论的评论员。评论将通过分析游戏过程中的场景获得的高亮提示作为输入,通过使用文本到语音(TTS)技术调整评论的音调和音量。我们研究了不同的音调和响度调整设计。该人工智能由两部分组成:用于控制TTS音高和响度的动态调节器和实时游戏评论生成器。我们对一款格斗游戏进行了初步研究,结果表明,根据游戏的突出程度显著调整响度,可以提高游戏的娱乐性。 摘要:This paper presents a commentator for providing real-time game commentary in a fighting game. The commentary takes into account highlight cues, obtained by analyzing scenes during gameplay, as input to adjust the pitch and loudness of commentary to be spoken by using a Text-to-Speech (TTS) technology. We investigate different designs for pitch and loudness adjustment. The proposed AI consists of two parts: a dynamic adjuster for controlling pitch and loudness of the TTS and a real-time game commentary generator. We conduct a pilot study on a fighting game, and our result shows that by adjusting the loudness significantly according to the level of game highlight, the entertainment of the gameplay can be enhanced.

【14】 MeDiaQA: A Question Answering Dataset on Medical Dialogues 标题:MeDiaQA:医学对话问答数据集 链接:https://arxiv.org/abs/2108.08074

作者:Huqun Suri,Qi Zhang,Wenhua Huo,Yan Liu,Chunsheng Guan 机构:Institute of Science and Technology, Taikang Insurance Group 摘要:在本文中,我们介绍了MeDiaQA,一种新的问答(QA)数据集,它构建在真实的在线医疗对话上。它包含22k个由人类注释的选择题,用于患者和医生之间的11k多个对话和120k个话语,涵盖150个疾病专业,这些问题来自haodf.com和dxy.com。MeDiaQA是第一个对医学对话,尤其是定量内容进行推理的QA数据集。该数据集有可能测试跨多回合对话的模型的计算、推理和理解能力,这与现有数据集相比具有挑战性。为了应对这些挑战,我们设计了媒体BERT,其准确率达到64.3%,而人因绩效的准确率达到93%,这表明仍有很大的改进空间。 摘要:In this paper, we introduce MeDiaQA, a novel question answering(QA) dataset, which constructed on real online Medical Dialogues. It contains 22k multiple-choice questions annotated by human for over 11k dialogues with 120k utterances between patients and doctors, covering 150 specialties of diseases, which are collected from haodf.com and dxy.com. MeDiaQA is the first QA dataset where reasoning over medical dialogues, especially their quantitative contents. The dataset has the potential to test the computing, reasoning and understanding ability of models across multi-turn dialogues, which is challenging compared with the existing datasets. To address the challenges, we design MeDia-BERT, and it achieves 64.3% accuracy, while human performance of 93% accuracy, which indicates that there still remains a large room for improvement.

【15】 Few-Shot Batch Incremental Road Object Detection via Detector Fusion 标题:基于检测器融合的Few-Shot批量增量道路目标检测 链接:https://arxiv.org/abs/2108.08048

作者:Anuj Tambwekar,Kshitij Agrawal,Anay Majee,Anbumani Subramanian 机构:PES University, Intel Corporation 备注:accepted in 2nd Autonomous Vehicle Vision Workshop, ICCV2021 摘要:增量Few-Shot学习已经成为深度学习中一个新的和具有挑战性的领域,其目标是使用很少的新类数据样本,而不使用任何旧类数据来训练深度学习模型。在这项工作中,我们解决了使用来自印度驾驶数据集(IDD)的数据进行批量增量Few-Shot道路目标检测的问题。我们的方法,DualFusion,以一种允许我们学习用非常有限的数据检测稀有对象的方式结合了对象检测器,所有这些都不会严重降低检测器在丰富类上的性能。在IDD OpenSet增量少数镜头检测任务中,我们在基类上实现了40.0的mAP50分数和38.8的总体mAP50分数,两者都是迄今为止最高的。在COCO批量增量Few-Shot检测任务中,我们获得了9.9分的新AP分数,比同类最先进的新级别性能提高了6.6倍以上。 摘要:Incremental few-shot learning has emerged as a new and challenging area in deep learning, whose objective is to train deep learning models using very few samples of new class data, and none of the old class data. In this work we tackle the problem of batch incremental few-shot road object detection using data from the India Driving Dataset (IDD). Our approach, DualFusion, combines object detectors in a manner that allows us to learn to detect rare objects with very limited data, all without severely degrading the performance of the detector on the abundant classes. In the IDD OpenSet incremental few-shot detection task, we achieve a mAP50 score of 40.0 on the base classes and an overall mAP50 score of 38.8, both of which are the highest to date. In the COCO batch incremental few-shot detection task, we achieve a novel AP score of 9.9, surpassing the state-of-the-art novel class performance on the same by over 6.6 times.

【16】 Variational Graph Normalized Auto-Encoders 标题:变分图归一化自动编码器 链接:https://arxiv.org/abs/2108.08046

作者:Seong Jin Ahn,Myoung Ho Kim 机构:KAIST, Daejeon, Republic of Korea, MyoungHo Kim 摘要:链接预测是图结构数据的关键问题之一。随着图神经网络的发展,图自动编码器(GAEs)和变分图自动编码器(VGAEs)被提出以无监督的方式学习图嵌入。结果表明,这些方法对于链路预测任务是有效的。然而,当涉及度为零的节点(例如,隔离节点)时,它们在链路预测中不能很好地工作。我们发现,GAEs/VGAE使孤立节点的嵌入接近于零,而与它们的内容特征无关。在本文中,我们提出了一种新的变分图规范化自动编码器(VGNAE),它利用$L_2$-规范化为孤立节点生成更好的嵌入。我们表明,我们的VGNAEs在链路预测任务方面优于现有的最新模型。该守则可于https://github.com/SeongJinAhn/VGNAE. 摘要:Link prediction is one of the key problems for graph-structured data. With the advancement of graph neural networks, graph autoencoders (GAEs) and variational graph autoencoders (VGAEs) have been proposed to learn graph embeddings in an unsupervised way. It has been shown that these methods are effective for link prediction tasks. However, they do not work well in link predictions when a node whose degree is zero (i.g., isolated node) is involved. We have found that GAEs/VGAEs make embeddings of isolated nodes close to zero regardless of their content features. In this paper, we propose a novel Variational Graph Normalized AutoEncoder (VGNAE) that utilize $L_2$-normalization to derive better embeddings for isolated nodes. We show that our VGNAEs outperform the existing state-of-the-art models for link prediction tasks. The code is available at https://github.com/SeongJinAhn/VGNAE.

【17】 SIFN: A Sentiment-aware Interactive Fusion Network for Review-based Item Recommendation 标题:SIFN:一种情感感知的交互式融合网络,用于基于评论的项目推荐 链接:https://arxiv.org/abs/2108.08022

作者:Kai Zhang,Hao Qian,Qi Liu,Zhiqiang Zhang,Jun Zhou,Jianhui Ma,Enhong Chen 机构:Anhui Province Key Lab of Big Data Analysis and Application, School of Data Science, University of Science and Technology of China; ,Ant Group, Hangzhou, China, School of Computer Science and Technology, University of Science and Technology of China 备注:5 pages, 3 figures 摘要:最近对推荐系统的研究通过利用评论预测评级,成功地实现了显著改进的性能。然而,尽管被广泛研究,这些方法仍然受到一些限制。首先,以前的研究要么对文档进行编码,要么通过神经网络提取潜在情绪,这很难直观地解释评论者的情绪。其次,他们忽略了评论与用户/项目的个性化交互,即在建模用户/项目的情感偏好时,每个评论都有不同的贡献。为了解决这些问题,我们提出了一种基于情绪感知的交互式融合网络(SIFN),用于基于评论的项目推荐。具体来说,我们首先通过BERT对用户/项目评论进行编码,并提出一个轻量级情感学习器来提取每个评论的语义特征。然后,我们提出了一个情绪预测任务,指导情绪学习者通过显式情绪标签提取情绪感知特征。最后,我们设计了一个评分预测任务,该任务包含一个评分学习者,该评分学习者具有一个交互和融合模块,用于融合身份(即用户和项目ID)和每个评审表示,以便各种交互特征能够协同影响最终评分。在五个真实数据集上的实验结果表明,该模型优于现有的模型。 摘要:Recent studies in recommender systems have managed to achieve significantly improved performance by leveraging reviews for rating prediction. However, despite being extensively studied, these methods still suffer from some limitations. First, previous studies either encode the document or extract latent sentiment via neural networks, which are difficult to interpret the sentiment of reviewers intuitively. Second, they neglect the personalized interaction of reviews with user/item, i.e., each review has different contributions when modeling the sentiment preference of user/item. To remedy these issues, we propose a Sentiment-aware Interactive Fusion Network (SIFN) for review-based item recommendation. Specifically, we first encode user/item reviews via BERT and propose a light-weighted sentiment learner to extract semantic features of each review. Then, we propose a sentiment prediction task that guides the sentiment learner to extract sentiment-aware features via explicit sentiment labels. Finally, we design a rating prediction task that contains a rating learner with an interactive and fusion module to fuse the identity (i.e., user and item ID) and each review representation so that various interactive features can synergistically influence the final rating score. Experimental results on five real-world datasets demonstrate that the proposed model is superior to state-of-the-art models.

【18】 RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving 标题:RANK-NOSH:基于非均匀连续减半的高效预测器体系结构搜索 链接:https://arxiv.org/abs/2108.08019

作者:Ruochen Wang,Xiangning Chen,Minhao Cheng,Xiaocheng Tang,Cho-Jui Hsieh 机构:Department of Computer Science, UCLA, DiDi AI Labs 备注:To Appear in ICCV2021. The code will be released shortly at this https URL 摘要:基于预测器的算法在神经结构搜索(NAS)任务中取得了显著的性能。然而,这些方法的计算成本很高,因为训练性能预测器通常需要从头开始训练和评估数百种体系结构。以前的工作主要集中在减少适应预测器所需的体系结构数量。在这项工作中,我们从另一个角度来应对这一挑战——通过减少体系结构训练的计算预算来提高搜索效率。我们提出了非均匀连续减半(NOSH)算法,这是一种分层调度算法,可以提前终止对性能不佳的体系结构的训练,以避免浪费预算。为了有效地利用NOSH产生的非均匀监督信号,我们将基于预测器的架构搜索描述为通过成对比较学习排序。由此产生的RANK-NOSH方法将搜索预算减少了约5倍,同时在各种空间和数据集上实现了比以前基于最先进的预测器的方法更具竞争力甚至更好的性能。 摘要:Predictor-based algorithms have achieved remarkable performance in the Neural Architecture Search (NAS) tasks. However, these methods suffer from high computation costs, as training the performance predictor usually requires training and evaluating hundreds of architectures from scratch. Previous works along this line mainly focus on reducing the number of architectures required to fit the predictor. In this work, we tackle this challenge from a different perspective - improve search efficiency by cutting down the computation budget of architecture training. We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget. To effectively leverage the non-uniform supervision signals produced by NOSH, we formulate predictor-based architecture search as learning to rank with pairwise comparisons. The resulting method - RANK-NOSH, reduces the search budget by ~5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.

【19】 XAI Methods for Neural Time Series Classification: A Brief Review 标题:神经时间序列分类的XAI方法综述 链接:https://arxiv.org/abs/2108.08009

作者:Ilija Šimić,Vedran Sabol,Eduardo Veas 机构:Austria 2University of Technology Graz 备注:None 摘要:深度学习模型最近在各种任务中显示出显著的效果,这就是为什么它们越来越多地应用于高风险领域,如工业、医学和金融。考虑到这些领域中的自动预测可能会对一个人的福祉产生重大影响,以及对个人或公司造成相当大的财务和法律后果,应用这些模型产生的所有行动和决定都必须负责。鉴于在高风险领域收集的大量数据是以时间序列的形式存在的,本文研究了可解释人工智能(XAI)方法的现状,重点探讨了为时间序列分类任务打开深度学习黑盒的方法。最后,我们的贡献还旨在为未来的工作得出有希望的方向,以推进XAI对时间序列数据的深入学习。 摘要:Deep learning models have recently demonstrated remarkable results in a variety of tasks, which is why they are being increasingly applied in high-stake domains, such as industry, medicine, and finance. Considering that automatic predictions in these domains might have a substantial impact on the well-being of a person, as well as considerable financial and legal consequences to an individual or a company, all actions and decisions that result from applying these models have to be accountable. Given that a substantial amount of data that is collected in high-stake domains are in the form of time series, in this paper we examine the current state of eXplainable AI (XAI) methods with a focus on approaches for opening up deep learning black boxes for the task of time series classification. Finally, our contribution also aims at deriving promising directions for future work, to advance XAI for deep learning on time series data.

【20】 Contrastive Identification of Covariate Shift in Image Data 标题:图像数据中协变量漂移的对比识别 链接:https://arxiv.org/abs/2108.08000

作者:Matthew L. Olson,Thuy-Vy Nguyen,Gaurav Dixit,Neale Ratzlaff,Weng-Keen Wong,Minsuk Kahng 机构:Oregon State University 摘要:识别协变量转移对于使机器学习系统在现实世界中具有鲁棒性以及检测测试数据中未反映的训练数据偏差至关重要。然而,检测协变量偏移是一个挑战,特别是当数据由高维图像组成时,以及当多种类型的局部协变量偏移影响数据的不同子空间时。虽然自动化技术可用于检测协变量移位的存在,但我们的目标是通过无缝集成从检测算法获得的信息的接口,帮助人类用户描述大型图像数据集中协变量移位的程度。在本文中,我们设计并评估了一个新的可视化界面,该界面便于比较训练和测试数据的局部分布。我们对多属性人脸数据进行定量用户研究,比较两种不同的学习低维潜在表征(预训练ImageNet CNN与密度比)和两种用户分析工作流(最近邻与簇对簇)。我们的结果表明,我们的密度比模型的潜在表示,结合最近邻比较,是帮助人类识别协变量转移的最有效方法。 摘要:Identifying covariate shift is crucial for making machine learning systems robust in the real world and for detecting training data biases that are not reflected in test data. However, detecting covariate shift is challenging, especially when the data consists of high-dimensional images, and when multiple types of localized covariate shift affect different subspaces of the data. Although automated techniques can be used to detect the existence of covariate shift, our goal is to help human users characterize the extent of covariate shift in large image datasets with interfaces that seamlessly integrate information obtained from the detection algorithms. In this paper, we design and evaluate a new visual interface that facilitates the comparison of the local distributions of training and test data. We conduct a quantitative user study on multi-attribute facial data to compare two different learned low-dimensional latent representations (pretrained ImageNet CNN vs. density ratio) and two user analytic workflows (nearest-neighbor vs. cluster-to-cluster). Our results indicate that the latent representation of our density ratio model, combined with a nearest-neighbor comparison, is the most effective at helping humans identify covariate shift.

【21】 A Unified Framework for Cross-Domain and Cross-System Recommendations 标题:跨域和跨系统推荐的统一框架 链接:https://arxiv.org/abs/2108.07976

作者:Feng Zhu,Yan Wang,Jun Zhou,Chaochao Chen,Longfei Li,Guanfeng Liu 机构: Liu are with the Department of Computing, MacquarieUniversity 备注:14 pages, this paper has been accepted as a regular paper in an upcoming issue of the Transactions on Knowledge and Data Engineering (TKDE) 摘要:跨域推荐(CDR)和跨系统推荐(CSR)被提出,以提高目标数据集(域/系统)中的推荐精度,并借助于信息相对丰富的源数据集。然而,大多数现有的CDR和CSR方法都是单目标的,即存在一个单一的目标数据集,它只能帮助目标数据集,因此不能使源数据集受益。在本文中,我们重点讨论了三种新的场景,即双目标CDR(DTCDR)、多目标CDR(MTCDR)和CDR CSR,旨在同时提高所有场景中所有数据集的推荐准确性。为此,我们为所有三种场景提出了一个统一的框架,称为GA(基于图形嵌入和注意技术)。在遗传算法中,我们首先构造独立的异构图来生成更具代表性的用户和项目嵌入。然后,我们提出了一种基于元素的注意机制来有效地结合从不同数据集中学习到的公共实体(用户/项目)的嵌入。此外,为了避免负迁移,我们进一步提出了一种个性化的训练策略,以最小化较丰富数据集和较稀疏数据集之间公共实体的嵌入差异,分别针对这三种场景推导了GA-DTCDR-P、GA-MTCDR-P和GA-CDR CSR-P三种新模型。在四个真实数据集上进行的大量实验表明,我们提出的遗传算法模型显著优于最先进的方法。 摘要:Cross-Domain Recommendation (CDR) and Cross-System Recommendation (CSR) have been proposed to improve the recommendation accuracy in a target dataset (domain/system) with the help of a source one with relatively richer information. However, most existing CDR and CSR approaches are single-target, namely, there is a single target dataset, which can only help the target dataset and thus cannot benefit the source dataset. In this paper, we focus on three new scenarios, i.e., Dual-Target CDR (DTCDR), Multi-Target CDR (MTCDR), and CDR CSR, and aim to improve the recommendation accuracy in all datasets simultaneously for all scenarios. To do this, we propose a unified framework, called GA (based on Graph embedding and Attention techniques), for all three scenarios. In GA, we first construct separate heterogeneous graphs to generate more representative user and item embeddings. Then, we propose an element-wise attention mechanism to effectively combine the embeddings of common entities (users/items) learned from different datasets. Moreover, to avoid negative transfer, we further propose a Personalized training strategy to minimize the embedding difference of common entities between a richer dataset and a sparser dataset, deriving three new models, i.e., GA-DTCDR-P, GA-MTCDR-P, and GA-CDR CSR-P, for the three scenarios respectively. Extensive experiments conducted on four real-world datasets demonstrate that our proposed GA models significantly outperform the state-of-the-art approaches.

【22】 Scalable regret for learning to control network-coupled subsystems with unknown dynamics 标题:学习控制具有未知动态的网络耦合子系统的可扩展后悔 链接:https://arxiv.org/abs/2108.07970

作者:Sagar Sudhakara,Aditya Mahajan,Ashutosh Nayyar,Yi Ouyang 机构: Due toSagar Sudhakara and Ashutosh Nayyar are with the Department of Electricaland Computer Engineering, University of Southern California 备注:12 pages 摘要:我们考虑控制一个未知的线性二次高斯(LQG)系统由多个子系统连接在网络上的问题。我们的目标是尽量减少和量化我们的战略对了解系统模型的oracle的遗憾(即性能损失)。全局查看互连子系统并直接使用全局系统的现有LQG学习算法会导致遗憾,遗憾随着子系统数量的增加而超线性增加。相反,我们提出了一种新的基于汤普森采样的学习算法,该算法利用了底层网络的结构。我们证明了该算法的期望遗憾是以$tilde{mathcal{O}}big(nsqrt{T}big)$为界的,其中$n$是子系统的数量,$T$是时间范围,$tilde{mathcal{O}(cdot)$符号隐藏了$n$和$T$中的对数项。因此,遗憾与子系统的数量成线性关系。我们通过数值实验来说明该算法的显著特点。 摘要:We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $tilde{mathcal{O}} big( n sqrt{T} big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $tilde{mathcal{O}}(cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.

【23】 Look Before You Leap! Designing a Human-Centered AI System for Change Risk Assessment 标题:三思而后行!设计一个以人为中心的变更风险评估人工智能系统 链接:https://arxiv.org/abs/2108.07951

作者:Binay Gupta,Anirban Chatterjee,Harika Matha,Kunal Banerjee,Lalitdutt Parsai,Vijay Agneeswaran 机构:Walmart Global Tech, Bangalore, India 摘要:减少生产系统中的故障数量是技术驱动行业(如在线零售行业)中最具挑战性的问题之一。为了应对这一挑战,变更管理已经成为运营中一个很有前途的子领域,它以系统的方式管理和审查将在生产中部署的变更。然而,每天手动审查大量变更并评估与之相关的风险实际上是不可能的。这就需要开发一个自动化系统来评估与大量变更相关的风险。有一些商业解决方案可以解决这个问题,但这些解决方案缺乏将领域知识和领域专家的持续反馈纳入风险评估过程的能力。作为这项工作的一部分,我们的目标是通过在风险评估过程中建立一个持续的反馈回路,弥合模型驱动的变更请求风险评估和领域专家评估之间的差距。在这里,我们介绍了我们构建端到端机器学习系统的工作,并讨论了我们面临的一些实际挑战,这些挑战涉及到类分布的极端偏斜、概念漂移、与模型预测相关的不确定性估计以及系统的整体可伸缩性。 摘要:Reducing the number of failures in a production system is one of the most challenging problems in technology driven industries, such as, the online retail industry. To address this challenge, change management has emerged as a promising sub-field in operations that manages and reviews the changes to be deployed in production in a systematic manner. However, it is practically impossible to manually review a large number of changes on a daily basis and assess the risk associated with them. This warrants the development of an automated system to assess the risk associated with a large number of changes. There are a few commercial solutions available to address this problem but those solutions lack the ability to incorporate domain knowledge and continuous feedback from domain experts into the risk assessment process. As part of this work, we aim to bridge the gap between model-driven risk assessment of change requests and the assessment of domain experts by building a continuous feedback loop into the risk assessment process. Here we present our work to build an end-to-end machine learning system along with the discussion of some of practical challenges we faced related to extreme skewness in class distribution, concept drift, estimation of the uncertainty associated with the model's prediction and the overall scalability of the system.

【24】 DeepFake MNIST : A DeepFake Facial Animation Dataset 标题:DeepFake MNIST :一个DeepFake人脸动画数据集 链接:https://arxiv.org/abs/2108.07949

作者:Jiajun Huang,Xueyu Wang,Bo Du,Pei Du,Chang Xu 机构:The University of Sydney, Wuhan University, AntGroup 备注:14 pages 摘要:深度假货是一种面部操纵技术,是数字社会的新威胁。人们提出了各种各样的假检测方法和数据集来检测这些数据,特别是人脸交换。然而,最近的研究较少考虑面部动画,这也是非常重要的,在深假面攻击方。它试图通过驾驶视频提供的动作来制作人脸图像动画,这也导致了对最近支付系统安全性的担忧,这些系统通过识别一系列用户面部动作来回复活跃度检测以验证真实用户。然而,我们的实验表明,现有的数据集不足以开发可靠的检测方法。而当前的活跃度检测器无法防御攻击等视频。作为回应,我们提出了一个新的人脸动画数据集,称为DeepFake MNIST ,由SOTA图像动画生成器生成。它包括10个不同动作中的10000个面部动画视频,可以欺骗最近的活跃度检测器。本文还介绍了一种基线检测方法,并对该方法进行了综合分析。此外,我们还分析了该数据集的性质,揭示了在不同类型的运动和压缩质量下检测动画数据集的困难和重要性。 摘要:The DeepFakes, which are the facial manipulation techniques, is the emerging threat to digital society. Various DeepFake detection methods and datasets are proposed for detecting such data, especially for face-swapping. However, recent researches less consider facial animation, which is also important in the DeepFake attack side. It tries to animate a face image with actions provided by a driving video, which also leads to a concern about the security of recent payment systems that reply on liveness detection to authenticate real users via recognising a sequence of user facial actions. However, our experiments show that the existed datasets are not sufficient to develop reliable detection methods. While the current liveness detector cannot defend such videos as the attack. As a response, we propose a new human face animation dataset, called DeepFake MNIST , generated by a SOTA image animation generator. It includes 10,000 facial animation videos in ten different actions, which can spoof the recent liveness detectors. A baseline detection method and a comprehensive analysis of the method is also included in this paper. In addition, we analyze the proposed dataset's properties and reveal the difficulty and importance of detecting animation datasets under different types of motion and compression quality.

【25】 Object Disparity 标题:物体视差 链接:https://arxiv.org/abs/2108.07939

作者:Ynjiun Paul Wang 机构:Cupertino, CA 备注:10 pages, 13 figures, 7 tables 摘要:大多数立体视觉工作都集中于计算给定左右图像对的稠密像素视差。相机对通常需要镜头不失真和立体校准,以提供不失真的外极线校准图像对,用于精确的密集像素视差计算。由于噪声、物体遮挡、重复或缺乏纹理以及匹配算法的限制,这些物体边界区域的像素视差精度通常受到最大影响。尽管统计上像素视差误差总数可能较低(根据当前排名靠前的算法的Kitti Vision基准,低于2%),但这些视差误差在对象边界处的百分比非常高。这使得子序列三维对象距离检测的精度远低于预期。本文提出了一种解决三维物体距离检测的不同方法,即直接检测物体的视差,而无需进行密集像素视差计算。构建了一个示例squeezenet对象视差SSD(OD-SSD),与Kitti数据集像素视差地面真实值相比,该SSD具有相当的精度,能够有效地检测对象视差。使用多个不同立体系统捕获的混合图像数据集的进一步训练和测试结果可能表明,OD-SSD可能对立体系统参数不可知,例如基线、FOV、镜头畸变,甚至左/右摄像机极线错位。 摘要:Most of stereo vision works are focusing on computing the dense pixel disparity of a given pair of left and right images. A camera pair usually required lens undistortion and stereo calibration to provide an undistorted epipolar line calibrated image pair for accurate dense pixel disparity computation. Due to noise, object occlusion, repetitive or lack of texture and limitation of matching algorithms, the pixel disparity accuracy usually suffers the most at those object boundary areas. Although statistically the total number of pixel disparity errors might be low (under 2% according to the Kitti Vision Benchmark of current top ranking algorithms), the percentage of these disparity errors at object boundaries are very high. This renders the subsequence 3D object distance detection with much lower accuracy than desired. This paper proposed a different approach for solving a 3D object distance detection by detecting object disparity directly without going through a dense pixel disparity computation. An example squeezenet Object Disparity-SSD (OD-SSD) was constructed to demonstrate an efficient object disparity detection with comparable accuracy compared with Kitti dataset pixel disparity ground truth. Further training and testing results with mixed image dataset captured by several different stereo systems may suggest that an OD-SSD might be agnostic to stereo system parameters such as a baseline, FOV, lens distortion, even left/right camera epipolar line misalignment.

【26】 Learning Implicit User Profiles for Personalized Retrieval-Based Chatbot 标题:基于个性化检索的聊天机器人隐式用户模型学习 链接:https://arxiv.org/abs/2108.07935

作者:Hongjin Qian,Zhicheng Dou,Yutao Zhu,Yueyuan Ma,Ji-Rong Wen 机构: Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China, Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China, Université de Montréal, Québec, Canada 备注:Accepted by CIKM 2021, codes and dataset will be released at this https URL 摘要:在本文中,我们探讨了开发个性化聊天机器人的问题。个性化聊天机器人是为用户设计的数字聊天助手。个性化聊天机器人的关键特征是它应该与相应的用户具有一致的个性。当它被授权响应其他人的消息时,它可以像用户一样说话。我们提出了一个基于检索的个性化聊天机器人模型,即IMPChat,用于从用户的对话历史中学习隐式用户配置文件。我们认为隐式用户配置文件在可访问性和灵活性方面优于显式用户配置文件。IMPChat旨在通过分别建模用户的个性化语言风格和个性化偏好来学习隐式用户配置文件。为了了解用户的个性化语言风格,我们利用用户的历史反应从浅到深精心构建语言模型;为了模拟用户的个性化偏好,我们探索了用户每个post响应对下的条件关系。个性化偏好是动态的,并且上下文感知:在聚合个性化偏好时,我们为那些与当前查询局部相关的历史对分配更高的权重。我们分别用个性化的语言风格和个性化的偏好来匹配每个响应候选,并融合这两个匹配信号来确定最终的排名分数。在两个大型数据集上的综合实验表明,我们的方法优于所有的基线模型。 摘要:In this paper, we explore the problem of developing personalized chatbots. A personalized chatbot is designed as a digital chatting assistant for a user. The key characteristic of a personalized chatbot is that it should have a consistent personality with the corresponding user. It can talk the same way as the user when it is delegated to respond to others' messages. We present a retrieval-based personalized chatbot model, namely IMPChat, to learn an implicit user profile from the user's dialogue history. We argue that the implicit user profile is superior to the explicit user profile regarding accessibility and flexibility. IMPChat aims to learn an implicit user profile through modeling user's personalized language style and personalized preferences separately. To learn a user's personalized language style, we elaborately build language models from shallow to deep using the user's historical responses; To model a user's personalized preferences, we explore the conditional relations underneath each post-response pair of the user. The personalized preferences are dynamic and context-aware: we assign higher weights to those historical pairs that are topically related to the current query when aggregating the personalized preferences. We match each response candidate with the personalized language style and personalized preference, respectively, and fuse the two matching signals to determine the final ranking score. Comprehensive experiments on two large datasets show that our method outperforms all baseline models.

【27】 M-ar-K-Fast Independent Component Analysis 标题:M-Ar-K-快速独立分量分析 链接:https://arxiv.org/abs/2108.07908

作者:Luca Parisi 机构:Coventry, United Kingdom, PhD in Machine Learning for Clinical Decision Support Systems, MBA Candidate with Artificial Intelligence Specialism 备注:17 pages, 2 listings/Python code snippets, 2 figures, 5 tables. arXiv admin note: text overlap with arXiv:2009.07530 摘要:本研究提出了用于特征提取的m-arcinh核(m-ar-K)快速独立分量分析(FastICA)方法(m-ar-K-FastICA)。核技巧使降维技术能够捕获数据中更大程度的非线性;然而,用于辅助特征提取的可重复的开源内核仍然有限,并且在从熵数据投影特征时可能不可靠。m-ar-K函数在Python中免费提供,并与其开源库“scikit learn”兼容,因此与FastICA结合使用,以在数据存在高度随机性的情况下实现更可靠的特征提取,从而减少预白化的需要。被认为是不同的分类任务,作为相关的五(n=5)开放存取数据集的不同程度的信息熵,可从SCIKIT学习和大学加利福尼亚欧文(UCI)机器学习库。实验结果表明,该特征提取方法提高了分类性能。新的m-ar-K-FastICA降维方法与“FastICA”金标准方法进行了比较,支持其更高的可靠性和计算效率,而不考虑数据中潜在的不确定性。 摘要:This study presents the m-arcsinh Kernel ('m-ar-K') Fast Independent Component Analysis ('FastICA') method ('m-ar-K-FastICA') for feature extraction. The kernel trick has enabled dimensionality reduction techniques to capture a higher extent of non-linearity in the data; however, reproducible, open-source kernels to aid with feature extraction are still limited and may not be reliable when projecting features from entropic data. The m-ar-K function, freely available in Python and compatible with its open-source library 'scikit-learn', is hereby coupled with FastICA to achieve more reliable feature extraction in presence of a high extent of randomness in the data, reducing the need for pre-whitening. Different classification tasks were considered, as related to five (N = 5) open access datasets of various degrees of information entropy, available from scikit-learn and the University California Irvine (UCI) Machine Learning repository. Experimental results demonstrate improvements in the classification performance brought by the proposed feature extraction. The novel m-ar-K-FastICA dimensionality reduction approach is compared to the 'FastICA' gold standard method, supporting its higher reliability and computational efficiency, regardless of the underlying uncertainty in the data.

【28】 Statistically Near-Optimal Hypothesis Selection 标题:统计上接近最优的假设选择 链接:https://arxiv.org/abs/2108.07880

作者:Olivier Bousquet,Mark Braverman,Klim Efremenko,Gillat Kol,Shay Moran 机构:‡Department of Computer Science, Ben Gurion University, §Department of Computer Science, Princeton University 备注:Accepted to FOCS 2021 摘要:假设选择是一个基本的分布学习问题,其中给定一个比较器类$Q={Qu 1、ldots、Qu n}$分布,以及对未知目标分布$p$的抽样访问,目标是输出分布$Q$,使得$mathsf{TV}(p,Q)$接近$opt$,其中$opt=minu i{mathsf{TV}(p,q_i)}$和$mathsf{TV}(cdot,cdot)$表示总变化距离。尽管这一问题自19世纪就被研究过,但其在基本资源方面的复杂性,如样本数量和近似保证,仍然没有得到解决(这一点在Devroye和Lugosi`00的《迷人的书》中进行了讨论)。这与其他(较年轻的)学习环境形成了鲜明的对比,例如PAC学习,对于这些环境的复杂性,我们已经很好地理解了。我们为假设选择问题导出了一个最优的$2$-近似学习策略,输出$q$,使得$mathsf{TV}(p,q)leq2cdot opt eps$,具有$tilde O(log n/epsilon^2)$(接近)最优样本复杂度。这是第一个同时获得最佳近似因子和样本复杂度的算法:之前,Bousquet、Kane和Moran(COLT`19)让学习者获得最佳$2$-近似,但样本复杂度呈指数级下降$tilde O(sqrt{n}/epsilon^{2.5})$,Yatracos~(Annals of Statistics`85)给出的学习者的最佳样本复杂度为$O(log n/epsilon^2)$,但次优近似因子为$3$。 摘要:Hypothesis Selection is a fundamental distribution learning problem where given a comparator-class $Q={q_1,ldots, q_n}$ of distributions, and a sampling access to an unknown target distribution $p$, the goal is to output a distribution $q$ such that $mathsf{TV}(p,q)$ is close to $opt$, where $opt = min_i{mathsf{TV}(p,q_i)}$ and $mathsf{TV}(cdot, cdot)$ denotes the total-variation distance. Despite the fact that this problem has been studied since the 19th century, its complexity in terms of basic resources, such as number of samples and approximation guarantees, remains unsettled (this is discussed, e.g., in the charming book by Devroye and Lugosi `00). This is in stark contrast with other (younger) learning settings, such as PAC learning, for which these complexities are well understood. We derive an optimal $2$-approximation learning strategy for the Hypothesis Selection problem, outputting $q$ such that $mathsf{TV}(p,q) leq2 cdot opt eps$, with a (nearly) optimal sample complexity of~$tilde O(log n/epsilon^2)$. This is the first algorithm that simultaneously achieves the best approximation factor and sample complexity: previously, Bousquet, Kane, and Moran (COLT `19) gave a learner achieving the optimal $2$-approximation, but with an exponentially worse sample complexity of $tilde O(sqrt{n}/epsilon^{2.5})$, and Yatracos~(Annals of Statistics `85) gave a learner with optimal sample complexity of $O(log n /epsilon^2)$ but with a sub-optimal approximation factor of $3$.

【29】 Edge AI without Compromise: Efficient, Versatile and Accurate Neurocomputing in Resistive Random-Access Memory 标题:不妥协的边缘人工智能:电阻性随机存取存储器中的高效、通用和精确的神经计算 链接:https://arxiv.org/abs/2108.07879

作者:Weier Wan,Rajkumar Kubendran,Clemens Schaefer,S. Burc Eryilmaz,Wenqiang Zhang,Dabin Wu,Stephen Deiss,Priyanka Raina,He Qian,Bin Gao,Siddharth Joshi,Huaqiang Wu,H. -S. Philip Wong,Gert Cauwenberghs 机构: Stanford University, CA, USA; , University of California San Diego, CA, USA; , Tsinghua University, Beijing, China; , University of Notre Dame, IN, USA; , University of Pittsburgh, PA, USA 备注:34 pages, 14 figures, 1 table 摘要:直接在分布在互联网边缘的设备上实现当今的云级人工智能功能,需要能够以前所未有的能效处理多种感官数据(如视频、音频)的边缘硬件。今天的人工智能硬件体系结构无法满足需求,因为存在一道基本的“内存墙”:单独的计算和内存单元之间的数据移动会消耗大量的能量,并且会产生较长的延迟。基于电阻随机存取存储器(RRAM)的内存中计算(CIM)体系结构通过直接在内存中执行计算,有望带来几个数量级的能效改进。然而,CIM硬件设计的传统方法限制了其处理各种AI工作负载所需的功能灵活性,并且必须克服降低推理精度的硬件缺陷。这种效率、多功能性和准确性之间的权衡不能通过对任何单一设计层次的单独改进来解决。通过在从算法和架构到电路和设备的所有设计层次上进行协同优化,我们展示了Neuram——第一款使用RRAM CIM的多模边缘AI芯片,可同时为各种模型架构提供高度的通用性,在各种计算位精度方面,记录的能效比现有技术高$5倍$-$8倍$,推理精度可与所有测量的标准AI基准上具有4位权重的软件模型相媲美,包括MNIST上99.0%的精度和CIFAR-10图像分类上85.7%的精度,谷歌语音命令识别的准确率为84.7%,贝叶斯图像恢复任务的图像重建误差降低了70%。这项工作为构建高效、可重构的边缘人工智能硬件平台铺平了道路,以满足未来更高要求和更异构的人工智能应用。 摘要:Realizing today's cloud-level artificial intelligence functionalities directly on devices distributed at the edge of the internet calls for edge hardware capable of processing multiple modalities of sensory data (e.g. video, audio) at unprecedented energy-efficiency. AI hardware architectures today cannot meet the demand due to a fundamental "memory wall": data movement between separate compute and memory units consumes large energy and incurs long latency. Resistive random-access memory (RRAM) based compute-in-memory (CIM) architectures promise to bring orders of magnitude energy-efficiency improvement by performing computation directly within memory. However, conventional approaches to CIM hardware design limit its functional flexibility necessary for processing diverse AI workloads, and must overcome hardware imperfections that degrade inference accuracy. Such trade-offs between efficiency, versatility and accuracy cannot be addressed by isolated improvements on any single level of the design. By co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM - the first multimodal edge AI chip using RRAM CIM to simultaneously deliver a high degree of versatility for diverse model architectures, record energy-efficiency $5times$ - $8times$ better than prior art across various computational bit-precisions, and inference accuracy comparable to software models with 4-bit weights on all measured standard AI benchmarks including accuracy of 99.0% on MNIST and 85.7% on CIFAR-10 image classification, 84.7% accuracy on Google speech command recognition, and a 70% reduction in image reconstruction error on a Bayesian image recovery task. This work paves a way towards building highly efficient and reconfigurable edge AI hardware platforms for the more demanding and heterogeneous AI applications of the future.

【30】 Channel-Temporal Attention for First-Person Video Domain Adaptation 标题:第一人称视频域自适应的通道-时间注意 链接:https://arxiv.org/abs/2108.07846

作者:Xianyuan Liu,Shuo Zhou,Tao Lei,Haiping Lu 机构:Institute of Optical and Electronics, Chinese Academy of Sciences, China, University of Chinese Academy of Sciences, China, University of Sheffield, United Kingdom 摘要:无监督领域自适应(UDA)可以将知识从标记的源数据转移到相同类别的未标记的目标数据。然而,第一人称动作识别的UDA是一个探索不足的问题,缺乏数据集,对第一人称视频特征的考虑也有限。本文着重于解决这一问题。首先,我们提出了两个小规模的第一人称视频域适配数据集:ADL${small}$和GTEA-KITCHEN。其次,我们引入通道时间注意块来捕捉通道和时间的关系,并建模它们对第一人称视觉重要的相互依赖关系。最后,我们提出了一个通道时间注意网络(CTAN)来将这些模块集成到现有的体系结构中。CTAN在两个拟议数据集和一个现有数据集EPIC${cvpr20}$上优于基线。 摘要:Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person action recognition is an under-explored problem, with lack of datasets and limited consideration of first-person video characteristics. This paper focuses on addressing this problem. Firstly, we propose two small-scale first-person video domain adaptation datasets: ADL$_{small}$ and GTEA-KITCHEN. Secondly, we introduce channel-temporal attention blocks to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to first-person vision. Finally, we propose a Channel-Temporal Attention Network (CTAN) to integrate these blocks into existing architectures. CTAN outperforms baselines on the two proposed datasets and one existing dataset EPIC$_{cvpr20}$.

【31】 Compressing gradients by exploiting temporal correlation in momentum-SGD 标题:利用动量-SGD的时间相关性压缩梯度 链接:https://arxiv.org/abs/2108.07827

作者:Tharindu B. Adikari,Stark C. Draper 机构: University of Toronto 备注:None 摘要:分散优化中一个日益增长的瓶颈是通信。更大的模型和不断增长的数据集意味着计算的分散非常重要,交换的信息量正在迅速增长。虽然已经引入了压缩技术来处理后者,但没有一种技术考虑利用连续向量更新中存在的时间相关性。一个重要的例子是分布式动量SGD,其中通过应用动量的低通滤波效应增强了时间相关性。在本文中,我们设计并分析了在有误差反馈和无误差反馈的系统中利用时间相关性的压缩方法。使用ImageNet数据集进行的实验表明,我们提出的方法在计算复杂度几乎不增加的情况下显著降低了通信速率。我们进一步分析了当采用误差反馈压缩时,SGD的收敛性。在文献中,收敛保证仅针对提供逐点误差边界的压缩器开发,即针对压缩器的每个输入。相比之下,许多重要的代码(例如率失真代码)仅在预期情况下提供错误界限,从而提供更一般的保证。本文通过建立最小梯度范数的界,证明了在期望误差假设下SGD的收敛性。 摘要:An increasing bottleneck in decentralized optimization is communication. Bigger models and growing datasets mean that decentralization of computation is important and that the amount of information exchanged is quickly growing. While compression techniques have been introduced to cope with the latter, none has considered leveraging the temporal correlations that exist in consecutive vector updates. An important example is distributed momentum-SGD where temporal correlation is enhanced by the low-pass-filtering effect of applying momentum. In this paper we design and analyze compression methods that exploit temporal correlation in systems both with and without error-feedback. Experiments with the ImageNet dataset demonstrate that our proposed methods offer significant reduction in the rate of communication at only a negligible increase in computation complexity. We further analyze the convergence of SGD when compression is applied with error-feedback. In the literature, convergence guarantees are developed only for compressors that provide error-bounds point-wise, i.e., for each input to the compressor. In contrast, many important codes (e.g. rate-distortion codes) provide error-bounds only in expectation and thus provide a more general guarantee. In this paper we prove the convergence of SGD under an expected error assumption by establishing a bound for the minimum gradient norm.

【32】 A Framework for Understanding AI-Induced Field Change: How AI Technologies are Legitimized and Institutionalized 标题:理解人工智能引起的领域变化的框架:人工智能技术如何合法化和制度化 链接:https://arxiv.org/abs/2108.07804

作者:Benjamin Cedric Larsen 机构:Department of Economics, Government & Business, Copenhagen Business School, Copenhagen, Denmark 备注:None 摘要:人工智能(AI)系统在越来越多样化的领域中运行,从医疗保健到面部识别、股票市场、自动驾驶汽车等等。虽然人工智能系统的基础数字基础设施发展迅速,但每个实施领域都受到不同程度和合法化过程的影响。本文结合制度理论和信息系统理论,提出了一个分析和理解人工智能引起的场域变化的概念框架。将新型人工智能代理引入新的或现有的领域创造了一种动态,在这种动态中,算法(重新)塑造组织和机构,而现有的机构基础设施决定了允许组织变革发生的范围和速度。在制度基础设施和治理安排(如标准、规则和法规)仍然缺乏劳动力的地方,这一领域可以快速发展,但也更有可能受到质疑。围绕人工智能诱导领域的制度基础设施通常很少详细阐述,这可能会阻碍人工智能系统更广泛的制度化。 摘要:Artificial intelligence (AI) systems operate in increasingly diverse areas, from healthcare to facial recognition, the stock market, autonomous vehicles, and so on. While the underlying digital infrastructure of AI systems is developing rapidly, each area of implementation is subject to different degrees and processes of legitimization. By combining elements from institutional theory and information systems-theory, this paper presents a conceptual framework to analyze and understand AI-induced field-change. The introduction of novel AI-agents into new or existing fields creates a dynamic in which algorithms (re)shape organizations and institutions while existing institutional infrastructures determine the scope and speed at which organizational change is allowed to occur. Where institutional infrastructure and governance arrangements, such as standards, rules, and regulations, still are unelaborate, the field can move fast but is also more likely to be contested. The institutional infrastructure surrounding AI-induced fields is generally little elaborated, which could be an obstacle to the broader institutionalization of AI-systems going forward.

【33】 Moser Flow: Divergence-based Generative Modeling on Manifolds 标题:Moser流:基于散度的流形产生式建模 链接:https://arxiv.org/abs/2108.08052

作者:Noam Rozen,Aditya Grover,Maximilian Nickel,Yaron Lipman 机构:FAIR and WIS 摘要:我们感兴趣的是学习通过流形描述的复杂几何体的生成模型,例如球体、环面和其他隐式曲面。现有(欧几里得)生成模型的当前扩展仅限于特定的几何结构,并且通常会遭受较高的计算成本。我们介绍了连续规范化流(CNF)族中的一类新的生成模型——Moser流(MF)。MF还通过变量公式变化的解决方案生成CNF,但与其他CNF方法不同,其模型(学习)密度被参数化为源(先验)密度减去神经网络(NN)的散度。散度是一个局部线性微分算子,易于在流形上逼近和计算。因此,与其他CNF不同,MF不需要在训练期间通过ODE解算器调用或反向传播。此外,将模型密度明确表示为NN的散度,而不是ODE的解,有助于学习高保真密度。理论上,我们在适当的假设下证明了MF是一个普适密度近似器。从经验上看,我们首次展示了使用流动模型从一般曲面采样,并在密度估计、样本质量和训练复杂性方面比现有CNF有了显著改进,从而挑战了地球和气候科学的合成几何和真实基准。 摘要:We are interested in learning generative models for complex geometries described via manifolds, such as spheres, tori, and other implicit surfaces. Current extensions of existing (Euclidean) generative models are restricted to specific geometries and typically suffer from high computational costs. We introduce Moser Flow (MF), a new class of generative models within the family of continuous normalizing flows (CNF). MF also produces a CNF via a solution to the change-of-variable formula, however differently from other CNF methods, its model (learned) density is parameterized as the source (prior) density minus the divergence of a neural network (NN). The divergence is a local, linear differential operator, easy to approximate and calculate on manifolds. Therefore, unlike other CNFs, MF does not require invoking or backpropagating through an ODE solver during training. Furthermore, representing the model density explicitly as the divergence of a NN rather than as a solution of an ODE facilitates learning high fidelity densities. Theoretically, we prove that MF constitutes a universal density approximator under suitable assumptions. Empirically, we demonstrate for the first time the use of flow models for sampling from general curved surfaces and achieve significant improvements in density estimation, sample quality, and training complexity over existing CNFs on challenging synthetic geometries and real-world benchmarks from the earth and climate sciences.

0 人点赞