cs.AI人工智能,共计31篇
【1】 Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection 标题:数据驱动和SE辅助的AI模型信号感知增强和自省 链接:https://arxiv.org/abs/2111.05827
作者:Sahil Suneja,Yufan Zhuang,Yunhui Zheng,Jim Laredo,Alessandro Morari 机构:IBM Research, Yorktown Heights, NY, USA, Jim A. Laredo 备注:Submitted September 2021 摘要:用于源代码理解任务的人工智能建模已经取得了重大进展,并正在生产开发管道中采用。然而,可靠性问题,特别是模型是否真正学习了源代码中与任务相关的方面,正在被提出。虽然最近的模型探测方法观察到许多人工智能代码模型缺乏信号感知,即模型没有捕获任务相关信号,但它们并没有提供纠正此问题的解决方案。在本文中,我们探索了数据驱动的方法来增强模型的信号感知:1)将代码复杂性的SE概念与课程学习的AI技术相结合;2) 我们通过定制增量调试,将SE辅助纳入AI模型,以生成简化的信号保留程序,并将其扩展到训练数据集。通过我们的技术,我们的模型信号感知能力提高了4.8倍。利用代码复杂性的概念,我们进一步从数据集的角度提出了一种新的模型学习内省方法。 摘要:AI modeling for source code understanding tasks has been making significant progress, and is being adopted in production development pipelines. However, reliability concerns, especially whether the models are actually learning task-related aspects of source code, are being raised. While recent model-probing approaches have observed a lack of signal awareness in many AI-for-code models, i.e. models not capturing task-relevant signals, they do not offer solutions to rectify this problem. In this paper, we explore data-driven approaches to enhance models' signal-awareness: 1) we combine the SE concept of code complexity with the AI technique of curriculum learning; 2) we incorporate SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset. With our techniques, we achieve up to 4.8x improvement in model signal awareness. Using the notion of code complexity, we further present a novel model learning introspection approach from the perspective of the dataset.
【2】 A Two-Stage Approach towards Generalization in Knowledge Base Question Answering 标题:知识库问答中的两阶段泛化方法 链接:https://arxiv.org/abs/2111.05825
作者:Srinivas Ravishankar,June Thai,Ibrahim Abdelaziz,Nandana Mihidukulasooriya,Tahira Naseem,Pavan Kapanipathi,Gaetano Rossilleo,Achille Fokoue 机构:∗IBM Research, UMass Amherst 摘要:大多数现有的知识库问答(KBQA)方法都关注于特定的底层知识库,这要么是因为该方法中存在固有的假设,要么是因为在不同的知识库上对其进行评估需要进行非平凡的更改。然而,许多流行的知识库在其底层模式中具有相似性,可以利用这些模式来促进跨知识库的泛化。为了实现这种泛化,我们引入了一个基于两阶段体系结构的KBQA框架,该框架明确地将语义解析与知识库交互分离,促进了跨数据集和知识图的转移学习。我们表明,对具有不同底层知识库的数据集进行预训练仍然可以显著提高性能并降低样本复杂性。我们的方法实现了LC QuAD(DBpedia)、WebQSP(Freebase)、SimpleQuestions(Wikidata)和MetaQA(Wikikg)的可比或最先进的性能。 摘要:Most existing approaches for Knowledge Base Question Answering (KBQA) focus on a specific underlying knowledge base either because of inherent assumptions in the approach, or because evaluating it on a different knowledge base requires non-trivial changes. However, many popular knowledge bases share similarities in their underlying schemas that can be leveraged to facilitate generalization across knowledge bases. To achieve this generalization, we introduce a KBQA framework based on a 2-stage architecture that explicitly separates semantic parsing from the knowledge base interaction, facilitating transfer learning across datasets and knowledge graphs. We show that pretraining on datasets with a different underlying knowledge base can nevertheless provide significant performance gains and reduce sample complexity. Our approach achieves comparable or state-of-the-art performance for LC-QuAD (DBpedia), WebQSP (Freebase), SimpleQuestions (Wikidata) and MetaQA (Wikimovies-KG).
【3】 Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention 标题:三思而后行:人工干预下基于安全模型的强化学习 链接:https://arxiv.org/abs/2111.05819
作者:Yunkun Xu,Zhenyu Liu,Guifang Duan,Jiangcheng Zhu,Xiaolong Bai,Jianrong Tan 机构:State Key Laboratory of CAD&CG, Zhejiang University, Huawei Cloud 摘要:安全性已成为将深度强化学习应用于现实系统的主要挑战之一。目前,纳入外部知识(如人类监督)是防止特工访问灾难状态的唯一手段。在本文中,我们提出了一种新的基于安全模型的强化学习框架MBHI,它可以确保国家级的安全性,并且可以有效地避免“局部”和“非局部”灾难。在MBHI中训练一组有监督的学习者来模仿人类的阻塞决策。与人类决策过程类似,MBHI将在对环境执行操作之前,在动力学模型中推出一条想象的轨迹,并评估其安全性。当想象遭遇灾难时,MBHI将阻止当前动作,并使用有效的MPC方法输出安全策略。我们在几个安全任务上评估了我们的方法,结果表明,与基线相比,MBHI在样本效率和灾难数量方面取得了更好的性能。 摘要:Safety has become one of the main challenges of applying deep reinforcement learning to real world systems. Currently, the incorporation of external knowledge such as human oversight is the only means to prevent the agent from visiting the catastrophic state. In this paper, we propose MBHI, a novel framework for safe model-based reinforcement learning, which ensures safety in the state-level and can effectively avoid both "local" and "non-local" catastrophes. An ensemble of supervised learners are trained in MBHI to imitate human blocking decisions. Similar to human decision-making process, MBHI will roll out an imagined trajectory in the dynamics model before executing actions to the environment, and estimate its safety. When the imagination encounters a catastrophe, MBHI will block the current action and use an efficient MPC method to output a safety policy. We evaluate our method on several safety tasks, and the results show that MBHI achieved better performance in terms of sample efficiency and number of catastrophes compared to the baselines.
【4】 PIMIP: An Open Source Platform for Pathology Information Management and Integration 标题:PIMIP:一个开放源码的病理信息管理与集成平台 链接:https://arxiv.org/abs/2111.05794
作者:Jialun Wu,Anyu Mao,Xinrui Bao,Haichuan Zhang,Zeyu Gao,Chunbao Wang,Tieliang Gong,Chen Li 机构:School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China, National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, Xi’an, China, School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, China 备注:BIBM 2021 accepted, including 8 pages, 8 figures 摘要:数字病理学在医学领域人工智能的发展中起着至关重要的作用。数字化病理平台可以使病理资源数字化、网络化,实现视觉数据的永久存储和不受时间和空间限制的同步浏览处理。它已广泛应用于病理学的各个领域。然而,仍然缺乏一个开放和通用的数字病理学平台来帮助医生管理和分析数字病理切片,以及管理和结构化描述相关患者信息。大多数平台无法集成图像查看、注释和分析以及文本信息管理。为了解决上述问题,我们提出了一个全面的、可扩展的平台PIMIP。我们的PIMIP开发了基于数字病理切片可视化的图像注释功能。我们的标注功能支持多用户协同标注和多设备标注,实现了部分标注任务的自动化。在注释任务中,我们邀请了一位专业病理学家进行指导。我们介绍了一个用于图像分析的机器学习模块。我们收集的数据包括当地医院的公共数据和临床实例。我们的平台更适合临床使用。除了图像数据,我们还构建了文本信息的管理和显示。所以我们的平台是全面的。平台框架以模块化方式构建,支持用户独立添加机器学习模块,使平台具有可扩展性。 摘要:Digital pathology plays a crucial role in the development of artificial intelligence in the medical field. The digital pathology platform can make the pathological resources digital and networked, and realize the permanent storage of visual data and the synchronous browsing processing without the limitation of time and space. It has been widely used in various fields of pathology. However, there is still a lack of an open and universal digital pathology platform to assist doctors in the management and analysis of digital pathological sections, as well as the management and structured description of relevant patient information. Most platforms cannot integrate image viewing, annotation and analysis, and text information management. To solve the above problems, we propose a comprehensive and extensible platform PIMIP. Our PIMIP has developed the image annotation functions based on the visualization of digital pathological sections. Our annotation functions support multi-user collaborative annotation and multi-device annotation, and realize the automation of some annotation tasks. In the annotation task, we invited a professional pathologist for guidance. We introduce a machine learning module for image analysis. The data we collected included public data from local hospitals and clinical examples. Our platform is more clinical and suitable for clinical use. In addition to image data, we also structured the management and display of text information. So our platform is comprehensive. The platform framework is built in a modular way to support users to add machine learning modules independently, which makes our platform extensible.
【5】 A framework for comprehensible multi-modal detection of cyber threats 标题:一种可理解的多模式网络威胁检测框架 链接:https://arxiv.org/abs/2111.05764
作者:Jan Kohout,Čeněk Škarda,Kyrylo Shcherbin,Martin Kopp,Jan Brabec 机构:Cisco Systems, Prague, Czech Republic 摘要:检测企业环境中的恶意活动是一项非常复杂的任务,人们在其自动化方面投入了大量精力。然而,绝大多数现有方法只能在狭窄的范围内操作,这限制了它们只能捕获恶意软件存在的证据片段。因此,这种方法与领域专家研究和描述网络威胁的方式不一致。在这项工作中,我们讨论了这些限制,并设计了一个检测框架,它结合了来自不同数据源的观察事件。有鉴于此,它提供了对攻击生命周期的全面了解,并能够检测到需要将不同遥测数据的观测耦合起来以确定事件的全部范围的威胁。我们在一个公司网络中观察到的真实恶意软件感染的案例研究中展示了该框架的适用性。 摘要:Detection of malicious activities in corporate environments is a very complex task and much effort has been invested into research of its automation. However, vast majority of existing methods operate only in a narrow scope which limits them to capture only fragments of the evidence of malware's presence. Consequently, such approach is not aligned with the way how the cyber threats are studied and described by domain experts. In this work, we discuss these limitations and design a detection framework which combines observed events from different sources of data. Thanks to this, it provides full insight into the attack life cycle and enables detection of threats that require this coupling of observations from different telemetries to identify the full scope of the incident. We demonstrate applicability of the framework on a case study of a real malware infection observed in a corporate network.
【6】 Prune Once for All: Sparse Pre-Trained Language Models 标题:一劳永逸地修剪:稀疏的预先训练的语言模型 链接:https://arxiv.org/abs/2111.05754
作者:Ofir Zafrir,Ariel Larey,Guy Boudoukh,Haihao Shen,Moshe Wasserblat 机构:Intel Labs, Israel, Intel Corporation 备注:ENLSP NeurIPS Workshop 2021, 12 pages 摘要:基于转换器的语言模型在自然语言处理中有着广泛的应用。但是,它们效率低下,难以部署。近年来,为了提高基于目标硬件的大型Transformer模型的实现效率,提出了许多压缩算法。在这项工作中,我们提出了一种新的训练稀疏预训练Transformer语言模型的方法,结合权值剪枝和模型蒸馏。这些稀疏的预训练模型可用于广泛任务的学习迁移,同时保持其稀疏模式。我们用三种已知的体系结构来演示我们的方法,以创建稀疏的预训练的BERT基、BERT大和DistilBERT。我们展示了我们训练的压缩稀疏预训练模型如何将其知识转移到五种不同的下游自然语言任务中,并且精度损失最小。此外,我们还展示了如何使用量化感知训练将稀疏模型的权重进一步压缩到8位精度。例如,通过我们的稀疏预训练BERT Large在SQuADv1.1上进行微调并量化为8比特,我们为编码器实现了$40$X的压缩比,精度损失小于$1%$。据我们所知,我们的结果显示,对于BERT-Base、BERT-Large和DistilBERT,压缩精度比最佳。 摘要:Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase the implementation efficiency of large Transformer-based models on target hardware. In this work we present a new method for training sparse pre-trained Transformer language models by integrating weight pruning and model distillation. These sparse pre-trained models can be used to transfer learning for a wide range of tasks while maintaining their sparsity pattern. We demonstrate our method with three known architectures to create sparse pre-trained BERT-Base, BERT-Large and DistilBERT. We show how the compressed sparse pre-trained models we trained transfer their knowledge to five different downstream natural language tasks with minimal accuracy loss. Moreover, we show how to further compress the sparse models' weights to 8bit precision using quantization-aware training. For example, with our sparse pre-trained BERT-Large fine-tuned on SQuADv1.1 and quantized to 8bit we achieve a compression ratio of $40$X for the encoder with less than $1%$ accuracy loss. To the best of our knowledge, our results show the best compression-to-accuracy ratio for BERT-Base, BERT-Large, and DistilBERT.
【7】 Important Sentence Identification in Legal Cases Using Multi-Class Classification 标题:多类分类在法律案件重要句子识别中的应用 链接:https://arxiv.org/abs/2111.05721
作者:Sahan Jayasinghe,Lakith Rambukkanage,Ashan Silva,Nisansa de Silva,Amal Shehan Perera 机构:Department of Computer Science & Engineering, University of Moratuwa 摘要:自然语言处理(NLP)的发展正以实际应用和学术兴趣的形式在各个领域传播。从本质上讲,法律领域包含大量文本格式的数据。因此,需要应用NLP来满足该领域的分析需求。对于法律专业人士来说,识别法律案件中的重要句子、事实和论据是一项非常繁琐的任务。在这项研究中,我们从案件主要当事人的角度探讨了多类别分类中句子嵌入的用法,以识别法律案件中的重要句子。此外,定义了特定于任务的损失函数,以提高直接使用分类交叉熵损失所限制的准确性。 摘要:The advancement of Natural Language Processing (NLP) is spreading through various domains in forms of practical applications and academic interests. Inherently, the legal domain contains a vast amount of data in text format. Therefore it requires the application of NLP to cater to the analytically demanding needs of the domain. Identifying important sentences, facts and arguments in a legal case is such a tedious task for legal professionals. In this research we explore the usage of sentence embeddings for multi-class classification to identify important sentences in a legal case, in the perspective of the main parties present in the case. In addition, a task-specific loss function is defined in order to improve the accuracy restricted by the straightforward use of categorical cross entropy loss.
【8】 Counterfactual Explanations for Models of Code 标题:代码模型的反事实解释 链接:https://arxiv.org/abs/2111.05711
作者:Jürgen Cito,Isil Dillig,Vijayaraghavan Murali,Satish Chandra 机构:TU Wien and Facebook, Austria, UT Austin†, U.S.A. 备注:10 pages, 6 listings, 2 algorithms, 2 tables, 1 figure 摘要:机器学习(ML)模型在许多软件工程任务中扮演着越来越普遍的角色。然而,由于大多数模型现在都由不透明的深层神经网络提供支持,开发人员可能很难理解为什么模型会得出某个结论,以及如何根据模型的预测采取行动。基于这个问题,本文探讨了源代码模型的反事实解释。这种反事实的解释构成了对源代码的最小更改,在此更改下模型“改变了主意”。我们将反事实解释生成集成到真实环境中的源代码模型中。我们描述了影响找到真实和可信的反事实解释的能力的考虑因素,以及此类解释对模型用户的有用性。在一系列实验中,我们在三种不同的模型上研究了我们的方法的有效性,每种模型都基于在源代码上运行的类似于BERT的体系结构。 摘要:Machine learning (ML) models play an increasingly prevalent role in many software engineering tasks. However, because most models are now powered by opaque deep neural networks, it can be difficult for developers to understand why the model came to a certain conclusion and how to act upon the model's prediction. Motivated by this problem, this paper explores counterfactual explanations for models of source code. Such counterfactual explanations constitute minimal changes to the source code under which the model "changes its mind". We integrate counterfactual explanation generation to models of source code in a real-world setting. We describe considerations that impact both the ability to find realistic and plausible counterfactual explanations, as well as the usefulness of such explanation to the user of the model. In a series of experiments we investigate the efficacy of our approach on three different models, each based on a BERT-like architecture operating over source code.
【9】 LSP : Acceleration and Regularization of Graph Neural Networks via Locality Sensitive Pruning of Graphs 标题:LSP:基于图的位置敏感剪枝的图神经网络加速和正则化 链接:https://arxiv.org/abs/2111.05694
作者:Eitan Kosman,Joel Oren,Dotan Di Castro 机构:Bosch Center for Artificial Intelligence, Haifa, Israel 摘要:图形神经网络(GNNs)已经成为图形相关任务的非常成功的工具。然而,现实世界中的问题涉及非常大的图,并且使GNN适应这些问题所需的计算资源迅速增长。此外,真实世界图形的噪声性质和大小会导致GNN过度拟合,如果不进行适当的正则化。令人惊讶的是,最近的研究表明,大型图通常涉及许多冗余组件,这些组件可以在不太影响性能的情况下删除。这包括在通过GNNs层进行推理期间或作为稀疏输入图的预处理步骤删除节点或边。这一有趣的现象使最先进的GNN得以发展,既高效又准确。在本文中,我们进一步揭开了这一现象的神秘面纱,并提出了一种基于局部敏感散列的图剪枝系统方法,称为局部敏感剪枝(LSP)。我们的目标是对图进行稀疏化,使原始图的相似局部环境在生成的稀疏图中产生相似的环境,这是与图相关的任务的一个基本特征。为了证明基于局部图属性的剪枝的应用,我们举例说明了在各种场景中应用基于局部性属性的剪枝比其他剪枝策略的优势。在合成数据集和真实数据集上的大量实验证明了LSP的优越性,它可以在不影响性能的情况下从大型图形中删除大量边,并伴随着相当大的加速。 摘要:Graph Neural Networks (GNNs) have emerged as highly successful tools for graph-related tasks. However, real-world problems involve very large graphs, and the compute resources needed to fit GNNs to those problems grow rapidly. Moreover, the noisy nature and size of real-world graphs cause GNNs to over-fit if not regularized properly. Surprisingly, recent works show that large graphs often involve many redundant components that can be removed without compromising the performance too much. This includes node or edge removals during inference through GNNs layers or as a pre-processing step that sparsifies the input graph. This intriguing phenomenon enables the development of state-of-the-art GNNs that are both efficient and accurate. In this paper, we take a further step towards demystifying this phenomenon and propose a systematic method called Locality-Sensitive Pruning (LSP) for graph pruning based on Locality-Sensitive Hashing. We aim to sparsify a graph so that similar local environments of the original graph result in similar environments in the resulting sparsified graph, which is an essential feature for graph-related tasks. To justify the application of pruning based on local graph properties, we exemplify the advantage of applying pruning based on locality properties over other pruning strategies in various scenarios. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of LSP, which removes a significant amount of edges from large graphs without compromising the performance, accompanied by a considerable acceleration.
【10】 Efficient Neural Network Training via Forward and Backward Propagation Sparsification 标题:基于前向和后向传播稀疏化的高效神经网络训练 链接:https://arxiv.org/abs/2111.05685
作者:Xiao Zhou,Weizhong Zhang,Zonghao Chen,Shizhe Diao,Tong Zhang 机构:Hong Kong University of Science and Technology, Tsinghua University 摘要:稀疏训练是一种自然的想法,可以加快深层神经网络的训练速度并节省内存使用,特别是在大型现代神经网络明显过度参数化的情况下。然而,大多数现有方法在实践中无法实现这一目标,因为先前方法采用的基于链规则的梯度(w.r.t.结构参数)估计至少在反向传播步骤中需要密集计算。本文提出了一种具有完全稀疏前向和后向通道的有效稀疏训练方法,解决了这一问题。我们首先将训练过程描述为全局稀疏约束下的连续最小化问题。然后,我们将优化过程分为两个步骤,分别对应于权重更新和结构参数更新。对于前一步,我们使用传统的链式规则,通过利用稀疏结构可以实现稀疏。对于后一步,我们不使用现有方法中基于链规则的梯度估计,而是提出了一种减少方差的策略梯度估计,它只需要两个前向过程而不需要反向传播,从而实现完全稀疏的训练。我们证明了梯度估计的方差是有界的。在真实数据集上的大量实验结果表明,与以前的方法相比,我们的算法在加速训练过程方面更加有效,速度快了一个数量级。 摘要:Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods cannot achieve this goal in practice because the chain rule based gradient (w.r.t. structure parameters) estimators adopted by previous methods require dense computation at least in the backward propagation step. This paper solves this problem by proposing an efficient sparse training method with completely sparse forward and backward passes. We first formulate the training process as a continuous minimization problem under global sparsity constraint. We then separate the optimization process into two steps, corresponding to weight update and structure parameter update. For the former step, we use the conventional chain rule, which can be sparse via exploiting the sparse structure. For the latter step, instead of using the chain rule based gradient estimators as in existing methods, we propose a variance reduced policy gradient estimator, which only requires two forward passes without backward propagation, thus achieving completely sparse training. We prove that the variance of our gradient estimator is bounded. Extensive experimental results on real-world datasets demonstrate that compared to previous methods, our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
【11】 Social Fraud Detection Review: Methods, Challenges and Analysis 标题:社会诈骗侦查审查:方法、挑战与分析 链接:https://arxiv.org/abs/2111.05645
作者:Saeedreza Shehnepoor,Roberto Togneri,Wei Liu,Mohammed Bennamoun 机构:University of Western Australia, Australia 摘要:社交评论已经占据了网络的主导地位,并成为产品信息的合理来源。人们和企业利用这些信息进行决策。企业还利用社交信息,利用单个用户、多组用户或经过训练生成欺诈内容的机器人传播虚假信息。许多研究提出了基于用户行为和审查文本的方法来应对欺诈检测的挑战。为了提供详尽的文献综述,使用一个考虑三个关键组成部分的框架对社会欺诈检测进行了综述:综述本身、进行综述的用户和被综述的项目。在为组件表示提取特征时,将基于行为、基于文本的特征及其组合提供特征方面的审查。在这个框架下,我们对各种方法进行了全面的概述,包括有监督、半监督和无监督学习。介绍了欺诈检测的监督方法,并将其分为两个子类;古典的,深入的学习。解释了缺乏标记数据集的原因,并提出了可能的解决方案。为了帮助该领域的新研究人员更好地理解,在拟议系统框架的每个步骤中都提供了主题分析和未来方向的概述。 摘要:Social reviews have dominated the web and become a plausible source of product information. People and businesses use such information for decision-making. Businesses also make use of social information to spread fake information using a single user, groups of users, or a bot trained to generate fraudulent content. Many studies proposed approaches based on user behaviors and review text to address the challenges of fraud detection. To provide an exhaustive literature review, social fraud detection is reviewed using a framework that considers three key components: the review itself, the user who carries out the review, and the item being reviewed. As features are extracted for the component representation, a feature-wise review is provided based on behavioral, text-based features and their combination. With this framework, a comprehensive overview of approaches is presented including supervised, semi-supervised, and unsupervised learning. The supervised approaches for fraud detection are introduced and categorized into two sub-categories; classical, and deep learning. The lack of labeled datasets is explained and potential solutions are suggested. To help new researchers in the area develop a better understanding, a topic analysis and an overview of future directions is provided in each step of the proposed systematic framework.
【12】 Machine Learning Models Disclosure from Trusted Research Environments (TRE), Challenges and Opportunities 标题:机器学习模型揭示可信研究环境(TRE)、挑战和机遇 链接:https://arxiv.org/abs/2111.05628
作者:Esma Mansouri-Benssassi,Simon Rogers,Jim Smith,Felix Ritchie,Emily Jefferson,University of Dundee,NHS National Services Scotland,University of the West of England 摘要:可信研究环境(TRE)是研究人员可以访问敏感数据的安全环境。随着电子健康记录(EHR)、医学成像和基因组数据等医疗数据的增长和多样性,人工智能(AI)的使用普遍增加,机器学习(ML)的子领域尤其是在医疗领域。这就产生了披露来自TRE的新类型输出的愿望,例如经过训练的机器学习模型。尽管TRE中存在统计披露控制的具体指导方针和政策,但这些指导方针和政策并未令人满意地涵盖这些新类型的输出请求。在本文中,我们定义了TRE中机器学习在医疗保健中的应用和披露所面临的一些挑战。我们描述了AI的引入给TRE带来的各种漏洞。我们还介绍了与披露经过训练的ML模型相关的不同类型和级别的风险。最后,我们描述了在开发和调整策略和工具以安全地公开TRE的机器学习输出方面的新的研究机会。 摘要:Trusted Research environments (TRE)s are safe and secure environments in which researchers can access sensitive data. With the growth and diversity of medical data such as Electronic Health Records (EHR), Medical Imaging and Genomic data, there is an increase in the use of Artificial Intelligence (AI) in general and the subfield of Machine Learning (ML) in particular in the healthcare domain. This generates the desire to disclose new types of outputs from TREs, such as trained machine learning models. Although specific guidelines and policies exists for statistical disclosure controls in TREs, they do not satisfactorily cover these new types of output request. In this paper, we define some of the challenges around the application and disclosure of machine learning for healthcare within TREs. We describe various vulnerabilities the introduction of AI brings to TREs. We also provide an introduction to the different types and levels of risks associated with the disclosure of trained ML models. We finally describe the new research opportunities in developing and adapting policies and tools for safely disclosing machine learning outputs from TREs.
【13】 Conversational Recommendation:Theoretical Model and Complexity Analysis 标题:会话推荐:理论模型与复杂性分析 链接:https://arxiv.org/abs/2111.05578
作者:Tommaso Di Noia,Francesco Donini,Dietmar Jannach,FedelucioNarducci,Claudio Pomo 机构:Politecnico di Bari, Italy, Universita degli Studi della Tuscia, Italy, University of Klagenfurt, Austria 摘要:推荐系统是一种软件应用程序,它可以帮助用户在信息过载的情况下,通过个性化的方式,利用有关个人用户需求和偏好的知识,找到感兴趣的项目。在会话推荐方法中,这些需求和偏好由系统在交互式多回合对话中获得。文献中驱动此类对话框的一种常见方法是逐步询问用户关于所需和不需要的项目特征或单个项目的偏好。在这种情况下,一个中心研究目标是效率,根据所需交互的数量进行评估,直到找到令人满意的项目。这通常是通过对下一个向用户提出的最佳问题进行推断来实现的。如今,关于对话效率的研究几乎完全是实证性的,旨在证明,例如,在给定的应用程序中,一种选择问题的策略比另一种更好。通过这项工作,我们用一个理论上独立于领域的会话推荐模型来补充实证研究。该模型旨在涵盖一系列应用场景,使我们能够以正式的方式研究会话方法的效率,特别是设计最佳交互策略的计算复杂性。通过这样的理论分析,我们发现找到一个有效的会话策略是NP难的,在PSPACE中,一般来说,但对于特定类型的目录,上界降低到多段空间。从实践的角度来看,这一结果意味着目录特征会强烈影响个体会话策略的效率,因此在设计新策略时应予以考虑。对来自真实世界的数据集进行的初步实证分析与我们的发现一致。 摘要:Recommender systems are software applications that help users find items of interest in situations of information overload in a personalized way, using knowledge about the needs and preferences of individual users. In conversational recommendation approaches, these needs and preferences are acquired by the system in an interactive, multi-turn dialog. A common approach in the literature to drive such dialogs is to incrementally ask users about their preferences regarding desired and undesired item features or regarding individual items. A central research goal in this context is efficiency, evaluated with respect to the number of required interactions until a satisfying item is found. This is usually accomplished by making inferences about the best next question to ask to the user. Today, research on dialog efficiency is almost entirely empirical, aiming to demonstrate, for example, that one strategy for selecting questions is better than another one in a given application. With this work, we complement empirical research with a theoretical, domain-independent model of conversational recommendation. This model, which is designed to cover a range of application scenarios, allows us to investigate the efficiency of conversational approaches in a formal way, in particular with respect to the computational complexity of devising optimal interaction strategies. Through such a theoretical analysis we show that finding an efficient conversational strategy is NP-hard, and in PSPACE in general, but for particular kinds of catalogs the upper bound lowers to POLYLOGSPACE. From a practical point of view, this result implies that catalog characteristics can strongly influence the efficiency of individual conversational strategies and should therefore be considered when designing new strategies. A preliminary empirical analysis on datasets derived from a real-world one aligns with our findings.
【14】 Deep Attention-guided Graph Clustering with Dual Self-supervision 标题:具有双重自我监督的深度注意引导图聚类 链接:https://arxiv.org/abs/2111.05548
作者:Zhihao Peng,Hui Liu,Yuheng Jia,Junhui Hou 机构:Liu is with the School of Computing & Information Sciences, CaritasInstitute of Higher Education 摘要:现有的深度嵌入聚类只考虑最深的层来学习特征嵌入,因此不能很好地利用来自聚类指派的可用判别信息,从而导致性能限制。为此,我们提出了一种新的方法,即双自监督深度注意引导图聚类(DAGC)。具体地说,DAGC首先利用异构融合模块在每一层中自适应地集成自动编码器和图卷积网络的特征,然后使用尺度融合模块动态地连接不同层中的多尺度特征。这些模块能够通过基于注意的机制学习鉴别特征嵌入。此外,我们还设计了一个分布式融合模块,该模块利用集群分配直接获取集群结果。为了更好地探索集群分配中的鉴别信息,我们开发了一个双重自我监督解决方案,包括一个具有三重Kullback-Leibler散度损失的软自我监督策略和一个具有伪监督损失的硬自我监督策略。大量实验证明,在六个基准数据集上,我们的方法始终优于最先进的方法。特别是,我们的方法比最佳基线提高了ARI 18.14%以上。 摘要:Existing deep embedding clustering works only consider the deepest layer to learn a feature embedding and thus fail to well utilize the available discriminative information from cluster assignments, resulting performance limitation. To this end, we propose a novel method, namely deep attention-guided graph clustering with dual self-supervision (DAGC). Specifically, DAGC first utilizes a heterogeneity-wise fusion module to adaptively integrate the features of an auto-encoder and a graph convolutional network in each layer and then uses a scale-wise fusion module to dynamically concatenate the multi-scale features in different layers. Such modules are capable of learning a discriminative feature embedding via an attention-based mechanism. In addition, we design a distribution-wise fusion module that leverages cluster assignments to acquire clustering results directly. To better explore the discriminative information from the cluster assignments, we develop a dual self-supervision solution consisting of a soft self-supervision strategy with a triplet Kullback-Leibler divergence loss and a hard self-supervision strategy with a pseudo supervision loss. Extensive experiments validate that our method consistently outperforms state-of-the-art methods on six benchmark datasets. Especially, our method improves the ARI by more than 18.14% over the best baseline.
【15】 Lightweight machine unlearning in neural network 标题:神经网络中的轻量级机器遗忘 链接:https://arxiv.org/abs/2111.05528
作者:Kongyang Chen,Yiwen Wang,Yao Huang 摘要:近年来,机器学习神经网络已经深入到人们的生活中。作为便利的代价,人们的私人信息也有被披露的风险。及时引入了“被遗忘的权利”,规定个人有权根据自己的同意撤销对个人信息处理活动的同意。为了解决这一问题,提出了机器忘却学习,允许模型擦除所有私有信息的内存。以前的研究,包括再训练和增量学习来更新模型,通常会占用额外的存储空间或难以应用于神经网络。我们的方法只需要对目标模型的权重进行一个小的扰动,并使其在使用剩余数据子集训练的模型的方向上迭代,直到完全消除未学习数据对模型的贡献。在本文中,在五个数据集上的实验证明了我们的机器学习方法的有效性,并且我们的方法比再训练快15倍。 摘要:In recent years, machine learning neural network has penetrated deeply into people's life. As the price of convenience, people's private information also has the risk of disclosure. The "right to be forgotten" was introduced in a timely manner, stipulating that individuals have the right to withdraw their consent from personal information processing activities based on their consent. To solve this problem, machine unlearning is proposed, which allows the model to erase all memory of private information. Previous studies, including retraining and incremental learning to update models, often take up extra storage space or are difficult to apply to neural networks. Our method only needs to make a small perturbation of the weight of the target model and make it iterate in the direction of the model trained with the remaining data subset until the contribution of the unlearning data to the model is completely eliminated. In this paper, experiments on five datasets prove the effectiveness of our method for machine unlearning, and our method is 15 times faster than retraining.
【16】 LUMINOUS: Indoor Scene Generation for Embodied AI Challenges 标题:发光:面向具体化人工智能挑战的室内场景生成 链接:https://arxiv.org/abs/2111.05527
作者:Yizhou Zhao,Kaixiang Lin,Zhiwei Jia,Qiaozi Gao,Govind Thattai,Jesse Thomason,Gaurav S. Sukhatme 机构:University of California, Los Angeles; ,Amazon Alexa AI;, University of California, San Diego; ,University of Southern California 备注:2021 paper, Amazon 摘要:基于学习的嵌入式代理训练方法通常需要大量高质量的场景,这些场景包含真实的布局并支持有意义的交互。然而,目前针对嵌入式人工智能(EAI)挑战的模拟器只提供有限数量布局的模拟室内场景。本文介绍了第一个研究框架Lightning,该框架采用最先进的室内场景合成算法生成大规模模拟场景,以应对具体的AI挑战。此外,我们通过生成的室内场景支持复杂家庭任务的能力,自动定量地评估生成的室内场景的质量。夜光采用了一种新的场景生成算法(约束随机场景生成(CSSG)),该算法与人工设计的场景相比具有竞争力。在夜光中,EAI任务执行器、任务指令生成模块和视频渲染工具包可以共同生成新场景的大规模多模态数据集,用于训练和评估嵌入式AI代理。大量的实验结果证明了夜光生成的数据的有效性,从而能够全面评估具体化代理的泛化性和鲁棒性。 摘要:Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts. This paper presents Luminous, the first research framework that employs state-of-the-art indoor scene synthesis algorithms to generate large-scale simulated scenes for Embodied AI challenges. Further, we automatically and quantitatively evaluate the quality of generated indoor scenes via their ability to support complex household tasks. Luminous incorporates a novel scene generation algorithm (Constrained Stochastic Scene Generation (CSSG)), which achieves competitive performance with human-designed scenes. Within Luminous, the EAI task executor, task instruction generation module, and video rendering toolkit can collectively generate a massive multimodal dataset of new scenes for the training and evaluation of Embodied AI agents. Extensive experimental results demonstrate the effectiveness of the data generated by Luminous, enabling the comprehensive assessment of embodied agents on generalization and robustness.
【17】 Discovering Latent Representations of Relations for Interacting Systems 标题:发现交互系统关系的潜在表示 链接:https://arxiv.org/abs/2111.05514
作者:Dohae Lee,Young Jin Oh,In-Kwon Lee 机构:This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center), support program(IITP-,-,-,-,) supervised by the IITP(Institute for Information and Communications Technology Planning 备注:Accepted by IEEE Access on Oct. 25, 2021 摘要:实体相互交互的系统是常见的。在许多相互作用的系统中,很难观察到实体之间的关系,这是分析系统的关键信息。近年来,利用图神经网络发现实体之间的关系越来越受到人们的关注。然而,如果关系的数量未知或关系复杂,则现有方法难以应用。我们提出了发现潜在关系(DSLR)模型,即使关系的数量未知或存在多种类型的关系,该模型也可以灵活地应用。我们的DSLR模型的灵活性来自编码器的设计理念,编码器代表潜在空间中实体之间的关系,而不是离散变量,解码器可以处理多种类型的关系。我们对实体之间存在各种关系的合成和真实图形数据进行了实验,并将定性和定量结果与其他方法进行了比较。实验表明,该方法适用于分析具有未知数量复杂关系的动态图。 摘要:Systems whose entities interact with each other are common. In many interacting systems, it is difficult to observe the relations between entities which is the key information for analyzing the system. In recent years, there has been increasing interest in discovering the relationships between entities using graph neural networks. However, existing approaches are difficult to apply if the number of relations is unknown or if the relations are complex. We propose the DiScovering Latent Relation (DSLR) model, which is flexibly applicable even if the number of relations is unknown or many types of relations exist. The flexibility of our DSLR model comes from the design concept of our encoder that represents the relation between entities in a latent space rather than a discrete variable and a decoder that can handle many types of relations. We performed the experiments on synthetic and real-world graph data with various relationships between entities, and compared the qualitative and quantitative results with other approaches. The experiments show that the proposed method is suitable for analyzing dynamic graphs with an unknown number of complex relations.
【18】 Training Generative Adversarial Networks with Adaptive Composite Gradient 标题:用自适应复合梯度训练产生式对抗网络 链接:https://arxiv.org/abs/2111.05508
作者:Huiqing Qi,Fang Li,Shengli Tan,Xiangyun Zhang 机构:School of Mathematical Sciences, East China Normal University, Shanghai , China 摘要:生成对抗网络的广泛应用得益于成功的训练方法,保证目标函数收敛到局部极小值。然而,由于一些基于梯度的方法的循环行为以及这些基于Hessian矩阵的方法的昂贵计算成本,设计一种高效且具有竞争力的训练方法仍然是一项具有挑战性的任务。本文提出了自适应复合梯度(ACG)方法,在适当的设置下,该方法在双线性对策中线性收敛。理论和玩具函数实验表明,与最近提出的算法相比,我们的方法可以缓解循环行为,收敛速度更快。值得注意的是,ACG方法不仅用于寻找双线性对策以及一般对策中的稳定不动点。ACG方法是一种新的半梯度自由算法,因为它不需要计算每一步的梯度,通过在未来的迭代中利用预测信息来降低梯度和Hessian的计算成本。我们进行了两个混合高斯实验,将ACG与现有的线性GANs算法相结合。结果表明,ACG算法和以前的算法相比具有一定的竞争力。使用DCGANs对四个流行数据集(MNIST、Fashion MNIST、CIFAR-10和CelebA)进行的真实实验表明,我们的ACG方法优于几个基线,这说明了我们方法的优越性和有效性。 摘要:The wide applications of Generative adversarial networks benefit from the successful training methods, guaranteeing that an object function converges to the local minima. Nevertheless, designing an efficient and competitive training method is still a challenging task due to the cyclic behaviors of some gradient-based ways and the expensive computational cost of these methods based on the Hessian matrix. This paper proposed the adaptive Composite Gradients (ACG) method, linearly convergent in bilinear games under suitable settings. Theory and toy-function experiments suggest that our approach can alleviate the cyclic behaviors and converge faster than recently proposed algorithms. Significantly, the ACG method is not only used to find stable fixed points in bilinear games as well as in general games. The ACG method is a novel semi-gradient-free algorithm since it does not need to calculate the gradient of each step, reducing the computational cost of gradient and Hessian by utilizing the predictive information in future iterations. We conducted two mixture of Gaussians experiments by integrating ACG to existing algorithms with Linear GANs. Results show ACG is competitive with the previous algorithms. Realistic experiments on four prevalent data sets (MNIST, Fashion-MNIST, CIFAR-10, and CelebA) with DCGANs show that our ACG method outperforms several baselines, which illustrates the superiority and efficacy of our method.
【19】 Inclusive Speaker Verification with Adaptive thresholding 标题:基于自适应阈值的包容性说话人确认 链接:https://arxiv.org/abs/2111.05501
作者:Navdeep Jain,Hongcheng Wang 机构:Comcast Applied AI Research 摘要:在商业应用程序中使用基于说话人验证(SV)的系统时,重要的是,客户无论性别、年龄或种族,都要有包容性的体验。在本文中,我们分析了性别和年龄对SV的影响,发现对于不同性别和年龄组的期望共同错误接受率(FAR),不同性别和年龄组的错误拒绝率(FRR)是不同的。为了优化所有用户的FRR以获得所需的FAR,我们提出了SV的上下文(例如性别、年龄)自适应阈值框架。上下文可以作为许多实际应用的先验信息。我们还提出了一个连接的性别/年龄检测模型,在没有这些先验信息的情况下通过算法推导上下文。我们的实验表明,我们的上下文自适应阈值方法在构建更高效的包容性SV系统中是有效的。具体地说,我们表明,我们可以通过使用性别特定阈值,在voxceleb1测试集上降低特定性别所需FAR的FRR。对OGI儿童语音语料库的类似分析表明,通过使用特定于年龄的阈值,我们可以显著降低特定年龄组所需FAR的FRR。 摘要:While using a speaker verification (SV) based system in a commercial application, it is important that customers have an inclusive experience irrespective of their gender, age, or ethnicity. In this paper, we analyze the impact of gender and age on SV and find that for a desired common False Acceptance Rate (FAR) across different gender and age groups, the False Rejection Rate (FRR) is different for different gender and age groups. To optimize FRR for all users for a desired FAR, we propose a context (e.g. gender, age) adaptive thresholding framework for SV. The context can be available as prior information for many practical applications. We also propose a concatenated gender/age detection model to algorithmically derive the context in absence of such prior information. We experimentally show that our context-adaptive thresholding method is effective in building a more efficient inclusive SV system. Specifically, we show that we can reduce FRR for specific gender for a desired FAR on the voxceleb1 test set by using gender-specific thresholds. Similar analysis on OGI kids' speech corpus shows that by using an age-specific threshold, we can significantly reduce FRR for certain age groups for desired FAR.
【20】 Attention Approximates Sparse Distributed Memory 标题:注意力近似稀疏分布式内存 链接:https://arxiv.org/abs/2111.05498
作者:Trenton Bricken,Cengiz Pehlevan 机构:Systems, Synthetic and Quantitative Biology, Harvard University, Applied Mathematics 备注:None 摘要:虽然注意力已经成为深度学习的一个重要机制,但对于它为何如此有效的直觉仍然有限。在这里,我们表明,在某些数据条件下,Transformer注意可能与Kanerva的稀疏分布记忆(SDM)密切相关,SDM是一种生物学上合理的联想记忆模型。我们确认在预先训练的GPT2Transformer模型中满足这些条件。我们讨论了注意力SDM图的含义,并提供了关于注意力的新的计算和生物学解释。 摘要:While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva's Sparse Distributed Memory (SDM), a biologically plausible associative memory model. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.
【21】 Spatially and Seamlessly Hierarchical Reinforcement Learning for State Space and Policy space in Autonomous Driving 标题:自主驾驶状态空间和策略空间的空间无缝分层强化学习 链接:https://arxiv.org/abs/2111.05479
作者:Jaehyun Kim,Jaeseung Jeong 备注:14 pages, 8 figures, and 3 tables 摘要:尽管分层强化学习取得了进展,但其在公路自主驾驶路径规划中的应用仍具有挑战性。一个原因是传统的分层强化学习方法由于其风险性而不适用于自主驾驶:代理必须移动以避免多个障碍物,例如高度不可预测的其他代理,因此安全区域很小、分散且随时间变化。为了克服这一挑战,我们提出了一种适用于状态空间和策略空间的空间分层强化学习方法。高级策略不仅选择行为子策略,还选择状态空间中需要注意的区域和策略空间中的大纲。随后,低级策略在高级命令选择的区域大纲内详细说明代理的短期目标位置。我们的方法所建议的网络结构和优化与单级方法一样简洁。在具有各种形状道路的环境上的实验表明,我们的方法从早期事件中找到接近最优的策略,优于基线分层强化学习方法,尤其是在狭窄和复杂的道路上。由此产生的道路上的轨迹与人类在行为规划层面上的策略相似。 摘要:Despite advances in hierarchical reinforcement learning, its applications to path planning in autonomous driving on highways are challenging. One reason is that conventional hierarchical reinforcement learning approaches are not amenable to autonomous driving due to its riskiness: the agent must move avoiding multiple obstacles such as other agents that are highly unpredictable, thus safe regions are small, scattered, and changeable over time. To overcome this challenge, we propose a spatially hierarchical reinforcement learning method for state space and policy space. The high-level policy selects not only behavioral sub-policy but also regions to pay mind to in state space and for outline in policy space. Subsequently, the low-level policy elaborates the short-term goal position of the agent within the outline of the region selected by the high-level command. The network structure and optimization suggested in our method are as concise as those of single-level methods. Experiments on the environment with various shapes of roads showed that our method finds the nearly optimal policies from early episodes, outperforming a baseline hierarchical reinforcement learning method, especially in narrow and complex roads. The resulting trajectories on the roads were similar to those of human strategies on the behavioral planning level.
【22】 Multi-Task Prediction of Clinical Outcomes in the Intensive Care Unit using Flexible Multimodal Transformers 标题:使用柔性多模式Transformer对重症监护病房临床结果的多任务预测 链接:https://arxiv.org/abs/2111.05431
作者:Benjamin Shickel,Patrick J. Tighe,Azra Bihorac,Parisa Rashidi 机构:University of Florida, Gainesville, FL, United States 摘要:最近基于Transformer模型体系结构的深度学习研究已经证明了在各种领域和任务中的最新性能,主要是在计算机视觉和自然语言处理领域。虽然最近的一些研究使用电子健康记录数据为临床任务实施了Transformer,但它们在范围、灵活性和全面性方面受到限制。在这项研究中,我们提出了一个灵活的基于转换器的EHR嵌入管道和预测模型框架,该框架引入了对现有工作流的一些新的修改,这些修改利用了医疗领域特有的数据属性。我们在重症监护病房的一个案例研究中展示了我们灵活设计的可行性,我们的模型准确预测了未来多个时间范围内与再入院和患者死亡率相关的七个临床结果。 摘要:Recent deep learning research based on Transformer model architectures has demonstrated state-of-the-art performance across a variety of domains and tasks, mostly within the computer vision and natural language processing domains. While some recent studies have implemented Transformers for clinical tasks using electronic health records data, they are limited in scope, flexibility, and comprehensiveness. In this study, we propose a flexible Transformer-based EHR embedding pipeline and predictive model framework that introduces several novel modifications of existing workflows that capitalize on data attributes unique to the healthcare domain. We showcase the feasibility of our flexible design in a case study in the intensive care unit, where our models accurately predict seven clinical outcomes pertaining to readmission and patient mortality over multiple future time horizons.
【23】 Convolutional Neural Network Dynamics: A Graph Perspective 标题:卷积神经网络动力学:图论视角 链接:https://arxiv.org/abs/2111.05410
作者:Fatemeh Vahedian,Ruiyu Li,Puja Trivedi,Di Jin,Danai Koutra 机构:University of Michigan 摘要:神经网络(NNs)在广泛应用中的成功,使人们对理解这些模型的潜在学习动态产生了越来越大的兴趣。在本文中,我们从图形的角度研究NNs的图形结构与其性能之间的关系,从而超越了单纯的学习动力学描述。具体而言,我们建议(1)将神经网络学习过程表示为一个时间演化图(即一系列跨时代的静态图快照),(2)在一个简单的时间摘要中捕获神经网络在训练阶段的结构变化,以及(3)利用结构摘要预测分类或回归任务中基础NN的准确性。对于NNs的动态图表示,我们探索了全连通层和卷积层的结构表示,这是强大NN模型的关键组件。我们的分析表明,简单总结几个时期的图统计数据,如加权度和特征向量中心度,可以用来准确预测NNs的性能。例如,基于LeNet体系结构的5个训练时段构建的基于加权度的时间演化图摘要实现了93%以上的分类准确率。我们的发现对于不同的神经网络体系结构是一致的,包括LeNet、VGG、AlexNet和ResNet。 摘要:The success of neural networks (NNs) in a wide range of applications has led to increased interest in understanding the underlying learning dynamics of these models. In this paper, we go beyond mere descriptions of the learning dynamics by taking a graph perspective and investigating the relationship between the graph structure of NNs and their performance. Specifically, we propose (1) representing the neural network learning process as a time-evolving graph (i.e., a series of static graph snapshots over epochs), (2) capturing the structural changes of the NN during the training phase in a simple temporal summary, and (3) leveraging the structural summary to predict the accuracy of the underlying NN in a classification or regression task. For the dynamic graph representation of NNs, we explore structural representations for fully-connected and convolutional layers, which are key components of powerful NN models. Our analysis shows that a simple summary of graph statistics, such as weighted degree and eigenvector centrality, over just a few epochs can be used to accurately predict the performance of NNs. For example, a weighted degree-based summary of the time-evolving graph that is constructed based on 5 training epochs of the LeNet architecture achieves classification accuracy of over 93%. Our findings are consistent for different NN architectures, including LeNet, VGG, AlexNet and ResNet.
【24】 Pipeline for 3D reconstruction of the human body from AR/VR headset mounted egocentric cameras 标题:用于从安装在AR/VR耳机上的自我中心摄像机进行人体三维重建的流水线 链接:https://arxiv.org/abs/2111.05409
作者:Shivam Grover,Kshitij Sidana,Vanita Jain 机构:Bharati Vidyapeeth’s College of Engineering, New Delhi, India, Contributed equally to this work as first authors. 备注:11 pages, 12 figures and 2 tables 摘要:在本文中,我们提出了一种从自我中心的角度对全身进行三维重建的新管道。从以自我为中心的视角对人体进行三维重建是一项具有挑战性的任务,因为视图是倾斜的,距离摄像机较远的身体部位被遮挡。其中一个例子是安装在VR头戴式耳机下方的摄像头的视图。为了完成这项任务,我们首先利用条件式GANs将以自我为中心的观点转换为全身第三人称观点。这增加了图像的可理解性,并迎合了遮挡。生成的第三人称视图进一步通过生成身体三维网格的三维重建模块发送。我们还训练了一个网络,该网络可以获取主体的第三人称全身视图,并生成纹理贴图以应用于网格。生成的网格具有相当逼真的身体比例,并且完全装配,允许进一步的应用,如游戏中的实时动画和姿势转换。这种方法可以成为移动人远程临场感新领域的关键。 摘要:In this paper, we propose a novel pipeline for the 3D reconstruction of the full body from egocentric viewpoints. 3-D reconstruction of the human body from egocentric viewpoints is a challenging task as the view is skewed and the body parts farther from the cameras are occluded. One such example is the view from cameras installed below VR headsets. To achieve this task, we first make use of conditional GANs to translate the egocentric views to full body third-person views. This increases the comprehensibility of the image and caters to occlusions. The generated third-person view is further sent through the 3D reconstruction module that generates a 3D mesh of the body. We also train a network that can take the third person full-body view of the subject and generate the texture maps for applying on the mesh. The generated mesh has fairly realistic body proportions and is fully rigged allowing for further applications such as real-time animation and pose transfer in games. This approach can be key to a new domain of mobile human telepresence.
【25】 Statistical Perspectives on Reliability of Artificial Intelligence Systems 标题:人工智能系统可靠性的统计透视 链接:https://arxiv.org/abs/2111.05391
作者:Yili Hong,Jiayi Lian,Li Xu,Jie Min,Yueyao Wang,Laura J. Freeman,Xinwei Deng 机构:Department of Statistics, Virginia Tech, Blacksburg, VA 备注:40 pages 摘要:人工智能(AI)系统在许多领域变得越来越流行。然而,人工智能技术仍处于发展阶段,许多问题需要解决。其中,需要证明人工智能系统的可靠性,以便公众能够放心地使用人工智能系统。在本文中,我们提供了人工智能系统可靠性的统计观点。与其他考虑不同,人工智能系统的可靠性主要集中在时间维度上。也就是说,系统可以在预期的时间段内执行其设计的功能。我们介绍了人工智能可靠性研究的智能统计框架,该框架包括五个部分:系统结构、可靠性指标、故障原因分析、可靠性评估和测试计划。我们回顾了可靠性数据分析和软件可靠性中的传统方法,并讨论了如何将这些现有方法转化为人工智能系统的可靠性建模和评估。我们还描述了人工智能可靠性建模和分析的最新发展,概述了该领域的统计研究挑战,包括分布外检测、训练集的影响、对抗性攻击、模型准确性和不确定性量化,并讨论了这些主题如何与人工智能可靠性相关,举例说明。最后,我们讨论了AI可靠性评估的数据收集和测试计划,以及如何改进系统设计以获得更高的AI可靠性。论文最后作了一些总结。 摘要:Artificial intelligence (AI) systems have become increasingly popular in many areas. Nevertheless, AI technologies are still in their developing stages, and many issues need to be addressed. Among those, the reliability of AI systems needs to be demonstrated so that the AI systems can be used with confidence by the general public. In this paper, we provide statistical perspectives on the reliability of AI systems. Different from other considerations, the reliability of AI systems focuses on the time dimension. That is, the system can perform its designed functionality for the intended period. We introduce a so-called SMART statistical framework for AI reliability research, which includes five components: Structure of the system, Metrics of reliability, Analysis of failure causes, Reliability assessment, and Test planning. We review traditional methods in reliability data analysis and software reliability, and discuss how those existing methods can be transformed for reliability modeling and assessment of AI systems. We also describe recent developments in modeling and analysis of AI reliability and outline statistical research challenges in this area, including out-of-distribution detection, the effect of the training set, adversarial attacks, model accuracy, and uncertainty quantification, and discuss how those topics can be related to AI reliability, with illustrative examples. Finally, we discuss data collection and test planning for AI reliability assessment and how to improve system designs for higher AI reliability. The paper closes with some concluding remarks.
【26】 DataWords: Getting Contrarian with Text, Structured Data and Explanations 标题:DataWords:利用文本、结构化数据和解释实现反向操作 链接:https://arxiv.org/abs/2111.05384
作者:Stephen I. Gallant,Mirza Nasir Hossain 机构:Textician, Cambridge, MA 备注:11 pages 摘要:我们的目标是使用自由文本和结构化数据的组合构建分类模型。为此,我们通过文本句子、数据词来表示结构化数据,以便将相似的数据项映射到同一个句子中。这允许只使用文本建模算法对文本和结构化数据的混合进行建模。几个例子说明,在建立模型和分类之前,可以先运行提取工具(命名实体识别),然后将输出转换为数据词,并将数据词添加到原始文本中,从而提高文本分类性能。这种方法还允许我们根据自由文本和结构化数据对推论进行解释。 摘要:Our goal is to build classification models using a combination of free-text and structured data. To do this, we represent structured data by text sentences, DataWords, so that similar data items are mapped into the same sentence. This permits modeling a mixture of text and structured data by using only text-modeling algorithms. Several examples illustrate that it is possible to improve text classification performance by first running extraction tools (named entity recognition), then converting the output to DataWords, and adding the DataWords to the original text -- before model building and classification. This approach also allows us to produce explanations for inferences in terms of both free text and structured data.
【27】 Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems 标题:走向易驾驭的数学推理:解决数学应用题的挑战、策略和机会 链接:https://arxiv.org/abs/2111.05364
作者:Keyur Faldu,Amit Sheth,Prashant Kikani,Manas Gaur,Aditi Avasthi 机构:Embibe, University of, South Carolina 备注:15 pages, 2 tables, 4 figures 摘要:数学推理将是人工智能取得重大进展的下一个前沿领域之一。解决数学单词问题(MWP)从而获得更好的数学推理能力的持续激增将继续是未来研究的一个关键方向。我们检查非神经和神经的方法来解决用自然语言叙述的数学单词问题。我们还强调了这些方法的普遍性、数学合理性、可解释性和可解释性。神经方法主导了当前的研究现状,我们对它们进行了调查,重点介绍了解决MWP问题的三种策略:(1)直接答案生成,(2)推理答案的表达式树生成,(3)答案计算的模板检索。此外,我们还讨论了技术方法,回顾了解决MWP的直观设计选择的演变,并检查了它们的数学推理能力。最后,我们确定了在解决MWP的其他几个机会中,需要外部知识和知识注入学习的几个差距。 摘要:Mathematical reasoning would be one of the next frontiers for artificial intelligence to make significant progress. The ongoing surge to solve math word problems (MWPs) and hence achieve better mathematical reasoning ability would continue to be a key line of research in the coming time. We inspect non-neural and neural methods to solve math word problems narrated in a natural language. We also highlight the ability of these methods to be generalizable, mathematically reasonable, interpretable, and explainable. Neural approaches dominate the current state of the art, and we survey them highlighting three strategies to MWP solving: (1) direct answer generation, (2) expression tree generation for inferring answers, and (3) template retrieval for answer computation. Moreover, we discuss technological approaches, review the evolution of intuitive design choices to solve MWPs, and examine them for mathematical reasoning ability. We finally identify several gaps that warrant the need for external knowledge and knowledge-infused learning, among several other opportunities in solving MWPs.
【28】 Classifying Human Activities with Inertial Sensors: A Machine Learning Approach 标题:利用惯性传感器对人类活动进行分类:一种机器学习方法 链接:https://arxiv.org/abs/2111.05333
作者:Hamza Ali Imran,Saad Wazir,Usman Iftikhar,Usama Latif 机构:Department of Computing, School of Electrical Engineering &, Computer Science, National University of Science and, Technology, Islamabad, Pakistan, Specialist VAS and HLR, Pakistan Mobile Communications, Limited 摘要:人类活动识别(HAR)是一个正在进行的研究课题。它在医疗支持、体育、健身、社交网络、人机界面、老年护理、娱乐、监控等领域有着广泛的应用。传统上,HAR采用计算机视觉方法,存在保密性或隐私性、环境因素影响、机动性差、运行成本高、遮挡等诸多问题。最近出现了使用传感器,特别是惯性传感器的新趋势。采用传感器数据作为传统计算机视觉算法的替代方案有几个优点。计算机视觉算法的许多局限性已经在文献中被记录,包括利用传感器数据进行活动分类的深度神经网络(DNN)和机器学习(ML)方法的研究。我们研究和分析了不同的机器学习和深度学习方法,用于使用智能手机的惯性传感器数据进行人类活动识别。以确定哪种方法最适合此应用。 摘要:Human Activity Recognition (HAR) is an ongoing research topic. It has applications in medical support, sports, fitness, social networking, human-computer interfaces, senior care, entertainment, surveillance, and the list goes on. Traditionally, computer vision methods were employed for HAR, which has numerous problems such as secrecy or privacy, the influence of environmental factors, less mobility, higher running costs, occlusion, and so on. A new trend in the use of sensors, especially inertial sensors, has lately emerged. There are several advantages of employing sensor data as an alternative to traditional computer vision algorithms. Many of the limitations of computer vision algorithms have been documented in the literature, including research on Deep Neural Network (DNN) and Machine Learning (ML) approaches for activity categorization utilizing sensor data. We examined and analyzed different Machine Learning and Deep Learning approaches for Human Activity Recognition using inertial sensor data of smartphones. In order to identify which approach is best suited for this application.
【29】 Early Myocardial Infarction Detection over Multi-view Echocardiography 标题:多视角超声心动图检测早期心肌梗死 链接:https://arxiv.org/abs/2111.05790
作者:Aysen Degerli,Serkan Kiranyaz,Tahir Hamid,Rashid Mazhar,Moncef Gabbouj 机构:Tampere University, qa) is with the Department of Elec-trical Engineering, Qatar University 摘要:心肌梗死(MI)是世界上由于供养心肌的冠状动脉阻塞而导致死亡的主要原因。心肌梗死的早期诊断和定位可通过促进早期治疗干预减轻心肌损伤的程度。冠状动脉阻塞后,缺血心肌节段的局部室壁运动异常(RWMA)是最早发生的改变。超声心动图是评估任何RWMA的基本工具。仅从单个超声心动图视图评估左心室(LV)壁的运动可能导致错过MI的诊断,因为在该特定视图上可能看不到RWMA。因此,在本研究中,我们建议融合心尖四腔(A4C)和心尖二腔(A2C)视图,其中总共11个心肌节段可用于心肌梗死检测。所提出的方法首先通过主动多项式(APs)估计左室壁的运动,该多项式提取并跟踪心内膜边界以计算心肌节段位移。从A4C和A2C视图位移中提取特征,将其融合并输入分类器以检测MI。本研究的主要贡献是:1)通过将A4C和A2C视图包含在总共260个超声心动图记录中,创建了一个新的基准数据集,并与研究社区公开共享;2)通过基于机器学习的方法改进基于阈值的AP先前工作的性能,3)融合A4C和A2C视图信息,通过多视图超声心动图进行心肌梗死检测的先驱方法。实验结果表明,该方法对多视点超声心动图检测心肌梗死的灵敏度为90.91%,准确率为86.36%。 摘要:Myocardial infarction (MI) is the leading cause of mortality in the world that occurs due to a blockage of the coronary arteries feeding the myocardium. An early diagnosis of MI and its localization can mitigate the extent of myocardial damage by facilitating early therapeutic interventions. Following the blockage of a coronary artery, the regional wall motion abnormality (RWMA) of the ischemic myocardial segments is the earliest change to set in. Echocardiography is the fundamental tool to assess any RWMA. Assessing the motion of the left ventricle (LV) wall only from a single echocardiography view may lead to missing the diagnosis of MI as the RWMA may not be visible on that specific view. Therefore, in this study, we propose to fuse apical 4-chamber (A4C) and apical 2-chamber (A2C) views in which a total of 11 myocardial segments can be analyzed for MI detection. The proposed method first estimates the motion of the LV wall by Active Polynomials (APs), which extract and track the endocardial boundary to compute myocardial segment displacements. The features are extracted from the A4C and A2C view displacements, which are fused and fed into the classifiers to detect MI. The main contributions of this study are 1) creation of a new benchmark dataset by including both A4C and A2C views in a total of 260 echocardiography recordings, which is publicly shared with the research community, 2) improving the performance of the prior work of threshold-based APs by a Machine Learning based approach, and 3) a pioneer MI detection approach via multi-view echocardiography by fusing the information of A4C and A2C views. Experimental results show that the proposed method achieves 90.91% sensitivity and 86.36% precision for MI detection over multi-view echocardiography.
【30】 HASA-net: A non-intrusive hearing-aid speech assessment network 标题:HASA-NET:一种非侵入式助听器语音评估网络 链接:https://arxiv.org/abs/2111.05691
作者:Hsin-Tien Chiang,Yi-Chiao Wu,Cheng Yu,Tomoki Toda,Hsin-Min Wang,Yih-Chun Hu,Yu Tsao 机构:Research Center for Information Technology Innovation, Academia Sinica, Taiwan, Information Technology Center, Nagoya University, Japan, Institute of Information Science, Academia Sinica, Taiwan, University of Illinois at Urbana-Champaign, USA 摘要:由于不需要一个干净的参考,非侵入式语音评估方法已经引起了人们的广泛关注。近年来,深度神经网络(DNN)模型被用于构建非侵入性语音评估方法,并被证实具有良好的性能。然而,大多数基于DNN的方法是为正常听力听者设计的,没有考虑听力损失因素。在这项研究中,我们提出了一个基于DNN的助听器语音评估网络(HASA网络),该网络由一个双向长短时记忆(BLSTM)模型组成,根据输入语音信号和指定的听力损失模式同时预测语音质量和可懂度分数。据我们所知,HASA网络是第一个利用基于DNN的统一非侵入式助听器模型进行质量和清晰度评估的工作。实验结果表明,HASA网络的预测语音质量和可懂度分数分别与两个著名的侵入式助听器评估指标——助听器语音质量指数(HASQI)和助听器语音感知指数(HASPI)高度相关。 摘要:Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations. Recently, deep neural network (DNN) models have been applied to build non-intrusive speech assessment approaches and confirmed to provide promising performance. However, most DNN-based approaches are designed for normal-hearing listeners without considering hearing-loss factors. In this study, we propose a DNN-based hearing aid speech assessment network (HASA-Net), formed by a bidirectional long short-term memory (BLSTM) model, to predict speech quality and intelligibility scores simultaneously according to input speech signals and specified hearing-loss patterns. To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids. Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics, hearing aid speech quality index (HASQI) and hearing aid speech perception index (HASPI), respectively.
【31】 Federated Expectation Maximization with heterogeneity mitigation and variance reduction 标题:具有异质性缓解和方差减小的联合期望最大化 链接:https://arxiv.org/abs/2111.02083
作者:Aymeric Dieuleveut,Gersende Fort,Eric Moulines,Geneviève Robin 机构:Centre de Math´ematiques Appliqu´ees, Ecole Polytechnique, France, Institut Polytechnique de Paris, Institut de Math´ematiques de Toulouse, Universit´e de Toulouse; CNRS, UPS, Toulouse, France, CS Dpt, HSE University, Russian Federation, Genevieve Robin 备注:None 摘要:期望最大化(EM)算法是潜变量模型推理的默认算法。与机器学习的任何其他领域一样,将潜在变量模型应用于非常大的数据集使得必须使用先进的并行和分布式体系结构。本文介绍了FedEM,它是EM算法在联邦学习环境中的第一个扩展。FedEM是一种新的高效通信方法,它处理本地设备的部分参与,并且对数据集的异构分布具有鲁棒性。为了缓解通信瓶颈,FedEM压缩适当定义的完整数据和足够的统计数据。我们还开发和分析了FedEM的一个扩展,以进一步合并方差缩减方案。在所有情况下,我们推导了光滑非凸问题的有限时间复杂度界。数值结果支持我们的理论发现,并应用于生物多样性监测中的联邦缺失值插补。 摘要:The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning context. FedEM is a new communication efficient method, which handles partial participation of local devices, and is robust to heterogeneous distributions of the datasets. To alleviate the communication bottleneck, FedEM compresses appropriately defined complete data sufficient statistics. We also develop and analyze an extension of FedEM to further incorporate a variance reduction scheme. In all cases, we derive finite-time complexity bounds for smooth non-convex problems. Numerical results are presented to support our theoretical findings, as well as an application to federated missing values imputation for biodiversity monitoring.