人工智能学术速递[12.24]

2021-12-27 17:07:13 浏览数 (1)

cs.AI人工智能,共计26篇

【1】 ELSA: Enhanced Local Self-Attention for Vision Transformer 标题:ELSA:增强视觉转换器的局部自我注意 链接:https://arxiv.org/abs/2112.12786

作者:Jingkai Zhou,Pichao Wang,Fan Wang,Qiong Liu,Hao Li,Rong Jin 机构:South China University of Technology, Alibaba Group 备注:Project at url{this https URL} 摘要:自我关注在建模长期依赖方面很强大,但在局部精细特征学习方面却很弱。局域自聚焦(LSA)的性能与卷积相当,而不如动态滤波器,这就困扰了研究者是否使用LSA或它的对应物,哪一个更好,什么使得LSA平庸。为了澄清这些问题,我们从两个方面对LSA及其对应物进行了全面的研究:emph{channel setting}和emph{spatial processing}。我们发现,问题在于空间注意的产生和应用,其中相对位置嵌入和相邻过滤器的应用是关键因素。基于这些发现,我们提出了增强的局部自我注意(ELSA)与哈达玛注意和鬼头。Hadamard attention引入Hadamard乘积,在保持高阶映射的同时,在相邻情况下有效地生成注意。ghost head将注意力贴图与静态矩阵相结合,以增加通道容量。实验证明了ELSA的有效性。在不修改体系结构/超参数的情况下,用ELSA替换LSA可将Swin Transformercite{Swin}的精度提高1.4以上。从D1到D5,ELSA也一直使VOLOcite{VOLO}受益,其中ELSA-VOLO-D5在ImageNet-1K上达到87.2,无需额外的训练图像。此外,我们还评估了下游任务中的ELSA。在COCO上,ELSA显著提高了基线值 1.9箱Ap/ 1.3面罩Ap,而在ADE20K上,ELSA显著提高了基线值 1.9百万。代码位于url{https://github.com/damo-cv/ELSA}. 摘要:Self-attention is powerful in modeling long-range dependencies, but it is weak in local finer-level feature learning. The performance of local self-attention (LSA) is just on par with convolution and inferior to dynamic filters, which puzzles researchers on whether to use LSA or its counterparts, which one is better, and what makes LSA mediocre. To clarify these, we comprehensively investigate LSA and its counterparts from two sides: emph{channel setting} and emph{spatial processing}. We find that the devil lies in the generation and application of spatial attention, where relative position embeddings and the neighboring filter application are key factors. Based on these findings, we propose the enhanced local self-attention (ELSA) with Hadamard attention and the ghost head. Hadamard attention introduces the Hadamard product to efficiently generate attention in the neighboring case, while maintaining the high-order mapping. The ghost head combines attention maps with static matrices to increase channel capacity. Experiments demonstrate the effectiveness of ELSA. Without architecture / hyperparameter modification, drop-in replacing LSA with ELSA boosts Swin Transformer cite{swin} by up to 1.4 on top-1 accuracy. ELSA also consistently benefits VOLO cite{volo} from D1 to D5, where ELSA-VOLO-D5 achieves 87.2 on the ImageNet-1K without extra training images. In addition, we evaluate ELSA in downstream tasks. ELSA significantly improves the baseline by up to 1.9 box Ap / 1.3 mask Ap on the COCO, and by up to 1.9 mIoU on the ADE20K. Code is available at url{https://github.com/damo-cv/ELSA}.

【2】 An Ontological Knowledge Representation for Smart Agriculture 标题:一种面向智能农业的本体知识表示方法 链接:https://arxiv.org/abs/2112.12768

作者:Bikram Pratim Bhuyan,Ravi Tomar,Maanak Gupta,Amar Ramdane-Cherif 机构:∗†Dept. of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India, ‡Dept. of Computer Science, Tennessee Technological University, Cookeville, Tennessee , USA 摘要:为了向农业产业提供利用先进技术所需的基础设施,如大数据、云和物联网(IoT);智能农业是一种管理理念,其重点是提供跟踪、监控、自动化和分析运营所需的基础设施。表示从收集的原始数据中提取的知识至关重要。提出了一个智能农业系统的农业本体框架。知识图表示为一个格,用于捕获时空农业数据并进行推理。 摘要:In order to provide the agricultural industry with the infrastructure it needs to take advantage of advanced technology, such as big data, the cloud, and the internet of things (IoT); smart farming is a management concept that focuses on providing the infrastructure necessary to track, monitor, automate, and analyse operations. To represent the knowledge extracted from the primary data collected is of utmost importance. An agricultural ontology framework for smart agriculture systems is presented in this study. The knowledge graph is represented as a lattice to capture and perform reasoning on spatio-temporal agricultural data.

【3】 Toward a New Science of Common Sense 标题:走向一门新的常识科学 链接:https://arxiv.org/abs/2112.12754

作者:Ronald J. Brachman,Hector J. Levesque 机构: Jacobs Technion-Cornell Institute and Cornell University, Dept. of Computer Science, University of Toronto 备注:To be published in Proceedings of AAAI-22, 36th AAAI Conference on Artificial Intelligence 摘要:常识一直是人工智能的兴趣所在,但很少占据中心地位。尽管约翰·麦卡锡(John McCarthy)最早的一篇论文中提到了这一点,并且多年来一直致力于研究这一点,但可以说,迄今为止还没有出现过一个具有大量常识的人工智能系统。为什么呢?少了什么?人工智能系统在常识上的失败例子比比皆是,它们指出人工智能经常把注意力集中在专业知识上是原因。那些试图打破脆弱性障碍的人,即使是在现代深度学习的背景下,也倾向于将精力投入大量的小常识知识。但是,世界上所有的常识知识碎片加起来并不能构成一个以类似人类的方式展示常识的系统。我们主张从比过去更广泛的角度审视常识。常识比人们想象的更复杂,值得自己进行科学探索。 摘要:Common sense has always been of interest in AI, but has rarely taken center stage. Despite its mention in one of John McCarthy's earliest papers and years of work by dedicated researchers, arguably no AI system with a serious amount of general common sense has ever emerged. Why is that? What's missing? Examples of AI systems' failures of common sense abound, and they point to AI's frequent focus on expertise as the cause. Those attempting to break the brittleness barrier, even in the context of modern deep learning, have tended to invest their energy in large numbers of small bits of commonsense knowledge. But all the commonsense knowledge fragments in the world don't add up to a system that actually demonstrates common sense in a human-like way. We advocate examining common sense from a broader perspective than in the past. Common sense is more complex than it has been taken to be and is worthy of its own scientific exploration.

【4】 Assessing the Impact of Attention and Self-Attention Mechanisms on the Classification of Skin Lesions 标题:评估注意力和自我注意机制对皮损分类的影响 链接:https://arxiv.org/abs/2112.12748

作者:Rafael Pedro,Arlindo L. Oliveira 机构:Lisbon, Portugal, INESC-ID Instituto Superior T´ecnico 摘要:注意机制已经引起了研究界的极大兴趣,因为它们有望显著改善神经网络结构的性能。然而,在任何特定的问题上,我们仍然缺乏一种原则性的方法来选择特定的机制和超参数,从而保证改进。最近,自关注被提出并广泛应用于Transformer式结构中,在一些应用中取得了重大突破。在这项工作中,我们关注两种形式的注意机制:注意模块和自我注意。注意模块用于重新加权各层输入张量的特征。不同的模块有不同的方式在完全连接或卷积层中执行此重新称重。研究的注意力模型是完全模块化的,在这项工作中,它们将与流行的ResNet架构一起使用。自我注意,最初是在自然语言处理领域提出的,它使得把输入序列中的所有项目联系起来成为可能。自我关注在计算机视觉中变得越来越流行,在计算机视觉中,自我关注有时与卷积层结合在一起,尽管最近的一些体系结构完全消除了卷积。在这项工作中,我们研究并客观比较了在一项特定的计算机视觉任务中的许多不同注意机制,即广泛使用的皮肤癌MNIST数据集中的样本分类。结果表明,注意模块有时确实改善了卷积神经网络结构的性能,但这种改进虽然明显且具有统计学意义,但在不同的设置下并不一致。另一方面,通过自我注意机制获得的结果显示出一致且显著的改进,即使在参数数量减少的体系结构中也能获得最佳结果。 摘要:Attention mechanisms have raised significant interest in the research community, since they promise significant improvements in the performance of neural network architectures. However, in any specific problem, we still lack a principled way to choose specific mechanisms and hyper-parameters that lead to guaranteed improvements. More recently, self-attention has been proposed and widely used in transformer-like architectures, leading to significant breakthroughs in some applications. In this work we focus on two forms of attention mechanisms: attention modules and self-attention. Attention modules are used to reweight the features of each layer input tensor. Different modules have different ways to perform this reweighting in fully connected or convolutional layers. The attention models studied are completely modular and in this work they will be used with the popular ResNet architecture. Self-Attention, originally proposed in the area of Natural Language Processing makes it possible to relate all the items in an input sequence. Self-Attention is becoming increasingly popular in Computer Vision, where it is sometimes combined with convolutional layers, although some recent architectures do away entirely with convolutions. In this work, we study and perform an objective comparison of a number of different attention mechanisms in a specific computer vision task, the classification of samples in the widely used Skin Cancer MNIST dataset. The results show that attention modules do sometimes improve the performance of convolutional neural network architectures, but also that this improvement, although noticeable and statistically significant, is not consistent in different settings. The results obtained with self-attention mechanisms, on the other hand, show consistent and significant improvements, leading to the best results even in architectures with a reduced number of parameters.

【5】 Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling 标题:基于部分报酬解耦的协作多Agent策略学习 链接:https://arxiv.org/abs/2112.12740

作者:Benjamin Freed,Aditya Kapoor,Ian Abraham,Jeff Schneider,Howie Choset 备注:in IEEE Robotics and Automation Letters 摘要:将多智能体强化学习扩展到大量智能体的一个突出障碍是为单个智能体的行为分配信用。在本文中,我们使用一种称为{部分报酬解耦}(PRD)的方法来解决这个信用分配问题,该方法试图将大型合作多代理RL问题分解为涉及代理子集的解耦子问题,从而简化信用分配。我们的经验表明,与其他各种actor-critic方法相比,在actor-critic算法中使用PRD分解RL问题会导致较低的方差策略梯度估计,从而提高数据效率、学习稳定性和跨多agent RL任务的渐近性能。此外,我们将我们的方法与反事实多智能体策略梯度(COMA)相关联,这是一种最先进的MARL算法,并通过经验证明,我们的方法通过更好地利用智能体奖励流中的信息,以及通过使用优势估计的最新进展,优于COMA。 摘要:One of the preeminent obstacles to scaling multi-agent reinforcement learning to large numbers of agents is assigning credit to individual agents' actions. In this paper, we address this credit assignment problem with an approach that we call textit{partial reward decoupling} (PRD), which attempts to decompose large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment. We empirically demonstrate that decomposing the RL problem using PRD in an actor-critic algorithm results in lower variance policy gradient estimates, which improves data efficiency, learning stability, and asymptotic performance across a wide array of multi-agent RL tasks, compared to various other actor-critic approaches. Additionally, we relate our approach to counterfactual multi-agent policy gradient (COMA), a state-of-the-art MARL algorithm, and empirically show that our approach outperforms COMA by making better use of information in agents' reward streams, and by enabling recent advances in advantage estimation to be used.

【6】 Forward Composition Propagation for Explainable Neural Reasoning 标题:前向合成传播在可解释神经推理中的应用 链接:https://arxiv.org/abs/2112.12717

作者:Isel Grau,Gonzalo Nápoles,Marilyn Bello,Yamisleydi Salgueiro 机构:Information Systems Group, Eindhoven University of Technology, The Netherlands., Department of Cognitive Science & Artificial Intelligence, Tilburg University, The, Department of Computer Science, Central University of Las Villas, Cuba. 摘要:本文提出了一种称为前向复合传播(FCP)的算法来解释前馈神经网络对结构化模式识别问题的预测。在所提出的FCP算法中,每个神经元由一个组合向量描述,该组合向量表示每个问题特征在该神经元中的作用。合成向量使用给定的输入实例初始化,然后通过整个网络传播,直到到达输出层。值得一提的是,一旦网络的训练网络完成,算法就会执行。每个成分值的符号表示相应的特征是否刺激或抑制神经元,而绝对值则量化了这种影响。为了验证FCP算法的正确性,我们开发了一个关于已知基本事实的最新问题中的偏差检测的案例研究。仿真结果表明,组合值与受保护特征的预期行为密切相关。 摘要:This paper proposes an algorithm called Forward Composition Propagation (FCP) to explain the predictions of feed-forward neural networks operating on structured pattern recognition problems. In the proposed FCP algorithm, each neuron is described by a composition vector indicating the role of each problem feature in that neuron. Composition vectors are initialized using a given input instance and subsequently propagated through the whole network until we reach the output layer. It is worth mentioning that the algorithm is executed once the network's training network is done. The sign of each composition value indicates whether the corresponding feature excites or inhibits the neuron, while the absolute value quantifies such an impact. Aiming to validate the FCP algorithm's correctness, we develop a case study concerning bias detection in a state-of-the-art problem in which the ground truth is known. The simulation results show that the composition values closely align with the expected behavior of protected features.

【7】 Explainable Artificial Intelligence Methods in Combating Pandemics: A Systematic Review 标题:可解释人工智能方法在抗击流行病中的系统评价 链接:https://arxiv.org/abs/2112.12705

作者:Felipe Giuste,Wenqi Shi,Yuanda Zhu,Tarun Naren,Monica Isgut,Ying Sha,Li Tong,Mitali Gupte,May D. Wang 机构: Georgia Institute of Technology 备注:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: text overlap with arXiv:2006.11371 by other authors 摘要:尽管有2019冠状病毒疾病的新的人工智能(AI)为基础的解决方案,但很少有人取得了显著的临床影响。人工智能在2019冠状病毒疾病流行中的影响很大程度上是由于缺乏模型透明度。这篇系统性综述考察了可解释人工智能(XAI)在流感大流行期间的使用情况,以及它的使用如何克服现实世界成功的障碍。我们发现,成功使用XAI可以提高模型性能,增强最终用户的信任,并提供影响用户决策所需的价值。我们将向读者介绍常见的XAI技术、它们的实用程序以及它们的具体应用示例。XAI结果的评估也被讨论为使基于AI的临床决策支持系统的价值最大化的重要步骤。我们阐述了XAI的经典、现代和潜在的未来趋势,以阐明新型XAI技术的发展。最后,我们提供了一份在实验设计过程中得到最新出版物支持的建议清单。人工智能解决方案实施过程中的常见挑战也通过潜在解决方案的具体示例加以解决。我们希望这篇综述可以作为指导,以改善未来基于人工智能的解决方案的临床影响。 摘要:Despite the myriad peer-reviewed papers demonstrating novel Artificial Intelligence (AI)-based solutions to COVID-19 challenges during the pandemic, few have made significant clinical impact. The impact of artificial intelligence during the COVID-19 pandemic was greatly limited by lack of model transparency. This systematic review examines the use of Explainable Artificial Intelligence (XAI) during the pandemic and how its use could overcome barriers to real-world success. We find that successful use of XAI can improve model performance, instill trust in the end-user, and provide the value needed to affect user decision-making. We introduce the reader to common XAI techniques, their utility, and specific examples of their application. Evaluation of XAI results is also discussed as an important step to maximize the value of AI-based clinical decision support systems. We illustrate the classical, modern, and potential future trends of XAI to elucidate the evolution of novel XAI techniques. Finally, we provide a checklist of suggestions during the experimental design process supported by recent publications. Common challenges during the implementation of AI solutions are also addressed with specific examples of potential solutions. We hope this review may serve as a guide to improve the clinical impact of future AI-based solutions.

【8】 TagLab: A human-centric AI system for interactive semantic segmentation 标题:TagLab:一个以人为中心的交互式语义切分人工智能系统 链接:https://arxiv.org/abs/2112.12702

作者:Gaia Pavoni,Massimiliano Corsini,Federico Ponchio,Alessandro Muntoni,Paolo Cignoni 机构:Visual Computing Lab, ISTI-CNR, Pisa, Italy 备注:Accepted at Human Centered AI workshop at NeurIPS 2021, this https URL 摘要:高度特定语义类和复杂形状的全自动语义分割可能无法满足科学家要求的精度标准。在这种情况下,以人为中心的人工智能解决方案能够帮助操作员,同时保持对复杂任务的人工控制,这是一个很好的折衷方案,可以在保持高精度水平的同时加快图像标记速度。TagLab是一个开源的人工智能辅助软件,用于注释大型正射影像,利用不同程度的自动化;它通过辅助工具从零开始加速图像注释,创建定制的全自动语义分割模型,最后允许快速编辑自动预测。由于orthoimages分析适用于多个科学学科,TagLab设计了灵活的标记管道。我们在两个不同的场景中报告我们的结果,海洋生态和建筑遗产。 摘要:Fully automatic semantic segmentation of highly specific semantic classes and complex shapes may not meet the accuracy standards demanded by scientists. In such cases, human-centered AI solutions, able to assist operators while preserving human control over complex tasks, are a good trade-off to speed up image labeling while maintaining high accuracy levels. TagLab is an open-source AI-assisted software for annotating large orthoimages which takes advantage of different degrees of automation; it speeds up image annotation from scratch through assisted tools, creates custom fully automatic semantic segmentation models, and, finally, allows the quick edits of automatic predictions. Since the orthoimages analysis applies to several scientific disciplines, TagLab has been designed with a flexible labeling pipeline. We report our results in two different scenarios, marine ecology, and architectural heritage.

【9】 Black-Box Testing of Deep Neural Networks through Test Case Diversity 标题:基于测试用例多样性的深度神经网络黑盒测试 链接:https://arxiv.org/abs/2112.12591

作者:Zohreh Aghababaeyan,Manel Abdellatif,Lionel Briand,Ramesh S,Mojtaba Bagherzadeh 摘要:深度神经网络(DNN)已广泛应用于许多领域,包括图像处理、医疗诊断和自动驾驶。然而,DNN可能表现出错误行为,可能导致严重错误,尤其是在安全关键系统中使用时。受传统软件系统测试技术的启发,研究人员提出了神经元覆盖率标准,作为源代码覆盖率的类比,以指导DNN模型的测试。尽管对DNN覆盖率进行了非常积极的研究,但最近的几项研究对此类标准在指导DNN测试中的有用性提出了质疑。此外,从实际角度来看,这些标准是白盒,因为它们需要访问DNN模型的内部或训练数据,这在许多情况下是不可行或不方便的。在本文中,我们研究黑盒输入多样性度量作为白盒覆盖标准的替代。为此,我们首先选择并调整三种多样性指标,并以可控的方式研究它们测量输入集中实际多样性的能力。然后,我们使用两个数据集和三个DNN模型分析它们与故障检测的统计关联。我们进一步将多样性与最先进的白盒覆盖标准进行比较。我们的实验表明,依靠嵌入在测试输入集中的图像特征的多样性是比覆盖标准更可靠的指标,可以有效地指导DNN的测试。事实上,我们发现我们选择的一个黑盒多样性度量在故障揭示能力和计算时间方面远远优于现有的覆盖标准。结果也证实了这样的怀疑,即最先进的覆盖率指标不足以指导测试输入集的构建,从而用自然输入检测尽可能多的故障。 摘要:Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics, and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNN models. Despite very active research on DNN coverage, several recent studies have questioned the usefulness of such criteria in guiding DNN testing. Further, from a practical standpoint, these criteria are white-box as they require access to the internals or training data of DNN models, which is in many contexts not feasible or convenient. In this paper, we investigate black-box input diversity metrics as an alternative to white-box coverage criteria. To this end, we first select and adapt three diversity metrics and study, in a controlled manner, their capacity to measure actual diversity in input sets. We then analyse their statistical association with fault detection using two datasets and three DNN models. We further compare diversity with state-of-the-art white-box coverage criteria. Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria to effectively guide the testing of DNNs. Indeed, we found that one of our selected black-box diversity metrics far outperforms existing coverage criteria in terms of fault-revealing capability and computational time. Results also confirm the suspicions that state-of-the-art coverage metrics are not adequate to guide the construction of test input sets to detect as many faults as possible with natural inputs.

【10】 A deep reinforcement learning model for predictive maintenance planning of road assets: Integrating LCA and LCCA 标题:公路资产预见性养护计划的深度强化学习模型:LCA和LCCA的集成 链接:https://arxiv.org/abs/2112.12589

作者:Fateme Golivand Darvishvand,Moen Latifi 机构:†These authors contributed equally to this work 摘要:道路养护规划是道路资产管理的一个组成部分。维护和修复(M&R)实践中的主要挑战之一是确定维护类型和时间。本研究提出了一个基于长期路面性能(LTPP)数据库的强化学习(RL)框架,以确定M&R实践的类型和时间。该算法首先建立了预测DNN模型,作为RL算法的环境。对于RL模型的政策估计,开发了DQN和PPO模型。然而,由于更好的收敛性和更高的样本效率,最终选择了PPO。本研究中使用的指标为国际粗糙度指数(IRI)和车辙深度(RD)。最初,我们将开裂指标(CM)视为第三个指标,但由于与其他指标相比数据少得多,因此被排除在外,这导致结果的准确性较低。此外,在成本效益计算(报酬)中,我们考虑了M&R处理的经济和环境影响。已使用Plate 2.0软件对成本和环境影响进行了评估。我们的方法在一个假设的案例研究中得到验证,该案例研究位于德克萨斯州,该州气候温暖潮湿,全长23公里。研究结果提出了一个20年的M&R计划,其中道路条件保持在良好的条件范围内。由于道路的早期状态处于良好的服务水平,因此无需在最初几年进行大规模维护。后来,在大规模的M&R行动之后,有几年的时间不需要治疗。所有这些都表明,拟议的计划具有合乎逻辑的结果。决策者和运输机构可以使用该方案进行更好的维护实践,以防止预算浪费,同时将环境影响降至最低。 摘要:Road maintenance planning is an integral part of road asset management. One of the main challenges in Maintenance and Rehabilitation (M&R) practices is to determine maintenance type and timing. This research proposes a framework using Reinforcement Learning (RL) based on the Long Term Pavement Performance (LTPP) database to determine the type and timing of M&R practices. A predictive DNN model is first developed in the proposed algorithm, which serves as the Environment for the RL algorithm. For the Policy estimation of the RL model, both DQN and PPO models are developed. However, PPO has been selected in the end due to better convergence and higher sample efficiency. Indicators used in this study are International Roughness Index (IRI) and Rutting Depth (RD). Initially, we considered Cracking Metric (CM) as the third indicator, but it was then excluded due to the much fewer data compared to other indicators, which resulted in lower accuracy of the results. Furthermore, in cost-effectiveness calculation (reward), we considered both the economic and environmental impacts of M&R treatments. Costs and environmental impacts have been evaluated with paLATE 2.0 software. Our method is tested on a hypothetical case study of a six-lane highway with 23 kilometers length located in Texas, which has a warm and wet climate. The results propose a 20-year M&R plan in which road condition remains in an excellent condition range. Because the early state of the road is at a good level of service, there is no need for heavy maintenance practices in the first years. Later, after heavy M&R actions, there are several 1-2 years of no need for treatments. All of these show that the proposed plan has a logical result. Decision-makers and transportation agencies can use this scheme to conduct better maintenance practices that can prevent budget waste and, at the same time, minimize the environmental impacts.

【11】 Preprocessing in Inductive Logic Programming 标题:归纳逻辑程序设计中的预处理 链接:https://arxiv.org/abs/2112.12551

作者:Brad Hunter 机构:Linacre College, University of Oxford, A dissertation submitted for the degree of, Master of Mathematics and Foundations of Computer Science, arXiv:,.,v, [cs.LG] , Dec 备注:91 pages, 6 figures, Masters thesis 摘要:归纳逻辑编程是一种机器学习,其中逻辑程序从示例中学习。这种学习通常与作为逻辑程序提供的一些背景知识有关。本文介绍了底层预处理,一种ILP系统必须考虑的初始约束生成方法。底部预处理将逆蕴涵的思想应用于现代ILP系统。逆蕴涵是Progol引入的一种有影响力的早期ILP方法。本文还介绍了$bot$-Popper,它是现代ILP系统Popper底层预处理的一种实现。实验表明,底层预处理可以减少ILP系统在困难问题上的学习时间。当问题中的背景知识量很大时,这种减少可能特别显著。 摘要:Inductive logic programming is a type of machine learning in which logic programs are learned from examples. This learning typically occurs relative to some background knowledge provided as a logic program. This dissertation introduces bottom preprocessing, a method for generating initial constraints on the programs an ILP system must consider. Bottom preprocessing applies ideas from inverse entailment to modern ILP systems. Inverse entailment is an influential early ILP approach introduced with Progol. This dissertation also presents $bot$-Popper, an implementation of bottom preprocessing for the modern ILP system Popper. It is shown experimentally that bottom preprocessing can reduce learning times of ILP systems on hard problems. This reduction can be especially significant when the amount of background knowledge in the problem is large.

【12】 Neuroevolution deep learning architecture search for estimation of river surface elevation from photogrammetric Digital Surface Models 标题:神经进化深度学习结构在从摄影测量数字表面模型估算河流表面高程中的搜索 链接:https://arxiv.org/abs/2112.12510

作者:Radosław Szostak,Marcin Pietroń,Mirosław Zimnoch,Przemysław Wachniew,Paweł Ćwiąkała,Edyta Puniach 机构:AGH UST, Marcin Pietro´n, Paweł ´Cwi ˛akała 备注:extended version of NeurIPS 2021 Workshop paper - ML4PhysicalSciences 摘要:鉴于与全球变暖有关的极端水文事件日益频繁,对水的需求日益增加,开发地表水观测的新方法至关重要。使用无人机摄影测量获得的正射影像和数字表面模型(DSM)可用于确定河流的水面高程(WSE)。然而,由于摄影测量算法的限制,DSMs上的水面受到干扰,这项任务很困难。在这项研究中,机器学习用于从受干扰的摄影测量数据中提取WSE值。水文学和摄影测量专家为此专门准备了一个全新的数据集。新方法是实现高时空分辨率水面测量自动化的重要一步。这些数据可用于验证和校准水文、水力和水动力模型,使水文预报更加准确,特别是预测洪水或干旱等极端和危险事件。据我们所知,这是第一种为此目的创建数据集并为此任务使用深度学习模型的方法。此外,神经进化算法被设置为探索不同的架构以找到局部最优模型,并执行非梯度搜索以微调模型参数。与通过摄影测量DSM确定WSE的手动方法相比,所获得的结果具有更好的精度。 摘要:Development of the new methods of surface water observation is crucial in the perspective of increasingly frequent extreme hydrological events related to global warming and increasing demand for water. Orthophotos and digital surface models (DSMs) obtained using UAV photogrammetry can be used to determine the Water Surface Elevation (WSE) of a river. However, this task is difficult due to disturbances of the water surface on DSMs caused by limitations of photogrammetric algorithms. In this study, machine learning was used to extract a WSE value from disturbed photogrammetric data. A brand new dataset has been prepared specifically for this purpose by hydrology and photogrammetry experts. The new method is an important step toward automating water surface level measurements with high spatial and temporal resolution. Such data can be used to validate and calibrate of hydrological, hydraulic and hydrodynamic models making hydrological forecasts more accurate, in particular predicting extreme and dangerous events such as floods or droughts. For our knowledge this is the first approach in which dataset was created for this purpose and deep learning models were used for this task. Additionally, neuroevolution algorithm was set to explore different architectures to find local optimal models and non-gradient search was performed to fine-tune the model parameters. The achieved results have better accuracy compared to manual methods of determining WSE from photogrammetric DSMs.

【13】 From Procedures, Objects, Actors, Components, Services, to Agents -- A Comparative Analysis of the History and Evolution of Programming Abstractions 标题:从过程、对象、参与者、组件、服务到Agent--编程抽象的历史与演变比较分析 链接:https://arxiv.org/abs/2112.12508

作者:Jean-Pierre Briot 机构:† Sorbonne Universit´e, CNRS, LIP, F-, Paris, France 备注:This article has been submitted to a project of book about the French school of programming, coordinated by Bertrand Meyer 摘要:本章的目的是对编程抽象的发展提出一些回顾性分析,从{em过程}、{em对象}、{em参与者}、{em组件}、{em服务}到{em代理},有一些软件组件和代理(以及多代理系统)的比较概念,%选择的方法是通过在一般历史透视图中替换它们来实现。选择了一些具有三个轴/维度的公共引用:{em action selection}在一个实体的级别上,{em耦合灵活性}在实体之间,以及{em抽象级别}。我们确实可以观察到对更高灵活性的持续追求(通过{em后期绑定},或{em连接}的{em具体化})和更高级别的{em抽象}。组件、服务和代理的概念有一些共同的目标(特别是,{em软件模块化和可重构性}),多代理系统提出了{em自治}和{em协调}的进一步概念。特别是通过{em auto organization}的概念和{em knowledge}的使用。我们希望这一分析有助于突出推动编程抽象进展的一些基本力量,因此它可能为思考未来的编程抽象提供一些种子。 摘要:The objective of this chapter is to propose some retrospective analysis of the evolution of programming abstractions, from {em procedures}, {em objects}, {em actors}, {em components}, {em services}, up to {em agents}, %have some compare concepts of software component and of agent (and multi-agent system), %The method chosen is to by replacing them within a general historical perspective. Some common referential with three axes/dimensions is chosen: {em action selection} at the level of one entity, {em coupling flexibility} between entities, and {em abstraction level}. We indeed may observe some continuous quest for higher flexibility (through notions such as {em late binding}, or {em reification} of {em connections}) and higher level of {em abstraction}. Concepts of components, services and agents have some common objectives (notably, {em software modularity and reconfigurability}), with multi-agent systems raising further concepts of {em autonomy} and {em coordination}. notably through the notion of {em auto-organization} and the use of {em knowledge}. We hope that this analysis helps at highlighting some of the basic forces motivating the progress of programming abstractions and therefore that it may provide some seeds for the reflection about future programming abstractions.

【14】 FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 标题:FedFR:面向通用和个性化人脸识别的联合优化联邦框架 链接:https://arxiv.org/abs/2112.12496

作者:Chih-Ting Liu,Chien-Yi Wang,Shao-Yi Chien,Shang-Hong Lai 机构: Graduate Institute of Electronics and Engineering, National Taiwan University, Microsoft AI R&D Center, Taiwan 备注:This paper was accepted by AAAI 2022 Conference on Artificial Intelligence 摘要:当前最先进的基于深度学习的人脸识别(FR)模型需要大量的人脸身份进行集中训练。然而,由于隐私意识的增强,禁止在用户设备上访问人脸图像以不断改进人脸识别模型。联合学习(FL)是一种解决隐私问题的技术,它可以协同优化模型,而无需在客户端之间共享数据。在这项工作中,我们提出了一个基于FL的框架,称为FedFR,用于以隐私感知的方式改进通用人脸表示。此外,该框架通过提出的解耦特征定制模块,为相应的客户机联合优化个性化模型。特定于客户端的个性化模型可以满足在本地设备上注册身份的优化人脸识别体验的需要。据我们所知,我们是第一个在FL设置中探索个性化人脸识别的人。所提出的框架被验证为优于以前的方法在几个通用和个性化的人脸识别基准与不同的FL场景。FL设置下的源代码和我们建议的个性化FR基准可在https://github.com/jackie840129/FedFR. 摘要:Current state-of-the-art deep learning based face recognition (FR) models require a large number of face identities for central training. However, due to the growing privacy awareness, it is prohibited to access the face images on user devices to continually improve face recognition models. Federated Learning (FL) is a technique to address the privacy issue, which can collaboratively optimize the model without sharing the data between clients. In this work, we propose a FL based framework called FedFR to improve the generic face representation in a privacy-aware manner. Besides, the framework jointly optimizes personalized models for the corresponding clients via the proposed Decoupled Feature Customization module. The client-specific personalized model can serve the need of optimized face recognition experience for registered identities at the local device. To the best of our knowledge, we are the first to explore the personalized face recognition in FL setup. The proposed framework is validated to be superior to previous approaches on several generic and personalized face recognition benchmarks with diverse FL scenarios. The source codes and our proposed personalized FR benchmark under FL setup are available at https://github.com/jackie840129/FedFR.

【15】 Curriculum Learning for Safe Mapless Navigation 标题:关于安全无人驾驶的课程学习 链接:https://arxiv.org/abs/2112.12490

作者:Luca Marzari,Davide Corsi,Enrico Marchesini,Alessandro Farinelli 机构:Computer Science Department, University of Verona, Verona, Italy 备注:8 pages, 5 figures. The poster version of this paper has been accepted by The 37th ACM/SIGAPP Symposium on Applied Computing Proceedings (SAC IRMAS 2022) 摘要:这项工作调查了基于课程学习(CL)的方法对代理绩效的影响。我们特别关注mapless机器人导航的安全方面,与标准端到端(E2E)训练策略进行比较。为此,我们提出了一种CL方法,利用基于统一的模拟中的学习转移(ToL)和微调,以Robotnik Kairos作为机器人代理。为了进行公平比较,我们的评估考虑了每种学习方法的同等计算需求(即,相同数量的交互和环境难度),并确认我们基于CL的使用ToL的方法优于E2E方法。特别是,我们提高了平均成功率和经过训练的策略的安全性,从而在看不见的测试场景中减少了10%的冲突。为了进一步证实这些结果,我们使用了一个正式的验证工具来量化强化学习策略在期望规范下的正确行为数量。 摘要:This work investigates the effects of Curriculum Learning (CL)-based approaches on the agent's performance. In particular, we focus on the safety aspect of robotic mapless navigation, comparing over a standard end-to-end (E2E) training strategy. To this end, we present a CL approach that leverages Transfer of Learning (ToL) and fine-tuning in a Unity-based simulation with the Robotnik Kairos as a robotic agent. For a fair comparison, our evaluation considers an equal computational demand for every learning approach (i.e., the same number of interactions and difficulty of the environments) and confirms that our CL-based method that uses ToL outperforms the E2E methodology. In particular, we improve the average success rate and the safety of the trained policy, resulting in 10% fewer collisions in unseen testing scenarios. To further confirm these results, we employ a formal verification tool to quantify the number of correct behaviors of Reinforcement Learning policies over desired specifications.

【16】 Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning 标题:基于局部优势网络的协作式多智能体强化学习 链接:https://arxiv.org/abs/2112.12458

作者:Raphaël Avalos,Mathieu Reymond,Ann Nowé,Diederik M. Roijers 机构:Vrije Universiteit Brussel, HU Univ. of Appl. Sci. Utrecht 摘要:多智能体强化学习(MARL)使我们能够在具有挑战性的环境中创建自适应智能体,即使这些智能体的观察能力有限。到目前为止,现代马尔方法的重点是寻找因式分解的值函数。虽然这种方法已被证明是成功的,但由此产生的方法具有复杂的网络结构。我们采取了一种完全不同的方法,建立在独立Q-学习者的结构之上。受基于影响的抽象的启发,我们从观察开始,观察动作历史的紧凑表示足以学习接近最优分散策略。将这一观察结果与决斗体系结构相结合,我们的算法LAN将这些策略表示为单独的个人优势功能w.r.t.一个集中的批评家。这些局部优势网络仅以单个代理的局部观察动作历史为条件。集中的值函数条件取决于代理的表示以及环境的完整状态。值函数在执行之前被丢弃,作为一个稳定器,协调学习并在学习期间制定DQN目标。与其他方法相比,这使LAN能够保持其集中网络的网络参数数量与代理数量无关,而不会施加诸如单调值函数之类的附加约束。当在星际争霸多代理挑战基准上进行评估时,LAN表现出最先进的性能,在两张之前未解决的地图“走廊”和“3s5z_vs_3s6z”中得分超过80%,从而在14张地图上的平均性能比QPLEX提高了10%。此外,当代理数量变大时,LAN使用的参数明显少于QPLEX甚至QMIX。因此,我们表明,局域网的结构形成了一个关键的改进,有助于MARL方法保持可扩展性。 摘要:Multi-agent reinforcement learning (MARL) enables us to create adaptive agents in challenging environments, even when the agents have limited observation. Modern MARL methods have hitherto focused on finding factorized value functions. While this approach has proven successful, the resulting methods have convoluted network structures. We take a radically different approach, and build on the structure of independent Q-learners. Inspired by influence-based abstraction, we start from the observation that compact representations of the observation-action histories can be sufficient to learn close to optimal decentralized policies. Combining this observation with a dueling architecture, our algorithm, LAN, represents these policies as separate individual advantage functions w.r.t. a centralized critic. These local advantage networks condition only on a single agent's local observation-action history. The centralized value function conditions on the agents' representations as well as the full state of the environment. The value function, which is cast aside before execution, serves as a stabilizer that coordinates the learning and to formulate DQN targets during learning. In contrast with other methods, this enables LAN to keep the number of network parameters of its centralized network independent in the number of agents, without imposing additional constraints like monotonic value functions. When evaluated on the StarCraft multi-agent challenge benchmark, LAN shows state-of-the-art performance and scores more than 80% wins in two previously unsolved maps `corridor' and `3s5z_vs_3s6z', leading to an improvement of 10% over QPLEX on average performance on the 14 maps. Moreover when the number of agents becomes large, LAN uses significantly fewer parameters than QPLEX or even QMIX. We thus show that LAN's structure forms a key improvement that helps MARL methods remain scalable.

【17】 Making sense of electrical vehicle discussions using sentiment analysis on closely related news and user comments 标题:利用对密切相关新闻和用户评论的情感分析来理解电动汽车讨论 链接:https://arxiv.org/abs/2112.12327

作者:Josh Everts,Xuan Jiang 摘要:我们对新闻和用户评论数据集使用了无监督和有监督模型,使用了令牌和文档情感分析。我们的代币式情绪分析发现两组之间的情绪存在统计上的显著差异(均为非常大的N),我们的文档式监督情绪分析发现情绪没有显著差异。 摘要:We used a token-wise and document-wise sentiment analysis using both unsupervised and supervised models applied to both news and user reviews dataset. And our token-wise sentiment analysis found a statistically significant difference in sentiment between the two groups (both of which were very large N), our document-wise supervised sentiment analysis found no significant difference in sentiment.

【18】 Investigating Effect of Dialogue History in Multilingual Task Oriented Dialogue Systems 标题:对话历史在面向多语言任务的对话系统中的作用研究 链接:https://arxiv.org/abs/2112.12318

作者:Michael Sun,Kaili Huang,Mehrad Moradshahi 机构:Department of Computer Science, Stanford University 摘要:虽然英语虚拟助理通过大量的训练资源取得了令人振奋的成绩,但非英语使用者的需求并没有得到很好的满足。截至12月2021日,Alexa是世界上最受欢迎的智能扬声器之一,能够支持9种不同的语言(1),而世界上有成千上万种语言,根据2019(2)中发表的统计,其中91种语言由1000万多人讲。然而,用英语以外的其他语言训练虚拟助理往往更加困难,尤其是对于那些资源不足的语言。缺乏高质量的训练数据限制了模型的性能,导致用户满意度差。因此,我们使用与BiToD相同的数据集生成管道和端到端对话系统架构[5],为多语言任务导向对话系统设计了一个高效的训练解决方案,它采用了一些关键的设计选择来进行简约自然语言设计,其中使用正式的对话状态来代替自然语言输入。这减少了较弱的自然语言模型带来的错误空间,并确保模型能够正确提取执行对话状态跟踪(DST)所需的基本时隙值。我们的目标是减少每个回合的自然语言编码量,我们研究的关键参数是作为历史输入模型的回合数(H)。我们首先探讨了一个转折点,即增加H开始对整体性能产生有限的回报。然后,我们检查了一个小H的模型出错的例子是否可以被分类,以便该模型进行一些镜头微调。最后,我们将探讨这种方法的局限性,以及是否存在这种方法无法解决的特定类型的示例。 摘要:While the English virtual assistants have achieved exciting performance with an enormous amount of training resources, the needs of non-English-speakers have not been satisfied well. Up to Dec 2021, Alexa, one of the most popular smart speakers around the world, is able to support 9 different languages [1], while there are thousands of languages in the world, 91 of which are spoken by more than 10 million people according to statistics published in 2019 [2]. However, training a virtual assistant in other languages than English is often more difficult, especially for those low-resource languages. The lack of high-quality training data restricts the performance of models, resulting in poor user satisfaction. Therefore, we devise an efficient and effective training solution for multilingual task-orientated dialogue systems, using the same dataset generation pipeline and end-to-end dialogue system architecture as BiToD[5], which adopted some key design choices for a minimalistic natural language design where formal dialogue states are used in place of natural language inputs. This reduces the room for error brought by weaker natural language models, and ensures the model can correctly extract the essential slot values needed to perform dialogue state tracking (DST). Our goal is to reduce the amount of natural language encoded at each turn, and the key parameter we investigate is the number of turns (H) to feed as history to model. We first explore the turning point where increasing H begins to yield limiting returns on the overall performance. Then we examine whether the examples a model with small H gets wrong can be categorized in a way for the model to do few-shot finetuning on. Lastly, will explore the limitations of this approach, and whether there is a certain type of examples that this approach will not be able to resolve.

【19】 Adversarial Attacks against Windows PE Malware Detection: A Survey of the State-of-the-Art 标题:针对Windows PE恶意软件检测的对抗性攻击:现状综述 链接:https://arxiv.org/abs/2112.12310

作者:Xiang Ling,Lingfei Wu,Jiangyu Zhang,Zhenqing Qu,Wei Deng,Xiang Chen,Chunming Wu,Shouling Ji,Tianyue Luo,Jingzheng Wu,Yanjun Wu 机构:Sciences, China 摘要:该恶意软件一直是跨多个操作系统和各种文件格式的计算机面临的最具破坏性的威胁之一。为了抵御不断增加和不断演变的恶意软件威胁,人们做出了巨大的努力,提出了各种恶意软件检测方法,试图有效地检测恶意软件。最近的研究表明,一方面,现有的ML和DL能够更好地检测新出现的和以前看不见的恶意软件。然而,另一方面,ML和DL模型天生容易受到以对抗性示例形式出现的对抗性攻击,这些攻击是通过稍微小心地干扰合法输入以混淆目标模型而恶意生成的。基本上,对抗性攻击最初是在计算机视觉领域广泛研究的,有些攻击很快扩展到其他领域,包括NLP、语音识别甚至恶意软件检测。本文以Windows操作系统家族中可移植可执行文件(PE)格式的恶意软件,即Windows PE恶意软件为代表,研究在此类对抗环境下的对抗攻击方法。具体而言,我们首先概述了基于ML/DL的Windows PE恶意软件检测的一般学习框架,然后重点介绍了在PE恶意软件环境中执行对抗性攻击的三个独特挑战。然后,我们进行全面和系统的审查,对针对PE恶意软件检测的最先进的对抗性攻击进行分类,以及相应的防御措施,以提高PE恶意软件检测的鲁棒性。我们首先介绍了针对Windows PE恶意软件检测的其他相关攻击,而不是对抗性攻击,然后阐明了未来的研究方向和机会。 摘要:The malware has been being one of the most damaging threats to computers that span across multiple operating systems and various file formats. To defend against the ever-increasing and ever-evolving threats of malware, tremendous efforts have been made to propose a variety of malware detection methods that attempt to effectively and efficiently detect malware. Recent studies have shown that, on the one hand, existing ML and DL enable the superior detection of newly emerging and previously unseen malware. However, on the other hand, ML and DL models are inherently vulnerable to adversarial attacks in the form of adversarial examples, which are maliciously generated by slightly and carefully perturbing the legitimate inputs to confuse the targeted models. Basically, adversarial attacks are initially extensively studied in the domain of computer vision, and some quickly expanded to other domains, including NLP, speech recognition and even malware detection. In this paper, we focus on malware with the file format of portable executable (PE) in the family of Windows operating systems, namely Windows PE malware, as a representative case to study the adversarial attack methods in such adversarial settings. To be specific, we start by first outlining the general learning framework of Windows PE malware detection based on ML/DL and subsequently highlighting three unique challenges of performing adversarial attacks in the context of PE malware. We then conduct a comprehensive and systematic review to categorize the state-of-the-art adversarial attacks against PE malware detection, as well as corresponding defenses to increase the robustness of PE malware detection. We conclude the paper by first presenting other related attacks against Windows PE malware detection beyond the adversarial attacks and then shedding light on future research directions and opportunities.

【20】 Algorithmic Probability of Large Datasets and the Simplicity Bubble Problem in Machine Learning 标题:大数据集的算法概率与机器学习中的简单性泡沫问题 链接:https://arxiv.org/abs/2112.12275

作者:Felipe S. Abrahão,Hector Zenil,Fabio Porto,Klaus Wehmuth 机构:oratory for Scientific Computing (LNCC),-, Petr´opolis, RJ, Brazil., for the Natural and Digital Sciences, Paris, France., The Alan Turing Institute, British Library,QR, Euston Rd, Lon-, don NW,DB. Algorithmic Dynamics Lab, Unit of Computational 摘要:在挖掘大型数据集以预测新数据时,统计机器学习背后原理的局限性不仅对大数据泛滥构成了严重挑战,也对数据生成过程偏向于低算法复杂性的传统假设构成了严重挑战。即使假设在有限数据集生成器中存在一种潜在的算法信息偏向于简单性,我们也表明,无论是否使用伪随机生成器,完全自动化的可计算学习算法,特别是当前机器学习(包括深度学习)方法中使用的统计性质的算法,总是会被足够大的数据集自然或人为地欺骗。特别是,我们证明,对于每个有限学习算法,都有一个足够大的数据集大小,超过该数据集,不可预测欺骗者的算法概率是任何其他较大数据集的算法概率的上界(最多一个乘法常数,仅取决于学习算法)。换句话说,与任何其他特定数据集一样,非常大和复杂的数据集也可能将学习算法欺骗成“简单泡沫”。这些欺骗性的数据集保证了任何预测都会偏离高算法复杂度的全局最优解,同时收敛到低算法复杂度的局部最优解。我们讨论了规避这种欺骗性现象的框架和经验条件,从统计机器学习转向基于算法信息理论和可计算性理论的内在力量或受其驱动的更强类型的机器学习。 摘要:When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that fully automated, with or without access to pseudo-random generators, computable learning algorithms, in particular those of statistical nature used in current approaches to machine learning (including deep learning), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every finite learning algorithm, there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets are as likely to deceive learning algorithms into a "simplicity bubble" as any other particular dataset. These deceiving datasets guarantee that any prediction will diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution. We discuss the framework and empirical conditions for circumventing this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, or motivated by, the intrinsic power of algorithmic information theory and computability theory.

【21】 Entropy-Regularized Partially Observed Markov Decision Processes 标题:熵正则部分观测马尔可夫决策过程 链接:https://arxiv.org/abs/2112.12255

作者:Timothy L. Molloy,Girish N. Nair 机构: limited attention has been paidto the problem of controlling POMDPs to reduce the rate at which information from observations isThe authors are with the Department of Electrical and Electronic Engineering, University of Melbourne 备注:20 pages, 2 figures, submitted 摘要:我们研究了部分观测马尔可夫决策过程(POMDP),其代价函数由描述状态、观测和控制不确定性的熵项正则化。标准POMDP技术可为这些熵正则化POMDP提供有界误差解,当正则化涉及状态、观测和控制轨迹的联合熵时,可提供精确解。我们的联合熵结果特别令人惊讶,因为它构成了一种新的、易于处理的主动状态估计公式。 摘要:We investigate partially observed Markov decision processes (POMDPs) with cost functions regularized by entropy terms describing state, observation, and control uncertainty. Standard POMDP techniques are shown to offer bounded-error solutions to these entropy-regularized POMDPs, with exact solutions when the regularization involves the joint entropy of the state, observation, and control trajectories. Our joint-entropy result is particularly surprising since it constitutes a novel, tractable formulation of active state estimation.

【22】 ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data 标题:ML4CO:GCNN是你需要的全部吗?如果对适当的数据进行适当的调整和训练,图形卷积神经网络可以为组合优化问题提供强大的基线 链接:https://arxiv.org/abs/2112.12251

作者:Amin Banitalebi-Dehkordi,Yong Zhang 机构:Huawei Technologies Canada Co., Ltd. 备注:Runner-up in the 2021 ML4CO NeurIPS Competition 摘要:2021个NILPS机器学习组合优化(ML4CO)竞争的目的是改进国家的最先进的组合优化求解器,用机器学习模型替换关键启发式组件。竞赛的主要科学问题是:当历史数据可用时,机器学习是否是改进特定问题分布的传统组合优化解算器的可行选择?这是因为在许多实际场景中,在重复组合优化问题之间,数据只会发生轻微变化,而机器学习模型在这一领域尤其强大。本文总结了华为EI-OROAS团队在竞争的双重任务中的解决方案和经验教训。我们队在最后的排名中获得了第二名,离第一名非常近。此外,在最终评估之前,我们的解决方案在几次每周排行榜更新中始终排名第一。我们提供了从大量实验中获得的见解,并认为简单图卷积神经网络(GCNNs)如果经过适当的训练和调整,可以获得最先进的结果。 摘要:The 2021 NeurIPS Machine Learning for Combinatorial Optimization (ML4CO) competition was designed with the goal of improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components with machine learning models. The competition's main scientific question was the following: is machine learning a viable option for improving traditional combinatorial optimization solvers on specific problem distributions, when historical data is available? This was motivated by the fact that in many practical scenarios, the data changes only slightly between the repetitions of a combinatorial optimization problem, and this is an area where machine learning models are particularly powerful at. This paper summarizes the solution and lessons learned by the Huawei EI-OROAS team in the dual task of the competition. The submission of our team achieved the second place in the final ranking, with a very close distance to the first spot. In addition, our solution was ranked first consistently for several weekly leaderboard updates before the final evaluation. We provide insights gained from a large number of experiments, and argue that a simple Graph Convolutional Neural Network (GCNNs) can achieve state-of-the-art results if trained and tuned properly.

【23】 Fine-grained Multi-Modal Self-Supervised Learning 标题:细粒度多模态自监督学习 链接:https://arxiv.org/abs/2112.12182

作者:Duo Wang,Salah Karout 机构:Department of Computer Science and, Technology, University of Cambridge, Cambridge, UK, Huawei R&D UK, Cambridge Science Park 备注:Accepted at BMVC 2021 摘要:视频多模式自监督学习已被证明可以提高模型在各种下游任务上的性能。然而,由于未处理数据中存在噪声,这种自监督预训练需要大量批量和大量计算资源。这部分是由于流行的训练方案是在粗粒度设置上训练的,其中表示整个视频片段或自然语言句子的向量用于计算相似度。由于视频片段的一部分和其他模态输入(如文本描述)完全不相关,这种方案使得训练变得有噪声。在本文中,我们提出了一种细粒度多模态自监督训练方案,该方案在更精细的尺度上计算嵌入之间的相似性(例如单个特征映射嵌入和短语嵌入),并使用注意机制来减少噪声对在损失函数中的权重。我们表明,通过所提出的预训练方案,我们可以训练更小的模型,更小的批量和更少的计算资源,以实现与最新技术相当的下游任务性能,包括动作识别和文本图像检索任务。 摘要:Multi-Modal Self-Supervised Learning from videos has been shown to improve model's performance on various downstream tasks. However, such Self-Supervised pre-training requires large batch sizes and a large amount of computation resources due to the noise present in the uncurated data. This is partly due to the fact that the prevalent training scheme is trained on coarse-grained setting, in which vectors representing the whole video clips or natural language sentences are used for computing similarity. Such scheme makes training noisy as part of the video clips can be totally not correlated with the other-modality input such as text description. In this paper, we propose a fine-grained multi-modal self-supervised training scheme that computes the similarity between embeddings at finer-scale (such as individual feature map embeddings and embeddings of phrases), and uses attention mechanisms to reduce noisy pairs' weighting in the loss function. We show that with the proposed pre-training scheme, we can train smaller models, with smaller batch-size and much less computational resources to achieve downstream tasks performances comparable to State-Of-The-Art, for tasks including action recognition and text-image retrievals.

【24】 Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding 标题:基于交叉注意变换和行为编码的多模态人格识别 链接:https://arxiv.org/abs/2112.12180

作者:Tanay Agrawal,Dhruv Agarwal,Michal Balazia,Neelabh Sinha,Francois Bremond 机构:INRIA Sophia Antipolis - M´editerran´ee, France, Universit´e Cˆote d’Azur, France, Indian Institute of Information Technology, Allahabad, India, Birla Institute of Technology and Science, Pilani, India 备注:Preprint. Final paper accepted at the 17th International Conference on Computer Vision Theory and Applications, VISAPP 2021, Virtual, February 6-8, 2022. 8 pages 摘要:人格计算和情感计算近年来在许多研究领域引起了人们的兴趣。该任务的数据集通常有多种模式,如视频、音频、语言和生物信号。在本文中,我们提出了一个灵活的任务模型,它利用了所有可用的数据。这项任务涉及复杂的关系,为了避免在视频处理中使用大型模型,我们建议使用行为编码,在对模型进行最小更改的情况下提高性能。使用Transformer的交叉关注在最近变得很流行,并被用于不同模式的融合。由于可能存在长期关系,因此不希望将输入分成块,因此建议的模型将整个输入一起处理。我们的实验表明了上述每一项贡献的重要性 摘要:Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, we propose the use of behaviour encoding which boosts performance with minimal change to the model. Cross-attention using transformers has become popular in recent times and is utilised for fusion of different modalities. Since long term relations may exist, breaking the input into chunks is not desirable, thus the proposed model processes the entire input together. Our experiments show the importance of each of the above contributions

【25】 AI-based Reconstruction for Fast MRI -- A Systematic Review and Meta-analysis 标题:基于人工智能的快速MRI重建--系统评价和荟萃分析 链接:https://arxiv.org/abs/2112.12744

作者:Yutong Chen,Carola-Bibiane Schönlieb,Pietro Liò,Tim Leiner,Pier Luigi Dragotti,Ge Wang,Daniel Rueckert,David Firmin,Guang Yang 机构: National Heart & Lung Institute, Imperial College London, London SW,NP, U.K., Cardiovascular Research Centre, Royal Brompton Hospital, London SW,NP, U.K., University of Cambridge, Cambridge CB,RX, U.K.. 备注:42 pages, 5 figures, Proceedings of the IEEE 摘要:压缩感知(CS)在加速磁共振成像(MRI)采集过程中起着关键作用。随着人工智能的复兴,深度神经网络和CS算法正在被集成,以重新定义快速MRI的最新技术。在过去的几年中,基于深度学习的CS技术在复杂性、多样性和性能方面有了长足的发展,这些技术致力于快速MRI。在这项荟萃分析中,我们系统地回顾了用于快速MRI的基于深度学习的CS技术,描述了关键模型设计,突出了突破,并讨论了有希望的方向。我们还引入了一个综合分析框架和分类系统,以评估深度学习在基于CS的MRI加速中的关键作用。 摘要:Compressed sensing (CS) has been playing a key role in accelerating the magnetic resonance imaging (MRI) acquisition process. With the resurgence of artificial intelligence, deep neural networks and CS algorithms are being integrated to redefine the state of the art of fast MRI. The past several years have witnessed substantial growth in the complexity, diversity, and performance of deep learning-based CS techniques that are dedicated to fast MRI. In this meta-analysis, we systematically review the deep learning-based CS techniques for fast MRI, describe key model designs, highlight breakthroughs, and discuss promising directions. We have also introduced a comprehensive analysis framework and a classification system to assess the pivotal role of deep learning in CS-based acceleration for MRI.

【26】 A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone 标题:一种求解无人机旅行商问题的深度强化学习方法 链接:https://arxiv.org/abs/2112.12545

作者:Aigerim Bogyrbayeva. Taehyun Yoon,Hanbum Ko,Sungbin Lim,Hyokun Yun,Changhyun Kwon 机构:Suleyman Demirel University, Kazakhstan, UNIST, South Korea, Amazon, U.S.A., University of South Florida, U.S.A., KAIST, South Korea 摘要:最近,强化学习在许多组合优化问题中显示出学习高质量解的前景。特别是,基于注意的编码器-解码器模型对各种路由问题,包括旅行商问题(TSP)表现出很高的有效性。不幸的是,它们在有无人机的TSP(TSP-D)中的性能很差,需要协调地路由不同的车队——卡车和无人机。在TSP-D中,两辆车串联移动,可能需要在节点处等待另一辆车加入。基于状态较少注意的解码器无法在车辆之间进行这种协调。我们提出了一种注意编码器-LSTM-解码器混合模型,其中解码器的隐藏状态可以表示动作序列。我们的经验表明,这种混合模型在解决方案质量和计算效率方面都优于纯粹基于注意力的模型。我们在最小-最大容量约束车辆路径问题(mmCVRP)上的实验也证实了混合模型比基于注意的模型更适合于多车辆的协调路径问题。 摘要:Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose an attention encoder-LSTM decoder hybrid model, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for coordinated routing of multiple vehicles than the attention-based model.

机器翻译,仅供参考

0 人点赞