人工智能学术速递[12.8]

cs.AI人工智能，共计60篇

【1】 Information is Power: Intrinsic Control via Information Capture 标题：信息就是权力：通过信息捕获实现内在控制链接：https://arxiv.org/abs/2112.03899

作者：Nicholas Rhinehart,Jenny Wang,Glen Berseth,John D. Co-Reyes,Danijar Hafner,Chelsea Finn,Sergey Levine 机构：UC Berkeley, University of Toronto, Google Research, Brain Team, Stanford University 备注：NeurIPS 2021 摘要：人类和动物探索他们的环境并获得有用的技能，即使在没有明确目标的情况下，也表现出内在的动机。人工智能体的内在动机研究涉及以下问题：智能体的良好通用目标是什么？我们研究了动态部分观测环境中的这一问题，并认为一个紧凑且通用的学习目标是最小化使用潜在状态空间模型估计的agent状态访问的熵。这一目标促使一个主体既收集有关其环境的信息，从而减少不确定性，又获得对其环境的控制，从而减少未来世界国家的不可预测性。我们将此方法实例化为一个配备有深度变分贝叶斯滤波器的深度强化学习代理。我们发现，我们的代理在各种部分观察到的环境中学习发现、表示和控制动态对象，这些环境是通过视觉观察感知到的，而没有外部奖励。摘要：Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.

【2】 Gradient and Projection Free Distributed Online Min-Max Resource Optimization 标题：无梯度无投影分布式在线最小最大资源优化链接：https://arxiv.org/abs/2112.03896

作者：Jingrong Wang,Ben Liang 机构：Department of Electrical and Computer Engineering, University of Toronto, Canada 摘要：我们考虑分布式在线最小最大资源分配与一组并行代理和参数服务器。我们的目标是在没有关于这些函数的先验信息的情况下，最小化一组时变凸函数和递减代价函数上的逐点最大值。我们提出了一种新的在线算法，称为分布式在线资源重新分配（DORA），其中非掉队者学习放弃资源并与掉队者共享资源。DORA的一个显著特点是它不需要梯度计算或投影操作，这与大多数现有的在线优化策略不同。这使得它能够大大减少大规模分布式网络中的计算开销。我们证明了该算法的动态遗憾上界为$Oleft（T^{frac{3}{4}}（1 P_T）^{frac{1}{4}}right）$，其中$T$是轮的总数，$Pu T$是瞬时极小值的路径长度。我们进一步考虑在分布式在线机器学习中的带宽分配问题的应用。我们的数值研究证明了所提出的解决方案的有效性及其相对于基于梯度和/或投影的资源分配算法在减少挂钟时间方面的性能优势。摘要：We consider distributed online min-max resource allocation with a set of parallel agents and a parameter server. Our goal is to minimize the pointwise maximum over a set of time-varying convex and decreasing cost functions, without a priori information about these functions. We propose a novel online algorithm, termed Distributed Online resource Re-Allocation (DORA), where non-stragglers learn to relinquish resource and share resource with stragglers. A notable feature of DORA is that it does not require gradient calculation or projection operation, unlike most existing online optimization strategies. This allows it to substantially reduce the computation overhead in large-scale and distributed networks. We show that the dynamic regret of the proposed algorithm is upper bounded by $Oleft(T^{frac{3}{4}}(1 P_T)^{frac{1}{4}}right)$, where $T$ is the total number of rounds and $P_T$ is the path-length of the instantaneous minimizers. We further consider an application to the bandwidth allocation problem in distributed online machine learning. Our numerical study demonstrates the efficacy of the proposed solution and its performance advantage over gradient- and/or projection-based resource allocation algorithms in reducing wall-clock time.

【3】 Is Complexity Important for Philosophy of Mind? 标题：复杂性对心灵哲学重要吗？链接：https://arxiv.org/abs/2112.03877

作者：Kristina Šekrst,Sandro Skansi 机构：†University of Zagreb, hr‡University of Zagreb 摘要：在心灵哲学和哲学人工智能研究中，计算复杂性常常被忽视。本文的目的有三个。首先也是最重要的，在哲学和人工智能问题中，展示复杂性而非可计算性的重要性。第二，从可解性的角度重新表述可计算性的概念，即将可计算性视为不足以建立智能。因此，为了捕捉时空复杂性的本体论背景，对丘奇-图灵理论进行了重新审视和表述。第三，强调不同时间复杂性之间的本体论差异，这似乎为更好地理解人工智能提供了坚实的基础。摘要：Computational complexity has often been ignored in philosophy of mind, in philosophical artificial intelligence studies. The purpose of this paper is threefold. First and foremost, to show the importance of complexity rather than computability in philosophical and AI problems. Second, to rephrase the notion of computability in terms of solvability, i.e. treating computability as non-sufficient for establishing intelligence. The Church-Turing thesis is therefore revisited and rephrased in order to capture the ontological background of spatial and temporal complexity. Third, to emphasize ontological differences between different time complexities, which seem to provide a solid base towards better understanding of artificial intelligence in general.

【4】 Universalizing Weak Supervision 标题：普遍推行弱监督链接：https://arxiv.org/abs/2112.03865

作者：Changho Shin,Winfred Li,Harit Vishwakarma,Nicholas Roberts,Frederic Sala 机构：†Department of Computer Sciences, University of Wisconsin-Madison 摘要：弱监督（WS）框架是一种流行的方法，可以绕过手动标记大型数据集来训练数据饥饿模型。这些方法将多个噪声但廉价获得的标签估计合成为一组高质量的伪标签，用于下游训练。然而，合成技术特定于特定种类的标签，例如二进制标签或序列，并且每种新标签类型都需要手动设计新的合成算法。相反，我们提出了一种通用技术，该技术能够对任何标签类型进行弱监督，同时仍然提供理想的特性，包括实用灵活性、计算效率和理论保证。我们将此技术应用于以前WS框架没有解决的重要问题，包括学习排序、回归和学习双曲流形。理论上，我们的综合方法为学习指数族模型的一个具有挑战性但重要的推广提供了一致估计。在实验上，我们验证了我们的框架，并在不同的环境下显示了基线的改进，包括真实世界的排序和回归问题学习以及双曲流形学习。摘要：Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new label type requires manually designing a new synthesis algorithm. Instead, we propose a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees. We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic manifolds. Theoretically, our synthesis approach produces a consistent estimator for learning a challenging but important generalization of the exponential family model. Experimentally, we validate our framework and show improvement over baselines in diverse settings including real-world learning-to-rank and regression problems along with learning on hyperbolic manifolds.

【5】 Grounded Language-Image Pre-training 标题：扎根的语言-形象预训链接：https://arxiv.org/abs/2112.03857

作者：Liunian Harold Li,Pengchuan Zhang,Haotian Zhang,Jianwei Yang,Chunyuan Li,Yiwu Zhong,Lijuan Wang,Lu Yuan,Lei Zhang,Jenq-Neng Hwang,Kai-Wei Chang,Jianfeng Gao 机构：UCLA,Microsoft Research,University of Washington, University of Wisconsin-Madison,Microsoft Cloud and AI,International Digital Economy Academy 备注：Code will be released at this https URL 摘要：本文提出了一个用于学习对象级、语言感知和语义丰富的视觉表征的扎根语言图像预训练（GLIP）模型。GLIP将目标检测和短语基础统一用于预训练。这种统一带来了两个好处：1）它允许GLIP从检测和接地数据中学习，以改进任务和引导良好的接地模型；2） GLIP可以利用大量的图像-文本对，以自我训练的方式生成基础框，使学习到的表示语义丰富。在我们的实验中，我们在27M的基础数据上预训练GLIP，包括3M的人类注释和24M的网络爬网图像-文本对。学习到的表示法显示出很强的Zero-Shot和少量镜头可转移到各种对象级识别任务。1）当直接在COCO和LVIS上进行评估时（在训练前没有看到COCO中的任何图像），GLIP分别达到49.8 AP和26.9 AP，超过了许多监督基线。2）在COCO上进行微调后，GLIP在val上达到60.8 AP，在测试开发上达到61.5 AP，超过了之前的SoTA。3）当转移到13个下游目标检测任务时，一个单发GLIP与一个完全监督的动态头部相匹敌。守则将于https://github.com/microsoft/GLIP. 摘要：This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head. Code will be released at https://github.com/microsoft/GLIP.

【6】 Policy Search for Model Predictive Control with Application to Agile Drone Flight 标题：模型预测控制策略研究及其在敏捷无人机飞行中的应用链接：https://arxiv.org/abs/2112.03850

作者：Yunlong Song,Davide Scaramuzza 机构： University of Zurich, and Department of Neuroinformatics, University of Zurich and ETH Zurich 备注：This paper is currently under review TRO 摘要：策略搜索和模型预测控制（MPC）是机器人控制的两种不同模式：策略搜索具有利用经验数据自动学习复杂策略的能力，而MPC可以利用模型和轨迹优化提供最优控制性能。一个开放的研究问题是如何利用和结合这两种方法的优势。在这项工作中，我们通过使用策略搜索为MPC自动选择高层决策变量来提供答案，从而为模型预测控制框架提供了一种新的策略搜索。具体来说，我们将MPC描述为一个参数化控制器，其中难以优化的决策变量表示为高级策略。这样的公式允许以自我监督的方式优化策略。我们通过关注敏捷无人机飞行中的一个具有挑战性的问题来验证这个框架：让四旋翼机通过快速移动的闸门。实验结果表明，该控制器在仿真和实际应用中均具有良好的鲁棒性和实时性。该框架为学习和控制的融合提供了一个新的视角。摘要：Policy Search and Model Predictive Control~(MPC) are two different paradigms for robot control: policy search has the strength of automatically learning complex policies using experienced data, while MPC can offer optimal control performance using models and trajectory optimization. An open research question is how to leverage and combine the advantages of both approaches. In this work, we provide an answer by using policy search for automatically choosing high-level decision variables for MPC, which leads to a novel policy-search-for-model-predictive-control framework. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Such a formulation allows optimizing policies in a self-supervised fashion. We validate this framework by focusing on a challenging problem in agile drone flight: flying a quadrotor through fast-moving gates. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world. The proposed framework offers a new perspective for merging learning and control.

【7】 Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction 标题：自然答案生成：从事实答案到使用语法校正的全长答案链接：https://arxiv.org/abs/2112.03849

作者：Manas Jain,Sriparna Saha,Pushpak Bhattacharyya,Gladvin Chinnadurai,Manish Kumar Vatsa 机构：Indian Institute of Technology Bombay, Mumbai, Indian Institute of Technology Patna, LG Soft India 摘要：如今的问答系统通常使用基于模板的语言生成。虽然这些系统对于特定于域的任务来说是足够的，但是对于独立于域的系统来说，这些系统限制太多，并且是预定义的。本文提出了一个系统，该系统输出给定问题的完整答案和提取的factoid答案（短跨度，如命名实体）作为输入。我们的系统使用选区和依赖性问题解析树。基于转换器的语法纠错模型GECToR（2020）被用作后处理步骤，以提高流利性。我们将我们的系统与（i）改进的指针生成器（SOTA）和（ii）针对factoid问题的微调DialoGPT进行比较。我们还测试了存在性（是非）问题的方法，结果更好。与最先进的（SOTA）方法相比，我们的模型生成准确、流畅的答案。评估是在NewsQA和SqUAD数据集上进行的，ROUGE-1得分分别增加0.4和0.9个百分点。与SOTA相比，推理时间也减少了85%。用于我们评估的改进数据集将作为研究贡献的一部分发布。摘要：Question Answering systems these days typically use template-based language generation. Though adequate for a domain-specific task, these systems are too restrictive and predefined for domain-independent systems. This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer (short spans such as named entities) as the input. Our system uses constituency and dependency parse trees of questions. A transformer-based Grammar Error Correction model GECToR (2020), is used as a post-processing step for better fluency. We compare our system with (i) Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions. We also test our approach on existential (yes-no) questions with better results. Our model generates accurate and fluent answers than the state-of-the-art (SOTA) approaches. The evaluation is done on NewsQA and SqUAD datasets with an increment of 0.4 and 0.9 percentage points in ROUGE-1 score respectively. Also the inference time is reduced by 85% as compared to the SOTA. The improved datasets used for our evaluation will be released as part of the research contribution.

【8】 A Deep Learning Driven Algorithmic Pipeline for Autonomous Navigation in Row-Based Crops 标题：一种深度学习驱动的行间作物自主导航算法流水线链接：https://arxiv.org/abs/2112.03816

作者：Simone Cerrato,Vittorio Mazzia,Francesco Salvetti,Marcello Chiaberge 备注：Submitted to IEEE/ASME Transactions on Mechatronics (TMECH) 摘要：昂贵的传感器和低效的算法管道极大地影响了自主机器的总体成本。然而，负担得起的机器人解决方案对于实际使用至关重要，它们的财务影响构成了在大多数应用领域使用服务机器人的基本要求。其中，精准农业领域的研究人员致力于设计健壮且经济高效的自主平台，以提供真正大规模的有竞争力的解决方案。在本文中，我们为基于行的作物自主导航提供了一个完整的算法管道，专门设计用于处理低范围传感器和季节变化。首先，我们建立了一种稳健的数据驱动方法，为自主机器生成可行的路径，仅使用田间占用栅格地图信息覆盖作物的全部延伸。此外，我们的解决方案利用了深度学习优化技术和数据合成的最新进展，提供了一个经济实惠的解决方案，有效解决了众所周知的全球导航卫星系统因行间植被生长而导致的不可靠性和退化问题。针对计算机生成的环境和真实世界的作物进行的大量实验和模拟表明，我们的方法具有鲁棒性和内在的通用性，这为价格合理且完全自主的机器提供了可能性。摘要：Expensive sensors and inefficient algorithmic pipelines significantly affect the overall cost of autonomous machines. However, affordable robotic solutions are essential to practical usage, and their financial impact constitutes a fundamental requirement to employ service robotics in most fields of application. Among all, researchers in the precision agriculture domain strive to devise robust and cost-effective autonomous platforms in order to provide genuinely large-scale competitive solutions. In this article, we present a complete algorithmic pipeline for row-based crops autonomous navigation, specifically designed to cope with low-range sensors and seasonal variations. Firstly, we build on a robust data-driven methodology to generate a viable path for the autonomous machine, covering the full extension of the crop with only the occupancy grid map information of the field. Moreover, our solution leverages on latest advancement of deep learning optimization techniques and synthetic generation of data to provide an affordable solution that efficiently tackles the well-known Global Navigation Satellite System unreliability and degradation due to vegetation growing inside rows. Extensive experimentation and simulations against computer-generated environments and real-world crops demonstrated the robustness and intrinsic generalizability of our methodology that opens the possibility of highly affordable and fully autonomous machines.

【9】 Learning a Robust Multiagent Driving Policy for Traffic Congestion Reduction 标题：学习用于缓解交通拥堵的鲁棒多智能体驾驶策略链接：https://arxiv.org/abs/2112.03759

作者：Yulin Zhang,William Macke,Jiaxun Cui,Daniel Urieli,Peter Stone 机构：United States, General Motors R&D Labs, Israel, The University of Texas at Austin and, Sony AI 备注：9 pages, 7 figures 摘要：自动和自动车辆（AV）的出现为使用多个AV实现系统级目标创造了机会，如减少交通拥堵。过去的研究表明，多智能体减少拥塞的驾驶策略可以在各种模拟场景中学习。虽然最初的概念证明是在小型封闭交通网络中使用集中控制器，但最近成功的结果已在更现实的环境中得到证明，在车辆进出的开放道路网络中使用分布式控制策略。然而，这些驾驶政策大多是在他们接受训练的相同条件下进行测试的，没有对不同交通条件下的鲁棒性进行彻底测试，这是现实场景中的一项关键要求。本文提出了一种学习的多智能体驾驶策略，该策略对各种开放网络交通条件具有鲁棒性，包括车辆流量、AV在交通中的比例、AV布局和不同的合并道路几何。一个彻底的实证分析调查了这种政策对简单合并网络和具有两个合并匝道的更复杂道路中AV数量的敏感性。结果表明，即使AV渗透率低至2%，学习策略也比模拟的人工驱动策略取得了显著的改进。同样的政策也被证明能够减少具有两个合并匝道的更复杂道路的交通拥堵。摘要：The advent of automated and autonomous vehicles (AVs) creates opportunities to achieve system-level goals using multiple AVs, such as traffic congestion reduction. Past research has shown that multiagent congestion-reducing driving policies can be learned in a variety of simulated scenarios. While initial proofs of concept were in small, closed traffic networks with a centralized controller, recently successful results have been demonstrated in more realistic settings with distributed control policies operating in open road networks where vehicles enter and leave. However, these driving policies were mostly tested under the same conditions they were trained on, and have not been thoroughly tested for robustness to different traffic conditions, which is a critical requirement in real-world scenarios. This paper presents a learned multiagent driving policy that is robust to a variety of open-network traffic conditions, including vehicle flows, the fraction of AVs in traffic, AV placement, and different merging road geometries. A thorough empirical analysis investigates the sensitivity of such a policy to the amount of AVs in both a simple merge network and a more complex road with two merging ramps. It shows that the learned policy achieves significant improvement over simulated human-driven policies even with AV penetration as low as 2%. The same policy is also shown to be capable of reducing traffic congestion in more complex roads with two merging ramps.

【10】 Bridging the Model-Reality Gap with Lipschitz Network Adaptation 标题：用Lipschitz网络自适应弥合模型与现实的鸿沟链接：https://arxiv.org/abs/2112.03756

作者：Siqi Zhou,Karime Pereida,Wenda Zhao,Angela P. Schoellig 机构： University of Toronto 备注：None 摘要：当机器人冒险进入现实世界时，它们会受到未建模动力学和干扰的影响。传统的基于模型的控制方法在相对静态和已知的操作环境中已被证明是成功的。然而，当机器人的精确模型不可用时，基于模型的设计可能导致次优甚至不安全的行为。在这项工作中，我们提出了一种方法，弥补了模型与现实之间的差距，并使基于模型的方法得以应用，即使存在动态不确定性。特别是，我们提出了一种基于学习的模型参考自适应方法，使可能具有不确定动力学的机器人系统表现为预定义的参考模型。反过来，参考模型可用于基于模型的控制器设计。与典型的模型参考自适应控制方法相比，我们利用神经网络的代表性功能，通过在一种称为Lipschitz网络的特殊类型神经网络的结构设计中编码一个证明Lipschitz条件来捕获高度非线性的动态不确定性并保证稳定性。我们的方法适用于一类一般的非线性仿射控制系统，即使我们对真实机器人系统的先验知识是有限的。我们在飞行倒立摆实验中演示了我们的方法，其中一个现成的四旋翼在悬停或跟踪圆形轨迹时面临平衡倒立摆的挑战。摘要：As robots venture into the real world, they are subject to unmodeled dynamics and disturbances. Traditional model-based control approaches have been proven successful in relatively static and known operating environments. However, when an accurate model of the robot is not available, model-based design can lead to suboptimal and even unsafe behaviour. In this work, we propose a method that bridges the model-reality gap and enables the application of model-based approaches even if dynamic uncertainties are present. In particular, we present a learning-based model reference adaptation approach that makes a robot system, with possibly uncertain dynamics, behave as a predefined reference model. In turn, the reference model can be used for model-based controller design. In contrast to typical model reference adaptation control approaches, we leverage the representative power of neural networks to capture highly nonlinear dynamics uncertainties and guarantee stability by encoding a certifying Lipschitz condition in the architectural design of a special type of neural network called the Lipschitz network. Our approach applies to a general class of nonlinear control-affine systems even when our prior knowledge about the true robot system is limited. We demonstrate our approach in flying inverted pendulum experiments, where an off-the-shelf quadrotor is challenged to balance an inverted pendulum while hovering or tracking circular trajectories.

【11】 Tell me why! -- Explanations support learning of relational and causal structure 标题：告诉我为什么！--解释有助于学习关系结构和因果结构链接：https://arxiv.org/abs/2112.03753

作者：Andrew K. Lampinen,Nicholas A. Roy,Ishita Dasgupta,Stephanie C. Y. Chan,Allison C. Tam,James L. McClelland,Chen Yan,Adam Santoro,Neil C. Rabinowitz,Jane X. Wang,Felix Hill 机构：DeepMind, London, UK 备注：22 pages 摘要：解释在人类学习中扮演着相当重要的角色，特别是在人工智能仍然面临重大挑战的领域——形成抽象，学习世界的关系和因果结构。在这里，我们探讨强化学习代理是否同样可以从解释中获益。我们概述了一系列关系任务，这些任务涉及选择一个对象，该对象是集合中的奇数对象（即，沿着许多可能的特征维度中的一个维度是唯一的）。奇数一次性任务要求代理对一组对象之间的多维关系进行推理。我们表明，代理人不能仅从奖励中很好地学习这些任务，但当他们还接受了生成解释对象属性或选择正确或不正确原因的语言的训练时，他们的绩效达到90%以上。在进一步的实验中，我们展示了预测解释如何使代理人能够从模棱两可、因果混淆的训练中适当地概括，甚至元学习执行实验干预以识别因果结构。我们表明，解释有助于克服代理人专注于简单特征的倾向，并探索解释的哪些方面使其最为有益。我们的结果表明，从解释中学习是一个强大的原则，可以为训练更健壮和通用的机器学习系统提供一条有希望的途径。摘要：Explanations play a considerable role in human learning, especially in areas thatremain major challenges for AI -- forming abstractions, and learning about the re-lational and causal structure of the world. Here, we explore whether reinforcement learning agents might likewise benefit from explanations. We outline a family of relational tasks that involve selecting an object that is the odd one out in a set (i.e., unique along one of many possible feature dimensions). Odd-one-out tasks require agents to reason over multi-dimensional relationships among a set of objects. We show that agents do not learn these tasks well from reward alone, but achieve >90% performance when they are also trained to generate language explaining object properties or why a choice is correct or incorrect. In further experiments, we show how predicting explanations enables agents to generalize appropriately from ambiguous, causally-confounded training, and even to meta-learn to perform experimental interventions to identify causal structure. We show that explanations help overcome the tendency of agents to fixate on simple features, and explore which aspects of explanations make them most beneficial. Our results suggest that learning from explanations is a powerful principle that could offer a promising path towards training more robust and general machine learning systems.

【12】 Wild ToFu: Improving Range and Quality of Indirect Time-of-Flight Depth with RGB Fusion in Challenging Environments 标题：野生豆腐：在具有挑战性的环境中通过RGB融合提高间接飞行时间深度的范围和质量链接：https://arxiv.org/abs/2112.03750

作者：HyunJun Jung,Nikolas Brasch,Ales Leonardis,Nassir Navab,Benjamin Busam 机构： Technical University of Munich, Huawei Noah’s Ark Lab 摘要：间接飞行时间（I-ToF）成像因其体积小、价格合理而成为移动设备深度估计的一种广泛方式。以往的工作主要集中在改善I-ToF成像质量，特别是治疗多径干扰（MPI）的影响。这些调查通常在近距离、室内和微弱环境光下的特定受限场景中进行。令人惊讶的是，在现实生活场景中，由于传感器功率和光散射有限的衰减导致大量诱导散粒噪声和信号稀疏，因此在强环境光和远距离情况下，研究I-ToF质量改善的工作很少。在这项工作中，我们提出了一种新的基于学习的端到端深度预测网络，该网络采用带噪的原始I-ToF信号和RGB图像，并基于多步方法融合其潜在表示，包括隐式和显式对齐，以预测与RGB视点对齐的高质量远程深度图。我们在具有挑战性的真实场景上测试了我们的方法，与基线方法相比，最终深度图上的RMSE提高了40%以上。摘要：Indirect Time-of-Flight (I-ToF) imaging is a widespread way of depth estimation for mobile devices due to its small size and affordable price. Previous works have mainly focused on quality improvement for I-ToF imaging especially curing the effect of Multi Path Interference (MPI). These investigations are typically done in specifically constrained scenarios at close distance, indoors and under little ambient light. Surprisingly little work has investigated I-ToF quality improvement in real-life scenarios where strong ambient light and far distances pose difficulties due to an extreme amount of induced shot noise and signal sparsity, caused by the attenuation with limited sensor power and light scattering. In this work, we propose a new learning based end-to-end depth prediction network which takes noisy raw I-ToF signals as well as an RGB image and fuses their latent representation based on a multi step approach involving both implicit and explicit alignment to predict a high quality long range depth map aligned to the RGB viewpoint. We test our approach on challenging real-world scenes and show more than 40% RMSE improvement on the final depth map compared to the baseline approach.

【13】 Dilated convolution with learnable spacings 标题：具有可学习间隔的扩张卷积链接：https://arxiv.org/abs/2112.03740

作者：Ismail Khalfaoui Hassani,Thomas Pellegrini,Timothée Masquelier 机构：ANITI, Universit´e de Toulouse, France, IRIT, CNRS, Timoth´ee Masquelier, CerCo UMR , CNRS 备注：15 pages 摘要：扩展卷积基本上是一种卷积，通过在内核元素之间定期插入空格来创建更宽的内核。在这篇文章中，我们提出了一个新版本的扩张卷积，其中的间距是可学习的通过反向传播通过插值技术。我们称这种方法为“具有可学习间距的扩张卷积”（DCLS），并将其方法推广到n维卷积情形。然而，我们这里主要关注的是2D案例，我们为其开发了两种实现：一种是构造扩展内核的简单实现，适用于较小的扩展速率，另一种是使用改进版的“im2col”算法的时间/内存效率更高的实现。然后，我们通过简单地用DCLS层替换经典的扩展卷积层，说明了该技术如何提高Pascal Voc 2012数据集上语义分割任务的现有体系结构的准确性。此外，我们还表明，DCLS允许将最近ConvMixer体系结构中使用的深度卷积的可学习参数的数量减少3倍，而精度没有降低或降低得非常低，并且通过用稀疏DCLS替换大的密集核来实现。该方法的代码基于Pytork，可从以下网址获得：https://github.com/K-H-Ismail/Dilated-Convolution-with-Learnable-Spacings-PyTorch. 摘要：Dilated convolution is basically a convolution with a wider kernel created by regularly inserting spaces between the kernel elements. In this article, we present a new version of the dilated convolution in which the spacings are made learnable via backpropagation through an interpolation technique. We call this method "Dilated Convolution with Learnable Spacings" (DCLS) and we generalize its approach to the n-dimensional convolution case. However, our main focus here will be the 2D case for which we developed two implementations: a naive one that constructs the dilated kernel, suitable for small dilation rates, and a more time/memory efficient one that uses a modified version of the "im2col" algorithm. We then illustrate how this technique improves the accuracy of existing architectures on semantic segmentation task on Pascal Voc 2012 dataset via a simple drop-in replacement of the classical dilated convolutional layers by DCLS ones. Furthermore, we show that DCLS allows to reduce the number of learnable parameters of the depthwise convolutions used in the recent ConvMixer architecture by a factor 3 with no or very low reduction in accuracy and that by replacing large dense kernels with sparse DCLS ones. The code of the method is based on Pytorch and available at: https://github.com/K-H-Ismail/Dilated-Convolution-with-Learnable-Spacings-PyTorch.

【14】 A coarse space acceleration of deep-DDM 标题：深DDM的一种粗空间加速链接：https://arxiv.org/abs/2112.03732

作者：Valentin Mercier,Serge Gratton,Pierre Boudier 机构：∗, †, ‡ 摘要：使用深度学习方法解决偏微分方程是一个正在全面扩展的领域。特别是，实现物理域采样并使用惩罚违反偏微分方程的损失函数的物理信息神经网络已显示出巨大的潜力。然而，为了解决实际应用中遇到的大规模问题并与现有的偏微分方程数值方法竞争，设计具有良好可扩展性的并行算法是非常重要的。在传统的区域分解方法（DDM）的脉络中，我们考虑了最近提出的深度DDM方法。我们提出了这种方法的一个扩展，它依赖于使用粗空间校正，类似于在传统DDM解算器中所做的。我们的研究表明，由于每次迭代时子域之间的瞬时信息交换，当子域数量增加时，粗校正能够缓解解算器收敛性的恶化。实验结果表明，我们的方法在减少额外计算量的情况下，显著加快了原有的deep-ddm方法。摘要：The use of deep learning methods for solving PDEs is a field in full expansion. In particular, Physical Informed Neural Networks, that implement a sampling of the physical domain and use a loss function that penalizes the violation of the partial differential equation, have shown their great potential. Yet, to address large scale problems encountered in real applications and compete with existing numerical methods for PDEs, it is important to design parallel algorithms with good scalability properties. In the vein of traditional domain decomposition methods (DDM), we consider the recently proposed deep-ddm approach. We present an extension of this method that relies on the use of a coarse space correction, similarly to what is done in traditional DDM solvers. Our investigations shows that the coarse correction is able to alleviate the deterioration of the convergence of the solver when the number of subdomains is increased thanks to an instantaneous information exchange between subdomains at each iteration. Experimental results demonstrate that our approach induces a remarkable acceleration of the original deep-ddm method, at a reduced additional computational cost.

【15】 Flexible Networks for Learning Physical Dynamics of Deformable Objects 标题：用于学习可变形物体物理动力学的柔性网络链接：https://arxiv.org/abs/2112.03728

作者：Jinhyung Park,DoHae Lee,In-Kwon Lee 机构： Yonsei University 摘要：使用基于粒子的表示学习可变形物体的物理动力学一直是机器学习中许多计算模型的目标。虽然一些最先进的模型在模拟环境中实现了这一目标，但大多数现有模型都有一个先决条件，即输入是有序点集的序列，即每个点集中的点在整个输入序列中的顺序必须相同。这限制了模型推广到现实世界的数据，这被认为是一个无序点集序列。在本文中，我们提出了一个称为时间点网（TP-Net）的模型，该模型通过直接使用一系列无序点集来推断基于粒子表示的可变形对象的未来状态，从而解决了这个问题。我们的模型由一个共享特征提取器和一个预测网络组成，共享特征提取器并行地从每个输入点集中提取全局特征，预测网络对这些特征进行聚合和推理，以便将来进行预测。我们方法的关键概念是，我们使用全局特征而不是局部特征来实现对输入置换的不变性，并确保模型的稳定性和可伸缩性。实验表明，我们的模型在合成数据集和真实数据集上都达到了最先进的性能，具有实时预测速度。我们提供定量和定性分析，说明为什么我们的方法比现有方法更有效。摘要：Learning the physical dynamics of deformable objects with particle-based representation has been the objective of many computational models in machine learning. While several state-of-the-art models have achieved this objective in simulated environments, most existing models impose a precondition, such that the input is a sequence of ordered point sets - i.e., the order of the points in each point set must be the same across the entire input sequence. This restrains the model to generalize to real-world data, which is considered to be a sequence of unordered point sets. In this paper, we propose a model named time-wise PointNet (TP-Net) that solves this problem by directly consuming a sequence of unordered point sets to infer the future state of a deformable object with particle-based representation. Our model consists of a shared feature extractor that extracts global features from each input point set in parallel and a prediction network that aggregates and reasons on these features for future prediction. The key concept of our approach is that we use global features rather than local features to achieve invariance to input permutations and ensure the stability and scalability of our model. Experiments demonstrate that our model achieves state-of-the-art performance in both synthetic dataset and in real-world dataset, with real-time prediction speed. We provide quantitative and qualitative analysis on why our approach is more effective and efficient than existing approaches.

【16】 RFGAN: RF-Based Human Synthesis 标题：RFGAN：基于射频的人体合成链接：https://arxiv.org/abs/2112.03727

作者：Cong Yu,Zhi Wu,Dongheng Zhang,Zhi Lu,Yang Hu,Yan Chen 摘要：本文演示了基于射频（RF）信号的人体合成，它利用射频信号可以通过人体反射的信号记录人体运动的事实。与现有的只能粗略感知人类的射频传感工作不同，本文通过引入一种新的交叉模态RFGAN模型来生成精细的光学人体图像。具体来说，我们首先构建一个配备水平和垂直天线阵列的无线电系统来接收射频信号。由于反射的射频信号在水平面和垂直面上被处理为模糊的信号投影热图，因此我们设计了一个RFGAN中带有RNN的射频提取器，用于射频热图编码和合并，以获得人类活动信息。然后，我们使用所提出的基于射频的自适应归一化，将射频提取器和RNN提取的信息作为条件注入GAN。最后，我们以端到端的方式训练整个模型。为了评估我们提出的模型，我们创建了两个跨模态数据集（RF Walk&RF Activity），其中包含数千个光学人体活动帧和相应的RF信号。实验结果表明，RFGAN可以利用射频信号生成目标人体活动帧。据我们所知，这是第一个基于射频信号生成光学图像的工作。摘要：This paper demonstrates human synthesis based on the Radio Frequency (RF) signals, which leverages the fact that RF signals can record human movements with the signal reflections off the human body. Different from existing RF sensing works that can only perceive humans roughly, this paper aims to generate fine-grained optical human images by introducing a novel cross-modal RFGAN model. Specifically, we first build a radio system equipped with horizontal and vertical antenna arrays to transceive RF signals. Since the reflected RF signals are processed as obscure signal projection heatmaps on the horizontal and vertical planes, we design a RF-Extractor with RNN in RFGAN for RF heatmap encoding and combining to obtain the human activity information. Then we inject the information extracted by the RF-Extractor and RNN as the condition into GAN using the proposed RF-based adaptive normalizations. Finally, we train the whole model in an end-to-end manner. To evaluate our proposed model, we create two cross-modal datasets (RF-Walk & RF-Activity) that contain thousands of optical human activity frames and corresponding RF signals. Experimental results show that the RFGAN can generate target human activity frames using RF signals. To the best of our knowledge, this is the first work to generate optical images based on RF signals.

【17】 Shrub Ensembles for Online Classification 标题：用于在线分类的灌木集成链接：https://arxiv.org/abs/2112.03723

作者：Sebastian Buschjäger,Sibylle Hess,Katharina Morik 机构： Technische Universiteit Eindhoven 备注：9 pages main content, 13 pages appendix, accepted at AAAI-2022 摘要：在线学习算法已经成为机器学习工具箱中普遍存在的工具，并且经常用于资源受限的小型环境中。最成功的在线学习方法是决策树（DT）集成。DT集成在适应数据变化的同时提供了优异的性能，但它们并没有资源效率。增量树学习器不断向树中添加新节点，但从不删除旧节点，这会随着时间的推移增加内存消耗。另一方面，基于梯度的树学习需要计算整个树上的梯度，这对于中等大小的树来说是非常昂贵的。在本文中，我们提出了一种新的存储效率高的在线分类集成，称为灌木集成，用于资源约束系统。我们的算法在小窗口上训练中小型决策树，并使用随机近端梯度下降来学习这些“灌木”的集合权重。我们对我们的算法进行了理论分析，并对我们的方法在在线环境中的行为进行了广泛的讨论。在12个不同数据集上进行的2~959个实验中，我们将我们的方法与8种最先进的方法进行了比较。我们的灌木丛组合即使在可用内存很少的情况下也能保持出色的性能。我们表明，SE在12例中的7例中提供了更好的准确性记忆权衡，同时在统计上比大多数其他方法具有更好的性能。我们的实施可在https://github.com/sbuschjaeger/se-online . 摘要：Online learning algorithms have become a ubiquitous tool in the machine learning toolbox and are frequently used in small, resource-constraint environments. Among the most successful online learning methods are Decision Tree (DT) ensembles. DT ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient. Incremental tree learners keep adding new nodes to the tree but never remove old ones increasing the memory consumption over time. Gradient-based tree learning, on the other hand, requires the computation of gradients over the entire tree which is costly for even moderately sized trees. In this paper, we propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems. Our algorithm trains small to medium-sized decision trees on small windows and uses stochastic proximal gradient descent to learn the ensemble weights of these `shrubs'. We provide a theoretical analysis of our algorithm and include an extensive discussion on the behavior of our approach in the online setting. In a series of 2~959 experiments on 12 different datasets, we compare our method against 8 state-of-the-art methods. Our Shrub Ensembles retain an excellent performance even when only little memory is available. We show that SE offers a better accuracy-memory trade-off in 7 of 12 cases, while having a statistically significant better performance than most other methods. Our implementation is available under https://github.com/sbuschjaeger/se-online .

【18】 Two-stage Deep Stacked Autoencoder with Shallow Learning for Network Intrusion Detection System 标题：用于网络入侵检测系统的浅学习两级深栈自动编码器链接：https://arxiv.org/abs/2112.03704

作者：Nasreen Fathima,Akshara Pramod,Yash Srivastava,Anusha Maria Thomas,Syed Ibrahim S P,Chandran K R 机构：a Research Scholar, School of Computer Science and Engineering, Vellore Institute of Technology, Chennai Campus, Tamil Nadu, India, b School of Electronics Engineering, Vellore Institute of Technology, Chennai Campus, Tamil Nadu, India 备注：8 pages, 3 figures 摘要：零星事件，如实时网络流量中的恶意攻击，已导致大型组织收入损失大幅增加。这是由于网络的过度增长及其与过多人群的接触。用于检测入侵的标准方法没有前途，并且在识别新恶意软件方面存在重大失败。此外，在处理大量稀疏数据、高误报率、小班较少的检测率、训练时间和数据维度的特征工程方面的挑战促进了深度学习以较少的时间和巨大的成果接管任务。现有系统在解决实时网络流量问题以及特征工程方面需要改进。我们提出的工作克服了这些挑战，通过在两个阶段中使用深层自动编码器获得了有希望的结果。两阶段深度学习与浅层学习相结合，第二阶段使用随机森林进行分类。这使得该模型与最新的加拿大网络安全研究所-入侵检测系统2017（CICIDS-2017）数据集配合良好。实现了零误报和令人钦佩的检测精度。摘要：Sparse events, such as malign attacks in real-time network traffic, have caused big organisations an immense hike in revenue loss. This is due to the excessive growth of the network and its exposure to a plethora of people. The standard methods used to detect intrusions are not promising and have significant failure to identify new malware. Moreover, the challenges in handling high volume data with sparsity, high false positives, fewer detection rates in minor class, training time and feature engineering of the dimensionality of data has promoted deep learning to take over the task with less time and great results. The existing system needs improvement in solving real-time network traffic issues along with feature engineering. Our proposed work overcomes these challenges by giving promising results using deep-stacked autoencoders in two stages. The two-stage deep learning combines with shallow learning using the random forest for classification in the second stage. This made the model get well with the latest Canadian Institute for Cybersecurity - Intrusion Detection System 2017 (CICIDS-2017) dataset. Zero false positives with admirable detection accuracy were achieved.

【19】 Safe Distillation Box 标题：安全蒸馏箱链接：https://arxiv.org/abs/2112.03695

作者：Jingwen Ye,Yining Mao,Jie Song,Xinchao Wang,Cheng Jin,Mingli Song 机构： Zhejiang University, Hangzhou, National University of Singapore, Fudan University 备注：Accepted by AAAI2022 摘要：知识提炼（KD）最近成为一种将知识从预先训练过的教师模型转移到轻量级学生的强大策略，并在广泛的应用中取得了前所未有的成功。尽管取得了令人鼓舞的结果，KD过程本身对网络所有权保护构成了潜在威胁，因为网络中包含的知识可以毫不费力地提取出来，从而暴露给恶意用户。在本文中，我们提出了一个新的框架，称为安全蒸馏箱（SDB），它允许我们将预先训练好的模型包装在一个虚拟箱中，以保护知识产权。具体而言，SDB保留了包装模型对所有用户的推理能力，但排除了未经授权用户的KD。另一方面，对于授权用户，SDB执行知识扩充计划，以增强KD性能和学生模型的结果。换句话说，所有用户都可以使用SDB中的模型进行推理，但只有授权用户才能从该模型访问KD。建议的SDB对模型架构不施加任何约束，并且可以很容易地作为即插即用解决方案来保护预先训练的网络的所有权。各种数据集和体系结构的实验表明，使用SDB，未经授权KD的性能显著下降，而授权KD的性能得到增强，这证明了SDB的有效性。摘要：Knowledge distillation (KD) has recently emerged as a powerful strategy to transfer knowledge from a pre-trained teacher model to a lightweight student, and has demonstrated its unprecedented success over a wide spectrum of applications. In spite of the encouraging results, the KD process per se poses a potential threat to network ownership protection, since the knowledge contained in network can be effortlessly distilled and hence exposed to a malicious user. In this paper, we propose a novel framework, termed as Safe Distillation Box (SDB), that allows us to wrap a pre-trained model in a virtual box for intellectual property protection. Specifically, SDB preserves the inference capability of the wrapped model to all users, but precludes KD from unauthorized users. For authorized users, on the other hand, SDB carries out a knowledge augmentation scheme to strengthen the KD performances and the results of the student model. In other words, all users may employ a model in SDB for inference, but only authorized users get access to KD from the model. The proposed SDB imposes no constraints over the model architecture, and may readily serve as a plug-and-play solution to protect the ownership of a pre-trained network. Experiments across various datasets and architectures demonstrate that, with SDB, the performance of an unauthorized KD drops significantly while that of an authorized gets enhanced, demonstrating the effectiveness of SDB.

【20】 Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization 标题：基于漏斗正则化的卷积神经网络压缩的低秩张量分解链接：https://arxiv.org/abs/2112.03690

作者：Bo-Shiuan Chu,Che-Rung Lee 摘要：张量分解能够揭示复杂结构之间的潜在关系，是深卷积神经网络模型压缩的基本技术之一。然而，现有的大多数方法都是对网络进行分层压缩，不能提供一个令人满意的解决方案来实现全局优化。本文提出了一种利用卷积层的低秩张量分解来压缩预训练网络的模型降阶方法。我们的方法基于优化技术来选择合适的分解网络层的秩。提出了一种新的正则化方法，称为漏斗函数，以抑制压缩过程中的不重要因素，从而更容易显示适当的秩。实验结果表明，与其他张量压缩方法相比，该算法可以减少更多的模型参数。对于使用ImageNet2012的ResNet18，我们的简化模型可以达到GMAC速度的两倍以上，而精度下降仅为0.7%，在这两个指标上都优于大多数现有方法。摘要：Tensor decomposition is one of the fundamental technique for model compression of deep convolution neural networks owing to its ability to reveal the latent relations among complex structures. However, most existing methods compress the networks layer by layer, which cannot provide a satisfactory solution to achieve global optimization. In this paper, we proposed a model reduction method to compress the pre-trained networks using low-rank tensor decomposition of the convolution layers. Our method is based on the optimization techniques to select the proper ranks of decomposed network layers. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression, so the proper ranks can be revealed much easier. The experimental results show that our algorithm can reduce more model parameters than other tensor compression methods. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop, which outperforms most existing methods in both metrics.

【21】 Hybrid Self-Attention NEAT: A novel evolutionary approach to improve the NEAT algorithm 标题：混合自我注意NEAT：一种改进NEAT算法的进化新方法链接：https://arxiv.org/abs/2112.03670

作者：Saman Khamesian,Hamed Malek 摘要：本文提出了一种“混合自注意NEAT”方法来改进高维输入中原始的增广拓扑神经进化（NEAT）算法。尽管NEAT算法在不同的挑战性任务中表现出了显著的效果，但由于输入表示是高维的，它无法创建一个经过良好调整的网络。我们的研究通过使用自我注意作为一种间接编码方法来选择输入中最重要的部分来解决这一局限性。此外，我们通过一种混合方法来进化最终的网络权值，从而提高其整体性能。主要结论是混合型自我注意工整可以消除原始工整的限制。结果表明，与进化算法相比，我们的模型可以在原始像素输入的Atari游戏中以更少的参数获得可比分数。摘要：This article presents a "Hybrid Self-Attention NEAT" method to improve the original NeuroEvolution of Augmenting Topologies (NEAT) algorithm in high-dimensional inputs. Although the NEAT algorithm has shown a significant result in different challenging tasks, as input representations are high dimensional, it cannot create a well-tuned network. Our study addresses this limitation by using self-attention as an indirect encoding method to select the most important parts of the input. In addition, we improve its overall performance with the help of a hybrid method to evolve the final network weights. The main conclusion is that Hybrid Self- Attention NEAT can eliminate the restriction of the original NEAT. The results indicate that in comparison with evolutionary algorithms, our model can get comparable scores in Atari games with raw pixels input with a much lower number of parameters.

【22】 Saliency Diversified Deep Ensemble for Robustness to Adversaries 标题：显著多样化的深度集成，增强了对对手的健壮性链接：https://arxiv.org/abs/2112.03615

作者：Alex Bogun,Dimche Kostadinov,Damian Borth 机构：University of St. Gallen 备注：Accepted to AAAI Workshop on Adversarial Machine Learning and Beyond 2022 摘要：深度学习模型在许多图像识别、分类和重建任务中表现出令人难以置信的性能。尽管由于其预测能力而非常有吸引力和价值，但有一个共同的威胁仍然难以解决。经过专门训练的攻击者可以引入恶意输入干扰来欺骗网络，从而导致潜在的有害预测失误。此外，当对手完全可以访问目标模型（白盒）时，甚至当访问受限时（黑盒设置），这些攻击都可以成功。模型集成可以防止此类攻击，但在其成员中共享漏洞（攻击可转移性）的情况下可能会变得脆弱。为此，这项工作提出了一种新的多样性促进学习方法的深层集成。其思想是通过在我们的学习目标中引入一个额外的术语，在集合成员上促进显著性图多样性（SMD），以防止攻击者同时攻击所有集合成员。在训练期间，这有助于我们最小化模型显著性之间的一致性，以减少共享成员的漏洞，从而提高对对手的整体鲁棒性。我们的经验表明，与针对中强度和高强度白盒攻击的最先进的集成防御相比，集成成员之间的可转移性降低，性能提高。此外，我们证明了我们的方法与现有方法相结合，在白盒和黑盒攻击下的防御性能优于最先进的集成算法。摘要：Deep learning models have shown incredible performance on numerous image recognition, classification, and reconstruction tasks. Although very appealing and valuable due to their predictive capabilities, one common threat remains challenging to resolve. A specifically trained attacker can introduce malicious input perturbations to fool the network, thus causing potentially harmful mispredictions. Moreover, these attacks can succeed when the adversary has full access to the target model (white-box) and even when such access is limited (black-box setting). The ensemble of models can protect against such attacks but might be brittle under shared vulnerabilities in its members (attack transferability). To that end, this work proposes a novel diversity-promoting learning approach for the deep ensembles. The idea is to promote saliency map diversity (SMD) on ensemble members to prevent the attacker from targeting all ensemble members at once by introducing an additional term in our learning objective. During training, this helps us minimize the alignment between model saliencies to reduce shared member vulnerabilities and, thus, increase ensemble robustness to adversaries. We empirically show a reduced transferability between ensemble members and improved performance compared to the state-of-the-art ensemble defense against medium and high strength white-box attacks. In addition, we demonstrate that our approach combined with existing methods outperforms state-of-the-art ensemble algorithms for defense under white-box and black-box attacks.

【23】 Predict and Optimize: Through the Lens of Learning to Rank 标题：预测与优化：通过学习排名的镜头链接：https://arxiv.org/abs/2112.03609

作者：Jayanta Mandi,Víctor Bucarey,Maxime Mulamba,Tias Guns 机构： Data Analytics Laboratory, Vrije Universiteit Brussel, Belgium, Institute of Engineering Sciences, Universidad de O’Higgins, Rancagua, Chile, Department of Computer Science, KU Leuven, Belgium 备注：Working paper 摘要：在过去的几年中，预测和优化方法（Elmachtoub和Grigas 2021；Wilder，Dilkina，TAMBE 2019）受到了越来越多的关注。这些问题的设置是将预测机器学习（ML）模型的预测反馈给下游优化问题进行决策。预测和优化方法提出通过直接优化优化求解器所做决策的质量来训练ML模型，通常是神经网络模型。然而，预测和优化方法的一个主要瓶颈是解决每个时期每个训练实例的优化问题。为了应对这一挑战，Mulamba等人（2021年）通过缓存可行的解决方案提出了噪声对比估计。在这项工作中，我们证明了噪声对比估计可以看作是学习对解决方案缓存进行排序的一种情况。我们还开发了成对和列表排序损失函数，这些损失函数可以以封闭形式区分，而无需解决优化问题。通过对这些替代损失函数的训练，我们的经验表明，我们能够最小化预测的遗憾。摘要：In the last years predict-and-optimize approaches (Elmachtoub and Grigas 2021; Wilder, Dilkina, and Tambe 2019) have received increasing attention. These problems have the settings where the predictions of predictive machine learning (ML) models are fed to downstream optimization problems for decision making. Predict-and-optimize approaches propose to train the ML models, often neural network models, by directly optimizing the quality of decisions made by the optimization solvers. However, one major bottleneck of predict-and-optimize approaches is solving the optimization problem for each training instance at every epoch. To address this challenge, Mulamba et al. (2021) propose noise contrastive estimation by caching feasible solutions. In this work, we show the noise contrastive estimation can be considered a case of learning to rank the solution cache. We also develop pairwise and listwise ranking loss functions, which can be differentiated in closed form without the need of solving the optimization problem. By training with respect to these surrogate loss function, we empirically show that we are able to minimize the regret of the predictions.

【24】 Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning 标题：基于注意力聚合的双向交互学习手写数学表达式识别链接：https://arxiv.org/abs/2112.03603

作者：Xiaohang Bian,Bo Qin,Xiaozhe Xin,Jianwu Li,Xuefeng Su,Yanfeng Wang 机构： Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, China, AI Interaction Department, Tencent, China 备注：None 摘要：手写数学表达式识别的目的是从给定的图像中自动生成乳胶序列。目前，基于注意的编解码模型被广泛应用于这项任务中。它们通常以从左到右（L2R）的方式生成目标序列，而不利用从右到左（R2L）的上下文。本文提出了一种基于注意聚合的双向互学习网络（ABM），该网络由一个共享编码器和两个并行逆解码器（L2R和R2L）组成。这两个译码器通过相互蒸馏来增强，在每个训练步骤中涉及一对一的知识转移，充分利用来自两个反向的互补信息。此外，为了处理不同尺度下的数学符号，提出了一种注意力聚合模块（AAM）来有效地集成多尺度覆盖注意。值得注意的是，在推理阶段，假设模型已经从两个反向学习知识，我们只使用L2R分支进行推理，保持原始参数大小和推理速度。大量实验表明，我们提出的方法在没有数据增强和模型融合的情况下，在CROHME 2014、CROHME 2016和CROHME 2019上的识别准确率分别为56.85%、52.92%和53.96%，大大优于最先进的方法。补充资料中提供了源代码。摘要：Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual learning Network (ABM) which consists of one shared encoder and two parallel inverse decoders (L2R and R2L). The two decoders are enhanced via mutual distillation, which involves one-to-one knowledge transfer at each training step, making full use of the complementary information from two inverse directions. Moreover, in order to deal with mathematical symbols in diverse scales, an Attention Aggregation Module (AAM) is proposed to effectively integrate multi-scale coverage attentions. Notably, in the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference, keeping the original parameter size and inference speed. Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods. The source code is available in the supplementary materials.

【25】 Pragmatic Implementation of Reinforcement Algorithms For Path Finding On Raspberry Pi 标题：树莓猪寻路增强算法的实用化实现链接：https://arxiv.org/abs/2112.03577

作者：Serena Raju,Sherin Shibu,Riya Mol Raji,Joel Thomas 机构：Computer Department, Fr. C. Rodrigues Institute of Technology, Navi Mumbai, India 备注：5 pages, 7 figures 摘要：本文介绍了一个利用强化学习算法进行路径规划和避免碰撞的室内自主配送系统的实际实现。所提出的系统是一种经济高效的方法，用于帮助树莓Pi控制的四轮驱动非完整机器人映射网格。该方法计算并导航从源关键点到目标关键点的最短路径，以执行所需的交付。Q学习和Deep-Q学习用于在避免与静态障碍物碰撞的同时寻找最优路径。这项工作定义了一种在机器人上部署这两种算法的方法。本文还提出了一种新的算法，将一系列方向解码为特定动作空间中的精确运动。描述了按照上述要求调度该系统所遵循的程序，从而为室内自动运载车辆提供了概念证明。摘要：In this paper, pragmatic implementation of an indoor autonomous delivery system that exploits Reinforcement Learning algorithms for path planning and collision avoidance is audited. The proposed system is a cost-efficient approach that is implemented to facilitate a Raspberry Pi controlled four-wheel-drive non-holonomic robot map a grid. This approach computes and navigates the shortest path from a source key point to a destination key point to carry out the desired delivery. Q learning and Deep-Q learning are used to find the optimal path while avoiding collision with static obstacles. This work defines an approach to deploy these two algorithms on a robot. A novel algorithm to decode an array of directions into accurate movements in a certain action space is also proposed. The procedure followed to dispatch this system with the said requirements is described, ergo presenting our proof of concept for indoor autonomous delivery vehicles.

【26】 MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance 标题：MESA：用于安全适配和容错的离线Meta-RL 链接：https://arxiv.org/abs/2112.03575

作者：Michael Luo,Ashwin Balakrishna,Brijen Thananjeyan,Suraj Nair,Julian Ibarz,Jie Tan,Chelsea Finn,Ion Stoica,Ken Goldberg 备注：None 摘要：安全探索对于在风险敏感环境中使用强化学习（RL）至关重要。最近的工作学习了风险度量，它度量违反约束的概率，然后可用于实现安全。然而，学习此类风险度量需要与环境进行大量交互，从而导致学习过程中过度违反约束。此外，这些措施不容易转移到新的环境中。我们将安全探索视为离线meta-RL问题，其目标是利用一系列环境中的安全和不安全行为示例，以快速将学到的风险度量适应具有以前未发现动态的新环境。然后，我们提出了安全适应元学习（MESA），一种元学习方法，一种安全RL的风险度量方法。跨5个连续控制域的模拟实验表明，MESA可以利用一系列不同环境中的脱机数据，在保持任务性能的同时，将看不见环境中的约束冲突减少多达2倍。看见https://tinyurl.com/safe-meta-rl 代码和补充资料。摘要：Safe exploration is critical for using reinforcement learning (RL) in risk-sensitive environments. Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety. However, learning such risk measures requires significant interaction with the environment, resulting in excessive constraint violations during learning. Furthermore, these measures are not easily transferable to new environments. We cast safe exploration as an offline meta-RL problem, where the objective is to leverage examples of safe and unsafe behavior across a range of environments to quickly adapt learned risk measures to a new environment with previously unseen dynamics. We then propose MEta-learning for Safe Adaptation (MESA), an approach for meta-learning a risk measure for safe RL. Simulation experiments across 5 continuous control domains suggest that MESA can leverage offline data from a range of different environments to reduce constraint violations in unseen environments by up to a factor of 2 while maintaining task performance. See https://tinyurl.com/safe-meta-rl for code and supplementary material.

【27】 Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices 标题：问答调查：方向、挑战、数据集、评估矩阵链接：https://arxiv.org/abs/2112.03572

作者：Hariom A. Pandya,Brijesh S. Bhatt 机构：Computer Engineering Department, Dharmsinh Desai University, Nadiad, Gujarat, India 摘要：在过去的十年里，互联网上可用信息的使用和数量都在增加。这种数字化导致需要自动答疑系统从冗余和过渡的知识源中提取丰富的信息。这些系统的设计是为了满足从这个巨大的知识源到使用自然语言理解（NLU）的用户查询的最突出的答案，因此明显依赖于问答（QA）领域。问答包括但不限于将用户问题映射到相关查询、检索相关信息、从检索到的信息中找到最合适的答案等步骤。目前对深度学习模型的改进表明，所有这些任务的性能都有显著提高。本文从问题类型、答案类型、证据来源、答案和建模方法等方面分析了质量保证领域的研究方向。这一细节之后是该领域的公开挑战，如自动问题生成、相似性检测和语言的低资源可用性。最后，对现有数据集和评价方法进行了综述。摘要：The usage and amount of information available on the internet increase over the past decade. This digitization leads to the need for automated answering system to extract fruitful information from redundant and transitional knowledge sources. Such systems are designed to cater the most prominent answer from this giant knowledge source to the user query using natural language understanding (NLU) and thus eminently depends on the Question-answering(QA) field. Question answering involves but not limited to the steps like mapping of user question to pertinent query, retrieval of relevant information, finding the best suitable answer from the retrieved information etc. The current improvement of deep learning models evince compelling performance improvement in all these tasks. In this review work, the research directions of QA field are analyzed based on the type of question, answer type, source of evidence-answer, and modeling approach. This detailing followed by open challenges of the field like automatic question generation, similarity detection and, low resource availability for a language. In the end, a survey of available datasets and evaluation measures is presented.

【28】 Graph Neural Controlled Differential Equations for Traffic Forecasting 标题：用于交通量预测的图神经控制微分方程链接：https://arxiv.org/abs/2112.03558

作者：Jeongwhan Choi,Hwangyong Choi,Jeehyun Hwang,Noseong Park 机构：Yonsei University, Seoul, South Korea 备注：Accepted by AAAI 2022 摘要：交通量预测是机器学习领域中最流行的时空任务之一。该领域的一种流行方法是将图卷积网络和递归神经网络结合起来进行时空处理。竞争激烈，许多新方法被提出。本文提出了时空图神经控制微分方程（STG-NCDE）的求解方法。神经控制微分方程（NCDE）是处理序列数据的一个突破性概念。我们扩展了概念并设计了两个NCDE：一个用于时间处理，另一个用于空间处理。然后，我们将它们合并到一个框架中。我们用6个基准数据集和20个基线进行了实验。STG-NCDE在所有情况下都显示出最佳的准确性，在非平凡的利润率方面超过了所有这20条基线。摘要：Traffic forecasting is one of the most popular spatio-temporal tasks in the field of machine learning. A prevalent approach in the field is to combine graph convolutional networks and recurrent neural networks for the spatio-temporal processing. There has been fierce competition and many novel methods have been proposed. In this paper, we present the method of spatio-temporal graph neural controlled differential equation (STG-NCDE). Neural controlled differential equations (NCDEs) are a breakthrough concept for processing sequential data. We extend the concept and design two NCDEs: one for the temporal processing and the other for the spatial processing. After that, we combine them into a single framework. We conduct experiments with 6 benchmark datasets and 20 baselines. STG-NCDE shows the best accuracy in all cases, outperforming all those 20 baselines by non-trivial margins.

【29】 Multi-speaker Emotional Text-to-speech Synthesizer 标题：多说话人情感文语合成器链接：https://arxiv.org/abs/2112.03557

作者：Sungjae Cho,Soo-Young Lee 机构：Korea Institute of Science and Technology, Republic of Korea, Korea Advanced Institute of Science and Technology, Republic of Korea 备注：None 摘要：我们提出了一种方法来训练我们的多说话人情感文本到语音合成器，它可以表达10个说话人的7种不同情感。在学习之前，将删除音频样本中的所有静音。这使得我们的模型能够快速学习。课程学习被用于有效地训练我们的模型。我们的模型首先使用一个大的单说话人中性数据集进行训练，然后使用所有说话人的中性语音进行训练。最后，使用来自所有说话人的情感语音数据集对我们的模型进行训练。在每个阶段中，每个说话人情感对的训练样本以相同的概率出现在小批量中。通过这个过程，我们的模型可以合成所有目标说话人的语音和情感。我们的合成音频集可在我们的网页上找到。摘要：We present a methodology to train our multi-speaker emotional text-to-speech synthesizer that can express speech for 10 speakers' 7 different emotions. All silences from audio samples are removed prior to learning. This results in fast learning by our model. Curriculum learning is applied to train our model efficiently. Our model is first trained with a large single-speaker neutral dataset, and then trained with neutral speech from all speakers. Finally, our model is trained using datasets of emotional speech from all speakers. In each stage, training samples of each speaker-emotion pair have equal probability to appear in mini-batches. Through this procedure, our model can synthesize speech for all targeted speakers and emotions. Our synthesized audio sets are available on our web page.

【30】 Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training 标题：自助式VITS：将视觉Transformer从预训中解放出来链接：https://arxiv.org/abs/2112.03552

作者：Haofei Zhang,Jiarui Duan,Mengqi Xue,Jie Song,Li Sun,Mingli Song 机构： Zhejiang University, Xidian University 备注：10 Pages, Reviewing under CVPR 2022 摘要：近年来，视觉变换器（VIT）发展迅速，并开始挑战卷积神经网络（CNN）在计算机视觉领域的主导地位。随着通用Transformer结构取代硬编码卷积电感偏置，VIT已超过CNN，尤其是在数据充足的情况下。然而，VIT容易过度适应小数据集，因此依赖于大规模的预训练，这需要花费大量时间。在本文中，我们努力将CNN的归纳偏差引入VIT，同时保留其网络结构以获得更高的上界，并设置更合适的优化目标，从而将VIT从预训练中解放出来。首先，基于给定的具有电感偏置的ViT设计了一个代理CNN。在此基础上，提出了一种自举训练算法，通过权重共享对agent和ViT进行联合优化，ViT从agent的中间特征中学习归纳偏差。在有限的训练数据下对CIFAR-10/100和ImageNet-1k进行的大量实验表明，令人鼓舞的结果是，诱导偏差有助于VIT更快地收敛，并优于参数更少的传统CNN。摘要：Recently, vision Transformers (ViTs) are developing rapidly and starting to challenge the domination of convolutional neural networks (CNNs) in the realm of computer vision (CV). With the general-purpose Transformer architecture for replacing the hard-coded inductive biases of convolution, ViTs have surpassed CNNs, especially in data-sufficient circumstances. However, ViTs are prone to over-fit on small datasets and thus rely on large-scale pre-training, which expends enormous time. In this paper, we strive to liberate ViTs from pre-training by introducing CNNs' inductive biases back to ViTs while preserving their network architectures for higher upper bound and setting up more suitable optimization objectives. To begin with, an agent CNN is designed based on the given ViT with inductive biases. Then a bootstrapping training algorithm is proposed to jointly optimize the agent and ViT with weight sharing, during which the ViT learns inductive biases from the intermediate features of the agent. Extensive experiments on CIFAR-10/100 and ImageNet-1k with limited training data have shown encouraging results that the inductive biases help ViTs converge significantly faster and outperform conventional CNNs with even fewer parameters.

【31】 Self-Organized Polynomial-Time Coordination Graphs 标题：自组织多项式时间坐标图链接：https://arxiv.org/abs/2112.03547

作者：Qianlan Yang,Weijun Dong,Zhizhou Ren,Jianhao Wang,Tonghan Wang,Chongjie Zhang 机构：Institute for Interdisciplinary Information Sciences, Tsinghua University, Department of Computer Science, University of Illinois at Urbana-Champaign 摘要：协调图是多agent强化学习中一种很有前途的agent协作建模方法。它将一个大型多代理系统分解为一组重叠的组，这些组表示底层的协调依赖关系。该范式中的一个关键挑战是计算基于图的值分解的最大值操作的复杂性。它是指分散约束优化问题（DCOP），其常数比近似是NP难问题。为了绕过这一基本难题，本文提出了一种新的方法，称为自组织多项式时间协调图（SOP-CG），该方法使用结构化图类来保证诱导DCOP具有足够的函数表达能力。我们将图的拓扑结构扩展为状态相关的，将图的选择表述为一个假想的agent，最后从统一的Bellman最优性方程导出一个端到端的学习范式。在实验中，我们表明，我们的方法学习可解释图拓扑，诱导有效的协调，并在各种协作多代理任务中提高性能。摘要：Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It factorizes a large multi-agent system into a suite of overlapping groups that represent the underlying coordination dependencies. One critical challenge in this paradigm is the complexity of computing maximum-value actions for a graph-based value factorization. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this fundamental hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the optimality of the induced DCOPs with sufficient function expressiveness. We extend the graph topology to be state-dependent, formulate the graph selection as an imaginary agent, and finally derive an end-to-end learning paradigm from the unified Bellman optimality equation. In experiments, we show that our approach learns interpretable graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks.

【32】 A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion 标题：一种三维点云补全的条件点扩散-细化范式链接：https://arxiv.org/abs/2112.03530

作者：Zhaoyang Lyu,Zhifeng Kong,Xudong Xu,Liang Pan,Dahua Lin 机构：CUHK-SenseTime Joint Lab, The Chinese University of Hong Kong, University of California, San Diego, S-Lab, Nanyang Technological University, Shanghai AI Laboratory, Centre of Perceptual and Interactive Intelligence 摘要：三维点云是捕捉真实世界三维对象的重要三维表示。然而，实际扫描的三维点云通常是不完整的，为下游应用程序恢复完整的点云非常重要。大多数现有的点云完成方法使用倒角距离（CD）损失进行训练。CD loss通过搜索最近邻来估计两个点云之间的对应关系，这不会捕获生成形状上的整体点密度分布，因此可能导致点云生成不均匀。为了解决这个问题，我们提出了一种新的点扩散细化（PDR）模式来完成点云。PDR由条件生成网络（CGNet）和优化网络（RFNet）组成。CGNet使用一种称为去噪扩散概率模型（DDPM）的条件生成模型来生成以部分观测为条件的粗略完成。DDPM在生成的点云和均匀地面真值之间建立一对一的逐点映射，然后优化均方误差损失以实现均匀生成。RFNet细化了CGNet的粗略输出，并进一步提高了完成的点云的质量。此外，我们还为这两个网络开发了一种新的双路径结构。该体系结构可以（1）从部分观测的点云中有效地提取多层次特征以指导完成；（2）精确地操纵三维点的空间位置以获得平滑的表面和清晰的细节。在各种基准数据集上的大量实验结果表明，我们的PDR范式优于以前最先进的点云完成方法。值得注意的是，在RFNet的帮助下，我们可以将DDPM的迭代生成过程加快50倍，而不会有太大的性能下降。摘要：3D point cloud is an important 3D representation for capturing real world 3D objects. However, real-scanned 3D point clouds are often incomplete, and it is important to recover complete point clouds for downstream applications. Most existing point cloud completion methods use Chamfer Distance (CD) loss for training. The CD loss estimates correspondences between two point clouds by searching nearest neighbors, which does not capture the overall point density distribution on the generated shape, and therefore likely leads to non-uniform point cloud generation. To tackle this problem, we propose a novel Point Diffusion-Refinement (PDR) paradigm for point cloud completion. PDR consists of a Conditional Generation Network (CGNet) and a ReFinement Network (RFNet). The CGNet uses a conditional generative model called the denoising diffusion probabilistic model (DDPM) to generate a coarse completion conditioned on the partial observation. DDPM establishes a one-to-one pointwise mapping between the generated point cloud and the uniform ground truth, and then optimizes the mean squared error loss to realize uniform generation. The RFNet refines the coarse output of the CGNet and further improves quality of the completed point cloud. Furthermore, we develop a novel dual-path architecture for both networks. The architecture can (1) effectively and efficiently extract multi-level features from partially observed point clouds to guide completion, and (2) accurately manipulate spatial locations of 3D points to obtain smooth surfaces and sharp details. Extensive experimental results on various benchmark datasets show that our PDR paradigm outperforms previous state-of-the-art methods for point cloud completion. Remarkably, with the help of the RFNet, we can accelerate the iterative generation process of the DDPM by up to 50 times without much performance drop.

【33】 UNITER-Based Situated Coreference Resolution with Rich Multimodal Input 标题：基于单位元的富多模态输入条件共指消解链接：https://arxiv.org/abs/2112.03521

作者：Yichen Huang,Yuchen Wang,Yik-Cheung Tam 机构： New York University Shanghai 摘要：作为第十届对话系统技术挑战赛（DSTC10）的一部分，我们介绍了我们在情境和交互式多模态对话2.0（SIMMC 2.0）数据集的多模态共指消解任务方面的工作。我们提出了一个基于UNITER的模型，利用丰富的多模态上下文（如文本对话历史、对象知识库和可视对话场景）来确定当前场景中的每个对象是否在当前对话回合中被提及。结果表明，所提出的方法大大优于官方DSTC10基线，在开发集上，对象F1得分从36.6%提高到77.3%，证明了所提出的对象表示法在丰富的多模态输入中的有效性。我们的模型在对象共指消解任务的官方评估中排名第二，模型融合后F1得分为73.3%。摘要：We present our work on the multimodal coreference resolution task of the Situated and Interactive Multimodal Conversation 2.0 (SIMMC 2.0) dataset as a part of the tenth Dialog System Technology Challenge (DSTC10). We propose a UNITER-based model utilizing rich multimodal context such as textual dialog history, object knowledge base and visual dialog scenes to determine whether each object in the current scene is mentioned in the current dialog turn. Results show that the proposed approach outperforms the official DSTC10 baseline substantially, with the object F1 score boosted from 36.6% to 77.3% on the development set, demonstrating the effectiveness of the proposed object representations from rich multimodal input. Our model ranks second in the official evaluation on the object coreference resolution task with an F1 score of 73.3% after model ensembling.

【34】 Genetic Algorithm for Constrained Molecular Inverse Design 标题：遗传算法在约束分子反设计中的应用链接：https://arxiv.org/abs/2112.03518

作者：Yurim Lee,Gydam Choi,Minsug Yoon,Cheongwon Kim 机构：Department of Artificial Intelligence and Language Engineering, Sejong University, Gyudam Choi, Department of Software Convergence, Minsung Yoon∗, Communication & Media Research Laboratory, Electronics and Telecommunications Research Institute 摘要：遗传算法适合于探索较大的搜索空间，因为它能找到近似解。正是由于这一优势，遗传算法能够有效地探索分子搜索空间等广阔而未知的空间。虽然该算法适用于搜索广阔的化学空间，但在保持分子亚结构的同时，很难优化药理学性质。为了解决这个问题，我们引入了一种具有约束分子逆向设计的遗传算法。该算法成功地产生了用于交叉和变异的有效分子。此外，它使用两阶段优化在遵守结构约束的同时优化特定属性。实验证明，我们的算法在保持结构约束的同时，能有效地找到满足特定性质的分子。摘要：A genetic algorithm is suitable for exploring large search spaces as it finds an approximate solution. Because of this advantage, genetic algorithm is effective in exploring vast and unknown space such as molecular search space. Though the algorithm is suitable for searching vast chemical space, it is difficult to optimize pharmacological properties while maintaining molecular substructure. To solve this issue, we introduce a genetic algorithm featuring a constrained molecular inverse design. The proposed algorithm successfully produces valid molecules for crossover and mutation. Furthermore, it optimizes specific properties while adhering to structural constraints using a two-phase optimization. Experiments prove that our algorithm effectively finds molecules that satisfy specific properties while maintaining structural constraints.

【35】 Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft 标题：人类反馈学习与知识工程相结合解决“我的世界”中的分层任务链接：https://arxiv.org/abs/2112.03482

作者：Vinicius G. Goecks,Nicholas Waytowich,David Watkins,Bharat Prakash 机构：Army Research Laboratory, Aberdeen Proving Ground, Maryland, USA, Columbia University, New York City, New York, USA, University of Maryland, Baltimore, Maryland, USA 备注：Submitted to the AAAI 2022 Spring Symposium on Machine Learning and Knowledge Engineering for Hybrid Intelligence (AAAI-MAKE 2022) 摘要：现实世界中感兴趣的任务通常由人类可读的描述来定义，并且没有预定义的奖励信号，除非由人类设计师定义。相反，数据驱动算法通常被设计用于解决特定的、狭义定义的任务，并具有驱动代理学习的性能指标。在这项工作中，我们提出的解决方案，赢得第一名，并被授予最人类样的代理在2021 NANIPs竞争矿物玄武岩挑战：学习人类反馈在MiCeCRAP，这挑战了参与者使用人的数据来解决四个任务只定义了自然语言描述和奖励功能。我们的方法使用可用的人类演示数据来训练用于导航的模仿学习策略，并使用额外的人类反馈来训练图像分类器。这些模块，连同估计的里程图，然后组合成一个状态机，该状态机是根据人类对任务的知识设计的，这些任务按自然层次结构分解，并控制学习代理在任何时刻应遵循的宏观行为。我们将这种混合智能方法与端到端机器学习和纯工程解决方案进行比较，然后由人工评估人员进行判断。代码库可在https://github.com/viniciusguigo/kairos_minerl_basalt. 摘要：Real-world tasks of interest are generally poorly defined by human-readable descriptions and have no pre-defined reward signals unless it is defined by a human designer. Conversely, data-driven algorithms are often designed to solve a specific, narrowly defined, task with performance metrics that drives the agent's learning. In this work, we present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft, which challenged participants to use human data to solve four tasks defined only by a natural language description and no reward function. Our approach uses the available human demonstration data to train an imitation learning policy for navigation and additional human feedback to train an image classifier. These modules, together with an estimated odometry map, are then combined into a state-machine designed based on human knowledge of the tasks that breaks them down in a natural hierarchy and controls which macro behavior the learning agent should follow at any instant. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators. Codebase is available at https://github.com/viniciusguigo/kairos_minerl_basalt.

【36】 Defending against Model Stealing via Verifying Embedded External Features 标题：通过验证嵌入的外部特征来防御模型窃取链接：https://arxiv.org/abs/2112.03476

作者：Yiming Li,Linghui Zhu,Xiaojun Jia,Yong Jiang,Shu-Tao Xia,Xiaochun Cao 机构：Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China, Peng Cheng Laboratory, Shenzhen, China, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 备注：This work is accepted by the AAAI 2022. The first two authors contributed equally to this work. 11 pages 摘要：获得训练有素的模型需要昂贵的数据收集和训练程序，因此该模型是一项宝贵的知识产权。最近的研究表明，对手可以“窃取”部署的模型，即使他们没有训练样本，也无法获得模型参数或结构。目前，有一些防御方法可以缓解这种威胁，主要是通过增加模型窃取的成本。在本文中，我们通过验证可疑模型是否包含defender specifiedemph{external features}的知识，从另一个角度探讨了防御。具体来说，我们通过使用样式转换对一些训练样本进行回火来嵌入外部特征。然后，我们训练一个元分类器来确定模型是否从受害者那里被盗。这种方法的灵感来自于这样一种理解，即被盗模型应该包含受害者模型学习到的特征知识。我们在CIFAR-10和ImageNet数据集上检查了我们的方法。实验结果表明，我们的方法可以有效地同时检测不同类型的模型窃取，即使窃取的模型是通过多阶段窃取过程获得的。再现主要结果的代码可在Github上获得(https://github.com/zlh-thu/StealingVerification). 摘要：Obtaining a well-trained model involves expensive data collection and training procedures, therefore the model is a valuable intellectual property. Recent studies revealed that adversaries can `steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. Currently, there were some defense methods to alleviate this threat, mostly by increasing the cost of model stealing. In this paper, we explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emph{external features}. Specifically, we embed the external features by tempering a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. We examine our method on both CIFAR-10 and ImageNet datasets. Experimental results demonstrate that our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process. The codes for reproducing main results are available at Github (https://github.com/zlh-thu/StealingVerification).

【37】 Spectral Complexity-scaled Generalization Bound of Complex-valued Neural Networks 标题：复值神经网络的谱复杂度泛化界链接：https://arxiv.org/abs/2112.03467

作者：Haowen Chen,Fengxiang He,Shiye Lei,Dacheng Tao 机构： University of Hong Kong 摘要：复值神经网络（CVNNs）在信号处理和图像识别等领域有着广泛的应用。然而，很少有工作关注于CVNN的泛化，尽管这对于确保CVNN在未知数据上的性能至关重要。本文首次证明了复值神经网络的推广界。有界标度具有谱复杂度，其主导因子是权矩阵的谱范数积。此外，当训练数据是连续的时，我们的工作为CVNN提供了一个泛化界，这也受谱复杂度的影响。理论上，这些边界是通过Maurey稀疏引理和Dudley熵积分推导出来的。根据经验，我们通过在不同的数据集上训练复值卷积神经网络来进行实验：MNIST、FashionMNIST、CIFAR-10、CIFAR-100、Tiny ImageNet和IMDB。Spearman的秩序相关系数和这些数据集上相应的p值有力地证明了网络的谱复杂度（通过权重矩阵谱范数乘积测量）与泛化能力具有统计显著相关性。摘要：Complex-valued neural networks (CVNNs) have been widely applied to various fields, especially signal processing and image recognition. However, few works focus on the generalization of CVNNs, albeit it is vital to ensure the performance of CVNNs on unseen data. This paper is the first work that proves a generalization bound for the complex-valued neural network. The bound scales with the spectral complexity, the dominant factor of which is the spectral norm product of weight matrices. Further, our work provides a generalization bound for CVNNs when training data is sequential, which is also affected by the spectral complexity. Theoretically, these bounds are derived via Maurey Sparsification Lemma and Dudley Entropy Integral. Empirically, we conduct experiments by training complex-valued convolutional neural networks on different datasets: MNIST, FashionMNIST, CIFAR-10, CIFAR-100, Tiny ImageNet, and IMDB. Spearman's rank-order correlation coefficients and the corresponding p values on these datasets give strong proof that the spectral complexity of the network, measured by the weight matrices spectral norm product, has a statistically significant correlation with the generalization ability.

【38】 Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size 标题：GLUE：自适应地合并单表基数以估计连接查询大小链接：https://arxiv.org/abs/2112.03458

作者：Rong Zhu,Tianjing Zeng,Andreas Pfadler,Wei Chen,Bolin Ding,Jingren Zhou 机构：Alibaba Group,Renmin University of China 摘要：基数估计（CardEst）是查询优化器的核心组件，在DBMS中生成高质量的查询计划中起着重要作用。CardEst问题在过去几十年中得到了广泛的研究，使用了传统方法和ML增强方法。然而，CardEst中最困难的问题，即如何估计多个表上的连接查询大小，尚未得到广泛的解决。目前的方法要么对独立性假设作出答复，要么采用负担沉重的技术，其性能仍远不能令人满意。更糟糕的是，现有的CardEst方法往往设计为优化一个目标，即推理速度或估计精度，不能适应不同的场合。在本文中，我们提出了一个非常通用的框架，称为Glue，以应对这些挑战。它的关键思想是优雅地解耦不同表之间的相关性，并无损地合并单个表的CardEst结果以估计联接查询的大小。Glue支持使用任何现有的CardEst方法获得单个表的CardEst结果，并且可以处理任何复杂的连接模式。因此，它很容易适应具有不同性能要求的不同场景，即具有快速估计时间的OLTP或具有高估计精度的OLAP。同时，我们展示了Glue可以无缝地集成到计划搜索过程中，并且能够支持计算不同数量的值。所有这些特性都显示了在实际DBMS中部署Glue的潜在优势。摘要：Cardinality estimation (CardEst), a central component of the query optimizer, plays a significant role in generating high-quality query plans in DBMS. The CardEst problem has been extensively studied in the last several decades, using both traditional and ML-enhanced methods. Whereas, the hardest problem in CardEst, i.e., how to estimate the join query size on multiple tables, has not been extensively solved. Current methods either reply on independence assumptions or apply techniques with heavy burden, whose performance is still far from satisfactory. Even worse, existing CardEst methods are often designed to optimize one goal, i.e., inference speed or estimation accuracy, which can not adapt to different occasions. In this paper, we propose a very general framework, called Glue, to tackle with these challenges. Its key idea is to elegantly decouple the correlations across different tables and losslessly merge single table CardEst results to estimate the join query size. Glue supports obtaining the single table-wise CardEst results using any existing CardEst method and can process any complex join schema. Therefore, it easily adapts to different scenarios having different performance requirements, i.e., OLTP with fast estimation time or OLAP with high estimation accuracy. Meanwhile, we show that Glue can be seamlessly integrated into the plan search process and is able to support counting distinct number of values. All these properties exhibit the potential advances of deploying Glue in real-world DBMS.

【39】 Producing augmentation-invariant embeddings from real-life imagery 标题：从真实图像生成增广不变嵌入链接：https://arxiv.org/abs/2112.03415

作者：Sergio Manuel Papadakis,Sanjay Addicam 摘要：本文提出了一种从真实图像中生成特征丰富、高维嵌入空间的有效方法。制作的功能设计独立于社交媒体上出现的真实案例中使用的增强功能。我们的方法使用卷积神经网络（CNN）生成嵌入空间。通过使用自动生成的增广，使用弧形头部来训练模型。此外，我们还提出了一种将包含相同语义信息的不同嵌入进行集成的方法，一种使用外部数据集对生成的嵌入进行规范化的方法，以及一种在ArcFace head中使用大量类对这些模型进行快速训练的新方法。使用这种方法，我们获得了第二的位置在2021脸谱网AI图像相似性挑战：描述符跟踪。摘要：This article presents an efficient way to produce feature-rich, high-dimensionality embedding spaces from real-life images. The features produced are designed to be independent from augmentations used in real-life cases which appear on social media. Our approach uses convolutional neural networks (CNN) to produce an embedding space. An ArcFace head was used to train the model by employing automatically produced augmentations. Additionally, we present a way to make an ensemble out of different embeddings containing the same semantic information, a way to normalize the resulting embedding using an external dataset, and a novel way to perform quick training of these models with a high number of classes in the ArcFace head. Using this approach we achieved the 2nd place in the 2021 Facebook AI Image Similarity Challenge: Descriptor Track.

【40】 JUSTICE: A Benchmark Dataset for Supreme Court's Judgment Prediction 标题：正义：最高法院判决预测的基准数据集链接：https://arxiv.org/abs/2112.03414

作者：Mohammad Alali,Shaayan Syed,Mohammed Alsayed,Smit Patel,Hemanth Bodala 机构：University of Southern California 备注：6 pages, 6 figures 摘要：最近，人工智能在许多领域得到了应用，法律体系也不例外。然而，就目前而言，与美国最高法院（SCOTUS）法律文件相关的注释良好的数据集数量非常有限，供公众使用。尽管最高法院的裁决是公共领域的知识，但由于每次都需要从零开始手动收集和处理数据，因此尝试对其进行有意义的工作就成了一项更大的任务。因此，我们的目标是创建一个高质量的SCOTUS法院案例数据集，以便在自然语言处理（NLP）研究和其他数据驱动的应用中方便地使用它们。此外，NLP的最新进展为我们提供了构建预测模型的工具，可用于揭示影响法院判决的模式。通过使用先进的NLP算法来分析以前的法院案例，经过训练的模型能够根据原告和被告提供的案件事实以文本形式预测和分类法院的判决；换言之，该模型通过生成最终裁决来模拟人类陪审团。摘要：Artificial intelligence is being utilized in many domains as of late, and the legal system is no exception. However, as it stands now, the number of well-annotated datasets pertaining to legal documents from the Supreme Court of the United States (SCOTUS) is very limited for public use. Even though the Supreme Court rulings are public domain knowledge, trying to do meaningful work with them becomes a much greater task due to the need to manually gather and process that data from scratch each time. Hence, our goal is to create a high-quality dataset of SCOTUS court cases so that they may be readily used in natural language processing (NLP) research and other data-driven applications. Additionally, recent advances in NLP provide us with the tools to build predictive models that can be used to reveal patterns that influence court decisions. By using advanced NLP algorithms to analyze previous court cases, the trained models are able to predict and classify a court's judgment given the case's facts from the plaintiff and the defendant in textual format; in other words, the model is emulating a human jury by generating a final verdict.

【41】 Extrapolation Frameworks in Cognitive Psychology Suitable for Study of Image Classification Models 标题：适用于图像分类模型研究的认知心理学外推框架链接：https://arxiv.org/abs/2112.03411

作者：Roozbeh Yousefzadeh,Jessica A. Mollick 机构：Yale Center for Medical Informatics, Yale University, and VA Connecticut Healthcare System, New Haven, CT , Department of Psychiatry, Yale School of Medicine 备注：1st Workshop on Human and Machine Decisions (WHMD 2021) at NeurIPS 2021 摘要：我们研究了深度学习图像分类模型的功能任务，并表明图像分类需要外推能力。这表明，为了理解深度学习，必须开发新的理论，因为当前的理论假设模型只是插值，留下了许多关于它们的问题没有答案。我们研究了通过训练模型从图像中提取的像素空间和特征空间（在其隐藏层中，包括预训练残差神经网络最后一个隐藏层中的64维特征空间），以及通过小波/剪切波提取的特征空间。在所有这些领域中，测试样本都大大超出了训练集的凸包，图像分类需要外推。与深度学习文献相反，在认知科学、心理学和神经科学中，外推和学习通常是同时进行的。此外，据报道，人类视觉认知和行为的许多方面都涉及外推。我们提出了一个新的外推框架，用于深入学习模型的数学研究。在我们的框架中，我们使用术语外推，在训练集的凸包外（在像素空间或特征空间中），但在训练数据定义的特定范围内，以这种特定的方式外推，认知科学的许多研究中定义了相同的外推方式。我们解释说，我们的外推框架可以为深度学习的开放性研究问题提供新的答案，包括其过度参数化、训练机制、分布外检测，等等。我们还发现，在学习任务中，外推的程度可以忽略不计，据报道，深度学习没有简单模型的优势。摘要：We study the functional task of deep learning image classification models and show that image classification requires extrapolation capabilities. This suggests that new theories have to be developed for the understanding of deep learning as the current theory assumes models are solely interpolating, leaving many questions about them unanswered. We investigate the pixel space and also the feature spaces extracted from images by trained models (in their hidden layers, including the 64-dimensional feature space in the last hidden layer of pre-trained residual neural networks), and also the feature space extracted by wavelets/shearlets. In all these domains, testing samples considerably fall outside the convex hull of training sets, and image classification requires extrapolation. In contrast to the deep learning literature, in cognitive science, psychology, and neuroscience, extrapolation and learning are often studied in tandem. Moreover, many aspects of human visual cognition and behavior are reported to involve extrapolation. We propose a novel extrapolation framework for the mathematical study of deep learning models. In our framework, we use the term extrapolation in this specific way of extrapolating outside the convex hull of training set (in the pixel space or feature space) but within the specific scope defined by the training data, the same way extrapolation is defined in many studies in cognitive science. We explain that our extrapolation framework can provide novel answers to open research problems about deep learning including their over-parameterization, their training regime, out-of-distribution detection, etc. We also see that the extent of extrapolation is negligible in learning tasks where deep learning is reported to have no advantage over simple models.

【42】 Causal Analysis and Classification of Traffic Crash Injury Severity Using Machine Learning Algorithms 标题：基于机器学习算法的交通碰撞伤严重程度原因分析与分类链接：https://arxiv.org/abs/2112.03407

作者：Meghna Chakraborty,Timothy Gates,Subhrajit Sinha 机构：Department of Civil and Environmental Engineering, Michigan State University, South Shaw, Pacific Northwest National Laboratory, Battelle Blvd, Richland, WA 摘要：应用非参数方法对交通事故进行损伤严重程度的因果分析和分类受到了有限的关注。本研究采用不同的机器学习技术，包括决策树（DT）、随机森林（RF）、极端梯度增强（XGBoost）和深度神经网络（DNN），提出了一个因果推断的方法框架，使用格兰杰因果关系分析和州际交通事故伤害严重程度分类。本研究中使用的数据是针对2014年至2019年间德克萨斯州所有州际公路上的交通事故获得的。建议的严重性分类方法的输出包括致命和严重伤害（KA）碰撞、非严重和可能伤害（BC）碰撞以及仅财产损失（PDO）碰撞的三类。格兰杰因果关系有助于确定影响碰撞严重性的最具影响力的因素，而基于学习的模型预测了性能不同的严重性等级。Granger因果关系分析的结果确定，限速、地面和天气条件、交通量、工作区的存在、工作区的工人和高占用率车辆（HOV）车道等是影响碰撞严重性的最重要因素。分类器的预测性能在不同类别中产生不同的结果。具体而言，虽然决策树和随机森林分类器分别为数据中最稀有的KA类的PDO和BC严重性提供了最大的性能，但深度神经网络分类器的性能优于所有其他算法，这很可能是由于其逼近非线性模型的能力。本研究有助于使用非参数方法对交通碰撞损伤严重程度进行因果分析和分类预测，这方面的知识非常有限。摘要：Causal analysis and classification of injury severity applying non-parametric methods for traffic crashes has received limited attention. This study presents a methodological framework for causal inference, using Granger causality analysis, and injury severity classification of traffic crashes, occurring on interstates, with different machine learning techniques including decision trees (DT), random forest (RF), extreme gradient boosting (XGBoost), and deep neural network (DNN). The data used in this study were obtained for traffic crashes on all interstates across the state of Texas from a period of six years between 2014 and 2019. The output of the proposed severity classification approach includes three classes for fatal and severe injury (KA) crashes, non-severe and possible injury (BC) crashes, and property damage only (PDO) crashes. While Granger Causality helped identify the most influential factors affecting crash severity, the learning-based models predicted the severity classes with varying performance. The results of Granger causality analysis identified the speed limit, surface and weather conditions, traffic volume, presence of workzones, workers in workzones, and high occupancy vehicle (HOV) lanes, among others, as the most important factors affecting crash severity. The prediction performance of the classifiers yielded varying results across the different classes. Specifically, while decision tree and random forest classifiers provided the greatest performance for PDO and BC severities, respectively, for the KA class, the rarest class in the data, deep neural net classifier performed superior than all other algorithms, most likely due to its capability of approximating nonlinear models. This study contributes to the limited body of knowledge pertaining to causal analysis and classification prediction of traffic crash injury severity using non-parametric approaches.

【43】 A Novel Deep Parallel Time-series Relation Network for Fault Diagnosis 标题：一种用于故障诊断的新型深度并行时序关系网络链接：https://arxiv.org/abs/2112.03405

作者：Chun Yang 机构：School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China 摘要：考虑到应用时间序列数据上下文信息的模型可以提高故障诊断性能，提出了一些神经网络结构，如RNN、LSTM和GRU，以有效地对工业过程进行建模。然而，这些模型受到串行计算的限制，因此无法实现高诊断效率。并行CNN也很难有效地实现故障诊断，因为它需要更大的卷积核或深层结构来实现长期的特征提取能力。另外，BERT模型采用绝对位置嵌入的方法将上下文信息引入模型中，会给原始数据带来噪声，不能直接应用于故障诊断。为了解决上述问题，本文提出了一种深并行时序关系网络故障诊断模型。DPTRN主要有三个优点：（1）我们提出的时间关系单元基于全多层感知器（textit{MLP}）结构，因此，DPTRN以并行方式执行故障诊断，显著提高了计算效率。（2）通过对绝对位置嵌入的改进，我们的新型解耦位置嵌入单元可以直接应用于故障诊断，并学习上下文信息。（3）我们提出的DPTRN在特征可解释性方面具有明显的优势。我们的模型在TE和KDD-CUP99数据集上都优于其他方法，这证实了所提出的DPTRN模型的有效性、效率和可解释性。摘要：Considering the models that apply the contextual information of time-series data could improve the fault diagnosis performance, some neural network structures such as RNN, LSTM, and GRU were proposed to model the industrial process effectively. However, these models are restricted by their serial computation and hence cannot achieve high diagnostic efficiency. Also the parallel CNN is difficult to implement fault diagnosis in an efficient way because it requires larger convolution kernels or deep structure to achieve long-term feature extraction capabilities. Besides, BERT model applies absolute position embedding to introduce contextual information to the model, which would bring noise to the raw data and therefore cannot be applied to fault diagnosis directly. In order to address the above problems, a fault diagnosis model named deep parallel time-series relation network(textit{DPTRN}) has been proposed in this paper. There are mainly three advantages for DPTRN: (1) Our proposed time relationship unit is based on full multilayer perceptron(textit{MLP}) structure, therefore, DPTRN performs fault diagnosis in a parallel way and improves computing efficiency significantly. (2) By improving the absolute position embedding, our novel decoupling position embedding unit could be applied on the fault diagnosis directly and learn contextual information. (3) Our proposed DPTRN has obvious advantage in feature interpretability. Our model outperforms other methods on both TE and KDD-CUP99 datasets which confirms the effectiveness, efficiency and interpretability of the proposed DPTRN model.

【44】 Feature Importance-aware Graph Attention Network and Dueling Double Deep Q-Network Combined Approach for Critical Node Detection Problems 标题：关键节点检测问题的特征重要度图关注网络和双深度Q网络相结合的方法链接：https://arxiv.org/abs/2112.03404

作者：Xuwei Tan,Yangming Zhou,Zhang-Hua Fu,Mengchu Zhou 机构：Department of Computer Science and Engineering, East China University of Science and Technology, Sino-US Global Logistics Institute, Shanghai Jiao Tong University, Macau Institute of Systems Engineering, Macau University of Science and Technology 备注：10 pages, 3 figures 摘要：检测稀疏网络中的关键节点在许多应用领域都很重要。关键节点问题（CNP）的目的是从一个网络中找到一组关键节点，这些节点的删除会最大程度地降低剩余网络的成对连通性。由于其一般NP难性质，最先进的CNP解决方案基于启发式方法。在设计此类方法时，通常需要领域知识和反复试验，因此需要花费大量的精力和时间。本文提出了一种基于特征重要性的图注意网络用于节点表示，并将其与双深度Q网络相结合，首次提出了一种端到端的算法来求解CNP。它不需要任何特定于问题的知识或大多数现有方法所要求的标记数据集。一旦对模型进行了训练，就可以将其推广到处理各种类型的CNP（具有不同的大小和拓扑结构），而无需重新训练。在28个真实网络上进行的大量实验表明，该方法与最新的方法具有很高的可比性。它不需要任何特定于问题的知识，因此，通过使用现有的方法，它可以适用于许多应用，包括那些不可能的应用。它可以与一些局部搜索方法相结合，进一步提高其解的质量。大量的比较结果表明了该方法在解决CNP问题上的有效性。摘要：Detecting critical nodes in sparse networks is important in a variety of application domains. A Critical Node Problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network. Due to its general NP-hard nature, state-of-the-art CNP solutions are based on heuristic approaches. Domain knowledge and trial-and-error are usually required when designing such approaches, thus consuming considerable effort and time. This work proposes a feature importance-aware graph attention network for node representation and combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time. It does not need any problem-specific knowledge or labeled datasets as required by most of existing methods. Once the model is trained, it can be generalized to cope with various types of CNPs (with different sizes and topological structures) without re-training. Extensive experiments on 28 real-world networks show that the proposed method is highly comparable to state-of-the-art methods. It does not require any problem-specific knowledge and, hence, can be applicable to many applications including those impossible ones by using the existing approaches. It can be combined with some local search methods to further improve its solution quality. Extensive comparison results are given to show its effectiveness in solving CNP.

【45】 Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design 标题：嵌套双曲空间降维与双曲神经网络设计链接：https://arxiv.org/abs/2112.03402

作者：Xiran Fan,Chun-Hao Yang,Baba C. Vemuri 机构：Department of Statistics, National Taiwan University, Institute of Applied Mathematical Science, Department of CISE, University of Florida 备注：19 pages, 6 figures 摘要：双曲线神经网络由于能够有效地表示分层数据集，在最近的一段时间里受到了广泛的欢迎。开发这些网络的挑战在于嵌入空间即双曲空间的非线性。双曲空间是洛伦兹群的齐次黎曼流形。大多数现有方法（除了一些例外）使用局部线性化来定义各种操作，这些操作与欧氏空间中传统深度神经网络中使用的操作并行。在本文中，我们提出了一种新的完全双曲型神经网络，它使用了投影（嵌入）的概念，然后在双曲空间中使用了内在聚集和非线性。这里的新颖之处在于投影，该投影设计用于将数据投影到低维嵌入双曲空间，从而导致嵌套双曲空间表示独立用于降维。主要的理论贡献是在洛伦兹变换下证明了所提出的嵌入是等距的和等变的。该投影在计算上是有效的，因为它可以用简单的线性运算来表示，并且由于上述等变特性，它允许权重共享。嵌套双曲空间表示是我们网络的核心组成部分，因此，我们首先将该嵌套双曲空间表示与其他降维方法（如切线PCA、主测地分析（PGA）和HoroPCA）进行比较。基于这种等变嵌入，我们开发了一种新的全双曲图卷积神经网络结构来学习投影参数。最后，我们在几个公开的数据集上展示了我们网络的比较性能。摘要：Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in developing these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this ensuing nested hyperbolic space representation with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets.

【46】 Guided Imitation of Task and Motion Planning 标题：任务和运动规划的引导式仿真链接：https://arxiv.org/abs/2112.03386

作者：Michael James McDonald,Dylan Hadfield-Menell 机构：Massachusetts Institute of Technology 备注：16 pages, 6 figures, 2 tables, submitted to Conference on Robot Learning 2021, to be published in Proceedings of Machine Learning Research 摘要：虽然现代政策优化方法可以从感官数据中进行复杂的操作，但它们在时间范围和多个子目标的问题上仍存在困难。另一方面，任务和运动规划（TAMP）方法可以扩展到很长的范围，但它们的计算成本很高，需要精确跟踪世界状态。我们提出了一种利用这两种方法的优点的方法：我们训练策略来模拟TAMP解算器的输出。这产生了一个前馈策略，可以从感官数据完成多步骤任务。首先，我们构建了一个异步分布式TAMP解算器，该解算器能够以足够快的速度生成用于模拟学习的监控数据。然后，我们提出了一个分层策略架构，允许我们使用部分训练的控制策略来加速TAMP求解器。在具有7自由度关节控制的机器人操作任务中，部分训练的策略将规划所需的时间减少了2.6倍。在这些任务中，我们可以了解到一个策略，该策略88%的时间从对象姿势观测中解决RoboSite 4对象拾取位置任务，以及一个策略，该策略79%的时间从RGB图像中解决RoboDesk 9目标基准（9个不同任务的平均值）。摘要：While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals. On the other hand, task and motion planning (TAMP) methods scale to long horizons but they are computationally expensive and need to precisely track world state. We propose a method that draws on the strength of both methods: we train a policy to imitate a TAMP solver's output. This produces a feed-forward policy that can accomplish multi-step tasks from sensory data. First, we build an asynchronous distributed TAMP solver that can produce supervision data fast enough for imitation learning. Then, we propose a hierarchical policy architecture that lets us use partially trained control policies to speed up the TAMP solver. In robotic manipulation tasks with 7-DoF joint control, the partially trained policies reduce the time needed for planning by a factor of up to 2.6. Among these tasks, we can learn a policy that solves the RoboSuite 4-object pick-place task 88% of the time from object pose observations and a policy that solves the RoboDesk 9-goal benchmark 79% of the time from RGB images (averaged across the 9 disparate tasks).

【47】 Differentiable Generalised Predictive Coding 标题：可微广义预测编码链接：https://arxiv.org/abs/2112.03378

作者：André Ofner,Sebastian Stober 机构：Otto-von-Guericke University, Magdeburg, Germany 摘要：本文讨论了与神经科学中的神经过程理论一致的可微动力学模型，该模型将大脑功能视为分层过滤，旨在改进解释观察结果的内部生成模型。我们的工作扩展了精确梯度预测编码的现有实现，并允许与深度神经网络集成以实现非线性潜在状态参数化。与梯度下降结合误差反向传播相比，这种基于梯度的预测编码通过优化从数据向潜在状态传播的精度加权预测误差，在每一层局部优化神经网络。预测从潜在状态向较低层反向流动。这里提出的模型，GPC，使用精确的梯度来学习低潜态的层次和动力学预测。分层预测对感知内容及其结构进行编码。动态预测处理编码内容的变化。因此，层次和动态预测解决了相同潜在状态的不同方面。由于潜在状态的变化受其所代表的内容的影响，反之亦然，因此这两种途径相互作用，并允许跨时空尺度甚至向后编码内容动态依赖性的表示。我们将GPC应用于具有自适应采样率的序列数据上的各种感知任务。我们讨论了放宽线性分层模型布局假设，支持任意图结构的可能性。最后，我们勾勒出在嵌套的时空层次结构中有效感知和规划的想法，并讨论与大脑中马尔可夫毯子的联系。摘要：This paper deals with differentiable dynamical models congruent with neural process theories in neuroscience that cast brain function as hierarchical filtering aiming at the refinement of an internal generative model explaining observations. Our work extends existing implementations of predictive coding with exact gradients and allows integration with deep neural networks for non-linear latent state parameterization. In contrast to Gradient Descent in combination with error backpropagation, such gradient based predictive coding optimises neural networks locally in each layer by optimising precision-weighted prediction errors that propagate from data towards latent states. Predictions flow backwards, from latent states towards lower layers. The model suggested here, GPC, uses exact gradients to learn hierarchical and dynamical predictions of lower latent states. Hierarchical predictions encode the perceived content and its structure. Dynamical predictions address changes in the encoded content. As a result, hierarchical and dynamical predictions address different aspects of the same latent states. Since changes in latent states are influenced by the content they represent and vice versa, both pathways interact and allow to encode representations of content-dynamics dependencies across spatio-temporal scales and even backwards in time. We apply GPC to various perception tasks on sequential data with adaptive sampling rates. We discuss possibilities to relax the assumption of linearly hierarchical model layout in favour of arbitrary graph structure. Finally, we sketch out ideas for efficient perception and planning in nested spatio-temporal hierarchies and discuss the connection to Markov Blankets in the brain.

【48】 Audio Deepfake Perceptions in College Going Populations 标题：大学在校生的音频深伪认知链接：https://arxiv.org/abs/2112.03351

作者：Gabrielle Watson,Zahra Khanjani,Vandana P. Janeja 备注：Summary of study findings 摘要：Deepfake是使用人工智能方法生成或操纵的内容或材料，以冒充真实。有四种不同类型：音频、视频、图像和文本。在这项研究中，我们主要关注音频假货以及人们如何感知它。有几种音频深度假生成框架，但我们选择了MelGAN，这是一种非自回归快速音频深度假生成框架，需要的参数较少。本研究试图评估不同专业大学生的听觉感知。这项研究还回答了他们的背景和专业如何影响他们对人工智能假货的认知的问题。我们还从不同方面对结果进行了分析：年级水平、音频剪辑中使用的语法的复杂性、音频剪辑的长度、知道和不知道deepfakes这个词的人以及政治角度。有趣的是，研究结果显示，当一段音频片段具有政治内涵时，它会影响人们对它是真是假的看法，即使内容相当相似。本研究还探讨了背景和专业如何影响人们对假货的认知。摘要：Deepfake is content or material that is generated or manipulated using AI methods, to pass off as real. There are four different deepfake types: audio, video, image and text. In this research we focus on audio deepfakes and how people perceive it. There are several audio deepfake generation frameworks, but we chose MelGAN which is a non-autoregressive and fast audio deepfake generating framework, requiring fewer parameters. This study tries to assess audio deepfake perceptions among college students from different majors. This study also answers the question of how their background and major can affect their perception towards AI generated deepfakes. We also analyzed the results based on different aspects of: grade level, complexity of the grammar used in the audio clips, length of the audio clips, those who knew the term deepfakes and those who did not, as well as the political angle. It is interesting that the results show when an audio clip has a political connotation, it can affect what people think about whether it is real or fake, even if the content is fairly similar. This study also explores the question of how background and major can affect perception towards deepfakes.

【49】 Multidimensional Assignment Problem for multipartite entity resolution 标题：面向多方实体解析的多维指派问题链接：https://arxiv.org/abs/2112.03346

作者：Alla Kammerdiner,Alexander Semenov,Eduardo Pasiliao 机构：Semenov · Eduardo L. Pasiliao, Received: date Accepted: date 摘要：多部分实体解析旨在将来自多个数据集的记录集成到一个实体中。我们推导了一个数学公式，将一类记录链接问题作为一个组合优化问题，称为多维分配问题，在多个数据集上进行多部分实体解析。作为我们方法的动机，我们说明了多部分实体解析比顺序二部分匹配的优势。由于优化问题是NP难问题，我们采用贪婪算法和超大规模邻域搜索两种启发式方法来解决分配问题，并从多个数据集中找到最可能匹配到单个实体的记录。我们评估和比较了这些算法的性能及其对合成数据的修改。我们进行了计算实验，以比较最近的启发式算法（大规模邻域搜索）与贪婪算法（另一种地图启发式算法）以及两种版本的遗传算法（一种通用元启发式算法）的性能。重要的是，我们进行实验，比较两种重新开始搜索前一种启发式的替代方法，特别是随机抽样多启动和基于确定性设计的多启动。我们发现，有证据表明，随着数据库规模的扩大，基于设计的多重启动可能会更加有效。此外，我们还证明了大规模搜索，特别是它的多起点版本，优于简单的贪婪启发式搜索。贪婪搜索与大规模邻域搜索的混合提高了性能。使用multi-start，只需额外运行三次超大规模搜索，就可以提高超大规模搜索过程的性能。最后，我们提出了一种评估大规模邻域搜索复杂性的方法。摘要：Multipartite entity resolution aims at integrating records from multiple datasets into one entity. We derive a mathematical formulation for a general class of record linkage problems in multipartite entity resolution across many datasets as a combinatorial optimization problem known as the multidimensional assignment problem. As a motivation for our approach, we illustrate the advantage of multipartite entity resolution over sequential bipartite matching. Because the optimization problem is NP-hard, we apply two heuristic procedures, a Greedy algorithm and very large scale neighborhood search, to solve the assignment problem and find the most likely matching of records from multiple datasets into a single entity. We evaluate and compare the performance of these algorithms and their modifications on synthetically generated data. We perform computational experiments to compare performance of recent heuristic, the very large-scale neighborhood search, with a Greedy algorithm, another heuristic for the MAP, as well as with two versions of genetic algorithm, a general metaheuristic. Importantly, we perform experiments to compare two alternative methods of re-starting the search for the former heuristic, specifically a random-sampling multi-start and a deterministic design-based multi-start. We find evidence that design-based multi-start can be more efficient as the size of databases grow large. In addition, we show that very large scale search, especially its multi-start version, outperforms simple Greedy heuristic. Hybridization of Greedy search with very large scale neighborhood search improves the performance. Using multi-start with as few as three additional runs of very large scale search offers some improvement in the performance of the very large scale search procedure. Last, we propose an approach to evaluating complexity of the very large-scale neighborhood search.

【50】 Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks 标题：基于逻辑神经网络的神经符号归纳逻辑程序设计链接：https://arxiv.org/abs/2112.03324

作者：Prithviraj Sen,Breno W. S. R. de Carvalho,Ryan Riegel,Alexander Gray 机构：IBM Research 摘要：最近关于神经符号归纳逻辑编程的研究已经产生了一些有希望的方法，可以从嘈杂的真实数据中学习解释性规则。虽然一些建议使用模糊逻辑或实值逻辑中的可微算子来近似逻辑算子，这些算子是无参数的，因此降低了它们拟合数据的能力，但其他方法只是松散地基于逻辑，因此很难解释所学的“规则”。在本文中，我们提出了学习规则和最近提出的逻辑神经网络（LNN）。与其他方法相比，LNN提供了与经典布尔逻辑的强大连接，从而允许精确解释学习的规则，同时包含可通过基于梯度的优化进行训练的参数，以有效拟合数据。我们将LNNs扩展为一阶逻辑中的规则。我们在标准基准测试任务上的实验证实，LNN规则具有高度的可解释性，并且由于其灵活的参数化，可以达到相当或更高的精度。摘要：Recent work on neuro-symbolic inductive logic programming has led to promising approaches that can learn explanatory rules from noisy, real-world data. While some proposals approximate logical operators with differentiable operators from fuzzy or real-valued logic that are parameter-free thus diminishing their capacity to fit the data, other approaches are only loosely based on logic making it difficult to interpret the learned "rules". In this paper, we propose learning rules with the recently proposed logical neural networks (LNN). Compared to others, LNNs offer strong connection to classical Boolean logic thus allowing for precise interpretation of learned rules while harboring parameters that can be trained with gradient-based optimization to effectively fit the data. We extend LNNs to induce rules in first-order logic. Our experiments on standard benchmarking tasks confirm that LNN rules are highly interpretable and can achieve comparable or higher accuracy due to their flexible parameterization.

【51】 Dynamic Graph Learning-Neural Network for Multivariate Time Series Modeling 标题：多变量时间序列建模的动态图学习-神经网络链接：https://arxiv.org/abs/2112.03273

作者：Zhuoling Li,Gaowei Zhang,Lingyu Xu,Jie Yu 机构：School of Computer Engineering and Science, Shanghai University, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Shanghai Institute for Advanced Communication and Data Science, Shanghai University 摘要：多元时间序列预测是一项具有挑战性的任务，因为数据涉及长期和短期模式的混合，变量之间具有动态时空依赖性。现有的图神经网络（GNN）通常使用预定义的空间图或学习的固定邻接图来建模多变量关系。它限制了GNN的应用，无法应对上述挑战。在本文中，我们提出了一个新的框架，即静态和动态图学习神经网络（SDGL）。该模型从数据中获取静态和动态图形矩阵，分别对长期和短期模式进行建模。静态矩阵通过节点嵌入来捕获固定的长期关联模式，并利用图的正则性来控制学习静态图的质量。为了捕获变量之间的动态依赖关系，我们提出了一种基于变化节点特征和静态节点嵌入生成时变矩阵的动态图学习方法。在该方法中，我们将学习到的静态图信息整合为归纳偏差，从而更好地构建动态图和局部时空模式。在两个具有额外结构信息的交通数据集和四个时间序列数据集上进行了大量实验，结果表明，我们的方法在几乎所有数据集上都达到了最先进的性能。如果论文被接受，我将在github上打开源代码。摘要：Multivariate time series forecasting is a challenging task because the data involves a mixture of long- and short-term patterns, with dynamic spatio-temporal dependencies among variables. Existing graph neural networks (GNN) typically model multivariate relationships with a pre-defined spatial graph or learned fixed adjacency graph. It limits the application of GNN and fails to handle the above challenges. In this paper, we propose a novel framework, namely static- and dynamic-graph learning-neural network (SDGL). The model acquires static and dynamic graph matrices from data to model long- and short-term patterns respectively. Static matric is developed to capture the fixed long-term association pattern via node embeddings, and we leverage graph regularity for controlling the quality of the learned static graph. To capture dynamic dependencies among variables, we propose dynamic graphs learning method to generate time-varying matrices based on changing node features and static node embeddings. And in the method, we integrate the learned static graph information as inductive bias to construct dynamic graphs and local spatio-temporal patterns better. Extensive experiments are conducted on two traffic datasets with extra structural information and four time series datasets, which show that our approach achieves state-of-the-art performance on almost all datasets. If the paper is accepted, I will open the source code on github.

【52】 Synthetic ECG Signal Generation Using Generative Neural Networks 标题：基于产生式神经网络的合成心电信号生成链接：https://arxiv.org/abs/2112.03268

作者：Edmond Adib,Fatemeh Afghah,John J. Prevost 机构： Electrical and Computer Engineering Department, University of Texas at San Antonio, (UTSA), San Antonio, TX, USA, Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA 摘要：由于缺乏异常病例，心电图（ECG）数据集往往高度不平衡。此外，由于隐私问题，真实患者心电图的使用受到高度管制。因此，总是需要更多的ECG数据，尤其是训练自动诊断机器学习模型，当在平衡数据集上训练时，这些模型的性能更好。我们研究了生成性对抗网络（GAN）家族中5种不同模型的综合心电图生成能力，并比较了它们的性能，重点是正常心动周期。采用动态时间扭曲（DTW）、Fr¨echet和欧几里德距离函数对性能进行定量测量。提出并应用了五种不同的方法来评估生成的节拍。我们还提出了3个新概念（阈值、可接受节拍和生产率），并将其与上述方法结合使用，作为模型间比较的系统方法。结果表明，所有被测试的模型都能在一定程度上成功地大量生成形态特征高度相似的可接受的心跳，并且所有这些模型都有可能用于扩充不平衡的数据集。然而，对产生的节拍进行目视检查有利于BiLSTM DC GAN和WGAN，因为它们产生的节拍在统计上更可接受。此外，就生产率而言，经典GAN的生产率高达72%。摘要：Electrocardiogram (ECG) datasets tend to be highly imbalanced due to the scarcity of abnormal cases. Additionally, the use of real patients' ECG is highly regulated due to privacy issues. Therefore, there is always a need for more ECG data, especially for the training of automatic diagnosis machine learning models, which perform better when trained on a balanced dataset. We studied the synthetic ECG generation capability of 5 different models from the generative adversarial network (GAN) family and compared their performances, the focus being only on Normal cardiac cycles. Dynamic Time Warping (DTW), Fr'echet, and Euclidean distance functions were employed to quantitatively measure performance. Five different methods for evaluating generated beats were proposed and applied. We also proposed 3 new concepts (threshold, accepted beat and productivity rate) and employed them along with the aforementioned methods as a systematic way for comparison between models. The results show that all the tested models can to an extent successfully mass-generate acceptable heartbeats with high similarity in morphological features, and potentially all of them can be used to augment imbalanced datasets. However, visual inspections of generated beats favor BiLSTM-DC GAN and WGAN, as they produce statistically more acceptable beats. Also, with regards to productivity rate, the Classic GAN is superior with a 72% productivity rate.

【53】 Communication and Energy Efficient Slimmable Federated Learning via Superposition Coding and Successive Decoding 标题：基于叠加编码和逐次解码的通信和能量高效的可精简联邦学习链接：https://arxiv.org/abs/2112.03267

作者：Hankyul Baek,Won Joon Yun,Soyi Jung,Jihong Park,Mingyue Ji,Joongheon Kim,Mehdi Bennis 机构： 1Korea University, 2Deakin University, 3The University of Utah, 4University of Oulu 备注：11 pages, 10 Figures, presented at the International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21). arXiv admin note: substantial text overlap with arXiv:2112.02543 摘要：移动设备是大数据不可或缺的来源。联邦学习（FL）通过交换本地训练的模型而不是原始数据，在利用这些私有数据方面具有巨大的潜力。然而，移动设备通常能量有限且无线连接，FL无法灵活应对其异构且随时间变化的能量容量和通信吞吐量，从而限制了采用。基于这些问题，我们提出了一个新的能源和通信效率FL框架，即SlimFL。为了解决能量容量不均匀的问题，SlimFL中的每个设备都运行一个宽度可调的可精简神经网络（SNN）。为了解决异构通信吞吐量问题，每个全宽（1.0x）SNN模型及其半宽（$0.5$x）模型在传输前进行叠加编码，并在接收后根据信道质量连续解码为0.5x或$1.0$x模型。仿真结果表明，SlimFL可以同时训练$0.5$x和$1.0$x两个模型，具有合理的精度和收敛速度，而香草FL则可以使用$2$x的通信资源分别训练这两个模型。令人惊讶的是，对于较差的信道和非IID数据分布，SlimFL比香草FL在更低的能量足迹下实现了更高的精度，在此情况下香草FL收敛较慢。摘要：Mobile devices are indispensable sources of big data. Federated learning (FL) has a great potential in exploiting these private data by exchanging locally trained models instead of their raw data. However, mobile devices are often energy limited and wirelessly connected, and FL cannot cope flexibly with their heterogeneous and time-varying energy capacity and communication throughput, limiting the adoption. Motivated by these issues, we propose a novel energy and communication efficient FL framework, coined SlimFL. To resolve the heterogeneous energy capacity problem, each device in SlimFL runs a width-adjustable slimmable neural network (SNN). To address the heterogeneous communication throughput problem, each full-width (1.0x) SNN model and its half-width ($0.5$x) model are superposition-coded before transmission, and successively decoded after reception as the 0.5x or $1.0$x model depending on the channel quality. Simulation results show that SlimFL can simultaneously train both $0.5$x and $1.0$x models with reasonable accuracy and convergence speed, compared to its vanilla FL counterpart separately training the two models using $2$x more communication resources. Surprisingly, SlimFL achieves even higher accuracy with lower energy footprints than vanilla FL for poor channels and non-IID data distributions, under which vanilla FL converges slowly.

【54】 Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization 标题：基于贝叶斯优化的时间序列多智能体市场模拟器的有效校准链接：https://arxiv.org/abs/2112.03874

作者：Yuanlu Bai,Henry Lam,Svitlana Vyetrenko,Tucker Balch 机构：Columbia University, USA, J.P.Morgan AI Research, USA 摘要：多代理市场模拟通常用于为下游机器学习或强化学习任务创建环境，例如在将交易策略部署到实时交易之前对其进行训练或测试。在电子交易市场中，通常只能直接观察到由多个市场参与者相互作用产生的价格或交易量时间序列。因此，需要校准多智能体市场环境，以便模拟智能体交互产生的时间序列类似于历史-这相当于解决一个高度复杂的大规模优化问题。在本文中，我们提出了一个简单而有效的框架，用于根据历史时间序列观测值校准多智能体市场模拟器参数。首先，我们考虑一个新的资格设置的概念绕过潜在的不可识别性问题。其次，我们推广了带有Bonferroni校正的两样本Kolmogorov-Smirnov（K-S）检验来检验两个高维时间序列分布之间的相似性，这给出了一个简单但有效的时间序列样本集之间的距离度量。第三，我们建议使用贝叶斯优化（BO）和信赖域BO（TuRBO）来最小化上述距离度量。最后，我们通过数值实验证明了该框架的有效性。摘要：Multi-agent market simulation is commonly used to create an environment for downstream machine learning or reinforcement learning tasks, such as training or testing trading strategies before deploying them to real-time trading. In electronic trading markets only the price or volume time series, that result from interaction of multiple market participants, are typically directly observable. Therefore, multi-agent market environments need to be calibrated so that the time series that result from interaction of simulated agents resemble historical -- which amounts to solving a highly complex large-scale optimization problem. In this paper, we propose a simple and efficient framework for calibrating multi-agent market simulator parameters from historical time series observations. First, we consider a novel concept of eligibility set to bypass the potential non-identifiability issue. Second, we generalize the two-sample Kolmogorov-Smirnov (K-S) test with Bonferroni correction to test the similarity between two high-dimensional time series distributions, which gives a simple yet effective distance metric between the time series sample sets. Third, we suggest using Bayesian optimization (BO) and trust-region BO (TuRBO) to minimize the aforementioned distance metric. Finally, we demonstrate the efficiency of our framework using numerical experiments.

【55】 Hard Sample Aware Noise Robust Learning for Histopathology Image Classification 标题：硬样本感知噪声鲁棒学习在组织病理学图像分类中的应用链接：https://arxiv.org/abs/2112.03694

作者：Chuang Zhu,Wenkai Chen,Ting Peng,Ying Wang,Mulan Jin 机构： Peng are with the School of Arti-ficial Intelligence, Beijing University of Posts and Telecommunica-tions 备注：14 pages, 20figures, IEEE Transactions on Medical Imaging 摘要：基于深度学习的组织病理学图像分类是帮助医生提高癌症诊断准确性和及时性的关键技术。然而，在复杂的人工标注过程中，标签噪声往往是不可避免的，从而误导了分类模型的训练。在这项工作中，我们介绍了一种新的硬样本感知噪声鲁棒学习方法用于组织病理学图像分类。为了区分信息性硬样本和有害噪声样本，我们利用样本训练历史建立了易/硬/噪声（EHN）检测模型。然后，我们将EHN集成到一个自训练结构中，通过逐步标记校正来降低噪声率。利用获得的几乎干净的数据集，我们进一步提出了一种噪声抑制和硬增强（NSHE）方案来训练噪声鲁棒模型。与以前的工作相比，我们的方法可以节省更多的干净样本，并且可以直接应用于真实的有噪声数据集场景，而不需要使用干净的子集。实验结果表明，无论是在合成数据集还是在真实噪声数据集，该方法都优于目前最新的方法。源代码和数据可在https://github.com/bupt-ai-cz/HSA-NRL/. 摘要：Deep learning-based histopathology image classification is a key technique to help physicians in improving the accuracy and promptness of cancer diagnosis. However, the noisy labels are often inevitable in the complex manual annotation process, and thus mislead the training of the classification model. In this work, we introduce a novel hard sample aware noise robust learning method for histopathology image classification. To distinguish the informative hard samples from the harmful noisy ones, we build an easy/hard/noisy (EHN) detection model by using the sample training history. Then we integrate the EHN into a self-training architecture to lower the noise rate through gradually label correction. With the obtained almost clean dataset, we further propose a noise suppressing and hard enhancing (NSHE) scheme to train the noise robust model. Compared with the previous works, our method can save more clean samples and can be directly applied to the real-world noisy dataset scenario without using a clean subset. Experimental results demonstrate that the proposed scheme outperforms the current state-of-the-art methods in both the synthetic and real-world noisy datasets. The source code and data are available at https://github.com/bupt-ai-cz/HSA-NRL/.

【56】 QKSA: Quantum Knowledge Seeking Agent -- resource-optimized reinforcement learning using quantum process tomography 标题：QKSA：量子知识寻求Agent--基于量子过程层析成像的资源优化强化学习链接：https://arxiv.org/abs/2112.03643

作者：Aritra Sarkar,Zaid Al-Ars,Harshitta Gandhi,Koen Bertels 机构：Department of Quantum & Computer Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands, Guru Gobind Singh Indraprastha University, Dwarka, India, QBee.eu, Leuven, Belgium 备注：superseding initial QKSA framework as presented in arXiv:2107.01429 摘要：在本研究中，我们将人工通用智能的通用强化学习（URL）代理模型扩展到量子环境。将经典探索性随机知识搜索代理KL-KSA的效用函数推广到密度矩阵上量子信息理论的距离测度。量子过程层析成像（QPT）算法形成了环境动力学建模程序的可处理子集。基于基于算法复杂度和计算资源复杂度的可变代价函数选择最优QPT策略。我们不使用图灵机，而是在高级语言上估计成本指标，以便进行真实的实验。整个代理设计封装在一个自复制的quine中，该quine根据最优策略选择方案的预测值对成本函数进行变异。因此，具有帕累托最优QPT策略的多个代理使用遗传规划进化，模仿物理理论的发展，每个代理具有不同的资源权衡。这种形式化的框架称为量子知识寻求代理（QKSA）。尽管量子强化学习非常重要，但与当前量子机器学习的发展方向相比，量子强化学习模型并不多见。QKSA是第一个类似于经典URL模型的框架提案。类似于AIXI tl是Solomonoff universal感应的资源受限主动版本，QKSA是最近提出的基于算法信息的量子力学重建的资源受限参与式观察者框架。QKSA可用于模拟和研究量子信息论的各个方面。具体来说，我们证明了它可以用来加速量子变分算法，其中包括层析重建作为其积分子程序。摘要：In this research, we extend the universal reinforcement learning (URL) agent models of artificial general intelligence to quantum environments. The utility function of a classical exploratory stochastic Knowledge Seeking Agent, KL-KSA, is generalized to distance measures from quantum information theory on density matrices. Quantum process tomography (QPT) algorithms form the tractable subset of programs for modeling environmental dynamics. The optimal QPT policy is selected based on a mutable cost function based on algorithmic complexity as well as computational resource complexity. Instead of Turing machines, we estimate the cost metrics on a high-level language to allow realistic experimentation. The entire agent design is encapsulated in a self-replicating quine which mutates the cost function based on the predictive value of the optimal policy choosing scheme. Thus, multiple agents with pareto-optimal QPT policies evolve using genetic programming, mimicking the development of physical theories each with different resource trade-offs. This formal framework is termed Quantum Knowledge Seeking Agent (QKSA). Despite its importance, few quantum reinforcement learning models exist in contrast to the current thrust in quantum machine learning. QKSA is the first proposal for a framework that resembles the classical URL models. Similar to how AIXI-tl is a resource-bounded active version of Solomonoff universal induction, QKSA is a resource-bounded participatory observer framework to the recently proposed algorithmic information-based reconstruction of quantum mechanics. QKSA can be applied for simulating and studying aspects of quantum information theory. Specifically, we demonstrate that it can be used to accelerate quantum variational algorithms which include tomographic reconstruction as its integral subroutine.

【57】 A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation 标题：一种用于多通道语音分离的时域广义维纳过滤链接：https://arxiv.org/abs/2112.03533

作者：Yi Luo 机构：Tencent AI Lab, Shenzhen, China 摘要：频域神经波束形成器是目前多通道语音分离模型的主流方法。尽管这些频域波束形成器具有良好的性能和有效性，但它们仍然存在性能有限的局限性，并且难以为复杂值操作设计合适的网络。在本文中，我们提出了一种时域广义维纳滤波器（TD-GWF），它是对传统频域波束形成器的扩展，具有更高的oracle性能，并且只涉及实值运算。我们还讨论了TD-GWF如何连接到传统的频域波束形成器。实验结果表明，在最近提出的顺序神经波束形成管道中，用TD-GWF代替频域波束形成器可以显著提高性能。摘要：Frequency-domain neural beamformers are the mainstream methods for recent multi-channel speech separation models. Despite their well-defined behaviors and the effectiveness, such frequency-domain beamformers still have the limitations of a bounded oracle performance and the difficulties of designing proper networks for the complex-valued operations. In this paper, we propose a time-domain generalized Wiener filter (TD-GWF), an extension to the conventional frequency-domain beamformers that has higher oracle performance and only involves real-valued operations. We also provide discussions on how TD-GWF can be connected to conventional frequency-domain beamformers. Experiment results show that a significant performance improvement can be achieved by replacing frequency-domain beamformers by the TD-GWF in the recently proposed sequential neural beamforming pipelines.

【58】 Training Deep Models to be Explained with Fewer Examples 标题：训练深度模型需要用更少的例子来解释链接：https://arxiv.org/abs/2112.03508

作者：Tomoharu Iwata,Yuya Yoshikawa 机构：NTT Communication Science Laboratories, Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology 摘要：尽管深度模型具有很高的预测性能，但人类很难理解他们所做的预测。解释性对于真实应用程序来说很重要，以证明其可靠性。已经提出了许多基于示例的解释方法，例如重新输入点选择，其中由一组训练示例定义的解释模型用于解释预测模型。为了提高解释性，减少解释模型中的示例数非常重要。然而，使用较少实例的解释可能是不可靠的，因为用这种基于实例的解释模型很难很好地逼近预测模型。不忠实的解释意味着可解释模型的预测与预测模型的预测不同。我们提出了一种训练深度模型的方法，使得它们的预测能够被解释模型用少量的例子忠实地解释。我们使用稀疏正则化器同时训练预测和解释模型，以减少示例数。该方法可用于任何基于神经网络的预测模型。使用多个数据集的实验表明，该方法在保持预测性能的同时提高了信度。摘要：Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples. We train the prediction and explanation models simultaneously with a sparse regularizer for reducing the number of examples. The proposed method can be incorporated into any neural network-based prediction models. Experiments using several datasets demonstrate that the proposed method improves faithfulness while keeping the predictive performance.

【59】 RSBNet: One-Shot Neural Architecture Search for A Backbone Network in Remote Sensing Image Recognition 标题：RSBNet：遥感图像识别中骨干网络的一次性神经结构搜索链接：https://arxiv.org/abs/2112.03456

作者：Cheng Peng,Yangyang Li,Ronghua Shang,Licheng Jiao 机构：Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of, Artificial Intelligence, Xidian University, Xi’an, China 摘要：近年来，大量基于深度学习的方法已成功应用于各种遥感图像（RSI）识别任务。然而，目前RSI领域的深度学习方法大多依赖于人工设计的主干网络提取的特征，由于RSI的复杂性和先验知识的局限性，这严重阻碍了深度学习模型的发展。在本文中，我们研究了一种新的RSI识别任务中主干结构的设计范式，包括场景分类、土地覆盖分类和目标检测。提出了一种新的基于权重共享策略和进化算法的一次性搜索框架RSBNet，该框架包括三个阶段：首先，基于集成单路径训练策略，在自组装的大规模RSI数据集上预训练在分层搜索空间中构造的超网。然后，通过可切换的识别模块为预先训练的超网配备不同的识别头，并分别对目标数据集进行微调，以获得特定于任务的超网。最后，在不需要任何网络训练的情况下，基于进化算法搜索不同识别任务的最佳主干结构。在五个基准数据集上对不同的识别任务进行了大量的实验，结果表明了所提出的搜索范式的有效性，并证明了所搜索的主干能够灵活地适应不同的RSI识别任务，并取得了令人印象深刻的性能。摘要：Recently, a massive number of deep learning based approaches have been successfully applied to various remote sensing image (RSI) recognition tasks. However, most existing advances of deep learning methods in the RSI field heavily rely on the features extracted by the manually designed backbone network, which severely hinders the potential of deep learning models due the complexity of RSI and the limitation of prior knowledge. In this paper, we research a new design paradigm for the backbone architecture in RSI recognition tasks, including scene classification, land-cover classification and object detection. A novel one-shot architecture search framework based on weight-sharing strategy and evolutionary algorithm is proposed, called RSBNet, which consists of three stages: Firstly, a supernet constructed in a layer-wise search space is pretrained on a self-assembled large-scale RSI dataset based on an ensemble single-path training strategy. Next, the pre-trained supernet is equipped with different recognition heads through the switchable recognition module and respectively fine-tuned on the target dataset to obtain task-specific supernet. Finally, we search the optimal backbone architecture for different recognition tasks based on the evolutionary algorithm without any network training. Extensive experiments have been conducted on five benchmark datasets for different recognition tasks, the results show the effectiveness of the proposed search paradigm and demonstrate that the searched backbone is able to flexibly adapt different RSI recognition tasks and achieve impressive performance.

【60】 Quality control for more reliable integration of deep learning-based image segmentation into medical workflows 标题：将基于深度学习的图像分割更可靠地集成到医疗工作流中的质量控制链接：https://arxiv.org/abs/2112.03277

作者：Elena Williams,Sebastian Niehaus,Janis Reinelt,Alberto Merola,Paul Glad Mihai,Ingo Roeder,Nico Scherf,Maria del C. Valdés Hernández 机构： – AICURA medical, Bessemerstrasse , Berlin, Germany., – Centre for Clinical Brain Sciences. University of Edinburgh., – Institute for Medical Informatics and Biometry, Technische Universität Dresden, Fetscherstrasse , Dresden, Germany. 备注：25 pages 摘要：机器学习算法是现代诊断辅助软件的基础，该软件在临床实践中，特别是在放射学中被证明是有价值的。然而，不准确主要是由于用于训练这些算法的临床样本有限，妨碍了它们在临床医生中的广泛适用性、接受度和认可度。我们分析了最先进的自动质量控制（QC）方法，这些方法可以在这些算法中实现，以估计其输出的确定性。我们在脑图像分割任务中验证了最有希望的方法，以识别磁共振成像数据中的白质高强度（WMH）。WMH是一种常见于成年中后期的小血管疾病，由于其大小和分布模式不同，对其进行分割尤其具有挑战性。我们的结果表明，不确定性聚合和骰子预测在该任务的故障检测中最有效。两种方法独立地将平均骰子从0.82提高到0.84。我们的工作揭示了QC方法如何帮助检测分割失败的案例，从而使自动分割更可靠，更适合临床实践。摘要：Machine learning algorithms underpin modern diagnostic-aiding software, which has proved valuable in clinical practice, particularly in radiology. However, inaccuracies, mainly due to the limited availability of clinical samples for training these algorithms, hamper their wider applicability, acceptance, and recognition amongst clinicians. We present an analysis of state-of-the-art automatic quality control (QC) approaches that can be implemented within these algorithms to estimate the certainty of their outputs. We validated the most promising approaches on a brain image segmentation task identifying white matter hyperintensities (WMH) in magnetic resonance imaging data. WMH are a correlate of small vessel disease common in mid-to-late adulthood and are particularly challenging to segment due to their varied size, and distributional patterns. Our results show that the aggregation of uncertainty and Dice prediction were most effective in failure detection for this task. Both methods independently improved mean Dice from 0.82 to 0.84. Our work reveals how QC methods can help to detect failed segmentation cases and therefore make automatic segmentation more reliable and suitable for clinical practice.

机器翻译，仅供参考

linux https 网络安全学习方法联邦学习

0 人点赞