访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.RO机器人相关,共计15篇
【1】 Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning 标题:基于任务驱动的具有统计保证的机器人学习失配检测
作者:Alec Farid,Sushant Veer,Anirudha Majumdar 机构:Department of Mechanical and Aerospace Engineering, Princeton University 链接:https://arxiv.org/abs/2106.13703 摘要:我们的目标是执行分布外(OOD)检测,即检测机器人何时在不同于用于训练机器人的分布的环境中工作。我们利用可能近似正确(PAC)-贝叶斯理论,在训练分布上训练一个性能有保证界的策略。我们的OOD检测的关键思想依赖于以下直觉:违反测试环境的性能限制提供了机器人正在操作OOD的证据。我们通过基于p值和浓度不等式的统计技术将其形式化。由此产生的方法(i)在OOD检测上提供有保证的置信边界,并且(ii)是任务驱动的,并且仅对影响机器人性能的变化敏感。我们在一个模拟的例子中演示了我们的方法,用不熟悉的姿势或形状来抓取物体。本文还对一架无人机在陌生环境(包括风干扰和不同的障碍物密度)中进行了基于视觉的避障仿真和硬件实验。我们的例子表明,我们可以执行任务驱动的OOD检测只需少数几个试验。与基线的比较也证明了我们的方法在提供统计保证和对任务无关的分布变化不敏感方面的优势。 摘要:Our goal is to perform out-of-distribution (OOD) detection, i.e., to detect when a robot is operating in environments that are drawn from a different distribution than the environments used to train the robot. We leverage Probably Approximately Correct (PAC)-Bayes theory in order to train a policy with a guaranteed bound on performance on the training distribution. Our key idea for OOD detection then relies on the following intuition: violation of the performance bound on test environments provides evidence that the robot is operating OOD. We formalize this via statistical techniques based on p-values and concentration inequalities. The resulting approach (i) provides guaranteed confidence bounds on OOD detection, and (ii) is task-driven and sensitive only to changes that impact the robot's performance. We demonstrate our approach on a simulated example of grasping objects with unfamiliar poses or shapes. We also present both simulation and hardware experiments for a drone performing vision-based obstacle avoidance in unfamiliar environments (including wind disturbances and different obstacle densities). Our examples demonstrate that we can perform task-driven OOD detection within just a handful of trials. Comparisons with baselines also demonstrate the advantages of our approach in terms of providing statistical guarantees and being insensitive to task-irrelevant distribution shifts.
【2】 Active Learning in Robotics: A Review of Control Principles 标题:机器人中的主动学习:控制原理综述
作者:Annalisa T. Taylor,Thomas A. Berrueta,Todd D. Murphey 机构:Mechanical Engineering, Northwestern University, Sheridan Rd., Evanston, Illinois, United States 备注:None 链接:https://arxiv.org/abs/2106.13697 摘要:主动学习是一个决策过程。在抽象和物理环境中,主动学习需要分析和行动。这是一个机器人主动学习的回顾,重点放在方法适合于具体的学习系统的要求。机器人必须能够通过连续的在线部署高效灵活地学习。这带来了一系列独特的面向控制的挑战——必须选择合适的措施作为目标,综合实时控制,并在对环境或机器人本身了解有限的情况下进行分析,以保证性能和安全。在这项工作中,我们调查了机器人主动学习系统的基本组成部分。我们讨论了机器人通常会遇到的学习任务的种类,衡量观察信息内容的方法,以及生成行动计划的算法。此外,我们还提供了各种各样的例子——从环境映射到非参数形状估计——这些例子突出了学习任务、信息度量和控制技术之间的质量差异。最后,我们讨论了面向控制的开放挑战,包括安全约束学习和分布式学习。 摘要:Active learning is a decision-making process. In both abstract and physical settings, active learning demands both analysis and action. This is a review of active learning in robotics, focusing on methods amenable to the demands of embodied learning systems. Robots must be able to learn efficiently and flexibly through continuous online deployment. This poses a distinct set of control-oriented challenges -- one must choose suitable measures as objectives, synthesize real-time control, and produce analyses that guarantee performance and safety with limited knowledge of the environment or robot itself. In this work, we survey the fundamental components of robotic active learning systems. We discuss classes of learning tasks that robots typically encounter, measures with which they gauge the information content of observations, and algorithms for generating action plans. Moreover, we provide a variety of examples -- from environmental mapping to nonparametric shape estimation -- that highlight the qualitative differences between learning tasks, information measures, and control techniques. We conclude with a discussion of control-oriented open challenges, including safety-constrained learning and distributed learning.
【3】 Move Beyond Trajectories: Distribution Space Coupling for Crowd Navigation 标题:超越轨迹:面向人群导航的分布空间耦合
作者:Muchen Sun,Francesca Baldini,Peter Trautman,Todd Murphey 机构:∗Department of Mechanical Engineering, Northwestern University, Evanston, IL , USA, †Honda Research Institute, San Jose, CA , USA, ‡California Institute of Technology, Pasadena, CA , USA 备注:None 链接:https://arxiv.org/abs/2106.13667 摘要:合作避免碰撞是机器人在密集人群中导航的一项关键功能,如果合作避免碰撞失败,可能会导致过度进取或过于谨慎的行为。协同避碰的一个必要条件是将智能体的轨迹预测与机器人的轨迹规划相结合。然而,目前还不清楚基于轨迹的协同避碰是否能捕捉到正确的agent属性。在这项工作中,我们从基于轨迹的耦合迁移到耦合代理偏好分布的形式。特别地,我们证明了偏好分布(代表代理意图的概率密度函数)可以捕获代理行为的高阶统计信息,例如合作意愿。因此,分布空间中的耦合比轨迹空间中的耦合利用了更多关于代理间合作的信息。在此基础上,提出了配电网规划与预测耦合的一般目标,提出了一种基于变分分析的保证充分降阶迭代最优响应优化方法。基于这一分析,我们开发了一个基于采样的运动规划框架DistNav,它在笔记本电脑CPU上实时运行。我们对现实世界数据集和模拟环境中具有挑战性的场景评估了我们的方法,并针对各种基于模型和基于机器学习的方法进行了基准测试。我们的方法的安全性和效率统计优于所有其他模型。最后,我们发现DistNav在人身安全和效率方面具有竞争力。 摘要:Cooperatively avoiding collision is a critical functionality for robots navigating in dense human crowds, failure of which could lead to either overaggressive or overcautious behavior. A necessary condition for cooperative collision avoidance is to couple the prediction of the agents' trajectories with the planning of the robot's trajectory. However, it is unclear that trajectory based cooperative collision avoidance captures the correct agent attributes. In this work we migrate from trajectory based coupling to a formalism that couples agent preference distributions. In particular, we show that preference distributions (probability density functions representing agents' intentions) can capture higher order statistics of agent behaviors, such as willingness to cooperate. Thus, coupling in distribution space exploits more information about inter-agent cooperation than coupling in trajectory space. We thus introduce a general objective for coupled prediction and planning in distribution space, and propose an iterative best response optimization method based on variational analysis with guaranteed sufficient decrease. Based on this analysis, we develop a sampling-based motion planning framework called DistNav that runs in real time on a laptop CPU. We evaluate our approach on challenging scenarios from both real world datasets and simulation environments, and benchmark against a wide variety of model based and machine learning based approaches. The safety and efficiency statistics of our approach outperform all other models. Finally, we find that DistNav is competitive with human safety and efficiency performance.
【4】 Navigating A Mobile Robot Using Switching Distributed Sensor Networks 标题:基于交换式分布式传感器网络的移动机器人导航
作者:Xingkang He,Ehsan Hashemi,Karl H. Johansson 机构: School of Electrical Engineering and Computer Science, KTH RoyalInstitute of Technology, Hashemi is with Department of Mechanical Engineering, University ofAlberta 链接:https://arxiv.org/abs/2106.13529 摘要:本文提出了一种通过多个分布式传感器网络(DSN)对移动机器人的状态进行估计的方法,使其能够在资源感知、时间高效的情况下,连续完成一系列任务,即状态进入每个目标集,停留在不少于所需时间的范围内,提出了一种新的机器人状态估计与导航结构,该结构集成了一个事件触发的机器人任务切换反馈控制器和一个两时间尺度的分布式状态估计器。与现有的方法相比,该体系结构具有三个主要优点:第一,在每个任务中只有一个主动的DSN用于感知和估计机器人的状态,对于不同的任务,机器人可以通过考虑资源节约和系统性能来切换主动的DSN;第二,机器人每次只需与一个主动传感器通信,就可以从主动DSN中获取机器人的状态信息;第三,不需要在线优化。通过该控制器,机器人能够按照参考轨迹完成任务,并在满足事件触发条件时切换到下一个任务。利用该估计器,每个主动传感器都能够估计机器人的状态。在适当的条件下,证明了状态估计误差和轨迹跟踪偏差分别为两个时变序列的上界,这在事件触发条件下起着至关重要的作用。此外,我们还找到了完成任务的一个充分条件,并给出了任务运行时间的上界。对室内机器人的定位和导航进行了数值仿真,验证了该结构的有效性。 摘要:This paper proposes a method to navigate a mobile robot by estimating its state over a number of distributed sensor networks (DSNs) such that it can successively accomplish a sequence of tasks, i.e., its state enters each targeted set and stays inside no less than the desired time, under a resource-aware, time-efficient, and computation- and communication-constrained setting.We propose a new robot state estimation and navigation architecture, which integrates an event-triggered task-switching feedback controller for the robot and a two-time-scale distributed state estimator for each sensor. The architecture has three major advantages over existing approaches: First, in each task only one DSN is active for sensing and estimating the robot state, and for different tasks the robot can switch the active DSN by taking resource saving and system performance into account; Second, the robot only needs to communicate with one active sensor at each time to obtain its state information from the active DSN; Third, no online optimization is required. With the controller, the robot is able to accomplish a task by following a reference trajectory and switch to the next task when an event-triggered condition is fulfilled. With the estimator, each active sensor is able to estimate the robot state. Under proper conditions, we prove that the state estimation error and the trajectory tracking deviation are upper bounded by two time-varying sequences respectively, which play an essential role in the event-triggered condition. Furthermore, we find a sufficient condition for accomplishing a task and provide an upper bound of running time for the task. Numerical simulations of an indoor robot's localization and navigation are provided to validate the proposed architecture.
【5】 Non-Parametric Neuro-Adaptive Control Subject to Task Specifications 标题:受任务规范约束的非参数神经自适应控制
作者:Christos K. Verginis,Zhe Xu,Ufuk Topcu 机构:University of Texas at Austin, Arizona State University 链接:https://arxiv.org/abs/2106.13498 摘要:我们发展了一种基于学习的机器人系统控制算法,该算法由未知非线性动力学控制,以满足以信号时序逻辑规范表示的任务。大多数现有算法要么对动态项采用某些参数形式,要么采用不必要的大控制输入(例如,使用互易函数)以提供理论保证。该算法将基于神经网络的学习与自适应控制相结合,避免了上述缺点。更具体地说,该算法使用与不同任务和机器人参数集合相对应的训练数据来学习控制器(表示为神经网络)。然后,它将这种神经网络集成到在线闭环自适应控制机制中,从而产生的行为满足用户定义的任务。该算法不需要任何关于未知动态项的信息,也不需要任何近似格式。我们提供了满足任务的形式化理论保证,并在使用6自由度机械手的虚拟仿真器中验证了算法的有效性。 摘要:We develop a learning-based algorithm for the control of robotic systems governed by unknown, nonlinear dynamics to satisfy tasks expressed as signal temporal logic specifications. Most existing algorithms either assume certain parametric forms for the dynamic terms or resort to unnecessarily large control inputs (e.g., using reciprocal functions) in order to provide theoretical guarantees. The proposed algorithm avoids the aforementioned drawbacks by innovatively integrating neural network-based learning with adaptive control. More specifically, the algorithm learns a controller, represented as a neural network, using training data that correspond to a collection of different tasks and robot parameters. It then incorporates this neural network into an online closed-loop adaptive control mechanism in such a way that the resulting behavior satisfies a user-defined task. The proposed algorithm does not use any information on the unknown dynamic terms or any approximation schemes. We provide formal theoretical guarantees on the satisfaction of the task and we demonstrate the effectiveness of the algorithm in a virtual simulator using a 6-DOF robotic manipulator.
【6】 Collision Avoidance for Unmanned Aerial Vehicles in the Presence of Static and Moving Obstacles 标题:无人机在静电和移动障碍物存在下的避碰
作者:Andrei Marchidan,Efstathios Bakolas 机构:The University of Texas at Austin, Austin, TX 备注:None 链接:https://arxiv.org/abs/2106.13451 摘要:本文提出了一种新的无人机在静止和移动障碍物情况下的避碰方法。该方法基于一种新形式的局部参数化制导向量场,称为避碰向量场,它能在障碍物周围产生平滑而直观的机动。机动遵循标称无碰撞路径,我们称之为避免碰撞矢量场的流线。在多个障碍物的情况下,该方法确定一个混合向量场,该混合向量场融合了每个障碍物的避碰向量场,并在达到预定距离阈值时呈现其形式。然后,根据计算得到的制导矢量场,设计了不同的避碰控制器,实现无碰撞机动。进一步证明了具有收敛保证的跟踪控制器可以与避碰控制器一起用于跟踪避碰矢量场的流线。最后,数值模拟验证了该方法的有效性,以及在三种不同的实际场景中避免与静态和移动弹出威胁冲突的能力。 摘要:This paper presents a new collision avoidance procedure for unmanned aerial vehicles in the presence of static and moving obstacles. The proposed procedure is based on a new form of local parametrized guidance vector fields, called collision avoidance vector fields, that produce smooth and intuitive maneuvers around obstacles. The maneuvers follow nominal collision-free paths which we refer to as streamlines of the collision avoidance vector fields. In the case of multiple obstacles, the proposed procedure determines a mixed vector field that blends the collision avoidance vector field of each obstacle and assumes its form whenever a pre-defined distance threshold is reached. Then, in accordance to the computed guidance vector fields, different collision avoidance controllers that generate collision-free maneuvers are developed. Furthermore, it is shown that any tracking controller with convergence guarantees can be used with the avoidance controllers to track the streamlines of the collision avoidance vector fields. Finally, numerical simulations demonstrate the efficacy of the proposed approach and its ability to avoid collisions with static and moving pop-up threats in three different practical scenarios.
【7】 Building Intelligent Autonomous Navigation Agents 标题:构建智能自主导航代理
作者:Devendra Singh Chaplot 机构:CMU-ML-,-, Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Thesis Committee:, Ruslan Salakhutdinov, Chair, Abhinav Gupta, Deva Ramanan, Jitendra Malik, Submitted in partial fulfillment of the requirements 备注:CMU Ph.D. Thesis, March 2021. For more details see this http URL 链接:https://arxiv.org/abs/2106.13415 摘要:在过去十年中,机器学习的突破导致了“数字智能”,即机器学习模型能够从大量的标记数据中学习,以执行一些数字任务,如语音识别、人脸识别、机器翻译等。本论文的目标是在设计“物理智能”算法方面取得进展,即构建智能自主导航代理,能够学习在物理世界中执行复杂的导航任务,包括视觉感知、自然语言理解、推理、规划,以及顺序决策。尽管经典的导航方法在过去的几十年中取得了一些进展,但是当前的导航代理在长期的语义导航任务中仍然很困难。在论文的第一部分,我们讨论了利用端到端强化学习来解决诸如障碍回避、语义感知、语言基础和推理等问题的短期导航工作。在第二部分中,我们提出了一类新的基于模块化学习和结构化显式地图表示的导航方法,利用经典和端到端学习方法的优点来处理长期的导航任务。结果表明,这些方法能够有效地解决诸如定位、映射、长期规划、探索和语义先验学习等问题。这些模块化学习方法能够对空间和语义进行长期的理解,并在各种导航任务上取得最先进的效果。 摘要:Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.
【8】 Scalable Perception-Action-Communication Loops with Convolutional and Graph Neural Networks 标题:基于卷积神经网络和图神经网络的可扩展感知-动作-通信环路
作者:Ting-Kuei Hu,Fernando Gama,Tianlong Chen,Wenqing Zheng,Zhangyang Wang,Alejandro Ribeiro,Brian M. Sadler 链接:https://arxiv.org/abs/2106.13358 摘要:提出了一种基于视觉的图形聚合与推理(VGAI)的感知-动作通信环路设计方法。这种多智能体分散学习控制框架通过相邻智能体之间的局部通信,将原始的视觉观察映射到智能体的行为。该框架由卷积神经网络(CNN/GNN)和图神经网络(graph neural network,CNN/GNN)级联而成,分别处理agent级的视觉感知和特征学习,以及群体级的通信、局部信息聚合和agent动作推理。通过联合训练CNN和GNN,图像特征和通信信息结合起来学习,以更好地解决具体的任务。我们采用模仿学习的方法,依靠一个集中式专家控制器,对VGAI控制器进行离线训练。这将产生一个学习的VGAI控制器,该控制器可以以分布式方式部署以供在线执行。此外,该控制器还具有良好的可伸缩性,可以在较小的团队中进行训练,也可以在较大的团队中应用。通过一个多智能体群集应用,我们证明了VGAI的性能可以与其他分散控制器相媲美或更好,只使用视觉输入模式,而不需要访问精确的位置或运动状态信息。 摘要:In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addressing agent-level visual perception and feature learning, as well as swarm-level communication, local information aggregation and agent action inference, respectively. By jointly training the CNN and GNN, image features and communication messages are learned in conjunction to better address the specific task. We use imitation learning to train the VGAI controller in an offline phase, relying on a centralized expert controller. This results in a learned VGAI controller that can be deployed in a distributed manner for online execution. Additionally, the controller exhibits good scaling properties, with training in smaller teams and application in larger teams. Through a multi-agent flocking application, we demonstrate that VGAI yields performance comparable to or better than other decentralized controllers, using only the visual input modality and without accessing precise location or motion state information.
【9】 Distributed IDA-PBC for a Class of Nonholonomic Mechanical Systems 标题:一类非完整力学系统的分布式IDA-PBC
作者:Anastasios Tsolakis,Tamas Keviczky 机构: Delft University of Technology, Delft University ofTechnology 备注:Longer version of a 6-page conference paper submitted to MICNON 2021 in order to illustrate more results 链接:https://arxiv.org/abs/2106.13338 摘要:非完整机械系统包括一大类实际有趣的机器人结构,如轮式移动机器人、空间机械手和多指机械手。然而,在一般的、分布式的方法中,关于这类系统的协同控制的研究成果很少。在这项工作中,我们将最近发展的分布式互联和阻尼分配无源控制(IDA-PBC)方法推广到这类系统。更具体地说,基于机械系统网络的波特哈密顿系统模型,我们在分布式IDA-PBC框架下提出了一类非完整系统的全状态镇定控制律。这使得在一个统一的控制律下,可以对异构、欠驱动和非完整系统进行协同控制。该控制律主要依赖于被动位形分解(PCD)的概念和一种新颖的非光滑期望势能函数。为了实现动态的agent间冲突避免,还实现了一个低层的冲突避免协议,增强了本文工作的实用性。在不同的模拟场景下对理论结果进行了测试,以突出所导出方法的适用性。 摘要:Nonholonomic mechanical systems encompass a large class of practically interesting robotic structures, such as wheeled mobile robots, space manipulators, and multi-fingered robot hands. However, few results exist on the cooperative control of such systems in a generic, distributed approach. In this work we extend a recently developed distributed Interconnection and Damping Assignment Passivity-Based Control (IDA-PBC) method to such systems. More specifically, relying on port-Hamiltonian system modelling for networks of mechanical systems, we propose a full-state stabilization control law for a class of nonholonomic systems within the framework of distributed IDA-PBC. This enables the cooperative control of heterogeneous, underactuated and nonholonomic systems with a unified control law. This control law primarily relies on the notion of Passive Configuration Decomposition (PCD) and a novel, non-smooth desired potential energy function proposed here. A low-level collision avoidance protocol is also implemented in order to achieve dynamic inter-agent collision avoidance, enhancing the practical relevance of this work. Theoretical results are tested in different simulation scenarios in order to highlight the applicability of the derived method.
【10】 Factor Graphs for Heterogeneous Bayesian Decentralized Data Fusion 标题:异构贝叶斯分散数据融合的因子图
作者:Ofer Dagan,Nisar R. Ahmed 机构:Smead Aerospace Engineering Sciences Dept., University of Colorado Boulder, Boulder, USA 备注:8 pages, 6 figures, 1 table, submitted to the 24th International Conference on Information Fusion 链接:https://arxiv.org/abs/2106.13285 摘要:本文探讨了利用因子图作为贝叶斯对等分散数据融合的推理和分析工具。我们提出了一个框架,每个代理可以使用局部因子图来表示复杂的全局联合概率分布的相关分区,从而避免了对更复杂模型的整体进行推理,节省了通信和计算成本。这使得异构多机器人系统能够在各种面向任务的现实世界中进行协作,其中可伸缩性和模块化是关键。为了发展初始理论并分析这种方法的局限性,我们将注意力集中在树结构网络中的静态线性高斯系统上,并使用信道滤波器(也由因子图表示)来显式跟踪公共信息。我们讨论了如何使用这种表示来描述各种多机器人应用,并设计和分析新的异构数据融合算法。通过对多智能体多目标跟踪和多智能体协作映射问题的仿真,验证了该方法的有效性,并讨论了该方法的计算和通信增益。 摘要:This paper explores the use of factor graphs as an inference and analysis tool for Bayesian peer-to-peer decentralized data fusion. We propose a framework by which agents can each use local factor graphs to represent relevant partitions of a complex global joint probability distribution, thus allowing them to avoid reasoning over the entirety of a more complex model and saving communication as well as computation cost. This allows heterogeneous multi-robot systems to cooperate on a variety of real world, task oriented missions, where scalability and modularity are key. To develop the initial theory and analyze the limits of this approach, we focus our attention on static linear Gaussian systems in tree-structured networks and use Channel Filters (also represented by factor graphs) to explicitly track common information. We discuss how this representation can be used to describe various multi-robot applications and to design and analyze new heterogeneous data fusion algorithms. We validate our method in simulations of a multi-agent multi-target tracking and cooperative multi-agent mapping problems, and discuss the computation and communication gains of this approach.
【11】 Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation 标题:BRAX--一个用于大规模刚体模拟的差分物理引擎
作者:C. Daniel Freeman,Erik Frey,Anton Raichuk,Sertan Girgin,Igor Mordatch,Olivier Bachem 机构:Google Research 备注:9 pages 12 pages of appendices and references. In submission at NeurIPS 2021 Datasets and Benchmarks Track 链接:https://arxiv.org/abs/2106.13281 摘要:我们介绍了Brax,一个用于刚体模拟的开源库,它的重点是加速器的性能和并行性,它是用JAX编写的。我们呈现了一组受现有强化学习文献启发的任务的结果,但在我们的引擎中进行了重新构建。此外,我们在JAX中提供了PPO、SAC、ES和直接策略优化的重新实现,这些优化与我们的环境一起编译,允许学习算法和环境处理在同一设备上发生,并在加速器上无缝扩展。最后,我们还提供了一些笔记本,这些笔记本可以在几分钟内就常见的OpenAI健身房MuJoCo类任务的绩效政策进行训练。 摘要:We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX. We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine. Additionally, we provide reimplementations of PPO, SAC, ES, and direct policy optimization in JAX that compile alongside our environments, allowing the learning algorithm and the environment processing to occur on the same device, and to scale seamlessly on accelerators. Finally, we include notebooks that facilitate training of performant policies on common OpenAI Gym MuJoCo-like tasks in minutes.
【12】 Multi-Robot Deep Reinforcement Learning for Mobile Navigation 标题:多机器人深度强化学习在移动导航中的应用
作者:Katie Kang,Gregory Kahn,Sergey Levine 机构:University of California, Berkeley 链接:https://arxiv.org/abs/2106.13280 摘要:深度强化学习算法需要大量不同的数据集来学习基于感知的移动导航策略。然而,用一个机器人收集这样的数据集可能会非常昂贵。用可能具有不同动力学的多个不同机器人平台收集数据是一种更具可伸缩性的大规模数据收集方法。但深度强化学习算法如何利用这些异构数据集呢?在这项工作中,我们提出了一个具有层次整合模型(HInt)的深度强化学习算法。在训练时,HInt学习单独的感知模型和动力学模型,在测试时,HInt将两个模型进行分层集成,并用集成模型规划动作。这种分层集成模型的规划方法允许算法在各种不同平台收集的数据集上进行训练,同时尊重测试时部署机器人的物理能力。我们的移动导航实验表明,HInt优于传统的分层策略和单源方法。 摘要:Deep reinforcement learning algorithms require large and diverse datasets in order to learn successful policies for perception-based mobile navigation. However, gathering such datasets with a single robot can be prohibitively expensive. Collecting data with multiple different robotic platforms with possibly different dynamics is a more scalable approach to large-scale data collection. But how can deep reinforcement learning algorithms leverage such heterogeneous datasets? In this work, we propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt). At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model. This method of planning with hierarchically integrated models allows the algorithm to train on datasets gathered by a variety of different platforms, while respecting the physical capabilities of the deployment robot at test time. Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.
【13】 Towards Exploiting Geometry and Time for FastOff-Distribution Adaptation in Multi-Task RobotLearning 标题:多任务机器人学习中利用几何和时间进行快速分布适应的研究
作者:K. R. Zentner,Ryan Julian,Ujjwal Puri,Yulun Zhang,Gaurav Sukhatme 机构:University of Southern California, Los Angeles, CA 备注:Accepted to Challenges of Real World Reinforcement Learning, Virtual Workshop at NeurIPS 2020 链接:https://arxiv.org/abs/2106.13237 摘要:我们探索多任务转移学习的可能方法,寻求利用机器人任务的共享物理结构。具体来说,我们为一组基本的预训练任务训练策略,然后尝试适应新的非分布任务,使用简单的体系结构方法将这些策略重新用作黑盒优先级。这些方法包括学习从基本任务到目标任务的观察空间或动作空间的对齐以利用刚体结构,以及学习解决目标任务的跨基本任务的时域切换策略以利用时间一致性的方法。我们发现,将低复杂度的目标策略类、作为黑盒先验的基本策略和简单的优化算法相结合,可以使用少量的离线训练数据,获得基本任务分布之外的新任务。 摘要:We explore possible methods for multi-task transfer learning which seek to exploit the shared physical structure of robotics tasks. Specifically, we train policies for a base set of pre-training tasks, then experiment with adapting to new off-distribution tasks, using simple architectural approaches for re-using these policies as black-box priors. These approaches include learning an alignment of either the observation space or action space from a base to a target task to exploit rigid body structure, and methods for learning a time-domain switching policy across base tasks which solves the target task, to exploit temporal coherence. We find that combining low-complexity target policy classes, base policies as black-box priors, and simple optimization algorithms allows us to acquire new tasks outside the base task distribution, using small amounts of offline training data.
【14】 Post Selections Using Test Sets (PSUTS) and How Developmental Networks Avoid Them 标题:使用测试集的帖子选择(PSUT)以及开发网络如何避免它们
作者:Juyang Weng 机构:∗Department of Computer Science and Engineering, †Cognitive Science Program, ‡Neuroscience Program, Michigan State University, East Lansing, MI, USA, §GENISAMA LLC, Okemos, MI , USA 备注:13 pages, 2 figures. The first part has been accepted as an IJCNN 2021 paper and the second has been accepted as an ICDL 2021 paper 链接:https://arxiv.org/abs/2106.13233 摘要:本文提出了一个很少报道的人工智能(AI)实践,称为使用测试集的后选择(PSUT)。因此,在深度学习中流行的错误反馈方法缺乏可接受的泛化能力。所有人工智能方法分为两大流派,连接主义和象征性。PSUT分为两种,机器PSUT和人PSUT。由于大量的网络参数和现在更糟糕的机器PSUT,连接主义学派因其“不修边幅”而受到批评;但由于人类PSUT的泛化能力较弱,这种看似“干净”的符号学派似乎更脆弱。本文正式定义了PSUTS的概念,分析了随机初始权值的误差反投影方法为什么会出现严重的局部极小值,PSUTS为什么违反了公认的研究伦理,以及每一篇使用PSUTS的论文应该如何至少透明地报告PSUTS。为了提高未来出版物的透明度,本文提出了一个新的人工智能性能评估标准,即所有训练网络的发展误差,以及三种学习条件:(1)增量学习结构,(2)训练经验和(3)有限的计算资源。开发性网络避免PSUT,并且不“邋遢”,因为它们驱动紧急图灵机,并且在整个生命周期中的最大可能性意义上是最优的。 摘要:This paper raises a rarely reported practice in Artificial Intelligence (AI) called Post Selection Using Test Sets (PSUTS). Consequently, the popular error-backprop methodology in deep learning lacks an acceptable generalization power. All AI methods fall into two broad schools, connectionist and symbolic. The PSUTS fall into two kinds, machine PSUTS and human PSUTS. The connectionist school received criticisms for its "scruffiness" due to a huge number of network parameters and now the worse machine PSUTS; but the seemingly "clean" symbolic school seems more brittle because of a weaker generalization power using human PSUTS. This paper formally defines what PSUTS is, analyzes why error-backprop methods with random initial weights suffer from severe local minima, why PSUTS violates well-established research ethics, and how every paper that used PSUTS should have at least transparently reported PSUTS. For improved transparency in future publications, this paper proposes a new standard for performance evaluation of AI, called developmental errors for all networks trained, along with Three Learning Conditions: (1) an incremental learning architecture, (2) a training experience and (3) a limited amount of computational resources. Developmental Networks avoid PSUTS and are not "scruffy" because they drive Emergent Turing Machines and are optimal in the sense of maximum-likelihood across lifetime.
【15】 mathcal{N}IPM-HLSP: An Efficient Interior-Point Method for Hierarchical Least-Squares Programs
作者:Kai Pfeiffer,Adrien Escande,Ludovic Righetti 机构: Tandon School of Engineering, New YorkUniversity, Japan 3National Institute of Advanced Industrial Science and Technology (AIST), JapanPart of this work was supported by New York University 备注:17 pages, 7 figures 链接:https://arxiv.org/abs/2106.13602 摘要:带线性约束的分层最小二乘规划(HLSP)是机器人领域中一类非常常见的优化问题。每个优先级包含一个最小二乘形式的目标,该目标受更高优先级层次结构的线性约束。主动集方法(ASM)是解决这些问题的常用方法。但是,如果活动集有很大的变化,它们在计算时间方面的性能会很差。因此,我们提出了一个计算效率高的原-对偶内点法(IPM),该方法能够在这些情况下保持求解器迭代次数不变。我们的IPM基于零空间方法,它只需要一次Newton迭代分解,而不是像其他IPM求解器那样需要两次。在一个优先级收敛之后,我们根据对偶构造一组活动约束,并将较低的优先级投影到它们的空空间中。我们证明了IPM-HLSP可以用最小二乘形式表示,避免了二次Karush-Kuhn-Tucker(KKT)黑森方程的形成。由于我们选择了零空间基,IPM-HLSP与仅用于等式问题的最新ASM-HLSP解算器一样快。 摘要:Hierarchical least-squares programs with linear constraints (HLSP) are a type of optimization problem very common in robotics. Each priority level contains an objective in least-squares form which is subject to the linear constraints of the higher priority hierarchy levels. Active-set methods (ASM) are a popular choice for solving them. However, they can perform poorly in terms of computational time if there are large changes of the active set. We therefore propose a computationally efficient primal-dual interior-point method (IPM) for HLSP's which is able to maintain constant numbers of solver iterations in these situations. We base our IPM on the null-space method which requires only a single decomposition per Newton iteration instead of two as it is the case for other IPM solvers. After a priority level has converged we compose a set of active constraints judging upon the dual and project lower priority levels into their null-space. We show that the IPM-HLSP can be expressed in least-squares form which avoids the formation of the quadratic Karush-Kuhn-Tucker (KKT) Hessian. Due to our choice of the null-space basis the IPM-HLSP is as fast as the state-of-the-art ASM-HLSP solver for equality only problems.