Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.RO机器人相关,共计12篇
【1】 Influencing Towards Stable Multi-Agent Interactions 标题:对稳定的多智能体交互的影响 链接:https://arxiv.org/abs/2110.08229
作者:Woodrow Z. Wang,Andy Shih,Annie Xie,Dorsa Sadigh 备注:15 pages, 5 figures, Published as an Oral at Conference on Robot Learning (CoRL) 2021 摘要:在多智能体环境中学习是困难的,因为对手或伙伴的变化行为会引入非平稳性。我们提出了一种算法来主动影响另一个代理的策略以使其稳定下来,而不是被动地适应另一个代理(对手或伙伴)的行为,这可以抑制由另一个代理引起的非平稳性。我们学习了另一个智能体策略的低维潜在表示,以及潜在策略相对于我们机器人行为的演化动力学。有了这个学习过的动力学模型,我们可以定义一个无监督的稳定性奖励,来训练我们的机器人故意影响另一个代理朝着单一策略稳定下来。我们证明了在各种模拟环境中,包括自动驾驶、紧急通信和机器人操作,稳定在提高任务报酬最大化效率方面的有效性。我们在网站上展示定性结果:https://sites.google.com/view/stable-marl/. 摘要:Learning in multi-agent environments is difficult due to the non-stationarity introduced by an opponent's or partner's changing behaviors. Instead of reactively adapting to the other agent's (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent's strategy to stabilize -- which can restrain the non-stationarity caused by the other agent. We learn a low-dimensional latent representation of the other agent's strategy and the dynamics of how the latent strategy evolves with respect to our robot's behavior. With this learned dynamics model, we can define an unsupervised stability reward to train our robot to deliberately influence the other agent to stabilize towards a single strategy. We demonstrate the effectiveness of stabilizing in improving efficiency of maximizing the task reward in a variety of simulated environments, including autonomous driving, emergent communication, and robotic manipulation. We show qualitative results on our website: https://sites.google.com/view/stable-marl/.
【2】 Dual-Arm Adversarial Robot Learning 标题:双臂对抗性机器人学习 链接:https://arxiv.org/abs/2110.08066
作者:Elie Aljalbout 机构:Technical University of Munich 备注:Accepted at CoRL 2021, Blue Sky Track 摘要:机器人学习是未来自动化和机器智能的一个非常有前途的课题。未来的机器人应该能够自主地获得技能,学会表现自己的环境,并与之互动。虽然这些主题已经在仿真中进行了探索,但现实世界中的机器人学习研究似乎仍然有限。这是由于在现实世界中遇到的额外挑战,如噪声传感器和执行器、安全探测、非平稳动力学、自主环境重置以及长时间运行实验的成本。除非我们为这些问题开发出可扩展的解决方案,否则学习涉及手眼协调和丰富接触的复杂任务将仍然是一个未触及的愿景,只有在受控实验室环境中才可行。我们建议将双臂设置作为机器人学习的平台。这样的设置可以安全地收集数据,以获取操作技能,并以机器人监督的方式训练感知模块。它们还简化了重置环境的过程。此外,对抗式学习通过最大化基于博弈论目标的探索,同时确保基于协作任务空间的安全性,有可能提高机器人学习方法的泛化能力。在本文中,我们将讨论这种设置的潜在好处以及可以追求的挑战和研究方向。 摘要:Robot learning is a very promising topic for the future of automation and machine intelligence. Future robots should be able to autonomously acquire skills, learn to represent their environment, and interact with it. While these topics have been explored in simulation, real-world robot learning research seems to be still limited. This is due to the additional challenges encountered in the real-world, such as noisy sensors and actuators, safe exploration, non-stationary dynamics, autonomous environment resetting as well as the cost of running experiments for long periods of time. Unless we develop scalable solutions to these problems, learning complex tasks involving hand-eye coordination and rich contacts will remain an untouched vision that is only feasible in controlled lab environments. We propose dual-arm settings as platforms for robot learning. Such settings enable safe data collection for acquiring manipulation skills as well as training perception modules in a robot-supervised manner. They also ease the processes of resetting the environment. Furthermore, adversarial learning could potentially boost the generalization capability of robot learning methods by maximizing the exploration based on game-theoretic objectives while ensuring safety based on collaborative task spaces. In this paper, we will discuss the potential benefits of this setup as well as the challenges and research directions that can be pursued.
【3】 A Broad-persistent Advising Approach for Deep Interactive Reinforcement Learning in Robotic Environments 标题:机器人环境中深度交互式强化学习的广域持久建议方法 链接:https://arxiv.org/abs/2110.08003
作者:Hung Son Nguyen,Francisco Cruz,Richard Dazeley 机构: Deep ReinforcementTheauthorsarewiththeSchoolofInformationTechnology, DeakinUniversity 备注:10 pages 摘要:深度强化学习(deepreinforcionlearning,DeepRL)方法已广泛应用于机器人学中,用于学习环境和自主获取行为。深度互动强化学习(DeepIRL)包括来自外部训练师或专家的互动反馈,提供建议,帮助学习者选择行动以加快学习过程。然而,目前的研究仅限于仅对代理当前状态提供可操作建议的交互。此外,在一次使用后,代理会丢弃该信息,从而导致在相同状态下重复进程以进行重新访问。在本文中,我们提出了广泛的持久性建议(BPA),这是一种广泛的持久性建议方法,它保留并重用处理过的信息。它不仅帮助训练师提供与类似状态相关的更一般的建议,而不仅仅是当前状态,而且还允许代理加快学习过程。我们在两个连续的机器人场景中测试了所提出的方法,即手推车杆平衡任务和模拟机器人导航任务。所获得的结果表明,与DeepIRL方法相比,使用BPA的代理的性能有所提高,同时保持了训练师所需的交互次数。 摘要:Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviors autonomously. Deep Interactive Reinforcement Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choosing actions to speed up the learning process. However, current research has been limited to interactions that offer actionable advice to only the current state of the agent. Additionally, the information is discarded by the agent after a single use that causes a duplicate process at the same state for a revisit. In this paper, we present Broad-persistent Advising (BPA), a broad-persistent advising approach that retains and reuses the processed information. It not only helps trainers to give more general advice relevant to similar states instead of only the current state but also allows the agent to speed up the learning process. We test the proposed approach in two continuous robotic scenarios, namely, a cart pole balancing task and a simulated robot navigation task. The obtained results show that the performance of the agent using BPA improves while keeping the number of interactions required for the trainer in comparison to the DeepIRL approach.
【4】 On-Policy Model Errors in Reinforcement Learning 标题:强化学习中的On-Policy模型错误 链接:https://arxiv.org/abs/2110.07985
作者:Lukas P. Fröhlich,Maksym Lefarov,Melanie N. Zeilinger,Felix Berkenkamp 机构:Bosch Center for Artificial Intelligence, Renningen, Germany, Melanie Zeilinger, Institute for Dynamic Systems and Control, ETH Zürich, Zurich, Switzerland 摘要:无模型强化学习算法可以计算给定采样环境转换的策略梯度,但需要大量数据。相反,基于模型的方法可以使用学习的模型生成新数据,但模型错误和偏差会导致学习不稳定或次优。在本文中,我们提出了一种新的方法,结合真实世界的数据和学习模型,以获得最好的两个世界。其核心思想是利用现实世界的数据进行政策预测,并仅使用学习到的模型来概括不同的行动。具体而言,我们在学习模型的基础上,将数据作为与时间相关的政策修正项,以保持生成数据的能力,而不会在长期预测范围内累积错误。我们从理论上激励了这种方法,并证明它抵消了基于模型的策略改进的错误项。在MuJoCo和PyBullet基准上的实验表明,我们的方法可以在不引入额外调优参数的情况下大大改进现有的基于模型的方法。 摘要:Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model errors and bias can render learning unstable or sub-optimal. In this paper, we present a novel method that combines real world data and a learned model in order to get the best of both worlds. The core idea is to exploit the real world data for on-policy predictions and use the learned model only to generalize to different actions. Specifically, we use the data as time-dependent on-policy correction terms on top of a learned model, to retain the ability to generate data without accumulating errors over long prediction horizons. We motivate this method theoretically and show that it counteracts an error term for model-based policy improvement. Experiments on MuJoCo- and PyBullet-benchmarks show that our method can drastically improve existing model-based approaches without introducing additional tuning parameters.
【5】 Estimation and Prediction of Deterministic Human Intent Signal to augment Haptic Glove aided Control of Robotic Hand 标题:增加机械手触觉手套辅助控制的确定性人意图信号估计与预测 链接:https://arxiv.org/abs/2110.07953
作者:Rajesh Kumar,Pimmy Gandotra,Brejesh Lall,Arzad A. Kherani,Sudipto Mukherjee 摘要:本文主要研究基于触觉手套(HG)的机械手(RH)手部操纵控制。提出了一种控制算法,允许RH重新定位目标姿势。HG和RH的运动信号都是高维的。RH运动学通常不同于HG运动学。这两个装置的运动学变化,加上人手运动学信息不完整,导致难以将HG的高维运动信号直接映射到RH。因此,提出了一种从高维汞运动信号中估计人类意图并在右侧重建信号以确保目标重新定位的方法。研究还表明,人手运动信号合成的滞后加上RH的控制延迟,导致需要预测人的意图信号。然后,提出了一种递归神经网络(RNN)来提前预测人类意图信号。 摘要:The paper focuses on Haptic Glove (HG) based control of a Robotic Hand (RH) executing in-hand manipulation. A control algorithm is presented to allow the RH relocate the object held to a goal pose. The motion signals for both the HG and the RH are high dimensional. The RH kinematics is usually different from the HG kinematics. The variability of kinematics of the two devices, added with the incomplete information about the human hand kinematics result in difficulty in direct mapping of the high dimensional motion signal of the HG to the RH. Hence, a method is proposed to estimate the human intent from the high dimensional HG motion signal and reconstruct the signal at the RH to ensure object relocation. It is also shown that the lag in synthesis of the motion signal of the human hand added with the control latency of the RH leads to a requirement of the prediction of the human intent signal. Then, a recurrent neural network (RNN) is proposed to predict the human intent signal ahead of time.
【6】 Anomaly Detection in Multi-Agent Trajectories for Automated Driving 标题:面向自动驾驶的多智能体轨迹异常检测 链接:https://arxiv.org/abs/2110.07922
作者:Julian Wiederer,Arij Bouazizi,Marco Troina,Ulrich Kressel,Vasileios Belagiannis 机构:Mercedes-Benz AG, Ulm University 备注:15 pages incl. supplementary material, 8 figures, 4 tables (accepted by CoRL 2021) 摘要:人类驾驶员可以识别快速异常驾驶情况,以避免发生事故。与人类类似,自动化车辆应该执行异常检测。在这项工作中,我们提出了时空图自动编码器学习正常驾驶行为。我们的创新在于能够联合学习动态多个代理的多个轨迹。为了进行异常检测,我们首先估计学习轨迹特征表示的密度函数,然后在低密度区域检测异常。由于缺乏用于自动驾驶中异常检测的多智能体轨迹数据集,我们介绍了使用驾驶模拟器进行正常和异常操纵的数据集。我们的评估表明,与相关工作相比,我们的方法学习了不同代理之间的关系,并提供了有希望的结果。代码、模拟和数据集可在项目页面上公开获取:https://github.com/againerju/maad_highway. 摘要:Human drivers can recognise fast abnormal driving situations to avoid accidents. Similar to humans, automated vehicles are supposed to perform anomaly detection. In this work, we propose the spatio-temporal graph auto-encoder for learning normal driving behaviours. Our innovation is the ability to jointly learn multiple trajectories of a dynamic number of agents. To perform anomaly detection, we first estimate a density function of the learned trajectory feature representation and then detect anomalies in low-density regions. Due to the lack of multi-agent trajectory datasets for anomaly detection in automated driving, we introduce our dataset using a driving simulator for normal and abnormal manoeuvres. Our evaluations show that our approach learns the relation between different agents and delivers promising results compared to the related works. The code, simulation and the dataset are publicly available on the project page: https://github.com/againerju/maad_highway.
【7】 Learning to Infer Kinematic Hierarchies for Novel Object Instances 标题:学习推断新对象实例的运动学层次 链接:https://arxiv.org/abs/2110.07911
作者:Hameed Abdul-Rashid,Miles Freeman,Ben Abbatematteo,George Konidaris,Daniel Ritchie 机构:Brown University, University of Illinois at Urbana-Champaign 摘要:操纵铰接对象需要感知其运动学层次:其各个部分、每个部分如何移动以及这些运动如何耦合。以前的工作已经探索了运动学的每个概念,但没有一项工作在不依赖模式或模板的情况下,在从未见过的对象实例上推断出完整的运动学。我们提出了一种新的感知系统来实现这一目标。我们的系统推断出物体的运动部分以及与之相关的运动耦合。对于其他零件,它使用点云实例分割神经网络来推断运动学层次,它使用图形神经网络来预测与推断零件相关的边(即关节)的存在、方向和类型。我们使用合成3D模型的模拟扫描来训练这些网络。我们通过3D对象的模拟扫描来评估我们的系统,并且我们展示了我们的系统用于驱动真实世界机器人操作的概念验证。 摘要:Manipulating an articulated object requires perceiving itskinematic hierarchy: its parts, how each can move, and howthose motions are coupled. Previous work has explored per-ception for kinematics, but none infers a complete kinematichierarchy on never-before-seen object instances, without relyingon a schema or template. We present a novel perception systemthat achieves this goal. Our system infers the moving parts ofan object and the kinematic couplings that relate them. Toinfer parts, it uses a point cloud instance segmentation neuralnetwork and to infer kinematic hierarchies, it uses a graphneural network to predict the existence, direction, and typeof edges (i.e. joints) that relate the inferred parts. We trainthese networks using simulated scans of synthetic 3D models.We evaluate our system on simulated scans of 3D objects, andwe demonstrate a proof-of-concept use of our system to drivereal-world robotic manipulation.
【8】 Toward Learning Context-Dependent Tasks from Demonstration for Tendon-Driven Surgical Robots 标题:肌腱驱动手术机器人从演示中学习上下文相关任务 链接:https://arxiv.org/abs/2110.07789
作者:Yixuan Huang,Michael Bentley,Tucker Hermans,Alan Kuntz 机构: Our method then learns howThe authors are with the Robotics Center and School of Computing, University of Utah 备注:7 pages, 6 figures, to be published in the proceedings of the 2021 International Symposium on Medical Robotics (ISMR) 摘要:肌腱驱动机器人是一种连续体机器人,通过接近难以到达的解剖目标,有可能降低手术的侵袭性。在未来,这些机器人手术任务的自动化可能有助于减少面对快速增长的人口时外科医生的压力。然而,直接为这些机器人编码手术任务及其相关上下文是不可行的。在这项工作中,我们采取步骤建立一个系统,该系统能够通过直接从一组专家演示中学习来学习成功执行上下文相关的外科任务。我们提出了三个在演示上训练的模型,其条件是对演示的上下文进行矢量编码。然后,我们使用这些模型来规划和执行肌腱驱动机器人的运动,类似于训练集中没有看到的新环境演示。我们在三个受手术启发的任务上证明了我们的方法的有效性。 摘要:Tendon-driven robots, a type of continuum robot, have the potential to reduce the invasiveness of surgery by enabling access to difficult-to-reach anatomical targets. In the future, the automation of surgical tasks for these robots may help reduce surgeon strain in the face of a rapidly growing population. However, directly encoding surgical tasks and their associated context for these robots is infeasible. In this work we take steps toward a system that is able to learn to successfully perform context-dependent surgical tasks by learning directly from a set of expert demonstrations. We present three models trained on the demonstrations conditioned on a vector encoding the context of the demonstration. We then use these models to plan and execute motions for the tendon-driven robot similar to the demonstrations for novel context not seen in the training set. We demonstrate the efficacy of our method on three surgery-inspired tasks.
【9】 An Independent Study of Reinforcement Learning and Autonomous Driving 标题:强化学习与自主驾驶的独立研究 链接:https://arxiv.org/abs/2110.07729
作者:Hanzhi Yang 机构:edu University of Michigan 备注:32 pages in total, 7 figures, 3 appendices, 5 tables 摘要:强化学习已成为近十年来最热门的学科之一。它已在机器人操作、自动驾驶、路径规划、计算机游戏等各个领域得到应用。在本项目期间,我们完成了三项任务。首先,我们研究了表格环境下的Q-学习算法,并将其成功应用于OpenAi健身房环境Taxi。其次,我们了解并实现了Cart-Pole环境下的deep-Q网络算法。第三,我们还研究了强化学习在自主驾驶中的应用及其与安全检查约束(安全控制器)的结合。我们使用高速公路健身房环境训练了一个粗糙的自主驾驶代理,并探讨了奖励函数等各种环境配置对代理训练性能的影响。 摘要:Reinforcement learning has become one of the most trending subjects in the recent decade. It has seen applications in various fields such as robot manipulations, autonomous driving, path planning, computer gaming, etc. We accomplished three tasks during the course of this project. Firstly, we studied the Q-learning algorithm for tabular environments and applied it successfully to an OpenAi Gym environment, Taxi. Secondly, we gained an understanding of and implemented the deep Q-network algorithm for Cart-Pole environment. Thirdly, we also studied the application of reinforcement learning in autonomous driving and its combination with safety check constraints (safety controllers). We trained a rough autonomous driving agent using highway-gym environment and explored the effects of various environment configurations like reward functions on the agent training performance.
【10】 Safety-aware Policy Optimisation for Autonomous Racing 标题:自主赛车的安全感知策略优化 链接:https://arxiv.org/abs/2110.07699
作者:Bingqing Chen,Jonathan Francis,James Herman,Jean Oh,Eric Nyberg,Sylvia L. Herbert 机构:School of Computer Science, Carnegie Mellon University, Pittsburgh, PA , Human-Machine Collaboration, Bosch Research Pittsburgh, Pittsburgh, PA , Safe Autonomous Systems, University of California, San Diego, CA 备注:22 pages, 14 figures, 3 tables 摘要:为了适用于安全关键应用,如自动驾驶和辅助机器人,自动代理应在与其环境的整个交互过程中遵守安全约束。汉密尔顿-雅可比(HJ)可达性等方法不是通过收集样本(包括不安全样本)来学习安全性,而是使用系统动力学模型计算具有理论保证的安全集。然而,HJ可达性不能扩展到高维系统,其保证取决于模型的质量。在这项工作中,我们将HJ可达性理论注入到约束马尔可夫决策过程(CMDP)框架中,作为通过状态-动作对的无模型更新进行安全分析的控制理论方法。此外,我们证明HJ安全值可以直接在视觉环境中学习,这是迄今为止通过该方法研究的最高维度问题。我们在几个基准任务上评估了我们的方法,包括安全健身房和学会比赛(L2R),一个最近发布的高保真自主比赛环境。与其他受约束的RL基线相比,我们的方法明显减少了违反约束的情况,并在L2R基准任务上实现了新的最新成果。 摘要:To be viable for safety-critical applications, such as autonomous driving and assistive robotics, autonomous agents should adhere to safety constraints throughout the interactions with their environments. Instead of learning about safety by collecting samples, including unsafe ones, methods such as Hamilton-Jacobi (HJ) reachability compute safe sets with theoretical guarantees using models of the system dynamics. However, HJ reachability is not scalable to high-dimensional systems, and the guarantees hinge on the quality of the model. In this work, we inject HJ reachability theory into the constrained Markov decision process (CMDP) framework, as a control-theoretical approach for safety analysis via model-free updates on state-action pairs. Furthermore, we demonstrate that the HJ safety value can be learned directly on vision context, the highest-dimensional problem studied via the method to-date. We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment. Our approach has significantly fewer constraint violations in comparison to other constrained RL baselines, and achieve the new state-of-the-art results on the L2R benchmark task.
【11】 Shaping embodied agent behavior with activity-context priors from egocentric video 标题:从以自我为中心的视频中利用活动情境先验塑造具体化代理行为 链接:https://arxiv.org/abs/2110.07692
作者:Tushar Nagarajan,Kristen Grauman 机构:UT Austin and Facebook AI Research 摘要:复杂的物理任务需要一系列对象交互,每个对象交互都有其自身的前提条件——这对于机器人代理来说,仅仅通过自己的经验来有效地学习是很困难的。我们介绍了一种方法来发现活动的背景先验知识,从野生的自我中心的视频捕获与人类佩戴的相机。对于给定对象,activity context Previor表示活动成功所需的一组其他兼容对象(例如,与番茄一起使用的刀和砧板有助于切割)。我们将基于Previor的视频编码为一个辅助奖励函数,鼓励代理在尝试交互之前将兼容对象聚集在一起。通过这种方式,我们的模型将日常人类经验转化为具体的代理技能。我们使用EPIC kitchen的视频演示了我们的想法,视频中的人们正在进行无脚本的厨房活动,这有助于虚拟家庭机器人代理在AI2 iTHOR中执行各种复杂任务,显著加快代理学习。项目页面:http://vision.cs.utexas.edu/projects/ego-rewards/ 摘要:Complex physical tasks entail a sequence of object interactions, each with its own preconditions -- which can be difficult for robotic agents to learn efficiently solely through their own experience. We introduce an approach to discover activity-context priors from in-the-wild egocentric video captured with human worn cameras. For a given object, an activity-context prior represents the set of other compatible objects that are required for activities to succeed (e.g., a knife and cutting board brought together with a tomato are conducive to cutting). We encode our video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction. In this way, our model translates everyday human experience into embodied agent skills. We demonstrate our idea using egocentric EPIC-Kitchens video of people performing unscripted kitchen activities to benefit virtual household robotic agents performing various complex tasks in AI2-iTHOR, significantly accelerating agent learning. Project page: http://vision.cs.utexas.edu/projects/ego-rewards/
【12】 Augmenting Imitation Experience via Equivariant Representations 标题:用等变表示法扩充模仿体验 链接:https://arxiv.org/abs/2110.07668
作者:Dhruv Sharma,Alihusein Kuwajerwala,Florian Shkurti 机构: University of TorontoRoboticsInstitute 备注:7 pages (including references), 15 figures 摘要:通过模仿训练的视觉导航策略的鲁棒性通常取决于训练图像-动作对的增强。传统上,这是通过从多个摄像头收集数据、使用计算机视觉的标准数据增强(如向每个图像添加随机噪声)或合成训练图像来实现的。在本文中,我们展示了另一种基于外推视点嵌入和在训练数据中观察到的动作附近的视觉导航数据增强的实用替代方案。我们的方法利用了二维和三维视觉导航问题的几何结构,并且依赖于与图像相反的等变嵌入函数的策略。给定来自训练导航数据集的图像-动作对,我们的神经网络模型使用等变特性预测附近视点处图像的潜在表示,并扩充数据集。然后,我们在扩充数据集上训练策略。我们的模拟结果表明,与使用标准增强方法训练的策略相比,以这种方式训练的策略显示出减少的交叉跟踪误差,并且需要更少的干预。我们还展示了真实地面机器人沿500米以上路径进行自主视觉导航的类似结果。 摘要:The robustness of visual navigation policies trained through imitation often hinges on the augmentation of the training image-action pairs. Traditionally, this has been done by collecting data from multiple cameras, by using standard data augmentations from computer vision, such as adding random noise to each image, or by synthesizing training images. In this paper we show that there is another practical alternative for data augmentation for visual navigation based on extrapolating viewpoint embeddings and actions nearby the ones observed in the training data. Our method makes use of the geometry of the visual navigation problem in 2D and 3D and relies on policies that are functions of equivariant embeddings, as opposed to images. Given an image-action pair from a training navigation dataset, our neural network model predicts the latent representations of images at nearby viewpoints, using the equivariance property, and augments the dataset. We then train a policy on the augmented dataset. Our simulation results indicate that policies trained in this way exhibit reduced cross-track error, and require fewer interventions compared to policies trained using standard augmentation methods. We also show similar results in autonomous visual navigation by a real ground robot along a path of over 500m.
机器翻译,仅供参考