机器人相关学术速递[9.10]

2021-09-16 16:55:25 浏览数 (1)

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.RO机器人相关,共计26篇

【1】 Leveraging Local Domains for Image-to-Image Translation 标题:利用本地域进行图像到图像的转换 链接:https://arxiv.org/abs/2109.04468

作者:Anthony Dell'Eva,Fabio Pizzati,Massimo Bertozzi,Raoul de Charette 机构:VisLab, Parma, Italy, Inria, Paris, France, University of Parma, Parma, Italy 备注:Submitted to conference 摘要:图像到图像(i2i)网络难以捕获局部变化,因为它们不会影响全局场景结构。例如,从公路场景转换到越野场景,i2i网络很容易关注全局颜色特征,但忽略了人类的明显特征,如没有车道标记。在本文中,我们利用人类关于空间域特征的知识,我们称之为“局部域”,并展示其对图像到图像翻译的好处。依靠一个简单的几何指导,我们根据少量的源数据训练了一个基于GAN的补丁,并幻觉出一个新的看不见的域,从而简化了向目标的转移学习。我们在三个任务上进行了实验,从非结构化环境到恶劣天气。我们的综合评估设置表明,我们能够以最少的先验知识生成真实的翻译,并且只对少数图像进行训练。此外,当对我们的翻译图像进行训练时,我们发现所有测试的代理任务都得到了显著的改进,在训练时从未看到目标域。 摘要:Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore obvious traits for humans like the absence of lane markings. In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as 'local domains' and demonstrate its benefit for image-to-image translation. Relying on a simple geometrical guidance, we train a patch-based GAN on few source data and hallucinate a new unseen domain which subsequently eases transfer learning to target. We experiment on three tasks ranging from unstructured environments to adverse weather. Our comprehensive evaluation setting shows we are able to generate realistic translations, with minimal priors, and training only on a few images. Furthermore, when trained on our translations images we show that all tested proxy tasks are significantly improved, without ever seeing target domain at training.

【2】 NEAT: Neural Attention Fields for End-to-End Autonomous Driving 标题:Neat:端到端自动驾驶的神经注意区域 链接:https://arxiv.org/abs/2109.04456

作者:Kashyap Chitta,Aditya Prakash,Andreas Geiger 机构:Max Planck Institute for Intelligent Systems, T¨ubingen, University of T¨ubingen 备注:ICCV 2021 摘要:关于场景的语义、空间和时间结构的有效推理是自主驾驶的关键先决条件。我们提出了神经注意场(NEAT),这是一种新的表示方法,可以为端到端的模仿学习模型提供这种推理。NEAT是一个连续函数,它将鸟瞰视图(BEV)场景坐标中的位置映射到航路点和语义,使用中间注意贴图将高维2D图像特征迭代压缩为紧凑表示。这使得我们的模型能够有选择地关注输入中的相关区域,同时忽略与驾驶任务无关的信息,从而有效地将图像与BEV表示相关联。在一个新的评估设置涉及恶劣的环境条件和具有挑战性的方案,整洁优于几个强基线,并实现驾驶分数与特权卡拉专家用来生成其训练数据。此外,可视化具有整洁中间表示的模型的注意图提供了改进的可解释性。 摘要:Efficient reasoning about the semantic, spatial, and temporal structure of a scene is a crucial prerequisite for autonomous driving. We present NEural ATtention fields (NEAT), a novel representation that enables such reasoning for end-to-end imitation learning models. NEAT is a continuous function which maps locations in Bird's Eye View (BEV) scene coordinates to waypoints and semantics, using intermediate attention maps to iteratively compress high-dimensional 2D image features into a compact representation. This allows our model to selectively attend to relevant regions in the input while ignoring information irrelevant to the driving task, effectively associating the images with the BEV representation. In a new evaluation setting involving adverse environmental conditions and challenging scenarios, NEAT outperforms several strong baselines and achieves driving scores on par with the privileged CARLA expert used to generate its training data. Furthermore, visualizing the attention maps for models with NEAT intermediate representations provides improved interpretability.

【3】 Mini Cheetah, the Falling Cat: A Case Study in Machine Learning and Trajectory Optimization for Robot Acrobatics 标题:迷你猎豹,坠落的猫:杂技机器人的机器学习和轨迹优化研究 链接:https://arxiv.org/abs/2109.04424

作者:Vince Kurtz,He Li,Patrick M. Wensing,Hai Lin 机构: University of Notre Dame 摘要:猫在跌倒后总是用脚着地,这似乎是对基本物理的蔑视。在本文中,我们设计了一个控制器,使小型猎豹四足机器人也能用脚着陆。具体来说,我们将探索轨迹优化和机器学习如何协同工作以实现高度动态的仿生行为。我们发现,神经网络学习整个状态轨迹的反射方法优于神经网络学习从状态到控制输入的映射的策略方法。我们在仿真和硬件实验中验证了我们提出的控制器,并且能够在初始俯仰角在-90到90度之间的情况下,使机器人从坠落中双脚着地。 摘要:Seemingly in defiance of basic physics, cats consistently land on their feet after falling. In this paper, we design a controller that lands the Mini Cheetah quadruped robot on its feet as well. Specifically, we explore how trajectory optimization and machine learning can work together to enable highly dynamic bioinspired behaviors. We find that a reflex approach, in which a neural network learns entire state trajectories, outperforms a policy approach, in which a neural network learns a mapping from states to control inputs. We validate our proposed controller in both simulation and hardware experiments, and are able to land the robot on its feet from falls with initial pitch angles between -90 and 90 degrees.

【4】 Dynamic Modeling of Hand-Object Interactions via Tactile Sensing 标题:基于触觉的手-物交互动态建模 链接:https://arxiv.org/abs/2109.04378

作者:Qiang Zhang,Yunzhu Li,Yiyue Luo,Wan Shou,Michael Foshey,Junchi Yan,Joshua B. Tenenbaum,Wojciech Matusik,Antonio Torralba 机构:Juggling, Tactile glove, Stick balancing, Tactile response, Time 备注:IROS 2021. First two authors contributed equally. Project page: this http URL 摘要:触觉感知对于人类执行日常任务至关重要。虽然在从视觉分析物体抓取方面已经取得了重大进展,但我们如何利用触觉感知来推理和建模手-物体相互作用的动力学尚不清楚。在这项工作中,我们使用一个高分辨率的触觉手套在一组不同的物体上执行四种不同的交互活动。我们在跨模式学习框架上构建模型,并使用视觉处理管道生成标签,以监督触觉模型,然后在测试期间可以单独使用该模型。触觉模型旨在通过预测模型和对比学习模块相结合,纯粹从触摸数据预测手和物体的三维位置。该框架可以从触觉数据推断交互模式,幻觉环境的变化,估计预测的不确定性,并推广到看不见的对象。我们还提供了关于不同系统设计的详细消融研究以及预测轨迹的可视化。这项工作在手-物体交互的动力学建模方面迈出了一步,从稠密的触觉感知开始,这为机器人的活动学习、人机交互和模仿学习的未来应用打开了大门。 摘要:Tactile sensing is critical for humans to perform everyday tasks. While significant progress has been made in analyzing object grasping from vision, it remains unclear how we can utilize tactile sensing to reason about and model the dynamics of hand-object interactions. In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model, which can then be used on its own during the test time. The tactile model aims to predict the 3d locations of both the hand and the object purely from the touch data by combining a predictive model and a contrastive learning module. This framework can reason about the interaction patterns from the tactile data, hallucinate the changes in the environment, estimate the uncertainty of the prediction, and generalize to unseen objects. We also provide detailed ablation studies regarding different system designs as well as visualizations of the predicted trajectories. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.

【5】 Learning Vision-Guided Dynamic Locomotion Over Challenging Terrains 标题:在具有挑战性的地形上学习视觉引导的动态运动 链接:https://arxiv.org/abs/2109.04322

作者:Zhaocheng Liu,Fernando Acero,Zhibin Li 机构:Institute of Perception, Action and Behaviour, School of Informatics, University of Edinburgh 备注:9 pages, 27 figures, 1 table 摘要:近年来,腿型机器人正变得越来越强大和流行,因为它们有可能将自主智能体的移动性提升到一个新的水平。这项工作提出了一种深度强化学习方法,该方法使用近端策略优化在部分可观测环境中学习基于激光雷达的鲁棒感知运动策略。视觉感知对于积极克服具有挑战性的地形至关重要,为此,我们提出了一种新的学习策略:动态奖励策略(DRS),它可以作为一种有效的启发式方法,使用神经网络架构学习多功能步态,而无需访问历史数据。此外,在OpenAI gym环境的修改版本中,在所有测试的挑战地形中,对提议的工作进行评估,成功率超过90%。 摘要:Legged robots are becoming increasingly powerful and popular in recent years for their potential to bring the mobility of autonomous agents to the next level. This work presents a deep reinforcement learning approach that learns a robust Lidar-based perceptual locomotion policy in a partially observable environment using Proximal Policy Optimisation. Visual perception is critical to actively overcome challenging terrains, and to do so, we propose a novel learning strategy: Dynamic Reward Strategy (DRS), which serves as effective heuristics to learn a versatile gait using a neural network architecture without the need to access the history data. Moreover, in a modified version of the OpenAI gym environment, the proposed work is evaluated with scores over 90% success rate in all tested challenging terrains.

【6】 OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching 标题:OPIRL:基于分布匹配的样本有效非策略逆强化学习 链接:https://arxiv.org/abs/2109.04307

作者:Hana Hoshino,Kei Ota,Asako Kanezaki,Rio Yokota 机构: emerging from the same motivation as 1 School of Computing, Department of Computer Science, Tokyo Instituteof Technology 备注:Under submission 摘要:反向强化学习(IRL)在奖励工程繁琐的场景中很有吸引力。然而,以前的IRL算法用于策略转换,这需要从当前策略中进行密集采样以获得稳定和最佳性能。这限制了现实世界中的IRL应用程序,在现实世界中,环境交互可能变得非常昂贵。为了解决这个问题,我们提出了非策略反向强化学习(OPIRL),它(1)采用非策略数据分布而不是策略上的数据分布,能够显著减少与环境的交互次数,(2)学习一个固定的奖励函数,该函数在不断变化的动态中具有较高的泛化能力,并且(3)利用模式覆盖行为加快收敛。通过实验,我们证明了我们的方法具有更高的样本效率,并且可以推广到新的环境中。我们的方法在策略性能基线上取得了更好或可比的结果,并且交互作用显著减少。此外,我们的经验表明,恢复的奖励函数推广到现有技术容易失败的不同任务。 摘要:Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

【7】 Energy-Efficient Mobile Robot Control via Run-time Monitoring of Environmental Complexity and Computing Workload 标题:基于运行时环境复杂性和计算工作量监测的高能效移动机器人控制 链接:https://arxiv.org/abs/2109.04285

作者:Sherif A. S. Mohamed,Mohammad-Hashem Haghbayan,Antonio Miele,Onur Mutlu,Juha Plosila 机构:University of Turku 备注:Accepted to be published on 2021 International Conference on Intelligent Robots and Systems (IROS) 摘要:我们提出了一种节能控制器,通过动态操作机器人的机械和计算执行器来最小化移动机器人的能量消耗。移动机器人基于基于事件的摄像机执行基于视觉的实时应用。控制器的执行器为计算部分的CPU电压/频率和机械部分的电机电压。我们表明,独立考虑机器人的速度控制和CPU的电压/频率控制并不一定能得到节能的解决方案。事实上,为了获得最高的效率,计算部分和机械部分应该协同控制。我们提出了一种快速爬山优化算法,允许控制器在运行时以及移动机器人在行进过程中遇到新环境时找到最佳的CPU/电机配置。在以Jetson TX2板为计算单元的无刷直流电机机器人和基于DAVIS-346事件的摄像机上的实验结果表明,在低复杂度、中复杂度和高复杂度环境下,所提出的控制算法在基线上平均可以节省50.5%、41%和30%的电池能量。 摘要:We propose an energy-efficient controller to minimize the energy consumption of a mobile robot by dynamically manipulating the mechanical and computational actuators of the robot. The mobile robot performs real-time vision-based applications based on an event-based camera. The actuators of the controller are CPU voltage/frequency for the computation part and motor voltage for the mechanical part. We show that independently considering speed control of the robot and voltage/frequency control of the CPU does not necessarily result in an energy-efficient solution. In fact, to obtain the highest efficiency, the computation and mechanical parts should be controlled together in synergy. We propose a fast hill-climbing optimization algorithm to allow the controller to find the best CPU/motor configuration at run-time and whenever the mobile robot is facing a new environment during its travel. Experimental results on a robot with Brushless DC Motors, Jetson TX2 board as the computing unit, and a DAVIS-346 event-based camera show that the proposed control algorithm can save battery energy by an average of 50.5%, 41%, and 30%, in low-complexity, medium-complexity, and high-complexity environments, over baselines.

【8】 Solving Simultaneous Target Assignment and Path Planning Efficiently with Time-Independent Execution 标题:时间无关的同时目标分配和路径规划问题的高效求解 链接:https://arxiv.org/abs/2109.04264

作者:Keisuke Okumura,Xavier Défago 机构:School of Computing, Tokyo Institute of Technology, Tokyo, Japan 备注:19 pages, preprint 摘要:多智能体目标分配和路径规划组合问题的实时规划,也称为多智能体路径发现(MAPF)的未标记版本,对于多智能体系统中的高级协调至关重要,例如,机器人群的模式形成。本文研究了未标记MAPF的两个方面:(1)离线场景:以较小的计算时间通过集中式方法解决大型实例;(2)在线场景:不顾真实机器人的时间不确定性执行未标记MAPF。为此,我们提出了一种新的完整算法TSWAP,它由具有延迟评估的目标分配和具有目标交换的路径规划组成。TSWAP可以适应离线和在线场景。我们的经验表明,离线TSWAP具有高度的可扩展性;提供接近最优的解决方案,同时与现有方法相比,运行时间减少了几个数量级。此外,我们还通过真实的机器人演示展示了在线TSWAP的优点,例如延迟容忍度。 摘要:Real-time planning for a combined problem of target assignment and path planning for multiple agents, also known as the unlabeled version of Multi-Agent Path Finding (MAPF), is crucial for high-level coordination in multi-agent systems, e.g., pattern formation by robot swarms. This paper studies two aspects of unlabeled-MAPF: (1) offline scenario: solving large instances by centralized approaches with small computation time, and (2) online scenario: executing unlabeled-MAPF despite timing uncertainties of real robots. For this purpose, we propose TSWAP, a novel complete algorithm consisting of target assignment with lazy evaluation and path planning with target swapping. TSWAP can adapt to both offline and online scenarios. We empirically demonstrate that Offline TSWAP is highly scalable; providing near-optimal solutions while reducing runtime by orders of magnitude compared to existing approaches. In addition, we present the benefits of Online TSWAP, such as delay tolerance, through real-robot demos.

【9】 Learning Forceful Manipulation Skills from Multi-modal Human Demonstrations 标题:从多模态人体演示中学习有力的操作技巧 链接:https://arxiv.org/abs/2109.04222

作者:An T. Le,Meng Guo,Niels van Duijkeren,Leonel Rozo,Robert Krug,Andras G. Kupcsik,Mathias Buerger 机构: Instead 1University of Stuttgart; 2Bosch Center for Artificial Intelligence (BCAI) 摘要:从演示中学习(LfD)提供了一种直观、快速的方法来编程机械手。任务参数化表示允许轻松适应新场景和在线观察。然而,这种方法仅限于姿势演示,因此仅限于具有空间和时间特征的技能。在这项工作中,我们扩展了LfD框架,以解决强有力的操作技能,这对于装配等工业过程非常重要。对于这些技能,包括机器人末端执行器姿势、力和扭矩读数以及操作场景在内的多模态演示是必不可少的。我们的目标是根据不同场景中演示的姿势和力剖面可靠地再现这些技能。该方法结合了我们以前在任务参数化优化和基于吸引子的阻抗控制方面的工作。学习技能模型包括(i)统一姿势和力特征的吸引子模型,以及(ii)优化技能不同阶段刚度的刚度模型。此外,还提出了一种在线执行算法,使技能执行适应机器人姿势、测量力和变化场景的实时观察。我们在电动自行车电机装配过程中的几个步骤上,在7自由度机械臂上严格验证了该方法,这些步骤需要不同类型的强制交互,如插入、滑动和扭转。 摘要:Learning from Demonstration (LfD) provides an intuitive and fast approach to program robotic manipulators. Task parameterized representations allow easy adaptation to new scenes and online observations. However, this approach has been limited to pose-only demonstrations and thus only skills with spatial and temporal features. In this work, we extend the LfD framework to address forceful manipulation skills, which are of great importance for industrial processes such as assembly. For such skills, multi-modal demonstrations including robot end-effector poses, force and torque readings, and operation scene are essential. Our objective is to reproduce such skills reliably according to the demonstrated pose and force profiles within different scenes. The proposed method combines our previous work on task-parameterized optimization and attractor-based impedance control. The learned skill model consists of (i) the attractor model that unifies the pose and force features, and (ii) the stiffness model that optimizes the stiffness for different stages of the skill. Furthermore, an online execution algorithm is proposed to adapt the skill execution to real-time observations of robot poses, measured forces, and changed scenes. We validate this method rigorously on a 7-DoF robot arm over several steps of an E-bike motor assembly process, which require different types of forceful interaction such as insertion, sliding and twisting.

【10】 Performance, Precision, and Payloads: Adaptive Nonlinear MPC for Quadrotors 标题:性能、精度和有效载荷:四旋翼的自适应非线性预测控制 链接:https://arxiv.org/abs/2109.04210

作者:Drew Hanover,Philipp Foehn,Sihao Sun,Elia Kaufmann,Davide Scaramuzza 机构: University of Zurichand ETH Zurich 备注:8 Pages, 5 figures, submitted to RAL ICRA 22 摘要:在充满挑战的环境中进行灵活的四旋翼飞行有可能彻底改变航运、运输和搜索救援应用。非线性模型预测控制(NMPC)最近在敏捷四转子控制中显示出了有希望的结果,但它依赖于高精度的模型来实现最大性能。因此,未建模的复杂气动效应、变化的有效载荷和参数失配形式的模型不确定性将降低系统的整体性能。在本文中,我们提出了L1-NMPC,这是一种新型的混合自适应NMPC,用于在线学习模型不确定性并立即对其进行补偿,以最小的计算开销大大提高了非自适应基线的性能。我们提出的体系结构可以推广到许多不同的环境中,从中我们可以评估风、未知有效载荷和高度敏捷的飞行条件。该方法具有极大的灵活性和鲁棒性,在大未知干扰和无任何增益调整的情况下,与非自适应NMPC相比,跟踪误差降低了90%以上。此外,具有相同增益的同一控制器可以准确地飞行高度敏捷的赛车轨迹,最高速度为70 km/h,与非自适应NMPC基线相比,提供大约50%的跟踪性能改进。我们将在验收后发布完全开源的代码。 摘要:Agile quadrotor flight in challenging environments has the potential to revolutionize shipping, transportation, and search and rescue applications. Nonlinear model predictive control (NMPC) has recently shown promising results for agile quadrotor control, but relies on highly accurate models for maximum performance. Hence, model uncertainties in the form of unmodeled complex aerodynamic effects, varying payloads and parameter mismatch will degrade overall system performance. In this paper, we propose L1-NMPC, a novel hybrid adaptive NMPC to learn model uncertainties online and immediately compensate for them, drastically improving performance over the non-adaptive baseline with minimal computational overhead. Our proposed architecture generalizes to many different environments from which we evaluate wind, unknown payloads, and highly agile flight conditions. The proposed method demonstrates immense flexibility and robustness, with more than 90% tracking error reduction over non-adaptive NMPC under large unknown disturbances and without any gain tuning. In addition, the same controller with identical gains can accurately fly highly agile racing trajectories exhibiting top speeds of 70 km/h, offering tracking performance improvements of around 50% relative to the non-adaptive NMPC baseline. We will release our code fully open-sourced upon acceptance.

【11】 DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem 标题:DAN:基于分散注意力的神经网络求解MinMax多旅行商问题 链接:https://arxiv.org/abs/2109.04205

作者:Yuhong Cao,Zhanhong Sun,Guillaume Sartoretti 备注:Submitted to IEEE Robotics and Automation Letters (RA-L) on September 9, 2021 摘要:多旅行商问题(mTSP)是一个著名的NP难问题,有着广泛的实际应用。特别是,这项工作涉及最小-最大mTSP,其目标是最小化所有代理之间的最大行程长度(欧几里得距离之和)。中期战略计划通常被视为一个组合优化问题,但由于其计算复杂性,基于搜索的精确和启发式算法随着城市数量的增加而变得效率低下。受深度强化学习(dRL)最新发展的鼓舞,本研究将mTSP视为一项合作任务,并引入了一种基于分散注意的神经网络方法来求解MinMax mTSP,即DAN。在DAN中,代理学习完全分散的策略,通过预测其他代理的未来决策,协作构建一个旅行。我们的模型依赖于Transformer体系结构,并使用具有参数共享的多代理RL进行训练,这为代理和城市的数量提供了自然的可伸缩性。我们在涉及50到1000个城市和5到20个代理的小规模到大规模中期战略计划实例上实验性地演示了我们的模型,并与最先进的基线进行了比较。对于小规模问题(少于100个城市),在相同的计算时间预算下,DAN能够与可用的最佳解算器(或工具,元启发式解算器)的性能紧密匹配。在更大规模的实例中,DAN的性能优于传统的和基于dRL的解算器,同时保持较低的计算时间,并表现出增强的代理之间的协作。 摘要:The multiple traveling salesman problem (mTSP) is a well-known NP-hard problem with numerous real-world applications. In particular, this work addresses MinMax mTSP, where the objective is to minimize the max tour length (sum of Euclidean distances) among all agents. The mTSP is normally considered as a combinatorial optimization problem, but due to its computational complexity, search-based exact and heuristic algorithms become inefficient as the number of cities increases. Encouraged by the recent developments in deep reinforcement learning (dRL), this work considers the mTSP as a cooperative task and introduces a decentralized attention-based neural network method to solve the MinMax mTSP, named DAN. In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting the future decisions of other agents. Our model relies on the Transformer architecture, and is trained using multi-agent RL with parameter sharing, which provides natural scalability to the numbers of agents and cities. We experimentally demonstrate our model on small- to large-scale mTSP instances, which involve 50 to 1000 cities and 5 to 20 agents, and compare against state-of-the-art baselines. For small-scale problems (fewer than 100 cities), DAN is able to closely match the performance of the best solver available (OR Tools, a meta-heuristic solver) given the same computation time budget. In larger-scale instances, DAN outperforms both conventional and dRL-based solvers, while keeping computation times low, and exhibits enhanced collaboration among agents.

【12】 Comfort and Sickness while Virtually Aboard an Autonomous Telepresence Robot 标题:在自主网真机器人上虚拟时的舒适和恶心 链接:https://arxiv.org/abs/2109.04177

作者:Markku Suomalainen,Katherine J. Mimnaugh,Israel Becerra,Eliezer Lozano,Rafael Murrieta-Cid,Steven M. LaValle 机构: Center for Ubiquitous Computing, University of Oulu, Oulu, Finland, Centro de Investigacion en Matematicas (CIMAT), Guanajuato, Mexico 备注:Accepted for publication in EuroXR 2021 摘要:在本文中,我们分析了不同的路径方面如何影响用户的体验,主要是虚拟现实疾病和整体舒适度,同时通过虚拟现实耳机沉浸在自主移动的临场感机器人中。特别是,我们关注机器人如何转动以及它与物体保持的距离,目的是为自主移动的沉浸式临场感机器人规划合适的轨迹;众所周知,旋转加速度会导致大多数VR疾病,而物体的距离会调节光流。我们进行了一项受试者内部用户研究(n=36,女性=18),参与者在虚拟博物馆中观看了三个全景视频,同时乘坐一个自主移动的远程呈现机器人,走三条不同的路径,在转弯、速度或到墙壁和物体的距离等方面有所不同。我们发现,通过SSQ测量的用户疾病与所有路径的6点Likert量表上的舒适度之间存在中度相关性。然而,我们没有发现疾病与最舒适路径的选择之间的关联,表明疾病不是影响用户舒适度的唯一因素。转向速度的主观体验与SSQ分数或舒适度均不相关,尽管人们在开放式问题中经常提到转向速度是不适的来源。通过更仔细地探索开放式答案,一个可能的原因是长度和缺乏可预测性也在很大程度上导致人们观察转弯时感到不舒服。在定量和定性数据中,与墙壁和物体的较大主观距离增加了舒适度,减少了疾病。最后,SSQ分量表和加权总分显示出年龄组和性别的差异。 摘要:In this paper, we analyze how different path aspects affect a user's experience, mainly VR sickness and overall comfort, while immersed in an autonomously moving telepresence robot through a virtual reality headset. In particular, we focus on how the robot turns and the distance it keeps from objects, with the goal of planning suitable trajectories for an autonomously moving immersive telepresence robot in mind; rotational acceleration is known for causing the majority of VR sickness, and distance to objects modulates the optical flow. We ran a within-subjects user study (n = 36, women = 18) in which the participants watched three panoramic videos recorded in a virtual museum while aboard an autonomously moving telepresence robot taking three different paths varying in aspects such as turns, speeds, or distances to walls and objects. We found a moderate correlation between the users' sickness as measured by the SSQ and comfort on a 6-point Likert scale across all paths. However, we detected no association between sickness and the choice of the most comfortable path, showing that sickness is not the only factor affecting the comfort of the user. The subjective experience of turn speed did not correlate with either the SSQ scores or comfort, even though people often mentioned turning speed as a source of discomfort in the open-ended questions. Through exploring the open-ended answers more carefully, a possible reason is that the length and lack of predictability also play a large role in making people observe turns as uncomfortable. A larger subjective distance from walls and objects increased comfort and decreased sickness both in quantitative and qualitative data. Finally, the SSQ subscales and total weighted scores showed differences by age group and by gender.

【13】 Safe, Deterministic Trajectory Planning for Unstructured and Partially Occluded Environments 标题:非结构化和部分遮挡环境下的安全确定性轨迹规划 链接:https://arxiv.org/abs/2109.04175

作者:Sebastian vom Dorff,Maximilian Kneissl,Martin Fränzle 摘要:确保自动化车辆在不受管制的交通区域内的安全行为对该行业构成了复杂的挑战。为这一挑战提供可扩展和可认证的解决方案是一个公开的问题。我们推导了一种基于模型预测控制的轨迹规划器,它与基于元胞自动机的行人安全监控系统互操作。以一个狭窄的室内停车场环境为例,演示了组合式planner监控系统。该系统的特点是确定性行为,减轻了黑匣子的内在风险,并提供完全的可认证性。通过使用行人的基本和保守预测模型,监视器能够确定部分闭塞和非结构化停车环境中的安全驾驶区域。该信息被反馈给轨迹规划器,该规划器通过约束优化确保车辆随时保持在安全驾驶区域。我们将展示该方法如何在停车场拥挤的情况下解决大量问题。尽管采用了保守的预测模型,但评估结果表明,测试的低速导航系统性能良好。 摘要:Ensuring safe behavior for automated vehicles in unregulated traffic areas poses a complex challenge for the industry. It is an open problem to provide scalable and certifiable solutions to this challenge. We derive a trajectory planner based on model predictive control which interoperates with a monitoring system for pedestrian safety based on cellular automata. The combined planner-monitor system is demonstrated on the example of a narrow indoor parking environment. The system features deterministic behavior, mitigating the immanent risk of black boxes and offering full certifiability. By using fundamental and conservative prediction models of pedestrians the monitor is able to determine a safe drivable area in the partially occluded and unstructured parking environment. The information is fed to the trajectory planner which ensures the vehicle remains in the safe drivable area at any time through constrained optimization. We show how the approach enables solving plenty of situations in tight parking garage scenarios. Even though conservative prediction models are applied, evaluations indicate a performant system for the tested low-speed navigation.

【14】 Self-supervised Reinforcement Learning with Independently Controllable Subgoals 标题:具有独立可控子目标的自监督强化学习 链接:https://arxiv.org/abs/2109.04150

作者:Andrii Zadaianchuk,Georg Martius,Fanny Yang 机构: Max Planck Institute for Intelligent Systems, T¨ubingen, Germany, Department of Computer Science, ETH Zurich 摘要:为了成功地处理具有挑战性的操作任务,自治代理必须学习多种技能,以及如何将它们结合起来。最近,通过利用环境中发现的结构来设定自己抽象目标的自监督代理在许多不同的任务中表现良好。特别是,其中一些用于学习合成多对象环境中的基本操作技能。但是,这些方法学习技能时不考虑对象之间的依赖关系。因此,所学的技能很难在现实环境中结合起来。我们提出了一种新的自监督代理,它估计环境组件之间的关系,并使用它们独立地控制环境状态的不同部分。此外,对象之间的估计关系可用于将复杂目标分解为兼容的子目标序列。我们证明,通过使用该框架,agent可以在对象间关系不同的多对象环境中高效、自动地学习操作任务。 摘要:To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.

【15】 Robot Localization and Navigation through Predictive Processing using LiDAR 标题:基于激光雷达预测处理的机器人定位与导航 链接:https://arxiv.org/abs/2109.04139

作者:Daniel Burghardt,Pablo Lanillos 机构: Lanillos 2 1 Radboud University, NL 2 Donders Institute for Brain, Department of ArtificialIntelligence 备注:2nd International Workshop on Active Inference IWAI2021, European Conference on Machine Learning (ECML/PCKDD 2021) 摘要:了解机器人在世界上的位置对于导航至关重要。如今,贝叶斯滤波器,如卡尔曼滤波和基于粒子滤波,是移动机器人的标准方法。最近,端到端学习允许扩展到高维输入并改进泛化。然而,提供可靠的激光导航仍然存在局限性。这里,我们展示了一种基于预测处理的感知方法的概念证明,该方法应用于使用激光传感器的定位和导航,无需里程计。我们通过自监督学习学习激光器的生成模型,并在变分自由能界上通过随机梯度下降进行在线状态估计和导航。我们在露台上装有激光传感器(SICK)的移动机器人(TIAGo基地)上评估了该算法。结果表明,在没有里程计的情况下,与最先进的粒子滤波器相比,状态估计性能有所提高。此外,与标准的贝叶斯估计方法相反,我们的方法还使机器人能够在提供所需目标时通过推断使预测误差最小化的动作进行导航。 摘要:Knowing the position of the robot in the world is crucial for navigation. Nowadays, Bayesian filters, such as Kalman and particle-based, are standard approaches in mobile robotics. Recently, end-to-end learning has allowed for scaling-up to high-dimensional inputs and improved generalization. However, there are still limitations to providing reliable laser navigation. Here we show a proof-of-concept of the predictive processing-inspired approach to perception applied for localization and navigation using laser sensors, without the need for odometry. We learn the generative model of the laser through self-supervised learning and perform both online state-estimation and navigation through stochastic gradient descent on the variational free-energy bound. We evaluated the algorithm on a mobile robot (TIAGo Base) with a laser sensor (SICK) in Gazebo. Results showed improved state-estimation performance when comparing to a state-of-the-art particle filter in the absence of odometry. Furthermore, conversely to standard Bayesian estimation approaches our method also enables the robot to navigate when providing the desired goal by inferring the actions that minimize the prediction error.

【16】 Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization 标题:用于实时图像地理定位的跨尺度视觉表示学习 链接:https://arxiv.org/abs/2109.04087

作者:Tianyi Zhang,Matthew Johnson-Roberson 机构: Zhang is with the Robotics Institute, University of Michigan, Johnson-Roberson is with the Department of Naval Architectureand Marine Engineering 摘要:在GPS拒绝的环境中,机器人定位仍然是一项具有挑战性的任务。基于局部传感器(如摄像头或IMU)的状态估计方法,随着误差累积,在远程任务中很容易漂移。在这项研究中,我们的目标是通过在二维多模态地理空间地图中定位图像观测来解决这个问题。我们介绍了跨尺度数据集和从跨模态来源生成附加数据的方法。我们提出了一个框架,学习跨尺度的视觉表示没有监督。实验是在水下和空中两个不同领域的数据上进行的。与现有的交叉图像地理定位研究相比,我们的方法a)在小比例尺多模态地图上表现更好;b) 对于实时应用而言,计算效率更高;c) 可直接与状态估计管道配合使用。 摘要:Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in cross-view image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines.

【17】 Risk-Averse Decision Making Under Uncertainty 标题:不确定条件下的风险规避决策 链接:https://arxiv.org/abs/2109.04082

作者:Mohamadreza Ahmadi,Ugo Rosolia,Michel D. Ingham,Richard M. Murray,Aaron D. Ames 机构: Ames are with Controland Dynamical Systems (CDS) at the California Institute of Technol-ogy 备注:arXiv admin note: substantial text overlap with arXiv:2012.02423 摘要:不确定性问题下的一大类决策可以通过马尔可夫决策过程(MDP)或部分可观测MDP(POMDP)来描述,并应用于人工智能和运筹学等领域。传统上,政策综合技术的提出使得总的预期成本或回报最小化或最大化。然而,只有当大量运行中的系统行为值得关注时,总预期成本意义上的最优才是合理的,这限制了在实际任务关键场景中使用此类策略,其中,与预期行为的大偏差可能导致任务失败。在本文中,我们考虑的设计问题的MDPs和POMDPs的目标和约束的动态连贯风险措施,我们称之为约束风险规避问题。对于MDP,我们通过拉格朗日框架将问题转化为infsup问题,并提出了一种基于优化的方法来综合马尔可夫策略。对于MDP,我们证明了公式化的优化问题是以差分凸规划(DCP)的形式存在的,并且可以通过约束凸凹规划(DCCP)框架来解决。我们证明了这些结果推广了具有总贴现期望成本和约束的约束MDP的线性规划。对于POMDP,我们证明,如果一致风险度量可以定义为马尔可夫风险转移映射,则可以使用无限维优化来设计基于马尔可夫信念的策略。对于随机有限状态控制器(FSC),我们证明后者的优化简化为(有限维)DCP,并且可以通过DCCP框架来解决。我们将这些DCP合并到策略迭代算法中,为POMDP设计风险规避FSC。 摘要:A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs), with application to artificial intelligence and operations research, among others. Traditionally, policy synthesis techniques are proposed such that a total expected cost or reward is minimized or maximized. However, optimality in the total expected cost sense is only reasonable if system behavior in the large number of runs is of interest, which has limited the use of such policies in practical mission-critical scenarios, wherein large deviations from the expected behavior may lead to mission failure. In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures, which we refer to as the constrained risk-averse problem. For MDPs, we reformulate the problem into a infsup problem via the Lagrangian framework and propose an optimization-based method to synthesize Markovian policies. For MDPs, we demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. For POMDPs, we show that, if the coherent risk measures can be defined as a Markov risk transition mapping, an infinite-dimensional optimization can be used to design Markovian belief-based policies. For stochastic finite-state controllers (FSCs), we show that the latter optimization simplifies to a (finite-dimensional) DCP and can be solved by the DCCP framework. We incorporate these DCPs in a policy iteration algorithm to design risk-averse FSCs for POMDPs.

【18】 Keeping an Eye on Things: Deep Learned Features for Long-Term Visual Localization 标题:紧盯事物:长期视觉定位的深层习得特征 链接:https://arxiv.org/abs/2109.04041

作者:Mona Gridseth,Timothy D. Barfoot 机构: while theAllauthorsarewiththeUniversityofTorontoInstituteforAerospaceStudies(UTIAS) 摘要:在本文中,我们学习视觉特征,我们使用这些特征首先构建一张地图,然后定位一个机器人在一整天的灯光变化中(包括在黑暗中)自主驾驶。我们训练了一个神经网络来预测稀疏关键点,该网络具有相关的描述符和分数,可以与经典的姿势估计器一起用于定位。我们的训练渠道包括一个可微分的姿势估计器,这样就可以使用先前收集的数据中的地面真实姿势对训练进行监督,在我们2016年和2017年的案例中,这些数据是通过多体验视觉教学和重复(VT&R)收集的。然后,我们将学习到的特性插入现有VT&R管道,以便在非结构化室外环境中执行闭环路径跟踪。我们展示了在所有照明条件下的成功路径跟踪,尽管机器人的地图是使用日光条件构建的。此外,我们还通过在特征训练数据集中不存在的两个新区域驾驶机器人穿越所有照明条件,探索特征的普遍性。总之,我们在具有挑战性的条件下通过30公里的自主路径跟踪实验验证了我们的方法。 摘要:In this paper, we learn visual features that we use to first build a map and then localize a robot driving autonomously across a full day of lighting change, including in the dark. We train a neural network to predict sparse keypoints with associated descriptors and scores that can be used together with a classical pose estimator for localization. Our training pipeline includes a differentiable pose estimator such that training can be supervised with ground truth poses from data collected earlier, in our case from 2016 and 2017 gathered with multi-experience Visual Teach and Repeat (VT&R). We then insert the learned features into the existing VT&R pipeline to perform closed-loop path-following in unstructured outdoor environments. We show successful path following across all lighting conditions despite the robot's map being constructed using daylight conditions. Moreover, we explore generalizability of the features by driving the robot across all lighting conditions in two new areas not present in the feature training dataset. In all, we validated our approach with 30 km of autonomous path-following experiments in challenging conditions.

【19】 Taxim: An Example-based Simulation Model for GelSight Tactile Sensors 标题:Taxim:一种基于实例的GelSight触觉传感器仿真模型 链接:https://arxiv.org/abs/2109.04027

作者:Zilin Si,Wenzhen Yuan 机构: as the contact medium for 1Zilin Si and Wenzhen Yuan are with the Robotics Institute, CarnegieMellon University 摘要:仿真在机器人技术中被广泛用于系统验证和大规模数据采集。然而,模拟传感器,包括触觉传感器,一直是一个长期的挑战。在本文中,我们提出了TAXM,一个基于视觉的触觉传感器GelSight的真实感和高速仿真模型。GelSight传感器使用一块软弹性体作为接触介质,并嵌入光学结构以捕获弹性体的变形,从而推断出接触面上施加的几何形状和力。我们提出了一种基于示例的GelSight模拟方法:使用多项式查找表模拟变形的光学响应。此表将变形几何体映射到嵌入式相机采样的像素强度。为了模拟弹性体表面拉伸引起的表面标记的运动,我们应用了线弹性变形理论和叠加原理。仿真模型使用实际传感器提供的不到100个数据点进行校准。基于示例的方法使模型能够轻松移植到其他GelSight传感器或其变体。据我们所知,我们的模拟框架是第一个将源自弹性体变形的标记运动场模拟与光学模拟相结合的框架,创建了一个全面且计算效率高的触觉模拟框架。实验表明,与以前的工作相比,我们的光学模拟具有最低的像素级强度误差,并且可以通过CPU计算在线运行。 摘要:Simulation is widely used in robotics for system verification and large-scale data collection. However, simulating sensors, including tactile sensors, has been a long-standing challenge. In this paper, we propose Taxim, a realistic and high-speed simulation model for a vision-based tactile sensor, GelSight. A GelSight sensor uses a piece of soft elastomer as the medium of contact and embeds optical structures to capture the deformation of the elastomer, which infers the geometry and forces applied at the contact surface. We propose an example-based method for simulating GelSight: we simulate the optical response to the deformation with a polynomial look-up table. This table maps the deformed geometries to pixel intensity sampled by the embedded camera. In order to simulate the surface markers' motion that is caused by the surface stretch of the elastomer, we apply the linear elastic deformation theory and the superposition principle. The simulation model is calibrated with less than 100 data points from a real sensor. The example-based approach enables the model to easily migrate to other GelSight sensors or its variations. To the best of our knowledge, our simulation framework is the first to incorporate marker motion field simulation that derives from elastomer deformation together with the optical simulation, creating a comprehensive and computationally efficient tactile simulation framework. Experiments reveal that our optical simulation has the lowest pixel-wise intensity errors compared to prior work and can run online with CPU computing.

【20】 Active Multi-Object Exploration and Recognition via Tactile Whiskers 标题:基于触觉胡须的主动多目标探测与识别 链接:https://arxiv.org/abs/2109.03976

作者:Chenxi Xiao,Shujia Xu,Wenzhuo Wu,Juan Wachs 机构: While some of this can be found through specializedChenxi Xiao is with the School of Industrial Engineering at PurdueUniversity, eduShujia Xu is with the School of Industrial Engineering at Purdue University 摘要:当光学信息不可用时,机器人在不确定环境下的探索具有挑战性。在本文中,我们提出了一种基于触觉感知的未知任务空间探索的自主解决方案。我们首先设计了一种基于MEMS气压计器件的晶须传感器。该传感器可以通过与环境的非侵入性交互来获取接触信息。这种传感器伴随着一种规划技术,通过仅仅使用触觉感知来生成探索轨迹。该技术依赖于触觉探索的混合策略,包括用于对象搜索的主动信息路径规划器和用于轮廓跟踪的反应式Hopf振荡器。结果表明,混合探测策略可以提高目标发现的效率。最后,通过分割对象和分类来促进场景理解。基于晶须传感器采集的几何特征,开发了一种识别目标类别的分类器。这种方法表明,触须传感器与触觉智能一起,可以提供足够的鉴别特征来区分物体。 摘要:Robotic exploration under uncertain environments is challenging when optical information is not available. In this paper, we propose an autonomous solution of exploring an unknown task space based on tactile sensing alone. We first designed a whisker sensor based on MEMS barometer devices. This sensor can acquire contact information by interacting with the environment non-intrusively. This sensor is accompanied by a planning technique to generate exploration trajectories by using mere tactile perception. This technique relies on a hybrid policy for tactile exploration, which includes a proactive informative path planner for object searching, and a reactive Hopf oscillator for contour tracing. Results indicate that the hybrid exploration policy can increase the efficiency of object discovery. Last, scene understanding was facilitated by segmenting objects and classification. A classifier was developed to recognize the object categories based on the geometric features collected by the whisker sensor. Such an approach demonstrates the whisker sensor, together with the tactile intelligence, can provide sufficiently discriminative features to distinguish objects.

【21】 Quality-Diversity Meta-Evolution: customising behaviour spaces to a meta-objective 标题:质量多样性元进化:为元目标定制行为空间 链接:https://arxiv.org/abs/2109.03918

作者:David M. Bossens,Danesh Tarapore 摘要:质量多样性(QD)算法进化出行为多样性和高性能的解决方案。为了阐明行为空间的精英解决方案,QD算法需要定义合适的行为空间。如果行为空间是高维的,则需要适当的降维技术来维持有限数量的行为生态位。虽然目前的自动化行为空间方法侧重于改变几何结构或无监督学习,但仍然需要根据最终用户指定的特定元目标定制行为多样性。在新出现的量子点元进化(简称量子点元进化)框架中,人们进化出一批量子点算法,每个算法都具有不同的算法和代表性特征,以优化算法及其结果档案,从而实现用户定义的元目标。尽管与传统QD算法相比,QD Meta的结果很有希望,但尚未与最先进的行为空间自动化方法进行比较,如形心Voronoi镶嵌多维表型精英存档算法(CVT MAP Elites)和实现其能力的自主机器人(AURORA)。本文对QD元函数优化和多足机器人运动基准进行了实证研究。结果表明,与CVT地图精英和极光相比,QD元档案提供了更好的平均性能和更快的适应环境先验未知变化的能力。定性分析表明,生成的归档文件是如何根据最终用户提供的元目标进行定制的。 摘要:Quality-Diversity (QD) algorithms evolve behaviourally diverse and high-performing solutions. To illuminate the elite solutions for a space of behaviours, QD algorithms require the definition of a suitable behaviour space. If the behaviour space is high-dimensional, a suitable dimensionality reduction technique is required to maintain a limited number of behavioural niches. While current methodologies for automated behaviour spaces focus on changing the geometry or on unsupervised learning, there remains a need for customising behavioural diversity to a particular meta-objective specified by the end-user. In the newly emerging framework of QD Meta-Evolution, or QD-Meta for short, one evolves a population of QD algorithms, each with different algorithmic and representational characteristics, to optimise the algorithms and their resulting archives to a user-defined meta-objective. Despite promising results compared to traditional QD algorithms, QD-Meta has yet to be compared to state-of-the-art behaviour space automation methods such as Centroidal Voronoi Tessellations Multi-dimensional Archive of Phenotypic Elites Algorithm (CVT-MAP-Elites) and Autonomous Robots Realising their Abilities (AURORA). This paper performs an empirical study of QD-Meta on function optimisation and multilegged robot locomotion benchmarks. Results demonstrate that QD-Meta archives provide improved average performance and faster adaptation to a priori unknown changes to the environment when compared to CVT-MAP-Elites and AURORA. A qualitative analysis shows how the resulting archives are tailored to the meta-objectives provided by the end-user.

【22】 Dynamic Locomotion Teleoperation of a Wheeled Humanoid Robot Reduced Model with a Whole-Body Human-Machine Interface 标题:具有全身人机接口的轮式仿人机器人简化模型的动态运动遥操作 链接:https://arxiv.org/abs/2109.03906

作者:Sunyu Wang,Joao Ramos 机构: 1The author was with the Department of Mechanical Science and Engi-neering at the University of Illinois at Urbana-Champaign, The authoris currently with the Robotics Institute at Carnegie Mellon University 摘要:双边遥操作为仿人机器人提供了人类规划智能,同时使人类能够感受机器人的感受。它有可能将具有物理能力的仿人机器人转变为动态智能机器人。然而,由于涉及复杂的动力学,动态双边移动遥操作仍然是一个挑战。这项工作提出了我们的初步步骤,以解决这一挑战,通过概念的轮式拟人机器人移动遥操作的身体倾斜。具体而言,我们开发了一个具有力反馈功能的全身人机界面(HMI),并设计了一个力反馈映射和两个遥操作映射,将人体倾斜映射到机器人的速度或加速度。我们比较了这两种映射,并通过一个实验研究了力反馈的效果,其中七名受试者通过HMI遥控模拟机器人执行动态目标跟踪任务。实验结果表明,所有受试者在练习后都使用这两种映射完成了任务,并且力反馈改善了他们的表现。然而,受试者表现出两种不同的遥操作方式,这两种方式从力反馈中受益不同。此外,力反馈影响受试者对遥操作映射的偏好,尽管大多数受试者在速度映射中表现更好。 摘要:Bilateral teleoperation provides humanoid robots with human planning intelligence while enabling the human to feel what the robot feels. It has the potential to transform physically capable humanoid robots into dynamically intelligent ones. However, dynamic bilateral locomotion teleoperation remains as a challenge due to the complex dynamics it involves. This work presents our initial step to tackle this challenge via the concept of wheeled humanoid robot locomotion teleoperation by body tilt. Specifically, we developed a force-feedback-capable whole-body human-machine interface (HMI), and designed a force feedback mapping and two teleoperation mappings that map the human's body tilt to the robot's velocity or acceleration. We compared the two mappings and studied the force feedback's effect via an experiment, where seven human subjects teleoperated a simulated robot with the HMI to perform dynamic target tracking tasks. The experimental results suggest that all subjects accomplished the tasks with both mappings after practice, and the force feedback improved their performances. However, the subjects exhibited two distinct teleoperation styles, which benefited from the force feedback differently. Moreover, the force feedback affected the subjects' preferences on the teleoperation mappings, though most subjects performed better with the velocity mapping.

【23】 Interpretable Run-Time Prediction and Planning in Co-Robotic Environments 标题:协同机器人环境中可解释的运行时预测与规划 链接:https://arxiv.org/abs/2109.03893

作者:Rahul Peddi,Nicola Bezzo 机构: If robotsRahul Peddi and Nicola Bezzo are with the Department of Systemsand Information Engineering and the Charles L, Brown Department ofElectrical and Computer Engineering, University of Virginia 备注:Final version to be presented at IROS 2021 摘要:传统上,移动机器人的发展方向是反应灵敏,避免与周围的人发生碰撞,通常在不遵守社交协议的情况下以不自然的方式移动,迫使人们的行为与人类交互规则非常不同。另一方面,人类能够无缝地理解为什么他们可能会干扰周围的人,并根据他们的推理改变他们的行为,从而产生平滑、直观的避免行为。在本文中,我们提出了一种移动机器人避免干扰周围人类期望路径的方法。我们利用先前观察到的轨迹库来设计基于决策树的可解释监视器:i)预测机器人是否干扰周围的人类,ii)解释导致预测的行为,以及iii)如果预测到干扰,则计划纠正行为。我们还提出了一个在运行时改进预测模型的验证方案。该方法通过无人地面车辆(UGV)在有人在场的情况下执行目标操作、演示无干扰行为和运行时学习的仿真和实验进行了验证。 摘要:Mobile robots are traditionally developed to be reactive and avoid collisions with surrounding humans, often moving in unnatural ways without following social protocols, forcing people to behave very differently from human-human interaction rules. Humans, on the other hand, are seamlessly able to understand why they may interfere with surrounding humans and change their behavior based on their reasoning, resulting in smooth, intuitive avoiding behaviors. In this paper, we propose an approach for a mobile robot to avoid interfering with the desired paths of surrounding humans. We leverage a library of previously observed trajectories to design a decision-tree based interpretable monitor that: i) predicts whether the robot is interfering with surrounding humans, ii) explains what behaviors are causing either prediction, and iii) plans corrective behaviors if interference is predicted. We also propose a validation scheme to improve the predictive model at run-time. The proposed approach is validated with simulations and experiments involving an unmanned ground vehicle (UGV) performing go-to-goal operations in the presence of humans, demonstrating non-interfering behaviors and run-time learning.

【24】 SORNet: Spatial Object-Centric Representations for Sequential Manipulation 标题:SORNet:面向顺序操作的空间对象中心表示法 链接:https://arxiv.org/abs/2109.03891

作者:Wentao Yuan,Chris Paxton,Karthik Desingh,Dieter Fox 机构:University of Washington, NVIDIA 摘要:顺序操作任务要求机器人感知环境状态并规划一系列动作,从而达到所需的目标状态,从原始传感器输入推断对象实体之间的空间关系的能力至关重要。以前的工作依赖于显式状态估计或端到端学习与新对象进行斗争。在这项工作中,我们提出了SORNet(空间以对象为中心的表示网络),它从RGB图像中提取以对象为中心的表示,并以感兴趣对象的规范视图为条件。我们发现,SORNet学习的对象嵌入在三个空间推理任务上(空间关系分类、技能前提分类和相对方向回归)将Zero-Shot推广到看不见的对象实体,显著优于基线。此外,我们提供了真实世界的机器人实验,演示了学习对象嵌入在顺序操作任务规划中的使用。 摘要:Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.

【25】 Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning 标题:基于强化学习的大气层内导弹综合自适应制导控制 链接:https://arxiv.org/abs/2109.03880

作者:Brian Gaudet,Isaac Charcos,Roberto Furfaro 机构:University of Arizona, E. Roger Way, Tucson Arizona 摘要:我们应用元强化学习框架对空空导弹的集成自适应制导和飞行控制系统进行优化,并将该系统作为深度神经网络(策略)实现。该策略将观测值直接映射到导弹控制面偏转的指令变化率,通过最小处理从捷联式导引头测量的计算稳定视线单位矢量、速率陀螺仪估计的旋转速度和控制面偏转角导出观测值。该系统针对机动目标诱导拦截轨迹,该轨迹满足鳍偏转角控制约束、视角和载荷路径约束。我们在一个包括非线性天线罩模型和捷联导引头模型的六自由度模拟器中测试了优化后的系统。通过大量的仿真,我们证明了该系统能够适应大的飞行包线和非标称飞行条件,包括气动系数参数和压力中心位置的摄动。此外,我们发现该系统对天线罩折射、导引头不稳定和传感器比例因子误差引起的寄生姿态回路具有鲁棒性。最后,我们将我们的系统性能与两个基准进行比较:简化3自由度环境中的比例导航制导系统基准,我们将其作为单独制导和飞行控制系统可达到的性能上限,以及比例导航纵向模型与三回路自动驾驶仪耦合。我们发现,我们的系统适度优于前者,并且大大优于后者。 摘要:We apply the meta reinforcement learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile, implementing the system as a deep neural network (the policy). The policy maps observations directly to commanded rates of change for the missile's control surface deflections, with the observations derived with minimal processing from the computationally stabilized line of sight unit vector measured by a strap down seeker, estimated rotational velocity from rate gyros, and control surface deflection angles. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model. Through extensive simulation, we demonstrate that the system can adapt to a large flight envelope and off nominal flight conditions that include perturbation of aerodynamic coefficient parameters and center of pressure locations. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction, imperfect seeker stabilization, and sensor scale factor errors. Finally, we compare our system's performance to two benchmarks: a proportional navigation guidance system benchmark in a simplified 3-DOF environment, which we take as an upper bound on performance attainable with separate guidance and flight control systems, and a longitudinal model of proportional navigation coupled with a three loop autopilot. We find that our system moderately outperforms the former, and outperforms the latter by a large margin.

【26】 Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems 标题:部分观测系统具有稳定性保证的递归神经网络控制器综合 链接:https://arxiv.org/abs/2109.03861

作者:Fangda Gu,He Yin,Laurent El Ghaoui,Murat Arcak,Peter Seiler,Ming Jin 机构: University of California, Berkeley, Hearst Ave, Berkeley, California , University of Michigan, S State St, Ann Arbor, Michigan , Virginia Tech, Perry Street , Whittemore (,), Blacksburg, Virginia 摘要:神经网络控制器由于其灵活性和表现力而在控制任务中得到广泛应用。稳定性是安全关键动力系统的一个重要特性,而在许多情况下,部分观测系统的稳定需要控制器保留和处理对过去的长期记忆。我们将重要的类递归神经网络(RNN)作为非线性不确定部分观测系统的动态控制器,并基于积分二次约束、S-引理和序列凸化得到凸稳定性条件。为了确保学习和控制过程中的稳定性,我们提出了一种投影策略梯度方法,该方法利用系统动力学的轻微附加信息,在重新参数化的空间中迭代执行稳定性条件。数值实验表明,与策略梯度相比,该方法在使用较少样本的情况下学习稳定化控制器,并获得更高的最终性能。 摘要:Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers while using fewer samples and achieving higher final performance compared with policy gradient.

机器翻译,仅供参考

0 人点赞