基于元生成内在奖励的机器人操作技能学习方法

吴培良,渠有源,李瑶,陈雯柏,高国伟

计量学报 ›› 2023, Vol. 44 ›› Issue (6) : 923-930.

PDF(761 KB)
PDF(761 KB)
计量学报 ›› 2023, Vol. 44 ›› Issue (6) : 923-930. DOI: 10.3969/j.issn.1000-1158.2023.06.13
力学计量

基于元生成内在奖励的机器人操作技能学习方法

  • 吴培良1,2,渠有源1,2,李瑶1,2,陈雯柏3,高国伟3
作者信息 +

A Meta Generactive Instinsic Reward Based Robot Manipulation Skill Learning

  • WU Pei-liang1,2,QU You-yuan1,2,LI Yao1,2,CHEN Wen-bai3,GAO Guo-wei3
Author information +
文章历史 +

摘要

针对稀疏奖励下,复杂任务学习效率低的问题,在离线策略(off-policy)强化学习思想基础上,提出了元生成内在奖励算法(meta generative intrinsic reward, MGIR),并将其应用在机器人操作技能学习问题求解方面。具体步骤为先使用一个可将复杂任务分解为多个子任务的元生成内在奖励框架,对子任务进行能力评价;再引入生成内在奖励模块,将智能体探索得到状态的新颖性作为内在奖励,并联合环境奖励共同指导智能体完成对环境的探索和特定任务的学习;最后,在MuJoCo仿真环境Fetch中对离线策略强化学习进行对比实验。实验结果表明,无论是在训练效率还是在成功率方面,提出的元生成内在奖励算法均表现较好。

Abstract

To address the problem of low learning efficiency for complex tasks under sparse rewards, a meta generative intrinsic reward (MGIR) algorithm was proposed based on the idea of off policy reinforcement learning. And it has been applied to the problem solving of robot operation skills learning. The specific steps were to first use a meta generated intrinsic reward framework that can decompose complex tasks into multiple subtasks, and evaluated the ability of subtasks. Then, an internal reward module was introduced to generate the novelty of the state explored by the agent as an internal reward. And jointly guided intelligent agents to explore the environment and learn specific tasks through environmental rewards. Finally, comparative experiments were conducted on offline strategy reinforcement learning in the MuJoCo simulation environment Fetch.The experimental results showed that the proposed meta-generated intrinsic reward algorithm performs better both in terms of training efficiency and success rate.

关键词

计量学;机器人操作技能学习 / 稀疏奖励 / 强化学习;元学习;生成内在奖励

Key words

metrology / robot operation skills learning / sparse reward / reinforcement learning / meta learning / generative intrinsic reward

引用本文

导出引用
吴培良,渠有源,李瑶,陈雯柏,高国伟. 基于元生成内在奖励的机器人操作技能学习方法[J]. 计量学报. 2023, 44(6): 923-930 https://doi.org/10.3969/j.issn.1000-1158.2023.06.13
WU Pei-liang,QU You-yuan,LI Yao,CHEN Wen-bai,GAO Guo-wei. A Meta Generactive Instinsic Reward Based Robot Manipulation Skill Learning[J]. Acta Metrologica Sinica. 2023, 44(6): 923-930 https://doi.org/10.3969/j.issn.1000-1158.2023.06.13
中图分类号: TB93    TB973   

参考文献

[1]Deng S, Xu X, Wu C, et al. 3d affordancenet: A benchmark for visual object affordance understanding [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 2021.
[2]吴培良, 刘瑞军, 李瑶, 等. 一种基于生成对抗网络与模型泛化的机器人推抓技能学习方法 [J]. 仪器仪表学报, 2022, 43(5): 244-253.
Wu P L, Liu R J, Li Y, et al. Robot pushing and grasping skill learning method based on generative adversarial network and model generalization [J]. Chinese Journal of Scientific Instrument, 2022, 43(5): 244-253.
[3]Ding Z, Tsai Y Y, Lee W W, et al. Sim-to-Real Transfer for Robotic Manipulation with Tactile Sensory [C]//IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS). Prague, Czech Republic, 2021.
[4]Liu R Z, Pang Z J, Meng Z Y, et al. On Efficient Reinforcement Learning for Full-length Game of StarCraft II [J]. Journal of Artificial Intelligence Research, 2022, 75: 213-260.
[5]Li D, Zhao D, Zhang Q, et al. Reinforcement learning and deep learning based lateral control for autonomous driving [J]. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98.
[6]Zou H S, Ren T Z, Yan D, et al. Learning task-distribution reward shaping with meta-learning [C]//AAAI Conference on Artificial Intelligence. Vancouver, British Columbia, Canada, 2021.
[7]Huang B Y, Tsai S C. Solving hard-exploration problems with counting and replay approach [J]. Engineering Applications of Artificial Intelligence, 2022, 110: 104701.
[8]Zhao R, Sun X, Tresp V. Maximum entropy-regularized multi-goal reinforcement learning [C]//International Conference on Machine Learning. Long Beach,CA, 2019.
[9]McLeod M, Lo C, Schlegel M, et al. Continual auxiliary task learning [J]. Neural Information Processing Systems, 2021, 34: 12549-12562.
[10]Andrychowicz M, Wolski F, Ray A, et al. Hindsigh experience replay [J]. Neural Information Processing Systems, 2017, 12(3): 5048-5058.
[11]Marzari L, Pore A, DallAlba D, et al. Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks [C]//20th International Conference on Advanced Robotics (ICAR). Ljubljana, Slovenia, 2021.
[12]Machado M C, Bellemare M G, Bowling M. Count-based exploration with the successor representation [C]//AAAI Conference on Artificial Intelligence. New York, USA, 2020.
[13]Bellemare M, Srinivasan S, Ostrovski G, et al. Unifying count-based exploration and intrinsic motivation [C]//The 30th International Conference on Neural Information Processing Systems. NY, USA, 2016.
[14]Yu X, Lyu Y, Tsang I. Intrinsic reward driven imitation learning via generative model [C]//International Conference on Machine Learning. Vienna, AUSTRIA, 2020.
[15]Bai C, Liu P, Liu K, et al. Variational dynamic for self-supervised exploration in deep reinforcement learning [J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, Advance online publication.
[16]Schaual T, Horgan D, Gregor K, et al. Universal value function approximators [C]//International conferenceon machine learning. Lille, France, 2015.

基金

国家重点研发计划(2018YFB1308300);国家自然科学基金(62276028,U20A20167);北京市自然科学基金(4202026);河北省自然科学基金(F202103079);河北省创新能力提升计划(22567626H)

PDF(761 KB)

Accesses

Citation

Detail

段落导航
相关文章

/