基于元生成内在奖励的机器人操作技能学习方法

doi:10.3969/j.issn.1000-1158.2023.06.13

摘要
图/表
参考文献(16)
相关文章 (15)

全文: PDF (761 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对稀疏奖励下,复杂任务学习效率低的问题,在离线策略(off-policy)强化学习思想基础上,提出了元生成内在奖励算法(meta generative intrinsic reward, MGIR),并将其应用在机器人操作技能学习问题求解方面。具体步骤为先使用一个可将复杂任务分解为多个子任务的元生成内在奖励框架,对子任务进行能力评价;再引入生成内在奖励模块,将智能体探索得到状态的新颖性作为内在奖励,并联合环境奖励共同指导智能体完成对环境的探索和特定任务的学习;最后,在MuJoCo仿真环境Fetch中对离线策略强化学习进行对比实验。实验结果表明,无论是在训练效率还是在成功率方面,提出的元生成内在奖励算法均表现较好。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	吴培良
	渠有源
	李瑶
	陈雯柏
	高国伟

关键词 ：计量学;机器人操作技能学习, 稀疏奖励, 强化学习;元学习;生成内在奖励

Abstract：To address the problem of low learning efficiency for complex tasks under sparse rewards, a meta generative intrinsic reward (MGIR) algorithm was proposed based on the idea of off policy reinforcement learning. And it has been applied to the problem solving of robot operation skills learning. The specific steps were to first use a meta generated intrinsic reward framework that can decompose complex tasks into multiple subtasks, and evaluated the ability of subtasks. Then, an internal reward module was introduced to generate the novelty of the state explored by the agent as an internal reward. And jointly guided intelligent agents to explore the environment and learn specific tasks through environmental rewards. Finally, comparative experiments were conducted on offline strategy reinforcement learning in the MuJoCo simulation environment Fetch.The experimental results showed that the proposed meta-generated intrinsic reward algorithm performs better both in terms of training efficiency and success rate.

Key words： metrology robot operation skills learning sparse reward reinforcement learning meta learning generative intrinsic reward

收稿日期: 2023-01-03 发布日期: 2023-06-25

PACS:	TB93
	TB973

基金资助:国家重点研发计划(2018YFB1308300);国家自然科学基金(62276028,U20A20167);北京市自然科学基金(4202026);河北省自然科学基金(F202103079);河北省创新能力提升计划(22567626H)

作者简介: 吴培良(1981-),河北石家庄人,燕山大学教授、博导。主要从事机器人认知与操作技能学习、多智能体系统等方面的研究。Email: peiliangwu@ysu.edu.cn

引用本文:

吴培良,渠有源,李瑶,陈雯柏,高国伟. 基于元生成内在奖励的机器人操作技能学习方法[J]. 计量学报, 2023, 44(6): 923-930.
WU Pei-liang,QU You-yuan,LI Yao,CHEN Wen-bai,GAO Guo-wei. A Meta Generactive Instinsic Reward Based Robot Manipulation Skill Learning. Acta Metrologica Sinica, 2023, 44(6): 923-930.

链接本文:

http://jlxb.china-csm.org:81/Jwk_jlxb/CN/10.3969/j.issn.1000-1158.2023.06.13 或 http://jlxb.china-csm.org:81/Jwk_jlxb/CN/Y2023/V44/I6/923