2025年04月07日 星期一 首页   |    期刊介绍   |    编 委 会   |    投稿指南   |    期刊订阅   |    统合信息   |    联系我们
计量学报  2023, Vol. 44 Issue (6): 923-930    DOI: 10.3969/j.issn.1000-1158.2023.06.13
  力学计量 本期目录 | 过刊浏览 | 高级检索 |
基于元生成内在奖励的机器人操作技能学习方法
吴培良1,2,渠有源1,2,李瑶1,2,陈雯柏3,高国伟3
1.燕山大学信息科学与工程学院,河北 秦皇岛 066004
2.河北省计算机虚拟技术与系统集成重点实验室,河北 秦皇岛 066004
3.北京信息科技大学自动化学院,北京 100192
A Meta Generactive Instinsic Reward Based Robot Manipulation Skill Learning
WU Pei-liang1,2,QU You-yuan1,2,LI Yao1,2,CHEN Wen-bai3,GAO Guo-wei3
1. School of Information science and Engineering, Yanshan University, Qinhuangdao, Heibei 066004, China
2. The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, Hebei 066004, China
3. School of Automation, Beijing Information Science and Technology University, Beijing 100192, China
全文: PDF (761 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 针对稀疏奖励下,复杂任务学习效率低的问题,在离线策略(off-policy)强化学习思想基础上,提出了元生成内在奖励算法(meta generative intrinsic reward, MGIR),并将其应用在机器人操作技能学习问题求解方面。具体步骤为先使用一个可将复杂任务分解为多个子任务的元生成内在奖励框架,对子任务进行能力评价;再引入生成内在奖励模块,将智能体探索得到状态的新颖性作为内在奖励,并联合环境奖励共同指导智能体完成对环境的探索和特定任务的学习;最后,在MuJoCo仿真环境Fetch中对离线策略强化学习进行对比实验。实验结果表明,无论是在训练效率还是在成功率方面,提出的元生成内在奖励算法均表现较好。
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
吴培良
渠有源
李瑶
陈雯柏
高国伟
关键词 计量学;机器人操作技能学习稀疏奖励强化学习;元学习;生成内在奖励    
Abstract:To address the problem of low learning efficiency for complex tasks under sparse rewards, a meta generative intrinsic reward (MGIR) algorithm was proposed based on the idea of off policy reinforcement learning. And it has been applied to the problem solving of robot operation skills learning. The specific steps were to first use a meta generated intrinsic reward framework that can decompose complex tasks into multiple subtasks, and evaluated the ability of subtasks. Then, an internal reward module was introduced to generate the novelty of the state explored by the agent as an internal reward. And jointly guided intelligent agents to explore the environment and learn specific tasks through environmental rewards. Finally, comparative experiments were conducted on offline strategy reinforcement learning in the MuJoCo simulation environment Fetch.The experimental results showed that the proposed meta-generated intrinsic reward algorithm performs better both in terms of training efficiency and success rate.
Key wordsmetrology    robot operation skills learning    sparse reward    reinforcement learning    meta learning    generative intrinsic reward
收稿日期: 2023-01-03      发布日期: 2023-06-25
PACS:  TB93  
  TB973  
基金资助:国家重点研发计划(2018YFB1308300);国家自然科学基金(62276028,U20A20167);北京市自然科学基金(4202026);河北省自然科学基金(F202103079);河北省创新能力提升计划(22567626H)
作者简介: 吴培良(1981-),河北石家庄人,燕山大学教授、博导。主要从事机器人认知与操作技能学习、多智能体系统等方面的研究。Email: peiliangwu@ysu.edu.cn
引用本文:   
吴培良,渠有源,李瑶,陈雯柏,高国伟. 基于元生成内在奖励的机器人操作技能学习方法[J]. 计量学报, 2023, 44(6): 923-930.
WU Pei-liang,QU You-yuan,LI Yao,CHEN Wen-bai,GAO Guo-wei. A Meta Generactive Instinsic Reward Based Robot Manipulation Skill Learning. Acta Metrologica Sinica, 2023, 44(6): 923-930.
链接本文:  
http://jlxb.china-csm.org:81/Jwk_jlxb/CN/10.3969/j.issn.1000-1158.2023.06.13     或     http://jlxb.china-csm.org:81/Jwk_jlxb/CN/Y2023/V44/I6/923
京ICP备:14006989号-1
版权所有 © 《计量学报》编辑部
地址:北三环东路18号(北京1413信箱)  邮编:100029 电话:(010)64271480
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持:support@magtech.com.cn