1. School of Information science and Engineering, Yanshan University, Qinhuangdao, Heibei 066004, China
2. The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, Hebei 066004, China
3. School of Automation, Beijing Information Science and Technology University, Beijing 100192, China
Abstract:To address the problem of low learning efficiency for complex tasks under sparse rewards, a meta generative intrinsic reward (MGIR) algorithm was proposed based on the idea of off policy reinforcement learning. And it has been applied to the problem solving of robot operation skills learning. The specific steps were to first use a meta generated intrinsic reward framework that can decompose complex tasks into multiple subtasks, and evaluated the ability of subtasks. Then, an internal reward module was introduced to generate the novelty of the state explored by the agent as an internal reward. And jointly guided intelligent agents to explore the environment and learn specific tasks through environmental rewards. Finally, comparative experiments were conducted on offline strategy reinforcement learning in the MuJoCo simulation environment Fetch.The experimental results showed that the proposed meta-generated intrinsic reward algorithm performs better both in terms of training efficiency and success rate.
Ding Z, Tsai Y Y, Lee W W, et al. Sim-to-Real Transfer for Robotic Manipulation with Tactile Sensory [C]//IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS). Prague, Czech Republic, 2021.
[4]
Liu R Z, Pang Z J, Meng Z Y, et al. On Efficient Reinforcement Learning for Full-length Game of StarCraft II [J]. Journal of Artificial Intelligence Research, 2022, 75: 213-260.
[6]
Zou H S, Ren T Z, Yan D, et al. Learning task-distribution reward shaping with meta-learning [C]//AAAI Conference on Artificial Intelligence. Vancouver, British Columbia, Canada, 2021.
[11]
Marzari L, Pore A, DallAlba D, et al. Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks [C]//20th International Conference on Advanced Robotics (ICAR). Ljubljana, Slovenia, 2021.
[13]
Bellemare M, Srinivasan S, Ostrovski G, et al. Unifying count-based exploration and intrinsic motivation [C]//The 30th International Conference on Neural Information Processing Systems. NY, USA, 2016.
[14]
Yu X, Lyu Y, Tsang I. Intrinsic reward driven imitation learning via generative model [C]//International Conference on Machine Learning. Vienna, AUSTRIA, 2020.
[16]
Schaual T, Horgan D, Gregor K, et al. Universal value function approximators [C]//International conferenceon machine learning. Lille, France, 2015.
[8]
Zhao R, Sun X, Tresp V. Maximum entropy-regularized multi-goal reinforcement learning [C]//International Conference on Machine Learning. Long Beach,CA, 2019.
Wu P L, Liu R J, Li Y, et al. Robot pushing and grasping skill learning method based on generative adversarial network and model generalization [J]. Chinese Journal of Scientific Instrument, 2022, 43(5): 244-253.
[9]
McLeod M, Lo C, Schlegel M, et al. Continual auxiliary task learning [J]. Neural Information Processing Systems, 2021, 34: 12549-12562.
[12]
Machado M C, Bellemare M G, Bowling M. Count-based exploration with the successor representation [C]//AAAI Conference on Artificial Intelligence. New York, USA, 2020.
[1]
Deng S, Xu X, Wu C, et al. 3d affordancenet: A benchmark for visual object affordance understanding [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 2021.
[5]
Li D, Zhao D, Zhang Q, et al. Reinforcement learning and deep learning based lateral control for autonomous driving [J]. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98.
[7]
Huang B Y, Tsai S C. Solving hard-exploration problems with counting and replay approach [J]. Engineering Applications of Artificial Intelligence, 2022, 110: 104701.
[10]
Andrychowicz M, Wolski F, Ray A, et al. Hindsigh experience replay [J]. Neural Information Processing Systems, 2017, 12(3): 5048-5058.
[15]
Bai C, Liu P, Liu K, et al. Variational dynamic for self-supervised exploration in deep reinforcement learning [J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, Advance online publication.