Learning by reusing previous advice: a memory-based teacher-student framework

Zhu, Changxi; Cai, Yi; Hu, Shuyue<sup>*</sup>; Leung, Ho-fung; Chiu, Dickson K. W.

doi:10.1007/s10458-022-09595-1

摘要

Reinforcement Learning (RL) has been widely used to solve sequential decision-making problems. However, it often suffers from slow learning speed in complex scenarios. Teacher-student frameworks address this issue by enabling agents to ask for and give advice so that a student agent can leverage the knowledge of a teacher agent to facilitate its learning. In this paper, we consider the effect of reusing previous advice, and propose a novel memory-based teacher-student framework such that student agents can memorize and reuse the previous advice from teacher agents. In particular, we propose two methods to decide whether previous advice should be reused: Q-Change per Step that reuses the advice if it leads to an increase in Q-values, and Decay Reusing Probability that reuses the advice with a decaying probability. The experiments on diverse RL tasks (Mario, Predator-Prey and Half Field Offense) confirm that our proposed framework significantly outperforms the existing frameworks in which previous advice is not reused.

全文

访问全文

分享分享被引浏览

更新时间：2024-03-22 21:53

Learning by reusing previous advice: a memory-based teacher-student framework

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友