摘要

Constrained by the numbers of action space and state space, Q-learning cannot be applied to continuous state space. Targeting this problem, the double deep Q network (DDQN) algorithm and the corresponding improvement methods were explored. First of all, to improve the accuracy of the DDNQ algorithm in estimating the target Q value in the training process, a multi-step guided strategy was introduced into the traditional DDQN algorithm, for which the single-step reward was replaced with the reward obtained in continuous multi-step interactions of mobile robots. Furthermore, an experience classification training method was introduced into the traditional DDQN algorithm, for which the state transition generated by the mobile robot-environment interaction was divided into two different types of experience pools, and experience pools were trained by the Q network, and the sampling proportions of the two experience pools were updated through the training loss. Afterward, the advantages of a multi-step guided DDQN (MS-DDQN) algorithm and experience classification DDQN (EC-DDQN) algorithm were combined to develop a novel experience classification multi-step DDQN (ECMS-DDQN) algorithm. Finally, the path planning of these four algorithms, including DDQN, MS-DDQN, EC-DDQN, and ECMS-DDQN, was simulated on the OpenAI Gym platform. The simulation results revealed that the ECMS-DDQN algorithm outperforms the other three in the total return value and generalization in path planning.

  • 单位
    桂林理工大学