摘要

This paper presents an integrated metro service scheduling and train unit deployment with a proximal policy optimization approach based on the deep reinforcement learning framework. The optimization problem is formulated as a Markov decision process (MDP) subject to a set of operational constraints. To address the computational complexity, the value function and control policy are parameterized by artificial neural networks (ANNs) with which the operational constraints are incorporated through a devised mask scheme. A proximal policy optimization (PPO) approach is developed for training the ANNs via successive transition simulations. The optimization framework is implemented and tested on a real-world scenario configured with the Victoria Line of London Underground, UK. The results show that the performance of proposed methodology outperforms a set of selected evolutionary heuristics in terms of both solution quality and computational efficiency. Results illustrate the advantages of having flexible train composition in saving operational costs and reducing service irregularities. This study contributes to real time metro operations with limited resources and state-of-art optimization techniques.

  • 单位
    北京交通大学