摘要
The emergence of cooperation between competing agents has been commonly studied through evolutionary games, but such cooperation often requires a mechanism or a third party to be activated and kept alive. To investigate how a mechanism affects the evo-lution of cooperation, this paper proposes an innovative reinforcement learning-based strategy updating model. The model consists of two symmetrical sets of convolutional neural networks. Besides, the agents' strategies updating rules are defined: firstly, the agents learn and predict the environment and the behaviors of neighboring agents, then estimate their future payoffs based on this information, and finally determine their strategies based on these estimated payoffs. Through investigating the behavior characteristics and the stable states of the network for highly intelligent agents with memory learning and prediction ability in the evolution of the prisoner's dilemma game, the results demonstrate that the game initiators who adopt the mixed optimal payoff approach can increase the number of cooperators and facilitate "global cooperation"and "repaying kindness with kindness". Although the temptation factor has little effect on the population, increasing the discount factor can expand the scale of the cooperative cluster and even achieve dynamic stability. Additionally, a smaller size of minibatch is beneficial for the evolution of cooperation in a smaller experience replay pool. A larger size of minibatch is more conducive to the evolution of cooperation with an increasing capacity of the experience replay pool. This research provides a novel perspective from reinforcement learning to understand the evolution of cooperation.
- 
                                单位武汉大学
