Emotion-sensitive deep dyna-Q learning for task-completion dialogue policy learning
摘要
In recent years, task-oriented dialogue systems have received extensive attention from academia and industry. Training a dialogue agent through reinforcement learning is often costly because it requires many interactions with real users. Although the Deep Dyna-Q (DDQ) framework uses simulation experience to alleviate the cost of direct reinforcement learning, it still suffers from challenges such as delayed rewards and policy degradation. This paper proposes an Emotion-Sensitive Deep Dyna-Q (ES-DDQ) model which: (1) presents an emotional world model that considers emotion-related cues to improve the ability of the traditional DDQ framework to model and simulate users, and (2) designs two kinds of emotion related immediate rewards to mitigate the delayed reward problem. Experimental results show that our proposed approach effectively simulates users' behaviors and is superior to the state-of-the-art benchmarks. CO 2021 Published by Elsevier B.V.
