摘要

Visual question generation task aims to generate meaningful questions about an image targeting an answer. Existing methods focus on the visual concepts in the image for question generation. However, humans inevitably use their knowledge related to visual objects in images to construct questions. In this paper, we propose a knowledge-based visual question generation model that can integrate visual concepts and non-visual knowledge to generate questions. To obtain visual concepts, we utilize a pre-trained object detection model to obtain object-level features of each object in the image. To obtain useful non-visual knowledge, we first retrieve the knowledge from the knowledge-base related to the visual objects in the image. Considering that not all retrieved knowledge is helpful for this task, we introduce an answer-aware module to capture the candidate knowledge related to the answer from the retrieved knowledge, which ensures that the generated content can be targeted at the answer. Finally, object-level representations containing visual concepts and non-visual knowledge are sent to a decoder module to generate questions. Extensive experiments on the FVQA and KBVQA datasets show that the proposed model outperforms the state-of-the-art models.