摘要

In recent years, with the emergence and rapid development of Generative Adversarial Networks (GANs), the generation of realistic images consistent with their semantics based on text description has become one of the most popular research directions in the field of computer vision. Although the idea of applying attention mechanism has been raised out in many implementation methods, it is required to treat the sub-regions of the generated images equally. For this reason, this paper proposes a novel generative adversarial networks, rdAttnGAN, which generate text to fine images by training multi-pair generators and discriminators. Comparing with the conventional models, it pays more attention to the representativeness and diversity of the generated images. In addition, an optimization method for calculating the similarity between the generated image and the text description is also introduced to enhance the representative judgment of the images. By paying more attention to the generation of important sub-regions of images, the model can further optimize the training of generators. In order to verify the effectiveness of our proposed framework, a comprehensive set of experiments are conducted on CUB dataset and COCO dataset. The results demonstrate viability to improve the representativeness and diversity of images with our rdAttnGAN.