摘要
The fusion of infrared intensity and polarization images can generate a single image with better visible perception and more vital information. Existing fusion methods based on a convolutional neural network (CNN), with local feature extraction, have the limitation of fully exploiting salient target features of polarization. In this Letter, we propose a transformer-based deep network to improve the performance of infrared polarization image fusion. Compared with existing CNN-based methods, our model can encode long-range features of infrared polarization images to obtain global contextual information using the self-attention mechanism. We also design a loss function with the self-supervised constraint to boost the performance of fusion. Experiments on the public infrared polarization dataset validate the effectiveness of the proposed method. Our approach achieves better fusion performance than the state-of-the-art.
