TFIV: Multigrained Token Fusion for Infrared and Visible Image via Transformer

作者:Li, Jing; Yang, Bin; Bai, Lu; Dou, Hao; Li, Chang; Ma, Lingfei*
来源:IEEE Transactions on Instrumentation and Measurement, 2023, 72: 2526414.
DOI:10.1109/TIM.2023.3312755

摘要

The existing transformer-based infrared and visible image fusion methods mainly focus on the self-attention correlation existing in the intra-modal of each image; yet these methods neglect the discrepancies of inter-modal in the same position of two source images, because the information of infrared token and visible token in the same position is unbalanced. Therefore, we develop a pure transformer fusion model to reconstruct fused image in token dimension, which not only perceives the long-range dependencies in intra-modal by self-attention mechanism of the transformer, but also captures the attentive correlation of inter-modal in token space. Moreover, to enhance and balance the interaction of inter-modal tokens when we fuse the corresponding infrared and visible tokens, learnable attentive weights are applied to dynamically measure the correlation of inter-modal tokens in the same position. Concretely, infrared and visible tokens are first calculated by two independent transformers to extract long-range dependencies in intra-modal due to their modal difference. Then, we fuse the corresponding infrared and visible tokens of inter-modal in token space to reconstruct the fused image. In addition, to comprehensively extract multiscale long-range dependencies and capture attentive correlation of corresponding multimodal tokens in different token sizes, we explore and extend the fusion to multigrained token-based fusion. Ablation studies and extensive experiments illustrate the effectiveness and superiorities of our model when compared with nine state-of-the-art methods.

  • 单位
    中央财经大学