摘要
Background subtraction is a challenging and fundamental task in computer vision, which aims at segmenting moving objects from the background. Recently, the attention mechanism has become a hot topic in the neural network. The algorithms based on encoder-decoder and multi-scale type network perform impressive results in the domain of background subtraction. In this paper, we propose a multi-scale inputs and labels (MSIL) model which is based on the encoder-decoder type network and the channel attention. The multi-scale fusion encoding (MSFE) module aims to utilize multi-scale inputs effectively, which can fuse the high-level and low-level features details. The channel attention (CA) module is introduced to connect the encoder and decoder to model channel-wise attentions. The multi-label supervision decoding (MLSD) module helps to learn richer hierarchical features and achieves better performance by the new multi-label supervision. The proposed model is also evaluated on the CDnet-2014 dataset and the LASIESTA dataset, which demonstrate the effectiveness and superiority of the proposed model by an average F-Measure of 0.9851 and 0.9633, respectively. In addition, scene independent evaluation experiments on the CDnet-2014 dataset demonstrate the effectiveness of the model on unseen videos.