Pixel Voting Decoder: A novel decoder that regresses pixel relationships for segmentation

摘要

With the rapid development of the convolutional neural network, both instance segmentation and semantic segmentation have achieved remarkable performances. Recently, many efforts have been made to use a unified Encoder-Decoder architecture to solve these two segmentation tasks simultaneously. The encoder extracts high-level features from the input images for both tasks. However, existing decoders cannot meet the performance requirements of these two tasks: the semantic segmentation decoder is not flexible enough for instance segmentation, and the instance segmentation decoder lacks the precision of semantic segmentation. Therefore, we introduce a novel Pixel Voting Decoder to satisfy both precision and flexibility. The proposed decoder regresses the interlayer pixel relationships between the input and output feature maps across the convolutional layers. Then, the pixel relationships are regarded as the pixel votes for dynamically decoding the higher level information from the encoder. Finally, we propose the dynamic deconvolution to make full use of the votes for each pixel during the decoding process. Meanwhile, the matrix computation for the dynamic deconvolution is designed to boost the calculation. Experiments show that the proposed method can achieve better performance than the well-known methods on both instance segmentation on the COCO dataset and semantic segmentation on the Cityscapes dataset. The matrix implementation of the dynamic deconvolution also shows its high efficiency and feasibility.

关键词

Convolutional neural network Dynamic deconvolution Encoder-Decoder Image segmentation Pixel voting Residual block