摘要
Downsampling operations such as max pooling or strided convolution are ubiquitously utilized in Convolutional Neural Networks (CNNs) to aggregate local features, enlarge receptive field, and minimize computational overhead. However, for a semantic segmentation task, pooling features over the local neighbourhood may result in the loss of important spatial information, which is conducive for pixel-wise predictions. To address this issue, we introduce a simple yet effective pooling operation called the Haar Wavelet-based Downsampling (HWD) module. This module can be easily integrated into CNNs to enhance the performance of semantic segmentation models. The core idea of HWD is to apply Haar wavelet transform for reducing the spatial resolution of feature maps while preserving as much information as possible. Furthermore, to investigate the benefits of HWD, we propose a novel metric, named as feature entropy index (FEI), which measures the degree of information uncertainty after downsampling in CNNs. Specifically, the FEI can be used to indicate the ability of downsampling methods to preserve essential information in semantic segmentation. Our comprehensive experiments demonstrate that the proposed HWD module could (1) effectively improve the segmentation performance across different modality image datasets with various CNN architectures, and (2) efficiently reduce information uncertainty compared to the conventional downsampling methods. Our implementation are available at https:// github.com/apple1986/HWD.
-
单位武汉工程大学; 华中科技大学