Arbitrarily shaped scene text detection with dynamic convolution
摘要
Arbitrarily shaped scene text detection has witnessed great development in recent years, and text detection using segmentation has been proven to an effective approach. However, problems caused by the diverse attributes of text instances, such as shapes, scales, and presentation styles (dense or sparse), persist. In this paper, we propose a novel text detector, termed DText, which can effectively formulate an arbitrarily shaped scene text detection task based on dynamic convolution. Our method can dynamically generate independent text-instance-aware convolutional parameters for each text instance from multifeatures thus overcoming some intractable limitations of arbitrary text detection, such as the splitting of similar adjacent text, which poses challenges to fixed instance-shared convolutional parameters-based methods. Unlike standard segmentation methods relying on regions-of-interest bounding boxes, DText focuses on enhancing the flexibility of the network to retain details of instances from diverse resolutions while effectively im proving prediction accuracy. Moreover, we propose encoding the shape and position information according to the characteristics of the text instance, termed text-shape sensitive position embedding. Thus, it can provide explicit shape and position information to the generator of the dynamic convolution parameters. Experiments on five benchmarks (Total-Text, SCUT-CTW1500, MSRA-TD50 0, ICDAR2015, and MLT) showed that our method achieves superior detection performance.
