摘要
In this paper, we propose the Glyph-Semanteme fusion Embedding (GSE) for Chinese character and apply it to Offline Handwritten Chinese Text Recognition (offline-HCTR). It is well known that the number of Chinese characters is very large and the glyphs of these characters are complex, but few researchers realize that the underlying reason for this phenomenon is that Chinese is a form of ideogram, which indicates that there are correlations between the glyph and semanteme of a character. In order to utilize this feature and create better representations for Chinese characters, firstly, we extract the glyph embedding and semanteme embedding for each Chinese character; then we propose a parameterized gated fusion strategy to automatically calculate the Glyph-Semanteme fusion Embedding for each character by fusing its glyph embedding and semanteme embedding. We apply the proposed GSE to an attention-based Encoder-decoder network for the offline-HCTR task. Furthermore, two kinds of GSE, Character-level GSE (CGSE) and Text-level GSE (TGSE), are applied to the decoder phase to yield the predictions. On the standard benchmark ICDAR-2013 HCTR competition dataset, the proposed method achieves 96.65% character-level recognition accuracy, which demonstrates the effectiveness of the proposed glyph-semanteme fusion embedding.