An effective negative sampling approach for contrastive learning of sentence embedding

Tan, Qitao; Song, Xiaoying; Ye, Guanghui; Wu, Chuan<sup>*</sup>

doi:10.1007/s10994-023-06408-8

摘要

Unsupervised sentence embedding learning is a fundamental task in natural language processing. Recently, unsupervised contrastive learning based on pre-trained language models has shown impressive performance in sentence embedding learning. This method aims to align positive sentence pairs while pushing apart negative sentence pairs to achieve semantic uniformity in the representation space. However, most previous literature leverages a random strategy to sample negative pairs, which suffers from the risk of selecting uninformative negative examples (e.g., easily distinguishable examples, anisotropic representations), thus greatly affecting the quality of learned representations. To address this issue, we propose nmCSE, a negative mining contrastive learning method for sentence embedding. Specifically, we introduce distance moderation and spatial uniformity as two properties of informative negative examples, and devise distance-based weighting and grid sampling as two strategies to preserve these properties, respectively. Our proposal outperforms the random strategy across seven semantic textual similarity datasets. Furthermore, our method can easily be adapted to other contrastive learning scenarios (e.g., vision), and does not introduce significant computational overhead.

全文

访问全文

分享分享被引浏览

更新时间：2024-03-23 08:07

An effective negative sampling approach for contrastive learning of sentence embedding

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友