摘要
Unsupervised sentence embedding learning is a fundamental task in natural language processing. Recently, unsupervised contrastive learning based on pre-trained language models has shown impressive performance in sentence embedding learning. This method aims to align positive sentence pairs while pushing apart negative sentence pairs to achieve semantic uniformity in the representation space. However, most previous literature leverages a random strategy to sample negative pairs, which suffers from the risk of selecting uninformative negative examples (e.g., easily distinguishable examples, anisotropic representations), thus greatly affecting the quality of learned representations. To address this issue, we propose nmCSE, a negative mining contrastive learning method for sentence embedding. Specifically, we introduce distance moderation and spatial uniformity as two properties of informative negative examples, and devise distance-based weighting and grid sampling as two strategies to preserve these properties, respectively. Our proposal outperforms the random strategy across seven semantic textual similarity datasets. Furthermore, our method can easily be adapted to other contrastive learning scenarios (e.g., vision), and does not introduce significant computational overhead.