Contrastive learning for unsupervised sentence embeddings using negative samples with diminished semantics


연구 분야: Artificial Intelligence



학회: The Journal of Supercomputing


초록

Unsupervised learning has made significant progress in recent years, driven by advancements in contrastive learning. However, current methods for generating negative samples often lead to false negatives and feature suppression. In this paper, we propose a new contrastive learning method for unsupervised sentence embedding using negative samples with diminished semantics (DSCSE), which includes three optimizations to produce more robust representations with less dependence on undesired features. Firstly, we introduce semantically weakened negative samples called mild negatives by blurring the main parts of the sentence in the attention mechanism, allowing the model to learn sentence embeddings that are sensitive to semantic differences between sentences. Secondly, we leverage the mild negatives to eliminate false negative samples that can negatively impact the model and to identify hard negative samples that can improve the model’s performance. This filtering process improves the quality of negative samples used for training. Finally, we introduce a novel loss function called triplet fusion loss (TFL) that considers negative samples at different levels to optimize the model’s performance. TFL leverages the filtered negative samples to improve the quality of the learned sentence embeddings. Experimental results on multiple semantic text similarity tasks demonstrate that our proposed DSCSE outperforms unsupervised SimCSE by + 1.52% Spearman’s correlation scores, showing its effectiveness in learning sentence embeddings.


Author Profile
Zhiyi Yu

School of Computer Science and Engineering Central South University Changsha China

Andorra
Author Profile
Hong Li

School of Computer Science and Engineering Central South University Changsha China

Andorra
Author Profile
Jialin Feng

School of Computer Science and Engineering Central South University Changsha China

Andorra

📄 논문 정보

발행 연도 2023년
인용수 0
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (286건)