SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

被引:10
|
作者
Wang, Hao [1 ]
Dou, Yong [1 ]
机构
[1] Natl Univ Def Technol, Changsha 410073, Peoples R China
关键词
Unsupervised Sentence Embedding; Contrastive Learning; Feature Suppression; Soft Negative Samples; Bidirectional Margin Loss;
D O I
10.1007/978-981-99-4752-2_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised sentence embedding aims to obtain the most appropriate embedding for a sentence to reflect its semantics. Contrastive learning has been attracting developing attention. For a sentence, current models utilize diverse data augmentation methods to generate positive samples, while consider other independent sentences as negative samples. Then they adopt InfoNCE loss to pull the embeddings of positive pairs gathered, and push those of negative pairs scattered. Although these models have made great progress, we argue that they may suffer from feature suppression, where the models fail to distinguish and decouple textual similarity and semantic similarity. They may overestimate the semantic similarity of any sentence pairs with similar text regardless of the actual semantic difference between them, and vice versa. Herein, we propose contrastive learning for unsupervised sentence embedding with soft negative samples (SNCSE). Soft negative samples share highly similar text but have surely and apparently different semantics with the original samples. Specifically, we take the negation of original sentences as soft negative samples, and propose BidirectionalMargin Loss (BML) to introduce them into traditional contrastive learning framework. Our experimental results on semantic textual similarity (STS) task show that SNCSE can obtain state-of-the-art performance with different encoders, indicating its strength on unsupervised sentence embedding. Our code and models are released at https:// github.com/Sense-GVT/SNCSE.
引用
收藏
页码:419 / 431
页数:13
相关论文
共 50 条
  • [41] DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
    Xu, Jiahao
    Shao, Wei
    Chen, Lihui
    Liu, Lemao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8153 - 8165
  • [42] MCSE: Multimodal Contrastive Learning of Sentence Embeddings
    Zhang, Miaoran
    Mosbach, Marius
    Adelani, David Ifeoluwa
    Hedderich, Michael A.
    Klakow, Dietrich
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5959 - 5969
  • [43] Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining
    Basu, Soumen
    Singla, Somanshu
    Gupta, Mayank
    Rana, Pratyaksha
    Gupta, Pankaj
    Arora, Chetan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 423 - 433
  • [44] Contrastive Learning of Sentence Embeddings from Scratch
    Zhang, Junlei
    Lan, Zhenzhong
    He, Junxian
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3916 - 3932
  • [45] SimCSE: Simple Contrastive Learning of Sentence Embeddings
    Gao, Tianyu
    Yao, Xingcheng
    Chen, Danqi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6894 - 6910
  • [46] Composition-contrastive Learning for Sentence Embeddings
    Chanchani, Sachin
    Huang, Ruihong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15836 - 15848
  • [47] Pairwise Supervised Contrastive Learning of Sentence Representations
    Zhang, Dejiao
    Li, Shang-Wen
    Xiao, Wei
    Zhu, Henghui
    Nallapati, Ramesh
    Arnold, Andrew O.
    Xiang, Bing
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5786 - 5798
  • [48] Cross-lingual Sentence Embedding for Low-resource Chinese-Vietnamese Based on Contrastive Learning
    Huang, Yuxin
    Liang, Yin
    Wu, Zhaoyuan
    Zhu, Enchang
    Yu, Zhengtao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [49] Negative Samples Mining Matters: Reconsidering Hyperspectral Image Classification With Contrastive Learning
    Liu, Hui
    Huang, Chenjia
    Chen, Ning
    Xie, Tao
    Lu, Mingyue
    Huang, Zhou
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [50] Feature extraction framework based on contrastive learning with adaptive positive and negative samples
    Zhang, Hongjie
    Zhao, Siyu
    Qiang, Wenwen
    Chen, Yingyi
    Jing, Ling
    NEURAL NETWORKS, 2022, 156 : 244 - 257