WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

被引:0
|
作者
Huang, Junjie [1 ,6 ]
Tang, Duyu [4 ]
Zhong, Wanjun [2 ]
Lu, Shuai [3 ,6 ]
Shou, Linjun [5 ]
Gong, Ming [5 ]
Jiang, Daxin [5 ]
Duan, Nan [4 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Microsoft Res Asia, Beijing, Peoples R China
[5] Microsoft STC Asia, Beijing, Peoples R China
[6] Microsoft, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Producing the embedding of a sentence in an unsupervised way is valuable to natural language matching and retrieval problems in practice. In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings. We study on four pretrained models and conduct massive experiments on seven datasets regarding sentence semantics. We have three main findings. First, averaging all tokens is better than only using [CLS] vector. Second, combining both top and bottom layers is better than only using top layers. Lastly, an easy whitening-based vector normalization strategy with less than 10 lines code consistently boosts the performance. (1)
引用
收藏
页码:238 / 244
页数:7
相关论文
共 50 条
  • [1] Keyword Extractor for Contrastive Learning of Unsupervised Sentence Embedding
    Cai, Hua
    Chen, Weihong
    Shi, Kehuan
    Li, Shuaishuai
    Xu, Qing
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 88 - 93
  • [2] Instance Smoothed Contrastive Learning for Unsupervised Sentence Embedding
    He, Hongliang
    Zhang, Junlei
    Lan, Zhenzhong
    Zhang, Yue
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12863 - 12871
  • [3] An Unsupervised Sentence Embedding Method by Mutual Information Maximization
    Yan Zhang
    He, Ruidan
    Liu, Zuozhu
    Lim, Kwan Hui
    Bing, Lidong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1601 - 1610
  • [4] Contrastive Learning for Unsupervised Sentence Embedding with False Negative Calibration
    Chiu, Chi-Min
    Lin, Ying-Jia
    Kao, Hung-Yu
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024, 2024, 14647 : 290 - 301
  • [5] Prefix Data Augmentation for Contrastive Learning of Unsupervised Sentence Embedding
    Wang, Chunchun
    Lv, Shu
    APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [6] DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing Perspective
    Miao, Pu
    Du, Zeyao
    Zhang, Junlin
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1847 - 1856
  • [7] SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples
    Wang, Hao
    Dou, Yong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 419 - 431
  • [8] Unsupervised Sentence Embedding Using Document Structure-Based Context
    Lee, Taesung
    Park, Youngja
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 : 633 - 647
  • [9] An Unsupervised Sentence Embedding Method by Maximizing the Mutual Information of Augmented Text Representations
    Sheng, Tianye
    Wang, Lisong
    He, Zongfeng
    Sun, Mingjie
    Jiang, Guohua
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 174 - 185
  • [10] OssCSE: Overcoming Surface Structure Bias in Contrastive Learning for Unsupervised Sentence Embedding
    Shi, Zhan
    Wang, Guoyin
    Bai, Ke
    Li, Jiwei
    Li, Xiang
    Cui, Qingjun
    Zeng, Belinda
    Chilimbi, Trishul
    Zhu, Xiaodan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7242 - 7254