WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

被引:0
|
作者
Huang, Junjie [1 ,6 ]
Tang, Duyu [4 ]
Zhong, Wanjun [2 ]
Lu, Shuai [3 ,6 ]
Shou, Linjun [5 ]
Gong, Ming [5 ]
Jiang, Daxin [5 ]
Duan, Nan [4 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Microsoft Res Asia, Beijing, Peoples R China
[5] Microsoft STC Asia, Beijing, Peoples R China
[6] Microsoft, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Producing the embedding of a sentence in an unsupervised way is valuable to natural language matching and retrieval problems in practice. In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings. We study on four pretrained models and conduct massive experiments on seven datasets regarding sentence semantics. We have three main findings. First, averaging all tokens is better than only using [CLS] vector. Second, combining both top and bottom layers is better than only using top layers. Lastly, an easy whitening-based vector normalization strategy with less than 10 lines code consistently boosts the performance. (1)
引用
收藏
页码:238 / 244
页数:7
相关论文
共 50 条
  • [21] SwiftRank: An Unsupervised Statistical Approach of Keyword and Salient Sentence Extraction for Individual Documents
    Lynn, Htet Myet
    Lee, Eunji
    Choi, Chang
    Kim, Pankoo
    8TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2017) / 7TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2017) / AFFILIATED WORKSHOPS, 2017, 113 : 472 - 477
  • [22] TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
    Wang, Kexin
    Reimers, Nils
    Gurevych, Iryna
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 671 - 688
  • [23] Smart Vulnerability Assessment for Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach
    Ullman, Steven
    Samtani, Sagar
    Lazarine, Ben
    Zhu, Hongyi
    Ampel, Benjamin
    Patton, Mark
    Chen, Hsinchun
    2020 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2020, : 135 - 140
  • [24] MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
    Zhang, Linhan
    Chen, Qian
    Wang, Wen
    Deng, Chong
    Zhang, Shiliang
    Li, Bing
    Wang, Wei
    Cao, Xin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 396 - 409
  • [25] Probabilistic Unsupervised Chinese Sentence Compression
    Chen, Jinguang
    He, Tingting
    Gui, Zhuoming
    Li, Fang
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 61 - +
  • [26] Unsupervised multilingual sentence boundary detection
    Kiss, Tibor
    Strunk, Jan
    COMPUTATIONAL LINGUISTICS, 2006, 32 (04) : 485 - 525
  • [27] Sentence Centrality Revisited for Unsupervised Summarization
    Zheng, Hao
    Lapata, Mirella
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6236 - 6247
  • [28] Connecting Supervised and Unsupervised Sentence Embeddings
    Levi, Gil
    REPRESENTATION LEARNING FOR NLP, 2018, : 79 - 83
  • [29] Unsupervised Large Graph Embedding
    Nie, Feiping
    Zhu, Wei
    Li, Xuelong
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2422 - 2428
  • [30] Unsupervised Embedding Quality Evaluation
    Tsitsulin, Anton
    Munkhoeva, Marina
    Perozzi, Bryan
    TOPOLOGICAL, ALGEBRAIC AND GEOMETRIC LEARNING WORKSHOPS 2023, VOL 221, 2023, 221