WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

被引:0
|
作者
Huang, Junjie [1 ,6 ]
Tang, Duyu [4 ]
Zhong, Wanjun [2 ]
Lu, Shuai [3 ,6 ]
Shou, Linjun [5 ]
Gong, Ming [5 ]
Jiang, Daxin [5 ]
Duan, Nan [4 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Microsoft Res Asia, Beijing, Peoples R China
[5] Microsoft STC Asia, Beijing, Peoples R China
[6] Microsoft, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Producing the embedding of a sentence in an unsupervised way is valuable to natural language matching and retrieval problems in practice. In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings. We study on four pretrained models and conduct massive experiments on seven datasets regarding sentence semantics. We have three main findings. First, averaging all tokens is better than only using [CLS] vector. Second, combining both top and bottom layers is better than only using top layers. Lastly, an easy whitening-based vector normalization strategy with less than 10 lines code consistently boosts the performance. (1)
引用
收藏
页码:238 / 244
页数:7
相关论文
共 50 条
  • [41] Unsupervised abstractive summarization via sentence rewriting
    Zhang, Zhihao
    Liang, Xinnian
    Zuo, Yuan
    Li, Zhoujun
    COMPUTER SPEECH AND LANGUAGE, 2023, 78
  • [42] Unsupervised Relation Extraction Using Sentence Encoding
    Ali, Manzoor
    Saleem, Muhammad
    Ngomo, Axel-Cyrille Ngonga
    SEMANTIC WEB: ESWC 2021 SATELLITE EVENTS, 2021, 12739 : 136 - 140
  • [43] Unsupervised Rewriter for Multi-Sentence Compression
    Zhao, Yang
    Shen, Xiaoyu
    Bi, Wei
    Aizawa, Akiko
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2235 - 2240
  • [44] An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings
    Lamsiyah, Salima
    El Mahdaouy, Abdelkader
    Espinasse, Bernard
    Ouatik, Said El Alaoui
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167
  • [45] Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding
    Barkan, Oren
    Razin, Noam
    Malkiel, Itzik
    Katz, Ori
    Caciularu, Avi
    Koenigstein, Noam
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3235 - 3242
  • [46] Strong systematicity through sensorimotor conceptual grounding: an unsupervised, developmental approach to connectionist sentence processing
    Jansen, Peter A.
    Watter, Scott
    CONNECTION SCIENCE, 2012, 24 (01) : 25 - 55
  • [47] UNSUPERVISED DISCRIMINANT EMBEDDING IN CLUSTER SPACES
    Szekely, Eniko
    Bruno, Eric
    Marchand-Maillet, Stephane
    KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 70 - 76
  • [48] Unsupervised Attributed Multiplex Network Embedding
    Park, Chanyoung
    Kim, Donghyun
    Han, Jiawei
    Yu, Hwanjo
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5371 - 5378
  • [49] CLSEP: Contrastive learning of sentence embedding with prompt
    Wang, Qian
    Zhang, Weiqi
    Lei, Tianyi
    Cao, Yu
    Peng, Dezhong
    Wang, Xu
    KNOWLEDGE-BASED SYSTEMS, 2023, 266
  • [50] Sign Language Translation with Sentence Embedding Supervision
    Hamidullah, Yasser
    van Genabith, Josef
    Espana-Bonet, Cristina
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 425 - 434