Improving Embedding-based Large-scale Retrieval via Label Enhancement

被引:0
|
作者
Liu, Peiyang [1 ,2 ]
Wang, Xi [2 ]
Wang, Sen [2 ]
Ye, Wei [1 ]
Xi, Xiangyu [1 ,3 ]
Zhang, Shikun [1 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China
[2] PX Secur, Beijing, Peoples R China
[3] Meituan Dianping Grp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current embedding-based large-scale retrieval models are trained with 0-1 hard label that indicates whether a query is relevant to a document, ignoring rich information of the relevance degree. This paper proposes to improve embedding-based retrieval from the perspective of better characterizing the query-document relevance degree by introducing label enhancement (LE) for the first time. To generate label distribution in the retrieval scenario, we design a novel and effective supervised LE method that incorporates prior knowledge from dynamic term weighting methods into contextual embeddings. Our method significantly outperforms four competitive existing retrieval models and its counterparts equipped with two alternative LE techniques by training models with the generated label distribution as auxiliary supervision information. The superiority can be easily observed on English and Chinese large-scale retrieval tasks under both standard and cold-start settings.
引用
收藏
页码:133 / 142
页数:10
相关论文
共 50 条
  • [1] QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval
    Liu, Peiyang
    Wang, Sen
    Wang, Xi
    Ye, Wei
    Zhang, Shikun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3734 - 3739
  • [2] PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models
    Chang, Wei-Cheng
    Jiang, Jyun-Yu
    Zhang, Jiong
    Al-Darabsah, Mutasem
    Teo, Choon Hui
    Hsieh, Cho-Jui
    Yu, Hsiang-Fu
    Vishwanathan, S. V. N.
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 77 - 86
  • [3] Improving Embedding-Based Retrieval in Friend Recommendation with ANN Query Expansion
    Kung, Pau Perng-Hwa
    Fan, Zihao
    Zhao, Tong
    Liu, Yozen
    Lai, Zhixin
    Shi, Jiahui
    Wu, Yan
    Yu, Jun
    Shah, Neil
    Venkataraman, Ganesh
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2930 - 2934
  • [4] Binary Embedding-based Retrieval at Tencent
    Gan, Yukang
    Ge, Yixiao
    Zhou, Chang
    Su, Shupeng
    Xu, Zhouchuan
    Xu, Xuyuan
    Hui, Quanchao
    Chen, Xiang
    Wang, Yexin
    Shan, Ying
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4056 - 4067
  • [5] Embedding-based Retrieval in Facebook Search
    Huang, Jui-Ting
    Sharma, Ashish
    Sun, Shuying
    Xia, Li
    Zhang, David
    Pronin, Philip
    Padmanabhan, Janani
    Ottaviano, Giuseppe
    Yang, Linjun
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2553 - 2561
  • [6] Coupled Binary Embedding for Large-Scale Image Retrieval
    Zheng, Liang
    Wang, Shengjin
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (08) : 3368 - 3380
  • [7] Embedding-based Product Retrieval in Taobao Search
    Li, Sen
    Lv, Fuyu
    Jin, Taiwei
    Lin, Guli
    Yang, Keping
    Zeng, Xiaoyi
    Wu, Xiao-Ming
    Ma, Qianli
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3181 - 3189
  • [8] Forward Compatible Training for Large-Scale Embedding Retrieval Systems
    Ramanujan, Vivek
    Vasu, Pavan Kumar Anasosalu
    Farhadi, Ali
    Tuzel, Oncel
    Pouransari, Hadi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19364 - 19373
  • [9] Random Projection Tree and Multiview Embedding for Large-Scale Image Retrieval
    Xie, Bo
    Mu, Yang
    Song, Mingli
    Tao, Dacheng
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 641 - +
  • [10] Large-scale image annotation via random forest based label propagation
    She, Qiaoqiao
    Yu, Yang
    Jiang, Yuan
    Zhou, Zhihua
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2012, 49 (11): : 2289 - 2295