Improving Embedding-based Large-scale Retrieval via Label Enhancement

被引:0
|
作者
Liu, Peiyang [1 ,2 ]
Wang, Xi [2 ]
Wang, Sen [2 ]
Ye, Wei [1 ]
Xi, Xiangyu [1 ,3 ]
Zhang, Shikun [1 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China
[2] PX Secur, Beijing, Peoples R China
[3] Meituan Dianping Grp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current embedding-based large-scale retrieval models are trained with 0-1 hard label that indicates whether a query is relevant to a document, ignoring rich information of the relevance degree. This paper proposes to improve embedding-based retrieval from the perspective of better characterizing the query-document relevance degree by introducing label enhancement (LE) for the first time. To generate label distribution in the retrieval scenario, we design a novel and effective supervised LE method that incorporates prior knowledge from dynamic term weighting methods into contextual embeddings. Our method significantly outperforms four competitive existing retrieval models and its counterparts equipped with two alternative LE techniques by training models with the generated label distribution as auxiliary supervision information. The superiority can be easily observed on English and Chinese large-scale retrieval tasks under both standard and cold-start settings.
引用
收藏
页码:133 / 142
页数:10
相关论文
共 50 条
  • [31] Contextual Path Retrieval: A Contextual Entity Relation Embedding-based Approach
    Lo, Pei-Chi
    Lim, Ee-Peng
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (01)
  • [32] Large-scale Image Retrieval based on the Vocabulary Tree
    Cheng, Bo
    Zhuo, Li
    Zhang, Pei
    Zhang, Jing
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 299 - 304
  • [33] Embedding-based Query Expansion for Weighted Sequential Dependence Retrieval Model
    Balaneshin-kordan, Saeid
    Kotov, Alexander
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1213 - 1216
  • [34] Large-scale eigenvector approximation via Hilbert Space Embedding Nystrom
    Lin, Ming
    Wang, Fei
    Zhang, Changshui
    PATTERN RECOGNITION, 2015, 48 (05) : 1904 - 1912
  • [35] Large-Scale ALS Point Cloud Segmentation via Projection-Based Context Embedding
    Dai, Hengming
    Hu, Xiangyun
    Zhang, Jinming
    Shu, Zhen
    Xu, Jiabo
    Du, Juan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
  • [36] Efficient Supervised Graph Embedding Hashing for large-scale cross-media retrieval
    Yao, Tao
    Wang, Ruxin
    Wang, Jintao
    Li, Ying
    Yue, Jun
    Yan, Lianshan
    Tian, Qi
    PATTERN RECOGNITION, 2024, 145
  • [37] Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-Scale Image Retrieval
    Xu, Jian
    Wang, Chunheng
    Qi, Chengzuo
    Shi, Cunzhao
    Xiao, Baihua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1551 - 1562
  • [38] Large-Scale Heterogeneous Feature Embedding
    Huang, Xiao
    Song, Qingquan
    Yang, Fan
    Hu, Xia
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3878 - 3885
  • [39] Large Margin Graph Embedding-Based Discriminant Dimensionality Reduction
    Tian, Yanjia
    Feng, Xiang
    SCIENTIFIC PROGRAMMING, 2021, 2021 (2021)
  • [40] Improving Search Engines via Large-Scale Physiological Sensing
    White, Ryen W.
    Ma, Ryan
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 881 - 884