An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text

被引:2
|
作者
Wang, Junmei [1 ,3 ]
Huang, Jimmy X. X. [2 ]
Sheng, Jinhua [1 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp, Hangzhou 310018, Peoples R China
[2] York Univ, Sch Informat Technol, Informat Retrieval & Knowledge Management Res Lab, Toronto, ON, Canada
[3] Minist Ind & Informat Technol China, Key Lab Intelligent Image Anal Sensory & Cognit Hl, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Neural information retrieval; Long-text similarity; Pretrained language model; Efficiency;
D O I
10.1007/s40747-023-01192-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the short-text retrieval model by BERT achieves significant performance improvement, research on the efficiency and performance of long-text retrieval still faces challenges. Therefore, this study proposes an efficient long-text retrieval model based on BERT (called LTR-BERT). This model achieves speed improvement while retaining most of the long-text retrieval performance. In particular, The LTR-BERT model is trained by using the relevance between short texts. Then, the long text is segmented and stored off-line. In the retrieval stage, only the coding of the query and the matching scores are calculated, which speeds up the retrieval. Moreover, a query expansion strategy is designed to enhance the representation of the original query and reserve the encoding region for the query. It is beneficial for learning missing information in the representation stage. The interaction mechanism without training parameters takes into account the local semantic details and the whole relevance to ensure the accuracy of retrieval and further shorten the response time. Experiments are carried out on MS MARCO Document Ranking dataset, which is specially designed for long-text retrieval. Compared with the interaction-focused semantic matching method by BERT-CLS, the MRR@10 values of the proposed LTR-BERT method are increased by 2.74%. Moreover, the number of documents processed per millisecond increased by 333 times.
引用
收藏
页码:963 / 979
页数:17
相关论文
共 50 条
  • [11] Asymmetric Short-Text Clustering via Prompt
    Wang, Zhi
    Zhu, Yi
    Li, Yun
    Qiang, Jipeng
    Yuan, Yunhao
    Zhang, Chaowei
    NEW GENERATION COMPUTING, 2024, 42 (04) : 599 - 615
  • [12] Short-text learning in social media: a review
    Tommasel, Antonela
    Godoy, Daniela
    KNOWLEDGE ENGINEERING REVIEW, 2019, 34 : 1 - 38
  • [13] EnsembleGAN: Adversarial Learning for Retrieval-Generation Ensemble Model on Short-Text Conversation
    Zhang, Jiayi
    Tao, Chongyang
    Xu, Zhenjing
    Xie, Qiaojing
    Chen, Wei
    Yan, Rui
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 435 - 444
  • [14] A Short-Text Similarity Model Combining Semantic and Syntactic Information
    Zhou, Ya
    Li, Cheng
    Huang, Guimin
    Guo, Qingkai
    Li, Hui
    Wei, Xiong
    ELECTRONICS, 2023, 12 (14)
  • [15] CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora
    Long, Zijun
    Ge, Xuri
    McCreadie, Richard
    Jose, Joemon M.
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2188 - 2198
  • [16] SyMSS: A syntax-based measure for short-text semantic similarity
    Oliva, Jesus
    Ignacio Serrano, Jose
    Dolores del Castillo, Maria
    Iglesias, Angel
    DATA & KNOWLEDGE ENGINEERING, 2011, 70 (04) : 390 - 405
  • [17] Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives
    Amur, Zaira Hassan
    Hooi, Yew Kwang
    Bhanbhro, Hina
    Dahri, Kamran
    Soomro, Gul Muhammad
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [18] Exploiting Global Semantic Similarity Biterms for Short-text Topic Discovery
    Lu, Heng-yang
    Ge, Gao-jian
    Li, Yun
    Wang, Chong-jun
    Xie, Jun-yuan
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 975 - 982
  • [19] Classification of Sentiments in Short-Text: An approach using mSMTP measure
    Kumar, H. M. Keerthi
    Harish, B. S.
    Kumar, S. V. Aruna
    Aradhya, V. N. Manjunath
    2ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING (ICMLSC 2018), 2015, : 145 - 150
  • [20] Spectral Approach to find number of Clusters of Short-text Documents
    Goyal, Anil
    Jadon, Mukesh K.
    Pujari, Arun K.
    2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,