An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text

被引:2
|
作者
Wang, Junmei [1 ,3 ]
Huang, Jimmy X. X. [2 ]
Sheng, Jinhua [1 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp, Hangzhou 310018, Peoples R China
[2] York Univ, Sch Informat Technol, Informat Retrieval & Knowledge Management Res Lab, Toronto, ON, Canada
[3] Minist Ind & Informat Technol China, Key Lab Intelligent Image Anal Sensory & Cognit Hl, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Neural information retrieval; Long-text similarity; Pretrained language model; Efficiency;
D O I
10.1007/s40747-023-01192-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the short-text retrieval model by BERT achieves significant performance improvement, research on the efficiency and performance of long-text retrieval still faces challenges. Therefore, this study proposes an efficient long-text retrieval model based on BERT (called LTR-BERT). This model achieves speed improvement while retaining most of the long-text retrieval performance. In particular, The LTR-BERT model is trained by using the relevance between short texts. Then, the long text is segmented and stored off-line. In the retrieval stage, only the coding of the query and the matching scores are calculated, which speeds up the retrieval. Moreover, a query expansion strategy is designed to enhance the representation of the original query and reserve the encoding region for the query. It is beneficial for learning missing information in the representation stage. The interaction mechanism without training parameters takes into account the local semantic details and the whole relevance to ensure the accuracy of retrieval and further shorten the response time. Experiments are carried out on MS MARCO Document Ranking dataset, which is specially designed for long-text retrieval. Compared with the interaction-focused semantic matching method by BERT-CLS, the MRR@10 values of the proposed LTR-BERT method are increased by 2.74%. Moreover, the number of documents processed per millisecond increased by 333 times.
引用
收藏
页码:963 / 979
页数:17
相关论文
共 50 条
  • [21] Multi-interest semantic changes over time in short-text microblogs
    Wandabwa, Herman M.
    Naeem, M. Asif
    Mirza, Farhaan
    Pears, Russel
    KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [22] Learning the Latent Semantic Space for Ranking in Text Retrieval
    Yan, Jun
    Yan, Shuicheng
    Liu, Ning
    Chen, Zheng
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 1115 - +
  • [23] Scene Text Retrieval via Joint Text Detection and Similarity Learning
    Wang, Hao
    Bai, Xiang
    Yang, Mingkun
    Zhu, Shenggao
    Wang, Jing
    Liu, Wenyu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4556 - 4565
  • [24] Short-Text Classification Detector: A Bert-Based Mental Approach
    Hu, Yongjun
    Ding, Jia
    Dou, Zixin
    Chang, Huiyou
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [25] Utilizing high-quality feature extension mode to classify chinese short-text
    Fan X.
    Hu H.
    Journal of Networks, 2010, 5 (12) : 1417 - 1425
  • [26] A Semantic-based Short-text Fast Clustering Method on Hotline Records in Chengdu
    Pu, Xiaorong
    Long, Kun
    Chen, Kecheng
    Xie, Mei
    Lv, Jiancheng
    Peng, Dezhong
    IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 516 - 521
  • [27] An efficient framework of utilizing the latent semantic analysis in text extraction
    Ababneh, Ahmad Hussein
    Lu, Joan
    Xu, Qiang
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 785 - 815
  • [28] A Fast and Efficient Semantic Short Text Similarity Metric
    Croft, David
    Coupland, Simon
    Shell, Jethro
    Brown, Stephen
    2013 13TH UK WORKSHOP ON COMPUTATIONAL INTELLIGENCE (UKCI), 2013, : 221 - 227
  • [29] An efficient framework of utilizing the latent semantic analysis in text extraction
    Ahmad Hussein Ababneh
    Joan Lu
    Qiang Xu
    International Journal of Speech Technology, 2019, 22 : 785 - 815
  • [30] Experimental study on short-text clustering using transformer-based semantic similarity measure
    Abdalgader, Khaled
    Matroud, Atheer A.
    Hossin, Khaled
    PEERJ COMPUTER SCIENCE, 2024, 10