An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text

被引:2
|
作者
Wang, Junmei [1 ,3 ]
Huang, Jimmy X. X. [2 ]
Sheng, Jinhua [1 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp, Hangzhou 310018, Peoples R China
[2] York Univ, Sch Informat Technol, Informat Retrieval & Knowledge Management Res Lab, Toronto, ON, Canada
[3] Minist Ind & Informat Technol China, Key Lab Intelligent Image Anal Sensory & Cognit Hl, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Neural information retrieval; Long-text similarity; Pretrained language model; Efficiency;
D O I
10.1007/s40747-023-01192-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the short-text retrieval model by BERT achieves significant performance improvement, research on the efficiency and performance of long-text retrieval still faces challenges. Therefore, this study proposes an efficient long-text retrieval model based on BERT (called LTR-BERT). This model achieves speed improvement while retaining most of the long-text retrieval performance. In particular, The LTR-BERT model is trained by using the relevance between short texts. Then, the long text is segmented and stored off-line. In the retrieval stage, only the coding of the query and the matching scores are calculated, which speeds up the retrieval. Moreover, a query expansion strategy is designed to enhance the representation of the original query and reserve the encoding region for the query. It is beneficial for learning missing information in the representation stage. The interaction mechanism without training parameters takes into account the local semantic details and the whole relevance to ensure the accuracy of retrieval and further shorten the response time. Experiments are carried out on MS MARCO Document Ranking dataset, which is specially designed for long-text retrieval. Compared with the interaction-focused semantic matching method by BERT-CLS, the MRR@10 values of the proposed LTR-BERT method are increased by 2.74%. Moreover, the number of documents processed per millisecond increased by 333 times.
引用
收藏
页码:963 / 979
页数:17
相关论文
共 50 条
  • [31] Cross-Lingual Short-Text Semantic Similarity for Kannada-English Language Pair
    Muralikrishna, S. N.
    Holla, Raghurama
    Harivinod, N.
    Ganiga, Raghavendra
    COMPUTERS, 2024, 13 (09)
  • [32] Experimental study on short-text clustering using transformer-based semantic similarity measure
    Abdalgader K.
    Matroud A.A.
    Hossin K.
    PeerJ Computer Science, 2024, 10
  • [33] NGram Approach for Semantic Similarity on Arabic Short Text
    Al-Mahmoud, Rana Husni
    Sharieh, Ahmad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 857 - 866
  • [34] SEMANTIC PRESENTATION OF A TEXT BASED ON A MODEL OF A LEARNING-SYSTEM
    LAVRENOVA, OA
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1984, (04): : 18 - 24
  • [35] Short Text Semantic Similarity Measurement Approach Based on Semantic Network
    Hameed, Naamah Hussien
    Alimi, Adel M.
    Sadiq, Ahmed T.
    BAGHDAD SCIENCE JOURNAL, 2022, 19 (06) : 1581 - 1591
  • [36] A Semantic and Feature Aggregated Information Retrieval Technique for Efficient Geospatial Text Document Retrieval
    Uma, R.
    Muneeswaran, K.
    JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2017, 28 (06) : 547 - 569
  • [37] Speaker-Text Retrieval via Contrastive Learning
    Liu, Xuechen
    Wang, Xin
    Cooper, Erica
    Miao, Xiaoxiao
    Yamagishi, Junichi
    arXiv, 2023,
  • [38] Text Classification via Learning Semantic Dependency and Association
    Zhu, Guanqi
    Tao, Hanqing
    Wu, Han
    Chen, Liyi
    Liu, Ye
    Liu, Qi
    Chen, Enhong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [39] LNN-EL: A Neuro-Symbolic Approach to Short-text Entity Linking
    Jiang, Hang
    Gurajada, Sairam
    Lu, Qiuhao
    Neelam, Sumit
    Popa, Lucian
    Sen, Prithviraj
    Li, Yunyao
    Gray, Alexander
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 775 - 787
  • [40] Learning to Embed Semantic Similarity for Joint Image-Text Retrieval
    Malali, Noam
    Keller, Yosi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10252 - 10260