An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text

被引:2
|
作者
Wang, Junmei [1 ,3 ]
Huang, Jimmy X. X. [2 ]
Sheng, Jinhua [1 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp, Hangzhou 310018, Peoples R China
[2] York Univ, Sch Informat Technol, Informat Retrieval & Knowledge Management Res Lab, Toronto, ON, Canada
[3] Minist Ind & Informat Technol China, Key Lab Intelligent Image Anal Sensory & Cognit Hl, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Neural information retrieval; Long-text similarity; Pretrained language model; Efficiency;
D O I
10.1007/s40747-023-01192-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the short-text retrieval model by BERT achieves significant performance improvement, research on the efficiency and performance of long-text retrieval still faces challenges. Therefore, this study proposes an efficient long-text retrieval model based on BERT (called LTR-BERT). This model achieves speed improvement while retaining most of the long-text retrieval performance. In particular, The LTR-BERT model is trained by using the relevance between short texts. Then, the long text is segmented and stored off-line. In the retrieval stage, only the coding of the query and the matching scores are calculated, which speeds up the retrieval. Moreover, a query expansion strategy is designed to enhance the representation of the original query and reserve the encoding region for the query. It is beneficial for learning missing information in the representation stage. The interaction mechanism without training parameters takes into account the local semantic details and the whole relevance to ensure the accuracy of retrieval and further shorten the response time. Experiments are carried out on MS MARCO Document Ranking dataset, which is specially designed for long-text retrieval. Compared with the interaction-focused semantic matching method by BERT-CLS, the MRR@10 values of the proposed LTR-BERT method are increased by 2.74%. Moreover, the number of documents processed per millisecond increased by 333 times.
引用
收藏
页码:963 / 979
页数:17
相关论文
共 50 条
  • [1] An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text
    Junmei Wang
    Jimmy X. Huang
    Jinhua Sheng
    Complex & Intelligent Systems, 2024, 10 : 963 - 979
  • [2] Efficient Long-Text Understanding with Short-Text Models
    Ivgi, Maor
    Shaham, Uri
    Berant, Jonathan
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 284 - 299
  • [3] Long-Text Sentiment Analysis Based on Semantic Graph
    Zhang, Linkun
    Lei, Yuxia
    Wang, Zhengyan
    2020 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2020,
  • [4] Research on semantic sentiment analysis of Chinese short-text
    Yu, Jian
    Gao, Jie
    Yu, Mei
    Han, Xu
    Zhang, Xu
    ICIC Express Letters, 2015, 9 (12): : 3237 - 3244
  • [5] A lightweight semantic-enhanced interactive network for efficient short-text matching
    Yu, Chuanming
    Xue, Haodong
    An, Lu
    Li, Gang
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2023, 74 (02) : 283 - 300
  • [6] Language independent semantic kernels for short-text classification
    Kim, Kwanho
    Chung, Beom-suk
    Choi, Yerim
    Lee, Seungjun
    Jung, Jae-Yoon
    Park, Jonghun
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (02) : 735 - 743
  • [7] A Comparative Analysis of Strategies for Semantic Short-Text Categorization
    Rosas, Maria V.
    Errecalde, Marcelo L.
    Rosso, Paolo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 11 - 18
  • [8] Transductive learning for short-text classification problems using latent semantic indexing
    Zelikovitz, S
    Marquez, F
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2005, 19 (02) : 143 - 163
  • [9] Learning short-text semantic similarity with word embeddings and external knowledge sources
    Nguyen, Hien T.
    Duong, Phuc H.
    Cambria, Erik
    KNOWLEDGE-BASED SYSTEMS, 2019, 182
  • [10] Transfer Learning in Long-Text Keystroke Dynamics
    Ceker, Hayreddin
    Upadhyaya, Shambhu
    2017 IEEE INTERNATIONAL CONFERENCE ON IDENTITY, SECURITY AND BEHAVIOR ANALYSIS (ISBA), 2017,