An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text

被引:2
|
作者
Wang, Junmei [1 ,3 ]
Huang, Jimmy X. X. [2 ]
Sheng, Jinhua [1 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp, Hangzhou 310018, Peoples R China
[2] York Univ, Sch Informat Technol, Informat Retrieval & Knowledge Management Res Lab, Toronto, ON, Canada
[3] Minist Ind & Informat Technol China, Key Lab Intelligent Image Anal Sensory & Cognit Hl, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Neural information retrieval; Long-text similarity; Pretrained language model; Efficiency;
D O I
10.1007/s40747-023-01192-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the short-text retrieval model by BERT achieves significant performance improvement, research on the efficiency and performance of long-text retrieval still faces challenges. Therefore, this study proposes an efficient long-text retrieval model based on BERT (called LTR-BERT). This model achieves speed improvement while retaining most of the long-text retrieval performance. In particular, The LTR-BERT model is trained by using the relevance between short texts. Then, the long text is segmented and stored off-line. In the retrieval stage, only the coding of the query and the matching scores are calculated, which speeds up the retrieval. Moreover, a query expansion strategy is designed to enhance the representation of the original query and reserve the encoding region for the query. It is beneficial for learning missing information in the representation stage. The interaction mechanism without training parameters takes into account the local semantic details and the whole relevance to ensure the accuracy of retrieval and further shorten the response time. Experiments are carried out on MS MARCO Document Ranking dataset, which is specially designed for long-text retrieval. Compared with the interaction-focused semantic matching method by BERT-CLS, the MRR@10 values of the proposed LTR-BERT method are increased by 2.74%. Moreover, the number of documents processed per millisecond increased by 333 times.
引用
收藏
页码:963 / 979
页数:17
相关论文
共 50 条
  • [41] Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval
    Li, Jiayi
    Jiang, Min
    Kong, Jun
    Tao, Xuefeng
    Luo, Xi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10678 - 10691
  • [42] SEMANTIC-PRESERVING METRIC LEARNING FOR VIDEO-TEXT RETRIEVAL
    Choo, Sungkwon
    Ha, Seong Jong
    Lee, Joonsoo
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2388 - 2392
  • [43] Short-Text Conceptualization Based on a Co-ranking Framework via Lexical Knowledge Base
    Wang, Yashen
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 281 - 293
  • [44] An efficient Wikipedia semantic matching approach to text document classification
    Wu, Zongda
    Zhu, Hui
    Li, Guiling
    Cui, Zongmin
    Huang, Hui
    Li, Jun
    Chen, Enhong
    Xu, Guandong
    INFORMATION SCIENCES, 2017, 393 : 15 - 28
  • [45] Augment to Prevent: Short-Text Data Augmentation in Deep Learning for Hate-Speech Classification
    Rizos, Georgios
    Hemker, Konstantin
    Schuller, Bjoern
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 991 - 1000
  • [46] Towards filtering undesired short text messages using an online learning approach with semantic indexing
    Silva, Renato M.
    Alberto, Tulio C.
    Almeida, Tiago A.
    Yamakami, Akebo
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 83 : 314 - 325
  • [47] Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity
    Batanovic, Vuk
    Bojic, Dragan
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 12 (01) : 1 - 31
  • [48] EHHR: an efficient evolutionary hyper-heuristic based recommender framework for short-text classifier selection
    Bushra Almas
    Hasan Mujtaba
    Kifayat Ullah Khan
    Cluster Computing, 2023, 26 : 1425 - 1446
  • [49] Large⁃scale semantic text overlapping region retrieval based on deep learning
    Dong L.-L.
    Yang D.
    Zhang X.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2021, 51 (05): : 1817 - 1822
  • [50] EHHR: an efficient evolutionary hyper-heuristic based recommender framework for short-text classifier selection
    Almas, Bushra
    Mujtaba, Hasan
    Khan, Kifayat Ullah
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (02): : 1425 - 1446