Lexicon-based probabilistic indexing of handwritten text images

被引:0
|
作者
Enrique Vidal
Alejandro H. Toselli
Joan Puigcerver
机构
[1] Universitat Politècnica de València,PRHLT Research Center
来源
关键词
Pattern recognition; Posteriorgram; Relevance probability; Hidden Markov model; Recurrent neural network; Handwritten text analysis and recognition; Keyword spotting; Large-scale indexing and search;
D O I
暂无
中图分类号
学科分类号
摘要
Keyword Spotting (KWS) is here considered as a basic technology for Probabilistic Indexing (PrIx) of large collections of handwritten text images to allow fast textual access to the contents of these collections. Under this perspective, a probabilistic framework for lexicon-based KWS in text images is presented. The presentation aims at providing formal insights which help understanding classical statements of KWS (from which PrIx borrows fundamental concepts), as well as the relative challenges entailed by these statements. The development of the proposed framework makes it clear that word recognition or classification implicitly or explicitly underlies any formulation of KWS. Moreover, it suggests that the same statistical models and training methods successfully used for handwriting text recognition can advantageously be used also for PrIx, even though PrIx does not generally require or rely on any kind of previously produced image transcripts. Experiments carried out using these approaches support the consistency and the general interest of the proposed framework. Results on three datasets traditionally used for KWS benchmarking are significantly better than those previously published for these datasets. In addition, good results are also reported on two new, larger handwritten text image datasets (Bentham and Plantas), showing the great potential of the methods proposed in this paper for indexing and textual search in large collections of untranscribed handwritten documents. Specifically, we achieved the following Average Precision values: IAMDB: 0.89, George Washington: 0.91, Parzival: 0.95, Bentham: 0.91 and Plantas: 0.92.
引用
收藏
页码:17501 / 17520
页数:19
相关论文
共 50 条
  • [1] Lexicon-based probabilistic indexing of handwritten text images
    Vidal, Enrique
    Toselli, Alejandro H.
    Puigcerver, Joan
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (24): : 17501 - 17520
  • [2] A New Smoothing Method for Lexicon-Based Handwritten Text Keyword Spotting
    Puigcerver, Joan
    Toselli, Alejandro H.
    Vidal, Enrique
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 23 - 30
  • [3] Lexicon-based Offline Recognition of Amharic Words in Unconstrained Handwritten Text
    Assabie, Yaregal
    Bigun, Josef
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3043 - 3046
  • [4] Probabilistic Indexing for Information Search and Retrieval in Large Collections of Handwritten Text Images
    Wardhana, Arya Wijaya Pramodha
    Toselli, Alejandro Hector
    Puigcerver, Joan
    Vidal, Enrique
    JOURNAL OF LIBRARIANSHIP AND INFORMATION SCIENCE, 2025,
  • [5] Lexicon-Based Text Analysis for Twitter and Quora
    Nishant, Potnuru Sai
    Mohan, Bhaskaruni Gopesh Krishna
    Chandra, Balina Surya
    Lokesh, Yangalasetty
    Devaraju, Gantakora
    Revanth, Madamala
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, 2020, 46 : 276 - 283
  • [6] Sentiment analysis of economic text: A lexicon-based approach
    Barbaglia, Luca
    Consoli, Sergio
    Manzan, Sebastiano
    Pezzoli, Luca Tiozzo
    Tosetti, Elisa
    ECONOMIC INQUIRY, 2025, 63 (01) : 125 - 143
  • [7] A Lexicon-Based Approach for Detecting Hedges in Informal Text
    Islam, Jumayel
    Xiao, Lu
    Mercer, Robert E.
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3109 - 3113
  • [8] Lexicon and Attention Based Handwritten Text Recognition System
    Kumari L.
    Singh S.
    Rathore V.V.S.
    Sharma A.
    Machine Graphics and Vision, 2022, 31 (1-4): : 75 - 92
  • [9] Fast lexicon-based scene text recognition with sparse belief propagation
    Weinman, Jerod J.
    Learned-Miller, Erik
    Hanson, Allen
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 979 - 983
  • [10] Probabilistic multi-word spotting in handwritten text images
    Toselli, Alejandro H.
    Vidal, Enrique
    Puigcerver, Joan
    Noya-Garcia, Ernesto
    PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (01) : 23 - 32