Lexicon-based probabilistic indexing of handwritten text images

被引:0
|
作者
Enrique Vidal
Alejandro H. Toselli
Joan Puigcerver
机构
[1] Universitat Politècnica de València,PRHLT Research Center
来源
关键词
Pattern recognition; Posteriorgram; Relevance probability; Hidden Markov model; Recurrent neural network; Handwritten text analysis and recognition; Keyword spotting; Large-scale indexing and search;
D O I
暂无
中图分类号
学科分类号
摘要
Keyword Spotting (KWS) is here considered as a basic technology for Probabilistic Indexing (PrIx) of large collections of handwritten text images to allow fast textual access to the contents of these collections. Under this perspective, a probabilistic framework for lexicon-based KWS in text images is presented. The presentation aims at providing formal insights which help understanding classical statements of KWS (from which PrIx borrows fundamental concepts), as well as the relative challenges entailed by these statements. The development of the proposed framework makes it clear that word recognition or classification implicitly or explicitly underlies any formulation of KWS. Moreover, it suggests that the same statistical models and training methods successfully used for handwriting text recognition can advantageously be used also for PrIx, even though PrIx does not generally require or rely on any kind of previously produced image transcripts. Experiments carried out using these approaches support the consistency and the general interest of the proposed framework. Results on three datasets traditionally used for KWS benchmarking are significantly better than those previously published for these datasets. In addition, good results are also reported on two new, larger handwritten text image datasets (Bentham and Plantas), showing the great potential of the methods proposed in this paper for indexing and textual search in large collections of untranscribed handwritten documents. Specifically, we achieved the following Average Precision values: IAMDB: 0.89, George Washington: 0.91, Parzival: 0.95, Bentham: 0.91 and Plantas: 0.92.
引用
收藏
页码:17501 / 17520
页数:19
相关论文
共 50 条
  • [21] Impact for whom? Mapping the users of public research with lexicon-based text mining
    Bonaccorsi, Andrea
    Chiarello, Filippo
    Fantoni, Gualtiero
    SCIENTOMETRICS, 2021, 126 (02) : 1745 - 1774
  • [22] Lexicon-based corpus processing with LexWare
    Dura, E
    PALC'99: PRACTICAL APPLICATIONS IN LANGUAGE CORPORA, 2000, 1 : 69 - 76
  • [23] Lexicon-Based Methods for Sentiment Analysis
    Taboada, Maite
    Brooke, Julian
    Tofiloski, Milan
    Voll, Kimberly
    Stede, Manfred
    COMPUTATIONAL LINGUISTICS, 2011, 37 (02) : 267 - 307
  • [24] The implementation and evaluation of a lexicon-based stemmer
    Silva, G
    Oliveira, C
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2003, 2857 : 266 - 276
  • [25] TOWARDS A LEXICON-BASED THEORY OF AGREEMENT
    WUNDERLICH, D
    THEORETICAL LINGUISTICS, 1994, 20 (01) : 1 - 35
  • [26] Unsupervised Learning for Lexicon-Based Classification
    Eisenstein, Jacob
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3188 - 3194
  • [27] Lexicon-based emotion analysis in Turkish
    Tocoglu, Mansur Alp
    Alpkocak, Adil
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (02) : 1213 - 1227
  • [28] Arabic word descriptor for handwritten word indexing and lexicon reduction
    Chherawala, Youssouf
    Cheriet, Mohamed
    PATTERN RECOGNITION, 2014, 47 (10) : 3477 - 3486
  • [29] A Lexicon-based Feature for Twitter Sentiment Analysis
    Limboi, Sergiu
    Diosan, Laura
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING, ICCP, 2022, : 95 - 102
  • [30] Text indexing of images based on graphical image content
    Patrick, TB
    Sievert, MC
    Popescu, M
    ASIS 99: PROCEEDINGS OF THE 62ND ASIS ANNUAL MEETING, VOL 36, 1999: KNOWLEDGE: CREATION ORGANIZATION AND USE, 1999, 36 : 675 - 680