Lexicon-based probabilistic indexing of handwritten text images

被引:0
|
作者
Enrique Vidal
Alejandro H. Toselli
Joan Puigcerver
机构
[1] Universitat Politècnica de València,PRHLT Research Center
来源
关键词
Pattern recognition; Posteriorgram; Relevance probability; Hidden Markov model; Recurrent neural network; Handwritten text analysis and recognition; Keyword spotting; Large-scale indexing and search;
D O I
暂无
中图分类号
学科分类号
摘要
Keyword Spotting (KWS) is here considered as a basic technology for Probabilistic Indexing (PrIx) of large collections of handwritten text images to allow fast textual access to the contents of these collections. Under this perspective, a probabilistic framework for lexicon-based KWS in text images is presented. The presentation aims at providing formal insights which help understanding classical statements of KWS (from which PrIx borrows fundamental concepts), as well as the relative challenges entailed by these statements. The development of the proposed framework makes it clear that word recognition or classification implicitly or explicitly underlies any formulation of KWS. Moreover, it suggests that the same statistical models and training methods successfully used for handwriting text recognition can advantageously be used also for PrIx, even though PrIx does not generally require or rely on any kind of previously produced image transcripts. Experiments carried out using these approaches support the consistency and the general interest of the proposed framework. Results on three datasets traditionally used for KWS benchmarking are significantly better than those previously published for these datasets. In addition, good results are also reported on two new, larger handwritten text image datasets (Bentham and Plantas), showing the great potential of the methods proposed in this paper for indexing and textual search in large collections of untranscribed handwritten documents. Specifically, we achieved the following Average Precision values: IAMDB: 0.89, George Washington: 0.91, Parzival: 0.95, Bentham: 0.91 and Plantas: 0.92.
引用
收藏
页码:17501 / 17520
页数:19
相关论文
共 50 条
  • [41] Generation of Images with Handwritten Text in Russian
    Bogatenkova, A.O.
    Belyaeva, O.V.
    Perminov, A.I.
    Programming and Computer Software, 2024, 50 (07) : 483 - 492
  • [42] Probabilistic Lexicon-Based Approach for Stock Market Prediction: A Case Study of The Stock Exchange of Thailand (SET)
    Sakphoowadon, Surinthip
    Wisitpongphan, Nawaporn
    Haruechaiyasak, Choochart
    2018 18TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2018, : 383 - 388
  • [43] Lexicon-Based Sentiment Analysis for Movie Review Tweets
    Azizan, Azilawati
    Jamal, Nurul Najwa S. K. Abdul
    Abdullah, Mohammad Nasir
    Mohamad, Masurah
    Khairuddin, Nurkhairizan
    2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA SCIENCES (AIDAS2019), 2019, : 132 - 136
  • [44] A Lexicon-based Collaborative Filtering Approach for Recommendation Systems
    Deac-Petrusel, Mara
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 203 - 210
  • [45] A Lexicon-Based Graph Neural Network for Chinese NER
    Gui, Tao
    Zou, Yicheng
    Zhang, Qi
    Peng, Minlong
    Fu, Jinlan
    Wei, Zhongyu
    Huang, Xuanjing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1040 - 1050
  • [46] Arabic Sentiment Analysis: Lexicon-based and Corpus-based
    Abdulla, Nawaf A.
    Ahmed, Nizar A.
    Shehab, Mohammed A.
    Al-Ayyoub, Mahmoud
    2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2013,
  • [47] A lexicon-based method for detecting eye diseases on microblogs
    Sarsam, Samer Muthana
    Al-Samarraie, Hosam
    APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [48] Lexicon-based browsers for searching in news video archives
    Worring, M.
    Snoek, C. G. M.
    Koelma, D. C.
    Nguyen, G. R.
    de Rooji, O.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 1256 - +
  • [49] A WEIGHTED LEXICON-BASED GENERATIVE MODEL FOR OPINION RETRIEVAL
    Liao, Xiang-Wen
    Chen, Hu
    Wei, Jing-Jing
    Yu, Zhi-Yong
    Chen, Guo-Long
    PROCEEDINGS OF 2014 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2014, : 821 - 826
  • [50] Lexicon-based prompt for financial dimensional sentiment analysis
    Lin, Wei
    Liao, Li-Chuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244