Using the Web to create dynamic dictionaries in handwritten out-of-vocabulary word recognition

被引:6
|
作者
Oprean, Cristina [1 ]
Likforman-Sulem, Laurence [1 ]
Popescu, Adrian [2 ]
Mokbel, Chafic [3 ]
机构
[1] Telecom ParisTech, Inst Mines Telecom, 46 Rue Barrault, F-75013 Paris, France
[2] CEA, LIST, LVIC, F-91190 Gif Sur Yvette, France
[3] Univ Balamand, Fac Engn, Tripoli, Libya
关键词
D O I
10.1109/ICDAR.2013.199
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handwriting recognition systems rely on predefined dictionaries obtained from training data. Small and static dictionaries are usually exploited to obtain high in-vocabulary (IV) accuracy at the expense of coverage. Thus the recognition of out-of-vocabulary (OOV) words cannot be handled efficiently. To improve OOV recognition while keeping IV dictionaries small, we introduce a multi-step approach that exploits Web resources. After an initial IV-OOV sequence classification, external resources are used to create OOV sequence-adapted dynamic dictionaries. A final Viterbi-based decoding is performed over the dynamic dictionary to determine the most probable word for the OOV sequence. We validate our approach with experiments conducted on RIMES, a publicly available database. Results show that improvements are obtained compared to standard handwriting recognition, performed with a static dictionary. Both domain-adapted and generic dynamic dictionaries are studied and we show that domain adaptation is beneficial.
引用
收藏
页码:989 / 993
页数:5
相关论文
共 50 条
  • [41] OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM
    Egorova, Ekaterina
    Burget, Lukas
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5919 - 5923
  • [42] Out-of-Vocabulary Word Detection in Spoken Dialogues Based on Joint Decoding with User Response Patterns
    Oshio, Miki
    Munakata, Hokuto
    Takeda, Ryu
    Komatani, Kazunori
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1753 - 1759
  • [43] Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples
    Van Tung Pham
    Xu, Haihua
    Xiao, Xiong
    Chen, Nancy F.
    Chng, Eng Siong
    Li, Haizhou
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 933 - 937
  • [44] Glyph2Vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs
    Chen, Hong-You
    Yu, Sz-Han
    Lin, Shou-De
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2865 - 2871
  • [45] MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective
    Wang, Xiao
    Dou, Shihan
    Xiong, Limao
    Zou, Yicheng
    Zhang, Qi
    Gui, Tao
    Qiao, Liang
    Cheng, Zhanzhan
    Huang, Xuanjing
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5590 - 5600
  • [46] Robust out-of-vocabulary rejection for low-complexity speaker independent speech recognition
    Broun, CC
    Campbell, WM
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1811 - 1814
  • [47] Dynamic Bayesian Networks for Handwritten Arabic Word Recognition
    Ghanmi, Nabil
    Awal, Amhad-Montaser
    Kooli, Nihel
    2017 1ST INTERNATIONAL WORKSHOP ON ARABIC SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2017, : 104 - 108
  • [48] A hybrid large vocabulary handwritten word recognition system using neural networks with hidden Markov models
    Koerich, AL
    Leydier, Y
    Sabourin, R
    Suen, CY
    EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 99 - 104
  • [49] Using lexical similarity in handwritten word recognition
    Park, J
    Govindaraju, V
    IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, VOL II, 2000, : 290 - 295
  • [50] Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
    Chu, Chenhui
    Kurohashi, Sadao
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 644 - 648