Word and sentence extraction using irregular pyramid

被引:0
|
作者
Loo, PK [1 ]
Tan, CL
机构
[1] Singapore Polytech, Sch Built Environm & Design, Singapore 139651, Singapore
[2] Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents the result of our continued work on a further enhancement to our previous proposed algorithm. Moving beyond the extraction of word groups and based on the same irregular pyramid structure the new proposed algorithm groups the extracted words into sentences. The uniqueness of the algorithm is in its ability to process text of a wide variation in terms of size, font, orientation and layout on the same document image. No assumption is made on any specified document type. The algorithm is based on the irregular pyramid structure with the application. of four fundamental concepts. The first is the inclusion of background information. The second is the concept of closeness where text information within a group is close to each other, in terms of spatial distance, as compared to other text areas. The third is the "majority win" strategy that is more suitable under the greatly varying environment than a constant threshold value. The final concept is the uniformity and continuity among words belonging to the same sentence.
引用
收藏
页码:307 / 318
页数:12
相关论文
共 50 条
  • [1] Word extraction using irregular pyramid
    Loo, PK
    Tan, CL
    DOCUMENT RECOGNITION AND RETRIEVAL VIII, 2001, 4307 : 363 - 371
  • [2] Detection of word groups based on irregular pyramid
    Loo, PK
    Tan, CL
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 200 - 204
  • [3] Sentence extraction using asymmetric word similarity and topic similarity
    Azmi-Murad, M.
    Martin, T. P.
    APPLIED SOFT COMPUTING TECHNOLOGIES: THE CHALLENGE OF COMPLEXITY, 2006, 34 : 505 - 514
  • [4] Multiresolution segmentation using the irregular pyramid
    Bertolino, P
    Montanvert, A
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL I, 1996, : 257 - 260
  • [5] STRUCTURAL TEXTURE SEGMENTATION USING IRREGULAR PYRAMID
    LAM, SWC
    IP, HHS
    PATTERN RECOGNITION LETTERS, 1994, 15 (07) : 691 - 698
  • [6] Text extraction using pyramid
    Tan, CL
    Ng, PO
    PATTERN RECOGNITION, 1998, 31 (01) : 63 - 72
  • [7] Phenotype Extraction Extraction Based on Word Embedding to Sentence Embedding Cascaded Approach
    Xing, Wenhui
    Yuan, Xiaohui
    Li, Lin
    Hu, Lun
    Peng, Jing
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2018, 17 (03) : 172 - 180
  • [8] Sentiment-Target Word Pair Extraction Model Using Statistical Analysis of Sentence Structures
    Jo, Jaechoon
    Kim, Gyeongmin
    Park, Kinam
    ELECTRONICS, 2021, 10 (24)
  • [9] THE WORD AND THE SENTENCE
    Graff, W. L.
    LANGUAGE, 1929, 5 (03) : 163 - 188
  • [10] Sentence and word
    Seppänen, L
    BETWEEN THE GRAMMAR AND THE LEXICON, 1998, 390 : 113 - 116