Word and sentence extraction using irregular pyramid

被引:0
|
作者
Loo, PK [1 ]
Tan, CL
机构
[1] Singapore Polytech, Sch Built Environm & Design, Singapore 139651, Singapore
[2] Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents the result of our continued work on a further enhancement to our previous proposed algorithm. Moving beyond the extraction of word groups and based on the same irregular pyramid structure the new proposed algorithm groups the extracted words into sentences. The uniqueness of the algorithm is in its ability to process text of a wide variation in terms of size, font, orientation and layout on the same document image. No assumption is made on any specified document type. The algorithm is based on the irregular pyramid structure with the application. of four fundamental concepts. The first is the inclusion of background information. The second is the concept of closeness where text information within a group is close to each other, in terms of spatial distance, as compared to other text areas. The third is the "majority win" strategy that is more suitable under the greatly varying environment than a constant threshold value. The final concept is the uniformity and continuity among words belonging to the same sentence.
引用
收藏
页码:307 / 318
页数:12
相关论文
共 50 条
  • [32] Learning Word and Sentence Embeddings Using a Generative Convolutional Network
    Vargas-Ocampo, Edgar
    Roman-Rangel, Edgar
    Hermosillo-Valadez, Jorge
    PATTERN RECOGNITION, 2018, 10880 : 135 - 144
  • [33] Developing a sentence level fairness metric using word embeddings
    Ahmed Izzidien
    Stephen Fitz
    Peter Romero
    Bao S. Loe
    David Stillwell
    International Journal of Digital Humanities, 2023, 5 (2-3) : 95 - 130
  • [34] Exploring fake news identification using word and sentence embeddings
    Priyanga, V. T.
    Sanjanasri, J. P.
    Menon, Vijay Krishna
    Gopalakrishnan, E. A.
    Soman, K. P.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (05) : 5441 - 5448
  • [35] WORD RECALL AS A FUNCTION OF SENTENCE GENERATION AND SENTENCE CONTEXT
    GOLLUB, D
    HEALY, AF
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1987, 25 (05) : 359 - 360
  • [36] Cascade Word Embedding to Sentence Embedding: A Class Label Enhanced Approach to Phenotype Extraction
    Xing, Wenhui
    Yuan, Xiaohui
    Li, Lin
    Peng, Jing
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 477 - 484
  • [37] Anatomy of word and sentence meaning
    Posner, MI
    Pavese, A
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (03) : 899 - 905
  • [38] Word frequency and position in sentence
    Uhlirova, Ludmila
    GLOTTOMETRICS, 2007, 14 : 1 - 20
  • [39] WORD PROSODY AND SENTENCE PROSODY
    DUFEU, VM
    PHONETICA, 1970, 21 (01) : 31 - &
  • [40] WORD REPETITIONS IN SENTENCE RECOGNITION
    MURNANE, K
    SHIFFRIN, RM
    MEMORY & COGNITION, 1991, 19 (02) : 119 - 130