Language identification of Kannada, Hindi and English text words through visual discriminating features

被引:0
|
作者
Padma M.C. [1 ]
Vijaya P.A. [2 ]
机构
[1] Dept. of Computer Science and Engineering, PES College of Engineering, Mandya-571401, Karnataka
[2] Dept. of Electronics and Communication Engineering, Malnad College of Engineering, Hassan-573201, Karnataka
关键词
Document mage processing; Feature extraction; Horizontal lines; Language identification; Multi-lingual document; Vertical lines;
D O I
10.2991/ijcis.2008.1.2.2
中图分类号
学科分类号
摘要
In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR) system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.
引用
收藏
页码:116 / 126
页数:10
相关论文
共 50 条
  • [1] Code-Borrowedness of English words in Hindi Language
    Mohan, Ram
    Arif, Muhammad
    Wilson, Jobin
    Chaudhury, Santanu
    Lall, Brejesh
    PROCEEDINGS OF THE FOURTH ACM IKDD CONFERENCES ON DATA SCIENCES (CODS '17), 2017,
  • [2] Automatic Language Identification system for code-mixed English-Kannada Social Media Text
    Lakshmi, Sowmya B. S.
    Shambhavi, B. R.
    2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 214 - 218
  • [3] Character Embedding for Language Identification in Hindi-English Code-mixed Social Media Text
    Veena, P. V.
    Kumar, M. Anand
    Soman, K. P.
    COMPUTACION Y SISTEMAS, 2018, 22 (01): : 65 - 74
  • [4] Text-Independent Automatic Accent Identification System for Kannada Language
    Soorajkumar, R.
    Girish, G. N.
    Ramteke, Pravin B.
    Joshi, Shreyas S.
    Koolagudi, Shashidhar G.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT 2016, VOL 2, 2017, 469 : 411 - 418
  • [5] Importance of Visual Support Through Lipreading in the Identification of Words in Spanish Language
    Gomez-Vicente, Violeta
    Esquiva, Gema
    Lancho, Carmen
    Benzerdjeb, Kawthar
    Jerez, Antonia Angulo
    Auso, Eva
    LANGUAGE AND SPEECH, 2024,
  • [6] Word Level Language Identification in Assamese-Bengali-Hindi-English Code-Mixed Social Media Text
    Sarma, Neelakshi
    Singh, Sanasam Ranbir
    Goswami, Diganta
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 261 - 266
  • [7] Aggressive and Offensive Language Identification in Hindi, Bangla, and English: A Comparative Study
    Kumar R.
    Lahiri B.
    Ojha A.K.
    SN Computer Science, 2021, 2 (1)
  • [8] Language Identification Using Visual Features
    Newman, Jacob L.
    Cox, Stephen J.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 1936 - 1947
  • [9] Parallel Text Identification Using Lexical and Corpus Features for the English-Maori Language Pair
    Mohaghegh, Mahsa
    Sarrafzadeh, Abdolhossein
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 910 - 915
  • [10] Cross-Lingual Short-Text Semantic Similarity for Kannada-English Language Pair
    Muralikrishna, S. N.
    Holla, Raghurama
    Harivinod, N.
    Ganiga, Raghavendra
    COMPUTERS, 2024, 13 (09)