Language identification of Kannada, Hindi and English text words through visual discriminating features

被引：0

作者：

Padma M.C. ^{[1
]}

Vijaya P.A. ^{[2
]}

机构：

[1] Dept. of Computer Science and Engineering, PES College of Engineering, Mandya-571401, Karnataka

[2] Dept. of Electronics and Communication Engineering, Malnad College of Engineering, Hassan-573201, Karnataka

来源：

International Journal of Computational Intelligence Systems | 2008年 / 1卷 / 02期

关键词：

Document mage processing; Feature extraction; Horizontal lines; Language identification; Multi-lingual document; Vertical lines;

D O I：

10.2991/ijcis.2008.1.2.2

中图分类号：

学科分类号：

摘要：

In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR) system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.

引用

页码：116 / 126

页数：10

共 50 条

[1] Code-Borrowedness of English words in Hindi Language
Mohan, Ram
Arif, Muhammad
Wilson, Jobin
Chaudhury, Santanu
Lall, Brejesh
PROCEEDINGS OF THE FOURTH ACM IKDD CONFERENCES ON DATA SCIENCES (CODS '17), 2017,
[2] Automatic Language Identification system for code-mixed English-Kannada Social Media Text
Lakshmi, Sowmya B. S.
Shambhavi, B. R.
2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 214 - 218
[3] Character Embedding for Language Identification in Hindi-English Code-mixed Social Media Text
Veena, P. V.
Kumar, M. Anand
Soman, K. P.
COMPUTACION Y SISTEMAS, 2018, 22 (01): : 65 - 74
[4] Text-Independent Automatic Accent Identification System for Kannada Language
Soorajkumar, R.
Girish, G. N.
Ramteke, Pravin B.
Joshi, Shreyas S.
Koolagudi, Shashidhar G.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT 2016, VOL 2, 2017, 469 : 411 - 418
[5] Importance of Visual Support Through Lipreading in the Identification of Words in Spanish Language
Gomez-Vicente, Violeta
Esquiva, Gema
Lancho, Carmen
Benzerdjeb, Kawthar
Jerez, Antonia Angulo
Auso, Eva
LANGUAGE AND SPEECH, 2024,
[6] Word Level Language Identification in Assamese-Bengali-Hindi-English Code-Mixed Social Media Text
Sarma, Neelakshi
Singh, Sanasam Ranbir
Goswami, Diganta
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 261 - 266
[7] Aggressive and Offensive Language Identification in Hindi, Bangla, and English: A Comparative Study
Kumar R.
Lahiri B.
Ojha A.K.
SN Computer Science, 2021, 2 (1)
[8] Language Identification Using Visual Features
Newman, Jacob L.
Cox, Stephen J.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 1936 - 1947
[9] Parallel Text Identification Using Lexical and Corpus Features for the English-Maori Language Pair
Mohaghegh, Mahsa
Sarrafzadeh, Abdolhossein
2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 910 - 915
[10] Cross-Lingual Short-Text Semantic Similarity for Kannada-English Language Pair
Muralikrishna, S. N.
Holla, Raghurama
Harivinod, N.
Ganiga, Raghavendra
COMPUTERS, 2024, 13 (09)

← 1 2 3 4 5 →