Feature reinforcement approach to poly-lingual text categorization

被引:0
|
作者
Wei, Chih-Ping [1 ]
Shi, Huihua [2 ]
Yang, Christopher C. [3 ]
机构
[1] Natl Tsing Hua Univ, Inst Technol Management, Hsinchu, Taiwan
[2] Natl Sun Yat Sen Univ, Dept Informat Management, Kaohsiung 80424, Taiwan
[3] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Sha Tin, Peoples R China
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naive approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora.
引用
收藏
页码:99 / +
页数:2
相关论文
共 50 条
  • [41] A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm
    Chen, Hao
    Jiang, Wen
    Li, Canbing
    Li, Rui
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013
  • [42] A rough set-based CBR approach for feature and document reduction in text categorization
    Li, Y
    Shiu, SCK
    Pal, SK
    Liu, JNK
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2438 - 2443
  • [43] Computing with words for text processing: An approach to the text categorization
    Zadrozny, S
    Kacprzyk, J
    INFORMATION SCIENCES, 2006, 176 (04) : 415 - 437
  • [44] An Overview of Unsupervised Deep Feature Representation for Text Categorization
    Wang, Shiping
    Cai, Jinyu
    Lin, Qihao
    Guo, Wenzhong
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (03) : 504 - 517
  • [45] Feature space restructuring for SVMs with application to text categorization
    Takamura, H
    Matsumoto, Y
    PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2001, : 51 - 57
  • [46] Enhancement of DTP feature selection method for text categorization
    Moyotl-Hernández, E
    Jiménez-Salazar, H
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 719 - 722
  • [47] TOFA: Trace Oriented Feature Analysis in Text Categorization
    Yan, Jun
    Liu, Ning
    Yang, Qiang
    Fan, Weiguo
    Chen, Zheng
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 668 - +
  • [48] Applying cascaded feature selection to SVM text categorization
    Masuyama, T
    Nakagawa, H
    13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 241 - 245
  • [49] Feature selection for support vector machines in text categorization
    Liu, Y
    Lu, HM
    Lu, ZX
    Wang, P
    MLMTA'03: INTERNATIONAL CONFERENCE ON MACHINE LEARNING; MODELS, TECHNOLOGIES AND APPLICATIONS, 2003, : 129 - 134
  • [50] Feature Generation for Text Categorization Using World Knowledge
    Gabrilovich, Evgeniy
    Markovitch, Shaul
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1048 - 1053