Feature reinforcement approach to poly-lingual text categorization

被引:0
|
作者
Wei, Chih-Ping [1 ]
Shi, Huihua [2 ]
Yang, Christopher C. [3 ]
机构
[1] Natl Tsing Hua Univ, Inst Technol Management, Hsinchu, Taiwan
[2] Natl Sun Yat Sen Univ, Dept Informat Management, Kaohsiung 80424, Taiwan
[3] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Sha Tin, Peoples R China
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naive approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora.
引用
收藏
页码:99 / +
页数:2
相关论文
共 50 条
  • [31] Cascaded feature selection in SVMs text categorization
    Masuyama, T
    Nakagawa, H
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 588 - 591
  • [32] Study on constraints for feature selection in text categorization
    Xu, Yan
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    Zhang, Sen
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2008, 45 (04): : 596 - 602
  • [33] An Efficient Feature Ranking Measure for Text Categorization
    Tan, Songbo
    Wang, Yuefen
    Cheng, Xueqi
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 407 - +
  • [34] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [35] A Novel Feature Weight Algorithm for Text Categorization
    Shang, Wenqian
    Dong, Hongbin
    Zhu, Haibin
    Wang, Yongbin
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 269 - 275
  • [36] An examination of feature selection frameworks in text categorization
    How, BC
    Kiong, WT
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 558 - 564
  • [37] A study on feature weighting in Chinese text categorization
    Xue, DJ
    Sun, MS
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 592 - 601
  • [38] A multi-criteria decision making approach in feature selection for enhancing text categorization
    Doan, S
    Horiguchi, S
    DATA MINING VI: DATA MINING, TEXT MINING AND THEIR BUSINESS APPLICATIONS, 2005, : 77 - 87
  • [39] A Latent Semantic Analysis-based Approach to Geographic Feature Categorization from Text
    Huang, Yuxia
    FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 87 - 94
  • [40] t-Test feature selection approach based on term frequency for text categorization
    Wang, Deqing
    Zhang, Hui
    Liu, Rui
    Lv, Weifeng
    Wang, Datao
    PATTERN RECOGNITION LETTERS, 2014, 45 : 1 - 10