A comparative study on supervised and unsupervised learning approaches for multilingual text categorization

被引:0
|
作者
Lee, Chung-Hong [1 ]
Yang, Hsin-Chang [2 ]
Chen, Ting-Chung [1 ]
Ma, Sheng-Min [1 ]
机构
[1] Natl Kaohsiung Univ Appl Sci, Dept Elect Engn, Kaohsiung, Taiwan
[2] Chang Jung Univ, Dept Informat Management, Tainan, Taiwan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently users of internationally distributed information networks need tools and methods that will enable them to discover, retrieve and categorize relevant information, in whatever language and form it may have been stored. This drives a convergence of numerous interests from diverse research communities focusing on the issues related to multilingual text categorization. In this work we compare and evaluate the performance of the leading supervised and unsupervised approaches for multilingual text categorization by using various performance measures and standard document corpora. For simplicity, we selected Support Vector Machines (SVM) and Latent Semantic Indexing (LSI) techniques as representatives of supervised and unsupervised methods for multilingual text categorization, respectively. The preliminary results show that our platform models including both supervised and unsupervised learning methods have the potentials for multilingual text categorization.
引用
收藏
页码:511 / +
页数:2
相关论文
共 50 条
  • [31] A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization
    Sharef, Nurfadhlina Mohd
    Martin, Trevor
    Kasmiran, Khairul Azhar
    Mustapha, Aida
    Sulaiman, Md Nasir
    Azmi-Murad, Masrah Azrifah
    SOFT COMPUTING, 2015, 19 (06) : 1701 - 1714
  • [32] A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization
    Nurfadhlina Mohd Sharef
    Trevor Martin
    Khairul Azhar Kasmiran
    Aida Mustapha
    Md. Nasir Sulaiman
    Masrah Azrifah Azmi-Murad
    Soft Computing, 2015, 19 : 1701 - 1714
  • [33] Indian Language Text Representation and Categorization using Supervised Learning Algorithm
    Swamy, M. Narayana
    Hanumanthappa, M.
    Jyothi, N. M.
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 406 - 410
  • [34] Bangla Content Categorization Using Text Based Supervised Learning Methods
    Al Mostakim, Sadek
    Ehsan, Faiza
    Hasan, Syeda Mahdiea
    Islam, Sadia
    Shatabda, Swakkhar
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [35] JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR MULTILINGUAL ASR
    Bai, Junwen
    Li, Bo
    Zhang, Yu
    Bapna, Ankur
    Siddhartha, Nikhil
    Sim, Khe Chai
    Sainath, Tara N.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6402 - 6406
  • [36] Chinese Short Text Categorization Based on Semi-Supervised Learning
    Ma, Jie
    Xiong, Zhong-Yang
    Zhang, Yu-Fang
    Wang, Liu-Qian
    Xie, Jiang
    3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA 2017), 2017, : 45 - 54
  • [37] Multilingual emotion classification using supervised learning: Comparative experiments
    Becker, Karin
    Moreira, Viviane P.
    dos Santos, Aline G. L.
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) : 684 - 704
  • [38] A comparative study on feature weight in text categorization
    Deng, Zhi-Hong
    Tang, Shi-Wei
    Yang, Dong-Qing
    Zhang, Ming
    Li, Li-Yu
    Xie, Kun-Qing
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3007 : 588 - 597
  • [39] A comparative study on feature weight in text categorization
    Deng, ZH
    Tang, SW
    Yang, DQ
    Zhang, M
    Li, LY
    Xie, KQ
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 588 - 597
  • [40] Minimally Supervised Categorization of Text with Metadata
    Zhang, Yu
    Meng, Yu
    Huang, Jiaxin
    Xu, Frank F.
    Wang, Xuan
    Han, Jiawei
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1231 - 1240