A feature selection approach for automatic e-book classification based on discourse segmentation

被引:3
|
作者
Guo, Jiunn-Liang [1 ]
Wang, Hei-Chia [2 ]
Lai, Ming-Way [2 ]
机构
[1] ROC Taiwan Air Force Acad, Kaohsiung, Taiwan
[2] Natl Cheng Kung Univ, Inst Informat Management, Tainan 70101, Taiwan
关键词
Discourse segmentation; Feature selection; Text classification; Word sense disambiguation; INFORMATION; TEXT; MODEL;
D O I
10.1108/PROG-12-2012-0071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents - e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus. Design/methodology/approach - The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique. Findings - The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books. Research limitations/implications - Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold. Practical implications - The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documents - e-books as against to conventional techniques. Originality/value - A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.
引用
收藏
页码:2 / 22
页数:21
相关论文
共 50 条
  • [1] A FEATURE SELECTION APPROACH FOR AUTOMATIC MUSIC GENRE CLASSIFICATION
    Silla, Carlos N., Jr.
    Koerich, Alessandro L.
    Kaestner, Celso A. A.
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2009, 3 (02) : 183 - 208
  • [2] Feature Selection for Automatic CT-based Prostate Segmentation
    Kos, Artur
    Skalski, Andrzej
    Zielinski, Tomasz P.
    Gomes, Diana
    Sa, Vitor
    Kedzierawski, Piotr
    Kuszewski, Tomasz
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST), 2016, : 243 - 248
  • [3] A Novel Automatic Diagnostic Approach based on Nystagmus Feature Selection and Neural Network Classification
    Ben Slama, Amine
    Mouelhi, Aymen
    Sahli, Hanene
    Manoubi, Sondes
    Ben Salah, Mamia
    Sayadi, Mounir
    Trabelsi, Hedi
    Fnaiech, Farhat
    PROCEEDINGS OF THE IECON 2016 - 42ND ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2016, : 5165 - 5170
  • [4] Automatic feature selection for unsupervised image segmentation
    Al-Nuaimy, W
    Huang, Y
    Eriksen, A
    Nguyen, VT
    APPLIED PHYSICS LETTERS, 2000, 77 (08) : 1230 - 1232
  • [5] Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
    Lee, Zne-Jung
    Lee, Chou-Yuan
    Chang, Li-Yun
    Sano, Natsuki
    SYMMETRY-BASEL, 2021, 13 (09):
  • [6] Enhancing e-Book Selection Practices in Malaysian Academic Libraries
    Abdullah, Che Zainab Hj
    Kassim, Norliya Ahmad
    IEEE SYMPOSIUM ON BUSINESS, ENGINEERING AND INDUSTRIAL APPLICATIONS (ISBEIA 2012), 2012, : 118 - 123
  • [7] Research on the key perception points in the process of e-book selection
    Xing, Sisi
    Peng, Aidong
    Mao, Yihong
    ELECTRONIC LIBRARY, 2020, 38 (5-6): : 1053 - 1071
  • [8] E-book Circulation System Based on Blockchain
    Lee, Cheng-Chi
    Li, Chung-Wei
    Li, Chun-Ta
    Chen, Chin-Ling
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 615 - 619
  • [9] A user-centred approach to e-book design
    Wilson, R
    Landoni, M
    Gibb, F
    ELECTRONIC LIBRARY, 2002, 20 (04): : 322 - 330
  • [10] Facial Volumization - An anatomic approach (mit E-Book)
    Bayerl, Christiane
    AKTUELLE DERMATOLOGIE, 2020, 46 (06)