A feature selection approach for automatic e-book classification based on discourse segmentation

被引:3
|
作者
Guo, Jiunn-Liang [1 ]
Wang, Hei-Chia [2 ]
Lai, Ming-Way [2 ]
机构
[1] ROC Taiwan Air Force Acad, Kaohsiung, Taiwan
[2] Natl Cheng Kung Univ, Inst Informat Management, Tainan 70101, Taiwan
关键词
Discourse segmentation; Feature selection; Text classification; Word sense disambiguation; INFORMATION; TEXT; MODEL;
D O I
10.1108/PROG-12-2012-0071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents - e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus. Design/methodology/approach - The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique. Findings - The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books. Research limitations/implications - Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold. Practical implications - The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documents - e-books as against to conventional techniques. Originality/value - A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.
引用
收藏
页码:2 / 22
页数:21
相关论文
共 50 条
  • [21] A Deep Learning Co-training Framework for e-book Classification
    Chang, Tsui-Ping
    Chen, Hung-Ming
    Chen, Jian-Qun
    2020 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2020), 2021, : 376 - 379
  • [22] Using a task-based approach in evaluating the usability of BoBIs in an e-book environment
    Abdullah, Noorhidawati
    Gibb, Forbes
    ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 246 - +
  • [23] Exploring the behavioral patterns of students learning with a Facebook-based e-book approach
    Zarzour, Hafed
    Bendjaballah, Sabrina
    Harirche, Hadjer
    COMPUTERS & EDUCATION, 2020, 156
  • [24] An E-Book Hub Service Based on a Cloud Platform
    Cheng, Jinn-Shing
    Huang, Echo
    Lin, Chuan-Lang
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTANCE LEARNING, 2012, 13 (05) : 39 - 55
  • [25] An RDF-Based Platform for E-Book Publishing
    Dittawit, Kornschnok
    Wuwongse, Vilas
    DIGITAL LIBRARIES: FOR CULTURAL HERITAGE, KNOWLEDGE DISSEMINATION, AND FUTURE CREATION: ICADL 2011, 2011, 7008 : 267 - 276
  • [26] E-book Reader Research Based on Kansei Engineering
    Li, Yan
    Liu, Xiaofei
    Qu, Zhenbo
    MATERIALS, TRANSPORTATION AND ENVIRONMENTAL ENGINEERING, PTS 1 AND 2, 2013, 779-780 : 1727 - 1730
  • [27] Feature Extracted Deep Neural Collaborative Filtering for E-Book Service Recommendations
    Kim, Ji-Yoon
    Lim, Chae-Kwan
    APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [28] A Study of Eye Image Extraction-based Automatic Character Creation E-Book Tool Application
    Lim, YangMi
    Nam, SangHun
    2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 222 - 225
  • [29] A filter-based feature selection approach in multilabel classification
    Shaikh, Rafia
    Rafi, Muhammad
    Mahoto, Naeem Ahmed
    Sulaiman, Adel
    Shaikh, Asadullah
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
  • [30] A Feature Selection Approach Based on Information Theory for Classification Tasks
    Jesus, Jhoseph
    Canuto, Anne
    Araujo, Daniel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 359 - 367