Generic features selection for structure classification of diverse styled scholarly articles

被引:1
|
作者
Waqas, Muhammad [1 ]
Anjum, Nadeem [1 ]
机构
[1] Capital Univ Sci & Technol, Dept Comp Sci, ICT, Expressway,Kahuta Rd,Zone 5, Islamabad, Pakistan
关键词
Features Engineering; Machine Learning; Research Article; Metadata Extraction; Text mining; KNOWLEDGE; SYSTEM;
D O I
10.1007/s11042-023-16128-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The enormous growth in online research publications in diversified domains has attracted the research community to extract these valuable scientific resources by searching online digital libraries and publishers' websites. A precise search is desired to enlist most related articles by applying semantic queries to the document's metadata and the structural elements. The online search engines and digital libraries offer only keyword-based search on full-body text, which creates excessive results. Therefore, the research article's structural and metadata information has to be stored in machine comprehendible form by the online research publishers. The research community in recent years has adopted different approaches to extract structural information from research documents like rule-based heuristics and machine-learning-based approaches. Studies suggest that machine-learning-based techniques have produced optimum results for document structure extraction from publishers having diversified publication layouts. In this paper, we have proposed thirteen different logical layout structural (LLS) components. We have identified a two-staged innovative set of generic features that are associated with the LLS. This approach has given our technique an advantage against the state-of-the-art for structural classification of digital scientific articles with diversified publication styles. We have applied chi-square (chi(2)) for feature selection, and the final result has revealed that SVM (Kernal function) has produced an optimum result with an overall F-measure of 0.95.
引用
收藏
页码:16623 / 16655
页数:33
相关论文
共 50 条
  • [21] A Novel Algorithm for Technical Articles Classification Based on Gene Selection
    Kilany, Rania
    Ammar, Reda
    Rajasekaran, Sanguthevar
    2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, : 234 - 238
  • [22] Quality Classification of ASEAN Wikipedia Articles using Statistical Features
    Saengthongpattana, Kanchana
    Supnithi, Thepchai
    Soonthornphisaj, Nuanwan
    2018 INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2018), 2018, : 207 - 212
  • [23] A Generic Classification Scheme for Urban Structure Types
    Lehner, Arthur
    Blaschke, Thomas
    REMOTE SENSING, 2019, 11 (02)
  • [24] STRUCTURE OF FACTUAL INFORMATION - ATTEMPT AT A GENERIC CLASSIFICATION
    DARLINGTON, D
    JACKSON, JR
    JOURNAL OF ANATOMY, 1976, 121 (JUL) : 644 - 644
  • [25] A modified mixtures of experts architecture for classification with diverse features
    Chen, K
    Chi, HS
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 215 - 220
  • [26] AUTOMATED CLASSIFICATION OF GENERIC TERRAIN FEATURES IN DIGITAL ELEVATION MODELS
    GRAFF, LH
    USERY, EL
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 1993, 59 (09): : 1409 - 1417
  • [27] Making functional units functional: The role of rhetorical structure in use of scholarly journal articles
    Zhang, Lei
    Kopak, Rick
    Freund, Luanne
    Rasmussen, Edie
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2011, 31 (01) : 21 - 29
  • [28] Making functional units functional: The role of rhetorical structure in use of scholarly journal articles
    School of Library, Archival and Information Studies, University of British Columbia, Vancouver, BC V6T 1Z1, Canada
    Int J Inf Manage, 1600, 1 (21-29):
  • [29] Selection of features for the classification of wood board defects
    Estévez, PA
    Fernández, M
    Alcock, RJ
    Packianather, MS
    NINTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS (ICANN99), VOLS 1 AND 2, 1999, (470): : 347 - 352
  • [30] Features Selection Algorithms for Classification of Voice Signals
    Silva, Leticia
    Bispo, Bruno
    Teixeira, Joao Paulo
    INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS / INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT / INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES 2020 (CENTERIS/PROJMAN/HCIST 2020), 2021, 181 : 948 - 956