Building semantically annotated corpus for text classification of Indian defence news articles

被引:3
|
作者
Kanekar S.A. [1 ]
Sharma A. [2 ]
Patkar G.S. [1 ]
Tilve A.K.S. [1 ]
机构
[1] Department of Computer Engineering, Don Bosco College of Engineering, Margao, 403602, Goa
[2] Centre for Artificial Intelligence and Robotics (CAIR), DRDO Complex, C V Raman Nagar, Bengaluru, Karnataka
关键词
Annotation; Dataset; Inter annotator agreement (IAA); Machine learning (ML); Natural language processing (NLP); Text classification;
D O I
10.1007/s41870-021-00679-x
中图分类号
学科分类号
摘要
A large amount of textual data is generated online with rapid growth and technological advancement. Deriving interesting patterns like opinions, summaries and facts from the text data is a challenging task. Currently, there is no dataset for subjectivity/objectivity classification data in Indian National Security domain. A News dataset has been created for purpose of subjective/objective sentence classification. This paper defines the news corpus annotation guidelines and employs an inter-annotator agreement metric to assess the quality of the dataset. The proposed methodology also highlights different challenges and limitations of building a corpus in the National Security domain. The corpus can be utilized for research work in developing robust subjective/objective sentence classifier. Furthermore, text categorization experiments are conducted on corpus, demonstrates that neural network based classifier gives promising result. © 2021, Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:1539 / 1544
页数:5
相关论文
共 26 条
  • [21] Text Categorization Using Hyper Rectangular Keyword Extraction: Application to News Articles Classification
    Hassaine, Abdelaali
    Mecheter, Souad
    Jaoua, Ali
    RELATIONAL AND ALGEBRAIC METHODS IN COMPUTER SCIENCE (RAMICS 2015), 2015, 9348 : 312 - 325
  • [22] Text Classification of News Articles Using Machine Learning on Low-resourced Language: Tigrigna
    Fesseha, Awet
    Xiong, Shengwu
    Emiru, Eshete Derb
    Dahou, Abdelghani
    2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 34 - 38
  • [23] LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification
    Kuzman, Taja
    Ljubesic, Nikola
    IEEE ACCESS, 2025, 13 : 35621 - 35633
  • [24] Multi-label Text Classification of Economic Concepts from Economic News Articles using Natural Language Processing
    Kim, Soojeong
    Lee, Minhyeok
    Seok, Junhee
    2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 417 - 420
  • [25] Dynamic building defect categorization through enhanced unsupervised text classification with domain-specific corpus embedding methods
    Jeon, Kahyun
    Lee, Ghang
    Yang, Seongmin
    Kim, Yonghan
    Suh, Seungah
    AUTOMATION IN CONSTRUCTION, 2024, 157
  • [26] Automated text classification of opinion vs. news French press articles. A comparison of transformer and feature-based approaches
    Escou, Louis
    Descampe, Antonin
    Fairon, Cedrick
    LANGUAGE & COMMUNICATION, 2024, 99 : 129 - 140