Building semantically annotated corpus for text classification of Indian defence news articles

被引:3
|
作者
Kanekar S.A. [1 ]
Sharma A. [2 ]
Patkar G.S. [1 ]
Tilve A.K.S. [1 ]
机构
[1] Department of Computer Engineering, Don Bosco College of Engineering, Margao, 403602, Goa
[2] Centre for Artificial Intelligence and Robotics (CAIR), DRDO Complex, C V Raman Nagar, Bengaluru, Karnataka
关键词
Annotation; Dataset; Inter annotator agreement (IAA); Machine learning (ML); Natural language processing (NLP); Text classification;
D O I
10.1007/s41870-021-00679-x
中图分类号
学科分类号
摘要
A large amount of textual data is generated online with rapid growth and technological advancement. Deriving interesting patterns like opinions, summaries and facts from the text data is a challenging task. Currently, there is no dataset for subjectivity/objectivity classification data in Indian National Security domain. A News dataset has been created for purpose of subjective/objective sentence classification. This paper defines the news corpus annotation guidelines and employs an inter-annotator agreement metric to assess the quality of the dataset. The proposed methodology also highlights different challenges and limitations of building a corpus in the National Security domain. The corpus can be utilized for research work in developing robust subjective/objective sentence classifier. Furthermore, text categorization experiments are conducted on corpus, demonstrates that neural network based classifier gives promising result. © 2021, Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:1539 / 1544
页数:5
相关论文
共 26 条
  • [1] Building a semantically annotated corpus of clinical texts
    Roberts, Angus
    Gaizauskas, Robert
    Hepple, Mark
    Demetriou, George
    Guo, Yikun
    Roberts, Ian
    Setzer, Andrea
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 950 - 966
  • [2] Building a Semantically Annotated Corpus of Chinese Directional Complements
    Kang, Byeongkwu
    Yu, Sukyong
    CHINESE LEXICAL SEMANTICS, CLSW 2022, PT II, 2023, 13496 : 43 - 57
  • [3] Building a lexicon of French deverbal nouns from a semantically annotated corpus
    Balvet, Antonio
    Barque, Lucie
    Marin, Rafael
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1408 - 1413
  • [4] Building an Annotated Corpus for Text Summarization and Question Answering
    Varasai, Patcharee
    Pechsiri, Chaveevan
    Sukvari, Thana
    Satayamas, Vee
    Kawtrakul, Asanee
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3427 - 3434
  • [5] Building a semantically annotated corpus for chronic disease complications using two document types
    Alnazzawi, Noha
    PLOS ONE, 2021, 16 (03):
  • [6] NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish
    Taule, Mariona
    Nofre, Montserrat
    Bargiela, Victor
    Bonet, Xavier
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (04) : 1115 - 1155
  • [7] Automated Text Classification of News Articles: A Practical Guide
    Barbera, Pablo
    Boydstun, Amber E.
    Linn, Suzanna
    McMahon, Ryan
    Nagler, Jonathan
    POLITICAL ANALYSIS, 2021, 29 (01) : 19 - 42
  • [8] Text classification of news articles with support vector machines
    Paass, G
    Kindermann, J
    Leopold, E
    TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 53 - 64
  • [9] Classification of News and Research Articles Using Text Pattern Mining
    Chaudhari, Sujit V.
    Lade, Shrikant
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (10): : 43 - 47
  • [10] ANT Corpus : An Arabic News Text Collection for Textual Classification
    Chouigui, Amina
    Ben Khiroun, Oussama
    Elayeb, Bilel
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 135 - 142