Building semantically annotated corpus for text classification of Indian defence news articles

被引：3

作者：

Kanekar S.A. ^{[1
]}

Sharma A. ^{[2
]}

Patkar G.S. ^{[1
]}

Tilve A.K.S. ^{[1
]}

机构：

[1] Department of Computer Engineering, Don Bosco College of Engineering, Margao, 403602, Goa

[2] Centre for Artificial Intelligence and Robotics (CAIR), DRDO Complex, C V Raman Nagar, Bengaluru, Karnataka

来源：

International Journal of Information Technology | 2021年 / 13卷 / 4期

关键词：

Annotation; Dataset; Inter annotator agreement (IAA); Machine learning (ML); Natural language processing (NLP); Text classification;

D O I：

10.1007/s41870-021-00679-x

中图分类号：

学科分类号：

摘要：

A large amount of textual data is generated online with rapid growth and technological advancement. Deriving interesting patterns like opinions, summaries and facts from the text data is a challenging task. Currently, there is no dataset for subjectivity/objectivity classification data in Indian National Security domain. A News dataset has been created for purpose of subjective/objective sentence classification. This paper defines the news corpus annotation guidelines and employs an inter-annotator agreement metric to assess the quality of the dataset. The proposed methodology also highlights different challenges and limitations of building a corpus in the National Security domain. The corpus can be utilized for research work in developing robust subjective/objective sentence classifier. Furthermore, text categorization experiments are conducted on corpus, demonstrates that neural network based classifier gives promising result. © 2021, Bharati Vidyapeeth's Institute of Computer Applications and Management.

引用

页码：1539 / 1544

页数：5

共 26 条

[1] Building a semantically annotated corpus of clinical texts
Roberts, Angus
Gaizauskas, Robert
Hepple, Mark
Demetriou, George
Guo, Yikun
Roberts, Ian
Setzer, Andrea
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 950 - 966
[2] Building a Semantically Annotated Corpus of Chinese Directional Complements
Kang, Byeongkwu
Yu, Sukyong
CHINESE LEXICAL SEMANTICS, CLSW 2022, PT II, 2023, 13496 : 43 - 57
[3] Building a lexicon of French deverbal nouns from a semantically annotated corpus
Balvet, Antonio
Barque, Lucie
Marin, Rafael
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1408 - 1413
[4] Building an Annotated Corpus for Text Summarization and Question Answering
Varasai, Patcharee
Pechsiri, Chaveevan
Sukvari, Thana
Satayamas, Vee
Kawtrakul, Asanee
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3427 - 3434
[5] Building a semantically annotated corpus for chronic disease complications using two document types
Alnazzawi, Noha
PLOS ONE, 2021, 16 (03):
[6] NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish
Taule, Mariona
Nofre, Montserrat
Bargiela, Victor
Bonet, Xavier
LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (04) : 1115 - 1155
[7] Automated Text Classification of News Articles: A Practical Guide
Barbera, Pablo
Boydstun, Amber E.
Linn, Suzanna
McMahon, Ryan
Nagler, Jonathan
POLITICAL ANALYSIS, 2021, 29 (01) : 19 - 42
[8] Text classification of news articles with support vector machines
Paass, G
Kindermann, J
Leopold, E
TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 53 - 64
[9] Classification of News and Research Articles Using Text Pattern Mining
Chaudhari, Sujit V.
Lade, Shrikant
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (10): : 43 - 47
[10] ANT Corpus : An Arabic News Text Collection for Textual Classification
Chouigui, Amina
Ben Khiroun, Oussama
Elayeb, Bilel
2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 135 - 142

← 1 2 3 →