Semi-Automatic Creation of a Reference News Corpus for Fine-Grained Multi-Label Scenarios

被引:0
|
作者
Teixeira, Jorge [1 ]
Sarmento, Luis [1 ]
Oliveira, Eugenio [2 ]
机构
[1] Labs SAPO UP, FEUP LIACC, Rua Dr Roberto Frias S-N, P-4200465 Oporto, Portugal
[2] FEUP LIACC, P-4200465 Oporto, Portugal
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we tackle the problem of creating a reference corpus for the classification of news items in fine-grained multi-label scenarios. These scenarios are particularly challenging for text classification techniques, and the availability of reference corpora is one important bottleneck for developing and testing new classification strategies. We propose a semiautomatic approach for creating a reference corpus that uses three auxiliary classification methods - one based on Support Vector Machines, one based on Nearest Neighbor Classifiers and another based on a dictionary-based classification heuristic - for suggesting to human annotators topic-related labels that can be used to describe different facets of a given news item being annotated. Using such approach, we semi-automatically produce a corpus of 1,600 news items with 865 different labels, having in average 3.63 labels per news item. We evaluate the contribution of each of the auxiliary classification methods to the annotation process and we conclude that: (i) none of the methods alone is capable of suggesting all relevant labels, (ii) a dictionary-based classification heuristic contributes significantly and (iii) the Nearest Neighbor classifier performs very efficiently in the most extreme multi-label part of the problem and is robust to the very unbalanced item-to-class distribution.
引用
收藏
页码:749 / +
页数:2
相关论文
共 50 条
  • [1] Fine-grained local label correlation for multi-label classification
    Zhao, Tianna
    Zhang, Yuanjian
    Miao, Duoqian
    Pedrycz, Witold
    KNOWLEDGE-BASED SYSTEMS, 2025, 314
  • [2] Fine-grained Multi-label Sexism Classification Using Semi-supervised Learning
    Abburi, Harika
    Parikh, Pulkit
    Chhaya, Niyati
    Varma, Vasudeva
    WEB INFORMATION SYSTEMS ENGINEERING, WISE 2020, PT II, 2020, 12343 : 531 - 547
  • [3] Multi-label adversarial fine-grained cross-modal retrieval
    Sun, Chunpu
    Zhang, Huaxiang
    Liu, Li
    Liu, Dongmei
    Wang, Lin
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
  • [4] DeepHSAR: Semi-supervised fine-grained learning for multi-label human sexual activity recognition
    Gangwar, Abhishek
    Gonzalez-Castro, Victor
    Alegre, Enrique
    Fidalgo, Eduardo
    Martinez-Mendoza, Alicia
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)
  • [5] Semi-automatic Labeling with Active Learning for Multi-label Image Classification
    Wu, Jian
    Ye, Chen
    Sheng, Victor S.
    Yao, Yufeng
    Zhao, Pengpeng
    Cui, Zhiming
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT I, 2015, 9314 : 473 - 482
  • [6] Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach
    Harika Abburi
    Pulkit Parikh
    Niyati Chhaya
    Vasudeva Varma
    Data Science and Engineering, 2021, 6 : 359 - 379
  • [7] Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach
    Abburi, Harika
    Parikh, Pulkit
    Chhaya, Niyati
    Varma, Vasudeva
    DATA SCIENCE AND ENGINEERING, 2021, 6 (04) : 359 - 379
  • [8] A Novel Fuzzy Logic Model for Multi-label Fine-Grained Emotion Retrieval
    Wang, Chu
    Wang, Daling
    Feng, Shi
    Zhang, Yifei
    SOCIAL MEDIA PROCESSING, SMP 2017, 2017, 774 : 218 - 231
  • [9] A semi-automatic creation of an annotated corpus for opinion mining
    Sadoun, Driss
    5E CONGRES MONDIAL DE LINGUISTIQUE FRANCAISE, 2016, 27
  • [10] Fine-Grained Emotion Analysis of Arabic Tweets: A Multi-Target Multi-Label Approach
    Badarneh, Omar
    Al-Ayyoub, Mahmoud
    Alhindawi, Nouh
    Tawalbeh, Lo'ai A.
    Jararweh, Yaser
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 340 - 345