Semi-Automatic Creation of a Reference News Corpus for Fine-Grained Multi-Label Scenarios

被引:0
|
作者
Teixeira, Jorge [1 ]
Sarmento, Luis [1 ]
Oliveira, Eugenio [2 ]
机构
[1] Labs SAPO UP, FEUP LIACC, Rua Dr Roberto Frias S-N, P-4200465 Oporto, Portugal
[2] FEUP LIACC, P-4200465 Oporto, Portugal
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we tackle the problem of creating a reference corpus for the classification of news items in fine-grained multi-label scenarios. These scenarios are particularly challenging for text classification techniques, and the availability of reference corpora is one important bottleneck for developing and testing new classification strategies. We propose a semiautomatic approach for creating a reference corpus that uses three auxiliary classification methods - one based on Support Vector Machines, one based on Nearest Neighbor Classifiers and another based on a dictionary-based classification heuristic - for suggesting to human annotators topic-related labels that can be used to describe different facets of a given news item being annotated. Using such approach, we semi-automatically produce a corpus of 1,600 news items with 865 different labels, having in average 3.63 labels per news item. We evaluate the contribution of each of the auxiliary classification methods to the annotation process and we conclude that: (i) none of the methods alone is capable of suggesting all relevant labels, (ii) a dictionary-based classification heuristic contributes significantly and (iii) the Nearest Neighbor classifier performs very efficiently in the most extreme multi-label part of the problem and is robust to the very unbalanced item-to-class distribution.
引用
收藏
页码:749 / +
页数:2
相关论文
共 50 条
  • [41] A fine-grained modal label-based multi-stage network for multimodal sentiment analysis
    Peng, Junjie
    Wu, Ting
    Zhang, Wenqiang
    Cheng, Feng
    Tan, Shuhua
    Yi, Fen
    Huang, Yansong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [42] Automatic Fine-Grained BIM element classification using Multi-Modal deep learning (MMDL)
    Liu, Hao
    Gan, Vincent J. L.
    Cheng, Jack C. P.
    Zhou, Shanjing
    ADVANCED ENGINEERING INFORMATICS, 2024, 61
  • [43] Automatic Reference-Free Fine-Grained Machine Translation Error Detection via Named Entity Recognition and Back-Translation
    Yan, Yiting
    Song, Jiaxin
    Fu, Biao
    Ye, Na
    Shi, Xiaodong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 306 - 317
  • [44] Multi-scale attention-based adaptive feature fusion network for fine-grained ship classification in remote sensing scenarios
    Liu, Kun
    Zhang, Xiaomeng
    Xu, Zhijing
    Liu, Sidong
    JOURNAL OF APPLIED REMOTE SENSING, 2024, 18 (03)
  • [45] Fine-Grained Trajectory-Based Travel Time Estimation for Multi-City Scenarios Based on Deep Meta-Learning
    Wang, Chenxing
    Zhao, Fang
    Zhang, Haichao
    Luo, Haiyong
    Qin, Yanjun
    Fang, Yuchen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 15716 - 15728
  • [46] Multi-scale attention-based adaptive feature fusion network for fine-grained ship classification in remote sensing scenarios
    Liu, Kun
    Zhang, Xiaomeng
    Xu, Zhijing
    Liu, Sidong
    Journal of Applied Remote Sensing, 1600, 18 (03):
  • [47] Semi-Supervised Fine-Grained Image Categorization Using Transfer Learning With Hierarchical Multi-Scale Adversarial Networks
    Chen, Peng
    Li, Peng
    Li, Qing
    Zhang, Dezheng
    IEEE ACCESS, 2019, 7 : 118650 - 118668
  • [48] semi-Traj2Graph Identifying Fine-Grained Driving Style With GPS Trajectory Data via Multi-Task Learning
    Chen, Chao
    Liu, Qiang
    Wang, Xingchen
    Liao, Chengwu
    Zhang, Daqing
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (06) : 1550 - 1565
  • [49] Creation of individual ideally shaped stents using multi-slice CT: in vitro results from the semi-automatic virtual stent (SAVS) designer
    Hideki Hyodoh
    Yoshimi Katagiri
    Toyohiko Sakai
    Kazusa Hyodoh
    Hidenari Akiba
    Masato Hareyama
    European Radiology, 2005, 15 : 1623 - 1628
  • [50] Creation of individual ideally shaped stents using multi-slice CT: in vitro results from the semi-automatic virtual stent (SAVS) designer
    Hyodoh, H
    Katagiri, Y
    Sakai, T
    Hyodoh, K
    Akiba, H
    Hareyama, M
    EUROPEAN RADIOLOGY, 2005, 15 (08) : 1623 - 1628