Metadata Generation for Multi-Text Classification in Structured Data

被引:0
|
作者
Trejo, Karla [1 ]
Garcia, Pere [1 ]
Puyol-Gruart, Josep [1 ]
机构
[1] IIIA CSIC, UAB Campus, E-08193 Bellaterra, Catalonia, Spain
关键词
text analysis; text mining; data formatting; multi-text classification; topology; metadata; structured data;
D O I
10.3233/FAIA190154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
dIn today's information-saturated world, text analysis has become an indispensable resource to extract useful data from massive amounts of texts. A large portion of this information is unstructured. Hence, it has created a need for methodologies -Named Entity Recognition (NER), Part-of-Speech (PoS) Tagging, N-grams, Term Frequency - Inverse Document Frequency (TF-IDF)- which can read and understand information based on their meaning, context and linguistic cohesion. However, these approaches on their own fall short if applied in already structured data. The idea of generating metadata which can simultaneously provide situational information from structured text data is proposed in this paper. The abstraction of text as a "group of concepts" can boost the relevance of a word in a collection of documents, which allows a more refined separation of classes and a better performance in multi-text classification tasks.
引用
收藏
页码:417 / 421
页数:5
相关论文
共 50 条
  • [21] Automatic classification and taxonomy generation for semi-structured data
    Nunes, Bernardo Pereira
    Lopes, Giseli Rabello
    Casanova, Marco Antonio
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 207 - 214
  • [22] Text Generation for Imbalanced Text Classification
    Akkaradamrongrat, Suphamongkol
    Kachamas, Pornpimon
    Sinthupinyo, Sukree
    2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019), 2019, : 181 - 186
  • [23] Text Mining using Metadata for Generation of Side information
    Bhanuse, Shraddha S.
    Kamble, Shailesh D.
    Kakde, Sandeep M.
    1ST INTERNATIONAL CONFERENCE ON INFORMATION SECURITY & PRIVACY 2015, 2016, 78 : 807 - 814
  • [24] EventCube: Multi-Dimensional Search and Mining of Structured and Text Data
    Tao, Fangbo
    Lei, Kin Hou
    Han, Jiawei
    Zhai, ChengXiang
    Cheng, Xiao
    Danilevsky, Marina
    Desai, Nihit
    Ding, Bolin
    Ge, Jing
    Ji, Heng
    Kanade, Rucha
    Kao, Anne
    Li, Qi
    Li, Yanen
    Lin, Cindy Xide
    Liu, Jialiu
    Oza, Nikunj
    Srivastava, Ashok
    Tjoelker, Rod
    Wang, Chi
    Zhang, Duo
    Zhao, Bo
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 1494 - 1497
  • [25] Prognosis Essay Scoring and Article Relevancy Using Multi-Text Features and Machine Learning
    Mehmood, Arif
    On, Byung-Won
    Lee, Ingyu
    Choi, Gyu Sang
    SYMMETRY-BASEL, 2017, 9 (01):
  • [26] Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification
    Zhang, Yu
    Shen, Zhihong
    Wu, Chieh-Han
    Xie, Boya
    Hao, Junheng
    Wang, Ye-Yi
    Wang, Kuansan
    Han, Jiawei
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3162 - 3173
  • [27] A Combined Approach for Multi-Label Text Data Classification
    Strimaitis, Rokas
    Stefanovic, Pavel
    Ramanauskaite, Simona
    Slotkiene, Asta
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [28] A Novel Data Mining Approach for Multi Variant Text Classification
    Dsouza, Kevin Joy
    Ansari, Zaheed Ahmed
    2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING IN EMERGING MARKETS (CCEM), 2016, : 68 - 73
  • [29] Balancing via Generation for Multi-Class Text Classification Improvement
    Tepper, Naama
    Goldbraich, Esther
    Zwerdling, Naama
    Kour, George
    Anaby-Tavor, Ateret
    Carmeli, Boaz
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1440 - 1452
  • [30] META: Metadata-Empowered Weak Supervision for Text Classification
    Mekala, Dheeraj
    Zhang, Xinyang
    Shang, Jingbo
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8351 - 8361