Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network

被引:17
|
作者
Rossi, Rafael Geraldeli [1 ]
Lopes, Alneu de Andrade [1 ]
Faleiros, Thiago de Paulo [1 ]
Rezende, Solange Oliveira [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
heterogeneous network; text classification; inductive model generation; CLASSIFIERS;
D O I
10.1007/s11390-014-1436-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Algorithms for numeric data classification have been applied for text classification. Usually the vector space model is used to represent text collections. The characteristics of this representation such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Networks can be used to represent text collections, avoiding the high sparsity and allowing to model relationships among different objects that compose a text collection. Such network-based representations can improve the quality of the classification results. One of the simplest ways to represent textual collections by a network is through a bipartite heterogeneous network, which is composed of objects that represent the documents connected to objects that represent the terms. Heterogeneous bipartite networks do not require computation of similarities or relations among the objects and can be used to model any type of text collection. Due to the advantages of representing text collections through bipartite heterogeneous networks, in this article we present a text classifier which builds a classification model using the structure of a bipartite heterogeneous network. Such an algorithm, referred to as IMBHN (Inductive Model Based on Bipartite Heterogeneous Network), induces a classification model assigning weights to objects that represent the terms for each class of the text collection. An empirical evaluation using a large amount of text collections from different domains shows that the proposed IMBHN algorithm produces significantly better results than k-NN, C4.5, SVM, and Naive Bayes algorithms.
引用
收藏
页码:361 / 375
页数:15
相关论文
共 50 条
  • [1] Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network
    Rafael Geraldeli Rossi
    Alneu de Andrade Lopes
    Thiago de Paulo Faleiros
    Solange Oliveira Rezende
    Journal of Computer Science and Technology, 2014, 29 : 361 - 375
  • [2] Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network
    Rafael Geraldeli Rossi
    Alneu de Andrade Lopes
    Thiago de Paulo Faleiros
    Solange Oliveira Rezende
    JournalofComputerScience&Technology, 2014, 29 (03) : 361 - 375
  • [3] Inductive Model Generation for Text Categorization using a Bipartite Heterogeneous Network
    Rossi, Rafael Geraldeli
    Faleiros, Thiago de Paulo
    Lopes, Alneu de Andrade
    Rezende, Solange Oliveira
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 1086 - 1091
  • [4] A Heterogeneous Directed Graph Attention Network for inductive text classification using multilevel semantic embeddings
    Lin, Mu
    Wang, Tao
    Zhu, Yifan
    Li, Xiaobo
    Zhou, Xin
    Wang, Weiping
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [5] Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization
    Rossi, Rafael Geraldeli
    Lopes, Alneu de Andrade
    Rezende, Solange Oliveira
    KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 94 - 118
  • [6] Text Classification with Heterogeneous Information Network Kernels
    Wang, Chenguang
    Song, Yangqiu
    Li, Haoran
    Zhang, Ming
    Han, Jiawei
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2130 - 2136
  • [7] Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network
    Gao, Haoxin
    Sun, Lijuan
    Wu, Jingchen
    Gao, Yutong
    Wu, Xu
    Data Analysis and Knowledge Discovery, 2023, 7 (11): : 26 - 36
  • [8] Boosting inductive transfer for text classification using Wikipedia
    Banerjee, Somnath
    ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 148 - 153
  • [9] Deeply integrating unsupervised semantics and syntax into heterogeneous graphs for inductive text classification
    Yue Gao
    Xiangling Fu
    Xien Liu
    Ji Wu
    Complex & Intelligent Systems, 2024, 10 : 1565 - 1579
  • [10] Deeply integrating unsupervised semantics and syntax into heterogeneous graphs for inductive text classification
    Gao, Yue
    Fu, Xiangling
    Liu, Xien
    Wu, Ji
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (01) : 1565 - 1579