Fuzzy Bag-of-Topics Model for Short Text Representation

被引:0
|
作者
Jia, Hao [1 ]
Li, Qing [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
关键词
Short text; Representation learning; Word communities;
D O I
10.1007/978-3-030-04221-9_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text representation is the keystone in many NLP tasks. For short text representation learning, the traditional Bag-of-Words model (BoW) is often criticized for sparseness and neglecting semantic information. Fuzzy Bag-of-Words (FBoW) and Fuzzy Bag-of-Words Cluster (FBoWC) model are the improved model of BoW, which can learn dense and meaningful document vectors. However, word clusters in FBoWC model are obtained by K-means cluster algorithm, which is unstable and may result in incoherent word clusters if not initialized properly. In this paper, we propose the Fuzzy Bag-of-Topics model (FBoT) to learn short text vector. In FBoT model, word communities, which are more coherent than word clusters in FBoWC, are used as basis terms in text vector. Experimental results of short text classification on two datasets show that FBoT achieves the highest classification accuracies.
引用
收藏
页码:473 / 482
页数:10
相关论文
共 50 条
  • [41] Text representation by a computational model of reading
    Serrano, J. Ignacio
    del Castillo, M. Dolores
    NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2006, 4232 : 237 - 246
  • [42] Emotional analysis of short text based on LDA three-way decision mixed topics vector model
    Wang, Dexin
    Tang, Kuiyu
    Wang, Limin
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 172 - 173
  • [43] Short text manifold representation based on AutoEncoder network
    Wei, Chao
    Luo, Sen-Lin
    Zhang, Jing
    Pan, Li-Min
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2015, 49 (08): : 1591 - 1599
  • [44] Inductive Document Representation Learning for Short Text Clustering
    Chen, Junyang
    Gong, Zhiguo
    Wang, Wei
    Dong, Xiao
    Liu, Weiwen
    Wang, Cong
    Chen, Xian
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 600 - 616
  • [45] Strategies for Short Text Representation in the Word Vector Space
    Pita, Marcelo
    Pappa, Gisele L.
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 266 - 271
  • [46] Short-Text Representation using Diffusion Wavelets
    Jain, Vidit
    Mahadeokar, Jay
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 301 - 302
  • [47] Pyramid Text Recognition Based on A New Text Representation Model
    Su, Shaoxun
    Zhu, Nafei
    He, Jingsha
    PROCEEDINGS OF THE 2019 IEEE 16TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC 2019), 2019, : 195 - 199
  • [48] A general fuzzy-based framework for text representation and its application to text categorization
    Doan, Son
    Ha, Quang-Thuy
    Horiguchi, Susumu
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 611 - 620
  • [49] A FUZZY BAG MODEL FOR NUCLEAR MATTER: A PRELIMINARY APPROACH
    Rocha, A. S. S.
    Vasconcellos, C. A. Z.
    Fernandez, F.
    INTERNATIONAL JOURNAL OF MODERN PHYSICS D, 2010, 19 (8-10): : 1593 - 1597
  • [50] Albanian Text Classification: Bag of Words Model and Word Analogies
    Kadriu, Arbana
    Abazi, Lejla
    Abazi, Hyrije
    BUSINESS SYSTEMS RESEARCH JOURNAL, 2019, 10 (01): : 74 - 87