Fuzzy Bag-of-Topics Model for Short Text Representation

被引:0
|
作者
Jia, Hao [1 ]
Li, Qing [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
关键词
Short text; Representation learning; Word communities;
D O I
10.1007/978-3-030-04221-9_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text representation is the keystone in many NLP tasks. For short text representation learning, the traditional Bag-of-Words model (BoW) is often criticized for sparseness and neglecting semantic information. Fuzzy Bag-of-Words (FBoW) and Fuzzy Bag-of-Words Cluster (FBoWC) model are the improved model of BoW, which can learn dense and meaningful document vectors. However, word clusters in FBoWC model are obtained by K-means cluster algorithm, which is unstable and may result in incoherent word clusters if not initialized properly. In this paper, we propose the Fuzzy Bag-of-Topics model (FBoT) to learn short text vector. In FBoT model, word communities, which are more coherent than word clusters in FBoWC, are used as basis terms in text vector. Experimental results of short text classification on two datasets show that FBoT achieves the highest classification accuracies.
引用
收藏
页码:473 / 482
页数:10
相关论文
共 50 条
  • [1] A Comparative Study of Bag-of-Words and Bag-of-Topics Models of EO Image Patches
    Bahmanyar, Reza
    Cui, Shiyong
    Datcu, Mihai
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2015, 12 (06) : 1357 - 1361
  • [2] Fuzzy Bag-of-Words Model for Document Representation
    Zhao, Rui
    Mao, Kezhi
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (02) : 794 - 804
  • [3] Bag of Textual Graphs (BoTG): A General Graph-Based Text Representation Model
    Dourado, Icaro Cavalcante
    Galante, Renata
    Goncalves, Marcos Andre
    Torres, Ricardo da Silva
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2019, 70 (08) : 817 - 829
  • [4] Beyond the bag of words: A text representation for sentence selection
    Caropreso, Maria Fernanda
    Matwin, Stan
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4013 : 324 - 335
  • [5] An enhanced short text categorization model with deep abundant representation
    Gu, Yanhui
    Gu, Min
    Long, Yi
    Xu, Guandong
    Yang, Zhenglu
    Zhou, Junsheng
    Qu, Weiguang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2018, 21 (06): : 1705 - 1719
  • [6] An enhanced short text categorization model with deep abundant representation
    Yanhui Gu
    Min Gu
    Yi Long
    Guandong Xu
    Zhenglu Yang
    Junsheng Zhou
    Weiguang Qu
    World Wide Web, 2018, 21 : 1705 - 1719
  • [7] Short text similarity based on probabilistic topics
    Xiaojun Quan
    Gang Liu
    Zhi Lu
    Xingliang Ni
    Liu Wenyin
    Knowledge and Information Systems, 2010, 25 : 473 - 491
  • [8] Short text similarity based on probabilistic topics
    Quan, Xiaojun
    Liu, Gang
    Lu, Zhi
    Ni, Xingliang
    Wenyin, Liu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (03) : 473 - 491
  • [9] The topics model for semantic representation
    Steyvers, M
    Griffiths, T
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2005, 49 (01) : 92 - 93
  • [10] The fuzzy bag model revisited
    Pilotto, F
    Vasconcellos, CAZ
    Coelho, HT
    MODERN PHYSICS LETTERS A, 2002, 17 (09) : 543 - 553