Fuzzy Bag-of-Topics Model for Short Text Representation

被引:0
|
作者
Jia, Hao [1 ]
Li, Qing [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
关键词
Short text; Representation learning; Word communities;
D O I
10.1007/978-3-030-04221-9_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text representation is the keystone in many NLP tasks. For short text representation learning, the traditional Bag-of-Words model (BoW) is often criticized for sparseness and neglecting semantic information. Fuzzy Bag-of-Words (FBoW) and Fuzzy Bag-of-Words Cluster (FBoWC) model are the improved model of BoW, which can learn dense and meaningful document vectors. However, word clusters in FBoWC model are obtained by K-means cluster algorithm, which is unstable and may result in incoherent word clusters if not initialized properly. In this paper, we propose the Fuzzy Bag-of-Topics model (FBoT) to learn short text vector. In FBoT model, word communities, which are more coherent than word clusters in FBoWC, are used as basis terms in text vector. Experimental results of short text classification on two datasets show that FBoT achieves the highest classification accuracies.
引用
收藏
页码:473 / 482
页数:10
相关论文
共 50 条
  • [21] Local word bag model for text categorization
    Pu, Wen
    Liu, Ning
    Yan, Shuicheng
    Yan, Jun
    Xie, Kunqing
    Chen, Zheng
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 625 - +
  • [23] The influence of preprocessing on text classification using a bag-of-words representation
    HaCohen-Kerner, Yaakov
    Miller, Daniel
    Yigal, Yair
    PLOS ONE, 2020, 15 (05):
  • [24] BOWL: Bag of Word Clusters Text Representation Using Word Embeddings
    Rui, Weikang
    Xing, Kai
    Jia, Yawei
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2016, 2016, 9983 : 3 - 14
  • [25] Text mining in the SOMLib Digital Library System: The representation of topics and genres
    Rauber, A
    Merkl, D
    APPLIED INTELLIGENCE, 2003, 18 (03) : 271 - 293
  • [26] Text Mining in the SOMLib Digital Library System: The Representation of Topics and Genres
    Andreas Rauber
    Dieter Merkl
    Applied Intelligence, 2003, 18 : 271 - 293
  • [27] A Text Network Representation Model
    Liu, Jianyi
    Wang, Jinghua
    Wang, Cong
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 150 - 154
  • [28] Short Text Entity Linking with Fine-grained Topics
    Chen, Lihan
    Liang, Jiaqing
    Xie, Chenhao
    Xiao, Yanghua
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 457 - 466
  • [29] Digital representation and the text model
    Buzetti, D
    NEW LITERARY HISTORY, 2002, 33 (01) : 61 - 88
  • [30] Enriching short text representation in microblog for clustering
    Tang, Jiliang
    Wang, Xufei
    Gao, Huiji
    Hu, Xia
    Liu, Huan
    FRONTIERS OF COMPUTER SCIENCE, 2012, 6 (01) : 88 - 101