Word Clustering Algorithms Based on Word Similarity

被引:2
|
作者
Yuan, Lichi [1 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330013, Peoples R China
关键词
Word similarity; Word clustering; Statistical language model;
D O I
10.1109/IHMSC.2015.36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.
引用
收藏
页码:21 / 24
页数:4
相关论文
共 50 条
  • [22] Word Similarity In WordNet
    Hong-Minh, Ran
    Smith, Dan
    MODELING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2008, : 293 - 302
  • [23] Word Similarity Algorithm Based on WordNet And HowNet
    Ren, Wuling
    Guo, Jinju
    MECHANICAL ENGINEERING AND GREEN MANUFACTURING II, PTS 1 AND 2, 2012, 155-156 : 375 - 380
  • [24] Probabilistic word vector and similarity based on dictionaries
    Suzuki, S
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 562 - 572
  • [25] Word sense disambiguation based on context selection using knowledge-based word similarity
    Kwon, Sunjae
    Oh, Dongsuk
    Ko, Youngjoong
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (04)
  • [26] Study of Word Similarity Based on Information Entropy
    Wang, Xiaolin
    Lu, Luoyong
    Tai, Weipeng
    INTERNATIONAL SYMPOSIUM ON ENGINEERING TECHNOLOGY, EDUCATION AND MANAGEMENT (ISETEM 2014), 2014, : 807 - 812
  • [27] Word similarity computation based on WordNet and HowNet
    Zhang, Pei-ying
    INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS II, PTS 1-3, 2013, 336-338 : 2115 - 2118
  • [28] Similarity-based word sense disambiguation
    Karov, Y
    Edelman, S
    COMPUTATIONAL LINGUISTICS, 1998, 24 (01) : 41 - 59
  • [29] A clustering-based topic model using word networks and word embeddings
    Mu, Wenchuan
    Lim, Kwan Hui
    Liu, Junhua
    Karunasekera, Shanika
    Falzon, Lucia
    Harwood, Aaron
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [30] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432