Intention-guided deep semi-supervised document clustering via metric learning

被引:2
|
作者
Li, Jingnan [1 ,2 ]
Lin, Chuan [1 ,2 ,3 ]
Huang, Ruizhang [1 ,2 ]
Qin, Yongbin [1 ,2 ]
Chen, Yanping [1 ,2 ]
机构
[1] Guizhou Univ, State Key Lab Publ Big Data, Guiyang 550025, Peoples R China
[2] Guizhou Univ, Coll Comp Sci & Technol, Guiyang 550025, Peoples R China
[3] Guizhou Univ, Guiyang 550025, Peoples R China
基金
中国国家自然科学基金;
关键词
Intention; Semi; -supervised; Clustering; Metric learning; NETWORKS;
D O I
10.1016/j.jksuci.2022.12.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The intention expresses the user's preference for document structure division. Intention-guided document structure division is an important task in the field of text mining. To achieve this goal, deep semi-supervised document clustering provides a promising solution to personalized document clustering. However, traditional deep semi-supervised clustering models suffer from the problem of the limited number of constraints which is insufficient for intention-guided document clustering. Moreover, documents normally have various emphases on their representations to reflect different structural opinions. In this paper, we proposed an intention-guided deep semi-supervised document clustering model, namely IGSC, to divide document structure based on a small amount of user-provided supervised information. IGSC designs a deep metric learning network to solve the above problems. The deep metric learner explores the user's global intention and outputs an intention matrix. The intention is explored from the small amount user provided pairwise constraints and is used to guide the representation learning. Moreover, IGSC uses the intention matrix to guide the clustering process, to get the clustering results that best meet the user's intention. This paper compares IGSC with a number of document clustering models on four real-world text datasets, namely Reu-10k, BBC, ACM, and Abstract. The results show that IGSC evidently improves the clustering performance and outperforms the best result of benchmark models with 7% on average. The comparison with other models and the visualization results can demonstrate that IGSC is effective.& COPY; 2022 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:416 / 425
页数:10
相关论文
共 50 条
  • [21] Deep Semi-supervised Metric Learning with Mixed Label Propagation
    Zhuang, Furen
    Moulin, Pierre
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3429 - 3438
  • [22] Semi-Supervised Distance Metric Learning for Collaborative Image Retrieval and Clustering
    Hoi, Steven C. H.
    Liu, Wei
    Chang, Shih-Fu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2010, 6 (03)
  • [23] Effective semi-supervised document clustering via active learning with instance-level constraints
    Weizhong Zhao
    Qing He
    Huifang Ma
    Zhongzhi Shi
    Knowledge and Information Systems, 2012, 30 : 569 - 587
  • [24] Semi-supervised concept factorization for document clustering
    Lu, Mei
    Zhao, Xiang-Jun
    Zhang, Li
    Li, Fan-Zhang
    INFORMATION SCIENCES, 2016, 331 : 86 - 98
  • [25] Effective semi-supervised document clustering via active learning with instance-level constraints
    Zhao, Weizhong
    He, Qing
    Ma, Huifang
    Shi, Zhongzhi
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (03) : 569 - 587
  • [26] An active learning framework for semi-supervised document clustering with language modeling
    Huang, Ruizhang
    Lam, Wai
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (01) : 49 - 67
  • [27] Semi-supervised deep embedded clustering
    Ren, Yazhou
    Hu, Kangrong
    Dai, Xinyi
    Pan, Lili
    Hoi, Steven C. H.
    Xu, Zenglin
    NEUROCOMPUTING, 2019, 325 : 121 - 130
  • [28] Semi-supervised deep density clustering
    Xu, Xiao
    Hou, Haiwei
    Ding, Shifei
    APPLIED SOFT COMPUTING, 2023, 148
  • [29] Distributed Semi-Supervised Metric Learning
    Shen, Pengcheng
    Du, Xin
    Li, Chunguang
    IEEE ACCESS, 2016, 4 : 8558 - 8571
  • [30] Soft Semi-Supervised Deep Learning-Based Clustering
    Alzuhair, Mona Suliman
    Ben Ismail, Mohamed Maher
    Bchir, Ouiem
    APPLIED SCIENCES-BASEL, 2023, 13 (17):