InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

被引:0
|
作者
Wu, Xiaobao [1 ]
Dong, Xinshuai [2 ]
Nguyen, Thong [3 ]
Liu, Chaoqun [1 ,4 ]
Pan, Liang-Ming [3 ]
Luu, Anh Tuan [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] Natl Univ Singapore, Singapore, Singapore
[4] DAMO Acad, Alibaba Grp, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. How-ever, most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries. In this paper, we pro-pose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM). Instead of the direct alignment in previous work, we propose a topic alignment with mutual information method. This works as a regularization to properly align topics and prevent degenerate topic representations of words, which mitigates the repetitive topic issue. To address the low-coverage dictionary issue, we further propose a cross-lingual vocabulary linking method that finds more linked cross-lingual words for topic alignment beyond the translations of a given dictionary. Extensive experiments on English, Chinese, and Japanese datasets demonstrate that our method outperforms state-of-the-art baselines, producing more coherent, diverse, and well-aligned topics and showing better transferability for cross-lingual classification tasks.
引用
收藏
页码:13763 / 13771
页数:9
相关论文
共 50 条
  • [1] Incorporating Word Embedding into Cross-lingual Topic Modeling
    Chang, Chia-Hsuan
    Hwang, San-Yih
    Xui, Tou-Hsiang
    2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 17 - 24
  • [2] Cross-Lingual Latent Topic Extraction
    Zhang, Duo
    Mei, Qiaozhu
    Zhai, ChengXiang
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 1128 - 1137
  • [3] Cross-lingual information retrieval model based on bilingual topic correlation
    Luo, Yuansheng
    Le, Zhongjian
    Wang, Mingwen
    Journal of Computational Information Systems, 2013, 9 (06): : 2433 - 2440
  • [4] A word embedding-based approach to cross-lingual topic modeling
    Chia-Hsuan Chang
    San-Yih Hwang
    Knowledge and Information Systems, 2021, 63 : 1529 - 1555
  • [5] A word embedding-based approach to cross-lingual topic modeling
    Chang, Chia-Hsuan
    Hwang, San-Yih
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (06) : 1529 - 1555
  • [6] Cross-lingual embeddings with auxiliary topic models
    Zhou, Dong
    Peng, Xiaoya
    Li, Lin
    Han, Jun-mei
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 190
  • [7] Encoding Category Correlations into Bilingual Topic Modeling for Cross-Lingual Taxonomy Alignment
    Wu, Tianxing
    Zhang, Lei
    Qi, Guilin
    Cui, Xuan
    Xu, Kang
    SEMANTIC WEB - ISWC 2017, PT I, 2017, 10587 : 728 - 744
  • [8] CROSS-LINGUAL TOPIC PREDICTION FOR SPEECH USING TRANSLATIONS
    Bansal, Sameer
    Kamper, Herman
    Lopez, Adam
    Goldwater, Sharon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8164 - 8168
  • [9] Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification
    D. Karpov
    M. Burtsev
    Journal of Mathematical Sciences, 2024, 285 (1) : 36 - 48
  • [10] Semantic Cross-Lingual Information Retrieval
    Pourmahmoud, Solmaz
    Shamsfard, Mehrnoush
    23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 80 - +