InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

被引:0
|
作者
Wu, Xiaobao [1 ]
Dong, Xinshuai [2 ]
Nguyen, Thong [3 ]
Liu, Chaoqun [1 ,4 ]
Pan, Liang-Ming [3 ]
Luu, Anh Tuan [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] Natl Univ Singapore, Singapore, Singapore
[4] DAMO Acad, Alibaba Grp, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. How-ever, most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries. In this paper, we pro-pose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM). Instead of the direct alignment in previous work, we propose a topic alignment with mutual information method. This works as a regularization to properly align topics and prevent degenerate topic representations of words, which mitigates the repetitive topic issue. To address the low-coverage dictionary issue, we further propose a cross-lingual vocabulary linking method that finds more linked cross-lingual words for topic alignment beyond the translations of a given dictionary. Extensive experiments on English, Chinese, and Japanese datasets demonstrate that our method outperforms state-of-the-art baselines, producing more coherent, diverse, and well-aligned topics and showing better transferability for cross-lingual classification tasks.
引用
收藏
页码:13763 / 13771
页数:9
相关论文
共 50 条
  • [41] Cross-lingual latent semantic analysis for language modeling
    Kim, W
    Khudanpur, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 257 - 260
  • [42] Modeling Language Discrepancy for Cross-Lingual Sentiment Analysis
    Chen, Qiang
    Li, Chenliang
    Li, Wenjie
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 117 - 126
  • [43] Cross-lingual lexical triggers in statistical language modeling
    Kim, W
    Khudanpur, S
    PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 17 - 24
  • [44] An Improvement in Statistical Machine Translation in Perspective of Hindi-English Cross-Lingual Information Retrieval
    Sharma, Vijay Kumar
    Mittal, Namita
    COMPUTACION Y SISTEMAS, 2018, 22 (04): : 1277 - 1285
  • [45] What Is in a <unittitle>? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe
    Musso, Marta
    Arnold, Kerstin
    Nanni, Federico
    Cannelli, Beatrice
    ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2024, 17 (02):
  • [46] Cross-Lingual Topic Discovery From Multilingual Search Engine Query Log
    Jiang, Di
    Tong, Yongxin
    Song, Yuanfeng
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2016, 35 (02)
  • [47] Labeled Bilingual Topic Model for Cross-Lingual Text Classification and Label Recommendation
    Tian, Ming-Jie
    Huang, Zheng-Hao
    Cui, Rong-Yi
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 285 - 289
  • [48] Mongolian-Chinese Cross-lingual Topic Detection Based on Knowledge Distillation
    Wang, Yanli
    Ji, Yatu
    Sun, Baolei
    Ren, Qing-Dao-Er-Ji
    Wu, Nier
    Liu, Na
    Lu, Min
    Zhao, Chen
    Jia, Yepai
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 383 - 388
  • [49] Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification
    Xenouleas, Stratos
    Tsoukara, Alexia
    Panagiotakis, Giannis
    Chalkidis, Ilias
    Androutsopoulos, Ion
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [50] Coarse Alignment of Topic and Sentiment: A Unified Model for Cross-Lingual Sentiment Classification
    Wang, Deqing
    Jing, Baoyu
    Lu, Chenwei
    Wu, Junjie
    Liu, Guannan
    Du, Chenguang
    Zhuang, Fuzhen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (02) : 736 - 747