Cross-lingual document clustering

被引:0
|
作者
Wu, Ke [1 ]
Lu, Bao-Liang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, 800 Dong Chuan Rd, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ever-increasing numbers of Web-accessible documents are available in languages other than English. The management of these heterogeneous document collections has posed a challenge. This paper proposes a novel model, called a domain alignment translation model, to conduct cross-lingual document clustering. While most existing cross-lingual document clustering methods make use of an expensive machine translation system to fill the gap between two languages, our model aims to effectively handle the cross-lingual document clustering by learning a cross-lingual domain alignment model and a domain-specific term translation model in a collaborative way. Experimental results show our method, i.e. C-TLS, without any resources other than a bilingual dictionary can achieve comparable performance to the direct machine translation method via, a machine translation system, e.g. Google language tool. Also, our method is more efficient.
引用
收藏
页码:956 / +
页数:2
相关论文
共 50 条
  • [31] Cross-Lingual Word Embeddings
    Søgaard A.
    Vulić I.
    Ruder S.
    Faruqui M.
    Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [32] Cross-lingual Continual Learning
    M'hamdi, Meryem
    Ren, Xiang
    May, Jonathan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3908 - 3943
  • [33] Cross-Lingual Phrase Retrieval
    Zheng, Heqi
    Zhang, Xiao
    Chi, Zewen
    Huang, Heyan
    Yan, Tan
    Lan, Tian
    Wei, Wei
    Mao, Xian-Ling
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4193 - 4204
  • [34] Evaluation of a Cross-lingual Romanian-English Multi-document Summariser
    Orasan, Constantin
    Chiorean, Oana Andreea
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2114 - 2119
  • [35] Cross-lingual timeline summarization
    Cagliero, Luca
    La Quatra, Moreno
    Garza, Paolo
    Baralis, Elena
    2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 45 - 53
  • [36] Cross-Lingual Sentiment Quantification
    Esuli, Andrea
    Moreo, Alejandro
    Sebastiani, Fabrizio
    IEEE INTELLIGENT SYSTEMS, 2020, 35 (03) : 106 - 113
  • [37] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [38] Cross-Lingual Word Embeddings
    Agirre, Eneko
    COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [39] Cross-lingual CCG Induction
    Evang, Kilian
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1577 - 1587
  • [40] Cross-lingual and Multilingual CLIP
    Carlsson, Fredrik
    Eisen, Philipp
    Rekathati, Faton
    Sahlgren, Magnus
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6848 - 6854