Cross-lingual document clustering

被引:0
|
作者
Wu, Ke [1 ]
Lu, Bao-Liang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, 800 Dong Chuan Rd, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ever-increasing numbers of Web-accessible documents are available in languages other than English. The management of these heterogeneous document collections has posed a challenge. This paper proposes a novel model, called a domain alignment translation model, to conduct cross-lingual document clustering. While most existing cross-lingual document clustering methods make use of an expensive machine translation system to fill the gap between two languages, our model aims to effectively handle the cross-lingual document clustering by learning a cross-lingual domain alignment model and a domain-specific term translation model in a collaborative way. Experimental results show our method, i.e. C-TLS, without any resources other than a bilingual dictionary can achieve comparable performance to the direct machine translation method via, a machine translation system, e.g. Google language tool. Also, our method is more efficient.
引用
收藏
页码:956 / +
页数:2
相关论文
共 50 条
  • [21] Cross-lingual document similarity estimation and dictionary generation with comparable corpora
    Tadej Štajner
    Dunja Mladenić
    Knowledge and Information Systems, 2019, 58 : 729 - 743
  • [22] Morpheme-based, cross-lingual indexing for medical document retrieval
    Schulz, S
    Hahn, U
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 87 - 99
  • [23] News Across Languages - Cross-Lingual Document Similarity and Event Tracking
    Rupnik, Jan
    Muhic, Andrej
    Leban, Gregor
    Skraba, Primoz
    Fortuna, Blaz
    Grobelnik, Marko
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 : 283 - 316
  • [24] Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification
    Wu, Hanqian
    Wang, Zhike
    Qing, Feng
    Li, Shoushan
    ELECTRONICS, 2021, 10 (03) : 1 - 14
  • [25] Cross-Lingual Blog Analysis by Cross-Lingual Comparison of Characteristic Terms and Blog Posts
    Nakasaki, Hiroyuki
    Kawaba, Mariko
    Utsuro, Takehito
    Fukuhara, Tomohiro
    Nakagawa, Hiroshi
    Kando, Noriko
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 105 - +
  • [26] SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
    Fatima, Mehwish
    Kolber, Tim
    Markert, Katja
    Strube, Michael
    NewSumm 2023 - Proceedings of the 4th New Frontiers in Summarization Workshop, Proceedings of EMNLP Workshop, 2023, : 24 - 40
  • [27] Cross-lingual Emotion Detection
    Hassan, Sabit
    Shaar, Shaden
    Darwish, Kareem
    2022 Language Resources and Evaluation Conference, LREC 2022, 2022, : 6948 - 6958
  • [28] Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Boyd-Graber, Jordan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9547 - 9554
  • [29] Cross-lingual document retrieval module based on hybrid peer to peer network
    Liu, Yuemin
    Li, Shaozi
    Zhang, Hongyi
    IEEE ICMA 2006: PROCEEDING OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-3, PROCEEDINGS, 2006, : 1441 - +
  • [30] Cross-lingual talker discrimination
    Wester, Mirjam
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1253 - 1256