Cross-lingual document clustering

被引:0
|
作者
Wu, Ke [1 ]
Lu, Bao-Liang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, 800 Dong Chuan Rd, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ever-increasing numbers of Web-accessible documents are available in languages other than English. The management of these heterogeneous document collections has posed a challenge. This paper proposes a novel model, called a domain alignment translation model, to conduct cross-lingual document clustering. While most existing cross-lingual document clustering methods make use of an expensive machine translation system to fill the gap between two languages, our model aims to effectively handle the cross-lingual document clustering by learning a cross-lingual domain alignment model and a domain-specific term translation model in a collaborative way. Experimental results show our method, i.e. C-TLS, without any resources other than a bilingual dictionary can achieve comparable performance to the direct machine translation method via, a machine translation system, e.g. Google language tool. Also, our method is more efficient.
引用
收藏
页码:956 / +
页数:2
相关论文
共 50 条
  • [1] Inducing word senses for cross-lingual document clustering
    Tang, Guoyu
    Xia, Yunqing
    Cambria, Erik
    Jin, Peng
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 409 - 414
  • [2] Document Representation with Statistical Word Senses in Cross-Lingual Document Clustering
    Tang, Guoyu
    Xia, Yunqing
    Cambria, Erik
    Jin, Peng
    Zheng, Thomas Fang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (02)
  • [3] Cross-Lingual Document Similarity
    Muhic, Andrej
    Rupnik, Jan
    Skraba, Primoz
    PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 387 - 392
  • [4] Improved Cross-Lingual Document Similarity Measurement
    Isuranga, Udhan
    Sandaruwan, Janaka
    Athukorala, Udesh
    Dias, Gihan
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 45 - 49
  • [5] Cross-lingual Text Clustering in a Large System
    Schneider, Nicole R.
    Sankaranarayanan, Jagan
    Samet, Hanan
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 1 - 11
  • [6] Document Similarity for Arabic and Cross-Lingual Web Content
    Salhi, Ali
    Yahya, Adnan H.
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 134 - 146
  • [7] Semantic Space Transformations for Cross-Lingual Document Classification
    Martinek, Jiri
    Lenc, Ladislav
    Kral, Pavel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 608 - 616
  • [8] Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
    Feng, Kai
    Huang, Lan
    Xu, Hao
    Wang, Kangping
    Wei, Wei
    Zhang, Rui
    ENTROPY, 2022, 24 (07)
  • [9] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [10] Cross-Lingual Training of Neural Models for Document Ranking
    Shi, Peng
    Bai, He
    Lin, Jimmy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2768 - 2773