Cross-lingual document clustering

被引：0

作者：

Wu, Ke ^{[1
]}

Lu, Bao-Liang ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, 800 Dong Chuan Rd, Shanghai 200240, Peoples R China

来源：

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS | 2007年 / 4426卷

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The ever-increasing numbers of Web-accessible documents are available in languages other than English. The management of these heterogeneous document collections has posed a challenge. This paper proposes a novel model, called a domain alignment translation model, to conduct cross-lingual document clustering. While most existing cross-lingual document clustering methods make use of an expensive machine translation system to fill the gap between two languages, our model aims to effectively handle the cross-lingual document clustering by learning a cross-lingual domain alignment model and a domain-specific term translation model in a collaborative way. Experimental results show our method, i.e. C-TLS, without any resources other than a bilingual dictionary can achieve comparable performance to the direct machine translation method via, a machine translation system, e.g. Google language tool. Also, our method is more efficient.

引用

页码：956 / +

页数：2

共 50 条

[1] Inducing word senses for cross-lingual document clustering
Tang, Guoyu
Xia, Yunqing
Cambria, Erik
Jin, Peng
2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 409 - 414
[2] Document Representation with Statistical Word Senses in Cross-Lingual Document Clustering
Tang, Guoyu
Xia, Yunqing
Cambria, Erik
Jin, Peng
Zheng, Thomas Fang
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (02)
[3] Cross-Lingual Document Similarity
Muhic, Andrej
Rupnik, Jan
Skraba, Primoz
PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 387 - 392
[4] Improved Cross-Lingual Document Similarity Measurement
Isuranga, Udhan
Sandaruwan, Janaka
Athukorala, Udesh
Dias, Gihan
2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 45 - 49
[5] Cross-lingual Text Clustering in a Large System
Schneider, Nicole R.
Sankaranarayanan, Jagan
Samet, Hanan
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 1 - 11
[6] Document Similarity for Arabic and Cross-Lingual Web Content
Salhi, Ali
Yahya, Adnan H.
ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 134 - 146
[7] Semantic Space Transformations for Cross-Lingual Document Classification
Martinek, Jiri
Lenc, Ladislav
Kral, Pavel
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 608 - 616
[8] Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
Feng, Kai
Huang, Lan
Xu, Hao
Wang, Kangping
Wei, Wei
Zhang, Rui
ENTROPY, 2022, 24 (07)
[9] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
Moreo, Alejandro
Pedrotti, Andrea
Sebastiani, Fabrizio
36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
[10] Cross-Lingual Training of Neural Models for Document Ranking
Shi, Peng
Bai, He
Lin, Jimmy
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2768 - 2773

← 1 2 3 4 5 →