Cross-lingual Similar Documents Retrieval Based on Co-occurrence Projection

被引:0
|
作者
Liu, Jiao [1 ]
Cui, Rong-Yi [1 ]
Zhao, Ya-Hui [1 ]
机构
[1] Yanbian Univ, Dept Comp Sci & Technol, Yanji, Jilin, Peoples R China
关键词
cross-lingual; similar documents retrieval; word cooccurrence; latent semantic analysis;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, an approach to calculating the similarity among cross-lingual documents was researched for multilingual documents including Chinese, English, and Korean. Firstly, document was represented as a vector in the space of other language by co-occurrence projection. And then, taking advantage of the latent semantic analysis, the loss of vector caused by polysemy between different languages was remedied. Finally, the cross-lingual cosine similarity of documents was calculated in the same language space possessing equivalent semantic information. External dictionary and knowledge base were sidestepped by using the translation corpus to establish the lexical correspondence among Chinese, English, and Korean. The results show that co-occurrence projection has a great effect in calculating cross-lingual documents similarity, moreover, the retrieval accuracy of translation can be reached 95%, which verifies the effectiveness of the proposed method.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [41] Ontology-based Tamil–English cross-lingual information retrieval system
    D Thenmozhi
    Chandrabose Aravindan
    Sādhanā, 2018, 43
  • [42] Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
    Vulic, Ivan
    Moens, Marie-Francine
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 363 - 372
  • [43] Fuzzy conceptual indexing for concept-based cross-lingual text retrieval
    Chau, R
    Yeh, CH
    IEEE INTERNET COMPUTING, 2004, 8 (05) : 14 - 21
  • [44] Cross-lingual document retrieval module based on hybrid peer to peer network
    Liu, Yuemin
    Li, Shaozi
    Zhang, Hongyi
    IEEE ICMA 2006: PROCEEDING OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-3, PROCEEDINGS, 2006, : 1441 - +
  • [45] Image retrieval based on the texton co-occurrence matrix
    Liu, Guang-Hai
    Yang, Jing-Yu
    PATTERN RECOGNITION, 2008, 41 (12) : 3521 - 3527
  • [46] Cross-lingual Model Transfer Using Feature Representation Projection
    Kozhevnikov, Mikhail
    Titov, Ivan
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 579 - 585
  • [47] Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection
    Ben Abacha, Asma
    Zweigenbaum, Pierre
    Max, Aurelien
    2013 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2013), 2013, : 82 - 88
  • [48] A Cross-Lingual Summarization method based on cross-lingual Fact-relationship Graph Generation
    Zhang, Yongbing
    Gao, Shengxiang
    Huang, Yuxin
    Tan, Kaiwen
    Yu, Zhengtao
    PATTERN RECOGNITION, 2024, 146
  • [49] Steering Large Language Models for Cross-lingual Information Retrieval
    Guo, Ping
    Ren, Yubing
    Hu, Yue
    Cao, Yanan
    Li, Yunpeng
    Huang, Heyan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 585 - 596
  • [50] PHONETIC NAME MATCHING FOR CROSS-LINGUAL SPOKEN SENTENCE RETRIEVAL
    Ji, Heng
    Grishman, Ralph
    Wang, Wen
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 281 - +