Cross-lingual Similar Documents Retrieval Based on Co-occurrence Projection

被引:0
|
作者
Liu, Jiao [1 ]
Cui, Rong-Yi [1 ]
Zhao, Ya-Hui [1 ]
机构
[1] Yanbian Univ, Dept Comp Sci & Technol, Yanji, Jilin, Peoples R China
关键词
cross-lingual; similar documents retrieval; word cooccurrence; latent semantic analysis;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, an approach to calculating the similarity among cross-lingual documents was researched for multilingual documents including Chinese, English, and Korean. Firstly, document was represented as a vector in the space of other language by co-occurrence projection. And then, taking advantage of the latent semantic analysis, the loss of vector caused by polysemy between different languages was remedied. Finally, the cross-lingual cosine similarity of documents was calculated in the same language space possessing equivalent semantic information. External dictionary and knowledge base were sidestepped by using the translation corpus to establish the lexical correspondence among Chinese, English, and Korean. The results show that co-occurrence projection has a great effect in calculating cross-lingual documents similarity, moreover, the retrieval accuracy of translation can be reached 95%, which verifies the effectiveness of the proposed method.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [31] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [32] Cross-Lingual Annotation Projection for Argument Mining in Portuguese
    Sousa, Afonso
    Leite, Bernardo
    Rocha, Gil
    Cardoso, Henrique Lopes
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 752 - 765
  • [33] Frustratingly Easy Label Projection for Cross-lingual Transfer
    Chen, Yang
    Jiang, Chao
    Ritter, Alan
    Xu, Wei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5775 - 5796
  • [34] CrossMath: Towards Cross-lingual Math Information Retrieval
    Gore, James
    Polletta, Joseph
    Mansouri, Behrooz
    PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 101 - 105
  • [35] Cross-lingual and cross-domain discourse segmentation of entire documents
    Braud, Chloe
    Lacroix, Ophelie
    Sogaard, Anders
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 237 - 243
  • [36] A method of cross-lingual consumer health information retrieval
    Neveol, Aurelie
    Pereira, Suzanne
    Soualmia, Lina F.
    Thirion, Benoit
    Darmoni, Stefan J.
    UBIQUITY: TECHNOLOGIES FOR BETTER HEALTH IN AGING SOCIETIES, 2006, 124 : 601 - 608
  • [37] Effective translation, tokenization and combination for cross-lingual retrieval
    Kamps, J
    Adafre, SF
    de Rijke, M
    MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 123 - 134
  • [38] Exploiting Wikipedia for cross-lingual and multilingual information retrieval
    Sorg, P.
    Cimiano, P.
    DATA & KNOWLEDGE ENGINEERING, 2012, 74 : 26 - 45
  • [39] Cross-Lingual Information Retrieval System for Indian Languages
    Jagarlamudi, Jagadeesh
    Kumaran, A.
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 80 - 87
  • [40] CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
    Wang, Yabing
    Wang, Fan
    Dong, Jianfeng
    Luo, Hao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5651 - 5659