Cross-lingual Similar Documents Retrieval Based on Co-occurrence Projection

被引:0
|
作者
Liu, Jiao [1 ]
Cui, Rong-Yi [1 ]
Zhao, Ya-Hui [1 ]
机构
[1] Yanbian Univ, Dept Comp Sci & Technol, Yanji, Jilin, Peoples R China
关键词
cross-lingual; similar documents retrieval; word cooccurrence; latent semantic analysis;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, an approach to calculating the similarity among cross-lingual documents was researched for multilingual documents including Chinese, English, and Korean. Firstly, document was represented as a vector in the space of other language by co-occurrence projection. And then, taking advantage of the latent semantic analysis, the loss of vector caused by polysemy between different languages was remedied. Finally, the cross-lingual cosine similarity of documents was calculated in the same language space possessing equivalent semantic information. External dictionary and knowledge base were sidestepped by using the translation corpus to establish the lexical correspondence among Chinese, English, and Korean. The results show that co-occurrence projection has a great effect in calculating cross-lingual documents similarity, moreover, the retrieval accuracy of translation can be reached 95%, which verifies the effectiveness of the proposed method.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [1] Context-based generic cross-lingual retrieval of documents and automated summaries
    Lam, W
    Chan, K
    Radev, D
    Saggion, H
    Teufel, S
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (02): : 129 - 139
  • [2] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [3] Cross-Lingual Phrase Retrieval
    Zheng, Heqi
    Zhang, Xiao
    Chi, Zewen
    Huang, Heyan
    Yan, Tan
    Lan, Tian
    Wei, Wei
    Mao, Xian-Ling
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4193 - 4204
  • [4] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Elham Ghanbari
    Azadeh Shakery
    Applied Intelligence, 2022, 52 : 3156 - 3174
  • [5] A Sense Based Similarity Measure for Cross-Lingual Documents
    Huang, Hsun-Hui
    Yang, Horng-Chang
    Kuo, Yau-Hwang
    ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 1, PROCEEDINGS, 2008, : 9 - +
  • [6] Using query-relevant documents pairs for cross-lingual information retrieval
    Pinto, David
    Juan, Alfons
    Rosso, Paolo
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 630 - 637
  • [7] Semantic Cross-Lingual Information Retrieval
    Pourmahmoud, Solmaz
    Shamsfard, Mehrnoush
    23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 80 - +
  • [8] Cross-lingual projection for class-based language models
    Gfeller, Beat
    Schogol, Vlad
    Hall, Keith
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 83 - 88
  • [9] Translating Justice: A Cross-Lingual Information Retrieval System for Maltese Case Law Documents
    Azzopardi, Joel
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V, 2024, 14612 : 236 - 240
  • [10] Target Language Monolingual Translation Memory based NMT by Cross-lingual Retrieval of Similar Translations and Reranking
    Tamura, Takuya
    Wang, Xiaotian
    Utsuro, Takehito
    Nagata, Masaaki
    MT Summit 2023 - Proceedings of 19th Machine Translation Summit, 2023, 1 : 313 - 323