Cross-lingual Similar Documents Retrieval Based on Co-occurrence Projection

被引:0
|
作者
Liu, Jiao [1 ]
Cui, Rong-Yi [1 ]
Zhao, Ya-Hui [1 ]
机构
[1] Yanbian Univ, Dept Comp Sci & Technol, Yanji, Jilin, Peoples R China
关键词
cross-lingual; similar documents retrieval; word cooccurrence; latent semantic analysis;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, an approach to calculating the similarity among cross-lingual documents was researched for multilingual documents including Chinese, English, and Korean. Firstly, document was represented as a vector in the space of other language by co-occurrence projection. And then, taking advantage of the latent semantic analysis, the loss of vector caused by polysemy between different languages was remedied. Finally, the cross-lingual cosine similarity of documents was calculated in the same language space possessing equivalent semantic information. External dictionary and knowledge base were sidestepped by using the translation corpus to establish the lexical correspondence among Chinese, English, and Korean. The results show that co-occurrence projection has a great effect in calculating cross-lingual documents similarity, moreover, the retrieval accuracy of translation can be reached 95%, which verifies the effectiveness of the proposed method.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [21] Cross-lingual Adaptation for Recipe Retrieval with Mixup
    Zhu, Bin
    Ngo, Chong-Wah
    Chen, Jingjing
    Chan, Wing-Kwong
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 258 - 267
  • [22] Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
    HajiAminShirazi, Shahrzad
    Momtazi, Saeedeh
    MACHINE TRANSLATION, 2020, 34 (04) : 287 - 303
  • [23] Cross-Lingual Information Retrieval from Multilingual Construction Documents Using Pretrained Language Models
    Kim, Jungyeon
    Chung, Sehwan
    Chi, Seokho
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2024, 150 (06)
  • [24] Pivot-based Candidate Retrieval for Cross-lingual Entity Linking
    Liu, Qian
    Geng, Xiubo
    Lu, Jie
    Jiang, Daxin
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1076 - 1085
  • [25] A New Technique for Detecting Similar Documents based on Term Co-occurrence and Conceptual Property of the Text
    Zamanifar, Azadeh
    Minaei-Bidgoli, Behrouz
    Kashefi, Omid
    2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 537 - 542
  • [26] Cross-lingual information retrieval model based on bilingual topic correlation
    Luo, Yuansheng
    Le, Zhongjian
    Wang, Mingwen
    Journal of Computational Information Systems, 2013, 9 (06): : 2433 - 2440
  • [27] A fuzzy knowledge-based system for cross-lingual text retrieval
    Chau, R
    Yeh, CH
    COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION - EVOLUTIONARY COMPUTATION & FUZZY LOGIC FOR INTELLIGENT CONTROL, KNOWLEDGE ACQUISITION & INFORMATION RETRIEVAL, 1999, 55 : 488 - 494
  • [28] Trusting the results in cross-lingual keyword-based ffmage retrieval
    Karlgren, Jussi
    Olsson, Fredrik
    EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 217 - +
  • [29] Morpheme-based, cross-lingual indexing for medical document retrieval
    Schulz, S
    Hahn, U
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 87 - 99
  • [30] Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval
    Trijakwanich, Nattapol
    Limkonchotiwat, Peerat
    Sarwar, Raheem
    Phatthiyaphaibun, Wannaphong
    Chuangsuwanich, Ekapol
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 935 - 944