Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance

被引:2
|
作者
Balikas, Georgios [1 ]
Laclau, Charlotte [1 ]
Redko, Ievgen [2 ]
Amini, Massih-Reza [1 ]
机构
[1] Univ Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
[2] Univ Lyon, Univ Claude Bernard Lyon 1, INSA Lyon,F69XXX, UJM St Etienne,CNRS,Inserm,CREATIS UMR 5220,U1206, Lyon, France
关键词
D O I
10.1007/978-3-319-76941-7_30
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature. Recently, a new promising metric called Word Mover's Distance was proposed to measure the divergence between text passages. In this paper, we demonstrate that this metric can be extended to incorporate term-weighting schemes and provide more accurate and computationally efficient matching between documents using entropic regularization. We evaluate the benefits of both extensions in the task of cross-lingual document retrieval (CLDR). Our experimental results on eight CLDR problems suggest that the proposed methods achieve remarkable improvements in terms of Mean Reciprocal Rank compared to several baselines.
引用
收藏
页码:398 / 410
页数:13
相关论文
共 50 条
  • [31] Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations
    Zhang, Rui
    Westerfield, Caitlin
    Shim, Sungrok
    Bingham, Garrett
    Fabbri, Alexander
    Hu, William
    Verma, Neha
    Radev, Dragomir
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3173 - 3179
  • [32] Document Similarity for Arabic and Cross-Lingual Web Content
    Salhi, Ali
    Yahya, Adnan H.
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 134 - 146
  • [33] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [34] Inducing word senses for cross-lingual document clustering
    Tang, Guoyu
    Xia, Yunqing
    Cambria, Erik
    Jin, Peng
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 409 - 414
  • [35] Cross-lingual Spoken Language Understanding with Regularized Representation Alignment
    Liu, Zihan
    Winata, Genta Indra
    Xu, Peng
    Lin, Zhaojiang
    Fung, Pascale
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7241 - 7251
  • [36] Semantic Space Transformations for Cross-Lingual Document Classification
    Martinek, Jiri
    Lenc, Ladislav
    Kral, Pavel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 608 - 616
  • [37] Document Representation with Statistical Word Senses in Cross-Lingual Document Clustering
    Tang, Guoyu
    Xia, Yunqing
    Cambria, Erik
    Jin, Peng
    Zheng, Thomas Fang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (02)
  • [38] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [39] Cross-Lingual Training of Neural Models for Document Ranking
    Shi, Peng
    Bai, He
    Lin, Jimmy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2768 - 2773
  • [40] CrossMath: Towards Cross-lingual Math Information Retrieval
    Gore, James
    Polletta, Joseph
    Mansouri, Behrooz
    PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 101 - 105