Using EuroWordNet in a concept-based approach to cross-language text retrieval

被引:0
|
作者
Gonzalo, Julio [1 ,2 ]
Verdejo, Felisa [1 ]
Chugur, Irina [1 ]
机构
[1] UNED, Ciudad Universitaria, Madrid, Spain
[2] Depto. Ing. Electrica, E., UNED, Ciudad Universitaria, s.n., 28040 Madrid, Spain
来源
关键词
Computational linguistics - Database systems - Errors - Indexing (of information) - Mathematical models - Natural language processing systems - Sensitivity analysis - Text processing;
D O I
暂无
中图分类号
学科分类号
摘要
We present an approach to cross-language text retrieval based on the EuroWordNet (EWN) multilingual semantic database. EuroWordNet is a multilingual, WordNet-like database with basic semantic relations between words for several European languages (English, Dutch, Spanish, Italian, German, French, Czech, and Estonian). In addition to the relations in WordNet 1.5, EWN includes domain labels, cross-language, and cross-part-of-speech relations, which are directly useful for multilingual information retrieval. In our approach, documents in any language covered by EuroWordNet are indexed in a space of language-independent concepts (the EuroWordNet Inter Lingual Index), thus turning term weighting and query/document matching into language-independent tasks. We report on the results of a number of experiments that measure the potential benefits of the approach and its tolerance to word sense disambiguation errors. In our monolingual experiments, the classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) if WordNet synsets are chosen as the indexing space, instead of word forms. This result is obtained for a manually disambiguated test collection derived from the Semcor annotated corpus. The sensitivity of retrieval performance to (automatic) disambiguation errors is also measured. Our preliminary bilingual experiments, also reported here, show that our approach can sensibly outperform a naive, dictionary-based, translation of the query terms into the target language.
引用
收藏
页码:647 / 678
相关论文
共 50 条
  • [21] An axiomatic approach to corpus-based cross-language information retrieval
    Razieh Rahimi
    Ali Montazeralghaem
    Azadeh Shakery
    Information Retrieval Journal, 2020, 23 : 191 - 215
  • [22] Billingual Formal Concept Analysis for Cross-Language Information Retrieval
    Ali, Chedi Bechikh
    Haddad, Hatem
    Slimani, Yahia
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 922 - 928
  • [23] WordNet based cross-language text categorization
    Amine, Bentaallah Mohamed
    Mimoun, Malki
    2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 848 - +
  • [24] Cross-language information retrieval
    Nie J.-Y.
    Synthesis Lectures on Human Language Technologies, 2010, 3 (01): : 1 - 142
  • [25] Cross-Language Retrieval with Wikipedia
    Schoenhofen, Peter
    Benczur, Andras
    Biro, Istvan
    Csalogany, Karoly
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 72 - 79
  • [26] Cross-Language Information Retrieval
    Federico, Marcello
    COMPUTATIONAL LINGUISTICS, 2011, 37 (02) : 411 - 412
  • [27] Cross-language information retrieval
    Oard, DW
    Diekema, AR
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1998, 33 : 223 - 256
  • [28] A corpus-based relevance feedback approach to cross-language image retrieval
    Chang, Yih-Chen
    Lin, Wen-Cheng
    Chen, Hsin-Hsi
    ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 592 - 601
  • [29] A hybrid approach to query and document translation using a pivot language for cross-language information retrieval
    Kishida, Kazuaki
    Kando, Noriko
    ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 93 - 101
  • [30] Cross-language retrieval using HAIRCUT at CLEF 2004
    McNamee, P
    Mayfield, J
    MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 50 - 59