Using EuroWordNet in a concept-based approach to cross-language text retrieval

被引:0
|
作者
Gonzalo, Julio [1 ,2 ]
Verdejo, Felisa [1 ]
Chugur, Irina [1 ]
机构
[1] UNED, Ciudad Universitaria, Madrid, Spain
[2] Depto. Ing. Electrica, E., UNED, Ciudad Universitaria, s.n., 28040 Madrid, Spain
来源
关键词
Computational linguistics - Database systems - Errors - Indexing (of information) - Mathematical models - Natural language processing systems - Sensitivity analysis - Text processing;
D O I
暂无
中图分类号
学科分类号
摘要
We present an approach to cross-language text retrieval based on the EuroWordNet (EWN) multilingual semantic database. EuroWordNet is a multilingual, WordNet-like database with basic semantic relations between words for several European languages (English, Dutch, Spanish, Italian, German, French, Czech, and Estonian). In addition to the relations in WordNet 1.5, EWN includes domain labels, cross-language, and cross-part-of-speech relations, which are directly useful for multilingual information retrieval. In our approach, documents in any language covered by EuroWordNet are indexed in a space of language-independent concepts (the EuroWordNet Inter Lingual Index), thus turning term weighting and query/document matching into language-independent tasks. We report on the results of a number of experiments that measure the potential benefits of the approach and its tolerance to word sense disambiguation errors. In our monolingual experiments, the classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) if WordNet synsets are chosen as the indexing space, instead of word forms. This result is obtained for a manually disambiguated test collection derived from the Semcor annotated corpus. The sensitivity of retrieval performance to (automatic) disambiguation errors is also measured. Our preliminary bilingual experiments, also reported here, show that our approach can sensibly outperform a naive, dictionary-based, translation of the query terms into the target language.
引用
收藏
页码:647 / 678
相关论文
共 50 条
  • [31] Explicit Versus Latent Concept Models for Cross-Language Information Retrieval
    Cimiano, Philipp
    Schultz, Antje
    Sizov, Sergej
    Sorg, Philipp
    Staab, Steffen
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1513 - 1518
  • [32] Cross-language information retrieval using web directories
    Kimura, F
    Maeda, A
    Yoshikawa, M
    Uemura, S
    2003 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS, AND SIGNAL PROCESSING, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2003, : 911 - 914
  • [33] Using Lasso RCCA for cross-language information retrieval
    Polajnar, Emil
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (09) : 2739 - 2748
  • [34] Using restricted CCA for cross-language information retrieval
    Polajnar, Emil
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (06) : 4618 - 4626
  • [35] Cross-language Information Retrieval Based on Multiple Information
    Liu, Pengyuan
    Zheng, Zhijun
    Su, Qi
    2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 623 - 626
  • [36] Translation-based indexing for cross-language retrieval
    Oard, DW
    Ertunc, F
    ADVANCES IN INFORMATION REFTRIEVAL, 2002, 2291 : 324 - 333
  • [37] Wikipedia-based cross-language text classification
    Mourino Garcia, Marcos Antonio
    Perez Rodriguez, Roberto
    Anido Rifon, Luis
    INFORMATION SCIENCES, 2017, 406 : 12 - 28
  • [38] Concept-based video retrieval
    University of Amsterdam, Science Park 107, 1098 XG Amsterdam, Netherlands
    Found. Trends Inf. Retr., 2008, 4 (215-322):
  • [39] QUILT: Implementing a large-scale cross-language text retrieval system
    Davis, MW
    Ogden, WC
    PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1997, : 92 - 98
  • [40] Resolving ambiguity for cross-language retrieval
    Univ of Massachusetts, Amherst, MA, United States
    SIGIR Forum, (64-71):