Using EuroWordNet in a concept-based approach to cross-language text retrieval

被引：0

作者：

Gonzalo, Julio ^{[1
,2
]}

Verdejo, Felisa ^{[1
]}

Chugur, Irina ^{[1
]}

机构：

[1] UNED, Ciudad Universitaria, Madrid, Spain

[2] Depto. Ing. Electrica, E., UNED, Ciudad Universitaria, s.n., 28040 Madrid, Spain

来源：

Applied Artificial Intelligence | / 13卷 / 07期

关键词：

Computational linguistics - Database systems - Errors - Indexing (of information) - Mathematical models - Natural language processing systems - Sensitivity analysis - Text processing;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We present an approach to cross-language text retrieval based on the EuroWordNet (EWN) multilingual semantic database. EuroWordNet is a multilingual, WordNet-like database with basic semantic relations between words for several European languages (English, Dutch, Spanish, Italian, German, French, Czech, and Estonian). In addition to the relations in WordNet 1.5, EWN includes domain labels, cross-language, and cross-part-of-speech relations, which are directly useful for multilingual information retrieval. In our approach, documents in any language covered by EuroWordNet are indexed in a space of language-independent concepts (the EuroWordNet Inter Lingual Index), thus turning term weighting and query/document matching into language-independent tasks. We report on the results of a number of experiments that measure the potential benefits of the approach and its tolerance to word sense disambiguation errors. In our monolingual experiments, the classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) if WordNet synsets are chosen as the indexing space, instead of word forms. This result is obtained for a manually disambiguated test collection derived from the Semcor annotated corpus. The sensitivity of retrieval performance to (automatic) disambiguation errors is also measured. Our preliminary bilingual experiments, also reported here, show that our approach can sensibly outperform a naive, dictionary-based, translation of the query terms into the target language.

引用

页码：647 / 678

共 50 条

[21] An axiomatic approach to corpus-based cross-language information retrieval
Razieh Rahimi
Ali Montazeralghaem
Azadeh Shakery
Information Retrieval Journal, 2020, 23 : 191 - 215
[22] Billingual Formal Concept Analysis for Cross-Language Information Retrieval
Ali, Chedi Bechikh
Haddad, Hatem
Slimani, Yahia
2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 922 - 928
[23] WordNet based cross-language text categorization
Amine, Bentaallah Mohamed
Mimoun, Malki
2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 848 - +
[24] Cross-language information retrieval
Nie J.-Y.
Synthesis Lectures on Human Language Technologies, 2010, 3 (01): : 1 - 142
[25] Cross-Language Retrieval with Wikipedia
Schoenhofen, Peter
Benczur, Andras
Biro, Istvan
Csalogany, Karoly
ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 72 - 79
[26] Cross-Language Information Retrieval
Federico, Marcello
COMPUTATIONAL LINGUISTICS, 2011, 37 (02) : 411 - 412
[27] Cross-language information retrieval
Oard, DW
Diekema, AR
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1998, 33 : 223 - 256
[28] A corpus-based relevance feedback approach to cross-language image retrieval
Chang, Yih-Chen
Lin, Wen-Cheng
Chen, Hsin-Hsi
ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 592 - 601
[29] A hybrid approach to query and document translation using a pivot language for cross-language information retrieval
Kishida, Kazuaki
Kando, Noriko
ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 93 - 101
[30] Cross-language retrieval using HAIRCUT at CLEF 2004
McNamee, P
Mayfield, J
MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 50 - 59

← 1 2 3 4 5 →