Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

被引:0
|
作者
Chebel, Mohamed [1 ]
Latiri, Chiraz [1 ]
Gaussier, Eric [2 ]
机构
[1] Tunis EL Manar Univ, Fac Sci Tunis, LIPAH Res Lab, Tunis, Tunisia
[2] Univ Grenoble Alpes, LIG, Grenoble INP, CNRS, Grenoble, France
关键词
Corpus linguistics; Evaluation; Information extraction; Information retrieval; Multilinguality;
D O I
10.1017/S135132492100022X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.
引用
收藏
页码:138 / 161
页数:24
相关论文
共 50 条
  • [41] Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
    Yan Xu
    Luoxin Chen
    Junsheng Wei
    Sophia Ananiadou
    Yubo Fan
    Yi Qian
    Eric I-Chao Chang
    Junichi Tsujii
    BMC Bioinformatics, 16
  • [42] Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
    Xu, Yan
    Chen, Luoxin
    Wei, Junsheng
    Ananiadou, Sophia
    Fan, Yubo
    Qian, Yi
    Chang, Eric I-Chao
    Tsujii, Junichi
    BMC BIOINFORMATICS, 2015, 16
  • [43] Multi-feature based Chinese-English Named Entity Extraction from comparable corpora
    Lu, Min
    Zhao, Jun
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 134 - 141
  • [44] A Hybrid Approach for Automatic Extraction of Bilingual Multiword Expressions from Parallel Corpora
    Semmar, Nasredine
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 311 - 318
  • [45] EFFICIENT APPROXIMATION FOR COUNTING OF FORMAL CONCEPTS GENERATED FROM FORMAL CONTEXT
    Kovacs, L.
    MISKOLC MATHEMATICAL NOTES, 2018, 19 (02) : 983 - 996
  • [46] Anchoring points for bilingual lexical extraction from small, specialized, comparable corpus
    Prochasson, Emmanuel
    Morin, Emmanuel
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (01): : 283 - 304
  • [47] Improved machine translation performance via parallel sentence extraction from comparable corpora
    Munteanu, DS
    Fraser, A
    Marcu, D
    HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 265 - 272
  • [48] In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
    Terryn, Ayla Rigouts
    Hoste, Veronique
    Lefever, Els
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (02) : 385 - 418
  • [49] In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
    Ayla Rigouts Terryn
    Véronique Hoste
    Els Lefever
    Language Resources and Evaluation, 2020, 54 : 385 - 418
  • [50] Single word term extraction using a Bilingual semantic Lexicon-based approach
    Zan, Hongying
    Duan, Guocheng
    Fan, Ming
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 5, PROCEEDINGS, 2007, : 451 - +