Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

被引:0
|
作者
Chebel, Mohamed [1 ]
Latiri, Chiraz [1 ]
Gaussier, Eric [2 ]
机构
[1] Tunis EL Manar Univ, Fac Sci Tunis, LIPAH Res Lab, Tunis, Tunisia
[2] Univ Grenoble Alpes, LIG, Grenoble INP, CNRS, Grenoble, France
关键词
Corpus linguistics; Evaluation; Information extraction; Information retrieval; Multilinguality;
D O I
10.1017/S135132492100022X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.
引用
收藏
页码:138 / 161
页数:24
相关论文
共 50 条
  • [31] French-English terminology extraction from comparable corpora
    Daille, B
    Morin, E
    NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS, 2005, 3651 : 707 - 718
  • [32] Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis
    Hazem, Amir
    Daille, Beatrice
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3110 - 3115
  • [33] Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction
    Kim, Jae-Hoon
    Kwon, Hong-Seok
    Seo, Hyeong-Won
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2015, 2015
  • [34] The treatment of polysemy in the extraction of bilingual lexics from parallel corpora
    Gamallo Otero, Pablo
    Sotelo Docio, Susana
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 103 - 110
  • [35] Extraction of alignment relationships in comparable corpora based on Singular Value Decomposition
    Oliveira F.
    Wong F.
    Ho A.
    Li Y.-P.
    Chao S.
    Information Technology Journal, 2011, 10 (11) : 2076 - 2083
  • [36] Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
    Chu, Chenhui
    Dabre, Raj
    Kurohashi, Sadao
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2931 - 2935
  • [37] A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
    Zweigenbaum, Pierre
    Sharoff, Serge
    Rapp, Reinhard
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3828 - 3833
  • [38] TExSIS Bilingual terminology extraction from parallel corpora using chunk-based alignment
    Macken, Lieve
    Lefever, Els
    Hoste, Veronique
    TERMINOLOGY, 2013, 19 (01): : 1 - 30
  • [39] Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision
    Aysa, Anwar
    Ablimit, Mijit
    Yilahun, Hankiz
    Hamdulla, Askar
    INFORMATION, 2022, 13 (04)
  • [40] Automatic Parallel Corpora and Bilingual Terminology extraction from Parallel WebSites
    Almeida, Jose Joao
    Simoes, Alberto
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 50 - 55