Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

被引:0
|
作者
Chebel, Mohamed [1 ]
Latiri, Chiraz [1 ]
Gaussier, Eric [2 ]
机构
[1] Tunis EL Manar Univ, Fac Sci Tunis, LIPAH Res Lab, Tunis, Tunisia
[2] Univ Grenoble Alpes, LIG, Grenoble INP, CNRS, Grenoble, France
关键词
Corpus linguistics; Evaluation; Information extraction; Information retrieval; Multilinguality;
D O I
10.1017/S135132492100022X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.
引用
收藏
页码:138 / 161
页数:24
相关论文
共 50 条
  • [1] Bilingual Lexicon Extraction from Comparable Corpora Based on Closed Concepts Mining
    Chebel, Mohamed
    Latiri, Chiraz
    Gaussier, Eric
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 : 586 - 598
  • [2] Addressing polysemy in bilingual lexicon extraction from comparable corpora
    Fiser, Darja
    Ljubesic, Nikola
    Kubelka, Ozren
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3031 - 3035
  • [3] Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora
    Zhang, Chunyue
    Zhao, Tiejun
    NEURAL INFORMATION PROCESSING, PT II, 2015, 9490 : 528 - 535
  • [4] Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora
    Hazem, Amir
    Morin, Emmanuel
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 288 - 292
  • [5] Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction
    Morin, Emmanuel
    Hazem, Amir
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 1284 - 1293
  • [6] Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction
    Morin, Emmanuel
    Hazem, Amir
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (04) : 575 - 601
  • [7] Iterative Bilingual Lexicon Extraction from Comparable Corpora with Topical and Contextual Knowledge
    Chu, Chenhui
    Nakazawa, Toshiaki
    Kurohashi, Sadao
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 296 - 309
  • [8] Bilingual Lexicon Extraction with Temporal Distributed Word Representation from Comparable Corpora
    Zhang, Chunyue
    Zhao, Tiejun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 380 - 387
  • [9] Bilingual Lexicon Extraction using Locally Weighted Linear Regression from Comparable Corpora
    Zhang, Chunyue
    Zhao, Tiejun
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 13 - 16
  • [10] Combining Lexical Context with Pseudo-alignment for Bilingual Lexicon Extraction from Comparable Corpora
    Li, Bo
    Zhu, Qunyan
    He, Tingting
    Chen, Qianjun
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 223 - 233