Unsupervised word-sense disambiguation using bilingual comparable corpora

被引:6
|
作者
Kaji, H [1 ]
Morimoto, Y [1 ]
机构
[1] Hitachi Ltd, Cent Res Lab, Kokubunji, Tokyo 1858601, Japan
关键词
word-sense disambiguation; unsupervised learning; comparable corpora;
D O I
10.1093/ietisy/E88-D.2.289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
引用
收藏
页码:289 / 301
页数:13
相关论文
共 50 条
  • [41] Using part-of-speech and word-sense disambiguation for boosting string-edit distance spelling correction
    Ruch, P
    Baud, R
    Geissbühler, A
    Lovis, C
    Rassinoux, AM
    Rivière, A
    ARTIFICIAL INTELLIGENCE IN MEDICINE, PROCEEDINGS, 2001, 2101 : 249 - 257
  • [42] Unsupervised Hindi Word Sense Disambiguation based on Network Agglomeration
    Jain, Amita
    Lobiyal, D. K.
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 195 - 200
  • [43] Parallel corpora make sense Bypassing the knowledge acquisition bottleneck for Word Sense Disambiguation
    Lefever, Els
    Hoste, Veronique
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2014, 19 (03) : 333 - 367
  • [44] Unsupervised word sense disambiguation with N-gram features
    Preotiuc-Pietro, Daniel
    Hristea, Florentina
    ARTIFICIAL INTELLIGENCE REVIEW, 2014, 41 (02) : 241 - 260
  • [45] The optimization of gibbs sampling model in unsupervised word sense disambiguation
    Li, Xu
    Shen, Lan
    Yao, Chunlong
    Yu, Xiaoqiang
    ICIC Express Letters, Part B: Applications, 2012, 3 (04): : 861 - 868
  • [46] A clustering-based Approach for Unsupervised Word Sense Disambiguation
    Martin-Wanton, Tamara
    Berlanga-Llavori, Rafael
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (49): : 49 - 56
  • [47] Research on dual pattern of unsupervised and supervised Word Sense Disambiguation
    Wang, Yao-Feng
    Zhang, Yue-Jie
    Xu, Zhi-Ting
    Zhang, Tao
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2665 - +
  • [48] AN IMPROVED UNSUPERVISED LEARNING PROBABILISTIC MODEL OF WORD SENSE DISAMBIGUATION
    Li, Xu
    Zhao, Xiuyan
    Ban, Fenglong
    Liu, Bai
    PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 1071 - 1075
  • [49] Combining unsupervised lexical knowledge methods for word sense disambiguation
    Rigau, G
    Atserias, J
    Agirre, E
    35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 48 - 55
  • [50] Graph Connectivity for Unsupervised Word Sense Disambiguation for HINDI Language
    Nandanwar, Lokesh
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,