Unsupervised word-sense disambiguation using bilingual comparable corpora

被引:6
|
作者
Kaji, H [1 ]
Morimoto, Y [1 ]
机构
[1] Hitachi Ltd, Cent Res Lab, Kokubunji, Tokyo 1858601, Japan
关键词
word-sense disambiguation; unsupervised learning; comparable corpora;
D O I
10.1093/ietisy/E88-D.2.289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
引用
收藏
页码:289 / 301
页数:13
相关论文
共 50 条
  • [1] Unsupervised bilingual word sense disambiguation using Web statistics
    Wang, Y
    Hoffmann, A
    AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 1167 - 1172
  • [2] Word sense acquisition from bilingual comparable corpora
    Kaji, H
    HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2003, : 111 - 118
  • [3] Word-Sense Disambiguation of Korean Predicates using Sejong Electronic Dictionary and Unsupervised learning
    Kang, Sangwook
    Oh, Yeontaek
    Kim, Minho
    Kwon, Hyuk-chul
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 257 - 261
  • [4] Knowledge lean word-sense disambiguation
    Pedersen, T
    Bruce, R
    FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 800 - 805
  • [5] A statistical model for parsing and word-sense disambiguation
    Bikel, DM
    PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 155 - 163
  • [6] EXAMPLE-BASED WORD-SENSE DISAMBIGUATION
    URAMOTO, N
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1994, E77D (02) : 240 - 246
  • [7] Unsupervised Word Sense Disambiguation Using Word Embeddings
    Moradi, Behzad
    Ansari, Ebrahim
    Zabokrtsky, Zdenek
    PROCEEDINGS OF THE 2019 25TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2019, : 228 - 233
  • [8] Unsupervised Translated Word Sense Disambiguation in Constructing Bilingual Lexical Database
    Lynn, Htet Myet
    Choi, Chang
    Kim, Pankoo
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 1824 - 1827
  • [9] Unsupervised Word Sense Disambiguation Using The WWW
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    STAIRS 2006, 2006, 142 : 174 - 183
  • [10] Machine Learning Techniques for Myanmar Word-Sense Disambiguation
    Khaing, Phyu Phyu
    Aung, Than Nwe
    GENETIC AND EVOLUTIONARY COMPUTING, VOL I, 2016, 387 : 175 - 185