Unsupervised word-sense disambiguation using bilingual comparable corpora

被引:6
|
作者
Kaji, H [1 ]
Morimoto, Y [1 ]
机构
[1] Hitachi Ltd, Cent Res Lab, Kokubunji, Tokyo 1858601, Japan
关键词
word-sense disambiguation; unsupervised learning; comparable corpora;
D O I
10.1093/ietisy/E88-D.2.289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
引用
收藏
页码:289 / 301
页数:13
相关论文
共 50 条
  • [21] Unsupervised Approach to Word Sense Disambiguation in Malayalam
    Sankar, Sruthi K. P.
    Raj, P. C. Reghu
    Jayan, V
    INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING, SCIENCE AND TECHNOLOGY (ICETEST - 2015), 2016, 24 : 1507 - 1513
  • [22] Word Sense Disambiguation in Bengali: an Unsupervised Approach
    Pal, Alok Ranjan
    Saha, Diganta
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [23] An unsupervised method for word sense tagging using parallel corpora
    Diab, M
    Resnik, P
    40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 255 - 262
  • [24] Selecting Decomposable Models for Word-Sense Disambiguation: TheGrling-Sdm System
    Tom O'Hara
    Janyce Wiebe
    Rebecca Bruce
    Computers and the Humanities, 2000, 34 : 159 - 164
  • [25] Learning Sense Representation from Word Representation for Unsupervised Word Sense Disambiguation
    Wang, Jie
    Fu, Zhenxin
    Li, Moxin
    Zhang, Haisong
    Zhao, Dongyan
    Yan, Rui
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13947 - 13948
  • [26] Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation
    Han, Shangzhuang
    Shirai, Kiyoaki
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 1218 - 1225
  • [27] Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications
    Alok Ranjan Pal
    Diganta Saha
    Sādhanā, 2019, 44
  • [28] Improving Subjectivity Detection using Unsupervised Subjectivity Word Sense Disambiguation
    Ortega, Reynier
    Fonseca, Adrian
    Gutierrez, Yoan
    Montoyo, Andres
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (51): : 179 - 186
  • [29] An unsupervised & statistical word sense tagging using bilingual sources
    Oliveira, F
    Wong, F
    Li, YP
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3749 - 3754
  • [30] Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications
    Pal, Alok Ranjan
    Saha, Diganta
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2019, 44 (07):