Unsupervised word-sense disambiguation using bilingual comparable corpora

被引:6
|
作者
Kaji, H [1 ]
Morimoto, Y [1 ]
机构
[1] Hitachi Ltd, Cent Res Lab, Kokubunji, Tokyo 1858601, Japan
关键词
word-sense disambiguation; unsupervised learning; comparable corpora;
D O I
10.1093/ietisy/E88-D.2.289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
引用
收藏
页码:289 / 301
页数:13
相关论文
共 50 条
  • [31] Selecting decomposable models for word-sense disambiguation:: The grling-sdm system
    O'Hara, T
    Wiebe, J
    Bruce, R
    COMPUTERS AND THE HUMANITIES, 2000, 34 (1-2): : 159 - 164
  • [32] Word sense disambiguation of Thai language with unsupervised learning
    Pongpinigpinyo, S
    Rivepiboon, W
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2005, 3681 : 1275 - 1283
  • [33] Graph Connectivity Measures for Unsupervised Word Sense Disambiguation
    Navigli, Roberto
    Lapata, Mirella
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1683 - 1688
  • [34] The Noisy Channel Mode for Unsupervised Word Sense Disambiguation
    Yuret, Deniz
    Yatbaz, Mehmet Ali
    COMPUTATIONAL LINGUISTICS, 2010, 36 (01) : 111 - 127
  • [35] Sense-Annotated Corpora for Word Sense Disambiguation in Multiple Languages and Domains
    Scarlini, Bianca
    Pasini, Tommaso
    Navigli, Roberto
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5897 - 5903
  • [36] Unsupervised Word Sense Disambiguation Using Markov Random Field and Dependency Parser
    Chaplot, Devendra Singh
    Bhattacharyya, Pushpak
    Paranjape, Ashwin
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2217 - 2223
  • [37] UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study
    Van-Hai Vu
    Quang-Phuoc Nguyen
    Shin, Joon-Choul
    Ock, Cheol-Young
    APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [38] A Unified and Unsupervised Framework for Bilingual Phrase Alignment on Specialized Comparable Corpora
    Liu, Jingshu
    Morin, Emmanuel
    Saldarriaga, Sebastian Pena
    Lark, Joseph
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2093 - 2100
  • [39] Unsupervised graph-based word sense disambiguation using measures of word semantic similarity
    Sinha, Ravi
    Mihalcea, Rada
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 363 - +
  • [40] UoR-NCL at SemEval-2023 Task 1: Learning Word-Sense and Image Embeddings for Word Sense Disambiguation
    Markchom, Thanet
    Liang, Huizhi
    Gitau, Joyce
    Liu, Zehao
    Ojha, Varun
    Taylor, Lee
    Bonnici, Jake
    Alshadadi, Abdullah
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 16 - 22