Unsupervised Word Decomposition with the Promodes Algorithm

被引:0
|
作者
Spiegler, Sebastian [1 ]
Golenia, Bruno [1 ]
Flach, Peter A. [1 ]
机构
[1] Univ Bristol, Dept Comp Sci, Machine Learning Grp, Bristol BS8 1TH, Avon, England
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present PROMODES, an algorithm for unsupervised word decomposition, which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. For the Morph Challenge 2009, we demonstrate three versions of PROMODES. The first one uses a simple segmentation algorithm on a subset of the data and applies maximum likelihood estimates for model parameters when decomposing words of the original language data. The second version estimates its parameters through expectation maximization (EM). A third method is a committee of unsupervised learners where learners correspond to different EM initializations. The solution is found by majority vote which decides whether to segment at a word position or not. In this paper, we describe the probabilistic model, parameter estimation and how the most likely decomposition of an input word is found. We have tested PROMODES on non-vowelized and vowelized Arabic as well as on English, Finnish, German and Turkish. All three methods achieved competitive results.
引用
收藏
页码:625 / 632
页数:8
相关论文
共 50 条
  • [31] Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
    Kamper, Herman
    Jansen, Aren
    Goldwater, Sharon
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 669 - 679
  • [32] Toward Unsupervised Protocol Feature Word Extraction
    Zhang, Zhuo
    Zhang, Zhibin
    Lee, Patrick P. C.
    Liu, Yunjie
    Xie, Gaogang
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2014, 32 (10) : 1894 - 1906
  • [33] Preliminary Experiments on Unsupervised Word Discovery in Mboshi
    Godard, Pierre
    Addal, Gilles
    Adda-Decker, Martine
    Allauzen, Alexandre
    Besacier, Laurent
    Bonneau-Maynard, Helene
    Kouarata, Guy-Noel
    Loser, Kevin
    Rialland, Annie
    Yvon, Francois
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3539 - 3543
  • [34] Non-Adversarial Unsupervised Word Translation
    Hoshen, Yedid
    Wolf, Lior
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 469 - 478
  • [35] Unsupervised Joint Training of Bilingual Word Embeddings
    Marie, Benjamin
    Fujita, Atsushi
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3224 - 3230
  • [36] An Unsupervised Approach for Constructing Word Similarity Network
    Hu, Yu
    Nie, Tiezheng
    Shen, Derong
    Kou, Yue
    2015 12TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2015, : 265 - 268
  • [37] Accessing Higher Dimensions for Unsupervised Word Translation
    Wang, Sida I.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,
  • [38] Unsupervised Word Sense Disambiguation with Multilingual Representations
    Fernandez-Ordonez, Erwin
    Mihalcea, Rada
    Hassan, Samer
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 847 - 851
  • [39] Mnogoznal: an Unsupervised System for Word Sense Disambiguation
    Ustalov, Dmitry
    Teslenko, Denis
    Panchenko, Alexander
    Chernoskutov, Mikhail
    2017 INTERNATIONAL MULTI-CONFERENCE ON ENGINEERING, COMPUTER AND INFORMATION SCIENCES (SIBIRCON), 2017, : 147 - 150
  • [40] Unsupervised Word Sense Disambiguation Using The WWW
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    STAIRS 2006, 2006, 142 : 174 - 183