Unsupervised Word Decomposition with the Promodes Algorithm

被引:0
|
作者
Spiegler, Sebastian [1 ]
Golenia, Bruno [1 ]
Flach, Peter A. [1 ]
机构
[1] Univ Bristol, Dept Comp Sci, Machine Learning Grp, Bristol BS8 1TH, Avon, England
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present PROMODES, an algorithm for unsupervised word decomposition, which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. For the Morph Challenge 2009, we demonstrate three versions of PROMODES. The first one uses a simple segmentation algorithm on a subset of the data and applies maximum likelihood estimates for model parameters when decomposing words of the original language data. The second version estimates its parameters through expectation maximization (EM). A third method is a committee of unsupervised learners where learners correspond to different EM initializations. The solution is found by majority vote which decides whether to segment at a word position or not. In this paper, we describe the probabilistic model, parameter estimation and how the most likely decomposition of an input word is found. We have tested PROMODES on non-vowelized and vowelized Arabic as well as on English, Finnish, German and Turkish. All three methods achieved competitive results.
引用
收藏
页码:625 / 632
页数:8
相关论文
共 50 条
  • [1] AN UNSUPERVISED ALGORITHM FOR HYBRID/MORPHOLOGICAL SIGNAL DECOMPOSITION
    Kowalski, Matthieu
    Rodet, Thomas
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4112 - 4115
  • [2] A general algorithm for word graph matrix decomposition
    Hakkani-Tür, D
    Riccardi, G
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 596 - 599
  • [3] An improved algorithm for unsupervised decomposition of a multi-author document
    Giannella, Chris
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (02) : 400 - 411
  • [4] UNSUPERVISED BAYESIAN EMG DECOMPOSITION ALGORITHM USING TABU SEARCH
    Ge, Di
    Le Carpentier, Eric
    Farina, Dario
    Idier, Jerome
    ISABEL: 2008 FIRST INTERNATIONAL SYMPOSIUM ON APPLIED SCIENCES IN BIOMEDICAL AND COMMMUNICATION TECHNOLOGIES, 2008, : 134 - +
  • [5] ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing
    Butnaru, Andrei M.
    Ionescu, Radu Tudor
    Hristea, Florentina
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 916 - 926
  • [6] UNSUPERVISED WORD SEGMENTATION BASED ON WORD INFLUENCE
    Yan, Ruohao
    Zhang, Huaping
    Silamu, Wushour
    Hamdulla, Askar
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,
  • [7] Unsupervised Multilingual Word Embeddings
    Chen, Xilun
    Cardie, Claire
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 261 - 270
  • [8] Unsupervised Hyperspectral Band Selection via Multimodal Evolutionary Algorithm and Subspace Decomposition
    Wei, Yunpeng
    Hu, Huiqiang
    Xu, Huaxing
    Mao, Xiaobo
    SENSORS, 2023, 23 (04)
  • [9] A hierarchical feature decomposition clustering algorithm for unsupervised classification of document image types
    Curtis, Dean
    Kubushyn, Vitaliy
    Yfantis, E. A.
    Rogers, Michael
    ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 423 - 428
  • [10] Full-words automatic word sense tagging based on unsupervised learning algorithm
    Lu, Zhi-Mao
    Liu, Ting
    Li, Sheng
    Zidonghua Xuebao/Acta Automatica Sinica, 2006, 32 (02): : 228 - 236