Unsupervised Word Decomposition with the Promodes Algorithm

被引:0
|
作者
Spiegler, Sebastian [1 ]
Golenia, Bruno [1 ]
Flach, Peter A. [1 ]
机构
[1] Univ Bristol, Dept Comp Sci, Machine Learning Grp, Bristol BS8 1TH, Avon, England
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present PROMODES, an algorithm for unsupervised word decomposition, which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. For the Morph Challenge 2009, we demonstrate three versions of PROMODES. The first one uses a simple segmentation algorithm on a subset of the data and applies maximum likelihood estimates for model parameters when decomposing words of the original language data. The second version estimates its parameters through expectation maximization (EM). A third method is a committee of unsupervised learners where learners correspond to different EM initializations. The solution is found by majority vote which decides whether to segment at a word position or not. In this paper, we describe the probabilistic model, parameter estimation and how the most likely decomposition of an input word is found. We have tested PROMODES on non-vowelized and vowelized Arabic as well as on English, Finnish, German and Turkish. All three methods achieved competitive results.
引用
收藏
页码:625 / 632
页数:8
相关论文
共 50 条
  • [41] Neural Unsupervised Reconstruction of Protolanguage Word Forms
    He, Andre
    Tomlin, Nicholas
    Klein, Dan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1636 - 1649
  • [42] A novel unsupervised method for new word extraction
    Mei, Lili
    Huang, Heyan
    Wei, Xiaochi
    Mao, Xianling
    SCIENCE CHINA-INFORMATION SCIENCES, 2016, 59 (09)
  • [43] UNSUPERVISED WORD SEGMENTATION FROM NOISY INPUT
    Heymann, Jahn
    Walter, Oliver
    Haeb-Umbach, Reinhold
    Raj, Bhiksha
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 458 - 463
  • [44] Audio keyword extraction by unsupervised word discovery
    Muscariello, Armando
    Gravier, Guillaume
    Bimbot, Frederic
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2811 - +
  • [45] Unsupervised Approaches for Computing Word Similarity in Portuguese
    Oliveira, Hugo Goncalo
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017), 2017, 10423 : 828 - 840
  • [46] Word Embeddings for Unsupervised Named Entity Linking
    Nozza, Debora
    Sas, Cezar
    Fersini, Elisabetta
    Messina, Enza
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 115 - 132
  • [47] Word Sense Disambiguation in Bengali: an Unsupervised Approach
    Pal, Alok Ranjan
    Saha, Diganta
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [48] Deformable Sprites for Unsupervised Video Decomposition
    Ye, Vickie
    Li, Zhengqi
    Tucker, Richard
    Kanazawa, Angjoo
    Snavely, Noah
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2647 - 2656
  • [49] Unsupervised Approach to Word Sense Disambiguation in Malayalam
    Sankar, Sruthi K. P.
    Raj, P. C. Reghu
    Jayan, V
    INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING, SCIENCE AND TECHNOLOGY (ICETEST - 2015), 2016, 24 : 1507 - 1513
  • [50] Unsupervised word clustering using deep features
    Kulkarni, Mandar
    Karande, Shirish
    Lodha, Sachin
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 263 - 268