Unsupervised morphological parsing of Bengali

被引:15
|
作者
Dasgupta, Sajib [1 ]
Ng, Vincent [1 ]
机构
[1] Univ Texas, Human Language Technol Res Inst, Richardson, TX 75083 USA
关键词
morphological parsing; word segmentation; data annotation; unsupervised learning; Asian language processing; Bengali;
D O I
10.1007/s10579-007-9031-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Unsupervised morphological analysis is the task of segmenting words into prefixes, suffixes and stems without prior knowledge of language-specific morphotactics and morpho-phonological rules. This paper introduces a simple, yet highly effective algorithm for unsupervised morphological learning for Bengali, an Indo-Aryan language that is highly inflectional in nature. When evaluated on a set of 4,110 human-segmented Bengali words, our algorithm achieves an F-score of 83%, substantially outperforming Linguistica, one of the most widely-used unsupervised morphological parsers, by about 23%.
引用
收藏
页码:311 / 330
页数:20
相关论文
共 50 条
  • [41] Enhancing Unsupervised Semantic Parsing with Distributed Contextual Representations
    Ling, Zixuan
    Zheng, Xiaoqing
    Xu, Jianhan
    Lin, Jinshu
    Chang, Kai-Wei
    Hsieh, Cho-Jui
    Huang, Xuanjing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11454 - 11465
  • [42] Unsupervised Noise Detection in Unstructured data for Automatic Parsing
    Jain, Shubham
    de Buitleir, Amy
    Fallon, Enda
    2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [43] Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications
    Alok Ranjan Pal
    Diganta Saha
    Sādhanā, 2019, 44
  • [44] Pattern based Pruning of Morphological Alternatives of Bengali Wordforms
    Barik, Biswanath
    Sarkar, Sudeshna
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1724 - 1730
  • [45] Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications
    Pal, Alok Ranjan
    Saha, Diganta
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2019, 44 (07):
  • [46] Unsupervised Part-of-Speech Disambiguation for High Frequency Words and Its Influence on Unsupervised Parsing
    Haenig, Christian
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 113 - 120
  • [47] Morphological Knowledge Guided Mongolian Constituent Parsing
    Liu, Na
    Su, Xiangdong
    Gao, Guanglai
    Bao, Feilong
    Lu, Min
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 363 - 375
  • [48] Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing
    Wang, Yaodong
    Ji, Zhong
    Wang, Di
    Pang, Yanwei
    Li, Xuelong
    KNOWLEDGE-BASED SYSTEMS, 2024, 285
  • [49] UNSUPERVISED MUTUAL LEARNING OF DIALOGUE DISCOURSE PARSING AND TOPIC SEGMENTATION
    Xu, Jiahui
    Jiang, Feng
    Gao, Anningzhe
    Li, Haizhou
    arXiv,
  • [50] Grammatical Inference of PCFGs Applied to Language Modelling and Unsupervised Parsing
    Scicluna, James
    de la Higuera, Colin
    FUNDAMENTA INFORMATICAE, 2016, 146 (04) : 379 - 402