Unsupervised morphological parsing of Bengali

被引:15
|
作者
Dasgupta, Sajib [1 ]
Ng, Vincent [1 ]
机构
[1] Univ Texas, Human Language Technol Res Inst, Richardson, TX 75083 USA
关键词
morphological parsing; word segmentation; data annotation; unsupervised learning; Asian language processing; Bengali;
D O I
10.1007/s10579-007-9031-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Unsupervised morphological analysis is the task of segmenting words into prefixes, suffixes and stems without prior knowledge of language-specific morphotactics and morpho-phonological rules. This paper introduces a simple, yet highly effective algorithm for unsupervised morphological learning for Bengali, an Indo-Aryan language that is highly inflectional in nature. When evaluated on a set of 4,110 human-segmented Bengali words, our algorithm achieves an F-score of 83%, substantially outperforming Linguistica, one of the most widely-used unsupervised morphological parsers, by about 23%.
引用
收藏
页码:311 / 330
页数:20
相关论文
共 50 条
  • [1] Unsupervised morphological parsing of Bengali
    Sajib Dasgupta
    Vincent Ng
    Language Resources and Evaluation, 2006, 40 : 311 - 330
  • [2] Probabilistic Approach of Parsing Bengali Sentences
    Khatun, Ayesha
    Hoque, Mohammed Moshiul
    2018 4TH IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2018), 2018, : 55 - 58
  • [3] Multilingual Unsupervised Dependency Parsing with Unsupervised POS Tags
    Marecek, David
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, MICAI 2015, PT I, 2015, 9413 : 72 - 82
  • [4] UnsuParse: Unsupervised Parsing with unsupervised Part of Speech tagging
    Haenig, Christian
    Bordag, Stefan
    Quasthoff, Uwe
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1109 - 1114
  • [5] Bengali Noun Morphological Analyzer
    Das, Priyanka
    Das, Arjun
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1538 - 1543
  • [6] Word Sense Disambiguation in Bengali: an Unsupervised Approach
    Pal, Alok Ranjan
    Saha, Diganta
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [7] Unsupervised Abstractive Summarization of Bengali Text Documents
    Chowdhury, Radia Rayan
    Nayeem, Mir Tafseer
    Mim, Tahsin Tasnim
    Chowdhury, Md Saifur Rahman
    Jannat, Taufiqul
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2612 - 2619
  • [8] Unsupervised Parts-of-Speech Induction for Bengali
    Nath, Joydeep
    Choudhury, Monojit
    Mukherjee, Animesh
    Biemann, Chris
    Ganguly, Niloy
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1220 - 1227
  • [9] Rule Augmented Unsupervised Constituency Parsing
    Sahay, Atul
    Nasery, Anshul
    Maheshwari, Ayush
    Ramakrishnan, Ganesh
    Iyer, Rishabh
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4923 - 4932
  • [10] Unsupervised Semantic Parsing of Video Collections
    Sener, Ozan
    Zamir, Amir R.
    Savarese, Silvio
    Saxena, Ashutosh
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4480 - 4488