Morphological Analysis by Multiple Sequence Alignment

被引:0
|
作者
Tchoukalov, Tzvetan [1 ]
Monson, Christian [2 ]
Roark, Brian [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In biological sequence processing, Multiple Sequence Alignment (MSA) techniques capture information about long-distance dependencies and the three-dimensional structure of protein and nucleotide sequences without resorting to polynomial complexity context-free models. But MSA techniques have rarely been used in natural language (NL) processing, and never for NL morphology induction. Our MetaMorph algorithm is a first attempt at leveraging MSA techniques to induce NL morphology in an unsupervised fashion. Given a text corpus in any language, MetaMorph sequentially aligns words of the corpus to form an MSA and then segments the MSA to produce morphological analyses. Over corpora that contain millions of unique word types, MetaMorph identifies morphemes at an F-1 below state-of-the-art performance. But when restricted to smaller sets of orthographically related words, MetaMorph outperforms the state-of-the-art ParaMor-Morfessor Union morphology induction system. Tested on 5,000 orthographically similar Hungarian word types, MetaMorph reaches 54.1% and ParaMor-Morfessor just 41.9%. Hence, we conclude that MSA is a promising algorithm for unsupervised morphology induction. Future research directions are discussed.
引用
收藏
页码:666 / +
页数:2
相关论文
共 50 条
  • [1] Multiple sequence alignment in phylogenetic analysis
    Phillips, A
    Janies, D
    Wheeler, W
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2000, 16 (03) : 317 - 330
  • [2] Performance Analysis of Multiple Sequence Alignment Tools
    Reddy, Bharath
    Fields, Richard
    PROCEEDINGS OF THE 2024 ACM SOUTHEAST CONFERENCE, ACMSE 2024, 2024, : 167 - 174
  • [3] MULTIPLE SEQUENCE ALIGNMENT
    ANDERSON, WF
    BACON, DJ
    MOL, CD
    BIOPHYSICAL JOURNAL, 1986, 49 (02) : A294 - A294
  • [4] MULTIPLE SEQUENCE ALIGNMENT
    BACON, DJ
    ANDERSON, WF
    JOURNAL OF MOLECULAR BIOLOGY, 1986, 191 (02) : 153 - 161
  • [5] Multiple sequence alignment
    Edgar, Robert C.
    Batzoglou, Serafim
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) : 368 - 373
  • [6] Comparative analysis of multiple protein sequence alignment methods
    Briffeuil, P
    Baudoux, G
    Reringster, I
    Depiereux, E
    Feytmans, E
    ARCHIVES OF PHYSIOLOGY AND BIOCHEMISTRY, 1996, 104 (03) : B33 - B33
  • [7] Multiple sequence threading: An analysis of alignment quality and stability
    Taylor, WR
    JOURNAL OF MOLECULAR BIOLOGY, 1997, 269 (05) : 902 - 943
  • [8] Promoter Sequence Analysis through No Gap Multiple Sequence Alignment of Motif Pairs
    Kouser
    Rangarajan, Lalitha
    SECOND INTERNATIONAL SYMPOSIUM ON COMPUTER VISION AND THE INTERNET (VISIONNET'15), 2015, 58 : 356 - 362
  • [9] Multiple alignment by sequence annealing
    Schwartz, Ariel S.
    Pachter, Lior
    BIOINFORMATICS, 2007, 23 (02) : E24 - E29
  • [10] Overestimation for multiple sequence alignment
    Cazenave, Tristan
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2007, : 159 - 164