Morphological Analysis by Multiple Sequence Alignment

被引:0
|
作者
Tchoukalov, Tzvetan [1 ]
Monson, Christian [2 ]
Roark, Brian [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In biological sequence processing, Multiple Sequence Alignment (MSA) techniques capture information about long-distance dependencies and the three-dimensional structure of protein and nucleotide sequences without resorting to polynomial complexity context-free models. But MSA techniques have rarely been used in natural language (NL) processing, and never for NL morphology induction. Our MetaMorph algorithm is a first attempt at leveraging MSA techniques to induce NL morphology in an unsupervised fashion. Given a text corpus in any language, MetaMorph sequentially aligns words of the corpus to form an MSA and then segments the MSA to produce morphological analyses. Over corpora that contain millions of unique word types, MetaMorph identifies morphemes at an F-1 below state-of-the-art performance. But when restricted to smaller sets of orthographically related words, MetaMorph outperforms the state-of-the-art ParaMor-Morfessor Union morphology induction system. Tested on 5,000 orthographically similar Hungarian word types, MetaMorph reaches 54.1% and ParaMor-Morfessor just 41.9%. Hence, we conclude that MSA is a promising algorithm for unsupervised morphology induction. Future research directions are discussed.
引用
收藏
页码:666 / +
页数:2
相关论文
共 50 条
  • [21] A multiple sequence alignment method with sequence vectorization
    Ji, Guoli
    Zeng, Yong
    Yang, Zijiang
    Ye, Congting
    Yao, Jingci
    ENGINEERING COMPUTATIONS, 2014, 31 (02) : 283 - 296
  • [22] Performance analysis of computational approaches to solve Multiple Sequence Alignment
    Montanola, Alberto
    Roig, Concepcio
    Guirado, Fernando
    Hernandez, Porfidio
    Notredame, Cedric
    JOURNAL OF SUPERCOMPUTING, 2013, 64 (01): : 69 - 78
  • [23] Performance analysis of computational approaches to solve Multiple Sequence Alignment
    Alberto Montañola
    Concepció Roig
    Fernando Guirado
    Porfidio Hernández
    Cedric Notredame
    The Journal of Supercomputing, 2013, 64 : 69 - 78
  • [24] Class of Multiple Sequence Alignment Algorithm Affects Genomic Analysis
    Blackburne, Benjamin P.
    Whelan, Simon
    MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (03) : 642 - 653
  • [25] Optimization alignment: The end of multiple sequence alignment in phylogenetics?
    Wheeler, W
    CLADISTICS-THE INTERNATIONAL JOURNAL OF THE WILLI HENNIG SOCIETY, 1996, 12 (01): : 1 - 9
  • [26] An Efficient Progressive Alignment Algorithm for Multiple Sequence Alignment
    Lakshmi, P. V.
    Rao, Allam Appa
    Sridhar, G. R.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (10): : 301 - 305
  • [27] SEQUENCE ALIGNMENT OF CITRATE SYNTHASE PROTEINS USING A MULTIPLE SEQUENCE ALIGNMENT ALGORITHM AND MULTIPLE SCORING MATRICES
    HENNEKE, CM
    DANSON, MJ
    HOUGH, DW
    OSGUTHORPE, DJ
    PROTEIN ENGINEERING, 1989, 2 (08): : 597 - 604
  • [28] Multiple sequence alignment containing a sequence of regular expressions
    Arslan, AN
    PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2005, : 230 - 236
  • [29] A new heuristic for multiple sequence alignment
    Agrawal, Ankit
    Khaitan, Siddhartha Kumar
    2008 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY, 2008, : 215 - +
  • [30] Multiple sequence alignment: Algorithms and applications
    Gotoh, O
    ADVANCES IN BIOPHYSICS, VOL 36, 1999, 1999, 36 : 159 - 206