Morphological Analysis by Multiple Sequence Alignment

被引:0
|
作者
Tchoukalov, Tzvetan [1 ]
Monson, Christian [2 ]
Roark, Brian [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In biological sequence processing, Multiple Sequence Alignment (MSA) techniques capture information about long-distance dependencies and the three-dimensional structure of protein and nucleotide sequences without resorting to polynomial complexity context-free models. But MSA techniques have rarely been used in natural language (NL) processing, and never for NL morphology induction. Our MetaMorph algorithm is a first attempt at leveraging MSA techniques to induce NL morphology in an unsupervised fashion. Given a text corpus in any language, MetaMorph sequentially aligns words of the corpus to form an MSA and then segments the MSA to produce morphological analyses. Over corpora that contain millions of unique word types, MetaMorph identifies morphemes at an F-1 below state-of-the-art performance. But when restricted to smaller sets of orthographically related words, MetaMorph outperforms the state-of-the-art ParaMor-Morfessor Union morphology induction system. Tested on 5,000 orthographically similar Hungarian word types, MetaMorph reaches 54.1% and ParaMor-Morfessor just 41.9%. Hence, we conclude that MSA is a promising algorithm for unsupervised morphology induction. Future research directions are discussed.
引用
收藏
页码:666 / +
页数:2
相关论文
共 50 条
  • [41] Practical aspects of multiple sequence alignment
    Baxevanis, AD
    BIOINFORMATICS, 1998, 39 : 172 - 188
  • [42] Optimization of multiple-sequence alignment based on multiple-structure alignment
    Shatsky, M
    Nussinov, R
    Wolfson, HJ
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 62 (01) : 209 - 217
  • [43] Approximation algorithms for multiple sequence alignment
    Bafna, V
    Lawler, EL
    Pevzner, PA
    THEORETICAL COMPUTER SCIENCE, 1997, 182 (1-2) : 233 - 244
  • [44] MALIGN - A MULTIPLE SEQUENCE ALIGNMENT PROGRAM
    WHEELER, WC
    GLADSTEIN, DS
    JOURNAL OF HEREDITY, 1994, 85 (05) : 417 - 418
  • [45] A METHOD FOR MULTIPLE SEQUENCE ALIGNMENT WITH GAPS
    SUBBIAH, S
    HARRISON, SC
    JOURNAL OF MOLECULAR BIOLOGY, 1989, 209 (04) : 539 - 548
  • [46] MALIGNED - A MULTIPLE SEQUENCE ALIGNMENT EDITOR
    CLARK, SP
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1992, 8 (06): : 535 - 538
  • [47] An evolutionary progressive multiple sequence alignment
    Naznin, Farhana
    Nakamura, Morikazu
    Okazaki, Takeo
    Nakajima, Yumiko
    2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, : 3886 - 3893
  • [48] Multiple sequence alignment with Clustal x
    Jeanmougin, F
    Thompson, JD
    Gouy, M
    Higgins, DG
    Gibson, TJ
    TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (10) : 403 - 405
  • [49] An optimization approach to multiple sequence alignment
    Hunt, FY
    Kearsley, AJ
    Wan, HH
    APPLIED MATHEMATICS LETTERS, 2003, 16 (05) : 785 - 790
  • [50] Multiple sequence alignment tools on the Web
    Gaskell, GJ
    BIOTECHNIQUES, 2000, 29 (01) : 60 - +