Morphological Analysis by Multiple Sequence Alignment

被引:0
|
作者
Tchoukalov, Tzvetan [1 ]
Monson, Christian [2 ]
Roark, Brian [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR USA
来源
MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS | 2010年 / 6241卷
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In biological sequence processing, Multiple Sequence Alignment (MSA) techniques capture information about long-distance dependencies and the three-dimensional structure of protein and nucleotide sequences without resorting to polynomial complexity context-free models. But MSA techniques have rarely been used in natural language (NL) processing, and never for NL morphology induction. Our MetaMorph algorithm is a first attempt at leveraging MSA techniques to induce NL morphology in an unsupervised fashion. Given a text corpus in any language, MetaMorph sequentially aligns words of the corpus to form an MSA and then segments the MSA to produce morphological analyses. Over corpora that contain millions of unique word types, MetaMorph identifies morphemes at an F-1 below state-of-the-art performance. But when restricted to smaller sets of orthographically related words, MetaMorph outperforms the state-of-the-art ParaMor-Morfessor Union morphology induction system. Tested on 5,000 orthographically similar Hungarian word types, MetaMorph reaches 54.1% and ParaMor-Morfessor just 41.9%. Hence, we conclude that MSA is a promising algorithm for unsupervised morphology induction. Future research directions are discussed.
引用
收藏
页码:666 / +
页数:2
相关论文
共 50 条
  • [31] GAP COSTS FOR MULTIPLE SEQUENCE ALIGNMENT
    ALTSCHUL, SF
    JOURNAL OF THEORETICAL BIOLOGY, 1989, 138 (03) : 297 - 309
  • [32] Multiple Sequence Alignment with Genetic Algorithms
    Botta, Marco
    Negro, Guido
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, 2010, 6160 : 206 - 214
  • [33] Parallel progressive multiple sequence alignment
    Pitzer, E
    COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2005, 2005, 3643 : 473 - 482
  • [34] MULTIPLE SEQUENCE ALIGNMENT BY A PAIRWISE ALGORITHM
    TAYLOR, WR
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1987, 3 (02): : 81 - 87
  • [35] Multiple sequence alignment using anytime A*
    Zhou, R
    Hansen, EA
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 975 - 976
  • [36] Multiple sequence alignment for phylogenetic purposes
    Morrison, David A.
    AUSTRALIAN SYSTEMATIC BOTANY, 2006, 19 (06) : 479 - 539
  • [37] A genetic algorithm for multiple sequence alignment
    Horng, JT
    Wu, LC
    Lin, CM
    Yang, BH
    SOFT COMPUTING, 2005, 9 (06) : 407 - 420
  • [38] An Optimized System for Multiple Sequence Alignment
    Yilmaz, Caglar
    Gok, Mustafa
    2009 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS, 2009, : 178 - +
  • [39] A FLEXIBLE MULTIPLE SEQUENCE ALIGNMENT PROGRAM
    MARTINEZ, HM
    NUCLEIC ACIDS RESEARCH, 1988, 16 (05) : 1683 - 1691
  • [40] Heuristics for multiobjective multiple sequence alignment
    Maryam Abbasi
    Luís Paquete
    Francisco B. Pereira
    BioMedical Engineering OnLine, 15