A Maximum-Likelihood Formulation and EM Algorithm for the Protein Multiple Alignment Problem

被引:0
|
作者
Sulimova, Valentina [1 ]
Razin, Nikolay [2 ]
Mottl, Vadim [3 ]
Muchnik, Ilya [4 ]
Kulikowski, Casimir [5 ]
机构
[1] Tula State Univ, Lenine Ave 92, Tula 300600, Russia
[2] MIPT, Moscow 117303, Russia
[3] Ctr Comp, RAS, Moscow 119333, Russia
[4] Rutgers State Univ, DIMACS, New Brunswick, NJ 08901 USA
[5] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ 08901 USA
来源
PATTERN RECOGNITION IN BIOINFORMATICS | 2010年 / 6282卷
关键词
Multiple alignment problem; protein sequences analysis; EM-algorithm; HMM; common ancestor; SEQUENCE ALIGNMENT;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A given group of protein sequences of different lengths is considered as resulting from random transformations of independent random ancestor sequences of the same preset smaller length, each produced in accordance with an unknown common probabilistic profile. We describe the process of transformation by a Hidden Markov Model (HMM) which is a direct generalization of the PAM model for amino acids. We formulate the problem of finding the maximum likelihood probabilistic ancestor profile and demonstrate its practicality. The proposed method of solving this problem allows for obtaining simultaneously the ancestor profile and the posterior distribution of its HMM, which permits efficient determination of the most probable multiple alignment of all the sequences. Results obtained on the BAliBASE 3.0 protein alignment benchmark indicate that the proposed method is generally more accurate than popular methods of multiple alignment such as CLUSTALW, DIALIGN and ProbAlign.
引用
收藏
页码:171 / +
页数:3
相关论文
共 50 条