Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements

被引:2
|
作者
Creanza, Teresa M. [1 ,2 ]
Horner, David S. [2 ]
D'Addabbo, Annarita [1 ]
Maglietta, Rosalia [1 ]
Mignone, Flavio [3 ]
Ancona, Nicola [1 ]
Pesole, Graziano [4 ,5 ]
机构
[1] CNR, Ist Studi Sistemi Intelligenti Automaz, I-70126 Bari, Italy
[2] Univ Milan, Dipartimento Sci Biomol & Biotecnol, Milan, Italy
[3] Univ Milan, Dipartimento Chim Strutturale & Stereochim Inorga, Milan, Italy
[4] Univ Bari, Dipartmento Biochim & Biol Mol, Bari, Italy
[5] CNR, Ist Tecnol Biomed, I-70126 Bari, Italy
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
IDENTIFICATION; TOOL; REGIONS; SEARCH; MOUSE; BLAST; TAGS; RAT;
D O I
10.1186/1471-2105-10-S6-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The identification of protein coding elements in sets of mammalian conserved elements is one of the major challenges in the current molecular biology research. Many features have been proposed for automatically distinguishing coding and non coding conserved sequences, making so necessary a systematic statistical assessment of their differences. A comprehensive study should be composed of an association study, i.e. a comparison of the distributions of the features in the two classes, and a prediction study in which the prediction accuracies of classifiers trained on single and groups of features are analyzed, conditionally to the compared species and to the sequence lengths. Results: In this paper we compared distributions of a set of comparative and non comparative features and evaluated the prediction accuracy of classifiers trained for discriminating sequence elements conserved among human, mouse and rat species. The association study showed that the analyzed features are statistically different in the two classes. In order to study the influence of the sequence lengths on the feature performances, a predictive study was performed on different data sets composed of coding and non coding alignments in equal number and equally long with an ascending average length. We found that the most discriminant feature was a comparative measure indicating the proportion of synonymous nucleotide substitutions per synonymous sites. Moreover, linear discriminant classifiers trained by using comparative features in general outperformed classifiers based on intrinsic ones. Finally, the prediction accuracy of classifiers trained on comparative features increased significantly by adding intrinsic features to the set of input variables, independently on sequence length (Kolmogorov-Smirnov P-value <= 0.05). Conclusion: We observed distinct and consistent patterns for individual and combined use of comparative and intrinsic classifiers, both with respect to different lengths of sequences/alignments and with respect to error rates in the classification of coding and non-coding elements. In particular, we noted that comparative features tend to be more accurate in the classification of coding sequences - this is likely related to the fact that such features capture deviations from strictly neutral evolution expected as a consequence of the characteristics of the genetic code.
引用
收藏
页数:12
相关论文
共 44 条
  • [31] The transcriptional landscape of mouse beta cells compared to human beta cells reveals notable species differences in long non-coding RNA and protein-coding gene expression
    Benner, Christopher
    van der Meulen, Talitha
    Caceres, Elena
    Tigyi, Kristof
    Donaldson, Cynthia J.
    Huising, Mark O.
    BMC GENOMICS, 2014, 15
  • [32] Cross-species sequencing and functional studies identify non-coding elements with biological import including a potent IL-4 and IL-13 enhancer.
    Rubin, EM
    Cretu, G
    Miller, W
    Frazer, KA
    AMERICAN JOURNAL OF HUMAN GENETICS, 1999, 65 (04) : A93 - A93
  • [33] Examples of sequence conservation analyses capture a subset of mouse long non-coding RNAs sharing homology with fish conserved genomic elements
    Basu, Swaraj
    Mueller, Ferenc
    Sanges, Remo
    BMC BIOINFORMATICS, 2013, 14
  • [34] Examples of sequence conservation analyses capture a subset of mouse long non-coding RNAs sharing homology with fish conserved genomic elements
    Swaraj Basu
    Ferenc Müller
    Remo Sanges
    BMC Bioinformatics, 14
  • [35] An ultrasensitive genosensor for detection of toxigenic and non-toxigenic Clostridioides difficile based on a conserved sequence in surface layer protein coding gene
    Chamgordani, Sepideh Ziaei
    Yadegar, Abbas
    Azimirad, Masoumeh
    Ghourchian, Hedayatollah
    TALANTA, 2024, 275
  • [36] PRO-ALPHA-2(V) COLLAGEN GENE - PAIRWISE ANALYSIS OF THE AMINO-PROPEPTIDE CODING DOMAIN, AND CROSS-SPECIES COMPARISON OF THE PROMOTER SEQUENCE
    TRUTER, S
    ANDRIKOPOULOS, K
    DILIBERTO, M
    WOMACK, L
    RAMIREZ, F
    CONNECTIVE TISSUE RESEARCH, 1993, 29 (01) : 51 - 59
  • [37] Human prion protein sequence elements impede cross-species chronic wasting disease transmission (vol 125, pg 1485, 2015)
    Kurt, Timothy D.
    Jiang, Lin
    Fernandez-Borges, Natalia
    Bett, Cyrus
    Liu, Jun
    Yang, Tom
    Spraker, Terry R.
    Castilla, Joaquin
    Eisenberg, David
    Kong, Qingzhong
    Sigurdson, Christina J.
    JOURNAL OF CLINICAL INVESTIGATION, 2015, 125 (06): : 2548 - 2548
  • [38] An Evolutionary Cancer Epigenetic Approach Revealed DNA Hypermethylation of Ultra-Conserved Non-Coding Elements in Squamous Cell Carcinoma of Different Mammalian Species
    Morandi, Luca
    Sabattini, Silvia
    Renzi, Andrea
    Rigillo, Antonella
    Bettini, Giuliano
    Dervas, Eva
    Schauer, Alexandria
    Morandi, Marco
    Gissi, Davide B.
    Tarsitano, Achille
    Evangelisti, Stefania
    Tonon, Caterina
    CELLS, 2020, 9 (09) : 1 - 18
  • [39] Sequence and phylogenetic analysis of the non-structural 3A and 3B protein-coding regions of foot-and-mouth disease virus subtype A Iran 05
    Jelokhani-Niaraki, Saber
    Esmaelizad, Majid
    Daliri, Morteza
    Vaez-Torshizi, Rasoul
    Kamalzadeh, Morteza
    Lotfi, Mohsen
    JOURNAL OF VETERINARY SCIENCE, 2010, 11 (03) : 243 - 247
  • [40] Decryption of sequence, structure, and functional features of SINE repeat elements in SINEUP non-coding RNA-mediated post-transcriptional gene regulation
    Harshita Sharma
    Matthew N. Z. Valentine
    Naoko Toki
    Hiromi Nishiyori Sueki
    Stefano Gustincich
    Hazuki Takahashi
    Piero Carninci
    Nature Communications, 15