CSI: Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences

被引:0
|
作者
Seo, M [1 ]
Park, S [1 ]
Won, JI [1 ]
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul 120749, South Korea
关键词
indexing method; secondary structure of proteins; approximate searching;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins' secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.
引用
收藏
页码:237 / 247
页数:11
相关论文
共 50 条
  • [41] Fold recognition using predicted secondary structure sequences and hidden Markov models of protein folds
    Di Francesco, V
    Geetha, V
    Garnier, J
    Munson, PJ
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1997, : 123 - 128
  • [42] Variable Length Character N-Gram Embedding of Protein Sequences for Secondary Structure Prediction
    Sharma, Ashish Kumar
    Srivastava, Rajeev
    PROTEIN AND PEPTIDE LETTERS, 2021, 28 (05): : 501 - 507
  • [43] Prediction of protein structural classes for low-homology sequences based on predicted secondary structure
    Yang, Jian-Yi
    Peng, Zhen-Ling
    Chen, Xin
    BMC BIOINFORMATICS, 2010, 11
  • [44] Side-chain conformations cooperatively restricted in protein secondary structure .1. A novel method for exhaustive structure searching
    Nakamura, H
    Tanimura, R
    Kidera, A
    PROCEEDINGS OF THE JAPAN ACADEMY SERIES B-PHYSICAL AND BIOLOGICAL SCIENCES, 1996, 72 (07): : 143 - 148
  • [45] An efficient three-Level parallel ABC algorithm for secondary structure prediction of complex RNA sequences
    Lalwani, Soniya
    Kumar, Rajesh
    APPLIED SOFT COMPUTING, 2021, 99 (99)
  • [46] Prediction of Domain Boundaries in Protein Sequences Using Predicted Secondary Structure and Physicochemical Properties of Amino Acids
    Chakraborty, Srija
    Das, Subhasish
    Chatterjee, Piyali
    2014 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2014), 2014, : 1022 - 1026
  • [47] An efficient tool for searching maximal and super maximal repeats in large DNA/protein sequences via induced-enhanced suffix array
    Kumar S.
    Agarwal S.
    Ranvijay
    Recent Patents on Computer Science, 2019, 12 (02) : 128 - 134
  • [48] A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach
    Hua, SJ
    Sun, ZR
    JOURNAL OF MOLECULAR BIOLOGY, 2001, 308 (02) : 397 - 407
  • [49] RCSB Protein Data Bank: Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive
    Hudson, Brian
    Rose, Yana
    Duarte, Jose M.
    Lowe, Robert
    Bi, Chunxiao
    Bhikadiya, Charmi
    Chen, Li
    Bittrich, Sebastian
    Segura, Joan
    Burley, Stephen
    Westbrook, John
    Rose, Alexander S.
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2021, 77 : A253 - A253
  • [50] AdoMet radical proteins - from structure to evolution - alignment of divergent protein sequences reveals strong secondary structure element conservation
    Nicolet, Y
    Drennan, CL
    NUCLEIC ACIDS RESEARCH, 2004, 32 (13) : 4015 - 4025