Allerdictor: fast allergen prediction using text classification techniques

被引:35
|
作者
Dang, Ha X. [1 ]
Lawrence, Christopher B. [1 ,2 ]
机构
[1] Virginia Tech, Virginia Bioinformat Inst, Blacksburg, VA 24061 USA
[2] Virginia Tech, Dept Biol Sci, Blacksburg, VA 24061 USA
关键词
WEB SERVER; PROTEINS; ALGORITHM; DATABASE; ASTHMA;
D O I
10.1093/bioinformatics/btu004
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Accurately identifying and eliminating allergens from biotechnology- derived products are important for human health. From a biomedical research perspective, it is also important to identify allergens in sequenced genomes. Many allergen prediction tools have been developed during the past years. Although these tools have achieved certain levels of specificity, when applied to large-scale allergen discovery (e. g. at a whole-genome scale), they still yield many false positives and thus low precision (even at low recall) due to the extreme skewness of the data (allergens are rare). Moreover, the most accurate tools are relatively slow because they use protein sequence alignment to build feature vectors for allergen classifiers. Additionally, only web server implementations of the current allergen prediction tools are publicly available and are without the capability of large batch submission. These weaknesses make large-scale allergen discovery ineffective and inefficient in the public domain. Results: We developed Allerdictor, a fast and accurate sequence-based allergen prediction tool that models protein sequences as text documents and uses support vector machine in text classification for allergen prediction. Test results on multiple highly skewed datasets demonstrated that Allerdictor predicted allergens with high precision over high recall at fast speed. For example, Allerdictor only took similar to 6 min on a single core PC to scan a whole Swiss-Prot database of similar to 540 000 sequences and identified < 1% of them as allergens.
引用
收藏
页码:1120 / 1128
页数:9
相关论文
共 50 条
  • [1] Linear Text Segmentation Using Classification Techniques
    Pillai, Raji R.
    Idicula, Sumam Mary
    PROCEEDINGS OF THE FIRST AMRITA ACM-W CELEBRATION OF WOMEN IN COMPUTING IN INDIA (A2WIC), 2010,
  • [2] Text Classification using Clustering Techniques and PCA
    Kaur, Manpreet
    Bansal, Meenakshi
    2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 642 - 646
  • [3] Segmenting handwritten text using supervised classification techniques
    Sun, Y
    Butler, TS
    Shafarenko, A
    Adams, R
    Loomes, M
    Davey, N
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 657 - 662
  • [4] An exploration on text classification using machine learning techniques
    Athanasios, Tzimourtas
    Spyros, Bakalakos
    Panagiota, Tselenti
    Athanasios, Voulodimos
    25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 247 - 249
  • [5] Automated Operations Classification using Text Mining Techniques
    Esmael, Bilal
    Arnaout, Mohammad Arghad
    Fruhwirth, Rudolf K.
    Thonhauser, Gerhard
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL V, 2010, : 235 - 238
  • [6] Arabic dialects classification using text mining techniques
    AL-Walaie, Mona Abdullah
    Khan, Muhammad Badruddin
    2017 INTERNATIONAL CONFERENCE ON COMPUTER AND APPLICATIONS (ICCA), 2017, : 325 - 329
  • [7] Using IR techniques to improve automated text classification
    Gonçalves, T
    Quaresma, P
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2004, 3136 : 374 - 379
  • [8] FAST ONLINE INCREMENTAL APPROACH OF UNSEEN PLACE CLASSIFICATION USING DISJOINT-TEXT ATTRIBUTE PREDICTION
    Pimup, Rapeeporn
    Kawewong, Aram
    Hasegawa, Osamu
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 3141 - 3144
  • [9] Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study
    Mujtaba, Ghulam
    Shuib, Liyana
    Raj, Ram Gopal
    Rajandram, Retnagowri
    Shaikh, Khairunisa
    JOURNAL OF FORENSIC AND LEGAL MEDICINE, 2018, 57 : 41 - 50
  • [10] Protein structure prediction using classification techniques
    Charry-Ceballos, Christian
    Bedoya-Leiva, Oscar
    UIS INGENIERIAS, 2018, 17 (02): : 75 - 85