Allerdictor: fast allergen prediction using text classification techniques

被引:35
|
作者
Dang, Ha X. [1 ]
Lawrence, Christopher B. [1 ,2 ]
机构
[1] Virginia Tech, Virginia Bioinformat Inst, Blacksburg, VA 24061 USA
[2] Virginia Tech, Dept Biol Sci, Blacksburg, VA 24061 USA
关键词
WEB SERVER; PROTEINS; ALGORITHM; DATABASE; ASTHMA;
D O I
10.1093/bioinformatics/btu004
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Accurately identifying and eliminating allergens from biotechnology- derived products are important for human health. From a biomedical research perspective, it is also important to identify allergens in sequenced genomes. Many allergen prediction tools have been developed during the past years. Although these tools have achieved certain levels of specificity, when applied to large-scale allergen discovery (e. g. at a whole-genome scale), they still yield many false positives and thus low precision (even at low recall) due to the extreme skewness of the data (allergens are rare). Moreover, the most accurate tools are relatively slow because they use protein sequence alignment to build feature vectors for allergen classifiers. Additionally, only web server implementations of the current allergen prediction tools are publicly available and are without the capability of large batch submission. These weaknesses make large-scale allergen discovery ineffective and inefficient in the public domain. Results: We developed Allerdictor, a fast and accurate sequence-based allergen prediction tool that models protein sequences as text documents and uses support vector machine in text classification for allergen prediction. Test results on multiple highly skewed datasets demonstrated that Allerdictor predicted allergens with high precision over high recall at fast speed. For example, Allerdictor only took similar to 6 min on a single core PC to scan a whole Swiss-Prot database of similar to 540 000 sequences and identified < 1% of them as allergens.
引用
收藏
页码:1120 / 1128
页数:9
相关论文
共 50 条
  • [41] Breast Cancer Prediction Using Data Mining Classification Techniques
    Kazi, Abdul Karim
    Waseemullah
    Baig, Mirza Adnan
    Khan, Shahzaib
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (09): : 696 - 704
  • [42] Prediction of Land Suitability for Crop Cultivation Using Classification Techniques
    Ganesan, Mariammal
    Andavar, Suruliandi
    Raj, Raja Soosaimarian Peter
    BRAZILIAN ARCHIVES OF BIOLOGY AND TECHNOLOGY, 2021, 64
  • [43] Classification and Prediction of Academic Talent Using Data Mining Techniques
    Jantan, Hamidah
    Hamdan, Abdul Razak
    Othman, Zulaiha Ali
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I, 2010, 6276 : 491 - +
  • [44] Liver Disease Prediction and Classification using Machine Learning Techniques
    Tokala, Srilatha
    Hajarathaiah, Koduru
    Gunda, Sai Ram Praneeth
    Botla, Srinivasrao
    Nalluri, Lakshmikanth
    Nagamanohar, Pathipati
    Anamalamudi, Satish
    Enduri, Murali Krishna
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 871 - 878
  • [45] A Comparative Study of Heart Disease Prediction Using Classification Techniques
    Alshakrani, Sara
    Hilal, Sawsan
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 11 - 16
  • [46] Astrological Prediction for Profession Using Classification Techniques of Artificial Intelligence
    Chaplot, Neelam
    Dhyani, Praveen
    Rishi, O. P.
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 233 - 236
  • [47] XAI Framework for Cardiovascular Disease Prediction Using Classification Techniques
    Guleria, Pratiyush
    Naga Srinivasu, Parvathaneni
    Ahmed, Shakeel
    Almusallam, Naif
    Alarfaj, Fawaz Khaled
    ELECTRONICS, 2022, 11 (24)
  • [48] Graph Convolutional Networks for Fast Text Classification
    Cai, Houyv
    Lv, Shaoqing
    Lu, Guangyue
    Li, Tingting
    Proceedings - 2022 4th International Conference on Natural Language Processing, ICNLP 2022, 2022, : 420 - 425
  • [49] A Simple Text Detection in Document Images using Classification-based Techniques
    Kawattikul, Khanabhorn
    Chomphuwiset, Phatthanaphong
    2017 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2017, : 119 - 122
  • [50] Using complex linguistic features in context-sensitive Text Classification techniques
    Wong, AKS
    Lee, JWT
    Yeung, DS
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3183 - 3188