Allerdictor: fast allergen prediction using text classification techniques

被引:35
|
作者
Dang, Ha X. [1 ]
Lawrence, Christopher B. [1 ,2 ]
机构
[1] Virginia Tech, Virginia Bioinformat Inst, Blacksburg, VA 24061 USA
[2] Virginia Tech, Dept Biol Sci, Blacksburg, VA 24061 USA
关键词
WEB SERVER; PROTEINS; ALGORITHM; DATABASE; ASTHMA;
D O I
10.1093/bioinformatics/btu004
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Accurately identifying and eliminating allergens from biotechnology- derived products are important for human health. From a biomedical research perspective, it is also important to identify allergens in sequenced genomes. Many allergen prediction tools have been developed during the past years. Although these tools have achieved certain levels of specificity, when applied to large-scale allergen discovery (e. g. at a whole-genome scale), they still yield many false positives and thus low precision (even at low recall) due to the extreme skewness of the data (allergens are rare). Moreover, the most accurate tools are relatively slow because they use protein sequence alignment to build feature vectors for allergen classifiers. Additionally, only web server implementations of the current allergen prediction tools are publicly available and are without the capability of large batch submission. These weaknesses make large-scale allergen discovery ineffective and inefficient in the public domain. Results: We developed Allerdictor, a fast and accurate sequence-based allergen prediction tool that models protein sequences as text documents and uses support vector machine in text classification for allergen prediction. Test results on multiple highly skewed datasets demonstrated that Allerdictor predicted allergens with high precision over high recall at fast speed. For example, Allerdictor only took similar to 6 min on a single core PC to scan a whole Swiss-Prot database of similar to 540 000 sequences and identified < 1% of them as allergens.
引用
收藏
页码:1120 / 1128
页数:9
相关论文
共 50 条
  • [21] Stock Portfolio Management: Prediction of Risk using Text Classification
    Sanwaliya, Abhishek
    Shanker, Kripa
    Misra, Subhas C.
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 621 - +
  • [22] Candidate Teacher Performance Prediction Using Classification Techniques
    Zoroub, Mohammed Kh.
    Maghari, Ashraf Y.
    2017 INTERNATIONAL CONFERENCE ON PROMISING ELECTRONIC TECHNOLOGIES (ICPET 2017), 2017, : 129 - 134
  • [23] Prediction of Hypertension Complications Risk Using Classification Techniques
    Lee, Wonji
    Lee, Junghye
    Lee, Hyeseon
    Jun, Chi-Hyuck
    Park, Il-su
    Kang, Sung-Hong
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2014, 13 (04): : 449 - 453
  • [24] Using pattern classification and recognition techniques for diagnostic and prediction
    Morariu, Nicolae
    Vlad, Sorin
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2007, 7 (01) : 63 - 67
  • [25] Prediction of Stroke using Data Mining Classification Techniques
    Almadani, Ohoud
    Alshammari, Riyad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (01) : 457 - 460
  • [26] A fast algorithm for hierarchical text classification
    Chuang, WT
    Tiyyagura, A
    Yang, J
    Giuffrida, G
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2000, 1874 : 409 - 418
  • [27] The research progress of Text Classification Techniques
    Dong, Kuifeng
    Gao, Jun
    Zhang, Ming
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 1988 - 1991
  • [28] Efficient English text classification using selected Machine Learning Techniques
    Luo, Xiaoyu
    ALEXANDRIA ENGINEERING JOURNAL, 2021, 60 (03) : 3401 - 3409
  • [29] Text Classification of English News Articles using Graph Mining Techniques
    Abdulla, Hasan Hameed Hasan Ahmed
    Awad, Wasan Shakir
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 926 - 937
  • [30] Sentiment Analysis in Online Reviews Classification using Text Mining Techniques
    Agueda, M.
    Rita, P.
    Guerreiro, P.
    2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2019,