Machine Learning Based Prediction of Enzymatic Degradation of Plastics Using Encoded Protein Sequence and Effective Feature Representation

被引:14
|
作者
Jiang, Renjing [1 ]
Shang, Lanyu [2 ]
Wang, Ruohan [1 ]
Wang, Dong [2 ]
Wei, Na [1 ]
机构
[1] Univ Illinois, Dept Civil & Environm Engn, Urbana, IL 61801 USA
[2] Univ Illinois, Sch Informat Sci, Champaign, IL 61820 USA
基金
美国国家科学基金会;
关键词
Machine learning; plastic waste; enzymaticdegradation; enzyme function; sequence representation; HEAT-CAPACITY; PORE-SIZE; TECHNOLOGIES; DEPOLYMERASE; HYDROLYSIS; DIFFUSION; SUBSTRATE;
D O I
10.1021/acs.estlett.3c00293
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Enzyme biocatalysis for plastic treatment and recyclingis an emergingfield of growing interest. However, it is challenging and time-consumingto identify plastic-degrading enzymes with desirable functionality,given the large number of putative enzyme sequences. There is a criticalneed to develop an effective approach to accurately predict the enzymeactivity in degrading different types of plastics. In this study,we developed a machine-learning-based plastic enzymatic degradation(PED) framework to predict the ability of an enzyme to degrade plasticsof interest by exploring and recognizing hidden patterns in proteinsequences. A data set integrating information from a wide range ofexperimentally verified enzymes and various common plastic substrateswas created. A new context-aware enzyme sequence representation (CESR)mechanism was developed to learn the abundant contextual informationin enzyme sequences, and feature extraction was performed for enzymesat both the amino acid level and global sequence level. Thirteen machinelearning classification algorithms were compared, and XGBoost wasidentified as the best-performing algorithm. PED achieved an overallaccuracy of 90.2% and outperformed sequence-based protein classificationmodels from the existing literature. Furthermore, important enzymefeatures in plastic degradation were identified and comprehensivelyinterpreted. This study demonstrated a new tool for the predictionand discovery of plastic-degrading enzymes.
引用
收藏
页码:557 / 564
页数:8
相关论文
共 50 条
  • [41] Calibrating the classifier for protein family prediction with protein sequence using machine learning techniques: An empirical investigation
    Idhaya, T.
    Suruliandi, A.
    Calitoiu, Dragos
    Raja, S. P.
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2023, 21 (03)
  • [42] Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation
    Phasit Charoenkwan
    Chanin Nantasenamat
    Md. Mehedi Hasan
    Watshara Shoombuatong
    Journal of Computer-Aided Molecular Design, 2020, 34 : 1105 - 1116
  • [43] Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation
    Charoenkwan, Phasit
    Nantasenamat, Chanin
    Hasan, Md. Mehedi
    Shoombuatong, Watshara
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2020, 34 (10) : 1105 - 1116
  • [44] Prediction of protein subcellular localization using machine learning with novel use of generic feature set
    Upama, Paramita Basak
    Tanny, Nawshin Tabassum
    Akhter, Shahin
    PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, : 98 - 101
  • [45] Prediction of protein subcellular location using a combined feature of sequence
    Gao, QB
    Wang, ZZ
    Yan, C
    Du, YH
    FEBS LETTERS, 2005, 579 (16): : 3444 - 3448
  • [46] MEM-FET: Essential protein prediction using membership feature and machine learning approach
    Payra, Anjan Kumar
    Saha, Banani
    Ghosh, Anupam
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2024, 92 (01) : 60 - 75
  • [47] An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems
    Muhammad Tahir
    Fazlullah Khan
    Maqsood Hayat
    Mohammad Dahman Alshehri
    Neural Computing and Applications, 2024, 36 : 65 - 75
  • [48] ETCD: An effective machine learning based technique for cardiac disease prediction with optimal feature subset selection
    Wadhawan, Savita
    Maini, Raman
    KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [49] Classification of enzyme function from protein sequence based on feature representation
    Lee, Bum Ju
    Lee, Jong Yun
    Lee, Heon Gu
    Ryu, Keun Ho
    PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 741 - +
  • [50] Genetic Algorithm Based Optimized Feature Engineering and Hybrid Machine Learning for Effective Energy Consumption Prediction
    Khan, Prince Waqas
    Byun, Yung-Cheol
    IEEE ACCESS, 2020, 8 : 196274 - 196286