A Robust TabNet-Based Multi-Classification Algorithm for Infrared Spectral Data of Chinese Herbal Medicine with High-Dimensional Small Samples

被引:1
|
作者
Wang, Yongjun [1 ]
Jin, Chengliang [2 ]
Ma, Li [3 ]
Liu, Xiao [4 ]
机构
[1] Wenzhou Polytech, Sch Artificial Intelligence, Wenzhou 325035, Peoples R China
[2] Wenzhou Business Coll, Sch Informat Engn, Wenzhou 325035, Peoples R China
[3] Shanghai JianQiao Univ, Coll Informat Technol, Shanghai, Peoples R China
[4] Wenzhou Hosp Tradit Chinese Med, Dept Rehabil, Wenzhou 325000, Peoples R China
基金
中国国家自然科学基金;
关键词
Origins identification; Infrared spectroscopic data; High dimension; TabNet; Small sample size; FEATURE-SELECTION; FEATURES;
D O I
10.1016/j.jpba.2024.116031
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Robust classification algorithms for high -dimensional, small -sample datasets are valuable in practical applications. Faced with the infrared spectroscopic dataset with 568 samples and 3448 wavelengths (features) to identify the origins of Chinese medicinal materials, this paper proposed a novel embedded multiclassification algorithm, ITabNet, derived from the framework of TabNet. Firstly, a refined data pre-processing (DP) mechanism was designed to efficiently find the best adaptive one among 50 DP methods with the help of Support Vector Machine (SVM). Following this, an innovative focal loss function was designed and joined with a cross -validation experiment strategy to mitigate the impact of sample imbalance on algorithm. Detailed investigations on ITabNet were conducted, including comparisons of ITabNet with SVM for the conditions of DP and Non -DP, GPU and CPU computer settings, as well as ITabNet against XGBT (Extreme Gradient Boosting). The numerical results demonstrate that ITabNet can significantly improve the effectiveness of prediction. The best accuracy score is 1.0000, and the best Area Under the Curve (AUC) score is 1.0000. Suggestions on how to use models effectively were given. Furthermore, ITabNet shows the potential to apply the analysis of medicinal efficacy and chemical composition of medicinal materials. The paper also provides ideas for multi -classification modeling data with small sample size and high -dimensional feature.
引用
收藏
页数:13
相关论文
共 37 条
  • [31] Multi-Objective Clustering Ensemble for High-Dimensional Data based on Strength Pareto Evolutionary Algorithm (SPEA-II)
    Wahid, Abdul
    Gao, Xiaoying
    Andreae, Peter
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 133 - 141
  • [32] Multi-class data augmentation and fault diagnosis of wind turbine blades based on ISOMAP-CGAN under high-dimensional imbalanced samples
    Zhang, Yuyan
    Zhang, Yongqi
    Zhang, Yafeng
    Li, Hao
    Yan, Lingdi
    Wen, Xiaoyu
    Wang, Haoqi
    RENEWABLE ENERGY, 2025, 243
  • [33] Application of multi-algorithm ensemble methods in high-dimensional and small-sample data of geotechnical engineering: A case study of swelling pressure of expansive soils
    Li, Chao
    Wang, Lei
    Li, Jie
    Chen, Yang
    JOURNAL OF ROCK MECHANICS AND GEOTECHNICAL ENGINEERING, 2024, 16 (05) : 1896 - 1917
  • [34] An Efficient and Robust Algorithm to Generate Initial Center of Bisecting K-means for High-dimensional Big Data Based on Random Integer Triangular Matrix Mappings
    Li Min
    He Tingting
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (04) : 948 - 955
  • [35] Enhancing anomaly detection Efficiency: Introducing grid searchbased multi-population particle Swarm optimization algorithm based optimized Regional based Convolutional neural network for robust and scalable solutions in High-Dimensional data
    Nalini, M.
    Yamini, B.
    Fernandez, F. Mary Harin
    Priyadarsini, P. S. Uma
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 96
  • [36] A multi-objective evolutionary algorithm for solving the feature selection problem of high-dimensional sparse data and its application in the absorption, distribution, metabolism, excretion and toxicity (ADMET) classification
    Liu, Yu
    Wang, Jie-Sheng
    Wen, Jia-Yao
    Li, Yu-Tong
    Yan, Peng-Guo
    ENGINEERING OPTIMIZATION, 2025,
  • [37] An Efficient Estimation and Classification Methods for High Dimensional Data Using Robust Iteratively Reweighted SIMPLS Algorithm Based on nu-Support Vector Regression
    Rashid, Abdullah Mohammed
    Midi, Habshah
    Slwabi, Waleed Dhhan
    Arasan, Jayanthi
    IEEE ACCESS, 2021, 9 : 45955 - 45967