A Robust TabNet-Based Multi-Classification Algorithm for Infrared Spectral Data of Chinese Herbal Medicine with High-Dimensional Small Samples

被引:1
|
作者
Wang, Yongjun [1 ]
Jin, Chengliang [2 ]
Ma, Li [3 ]
Liu, Xiao [4 ]
机构
[1] Wenzhou Polytech, Sch Artificial Intelligence, Wenzhou 325035, Peoples R China
[2] Wenzhou Business Coll, Sch Informat Engn, Wenzhou 325035, Peoples R China
[3] Shanghai JianQiao Univ, Coll Informat Technol, Shanghai, Peoples R China
[4] Wenzhou Hosp Tradit Chinese Med, Dept Rehabil, Wenzhou 325000, Peoples R China
基金
中国国家自然科学基金;
关键词
Origins identification; Infrared spectroscopic data; High dimension; TabNet; Small sample size; FEATURE-SELECTION; FEATURES;
D O I
10.1016/j.jpba.2024.116031
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Robust classification algorithms for high -dimensional, small -sample datasets are valuable in practical applications. Faced with the infrared spectroscopic dataset with 568 samples and 3448 wavelengths (features) to identify the origins of Chinese medicinal materials, this paper proposed a novel embedded multiclassification algorithm, ITabNet, derived from the framework of TabNet. Firstly, a refined data pre-processing (DP) mechanism was designed to efficiently find the best adaptive one among 50 DP methods with the help of Support Vector Machine (SVM). Following this, an innovative focal loss function was designed and joined with a cross -validation experiment strategy to mitigate the impact of sample imbalance on algorithm. Detailed investigations on ITabNet were conducted, including comparisons of ITabNet with SVM for the conditions of DP and Non -DP, GPU and CPU computer settings, as well as ITabNet against XGBT (Extreme Gradient Boosting). The numerical results demonstrate that ITabNet can significantly improve the effectiveness of prediction. The best accuracy score is 1.0000, and the best Area Under the Curve (AUC) score is 1.0000. Suggestions on how to use models effectively were given. Furthermore, ITabNet shows the potential to apply the analysis of medicinal efficacy and chemical composition of medicinal materials. The paper also provides ideas for multi -classification modeling data with small sample size and high -dimensional feature.
引用
收藏
页数:13
相关论文
共 37 条
  • [1] A classification method for high-dimensional imbalanced multi-classification data
    Li, Mengmeng
    Zheng, Qibin
    Liu, Yi
    Li, Gengsong
    Qin, Wei
    Ren, Xiaoguang
    ELECTRONICS LETTERS, 2023, 59 (20)
  • [2] Multi-classification for high-dimensional data using probabilistic neural networks
    Li, Jingyi
    Chao, Xiaojie
    Xu, Qin
    JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES, 2022, 15 (02) : 111 - 118
  • [3] Decision theory classification of high-dimensional vectors based on small samples
    Bradshaw, David
    Pensky, Marianna
    TEST, 2008, 17 (01) : 83 - 100
  • [4] Decision theory classification of high-dimensional vectors based on small samples
    David Bradshaw
    Marianna Pensky
    TEST, 2008, 17 : 83 - 100
  • [5] Classification of high-dimensional imbalanced biomedical data based on spectral clustering SMOTE and marine predators algorithm
    Qin X.
    Zhang S.
    Dong X.
    Shi H.
    Yuan L.
    Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 8709 - 8728
  • [6] Classifier for Chinese traditional medicine with high-dimensional and small sample-size data
    Zhang, LX
    Zhao, YN
    Yang, ZH
    Wang, JX
    Cai, SQ
    Liu, HY
    PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 330 - 334
  • [7] The classification method based on evolutionary algorithm for high-dimensional imbalanced missing data
    Liu, Yi
    Li, Gengsong
    Li, Xiang
    Qin, Wei
    Zheng, Qibin
    Ren, Xiaoguang
    ELECTRONICS LETTERS, 2023, 59 (12)
  • [8] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar S.
    Aravindakshan Savithri A.
    Kaimal R.
    Turkish Journal of Electrical Engineering and Computer Sciences, 2019, 27 (06): : 4082 - 4101
  • [9] Defining and Evaluating Classification Algorithm for High-Dimensional Data Based on Latent Topics
    Luo, Le
    Li, Li
    PLOS ONE, 2014, 9 (01):
  • [10] A depth-based nearest neighbor algorithm for high-dimensional data classification
    Harikumar, Sandhya
    Aravindakshan Savithri, Akhil
    Kaimal, Ramachandra
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4082 - 4101