Classification of gasoline data obtained by gas chromatography using a piecewise alignment algorithm combined with feature selection and principal component analysis

被引:138
|
作者
Pierce, KM
Hope, JL
Johnson, KJ
Wright, BW
Synovec, RE
机构
[1] Univ Washington, Dept Chem, Seattle, WA 98195 USA
[2] Pacific NW Natl Lab, Richland, WA 99352 USA
关键词
alignment; gas chromatography; feature selection; principal component analysis; ANOVA; fuel; chemometrics;
D O I
10.1016/j.chroma.2005.04.078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A fast and objective chemometric classification method is developed and applied to the analysis of gas chromatography (GC) data from five commercial gasoline samples. The gasoline samples serve as model mixtures, whereas the focus is on the development and demonstration of the classification method. The method is based on objective retention time alignment (referred to as piecewise alignment) coupled with analysis of variance (ANOVA) feature selection prior to classification by principal component analysis (PCA) using optimal parameters. The degree-of-class-separation is used as a metric to objectively optimize the alignment and feature selection parameters using a suitable training set thereby reducing user subjectivity, as well as to indicate the success of the PCA clustering and classification. The degree-of-class-separation is calculated using Euclidean distances between the PCA scores of a subset of the replicate runs from two of the five fuel types, i.e., the training set. The unaligned training set that was directly submitted to PCA had a low degree-of-class-separation (0.4), and the PCA scores plot for the raw training set combined with the raw test set failed to correctly cluster the five sample types. After submitting the training set to piecewise alignment, the degree-of-class-separation increased (1.2), but when the same alignment parameters were applied to the training set combined with the test set, the scores plot clustering still did not yield five distinct groups. Applying feature selection to the unaligned training set increased the degree-of-class-separation (4.8), but chemical variations were still obscured by retention time variation and when the same feature selection conditions were used for the training set combined with the test set, only one of the five fuels was clustered correctly. However, piecewise alignment coupled with feature selection yielded a reasonably optimal degree-of-class-separation for the training set (9.2). and when the same alignment and ANOVA parameters were applied to the training set combined with the test set, the PCA scores plot correctly classified the gasoline fingerprints into five distinct clusters. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:101 / 110
页数:10
相关论文
共 50 条
  • [21] Instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis using decision tree
    Suganthi, M.
    Karunakaran, V.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 89 - 101
  • [22] Feature selection using Principal Component Analysis for massive retweet detection
    Morchid, Mohamed
    Dufour, Richard
    Bousquet, Pierre-Michel
    Linares, Georges
    Torres-Moreno, Juan-Manuel
    PATTERN RECOGNITION LETTERS, 2014, 49 : 33 - 39
  • [23] A study on feature selection in face image using principal component analysis and particle swarm optimization algorithm
    Kim, Woong-Ki
    Oh, Sung-Kwun
    Kim, Hyun-Ki
    Transactions of the Korean Institute of Electrical Engineers, 2009, 58 (12): : 2511 - 2519
  • [24] Data Classification for Gas Sensor Array by Means of Principal Component Analysis and Sparse Bayesian Learning Algorithm
    Wang, Xiaodong
    Ye, Meiying
    ACC 2009: ETP/IITA WORLD CONGRESS IN APPLIED COMPUTING, COMPUTER SCIENCE, AND COMPUTER ENGINEERING, 2009, : 225 - 228
  • [25] Hybrid Feature Selection Based on Principal Component Analysis and Grey Wolf Optimizer Algorithm for Arabic News Article Classification
    Alomari, Osama Ahmad
    Elnagar, Ashraf
    Afyouni, Imad
    Shahin, Ismail
    Nassif, Ali Bou
    Hashem, Ibrahim Abaker
    Tubishat, Mohammad
    IEEE ACCESS, 2022, 10 : 121816 - 121830
  • [26] Gene selection for microarray data analysis using principal component analysis
    Wang, AT
    Gehan, EA
    STATISTICS IN MEDICINE, 2005, 24 (13) : 2069 - 2087
  • [27] Impact of data bin size on the classification of diesel fuels using comprehensive two-dimensional gas chromatography with principal component analysis
    Sudol, Paige E.
    Gough, Derrick V.
    Prebihalo, Sarah E.
    Synovec, Robert E.
    TALANTA, 2020, 206
  • [28] Subject classification obtained by cluster analysis and principal component analysis applied to flow cytometric data
    Lugli, Enrico
    Pinti, Marcello
    Nasi, Milena
    Troiano, Leonarda
    Ferraresi, Roberta
    Mussi, Chiara
    Salvioli, Gianfranco
    Patsekin, Valeri
    Robinson, J. Paul
    Durante, Caterina
    Cocchi, Marina
    Cossarizza, Andrea
    CYTOMETRY PART A, 2007, 71A (05) : 334 - 344
  • [29] Terahertz data combined with principal component analysis applied for visual classification of materials
    Xie, Yijun
    Sun, Ping
    OPTICAL AND QUANTUM ELECTRONICS, 2018, 50 (01)
  • [30] Terahertz data combined with principal component analysis applied for visual classification of materials
    Yijun Xie
    Ping Sun
    Optical and Quantum Electronics, 2018, 50