Cluster analysis for the selection of potential discriminatory variables and the identification of subgroups in archaeometry

被引:3
|
作者
Lopez-Garcia, Pedro A. [1 ]
Argote, Denisse L. [2 ]
机构
[1] Escuela Nacl Antropol Hist, Posgrad Arqueol, Perifer Sur Esq,Calle Zapote,Col Isidro Fabela, Mexico City, Mexico
[2] Inst Nacl Antropol & Hist, Direcc Estudios Arqueol, Tacuba 76,Colonia Ctr, Mexico City, Mexico
关键词
Archaeological glass; High-dimensional data; Dimensionality reduction; Feature selection; Databionic Swarm; Datavisualization; COMPOSITIONAL DATA-ANALYSIS; R PACKAGE; MODEL; GLASS; CLASSIFICATION; KNOWLEDGE; ANTWERP;
D O I
10.1016/j.jasrep.2023.104022
中图分类号
K85 [文物考古];
学科分类号
0601 ;
摘要
In this article, three variable selection methods based on Gaussian mixture models were compared to find a subset of variables that provided the "best" clustering. The use of an appropriate transformation for composi-tional data, whose geometric space is the Simplex, is emphasized. The comparison revealed the ability of the models to cluster data in multiple phases, showing to be more convenient to select the relevant variables than to perform an analysis based on 2D plots or by simultaneously including all the available variables in a multivariate analysis. Once the informative variables for the clustering were obtained, we used a method called Databionic Swarm (DBS). This method uses unsupervised machine learning, taking advantage of emergence and swarm intelligence applied to find natural chemical groups in the input data space. DBS can visualize high-dimensional distances in the projection through a 3D topographic map with hypsometric tints. The results were compared in terms of accuracy, both in the selection of the variables and in the classification, using a supervised accuracy index for clustering and two unsupervised indexes (the Heatmap and the Silhouette plot). The concepts and methods were illustrated by applying them to two published archaeological glass data sets. The first set consisted of 245 Romano-British glass vessels and the second set of 180 glass vessels from the 15th-17th century in Antwerp. In these applications, it was found that the methods for the selection of variables increased the ac-curacy of the classification compared to traditional methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] The identification of empirically derived cancer patient subgroups using psychosocial variables
    Trask, PC
    Griffith, KA
    JOURNAL OF PSYCHOSOMATIC RESEARCH, 2004, 57 (03) : 287 - 295
  • [32] SELECTION OF VARIABLES IN DISCRIMINANT ANALYSIS
    MERSCH, G
    ANNALES DE LA SOCIETE SCIENTIFIQUE DE BRUXELLES SERIES 1-SCIENCES MATHEMATIQUES ASTRONOMIQUES ET PHYSIQUES, 1973, 87 (03): : 299 - 309
  • [33] Analysis, identification and visualization of subgroups in genomics
    Voelkel, Gunnar
    Laban, Simon
    Fuerstberger, Axel
    Kuehlwein, Silke D.
    Ikonomi, Nensi
    Hoffmann, Thomas K.
    Brunner, Cornelia
    Neuberg, Donna S.
    Gaidzik, Verena
    Doehner, Hartmut
    Kraus, Johann M.
    Kestler, Hans A.
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [34] Cluster analysis using data from a survey of patients with asthma: Identification of asthma subgroups with history of exacerbations
    Miller, D. P.
    Li, H.
    Emmett, A.
    Sharma, S.
    Ortega, H. G.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2010, 181
  • [35] EHMTI-0056. Self-medication of headache: identification of subgroups of patients through cluster analysis
    E Mehuys
    K Paemeleire
    G Crombez
    T Van Hees
    T Christiaens
    L Van Bortel
    I Van Tongelen
    JP Remon
    K Boussery
    The Journal of Headache and Pain, 2014, 15
  • [36] Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables
    Ahlqvist, Emma
    Storm, Petter
    Karajamaki, Annemari
    Martinell, Mats
    Dorkhan, Mozhgan
    Carlsson, Annelie
    Vikman, Petter
    Prasad, Rashmi B.
    Aly, Dina Mansour
    Almgren, Peter
    Wessman, Ylva
    Shaat, Nael
    Spegel, Peter
    Mulder, Hindrik
    Lindholm, Eero
    Melander, Olle
    Hansson, Ola
    Malmqvist, Ulf
    Lernmark, Ake
    Lahti, Kaj
    Forsen, Tom
    Tuomi, Tiinamaija
    Rosengren, Anders H.
    Groop, Leif
    LANCET DIABETES & ENDOCRINOLOGY, 2018, 6 (05): : 361 - 369
  • [37] Integration of Cluster Analysis and Rock Physics for the Identification of Potential Hydrocarbon Reservoir
    Ali, Amjad
    Chen Sheng-Chang
    Shah, Munawar
    NATURAL RESOURCES RESEARCH, 2021, 30 (02) : 1395 - 1409
  • [38] Integration of Cluster Analysis and Rock Physics for the Identification of Potential Hydrocarbon Reservoir
    Amjad Ali
    Chen Sheng-Chang
    Munawar Shah
    Natural Resources Research, 2021, 30 : 1395 - 1409
  • [39] CLUSTATIS: Cluster analysis of blocks of variables
    Llobell, Fabien
    Qannari, El Mostafa
    ELECTRONIC JOURNAL OF APPLIED STATISTICAL ANALYSIS, 2020, 13 (02) : 436 - 453
  • [40] Who are the obese? A cluster analysis exploring subgroups of the obese
    Green, M. A.
    Strong, M.
    Razak, F.
    Subramanian, S. V.
    Relton, C.
    Bissell, P.
    JOURNAL OF PUBLIC HEALTH, 2016, 38 (02) : 258 - 264