Embedded Cluster Modelling - A novel method for analysing embedded data sets

被引:0
|
作者
Worth, AP [1 ]
Cronin, MTD
机构
[1] Commiss European Communities, Joint Res Ctr, European Ctr Validat Alternat Methods, Inst Hlth & Consumer Protect, I-21020 Ispra, VA, Italy
[2] Liverpool John Moores Univ, Sch Pharm & Chem, Liverpool L3 3AF, Merseyside, England
来源
关键词
bootstrap resampling; cluster significance analysis; embedded cluster modelling; minitab macro;
D O I
10.1002/(SICI)1521-3838(199907)18:3<229::AID-QSAR229>3.0.CO;2-G
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Cluster Significance Analysis (CSA) is a method for analysing embedded data sets, i.e. data sets in which the objects (chemicals) are divided into two classes (active/inactive or toxic/non-toxic) and in which one class of objects (typically, the active or toxic chemicals) is found to cluster along one or more variables (e.g. physicochemical descriptors), forming an 'embedded cluster' surrounded by the 'diffuse cluster' of objects in the other class (typically, the inactive or non-toxic chemicals). The aim of CSA is to identify variables along which clustering is statistically significant. Having identified significant variables, the investigator may wish to derive a model for classifying active and inactive chemicals on the basis of these variables. In this paper, a method called 'embedded cluster modelling' (ECM) is proposed for the derivation of such classification models. If ECM is applied to a single variable, the resulting model consists of two cut-off values (an upper and a lower limit) between which the active (toxic) chemicals are predicted to lie. If ECM is applied to two or more variables, the resulting model is best described as an 'elliptic model' of cluster membership, since the active (or toxic) chemicals are predicted to lie inside the boundary of a two-dimensional or three-dimensional ellipse, which is regarded as the boundary of the embedded cluster. The combined use of CSA and ECM for the analysis of embedded data sets is illustrated by their application to a data set of methacycline derivatives. The algorithms for CSA and ECM have been coded in the form of Minitab macros, which the authors are making freely available.
引用
收藏
页码:229 / 235
页数:7
相关论文
共 50 条
  • [1] Analysing Structured Scholarly Data Embedded in Web Pages
    Sahoo, Pracheta
    Gadiraju, Ujwal
    Yu, Ran
    Saha, Sriparna
    Dietze, Stefan
    SEMANTICS, ANALYTICS, VISUALIZATION: ENHANCING SCHOLARLY DATA, SAVE-SD 2016, 2016, 9792 : 90 - 100
  • [2] Novel Physically-Embedded Data Encryption for Embedded Device
    Hou, Fangyong
    Xiao, Nong
    He, Hongjun
    Liu, Fang
    Chen, Zhiguang
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 89 - 93
  • [3] A Novel Aircraft Trajectory Generation Method Embedded with Data Mining
    Gui, Xuhao
    Zhang, Junfeng
    Tang, Xinmin
    Delahaye, Daniel
    Bao, Jie
    AEROSPACE, 2024, 11 (08)
  • [4] Embedded cluster modelling: a novel quantitative structure-activity relationship method for generating elliptic models of biological activity
    Worth, AP
    Cronin, MTD
    PROGRESS IN THE REDUCTION, REFINEMENT AND REPLACEMENT OF ANIMAL EXPERIMENTATION, 2000, 31 : 479 - 491
  • [5] Large Basis Sets and Geometry Optimizations in Embedded Cluster Calculations
    Teunissen, E. H.
    Jansen, A. J.
    International Journal of Quantum Chemistry, 54 (01):
  • [6] TB-LMTO method for an embedded cluster
    Drchal, V.
    Kudrnovsky, J.
    PHILOSOPHICAL MAGAZINE, 2008, 88 (18-20) : 2777 - 2786
  • [7] Modelling a nuclear star cluster - Interaction with an embedded accretion disc
    Subr, L
    Karas, V
    Growing Black Holes: Accretion in a Cosmological Context, 2005, : 250 - 251
  • [8] A novel method to test embedded memories
    Wang, DH
    Fan, XY
    Gao, DY
    Zhang, SB
    ISTM/2005: 6TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-9, CONFERENCE PROCEEDINGS, 2005, : 8265 - 8268
  • [9] A Cluster-based Method to Detect and Correct Anomalies in Sensor Data of Embedded Systems
    Mojarad, Roghayeh
    Kordestani, Hossain
    Zarandi, Hamid R.
    2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), 2016, : 240 - 247
  • [10] Modelling and analysing exchangeable binary data with random cluster sizes
    Xu, JL
    Prorok, PC
    STATISTICS IN MEDICINE, 2003, 22 (15) : 2401 - 2416