The Difference Between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality

被引：17

作者：

Batista, Jadranko ^{[1
]}

Vikic-Topic, Drazen ^{[2
]}

Lucic, Bono ^{[2
]}

机构：

[1] Univ Mostar, Fac Sci & Educ, Mostar, Bosnia & Herceg

[2] Rudjer Boskovic Inst, NMR Ctr, POB 180, HR-10002 Zagreb, Croatia

来源：

CROATICA CHEMICA ACTA | 2016年 / 89卷 / 04期

关键词：

classification model; Q(2) accuracy; overall classification accuracy; random classification accuracy; classification accuracy difference; correct class estimation; under-prediction; over-prediction; class imbalance; membrane structure modeling; QSAR classification modeling; PREDICTION; TOPOLOGY; PROTEINS; INDEXES;

D O I：

10.5562/cca3117

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

The simplest and the most commonly used measure for assess the classification model quality is parameter Q(2) = 100 (p + n) / N (%) named the classification accuracy, p, n and N are the total numbers of correctly predicted compounds in the first and in the second class, and the total number of elements of classes (compounds) in data set, respectively. Moreover, the most probable accuracy that can be obtained by a random model is calculated for two-state model by the formulae Q(2,rnd) = 100 [(p + u) (p + o) + (n + u) (n + o)] / N-2 (%), where u and o are thetotal number of under-predictions (when class 1 is predicted by the model as class 2) and over-predictions (when class 2 is predicted by the model as class 1) in data set, respectively. Finally, the difference between these two parameter Delta Q(2) = Q(2) - Q(2), rnd is introduced, and it is suggested to compute and give Delta Q(2) for each two-state classification model to assess its contribution over the accuracy of the corresponding random model. When data set is ideally balanced having the same numbers of elements in both classes, the two-state classification problem is the most difficult with maximal Q(2) = 100 % and Q(2,rnd) = 50 %, giving the maximal Q(2) = 50 %. The usefulness of Q(2) parameter is illustrated in comparative analysis on two-class classification models from literature for prediction of secondary structure of membrane proteins and on several quantitative structure-property models. Real contributions of these models over the random level of accuracy is calculated, and their Delta Q(2) values are compared mutually and with the value of.Q(2) (= 50 %) for the most difficult two-state classification model.

引用

页码：527 / 534

页数：8

共 30 条

[1] Storage model with a two-state random environment
Kella, Offer
Whitt, Ward
Operations Research, 1992, 40 (SUPPL)
[2] New applications of the two-state random model
Gitterman, M.
Physica A: Statistical and Theoretical Physics, 1995, 221 (1-3):
[3] A fluid EOQ model with a two-state random environment
Berman, O
Perry, D
Stadje, W
PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2006, 20 (02) : 329 - 349
[4] The rejuvenation effect in the two-state random energy model
Kawasaki, M
JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2001, 70 (06) : 1762 - 1767
[5] Double Two-State Opsin Model With Autonomous Parameter Inference
Schoeters, Ruben
Tarnaud, Thomas
Martens, Luc
Joseph, Wout
Raedt, Robrecht
Tanghe, Emmeric
Frontiers in Computational Neuroscience, 2021, 15
[6] Double Two-State Opsin Model With Autonomous Parameter Inference
Schoeters, Ruben
Tarnaud, Thomas
Martens, Luc
Joseph, Wout
Raedt, Robrecht
Tanghe, Emmeric
FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15
[7] CONTINUOUS-TIME RANDOM WALK MODEL OF RELAXATION OF TWO-STATE SYSTEMS
Denisov, S. I.
Bystrik, Yu. S.
ACTA PHYSICA POLONICA B, 2015, 46 (05): : 931 - 947
[8] Accuracy of maximum likelihood estimates of a two-state model in single-molecule FRET
Gopich, Irina V.
JOURNAL OF CHEMICAL PHYSICS, 2015, 142 (03):
[9] Note: Network random walk model of two-state protein folding: Test of the theory
Berezhkovskii, Alexander M.
Murphy, Ronan D.
Buchete, Nicolae-Viorel
JOURNAL OF CHEMICAL PHYSICS, 2013, 138 (03):
[10] Random forests and classification of satellite images: Relationship between the accuracy of the training model and the overall accuracy of the classification
Matsaguim A.N.G.
Tiomo E.D.
Revue Francaise de Photogrammetrie et de Teledetection, 2020, (222): : 3 - 14

← 1 2 3 →