Categorical missing data imputation for software cost estimation by multinomial logistic regression

被引:45
|
作者
Sentas, P [1 ]
Angelis, L [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
关键词
software effort prediction; cost estimation; missing data; imputation; multinomial logistic regression;
D O I
10.1016/j.jss.2005.02.026
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A common problem in software cost estimation is the manipulation of incomplete or missing data in databases used for the development of prediction models. In such cases, the most popular and simple method of handling missing data is to ignore either the projects or the attributes with missing observations. This technique causes the loss of valuable information and therefore may lead to inaccurate cost estimation models. On the other hand, there are various imputation methods used to estimate the missing values in a data set. These methods are applied mainly on numerical data and produce continuous estimates. However, it is well known that the majority of the cost data sets contain software projects with mostly categorical attributes with many missing values. It is therefore reasonable to use some estimating method producing categorical rather than continuous values. The purpose of this paper is to investigate the possibility of using such a method for estimating categorical missing values in software cost databases. Specifically, the method known as multinomial logistic regression (MLR) is suggested for imputation and is applied on projects of the ISBSG multi-organizational software database. Comparisons of NILR with other techniques for handling missing data, such as listwise deletion (LD), mean imputation (MI), expectation maximization (EM) and regression imputation (RI) under different patterns and percentages of missing data, show the high efficiency of the proposed method. (C) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:404 / 414
页数:11
相关论文
共 50 条
  • [31] Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data
    Du, Xiuli
    Jiang, Xiaohu
    Lin, Jinguan
    PSYCHOMETRIKA, 2023, 88 (03) : 975 - 1001
  • [32] Multinomial Principal Component Logistic Regression on Shape Data
    Meisam Moghimbeygi
    Anahita Nodehi
    Journal of Classification, 2022, 39 : 578 - 599
  • [33] Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data
    Xiuli Du
    Xiaohu Jiang
    Jinguan Lin
    Psychometrika, 2023, 88 : 975 - 1001
  • [34] Statistical Modelling under Epistemic Data Imprecision: Some Results on Estimating Multinomial Distributions and Logistic Regression for Coarse Categorical Data
    Plass, Julia
    Augustin, Thomas
    Cattaneo, Marco E. G. V.
    Schollmeyer, Georg
    PROCEEDINGS OF THE 9TH INTERNATIONAL SYMPOSIUM ON IMPRECISE PROBABILITY: THEORIES AND APPLICATIONS (ISIPTA '15), 2015, : 247 - 256
  • [35] Model selection and parameter estimation of a multinomial logistic regression model
    Hossain, Shakhawat
    Ahmed, S. Ejaz
    Howlader, Hatem A.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2014, 84 (07) : 1412 - 1426
  • [36] Multinomial and ordinal Logistic regression analyses with multi-categorical variables using R
    Liang, Jiaqi
    Bi, Guoshu
    Zhan, Cheng
    ANNALS OF TRANSLATIONAL MEDICINE, 2020, 8 (16)
  • [37] A product-multinomial framework for categorical data analysis with missing responses
    Poleto, Frederico Z.
    Singer, Julio M.
    Paulino, Carlos Daniel
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2014, 28 (01) : 109 - 139
  • [38] Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges
    Almutlaq, Ayman Jalal Hassan
    Jawawi, Dayang N. A.
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 1144 - 1158
  • [39] Possibilistic Logistic Regression for Fuzzy Categorical Response Data
    Namdari, Mahshid
    Taheri, S. Mahmoud
    Abadi, Alireza
    Rezaei, Mansour
    Kalantari, Naser
    2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [40] Missing data imputation using classification and regression trees
    Chen, Cheng-Yang
    Chang, Yu-Wei
    PEERJ COMPUTER SCIENCE, 2024, 10