Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour

被引:0
|
作者
Aditya Dubey
Akhtar Rasool
机构
[1] Maulana Azad National Institute of Technology,Department of Computer Science & Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept of weighted distance. The evaluation is carried out on numerous datasets from a variety of biological areas, with experimentally inserted missing values varying from 5 to 25%. Experimental results prove that the proposed imputation technique makes accurate predictions as compared to other imputation procedures. In this paper, for performing the imputation experiments, microarray gene expression datasets consisting of information of different cancers and tumors are considered. The main contribution of this research states that local similarity-based techniques can be used for imputation even when the dataset has varying dimensionality and characteristics.
引用
收藏
相关论文
共 50 条
  • [21] Nearest neighbour approach in the least-squares data imputation algorithms
    Wasito, I
    Mirkin, B
    INFORMATION SCIENCES, 2005, 169 (1-2) : 1 - 25
  • [22] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
    Jonsson, Per
    Wohlin, Claes
    EMPIRICAL SOFTWARE ENGINEERING, 2006, 11 (03) : 463 - 489
  • [23] Benchmarking k-nearest neighbour imputation with homogeneous Likert data
    Per Jönsson
    Claes Wohlin
    Empirical Software Engineering, 2006, 11
  • [24] Gaussian mixture clustering and imputation of microarray data
    Ouyang, M
    Welsh, WJ
    Georgopoulos, P
    BIOINFORMATICS, 2004, 20 (06) : 917 - 923
  • [25] On the Use of Weighted k-Nearest Neighbors for Missing Value Imputation
    Lim, Chanhui
    Kim, Dongjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2015, 28 (01) : 23 - 31
  • [26] IMPROVED INDOOR POSITIONING USING FINGERPRINT TECHNIQUE AND WEIGHTED K-NEAREST NEIGHBOUR
    Salim, Sh. Naderi
    Alizadeh, M. M.
    Chamankar, Sh.
    Schuh, H.
    ISPRS GEOSPATIAL CONFERENCE 2022, JOINT 6TH SENSORS AND MODELS IN PHOTOGRAMMETRY AND REMOTE SENSING, SMPR/4TH GEOSPATIAL INFORMATION RESEARCH, GIRESEARCH CONFERENCES, VOL. 10-4, 2023, : 575 - 580
  • [27] COLI: Collaborative clustering missing data imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    Mozafari, Niloofar
    PATTERN RECOGNITION LETTERS, 2021, 152 : 420 - 427
  • [28] Robust imputation method for missing values in microarray data
    Yoon, Dankyu
    Lee, Eun-Kyung
    Park, Taesung
    BMC BIOINFORMATICS, 2007, 8 (Suppl 2)
  • [29] Robust imputation method for missing values in microarray data
    Dankyu Yoon
    Eun-Kyung Lee
    Taesung Park
    BMC Bioinformatics, 8
  • [30] Missing Data Imputation Using Ensemble Learning Technique: A Review
    Jegadeeswari, K.
    Ragunath, R.
    Rathipriya, R.
    SOFT COMPUTING FOR SECURITY APPLICATIONS, ICSCS 2022, 2023, 1428 : 223 - 236