Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour

被引:0
|
作者
Aditya Dubey
Akhtar Rasool
机构
[1] Maulana Azad National Institute of Technology,Department of Computer Science & Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept of weighted distance. The evaluation is carried out on numerous datasets from a variety of biological areas, with experimentally inserted missing values varying from 5 to 25%. Experimental results prove that the proposed imputation technique makes accurate predictions as compared to other imputation procedures. In this paper, for performing the imputation experiments, microarray gene expression datasets consisting of information of different cancers and tumors are considered. The main contribution of this research states that local similarity-based techniques can be used for imputation even when the dataset has varying dimensionality and characteristics.
引用
收藏
相关论文
共 50 条
  • [31] A Missing Data Imputation Approach Using Clustering and Maximum Likelihood Estimation
    Albayrak, Muammer
    Turhan, Kemal
    Kurt, Burcin
    2017 MEDICAL TECHNOLOGIES NATIONAL CONGRESS (TIPTEKNO), 2017,
  • [32] Improving missing value imputation of microarray data by using spot quality weights
    Peter Johansson
    Jari Häkkinen
    BMC Bioinformatics, 7
  • [33] Improving missing value imputation of microarray data by using spot quality weights
    Johansson, Peter
    Hakkinen, Jari
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [34] Improved methods for the imputation of missing data by nearest neighbor methods
    Tutz, Gerhard
    Ramzan, Shahla
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 90 : 84 - 99
  • [35] A weighted Local Least Squares Imputation method for missing value estimation in microarray gene expression data
    Ching, Wai-Ki
    Li, Limin
    Tsing, Nam-Kiu
    Tai, Ching-Wan
    Ng, Tuen-Wai
    Wong, Alice S.
    Cheng, Kwai-Wa
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (03) : 331 - 347
  • [36] Imputation of missing data based on locally weighted algorithm
    College of Information Engineering, Shenyang University of Chemical Technology, Shenyang, China
    J. Comput. Inf. Syst., 4 (1195-1204):
  • [37] Instance driven clustering for the imputation of missing data in KDD
    Ilango, P.
    Vijayakumar, K.
    Babu, M. Rajasekhara
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2014, 12 (01) : 69 - 81
  • [38] An Imputation Technique for Missing Data in Propagation Measurements
    Cheng, Lin
    2010 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, 2010,
  • [39] SICE: an improved missing data imputation technique
    Khan, Shahidul Islam
    Hoque, Abu Sayed Md Latiful
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [40] Imputation of missing values in DNA microarray gene expression data
    Kim, H
    Golub, GH
    Park, H
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 572 - 573