Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour

被引:0
|
作者
Aditya Dubey
Akhtar Rasool
机构
[1] Maulana Azad National Institute of Technology,Department of Computer Science & Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept of weighted distance. The evaluation is carried out on numerous datasets from a variety of biological areas, with experimentally inserted missing values varying from 5 to 25%. Experimental results prove that the proposed imputation technique makes accurate predictions as compared to other imputation procedures. In this paper, for performing the imputation experiments, microarray gene expression datasets consisting of information of different cancers and tumors are considered. The main contribution of this research states that local similarity-based techniques can be used for imputation even when the dataset has varying dimensionality and characteristics.
引用
收藏
相关论文
共 50 条
  • [41] SICE: an improved missing data imputation technique
    Shahidul Islam Khan
    Abu Sayed Md Latiful Hoque
    Journal of Big Data, 7
  • [42] Imputation of Missing Data Using Fuzzy Neighborhood Density-Based Clustering
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1834 - 1841
  • [43] Missing data imputation using decision trees and fuzzy clustering with iterative learning
    Sanaz Nikfalazar
    Chung-Hsing Yeh
    Susan Bedingfield
    Hadi A. Khorshidi
    Knowledge and Information Systems, 2020, 62 : 2419 - 2437
  • [44] Missing value imputation for microarray gene expression data using histone acetylation information
    Xiang, Qian
    Dai, Xianhua
    Deng, Yangyang
    He, Caisheng
    Wang, Jiang
    Feng, Jihua
    Dai, Zhiming
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [45] Missing Data Imputation using Evolutionary k- Nearest Neighbor Algorithm for Gene Expression Data
    de Silva, Hiroshi
    Perera, A. Shehan
    2016 SIXTEENTH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) - 2016, 2016, : 141 - 146
  • [46] Missing data imputation using decision trees and fuzzy clustering with iterative learning
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (06) : 2419 - 2437
  • [47] Missing value imputation for microarray gene expression data using histone acetylation information
    Qian Xiang
    Xianhua Dai
    Yangyang Deng
    Caisheng He
    Jiang Wang
    Jihua Feng
    Zhiming Dai
    BMC Bioinformatics, 9
  • [48] Hyperspectral Image Classification using Mutual Nearest Neighbour Clustering
    Flarence, R. Aruna
    Negi, Atul
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7261 - 7264
  • [49] Missing precipitation data estimation using optimal proximity metric-based imputation, nearest-neighbour classification and cluster-based interpolation methods
    Teegavarapu, Ramesh S. V.
    HYDROLOGICAL SCIENCES JOURNAL-JOURNAL DES SCIENCES HYDROLOGIQUES, 2014, 59 (11): : 2009 - 2026
  • [50] Missing value imputation for gene expression data by tailored nearest neighbors
    Faisal, Shahla
    Tutz, Gerhard
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2017, 16 (02) : 95 - 106