Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour

被引:0
|
作者
Aditya Dubey
Akhtar Rasool
机构
[1] Maulana Azad National Institute of Technology,Department of Computer Science & Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept of weighted distance. The evaluation is carried out on numerous datasets from a variety of biological areas, with experimentally inserted missing values varying from 5 to 25%. Experimental results prove that the proposed imputation technique makes accurate predictions as compared to other imputation procedures. In this paper, for performing the imputation experiments, microarray gene expression datasets consisting of information of different cancers and tumors are considered. The main contribution of this research states that local similarity-based techniques can be used for imputation even when the dataset has varying dimensionality and characteristics.
引用
收藏
相关论文
共 50 条
  • [1] Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
    Dubey, Aditya
    Rasool, Akhtar
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [2] Usage of Clustering and Weighted Nearest Neighbors for Efficient Missing Data Imputation of Microarray Gene Expression Dataset
    Dubey, Aditya
    Rasool, Akhtar
    ADVANCED THEORY AND SIMULATIONS, 2022, 5 (11)
  • [3] An Efficient Technique for Missing value Imputation in Microarray Gene Expression Data
    Valarmathie, P.
    Dinakaran, K.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND SYSTEMS (ICCCS'14), 2014, : 73 - 80
  • [4] MICROARRAY MISSING DATA IMPUTATION USING REGRESSION
    Bayrak, Tuncay
    Ogul, Hasan
    2017 13TH IASTED INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING (BIOMED), 2017, : 68 - 73
  • [5] A cluster-directed framework for neighbour based imputation of missing value in microarray data
    Keerin, Phimmarin
    Kurutach, Werasak
    Boongoen, Tossapon
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 15 (02) : 165 - 193
  • [6] An evaluation of k-nearest neighbour imputation using Likert data
    Jönsson, P
    Wohlin, C
    10TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS, 2004, : 108 - 118
  • [7] Missing data imputation by nearest-neighbor trained BP for fuzzy clustering
    Zhang, Li, 1600, Binary Information Press (11):
  • [8] Missing value imputation improves clustering and interpretation of gene expression microarray data
    Tuikkala, Johannes
    Elo, Laura L.
    Nevalainen, Olli S.
    Aittokallio, Tero
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [9] Missing value imputation improves clustering and interpretation of gene expression microarray data
    Johannes Tuikkala
    Laura L Elo
    Olli S Nevalainen
    Tero Aittokallio
    BMC Bioinformatics, 9
  • [10] An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity metrics
    Wang Y.
    Xiao Y.
    Lai J.
    Chen Y.
    Archives of Transport, 2020, 54 (02) : 59 - 73