Multilevel Stochastic Optimization for Imputation in Massive Medical Data Records

被引:1
|
作者
Li, Wenrui [1 ]
Wang, Xiaoyu [1 ]
Sun, Yuetian [1 ]
Milanovic, Snezana [1 ,2 ]
Kon, Mark [1 ]
Castrillon-Candas, Julio Enrique [1 ]
机构
[1] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[2] Sunov Pharmaceut, Marlborough, MA 01752 USA
基金
美国国家科学基金会;
关键词
Covariance matrices; Optimization; Stochastic processes; Deep learning; Iterative methods; Costs; Big Data; Best linear unbiased predictor; computational applied mathematics; machine learning; massive datasets; numerical stability; APPROXIMATION; EQUATIONS; PDES;
D O I
10.1109/TBDATA.2023.3328433
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It has long been a recognized problem that many datasets contain significant levels of missing numerical data. A potentially critical predicate for application of machine learning methods to datasets involves addressing this problem. However, this is a challenging task. In this article, we apply a recently developed multi-level stochastic optimization approach to the problem of imputation in massive medical records. The approach is based on computational applied mathematics techniques and is highly accurate. In particular, for the Best Linear Unbiased Predictor (BLUP) this multi-level formulation is exact, and is significantly faster and more numerically stable. This permits practical application of Kriging methods to data imputation problems for massive datasets. We test this approach on data from the National Inpatient Sample (NIS) data records, Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality. Numerical results show that the multi-level method significantly outperforms current approaches and is numerically robust. It has superior accuracy as compared with methods recommended in the recent report from HCUP. Benchmark tests show up to 75% reductions in error. Furthermore, the results are also superior to recent state of the art methods such as discriminative deep learning.
引用
收藏
页码:122 / 131
页数:10
相关论文
共 50 条
  • [31] Imputation of Missing Data in Electronic Health Records Based on Patients’ Similarities
    Ali Jazayeri
    Ou Stella Liang
    Christopher C. Yang
    Journal of Healthcare Informatics Research, 2020, 4 : 295 - 307
  • [32] Missing data imputation over academic records of electrical engineering students
    Jove, Esteban
    Blanco-Rodriguez, Patricia
    Casteleiro-Roca, Jose-Luis
    Quintian, Hector
    Moreno Arboleda, Francisco Javier
    LoPez-Vazquez, Jose Antonio
    Antonio Rodriguez-Gomez, Benigno
    Del Carmen Meizoso-Lopez, Maria
    Pinon-Pazos, Andres
    De Cos Juez, Francisco Javier
    Cho, Sung-Bae
    Calvo-Rolle, Jose Luis
    LOGIC JOURNAL OF THE IGPL, 2020, 28 (04) : 487 - 501
  • [33] Imputation of Missing Data in Electronic Health Records Based on Patients' Similarities
    Jazayeri, Ali
    Liang, Ou Stella
    Yang, Christopher C.
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2020, 4 (03) : 295 - 307
  • [34] TENSOR FACTORIZATION FOR MISSING DATA IMPUTATION IN MEDICAL QUESTIONNAIRES
    Dauwels, Justin
    Garg, Lalit
    Earnest, Arul
    Pang, Leong Khai
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2109 - 2112
  • [35] Enhancing the missing data imputation of primary substation load demand records
    Borges, Cruz E.
    Kamara-Esteban, Oihane
    Castillo-Calzadilla, Tony
    Martin Andonegui, Cristina
    Alonso-Vicario, Ainhoa
    SUSTAINABLE ENERGY GRIDS & NETWORKS, 2020, 23
  • [36] Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics
    Si, Yajuan
    Heeringa, Steve
    Johnson, David
    Little, Roderick J. A.
    Liu, Wenshuo
    Pfeffer, Fabian
    Raghunathan, Trivellore
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2023, 11 (01) : 260 - 283
  • [37] MULTILEVEL STOCHASTIC GRADIENT METHODS FOR NESTED COMPOSITION OPTIMIZATION
    Yang, Shuoguang
    Wang, Mengdi
    Fang, Ethan X.
    SIAM JOURNAL ON OPTIMIZATION, 2019, 29 (01) : 616 - 659
  • [38] Semi-parametric optimization for missing data imputation
    Yongsong Qin
    Shichao Zhang
    Xiaofeng Zhu
    Jilian Zhang
    Chengqi Zhang
    Applied Intelligence, 2007, 27 : 79 - 88
  • [39] Optimization Method for Storing Massive Small Files in Multi-modal Medical Data
    Zeng M.
    Zou B.-J.
    Zhang W.-S.
    Yang X.-B.
    Zhu C.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03): : 1451 - 1469
  • [40] Semi-parametric optimization for missing data imputation
    Qin, Yongsong
    Zhang, Shichao
    Zhu, Xiaofeng
    Zhang, Jilian
    Zhang, Chengqi
    APPLIED INTELLIGENCE, 2007, 27 (01) : 79 - 88