Structured Matrix Completion with Applications to Genomic Data Integration

被引:51
|
作者
Cai, Tianxi [1 ]
Cai, T. Tony [1 ]
Zhang, Anru [1 ]
机构
[1] Univ Penn, Dept Stat, Wharton Sch, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Constrained minimization; Genomic data integration; Low-rank matrix; Matrix completion; Singular value decomposition; Structured matrix completion; LOW-RANK MATRIX; MISSING VALUE ESTIMATION; GENE-EXPRESSION DATA; OVARIAN-CANCER; GENOTYPE IMPUTATION; PENALIZATION; ALGORITHM; MODEL;
D O I
10.1080/01621459.2015.1021005
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on-independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured rnissingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival. Supplementary materials for this article are available online.
引用
收藏
页码:621 / 633
页数:13
相关论文
共 50 条
  • [41] Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data
    Bai, Jushan
    Ng, Serena
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (536) : 1746 - 1763
  • [42] FUNCTIONAL DATA ANALYSIS BY MATRIX COMPLETION1
    Descary, Marie-Helene
    Panaretos, Victor M.
    ANNALS OF STATISTICS, 2019, 47 (01): : 1 - 38
  • [43] The em algorithm for kernel matrix completion with auxiliary data
    Tsuda, K
    Akaho, S
    Asai, K
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (01) : 67 - 81
  • [44] Data Poisoning Attacks on Graph Convolutional Matrix Completion
    Zhou, Qi
    Ren, Yizhi
    Xia, Tianyu
    Yuan, Lifeng
    Chen, Linqiang
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 427 - 439
  • [45] Graph integration of structured, semistructured and unstructured data for data journalism
    Anadiotis, Angelos Christos
    Balalau, Oana
    Conceicao, Catarina
    Galhardas, Helena
    Haddad, Mhd Yamen
    Manolescu, Ioana
    Merabti, Tayeb
    You, Jingmao
    INFORMATION SYSTEMS, 2022, 104
  • [46] Some Observations about Ramanujan Graphs With Applications to Matrix Completion
    Burnwal, Shantanu Prasad
    Vidyasagar, Mathukumalli
    2019 SIXTH INDIAN CONTROL CONFERENCE (ICC), 2019, : 403 - 406
  • [47] SMART 4.0: towards genomic data integration
    Letunic, I
    Copley, RR
    Schmidt, S
    Ciccarelli, FD
    Doerks, T
    Schultz, J
    Ponting, CP
    Bork, P
    NUCLEIC ACIDS RESEARCH, 2004, 32 : D142 - D144
  • [48] Genomic, Proteomic, and Metabolomic Data Integration Strategies
    Wanichthanarak, Wanjeera
    Fahrmann, Johannes F.
    Grapov, Dmitry
    BIOMARKER INSIGHTS, 2015, 10 : 1 - 6
  • [49] ALL classification—integration of genomic and cytogenetic data
    Alessia Errico
    Nature Reviews Clinical Oncology, 2014, 11 (8) : 440 - 440
  • [50] Genomic data integration using guided clustering
    Maneck, Matthias
    Schrader, Alexandra
    Kube, Dieter
    Spang, Rainer
    BIOINFORMATICS, 2011, 27 (16) : 2231 - 2238