Probabilistic count matrix factorization for single cell expression data analysis

被引:23
|
作者
Durif, Ghislain [1 ,2 ,3 ]
Modolo, Laurent [1 ,4 ,5 ]
Mold, Jeff E. [5 ]
Lambert-Lacroix, Sophie [6 ]
Picard, Franck [1 ]
机构
[1] Univ Lyon 1, Univ Lyon, CNRS, LBBE UMR 5558, F-69622 Villeurbanne, France
[2] Univ Grenoble Alpes, INRIA, CNRS, Grenoble INP,LJK UMR 5224, F-38000 Grenoble, France
[3] Univ Montpellier, CNRS, IMAG UMR 5149, F-34090 Montpellier, France
[4] Univ Lyon 1, Univ Lyon, ENS Lyon, CNRS,LBMC UMR 5239, F-69007 Lyon, France
[5] Karolinska Inst, Dept Cell & Mol Biol, Stockholm, Sweden
[6] Univ Grenoble Alpes, CNRS, TIMC IMAG UMR 5525, F-38041 Grenoble, France
基金
欧洲研究理事会;
关键词
GENE-EXPRESSION; REVEALS; SUBPOPULATIONS; HETEROGENEITY; POPULATION;
D O I
10.1093/bioinformatics/btz177
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The development of high-throughput single-cell sequencing technologies now allows the investigation of the population diversity of cellular transcriptomes. The expression dynamics (gene-to-gene variability) can be quantified more accurately, thanks to the measurement of lowly expressed genes. In addition, the cell-to-cell variability is high, with a low proportion of cells expressing the same genes at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent a summarized view of single-cell expression data. Principal component analysis (PCA) is a most powerful tool for high dimensional data representation, by searching for latent directions catching the most variability in the data. Unfortunately, classical PCA is based on Euclidean distance and projections that poorly work in presence of over-dispersed count data with dropout events like single-cell expression data. Results: We propose a probabilistic Count Matrix Factorization (pCMF) approach for single-cell expression data analysis that relies on a sparse Gamma-Poisson factor model. This hierarchical model is inferred using a variational EM algorithm. It is able to jointly build a low dimensional representation of cells and genes. We show how this probabilistic framework induces a geometry that is suitable for single-cell data visualization, and produces a compression of the data that is very powerful for clustering purposes. Our method is competed against other standard representation methods like t-SNE, and we illustrate its performance for the representation of single-cell expression data.
引用
收藏
页码:4011 / 4019
页数:9
相关论文
共 50 条
  • [1] Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis
    Durif, G.
    Modolo, L.
    Mold, J. E.
    Lambert-Lacroix, S.
    Picard, F.
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 254 - 255
  • [2] Scalable Probabilistic Tensor Factorization for Binary and Count Data
    Rai, Piyush
    Hu, Changwei
    Harding, Matthew
    Carin, Lawrence
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3770 - 3776
  • [3] Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
    Hsu, Lauren L.
    Culhane, Aedin C.
    FRONTIERS IN ONCOLOGY, 2020, 10
  • [4] A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data
    Sun, Shiquan
    Chen, Yabo
    Liu, Yang
    Shang, Xuequn
    BMC SYSTEMS BIOLOGY, 2019, 13
  • [5] PROBABILISTIC NON-NEGATIVE MATRIX FACTORIZATION: THEORY AND APPLICATION TO MICROARRAY DATA ANALYSIS
    Bayar, Belhassen
    Bouaynaya, Nidhal
    Shterenberg, Roman
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (01)
  • [6] Probabilistic Matrix Factorization with Non-random Missing Data
    Hernandez-Lobato, Jose Miguel
    Houlsby, Neil
    Ghahramani, Zoubin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1512 - 1520
  • [7] Probabilistic Sequential Matrix Factorization
    Akyildiz, Omer Deniz
    van den Burg, Gerrit J. J.
    Damoulas, Theodoros
    Steel, Mark F. J.
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [8] Robust classification of single-cell transcriptome data by nonnegative matrix factorization
    Shao, Chunxuan
    Hoefer, Thomas
    BIOINFORMATICS, 2017, 33 (02) : 235 - 242
  • [9] Probabilistic Matrix Factorization for Data With Attributes Based on Finite Mixture Modeling
    Kong, Qingming
    Sun, Jianyong
    Zhang, Yongquan
    Xu, Zongben
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) : 1154 - 1166
  • [10] TriTan: an efficient triple nonnegative matrix factorization method for integrative analysis of single-cell multiomics data
    Ma, Xin
    Lin, Lijing
    Zhao, Qian
    Iqbal, Mudassar
    BRIEFINGS IN BIOINFORMATICS, 2024, 26 (01)