Probabilistic count matrix factorization for single cell expression data analysis

被引：23

作者：

Durif, Ghislain ^{[1
,2
,3
]}

Modolo, Laurent ^{[1
,4
,5
]}

Mold, Jeff E. ^{[5
]}

Lambert-Lacroix, Sophie ^{[6
]}

Picard, Franck ^{[1
]}

机构：

[1] Univ Lyon 1, Univ Lyon, CNRS, LBBE UMR 5558, F-69622 Villeurbanne, France

[2] Univ Grenoble Alpes, INRIA, CNRS, Grenoble INP,LJK UMR 5224, F-38000 Grenoble, France

[3] Univ Montpellier, CNRS, IMAG UMR 5149, F-34090 Montpellier, France

[4] Univ Lyon 1, Univ Lyon, ENS Lyon, CNRS,LBMC UMR 5239, F-69007 Lyon, France

[5] Karolinska Inst, Dept Cell & Mol Biol, Stockholm, Sweden

[6] Univ Grenoble Alpes, CNRS, TIMC IMAG UMR 5525, F-38041 Grenoble, France

来源：

BIOINFORMATICS | 2019年 / 35卷 / 20期

基金：

欧洲研究理事会;

关键词：

GENE-EXPRESSION; REVEALS; SUBPOPULATIONS; HETEROGENEITY; POPULATION;

D O I：

10.1093/bioinformatics/btz177

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: The development of high-throughput single-cell sequencing technologies now allows the investigation of the population diversity of cellular transcriptomes. The expression dynamics (gene-to-gene variability) can be quantified more accurately, thanks to the measurement of lowly expressed genes. In addition, the cell-to-cell variability is high, with a low proportion of cells expressing the same genes at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent a summarized view of single-cell expression data. Principal component analysis (PCA) is a most powerful tool for high dimensional data representation, by searching for latent directions catching the most variability in the data. Unfortunately, classical PCA is based on Euclidean distance and projections that poorly work in presence of over-dispersed count data with dropout events like single-cell expression data. Results: We propose a probabilistic Count Matrix Factorization (pCMF) approach for single-cell expression data analysis that relies on a sparse Gamma-Poisson factor model. This hierarchical model is inferred using a variational EM algorithm. It is able to jointly build a low dimensional representation of cells and genes. We show how this probabilistic framework induces a geometry that is suitable for single-cell data visualization, and produces a compression of the data that is very powerful for clustering purposes. Our method is competed against other standard representation methods like t-SNE, and we illustrate its performance for the representation of single-cell expression data.

引用

页码：4011 / 4019

页数：9

共 50 条

[1] Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis
Durif, G.
Modolo, L.
Mold, J. E.
Lambert-Lacroix, S.
Picard, F.
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 254 - 255
[2] Scalable Probabilistic Tensor Factorization for Binary and Count Data
Rai, Piyush
Hu, Changwei
Harding, Matthew
Carin, Lawrence
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3770 - 3776
[3] Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
Hsu, Lauren L.
Culhane, Aedin C.
FRONTIERS IN ONCOLOGY, 2020, 10
[4] A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data
Sun, Shiquan
Chen, Yabo
Liu, Yang
Shang, Xuequn
BMC SYSTEMS BIOLOGY, 2019, 13
[5] PROBABILISTIC NON-NEGATIVE MATRIX FACTORIZATION: THEORY AND APPLICATION TO MICROARRAY DATA ANALYSIS
Bayar, Belhassen
Bouaynaya, Nidhal
Shterenberg, Roman
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (01)
[6] Probabilistic Matrix Factorization with Non-random Missing Data
Hernandez-Lobato, Jose Miguel
Houlsby, Neil
Ghahramani, Zoubin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1512 - 1520
[7] Probabilistic Sequential Matrix Factorization
Akyildiz, Omer Deniz
van den Burg, Gerrit J. J.
Damoulas, Theodoros
Steel, Mark F. J.
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[8] Robust classification of single-cell transcriptome data by nonnegative matrix factorization
Shao, Chunxuan
Hoefer, Thomas
BIOINFORMATICS, 2017, 33 (02) : 235 - 242
[9] Probabilistic Matrix Factorization for Data With Attributes Based on Finite Mixture Modeling
Kong, Qingming
Sun, Jianyong
Zhang, Yongquan
Xu, Zongben
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) : 1154 - 1166
[10] TriTan: an efficient triple nonnegative matrix factorization method for integrative analysis of single-cell multiomics data
Ma, Xin
Lin, Lijing
Zhao, Qian
Iqbal, Mudassar
BRIEFINGS IN BIOINFORMATICS, 2024, 26 (01)

← 1 2 3 4 5 →