An information-theoretic approach to single cell sequencing analysis

被引:3
|
作者
Casey, Michael J. [1 ,2 ]
Fliege, Joerg [1 ]
Sanchez-Garcia, Ruben J. [1 ,2 ,3 ]
MacArthur, Ben D. [1 ,2 ,3 ,4 ]
机构
[1] Univ Southampton, Math Sci, Southampton, England
[2] Univ Southampton, Inst Life Sci, Southampton, England
[3] Alan Turing Inst, London, England
[4] Univ Southampton, Fac Med, Ctr Human Dev Stem Cells & Regenerat, Southampton, England
基金
英国工程与自然科学研究理事会;
关键词
RNA-SEQ; INFERENCE; ALGORITHM; NOISE;
D O I
10.1186/s12859-023-05424-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundSingle-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information.ResultsHere, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types.ConclusionsThus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] An information-theoretic approach to statistical dependence: Copula information
    Calsaverini, R. S.
    Vicente, R.
    EPL, 2009, 88 (06)
  • [42] A geometric approach to information-theoretic private information retrieval
    Woodruff, David
    Yekhanin, Sergey
    SIAM JOURNAL ON COMPUTING, 2007, 37 (04) : 1046 - 1056
  • [43] Information-theoretic analysis of neural coding
    Johnson, DH
    Gruner, CM
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1937 - 1940
  • [44] Information-Theoretic Analysis of Neural Coding
    Don H. Johnson
    Charlotte M. Gruner
    Keith Baggerly
    Chandran Seshagiri
    Journal of Computational Neuroscience, 2001, 10 : 47 - 69
  • [45] Information-Theoretic Analysis of Haplotype Assembly
    Si, Hongbo
    Vikalo, Haris
    Vishwanath, Sriram
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (06) : 3468 - 3479
  • [46] Information-Theoretic Analysis of Spherical Fingerprinting
    Moulin, Pierre
    Wang, Ying
    2009 INFORMATION THEORY AND APPLICATIONS WORKSHOP, 2009, : 226 - +
  • [47] Information-theoretic analysis of neural coding
    Johnson, DH
    Gruner, CM
    Baggerly, K
    Seshagiri, C
    JOURNAL OF COMPUTATIONAL NEUROSCIENCE, 2001, 10 (01) : 47 - 69
  • [48] An Information-Theoretic Analysis of Thompson Sampling
    Russo, Daniel
    Van Roy, Benjamin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [49] Information-theoretic analysis for transfer learning
    Wu, Xuetong
    Manton, Jonathan H.
    Aickelin, Uwe
    Zhu, Jingge
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2819 - 2824
  • [50] INFORMATION-THEORETIC ANALYSIS OF CLINICAL REFRACTION
    CROSSMAN, ER
    NAGARVALA, PJ
    MARG, E
    AMERICAN JOURNAL OF OPTOMETRY AND ARCHIVES OF AMERICAN ACADEMY OF OPTOMETRY, 1971, 48 (05): : 391 - +