An information-theoretic approach to single cell sequencing analysis

被引:3
|
作者
Casey, Michael J. [1 ,2 ]
Fliege, Joerg [1 ]
Sanchez-Garcia, Ruben J. [1 ,2 ,3 ]
MacArthur, Ben D. [1 ,2 ,3 ,4 ]
机构
[1] Univ Southampton, Math Sci, Southampton, England
[2] Univ Southampton, Inst Life Sci, Southampton, England
[3] Alan Turing Inst, London, England
[4] Univ Southampton, Fac Med, Ctr Human Dev Stem Cells & Regenerat, Southampton, England
基金
英国工程与自然科学研究理事会;
关键词
RNA-SEQ; INFERENCE; ALGORITHM; NOISE;
D O I
10.1186/s12859-023-05424-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundSingle-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information.ResultsHere, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types.ConclusionsThus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] An information-theoretic approach to single cell sequencing analysis
    Michael J. Casey
    Jörg Fliege
    Rubén J. Sánchez-García
    Ben D. MacArthur
    BMC Bioinformatics, 24
  • [2] Information-theoretic analysis of multivariate single-cell signaling responses
    Jetka, Tomasz
    Nienaltowski, Karol
    Winarski, Tomasz
    Blonski, Slawomir
    Komorowski, Michal
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (07)
  • [3] An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data
    Garrett Jenkinson
    Jordi Abante
    Andrew P. Feinberg
    John Goutsias
    BMC Bioinformatics, 19
  • [4] An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data
    Jenkinson, Garrett
    Abante, Jordi
    Feinberg, Andrew P.
    Goutsias, John
    BMC BIOINFORMATICS, 2018, 19
  • [5] Information-Theoretic Approach to Optimal Differential Fault Analysis
    Sakiyama, Kazuo
    Li, Yang
    Iwamoto, Mitsugu
    Ohta, Kazuo
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2012, 7 (01) : 109 - 120
  • [6] Models and information-theoretic bounds for nanopore sequencing
    Mao, Wei
    Diggavi, Suhas
    Kannan, Sreeram
    2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017, : 2458 - 2462
  • [7] An information-theoretic approach to the analysis of location and colocation patterns
    van Dam, Alje
    Gomez-Lievano, Andres
    Neffke, Frank
    Frenken, Koen
    JOURNAL OF REGIONAL SCIENCE, 2023, 63 (01) : 173 - 213
  • [8] Models and Information-Theoretic Bounds for Nanopore Sequencing
    Mao, Wei
    Diggavi, Suhas N.
    Kannan, Sreeram
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (04) : 3216 - 3236
  • [9] Information-theoretic analysis of information hiding
    Moulin, P
    O'Sullivan, JA
    2000 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, 2000, : 19 - 19
  • [10] Information-theoretic analysis of information hiding
    Moulin, P
    O'Sullivan, JA
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2003, 49 (03) : 563 - 593