Deep generative models for T cell receptor protein sequences

被引:49
作者
Davidsen, Kristian [1 ,2 ]
Olson, Branden J. [1 ,2 ]
DeWitt, William S., III [1 ,2 ]
Feng, Jean [1 ,2 ]
Harkins, Elias [1 ,2 ]
Bradley, Philip [1 ,2 ]
Matsen, Frederick A. [1 ,2 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
[2] Fred Hutchinson Canc Res Ctr, 1124 Columbia St, Seattle, WA 98104 USA
基金
美国国家卫生研究院;
关键词
D O I
10.7554/eLife.46935
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Probabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences for stimulation via vaccination. Classically, these models are defined in terms of a probabilistic V(D)J recombination model which is sometimes combined with a selection model. In this paper we take a different approach, fitting variational autoencoder (VAE) models parameterized by deep neural networks to T cell receptor (TCR) repertoires. We show that simple VAE models can perform accurate cohort frequency estimation, learn the rules of VDJ recombination, and generalize well to unseen sequences. Further, we demonstrate that VAE-like models can distinguish between real sequences and sequences generated according to a recombination-selection model, and that many characteristics of VAE-generated sequences are similar to those of real sequences.
引用
收藏
页数:18
相关论文
共 40 条
[1]  
Abadi M., 2015, Technical report
[2]  
[Anonymous], OLGA FAST COMPUTATIO
[3]  
[Anonymous], BIORXIV
[4]  
[Anonymous], 2017, Variational auto-encoding of protein sequences. arXiv
[5]  
[Anonymous], DEEP GENERATIVE MODE
[6]  
[Anonymous], BIORXIV
[7]  
[Anonymous], 2017, DO GANS ACTUALLY LEA
[8]  
[Anonymous], 2016, DEEP UNSUPERVISED CL
[9]  
[Anonymous], GEN EQUILIBRIUM GENE
[10]  
[Anonymous], DATA ANAL USING VAMP