In vitro, long-range sequence information for de novo genome assembly via transposase contiguity

被引:128
|
作者
Adey, Andrew [1 ]
Kitzman, Jacob O. [1 ]
Burton, Joshua N. [1 ]
Daza, Riza [1 ]
Kumar, Akash [1 ]
Christiansen, Lena [2 ]
Ronaghi, Mostafa [2 ]
Amini, Sasan [2 ]
Gunderson, Kevin L. [2 ]
Steemers, Frank J. [2 ]
Shendure, Jay [1 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98115 USA
[2] Illumina Inc, Adv Res Grp, San Diego, CA 92122 USA
基金
美国国家科学基金会;
关键词
LOW-INPUT; CONSTRUCTION; CHROMATIN; DATABASE; READS;
D O I
10.1101/gr.178319.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to similar to 5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight-to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
引用
收藏
页码:2041 / 2049
页数:9
相关论文
共 50 条
  • [1] High Contiguity de novo Genome Sequence Assembly of Trifoliate Yam (Dioscorea dumetorum) Using Long Read Sequencing
    Siadjeu, Christian
    Pucker, Boas
    Viehoever, Prisca
    Albach, Dirk C.
    Weisshaar, Bernd
    GENES, 2020, 11 (03)
  • [2] Targeted de novo phasing and long-range assembly by template mutagenesis
    Li, Siran
    Park, Sarah
    Ye, Catherine
    Danyko, Cassidy
    Wroten, Matthew
    Andrews, Peter
    Wigler, Michael
    Levy, Dan
    NUCLEIC ACIDS RESEARCH, 2022, 50 (18) : E103 - E103
  • [3] An improved de novo genome assembly of the common marmoset genome yields improved contiguity and increased mapping rates of sequence data
    Vasanthan Jayakumar
    Hiromi Ishii
    Misato Seki
    Wakako Kumita
    Takashi Inoue
    Sumitaka Hase
    Kengo Sato
    Hideyuki Okano
    Erika Sasaki
    Yasubumi Sakakibara
    BMC Genomics, 21
  • [4] An improved de novo genome assembly of the common marmoset genome yields improved contiguity and increased mapping rates of sequence data
    Jayakumar, Vasanthan
    Ishii, Hiromi
    Seki, Misato
    Kumita, Wakako
    Inoue, Takashi
    Hase, Sumitaka
    Sato, Kengo
    Okano, Hideyuki
    Sasaki, Erika
    Sakakibara, Yasubumi
    BMC GENOMICS, 2020, 21 (Suppl 3)
  • [5] The sequence and de novo assembly of Oxygymnocypris stewartii genome
    Hai-Ping Liu
    Shi-Jun Xiao
    Nan Wu
    Di Wang
    Yan-Chao Liu
    Chao-Wei Zhou
    Qi-Yong Liu
    Rui-Bin Yang
    Wen-Kai Jiang
    Qi-Qi Liang
    Chi Wangjiu
    Jun-Hua Zhang
    Xiao-Hui Gong
    Zhen-Bo Yuan
    Scientific Data, 6
  • [6] The sequence and de novo assembly of the giant panda genome
    Ruiqiang Li
    Wei Fan
    Geng Tian
    Hongmei Zhu
    Lin He
    Jing Cai
    Quanfei Huang
    Qingle Cai
    Bo Li
    Yinqi Bai
    Zhihe Zhang
    Yaping Zhang
    Wen Wang
    Jun Li
    Fuwen Wei
    Heng Li
    Min Jian
    Jianwen Li
    Zhaolei Zhang
    Rasmus Nielsen
    Dawei Li
    Wanjun Gu
    Zhentao Yang
    Zhaoling Xuan
    Oliver A. Ryder
    Frederick Chi-Ching Leung
    Yan Zhou
    Jianjun Cao
    Xiao Sun
    Yonggui Fu
    Xiaodong Fang
    Xiaosen Guo
    Bo Wang
    Rong Hou
    Fujun Shen
    Bo Mu
    Peixiang Ni
    Runmao Lin
    Wubin Qian
    Guodong Wang
    Chang Yu
    Wenhui Nie
    Jinhuan Wang
    Zhigang Wu
    Huiqing Liang
    Jiumeng Min
    Qi Wu
    Shifeng Cheng
    Jue Ruan
    Mingwei Wang
    Nature, 2010, 463 : 311 - 317
  • [7] The sequence and de novo assembly of the wild yak genome
    Yanbin Liu
    Jiayu Luo
    Jiajia Dou
    Biyao Yan
    Qingmiao Ren
    Bolin Tang
    Kun Wang
    Qiang Qiu
    Scientific Data, 7
  • [8] The sequence and de novo assembly of Oxygymnocypris stewartii genome
    Liu, Hai-Ping
    Xiao, Shi-Jun
    Wu, Nan
    Wang, Di
    Liu, Yan-Chao
    Zhou, Chao-Wei
    Liu, Qi-Yong
    Yang, Rui-Bin
    Jiang, Wen-Kai
    Liang, Qi-Qi
    Jiu, Wang
    Zhang, Chi
    Gong, Jun-Hua
    Yuan, Xiao-Hui
    Mou, Zhen-Bo
    SCIENTIFIC DATA, 2019, 6 (1)
  • [9] The sequence and de novo assembly of the wild yak genome
    Liu, Yanbin
    Luo, Jiayu
    Dou, Jiajia
    Yan, Biyao
    Ren, Qingmiao
    Tang, Bolin
    Wang, Kun
    Qiu, Qiang
    SCIENTIFIC DATA, 2020, 7 (01)
  • [10] The sequence and de novo assembly of the giant panda genome
    Li, Ruiqiang
    Fan, Wei
    Tian, Geng
    Zhu, Hongmei
    He, Lin
    Cai, Jing
    Huang, Quanfei
    Cai, Qingle
    Li, Bo
    Bai, Yinqi
    Zhang, Zhihe
    Zhang, Yaping
    Wang, Wen
    Li, Jun
    Wei, Fuwen
    Li, Heng
    Jian, Min
    Li, Jianwen
    Zhang, Zhaolei
    Nielsen, Rasmus
    Li, Dawei
    Gu, Wanjun
    Yang, Zhentao
    Xuan, Zhaoling
    Ryder, Oliver A.
    Leung, Frederick Chi-Ching
    Zhou, Yan
    Cao, Jianjun
    Sun, Xiao
    Fu, Yonggui
    Fang, Xiaodong
    Guo, Xiaosen
    Wang, Bo
    Hou, Rong
    Shen, Fujun
    Mu, Bo
    Ni, Peixiang
    Lin, Runmao
    Qian, Wubin
    Wang, Guodong
    Yu, Chang
    Nie, Wenhui
    Wang, Jinhuan
    Wu, Zhigang
    Liang, Huiqing
    Min, Jiumeng
    Wu, Qi
    Cheng, Shifeng
    Ruan, Jue
    Wang, Mingwei
    NATURE, 2010, 463 (7279) : 311 - 317