Predicting genome-wide redundancy using machine learning
被引:7
|
作者:
Chen, Huang-Wen
论文数: 0引用数: 0
h-index: 0
机构:
NYU, Dept Comp Sci, Courant Inst Math Sci, New York, NY 10003 USANYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
Chen, Huang-Wen
[2
]
Bandyopadhyay, Sunayan
论文数: 0引用数: 0
h-index: 0
机构:
NYU, Dept Comp Sci, Courant Inst Math Sci, New York, NY 10003 USA
Univ Minnesota Twin Cities, Dept Comp Sci & Engn, Minneapolis, MN 55455 USANYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
Bandyopadhyay, Sunayan
[2
,3
]
Shasha, Dennis E.
论文数: 0引用数: 0
h-index: 0
机构:
NYU, Dept Comp Sci, Courant Inst Math Sci, New York, NY 10003 USANYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
Shasha, Dennis E.
[2
]
Birnbaum, Kenneth D.
论文数: 0引用数: 0
h-index: 0
机构:
NYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USANYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
Birnbaum, Kenneth D.
[1
]
机构:
[1] NYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
[2] NYU, Dept Comp Sci, Courant Inst Math Sci, New York, NY 10003 USA
Background: Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results: Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e. g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods. Conclusions: Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.
机构:
Mem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USAMem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USA
Lee, S.
Oh, J.
论文数: 0引用数: 0
h-index: 0
机构:
Mem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USAMem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USA
Oh, J.
Kerns, S.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Rochester, Med Ctr, Dept Radiat Oncol, Rochester, NY 14627 USAMem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USA
Kerns, S.
Rosenstein, B.
论文数: 0引用数: 0
h-index: 0
机构:
Icahn Sch Med Mt Sinai, Dept Radiat Oncol & Genet & Genom Sci, New York, NY 10029 USAMem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USA
Rosenstein, B.
Ostrer, H.
论文数: 0引用数: 0
h-index: 0
机构:
Albert Einstein Coll Med, Dept Pathol, New York, NY USA
Albert Einstein Coll Med, Dept Pediat, New York, NY USAMem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USA
Ostrer, H.
Deasy, J.
论文数: 0引用数: 0
h-index: 0
机构:
Mem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USAMem Sloan Kettering Canc Ctr, Dept Med Phys, New York, NY 10021 USA