Extraction of gene/protein names involved in each stage of spermatogenesis based on literature mining

被引:0
|
作者
Zhu, Jun [1 ,3 ]
Yin, Jianping [1 ]
Zhao, Zhiheng [1 ]
Zhu, En [1 ]
Ban, Rongjun [2 ]
机构
[1] [1,Zhu, Jun
[2] Yin, Jianping
[3] Zhao, Zhiheng
[4] Zhu, En
[5] Ban, Rongjun
来源
Zhu, J. (cqzhujun@126.com) | 1600年 / Science Press卷 / 51期
关键词
Classification (of information) - Text processing - Extraction - Statistical tests;
D O I
10.7544/issn1000-1239.2014.20121057
中图分类号
学科分类号
摘要
Spermatogenesis is an important bioprocess in the lifetime of male mammalians, which has deep effect on mammal's reproduction. Abnormal spermatogenesis is a major cause of male infertility, however treatments for this are limited. Characterizing the genes/proteins involved in spermatogenesis is fundamental to understand the mechanisms underlying this biological process and to develop treatments for the problems in spermatogenesis. However, most crucial information of spermatogenesis-related genes/proteins scatters in vast amount of research articles, so manually curation of these genes/proteins could be a time-consuming task. In this paper, a novel strategy is proposed to automatically extract the names of spermatogenesis-related genes/proteins, which function in different stages of spermatogenesis based on literature mining. Firstly, it compares three different algorithms performance on different terms and applys an SVM classifier trained with a manually prepared dataset to classify spermatogenesis-related texts into three classes in accordance with the three stages of spermatogenesis. Then, integrating expert knowledge and grammar rules, it recongnizes and extracts the gene/protein names of each spermatogenesis stage with high confidence. Finally, a manually curation test dataset is used to test the performance of this strategy, and the strategy gets an accuracy of 71.9%, which verifys the reliability of proposed method and proves the value of application.
引用
收藏
页码:1352 / 1358
相关论文
共 50 条
  • [1] Automatic extraction of reference gene from literature in plants based on texting mining
    He Lin
    Shen Gengyu
    Li Fei
    Huang Shuiqing
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (04) : 400 - 416
  • [2] Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction
    Hsieh, Ai-Ru
    Tsai, Chen-Yu
    EUROPEAN JOURNAL OF MEDICAL RESEARCH, 2024, 29 (01)
  • [3] Literature extraction of protein functions using sentence pattern mining
    Chiang, JH
    Yu, HC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (08) : 1088 - 1098
  • [4] MinePhos: A Literature Mining System for Protein Phoshphorylation Information Extraction
    Xu, Yun
    Teng, Da
    Lei, Yiming
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (01) : 311 - 315
  • [5] SPAG4L, a Novel Nuclear Envelope Protein Involved in the Meiotic Stage of Spermatogenesis
    Jiang, Xian-Zhen
    Yang, Ming-Gang
    Huang, Li-Hua
    Li, Chang-Qi
    Xing, Xiao-Wei
    DNA AND CELL BIOLOGY, 2011, 30 (11) : 875 - 882
  • [6] BioThesaurus: a web-based thesaurus of protein and gene names
    Liu, HF
    Hu, ZZ
    Zhang, J
    Wu, C
    BIOINFORMATICS, 2006, 22 (01) : 103 - 105
  • [7] Annotating gene sets by mining large literature collections with protein networks
    Wang, Sheng
    Ma, Jianzhu
    Yu, Michael Ku
    Zheng, Fan
    Huang, Edward W.
    Han, Jiawei
    Peng, Jian
    Ideker, Trey
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018 (PSB), 2018, : 602 - 613
  • [8] Protein identification at each growth stage based on early-stage expression in 'Niitaka' pear fruits
    Baek, Yun-Ju
    Seo, Su-mi
    Yang, Ung
    Wi, Seung Gon
    Lee, Sang-Hyun
    HORTICULTURE ENVIRONMENT AND BIOTECHNOLOGY, 2024, : 219 - 232
  • [9] Identification of related gene/protein names based on an HMM of name variations
    Yeganova, L
    Smith, L
    Wilbur, WJ
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (02) : 97 - 107
  • [10] SpermatogenesisOnline 1.0: a resource for spermatogenesis based on manual literature curation and genome-wide data mining
    Zhang, Yuanwei
    Zhong, Liangwen
    Xu, Bo
    Yang, Yifan
    Ban, Rongjun
    Zhu, Jun
    Cooke, Howard J.
    Hao, QiaoMei
    Shi, Qinghua
    NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D1055 - D1062