Information extraction from semi-structured data in the protein data bank by induction of a data description pattern

被引:0
|
作者
Kawaguchi, Y [1 ]
Kaneta, Y [1 ]
Ohkawa, T [1 ]
Nakamura, H [1 ]
Ito, N [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
关键词
Protein Data Bank; XML; information extraction; description pattern; induction;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
PDB (Protein Data Bank) is a primary database that stores the three-dimensional data of a protein structure. This paper proposes a system, the PDB REMARK transcoder, that semi-automatically extracts significant data from REMARK lines, a part of the PDB data, and transcodes them to XML (eXtensible Markup Language) format. This system induces a description pattern from some protein entries to accept gradual variations of REMARK lines. Tokens (words and phrases) are clustered by evaluating their similarity using token attributes, and their contents are recognized by cluster labels. By using finite state automatons, description patterns are induced, and then iterative structures are correspondly nested into XML formats. The confidence of the output XML data is confirmed by log files. Applying the system to the REMARK lines of 8,906 protein entries clarified the effectiveness of the method.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [31] Building knowledge base from semi-structured data
    Liu, Xiao-Li
    Wu, Guo-Qing
    Yang, Min
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 839 - +
  • [32] Tool for extracting semi-structured data to a big data load
    Furtado, Joao Carlos
    Bulsing, Gabriel Merten
    Kroth, Eduardo
    Benitez Nara, Elpidio Oscar
    Kipper, Liane Malhmann
    REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2015, 7 (03): : 43 - 52
  • [33] A strategy for data storage and the search for semi-structured data in the Web
    do Nascimento, C. A. S. A.
    Ebecken, N. F. F.
    Rosa, J. L. dos A.
    DATA MINING X: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2009, 42 : 51 - +
  • [34] Multilevel Data Storage Model of Fuzzy Semi-Structured Data
    Yants, V. I.
    Chernov, A. V.
    Butakova, M. A.
    Klimanskaya, E. V.
    2015 XVIII International Conference on Soft Computing and Measurements (SCM), 2015, : 112 - 114
  • [35] Data Warehouse Based Approach to the Integration of Semi-structured Data
    Ahmad, Houda
    Kermanshahani, Shokoh
    Simonet, Ana
    Simonet, Michel
    ADVANCES IN WEB AND NETWORK TECHNOLOGIES, AND INFORMATION MANAGEMENT, 2009, 5731 : 88 - 99
  • [36] Multi-level schema extraction for heterogeneous semi-structured data
    Yoon, JP
    Raghavan, V
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2000, 1846 : 411 - 422
  • [37] An Algorithm of Semi-structured Data Scheme Extraction Based on OEM Model
    Gong, An
    Yang, Xue-wei
    ADVANCED RESEARCH ON ELECTRONIC COMMERCE, WEB APPLICATION, AND COMMUNICATION, PT 1, 2011, 143 : 315 - 319
  • [38] An automated integration approach for semi-structured and structured data
    Lim, SJ
    Ng, YK
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON COOPERATIVE DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2000, : 12 - 21
  • [39] Schemas for integration and translation of structured and semi-structured data
    Beeri, C
    Milo, T
    DATABASE THEORY - ICDT'99, 1999, 1540 : 296 - 313
  • [40] Generating finite-state transducers for semi-structured data extraction from the Web
    Academia Sinica, Taipei, Taiwan
    Inf Syst, 8 (521-538):