Information extraction from semi-structured data in the protein data bank by induction of a data description pattern

被引:0
|
作者
Kawaguchi, Y [1 ]
Kaneta, Y [1 ]
Ohkawa, T [1 ]
Nakamura, H [1 ]
Ito, N [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
关键词
Protein Data Bank; XML; information extraction; description pattern; induction;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
PDB (Protein Data Bank) is a primary database that stores the three-dimensional data of a protein structure. This paper proposes a system, the PDB REMARK transcoder, that semi-automatically extracts significant data from REMARK lines, a part of the PDB data, and transcodes them to XML (eXtensible Markup Language) format. This system induces a description pattern from some protein entries to accept gradual variations of REMARK lines. Tokens (words and phrases) are clustered by evaluating their similarity using token attributes, and their contents are recognized by cluster labels. By using finite state automatons, description patterns are induced, and then iterative structures are correspondly nested into XML formats. The confidence of the output XML data is confirmed by log files. Applying the system to the REMARK lines of 8,906 protein entries clarified the effectiveness of the method.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [41] Generating finite-state transducers for semi-structured data extraction from the Web
    Hsu, CN
    Dung, MT
    INFORMATION SYSTEMS, 1998, 23 (08) : 521 - 538
  • [42] Named Entity Extraction from Semi-structured Data Using Machine Learning Algorithms
    Mansurova, Madina
    Barakhnin, Vladimir
    Khibatkhanuly, Yerzhan
    Pastushkov, Ilya
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 58 - 69
  • [43] Extraction and transformation of data from semi-structured text files using a declarative approach
    Raminhos, R.
    Moura-Pires, J.
    ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2007, : 199 - +
  • [44] Automatic information extraction from semi-structured Web pages by pattern discovery
    Chang, CH
    Hsu, CN
    Lui, SC
    DECISION SUPPORT SYSTEMS, 2003, 35 (01) : 129 - 147
  • [45] Converting unstructured and semi-structured data into knowledge
    Rusu, Octavian
    Halcu, Ionela
    Grigoriu, Oana
    Neculoiu, Giorgian
    Sandulescu, Virginia
    Marinescu, Mariana
    Marinescu, Viorel
    2013 ROEDUNET INTERNATIONAL CONFERENCE (ROEDUNET): NETWORKING IN EDUCATION, 11TH EDITION, 2013,
  • [46] Constraint satisfaction in semi-structured data graphs
    Mamoulis, N
    Stergiou, K
    PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING - CP 2004, PROCEEDINGS, 2004, 3258 : 393 - 407
  • [47] Fragmentation of object oriented and semi-structured data
    Schewe, KD
    DATABASES AND INFORMATION SYSTEMS II, 2002, : 1 - 14
  • [48] Integrating unnormalised semi-structured data sources
    Kittivoravitkul, S
    McBrien, P
    ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2005, 3520 : 460 - 474
  • [49] Extracting semi-structured data through examples
    Ribeiro-Neto, B
    Laender, AHF
    da Silva, AS
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 94 - 101
  • [50] About One Approach to the Description of Semi-structured Indicators on a Given Data Sample
    Aliev, Araz R.
    Rzayev, Ramin R.
    10TH INTERNATIONAL CONFERENCE ON THEORY AND APPLICATION OF SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS - ICSCCW-2019, 2020, 1095 : 436 - 444