Information extraction from semi-structured data in the protein data bank by induction of a data description pattern

被引:0
|
作者
Kawaguchi, Y [1 ]
Kaneta, Y [1 ]
Ohkawa, T [1 ]
Nakamura, H [1 ]
Ito, N [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
关键词
Protein Data Bank; XML; information extraction; description pattern; induction;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
PDB (Protein Data Bank) is a primary database that stores the three-dimensional data of a protein structure. This paper proposes a system, the PDB REMARK transcoder, that semi-automatically extracts significant data from REMARK lines, a part of the PDB data, and transcodes them to XML (eXtensible Markup Language) format. This system induces a description pattern from some protein entries to accept gradual variations of REMARK lines. Tokens (words and phrases) are clustered by evaluating their similarity using token attributes, and their contents are recognized by cluster labels. By using finite state automatons, description patterns are induced, and then iterative structures are correspondly nested into XML formats. The confidence of the output XML data is confirmed by log files. Applying the system to the REMARK lines of 8,906 protein entries clarified the effectiveness of the method.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [1] Analyzing semi-structured data for ontological information extraction
    Han, H
    Elmasri, R
    IC'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS I AND II, 2001, : 21 - 27
  • [2] WICCAO: From semi-structured data to structured data
    Li, Z
    Ng, WK
    11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOP ON THE ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 2004, : 86 - 93
  • [3] Web Service for Data Extraction from Semi-structured Data Sources
    Yashina, Marina V.
    Nakonechnyy, Ivan I.
    PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON DEPENDABILITY AND COMPLEX SYSTEMS DEPCOS-RELCOMEX, 2014, 286 : 499 - 510
  • [4] Exploratory Visual Analysis and Interactive Pattern Extraction from Semi-Structured Data
    Soto, Axel J.
    Kiros, Ryan
    Keselj, Vlado
    Milios, Evangelos
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2015, 5 (03)
  • [5] Interactive tuples extraction from semi-structured data
    Gilleron, Remi
    Marty, Patrick
    Tommasi, Marc
    Torre, Fabien
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 997 - 1004
  • [6] Interactive Data Extraction from Semi-Structured Text
    Broman, Per
    Thalheim, Bernhard
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 1 - 19
  • [7] Information extraction from Web pages using semi-structured data alignment
    Kuboyama, Tetsuji
    Miyahara, Tetsuhiro
    Hirokawa, Sachio
    Itou, Eisuke
    WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
  • [8] List data extraction in semi-structured document
    Xu, H
    Li, JZ
    Xu, P
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005, 2005, 3806 : 584 - 585
  • [9] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [10] Data Integration Approach for Semi-structured and Structured Data (Linked Data)
    Kettouch, Mohamed Salah
    Luca, Cristina
    Hobbs, Mike
    Fatima, Arooj
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 820 - 825