Design and Implement of Information Extraction System Based on XML

被引:0
|
作者
Xuan, Yanyan [1 ]
Hu, Yan [1 ]
机构
[1] Wuhan Univ Technol, Dept Comp Sci & Technol, Wuhan 430070, Peoples R China
关键词
Information Extraction; XML; XPath; XSLT; Extraction Rule;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
By studying the structure of HTML documents, this paper solves the problem of web information extraction through the standard XML technology and poses an information extraction method based on XML: construct HTMLDOM tree to implement Web cleaning and generate XHTML documents by analyzing HTML web, then analyze the XHTML files through the Xerces-J's DOM methods and construct an XPath generation algorithm; use the advantages of XSLT and XPath technology in the aspects of data location and conversion to automatically learn and generate the information extraction rules and implement the Web information extraction according to the generated XPath.
引用
收藏
页码:1400 / 1404
页数:5
相关论文
共 50 条
  • [41] Design of XML based information exchange format for consumer service
    Trans. Korean Inst. Electr. Eng., 2009, 10 (2052-2058):
  • [42] Design of Metadata based on XML of Radar Network Information Grid
    Chen, Fang
    Qiu, Ling
    PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 604 - +
  • [43] Design and Implement of News Publishing System Based on MVC Design Pattern
    Li Yong-Fei
    Chen Zhen-Guo
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON COMMUNICATION, ELECTRONICS AND AUTOMATION ENGINEERING, 2013, 181 : 755 - 760
  • [44] From Text to XML by Structural Information Extraction
    Piao, Yong
    Wang, Tianyu
    Jiang, He
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 448 - 452
  • [45] Information extraction and automatic markup for XML documents
    Abolhassani, M
    Fuhr, N
    Gövert, N
    INTELLIGENT SEARCH ON XML DATA: APPLICATIONS, LANGUAGES, MODELS IMPLEMENTATIONS AND BENCHMARKS, 2003, 2818 : 159 - 174
  • [46] Concurrent design versioning system, based on XML file
    Delinchant, B
    Gerbaud, L
    Wurtz, F
    Ateinza, E
    IECON-2002: PROCEEDINGS OF THE 2002 28TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-4, 2002, : 2485 - 2490
  • [47] Data driven design and simimation system based on XML
    Qiao, GX
    Riddick, F
    McLean, C
    PROCEEDINGS OF THE 2003 WINTER SIMULATION CONFERENCE, VOLS 1 AND 2, 2003, : 1143 - 1148
  • [48] Middleware design and xml data table creation relate to agent-based information release system
    Wu Junliang
    Zhang Pingping
    Wang Yong
    SMART MATERIALS AND INTELLIGENT SYSTEMS, PTS 1 AND 2, 2011, 143-144 : 1311 - +
  • [49] Extraction of interesting financial information from heterogeneous XML-based data
    Paik, Juryon
    Eom, Young Ik
    Kim, Ung Mo
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 4, PROCEEDINGS, 2006, 3994 : 356 - 363
  • [50] Implementation of the XML based listener for information retrieval & management system
    Lee, Seok-Hyoung
    Choi, Sung-Pil
    Choe, Ho-Seop
    Kang, Nam-Kyu
    Kim, Han-Gi
    Kim, Kwang-Young
    Hwang, Mi-Nyung
    Lee, Wang-Woo
    Yoon, Hwa-Mook
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 463 - +