A Hybrid Method for Extracting Deep Web Information

被引:0
|
作者
Zhang, Yuanpeng [1 ]
Wang, Li [1 ]
Jiang, Kui [1 ]
Qian, Danmin [1 ]
Dong, Jiancheng [1 ]
机构
[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China
关键词
information extraction; clinic expert information; domain model; block importance model; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.
引用
收藏
页码:777 / 782
页数:6
相关论文
共 50 条
  • [41] Hybrid Schema Matching for Deep Web
    Chen, Kerui
    Zuo, Wanli
    He, Fengling
    Chen, Yongheng
    INTELLIGENT COMPUTING AND INFORMATION SCIENCE, PT II, 2011, 135 : 165 - +
  • [42] A method of extracting management information for service management
    Kubo, K
    Ikemoto, K
    NTT REVIEW, 1998, 10 (02): : 63 - 68
  • [43] Mining the deep Web for company information
    Ojala, M
    ONLINE, 2002, 26 (05): : 73 - 75
  • [44] A hybrid approach for web information extraction
    Xiao, Ji-Yi
    Zhu, Dao-Hui
    Zou, La-Mei
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1560 - 1563
  • [45] Three Level Method Using Machine Learning and Rule Based Approach for Extracting Web-Table Information
    Jung, Sung-Won
    Lim, Sung-Shin
    Kwon, Hyuk-Chul
    IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 3, 2004, : 3131 - 3136
  • [46] Extracting Output Metadata from Scientific Deep Web Data Sources
    Wang, Fan
    Agrawal, Gagan
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 552 - 561
  • [47] A method of deep web classification
    Xu, He-Xiang
    Hao, Xiu-Lan
    Wang, Shu-Yun
    Hu, Yun-Fa
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4009 - 4014
  • [48] Query Interface Schema Extracting from Deep Web using Ontology
    Sun, Yong
    Wang, Shang
    Li, Zhenyuan
    Liu, Chang
    Peng, Tao
    Qiu, Yuhang
    2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2021, 12076
  • [49] A scalable hybrid approach for extracting head components from Web tables
    Jung, SW
    Kwon, HC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 174 - 187
  • [50] Extracting Wetland Type Information with a Deep Convolutional Neural Network
    Guan, XianMing
    Wang, Di
    Wan, Luhe
    Zhang, Jiyi
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022