A Hybrid Method for Extracting Deep Web Information

被引:0
|
作者
Zhang, Yuanpeng [1 ]
Wang, Li [1 ]
Jiang, Kui [1 ]
Qian, Danmin [1 ]
Dong, Jiancheng [1 ]
机构
[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China
来源
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷
关键词
information extraction; clinic expert information; domain model; block importance model; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.
引用
收藏
页码:777 / 782
页数:6
相关论文
共 50 条
  • [21] Extracting Domain Information using Deep Learning
    Gupta, Amit
    Xu, Weijia
    Jaiswal, Pankaj
    Taylor, Crispin
    Regala, Jennifer
    PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
  • [22] UNIVERSALEXTRACT - EXTRACTING DEEP WEB DATA USING ONTOLOGY
    Hong, Jer Lang
    Yin, Brian Ho Hoe
    UNCERTAINTY MODELLING IN KNOWLEDGE ENGINEERING AND DECISION MAKING, 2016, 10 : 377 - 383
  • [23] Tuning up FOIL for extracting information from the web
    Palacios, Pablo
    Fernandez de Viana, Inaki
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2008, 33 (04) : 280 - 284
  • [24] Extracting Hidden Information Based on Comparing Web with UGC
    Uchimura, Keisuke
    Nadamoto, Akiyo
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2010 WORKSHOPS, 2011, 6724 : 365 - 377
  • [25] Extracting Information Seeking Intentions for Web Search Sessions
    Mitsui, Matthew
    Shah, Chirag
    Belkin, Nicholas J.
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 841 - 844
  • [26] A web page segmentation algorithm for extracting product information
    Wu, Changjun
    Zeng, Guosun
    Xu, Guorong
    2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 1374 - 1379
  • [27] A Method for Extracting Building Information from Remote Sensing Images Based on Deep Learning
    Li, Lianying
    Chen, Xi
    Li, Lianchao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [28] Extracting Academic Information from Conference Web Pages
    Wang, Peng
    You, Yue
    Xu, Baowen
    Zhao, Jianyu
    2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 952 - 959
  • [29] A hybrid method for extracting classification rules
    Zhuang, CL
    Fu, ZT
    Li, DL
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS II, 2005, 187 : 257 - 267
  • [30] A New Vision-Based Method for Extracting Academic Information from Conference Web Pages
    Wang, Peng
    Zhou, Mingqi
    You, Yue
    Zhang, Xiang
    2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 976 - 981