A Hybrid Method for Extracting Deep Web Information

被引:0
|
作者
Zhang, Yuanpeng [1 ]
Wang, Li [1 ]
Jiang, Kui [1 ]
Qian, Danmin [1 ]
Dong, Jiancheng [1 ]
机构
[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China
关键词
information extraction; clinic expert information; domain model; block importance model; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.
引用
收藏
页码:777 / 782
页数:6
相关论文
共 50 条
  • [1] Hybrid approach to extracting information from web-tables
    Jung, Sung-won
    Kang, Mi-young
    Kwon, Hyuk-chul
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 109 - +
  • [2] A Novel Method for Extracting Entity Data from Deep Web Precisely
    Yu Hai-tao
    Guo Jian-yi
    Yu Zheng-tao
    Xian Yan-tuan
    Yan Xin
    26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 5049 - 5053
  • [3] A NEW WEB INFORMATION EXTRACTING METHOD BASED ON MULTI-COORDINATE
    Huang, Min
    Xi, Jian-Qing
    Sun, Bo
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1488 - +
  • [4] Extracting Company Information from the Web
    Lam, Man I.
    Gong, Zhiguo
    Guo, Jingzhi
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 3640 - 3645
  • [5] Extracting semistructured information from Web
    Huang, Yu-Qing
    Qi, Guang-Zhi
    Zhang, Fu-Yan
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design & Computer Graphics, 2000, 12 (03): : 230 - 234
  • [6] Extracting table information from the Web
    Kim, YS
    Lee, KH
    DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 438 - 441
  • [7] An efficient web information extracting system
    Kong, YH
    Choi, IS
    ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 1771 - 1774
  • [8] A novel method for extracting information from web pages with multiple presentation templates
    Qingzhong L.
    Yanhui D.
    An F.
    Yongquan D.
    Journal of Software, 2010, 5 (05) : 506 - 513
  • [9] Extracting macroscopic information from Web links
    Thelwall, M
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2001, 52 (13): : 1157 - 1168
  • [10] A proactive web agent for information browsing and extracting
    Lu, HE
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 879 - 882