A Hybrid Method for Extracting Deep Web Information

被引:0
|
作者
Zhang, Yuanpeng [1 ]
Wang, Li [1 ]
Jiang, Kui [1 ]
Qian, Danmin [1 ]
Dong, Jiancheng [1 ]
机构
[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China
关键词
information extraction; clinic expert information; domain model; block importance model; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.
引用
收藏
页码:777 / 782
页数:6
相关论文
共 50 条
  • [31] An Efficient Method for Extracting Web News Content
    Sun, Jian
    Tang, Luyang
    Liao, Dan
    Chang, Victor
    2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [32] Extracting and evaluating method of web dense cores
    Yang, Nan
    Gao, Jie
    Xue, Honghu
    Liu, Xiude
    Journal of Southeast University (English Edition), 2008, 24 (03) : 276 - 280
  • [33] A Review on Extracting Underlying Content from Deep Web Interfaces
    Bhakare, Unnati N.
    Chatur, Prashant N.
    2017 INTERNATIONAL CONFERENCE ON INNOVATIVE MECHANISMS FOR INDUSTRY APPLICATIONS (ICIMIA), 2017, : 234 - 237
  • [35] A hybrid approach for extracting informative content from web pages
    Uzun, Erdinc
    Agun, Hayri Volkan
    Yerlikaya, Tarik
    INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 928 - 944
  • [36] On extracting link information of relationship instances from a web site
    Naing, MM
    Lim, EP
    Goh, DHL
    WEB SERVICES -ICWS-EUROPE 2003, PROCEEDINGS, 2003, 2853 : 213 - 226
  • [37] Review of Extracting Information From the Social Web for Health Personalization
    Fernandez-Luque, Luis
    Karlsen, Randi
    Bonander, Jason
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2011, 13 (01) : 138 - 152
  • [38] Extracting Environmental Information for Improved Web Service Matching and Identification
    Kannan, Kalapriya
    Narendra, Nanjangud C.
    Ramaswamy, Lakshmish
    2009 WORLD CONFERENCE ON SERVICES PART, 2009, : 79 - +
  • [39] Hidden Web Query Technique for Extracting the Data From Deep Web Data Base
    Das, Nripendra Narayan
    Kumar, Ela
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 410 - 414
  • [40] An automatic web wrapper for extracting information from web sources, using clustering techniques
    Papadakis, N
    Skoutas, D
    Raftopoulos, K
    Varvarigou, T
    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 24 - 30