A Hybrid Method for Extracting Deep Web Information

被引：0

作者：

Zhang, Yuanpeng ^{[1
]}

Wang, Li ^{[1
]}

Jiang, Kui ^{[1
]}

Qian, Danmin ^{[1
]}

Dong, Jiancheng ^{[1
]}

机构：

[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷

关键词：

information extraction; clinic expert information; domain model; block importance model; SVM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.

引用

页码：777 / 782

页数：6

共 50 条

[31] An Efficient Method for Extracting Web News Content
Sun, Jian
Tang, Luyang
Liao, Dan
Chang, Victor
2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
[32] Extracting and evaluating method of web dense cores
Yang, Nan
Gao, Jie
Xue, Honghu
Liu, Xiude
Journal of Southeast University (English Edition), 2008, 24 (03) : 276 - 280
[33] A Review on Extracting Underlying Content from Deep Web Interfaces
Bhakare, Unnati N.
Chatur, Prashant N.
2017 INTERNATIONAL CONFERENCE ON INNOVATIVE MECHANISMS FOR INDUSTRY APPLICATIONS (ICIMIA), 2017, : 234 - 237
[34] Extracting Result Schema Based on Query Instances in the Deep Web
NIE Tiezheng
Wuhan University Journal of Natural Sciences, 2007, (05) : 835 - 839
[35] A hybrid approach for extracting informative content from web pages
Uzun, Erdinc
Agun, Hayri Volkan
Yerlikaya, Tarik
INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 928 - 944
[36] On extracting link information of relationship instances from a web site
Naing, MM
Lim, EP
Goh, DHL
WEB SERVICES -ICWS-EUROPE 2003, PROCEEDINGS, 2003, 2853 : 213 - 226
[37] Review of Extracting Information From the Social Web for Health Personalization
Fernandez-Luque, Luis
Karlsen, Randi
Bonander, Jason
JOURNAL OF MEDICAL INTERNET RESEARCH, 2011, 13 (01) : 138 - 152
[38] Extracting Environmental Information for Improved Web Service Matching and Identification
Kannan, Kalapriya
Narendra, Nanjangud C.
Ramaswamy, Lakshmish
2009 WORLD CONFERENCE ON SERVICES PART, 2009, : 79 - +
[39] Hidden Web Query Technique for Extracting the Data From Deep Web Data Base
Das, Nripendra Narayan
Kumar, Ela
WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 410 - 414
[40] An automatic web wrapper for extracting information from web sources, using clustering techniques
Papadakis, N
Skoutas, D
Raftopoulos, K
Varvarigou, T
2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 24 - 30

← 1 2 3 4 5 →