A Hybrid Method for Extracting Deep Web Information

被引：0

作者：

Zhang, Yuanpeng ^{[1
]}

Wang, Li ^{[1
]}

Jiang, Kui ^{[1
]}

Qian, Danmin ^{[1
]}

Dong, Jiancheng ^{[1
]}

机构：

[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷

关键词：

information extraction; clinic expert information; domain model; block importance model; SVM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.

引用

页码：777 / 782

页数：6

共 50 条

[1] Hybrid approach to extracting information from web-tables
Jung, Sung-won
Kang, Mi-young
Kwon, Hyuk-chul
COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 109 - +
[2] A Novel Method for Extracting Entity Data from Deep Web Precisely
Yu Hai-tao
Guo Jian-yi
Yu Zheng-tao
Xian Yan-tuan
Yan Xin
26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 5049 - 5053
[3] A NEW WEB INFORMATION EXTRACTING METHOD BASED ON MULTI-COORDINATE
Huang, Min
Xi, Jian-Qing
Sun, Bo
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1488 - +
[4] Extracting Company Information from the Web
Lam, Man I.
Gong, Zhiguo
Guo, Jingzhi
2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 3640 - 3645
[5] Extracting semistructured information from Web
Huang, Yu-Qing
Qi, Guang-Zhi
Zhang, Fu-Yan
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design & Computer Graphics, 2000, 12 (03): : 230 - 234
[6] Extracting table information from the Web
Kim, YS
Lee, KH
DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 438 - 441
[7] An efficient web information extracting system
Kong, YH
Choi, IS
ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 1771 - 1774
[8] A novel method for extracting information from web pages with multiple presentation templates
Qingzhong L.
Yanhui D.
An F.
Yongquan D.
Journal of Software, 2010, 5 (05) : 506 - 513
[9] Extracting macroscopic information from Web links
Thelwall, M
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2001, 52 (13): : 1157 - 1168
[10] A proactive web agent for information browsing and extracting
Lu, HE
ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 879 - 882

← 1 2 3 4 5 →