A Hybrid Method for Extracting Deep Web Information

被引：0

作者：

Zhang, Yuanpeng ^{[1
]}

Wang, Li ^{[1
]}

Jiang, Kui ^{[1
]}

Qian, Danmin ^{[1
]}

Dong, Jiancheng ^{[1
]}

机构：

[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷

关键词：

information extraction; clinic expert information; domain model; block importance model; SVM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.

引用

页码：777 / 782

页数：6

共 50 条

[21] Extracting Domain Information using Deep Learning
Gupta, Amit
Xu, Weijia
Jaiswal, Pankaj
Taylor, Crispin
Regala, Jennifer
PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
[22] UNIVERSALEXTRACT - EXTRACTING DEEP WEB DATA USING ONTOLOGY
Hong, Jer Lang
Yin, Brian Ho Hoe
UNCERTAINTY MODELLING IN KNOWLEDGE ENGINEERING AND DECISION MAKING, 2016, 10 : 377 - 383
[23] Tuning up FOIL for extracting information from the web
Palacios, Pablo
Fernandez de Viana, Inaki
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2008, 33 (04) : 280 - 284
[24] Extracting Hidden Information Based on Comparing Web with UGC
Uchimura, Keisuke
Nadamoto, Akiyo
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2010 WORKSHOPS, 2011, 6724 : 365 - 377
[25] Extracting Information Seeking Intentions for Web Search Sessions
Mitsui, Matthew
Shah, Chirag
Belkin, Nicholas J.
SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 841 - 844
[26] A web page segmentation algorithm for extracting product information
Wu, Changjun
Zeng, Guosun
Xu, Guorong
2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 1374 - 1379
[27] A Method for Extracting Building Information from Remote Sensing Images Based on Deep Learning
Li, Lianying
Chen, Xi
Li, Lianchao
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[28] Extracting Academic Information from Conference Web Pages
Wang, Peng
You, Yue
Xu, Baowen
Zhao, Jianyu
2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 952 - 959
[29] A hybrid method for extracting classification rules
Zhuang, CL
Fu, ZT
Li, DL
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS II, 2005, 187 : 257 - 267
[30] A New Vision-Based Method for Extracting Academic Information from Conference Web Pages
Wang, Peng
Zhou, Mingqi
You, Yue
Zhang, Xiang
2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 976 - 981

← 1 2 3 4 5 →